Hi

Well, thanks to wbx, brutefir runs again! :-)
I reran the performance test on my core2duo machine on an asus-p5bvm in adk with kernel  4.4.50-1.
For the 30:00 test file it now takes 18 seconds!
This is a 40% performance increase compared to earlier runs using adk on the same machine! :-)
Also, this is now the same runtime I had on this machine running lfs with kernel 3.10.10.
It is encouraging to now get the same runtime.
Though I do not know where the performance boost comes from. Does anyone have an idea?

The next step will be to rerun the test on the pi2, to see if it runs faster there as well.

Cheers
Oliver



----- Weitergeleitete Message -----
Von: "lich000king@yahoo.de" <lich000king@yahoo.de>
An: "dev@openadk.org" <dev@openadk.org>
Gesendet: 21:09 Mittwoch, 2.Dezember 2015
Betreff: BruteFIR performance test

Hi there

So, I have also run the test on my Hummingboard i2ex. Here it takes around 4.5 minutes, i.e. almost 50% longer than on the pi 2.
On the core2duo machine I have also repeated the test in adk. It takes around 30 seconds, more than 50% longer than on the same machine running lfs (linux 3.10.10). This may well be due to a not optimized kernel config. I will need to delve deeper into this to find out.

Any ideas and comments are welcome:-)

Cheers
Oliver

----- Weitergeleitete Message -----
Von: lich000king <lich000king@yahoo.de>
An: dev@openadk.org
Gesendet: 20:47 Dienstag, 27.Oktober 2015
Betreff: BruteFIR performance test

Hi again

Performed the same test on my core2duo machine (approx. 10 years old).
It takes around 18 seconds. So it is roughly 10x faster.
Does this sound reasonable or should I expect better performance from the raspberry pi 2?
The test is designed to use two cores only.

I have not managed setting up OpenADK on this machine yet, so this test ran on lfs (linux 3.10.10-rt7). But this should not affect the results much.

I also tried to run the test on the hummingboard. But there BruteFIR gives me an error which I don't quite understand:

BruteFIR v1.0m (November 2013)                                (c) Anders Torger

Internal resolution is 32 bit floating point.
Creating 4 FFTW plans of size 8192...finished.
Loading coefficient set...finished.
Failed to open module "file" in "/usr/lib/brutefir/file.bfio": /usr/lib/brutefir/file.bfio: undefined symbol: __aeabi_idivmod.

It looks like a problem with a floating point operation.

If anyone has any insight to any of these it will be much appreciated. :-)

Cheers
Oliver





On 18.10.2015 14:58, lich000king wrote:
Hi everyone

I am still wondering if it is possible to improve brutefir performance on arm.
In order to assess the convolution performance, I have prepared a little test procedure.

So here is how to test BruteFIR performance on OpenADK:

I used adk revision 463aa3b 2015-10-14 | update firmware and kernel to latest.
In order to perform this test you have to build adk with brutefir and sox packages (for now better don't enable PREEMPT_RT_FULL as there is an unresolved conflict with the bcm driver).

In adk do the following:

**********************************************************************************

rw
mkdir /root/bftest
cd /root/bftest

cat > /root/bftest/wavtest.conf << "EOF"
# Begin /root/bftest/wavtest.conf

# File used for testing brutefir
# Uses dirac pulse als filter.
# Stereo only. Input from file /root/bftest/pinknoise.wav. Output to file /root/bftest/output.wav.

## DEFAULT GENERAL SETTINGS ##

float_bits: 32;             # internal floating point precision
sampling_rate: 44100;       # sampling rate in Hz of audio interfaces
filter_length: 4096,16;      # length of filters
# config_file: "/home/audiovero/.brutefir_config"; # standard location of main config file
overflow_warnings: true;    # echo warnings to stderr if overflow occurs
show_progress: false;        # echo filtering progress to stderr
max_dither_table_size: 0;   # maximum size in bytes of precalculated dither
allow_poll_mode: false;     # allow use of input poll mode
modules_path: "/usr/lib/brutefir";   # extra path where to find BruteFIR modules
monitor_rate: true;        # monitor sample rate
powersave: -80;           # pause filtering when input is zero
lock_memory: false;          # try to lock memory if realtime prio is set
convolver_config: "/root/bftest/brutefir_convolver"; # location of convolver config file
#benchmark: true;

## LOGIC ##

logic: "cli" { port: 3000; };

## COEFFS ##


coeff "dirac" {
        filename: "dirac pulse";
	format: "FLOAT64_LE";
	};


## INPUT, OUTPUT ##							

input "fleft", "fright" {
        device: "file" { path: "/root/bftest/pinknoise.wav"; skip: 44;}; # ignore_xrun: true; };
        sample: "S16_LE";
	channels: 2/0,1;
	
};

output "fr", "fl" {
        # device: "alsa" { device: "hw:0";}; # ignore_xrun: true; };
	device: "file" { path: "/root/bftest/output.wav"; };
        sample: "S16_LE";
	channels: 2/0,1;
	dither: true;
};




## FILTERS ##



filter "frfilter" {
       from_inputs: "fright";
       to_outputs: "fr";
       coeff: "dirac";
};

filter "flfilter" {
       from_inputs: "fleft";
       to_outputs: "fl";
       coeff: "dirac";
};

# End /root/bftest/wavtest.conf
EOF


# generate 30 minutes of pink noise as test input for brutefir

sox -t sl -r 44100 -c 2 /dev/zero -r 44100 -c 2 -b 16 pinknoise.wav synth 30:00 pinknoise vol 0.6

# run brutefir once in order to generate fftw plans (you can cancel this as soon as you see the "Audio processing starts now..."):

brutefir -nodefault /root/bftest/wavtest.conf

# Then measure the time it takes to convolve this test file by doing:

time brutefir -nodefault /root/bftest/wavtest.conf

**********************************************************************************

The first test will not give the correct time since it has to generate FFWT plans which takes quite some time. So you can cancel it the first time as soon as you see the "Audio processing starts now..." and restart it. The second time it will start convolving immediately.
On my rpi2 it takes approx. 3 m 10 s to 3 m 20 s.

The question now is if there is a way to improve brutefir performance on arm, possibly by using some hand coded assembler.

Happy testing and thanks for any feedback!

All the best
Oliver