Differences

This shows you the differences between two versions of the page.

--- conv:conv [2008/01/10 13:54] – deva
+++ conv:conv [2008/01/24 12:08] – deva
@@ Line 1: / Line 1: @@
-=====Convolution implementation pages=====
+=====Convolution implementation pages (Spring 2008 studies group under Ole Caprani)=====
 Our goal is to implement a realtime plugin to do convolution.\\
-The basic "Input Side" is way too slow, so we will be experimenting with a dft-multiply-idft algorithm instead.\\
+The basic "Input Side" is way too slow (convolution implemented with two for loops), so we will be experimenting with a dft-multiply-idft algorithm (sometimes called "The Overlap-and-Save algorithm") instead.\\
 To do the actual fft we will use the fftw3 library (http://www.fftw.org), which implements the "Fastest Fourier Transform in the West" algorithm.\\
+The next step is to implement at test the "Partitioned Overlap-and-Save" algorithm, where the filter is split into partitions prior to convolution.
+====Group====
+  * Jonas Suhr Christensen - //suhr@daimi.au.dk// - 20032491
+  * Bent Bisballe Nyeng - //deva@daimi.au.dk// - 20001467
+====Analysis====
+The following graphs show the execution time of a single convolution iteration, using the overlap and save algorithm with various optimizations enabled.\\
+The timings are produced using a Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz, with 4096KB cache.\\
+The algorithm optimizations are named as follows:
+  * **Double** This is an implementation with no particular optimizations.
+  * **Float** This is a simple implementation using float as the internal datatype.
+  * **Real Double** This is an implementation using real ffts instead of complex.
+  * **Real Float** This is an implementation using real ffts instead of complex, with float as internal datatype.
+  * **MT Double** This is an implementation running on multiple CPUs.
+  * **MT Float** This is an implementation with float as internal datatype, running on multiple CPUs.
+  * **MT Real Double** This is an implementation using real ffts instead of complex, running on multiple CPUs.
+  * **MT Real Float** This is an implementation using real ffts instead of complex, with float as internal datatype, running on multiple CPUs.
+{{:conv:graph_256.png?450}}
+{{:conv:graph_512.png?450}}\\
+{{:conv:graph_1024.png?450}}
+{{:conv:graph_2048.png?450}}\\
+{{:conv:graph_4096.png?450}}\\
+As it is seen on the graphs the speed improves with the buffer size increasing, up to 1024 samples. At this point the iteration time starts fluctuating. This may be caused by internal cache size on the given CPU.
 ====Links====
@@ Line 11: / Line 35: @@
   * http://people.scs.fsu.edu/~burkardt/c_src/fftw3/fftw3.html - Some info about the fftw3 library.
   * http://www.ludd.luth.se/~torger/brutefir.html#bruteconv_3 - Some info about partitioned convolution.
+  * http://www.aurora-plugins.it/Public/Papers/164-Mohonk2001.PDF - More on partitioned convolution.
+  * http://www.linuxdevcenter.com/lpt/a/586 - A LADSPA tutorial
 ====Code====
@@ Line 116: / Line 142: @@
 #define RE 0
 #define IM 1
 static FFTW_DATATYPE *hfft = NULL;