wtorek, 5 maja 2015

Finding a 'corr' representative in Octave

This article describes the methodology used to compare function from GNU Octave and the TISEAN package. To achieve the desired results the author assumes you have installed the TISEAN package (available here) and have downloaded amplitude.dat and have installed GNU Octave with the 'signal' package in version 1.3.0 or newer.


1. Comparison

Procedure taken to receive results:

  1. Generate amp_corr.dat using the TISEAN package 'corr' with the call:
 '$ corr amplitude.dat -D5000 -o "amp_corr.dat"'
  2. Generate similar autocorelation data using (in GNU Octave):
 'load amplitude.dat; [a,b] = xcorr(amplitude, 5000, 'coeff');'
     Then to save the data you can use:
 'idx = [rows(amplitude):2*rows(amplitude)-1]; 
  xcorr_res = a(idx); 
  save "xcorr_res.dat" xcorr_res'
There is a strong difference in the data. This might be because of the different methods used in both cases (as explained further in the methods used Section 2. Methods). Because of those differences the amplitudes of the data generated using 'xcorr' from 'signal' decreases linearly. Thus to compare the data the oscillation amplitude of the data generated by 'xcorr' must be amplified. This linear decrease was not proven but observed on the 'amplitude.dat' data.

When a linear correction is applied:
 'mult = rows (amplitude) ./ (rows (amplitude) - [0:rows(amplitude)-1]);  
  xcorr_tisean_res = mult .* xcorr_res' 
Fig. 1 Difference between xcorr_tisean_res and amp_corr

The resultant xcorr_tisean_res is close to the TISEAN 'corr' function, and the difference is smaller than 3% (see Fig. 1). The end of the data begins to change and this is most likely because there is no more data past 5000 and so the results vary. If a autocorrelation is calculated for less data (e.g.4500 instead of 5000) the difference is much less, as can be seen on the chart above.

Even better results can be obtained for different data. We can generate a different set using the TISEAN package
 '$ar-model -s5000 amplitude.dat -p10 -o "amp_ar.dat"' 
When the process described above is applied to this new data set ('amp_ar.dat') the resulting difference between 'xcorr' and 'corr' is shown on Fig. 2.
Fig. 2 Difference between 'xcorr' and 'corr' on 'amp_ar.dat'

Similarly to the previous case the data is the same for small ( < 4000) numbers but when they get close to the edge the difference becomes more pronounced.


2. Methods

The way TISEAN calculates autocorrelation in the 'corr' program is by using estimation method. It is described here:
http://en.wikipedia.org/wiki/Autocorrelation#Estimation

On the other hand the 'xcorr' function from the signal package uses the FFT (Fast Fourier Transform) method (it is described in the same Wikipedia article: here)

This difference in methodology is the cause of the difference in the data results between both functions.

3. Conclusions [edited]
After more test we found 'corr' from TISEAN and 'xcorr' from 'signal' to perform the same autocorrelation and therefor it is not necessary to port it.

It is important to note the different usage:
 $ corr amplitude.dat -Dn# -n -o "amplitude_tisean.dat" 
 [data, lags] xcorr (amplitude, n#, 'unbiased')   
  data = data(find (lags >0)) 
Both of the usage noted above produce the same data.

It is important to note the '-n' in the calling of the TISEAN program. It mean the data is not normalized. You can achieve similar data even when calling 'corr' with normalization, but it is more tricky:
 $ corr amplitude.dat -Dn# -o "amplitude_tisean.dat" 
 [data, lags] xcorr (center (amplitude), n#, 'coeff')   
  data = data(find (lags >0))   
  data = data .* (n# ./ (n# - (transpose ([0:n#-1])))) 
The results of this can be viewed in tests/corr with function test_corr.m (note: 'signal' package needed) available in from the tisean port package repo: https://bitbucket.org/josiah425/tisean.

Brak komentarzy:

Prześlij komentarz