GSoC 2015 porting TISEAN package: maja 2015

środa, 27 maja 2015

Milestone

I am happy to announce that I completed the first two sections.

Nonlinear Noise Reduction
From this section I added:

ghkss

Thus deprecating project. It is important to note that data allocation in ghkss (only) is done via new/delete because of the way the function was originally written. In the future this will be replaced with OCTAVE_LOCAL_BUFFER (OLB) macro.

Closer examination has revealed another interesting function in this section: nrlazy, which according to the documentation is similar to lazy. Because of this similarity porting it has low priority.

Phase Space Representation
From this section I added:


mutual
false_nearest
poincare

The function poincare was discovered when further studying the documentation and was omitted in the initial plan, but as it seems important in the documentation it was ported.

The function false_nearest is in a similar situation to ghkss, that is, new/delete is used instead of OLB. This will be improved on in the future.

Since corr does not need to be ported, this section is complete.

Nonlinear Prediction
This next section is well on the way as the following have been ported:


upo
upoembed
lzo_test

The function predict turned out to be essentially the same with lzo_test. This had not been verified so far, but according to the documentation they do essentially the same. The additional option that predict has (flag -v) can be easily replaced using GNU Octave's std when calling lzo_test with parameter r. Therefore porting predict will be most likely unnecessary.

The function upoembed is closely associated with upo. It takes the output of upo and creates a cell array of delay vectors for each orbit produced by upo. It was not mentioned in the original outline, but it is an important function for the package.

The state of upo is not optimal. The original implementation only supported input up to 1e6 data points. This might not be a big problem as calculating upo on a 1e4 henon map takes about 8 seconds, so a 1e6 would take about 800 seconds ~ 15 min. Changing this might be problematic as the main data in the FORTAN program is in a common block (it is stored in a global variable), which cannot contain arrays of variable dimensions. The authors of TISEAN chose 1e6, but because of how the program is written making it unlimited would not be trivial. It might be beneficial to lift this number to 1e8 or 1e9 but one must keep in mind that because of how the FORTRAN program is written it will always allocate a local array of the maximum size possible (currently 1e6 elements). If the maximum input length is lifted to 1e8 or 1e9 the size of the data allocated by Octave and the local copy that the FORTRAN program uses can be a sizable amount of memory to allocate. Moreover it is important to note that each data point will be a real*8 (not a real*4). This brings me to the next point.

FORTRAN data types
This topic has been problematic for me from the very beginning. I had trouble realizing just how the dummy variables work. After some research I found that one can pass -freal-4-real-8 and promote all real*4 to real*8. This is beneficial as the input into a FOTRAN program is passed as doubles. This caused serious issues as I needed to ensure that when I call any TISEAN or SLATEC function/subroutine I needed to use real*4 instead of real*8. The solution I originally used was to copy the input variables into local variables. Apart from eliminating potential bugs and improving code complexity the previously mentioned flag also allows the FORTRAN programs to have the same type of precision expected from GNU Octave programs.

Other things
I spent my time also on other things. I significantly changed the Makefiles, removed all compilation warnings and everywhere except for ghkss and false_nearest moved from new/delete to OCTAVE_LOCAL_BUFFER or some type of Array.

piątek, 15 maja 2015

Progress report

As I haven't had any significant roadblock or breakthroughs this week I wanted to give a little progress report on my work.
1. Added functionality
I have managed to add the following functions:

mutual
spectrum
lazy
delay
pca

Along with their documentation, tests and a demo (for lazy). I was really happy that once I had produced some examples of how I want to port these functions the process of porting each one accelerated rapidly.

I am especially excited about the fact that I have now henon, delay, an equivalent of addnoise and project available as this allowed me to create a nice noise reduction demo for project (and for lazy, but the one for project is more impressive). Fig.1 sums those efforts up.

Fig. 1 Noisy data and data cleaned up by project.

2. Functions found to be non-equivalent

I also spent a lot of my time (almost a week) researching which programs from TISEAN have a GNU Octave equivalent. Apart from my positive identifications, most of which I discussed in previous posts, I have made some negative ones. I found that both extrema and polynom have no GNU Octave equivalent.

There was a suggestion made that extrema might be similar to findpeaks from signal. The only problem is that findpeaks searches (and returns) all peaks, whereas extrema returns either minimums or maximums. It might be easier to implement it in Octave than to port, but this decision has not been made yet.

The latter program polynom was compared to detrend, polyfit and wpolyfit. The results were disappointing. All of the GNU Octave functions attempt to fit a polynomial onto the data, whereas polynom tries to make a "polynomial ansatz" for the data. The results are vastly different as can be seen on Fig. 2.

Fig. 2 Comparison of original data (green), polyfit fit (red), and polynom prediction (blue).

Both programs were run to try to use a 4th order polynomial.

czwartek, 7 maja 2015

The problem with 'spectrum'

The 'spectrum' function from TISEAN most likely needs to be rewritten in GNU Octave or there is no need for it. This is because linking to it does not seem like a good idea. This is because there is a suspicion it does not produce good results for some data inputs.

1. Where 'spectrum' works
First it is important to note that 'spectrum' from TISEAN is basically a GNU Octave 'abs(fft(:))' with additional data manipulation/adjustment. This additional work is not an elegant one-line solution, which might warrant designating a separate function that would translate the Octave respective function into a form similar to the output of 'spectrum'. Although this might not be necessary since the data obtained from the Octave function is very similar to 'spectrum' (Fig. 1).

Fig. 1 Unadjusted data from Octave

After adjusting the data (which was done by analyzing the source code to determine what actions the TISEAN programs perform) it was possible to get a close fit with a small difference. An example of this type of adjustment is listed below (Fig. 2)

Fig. 2 Adjusted data from Octave.

As it is not a one-line fix to convert ' abs (fft (:))' into a similar format as 'spectrum' it will not be shown in the post. It is available in the 'tests/spectrum/test_spectrum.m' function located on the tisean package repo (here).

2. Where the problem lies
The problem is that when 'spectrum' is used to create a step response its results vary substantially from what is produced by Octave. The way the data looks suggests that there is something wrong with 'spectrum'. The adjusted version is situated below.

For the most part, the data fits perfectly, but there seems to be a shadow on the bottom of the TISEAN data. If it is the case that there is a problem with 'spectrum' then its code should not be used in the future Octave package and should be rewritten or omitted (as similar results can be obtained from a simple Octave call).

środa, 6 maja 2015

Finding 'histogram' in GNU Octave

Unlike 'corr' it is quite easy to find a representative for 'histogram' from TISEAN. It is 'hist' from GNU Octave. The data is almost the same with the exception that the TISEAN package normalizes by default so one needs to be careful when calling the respective functions. I will describe differences in the data and describe the differences in usage.

1. Data comparison
I have attached a comparison of the two data sets (from 'hist' and 'histogram' on one chart)

Fig. 1 Comparison between 'hist' (Octave) and 'histogram' (TISEAN)

When one analyses the data close there is a slight discrepancy between the value on the 40th and 41st bar. But not only is it slight, it basically means that both programs assigned a certain value to two different bins, which should not be a major problem. All told we can say that both of those functions perform the same task.

2. Usage comparison
As mentioned before, usage varies on both functions.

 $ histogram amplitude.dat -b#n -o "amplitude_histogram.dat"

 [nn, xx] = hist (amplitude, #n, 1);  
  nn = transpose (nn); xx = transpose (xx)  
  amplitude_hist = [xx, nn];

This way the data stored in 'amplitude_hist' is essentially the same with 'amplitude_histogram.dat'.

wtorek, 5 maja 2015

Finding a 'corr' representative in Octave

This article describes the methodology used to compare function from GNU Octave and the TISEAN package. To achieve the desired results the author assumes you have installed the TISEAN package (available here) and have downloaded amplitude.dat and have installed GNU Octave with the 'signal' package in version 1.3.0 or newer.

1. Comparison

Procedure taken to receive results:

1. Generate amp_corr.dat using the TISEAN package 'corr' with the call:

 '$ corr amplitude.dat -D5000 -o "amp_corr.dat"'

2. Generate similar autocorelation data using (in GNU Octave):

 'load amplitude.dat; [a,b] = xcorr(amplitude, 5000, 'coeff');'

Then to save the data you can use:

 'idx = [rows(amplitude):2*rows(amplitude)-1]; 
  xcorr_res = a(idx); 
  save "xcorr_res.dat" xcorr_res'

There is a strong difference in the data. This might be because of the different methods used in both cases (as explained further in the methods used Section 2. Methods). Because of those differences the amplitudes of the data generated using 'xcorr' from 'signal' decreases linearly. Thus to compare the data the oscillation amplitude of the data generated by 'xcorr' must be amplified. This linear decrease was not proven but observed on the 'amplitude.dat' data.

When a linear correction is applied:

 'mult = rows (amplitude) ./ (rows (amplitude) - [0:rows(amplitude)-1]);  
  xcorr_tisean_res = mult .* xcorr_res'

Fig. 1 Difference between xcorr_tisean_res and amp_corr

The resultant xcorr_tisean_res is close to the TISEAN 'corr' function, and the difference is smaller than 3% (see Fig. 1). The end of the data begins to change and this is most likely because there is no more data past 5000 and so the results vary. If a autocorrelation is calculated for less data (e.g.4500 instead of 5000) the difference is much less, as can be seen on the chart above.

Even better results can be obtained for different data. We can generate a different set using the TISEAN package

 '$ar-model -s5000 amplitude.dat -p10 -o "amp_ar.dat"'

When the process described above is applied to this new data set ('amp_ar.dat') the resulting difference between 'xcorr' and 'corr' is shown on Fig. 2.

Fig. 2 Difference between 'xcorr' and 'corr' on 'amp_ar.dat'

Similarly to the previous case the data is the same for small ( < 4000) numbers but when they get close to the edge the difference becomes more pronounced.

2. Methods

The way TISEAN calculates autocorrelation in the 'corr' program is by using estimation method. It is described here:
http://en.wikipedia.org/wiki/Autocorrelation#Estimation

On the other hand the 'xcorr' function from the signal package uses the FFT (Fast Fourier Transform) method (it is described in the same Wikipedia article: here)

This difference in methodology is the cause of the difference in the data results between both functions.

3. Conclusions [edited]
After more test we found 'corr' from TISEAN and 'xcorr' from 'signal' to perform the same autocorrelation and therefor it is not necessary to port it.

It is important to note the different usage:

 $ corr amplitude.dat -Dn# -n -o "amplitude_tisean.dat"

 [data, lags] xcorr (amplitude, n#, 'unbiased')   
  data = data(find (lags >0))

Both of the usage noted above produce the same data.

It is important to note the '-n' in the calling of the TISEAN program. It mean the data is not normalized. You can achieve similar data even when calling 'corr' with normalization, but it is more tricky:

 $ corr amplitude.dat -Dn# -o "amplitude_tisean.dat"

 [data, lags] xcorr (center (amplitude), n#, 'coeff')   
  data = data(find (lags >0))   
  data = data .* (n# ./ (n# - (transpose ([0:n#-1]))))

The results of this can be viewed in tests/corr with function test_corr.m (note: 'signal' package needed) available in from the tisean port package repo: https://bitbucket.org/josiah425/tisean.