niedziela, 19 lipca 2015

Plans for randomize

This past week I have spent time trying to establish some testing framework for C++ methods and also trying to create a model for what the abstract base classes for cool, cost and permute should look like. I would like them to have the following methods/members:
  • Cost (Cost_fcn):
    • const Matrix *series
      -> pointer to the series for which a surrogate is generated
    • double cost
      -> current cost
    • void cost_transform (Matrix *)
      -> initial transformation which is used for better calculation of cost
    • Matrix cost_invert () const
      -> assigns to the input variable the inverse of the transformation performed above (to get the actual surrogate, not just a representation of it in a different form)
    • double cost_update (octave_idx_type nn1, octave_idx_type nn2, double cmax, bool &accept)
      -> perform quick update of cost (for a swap of elements under index nn1 and nn2) and decide if cost is smaller than maximum cost (cmax) if yes, accept new cost and return true otherwise reject new cost and return false
    • double cost_full()
      -> performs a full calculation of the cost, takes longer than cost_update()
    • getters/setters for cost
  • Cool (Cool_fcn):
    • double temp
      -> holds the current temperature
    • double cool (double cost, bool accept, bool &end)
      -> takes current cost, accept which holds whether the last cost_update() was accepted or not and returns the new temp and sets flag end to indicate if the simulated annealing is over
  • Permute (Permute_fcn):
    • Matrix *series
      -> holds the pointer to the series, and modifies the series only when exch() is called
    • void permute (octave_idx_type &n1, octave_idx_type &n2) const
      -> generates two indexes n1 and n2 that can be used to calculate Cost_fcn::cost_update()
    • void exch (octave_idx_type n1, octave_idx_type n2)
      -> exchange element under n1 with element under n2 in the series
Those are the methods I intend to have in the base/abstract classes which will be called by the Simulated Annealing runner code. I have not decided what that code should look like, but the current version seems to be working as well as the randomize program from TISEAN package.

I was also hoping to create a subclass of each of those abstract classes built to call GNU Octave code. This would allow the user to create their own functions without having to write anything in C++. However, this idea might not be practical for the following reasons:
  1. The example of Simulated Annealing provided in the TISEAN package (ver. 3.0.1), takes about 0.7 seconds to run using only C++ code and performs on average 900,000 calls to Cost_fcn::cost_update() and calling a simple function in Octave that many times (using the for loop) took 16 seconds
  2. I have trouble deciding how to neatly pass these functions/classes to randomize along with some parameters the user might want to include. I originally thought of using classdef - the new keyword introduced in Octave 4.0.0. I hoped to create an abstract Octave class and then let anyone subclass it to create their own cost, cool and permute classes. The problem is that classdef and all of the associated keywords are not documented, moreover according to Carnë Draug the help function will not recognize this new type of Octave class. So even if all of the needed functionality was available in Octave I might not be able to document it for the user
If obstacle 2. can be overcome it might still be beneficial for the package to create this type of functionality, regardless of how long the code will execute.

This week I plan to refine the design of the abstract classes as well as port more of the cost function options from TISEAN.

[Update]: I modified the design a bit and updated this post to fit the new code.

czwartek, 9 lipca 2015

Progress report and plans

So far my progress has been as planned. Before the end of the midterm evaluation I was able to publish on my repository version 0.2.0 of the package, which included all of the functions from section Dimensions and entropies from the TISEAN documentation. As I mentioned in my previous post the functions that needed to be ported in this section are slightly different from what I wrote in my outline. The ported functions are:
  • d2
  • d1
  • boxcount
  • c2t
  • c2g
  • c2d
  • av_d2
I also wrote demos for most of those functions and updated the tutorial on the wiki page.

The first part of this week I spent improving on the build process. The function __c2g__ relies on C++ lambdas to work, therefore a configure script needed to be introduced to ensure the compiler has this capability. As was suggested by John Eaton, I tried to make the impact of not having that capability as small as possible. Currently if the compiler does not recognize C++ lambdas simply __c2g__ is not compiled and the function c2g does not work.

The plans

I was hoping to port all of the functions in the next section, Testing for Nonlinearity, by the end of the week. This might not be possible as randomize turned out to be a bigger function than I anticipated. It is actually not a function at all but, as the author of the TISEAN documentation puts it, "a toolbox". It generates surrogate data using simulated annealing. It needs to be supplied with three functions:
  1. the cost function 
  2. the cooling function -- how the temperature decreases
  3. the permutation function -- what to change every step
So currently if the user wants their own version of any of the functions above the user needs to write it in FORTRAN. My goal for this project would be to allow the user to write (use) their own octave function. The SA algorithm is an iterative method so using Octave code is not a good idea (as each line must be parsed when using for or while loops). As far as I understand the samin routine from the optim package will not suffice as it does not generate surrogate data, and has fewer options. Due to the size of this function it might take me some time to complete it.

I plan to tackle this problem as follows: I will rewrite in C++ the equivalent function to randomize_auto_exp_random and then try to refactor and modify the code to accept other functions. I plan to include all of the functions that are available in TISEAN in the Octave package, either through rewriting them or through linking to them. And I would like to make it easy for new functions to be added.

Further reading on randomize is available on the TISEAN documentation in the General Constrained RandomizationSurrogates paper Appendixrandomize description and randomize function extension description.