Monday, April 29, 2013

Why waste a good polynomial

In the past week Jan finished the implementation of Taylor series methods for ODE-IVPs. The algorithm uses forward-mode automatic differentiation to recursively compute Taylor coefficients for the field and solution components of increasing length and the implementation relies on memorization of sub-expressions to guarantee that syntactically equivalent expressions only get computed once. Jan has been reflecting on the fact that the algorithm actually produces the Taylor polynomial for the field and solution components, but then collapses them into a single interval box. While the current line of inquiry aims for an algorithm where components exchange approximations given by intervals, and not polynomials, Jan hopes to return to investigate the potential of keeping the polynomials unevaluated, and applying the algorithm on them directly, as a potential means of combating wrapping effects. Jan wonders if this is one of the techniques employed by advanced solvers such as VNODE or ValEncIA-IVP, which also rely on Taylor series methods and conditioning techniques similar to those proposed by Lohner and Zgliczynski. 

Planar Vehicle Safety Maneuver Function

Jawad had created the next simulation case for lateral function. The simulation case describes the testing vehicle approaching from behind towards the reference object (car/pedestrian). Simulation shows the testing vehicle kinematic maneuver (Smooth steering) when critical distance between testing vehicle and reference stationary object/Vehicle crosses threshold values. Planar differential vehicle was modeled in local reference frame, where the position, orientation, linear velocity and rotational velocity were defined in terms of right and left side wheels. The yaw angle, vertical distance and horizontal distance were used as the control variable for safe maneuver. The local planner position and velocity vectors were then transformed in Global Cartesian Coordinates.  3D visualization of the planar vehicle was represented by rectangular box; the straight road was represented by a fixed rectangular box; the sensor was represented by the single variable which calculate(ideally without uncertainty and faults) the distance between testing vehicle and stationary vehicle. The left and right tire constant velocity was provided to get the velocity and orientation at the center of the vehicle. Please click Simulation 3 to watch the video.


Sunday, April 28, 2013

Two options for parallelizing the CStore representation function

Adam has been working on the parallelization of an expensive operation performed by the imperative interpreter. This operation (repr) converts the type used to represent the simulation state internally in the imperative interpreter (Object) to the type used by the reference interpreter (CStore), as this is the format used to communicate results to the graphical user interface. Currently, this operation is done sequentially. The imperative interpreter first collects snapshots of the state at each step of the simulation and then maps repr over these to produce a stream of CStore objects, which are then passed to the GUI.
There are several possible options to parallelize this operation. An attractive first option, which is trivial to implement, is to use Scala's parallel collections. This amounts to adding the .par suffix to the collection, which makes all subsequent operations on the collection resolve to their parallel counterparts. Unfortunately, the simplicity of this solution comes at a price, Scala's parallel collections operations only work on compatible collection types (vectors, arrays, hash maps), and will copy the contents of sequential collections (such as the Stream used to represent the simulation history) to a compatible type before performing their computation. This means that additional copies of the simulation state will have to be kept in memory, which already constitutes a bottleneck in certain simulations.
Another option that Adam looked at was to push the computation of the CStore into the iterateMain function which updates the state during simulation and is already parallel. Fusing iterateMain and repr was straight forward given that these are both traversals of the heap and the necessary changes to the interpreter were relatively small. Surprisingly, the resulting implementation did not result in any measurable performance improvement. Additional investigation is necessary to identify why this optimization was not successful.

Monday, April 22, 2013

Constants and their Taylor coefficients

During the past week Jan has been working on a Taylor series-based method for solving ODE-IVPs as pert of the ongoing work to implement the C^1-Lohner solver. Midway into the implementation described in An Introduction to Automatic Differentiation by Corliss and Rall Jan decided to hand-execute the algorithm to have a case ready to test the implementation with. He chose the ODE x'' = -10, x(0) = 5, x'(0) = 0 and expected to obtain the polynomial solution as the first three terms of the computed expansion, and then zeroes from the fourth term and on but, to his surprise, the computation produced non-zero terms also after the third term. It turns out that Jan had forgotten to use the Taylor coefficients for the constant -10 in the field of the ODE, effectively equating x'' with a function whose every Taylor coefficient is -10, not just the zeroth one, with all the remaining ones being zero! When using the prescribed series the computation, of course, yielded the correct result.

Monday, April 15, 2013

Naive Taylor approximations

For the past week Jan has been considering various ways to implemented Taylor approximations for real vector functions. While automatic differentiation has seemed a natural choice, the issue of computing safe approximations of the interval remainder makes that choice questionable, as it requires the re-computation of all Taylor coefficients of lower orders. With the requirement of a rigorous Taylor approximation, symbolic differentiation becomes a viable alternative. The large degree of sharing between terms of a Taylor approximation makes is crucial, from a performance perspective, to memoize the intermediate value in the computation. However, the optimization space is very large and Jan decided to aim to complete a prototype as soon as possible, to be able to observe the expected  behavior of the Lohner algorithm, and thus motivate the considerable optimization work that will be needed to implement the algorithm in a satisfactory manner. Jan has finalized a naive implementation of taylor approximation for multivariate functions and will aim for an implementation of Taylor methods for ODEs next.

Adding a generic, parallel array to a Delite DSL

During the past week Adam dusted off the Delite DSL for interval arithmetic that he has worked on previously and extended it with a generic data structure supporting a data-parallel map operation. The Delite framework supports this through its DeliteCollection and DeliteOpMap interface traits. Generic data types in Delite DSLs require some additional effort to develop, e.g. type information needs to be carried throughout the implementation to get around the erasure that eliminates type information in JVM bytecode. This is most conveniently done using Scala's type manifests.
Thanks to the kind people of the delite-devel mailing list Adam was also able to understand how to implement the supporting operations that need to be added so that Delite can use the custom data type that Adam implemented (a generic array). Something that is still under investigation is how to generate CUDA code from the example DSL program (an Interval Newton-based root finder), as the side-effectful functions that work well with the Scala code generator seem to cause problems with the CUDA generator.

Monday, April 8, 2013

Eliminating book-keeping using locks

In an earlier experiment, Adam and Jan showed that eliminating the book-keeping data structure Changeset that encodes object graph transformations caused the parallel interpreter to become non-deterministic. Adam realized that the non-deterministic behavior was likely caused by data races during concurrent updates of the Acumen heap. A fix was implemented using locks to sequentialize updates to the object graph. Indeed, this removed the non-deterministic behavior and Adam then proceeded to remove the book-keeping from the Changeset data structure all together, reducing it to a simple flag, indicating whether the discrete update fix point computation has been reached. As a result, the implementation of the parallel interpreter was simplified and as a bonus an average 5-10% speedup was observed during benchmarking. An additional benefit is that the resulting interpreter is more suitable for implementation in imperative languages.

Going recursive with Taylor methods

Jan has been working on an implementation of the C^1-Lohner ODE-IVP solver described previously. As a first step he has been considering various ways to implement Taylor methods for ODEs, and following the advice of Barrio's 
Performance of the Taylor series method for ODEs/DAEs Jan decided to use a recursive scheme based on automatic differentiation. This is also the scheme suggested by Zgliczynski in the paper describing the Lohner algorithm itself. The correct choice of algorithm for this sub-procedure of the solver is crucial as the computation if performed in each time-step. Ideally as much as possible of the computation should be performed at initialization, when the field of the IVP is first identified and Jan will be keeping in mind choices that will keep this a possibility.