Skip to content
Dan Barowy edited this page Feb 14, 2014 · 6 revisions

Write a Partial Recomputation Engine

Presently, CheckCell relies on Excel's own computation engine to explore the effect of altering function inputs. While this works well, Excel's API is extremely expensive to call, and thus there is a very high overhead when doing these recomputations. For thread-safety, Excel also serializes all accesses to the Excel objects, which means that parallel recomputation is currently not possible despite the fact that data debugging is an embarrassingly parallel algorithm.

We would like a GSoC to build a spreadsheet calculation engine to allow us to better utilize parallelism in CheckCell. The following things will need to be done:

  • Reimplement a large number of Excel functions. There should be a fallback mechanism (i.e., use Excel) to handle functions that we do not implement. We have a small codebase started by a different GSoC student that can be used
  • The recomputation engine should use the graph (specifically, a DAG) of the computation generated by CheckCell in order to determine which subgraphs don't need to be recalculated, thus allowing efficient partial recomputation.
  • Ideally, the engine should be written in a way that does not rely on Excel specifically, so that CheckCell the same system can be used with other spreadsheet software packages (e.g., OpenOffice).

Technical Challenges

  • Our parser will need to be extended to handle new functions, one at a time. This parser is written in F# using FParsec.
  • Excel's APIs and object model are poorly designed, and do not make good use of the type-safety features in modern managed languages like C#. We have attempted to abstract away many of these problems in CheckCell, but it is possible that our abstraction is leaky and will need to be further refined.
  • Reimplementing Excel's many functions will require a deep understanding of their semantics, which in some cases is poorly documented. It will be essential to have a comprehensive test suite to ensure that we do the same thing as Excel.

Ideal Skills

  • Experience with C# is required. Experience with F# is a major plus.
  • Familiarity with Visual Studio's test suite is required.
  • Familiarity with Excel's API is recommended.
  • Some knowledge of programming language design (parsers, code transformers, etc.) is a plus.

Clone this wiki locally