API design

With the current API, if one wants to project in d=3, one has to know the exact number *n* of optional arguments before specifying 3 as the *n+1*th argument. This feels a bit uneasy, and it means that we can't add a supplementary hyperparameter to any method without it being a breaking change.

It seems to be that it would be nice to rethink the API "à la D3", so that:
- all the algorithms can be called interchangeably
- we could separate the training and transform phases (#11)
- we could specify hyperparameters individually
- we could serialize the model (in and out : save and load)

I would imagine that this could be structured as:

* new Druid([method or model]) — create a druid
* _druid_.values([accessor]) — sets the values accessor if specified, and returns the druid; return the values accessor if not specified
* _druid_.dimensions([number]) — sets or returns the dimensions (default: 2)
* _druid_.class([accessor]) — sets or returns the class accessor (for LDA)
* _druid_.method([name or class])  — sets the current method (UMAP, FASTMAP etc) if specified and returns the druid ; if not specified, return the method (as a Class or function).
* _druid_.fit(data) — train the model on the data and returns the druid
* _druid_.transform([data]) — transforms the data if specified; if data is not specified, returns the transformed train set
* _druid_.model([model]) — returns the serialized model (JSON) if a model is not specified, loads the model if specified

And for each hyperparameter, for example UMAP/*min_dist*
* _druid_.min_dist([min_dist]) — if specified, sets the min_dist hyperparameter and returns the druid, or read it if not specified

With this we could say for example:

~~~js
const dr = new Druid("LDA"); // dr
dr.dimensions(2).class(d => d.species).values(d => [+d.sepal_length, +d.petal_length, …]).fit(data); // dr
dr.transform(); // transformed data
const model = dr.model(); // JSON {}
…
const dr = new Druid(model); // dr
dr.transform([new data]); // apply the model to new data…
~~~

I wonder what should be done for NaN, I suppose they should be automatically ignored if the values accessor returns any NaN.

Note also that some methods such as UMAP can accept a distance matrix instead of a data array.


PS: Sorry for spamming your project :) The potential is very exciting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API design #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

API design #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions