You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the current API, if one wants to project in d=3, one has to know the exact number n of optional arguments before specifying 3 as the n+1th argument. This feels a bit uneasy, and it means that we can't add a supplementary hyperparameter to any method without it being a breaking change.
It seems to be that it would be nice to rethink the API "à la D3", so that:
we could serialize the model (in and out : save and load)
I would imagine that this could be structured as:
new Druid([method or model]) — create a druid
druid.values([accessor]) — sets the values accessor if specified, and returns the druid; return the values accessor if not specified
druid.dimensions([number]) — sets or returns the dimensions (default: 2)
druid.class([accessor]) — sets or returns the class accessor (for LDA)
druid.method([name or class]) — sets the current method (UMAP, FASTMAP etc) if specified and returns the druid ; if not specified, return the method (as a Class or function).
druid.fit(data) — train the model on the data and returns the druid
druid.transform([data]) — transforms the data if specified; if data is not specified, returns the transformed train set
druid.model([model]) — returns the serialized model (JSON) if a model is not specified, loads the model if specified
And for each hyperparameter, for example UMAP/min_dist
druid.min_dist([min_dist]) — if specified, sets the min_dist hyperparameter and returns the druid, or read it if not specified
With this we could say for example:
constdr=newDruid("LDA");// drdr.dimensions(2).class(d=>d.species).values(d=>[+d.sepal_length,+d.petal_length,…]).fit(data);// drdr.transform();// transformed dataconstmodel=dr.model();// JSON {}…constdr=newDruid(model);// drdr.transform([newdata]);// apply the model to new data…
I wonder what should be done for NaN, I suppose they should be automatically ignored if the values accessor returns any NaN.
Note also that some methods such as UMAP can accept a distance matrix instead of a data array.
PS: Sorry for spamming your project :) The potential is very exciting.
With the current API, if one wants to project in d=3, one has to know the exact number n of optional arguments before specifying 3 as the n+1th argument. This feels a bit uneasy, and it means that we can't add a supplementary hyperparameter to any method without it being a breaking change.
It seems to be that it would be nice to rethink the API "à la D3", so that:
I would imagine that this could be structured as:
And for each hyperparameter, for example UMAP/min_dist
With this we could say for example:
I wonder what should be done for NaN, I suppose they should be automatically ignored if the values accessor returns any NaN.
Note also that some methods such as UMAP can accept a distance matrix instead of a data array.
PS: Sorry for spamming your project :) The potential is very exciting.