Conversation
| // Use the path object to pass to GetLeaf, which will populate path with the IDs of th nodes from root to leaf. | ||
| List<int> path = default; | ||
| // Get the ID of the leaf this example ends up in tree 0. | ||
| var leafID = modelParams.GetLeaf(0, in testRow, ref path); |
There was a problem hiding this comment.
Removed because our public APIs doesn't provide those functions. #Resolved
There was a problem hiding this comment.
But you can still traverse through trees with new structure, right?
In reply to: 260562343 [](ancestors = 260562343)
There was a problem hiding this comment.
No, we cannot but this definitely should be hidden here. @[email protected], do you think this feature is required in tree's public APIs?
In reply to: 261011054 [](ancestors = 261011054,260562343)
There was a problem hiding this comment.
I think the trees should enable someone to explain what the tree is. I wouldn't make evaluation code part of the API. Any more than we insist that linear model parameters expose "do dot product" as a method on the model parameters. Both things are potentially useful of course, but we don't need it as part of our public API (at least at first).
In reply to: 261417026 [](ancestors = 261417026,261011054,260562343)
There was a problem hiding this comment.
Sounds good. I feel evaluation module should be independent to the expression of the trained objects. Thanks you!
In reply to: 261734289 [](ancestors = 261734289,261417026,261011054,260562343)
| [Argument(ArgumentType.LastOccurenceWins, HelpText = "The seed of the random number generator", ShortName = "r1")] | ||
| public int RngSeed = 123; | ||
| [Argument(ArgumentType.LastOccurenceWins, HelpText = "The seed of the random number generator", ShortName = "r1,RngSeed")] | ||
| public int RandomSeed = 123; |
There was a problem hiding this comment.
RandomSeed [](start = 19, length = 10)
I think everywhere else we call it just seed. #Resolved
There was a problem hiding this comment.
| [Argument(ArgumentType.LastOccurenceWins, HelpText = "The seed of the active feature selection", ShortName = "r3,FeatureSelectSeed", Hide = true)] | ||
| [TGUI(NotGui = true)] | ||
| public int FeatureSelectSeed = 123; | ||
| public int FeatureSelectionRandomSeed = 123; |
There was a problem hiding this comment.
FeatureSelectionRandomSeed [](start = 19, length = 26)
Do we really need two of them? #Resolved
There was a problem hiding this comment.
Kind of. There are two degrees of control that internal people wanted to independently vary or not vary: selection of features and selection of data. I have not kept touch with this set of people over the years but the reasoning at one point was definitely considered important. Let us not change it. #Resolved
There was a problem hiding this comment.
| [Argument(ArgumentType.LastOccurenceWins, HelpText = "Minimum categorical docs percentage in a bin to consider for a split.", ShortName = "mdop")] | ||
| public double MinDocsPercentageForCategoricalSplit = 0.001; | ||
| [Argument(ArgumentType.LastOccurenceWins, HelpText = "Minimum categorical docs percentage in a bin to consider for a split.", ShortName = "mdop,MinDocsPercentageForCategoricalSplit")] | ||
| public double MinExamplePercentageForCategoricalSplit = 0.001; |
There was a problem hiding this comment.
Min [](start = 22, length = 3)
Minimum #Resolved
| [Argument(ArgumentType.LastOccurenceWins, HelpText = "Minimum categorical docs percentage in a bin to consider for a split.", ShortName = "mdop")] | ||
| public double MinDocsPercentageForCategoricalSplit = 0.001; | ||
| [Argument(ArgumentType.LastOccurenceWins, HelpText = "Minimum categorical docs percentage in a bin to consider for a split.", ShortName = "mdop,MinDocsPercentageForCategoricalSplit")] | ||
| public double MinExamplePercentageForCategoricalSplit = 0.001; |
There was a problem hiding this comment.
0.001 [](start = 64, length = 5)
is it 0.001% or 0.1% ? #Resolved
There was a problem hiding this comment.
0.1%. Docs are updated to all Percentage and Fraction arguments.
In reply to: 260563129 [](ancestors = 260563129)
| /// <summary> | ||
| /// Copy the weights of all training features to <paramref name="weights"/>. | ||
| /// </summary> | ||
| public void GetFeatureWeights(ref VBuffer<float> weights) |
There was a problem hiding this comment.
Not sure if this should return IReadOnlyList<float>.
| [Argument(ArgumentType.LastOccurenceWins, HelpText = "Maximum number of distinct values (bins) per feature", ShortName = "mb")] | ||
| public int MaxBins = 255; // save one for undefs | ||
| [Argument(ArgumentType.LastOccurenceWins, HelpText = "Maximum number of distinct values (bins) per feature", ShortName = "mb,MaxBins")] | ||
| public int MaxBinCountPerFeature = 255; // save one for undefs |
There was a problem hiding this comment.
Max [](start = 19, length = 3)
Maximum #Resolved
| public Double FeatureFractionPerSplit = 1; | ||
|
|
||
| /// <summary> | ||
| /// Smoothing paramter for tree regularization. |
There was a problem hiding this comment.
paramter [](start = 22, length = 8)
parameter #Resolved
It has nothing to do with TLC TV. It is however important for command line applications, less so for API. Let us make internal for now. #Resolved |
I've looked into RegressionTree.cs and some things like: Refers to: src/Microsoft.ML.FastTree/FastTree.cs:3393 in 3257470. [](commit_id = 3257470, deletion_comment = False) |
|
Sure. In reply to: 467958748 [](ancestors = 467958748) |
We use using (var ch = Host.Start("Training"))
{
ch.CheckValue(trainData, nameof(trainData));
trainData.CheckBinaryLabel();
trainData.CheckFeatureFloatVector();
trainData.CheckOptFloatWeight();
FeatureCount = trainData.Schema.Feature.Value.Type.GetValueCount();
ConvertData(trainData);
TrainCore(ch);
}In reply to: 467689294 [](ancestors = 467689294) Refers to: src/Microsoft.ML.FastTree/FastTreeArguments.cs:301 in d64a4ae. [](commit_id = d64a4ae, deletion_comment = False) |
| using Microsoft.ML.CommandLine; | ||
| using Microsoft.ML.Data; | ||
| using Microsoft.ML.EntryPoints; | ||
| using Microsoft.ML.Internal.Internallearn; |
There was a problem hiding this comment.
Are you merged with master? I thought that InternalLearn was removed in a prior change. #Resolved
There was a problem hiding this comment.
suggestion: Refers to: src/Microsoft.ML.FastTree/FastTree.cs:104 in 3257470. [](commit_id = 3257470, deletion_comment = False) |
|
|
||
| /// <summary> | ||
| /// Copy the weights of all training features to <paramref name="weights"/>. | ||
| /// </summary> |
There was a problem hiding this comment.
</ [](start = 12, length = 2)
Missing pararm for weights #Resolved
There was a problem hiding this comment.
Thanks. I will add
/// <param name="weights">a <see cref="VBuffer{T}"/> where feature weights would be assigned to.
/// The i-th element in <paramref name="weights"/> stores the weight of the i-th feature.</param>In reply to: 261459910 [](ancestors = 261459910)
| int numTrees = Defaults.NumTrees, | ||
| int minDatapointsInLeaves = Defaults.MinDocumentsInLeaves, | ||
| double learningRate = Defaults.LearningRates) | ||
| int numLeaves = Defaults.NumberOfLeaves, |
There was a problem hiding this comment.
numLeaves [](start = 16, length = 9)
same suggestion here to use the full names for the arguments. #Resolved
There was a problem hiding this comment.
Should this be renamed to GamBinaryClassificationTrainer? I know there is an issue that is tracking consistent naming with our algorithms, so probably not worth doing in this pr. Refers to: src/Microsoft.ML.FastTree/GamClassification.cs:52 in 3257470. [](commit_id = 3257470, deletion_comment = False) |
| int numTrees = Defaults.NumTrees, | ||
| int minDatapointsInLeaves = Defaults.MinDocumentsInLeaves, | ||
| double learningRate = Defaults.LearningRates, | ||
| int numLeaves = Defaults.NumberOfLeaves, |
There was a problem hiding this comment.
numLeaves [](start = 16, length = 9)
Shouldnt these parameters also be fully named? #Resolved
There was a problem hiding this comment.
Thanks! I didn't realize they are FastTree static APIs.
In reply to: 261464899 [](ancestors = 261464899)
| /// https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/operators/bitwise-complement-operator. | ||
| /// </summary> | ||
| public IReadOnlyList<int> LteChild => _lteChild; | ||
| public IReadOnlyList<int> LessThanOrEqualToThresholdChildren => _lteChild; |
There was a problem hiding this comment.
LessThanOrEqualToThresholdChildren [](start = 34, length = 34)
It looks like across lightgbm, catboost and whatever they use in scikit this one called: leftChildrends, or leftChild, another one is rightChild. #Resolved
There was a problem hiding this comment.
As suggested by @[email protected], this will be internal. In reply to: 468890167 [](ancestors = 468890167) Refers to: src/Microsoft.ML.FastTree/FastTreeArguments.cs:105 in ee67d50. [](commit_id = ee67d50, deletion_comment = False) |
| ectx.CheckUserArg(0 <= secondaryMetricShare && secondaryMetricShare <= 1, "secondaryMetricShare", "secondaryMetricShare must be between 0 and 1."); | ||
| #endif | ||
| ectx.CheckUserArg(0 < LambdaMartMaxTruncation, nameof(LambdaMartMaxTruncation), "lambdaMartMaxTruncation must be positive."); | ||
| ectx.CheckUserArg(0 < NdcgTruncationLevel, nameof(NdcgTruncationLevel), "lambdaMartMaxTruncation must be positive."); |
There was a problem hiding this comment.
lambdaMartMaxTruncation [](start = 89, length = 23)
no need for this one. you already use nameof #Resolved
| /// <summary> | ||
| /// Early stopping metrics. | ||
| /// </summary> | ||
| public EarlyStoppingMetric EarlyStoppingMetric |
There was a problem hiding this comment.
EarlyStoppingMetric [](start = 19, length = 19)
From comments for previous option it looks like it should be applicable only for regression and ranking problem. Does it actually works for binary classification? #Pending
There was a problem hiding this comment.
I don't exactly know but conceptually yes. It can be used to measure gradient norm.
In reply to: 262295829 [](ancestors = 262295829)
make this one internal and next two below it. Refers to: src/Microsoft.ML.FastTree/FastTreeArguments.cs:737 in 157aedc. [](commit_id = 157aedc, deletion_comment = False) |
| }, | ||
| { | ||
| "Name": "MaxTreesAfterCompression", | ||
| "Type": "Int", |
There was a problem hiding this comment.
This PR fixes #2617 (also fixes #2375, fixes #2857) and is also a part of #2613.
Cleaned items: