Add V1 Introspective Training Tests#2859
Conversation
6d3fdb6 to
55e7966
Compare
| using Microsoft.ML.Trainers.FastTree; | ||
| using Microsoft.ML.Trainers; | ||
| using Xunit; | ||
| using Microsoft.ML.Functional.Tests.Datasets; |
| /// Verify that a numerical array has no NaNs or infinities. | ||
| /// </summary> | ||
| /// <param name="array">An array of doubles.</param> | ||
| public static void AssertFiniteNumbers(double[] array, int ignoreElementAt = -1) |
There was a problem hiding this comment.
AssertFiniteNumbers [](start = 27, length = 19)
Where is this function being used? #Resolved
There was a problem hiding this comment.
There was a problem hiding this comment.
That's right. I put it in Common because I imagine that I'll use it again. Although ignoreElementAt is definitely a binning-only kind of thing.
In reply to: 262705673 [](ancestors = 262705673,262695483)
| } | ||
|
|
||
| /// <summary> | ||
| /// I can take an existing model file and inspect what transformers were included in the pipeline. |
There was a problem hiding this comment.
I can take an existing model file [](start = 12, length = 33)
You are not taking a model file. You are constructing the pipeline in the test. #Resolved
There was a problem hiding this comment.
Good point. I am updating the summary. I changed this test to just look at pipelines, and not necessarily at serialization / deserialization. There will be model-file-specific tests that test serialization and deserialization, so I decided to not test that here.
In reply to: 262709151 [](ancestors = 262709151)
| var column = currentSchema.GetColumnOrNull(expectedColumn); | ||
| Assert.Null(column); | ||
| } | ||
| i++; |
There was a problem hiding this comment.
Seems a bit complex and overkill. We only have two transforms in the chain, so this will run for the first transform and will check that the outputschema does not contain Score. #Resolved
| // Transform the data. | ||
| var transformedData = model.Transform(data); | ||
|
|
||
| // Verify that the slotnames cane be used to backtrack by confirming that |
| } | ||
|
|
||
| [Fact] | ||
| public void InspectNestedPipeline() |
There was a problem hiding this comment.
InspectNestedPipeline [](start = 20, length = 21)
Missing summary. #Resolved
artidoro
left a comment
There was a problem hiding this comment.
After you address the comments I think it's ready to go!
| var model = pipeline.Fit(data); | ||
|
|
||
| // Extract the normalizer from the trained pipeline. | ||
| // TODO #2854: Extract the normalizer parameters. |
There was a problem hiding this comment.
2854 [](start = 21, length = 4)
See issue, and sample on normalizers I think we can extract the parameters. #Resolved
| public float HoursPerWeek { get; set; } | ||
|
|
||
| /// <summary> | ||
| /// The list of columns commonly used as numerical features. |
There was a problem hiding this comment.
| /// The list of columns commonly used as numerical features. | |
| /// The list of columns commonly used as categorical features. | |
| ``` #Resolved |
29371f4 to
4f7d8f5
Compare
Codecov Report
@@ Coverage Diff @@
## master #2859 +/- ##
=========================================
Coverage ? 71.72%
=========================================
Files ? 812
Lines ? 142678
Branches ? 16124
=========================================
Hits ? 102330
Misses ? 35936
Partials ? 4412
|
This PR adds tests to cover the Introspective Training scenarios we want fully supported in V1.
I can take an existing model file and inspect what transformers were included in the pipeline
I can inspect the coefficients (weights and bias) of a linear model without much work. Easy to find via auto-complete.
I can inspect the normalization coefficients of a normalizer in my pipeline without much work. Easy to find via auto-complete.
I can inspect the trees of a boosted decision tree model without much work. Easy to find via auto-complete.
I can inspect the topics after training an LDA transform. Easy to find via auto-complete.
I can inspect a categorical transform and see which feature values map to which key values. Easy to find via auto-complete.
P1: I can access the GAM feature histograms through APIs
Fixes: #2498