SDCA Regression and BinaryClassification Pigsty extensions by TomFinley · Pull Request #837 · dotnet/machinelearning

TomFinley · 2018-09-05T06:21:19Z

Related to the closed #632 and the API overall. In here I introduce extensions for SDCA regression and binary classification. (No multiclass until I also write extensions for term.)

This also includes a sort of general purpose utilities for writing reconcilers for trainers, though, again, only regression, binary classification, and binary classification without probabilities so far.

Also a minor change to the linear classification trainer, as it was not identifying that it was producing probabilities sometimes.

sfilipi · 2018-09-05T15:38:12Z

src/Microsoft.ML.Data/StaticPipe/TrainerEstimatorReconciler.cs

+        /// Constructor for the base class.
+        /// </summary>
+        /// <param name="inputs">The set of inputs</param>
+        /// <param name="outputNames">The names of the outputs, which we assume cannot be changed</param>


real question: do we put a period in the sentences of the
section, but not in the sentences in ? #Closed

Having complete sentences in the summary is essential. About params, I'm less sure about that, but I think I prefer to have them, I'll try to add them.

In reply to: 215322337 [](ancestors = 215322337)

sfilipi · 2018-09-05T15:39:44Z

src/Microsoft.ML.Data/StaticPipe/TrainerEstimatorReconciler.cs

+namespace Microsoft.ML.Data.StaticPipe.Runtime
+{
+    /// <summary>
+    /// General purpose reconciler for a typical case with trainers, where they accept some generally


reconciler [](start = 24, length = 10)

I am still fuzzy on the purpose of reconcilers. Is it a bridge from estimators to trainers? #Closed

Hi @sfilipi thanks for the question -- I covered this somewhat in #632, in particular, if you check out the section "one or two implementation details," paragraphs 2 through 4 of that section deal with the role of reconcilers. (As does I suppose the XML docs for reconcilers.)

The idea is that, the user has declaratively defined their delegate where they define what they want done, and the analyzer that calls that delegate does a lot of work to determine what should go where, but at the final last step there has to be something responsible for actually creating those IEstimator (or, in some cases, IDataReaderEstimator) objects. That thing is the reconciler.

It's not a great name. #Resolved

sfilipi · 2018-09-05T15:47:54Z

src/Microsoft.ML.Data/StaticPipe/TrainerEstimatorReconciler.cs

+        /// <summary>
+        /// Produces the estimator. Note that this is made out of <see cref="ReconcileCore(IHostEnvironment, string[])"/>'s
+        /// return value, plus whatever usages of <see cref="CopyColumnsEstimator"/> are necessary to avoid collisions with
+        /// the output names fed to the constructor. This class provides the implementation, and subclassses should instead


subclassses [](start = 97, length = 11)

typo #Closed

sfilipi · 2018-09-05T15:59:01Z

src/Microsoft.ML.StandardLearners/Standard/SdcaStatic.cs

+            return rec.Output;
+        }
+
+        private sealed class TrivialFactory : ISupportSdcaClassificationLossFactory


TrivialFactory [](start = 29, length = 14)

would 'BaseLossFactory' be a better name? #Closed

I think the idea is that this 'factory' will always return an existing object, so it's more 'trivial' than it is 'base'

In reply to: 215330387 [](ancestors = 215330387)

Right, I agree with @Zruty0. I also do not like putting "base" in since we tend to reserve that word for abstract classes.

In reply to: 215369185 [](ancestors = 215369185,215330387)

sfilipi · 2018-09-05T16:01:27Z

src/Microsoft.ML.StandardLearners/Standard/SdcaStatic.cs

+                int? maxIterations = null,
+                bool shuffle = true,
+                float biasLearningRate = 0,
+                Action<LinearBinaryPredictor, ParameterMixingCalibratedPredictor> onFit = null)


onFit [](start = 82, length = 5)

is this so we can pass different types of calibrators? #Closed

No, this is so the user of this method can be informed about the calibrator, so they know the slope, etc. I have added some XML docs for this, that will attempt to explain what onFit is.

In reply to: 215331242 [](ancestors = 215331242)

sfilipi · 2018-09-05T16:06:28Z

src/Microsoft.ML.StandardLearners/Standard/SdcaStatic.cs

+        }
+
+        public static (Scalar<float> score, Scalar<bool> predictedLabel)
+            PredictSdcaBinaryClassificationCustomLoss(this Scalar<bool> label, Vector<float> features, Scalar<float> weights = null,


PredictSdcaBinaryClassificationCustomLoss [](start = 12, length = 41)

Are we going to test those methods through the samples code we're doing later? #Closed

I guess so. But I wanted to test them now.

In reply to: 215332900 [](ancestors = 215332900)

Zruty0 · 2018-09-05T17:49:30Z

src/Microsoft.ML.Data/StaticPipe/TrainerEstimatorReconciler.cs

+            /// <param name="features">The features column name</param>
+            /// <param name="weights">The weights column name, or <c>null</c> if the reconciler was constructed with <c>null</c> weights</param>
+            /// <returns>Some sort of estimator producing columns with the fixed name <see cref="DefaultColumnNames.Score"/></returns>
+            public delegate IEstimator<ITransformer> EstimatorMaker(IHostEnvironment env, string label, string features, string weights);


Maker [](start = 62, length = 5)

Either 'builder' or 'factory' seems a bit more common for a factory method like this #Closed

Let's do factory I guess.

In reply to: 215366624 [](ancestors = 215366624)

Zruty0 · 2018-09-05T17:52:12Z

src/Microsoft.ML.StandardLearners/Standard/SdcaStatic.cs

+    /// </summary>
+    public static class SdcaStatic
+    {
+        public static Scalar<float> PredictSdcaRegression(this Scalar<float> label, Vector<float> features, Scalar<float> weights = null,


PredictSdcaRegression [](start = 36, length = 21)

Let's get into habit of writing extensive summary comments for these, since this is our public API now #Pending

Let me know if you like my documentation.

In reply to: 215367489 [](ancestors = 215367489)

I do

In reply to: 215430305 [](ancestors = 215430305,215367489)

Zruty0 · 2018-09-05T17:54:07Z

src/Microsoft.ML.StandardLearners/Standard/SdcaStatic.cs

+            var rec = new TrainerEstimatorReconciler.Regression(
+                (env, labelName, featuresName, weightsName) =>
+                {
+                    var trainer = new SdcaRegressionTrainer(env, args, featuresName, labelName, weightsName);


args [](start = 65, length = 4)

Yuck, args! In transforms, I was trying to get rid of them. I think we should do the same for trainers? #Closed

asking for a friend :)
That is, not insisting that you do so now.

In reply to: 215368077 [](ancestors = 215368077)

Hi @Zruty0 , I agree with you. I can make an issue on this subject. This will of course not be resolved in this PR, but it is important we begin to think about it.

In reply to: 215368265 [](ancestors = 215368265,215368077)

Zruty0 · 2018-09-05T17:58:17Z

src/Microsoft.ML.StandardLearners/Standard/SdcaStatic.cs

+
+            public ISupportSdcaClassificationLoss CreateComponent(IHostEnvironment env)
+            {
+                // REVIEW: We are ignoring env?


REVIEW: We are ignoring env? [](start = 19, length = 28)

well, in this case the loss has been given to us, so no need to use env to manufacture anything, right? #Closed

That's fair I suppose.

In reply to: 215369392 [](ancestors = 215369392)

Zruty0 · 2018-09-05T18:00:00Z

src/Microsoft.ML.Data/StaticPipe/TrainerEstimatorReconciler.cs

+        }
+
+        /// <summary>
+        /// A reconciler for regression capable of handling the most common cases for binary classification


regression [](start = 29, length = 10)

binary classification? #Closed

Zruty0 · 2018-09-05T18:00:15Z

src/Microsoft.ML.Data/StaticPipe/TrainerEstimatorReconciler.cs

+        }
+
+        /// <summary>
+        /// A reconciler for regression capable of handling the most common cases for binary classifier with calibrated outputs.


regression [](start = 29, length = 10)

seems like a copypasta error #Closed

Zruty0 · 2018-09-05T18:05:24Z

src/Microsoft.ML.Data/StaticPipe/TrainerEstimatorReconciler.cs

+            /// </summary>
+            public (Scalar<float> score, Scalar<bool> predictedLabel) Output { get; }
+
+            protected override IEnumerable<PipelineColumn> Outputs { get; }


I'm not sure I fully grasp the difference between Output and Outputs... #Closed

I'll add a comment. They only really differ in content in this one class, due to the fact that are compile time we can only be sure two are present, while at runtime we also have to be sensitive as to whether we are in fact producing that extra probability column.

In reply to: 215371730 [](ancestors = 215371730)

Zruty0 · 2018-09-05T18:06:45Z

test/Microsoft.ML.StaticPipelineTesting/Training.cs

+
+namespace Microsoft.ML.StaticPipelineTesting
+{
+    public sealed class Training : MakeConsoleWork


MakeConsoleWork [](start = 35, length = 15)

can we rename this to something that indicates that it's a base class for tests?
Like BaseTestClassWithConsole ? #Closed

:D

In reply to: 215372112 [](ancestors = 215372112)

Zruty0 · 2018-09-05T18:25:14Z

src/Microsoft.ML.StandardLearners/Standard/SdcaStatic.cs

+        }
+
+        public static (Scalar<float> score, Scalar<bool> predictedLabel)
+            PredictSdcaBinaryClassificationCustomLoss(this Scalar<bool> label, Vector<float> features, Scalar<float> weights = null,


PredictSdcaBinaryClassificationCustomLoss [](start = 12, length = 41)

Maybe make this an overload instead of a method with a different name? #Closed

OK. In order to encourage "early" resolution I will also move the loss argument to be right after the delegate, and make it a non-default argument to encourage people to actually set it.

In reply to: 215378059 [](ancestors = 215378059)

Ooo... hmmm. Maybe it has to go after features, but before weights. Hmmm. Hmmm.

In reply to: 215414897 [](ancestors = 215414897,215378059)

…w method.

TomFinley · 2018-09-05T21:41:30Z

src/Microsoft.ML.Data/StaticPipe/SchemaBearing.cs

        /// </summary>
-        internal Estimator<TTupleShape, TTupleShape, ITransformer> MakeNewEstimator()
+        /// <returns>An empty estimator with the same shape as the object on which it was created</returns>
+        public Estimator<TTupleShape, TTupleShape, ITransformer> MakeNewEstimator()


public [](start = 8, length = 6)

As discussed here I believe the POR for right now until we change our minds again is for the method to create new estimator pipes to exist directly on these schema bearing objects (whatever they may be).

Zruty0 · 2018-09-05T23:05:23Z

src/Microsoft.ML.StandardLearners/Standard/SdcaStatic.cs

+        /// <param name="onFit">A delegate that is called every time the
+        /// <see cref="Estimator{TTupleInShape, TTupleOutShape, TTransformer}.Fit(DataView{TTupleInShape})"/> method is called on the
+        /// <see cref="Estimator{TTupleInShape, TTupleOutShape, TTransformer}"/> instance created out of this. This delegate will receive
+        /// the linear model that was learnt.  Note that this action cannot change the result in any way; it is only a way for the caller to


learnt [](start = 38, length = 6)

maybe 'trained'? :)

Yes. I think the first instance can be trained, but I feel like the trailing word has to be learnt or something.

Zruty0

sfilipi

TomFinley self-assigned this Sep 5, 2018

TomFinley requested review from Ivanidzo4ka, Zruty0 and sfilipi September 5, 2018 06:21

TomFinley force-pushed the tfinley/Pigsty branch from 22dfd0b to b07b7f2 Compare September 5, 2018 14:55

sfilipi reviewed Sep 5, 2018

View reviewed changes

Zruty0 reviewed Sep 5, 2018

View reviewed changes

TomFinley added 2 commits September 5, 2018 14:35

SDCA Regression and BinaryClassification Pigsty extensions

74708e7

Use method on the typed instances intead of static Estimator.CreateNe…

e5abdc5

…w method.

TomFinley force-pushed the tfinley/Pigsty branch from b07b7f2 to 68cfc30 Compare September 5, 2018 21:37

TomFinley commented Sep 5, 2018

View reviewed changes

Review comments Senja & Pete

ba6a232

TomFinley force-pushed the tfinley/Pigsty branch from 68cfc30 to ba6a232 Compare September 5, 2018 21:45

Fix resource for test to use MakeNewEstimator

4d707fb

Zruty0 reviewed Sep 5, 2018

View reviewed changes

Zruty0 approved these changes Sep 5, 2018

View reviewed changes

learnt => trained

ed2f33f

sfilipi approved these changes Sep 5, 2018

View reviewed changes

TomFinley merged commit 5654d72 into dotnet:master Sep 6, 2018

TomFinley deleted the tfinley/Pigsty branch September 6, 2018 01:39

ghost locked as resolved and limited conversation to collaborators Mar 29, 2022

Conversation

TomFinley commented Sep 5, 2018

Uh oh!

sfilipi Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomFinley Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomFinley Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zruty0 Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zruty0 Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zruty0 Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomFinley Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zruty0 Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zruty0 Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zruty0 Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

sfilipi Sep 5, 2018 •

edited

Loading

sfilipi Sep 5, 2018 •

edited

Loading

TomFinley Sep 5, 2018 •

edited

Loading

sfilipi Sep 5, 2018 •

edited

Loading

sfilipi Sep 5, 2018 •

edited

Loading

sfilipi Sep 5, 2018 •

edited

Loading

TomFinley Sep 5, 2018 •

edited

Loading

sfilipi Sep 5, 2018 •

edited

Loading

Zruty0 Sep 5, 2018 •

edited

Loading

Zruty0 Sep 5, 2018 •

edited

Loading

Zruty0 Sep 5, 2018 •

edited

Loading

TomFinley Sep 5, 2018 •

edited

Loading

Zruty0 Sep 5, 2018 •

edited

Loading

Zruty0 Sep 5, 2018 •

edited

Loading

Zruty0 Sep 5, 2018 •

edited

Loading

Zruty0 Sep 5, 2018 •

edited

Loading

Zruty0 Sep 5, 2018 •

edited

Loading

Zruty0 Sep 5, 2018 •

edited

Loading