Fix: correctly order the ground truth and prediction for ARFF files in run.data_content#1209
Merged
mfeurer merged 22 commits intoopenml:developfrom Feb 24, 2023
LennartPurucker:develop
Merged
Fix: correctly order the ground truth and prediction for ARFF files in run.data_content#1209mfeurer merged 22 commits intoopenml:developfrom LennartPurucker:develop
mfeurer merged 22 commits intoopenml:developfrom
LennartPurucker:develop
Conversation
mfeurer
reviewed
Feb 21, 2023
new unit test for run consistency and bug fixed in read from xml
mfeurer
approved these changes
Feb 22, 2023
Collaborator
mfeurer
left a comment
There was a problem hiding this comment.
Looks good to me, let's wait for the unit tests to work again and then merge this.
PGijsbers
reviewed
Feb 22, 2023
|
|
||
| * Add new contributions here. | ||
|
|
||
| * FIX #1197 #559 #1131: Fix the order of ground truth and predictions in the ``OpenMLRun`` object and in ``format_prediction``. |
Collaborator
There was a problem hiding this comment.
Can you add that it is specifically about regression tasks? thanks
Contributor
Author
There was a problem hiding this comment.
The switched order is a problem for regression and classification tasks (and maybe learning curve).
PGijsbers
reviewed
Feb 22, 2023
PGijsbers
reviewed
Feb 23, 2023
PGijsbers
reviewed
Feb 23, 2023
Co-authored-by: Pieter Gijsbers <[email protected]>
PGijsbers
approved these changes
Feb 23, 2023
* Add sklearn marker
* Mark tests that use scikit-learn
* Only run scikit-learn tests multiple times
The generic tests that don't use scikit-learn should only be tested once
(per platform).
* Rename for correct variable
* Add sklearn mark for filesystem test
* Remove quotes around sklearn
* Instead include sklearn in the matrix definition
* Update jobnames
* Add explicit false to jobname
* Remove space
* Add function inside of expression?
* Do string testing instead
* Add missing ${{
* Add explicit true to old sklearn tests
* Add instruction to add pytest marker for sklearn tests
…o develop # Conflicts: # tests/test_runs/test_run.py
…lt of the random state problems for sklearn < 0.24
mfeurer
approved these changes
Feb 24, 2023
This was referenced Feb 24, 2023
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug - What does this PR fix?
The order of the ground truth and predictions is mixed up in the current implementation of how a run stores prediction data and how it uses the prediction data to build an ARFF file.
As a result, the ground truth is treated as the predictions, and the predictions are treated as the ground truth.
Consequently, publishing a run to uploads the wrong values for these columns to the OpenML server.
Impact
This is not validated on the server side (to my understanding). Hence, all ARFF files of predictions uploaded using the OpenML Python API are most likely wrong. Moreover, evaluations of such runs also report the wrong scores. This might have impacted the results of papers that used scores of runs uploaded by the Python Client for meta-analysis.
Reference Issues
Multiple issues exist as a result of this. The following issues are likely related to this problem: #1197,
#559, openml/OpenML#1185Fix
I changed the order in the appropiate places and added a test for the (IMO) expected/correct behavior. Additionally, I changed the tests that checked for the old order to the new order.
New Order
The order I used follows the order of ARFF files uploaded by the R Client API. I used the following code snippet to find the order.
Open Questions
The contribution guidelines mention that I should change the progress.rst. I am unsure if this would be below 13.1 and what I should mention there. What do you think?I updated the progress.rst for the newest version.Required (Server-Side) Follow-up Actions
This bug affects a lot of already published runs on OpenML.
We might need to change/adjust the uploaded ARFF files and re-evaluate all these runs.