Skip to content

feat: multiple metrics in evaluate#643

Merged
JoanFM merged 26 commits intomainfrom
feat-multiple-metrics-in-evaluate
Oct 19, 2022
Merged

feat: multiple metrics in evaluate#643
JoanFM merged 26 commits intomainfrom
feat-multiple-metrics-in-evaluate

Conversation

@guenthermi
Copy link
Copy Markdown
Contributor

Goals:

  • allow users to pass multiple metrics to the evaluation function
  • calculate all metric scores at once
  • check and update documentation, if required. See guide

@guenthermi guenthermi linked an issue Oct 17, 2022 that may be closed by this pull request
@codecov
Copy link
Copy Markdown

codecov bot commented Oct 17, 2022

Codecov Report

Merging #643 (096dbee) into main (5dbf1a3) will decrease coverage by 0.04%.
The diff coverage is 76.31%.

@@            Coverage Diff             @@
##             main     #643      +/-   ##
==========================================
- Coverage   86.50%   86.45%   -0.05%     
==========================================
  Files         133      133              
  Lines        6735     6755      +20     
==========================================
+ Hits         5826     5840      +14     
- Misses        909      915       +6     
Flag Coverage Δ
docarray 86.45% <76.31%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/array/mixins/evaluation.py 80.72% <76.31%> (-3.41%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@JoanFM JoanFM changed the title Feat multiple metrics in evaluate feat: multiple metrics in evaluate Oct 17, 2022
Copy link
Copy Markdown
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a breaking change with respect to the return of evaluate?

@guenthermi
Copy link
Copy Markdown
Contributor Author

is this a breaking change with respect to the return of evaluate?

Yes, I could return only a single number if the user passes only one metric but having different return types for different inputs is a bit strange

@guenthermi
Copy link
Copy Markdown
Contributor Author

depends on #617

@guenthermi guenthermi requested a review from JoanFM October 18, 2022 15:39
@alexcg1
Copy link
Copy Markdown
Contributor

alexcg1 commented Oct 19, 2022

@guenthermi remember to tag @NicholasDunham for any future PRs that include docs (or docstrings)

Copy link
Copy Markdown

@bwanglzu bwanglzu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two comments:

  1. i don't like the fact that metrics='precision_at_k' should be supported, i suppose metrics should be always accept a list, not a str.
  2. can we provide a all option to evaluate all metrics we have?

it's like what we're doing in callbacks, user should always pass a list of callbacks even user only need one.
I'm expecting we:

.evaluate(metrics=['precision_at_k'])

@guenthermi
Copy link
Copy Markdown
Contributor Author

guenthermi commented Oct 19, 2022

two comments:

  1. i don't like the fact that metrics='precision_at_k' should be supported, i suppose metrics should be always accept a list, not a str.
  2. can we provide a all option to evaluate all metrics we have?

it's like what we're doing in callbacks, user should always pass a list of callbacks even user only need one. I'm expecting we:

.evaluate(metrics=['precision_at_k'])

I don't care much if we support passing only a string or not.
@JoanFM @samsja @JohannesMessner Do you have an opinion about it?

Having an all keyword is not a good idea, because the metric functions have different use cases, e.g., the ndcg metric should not be applied on a top k ranking while for other metrics this makes sense.

@JoanFM
Copy link
Copy Markdown
Member

JoanFM commented Oct 19, 2022

two comments:

  1. i don't like the fact that metrics='precision_at_k' should be supported, i suppose metrics should be always accept a list, not a str.
  2. can we provide a all option to evaluate all metrics we have?

it's like what we're doing in callbacks, user should always pass a list of callbacks even user only need one. I'm expecting we:

.evaluate(metrics=['precision_at_k'])

I don't care much if we support passing only a string or not. @JoanFM @samsja @JohannesMessner Do you have an opinion about it?

Having an all keyword is not a good idea, because the metric functions have different use cases, e.g., the ndcg metric should not be applied on a top k ranking while for other metrics this makes sense.

I believe, that to keep backwards compatibility we need to accept at least metric=STR but I agree, it is cleaner if user needs to pass list. But if it is complex, this seems like a minor issue to me

@JoanFM
Copy link
Copy Markdown
Member

JoanFM commented Oct 19, 2022

two comments:

  1. i don't like the fact that metrics='precision_at_k' should be supported, i suppose metrics should be always accept a list, not a str.
  2. can we provide a all option to evaluate all metrics we have?

it's like what we're doing in callbacks, user should always pass a list of callbacks even user only need one. I'm expecting we:

.evaluate(metrics=['precision_at_k'])

I don't care much if we support passing only a string or not. @JoanFM @samsja @JohannesMessner Do you have an opinion about it?
Having an all keyword is not a good idea, because the metric functions have different use cases, e.g., the ndcg metric should not be applied on a top k ranking while for other metrics this makes sense.

I believe, that to keep backwards compatibility we need to accept at least metric=STR but I agree, it is cleaner if user needs to pass list. But if it is complex, this seems like a minor issue to me

Agree, all is not good

@bwanglzu
Copy link
Copy Markdown

make sense, ignore my all suggestion.

@guenthermi
Copy link
Copy Markdown
Contributor Author

two comments:

  1. i don't like the fact that metrics='precision_at_k' should be supported, i suppose metrics should be always accept a list, not a str.
  2. can we provide a all option to evaluate all metrics we have?

it's like what we're doing in callbacks, user should always pass a list of callbacks even user only need one. I'm expecting we:

.evaluate(metrics=['precision_at_k'])

I don't care much if we support passing only a string or not. @JoanFM @samsja @JohannesMessner Do you have an opinion about it?
Having an all keyword is not a good idea, because the metric functions have different use cases, e.g., the ndcg metric should not be applied on a top k ranking while for other metrics this makes sense.

I believe, that to keep backwards compatibility we need to accept at least metric=STR but I agree, it is cleaner if user needs to pass list. But if it is complex, this seems like a minor issue to me

We could transform strings into a list of one string in the deprecation decorator and adjust the documentation. So I would not say that it is complex

@JoanFM
Copy link
Copy Markdown
Member

JoanFM commented Oct 19, 2022

two comments:

  1. i don't like the fact that metrics='precision_at_k' should be supported, i suppose metrics should be always accept a list, not a str.
  2. can we provide a all option to evaluate all metrics we have?

it's like what we're doing in callbacks, user should always pass a list of callbacks even user only need one. I'm expecting we:

.evaluate(metrics=['precision_at_k'])

I don't care much if we support passing only a string or not. @JoanFM @samsja @JohannesMessner Do you have an opinion about it?
Having an all keyword is not a good idea, because the metric functions have different use cases, e.g., the ndcg metric should not be applied on a top k ranking while for other metrics this makes sense.

I believe, that to keep backwards compatibility we need to accept at least metric=STR but I agree, it is cleaner if user needs to pass list. But if it is complex, this seems like a minor issue to me

We could transform strings into a list of one string in the deprecation decorator and adjust the documentation. So I would not say that it is complex

then go for it

Copy link
Copy Markdown

@bwanglzu bwanglzu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Member

@samsja samsja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can make the documentation a bit better for evaluation. @NicholasDunham what do u think ?

Copy link
Copy Markdown
Member

@samsja samsja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docstring for evaluate are not reported because of the decorator

old:
Screenshot from 2022-10-19 13-55-51

new:

Screenshot from 2022-10-19 13-54-52

@guenthermi
Copy link
Copy Markdown
Contributor Author

guenthermi commented Oct 19, 2022

I think that we can make the documentation a bit better for evaluation. @NicholasDunham what do u think ?

Maybe we can rewrite the documentation in another PR. There are also some old features of the evaluate functions which are not yet documented and today, I have not more time to do this.

@samsja
Copy link
Copy Markdown
Member

samsja commented Oct 19, 2022

Maybe we can rewrite the documentation in another PR. There are also some old features of the evaluate functions which are not yet documented and today, I have not more time to do this.

Okay then lets open an issue for improving this documentation and please align with @NicholasDunham on how do refactor it

@guenthermi
Copy link
Copy Markdown
Contributor Author

Maybe we can rewrite the documentation in another PR. There are also some old features of the evaluate functions which are not yet documented and today, I have not more time to do this.

Okay then lets open an issue for improving this documentation and please align with @NicholasDunham on how do refactor it

I created #651 for the documentation changes.

@github-actions
Copy link
Copy Markdown

📝 Docs are deployed on https://ft-feat-multiple-metrics-in-evaluate--jina-docs.netlify.app 🎉

@JoanFM JoanFM merged commit 135c007 into main Oct 19, 2022
@JoanFM JoanFM deleted the feat-multiple-metrics-in-evaluate branch October 19, 2022 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

allow evaluate function to calculate multiple metrics at once

6 participants