Skip to content

fix: remove cosine similarity field with false assignment#835

Merged
JoanFM merged 14 commits intomainfrom
fix-distance-metrics-in-storage-backends
Dec 1, 2022
Merged

fix: remove cosine similarity field with false assignment#835
JoanFM merged 14 commits intomainfrom
fix-distance-metrics-in-storage-backends

Conversation

@anna-charlotte
Copy link
Copy Markdown
Contributor

@anna-charlotte anna-charlotte commented Nov 23, 2022

Signed-off-by: anna-charlotte [email protected]

In the weaviate storage backend, the distance is mistakenly being assigned to the cosine_similarity field. Remove this line, since weaviate does not provide cosine similarity, but instead (cosine) distance.

Added some more documentation on the keys to access in the .scores dictionary, since most are different to the default.

Goals:

  • Remove cosine_similarity assignment from weaviate backend.
  • check and update documentation, if required. See guide

@anna-charlotte anna-charlotte changed the title fix: remove cosine similarity field with false assignment and set metric_name default to score fix: remove cosine similarity field with false assignment Nov 23, 2022
@anna-charlotte anna-charlotte linked an issue Nov 23, 2022 that may be closed by this pull request
@anna-charlotte anna-charlotte marked this pull request as ready for review November 23, 2022 11:11
@anna-charlotte anna-charlotte marked this pull request as draft November 23, 2022 13:00
Signed-off-by: anna-charlotte <[email protected]>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Nov 23, 2022

Codecov Report

Base: 88.13% // Head: 88.13% // Decreases project coverage by -0.00% ⚠️

Coverage data is based on head (a202cef) compared to base (37c6001).
Patch has no changes to coverable lines.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #835      +/-   ##
==========================================
- Coverage   88.13%   88.13%   -0.01%     
==========================================
  Files         138      138              
  Lines        7137     7136       -1     
==========================================
- Hits         6290     6289       -1     
  Misses        847      847              
Flag Coverage Δ
docarray 88.13% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/array/storage/weaviate/find.py 88.00% <ø> (-0.16%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@github-actions github-actions bot added size/s and removed size/xs labels Nov 23, 2022
@anna-charlotte anna-charlotte force-pushed the fix-distance-metrics-in-storage-backends branch from ecbf52c to 7fafa58 Compare November 23, 2022 16:20
@anna-charlotte anna-charlotte marked this pull request as ready for review November 25, 2022 16:07
anna-charlotte and others added 2 commits November 28, 2022 10:43
Copy link
Copy Markdown
Contributor

@alexcg1 alexcg1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Signed-off-by: anna-charlotte <[email protected]>
@github-actions github-actions bot added size/m and removed size/s labels Nov 28, 2022
JoanFM
JoanFM previously requested changes Nov 28, 2022
Copy link
Copy Markdown
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a common key in scores for every docstore besides an specialized one? like this is easier to have same code for every backend?

@anna-charlotte
Copy link
Copy Markdown
Contributor Author

Yes we have discussed this, too. I think it would be nicer to have the same solution for all storages, but changing it now, would be a breaking change, right? Do u think it is worth it?
Also, some storages only allow the key 'score' which results in not allowing to save scores for several distances (e.g for 'euclidean' as well as 'cosine'), but only one.
I think it would be nice to use the self.distance value as a default key in scores. This way it is the same for all storages, even when they have different options for the distance parameter.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 1, 2022

@JoanFM JoanFM dismissed their stale review December 1, 2022 09:54

taken care

@JoanFM JoanFM merged commit 59db0ff into main Dec 1, 2022
@JoanFM JoanFM deleted the fix-distance-metrics-in-storage-backends branch December 1, 2022 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Weaviate's cosine similarity vs cosine distance

5 participants