feat: add benchmark adapted for sift1m#301

Closed

davidbp wants to merge 11 commits intomainfrom

feat-benchmark-sift

Contributor

davidbp commented Apr 25, 2022

No description provided.


          feat: add benchmark adapted for sift1m

e8e6fbc

github-actions bot added the size/m label

davidbp and others added 5 commits

April 25, 2022 12:11


          fix: order

42ca6a0


          fix: drop sqlite in table

0bbc88c


          fix: delete docarray for once finished

a6584d4


          fix: correctly drop sqlite

c543f95


          feat: speedup benchmark from groundtruth

ed511c9

github-actions bot added size/l and removed size/m labels


          refactor: sift1M

ba51b23

codecov bot commented Apr 26, 2022 •

edited

Loading

Codecov Report

Merging #301 (1ce1433) into main (1482421) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #301      +/-   ##
==========================================
+ Coverage   86.51%   86.53%   +0.02%     
==========================================
  Files         134      134              
  Lines        6385     6388       +3     
==========================================
+ Hits         5524     5528       +4     
+ Misses        861      860       -1

Flag	Coverage Δ
docarray	`86.53% <100.00%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
docarray/__init__.py	`75.00% <100.00%> (ø)`
docarray/array/storage/weaviate/find.py	`86.66% <0.00%> (ø)`
docarray/array/storage/annlite/find.py	`93.33% <0.00%> (+10.00%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aad1e1f...1ce1433. Read the comment docs.

alaeddine-13 reviewed

View reviewed changes

scripts/benchmarking_dataset.py Outdated



		def run_benchmark(
		X_tr, X_te, dataset, n_index_values, n_vector_queries, n_query, storage_backends

Member

alaeddine-13 Apr 26, 2022

let's use more meaningful variable names than X_tr and X_te. maybe just test and train ?

scripts/benchmarking_dataset.py

+                  X_tr, X_te, dataset, n_index_values, n_vector_queries, n_query, storage_backends
+              ):
+                  table = Table(
+                      title=f'DocArray Benchmarking n_index={n_index_values[-1]} n_query={n_query} D={D} K={K}'

Member

alaeddine-13 Apr 26, 2022

I guess since n_index_values always contain 1 element, maybe it shouldn't be a list

scripts/benchmarking_dataset.py

+                              console.print(f'\treading {n_query} docs ...')
+                              read_time, _ = read(
+                                  da,
+                                  random.sample([d.id for d in docs], n_query),

Member

alaeddine-13 Apr 26, 2022

let's try to have the same query for all backends

scripts/benchmarking_dataset.py Outdated

Comment on lines +136 to +138

+                                  ground_truth = [
+                                      x for x in dataset['neighbors'][0 : len(vector_queries)]
+                                  ]

Member

alaeddine-13 Apr 26, 2022

let's put this in a higher scope

scripts/benchmarking_dataset.py

+                              find_by_condition_time, _ = find_by_condition(
+                                  da, {'tags__i': {'$eq': 0}}
+                              )
+                              if idx == len(n_index_values) - 1:

Member

alaeddine-13 Apr 26, 2022

no need for this check once n_index_values becomes 1 value instead of list

Member

alaeddine-13 commented Apr 26, 2022

btw, it looks like sift dataset needs euclidean distance

davidbp added 2 commits

April 26, 2022 17:16


          fix: recall bug

aed73b1


          refactor: get groundtruth before

ea658c1

hanxiao linked an issue

that may be closed by this pull request

refactor benchmarks to use a dataset instead of random data #298

Closed

alaeddine-13 added 2 commits

April 28, 2022 12:10


          Merge branch 'main' into feat-benchmark-sift

3cd87bb


          Merge branch 'main' into feat-benchmark-sift

1ce1433

Member

JoanFM commented Jun 21, 2022

Closing until further notice

JoanFM closed this

AnneYang720 mentioned this pull request

feat: add sift1m to benchmark #614

Closed

5 tasks

AnneYang720 mentioned this pull request

docs: add new benchmark page with SIFT1M dataset #691

Merged

5 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels