Conversation
Codecov Report
@@ Coverage Diff @@
## main #301 +/- ##
==========================================
+ Coverage 86.51% 86.53% +0.02%
==========================================
Files 134 134
Lines 6385 6388 +3
==========================================
+ Hits 5524 5528 +4
+ Misses 861 860 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
scripts/benchmarking_dataset.py
Outdated
|
|
||
|
|
||
| def run_benchmark( | ||
| X_tr, X_te, dataset, n_index_values, n_vector_queries, n_query, storage_backends |
There was a problem hiding this comment.
let's use more meaningful variable names than X_tr and X_te. maybe just test and train ?
| X_tr, X_te, dataset, n_index_values, n_vector_queries, n_query, storage_backends | ||
| ): | ||
| table = Table( | ||
| title=f'DocArray Benchmarking n_index={n_index_values[-1]} n_query={n_query} D={D} K={K}' |
There was a problem hiding this comment.
I guess since n_index_values always contain 1 element, maybe it shouldn't be a list
| console.print(f'\treading {n_query} docs ...') | ||
| read_time, _ = read( | ||
| da, | ||
| random.sample([d.id for d in docs], n_query), |
There was a problem hiding this comment.
let's try to have the same query for all backends
scripts/benchmarking_dataset.py
Outdated
| ground_truth = [ | ||
| x for x in dataset['neighbors'][0 : len(vector_queries)] | ||
| ] |
There was a problem hiding this comment.
let's put this in a higher scope
| find_by_condition_time, _ = find_by_condition( | ||
| da, {'tags__i': {'$eq': 0}} | ||
| ) | ||
| if idx == len(n_index_values) - 1: |
There was a problem hiding this comment.
no need for this check once n_index_values becomes 1 value instead of list
|
btw, it looks like sift dataset needs euclidean distance |
|
Closing until further notice |
No description provided.