Skip to content

Tags: yaakovs/DiskANN

Tags

0.2

Toggle 0.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Revised inner product (microsoft#10)

* working towards inner product in memory indices

* done with in-memory code

* made the inner product distance function return std::float_max if negative

* more changes for disk index support

* on the way to disk index support for MIPS

* works now, need to change the PQ generation for MIPS

* now incorporated disk+memory search for inner product

* support for mips and l2

* changed inner product to -IP rather than 1/IP

* towards adding support for storing PQ vectors in disk index for very large data

* towards adding support for storing PQ vectors in disk index for very large data

* halfway through PQ-based disk search option

* code compiles for disk index pq

* fixed some bug

* shards are written as and when necessary

* sharding is now on demand

* minor changes

* fixed one malloc bug in parameters

* added a vector analyzer util

* added missing file

* fixed a bug which used L2 instead of inner product in cached beam search

* now setting up the normalizing approach

* towards pre-processing data

* working towards newer inner product

* more changes to do MIPS by reducing to L2 with extra coordinate

* cleaned up code a bit, need to test everything again

* testing underway

* added back saturate graph to create denser indices

* now we dont sample a new test dataset every iteration for estimating sharding

* now num_parts increases by 2

* cleaned up warnings in Debug mode compiler

* working towards inner product in memory indices

* done with in-memory code

* made the inner product distance function return std::float_max if negative

* more changes for disk index support

* on the way to disk index support for MIPS

* works now, need to change the PQ generation for MIPS

* now incorporated disk+memory search for inner product

* support for mips and l2

* changed inner product to -IP rather than 1/IP

* towards adding support for storing PQ vectors in disk index for very large data

* towards adding support for storing PQ vectors in disk index for very large data

* halfway through PQ-based disk search option

* code compiles for disk index pq

* fixed some bug

* shards are written as and when necessary

* sharding is now on demand

* minor changes

* fixed one malloc bug in parameters

* added a vector analyzer util

* added missing file

* fixed a bug which used L2 instead of inner product in cached beam search

* now setting up the normalizing approach

* towards pre-processing data

* working towards newer inner product

* more changes to do MIPS by reducing to L2 with extra coordinate

* cleaned up code a bit, need to test everything again

* testing underway

* added back saturate graph to create denser indices

* now we dont sample a new test dataset every iteration for estimating sharding

* now num_parts increases by 2

* cleaned up warnings in Debug mode compiler

* added a normalizer to vector analysis

* fixed one bug for MIPS

* addressed all comments of PR

* fixed minor typos. now running unit tests

* ran clang-format as it doesnt run by default due to LINUX flag not set anywhere

* clang introduced a bug in distance.h, fixed itt

* added unit tester partially

* minor bugfix

* finished unit tester

* changed back training size to 100K for now, we can increase to 1M later if necessary

* added comments for unit_tester.sh

* added auto tuning parameters for unit tester

* re-ran clang formatting

* small change to unit tester

* fixed minor bug in unit tester

* fixed some formatting on unit tester

* started code for range search support in pq_flash_index

* added more code for range search in disk index

* added range search support

* tested range search on small dataset

* Update memory_mapper.h

* minor edits

Co-authored-by: ravishankar <[email protected]>

0.1

Toggle 0.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
format conversion tools (microsoft#9)

* added tsv to bin format convertor

* added tool to convert float binary to int8 binary