Tags: yaakovs/DiskANN
Tags
Revised inner product (microsoft#10) * working towards inner product in memory indices * done with in-memory code * made the inner product distance function return std::float_max if negative * more changes for disk index support * on the way to disk index support for MIPS * works now, need to change the PQ generation for MIPS * now incorporated disk+memory search for inner product * support for mips and l2 * changed inner product to -IP rather than 1/IP * towards adding support for storing PQ vectors in disk index for very large data * towards adding support for storing PQ vectors in disk index for very large data * halfway through PQ-based disk search option * code compiles for disk index pq * fixed some bug * shards are written as and when necessary * sharding is now on demand * minor changes * fixed one malloc bug in parameters * added a vector analyzer util * added missing file * fixed a bug which used L2 instead of inner product in cached beam search * now setting up the normalizing approach * towards pre-processing data * working towards newer inner product * more changes to do MIPS by reducing to L2 with extra coordinate * cleaned up code a bit, need to test everything again * testing underway * added back saturate graph to create denser indices * now we dont sample a new test dataset every iteration for estimating sharding * now num_parts increases by 2 * cleaned up warnings in Debug mode compiler * working towards inner product in memory indices * done with in-memory code * made the inner product distance function return std::float_max if negative * more changes for disk index support * on the way to disk index support for MIPS * works now, need to change the PQ generation for MIPS * now incorporated disk+memory search for inner product * support for mips and l2 * changed inner product to -IP rather than 1/IP * towards adding support for storing PQ vectors in disk index for very large data * towards adding support for storing PQ vectors in disk index for very large data * halfway through PQ-based disk search option * code compiles for disk index pq * fixed some bug * shards are written as and when necessary * sharding is now on demand * minor changes * fixed one malloc bug in parameters * added a vector analyzer util * added missing file * fixed a bug which used L2 instead of inner product in cached beam search * now setting up the normalizing approach * towards pre-processing data * working towards newer inner product * more changes to do MIPS by reducing to L2 with extra coordinate * cleaned up code a bit, need to test everything again * testing underway * added back saturate graph to create denser indices * now we dont sample a new test dataset every iteration for estimating sharding * now num_parts increases by 2 * cleaned up warnings in Debug mode compiler * added a normalizer to vector analysis * fixed one bug for MIPS * addressed all comments of PR * fixed minor typos. now running unit tests * ran clang-format as it doesnt run by default due to LINUX flag not set anywhere * clang introduced a bug in distance.h, fixed itt * added unit tester partially * minor bugfix * finished unit tester * changed back training size to 100K for now, we can increase to 1M later if necessary * added comments for unit_tester.sh * added auto tuning parameters for unit tester * re-ran clang formatting * small change to unit tester * fixed minor bug in unit tester * fixed some formatting on unit tester * started code for range search support in pq_flash_index * added more code for range search in disk index * added range search support * tested range search on small dataset * Update memory_mapper.h * minor edits Co-authored-by: ravishankar <[email protected]>
format conversion tools (microsoft#9) * added tsv to bin format convertor * added tool to convert float binary to int8 binary