Skip to content

support primary key #266

@nautaa

Description

@nautaa

The primary key plan supports data deduplication.

First of all, the primary key on a single column is supported. When data is inserted, judge whether the row appears in the table by primary key. If there is already row with the same primary key, the insertion is skipped.

The initial plan is to achieve deduplication by maintaining a deduplication container for each table. When the database is restarted, the primary key column is read from the disk and container in memory is rebuilt.

After investigation, roaring bitmap is a compressed bitmap index with excellent performance and less memory usage.

We can use RoaringBitmap and RoaringTreemap in roaring-rs to store ordinary integer primary keys. For string types that cannot be supported by roaring bitmap, we can use HashSet storage.

Also, where can the deduplication container of each table be placed appropriately, can it be placed in the MetaStore?

  • sql parse
  • deduplication by primary key when data inserting
  • recovery
  • performance test

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions