Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
12 changes: 6 additions & 6 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ metaStore

# Test data and runtime files
store-*/
store-*-rocksdb/
store-*-pebble/
*.wal
*.tmp

# RocksDB platform-specific builds (generated by make rocksdb)
third_party/rocksdb/darwin-arm64/
third_party/rocksdb/darwin-x86_64/
third_party/rocksdb/linux-x86_64/
third_party/rocksdb/linux-aarch64/
# Pebble platform-specific builds (generated by make pebble)
third_party/pebble/darwin-arm64/
third_party/pebble/darwin-x86_64/
third_party/pebble/linux-x86_64/
third_party/pebble/linux-aarch64/

etcd-3.6.7/
.worktrees/
Expand Down
339 changes: 46 additions & 293 deletions Makefile

Large diffs are not rendered by default.

78 changes: 15 additions & 63 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ A lightweight, high-performance, production-ready distributed metadata managemen
- **🏗️ Raft Consensus**: Built on etcd's battle-tested raft library for strong consistency
- **🚀 High Availability**: Tolerates up to (N-1)/2 node failures in an N-node cluster
- **👁️ 2-Node HA with Witness**: Support for 2 data nodes + 1 lightweight witness node for cost-effective HA
- **💾 Dual Storage Modes**: Memory+WAL (fast) or RocksDB (persistent)
- **💾 Dual Storage Modes**: Memory+WAL (fast) or Pebble (persistent, pure Go)
- **📊 Observability**: Prometheus metrics, structured logging, and health checks
- **🔧 Production Features**: Graceful shutdown, panic recovery, rate limiting, and input validation

Expand Down Expand Up @@ -241,7 +241,7 @@ See [docs/MYSQL_API_QUICKSTART.md](docs/MYSQL_API_QUICKSTART.md) for complete My

#### Performance Optimization
- ✅ Object pooling for KV pairs (reduces GC pressure)
- ✅ Memory-mapped I/O for RocksDB
- ✅ Pebble storage engine (pure Go, no CGO required)
- ✅ Efficient serialization with protobuf
- ✅ Connection pooling and keep-alive

Expand All @@ -261,13 +261,11 @@ See [docs/MYSQL_API_QUICKSTART.md](docs/MYSQL_API_QUICKSTART.md) for complete My
git clone https://github.com/axfor/MetaStore.git
cd MetaStore

# Build with Make (recommended)
# Build with Make (recommended, pure Go, no CGO required)
make build

# Or build manually
export CGO_ENABLED=1
export CGO_LDFLAGS="-lrocksdb -lpthread -lstdc++ -ldl -lm -lzstd -llz4 -lz -lsnappy -lbz2"
go build -o metastore cmd/metastore/main.go
CGO_ENABLED=0 go build -ldflags="-s -w" -o metastore cmd/metastore/main.go
```

### Running a Single Node
Expand All @@ -276,9 +274,9 @@ go build -o metastore cmd/metastore/main.go
# Memory + WAL mode (default, fast)
./metastore --member-id 1 --cluster http://127.0.0.1:12379 --port 12380

# RocksDB mode (persistent)
# Pebble mode (persistent, pure Go)
mkdir -p data
./metastore --member-id 1 --cluster http://127.0.0.1:12379 --port 12380 --storage rocksdb
./metastore --member-id 1 --cluster http://127.0.0.1:12379 --port 12380 --storage pebble
```

### Using etcd Client
Expand Down Expand Up @@ -338,7 +336,7 @@ func main() {
```bash
# Using Make
make cluster-memory # Memory storage cluster
make cluster-rocksdb # RocksDB storage cluster
make cluster-pebble # Pebble storage cluster

# Check cluster status
make status
Expand Down Expand Up @@ -508,68 +506,22 @@ go test -v -run="TestCrossProtocol" ./test
- 🔍 [Performance Assessment](docs/ASSESSMENT_PERFORMANCE.md) - Performance analysis
- 🔍 [Best Practices Assessment](docs/ASSESSMENT_BEST_PRACTICES.md) - Go best practices compliance

### RocksDB Documentation
- 🔧 [RocksDB Build Guide (macOS)](docs/ROCKSDB_BUILD_MACOS.md) - macOS build instructions
- 🔧 [RocksDB Test Guide](docs/ROCKSDB_TEST_GUIDE.md) - RocksDB testing
- 📊 [RocksDB Test Report](docs/ROCKSDB_TEST_REPORT.md) - Test results

## 🏗️ Building from Source

### Prerequisites
- **Go 1.23 or higher**
- **CGO enabled** (`CGO_ENABLED=1`)
- **RocksDB C++ library** (for RocksDB storage mode)
- No CGO or C libraries required (pure Go build)

### Linux (Ubuntu/Debian)
### All Platforms (Linux, macOS, Windows)

```bash
# Install dependencies
sudo apt-get update
sudo apt-get install -y librocksdb-dev build-essential

# Build
export CGO_ENABLED=1
export CGO_LDFLAGS="-lrocksdb -lpthread -lstdc++ -ldl -lm -lzstd -llz4 -lz -lsnappy -lbz2"
go build -ldflags="-s -w" -o metastore cmd/metastore/main.go
```

### macOS

```bash
# Install dependencies
brew install rocksdb

# Build
export CGO_ENABLED=1
export CGO_LDFLAGS="-lrocksdb -lpthread -lstdc++ -ldl -lm -lzstd -llz4 -lz -lsnappy -lbz2"
go build -ldflags="-s -w" -o metastore cmd/metastore/main.go
```

### Build from RocksDB Source (Latest Version)

For the latest RocksDB version with optimal performance:
# Build (pure Go, cross-platform)
make build

```bash
# Install build dependencies (Ubuntu)
sudo apt-get install -y gcc-c++ make cmake git \
libsnappy-dev zlib1g-dev libbz2-dev liblz4-dev libzstd-dev

# Clone and build RocksDB v10.7.5
git clone --branch v10.7.5 https://github.com/facebook/rocksdb.git
cd rocksdb
make clean
make static_lib -j$(nproc)
sudo make install

# Build MetaStore
cd /path/to/MetaStore
export CGO_ENABLED=1
export CGO_LDFLAGS="-lrocksdb -lpthread -lstdc++ -ldl -lm -lzstd -llz4 -lz -lsnappy -lbz2"
go build -ldflags="-s -w" -o metastore cmd/metastore/main.go
# Or manually
CGO_ENABLED=0 go build -ldflags="-s -w" -o metastore cmd/metastore/main.go
```

See [ROCKSDB_BUILD_MACOS.md](docs/ROCKSDB_BUILD_MACOS.md) for macOS-specific instructions.

## 🔧 Configuration

MetaStore can be configured via:
Expand All @@ -587,7 +539,7 @@ Flags:
--cluster string Comma-separated cluster peer URLs
--port int HTTP API port (default: 9121)
--grpc-port int gRPC API port (default: 2379)
--storage string Storage engine: "memory" or "rocksdb" (default: "memory")
--storage string Storage engine: "memory" or "pebble" (default: "memory")
--join Join existing cluster
--config string Config file path (default: "configs/metastore.yaml")

Expand Down Expand Up @@ -638,7 +590,7 @@ See [configs/metastore.yaml](configs/metastore.yaml) for complete configuration
- ✅ Best for: Datasets < 10GB, read-heavy workloads
- ⚠️ Note: WAL replay on restart for large datasets can be slow

**RocksDB Mode**:
**Pebble Mode**:
- ✅ Use for: Large datasets (TB-scale), guaranteed persistence
- ✅ Best for: Write-heavy workloads, large key-value pairs
- ⚠️ Note: Slightly higher latency due to disk I/O
Expand Down
2 changes: 1 addition & 1 deletion api/etcd/lease_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ func (lm *LeaseManager) TimeToLive(id int64) (*kvstore.Lease, error) {
// (e.g., SyncFromStore loaded a lease that was concurrently revoked),
// so treat any store "not found" error as ErrLeaseNotFound.
lease, err := lm.store.LeaseTimeToLive(context.Background(), id)
if err != nil {
if err != nil || lease == nil {
// Remove stale entry from cache
lm.mu.Lock()
delete(lm.leases, id)
Expand Down
2 changes: 1 addition & 1 deletion api/etcd/maintenance.go
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ func (s *MaintenanceServer) Status(ctx context.Context, req *pb.StatusRequest) (
// Defragment complete(compatible etcd interface)
func (s *MaintenanceServer) Defragment(ctx context.Context, req *pb.DefragmentRequest) (*pb.DefragmentResponse, error) {
// Defragment for completedata
// for RocksDB:storageenginehandlecompress,nomanuallytrigger
// for PebbleDB:storageenginehandlecompress,nomanuallytrigger
// for Memory:memorystorageno
// returnsuccessresponse,hold etcd API compatible

Expand Down
40 changes: 20 additions & 20 deletions cmd/metastore/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ import (
"metaStore/internal/kvstore"
"metaStore/internal/memory"
"metaStore/internal/raft"
"metaStore/internal/rocksdb"
"metaStore/internal/pebbledb"
"metaStore/pkg/config"
"metaStore/pkg/log"
"metaStore/pkg/metrics"
Expand Down Expand Up @@ -76,33 +76,33 @@ func (s *serverStarter) buildStore(engine string) (kvstore.Store, raftNodeHook,
kvs = memory.NewMemory(<-snapshotterReady, s.proposeC, commitC, errorC)
kvs.SetRaftNode(raftNode, s.cfg.Server.MemberID)
return kvs, raftNode, func() {}, nil
case "rocksdb":
dbPath := fmt.Sprintf("data/rocksdb/%d", s.cfg.Server.MemberID)
db, err := rocksdb.Open(dbPath, &s.cfg.Server.RocksDB)
case "pebble":
dbPath := fmt.Sprintf("data/pebble/%d", s.cfg.Server.MemberID)
db, err := pebbledb.Open(dbPath, &s.cfg.Server.Pebble)
if err != nil {
return nil, nil, nil, err
}

log.Info("RocksDB configuration applied",
zap.Uint64("block_cache_size", s.cfg.Server.RocksDB.BlockCacheSize),
zap.Uint64("write_buffer_size", s.cfg.Server.RocksDB.WriteBufferSize),
zap.Int("max_background_jobs", s.cfg.Server.RocksDB.MaxBackgroundJobs),
zap.Int("max_open_files", s.cfg.Server.RocksDB.MaxOpenFiles),
zap.Bool("bloom_filter_enabled", s.cfg.Server.RocksDB.BlockBasedTableBloomFilter),
zap.String("component", "rocksdb"))
log.Info("Pebble configuration applied",
zap.Uint64("block_cache_size", s.cfg.Server.Pebble.BlockCacheSize),
zap.Uint64("write_buffer_size", s.cfg.Server.Pebble.WriteBufferSize),
zap.Int("max_background_jobs", s.cfg.Server.Pebble.MaxBackgroundJobs),
zap.Int("max_open_files", s.cfg.Server.Pebble.MaxOpenFiles),
zap.Bool("bloom_filter_enabled", s.cfg.Server.Pebble.BlockBasedTableBloomFilter),
zap.String("component", "pebble"))

var kvs *rocksdb.RocksDB
var kvs *pebbledb.PebbleDB
getSnapshot := func() ([]byte, error) { return kvs.GetSnapshot() }
commitC, errorC, snapshotterReady, raftNode := raft.NewNodeRocksDB(int(s.cfg.Server.MemberID), s.clusterPeers, s.join, getSnapshot, s.proposeC, s.confChangeC, db, dbPath, s.cfg)
kvs = rocksdb.NewRocksDB(db, <-snapshotterReady, s.proposeC, commitC, errorC)
commitC, errorC, snapshotterReady, raftNode := raft.NewNodePebble(int(s.cfg.Server.MemberID), s.clusterPeers, s.join, getSnapshot, s.proposeC, s.confChangeC, db, dbPath, s.cfg)
kvs = pebbledb.NewPebbleDB(db, <-snapshotterReady, s.proposeC, commitC, errorC)
kvs.SetRaftNode(raftNode, s.cfg.Server.MemberID)
closeFunc := func() {
kvs.Close()
db.Close()
}
return kvs, raftNode, closeFunc, nil
default:
return nil, nil, nil, fmt.Errorf("Unknown storage engine: %s. Supported engines: memory, rocksdb", engine)
return nil, nil, nil, fmt.Errorf("Unknown storage engine: %s. Supported engines: memory, pebble", engine)
}
}

Expand Down Expand Up @@ -233,7 +233,7 @@ func main() {
grpcAddr := flag.String("grpc-addr", ":2379", "gRPC server address for etcd compatibility")
clientURLs := flag.String("client-urls", "", "comma separated advertised client URLs")
join := flag.Bool("join", false, "join an existing cluster")
storageEngine := flag.String("storage", "memory", "storage engine: memory or rocksdb")
storageEngine := flag.String("storage", "memory", "storage engine: memory or pebble")

flag.Parse()

Expand Down Expand Up @@ -322,8 +322,8 @@ func main() {
}

switch *storageEngine {
case "rocksdb":
log.Info("Starting with RocksDB persistent storage", zap.String("component", "main"))
case "pebble":
log.Info("Starting with Pebble persistent storage", zap.String("component", "main"))
case "memory":
log.Info("Starting with memory + WAL storage and etcd gRPC support", zap.String("component", "main"))
default:
Expand All @@ -334,8 +334,8 @@ func main() {

store, raftNode, closeStore, err := starter.buildStore(*storageEngine)
if err != nil {
if *storageEngine == "rocksdb" {
log.Fatal("Failed to open RocksDB", zap.Error(err), zap.String("component", "main"))
if *storageEngine == "pebble" {
log.Fatal("Failed to open Pebble", zap.Error(err), zap.String("component", "main"))
} else {
log.Fatal("Failed to build store", zap.Error(err), zap.String("component", "main"))
}
Expand Down
4 changes: 2 additions & 2 deletions configs/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -192,10 +192,10 @@ server:
# 跨大洲部署建议:500ms(需相应增大 election_timeout)
read_timeout: 5s # 读超时时间(防止读请求永久挂起)

# RocksDB 性能配置(仅在使用 RocksDB 存储引擎时生效)
# Pebble performance configuration (only effective when using Pebble storage engine)
# 默认配置针对轻量化优化(~10MB 内存占用)
# 如需高性能,可增大 block_cache_size 和 write_buffer_size
rocksdb:
pebble:
# Block Cache 配置(影响读性能)
block_cache_size: 8388608 # 8MB(轻量默认),高性能建议 256MB-512MB

Expand Down
26 changes: 13 additions & 13 deletions docs/ADVANCED_BATCH_OPTIMIZATION_REPORT.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ message BatchOperation {

### 1.3 编码逻辑 / Encoding Logic

**文件**: [internal/rocksdb/raft_proto.go](../internal/rocksdb/raft_proto.go)
**文件**: [internal/pebble/raft_proto.go](../internal/pebble/raft_proto.go)

新增函数:

Expand Down Expand Up @@ -96,7 +96,7 @@ if len(batch) == 1 {

### 1.4 解码和执行 / Decoding and Execution

**文件**: [internal/rocksdb/kvstore.go](../internal/rocksdb/kvstore.go) (Lines 204-216)
**文件**: [internal/pebble/kvstore.go](../internal/pebble/kvstore.go) (Lines 204-216)

三层 fallback 策略:

Expand All @@ -120,7 +120,7 @@ for _, data := range commit.Data {

### 1.5 BatchProposer 集成 / BatchProposer Integration

**文件**: [internal/rocksdb/batch_proposer.go](../internal/rocksdb/batch_proposer.go)
**文件**: [internal/pebble/batch_proposer.go](../internal/pebble/batch_proposer.go)

关键修改:

Expand Down Expand Up @@ -169,7 +169,7 @@ if len(batch) == 1 {

### 2.2 配置扩展 / Configuration Extension

**文件**: [internal/rocksdb/batch_proposer.go](../internal/rocksdb/batch_proposer.go) (Lines 27-32)
**文件**: [internal/pebble/batch_proposer.go](../internal/pebble/batch_proposer.go) (Lines 27-32)

```go
type BatchConfig struct {
Expand Down Expand Up @@ -273,7 +273,7 @@ func (bp *BatchProposer) adjustWaitTime() {
### 3.1 功能测试 / Functional Tests

**所有测试 100% 通过**:
- ✅ RocksDB 核心测试:13/13
- ✅ Pebble 核心测试:13/13
- ✅ 跨协议集成测试:19/19
- ✅ 单节点操作测试:3/3

Expand Down Expand Up @@ -646,10 +646,10 @@ BatchConfig{
| 功能 | 文件 | 行数 |
|------|------|------|
| Protobuf 定义 | internal/proto/raft.proto | 21-34 |
| 批量编码函数 | internal/rocksdb/raft_proto.go | 191-233 |
| 自适应逻辑 | internal/rocksdb/batch_proposer.go | 255-291 |
| 批量解码 | internal/rocksdb/kvstore.go | 204-216 |
| BatchProposer 集成 | internal/rocksdb/batch_proposer.go | 196-231 |
| 批量编码函数 | internal/pebble/raft_proto.go | 191-233 |
| 自适应逻辑 | internal/pebble/batch_proposer.go | 255-291 |
| 批量解码 | internal/pebble/kvstore.go | 204-216 |
| BatchProposer 集成 | internal/pebble/batch_proposer.go | 196-231 |

### 11.2 相关文档

Expand All @@ -667,11 +667,11 @@ Tier 3 基础版: 5.92s
+ 自适应批处理: 4.67s (-21%)
```

**RocksDB 核心测试**:
**Pebble 核心测试**:
```
TestRocksDB_Compact_Basic: 54ms (vs 150ms baseline, -64%)
TestRocksDB_Compact_Sequential: 115ms first, <1ms subsequent
TestCrossProtocolRocksDB/Concurrent: 4.67s (vs 5.92s, -21%)
TestPebble_Compact_Basic: 54ms (vs 150ms baseline, -64%)
TestPebble_Compact_Sequential: 115ms first, <1ms subsequent
TestCrossProtocolPebble/Concurrent: 4.67s (vs 5.92s, -21%)
```

---
Expand Down
Loading