Demonstrating Shards On Elastic Stack Performance

This repository contains an Elasticsearch Rally with Docker and custom tracks for various scenarios to demonstrate the effect that oversharding, or correctly sharding but simply a lot of shards, can play on the non functional requirements of an Elastic cluster.

Results

Detailed analysis is contained in the write up

Setup

To run

# Run the setup script
./setup.sh

# Or manually:
poetry install
poetry run poe cluster_up
poetry run poe rally_help

📋 Prerequisites

Docker (Compose v2 CLI)
Poetry ≥ 1.6
curl and jq (for health checks)

🏗️ Architecture

This project provides:

3-node Elasticsearch cluster optimized for load testing
Custom Rally tracks for different workload scenarios
Helper scripts for common benchmarking tasks
Docker Compose configuration tuned for performance

🎯 Custom Tracks

1. E-commerce Products (`tracks/ecommerce-products/`)

Use case: Product catalog search and filtering

Challenges: index-and-search, search-heavy
Operations: Product indexing, search, filtering, aggregations
Data: 1M products with categories, prices, ratings, locations

# Run ecommerce benchmark
poetry run poe rally_ecommerce challenge=search-heavy user_tag=test-run-1

2. Log Aggregation (`tracks/log-aggregation/`)

Use case: High-volume log ingestion and analysis

Challenges: high-throughput-ingestion, search-heavy, mixed-workload
Operations: Log indexing, search, filtering, aggregations
Data: 10M log entries with timestamps, levels, services

# Run log aggregation benchmark
poetry run poe rally_logs challenge=high-throughput-ingestion user_tag=logs-test

3. Time-Series Metrics (`tracks/time-series-metrics/`)

Use case: Metrics collection and monitoring

Challenges: metrics-ingestion, metrics-analysis
Operations: Metrics indexing, time-series queries, aggregations
Data: 5M metrics with timestamps, hosts, services

# Run metrics benchmark
poetry run poe rally_metrics challenge=metrics-analysis user_tag=metrics-test

🛠️ Available Commands

Cluster Management

poetry run poe cluster_up          # Start Elasticsearch cluster
poetry run poe cluster_down        # Stop cluster (keeps data)
poetry run poe cluster_status      # Check container status
poetry run poe cluster_logs        # Follow cluster logs

Rally Operations

poetry run poe rally_help          # Show Rally help
poetry run poe rally_list_tracks   # List available tracks
poetry run poe rally_list_races    # List completed races
poetry run poe rally_compare baseline=<id> contender=<id>  # Compare runs

Custom Track Benchmarks

poetry run poe rally_ecommerce     # E-commerce products track
poetry run poe rally_logs          # Log aggregation track
poetry run poe rally_metrics       # Time-series metrics track
poetry run poe rally_quick_benchmark  # Quick test with geonames track

Track Creation

# Create track from live cluster
poetry run poe rally_create_track target_hosts=localhost:9200 indices=my-index track_name=my-track

# Run custom track
poetry run poe rally_race target_hosts=localhost:9200 track_path=tracks/my-track

⚙️ Configuration

Docker Compose Tuning

The docker-compose.yml is optimized for load testing:

Memory: 2GB heap per node (3GB container limit)
Performance: Disabled monitoring, increased thread pools
Storage: Persistent volumes for data retention

Environment Variables

Create .env file to customize:

ELASTIC_VERSION=8.12.2
RALLY_HOME=.rally

Rally Configuration

Rally stores its state in .rally/ directory. Configure advanced options:

poetry run esrally configure --advanced

📊 Performance Monitoring

Cluster Health

# Check cluster status
curl -s http://localhost:9200/_cluster/health | jq .

# Monitor resource usage
poetry run poe cluster_logs

Rally Metrics

# List recent races
poetry run poe rally_list_races

# Compare performance
poetry run poe rally_compare baseline=2024-01-01-01-01-01 contender=2024-01-01-02-02-02

🔧 Customization

Adding New Tracks

Create track directory: tracks/my-track/
Add track.json with track definition
Add index mapping: index.json
Add query files: search_queries.json, etc.
Add poe task in pyproject.toml

Modifying Existing Tracks

Edit track definitions in tracks/*/track.json
Modify queries in tracks/*/*.json
Adjust challenges and schedules as needed

Cluster Tuning

Modify docker-compose.yml for different cluster sizes
Adjust JVM settings in ES_JAVA_OPTS
Change shard/replica counts in track definitions

📈 Example Workflows

1. Basic Performance Test

# Start cluster
poetry run poe cluster_up

# Run quick benchmark
poetry run poe rally_quick_benchmark

# Check results
poetry run poe rally_list_races

2. E-commerce Load Test

# Run product indexing and search
poetry run poe rally_ecommerce challenge=index-and-search user_tag=ecommerce-test-1

# Run search-heavy workload
poetry run poe rally_ecommerce challenge=search-heavy user_tag=ecommerce-test-2

# Compare results
poetry run poe rally_compare baseline=ecommerce-test-1 contender=ecommerce-test-2

3. High-Volume Log Testing

# Test log ingestion performance
poetry run poe rally_logs challenge=high-throughput-ingestion user_tag=logs-ingestion

# Test mixed workload
poetry run poe rally_logs challenge=mixed-workload user_tag=logs-mixed

🐛 Troubleshooting

Cluster Issues

# Check cluster status
poetry run poe cluster_status

# View logs
poetry run poe cluster_logs

# Restart cluster
poetry run poe cluster_down
poetry run poe cluster_up

Rally Issues

# Check Rally configuration
poetry run poe rally_help

# List available tracks
poetry run poe rally_list_tracks

# Check race results
poetry run poe rally_list_races

Performance Issues

Increase heap size in docker-compose.yml
Adjust bulk sizes in track definitions
Monitor system resources during tests

📚 Resources

🤝 Contributing

Fork the repository
Create a feature branch
Add your custom tracks or improvements
Test with ./setup.sh
Submit a pull request

📄 License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
experiments/sharding-configs		experiments/sharding-configs
scripts		scripts
tracks		tracks
.env.example		.env.example
.gitignore		.gitignore
README.MD		README.MD
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Demonstrating Shards On Elastic Stack Performance

Results

Setup

📋 Prerequisites

🏗️ Architecture

🎯 Custom Tracks

1. E-commerce Products (tracks/ecommerce-products/)

2. Log Aggregation (tracks/log-aggregation/)

3. Time-Series Metrics (tracks/time-series-metrics/)

🛠️ Available Commands

Cluster Management

Rally Operations

Custom Track Benchmarks

Track Creation

⚙️ Configuration

Docker Compose Tuning

Environment Variables

Rally Configuration

📊 Performance Monitoring

Cluster Health

Rally Metrics

🔧 Customization

Adding New Tracks

Modifying Existing Tracks

Cluster Tuning

📈 Example Workflows

1. Basic Performance Test

2. E-commerce Load Test

3. High-Volume Log Testing

🐛 Troubleshooting

Cluster Issues

Rally Issues

Performance Issues

📚 Resources

🤝 Contributing

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. E-commerce Products (`tracks/ecommerce-products/`)

2. Log Aggregation (`tracks/log-aggregation/`)

3. Time-Series Metrics (`tracks/time-series-metrics/`)

Packages