Go to file

abbycin f0044d1d62 Clarify workloads and comparison filtering		2026-03-09 12:26:34 +08:00
.vscode	adapt new mace 0.0.14	2025-10-12 18:29:40 +08:00
coreid	add coreid	2025-08-15 16:38:03 +08:00
docs	Remove legacy mixed benchmark mode	2026-03-09 08:26:40 +08:00
heap_trace	adapt new mace 0.0.14	2025-10-12 18:29:40 +08:00
logger	update scripts	2025-08-27 15:32:15 +08:00
rocksdb	Remove legacy mixed benchmark mode	2026-03-09 08:26:40 +08:00
scripts	Clarify workloads and comparison filtering	2026-03-09 12:26:34 +08:00
src	Remove legacy mixed benchmark mode	2026-03-09 08:26:40 +08:00
.gitignore	refactor: align benchmark v2 workload protocol	2026-03-08 11:33:20 +08:00
Cargo.toml	refactor: align benchmark v2 workload protocol	2026-03-08 11:33:20 +08:00
plan_exec.md	refactor: align benchmark v2 workload protocol	2026-03-08 11:33:20 +08:00
README.md	Clarify workloads and comparison filtering	2026-03-09 12:26:34 +08:00

README.md

kv_bench (Mace vs RocksDB)

Quick start for reproducible Mace vs RocksDB comparison. Full guide: docs/repro.md.

5-Minute Quickstart

Set your storage root (any mount path, not hardcoded to /nvme):

export KV_BENCH_STORAGE_ROOT=/path/to/your/storage/kvbench
mkdir -p "${KV_BENCH_STORAGE_ROOT}"

Initialize Python env once: assume that kv_bench repo is located in $HOME

cd "$HOME/kv_bench/scripts"
./init.sh
source ./bin/activate
cd "$HOME/kv_bench"

Run baseline comparison (both engines append to the same CSV):

Default quick-baseline timing in these scripts is WARMUP_SECS=3 and MEASURE_SECS=5.

rm -rf "${KV_BENCH_STORAGE_ROOT}/basic_mace" "${KV_BENCH_STORAGE_ROOT}/basic_rocks"
mkdir -p "${KV_BENCH_STORAGE_ROOT}/basic_mace" "${KV_BENCH_STORAGE_ROOT}/basic_rocks"

./scripts/mace.sh "${KV_BENCH_STORAGE_ROOT}/basic_mace" ./scripts/benchmark_results.csv
./scripts/rocksdb.sh "${KV_BENCH_STORAGE_ROOT}/basic_rocks" ./scripts/benchmark_results.csv

Plot results:

./scripts/bin/python ./scripts/plot.py ./scripts/benchmark_results.csv ./scripts

Print a direct comparison table from the CSV:

./scripts/bin/python ./scripts/compare_baseline.py ./scripts/benchmark_results.csv

What Is Compared

Comparison unit: rows with identical workload_id, threads, key_size, value_size, durability_mode, read_path
Fairness rule for read-heavy workloads: get, scan, and W1-W6 run one GC/compaction pass after prefill and before warmup/measurement, so RocksDB is not compared with GC artificially disabled while reads may have to touch multiple SSTs
Throughput metric: workload-level ops_per_sec (higher is better)
Tail latency metric: workload-level p99_us (lower is better)
- This is the workload-level p99 of all operations executed in that row, not per-op-type p99

Workloads

W1: 95% read + 5% update, uniform distribution
W2: 95% read + 5% update, Zipf distribution
W3: 50% read + 50% update, uniform distribution
W4: 5% read + 95% update, uniform distribution
W5: 70% read + 25% update + 5% scan, uniform distribution
W6: 100% scan, uniform distribution; throughput is counted by scan requests, not scanned key count

Raw CSV path: ./scripts/benchmark_results.csv

Full Reproduction

For phase-by-phase commands, knobs, and interpretation rules, use docs/repro.md.