diff --git a/README.md b/README.md
index 43c5ec7..96bbb86 100644
--- a/README.md
+++ b/README.md
@@ -1,33 +1,74 @@
-# mace 0.0.27 vs rocksdb 10.4.2
+# kv_bench (Mace vs RocksDB)
 
-## sequential insert
-![mace_sequential_insert](./scripts/mace_sequential_insert.png)
+Quick start for reproducible comparison. Full guide: [docs/repro.md](./docs/repro.md).
 
-![rocksdb_sequential_insert](./scripts/rocksdb_sequential_insert.png)
+## 5-Minute Quickstart
+1. Set your storage root (any mount path, not hardcoded to `/nvme`):
 
-## random insert
-![mace_random_insert](./scripts/mace_random_insert.png)
+```bash
+export KV_BENCH_STORAGE_ROOT=/path/to/your/storage/kvbench
+mkdir -p "${KV_BENCH_STORAGE_ROOT}"
+```
 
-![rocksdb_random_insert](./scripts/rocksdb_random_insert.png)
+2. Initialize Python env once:
 
----
+```bash
+cd /home/abby/kv_bench/scripts
+./init.sh
+source ./bin/activate
+cd /home/abby/kv_bench
+```
 
-## random get (warm get)
+3. Run baseline comparison (both engines write to the same CSV):
 
-![mace_get](./scripts/mace_get.png)
+```bash
+rm -rf "${KV_BENCH_STORAGE_ROOT}/basic_mace" "${KV_BENCH_STORAGE_ROOT}/basic_rocks"
+mkdir -p "${KV_BENCH_STORAGE_ROOT}/basic_mace" "${KV_BENCH_STORAGE_ROOT}/basic_rocks"
 
-![rocksdb_get](./scripts/rocksdb_get.png)
+./scripts/mace.sh "${KV_BENCH_STORAGE_ROOT}/basic_mace" ./scripts/benchmark_results.csv
+./scripts/rocksdb.sh "${KV_BENCH_STORAGE_ROOT}/basic_rocks" ./scripts/benchmark_results.csv
+```
 
----
+4. View and plot results:
 
-# mixed perfomance (hot get)
+```bash
+./scripts/bin/python ./scripts/plot.py ./scripts/benchmark_results.csv ./scripts
+```
 
-![mace_mixed](./scripts/mace_mixed.png)
+## Fast Result Reading
+- Raw input CSV: `./scripts/benchmark_results.csv`
+- Key columns:
+  - `engine` (`mace` / `rocksdb`)
+  - `workload_id` (`W1..W6`)
+  - `ops_per_sec` (higher is better)
+  - `p99_us` (lower is better)
+  - `error_ops` (must be 0 before drawing conclusions)
 
-![rockdb_mixed](./scripts/rocksdb_mixed.png)
+## Phase Reports
+- Phase 1 (stability CV):
 
-# sequential scan (warm scan)
+```bash
+./scripts/bin/python ./scripts/phase1_eval.py ./scripts/phase1_results.csv
+```
 
-![mace_scan](./scripts/mace_scan.png)
+- Phase 2 (core median + slow scenarios):
 
-![rocksdb_scan](./scripts/rocksdb_scan.png)
+```bash
+./scripts/bin/python ./scripts/phase2_report.py ./scripts/phase2_results.csv
+```
+
+- Phase 3 (durability cost):
+
+```bash
+./scripts/bin/python ./scripts/phase3_report.py ./scripts/phase3_results.csv
+```
+
+- Phase 4 (restart/recovery):
+
+```bash
+./scripts/bin/python ./scripts/phase4_report.py ./scripts/phase4_restart_mace.csv
+./scripts/bin/python ./scripts/phase4_report.py ./scripts/phase4_restart_rocks.csv
+```
+
+## Full Reproduction
+For phase-by-phase commands, knobs, and interpretation rules, use [docs/repro.md](./docs/repro.md).
diff --git a/docs/repro.md b/docs/repro.md
new file mode 100644
index 0000000..3de1ec4
--- /dev/null
+++ b/docs/repro.md
@@ -0,0 +1,167 @@
+# kv_bench Reproduction Guide (Mace vs RocksDB)
+
+This repository is used to reproduce and compare `mace` and `rocksdb` benchmark results across phase0~phase4.
+
+## 1. Prerequisites
+- Linux
+- A high-speed storage mount directory you choose (typically an NVMe mount point)
+- Rust/Cargo
+- CMake (to build `rocksdb_bench`)
+- Python 3 (for result aggregation and plotting)
+
+## 2. Storage Directory Configuration (Important)
+`/nvme` is no longer hardcoded. You can use any mount directory.
+
+Recommended: set one shared variable first:
+
+```bash
+export KV_BENCH_STORAGE_ROOT=/path/to/your/nvme_mount/kvbench
+mkdir -p "${KV_BENCH_STORAGE_ROOT}"
+```
+
+All scripts below take this directory (or one of its subdirectories) as the first argument.
+
+## 3. Initialization
+```bash
+cd /home/abby/kv_bench/scripts
+./init.sh
+source ./bin/activate
+cd /home/abby/kv_bench
+```
+
+## 4. Quick Baseline Comparison (W1~W6)
+Clean old data first:
+
+```bash
+rm -rf "${KV_BENCH_STORAGE_ROOT}/basic_mace" "${KV_BENCH_STORAGE_ROOT}/basic_rocks"
+mkdir -p "${KV_BENCH_STORAGE_ROOT}/basic_mace" "${KV_BENCH_STORAGE_ROOT}/basic_rocks"
+```
+
+Run both engines:
+
+```bash
+./scripts/mace.sh "${KV_BENCH_STORAGE_ROOT}/basic_mace" ./scripts/benchmark_results.csv
+./scripts/rocksdb.sh "${KV_BENCH_STORAGE_ROOT}/basic_rocks" ./scripts/benchmark_results.csv
+```
+
+Generate plots:
+
+```bash
+./scripts/bin/python ./scripts/plot.py ./scripts/benchmark_results.csv ./scripts
+```
+
+## 5. Phase Reproduction Commands
+
+### Phase 1
+```bash
+rm -rf "${KV_BENCH_STORAGE_ROOT}/phase1"
+mkdir -p "${KV_BENCH_STORAGE_ROOT}/phase1"
+./scripts/phase1.sh "${KV_BENCH_STORAGE_ROOT}/phase1" ./scripts/phase1_results.csv
+```
+
+### Phase 2
+```bash
+rm -rf "${KV_BENCH_STORAGE_ROOT}/phase2"
+mkdir -p "${KV_BENCH_STORAGE_ROOT}/phase2"
+./scripts/phase2.sh "${KV_BENCH_STORAGE_ROOT}/phase2" ./scripts/phase2_results.csv
+```
+
+Optional: enable tier-l representative subset:
+
+```bash
+RUN_TIER_L_REPRESENTATIVE=1 TIER_L_REPEATS=1 \
+./scripts/phase2.sh "${KV_BENCH_STORAGE_ROOT}/phase2" ./scripts/phase2_results.csv
+```
+
+### Phase 3
+```bash
+rm -rf "${KV_BENCH_STORAGE_ROOT}/phase3"
+mkdir -p "${KV_BENCH_STORAGE_ROOT}/phase3"
+./scripts/phase3.sh "${KV_BENCH_STORAGE_ROOT}/phase3" ./scripts/phase3_results.csv
+```
+
+### Phase 4 (run one engine at a time)
+Mace:
+```bash
+rm -rf "${KV_BENCH_STORAGE_ROOT}/phase4_mace"
+./scripts/phase4_soak.sh mace "${KV_BENCH_STORAGE_ROOT}/phase4_mace" \
+  ./scripts/phase4_results_mace.csv ./scripts/phase4_restart_mace.csv
+```
+
+RocksDB:
+```bash
+rm -rf "${KV_BENCH_STORAGE_ROOT}/phase4_rocks"
+./scripts/phase4_soak.sh rocksdb "${KV_BENCH_STORAGE_ROOT}/phase4_rocks" \
+  ./scripts/phase4_results_rocks.csv ./scripts/phase4_restart_rocks.csv
+```
+
+## 6. Where Result Inputs (CSV) Are Stored
+Default output files:
+- `./scripts/benchmark_results.csv`
+- `./scripts/phase1_results.csv`
+- `./scripts/phase2_results.csv`
+- `./scripts/phase3_results.csv`
+- `./scripts/phase4_results_*.csv`
+- `./scripts/phase4_restart_*.csv`
+
+The unified schema is emitted by both engine binaries (same format for mace/rocksdb). Key columns:
+- `engine`: `mace` / `rocksdb`
+- `workload_id`: `W1..W6`
+- `durability_mode`: `relaxed` / `durable`
+- `threads,key_size,value_size,prefill_keys`: case configuration
+- `ops_per_sec`: throughput
+- `p50_us,p95_us,p99_us,p999_us`: latency percentiles
+- `error_ops`: number of failed operations
+- `read_path`: `snapshot` / `rw_txn`
+
+## 7. Where to Interpret Results
+
+### Phase 1 (stability)
+```bash
+./scripts/bin/python ./scripts/phase1_eval.py ./scripts/phase1_results.csv
+```
+Check:
+- `throughput_cv` (<=10%)
+- `p99_cv` (<=15%)
+- `stable` and overall pass ratio
+
+### Phase 2 (core report)
+```bash
+./scripts/bin/python ./scripts/phase2_report.py ./scripts/phase2_results.csv
+```
+Check:
+- `throughput_median`
+- `p95_median`, `p99_median`
+- `slower_engine`, `slower_ratio`
+
+### Phase 3 (durability cost)
+```bash
+./scripts/bin/python ./scripts/phase3_report.py ./scripts/phase3_results.csv
+```
+Check:
+- `throughput_drop_pct` (durable vs relaxed throughput drop)
+- `p99_inflation_pct` (durable vs relaxed p99 inflation)
+
+### Phase 4 (recovery capability)
+```bash
+./scripts/bin/python ./scripts/phase4_report.py ./scripts/phase4_restart_mace.csv
+./scripts/bin/python ./scripts/phase4_report.py ./scripts/phase4_restart_rocks.csv
+```
+Check:
+- `restart_success`
+- `restart_ready_ms` at `p50/p95/p99/max`
+
+## 8. CLI Configurability (No Hardcoded Disk Prefix)
+- Both benchmark binaries support `--path` to set the DB directory.
+- All scripts use the first argument as storage root/path.
+- You can point `${KV_BENCH_STORAGE_ROOT}` to any mount point (NVMe, SSD, RAID, ephemeral disk).
+
+## 9. Comparison Best Practices
+Only compare cases under identical dimensions:
+- `workload_id`
+- `key_size/value_size`
+- `threads`
+- `durability_mode`
+- `read_path`
+
+If `error_ops > 0`, investigate that case first before drawing performance conclusions.