## swisstable -- a joke `swisstable`是`abseil-cpp`中`flat_hash_set/map`中使用的hash表实现,使用的是二次探测法,号称性能优于STL,下面讲讲这个`swisstable`是怎么回事 这个hash表使用了一个`group`的概念,将一块连续的内存分为若干个`group`,每个`group`有16个元素包括`meta`和`slot`两部分,其中`meta`存放64位的`hash`值的7位(称为h2),`slot`用于存放Key,Key通过`hash`值的57位(称为h1)来索引,使用SSE2同时比较128位的ctrl值,快速找到Key在`group`中的偏移,这也是为什么是16个`meta`是一组(128 / 8 = 16),而`meta`又有三个特殊值`-1`、`-2`和`-128`分别对应,sentinel、invalid(delete)和empty,一开始所有`meta`被初始化位`empty` 查找过程: 1. 计算hash,得到h1和h2 2. 通过h2得到起始的`group` 3. 使用h2和这一`group`的`meta`匹配 (使用SSE2) 4. 若匹配,返回到起始位置的offset 5. 若匹配到 `empty` 则返回不存在 代码如下 ```c++ auto pattern = _mm_set1_epi8(H2(hash)); // 1 prober seq { H1(hash), groups_ - 1 }; // 1 matcher m { 0 }; while (true) { group g { ctrl_ + seq.offset() }; // 2 m = g.match(pattern); // 3 while (m) { offset = seq.offset(m.index()); assert(offset < cap_); if (likely(slot_[offset] == key)) // 4 return true; ++m; } if (likely(g.match_empty())) // 5 return false; seq.next(); } ``` 这个`swisstable`宣传查找比STL要快,但不管是使用`absl::flat_hash_set`的官方实现,还是本人的实现,结果都不如STL(少数时候能比STL快一点点) ```shell abby@Serenity ~/s/build> ./bench2 ----------- insert ------------ unordered_set => 183.605811ms flat_hash_set => 41.932002ms ----------- search ------------ unordered_set => 4.378100ms flat_hash_set => 22.928702ms unordered_set cap 1056323 size 787268 load_factor 0.745291 flat_hash_set cap 1048575 size 787268 load_factor 0.750798 abby@Serenity ~/s/build> ./bench2 ----------- insert ------------ unordered_set => 182.890869ms flat_hash_set => 39.940193ms ----------- search ------------ unordered_set => 4.542800ms flat_hash_set => 21.498796ms unordered_set cap 1056323 size 786505 load_factor 0.744569 flat_hash_set cap 1048575 size 786505 load_factor 0.750070 abby@Serenity ~/s/build> ./bench2 ----------- insert ------------ unordered_set => 183.804469ms flat_hash_set => 38.301994ms ----------- search ------------ unordered_set => 5.001399ms flat_hash_set => 22.470496ms unordered_set cap 1056323 size 787110 load_factor 0.745141 flat_hash_set cap 1048575 size 787110 load_factor 0.750647 abby@Serenity ~/s/build> ./bench ----------- insert ------------ unordered_set => 187.271974ms swiss => 22.850596ms ----------- search ------------ unordered_set => 4.307000ms swiss => 4.016499ms unordered_set cap 1056323 size 786815 load_factor 0.744862 swiss cap 1048576 size 786815 load_factor 0.750365 abby@Serenity ~/s/build> ./bench ----------- insert ------------ unordered_set => 197.831773ms swiss => 20.787797ms ----------- search ------------ unordered_set => 4.811299ms swiss => 4.107399ms unordered_set cap 1056323 size 787381 load_factor 0.745398 swiss cap 1048576 size 787381 load_factor 0.750905 ``` 据说rust也有使用了swisstable实现,实际测试如下 ```shell abby@Serenity ~/hash_test> cargo run --release ~/numbers.txt Compiling libc v0.2.148 Compiling gcc v0.3.55 Compiling autocfg v1.1.0 Compiling version_check v0.9.4 Compiling ahash v0.8.3 Compiling num-traits v0.2.16 Compiling cfg-if v1.0.0 Compiling once_cell v1.18.0 Compiling fasthash-sys v0.3.2 Compiling cfg-if v0.1.10 Compiling rand v0.4.6 Compiling seahash v3.0.7 Compiling allocator-api2 v0.2.16 Compiling xoroshiro128 v0.3.0 Compiling hashbrown v0.14.1 Compiling fasthash v0.4.0 Compiling hash_test v0.1.0 (/home/abby/hash_test) Finished release [optimized] target(s) in 12.99s Running `target/release/hash_test /home/abby/numbers.txt` swiss insert => 73ms swiss search => 57ms std insert => 76ms std search => 69ms abby@Serenity ~/hash_test> cargo run --release ~/numbers.txt Finished release [optimized] target(s) in 0.01s Running `target/release/hash_test /home/abby/numbers.txt` swiss insert => 71ms swiss search => 56ms std insert => 71ms std search => 55ms abby@Serenity ~/hash_test> cargo run --release ~/numbers.txt Finished release [optimized] target(s) in 0.02s Running `target/release/hash_test /home/abby/numbers.txt` swiss insert => 73ms swiss search => 78ms std insert => 72ms std search => 61ms abby@Serenity ~/hash_test> ``` 结论: 为什么,swisstable实现会比STL的慢,可能时因为这里面的实现`meta`和`slot`是分开存放的,类似 ``` | meta | slot | +-------------------+----------------------------------------+ ``` 导致在查找时需要在`meta`和`slot`之间来回跳转,对缓存不友好 如果将`meta`和`slot`做到和 L1 cache对齐,效果应该要好一些