diff --git a/swisstable -- a joke.md b/swisstable -- a joke.md new file mode 100755 index 0000000..b9cf2cc --- /dev/null +++ b/swisstable -- a joke.md @@ -0,0 +1,155 @@ +## swisstable -- a joke + +`swisstable`是`abseil-cpp`中`flat_hash_set/map`中使用的hash表实现,使用的是二次探测法,号称性能优于STL,下面讲讲这个`swisstable`是怎么回事 + + + +这个hash表使用了一个`group`的概念,将一块连续的内存分为若干个`group`,每个`group`有16个元素包括`meta`和`slot`两部分,其中`meta`存放64位的`hash`值的7位(称为h2),`slot`用于存放Key,Key通过`hash`值的57位(称为h1)来索引,使用SSE2同时比较128位的ctrl值,快速找到Key在`group`中的偏移,这也是为什么是16个`meta`是一组(128 / 8 = 16),而`meta`又有三个特殊值`-1`、`-2`和`-128`分别对应,sentinel、invalid(delete)和empty,一开始所有`meta`被初始化位`empty` + + + +查找过程: + +1. 计算hash,得到h1和h2 + +2. 通过h2得到起始的`group` + +3. 使用h2和这一`group`的`meta`匹配 (使用SSE2) + +4. 若匹配,返回到起始位置的offset + +5. 若匹配到 `empty` 则返回不存在 + +代码如下 + +```c++ +auto pattern = _mm_set1_epi8(H2(hash)); // 1 +prober seq { H1(hash), groups_ - 1 }; // 1 +matcher m { 0 }; + +while (true) { + group g { ctrl_ + seq.offset() }; // 2 + m = g.match(pattern); // 3 + + while (m) { + offset = seq.offset(m.index()); + assert(offset < cap_); + if (likely(slot_[offset] == key)) // 4 + return true; + ++m; + } + if (likely(g.match_empty())) // 5 + return false; + seq.next(); +} +``` + +这个`swisstable`宣传查找比STL要快,但不管是使用`absl::flat_hash_set`的官方实现,还是本人的实现,结果都不如STL(少数时候能比STL快一点点) + +```shell +abby@Serenity ~/s/build> ./bench2 +----------- insert ------------ + unordered_set => 183.605811ms + flat_hash_set => 41.932002ms +----------- search ------------ + unordered_set => 4.378100ms + flat_hash_set => 22.928702ms +unordered_set cap 1056323 size 787268 load_factor 0.745291 +flat_hash_set cap 1048575 size 787268 load_factor 0.750798 +abby@Serenity ~/s/build> ./bench2 +----------- insert ------------ + unordered_set => 182.890869ms + flat_hash_set => 39.940193ms +----------- search ------------ + unordered_set => 4.542800ms + flat_hash_set => 21.498796ms +unordered_set cap 1056323 size 786505 load_factor 0.744569 +flat_hash_set cap 1048575 size 786505 load_factor 0.750070 +abby@Serenity ~/s/build> ./bench2 +----------- insert ------------ + unordered_set => 183.804469ms + flat_hash_set => 38.301994ms +----------- search ------------ + unordered_set => 5.001399ms + flat_hash_set => 22.470496ms +unordered_set cap 1056323 size 787110 load_factor 0.745141 +flat_hash_set cap 1048575 size 787110 load_factor 0.750647 +abby@Serenity ~/s/build> ./bench +----------- insert ------------ + unordered_set => 187.271974ms + swiss => 22.850596ms +----------- search ------------ + unordered_set => 4.307000ms + swiss => 4.016499ms +unordered_set cap 1056323 size 786815 load_factor 0.744862 +swiss cap 1048576 size 786815 load_factor 0.750365 +abby@Serenity ~/s/build> ./bench +----------- insert ------------ + unordered_set => 197.831773ms + swiss => 20.787797ms +----------- search ------------ + unordered_set => 4.811299ms + swiss => 4.107399ms +unordered_set cap 1056323 size 787381 load_factor 0.745398 +swiss cap 1048576 size 787381 load_factor 0.750905 +``` + + + +据说rust也有使用了swisstable实现,实际测试如下 + +```shell +abby@Serenity ~/hash_test> cargo run --release ~/numbers.txt + Compiling libc v0.2.148 + Compiling gcc v0.3.55 + Compiling autocfg v1.1.0 + Compiling version_check v0.9.4 + Compiling ahash v0.8.3 + Compiling num-traits v0.2.16 + Compiling cfg-if v1.0.0 + Compiling once_cell v1.18.0 + Compiling fasthash-sys v0.3.2 + Compiling cfg-if v0.1.10 + Compiling rand v0.4.6 + Compiling seahash v3.0.7 + Compiling allocator-api2 v0.2.16 + Compiling xoroshiro128 v0.3.0 + Compiling hashbrown v0.14.1 + Compiling fasthash v0.4.0 + Compiling hash_test v0.1.0 (/home/abby/hash_test) + Finished release [optimized] target(s) in 12.99s + Running `target/release/hash_test /home/abby/numbers.txt` +swiss insert => 73ms +swiss search => 57ms +std insert => 76ms +std search => 69ms +abby@Serenity ~/hash_test> cargo run --release ~/numbers.txt + Finished release [optimized] target(s) in 0.01s + Running `target/release/hash_test /home/abby/numbers.txt` +swiss insert => 71ms +swiss search => 56ms +std insert => 71ms +std search => 55ms +abby@Serenity ~/hash_test> cargo run --release ~/numbers.txt + Finished release [optimized] target(s) in 0.02s + Running `target/release/hash_test /home/abby/numbers.txt` +swiss insert => 73ms +swiss search => 78ms +std insert => 72ms +std search => 61ms +abby@Serenity ~/hash_test> +``` + +结论: + +为什么,swisstable实现会比STL的慢,可能时因为这里面的实现`meta`和`slot`是分开存放的,类似 + +``` +| meta | slot | ++-------------------+----------------------------------------+ +``` + +导致在查找时需要在`meta`和`slot`之间来回跳转,对缓存不友好 + +如果将`meta`和`slot`做到和 L1 cache对齐,效果应该要好一些 +