hash_test/swisstable -- a joke.md
2023-10-04 10:38:36 +08:00

156 lines
5.1 KiB
Markdown
Executable File
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## swisstable -- a joke
`swisstable`是`abseil-cpp`中`flat_hash_set/map`中使用的hash表实现使用的是二次探测法号称性能优于STL下面讲讲这个`swisstable`是怎么回事
这个hash表使用了一个`group`的概念,将一块连续的内存分为若干个`group`,每个`group`有16个元素包括`meta`和`slot`两部分,其中`meta`存放64位的`hash`值的7位称为h2)`slot`用于存放KeyKey通过`hash`值的57位称为h1来索引使用SSE2同时比较128位的ctrl值快速找到Key在`group`中的偏移这也是为什么是16个`meta`是一组128 / 8 = 16而`meta`又有三个特殊值`-1`、`-2`和`-128`分别对应sentinel、invaliddelete和empty一开始所有`meta`被初始化位`empty`
查找过程:
1. 计算hash得到h1和h2
2. 通过h2得到起始的`group`
3. 使用h2和这一`group`的`meta`匹配 使用SSE2
4. 若匹配返回到起始位置的offset
5. 若匹配到 `empty` 则返回不存在
代码如下
```c++
auto pattern = _mm_set1_epi8(H2(hash)); // 1
prober seq { H1(hash), groups_ - 1 }; // 1
matcher m { 0 };
while (true) {
group g { ctrl_ + seq.offset() }; // 2
m = g.match(pattern); // 3
while (m) {
offset = seq.offset(m.index());
assert(offset < cap_);
if (likely(slot_[offset] == key)) // 4
return true;
++m;
}
if (likely(g.match_empty())) // 5
return false;
seq.next();
}
```
这个`swisstable`宣传查找比STL要快但不管是使用`absl::flat_hash_set`的官方实现还是本人的实现结果都不如STL少数时候能比STL快一点点
```shell
abby@Serenity ~/s/build> ./bench2
----------- insert ------------
unordered_set => 183.605811ms
flat_hash_set => 41.932002ms
----------- search ------------
unordered_set => 4.378100ms
flat_hash_set => 22.928702ms
unordered_set cap 1056323 size 787268 load_factor 0.745291
flat_hash_set cap 1048575 size 787268 load_factor 0.750798
abby@Serenity ~/s/build> ./bench2
----------- insert ------------
unordered_set => 182.890869ms
flat_hash_set => 39.940193ms
----------- search ------------
unordered_set => 4.542800ms
flat_hash_set => 21.498796ms
unordered_set cap 1056323 size 786505 load_factor 0.744569
flat_hash_set cap 1048575 size 786505 load_factor 0.750070
abby@Serenity ~/s/build> ./bench2
----------- insert ------------
unordered_set => 183.804469ms
flat_hash_set => 38.301994ms
----------- search ------------
unordered_set => 5.001399ms
flat_hash_set => 22.470496ms
unordered_set cap 1056323 size 787110 load_factor 0.745141
flat_hash_set cap 1048575 size 787110 load_factor 0.750647
abby@Serenity ~/s/build> ./bench
----------- insert ------------
unordered_set => 187.271974ms
swiss => 22.850596ms
----------- search ------------
unordered_set => 4.307000ms
swiss => 4.016499ms
unordered_set cap 1056323 size 786815 load_factor 0.744862
swiss cap 1048576 size 786815 load_factor 0.750365
abby@Serenity ~/s/build> ./bench
----------- insert ------------
unordered_set => 197.831773ms
swiss => 20.787797ms
----------- search ------------
unordered_set => 4.811299ms
swiss => 4.107399ms
unordered_set cap 1056323 size 787381 load_factor 0.745398
swiss cap 1048576 size 787381 load_factor 0.750905
```
据说rust也有使用了swisstable实现实际测试如下
```shell
abby@Serenity ~/hash_test> cargo run --release ~/numbers.txt
Compiling libc v0.2.148
Compiling gcc v0.3.55
Compiling autocfg v1.1.0
Compiling version_check v0.9.4
Compiling ahash v0.8.3
Compiling num-traits v0.2.16
Compiling cfg-if v1.0.0
Compiling once_cell v1.18.0
Compiling fasthash-sys v0.3.2
Compiling cfg-if v0.1.10
Compiling rand v0.4.6
Compiling seahash v3.0.7
Compiling allocator-api2 v0.2.16
Compiling xoroshiro128 v0.3.0
Compiling hashbrown v0.14.1
Compiling fasthash v0.4.0
Compiling hash_test v0.1.0 (/home/abby/hash_test)
Finished release [optimized] target(s) in 12.99s
Running `target/release/hash_test /home/abby/numbers.txt`
swiss insert => 73ms
swiss search => 57ms
std insert => 76ms
std search => 69ms
abby@Serenity ~/hash_test> cargo run --release ~/numbers.txt
Finished release [optimized] target(s) in 0.01s
Running `target/release/hash_test /home/abby/numbers.txt`
swiss insert => 71ms
swiss search => 56ms
std insert => 71ms
std search => 55ms
abby@Serenity ~/hash_test> cargo run --release ~/numbers.txt
Finished release [optimized] target(s) in 0.02s
Running `target/release/hash_test /home/abby/numbers.txt`
swiss insert => 73ms
swiss search => 78ms
std insert => 72ms
std search => 61ms
abby@Serenity ~/hash_test>
```
结论
为什么swisstable实现会比STL的慢可能时因为这里面的实现`meta``slot`是分开存放的类似
```
| meta | slot |
+-------------------+----------------------------------------+
```
导致在查找时需要在`meta``slot`之间来回跳转对缓存不友好
如果将`meta``slot`做到和 L1 cache对齐效果应该要好一些