hash_test/swisstable -- a joke.md at 9c782b4391bbaece05de8732224d02c32e5be870

2023-10-04 10:38:36 +08:00

5.1 KiB

Executable File

Raw Blame History

swisstable -- a joke

swisstable是abseil-cpp中flat_hash_set/map中使用的hash表实现，使用的是二次探测法，号称性能优于STL，下面讲讲这个swisstable是怎么回事

这个hash表使用了一个group的概念，将一块连续的内存分为若干个group，每个group有16个元素包括meta和slot两部分，其中meta存放64位的hash值的7位（称为h2)，slot用于存放Key，Key通过hash值的57位（称为h1）来索引，使用SSE2同时比较128位的ctrl值，快速找到Key在group中的偏移，这也是为什么是16个meta是一组（128 / 8 = 16），而meta又有三个特殊值-1、-2和-128分别对应，sentinel、invalid（delete）和empty，一开始所有meta被初始化位empty

查找过程：

计算hash，得到h1和h2
通过h2得到起始的group
使用h2和这一group的meta匹配（使用SSE2）
若匹配，返回到起始位置的offset
若匹配到 empty 则返回不存在

代码如下

auto pattern = _mm_set1_epi8(H2(hash)); // 1
prober seq { H1(hash), groups_ - 1 }; // 1
matcher m { 0 };

while (true) {
    group g { ctrl_ + seq.offset() }; // 2
    m = g.match(pattern); // 3

    while (m) {
        offset = seq.offset(m.index());
        assert(offset < cap_);
        if (likely(slot_[offset] == key)) // 4
            return true;
        ++m;
    }
    if (likely(g.match_empty())) // 5
        return false;
    seq.next();
}

这个swisstable宣传查找比STL要快，但不管是使用absl::flat_hash_set的官方实现，还是本人的实现，结果都不如STL（少数时候能比STL快一点点）

abby@Serenity ~/s/build> ./bench2
----------- insert ------------
  unordered_set => 183.605811ms
  flat_hash_set => 41.932002ms
----------- search ------------
  unordered_set => 4.378100ms
  flat_hash_set => 22.928702ms
unordered_set cap 1056323 size 787268 load_factor 0.745291
flat_hash_set cap 1048575 size 787268 load_factor 0.750798
abby@Serenity ~/s/build> ./bench2
----------- insert ------------
  unordered_set => 182.890869ms
  flat_hash_set => 39.940193ms
----------- search ------------
  unordered_set => 4.542800ms
  flat_hash_set => 21.498796ms
unordered_set cap 1056323 size 786505 load_factor 0.744569
flat_hash_set cap 1048575 size 786505 load_factor 0.750070
abby@Serenity ~/s/build> ./bench2
----------- insert ------------
  unordered_set => 183.804469ms
  flat_hash_set => 38.301994ms
----------- search ------------
  unordered_set => 5.001399ms
  flat_hash_set => 22.470496ms
unordered_set cap 1056323 size 787110 load_factor 0.745141
flat_hash_set cap 1048575 size 787110 load_factor 0.750647
abby@Serenity ~/s/build> ./bench
----------- insert ------------
  unordered_set => 187.271974ms
          swiss => 22.850596ms
----------- search ------------
  unordered_set => 4.307000ms
          swiss => 4.016499ms
unordered_set cap 1056323 size 786815 load_factor 0.744862
swiss         cap 1048576 size 786815 load_factor 0.750365
abby@Serenity ~/s/build> ./bench
----------- insert ------------
  unordered_set => 197.831773ms
          swiss => 20.787797ms
----------- search ------------
  unordered_set => 4.811299ms
          swiss => 4.107399ms
unordered_set cap 1056323 size 787381 load_factor 0.745398
swiss         cap 1048576 size 787381 load_factor 0.750905

据说rust也有使用了swisstable实现，实际测试如下

abby@Serenity ~/hash_test> cargo run --release ~/numbers.txt
   Compiling libc v0.2.148
   Compiling gcc v0.3.55
   Compiling autocfg v1.1.0
   Compiling version_check v0.9.4
   Compiling ahash v0.8.3
   Compiling num-traits v0.2.16
   Compiling cfg-if v1.0.0
   Compiling once_cell v1.18.0
   Compiling fasthash-sys v0.3.2
   Compiling cfg-if v0.1.10
   Compiling rand v0.4.6
   Compiling seahash v3.0.7
   Compiling allocator-api2 v0.2.16
   Compiling xoroshiro128 v0.3.0
   Compiling hashbrown v0.14.1
   Compiling fasthash v0.4.0
   Compiling hash_test v0.1.0 (/home/abby/hash_test)
    Finished release [optimized] target(s) in 12.99s
     Running `target/release/hash_test /home/abby/numbers.txt`
swiss insert => 73ms
swiss search => 57ms
std  insert => 76ms
std  search => 69ms
abby@Serenity ~/hash_test> cargo run --release ~/numbers.txt
    Finished release [optimized] target(s) in 0.01s
     Running `target/release/hash_test /home/abby/numbers.txt`
swiss insert => 71ms
swiss search => 56ms
std  insert => 71ms
std  search => 55ms
abby@Serenity ~/hash_test> cargo run --release ~/numbers.txt
    Finished release [optimized] target(s) in 0.02s
     Running `target/release/hash_test /home/abby/numbers.txt`
swiss insert => 73ms
swiss search => 78ms
std  insert => 72ms
std  search => 61ms
abby@Serenity ~/hash_test>

结论：

为什么，swisstable实现会比STL的慢，可能时因为这里面的实现meta和slot是分开存放的，类似

| meta              |  slot                                  |
+-------------------+----------------------------------------+

导致在查找时需要在meta和slot之间来回跳转，对缓存不友好

如果将meta和slot做到和 L1 cache对齐，效果应该要好一些

5.1 KiB Executable File Raw Blame History Unescape Escape

swisstable -- a joke

5.1 KiB

Executable File

Raw Blame History