fca_unordered

Proof of concept of closed-addressing, O(1)-erase, standards-compliant unordered associative containers.

template<
  typename T,typename Hash=boost::hash<T>,typename Pred=std::equal_to<T>,
  typename Allocator=std::allocator<T>,
  typename SizePolicy=prime_size,typename BucketArrayPolicy=grouped_buckets
>
class fca_unordered_set;

template<
  typename Key,typename Value,
  typename Hash=boost::hash<Key>,typename Pred=std::equal_to<Key>,
  typename Allocator=std::allocator</* equivalent to std::pair<const Key,Value> */>,
  typename SizePolicy=prime_size,typename BucketArrayPolicy=grouped_buckets
>
class fca_unordered_map;

SizePolicy

Specifies how the bucket array grows and the algorithm used for determining the position
of an element with hash value h in the array.

  • prime_size: Sizes are prime numbers in an approximately doubling sequence. position(h) = h % size,
    modulo operations are sped up by keeping a function pointer table with fpt[i](h) == h % size(i),
    where i ranges over the size sequence.
  • prime_switch_size: Same as before, but instead of a table a switch statement over i is used.
  • prime_fmod_size: position(h) = fastmod(high32bits(h) + low32bits(h), size).
    fastmod is a fast implementation of modulo for 32-bit numbers
    by Daniel Lemire.
  • prime_frng_size: position(h) = fastrange(h, size). Daniel Lemire’s fastrange
    maps a uniform distribution of values in the range [0, size). This policy is ill behaved for
    low quality hash functions because it ignores the low bits of the hash value.
  • prime_frng_fib_size: Fixes pathological situations of prime_frng_size by doing
    positon(h) = fastrange(h * F, size), where F is the
    Fibonacci hashing constant.
  • pow2_size: Sizes are consecutive powers of two. position(h) returns the higher bits of the
    hash value, which, as it happens with prime_frng_size, works poorly for low quality hash functions.
  • pow2_fib_size: h is Fibonacci hashed before calculating the position.

BucketArrayPolicy

  • simple_buckets: The bucket array is a plain vector of node pointers without additional metadata.
    The resulting container deviates from the C++ standard requirements for unordered associative
    containers in two aspects:

    • Iterator increment is not constant but gets slower as the number of empty buckets grow;
      see N2023 for details.
    • Because of the former, erase(iterator) returns void instead of an iterator to the next
      element.
  • grouped_buckets: The resulting container is fully standards-compliant, including constant
    iteration and average constant erasure(iterator). Besides the usual bucket array, a vector
    of bucket group metadata is kept. Buckets are logically grouped in 32/64 (depending on the
    architecture) consecutive buckets: the associated bucket group metadata consists of a pointer
    to the first bucket of the group, a bitmask signalling which buckets are occupied,
    and prev and next pointers to link non-empty bucket groups in a bidirectional list.
    Going from a given bucket to the next occupied one is implemented as follows:

    • Use bit counting operations to
      determine, in constant time, the following occupied bucket within the group.
    • If there are no further occupied buckets in the group, go the next group (which is
      guaranteed to be non-empty).
    The memory overhead added by bucket groups is 4 bits per bucket.

template<
  typename T,typename Hash=boost::hash<T>,typename Pred=std::equal_to<T>,
  typename Allocator=std::allocator<T>
>
class fca_simple_unordered_set;

template<
  typename Key,typename Value,
  typename Hash=boost::hash<Key>,typename Pred=std::equal_to<Key>,
  typename Allocator=std::allocator</* equivalent to std::pair<const Key,Value> */>
>
class fca_simple_unordered_map;

Abandoned experiment where individual occupied buckets where linked in a bidirectional
list. Outperformed by fca_unordered_[set|map] with grouped_buckets.

GitHub

View Github