Exploring Swift Dictionary's Implementation


Swift Dictionary :

  • Swift dictionary is made of two generic types: Key (which has to be Hashable) and Value
  • An entry can be made by providing a key and its value
  • A value can be retrieved by providing a key which has been inserted before
  • A entry can be deleted by providing a key
  • Each key will map to one and only one value

There are multiple ways to store these (key, value) records one of which is open addressing with linear probing which is used by swift’s dictionary implementation.

Consider a dictionary of capacity = 8, it can store a max of 7 records of (key, value) because there should be atleast one empty space (called holes) in the dictionary buffer which acts as sentinel for stopping the search during retrivals/insertions.

This will look something like this in memory:

A continuous memory is allocated with

size = capacity * (sizeof(Bitmap) + sizeof(Keys) + sizeof(Values))

This can be rearranged logically to look something like this :

Each column is called a bucket which holds three things : a bitmap value, a key and a value

Bitmap value of a bucket tells if the key and value in that bucket is valid and initialized or not. If not, then that bucket is called a hole.

_HashedContainerStorageHeader (struct)

This struct acts as header to the memory buffer which will be created. It contains the capacity and count of the allocated storage where capacity is actual capacity of the allocated memory and count is number of valid elements which are held right now in the buffer.

maxLoadFactorInverse is factor used to grow the buffer when it needs to expand.

_NativeDictionaryStorageImpl<Key, Value> (class)

subclass of ManagedBuffer<_HashedContainerStorageHeader, UInt8>

The main responsibility of this class is to allocate the memory needed for the dictionary and give pointers to bitmap, keys and values array for mutating them for eg:

internal var _values: UnsafeMutablePointer<Value> {
  @warn_unused_result
  get {
    let start = UInt(Builtin.ptrtoint_Word(_keys._rawValue)) &+
      UInt(_capacity) &* UInt(strideof(Key.self))
    let alignment = UInt(alignof(Value))
    let alignMask = alignment &- UInt(1)
    return UnsafeMutablePointer<Value>(
        bitPattern:(start &+ alignMask) & ~alignMask)
  }
}

Since Bitmap, keys and values memory is allocated contiguously, values_pointer will be keys_pointer + capacity * keys_pointer.

internal class func create(capacity: Int) -> StorageImpl {
  let requiredCapacity = bytesForBitMap(capacity) + bytesForKeys(capacity)
      + bytesForValues(capacity)

  let r = super.create(requiredCapacity) { _ in
    return _HashedContainerStorageHeader(capacity: capacity)
  }
  let storage = r as! StorageImpl
  let initializedEntries = _BitMap(
      storage: storage._initializedHashtableEntriesBitMapStorage,
      bitCount: capacity)
  initializedEntries.initializeToZero()
  return storage
}

All the entries in bitmap are initalized to zero when buffer is created i.e all buckets are holes. It is also worth noting that the generic type Key of this class doesn’t need to follow Hashable to use this storage implementation.

// Note: It is intended that Key, Value
// (without : Hashable) is used here - this storage must work
// with non-Hashable types.

_NativeDictionaryStorage<Key : Hashable, Value> (struct)

This struct keeps the above storage implementation in its buffer property and provides method which converts contiguous allocated memory into logical bucket array. Also note that the end of bucket array is connected to start of the bucket array forming a logical ring. This just means that if you’re at end of the bucket array and you call next method, you’ll end up at begining of the array.

To do operation like insert or get an value using its key, we need to figure out which bucket the key belongs to. Since Key is Hashable, there is a hashValue method on it which will return an Int, but this hashValue can be greater than capacity of the dictionary buffer so it must be squeezed to be within bounds of 0 and capacity :

@warn_unused_result
internal func _bucket(k: Key) -> Int {
  return _squeezeHashValue(k.hashValue, 0..<capacity)
}

We can move between the buckets using _next and _prev methods. Capacity is always a number which can be represented in powers of 2. If we’re at end of the bucket array we’ll just cycle back to 0th position by calling _next

internal var _bucketMask: Int {
  // The capacity is not negative, therefore subtracting 1 will not overflow.
  return capacity &- 1
}

@warn_unused_result
internal func _next(bucket: Int) -> Int {
  // Bucket is within 0 and capacity. Therefore adding 1 does not overflow.
  return (bucket &+ 1) & _bucketMask
}

@warn_unused_result
internal func _prev(bucket: Int) -> Int {
  // Bucket is not negative. Therefore subtracting 1 does not overflow.
  return (bucket &- 1) & _bucketMask
}

To add an element, we just need to figure out which bucket it belongs to and then call this method which will insert the key and value in the bucket and also set its bitmap’s entry to true.

@_transparent
internal func initializeKey(k: Key, value v: Value, at i: Int) {
  _sanityCheck(!isInitializedEntry(i))

  (keys + i).initialize(k)
  (values + i).initialize(v)
  initializedEntries[i] = true
  _fixLifetime(self)
}

_find method :

  • _find method is used to figure out which bucket has to be used for a key (insert or retrieval)
  • Needs the starting bucket for a key => _bucket(key)
  • Returns the position where the item is if the key in parameter matches the key in the bucket
  • Or the if item is not found, the position of the first hole encountered is where that key can be inserted
  • The bitmap is used to check if the current bucket is hole or not
  • This is called linear probing
/// Search for a given key starting from the specified bucket.
///
/// If the key is not present, returns the position where it could be
/// inserted.
@warn_unused_result
internal
func _find(key: Key, _ startBucket: Int) -> (pos: Index, found: Bool) {
  var bucket = startBucket

  // The invariant guarantees there's always a hole, so we just loop
  // until we find one
  while true {
    let isHole = !isInitializedEntry(bucket)
    if isHole {
      return (Index(nativeStorage: self, offset: bucket), false)
    }
    if keyAt(bucket) == key {
      return (Index(nativeStorage: self, offset: bucket), true)
    }
    bucket = _next(bucket)
  }
}
  • So basically the squeezeHashValue is the index of that key in the bucket array but there might be collisions, i.e two keys can have same hashValues
  • In this case the find method will find the next best possible bucket with hole so that key can be inserted

This is why the hashValue of Hashable should be extremely good function for good performance.

  • squeezeHashValue does try to create a good hash by applying some transformation operations.
  • Also it looks like to keep hashValues different across executions, execution seed was supposed to mixed with hashValue provided by the Key type but its a constant placeholder value for now
func _squeezeHashValue(hashValue: Int, _ resultRange: Range<UInt>) -> UInt {
  let mixedHashValue = UInt(bitPattern: _mixInt(hashValue))
  let resultCardinality: UInt = resultRange.endIndex - resultRange.startIndex
  if _isPowerOf2(resultCardinality) {
    return mixedHashValue & (resultCardinality - 1)
  }
  return resultRange.startIndex + (mixedHashValue % resultCardinality)
}

func _mixUInt64(value: UInt64) -> UInt64 {
  // Similar to hash_4to8_bytes but using a seed instead of length.
  let seed: UInt64 = _HashingDetail.getExecutionSeed()
  let low: UInt64 = value & 0xffff_ffff
  let high: UInt64 = value >> 32
  return _HashingDetail.hash16Bytes(seed &+ (low << 3), high)
}

static func getExecutionSeed() -> UInt64 {
  // FIXME: This needs to be a per-execution seed. This is just a placeholder
  // implementation.
  let seed: UInt64 = 0xff51afd7ed558ccd
  return _HashingDetail.fixedSeedOverride == 0 ? seed : fixedSeedOverride
}

static func hash16Bytes(low: UInt64, _ high: UInt64) -> UInt64 {
  // Murmur-inspired hashing.
  let mul: UInt64 = 0x9ddfea08eb382d69
  var a: UInt64 = (low ^ high) &* mul
  a ^= (a >> 47)
  var b: UInt64 = (high ^ a) &* mul
  b ^= (b >> 47)
  b = b &* mul
  return b
}
  • _BitMap struct is wrapper which helps in reading and writing to the bitmap

This can be summarized in this image :

_NativeDictionaryStorageOwner (class)

  • This class was created to keep track of correct reference count for Copy on Write feature, since the storage will be referenced by Dictionary and DictionaryIndex both, reference count will be 2 for the storage but if Dictionary refers this class and reference count of this class is tracked, everything should be fine.

      /// This class is an artifact of the COW implementation.  This class only
      /// exists to keep separate retain counts separate for:
      /// - `Dictionary` and `NSDictionary`,
      /// - `DictionaryIndex`.
      ///
      /// This is important because the uniqueness check for COW only cares about
      /// retain counts of the first kind.
    

This is just becoming awkward now :

_VariantDictionaryStorage<Key : Hashable, Value> (enum)

This enum has two cases with storage for Native/Cocoa dictionary as stored properties inside those cases.

case Native(_NativeDictionaryStorageOwner<Key, Value>)
case Cocoa(_CocoaDictionaryStorage)

Major features :

  • This enum based on the storage Native or Cocoa performs correct methods on the storage for operations on the dictionary like insert, fetch, delete etc
  • Expands the storage if it is getting full
  • Update or intialize key-value
  internal mutating func nativeUpdateValue(
    value: Value, forKey key: Key
  ) -> Value? {
    var (i, found) = native._find(key, native._bucket(key))
    
    let minCapacity = found
      ? native.capacity
      : NativeStorage.getMinCapacity(
          native.count + 1,
          native.maxLoadFactorInverse)

    let (_, capacityChanged) = ensureUniqueNativeStorage(minCapacity)
    if capacityChanged {
      i = native._find(key, native._bucket(key)).pos
    }

    let oldValue: Value? = found ? native.valueAt(i.offset) : nil
    if found {
      native.setKey(key, value: value, at: i.offset)
    } else {
      native.initializeKey(key, value: value, at: i.offset)
      native.count += 1
    }

    return oldValue
  }
  • If a Value has to be removed from the bucket array, we might end up creating a hole which might cause side effect of probing to stop earlier than expected. To take care of that the elements are rearranged at their ideal position.
/// - parameter idealBucket: The ideal bucket for the element being deleted.
/// - parameter offset: The offset of the element that will be deleted.
/// Requires an initialized entry at offset.
internal mutating func nativeDeleteImpl(
  nativeStorage: NativeStorage, idealBucket: Int, offset: Int
) {
  _sanityCheck(
      nativeStorage.isInitializedEntry(offset), "expected initialized entry")

  // remove the element
  nativeStorage.destroyEntryAt(offset)
  nativeStorage.count -= 1

  // If we've put a hole in a chain of contiguous elements, some
  // element after the hole may belong where the new hole is.
  var hole = offset

  // Find the first bucket in the contiguous chain
  var start = idealBucket
  while nativeStorage.isInitializedEntry(nativeStorage._prev(start)) {
    start = nativeStorage._prev(start)
  }

  // Find the last bucket in the contiguous chain
  var lastInChain = hole
  var b = nativeStorage._next(lastInChain)
  while nativeStorage.isInitializedEntry(b) {
    lastInChain = b
    b = nativeStorage._next(b)
  }

  // Relocate out-of-place elements in the chain, repeating until
  // none are found.
  while hole != lastInChain {
    // Walk backwards from the end of the chain looking for
    // something out-of-place.
    var b = lastInChain
    while b != hole {
      let idealBucket = nativeStorage._bucket(nativeStorage.keyAt(b))

      // Does this element belong between start and hole?  We need
      // two separate tests depending on whether [start,hole] wraps
      // around the end of the buffer
      let c0 = idealBucket >= start
      let c1 = idealBucket <= hole
      if start <= hole ? (c0 && c1) : (c0 || c1) {
        break // Found it
      }
      b = nativeStorage._prev(b)
    }

    if b == hole { // No out-of-place elements found; we're done adjusting
      break
    }

    // Move the found element into the hole
    nativeStorage.moveInitializeFrom(nativeStorage, at: b, toEntryAt: hole)
    hole = b
  }
}

And finally the dictionary looks like this

(just one single property of the above enum)