Swift Dictionary :
- Swift dictionary is made of two generic types: Key (which has to be
Hashable) and Value
- An entry can be made by providing a key and its value
- A value can be retrieved by providing a key which has been inserted before
- A entry can be deleted by providing a key
- Each key will map to one and only one value
There are multiple ways to store these (key, value) records one of which is open addressing with linear probing which is used by swift’s dictionary implementation.
Consider a dictionary of capacity = 8, it can store a max of 7 records of (key, value) because there should be atleast one empty space (called holes) in the dictionary buffer which acts as sentinel for stopping the search during retrivals/insertions.
This will look something like this in memory:
A continuous memory is allocated with
size = capacity * (sizeof(Bitmap) + sizeof(Keys) + sizeof(Values))
This can be rearranged logically to look something like this :
Each column is called a bucket which holds three things : a bitmap value, a key and a value
Bitmap value of a bucket tells if the key and value in that bucket is valid and initialized or not. If not, then that bucket is called a hole.
This struct acts as header to the memory buffer which will be created. It contains the capacity and count of the allocated storage where capacity is actual capacity of the allocated memory and count is number of valid elements which are held right now in the buffer.
maxLoadFactorInverse is factor used to grow the buffer when it needs to expand.
_NativeDictionaryStorageImpl<Key, Value> (class)
The main responsibility of this class is to allocate the memory needed for the dictionary and give pointers to bitmap, keys and values array for mutating them for eg:
Since Bitmap, keys and values memory is allocated contiguously, values_pointer will be
keys_pointer + capacity * keys_pointer.
All the entries in bitmap are initalized to zero when buffer is created i.e all buckets are holes. It is also worth noting that the generic type Key of this class doesn’t need to follow Hashable to use this storage implementation.
// Note: It is intended that Key, Value // (without : Hashable) is used here - this storage must work // with non-Hashable types.
_NativeDictionaryStorage<Key : Hashable, Value> (struct)
This struct keeps the above storage implementation in its
buffer property and provides method which converts contiguous allocated memory into logical bucket array. Also note that the end of bucket array is connected to start of the bucket array forming a
logical ring. This just means that if you’re at end of the bucket array and you call next method, you’ll end up at begining of the array.
To do operation like insert or get an value using its key, we need to figure out which bucket the key belongs to. Since
Key is Hashable, there is a
hashValue method on it which will return an
Int, but this
hashValue can be greater than capacity of the dictionary buffer so it must be squeezed to be within bounds of 0 and capacity :
We can move between the buckets using
_prev methods. Capacity is always a number which can be represented in powers of 2.
If we’re at end of the bucket array we’ll just cycle back to 0th position by calling
To add an element, we just need to figure out which bucket it belongs to and then call this method which will insert the key and value in the bucket and also set its bitmap’s entry to true.
_find method :
_findmethod is used to figure out which bucket has to be used for a key (insert or retrieval)
- Needs the starting bucket for a key => _bucket(key)
- Returns the position where the item is if the key in parameter matches the key in the bucket
- Or the if item is not found, the position of the first hole encountered is where that key can be inserted
- The bitmap is used to check if the current bucket is hole or not
- This is called linear probing
- So basically the squeezeHashValue is the index of that key in the bucket array but there might be collisions, i.e two keys can have same hashValues
- In this case the find method will find the next best possible bucket with hole so that key can be inserted
This is why the hashValue of
Hashable should be extremely good function for good performance.
- squeezeHashValue does try to create a good hash by applying some transformation operations.
- Also it looks like to keep hashValues different across executions, execution seed was supposed to mixed with hashValue provided by the Key type but its a constant placeholder value for now
_BitMapstruct is wrapper which helps in reading and writing to the bitmap
This can be summarized in this image :
This class was created to keep track of correct reference count for Copy on Write feature, since the storage will be referenced by Dictionary and DictionaryIndex both, reference count will be 2 for the storage but if Dictionary refers this class and reference count of this class is tracked, everything should be fine.
/// This class is an artifact of the COW implementation. This class only /// exists to keep separate retain counts separate for: /// - `Dictionary` and `NSDictionary`, /// - `DictionaryIndex`. /// /// This is important because the uniqueness check for COW only cares about /// retain counts of the first kind.
This is just becoming awkward now :
_VariantDictionaryStorage<Key : Hashable, Value> (enum)
This enum has two cases with storage for Native/Cocoa dictionary as stored properties inside those cases.
Major features :
- This enum based on the storage Native or Cocoa performs correct methods on the storage for operations on the dictionary like insert, fetch, delete etc
- Expands the storage if it is getting full
- Update or intialize key-value
- If a Value has to be removed from the bucket array, we might end up creating a hole which might cause side effect of probing to stop earlier than expected. To take care of that the elements are rearranged at their ideal position.
And finally the dictionary looks like this
(just one single property of the above enum)