How Sets and Dictionaries work

Sets and Dictionaries

Sets and dictionaries are suitable data structures when the data has no intrinsic order and has a unique object that can be used to reference the data.This reference object is called a key and it can be any hashable type. Dictionaries and sets are almost identical,except that sets do not contain values. A set is simply a collection of keys.

A hashable type is one that implements both the _hash_ and either __eq__ or __cmp__ methods.

Example

Let’s say we want to store contact information in a phonebook, we will first look how to do tat using a List:

def find_contact(phonebook,name):
    for n,p in phonebook:
        if n == name:
            return p
    return None

phone_book = [("Rob Pike","222-222-2222"),("John Doe","333-333-3333")]
print(f"Rob Pike's number is {find_contact(phone_book,'Rob Pike')}")

If we use a dictionary, the index will be the name and the value will be the phone number:

phone_book = {
    "Rob Pike": "222-222-2222",
    "John Doe": "333-333-3333"
}
print(f"Rob Pike's number is {phone_book['Rob Pike']}")

For large amounts of records in the phonebook,the difference between the O(1) lookup of the dictionary and the O(n) time for linear search over a list is quite substantial. Supposingly we wanted to answer the question,“How many unique first names are there in the Phone book? The obvious data structure will be sets. If we were to use lists, than we would have had to enforce this property by comparing all names with other names. The example below illustrated the two different implementations:

Using Lists

def unique_names(phone_book):
    uniques = []
    for name,number in phone_book:
        fname,lname = name.split(" ",1)

        for unique in uniques:
            if unique == fname:
                print(fname)
                break
        else:
            
            uniques.append(fname)
    return len(uniques)

Using Sets

def unique_names_sets(phone_book):
    uniques = set()
    for name,number in phone_book:
        fname,lname = name.split(" ",1)
        uniques.add(fname)

    return len(uniques)

Usage:

phoneBook = [("Rob Pike","222-222-2222"),("John Doe","333-333-3333"),
             ("John Defoe","444-444-4444"),("Doe John","555-555-5555")
 ]
print("Number of Unique names from List:",unique_names(phoneBook))
print("Number of Unique names from Set:",unique_names_sets(phoneBook))

For Lists,we must loop over all the names in the Phone book which costs O(n),and than the inner loop which iterates over the unique names,which starts out empty and than grows constantly, thus the total operation cost is O(n2). While for sets, which has the property of guranteeing uniqueness and since the only cost is the single looping over the contacts, hence the cost is O(n)

Dictionary Comprehension

A dictionary comprehension builds a dictionary from any interable:

Let’s say we have a list of pairs that we want to convert to a dictionary:

country_codes = [(
    61,"Australia"),
    (880,"Bangladesh"),
    (55,"Brazil"),
    (45,"Denmark"),
    (679,"Fiji"),
    (33,"France"),
    (49,"Germany"),
    (91,"India")]

codes = {country:code for code,country in country_codes}
print(codes)

Some common methods for dictionaries

  1. d.clear() - Remove all items
  2. *d._contains_(k)* - checking if k is in d
  3. d.copy() - shallow copy
  4. d.fromkeys(k,[initial]) - New mapping from Keys iterable with optional default value
  5. d.items() - get view of key,value pairs
  6. d.get(k,default) - Get item with key k, return default or none if missing
  7. d.keys() - Get all the keys
  8. d.pop(k,default) - Remove and return value at k, if missing return default of None
  9. d.popitem() - Remove and arbitrary value
  10. d.update(v,[**kargs]) - Update d with items from mapping or iterable of key,value pairs
  11. d.values() - Get all the values

Variations of dicts

In the collections module there are some variations of dictionaries that can come in handy:

  1. collections.OrderedDict - Maintains a Key insertion order,allowing iteration in a predictable manner.
  2. collections.ChainMap - Groups multiple dicts to create a single updateable view.The underlying mappings are stored as lists

Example

from collections import ChainMap
call_codes = {'Australia': 61, 'Bangladesh': 880, 'Brazil': 55, 'Denmark': 45, 'Fiji': 679, 'France': 33, 'Germany': 49, 'India': 91}
country_codes = {'Australia':"AU", 'Bangladesh': "BG", 'Brazil': "BR", 'Denmark': "DN", 'Fiji': "FJ", 'France': "FR", 'Germany': "DE", 'India': "IN"}
combined = ChainMap(call_codes,country_codes)
print(combined)

Output

ChainMap({'Australia': 61, 'Bangladesh': 880, 'Brazil': 55, 'Denmark': 45, 'Fiji': 679, 'France': 33, 'Germany': 49, 'India': 91}, {'Australia': 'AU', 'Bangladesh': 'BG', 'Brazil': 'BR', 'Denmark': 'DN', 'Fiji': 'FJ', 'France': 'FR', 'Germany': 'DE', 'India': 'IN'})
  1. collections.Counter - A mapping that holds an integer count for each key.Updating an existing key adds to the count.
from collections import Counter
words = ['Peek','Post','Random','Post','Peek','Element']
count = Counter(words)
print(count)

Output

Counter({'Peek': 2, 'Post': 2, 'Random': 1, 'Element': 1})
  1. creating immutable dictionaries: In the types module, there is a wrapper class called MappingProxyType which, given a mapping returns a mappingproxy instance which is a read only view of the orignal mapping.
from types import MappingProxyType
data = {'Name':"India"}
data_proxy = MappingProxyType(data)
print(data_proxy)
data_proxy['Code'] = 'IN'

Output

TypeError: 'mappingproxy' object does not support item assignment
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_4370/748463367.py in <module>
      3 data_proxy = MappingProxyType(data)
      4 print(data_proxy)
----> 5 data_proxy['Code'] = 'IN'

TypeError: 'mappingproxy' object does not support item assignment

Sets

A set is a collection of unique object.The immutable sibbiling of set is frozenset.Set elements must be hashsble. The set itself is not hashable,but frozenset is and can be passed inside a set.

devices = ['Router','Switch','Router','Firewall','Appliance','Switch']
print(set(devices))
# Converting set to list
print(list(set(devices)))

In addition to providing uniqueness, it support the basic mathematical set operations: Given two sets a and b

  1. a | b returns their unions
  2. a & b computes the intersection
  3. a - b computes the difference
  4. a ^ b Symmetric difference (the complement of intersection)
  5. e in b Element e is a member of b
  6. s <= a s is a subset of a
  7. s < a s is a proper subset of a
  8. s >= b s is a superset of b
  9. s > b s is a proper superset of b

Operations from 5 to 9 return boolean value.

Methods supported in Set

  1. a.add(e) - Add element to a
  2. a.clear() - clear all elements of a
  3. a.copy() - shallow copy of a
  4. a.discard(e) - Remove element e from a
  5. a.pop() - Remove and return an element from a,will raise KeyError if set is empty
  6. a.remove(e) - Remove element e from the set.
  7. a._iter_() - get an iterator
  8. a._len_() - len(a)

Creating a frozenset is straight foreward and can be created as:

fs = frozenset(range(20))
print(fs)

# Output
frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19})