How Lists and Tuples work

List and Tuples

Lists and tuples are a class of data structures called arrays. Lists are dynamic arrays that let us modify and resize the data we are storing,whie tuples are static arrays whose contents are fixed and immutable.

System memory in the computer can be thought of as a series of numbered buckets,each capable of holding a number.Python stores data in these buckets by reference i.e. the number itself points to the data we store.

When we create arrays,we first allocate a block of system memory which involves going to the system kernel and requesting the use of N consecutive buckets.For example if we want to allocate an array of size six,this is how it will be represented.

image info

In Python, lists also store how large they are, so of the six allocated blocks, only five are usable—the zeroth element is the length.

If we wanted to retrieve the zeroth element from the array,we would simply go to the first bucket in the sequence M and read the value inside it.On the other hand if we needed the fourth element in the array we would go to the buckets at the position M + 4.In general if we want to retrieve the ith element, we go to the bucket M + i.

If our data is stored in consecutive buckets, and having knowledge of the ordering of the data, we can locate the data by knowing which bucket to look into in one step i.e O(1).

%%timeit l = list(range(10))
l[5]
41.9 ns ± 1.18 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%%timeit l = list(range(1000))
l[5]
40.3 ns ± 0.459 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

What if we are to do a search of a particular element on an array where we do not know it’s order, than we must do a search and the most basic search is linear search which has a worst case performance of O(n).In order to know that the element we are searching for isn’t in the array,we must check against each and every other element.

Python’s list.index() uses this algorithm

def linear_search(elem,array):
    for i,item in enumerate(array):
        if item == elem:
            return i

    return -1

The performance of the search can be improved if the data is sorted either way. Python list have a built-in sorting algorithm that uses Tim sort which can sort a list in O(n) in the best case and O(log n) in the worst case.

Timsort is a hybrid stable sorting algorithm, derived from merge sort and insertion sort, designed to perform well on many kinds of real-world data.The algorithm finds subsequences of the data that are already ordered (runs) and uses them to sort the remainder more efficiently. This is done by merging runs until certain criteria are fulfilled. Wikipedia

Once we have the list sorted,we can use binary search to find the desired element.

def binarySearch(elem,arr):
    min,max = 0,len(arr)
    while True:
        if min > max:
            return -1
        midpoint = (min + max)//2
        if arr[midpoint] > elem:
            max = midpoint
        elif arr[midpoint] < elem:
            min = midpoint + 1
        else:
            return midpoint 

The bisect module in Python simplifies much of the process by providing easy methods to add elements into a list while maintaining it’s sorting. Example: finding the close values in a list

import bisect
import random

def closestMatch(arr,value):
    #  bisect_left returns the closest value in the array that is greater than the value
    i = bisect.bisect_left(arr,value)
    if i == len(arr):
        return i - 1
    elif arr[i] == value:
        return i
    elif i > 0:
        j = i -1
        # Since i is larger than the value we passed 
        if arr[i] - value > value - arr[j]:
            return j
    return i

Generating an ordered list using the bisect module

num = list()
for i in range(10):
    new = random.randint(0,1000)
    bisect.insort(num,new)

Finding the closest match

closest = closestMatch(num,500)
print(f'closest number is {num[closest]}')

Differences between Lists and Tuples

The following are the main differences between a list and a tuple:

  1. Lists are dynamic arrays, they are mutable and allow for resizing.
  2. Tuples are static arrays, they are immutable and the data within them cannot be changed.
  3. Tuples are cached by the Python runtime,that means whenever we want to use one, there is no need to talk to the kernel to reserve memory. That also means tuples are lightwight data structures.

List as Dynamic arrays

Once a list is created,we can change it’s contents freely:

num = [2,4,6,8,10]
num[2] = 2 * num[0]

We can also append to a list:

num.append(12)

This is possible because:

  1. Lists supports a resize operation that increases the capacity of the array.
  2. When a list of size N is first appended to, Python must create a new list that is big enough to hold the orignal N items in addition to the extra one.
  3. However, instead of allocating N + 1 items, M items are actually allocated, where M > N, in order to provide extra room for future appends.

Tuples as Static arrays

Tuples are fixed and immutable that means we cannot do an operation like this:

t = (2,4,6,8)
t[0] = 5 # This is not correct

Tuples do not support resizing, we can concatenate two tuples together and form a new tuple:

t1 = (2,4,6,8)
t2 = (3,5,7,9)
t1 + t2
(2,4,6,8,3,5,7,9)

A benefit of the static nature of Tuples is that, Python performs resource caching in the background. This means that when a new tuple of that size is needed in the future, we don’t need to communicate with the operating system to find a region in memory to put the data into, since we have a reserve of free memory already.