Python list search speed

Searching for data stored in different data structures is a crucial part of pretty much every single application.

There are many different algorithms available to utilize when searching, and each have different implementations and rely on different data structures to get the job done.

Being able to choose a specific algorithm for a given task is a key skill for developers and can mean the difference between a fast, reliable and stable application and an application that crumbles from a simple request.

Membership Operators

Algorithms develop and become optimized over time as a result of constant evolution and the need to find the most efficient solutions for underlying problems in different domains.

One of the most common problems in the domain of Computer Science is searching through a collection and determining whether a given object is present in the collection or not.

Almost every programming language has its own implementation of a basic search algorithm, usually as a function which returns a Boolean value of True or False when an item is found in a given collection of items.

In Python, the easiest way to search for an object is to use Membership Operators - named that way because they allow us to determine whether a given object is a member in a collection.

These operators can be used with any iterable data structure in Python, including Strings, Lists, and Tuples.

  • in - Returns True if the given element is a part of the structure.
  • not in - Returns True if the given element is not a part of the structure.
'apple' in ['orange', 'apple', 'grape'] True 't' in 'stackabuse' True 'q' in 'stackabuse' False 'q' not in 'stackabuse' True

Membership operators suffice when all we need to do is find whether a substring exists within a given string, or determine whether two Strings, Lists, or Tuples intersect in terms of the objects they hold.

In most cases we need the position of the item in the sequence, in addition to determining whether or not it exists; membership operators do not meet this requirement.

There are many search algorithms that don't depend on built-in operators and can be used to search for values faster and/or more efficiently. In addition, they can yield more information, such as the position of the element in the collection, rather than just being able to determine its existence.

Linear Search

Linear search is one of the simplest searching algorithms, and the easiest to understand. We can think of it as a ramped-up version of our own implementation of Python's in operator.

The algorithm consists of iterating over an array and returning the index of the first occurrence of an item once it is found:

def LinearSearch[lys, element]: for i in range [len[lys]]: if lys[i] == element: return i return -1

So if we use the function to compute:

print[LinearSearch[[1,2,3,4,5,2,1], 2]]

Upon executing the code, we're greeted with:

1

This is the index of the first occurrence of the item we are searching for - keeping in mind that Python indexes are 0-based.

The time complexity of linear search is O[n], meaning that the time taken to execute increases with the number of items in our input list lys.

Linear search is not often used in practice, because the same efficiency can be achieved by using inbuilt methods or existing operators, and it is not as fast or efficient as other search algorithms.

Linear search is a good fit for when we need to find the first occurrence of an item in an unsorted collection because unlike most other search algorithms, it does not require that a collection be sorted before searching begins.

Binary Search

Binary search follows a divide and conquer methodology. It is faster than linear search but requires that the array be sorted before the algorithm is executed.

Assuming that we're searching for a value val in a sorted array, the algorithm compares val to the value of the middle element of the array, which we'll call mid.

  • If mid is the element we are looking for [best case], we return its index.
  • If not, we identify which side of mid val is more likely to be on based on whether val is smaller or greater than mid, and discard the other side of the array.
  • We then recursively or iteratively follow the same steps, choosing a new value for mid, comparing it with val and discarding half of the possible matches in each iteration of the algorithm.

The binary search algorithm can be written either recursively or iteratively. Recursion is generally slower in Python because it requires the allocation of new stack frames.

Since a good search algorithm should be as fast and accurate as possible, let's consider the iterative implementation of binary search:

def BinarySearch[lys, val]: first = 0 last = len[lys]-1 index = -1 while [first 1]: i = min[index + fibM_minus_2, [len[lys]-1]] if [lys[i] < val]: fibM = fibM_minus_1 fibM_minus_1 = fibM_minus_2 fibM_minus_2 = fibM - fibM_minus_1 index = i elif [lys[i] > val]: fibM = fibM_minus_2 fibM_minus_1 = fibM_minus_1 - fibM_minus_2 fibM_minus_2 = fibM - fibM_minus_1 else : return i if[fibM_minus_1 and index < [len[lys]-1] and lys[index+1] == val]: return index+1; return -1

If we use the FibonacciSearch function to compute:

print[FibonacciSearch[[1,2,3,4,5,6,7,8,9,10,11], 6]]

Let's take a look at the step-by-step process of this search:

  • Determining the smallest Fibonacci number greater than or equal to the length of the list as fibM; in this case, the smallest Fibonacci number meeting our requirements is 13.
  • The values would be assigned as:
    • fibM = 13
    • fibM_minus_1 = 8
    • fibM_minus_2 = 5
    • index = -1
  • Next, we check the element lys[4] where 4 is the minimum of -1+5 . Since the value of lys[4] is 5, which is smaller than the value we are searching for, we move the Fibonacci numbers one step down in the sequence, making the values:
    • fibM = 8
    • fibM_minus_1 = 5
    • fibM_minus_2 = 3
    • index = 4
  • Next, we check the element lys[7] where 7 is the minimum of 4+3. Since the value of lys[7] is 8, which is greater than the value we are searching for, we move the Fibonacci numbers two steps down in the sequence.
    • fibM = 3
    • fibM_minus_1 = 2
    • fibM_minus_2 = 1
    • index = 4
  • Now we check the element lys[5] where 5 is the minimum of 4+1 . The value of lys[5] is 6, which is the value we are searching for!

The result, as expected is:

5

The time complexity for Fibonacci search is O[log n]; the same as binary search. This means the algorithm is faster than both linear search and jump search in most cases.

Fibonacci search can be used when we have a very large number of elements to search through, and we want to reduce the inefficiency associated with using an algorithm which relies on the division operator.

An additional advantage of using Fibonacci search is that it can accommodate input arrays that are too large to be held in CPU cache or RAM, because it searches through elements in increasing step sizes, and not in a fixed size.

Exponential Search

Exponential search is another search algorithm that can be implemented quite simply in Python, compared to jump search and Fibonacci search which are both a bit complex. It is also known by the names galloping search, doubling search and Struzik search.

Exponential search depends on binary search to perform the final comparison of values. The algorithm works by:

  • Determining the range where the element we're looking for is likely to be
  • Using binary search for the range to find the exact index of the item

The Python implementation of the exponential search algorithm is:

def ExponentialSearch[lys, val]: if lys[0] == val: return 0 index = 1 while index < len[lys] and lys[index]

Chủ Đề