The internal implementation of Python’s list is an array, which is a linear list. To find an element in a list, you can use the list.index() method, which has a time complexity of O(n). For large amounts of data, binary search can be used for optimization. Binary search requires that objects must be ordered. The basic principles are as follows:
1. Start from the middle element of the array. If the middle element happens to be the element to be searched, the search process ends;
2. If a specific element is greater than or less than the middle element, then search for the half of the array that is greater than or less than the middle element, and start the comparison from the middle element as before.
3. If the array is empty at a certain step, it means it cannot be found.
Binary search also becomes a binary search. Each comparison of the algorithm reduces the search range by half, and its time complexity is O(logn).
We use recursion and loop to implement binary search respectively:
def binary_search_recursion(lst, value, low, high): if high < low: return None mid = (low + high) / 2 if lst[mid] > value: return binary_search_recursion(lst, value, low, mid-1) elif lst[mid] < value: return binary_search_recursion(lst, value, mid+1, high) else: return mid def binary_search_loop(lst,value): low, high = 0, len(lst)-1 while low <= high: mid = (low + high) / 2 if lst[mid] < value: low = mid + 1 elif lst[mid] > value: high = mid - 1 else: return mid return None
Then we conduct a performance test on these two implementations:
if __name__ == "__main__": import random lst = [random.randint(0, 10000) for _ in xrange(100000)] lst.sort() def test_recursion(): binary_search_recursion(lst, 999, 0, len(lst)-1) def test_loop(): binary_search_loop(lst, 999) import timeit t1 = timeit.Timer("test_recursion()", setup="from __main__ import test_recursion") t2 = timeit.Timer("test_loop()", setup="from __main__ import test_loop") print "Recursion:", t1.timeit() print "Loop:", t2.timeit()
The execution results are as follows:
Recursion: 3.12596702576 Loop: 2.08254289627
It can be seen that the loop method is more efficient than recursion.
Python has a bisect module for maintaining ordered lists. The bisect module implements an algorithm for inserting elements into an ordered list. In some cases, this is more efficient than sorting the list repeatedly or constructing a large list and then sorting it. Bisect means bisection. The bisection method is used here to sort. It will insert an element into the appropriate position of an ordered list, which eliminates the need to call sort every time to maintain the ordered list.
The following is a simple usage example:
import bisect import random random.seed(1) print'New Pos Contents' print'--- --- --------' l = [] for i in range(1, 15): r = random.randint(1, 100) position = bisect.bisect(l, r) bisect.insort(l, r) print'%3d %3d' % (r, position), l
Output result:
New Pos Contents --- --- -------- 14 0 [14] 85 1 [14, 85] 77 1 [14, 77, 85] 26 1 [14, 26, 77, 85] 50 2 [14, 26, 50, 77, 85] 45 2 [14, 26, 45, 50, 77, 85] 66 4 [14, 26, 45, 50, 66, 77, 85] 79 6 [14, 26, 45, 50, 66, 77, 79, 85] 10 0 [10, 14, 26, 45, 50, 66, 77, 79, 85] 3 0 [3, 10, 14, 26, 45, 50, 66, 77, 79, 85] 84 9 [3, 10, 14, 26, 45, 50, 66, 77, 79, 84, 85] 44 4 [3, 10, 14, 26, 44, 45, 50, 66, 77, 79, 84, 85] 77 9 [3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85] 1 0 [1, 3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85]
The functions provided by the Bisect module are:
bisect.bisect_left(a,x, lo=0, hi=len(a)) :
Find the index at which x is inserted into the ordered list a. lo and hi are used to specify the range of the list, and the default is to use the entire list. If x already exists, insert it to the left. The return value is index.
bisect.bisect_right(a,x, lo=0, hi=len(a))
bisect.bisect(a, x,lo=0, hi=len(a)) :
The sum of these two functions bisect_left Similar, but if x already exists, insert it to the right.
bisect.insort_left(a,x, lo=0, hi=len(a)):
Insert x in the ordered list a. The effect is the same as a.insert(bisect.bisect_left(a,x, lo, hi), x).
bisect.insort_right(a,x, lo=0, hi=len(a))
bisect.insort(a, x,lo=0, hi=len(a)) :
Similar to insort_left, but If x already exists, insert it to the right.
The functions provided by the Bisect module can be divided into two categories: bisect* is only used to find the index and does not perform actual insertion; while insort* is used for actual insertion. A typical application of this module is to calculate the score level:
def grade(score,breakpoints=[60, 70, 80, 90], grades='FDCBA'): i = bisect.bisect(breakpoints, score) return grades[i] print [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
Execution result:
['F', 'A', 'C', 'C', 'B', 'A', 'A']
Similarly, we can use the bisect module to implement binary search:
def binary_search_bisect(lst, x): from bisect import bisect_left i = bisect_left(lst, x) if i != len(lst) and lst[i] == x: return i return None
Let’s test its performance with binary search implemented by recursion and loops:
Recursion: 4.00940990448 Loop: 2.6583480835 Bisect: 1.74922895432
You can see that it is slightly faster than the loop implementation and almost half faster than the recursive implementation.
Python's famous data processing library numpy also has a function numpy.searchsorted for binary search. The usage is basically the same as bisect, except that if you want to insert on the right, you need to set the parameter side='right', for example:
>>> import numpy as np >>> from bisect import bisect_left, bisect_right >>> data = [2, 4, 7, 9] >>> bisect_left(data, 4) 1 >>> np.searchsorted(data, 4) 1 >>> bisect_right(data, 4) 2 >>> np.searchsorted(data, 4, side='right') 2
Then , let’s compare the performance again:
In [20]: %timeit -n 100 bisect_left(data, 99999) 100 loops, best of 3: 670 ns per loop In [21]: %timeit -n 100 np.searchsorted(data, 99999) 100 loops, best of 3: 56.9 ms per loop In [22]: %timeit -n 100 bisect_left(data, 8888) 100 loops, best of 3: 961 ns per loop In [23]: %timeit -n 100 np.searchsorted(data, 8888) 100 loops, best of 3: 57.6 ms per loop In [24]: %timeit -n 100 bisect_left(data, 777777) 100 loops, best of 3: 670 ns per loop In [25]: %timeit -n 100 np.searchsorted(data, 777777) 100 loops, best of 3: 58.4 ms per loop
You can find that the efficiency of numpy.searchsorted is very low, not at the same order of magnitude as bisect. Therefore searchsorted is not suitable for searching ordinary arrays, but it is quite fast for searching numpy.ndarray:
In [30]: data_ndarray = np.arange(0, 1000000) In [31]: %timeit np.searchsorted(data_ndarray, 99999) The slowest run took 16.04 times longer than the fastest. This could mean that an intermediate result is being cached. 1000000 loops, best of 3: 996 ns per loop In [32]: %timeit np.searchsorted(data_ndarray, 8888) The slowest run took 18.22 times longer than the fastest. This could mean that an intermediate result is being cached. 1000000 loops, best of 3: 994 ns per loop In [33]: %timeit np.searchsorted(data_ndarray, 777777) The slowest run took 31.32 times longer than the fastest. This could mean that an intermediate result is being cached. 1000000 loops, best of 3: 990 ns per loop
numpy.searchsorted can search multiple values at the same time:
>>> np.searchsorted([1,2,3,4,5], 3) 2 >>> np.searchsorted([1,2,3,4,5], 3, side='right') 3 >>> np.searchsorted([1,2,3,4,5], [-10, 10, 2, 3]) array([0, 5, 1, 2])