TopK problem, that is, finding the largest K number. This problem is very common, such as finding the 10 most popular keywords from 10 million search records.
Method 1:
Sort first, and then intercept the first k numbers.
Time complexity: O(n*logn)+O(k)=O(n*logn).
This method is relatively simple and crude, just mention it.
Method 2: Maximum Heap
We can create a data container of size K to store the smallest K numbers, and then traverse the entire array and add each number Compare it with the maximum number in the container. If this number is greater than the maximum value in the container, continue traversing, otherwise replace the maximum value in the container with this number. The understanding of this method is also very simple. As for the choice of container, many people's first reaction is the maximum heap. But how to implement the maximum heap in Python? We can use the heapq library that implements the minimum heap, because in an array, if each number is inverted, the maximum number becomes the minimum number, and the order of the entire number changes, so each number in the array can be inverted. , and then use the minimum heap to negate the result when finally returning it. The code is as follows:
import heapq def get_least_numbers_big_data(self, alist, k): max_heap = [] length = len(alist) if not alist or k <= 0 or k > length: return k = k - 1 for ele in alist: ele = -ele if len(max_heap) <= k: heapq.heappush(max_heap, ele) else: heapq.heappushpop(max_heap, ele) return map(lambda x:-x, max_heap) if __name__ == "__main__": l = [1, 9, 2, 4, 7, 6, 3] min_k = get_least_numbers_big_data(l, 3)
Method 3: quick select
quick select algorithm. In fact, it is similar to quick sort. The difference is that quick select only needs to go in one direction for each trip.
Time complexity: O(n).
def qselect(A,k): if len(A)<k:return A pivot = A[-1] right = [pivot] + [x for x in A[:-1] if x>=pivot] rlen = len(right) if rlen==k: return right if rlen>k: return qselect(right, k) else: left = [x for x in A[:-1] if x<pivot] return qselect(left, k-rlen) + right for i in range(1, 10): print qselect([11,8,4,1,5,2,7,9], i)
For more articles related to the Python version of the maximum K number problem, please pay attention to the PHP Chinese website!