Home > Article > Backend Development > Detailed introduction to seven classic sorting algorithms based on Python

Detailed introduction to seven classic sorting algorithms based on Python

高洛峰Original: 2017-03-23 16:43:041508browse

1. Basic concepts and classifications of sorting

The so-called sorting is to arrange a string of records in increasing or decreasing order according to the size of one or some keywords in it. Get up operation. Sorting algorithm is how to arrange records as required.

Stability of sorting:

After a certain sorting, if the serial numbers of two records are the same, and the order of the two records in the original unordered record remains inconsistent, changes, the sorting method used is said to be stable, otherwise it is unstable.

Internal sorting and external sorting

Internal sorting: During the sorting process, all records to be sorted are placed in memory

External sorting: Sorting During the process, external storage is used.

Usually what is discussed is internal sorting.

Three factors that affect the performance of the internal sorting algorithm:

Time complexity: that is, time performance, an efficient sorting algorithm should have as few keywords as possible Number of comparisons and number of recorded moves

Space complexity: mainly the auxiliary space required to execute the algorithm, the less, the better.

Algorithm complexity. Mainly refers to the complexity of the code.

According to the main operations used in the sorting process, internal sorting can be divided into:

Insertion sort

Exchange sort

Selection sort

Merge sort

can be divided into two categories according to algorithm complexity:

Simple algorithm: including bubble sort, simple selection sort and Direct insertion sort

Improved algorithms: including Hill sort, heap sort, merge sort and quick sort

The following seven sorting algorithms are just the most classic of all sorting algorithms and do not represent all.

2. Bubble sorting

Bubble sorting (Bubble sort): time complexity O(n^2)

A kind of exchange sorting. The core idea is: compare the keywords of adjacent records pairwise, and exchange them if they are in reverse order, until there are no records in reverse order.

The implementation details can be different, such as the following three:

1. The simplest sorting implementation: bubble_sort_simple

2. Bubble sorting: bubble_sort

3. Improved bubble sort: bubble_sort_advance

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author: Liu Jiang
# Python 3.5
# 冒泡排序算法

class SQList:
  def init(self, lis=None):
    self.r = lis

  def swap(self, i, j):
    """定义一个交换元素的方法，方便后面调用。"""
    temp = self.r[i]
    self.r[i] = self.r[j]
    self.r[j] = temp

  def bubble_sort_simple(self):
    """
    最简单的交换排序，时间复杂度O(n^2)
    """
    lis = self.r
    length = len(self.r)
    for i in range(length):
      for j in range(i+1, length):
        if lis[i] > lis[j]:
          self.swap(i, j)

  def bubble_sort(self):
    """
    冒泡排序，时间复杂度O(n^2)
    """
    lis = self.r
    length = len(self.r)
    for i in range(length):
      j = length-2
      while j >= i:
        if lis[j] > lis[j+1]:
          self.swap(j, j+1)
        j -= 1

  def bubble_sort_advance(self):
    """
    冒泡排序改进算法，时间复杂度O(n^2)
    设置flag，当一轮比较中未发生交换动作，则说明后面的元素其实已经有序排列了。
    对于比较规整的元素集合，可提高一定的排序效率。
    """
    lis = self.r
    length = len(self.r)
    flag = True
    i = 0
    while i < length and flag:
      flag = False
      j = length - 2
      while j >= i:
        if lis[j] > lis[j + 1]:
          self.swap(j, j + 1)
          flag = True
        j -= 1
      i += 1

  def str(self):
    ret = ""
    for i in self.r:
      ret += " %s" % i
    return ret

if name == 'main':
  sqlist = SQList([4,1,7,3,8,5,9,2,6])
  # sqlist.bubble_sort_simple()
  # sqlist.bubble_sort()
  sqlist.bubble_sort_advance()
  print(sqlist)

3. Simple selection sort

Simple selection sort (simple selection sort): Time complexity O(n^2)

Through n-i comparisons between keywords, select the record with the smallest keyword from n-i+1 records, and combine it with the i-th (1
In layman's terms, all the elements that have not been sorted are compared from beginning to end, and the subscript of the smallest element is recorded, which is the position of the element. Then swap the element to the front of the current traversal. The efficiency lies in the fact that each round is compared many times but only exchanged once. Therefore, although its time complexity is also O(n^2), it is still better than the bubble algorithm.

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author: Liu Jiang
# Python 3.5
# 简单选择排序

class SQList:
  def init(self, lis=None):
    self.r = lis

  def swap(self, i, j):
    """定义一个交换元素的方法，方便后面调用。"""
    temp = self.r[i]
    self.r[i] = self.r[j]
    self.r[j] = temp

  def select_sort(self):
    """
    简单选择排序，时间复杂度O(n^2)
    """
    lis = self.r
    length = len(self.r)
    for i in range(length):
      minimum = i
      for j in range(i+1, length):
        if lis[minimum] > lis[j]:
          minimum = j
      if i != minimum:
        self.swap(i, minimum)

  def str(self):
    ret = ""
    for i in self.r:
      ret += " %s" % i
    return ret

if name == 'main':
  sqlist = SQList([4, 1, 7, 3, 8, 5, 9, 2, 6, 0])
  sqlist.select_sort()
  print(sqlist)

4. Direct Insertion Sort

Straight Insertion Sort: Time complexity O( n^2)

The basic operation is to insert a record into an already sorted ordered list, thereby obtaining a new ordered list with the number of records increased by 1.

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author: Liu Jiang
# Python 3.5
# 直接插入排序

class SQList:
  def init(self, lis=None):
    self.r = lis

  def insert_sort(self):
    lis = self.r
    length = len(self.r)
    # 下标从1开始
    for i in range(1, length):
      if lis[i] < lis[i-1]:
        temp = lis[i]
        j = i-1
        while lis[j] > temp and j >= 0:
          lis[j+1] = lis[j]
          j -= 1
        lis[j+1] = temp

  def str(self):
    ret = ""
    for i in self.r:
      ret += " %s" % i
    return ret

if name == 'main':
  sqlist = SQList([4, 1, 7, 3, 8, 5, 9, 2, 6, 0])
  sqlist.insert_sort()
  print(sqlist)

This algorithm requires auxiliary space for a record. In the best case, when the original data is in order, only one round of comparison is needed and no records need to be moved. In this case, the time complexity is O(n). However, this is basically a fantasy.

Detailed introduction to seven classic sorting algorithms based on Python

5. Shell Sort

Shell Sort is an improved version of insertion sort. Its core idea is Divide the original data set into several subsequences, and then perform direct insertion sorting on the subsequences respectively to make the subsequences basically orderly. Finally, perform a direct insertion sorting on all records.

The most critical thing here is the strategy of jumping and segmentation, that is, how we want to segment the data and how big the interval is. Usually records that are separated by a certain "increment" are formed into a subsequence, so as to ensure that the results obtained after direct insertion sorting within the subsequence are basically ordered rather than partially ordered. In the following example, the value of "increment" is determined by: increment = int(increment/3)+1.

The time complexity of Hill sorting is: O(n^(3/2))

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author: Liu Jiang
# Python 3.5
# 希尔排序

class SQList:
  def init(self, lis=None):
    self.r = lis

  def shell_sort(self):
    """希尔排序"""
    lis = self.r
    length = len(lis)
    increment = len(lis)
    while increment > 1:
      increment = int(increment/3)+1
      for i in range(increment+1, length):
        if lis[i] < lis[i - increment]:
          temp = lis[i]
          j = i - increment
          while j >= 0 and temp < lis[j]:
            lis[j+increment] = lis[j]
            j -= increment
          lis[j+increment] = temp

  def str(self):
    ret = ""
    for i in self.r:
      ret += " %s" % i
    return ret

if name == 'main':
  sqlist = SQList([4, 1, 7, 3, 8, 5, 9, 2, 6, 0,123,22])
  sqlist.shell_sort()
  print(sqlist)

六、堆排序

堆是具有下列性质的完全二叉树：

每个分支节点的值都大于或等于其左右孩子的值，称为大顶堆；

每个分支节点的值都小于或等于其做右孩子的值，称为小顶堆；

因此，其根节点一定是所有节点中最大（最小）的值。

Detailed introduction to seven classic sorting algorithms based on Python

如果按照层序遍历的方式（广度优先）给节点从1开始编号，则节点之间满足如下关系：

堆排序（Heap Sort）就是利用大顶堆或小顶堆的性质进行排序的方法。堆排序的总体时间复杂度为O(nlogn)。（下面采用大顶堆的方式）

其核心思想是：将待排序的序列构造成一个大顶堆。此时，整个序列的最大值就是堆的根节点。将它与堆数组的末尾元素交换，然后将剩余的n-1个序列重新构造成一个大顶堆。反复执行前面的操作，最后获得一个有序序列。

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author: Liu Jiang
# Python 3.5
# 堆排序

class SQList:
  def init(self, lis=None):
    self.r = lis

  def swap(self, i, j):
    """定义一个交换元素的方法，方便后面调用。"""
    temp = self.r[i]
    self.r[i] = self.r[j]
    self.r[j] = temp

  def heap_sort(self):
    length = len(self.r)
    i = int(length/2)
    # 将原始序列构造成一个大顶堆
    # 遍历从中间开始，到0结束，其实这些是堆的分支节点。
    while i >= 0:
      self.heap_adjust(i, length-1)
      i -= 1
    # 逆序遍历整个序列，不断取出根节点的值，完成实际的排序。
    j = length-1
    while j > 0:
      # 将当前根节点，也就是列表最开头，下标为0的值，交换到最后面j处
      self.swap(0, j)
      # 将发生变化的序列重新构造成大顶堆
      self.heap_adjust(0, j-1)
      j -= 1

  def heap_adjust(self, s, m):
    """核心的大顶堆构造方法，维持序列的堆结构。"""
    lis = self.r
    temp = lis[s]
    i = 2*s
    while i <= m:
      if i < m and lis[i] < lis[i+1]:
        i += 1
      if temp >= lis[i]:
        break
      lis[s] = lis[i]
      s = i
      i *= 2
    lis[s] = temp

  def str(self):
    ret = ""
    for i in self.r:
      ret += " %s" % i
    return ret

if name == 'main':
  sqlist = SQList([4, 1, 7, 3, 8, 5, 9, 2, 6, 0, 123, 22])
  sqlist.heap_sort()
  print(sqlist)

堆排序的运行时间主要消耗在初始构建堆和重建堆的反复筛选上。

其初始构建堆时间复杂度为O(n)。

正式排序时，重建堆的时间复杂度为O(nlogn)。

所以堆排序的总体时间复杂度为O(nlogn)。

堆排序对原始记录的排序状态不敏感，因此它无论最好、最坏和平均时间复杂度都是O(nlogn)。在性能上要好于冒泡、简单选择和直接插入算法。

空间复杂度上，只需要一个用于交换的暂存单元。但是由于记录的比较和交换是跳跃式的，因此，堆排序也是一种不稳定的排序方法。

此外，由于初始构建堆的比较次数较多，堆排序不适合序列个数较少的排序工作。

七、归并排序

归并排序（Merging Sort）：建立在归并操作上的一种有效的排序算法,该算法是采用分治法（pide and Conquer）的一个非常典型的应用。将已有序的子序列合并，得到完全有序的序列；即先使每个子序列有序，再使子序列段间有序。若将两个有序表合并成一个有序表，称为二路归并。

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author: Liu Jiang
# Python 3.5
# 归并排序

class SQList:
  def init(self, lis=None):
    self.r = lis

  def swap(self, i, j):
    """定义一个交换元素的方法，方便后面调用。"""
    temp = self.r[i]
    self.r[i] = self.r[j]
    self.r[j] = temp

  def merge_sort(self):
    self.msort(self.r, self.r, 0, len(self.r)-1)

  def msort(self, list_sr, list_tr, s, t):
    temp = [None for i in range(0, len(list_sr))]
    if s == t:
      list_tr[s] = list_sr[s]
    else:
      m = int((s+t)/2)
      self.msort(list_sr, temp, s, m)
      self.msort(list_sr, temp, m+1, t)
      self.merge(temp, list_tr, s, m, t)

  def merge(self, list_sr, list_tr, i, m, n):
    j = m+1
    k = i
    while i <= m and j <= n:
      if list_sr[i] < list_sr[j]:
        list_tr[k] = list_sr[i]
        i += 1
      else:
        list_tr[k] = list_sr[j]
        j += 1

      k += 1
    if i <= m:
      for l in range(0, m-i+1):
        list_tr[k+l] = list_sr[i+l]
    if j <= n:
      for l in range(0, n-j+1):
        list_tr[k+l] = list_sr[j+l]

  def str(self):
    ret = ""
    for i in self.r:
      ret += " %s" % i
    return ret

if name == 'main':
  sqlist = SQList([4, 1, 7, 3, 8, 5, 9, 2, 6, 0, 12, 77, 34, 23])
  sqlist.merge_sort()
  print(sqlist)

归并排序对原始序列元素分布情况不敏感，其时间复杂度为O(nlogn)。

归并排序在计算过程中需要使用一定的辅助空间，用于递归和存放结果，因此其空间复杂度为O(n+logn)。

归并排序中不存在跳跃，只有两两比较，因此是一种稳定排序。

总之，归并排序是一种比较占用内存，但效率高，并且稳定的算法。

八、快速排序

快速排序（Quick Sort）由图灵奖获得者Tony Hoare发明，被列为20世纪十大算法之一。冒泡排序的升级版，交换排序的一种。快速排序的时间复杂度为O(nlog(n))。

快速排序算法的核心思想：通过一趟排序将待排记录分割成独立的两部分，其中一部分记录的关键字均比另一部分记录的关键字小，然后分别对这两部分继续进行排序，以达到整个记录集合的排序目的。

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author: Liu Jiang
# Python 3.5
# 快速排序

class SQList:
  def init(self, lis=None):
    self.r = lis

  def swap(self, i, j):
    """定义一个交换元素的方法，方便后面调用。"""
    temp = self.r[i]
    self.r[i] = self.r[j]
    self.r[j] = temp

  def quick_sort(self):
    """调用入口"""
    self.qsort(0, len(self.r)-1)

  def qsort(self, low, high):
    """递归调用"""
    if low < high:
      pivot = self.partition(low, high)
      self.qsort(low, pivot-1)
      self.qsort(pivot+1, high)

  def partition(self, low, high):
    """
    快速排序的核心代码。
    其实就是将选取的pivot_key不断交换，将比它小的换到左边，将比它大的换到右边。
    它自己也在交换中不断变换自己的位置，直到完成所有的交换为止。
    但在函数调用的过程中，pivot_key的值始终不变。
    :param low:左边界下标
    :param high:右边界下标
    :return:分完左右区后pivot_key所在位置的下标
    """
    lis = self.r
    pivot_key = lis[low]
    while low < high:
      while low < high and lis[high] >= pivot_key:
        high -= 1
      self.swap(low, high)
      while low < high and lis[low] <= pivot_key:
        low += 1
      self.swap(low, high)
    return low

  def str(self):
    ret = ""
    for i in self.r:
      ret += " %s" % i
    return ret

if name == 'main':
  sqlist = SQList([4, 1, 7, 3, 8, 5, 9, 2, 6, 0, 123, 22])
  sqlist.quick_sort()
  print(sqlist)

快速排序的时间性能取决于递归的深度。

当pivot_key恰好处于记录关键码的中间值时，大小两区的划分比较均衡，接近一个平衡二叉树，此时的时间复杂度为O(nlog(n))。

当原记录集合是一个正序或逆序的情况下，分区的结果就是一棵斜树，其深度为n-1，每一次执行大小分区，都要使用n-i次比较，其最终时间复杂度为O(n^2)。

在一般情况下，通过数学归纳法可证明，快速排序的时间复杂度为O(nlog(n))。

但是由于关键字的比较和交换是跳跃式的，因此，快速排序是一种不稳定排序。

同时由于采用的递归技术，该算法需要一定的辅助空间，其空间复杂度为O(logn)。

基本的快速排序还有可以优化的地方：

1. 优化选取的pivot_key

前面我们每次选取pivot_key的都是子序列的第一个元素，也就是lis[low]，这就比较看运气。运气好时，该值处于整个序列的靠近中间值，则构造的树比较平衡，运气比较差，处于最大或最小位置附近则构造的树接近斜树。

为了保证pivot_key选取的尽可能适中，采取选取序列左中右三个特殊位置的值中，处于中间值的那个数为pivot_key，通常会比直接用lis[low]要好一点。在代码中，在原来的pivot_key = lis[low]这一行前面增加下面的代码：

m = low + int((high-low)/2)
if lis[low] > lis[high]:
  self.swap(low, high)
if lis[m] > lis[high]:
  self.swap(high, m)
if lis[m] > lis[low]:
  self.swap(m, low)

如果觉得这样还不够好，还可以将整个序列先划分为3部分，每一部分求出个pivot_key，再对3个pivot_key再做一次上面的比较得出最终的pivot_key。这时的pivot_key应该很大概率是一个比较靠谱的值。

2. 减少不必要的交换

原来的代码中pivot_key这个记录总是再不断的交换中，其实这是没必要的，完全可以将它暂存在某个临时变量中，如下所示：

def partition(self, low, high):
    
    lis = self.r

    m = low + int((high-low)/2)
    if lis[low] > lis[high]:
      self.swap(low, high)
    if lis[m] > lis[high]:
      self.swap(high, m)
    if lis[m] > lis[low]:
      self.swap(m, low)

    pivot_key = lis[low]
    # temp暂存pivot_key的值
    temp = pivot_key
    while low < high:
      while low < high and lis[high] >= pivot_key:
        high -= 1
      # 直接替换，而不交换了
      lis[low] = lis[high]
      while low < high and lis[low] <= pivot_key:
        low += 1
      lis[high] = lis[low]
      lis[low] = temp
    return low

3. 优化小数组时的排序

快速排序算法的递归操作在进行大量数据排序时，其开销能被接受，速度较快。但进行小数组排序时则不如直接插入排序来得快，也就是杀鸡用牛刀，未必就比菜刀来得快。

因此，一种很朴素的做法就是根据数据的多少，做个使用哪种算法的选择而已，如下改写qsort方法：

def qsort(self, low, high):
  """根据序列长短，选择使用快速排序还是简单插入排序"""
  # 7是一个经验值，可根据实际情况自行决定该数值。
  MAX_LENGTH = 7
  if high-low < MAX_LENGTH:
    if low < high:
      pivot = self.partition(low, high)
      self.qsort(low, pivot - 1)
      self.qsort(pivot + 1, high)
  else:
    # insert_sort方法是我们前面写过的简单插入排序算法
    self.insert_sort()

4. 优化递归操作

可以采用尾递归的方式对整个算法的递归操作进行优化，改写qsort方法如下：

def qsort(self, low, high):
  """根据序列长短，选择使用快速排序还是简单插入排序"""
  # 7是一个经验值，可根据实际情况自行决定该数值。
  MAX_LENGTH = 7
  if high-low < MAX_LENGTH:
    # 改用while循环
    while low < high:
      pivot = self.partition(low, high)
      self.qsort(low, pivot - 1)
      # 采用了尾递归的方式
      low = pivot + 1
  else:
    # insert_sort方法是我们前面写过的简单插入排序算法
    self.insert_sort()

九、排序算法总结

排序算法的分类：

Detailed introduction to seven classic sorting algorithms based on Python

没有十全十美的算法，有有点就会有缺点，即使是快速排序算法，也只是整体性能上的优越，也存在排序不稳定，需要大量辅助空间，不适于少量数据排序等缺点。

七种排序算法性能对比

Detailed introduction to seven classic sorting algorithms based on Python

如果待排序列基本有序，请直接使用简单的算法，不要使用复杂的改进算法。

归并排序和快速排序虽然性能高，但是需要更多的辅助空间。其实就是用空间换时间。

待排序列的元素个数越少，就越适合用简单的排序方法；元素个数越多就越适合用改进的排序算法。

简单选择排序虽然在时间性能上不好，但它在空间利用上性能很高。特别适合，那些数据量不大，每条数据的信息量又比较多的一类元素的排序。

The above is the detailed content of Detailed introduction to seven classic sorting algorithms based on Python. For more information, please follow other related articles on the PHP Chinese website!

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：python method to extract string using regular expressionNext article：python method to extract string using regular expression

See more

Detailed introduction to seven classic sorting algorithms based on Python

Related articles