在本文中,術語 Python 和 CPython(該語言的參考實作)可以互換使用。本文專門討論 CPython,不涉及 Python 的任何其他實作。
Python 是一門美麗的語言,它允許程式設計師用簡單的術語表達他們的想法,而將實際實現的複雜性拋在腦後。
它抽像出來的東西之一就是排序。
你可以輕鬆找到「Python中排序是如何實現的?」這個問題的答案。這幾乎總是回答另一個問題:「Python 使用什麼排序演算法?」。
但是,這通常會留下一些有趣的實作細節。
有一個實現細節我認為討論得不夠充分,儘管它是七年前在 python 3.7 中引入的:
sorted() 和 list.sort() 已針對常見情況進行了最佳化,速度提高了 40-75%。 (由 Elliot Gorokhovsky 在 bpo-28685 中貢獻。)
但是在我們開始之前...
當你需要在Python中對清單進行排序時,你有兩個選擇:
key=None
,>),傳回排序清單而不修改其參數
如果需要對任何其他內建可迭代物件進行排序,則無論作為參數傳遞的可迭代物件或生成器的類型為何,都只能使用排序。
def sorted(iterable: Iterable[Any], key=None, reverse=False): new_list = list(iterable) new_list.sort(key=key, reverse=reverse) return new_list
這是用純 python 重寫的 CPython 排序 C 實現的大致等效項:
是的,就這麼簡單。
如 Python 內部排序文件所說:Python 如何讓排序更快
有時可以用更快的特定型別比較來取代較慢的通用 PyObject_RichCompareBool簡而言之,這個最佳化可以描述如下:
什麼是同質列表?
homogeneous = [1, 2, 3, 4]
例如:
heterogeneous = [1, "2", (3, ), {'4': 4}]
另一方面,這不是一個同質列表:
有趣的是,官方 Python 教學指出:
並且透過迭代列表來存取
關於元組的旁注 同一個教學指出:
元組是不可變的,且
通常包含異構序列
因此,如果您想知道何時使用元組或列表,這裡有一條經驗法則:
如果元素類型相同,則使用列表,否則使用元組等等,那數組呢?
Python 為數值實作了同構數組容器物件。
對它們進行排序的唯一方法是使用排序,它在內部從數組中創建一個列表,並在此過程中刪除任何與類型相關的資訊。
為什麼使用特定於類型的比較函數有幫助?
否則,會引發 TypeError
除此之外,每種類型自己的比較函數都會實現額外的檢查。
For example, when comparing strings, Python will check if the string characters take more than one byte of memory, and float comparison will compare a pair of float's and a float and an int differently.
A more detailed explanation and diagram can be found here: Adding Data-Aware Sort Optimizations to CPython
Before this optimization was introduced, Python had to execute all this various type-specific and non-type-specific checks every time two values were compared during sorting.
There's no magical way to know if all the elements of a list are of the same type other than to iterate over the list and check each element.
Python does almost exactly that — checking the types of sorting keys generated by key function passed to list.sort or sorted as a parameter
If a key function is provided, Python uses it to construct a list of keys, otherwise it uses the list's own values as sorting keys.
In an oversimplified manner, keys construction can be expressed as the following python code.
if key is None: keys = list_items else: keys = [key(list_item) for list_item in list_item]
Note, that keys used internally in CPython are a C array of CPython object references, and not a Python list
Once the keys are constructed, Python checks their types.
When checking the types of keys, Python's sorting algorithm tries to determine if all elements in the keys array are either str, int, float or tuple, or simply of the same type, with some constraints for base types.
It's worth noting that checking the types of the keys adds some extra work up front. Python does this because it usually pays off by making the actual sorting faster, especially for longer lists.
int should not be a bignum
Practically this means that for this optimization to work, integer should be less than 2^30 - 1 (this may vary depending on the platform)
As a side note, here is a great article which explains how Python handles big integers: # How python implements super long integers?
All characters of a string should take less than 1 byte of memory, meaning that they should be represented by integer values in the range of 0-255
In practice, this means that strings should consist only of Latin characters, spaces, and some special characters found in the ASCII table.
There are no constraints for floats in order for this optimization to work.
First of all, isn’t it fascinating to know?
Secondly, mentioning this knowledge could be a nice touch in a Python Developer interview.
As for actual code development, understanding this optimization can help you improve sorting performance.
According to the benchmark in the PR that introduced this optimization, sorting a list that consists only of floats rather than a list of floats with even a single integer at the end is almost twice as fast.
So when it's time to optimize, transforming list like this
floats_and_int = [1.0, -1.0, -0.5, 3]
Into list that looks like this
just_floats = [1.0, -1.0, -0.5, 3.0] # note that 3.0 is a float now
might improve performance.
While Python's sorting optimization works well with built-in types, it's important to understand how it interacts with custom classes.
When sorting objects of custom classes, Python relies on the comparison methods you define, such as __lt__ (less than) or __gt__ (greater than).
However, the type-specific optimization doesn't apply to custom classes.
Python will always use the general comparison method for these objects.
Here's an example:
class MyClass: def __init__(self, value): self.value = value def __lt__(self, other): return self.value < other.value my_list = [MyClass(3), MyClass(1), MyClass(2)] sorted_list = sorted(my_list)
In this case, Python will use the __lt__ method for comparisons, but it won't benefit from the type-specific optimization. The sorting will still work correctly, but it may not be as fast as sorting built-in types.
If performance is critical when sorting custom objects, consider using a key function that returns a built-in type:
sorted_list = sorted(my_list, key=lambda x: x.value)
Premature optimization, especially in Python, is evil.
您不應該圍繞 CPython 中的特定優化來設計整個應用程序,但了解這些優化是有好處的:充分了解您的工具是成為更熟練的開發人員的一種方式。
留意這些最佳化可以讓你在情況需要時利用它們,特別是當效能變得至關重要時:
考慮一個基於時間戳進行排序的場景:使用同構整數列表(Unix 時間戳記)而不是日期時間物件可以有效地利用此最佳化。
但是,重要的是要記住,程式碼的可讀性和可維護性應優先於此類最佳化。
雖然了解這些底層細節很重要,但欣賞 Python 的高階抽像也同樣重要,正是這些抽象使其成為一種高效的語言。
Python 是一門令人驚嘆的語言,探索其深度可以幫助您更好地理解它並成為更好的 Python 程式設計師。
以上是比較優化如何讓 Python 排序更快的詳細內容。更多資訊請關注PHP中文網其他相關文章!