Home  >  Article  >  Backend Development  >  Python standard library functools/itertools/operator

Python standard library functools/itertools/operator

高洛峰
高洛峰Original
2017-02-09 11:04:561642browse

Introduction

##functools, itertools, operator is the Python standard library. We provide three major modules that support functional programming. If we use these three modules properly, we can write more concise and readable Pythonic code. Next, we will use some examples to understand the use of the three major modules.

Usage of functools

Functools is a very important module in Python. It provides some very useful high-order functions. A higher-order function is a function that can accept a function as a parameter or use a function as a return value. Because functions in Python are also objects, it is easy to support such functional features.

partial

>>> from functools import partial

>>> basetwo = partial(int, base=2)

>>> basetwo('10010')
18

basetwo('10010') is actually equivalent to calling int('10010', base=2), when When the function has too many parameters, you can use functools.partial to create a new function to simplify the logic and enhance the readability of the code. Partial is actually implemented internally through a simple closure.

def partial(func, *args, **keywords):
    def newfunc(*fargs, **fkeywords):
        newkeywords = keywords.copy()
        newkeywords.update(fkeywords)
        return func(*args, *fargs, **newkeywords)
    newfunc.func = func
    newfunc.args = args
    newfunc.keywords = keywords
    return newfunc
partialmethod

partialmethod is similar to partial, but for

binding a non-object's own method, only partialmethod can be used at this time. We pass the following Let’s take a look at the difference between the two with this example.

from functools import partial, partialmethod


def standalone(self, a=1, b=2):
    "Standalone function"
    print('  called standalone with:', (self, a, b))
    if self is not None:
        print('  self.attr =', self.attr)


class MyClass:
    "Demonstration class for functools"
    def __init__(self):
        self.attr = 'instance attribute'
    method1 = functools.partialmethod(standalone)  # 使用partialmethod
    method2 = functools.partial(standalone)  # 使用partial
>>> o = MyClass()

>>> o.method1()
  called standalone with: (<__main__.MyClass object at 0x7f46d40cc550>, 1, 2)
  self.attr = instance attribute

# 不能使用partial
>>> o.method2()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: standalone() missing 1 required positional argument: 'self'
singledispatch

Although Python does not support methods with the same name that allow different parameter types, we can use singledispatch to

dynamically specify the parameter types received by the corresponding methods, Instead of putting parameter judgment inside the method, it reduces the readability of the code.

from functools import singledispatch


class TestClass(object):
    @singledispatch
    def test_method(arg, verbose=False):
        if verbose:
            print("Let me just say,", end=" ")
        print(arg)

    @test_method.register(int)
    def _(arg):
        print("Strength in numbers, eh?", end=" ")
        print(arg)

    @test_method.register(list)
    def _(arg):
        print("Enumerate this:")

        for i, elem in enumerate(arg):
            print(i, elem)
The following uses @test_method.register(int) and @test_method.register(list) to specify that when the first parameter of test_method is int or list, different methods are called for processing.

>>> TestClass.test_method(55555)  # call @test_method.register(int)
Strength in numbers, eh? 55555

>>> TestClass.test_method([33, 22, 11])   # call @test_method.register(list)
Enumerate this:
0 33
1 22
2 11

>>> TestClass.test_method('hello world', verbose=True)  # call default
Let me just say, hello world
wraps

The decorator will lose the __name__ and __doc__ attributes of the decorated function, which can be restored using @wraps.

from functools import wraps


def my_decorator(f):
    @wraps(f)
    def wrapper():
        """wrapper_doc"""
        print('Calling decorated function')
        return f()
    return wrapper


@my_decorator
def example():
    """example_doc"""
    print('Called example function')
>>> example.__name__
'example'
>>> example.__doc__
'example_doc'

# 尝试去掉@wraps(f)来看一下运行结果,example自身的__name__和__doc__都已经丧失了
>>> example.__name__
'wrapper'
>>> example.__doc__
'wrapper_doc'
We can also use update_wrapper to rewrite

from itertools import update_wrapper


def g():
    ...
g = update_wrapper(g, f)


# equal to
@wraps(f)
def g():
    ...
@wraps is actually implemented internally based on update_wrapper.

def wraps(wrapped, assigned=WRAPPER_ASSIGNMENTS, updated=WRAPPER_UPDATES):
    def decorator(wrapper):
        return update_wrapper(wrapper, wrapped=wrapped...)
    return decorator
lru_cache

lru_cache and singledispatch are black magic that are widely used in development. Next, let’s take a look at lru_cache. For repetitive computing tasks, it is very important to use

cache acceleration. Let's use a fibonacci example to see the difference in speed between using lru_cache and not using lru_cache.

# clockdeco.py

import time
import functools


def clock(func):
    @functools.wraps(func)
    def clocked(*args, **kwargs):
        t0 = time.time()
        result = func(*args, **kwargs)
        elapsed = time.time() - t0
        name = func.__name__
        arg_lst = []
        if args:
            arg_lst.append(', '.join(repr(arg) for arg in args))
        if kwargs:
            pairs = ['%s=%r' % (k, w) for k, w in sorted(kwargs.items())]
            arg_lst.append(', '.join(pairs))
        arg_str = ', '.join(arg_lst)
        print('[%0.8fs] %s(%s) -> %r ' % (elapsed, name, arg_str, result))
        return result
    return clocked

Do not use lru_cache

from clockdeco import clock


@clock
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-2) + fibonacci(n-1)


if __name__==&#39;__main__&#39;:
    print(fibonacci(6))

The following are the running results. From the running results, we can see that fibonacci(n) will be

repeatedly calculated during recursion, this is very time consuming and consumes resources.

[0.00000119s] fibonacci(0) -> 0 
[0.00000143s] fibonacci(1) -> 1 
[0.00021172s] fibonacci(2) -> 1 
[0.00000072s] fibonacci(1) -> 1 
[0.00000095s] fibonacci(0) -> 0 
[0.00000095s] fibonacci(1) -> 1 
[0.00011444s] fibonacci(2) -> 1 
[0.00022793s] fibonacci(3) -> 2 
[0.00055265s] fibonacci(4) -> 3 
[0.00000072s] fibonacci(1) -> 1 
[0.00000072s] fibonacci(0) -> 0 
[0.00000095s] fibonacci(1) -> 1 
[0.00011158s] fibonacci(2) -> 1 
[0.00022268s] fibonacci(3) -> 2 
[0.00000095s] fibonacci(0) -> 0 
[0.00000095s] fibonacci(1) -> 1 
[0.00011349s] fibonacci(2) -> 1 
[0.00000072s] fibonacci(1) -> 1 
[0.00000095s] fibonacci(0) -> 0 
[0.00000095s] fibonacci(1) -> 1 
[0.00010705s] fibonacci(2) -> 1 
[0.00021267s] fibonacci(3) -> 2 
[0.00043225s] fibonacci(4) -> 3 
[0.00076509s] fibonacci(5) -> 5 
[0.00142813s] fibonacci(6) -> 8 
8

Use lru_cache

import functools
from clockdeco import clock


@functools.lru_cache() # 1
@clock # 2
def fibonacci(n):
    if n < 2:
       return n
    return fibonacci(n-2) + fibonacci(n-1)

if __name__==&#39;__main__&#39;:
    print(fibonacci(6))

The following are the running results. The calculated results are put into the cache.

[0.00000095s] fibonacci(0) -> 0 
[0.00005770s] fibonacci(1) -> 1 
[0.00015855s] fibonacci(2) -> 1 
[0.00000286s] fibonacci(3) -> 2 
[0.00021124s] fibonacci(4) -> 3 
[0.00000191s] fibonacci(5) -> 5 
[0.00024652s] fibonacci(6) -> 8 
8
The number we chose above is not big enough. Interested friends may wish to choose a larger number to compare the difference in speed between the two

total_ordering

In Python2, you can compare the size of objects by customizing the return value of __cmp__ 0/-1/1. In Python3, __cmp__ is abandoned, but we can modify it through total_ordering and then modify __lt__(), __le__(), __gt__( ), __ge__(), __eq__(), __ne__() and other magic methods to customize the comparison rules of the class. p.s: If you use it, you must define one of __lt__(), __le__(), __gt__(), __ge__() in the class, and add an __eq__() method to the class.

import functools


@functools.total_ordering
class MyObject:
    def __init__(self, val):
        self.val = val

    def __eq__(self, other):
        print('  testing __eq__({}, {})'.format(
            self.val, other.val))
        return self.val == other.val

    def __gt__(self, other):
        print('  testing __gt__({}, {})'.format(
            self.val, other.val))
        return self.val > other.val


a = MyObject(1)
b = MyObject(2)

for expr in ['a < b&#39;, &#39;a <= b&#39;, &#39;a == b&#39;, &#39;a >= b', 'a > b']:
    print('\n{:<6}:&#39;.format(expr))
    result = eval(expr)
    print(&#39;  result of {}: {}&#39;.format(expr, result))

The following are the running results:

a < b :
  testing __gt__(1, 2)
  testing __eq__(1, 2)
  result of a < b: True

a <= b:
  testing __gt__(1, 2)
  result of a <= b: True

a == b:
  testing __eq__(1, 2)
  result of a == b: False

a >= b:
  testing __gt__(1, 2)
  testing __eq__(1, 2)
  result of a >= b: False

a > b :
  testing __gt__(1, 2)
  result of a > b: False
Usage of itertools

itertools provides us with very useful functions for operating iterative objects.

Infinite iterator

count

count(start=0, step=1) will return an infinite integer iterator, increasing by 1 each time. You can optionally provide a starting number, which defaults to 0.

>>> from itertools import count

>>> for i in zip(count(1), ['a', 'b', 'c']):
...     print(i, end=' ')
...
(1, 'a') (2, 'b') (3, 'c')
cycle

cycle(iterable) will repeat an incoming sequence indefinitely, but you can provide a second parameter to specify the number of repetitions.

>>> from itertools import cycle

>>> for i in zip(range(6), cycle(['a', 'b', 'c'])):
...     print(i, end=' ')
...
(0, 'a') (1, 'b') (2, 'c') (3, 'a') (4, 'b') (5, 'c')
repeat

repeat(object[, times]) returns an iterator whose elements are repeated infinitely. You can provide a second parameter to limit the number of repetitions.

>>> from itertools import repeat

>>> for i, s in zip(count(1), repeat('over-and-over', 5)):
...     print(i, s)
...
1 over-and-over
2 over-and-over
3 over-and-over
4 over-and-over
5 over-and-over
Iterators terminating on the shortest input sequence

accumulate

accumulate(iterable[, func])

>>> from itertools import accumulate
>>> import operator

>>> list(accumulate([1, 2, 3, 4, 5], operator.add))
[1, 3, 6, 10, 15]

>>> list(accumulate([1, 2, 3, 4, 5], operator.mul))
[1, 2, 6, 24, 120]
chain

itertools .chain(*iterables) can combine multiple iterables into one iterator

>>> from itertools import chain

>>> list(chain([1, 2, 3], ['a', 'b', 'c']))
[1, 2, 3, 'a', 'b', 'c']
The implementation principle of chain is as follows

def chain(*iterables):
    # chain('ABC', 'DEF') --> A B C D E F
    for it in iterables:
        for element in it:
            yield element
chain.from_iterable

chain.from_iterable(iterable) and Chain is similar, but it only receives a single iterable and then combines the elements in this iterable into an iterator.

>>> from itertools import chain

>>> list(chain.from_iterable(['ABC', 'DEF']))
['A', 'B', 'C', 'D', 'E', 'F']
The implementation principle is also similar to chain

def from_iterable(iterables):
    # chain.from_iterable(['ABC', 'DEF']) --> A B C D E F
    for it in iterables:
        for element in it:
            yield element
compress

compress(data, selectors) receives two iterables as parameters, and only returns the corresponding element in selectors that is True. data, stops when one of the data/selectors is exhausted.

>>> list(compress([1, 2, 3, 4, 5], [True, True, False, False, True]))
[1, 2, 5]
zip_longest

zip_longest(*iterables, fillvalue=None) is similar to zip, but the disadvantage of zip is that when a certain element in iterable is traversed, the entire traversal will stop. Please see the specific differences. Look at the example below

from itertools import zip_longest

r1 = range(3)
r2 = range(2)

print('zip stops early:')
print(list(zip(r1, r2)))

r1 = range(3)
r2 = range(2)

print('\nzip_longest processes all of the values:')
print(list(zip_longest(r1, r2)))
The following is the output result

zip stops early:
[(0, 0), (1, 1)]

zip_longest processes all of the values:
[(0, 0), (1, 1), (2, None)]

islice

islice(iterable, stop) or islice(iterable, start, stop[, step]) 与Python的字符串和列表切片有一些类似,只是不能对start、start和step使用负值。

>>> from itertools import islice

>>> for i in islice(range(100), 0, 100, 10):
...     print(i, end=' ')
...
0 10 20 30 40 50 60 70 80 90

tee

tee(iterable, n=2) 返回n个独立的iterator,n默认为2。

from itertools import islice, tee

r = islice(count(), 5)
i1, i2 = tee(r)

print('i1:', list(i1))
print('i2:', list(i2))

for i in r:
    print(i, end=' ')
    if i > 1:
        break

下面是输出结果,注意tee(r)后,r作为iterator已经失效,所以for循环没有输出值。

i1: [0, 1, 2, 3, 4]
i2: [0, 1, 2, 3, 4]

starmap

starmap(func, iterable)假设iterable将返回一个元组流,并使用这些元组作为参数调用func:

>>> from itertools import starmap
>>> import os

>>> iterator = starmap(os.path.join,
...                    [('/bin', 'python'), ('/usr', 'bin', 'java'),
...                    ('/usr', 'bin', 'perl'), ('/usr', 'bin', 'ruby')])

>>> list(iterator)
['/bin/python', '/usr/bin/java', '/usr/bin/perl', '/usr/bin/ruby']

filterfalse

filterfalse(predicate, iterable) 与filter()相反,返回所有predicate返回False的元素。

itertools.filterfalse(is_even, itertools.count()) =>
1, 3, 5, 7, 9, 11, 13, 15, ...

takewhile

takewhile(predicate, iterable) 只要predicate返回True,不停地返回iterable中的元素。一旦predicate返回False,iteration将结束。

def less_than_10(x):
    return x < 10

itertools.takewhile(less_than_10, itertools.count())
=> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

itertools.takewhile(is_even, itertools.count())
=> 0

dropwhile

dropwhile(predicate, iterable) 在predicate返回True时舍弃元素,然后返回其余迭代结果。

itertools.dropwhile(less_than_10, itertools.count())
=> 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...

itertools.dropwhile(is_even, itertools.count())
=> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...

groupby

groupby(iterable, key=None) 把iterator中相邻的重复元素挑出来放在一起。p.s: The input sequence needs to be sorted on the key value in order for the groupings to work out as expected.

  • [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B

  • [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D

>>> import itertools

>>> for key, group in itertools.groupby('AAAABBBCCDAABBB'):
...     print(key, list(group))
...
A ['A', 'A', 'A', 'A']
B ['B', 'B', 'B']
C ['C', 'C']
D ['D']
A ['A', 'A']
B ['B', 'B', 'B']
city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'),
             ('Anchorage', 'AK'), ('Nome', 'AK'),
             ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ'),
             ...
            ]

def get_state(city_state):
    return city_state[1]

itertools.groupby(city_list, get_state) =>
  ('AL', iterator-1),
  ('AK', iterator-2),
  ('AZ', iterator-3), ...

iterator-1 =>  ('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL')
iterator-2 => ('Anchorage', 'AK'), ('Nome', 'AK')
iterator-3 => ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')

Combinatoric generators

product

product(*iterables, repeat=1)

  • product(A, B) returns the same as ((x,y) for x in A for y in B)

  • product(A, repeat=4) means the same as product(A, A, A, A)

from itertools import product


def show(iterable):
    for i, item in enumerate(iterable, 1):
        print(item, end=' ')
        if (i % 3) == 0:
            print()
    print()


print('Repeat 2:\n')
show(product(range(3), repeat=2))

print('Repeat 3:\n')
show(product(range(3), repeat=3))
Repeat 2:

(0, 0) (0, 1) (0, 2)
(1, 0) (1, 1) (1, 2)
(2, 0) (2, 1) (2, 2)

Repeat 3:

(0, 0, 0) (0, 0, 1) (0, 0, 2)
(0, 1, 0) (0, 1, 1) (0, 1, 2)
(0, 2, 0) (0, 2, 1) (0, 2, 2)
(1, 0, 0) (1, 0, 1) (1, 0, 2)
(1, 1, 0) (1, 1, 1) (1, 1, 2)
(1, 2, 0) (1, 2, 1) (1, 2, 2)
(2, 0, 0) (2, 0, 1) (2, 0, 2)
(2, 1, 0) (2, 1, 1) (2, 1, 2)
(2, 2, 0) (2, 2, 1) (2, 2, 2)

permutations

permutations(iterable, r=None)返回长度为r的所有可能的组合。

from itertools import permutations


def show(iterable):
    first = None
    for i, item in enumerate(iterable, 1):
        if first != item[0]:
            if first is not None:
                print()
            first = item[0]
        print(''.join(item), end=' ')
    print()


print('All permutations:\n')
show(permutations('abcd'))

print('\nPairs:\n')
show(permutations('abcd', r=2))

下面是输出结果

All permutations:

abcd abdc acbd acdb adbc adcb
bacd badc bcad bcda bdac bdca
cabd cadb cbad cbda cdab cdba
dabc dacb dbac dbca dcab dcba

Pairs:

ab ac ad
ba bc bd
ca cb cd
da db dc

combinations

combinations(iterable, r) 返回一个iterator,提供iterable中所有元素可能组合的r元组。每个元组中的元素保持与iterable返回的顺序相同。下面的实例中,不同于上面的permutations,a总是在bcd之前,b总是在cd之前,c总是在d之前。

from itertools import combinations


def show(iterable):
    first = None
    for i, item in enumerate(iterable, 1):
        if first != item[0]:
            if first is not None:
                print()
            first = item[0]
        print(''.join(item), end=' ')
    print()


print('Unique pairs:\n')
show(combinations('abcd', r=2))

下面是输出结果

Unique pairs:

ab ac ad
bc bd
cd

combinations_with_replacement

combinations_with_replacement(iterable, r)函数放宽了一个不同的约束:元素可以在单个元组中重复,即可以出现aa/bb/cc/dd等组合。

from itertools import combinations_with_replacement


def show(iterable):
    first = None
    for i, item in enumerate(iterable, 1):
        if first != item[0]:
            if first is not None:
                print()
            first = item[0]
        print(''.join(item), end=' ')
    print()


print('Unique pairs:\n')
show(combinations_with_replacement('abcd', r=2))

下面是输出结果

aa ab ac ad
bb bc bd
cc cd
dd

operator的使用

attrgetter

operator.attrgetter(attr)和operator.attrgetter(*attrs)

  • After f = attrgetter('name'), the call f(b) returns b.name.

  • After f = attrgetter('name', 'date'), the call f(b) returns (b.name, b.date).

  • After f = attrgetter('name.first', 'name.last'), the call f(b) returns (b.name.first, b.name.last).

我们通过下面这个例子来了解一下itergetter的用法。

>>> class Student:
...     def __init__(self, name, grade, age):
...         self.name = name
...         self.grade = grade
...         self.age = age
...     def __repr__(self):
...         return repr((self.name, self.grade, self.age))

>>> student_objects = [
...     Student('john', 'A', 15),
...     Student('jane', 'B', 12),
...     Student('dave', 'B', 10),
... ]

>>> sorted(student_objects, key=lambda student: student.age)   # 传统的lambda做法
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

>>> from operator import itemgetter, attrgetter

>>> sorted(student_objects, key=attrgetter('age'))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

# 但是如果像下面这样接受双重比较,Python脆弱的lambda就不适用了
>>> sorted(student_objects, key=attrgetter('grade', 'age'))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

attrgetter的实现原理:

def attrgetter(*items):
    if any(not isinstance(item, str) for item in items):
        raise TypeError('attribute name must be a string')
    if len(items) == 1:
        attr = items[0]
        def g(obj):
            return resolve_attr(obj, attr)
    else:
        def g(obj):
            return tuple(resolve_attr(obj, attr) for attr in items)
    return g

def resolve_attr(obj, attr):
    for name in attr.split("."):
        obj = getattr(obj, name)
    return obj

itemgetter

operator.itemgetter(item)和operator.itemgetter(*items)

  • After f = itemgetter(2), the call f(r) returns r[2].

  • After g = itemgetter(2, 5, 3), the call g(r) returns (r[2], r[5], r[3]).

我们通过下面这个例子来了解一下itergetter的用法

>>> student_tuples = [
...     ('john', 'A', 15),
...     ('jane', 'B', 12),
...     ('dave', 'B', 10),
... ]

>>> sorted(student_tuples, key=lambda student: student[2])   # 传统的lambda做法
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

>>> from operator import attrgetter

>>> sorted(student_tuples, key=itemgetter(2))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

# 但是如果像下面这样接受双重比较,Python脆弱的lambda就不适用了
>>> sorted(student_tuples, key=itemgetter(1,2))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

itemgetter的实现原理

def itemgetter(*items):
    if len(items) == 1:
        item = items[0]
        def g(obj):
            return obj[item]
    else:
        def g(obj):
            return tuple(obj[item] for item in items)
    return g

methodcaller

operator.methodcaller(name[, args...])

  • After f = methodcaller('name'), the call f(b) returns b.name().

  • After f = methodcaller('name', 'foo', bar=1), the call f(b) returns b.name('foo', bar=1).

methodcaller的实现原理

def methodcaller(name, *args, **kwargs):
    def caller(obj):
        return getattr(obj, name)(*args, **kwargs)
    return caller

References

DOCUMENTATION-FUNCTOOLS
DOCUMENTATION-ITERTOOLS
DOCUMENTATION-OPERATOR
HWOTO-FUNCTIONAL
HWOTO-SORTING
PYMOTW
FLENT-PYTHON


本文为作者原创,转载请先与作者联系。首发于我的博客

引言

functools, itertools, operator是Python标准库为我们提供的支持函数式编程的三大模块,合理的使用这三个模块,我们可以写出更加简洁可读的Pythonic代码,接下来我们通过一些example来了解三大模块的使用。

functools的使用

functools是Python中很重要的模块,它提供了一些非常有用的高阶函数。高阶函数就是说一个可以接受函数作为参数或者以函数作为返回值的函数,因为Python中函数也是对象,因此很容易支持这样的函数式特性。

partial

>>> from functools import partial

>>> basetwo = partial(int, base=2)

>>> basetwo('10010')
18

basetwo('10010') is actually equivalent to calling int('10010', base=2), when the function parameters When there are too many, you can use functools.partial to create a new function to simplify the logic and enhance the readability of the code. Partial is actually implemented internally through a simple closure.

def partial(func, *args, **keywords):
    def newfunc(*fargs, **fkeywords):
        newkeywords = keywords.copy()
        newkeywords.update(fkeywords)
        return func(*args, *fargs, **newkeywords)
    newfunc.func = func
    newfunc.args = args
    newfunc.keywords = keywords
    return newfunc

partialmethod

partialmethod is similar to partial, but for binding a non-object's own method, only partialmethod can be used at this time. We pass the following Let’s take a look at the difference between the two with this example.

from functools import partial, partialmethod


def standalone(self, a=1, b=2):
    "Standalone function"
    print('  called standalone with:', (self, a, b))
    if self is not None:
        print('  self.attr =', self.attr)


class MyClass:
    "Demonstration class for functools"
    def __init__(self):
        self.attr = 'instance attribute'
    method1 = functools.partialmethod(standalone)  # 使用partialmethod
    method2 = functools.partial(standalone)  # 使用partial
>>> o = MyClass()

>>> o.method1()
  called standalone with: (<__main__.MyClass object at 0x7f46d40cc550>, 1, 2)
  self.attr = instance attribute

# 不能使用partial
>>> o.method2()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: standalone() missing 1 required positional argument: 'self'

singledispatch

Although Python does not support methods with the same name that allow different parameter types, we can use singledispatch to dynamically specify the parameter types received by the corresponding methods, Instead of putting parameter judgment inside the method, it reduces the readability of the code.

from functools import singledispatch


class TestClass(object):
    @singledispatch
    def test_method(arg, verbose=False):
        if verbose:
            print("Let me just say,", end=" ")
        print(arg)

    @test_method.register(int)
    def _(arg):
        print("Strength in numbers, eh?", end=" ")
        print(arg)

    @test_method.register(list)
    def _(arg):
        print("Enumerate this:")

        for i, elem in enumerate(arg):
            print(i, elem)

The following uses @test_method.register(int) and @test_method.register(list) to specify that when the first parameter of test_method is int or list, different methods are called for processing.

>>> TestClass.test_method(55555)  # call @test_method.register(int)
Strength in numbers, eh? 55555

>>> TestClass.test_method([33, 22, 11])   # call @test_method.register(list)
Enumerate this:
0 33
1 22
2 11

>>> TestClass.test_method('hello world', verbose=True)  # call default
Let me just say, hello world

wraps

The decorator will lose the __name__ and __doc__ attributes of the decorated function, which can be restored using @wraps.

from functools import wraps


def my_decorator(f):
    @wraps(f)
    def wrapper():
        """wrapper_doc"""
        print('Calling decorated function')
        return f()
    return wrapper


@my_decorator
def example():
    """example_doc"""
    print('Called example function')
>>> example.__name__
'example'
>>> example.__doc__
'example_doc'

# 尝试去掉@wraps(f)来看一下运行结果,example自身的__name__和__doc__都已经丧失了
>>> example.__name__
'wrapper'
>>> example.__doc__
'wrapper_doc'

We can also use update_wrapper to rewrite

from itertools import update_wrapper


def g():
    ...
g = update_wrapper(g, f)


# equal to
@wraps(f)
def g():
    ...

@wraps is actually implemented internally based on update_wrapper.

def wraps(wrapped, assigned=WRAPPER_ASSIGNMENTS, updated=WRAPPER_UPDATES):
    def decorator(wrapper):
        return update_wrapper(wrapper, wrapped=wrapped...)
    return decorator

lru_cache

lru_cache and singledispatch are black magic that are widely used in development. Next, let’s take a look at lru_cache. For repetitive computing tasks, it is very important to use cache acceleration. Let's use a fibonacci example to see the difference in speed between using lru_cache and not using lru_cache.

# clockdeco.py

import time
import functools


def clock(func):
    @functools.wraps(func)
    def clocked(*args, **kwargs):
        t0 = time.time()
        result = func(*args, **kwargs)
        elapsed = time.time() - t0
        name = func.__name__
        arg_lst = []
        if args:
            arg_lst.append(', '.join(repr(arg) for arg in args))
        if kwargs:
            pairs = ['%s=%r' % (k, w) for k, w in sorted(kwargs.items())]
            arg_lst.append(', '.join(pairs))
        arg_str = ', '.join(arg_lst)
        print('[%0.8fs] %s(%s) -> %r ' % (elapsed, name, arg_str, result))
        return result
    return clocked

Do not use lru_cache

from clockdeco import clock


@clock
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-2) + fibonacci(n-1)


if __name__==&#39;__main__&#39;:
    print(fibonacci(6))

The following are the running results. From the running results, we can see that fibonacci(n) will be repeatedly calculated during recursion, this is very time consuming and consumes resources.

[0.00000119s] fibonacci(0) -> 0 
[0.00000143s] fibonacci(1) -> 1 
[0.00021172s] fibonacci(2) -> 1 
[0.00000072s] fibonacci(1) -> 1 
[0.00000095s] fibonacci(0) -> 0 
[0.00000095s] fibonacci(1) -> 1 
[0.00011444s] fibonacci(2) -> 1 
[0.00022793s] fibonacci(3) -> 2 
[0.00055265s] fibonacci(4) -> 3 
[0.00000072s] fibonacci(1) -> 1 
[0.00000072s] fibonacci(0) -> 0 
[0.00000095s] fibonacci(1) -> 1 
[0.00011158s] fibonacci(2) -> 1 
[0.00022268s] fibonacci(3) -> 2 
[0.00000095s] fibonacci(0) -> 0 
[0.00000095s] fibonacci(1) -> 1 
[0.00011349s] fibonacci(2) -> 1 
[0.00000072s] fibonacci(1) -> 1 
[0.00000095s] fibonacci(0) -> 0 
[0.00000095s] fibonacci(1) -> 1 
[0.00010705s] fibonacci(2) -> 1 
[0.00021267s] fibonacci(3) -> 2 
[0.00043225s] fibonacci(4) -> 3 
[0.00076509s] fibonacci(5) -> 5 
[0.00142813s] fibonacci(6) -> 8 
8

Use lru_cache

import functools
from clockdeco import clock


@functools.lru_cache() # 1
@clock # 2
def fibonacci(n):
    if n < 2:
       return n
    return fibonacci(n-2) + fibonacci(n-1)

if __name__==&#39;__main__&#39;:
    print(fibonacci(6))

The following are the running results. The calculated results are put into the cache.

[0.00000095s] fibonacci(0) -> 0 
[0.00005770s] fibonacci(1) -> 1 
[0.00015855s] fibonacci(2) -> 1 
[0.00000286s] fibonacci(3) -> 2 
[0.00021124s] fibonacci(4) -> 3 
[0.00000191s] fibonacci(5) -> 5 
[0.00024652s] fibonacci(6) -> 8 
8

The number we chose above is not big enough. Interested friends may wish to choose a larger number to compare the difference in speed between the two

total_ordering

In Python2, you can compare the size of objects by customizing the return value of __cmp__ 0/-1/1. In Python3, __cmp__ is abandoned, but we can modify it through total_ordering and then modify __lt__(), __le__(), __gt__( ), __ge__(), __eq__(), __ne__() and other magic methods to customize the comparison rules of the class. p.s: If you use it, you must define one of __lt__(), __le__(), __gt__(), __ge__() in the class, and add an __eq__() method to the class.

import functools


@functools.total_ordering
class MyObject:
    def __init__(self, val):
        self.val = val

    def __eq__(self, other):
        print('  testing __eq__({}, {})'.format(
            self.val, other.val))
        return self.val == other.val

    def __gt__(self, other):
        print('  testing __gt__({}, {})'.format(
            self.val, other.val))
        return self.val > other.val


a = MyObject(1)
b = MyObject(2)

for expr in ['a < b&#39;, &#39;a <= b&#39;, &#39;a == b&#39;, &#39;a >= b', 'a > b']:
    print('\n{:<6}:&#39;.format(expr))
    result = eval(expr)
    print(&#39;  result of {}: {}&#39;.format(expr, result))

The following are the running results:

a < b :
  testing __gt__(1, 2)
  testing __eq__(1, 2)
  result of a < b: True

a <= b:
  testing __gt__(1, 2)
  result of a <= b: True

a == b:
  testing __eq__(1, 2)
  result of a == b: False

a >= b:
  testing __gt__(1, 2)
  testing __eq__(1, 2)
  result of a >= b: False

a > b :
  testing __gt__(1, 2)
  result of a > b: False

Usage of itertools

itertools provides us with very useful functions for operating iterative objects.

Infinite iterator

count

count(start=0, step=1) will return an infinite integer iterator, increasing by 1 each time. You can optionally provide a starting number, which defaults to 0.

>>> from itertools import count

>>> for i in zip(count(1), ['a', 'b', 'c']):
...     print(i, end=' ')
...
(1, 'a') (2, 'b') (3, 'c')

cycle

cycle(iterable) will repeat an incoming sequence indefinitely, but you can provide a second parameter to specify the number of repetitions.

>>> from itertools import cycle

>>> for i in zip(range(6), cycle(['a', 'b', 'c'])):
...     print(i, end=' ')
...
(0, 'a') (1, 'b') (2, 'c') (3, 'a') (4, 'b') (5, 'c')

repeat

repeat(object[, times]) returns an iterator whose elements are repeated infinitely. You can provide a second parameter to limit the number of repetitions.

>>> from itertools import repeat

>>> for i, s in zip(count(1), repeat('over-and-over', 5)):
...     print(i, s)
...
1 over-and-over
2 over-and-over
3 over-and-over
4 over-and-over
5 over-and-over

Iterators terminating on the shortest input sequence

accumulate

accumulate(iterable[, func])

>>> from itertools import accumulate
>>> import operator

>>> list(accumulate([1, 2, 3, 4, 5], operator.add))
[1, 3, 6, 10, 15]

>>> list(accumulate([1, 2, 3, 4, 5], operator.mul))
[1, 2, 6, 24, 120]

chain

itertools .chain(*iterables) can combine multiple iterables into one iterator

>>> from itertools import chain

>>> list(chain([1, 2, 3], ['a', 'b', 'c']))
[1, 2, 3, 'a', 'b', 'c']

The implementation principle of chain is as follows

def chain(*iterables):
    # chain('ABC', 'DEF') --> A B C D E F
    for it in iterables:
        for element in it:
            yield element

chain.from_iterable

chain.from_iterable(iterable) and Chain is similar, but it only receives a single iterable and then combines the elements in this iterable into an iterator.

>>> from itertools import chain

>>> list(chain.from_iterable(['ABC', 'DEF']))
['A', 'B', 'C', 'D', 'E', 'F']

The implementation principle is also similar to chain

def from_iterable(iterables):
    # chain.from_iterable(['ABC', 'DEF']) --> A B C D E F
    for it in iterables:
        for element in it:
            yield element

compress

compress(data, selectors) receives two iterables as parameters, and only returns the corresponding element in selectors that is True. data, stops when one of the data/selectors is exhausted.

>>> list(compress([1, 2, 3, 4, 5], [True, True, False, False, True]))
[1, 2, 5]

zip_longest

zip_longest(*iterables, fillvalue=None) is similar to zip, but the disadvantage of zip is that when a certain element in iterable is traversed, the entire traversal will stop. Please see the specific differences. Look at the example below

from itertools import zip_longest

r1 = range(3)
r2 = range(2)

print('zip stops early:')
print(list(zip(r1, r2)))

r1 = range(3)
r2 = range(2)

print('\nzip_longest processes all of the values:')
print(list(zip_longest(r1, r2)))

The following is the output result

zip stops early:
[(0, 0), (1, 1)]

zip_longest processes all of the values:
[(0, 0), (1, 1), (2, None)]

islice

islice(iterable, stop) or islice(iterable, start, stop[, step]) with Python The string is somewhat similar to the list slice, except that negative values ​​cannot be used for start, start and step.

>>> from itertools import islice

>>> for i in islice(range(100), 0, 100, 10):
...     print(i, end=' ')
...
0 10 20 30 40 50 60 70 80 90

tee

tee(iterable, n=2) returns n independent iterators, n defaults to 2.

from itertools import islice, tee

r = islice(count(), 5)
i1, i2 = tee(r)

print('i1:', list(i1))
print('i2:', list(i2))

for i in r:
    print(i, end=' ')
    if i > 1:
        break

The following is the output result. Note that after tee(r), r has expired as an iterator, so the for loop has no output value.

i1: [0, 1, 2, 3, 4]
i2: [0, 1, 2, 3, 4]

starmap

starmap(func, iterable) assumes that iterable will return a stream of tuples and calls func with these tuples as arguments:

>>> from itertools import starmap
>>> import os

>>> iterator = starmap(os.path.join,
...                    [('/bin', 'python'), ('/usr', 'bin', 'java'),
...                    ('/usr', 'bin', 'perl'), ('/usr', 'bin', 'ruby')])

>>> list(iterator)
['/bin/python', '/usr/bin/java', '/usr/bin/perl', '/usr/bin/ruby']

filterfalse

filterfalse(predicate, iterable) Contrary to filter(), returns all elements for which predicate returns False.

itertools.filterfalse(is_even, itertools.count()) =>
1, 3, 5, 7, 9, 11, 13, 15, ...

takewhile

takewhile(predicate, iterable) 只要predicate返回True,不停地返回iterable中的元素。一旦predicate返回False,iteration将结束。

def less_than_10(x):
    return x < 10

itertools.takewhile(less_than_10, itertools.count())
=> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

itertools.takewhile(is_even, itertools.count())
=> 0

dropwhile

dropwhile(predicate, iterable) 在predicate返回True时舍弃元素,然后返回其余迭代结果。

itertools.dropwhile(less_than_10, itertools.count())
=> 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...

itertools.dropwhile(is_even, itertools.count())
=> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...

groupby

groupby(iterable, key=None) 把iterator中相邻的重复元素挑出来放在一起。p.s: The input sequence needs to be sorted on the key value in order for the groupings to work out as expected.

  • [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B

  • [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D

>>> import itertools

>>> for key, group in itertools.groupby('AAAABBBCCDAABBB'):
...     print(key, list(group))
...
A ['A', 'A', 'A', 'A']
B ['B', 'B', 'B']
C ['C', 'C']
D ['D']
A ['A', 'A']
B ['B', 'B', 'B']
city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'),
             ('Anchorage', 'AK'), ('Nome', 'AK'),
             ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ'),
             ...
            ]

def get_state(city_state):
    return city_state[1]

itertools.groupby(city_list, get_state) =>
  ('AL', iterator-1),
  ('AK', iterator-2),
  ('AZ', iterator-3), ...

iterator-1 =>  ('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL')
iterator-2 => ('Anchorage', 'AK'), ('Nome', 'AK')
iterator-3 => ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')

Combinatoric generators

product

product(*iterables, repeat=1)

  • product(A, B) returns the same as ((x,y) for x in A for y in B)

  • product(A, repeat=4) means the same as product(A, A, A, A)

from itertools import product


def show(iterable):
    for i, item in enumerate(iterable, 1):
        print(item, end=' ')
        if (i % 3) == 0:
            print()
    print()


print('Repeat 2:\n')
show(product(range(3), repeat=2))

print('Repeat 3:\n')
show(product(range(3), repeat=3))
Repeat 2:

(0, 0) (0, 1) (0, 2)
(1, 0) (1, 1) (1, 2)
(2, 0) (2, 1) (2, 2)

Repeat 3:

(0, 0, 0) (0, 0, 1) (0, 0, 2)
(0, 1, 0) (0, 1, 1) (0, 1, 2)
(0, 2, 0) (0, 2, 1) (0, 2, 2)
(1, 0, 0) (1, 0, 1) (1, 0, 2)
(1, 1, 0) (1, 1, 1) (1, 1, 2)
(1, 2, 0) (1, 2, 1) (1, 2, 2)
(2, 0, 0) (2, 0, 1) (2, 0, 2)
(2, 1, 0) (2, 1, 1) (2, 1, 2)
(2, 2, 0) (2, 2, 1) (2, 2, 2)

permutations

permutations(iterable, r=None)返回长度为r的所有可能的组合。

from itertools import permutations


def show(iterable):
    first = None
    for i, item in enumerate(iterable, 1):
        if first != item[0]:
            if first is not None:
                print()
            first = item[0]
        print(''.join(item), end=' ')
    print()


print('All permutations:\n')
show(permutations('abcd'))

print('\nPairs:\n')
show(permutations('abcd', r=2))

下面是输出结果

All permutations:

abcd abdc acbd acdb adbc adcb
bacd badc bcad bcda bdac bdca
cabd cadb cbad cbda cdab cdba
dabc dacb dbac dbca dcab dcba

Pairs:

ab ac ad
ba bc bd
ca cb cd
da db dc

combinations

combinations(iterable, r) 返回一个iterator,提供iterable中所有元素可能组合的r元组。每个元组中的元素保持与iterable返回的顺序相同。下面的实例中,不同于上面的permutations,a总是在bcd之前,b总是在cd之前,c总是在d之前。

from itertools import combinations


def show(iterable):
    first = None
    for i, item in enumerate(iterable, 1):
        if first != item[0]:
            if first is not None:
                print()
            first = item[0]
        print(''.join(item), end=' ')
    print()


print('Unique pairs:\n')
show(combinations('abcd', r=2))

下面是输出结果

Unique pairs:

ab ac ad
bc bd
cd

combinations_with_replacement

combinations_with_replacement(iterable, r)函数放宽了一个不同的约束:元素可以在单个元组中重复,即可以出现aa/bb/cc/dd等组合。

from itertools import combinations_with_replacement


def show(iterable):
    first = None
    for i, item in enumerate(iterable, 1):
        if first != item[0]:
            if first is not None:
                print()
            first = item[0]
        print(''.join(item), end=' ')
    print()


print('Unique pairs:\n')
show(combinations_with_replacement('abcd', r=2))

下面是输出结果

aa ab ac ad
bb bc bd
cc cd
dd

operator的使用

attrgetter

operator.attrgetter(attr)和operator.attrgetter(*attrs)

  • After f = attrgetter('name'), the call f(b) returns b.name.

  • After f = attrgetter('name', 'date'), the call f(b) returns (b.name, b.date).

  • After f = attrgetter('name.first', 'name.last'), the call f(b) returns (b.name.first, b.name.last).

我们通过下面这个例子来了解一下itergetter的用法。

>>> class Student:
...     def __init__(self, name, grade, age):
...         self.name = name
...         self.grade = grade
...         self.age = age
...     def __repr__(self):
...         return repr((self.name, self.grade, self.age))

>>> student_objects = [
...     Student('john', 'A', 15),
...     Student('jane', 'B', 12),
...     Student('dave', 'B', 10),
... ]

>>> sorted(student_objects, key=lambda student: student.age)   # 传统的lambda做法
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

>>> from operator import itemgetter, attrgetter

>>> sorted(student_objects, key=attrgetter('age'))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

# 但是如果像下面这样接受双重比较,Python脆弱的lambda就不适用了
>>> sorted(student_objects, key=attrgetter('grade', 'age'))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

attrgetter的实现原理:

def attrgetter(*items):
    if any(not isinstance(item, str) for item in items):
        raise TypeError('attribute name must be a string')
    if len(items) == 1:
        attr = items[0]
        def g(obj):
            return resolve_attr(obj, attr)
    else:
        def g(obj):
            return tuple(resolve_attr(obj, attr) for attr in items)
    return g

def resolve_attr(obj, attr):
    for name in attr.split("."):
        obj = getattr(obj, name)
    return obj

itemgetter

operator.itemgetter(item)和operator.itemgetter(*items)

  • After f = itemgetter(2), the call f(r) returns r[2].

  • After g = itemgetter(2, 5, 3), the call g(r) returns (r[2], r[5], r[3]).

我们通过下面这个例子来了解一下itergetter的用法

>>> student_tuples = [
...     ('john', 'A', 15),
...     ('jane', 'B', 12),
...     ('dave', 'B', 10),
... ]

>>> sorted(student_tuples, key=lambda student: student[2])   # 传统的lambda做法
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

>>> from operator import attrgetter

>>> sorted(student_tuples, key=itemgetter(2))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

# 但是如果像下面这样接受双重比较,Python脆弱的lambda就不适用了
>>> sorted(student_tuples, key=itemgetter(1,2))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

itemgetter的实现原理

def itemgetter(*items):
    if len(items) == 1:
        item = items[0]
        def g(obj):
            return obj[item]
    else:
        def g(obj):
            return tuple(obj[item] for item in items)
    return g

methodcaller

operator.methodcaller(name[, args...])

  • After f = methodcaller('name'), the call f(b) returns b.name().

  • After f = methodcaller('name', 'foo', bar=1), the call f(b) returns b.name('foo', bar=1).

methodcaller的实现原理

def methodcaller(name, *args, **kwargs):
    def caller(obj):
        return getattr(obj, name)(*args, **kwargs)
    return caller

更多Python标准库之functools/itertools/operator相关文章请关注PHP中文网!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn