Home >Backend Development >Python Tutorial >Detailed explanation of the implementation principle of Python probes

Detailed explanation of the implementation principle of Python probes

高洛峰
高洛峰Original
2017-03-04 16:07:362191browse

This article will briefly describe the implementation principle of Python probes. At the same time, in order to verify this principle, we will also implement a simple probe program that counts the execution time of a specified function.

The implementation of the probe mainly involves the following knowledge points:

sys.meta_path
sitecustomize.py
sys.meta_path

sys.meta_path This is simple In other words, the function of import hook can be realized.
When import-related operations are performed, the objects defined in the sys.meta_path list will be triggered.
For more detailed information about sys.meta_path, please refer to the sys.meta_path related content in the python documentation and
PEP 0302.

The objects in sys.meta_path need to implement a find_module method,
This find_module method returns None or an object that implements the load_module method
(The code can be downloaded from github part1):

import sys
 
class MetaPathFinder:
 
  def find_module(self, fullname, path=None):
    print('find_module {}'.format(fullname))
    return MetaPathLoader()
 
class MetaPathLoader:
 
  def load_module(self, fullname):
    print('load_module {}'.format(fullname))
    sys.modules[fullname] = sys
    return sys
 
sys.meta_path.insert(0, MetaPathFinder())
 
if __name__ == '__main__':
  import http
  print(http)
  print(http.version_info)

The load_module method returns a module object, which is the module object of the import.
For example, I replaced http with the sys module as I did above.

$ python meta_path1.py
find_module http
load_module http

sys.version_info(major=3, minor=5, micro=1, releaselevel='final', serial =0)
Through sys.meta_path we can realize the function of import hook:
When importing the scheduled module, the object in this module will be replaced with a civet cat,
so as to obtain the function or method Execution time and other detection information.

The above mentioned the civet cat for the prince, so how to perform the operation of civet cat for the prince on an object?
For function objects, we can use decorators to replace function objects (the code can be downloaded from github part2):

import functools
import time
 
def func_wrapper(func):
  @functools.wraps(func)
  def wrapper(*args, **kwargs):
    print('start func')
    start = time.time()
    result = func(*args, **kwargs)
    end = time.time()
    print('spent {}s'.format(end - start))
    return result
  return wrapper
 
def sleep(n):
  time.sleep(n)
  return n
 
if __name__ == '__main__':
  func = func_wrapper(sleep)
  print(func(3))

Execution results:

$ python func_wrapper.py
start func
spent 3.004966974258423s
3

Let's implement a function to calculate the execution time of a specified function of a specified module (the code can be downloaded from github part3).

Suppose our module file is hello.py:

import time
 
def sleep(n):
  time.sleep(n)
  return n

Our import hook is hook.py:

import functools
import importlib
import sys
import time
 
_hook_modules = {'hello'}
 
class MetaPathFinder:
 
  def find_module(self, fullname, path=None):
    print('find_module {}'.format(fullname))
    if fullname in _hook_modules:
      return MetaPathLoader()
 
class MetaPathLoader:
 
  def load_module(self, fullname):
    print('load_module {}'.format(fullname))
    # ``sys.modules`` 中保存的是已经导入过的 module
    if fullname in sys.modules:
      return sys.modules[fullname]
 
    # 先从 sys.meta_path 中删除自定义的 finder
    # 防止下面执行 import_module 的时候再次触发此 finder
    # 从而出现递归调用的问题
    finder = sys.meta_path.pop(0)
    # 导入 module
    module = importlib.import_module(fullname)
 
    module_hook(fullname, module)
 
    sys.meta_path.insert(0, finder)
    return module
 
sys.meta_path.insert(0, MetaPathFinder())
 
def module_hook(fullname, module):
  if fullname == 'hello':
    module.sleep = func_wrapper(module.sleep)
 
def func_wrapper(func):
  @functools.wraps(func)
  def wrapper(*args, **kwargs):
    print('start func')
    start = time.time()
    result = func(*args, **kwargs)
    end = time.time()
    print('spent {}s'.format(end - start))
    return result
  return wrapper

Test code:

>>> import hook
>>> import hello
find_module hello
load_module hello
>>>
>>> hello.sleep(3)
start func
spent 3.0029919147491455s
3
>>>

In fact, the above code has realized the basic functions of the probe. However, there is a problem that the above code needs to execute the import hook operation to register the hook we defined.

So is there a way to automatically execute the import hook operation when starting the python interpreter?

The answer is that this function can be achieved by defining sitecustomize.py.

sitecustomize.py

To put it simply, when the python interpreter is initialized, it will automatically import the sitecustomize and usercustomize modules that exist under PYTHONPATH:

The directory structure of the experimental project is as follows (the code can Download part4 from github)

$ tree

.
├── sitecustomize.py
└── usercustomize.py
sitecustomize.py:

$ cat sitecustomize.py

print('this is sitecustomize')
usercustomize.py:

$ cat usercustomize.py

print('this is usercustomize')
Change the current directory Add it to PYTHONPATH, and then see the effect:

$ export PYTHONPATH=.
$ python
this is sitecustomize    <----
this is usercustomize    <----
Python 3.5.1 (default, Dec 24 2015, 17:20:27)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

You can see that it is indeed imported automatically. So we can change the previous detection program to support automatic execution of import hook (the code can be downloaded from github part5).

Directory structure:

$ tree

.
├── hello.py
├── hook.py
├── sitecustomize.py
sitecustomize.py:

$ cat sitecustomize.py
import hook

Result:

$ export PYTHONPATH=.
$ python
find_module usercustomize
Python 3.5.1 (default, Dec 24 2015, 17:20:27)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
find_module readline
find_module atexit
find_module rlcompleter
>>>
>>> import hello
find_module hello
load_module hello
>>>
>>> hello.sleep(3)
start func
spent 3.005002021789551s
3

But the above detection program In fact, there is another problem, that is, PYTHONPATH needs to be modified manually. Friends who have used probe programs will remember that to use probes such as newrelic, you only need to execute one command: newrelic-admin run-program python hello.py In fact, the operation of modifying PYTHONPATH is in the newrelic-admin program. completed in.

Next we will also implement a similar command line program, let's call it agent.py.

agent

is still modified based on the previous program. First adjust a directory structure and put the hook operation in a separate directory so that there will be no other interference after setting PYTHONPATH (the code can be downloaded from github part6).

$ mkdir bootstrap
$ mv hook.py bootstrap/_hook.py
$ touch bootstrap/__init__.py
$ touch agent.py
$ tree
.
├── bootstrap
│  ├── __init__.py
│  ├── _hook.py
│  └── sitecustomize.py
├── hello.py
├── test.py
├── agent.py

The content of bootstrap/sitecustomize.py is modified to:

$ cat bootstrap/sitecustomize.py

import _hook
The content of agent.py is as follows:

<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">sys</span>
 
<span class="n">current_dir</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">realpath</span><span class="p">(</span><span class="n">__file__</span><span class="p">))</span>
<span class="n">boot_dir</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">current_dir</span><span class="p">,</span> <span class="s">&#39;bootstrap&#39;</span><span class="p">)</span>
 
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
  <span class="n">args</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
  <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s">&#39;PYTHONPATH&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">boot_dir</span>
  <span class="c"># 执行后面的 python 程序命令</span>
  <span class="c"># sys.executable 是 python 解释器程序的绝对路径 ``which python``</span>
  <span class="c"># >>> sys.executable</span>
  <span class="c"># &#39;/usr/local/var/pyenv/versions/3.5.1/bin/python3.5&#39;</span>
  <span class="n">os</span><span class="o">.</span><span class="n">execl</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">executable</span><span class="p">,</span> <span class="n">sys</span><span class="o">.</span><span class="n">executable</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">)</span>
 
<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">&#39;__main__&#39;</span><span class="p">:</span>
  <span class="n">main</span><span class="p">()</span>

The content of test.py is:

$ cat test.py
import sys
import hello
 
print(sys.argv)
print(hello.sleep(3))

Usage:

$ python agent.py test.py arg1 arg2
find_module usercustomize
find_module hello
load_module hello
[&#39;test.py&#39;, &#39;arg1&#39;, &#39;arg2&#39;]
start func
spent 3.005035161972046s
3

At this point, we have implemented a simple python probe program. Of course, there is definitely a big gap compared with the actual probe program. This article mainly explains the implementation principle behind the probe.

If you are interested in the specific implementation of commercial probe programs, you can take a look at the source code of commercial python probes from foreign New Relic or domestic OneAPM, TingYun and other APM manufacturers. I believe you will find Some very interesting things.

For more detailed explanations of the implementation principles of Python probes, please pay attention to the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn