A detailed introduction to how to write CUDA programs using Python-Python Tutorial-php.cn

A detailed introduction to how to write CUDA programs using Python

高洛峰

Release： 2017-03-28 09:29:19

Original

4298 people have browsed it

The following editor will bring you an article on how to write CUDA programs using Python. The editor thinks it is quite good, so I will share it with you now and give it as a reference for everyone. Let’s follow the editor and take a look.

There are two ways to write CUDA programs in Python:

* Numba
* PyCUDA

Numbapro is now deprecated, and the functions have been split and integrated into accelerate and Numba respectively.

Example

numba

#Numba uses the just-in-time compilation mechanism ( JIT) to optimize Python code, Numba can be optimized for the local hardware environment, supports both CPU and GPU optimization, and can be integrated with Numpy so that Python code can run on the GPU, just in function Add the relevant command tags above,

as follows:

import numpy as np 
from timeit import default_timer as timer
from numba import vectorize
@vectorize(["float32(float32, float32)"], target='cuda')
def vectorAdd(a, b):
  return a + b
def main():
  N = 320000000
  A = np.ones(N, dtype=np.float32 )
  B = np.ones(N, dtype=np.float32 )
  C = np.zeros(N, dtype=np.float32 )
  start = timer()
  C = vectorAdd(A, B)
  vectorAdd_time = timer() - start
  print("c[:5] = " + str(C[:5]))
  print("c[-5:] = " + str(C[-5:]))
  print("vectorAdd took %f seconds " % vectorAdd_time)
if name == 'main':
  main()

Copy after login

PyCUDA

The kernel function (kernel) of PyCUDA is actually written in C/C++. It is dynamically compiled into GPU microcode, and the Python code interacts with the GPU code, as shown below:

import pycuda.autoinit
import pycuda.driver as drv
import numpy as np
from timeit import default_timer as timer
from pycuda.compiler import SourceModule
mod = SourceModule("""
global void func(float *a, float *b, size_t N)
{
 const int i = blockIdx.x * blockDim.x + threadIdx.x;
 if (i >= N)
 {
  return;
 }
 float temp_a = a[i];
 float temp_b = b[i];
 a[i] = (temp_a * 10 + 2 ) * ((temp_b + 2) * 10 - 5 ) * 5;
 // a[i] = a[i] + b[i];
}
""")
func = mod.get_function("func")  
def test(N):
  # N = 1024 * 1024 * 90  # float: 4M = 1024 * 1024
  print("N = %d" % N)
  N = np.int32(N)
  a = np.random.randn(N).astype(np.float32)
  b = np.random.randn(N).astype(np.float32)  
  # copy a to aa
  aa = np.empty_like(a)
  aa[:] = a
  # GPU run
  nTheads = 256
  nBlocks = int( ( N + nTheads - 1 ) / nTheads )
  start = timer()
  func(
      drv.InOut(a), drv.In(b), N,
      block=( nTheads, 1, 1 ), grid=( nBlocks, 1 ) )
  run_time = timer() - start 
  print("gpu run time %f seconds " % run_time)  
  # cpu run
  start = timer()
  aa = (aa * 10 + 2 ) * ((b + 2) * 10 - 5 ) * 5
  run_time = timer() - start 
  print("cpu run time %f seconds " % run_time) 
  # check result
  r = a - aa
  print( min(r), max(r) )
def main():
 for n in range(1, 10):
  N = 1024 * 1024 * (n * 10)
  print("------------%d---------------" % n)
  test(N)
if name == 'main':
  main()

Copy after login

Comparison

numba uses some instructions to mark certain functions for acceleration (you can also use Python to write kernel functions), which is similar to OpenACC, while PyCUDA needs to write its own kernel , compiled at runtime, and the bottom layer is implemented based on C/C++. Through testing, the acceleration ratios of these two methods are basically the same. However, numba is more like a black box, and you don't know what is done internally, while PyCUDA seems very intuitive. Therefore, these two methods have different applications:

* If you just want to speed up your own algorithm and don't care about CUDA programming, it will be better to use numba directly.

* If you want to learn and research CUDA programming or experiment with the feasibility of a certain algorithm under CUDA, then use PyCUDA.

* If the program you write will be transplanted to C/C++ in the future, you must use PyCUDA, because the kernel written using PyCUDA itself is written in CUDA C/C++.

The above is the detailed content of A detailed introduction to how to write CUDA programs using Python. For more information, please follow other related articles on the PHP Chinese website!