Python GPU programming

Python Problem Overview

I am currently working on a project in python, and I would like to make use of the GPU for some calculations.

At first glance it seems like there are many tools available; at second glance, I feel like im missing something.

Copperhead looks awesome but has not yet been released. It would appear that im limited to writing low-level CUDA or openCL kernels; no thrust, no cudpp. If id like to have something sorted, im going to have to do it myself.

That doesnt seem quite right to me. Am I indeed missing something? Or is this GPU-scripting not quite living up to the hype yet?

Edit: GPULIB seems like it might be what I need. Documentation is rudimentary, and the python bindings are mentioned only in passing, but im applying for a download link right now. Anyone has experience with that, or links to similar free-for-academic-use GPU libraries? ReEdit: ok, python bindings are infact nonexistant.

Edit2: So I guess my best bet is to write something in C/CUDA and call that from python?

Python Solutions

Solution 1 - Python

PyCUDA provides very good integration with CUDA and has several helper interfaces to make writing CUDA code easier than in the straight C api. Here is an example from the Wiki which does a 2D FFT without needing any C code at all.

Solution 2 - Python

I will publish here some information that I read on reddit. It will be useful for people who are coming without a clear idea of what different packages do and how they connect cuda with Python:

From: Reddit

There's a lot of confusion in this thread about what various projects aim to do and how ready they are. There is no "GPU backend for NumPy" (much less for any of SciPy's functionality). There are a few ways to write CUDA code inside of Python and some GPU array-like objects which support subsets of NumPy's ndarray methods (but not the rest of NumPy, like linalg, fft, etc..)

PyCUDA and PyOpenCL come closest. They eliminate a lot of the plumbing surrounding launching GPU kernels (simplified array creation & memory transfer, no need for manual deallocation, etc...). For the most part, however, you're still stuck writing CUDA kernels manually, they just happen to be inside your Python file as a triple-quoted string. PyCUDA's GPUarray does include some limited NumPy-like functionality, so if you're doing something very simple you might get away without writing any kernels yourself.
NumbaPro includes a "cuda.jit" decorator which lets you write CUDA kernels using Python syntax. It's not actually much of an advance over what PyCUDA does (quoted kernel source), it's just your code now looks more Pythonic. It definitely doesn't, however, automatically run existing NumPy code on the GPU.
Theano let you construct symbolic expression trees and then compiles them to run on the GPU. It's not NumPy and only has equivalents for a small subset of NumPy's functionality.
gnumpy is a thinly documented wrapper around CudaMat. The only supported element type is float32 and only a small subset of NumPy is implemented.

Solution 3 - Python

I know that this thread is old, but I think I can bring some relevant information that answers to the question asked.

Continuum Analytics has a package that contains libraries that resolves the CUDA computing for you. Basically you instrument your code that needs to be parallelized (within a function) with a decorator and you need to import a library. Thus, you don't need any knowledge about CUDA instructions.

Information can be found on NVIDIA page

https://developer.nvidia.com/anaconda-accelerate

or you can go directly to the Continuum Analytics' page

https://store.continuum.io/cshop/anaconda/

There is a 30 day trial period and a free licence for academics.

I use this extensively and accelerates my code between 10 to 50 times.

Solution 4 - Python

Theano looks like it might be what you're looking for. From what I understand, it is very capable of doing some heavy mathematical lifting with the GPU and appears to be actively maintained.

Good luck!

Solution 5 - Python

Check this page for a open source library distributed with Anaconda https://www.anaconda.com/blog/developer-blog/open-sourcing-anaconda-accelerate/

" Today, we are releasing a two new Numba sub-projects called pyculib and pyculib_sorting, which contain the NVIDIA GPU library Python wrappers and sorting functions from Accelerate. These wrappers work with NumPy arrays and Numba GPU device arrays to provide access to accelerated functions from: cuBLAS: Linear algebra cuFFT: Fast Fourier Transform cuSparse: Sparse matrix operations cuRand: Random number generation (host functions only) Sorting: Fast sorting algorithms ported from CUB and ModernGPU Going forward, the Numba project will take stewardship of pyculib and pyculib_sorting, releasing updates as needed when new Numba releases come out. These projects are BSD-licensed, just like Numba "

Solution 6 - Python

Have you taken a look at PyGPU?

http://fileadmin.cs.lth.se/cs/Personal/Calle_Lejdfors/pygpu/

Solution 7 - Python

I can recommend scikits.cuda . but for that you need to download CULA full version(free for students.) . Another is CUV . If you are looking for something better and ready to pay for that,you can also take a look at array fire.Write now I am using scikits and quite satisfy so far.

Content Type	Original Author	Original Content on Stackoverflow
Question	Eelco Hoogendoorn	View Question on Stackoverflow
Solution 1 - Python	Joseph Lisee	View Answer on Stackoverflow
Solution 2 - Python	Heberto Mayorquin	View Answer on Stackoverflow
Solution 3 - Python	Bogdan	View Answer on Stackoverflow
Solution 4 - Python	katzenklavier	View Answer on Stackoverflow
Solution 5 - Python	Cristiana SP	View Answer on Stackoverflow
Solution 6 - Python	onteria_	View Answer on Stackoverflow
Solution 7 - Python	Moj	View Answer on Stackoverflow