Post

Basic C++ python extension

C++ and Python are very different programming languages, the first one is compiled and low level whereas the second one is interpreted. C++ is a lot faster than Python but, can we leverage the performance of C++ and the versatility in Python?. Yes, we can do such thing writing C++ extensions and create bindings for Python. In this post we will create a python package with compiled code using pybind11 library to create the python bindings. As usual you have the blog with the code.

The C++ project

In a repository we will need coexisting python code and C++ code. In this example we will code a C++ matrix multiplication that we want to expose to Python. We define the following file structure

1
2
3
4
5
6
7
8
9
10
11
12
.
├── README.md
├── include
│   └── matmul.h
├── scripts
│   └── compile.sh
├── src
│   ├── bindings.cpp
│   ├── main.cpp
│   └── matmul.cpp
└── tests
    └── test_matmul.py

With the following contents for matmul.h:

1
2
3
4
5
6
7
#ifndef MATMUL_H
#define MATMUL_H

void matmul(const float* A, const float* B, float* C, int M, int N, int K);
void printmatrix(const float* A, int M, int N);

#endif

and matmul.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <iostream>
#include "matmul.h"

// Matrices are indexed row-major in this example. E.g. if A is [M x N]
// If i,j are the row and column indices, the element A[i, j] is
// A[i, j] = A[i * N + j] // if row-major
// A[i, j] = A[j * M + i] // if column-major

void matmul(const float* A, const float* B, float* C, int M, int N, int K){
// Matrix multiplication, C[M x K] = A[M x N] * B[N x K]
// Multiplication is $\sum_n A[m, n] * B[n, k]$
    for(int m=0; m<M; m++){
        for(int k=0; k<K; k++){
            C[m * K + k] = 0;
            for(int n=0; n<N; n++){
                C[m * K + k] += A[m * N + n] * B[n * K + k];
            }
        }
    }
}

void printmatrix(const float* A, int M, int N) {
    for (int i = 0; i < M; ++i) {
        for (int j = 0; j < N; ++j) {
            std::cout << A[i * N + j] << " ";
        }
        std::cout << "\n";
    }
}

This file contains just two fucntions, matmul gets three matrices A[M x K], B[M x N] and C[N x K] and returns the multiplication of $C = A \times B$. The matrices are single precision array of floats and we consider row-major order.

We can use the matmul library in a main fucntion to compile a binary and test that our function is correct. For that we define a main.cpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <iostream>
#include "matmul.h"

#define M 32
#define N 64
#define K 32

void initializeMatrices(float* A, float* B) {
    srand(7);

    for (int i = 0; i < M * N; ++i)
        A[i] = (rand() % 100) / 10.0f;  // Random float in range [0,10]

    for (int i = 0; i < N * K; ++i)
        B[i] = (rand() % 100) / 10.0f;  // Random float in range [0,10]
}

int main() {

    float* A = new float[M * N];
    float* B = new float[N * K];
    float* C = new float[M * K];

    initializeMatrices(A, B);
    matmul(A, B, C, M, N, K);

    std::cout << "C = A x B:" << std::endl;
    printmatrix(C, M, K);

    delete[] A;
    delete[] B;
    delete[] C;

    return 0;
}

This is pretty simple code, we define the matrices, randomly initialize them (although we don’t need C to be initialized randomly) and we perform the multiplication. Then we print out the results on screen. Let’s compile this main.cpp entrypoint. First we manually create our usual build directory, then we compile the objects and lastly we link the objects into the final executable.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# create build directories
rm -rf build
mkdir -p build/obj
mkdir build/bin
mkdir build/lib

# compile to objects
g++ -std=c++17 -Iinclude -c src/matmul.cpp -o build/obj/matmul.o
g++ -std=c++17 -Iinclude -c src/main.cpp -o build/obj/main.o

# link all the objects
g++ build/obj/matmul.o \
    build/obj/main.o \
    -o build/bin/main

With this we can execute the main and see the result of the multiplication of the two matrices

1
./build/bin/main

Understanding what are Python bindings

Now the question is, how do we convert this code so that we can run it with python?. I would like to use matmul function from python. We need to understand first that python is actually a collection of shared libraries that are loaded dynamically. Just create a new environment and let’s inspect it

1
2
python -m venv .venv
source .venv/bin/activate

First with the tool otool in MacOs (ldd in Linux) let’s see what are the libraries that the executable python depends on, type

1
otool -L .venv/bin/python

to see that

1
2
3
4
5
.venv/bin/python:
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 2420.0.0)
	/Users/sebas/.pyenv/versions/3.12.4/lib/libpython3.12.dylib (compatibility version 3.12.0, current version 3.12.0)
	/usr/local/opt/gettext/lib/libintl.8.dylib (compatibility version 13.0.0, current version 13.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1345.100.2)

These are the libraries that python binary expect to load at runtime. The most important is libpython3.12.dylib (that’s for MacOS, you would see a .so file in Linux or a .dll file in Windows). It contains the compiled core of the Python interpreter, including the bytecode evaluator, built-in types, and other core runtime components. This library is used to embed Python into other applications or link with C/C++ extensions dynamically. Let’s continue inspecting the paths that python uses. At runtime the interpreter includes a bunch of directories to look for libraries. Find them by running

1
python -c "import sys; print(sys.path)"

The output is:

1
2
3
4
['',
'/Users/sebas/.pyenv/versions/3.12.4/lib/python3.12',
'/Users/sebas/.pyenv/versions/3.12.4/lib/python3.12/lib-dynload',
'/Users/sebas/tmp/blogging-code/cpp-compile-link-external-lib/.venv/lib/python3.12/site-packages']

The first directory containes the file libpython3.12.dylib. If we go deeper into that directory there’s /Users/sebas/.pyenv/versions/3.12.4/lib/python3.12/lib-dynload where you will find python files of really known packages (the standard library of python), that’s hashlib.py, datetime.py, dataclases.py, abc.py… those are “packages” that come by default with the python installation.

Let’s take a look at the file hashlib.py file, some of the imports are import _sha1, import _md5, those are cryptographic algoritms. Where are those imports?. If you check the next path /Users/sebas/.pyenv/versions/3.12.4/lib/python3.12/lib-dynload there are files like _sha1.cpython-312-darwin.so and _md5.cpython-312-darwin.so. Libraries that are imported as modules, C++ shared libraries that can be loaded by the python interpreter. That’s what we want to do, compile the C++ matmul funcion into some sort of shared library so that we can import in our python script.

Compiling a shared library for Python

In a previous C++ post I have shown how to compile a shared object, and this should be easy. However we cannot expect to compile C++ code directly to get a python shared object, we need to define how the C++ code translates into C++ python objects. For this we need the python C++ headers, to use the C++ python API. You can see the path for those by executing

1
python -c "import sysconfig; print(sysconfig.get_path('include'))"

which in my case is /Users/sebas/.pyenv/versions/3.12.4/include/python3.12, there you can find many headers but the most important is Python.h (which is basically all the other headers combined).

Apart from the header you need the library python3.12, you can get it by asking the linking flags to your python:

1
python3-config --ldflags

which returns -lintl -ldl -L/Users/sebas/.pyenv/versions/3.12.4/lib -Wl,-rpath,/Users/sebas/.pyenv/versions/3.12.4/lib -framework CoreFoundation, that is to link against the libraier intl, dl and look for those libraries in the specified cirectory -L. Finally also tells the linker to add a runtime path -rpath at which the executable will try to find the libraries at runtime. The last is specific to MacOS, this provides utilities for the operating system. This output would be different if we were on Windows or a Linux machine.

At this point we have the includes and libraries that we need to compile the C++ code into Python. We could write our bindings using the definitons in Python.h. This library is the official Python API to write C code, it allows you to have full control of the program but it is generally more difficult to write code compared to other options (see next section).

Compiling a shared library for Python using Pybind11

A more convenient library to compile your shared python packages is pybind11, which is a header only library that exposes C++ types in Python and vice versa. For this you will need the python headers and libraries (shown in previous section) and pybind11 that can be installed with pip install pybind11.

For now we will write a bindings.cpp file with all the “translated code”:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <pybind11/numpy.h>
#include "matmul.h"

namespace py = pybind11;

void matmul_py(py::array_t<float> A, py::array_t<float> B, py::array_t<float> C) {
    auto bufA = A.request(), bufB = B.request(), bufC = C.request();

    if (bufA.ndim != 2 || bufB.ndim != 2 || bufC.ndim != 2) {
        throw std::runtime_error("All matrices must be 2D");
    }

    size_t M = bufA.shape[0];
    size_t N = bufA.shape[1];
    size_t K = bufB.shape[1];

    if (bufB.shape[0] != N || bufC.shape[0] != M || bufC.shape[1] != K) {
        throw std::runtime_error("Matrix dimensions do not match for multiplication");
    }

    float* ptrA = static_cast<float*>(bufA.ptr);
    float* ptrB = static_cast<float*>(bufB.ptr);
    float* ptrC = static_cast<float*>(bufC.ptr);

    matmul(ptrA, ptrB, ptrC, M, N, K);  // same call as before
}


PYBIND11_MODULE(matrix_mul, m) {
    m.def("matmul", &matmul_py, "Matrix multiplication function");
}

The function matmul_py takes three python numpy arrays, A, B and C and first checks that they are dimension 2. After that, we get the shapes of the pointers and get the pointers to the memory of each array. Finally we can call the C++ function matmul. Lastly we define our PYBIND11_MODULE, we expose the function matmul_py to be called as matmul in Python. And that should be it, now is time to compile. Bear with me, I’m going to throw a bunch of bash commands while explaining them in inline comments, in the root directory of the project run:

1
2
3
4
5
6
7
8
# create a building python environment
rm -rf .venv_build
python -m venv .venv_build
.venv_build/bin/pip install --upgrade pip

# install pybind11 using pip
.venv_build/bin/python -m pip install pybind11

Create the directory to hold the objects, binaries and the library

1
2
3
4
5
6
rm -rf build

# creating directories for the build
mkdir -p build/obj
mkdir build/bin
mkdir build/lib

Now we can compile the objects including the python and pybind11 headers (the output of python -m pybind11 --includes is -I/Users/sebas/.pyenv/versions/3.12.4/include/python3.12 -I/Users/sebas/tmp/blogging-code/cpp-basic-cpp-python-extension/.venv_build/lib/python3.12/site-packages/pybind11/include in my setup).

1
2
3
4
5
g++ -std=c++17 -Iinclude -c src/matmul.cpp -o build/obj/matmul.o
g++ -std=c++17 -Iinclude \
                $(.venv_build/bin/python -m pybind11 --includes) \
                -c src/bindings.cpp \
                -o build/obj/bindings.o

And finally we create the shared object

1
2
3
4
5
6
7
8
9
10
# grep the name of the major and minor versions of python, i.e. if we use 3.12.8 this will return python3.12
# this is the name of the python library
python_library=python$(.venv_build/bin/python --version | awk '{print $2}' | awk -F. '{print $1"."$2}')

g++ -O3 -Wall -shared -std=c++17 -fPIC \
    $(python3-config --ldflags) \
    -l${python_library} \
    build/obj/matmul.o \
    build/obj/bindings.o \
    -o build/lib/matrix_mul$(python3-config --extension-suffix)

And you will see a file matrix_mul.cpython-312-darwin.so in your build/lib directory. This is your compiled library!. Let me explain some key concepts. The command python3-config --ldflags gives you the flags needed to compile the python extension (explained before), the -l flag is to specifically link a library, in my case python3.12, then python3-config --extension-suffix gives the python version, architecture and operating system. It is used commonly to name the extension.

How can you import it?. Change directory to where the shared library is and try to impor it from there

1
2
cd build/lib
python -c "import matrix_mul"

This works here because “current directory” is always in the sys.path for the interpreter. We should place the library in the libs directory of our environment and then it could be imported every time we open a python prompt.

Testing

Let’s write a python script to test our library, this file will be called test_matmul.py and will be placed under tests directory.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import numpy as np
import sys
from pathlib import Path

# we can't really import the library (shared object) from a script
# unless it's in the sys.path
SHARED_LIBRARY_DIR = Path(__file__).parents[1] / "build" / "lib"
sys.path.insert(0, str(SHARED_LIBRARY_DIR))

# now we can import our compiled library
import matrix_mul

# Define matrix dimensions
M, N, K = 32, 64, 32

# Create random matrices A[MxN] and B[NxK]
A = np.random.rand(M, N).astype(np.float32)
B = np.random.rand(N, K).astype(np.float32)
C = np.zeros((M, K), dtype=np.float32)  # Initialize C with zeros

# Call the compiled function
matrix_mul.matmul(A, B, C)

# Verify with NumPy
C_np = np.dot(A, B)

# Check if the results match
assert np.allclose(C, C_np), f"something went wrong, C and C_np are not equal"

print(f"Tests passed!")

The first part adds the path to the library we just compile so that the python interpreter can find it. The rest of the script is self explanatory, we use numpy to compare the two matrix multiplications. To start “fresh” we create a new python environment and call the script

1
2
3
4
5
6
7
8
9
rm -rf .venv_test
python -m venv .venv_test
.venv_test/bin/pip install --upgrade pip

# install numpy, required for the script tests/test_matmul.py
.venv_test/bin/pip install numpy

# run the test
.venv_test/bin/python tests/test_matmul.py

When executing this you should see Tests passed! as the last output. Congrats! You have learned the basic of Python bindings for C++ extensions.

This post is licensed under CC BY 4.0 by the author.