Python for HPC: Community Materials

Getting Started

This site provides a combination of original resources and recommended links for Python users in the ECP and broader scientific community. It is part of the Better Scientific Software initiative.

Quick-Start Guides

Presentations and Webinars

Python Resources for Scientists

Best Practices in Python

Testing

  • pytest - Recommended testing framework. Easy-to-use, auto-discovery of tests, fixtures, many useful plugins (eg. pytest-cov, pytest-timeout).
  • HPC Python Testing and Debugging Tutorial 2018 ECP Annual Meeting 2018. Matt Belhorn (OLCF), William Scullin (ALCF), Rollin Thomas (NERSC)

Coding Standards and Style

Design Patterns

“Design Patterns are general, repeatable solutions to common recurring problems in software development.” [From Wikipedia, the free encyclopedia, “Design pattern (computer science)”].

  • python-patterns A collection of design patterns and idioms in Python with coded examples.

Scientific Notebooks

Jupyter Notebooks

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text

NoteBook Example: Python at Speed

Jupyter on Supercomputers and Science Facilities

There is currently lively discussion on use of Jupyter at Science centres.

Some HPC centres now support running Jupyter notebooks on supercomputers. Accounts on these systems is required to access the Jupyterhub servers.

Jupyter at NERSC

Jupyter at Argonne

The ALCF (Argonne Leadship Computing Facility) now supports Notebooks on Theta and Cooley. An ALCF account is required.

Scientific Programming in Python

Scientific Computing Packages:

  • NumPy NumPy is the fundamental package for scientific computing with Python.
  • SciPy A Python-based ecosystem of open-source software for mathematics, science, and engineering. Now incorporates: Numpy, the SciPy library, Matplotlib, IPython, SymPy and Pandas.
  • (NEW) SymPy examples used for deriving equations for code generation and testing by Mark Dewing.

    Speeding up Python

These packages all involve creating compiled code from Python. This can be done using packages such as NumPy and SciPy. Alternatively, you can use tools to compile Python code. Some popular ones are given below.

  • Cython Create C code from modified Python. Enables performance and threading Beginners video
  • PyPy Implementation of the Python language that JIT compiles for performance. No code changes required.
  • Numba Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. Numba supports Intel and AMD x86, POWER8/9, and ARM CPUs, NVIDIA and AMD GPUs. Relies on decorators to identify code sections to accelerate. Requires LLVM for compiler, but works with the standard CPython. Accelerating Python with Numba (Video)
  • Note: A growing alternative to speeding up Python is the compiled language Julia

Create Python Bindings to Code:

  • SWIG – Generate bindings to C/C++. Works with Python and other high level languages.
  • F2PY – Create Python interfaces to Fortran (part of NumPy).
  • PyBind11 – “seamless operability between C++11 and Python”
  • Boost.Python – “seamless operability between C++ and Python”
  • ctypes – built-in Python FFI for interfacing C

  • Recommended reading: Python modules in C

Python on Accelerators

  • Numba supports GPUS (see description above). github
  • CuPy NumPy-like API accelerated with CUDA. “CuPy’s interface is highly compatible with NumPy; in most cases it can be used as a drop-in replacement. All you need to do is just replace numpy with cupy in your Python code.” github

  • PyCUDA PyCUDA lets you access Nvidia’s CUDA parallel computation API from Python.
  • PyOpenCL PyOpenCL lets you access the OpenCL parallel computation API from Python

Parallel and Distributed Programming:

Shared Memory Parallelism

Note: Python provides an in-built threading module. However, this is not really suitable for parallel computation due to the GIL (Global Interpreter Lock)

  • Multiprocessing module Basic multiple process parallelism through forked interpreters (with threading like interface). Be aware of issues mixing with OpenMP, MPI, or shared memory tools.

Distributed Memory Parallelism

  • mpi4py Python wrapper for MPI
  • PyCOMPSs PyCOMPSs is a programming model that enables the parallelization of sequential Python codes following a task-based approach. PyCOMPSs enables the execution in distributed infrastructures, such as Clusters, Grids, Clouds, and containers.

Scientific Libraries for HPC

Python Bindings to HPC Libraries:

  • petsc4py Python bindings for PETSc/Tao, the scalable Library for partial differential equations, and numerical optimization.
  • slepc4py Python bindings for SLEPc, the Scalable Library for Eigenvalue Problem Computations
  • PyTrilinos A set of python wrappers for selected Trilinos packages

I/O Libraries:

  • h5py The h5py package is a Pythonic interface to the HDF5 binary data format.

Ensemble and Workflow Tools

  • Parsl Use Parsl with Jupyter notebooks to scale interactive analyses from laptops to supercomputers. video
  • Balsam Workflow system for managing large campaigns of interdependent jobs (unlimited queue depth). Balsam manages a database of jobs, with specified dependences. Jobs can be added to the database from anywhere on the system, for dynamic workflows.
  • libEnsemble Library for dynamic ensembles using generator and simulator functions (e.g. using numerical optimization).

Other

Conferences and Events

Upcoming:

Previous Events:

Find more software engineering materials for computational scientists at the Better Scientific Software website.

Alternatives to Python

  • Julia A scripting like language that compiles to efficient native code for multiple platforms via LLVM.

Feedback

Any feedback/corrections/additions are welcome:

  • Leave a comment below.
  • Email: shudson@anl.gov
  • Or fork on github and make a pull request

Back to main page

Leave a comment