Python for HPC: Community Materials
Getting Started
This site provides a combination of original resources and recommended links for Python users in the ECP and broader scientific community. It is part of the Better Scientific Software initiative.
Quick-Start Guides
- Python for HPC
- Creating a PYPI package
- Using Conda / Anaconda
- Python 102 Covers topics that are essential for scientific computing and data analysis in Python, but typically not covered in an introductory course or workshop.
Presentations and Webinars
- Python for High Performance Computing ECP Annual Meeting 2018. William Scullin (ALCF), Matt Belhorn (OLCF) and Rollin Thomas (NERSC)
- HPC Python Testing and Debugging Tutorial 2018 ECP Annual Meeting 2018. Matt Belhorn (OLCF), William Scullin (ALCF), Rollin Thomas (NERSC)
- Analyzing Python Performance with Intel VTune 2017 Intel presentation.
- Using and Scaling Python ALCF Simulation, Data, and Learning Workshop 2018. William Scullin (ALCF) and Oleksandr Pavlyk (Intel)
- Python in HPC Webinar 2017
- Python on Summit OLCF Feb 2019 Note on mpi4py
Python Resources for Scientists
- XSD Python lecture/video series for Scientists (From Argonne APS) Recommended as an introductory course for scientists.
- SciPy Lectures A community-based series of tutorials.
- On-demand learning for Python - using a Transmedia Learning Framework Webinar Python TLF
- MolSSI institue Best Practices A concise set of best practices that apply to all scientfic software. Webinar
Best Practices in Python
Testing
- pytest - Recommended testing framework. Easy-to-use, auto-discovery of tests, fixtures, many useful plugins (eg. pytest-cov, pytest-timeout).
- HPC Python Testing and Debugging Tutorial 2018 ECP Annual Meeting 2018. Matt Belhorn (OLCF), William Scullin (ALCF), Rollin Thomas (NERSC)
Coding Standards and Style
-
(NEW) Static analysis and style checking in Python packages Blog entry - S. Hudson June 2019.
- PEP 8 contains the official Python coding conventions.
- Flake8 is a popular, configurable tool comprising lint checks, PEP8 style compliance and complexity analysis. github
- YAPF “Yet Another Python Formatter” Auto-formats Python to style-guides (select from built-in styles PEP 8(default), Google, Chromium, Facebook). Also customisable. Try online
Design Patterns
“Design Patterns are general, repeatable solutions to common recurring problems in software development.” [From Wikipedia, the free encyclopedia, “Design pattern (computer science)”].
- python-patterns A collection of design patterns and idioms in Python with coded examples.
Scientific Notebooks
Jupyter Notebooks
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text
- Jupyter Notebooks Official Documentation
- Beginners Tutorial
- Beginner’s Video
NoteBook Example: Python at Speed
- Electrostatics loop example A simple electrostatics loop implemented using pure Python, NumPy, Cython, Numba (inc GPU version) with timings. Contributed by Daniel Smith (MolSSI). Notes on running
Jupyter on Supercomputers and Science Facilities
There is currently lively discussion on use of Jupyter at Science centres.
-
(NEW) Jupyter for Science User Facilities and High Performance Computing Blog entry by Rollin Thomas (Big Data Architect, NERSC) on supporting Jupyter at HPC centres and other scientific facilities. July 2019.
-
Jupyter Community Workshop (inc. presentations) Jupyter for Science User Facilities and High Performance Computing Workshops held in July 2019. Media Spot
Some HPC centres now support running Jupyter notebooks on supercomputers. Accounts on these systems is required to access the Jupyterhub servers.
Jupyter at NERSC
-
Jupyter and HPC Webinar Current State and Future Roadmap 2018. Includes information on NERSC’s support for running notebooks on Cori.
- Example Notebook with Slurm A NERSC example using SLURM magic (provided by Rollin Thomas).
- More NERSC Examples
Jupyter at Argonne
The ALCF (Argonne Leadship Computing Facility) now supports Notebooks on Theta and Cooley. An ALCF account is required.
- Theta Jupyterhub for Theta
- Cooley Jupyterhub for Cooley
- Using Jupyter on Theta A notebook for getting started with Jupyter on Theta
Scientific Programming in Python
Scientific Computing Packages:
- NumPy NumPy is the fundamental package for scientific computing with Python.
- SciPy A Python-based ecosystem of open-source software for mathematics, science, and engineering. Now incorporates: Numpy, the SciPy library, Matplotlib, IPython, SymPy and Pandas.
- (NEW) SymPy examples used for deriving equations for code generation and testing by Mark Dewing.
Speeding up Python
These packages all involve creating compiled code from Python. This can be done using packages such as NumPy and SciPy. Alternatively, you can use tools to compile Python code. Some popular ones are given below.
- Cython Create C code from modified Python. Enables performance and threading Beginners video
- PyPy Implementation of the Python language that JIT compiles for performance. No code changes required.
- Numba Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. Numba supports Intel and AMD x86, POWER8/9, and ARM CPUs, NVIDIA and AMD GPUs. Relies on decorators to identify code sections to accelerate. Requires LLVM for compiler, but works with the standard CPython. Accelerating Python with Numba (Video)
- Note: A growing alternative to speeding up Python is the compiled language Julia
Create Python Bindings to Code:
- SWIG – Generate bindings to C/C++. Works with Python and other high level languages.
- F2PY – Create Python interfaces to Fortran (part of NumPy).
- PyBind11 – “seamless operability between C++11 and Python”
- Boost.Python – “seamless operability between C++ and Python”
-
ctypes – built-in Python FFI for interfacing C
- Recommended reading: Python modules in C
Python on Accelerators
- Numba supports GPUS (see description above). github
-
CuPy NumPy-like API accelerated with CUDA. “CuPy’s interface is highly compatible with NumPy; in most cases it can be used as a drop-in replacement. All you need to do is just replace numpy with cupy in your Python code.” github
- PyCUDA PyCUDA lets you access Nvidia’s CUDA parallel computation API from Python.
- PyOpenCL PyOpenCL lets you access the OpenCL parallel computation API from Python
Parallel and Distributed Programming:
Shared Memory Parallelism
Note: Python provides an in-built threading module. However, this is not really suitable for parallel computation due to the GIL (Global Interpreter Lock)
- Multiprocessing module Basic multiple process parallelism through forked interpreters (with threading like interface). Be aware of issues mixing with OpenMP, MPI, or shared memory tools.
Distributed Memory Parallelism
- mpi4py Python wrapper for MPI
- PyCOMPSs PyCOMPSs is a programming model that enables the parallelization of sequential Python codes following a task-based approach. PyCOMPSs enables the execution in distributed infrastructures, such as Clusters, Grids, Clouds, and containers.
Scientific Libraries for HPC
Python Bindings to HPC Libraries:
- petsc4py Python bindings for PETSc/Tao, the scalable Library for partial differential equations, and numerical optimization.
- slepc4py Python bindings for SLEPc, the Scalable Library for Eigenvalue Problem Computations
- PyTrilinos A set of python wrappers for selected Trilinos packages
I/O Libraries:
- h5py The h5py package is a Pythonic interface to the HDF5 binary data format.
Ensemble and Workflow Tools
- Parsl Use Parsl with Jupyter notebooks to scale interactive analyses from laptops to supercomputers. video
- Balsam Workflow system for managing large campaigns of interdependent jobs (unlimited queue depth). Balsam manages a database of jobs, with specified dependences. Jobs can be added to the database from anywhere on the system, for dynamic workflows.
- libEnsemble Library for dynamic ensembles using generator and simulator functions (e.g. using numerical optimization).
Other
Conferences and Events
Upcoming:
Previous Events:
- Open Data Science Conference (London) Nov 19-22 2019
- Jupyter Community Workshop 2019 (inc. presentations) Jupyter for Science User Facilities and High Performance Computing Workshops held in July 2019.
- SciPy 2019 Austin, Texas, July 8-14th Videos
- SciPy 2018 Videos
- ALCF Simulation, Data and Learning Workshop 2018 Includes slides for many relevant presentations.
- PyHPC Workshop 2018 (In conjunction with SC18) Twitter page
Find more software engineering materials for computational scientists at the Better Scientific Software website.
Alternatives to Python
- Julia A scripting like language that compiles to efficient native code for multiple platforms via LLVM.
Feedback
Any feedback/corrections/additions are welcome:
- Leave a comment below.
- Email: shudson@anl.gov
- Or fork on github and make a pull request
Back to main page
Leave a comment