Because Python is a dynamic language, making it faster has been a challenge. But over the last couple of years, developers in the core Python team have focused on various ways to do it.
At PyCon 2023, held in Salt Lake City, Utah, several talks highlighted Python’s future as a faster and more efficient language. Python 3.12 will showcase many of those improvements. Some are new in that latest version, others are already in Python but have been further refined.
Mark Shannon, a longtime core Python contributor now at Microsoft, summarized many of the initiatives to speed up and streamline Python. Most of the work he described in his presentation centered on reducing Python’s memory use, making the interpreter faster, and optimizing the compiler to yield more efficient code.
Other projects, still under wraps but already showing promise, offer ways to expand Python’s concurrency model. This will allow Python to better use multiple cores with fewer of the tradeoffs imposed by threads, async, or multiprocessing.
The per-interpreter GIL and subinterpreters
What keeps Python from being truly fast? One of the most common answers is “lack of a better way to execute code across multiple cores.” Python does have multithreading, but threads run cooperatively, yielding to each other for CPU-bound work. And Python’s support for multiprocessing is top-heavy: you have to spin up multiple copies of the Python runtime for each core and distribute your work between them.
One long-dreamed way to solve this problem is to remove Python’s GIL, or Global Interpreter Lock. The GIL synchronizes operations between threads to ensure objects are accessed by only one thread at a time. In theory, removing the GIL would allow true multithreading. In practice—and it’s been tried many times—it slows down non-threaded use cases, so it’s not a net win.
Core python developer Eric Snow, in his talk, unveiled a possible future solution for all this: subinterpreters, and a per-interpreter GIL. In short: the GIL wouldn’t be removed, just sidestepped.
Subinterpreters is a mechanism where the Python runtime can have multiple interpreters running together inside a single process, as opposed to each interpreter being isolated in its own process (the current multiprocessing mechanism). Each subinterpreter gets its own GIL, but all subinterpreters can share state more readily.
While subinterpreters have been available in the Python runtime for some time now, they haven’t had an interface for the end user. Also, the messy state of Python’s internals hasn’t allowed subinterperters to be used effectively.
With Python 3.12, Snow and his cohort cleaned up Python’s internals enough to make subinterpreters useful, and they are adding a minimal module to the Python standard library called interpreters
. This gives programmers a rudimentary way to launch subinterpreters and execute code on them.
Snow’s own initial experiments with subinterpreters significantly outperformed threading and multiprocessing. One example, a simple web service that performed some CPU-bound work, maxed out at 100 requests per second with threads, and 600 with multiprocessing. But with subinterpreters, it yielded 11,500 requests, and with little to no drop-off when scaled up from one client.
The interpreters
module has very limited functionality right now, and it lacks robust mechanisms for sharing state between subinterpreters. But Snow believes by Python 3.13 a good deal more functionality will appear, and in the interim developers are encouraged to experiment.
A faster Python interpreter
Another major set of performance improvements Shannon mentioned, Python’s new adaptive specializing interpreter, was discussed in detail in a separate session by core Python developer Brandt Bucher.
Python 3.11 introduced new bytecodes to the interpreter, called adaptive instructions. These instructions can be replaced automatically at runtime with versions specialized for a given Python type, a process called quickening. This saves the interpreter the step of having to look up what types the objects are, speeding up the whole process enormously. For instance, if a given addition operation regularly takes in two integers, that instruction can be replaced with one that assumes the operands are both integers.
Not all code specializes well, though. For instance, arithmetic between ints and floats is allowed in Python, but operations between ints and ints, or floats and ints, don’t specialize well. Bucher provides a tool called specialist, available on PyPI, to determine if code will specialize well or badly, and to suggest where it can be improved.
Python 3.12 has more adaptive specialization opcodes, such as accessors for dynamic attributes, which are slow operations. Version 3.12 also simplifies the overall process of specializing, with fewer steps involved.
The big Python object slim-down
Python objects have historically used a lot of memory. A Python 3 object header, even without the data for the object, occupied 208 bytes.
Over the last several versions of Python, though, various efforts took place to streamline the way Python objects were designed, finding ways to share memory or represent things more compactly. Shannon outlined how as of Python 3.12, the object header’s now a mere 96 bytes—slightly less than half of what it was before.
These changes don’t just allow more Python objects to be kept in memory, they also improve cache locality for Python objects. While that by itself may not speed things up as significantly as other efforts, it’s still a boon.
Future-proofing Python’s internals
The default Python implementation, CPython, has three decades of development behind it. That also means three decades of cruft, legacy APIs, and design decisions that can be hard to transcend—all of which make it hard to improve Python in key ways.
Core Python developer Victor Stinner, in a presentation about how Python features are deprecated over time, touched on some of the ways Python’s internals are being cleaned up and future-proofed.
One key issue is the proliferation of C APIs found in CPython, the reference runtime for the language. As of Python 3.8, there are a few different sets of APIs, each with different maintenance requirements. Over the last five years, Stinner worked to make many public APIs private, so programmers don’t need to deal as directly with sensitive CPython internals. The long-term goal is to make components that use the C APIs, like Python extension modules, less dependent on things that might change with each version.
A third-party project named HPy aims to ease the maintenance burden on the developer. HPy is a substitute C API for Python—stabler across versions, yielding faster code at runtime, and abstracted from CPython’s often messy internals. The downside is that it’s an opt-in project, not a requirement, but various key projects like NumPy are experimenting with using it, and some (like the HPy port of ultrajson) are enjoying big performance gains as a result.
The biggest win for cleaning up the C API is that it opens the door to many more kinds of improvements that previously weren’t possible. Like all the other improvements described here, they’re about paving the way toward future Python versions that run faster and more efficiently than ever.
Copyright © 2023 IDG Communications, Inc.