Description
Currently, --enable-experimental-jit
and --disable-gil
can be configured together. However, the behavior in this configuration is probably surprising, and not what the user intends: the JIT is built, but never actually used.
I've been seeing a lot of people configuring these "bleeding edge" experimental builds now, and we even have one in CI (we've been using it to make sure the JIT build step isn't broken). Until the free-threaded build supports the JIT, we should start by erroring during the configure step for these builds in 3.14. I'll open a PR to do this soon.
For 3.15, the JIT should be updated to work on the free-threaded build. This shouldn't be done in a way that pessimizes the JIT in the default build... it will probably start with a bunch of #ifdef Py_GIL_DISABLED
/#ifndef Py_GIL_DISABLED
blocks, in order to at least get something working.
This isn't something that I am planning on working on personally. This isn't a good beginner issue either. If there is somebody with free-threading experience interested in learning more about the JIT and willing to take on this project, then that would be great. I can't claim deep free-threading knowledge, but I'm willing to walk you through how the JIT works, help identify what the potential thread-safety issues may be, and review/debug any PRs.
In my understanding, here's a very high-level triage of the current situation:
JUMP_BACKWARD
: We need to re-enable specialization for this instruction. This shouldn't be too hard, since we already have idioms for specialization on the free-threaded build.bytecodes.c
: There are several instructions that are tier-two (JIT) only. These need to be audited for thread-safety._PyOptimizer_Optimize
: Finding space for and inserting executors into the code object needs to be done in a thread-safe way, which probably will involve locking the code object such that only one part of it can be compiled at a time._PyExecutorObject
: These are objects that manage JIT code. They are joined into a giant doubly-linked list, and are unlinked as they are invalidated or no longer needed. This list needs to be made thread-safe.translate_bytecode_to_trace
: This shouldn't be too bad, since each thread has its own copy of the bytecode and inline caches (so we don't need to worry about concurrent mutations during tracing). Function inlining andfor
loop headers may need some attention._Py_uop_analyze_and_optimize
is going to be where most of our time and ongoing effort is spent. Thankfully it's all optional (in theory), so we may just be able to turn it all off and selectively turn things back on. It consists of three passes:remove_globals
: This has the potential to be annoying, but I think it may not be too bad (as long as watchers are already thread-safe?).optimize_uops
: This is going to be the long tail, I think. This one is different, since we don't only care if theoptimize_uops
itself is thread-safe (easy), but also if our optimizations will be thread-safe at runtime (hard). We'll probably just end up#ifdef
'ing out many of the bodies inoptimizer_bytecodes.c
and figuring out how to turn them back on (if at all).remove_unneeded_uops
: This already looks fine to me.
_PyJIT_Compile
: This is probably fine, actually, which is a relief. The entire machine-code backend is stateless and self-contained. As long asmmap
/mprotect
/munmap
/etc. are threadsafe, then we're okay. The generated code is immutable, and can be shared between threads without issue.- (I'm probably missing stuff.)
It may even be reasonable to stop the world for JIT compilation initially (just to get something working), then later introduce finer-grained locking as needed.