带完整文字稿的播客节目，时长 31 分钟。
Utsav: [09:45] Could you walk us through the idea of the tiers of execution of the Python Interpreter?
Guido: When you execute a program, you don’t know if it’s going to crash after running a fraction of a millisecond, or whether it’s going to be a three-week-long computation. Because it could be the same code, just in the first case, it has a bug. And so, if it takes three weeks to run the program, maybe it would make sense to spend half an hour ahead of time optimizing all the code that’s going to be run. But obviously, especially in dynamic languages like Python, where we do as much as we can without asking the user to tell us exactly how they need it done, you just want to start executing code as quickly as you can. So that if it’s a small script, or a large program that happens to fail early, or just exits early for a good reason, you don’t spend any time being distracted by optimizing all that code.
So, what we try to do there is keep the bytecode compiler simple so that we get to execute the beginning of the code as soon as possible. , and some definition of “hot”. For some purposes, maybe it’s a hot function if it gets called more than once, or more than twice, or more than 10 times. For other purposes, you want to be more conservative, and you can say, “Well, it’s only hot if it’s been called 1000 times.”
A simple hypothetical example is the plus operator in Python. It can add lots of things like integers, strings, lists, or even tuples. On the other hand, you can’t add an integer to a string. So, the optimization step - often called quickening, but usually in our context, we call it specializing - is to have a separate “binary add” integer bytecode, a second-tier bytecode hidden from the user. This opcode assumes that both of its arguments are actual Python integer objects, reaches directly into those objects to find the values, adds those values together in machine registers, and pushes the result back on the stack.
The binary adds integer operation still has to make a type check on the arguments. So, it’s not completely free but a type check can be implemented much faster than a sort of completely generic object-oriented dispatch, like what normally happens for most generic add operations.
Finally, it’s always possible that a function is called millions of times with integer arguments, and then suddenly a piece of data calls it with a floating-point argument, or something worse. At that point, the interpreter will simply execute the original bytecode. That’s an important part so that you still have the full Python semantics.
Utsav [18:20] Generally you hear of these techniques in the context of JIT, a Just-In-Time compiler, but that’s not being implemented right now.
Guido: Just-In-Time compilation has a whole bunch of emotional baggage with it at this point that we’re trying to avoid. In our case, it’s unclear what and when we’re exactly compiling. At some point ahead of program execution, we compile your source code into bytecode. Then we translate the bytecode into specialized bytecode. I mean, everything happens at some point during runtime, so which part would you call Just-In-Time?
Also, it’s often assumed that Just-In-Time compilation automatically makes all your code better. Unfortunately, you often can’t actually predict what the performance of your code is going to be. And we have enough of that with modern CPUs and their fantastic branch prediction. For example, we write code in a way that we think will clearly reduce the number of memory accesses. When we benchmark it, we find that it runs just as fast as the old unoptimized code because the CPU figured out access patterns without any of our help. I wish I knew what went on in modern CPUs when it comes to branch prediction and inline caching because that is absolute magic.