-2

what i know is , in cpython when we run a code then its first compiled into byte code and the cpython interpreter( which is written in c interprets it) interprets the byte code and converts it into machine code. Does that mean the byte code is represented as c code by the interpreter and then its carried out as c code ?

what exactly does it mean when we say python interpreter is written in c/java? how do they differ in their process of converting byte code into machine code?

what exactly does the interpreter do with the Byte code?

2

3 Answers 3

5

For all practical purposes, bytecode is just a data structure that is convenient for an interpreter to use. The interpreter looks at each instruction in the byte code and immediately performs that action.

A simple interpreter for arithmetic expressions might look like this when written in Python:

bytecode = [
  {'type': 'const', 'value': 40},
  {'type': 'const', 'value': 2},
  {'type': '+'},
  {'type': 'print'},
]

stack = []

for instruction in bytecode:
  action = instruction['type']
  if action == 'const':
    stack.append(instruction['value'])
  elif action == '+':
    right = stack.pop()
    left = stack.pop()
    stack.append(left + right)
  elif action == 'print':
    print(stack.pop())
  else:
    raise TypeError(f'Unknown instruction type {action}')

That is, the interpreter checks the type of the type of the instruction and selects which code snippet to run depending on that action. The interpreter is not translating the bytecode into another language in the sense that we'd get a program in that language, but it does map the instructions to code snippets.

Thus, the CPython interpreter does not translate Python bytecode into C code, but selects which C code snippet to run depending on the instruction.

This is useful, because generating C code that works correctly and is as fast as expected from C is quite tricky. Interpreters typically spend a lot of time doing extra bookkeeping (like reference counting), and in the dispatch logic itself – that loop through all instructions isn't quite free.

Programs that translate a source language into another language are sometimes called a transpiler (a kind of compiler that doesn't output machine code). It is easy to create a transpiler that just calls into the code snippets of an interpreter (sometimes called threaded code in older literature):

bytecode = [
  {'type': 'const', 'value': 40},
  {'type': 'const', 'value': 2},
  {'type': '+'},
  {'type': 'print'},
]

# built-in function for our "compiled" code to call

def do_const(stack, value):
  stack.append(value)

def do_add(stack):
  right = stack.pop()
  left = stack.pop()
  stack.append(left + right)

def do_print(stack):
  print(stack.pop())

# assembling Python source code for our "bytecode"
code = 'stack = []\n'
for instruction in bytecode:
  action = instruction['type']
  if action == 'const':
    code += f'do_const(stack, {instruction["value"]})\n'
  elif action == '+':
    code += 'do_add(stack)\n'
  elif action == 'print':
    code += 'do_print(stack)\n'
  else:
    raise TypeError(f'Unknown instruction type {action}')

# we can now execute the code by "compiling" it as Python:
exec(code, locals(), {})

The resulting code might be slightly faster because we've gotten rid of the dispatch logic, but we still have interpreter overhead like stack manipulation. The code we've generated doesn't look like normal Python code. But to get to that Python code we still had to run through our dispatch logic, and now Python has to parse the code we've generated. Similarly, a Python interpreter that translates to C wouldn't be very fast.

The Java reference implementation OpenJDK/HotSpot is interesting because its runtime combines a just-in-time compiler with an interpreter. By default, it interprets byte code with an interpreter written in C++. But if the same code is executed often enough, it compiles that part of the code directly to machine code. Depending on how important that code is, HotSpot spends more effort on optimizing the machine code. This allows Java to be as fast as C in some benchmarks. CPython is very simplistic in comparison.

There is (was?) a Python implementation called Jython that was written in Java. Jython works by compiling the Python code to Java byte code. That byte code is then handled by the Java virtual machine, which either interprets it or compiles it on the fly to machine code, as discussed above. Because the Java runtime is awesome this could make that Python code run very fast, in some circumstances. But the added complexity also comes at a cost. Additionally, Jython is not compatible with Python modules that need to interact with internal CPython data structures.

1
  • Jython still exists. There is also IronPython. The most interesting Python implementation(s) currently are, IMO, PyPy (a Python implementation using RPython, which is both a statically typed programming language as well as a framework for implementing programming languages) and TrufflePython (built using the Truffle AST Interpreter framework, which can use the Graal Compiler Framework and the GraalVM for some quite amazing performance; its cousin TruffleRuby not only beats CRuby but depending on the benchmark even C). Commented Sep 27, 2020 at 19:01
2

what i know is , in cpython when we run a code then its first compiled into byte code and the cpython interpreter( which is written in c interprets it) interprets the byte code and converts it into machine code.

No, it doesn't "convert it into machine code." That would be a compiler, not an interpreter.

A compiler is a program which takes a program written in Language A and converts it into a semantically equivalent program written in Language B.

An interpreter is a program which takes a program written in Language A and runs it.

Does that mean the byte code is represented as c code by the interpreter and then its carried out as c code ?

It's not clear what you mean by that. CPython is written in C, so in some sense, at some point, the byte code will be represented by some sort of C data structure, even if just as a byte[]. So, in some sense, there is some sort of representation in C.

what exactly does it mean when we say python interpreter is written in c/java? how do they differ in their process of converting byte code into machine code?

They don't convert byte code into machine code.

Again, an interpreter doesn't convert. It interprets.

A compiler converts.

Maybe you are confused by the standard English meaning of interpreter which is a person that translates from language to another language in real time. That is not what is meant when we use the term interpreter in programming.

Words can have different meanings in different contexts, the same word can mean different things in "general" English and in Programming Jargon.

what exactly does the interpreter do with the Byte code?

The simplest answer is: it runs it.

For example, if you have an ADD byte code whose semantics are that you take the two values on top of the stack, add them, and put the result back on the stack, the byte code interpreter will then do exactly that: it will take two values from the stack, add them, and put the result back on the stack. (Assuming we are using a stack-based VM, which I believe CPython does.)

I wrote a little bit about the differences between interpreters and compilers in my answer to the question Understanding the differences: traditional interpreter, JIT compiler, JIT interpreter and AOT compiler, in case you are interested.

0
0

Conceptually, a bytecode interpreter has a big switch() statement at its core where each case corresponds to one bytecode. Just-in-time implementations get better performance by translating a sequence of bytecodes to machine language and calling the generated code (which may be cached) instead of interpreting.

Translating bytecodes to C and compiling that using the C compiler would cost much more time without necessarily getting better machine code.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.