Python compatibility
The goal of this page is to point out some of the differences between running python with PyPy and with CPython
TL;DR
Pure python code works, but there are a few differences with object lifetime management. Modules that use the CPython C API will probably work, but will not achieve a speedup via the JIT. We encourage library authors to use CFFI instead.
If you are looking for how to use PyPy with the scientific python ecosystem, we encourage you to use conda, since they repackage common libraries like scikit-learn and SciPy for PyPy.
Refcounting, __del__
, and resource use
The main difference in pure-python code that is not going to be fixed is that
PyPy does
not support refcounting semantics for "automatically" releasing state when
an object's __del__
is called. The following code won't fill the
file immediately, but only after a certain period of time, when the GC
does a collection and flushes the output, since the file is only closed when
the __del__
method is called:
The proper fix is
The same problem---not closing your files---can also show up if your program opens a large number of files without closing them explicitly. In that case, you can easily hit the system limit on the number of file descriptors that are allowed to be opened at the same time.
PyPy can be run with the command-line option -X track-resources
(as in,
pypy -X track-resources myprogram.py
). This produces a ResourceWarning
when the GC closes a non-closed file or socket. The traceback for the place
where the file or socket was allocated is given as well, which aids finding
places where close()
is missing.
Similarly, remember that you must close()
a non-exhausted
generator in order to have its pending finally
or with
clauses executed immediately:
def mygen(): with foo: yield 42 for x in mygen(): if x == 42: break # foo.__exit__ is not run immediately! # fixed version: gen = mygen() try: for x in gen: if x == 42: break finally: gen.close()
More generally, __del__()
methods are not executed as predictively
as on CPython: they run "some time later" in PyPy (or not at all if
the program finishes running in the meantime). See more details
here.
Why is memory usage so high?
Note that PyPy returns unused memory to the operating system only after
a madvise() system call (at least Linux, OS X, BSD) or on Windows. It is
important to realize that you may not see this in top
. The unused
pages are marked with MADV_FREE
, which tells the system "if you
need more memory at some point, grab this page". As long as memory is
plentiful, the RES
column in top
might remains high. (Exceptions to
this rule are systems with no MADV_FREE
, where we use
MADV_DONTNEED
, which forcefully lowers the RES
. This includes
Linux <= 4.4.)
More info
A more complete list of known differences is available at our dev site.