[DiceCTF Quals 2024] Diligent Auditor

Robin Jadoul

Feb 11, 2024

Points: 281 (15 solves) Challenge Author: aplet123

I hired an auditor to secure my python interpreter!

from challenge import source

#!/usr/local/bin/python

import os
import sys
import string

mod = input("mod? ")
cod = input("cod? ")
sys.stdin.close()

if len(mod) > 16 or len(cod) > 512 or any(x not in string.ascii_lowercase for x in mod) or any(x not in string.printable for x in cod):
    print("nope")
    exit(1)

code = f"""
import sys
import os
import inspect
import {mod}

if "_posixsubprocess" in sys.modules:
    print("nope")
    os._exit(1)

for k in list(sys.modules):
    del sys.modules[k]

f = inspect.currentframe()
sys.addaudithook((lambda x: lambda *_: x(1))(os._exit))

for k in f.f_builtins:
    f.f_builtins[k] = None
for k in f.f_globals:
    if k != "f":
        f.f_globals[k] = None
for k in f.f_locals:
    f.f_locals[k] = None
del f
del k

{cod}
""".strip()

os.execv(sys.executable, [sys.executable, "-c", code])

Additionally, in the Dockerfile, we can observe that we’re running in python 3.12 and that the flag has some unknown filename with a known format flag-*.txt, in the eventual working directory. The latter is usually a measure to prevent a pure file read from obtaining the flag, and encouraging arbitrary code execution for the solution. As we’ll see, however, full code execution is not needed for this challenge, as it is possible to first recover the flag filename and then still perform an arbitrary file read to recover its contents.

from sys import addaudithook as pain

First, let’s have a look and summarize all restrictions that this jail places upon our code.

On the outside, we just have a few sanity checks on our input to be included in the executed code. We can choose a module to be imported, but importantly, it cannot contain underscores or dots. That means that most extension modules and all submodules are already excluded, on the top level. Of course, as long as we have an importable module that eventually transitively imports our target module, there’s no problem with that. On the code itself, there’s just a minor restriction on the used character set, presumably just to play it safe and avoid any weird unicode shenanigans such as weird interpretations of an rtl override character. There are some length restrictions, but nothing too stringent to make us worry.

Then, on the inside, we first have our unrestricted import, before the pain begins. sys.modules is checked for the _posixsubprocess module having been (transitively) imported, since it contains the fork_exec function that does what it says on the tin, without any audit hooks. Then, the module cache is cleared (which could probably be done with sys.modules.clear() just fine too), before installing a killer audit hook and clearing out essentially all available variables. The audit hook does a bit of trickery to hide a reference to os._exit where in a closure where it becomes hard to touch without significant manipulations, to avoid us easily disabling it. To see what the audit hook prevents us from doing, I suggest you have a quick look at the table of events and wondering which of those are actually comprehensively covered ;) All variables in all of the builtins, globals and locals are then set to None, including the imported modules like sys or our {mod}. Observe that even if we get access to the sys module, this will not be enough to get immediate access to our import of choice, as it has been removed from the sys.modules cache at this point too.

print(solution.intended(ctypes))

We shall first examine the intended solution, yielding arbitrary code execution, before we move on to the unintended approach of just reading the right file.

One prominent choice of module to make is of course ctypes. It allows you to call into C functions, mess with interpreter internals, and do all around “dangerous things”. But… with great power come great ~~responsibility~~ auditing. As of the time of this writing, there are no less than 16 unique audit hook events defined specifically for ctypes, in the audit hook events list. That means we can’t just get the address of some object (and while the builtin id function gives you the memory address – and transitively, the standard __repr__¹ – it is of course audited, with the builtins.id event), or resolve the address of a symbol (with dlsym), or much of anything else. One thing that does turn out to be possible, is to obtain pointer through ctypes.byref, which can be turned into a proper pointer after some ctypes.cast. It’s interesting to observe here that directly passing byref(py_object(())) into cast will give a nullpointer, but as long as a some reference is kept to either the byref input or result, we do get a proper pointer.² This behavior is likely related to the “reference-stealing” nature of byref.

From there, we can start calculating offsets to interesting things in interpreter memory. One of those interesting things would then be the audit hook, so that we could either mess with the closure variables of the hook to avoid the exit, overwrite the code, or remove the audit hook itself altogether. Conveniently enough, the audit hooks defined in python and added via sys.addaudithook³ are stored in a python list⁴ and not a C linked list, which in turn means that we can list.clear it, getting rid of all those pesky os._exit calls.

From there, all that remains is to get a handle to os, which can even be found in ctypes._os, and spawn all the shells we’d like through os.system.

As an alternative ctypes-based solution, you could observe that, while ctypes.call_function is a defined audit hook event, it does not trigger when you call ctypes.memmove for instance. This again allows us to overwrite arbitrary interpreter memory, or, when digging deeper into the type of this object, again get some address leaks (like ctypes.pythonapi._handle for libpython or ctypes.memmove_addr for libc) and call arbitrary C functions by constructing new objects of the same type but with a different address. Like so: ctypes.memmove.__class__(addr)(arguments).⁵

Finally, it is worth noting that all of this ctypes magic does rely on some functions that trigger audit hooks,⁶ but only during import, which in this case happens well before any hooks are installed.

And now for something completely different: back to our scheduled unintended solution.

import {mod}

Have you ever considered what goes on inside your python interpreter when you type those six innocent letters: import? Maybe you have, but have you actually looked at the source code yet? Time for a bit of a journey through the bowels of python! Feel free to skip this if it gets too long, I’ll make sure to mention the important bits we saw along the way again when we use them.

As we can verify experimentally with a quick del __builtins__.__import__, the first step in importing a module is calling that function: __import__. A quick detour through bltinmodule.c, we end up in the aptly named import.c. There, other than dealing with some caching, package and fromlist stuff, the important call goes to import_find_and_load, which in turn – of course – emits an audit hook, and hands things over to the _frozen_importlib module, which also goes by the name of importlib._bootstrap and is actually written in python.

Once in python land, we again check the sys.modules cache, followed by acquiring a lock to avoid threading interfering with importing. The _find_and_load_unlocked function takes over, again looks at package paths to ensure all parent modules have been imported, and starts the actual importing “logic”.

When importing, there are several knobs to tweak and things to hook. First, a ModuleSpec is needed, which can be found by any “finder” that has been registered on sys.meta_path. The responsibility for a finder is to locate the module in some form, be it a file path, an entry in a zip file or even some network location and to return a corresponding spec. All with the find_spec classmethod. By default, sys.meta_path contains three importers:

BuiltinImporterfor – you guessed it – builtin modules.
FrozenImporter for frozen modules. That is, modules written in python by “translated” to C so they don’t depend on importing .py files. As an example, the importlib._bootstrap module we’re now looking in has been frozen as _frozen_importlib to avoid the import system messing with itself.
PathFinder which lives in an adjacent module importlib._bootstrap_external and concerns itself with file system loading.

The first two cycle back into C land and essentially follow the normal rules of extensions modules. They define themselves and a module gets created. The last one has some more interesting stuff and is likely more immediately relevant, so let’s follow that rabbit.

As the docstring indicates, this looks at even more hookable things, including sys.path specifying where to look for files, and sys.path_hooks that allow further customization on how to interpret the entries thereof, such as looking in the regular file system or inside of zip files. We also see the first mention of sys.path_importer_cache here, which will turn out rather interesting later. The method itself is simple, and defers to _get_spec. That method then starts going through sys.path, and querying or populating sys.path_importer_cache along the way, storing the appropriate sys.path_hook objects for each entry on sys.path. Once the appropriate loader is found from that cache, it is in turn called to produce the requested spec, from the given location of sys.path. For convenience, we assume it is a FileFinder

This FileFinder can then proceed by enumerating its directory for any files that would satisfy the requested module. Of course, it also stores all possibly relevant files in its own cache: finder._path_cache. If a file is found (be it mod.py, mod.pyc, mod.so, mod/__init__.py or something else completely), it can then put all the information together in a module spec, along with the next object in this long chain: a loader that knows how to load such a file (e.g. you can have a different Loader class for .py vs .pyc files). Of course, for FileFinder, it is also possible to tweak which kinds of loaders it has access to, although I believe this would need to be addressed at the sys.path_hooks level.

Remember where we first had importlib._bootstrap._find_and_load_module_unlocked ask for a spec? We finally found it now! It then does some more things to create a module object and populate it, all guided by the loader object provided by the spec, as assembled by the finder,⁷ and populate the correct entries in sys.modules and optional parent modules. For completeness, I’ll just reference that the loader drives this through the create_module and exec_module methods. These can in the simplest case just create a types.ModuleType and exec the obtained code in its namespace.

From cache to history to flag

Now, during all this importing, we’ve seen quite some caching going on. Starting from the well known sys.modules, through sys.path_importer_cache, up to the _path_cache of a specific FileFinder. And of course, those last two will become quite useful to us. Since python’s working directory is on sys.path by default, sys.path_importer will contain a FileFinder entry for that directory, once all caches have been populated during the first few imports. In turn, that FileFinder will have cached a directory listing in its _path_cache, including a juicy bit of information: the flag file name.

Now all that stands in between us and the flag is being able to read a file, or even just the first line of the file will probably do. Oh, and getting our hands on the sys module to access those caches of course, but most usual tricks via __globals__ on methods of entries in object.__subclasses__() will work there. Annoyingly, open is also audit hooked. Less annoyingy, and even luckily, I was already aware of an extension module and function that would provide me with a file read and no audit hooks, assuming the module was already imported and available: readline. Now, to get access to that function through object.__subclasses__(), we’d need to have a class in the readline module, of which there are none. So instead, we find a module that has at least one class and imports readline for itself, so that we can find it there. Such module luckily exists, and for instance rlcompleter will do the job. In readline, we “simply” need to call read_history_file(flag_filename) followed by a print of get_history_item(0). A last hurdle was getting to print things and some weirdly behaving buffering, which I eventually got around by using sys.stdout.buffer.write followed by sys.stdout.buffer.flush.

And putting all this together, we get a fairly simple piece of exploit code, since most of the hard work was already in our heads, and resolved while reading the cpython source code :)

from pwn import remote, process

PAYLOAD = r"""
subs = os.__class__.__base__.__subclasses__()
sys=[x for x in subs if "BuiltinImporter" in x.__name__][0].load_module.__globals__['sys']
p=[x for x in sys.path_importer_cache['/app']._path_cache if "flag" in x][0]
readline=[x for x in subs if "Completer" in x.__name__][0].__init__.__globals__['readline']
readline.clear_history()
readline.read_history_file("/app/" + p)
sys.stdout.buffer.write(p.encode() + b"\n" + readline.get_history_item(1).__str__().encode() + b"\ntest\n")
sys.stdout.buffer.flush()
""".strip().replace("\n", ";")
io = remote("mc.ax", 31130)
io.sendlineafter(b"mod? ", b"rlcompleter")
io.sendlineafter(b"cod? ", PAYLOAD.encode())
io.stream()

Or, with syntax highlighting to make the payload a bit more readable:

subs = os.__class__.__base__.__subclasses__()
sys=[x for x in subs if "BuiltinImporter" in x.__name__][0].load_module.__globals__['sys']
p=[x for x in sys.path_importer_cache['/app']._path_cache if "flag" in x][0]
readline=[x for x in subs if "Completer" in x.__name__][0].__init__.__globals__['readline']
readline.clear_history()
readline.read_history_file("/app/" + p)
sys.stdout.buffer.write(p.encode() + b"\n" + readline.get_history_item(1).__str__().encode() + b"\ntest\n")
sys.stdout.buffer.flush()

All eventually leading up to the glorious feeling of a flagged challenge.

dice{python_audit_hooks_not_exactly_secure}, stored in flag-e639626913ad08d1.txt

When I wrote this, I was convinced that I had seen the default __repr__ result in audit hooks being called. This must have been a hallucination, as both later experiments and reading the cpython source code again disprove it. As a result, repr also leads to easy address leaks to use.↩︎
There are also ways that don’t need byref to obtain pointer casts. For instance ctypes.POINTER(ctypes.c_voidp)(ctypes.py_object(x)).contents.value == id(x).↩︎
As opposed to audit hooks that are defined and added in native code via the C API.↩︎
This list is explicitly excluded from the garbage collector to avoid obtaining access to it.↩︎
To be honest, this approach seems like a python issue to me, as both the construction of memmove.__class__ should probably trigger a ctypes.cdata event, and calling it should absolutely trigger a ctypes.call_function. I am considering making an issue and/or a patch to cpython for this.
UPDATE Here it is GH-115322, with my apologies to future pyjail constructors :)↩︎
Importing anything will of course trigger several hooks, but ctypes in particular will also construct the ctypes.pythonapi object at import time, which requires a ctypes.dlopen.↩︎
You know… the finder that was first found by the other finder.↩︎