An updated approach to content sandboxing

infobeamer-fw · April 14, 2024, 9:11pm

Safe file format probing and thumbnail extraction

There are probably too few blog posts describing some of the info-beamer hosted internals, so here’s a short post describing how info-beamer securely sandboxes the parsing process of user content.

Introduction

info-beamer hosted has to handle content uploaded by users (that’s you!) in a safe way. The uploaded content has to be first examined during upload to ensure it’s one of the supported file formats. This requires parsing the uploaded content to extract information like width and height for images as well as, for example, duration for video files. Once uploaded, assets are represented by thumbnails all within the dashboard, so the content has to be converted to a small thumbnail image. This means loading PNG or JPEG files as well as opening video files to extract the first frame from the video.

Uploaded content cannot be trusted and image and video file parsers have had various security issues in the past. Keeping up with updates is necessary. But that doesn’t help against unknown future bugs or 0-days. A rock solid approach is needed to handle this.

One approach is to sandbox the metadata and thumbnail extraction process to a different machine, use a process sandbox like bubblewrap or similar features. This works, but info-beamer now goes one step further than this.

Seccomp strict mode

There are different methods to isolate untrusted code on a Linux machine. Probably the most restrictive one is seccomp’s strict mode. Once a program switches to this mode, the only system calls allowed are read, write, exit and sigreturn. Anything else results in the immediate termination of the program. This means once the mode is active, the code cannot (among other things):

Open any new file
Use the network
Start other programs
~~Measure time~~ (Turns out this isn’t true: time uses vdso, so no syscall is needed to fetch the time)
Allocate additional memory (as sbrk and mmap are not allowed)

The only abilities left are to read or write data from an already open file descriptor and do computations. Luckily that’s all that’s required to decode an image or video and to generate a thumbnail.

Implementing the sandbox

There are currently four extraction programs written in C that all use shared code:

Thumbnail and metadata from images (JPEG/PNG)
Thumbnail and metadata from videos (H264 and HEVC)
Loading and rendering TrueType font preview thumbnails
Loading and validating Lua syntax (for uploaded package code)

All pre-allocate the memory required for the task at hand thus putting an upper limit on memory usage. A custom pool allocator replacing malloc/free is used for that. That way memory allocations never fall back to using sbrk or mmap and only use memory from the provided pool. As threading isn’t allowed either, video decoding has to be single threaded. This makes no real difference as we’re not trying to decode a complete video but only the first frame.

If the provided input file isn’t a fifo, the file is mmap’ed into the processes’ memory, avoiding an extra copy, which helps when analyzing large video files. All file descriptors except those required to communicate with the Python code are closed and a time limit is set up that terminates the process if it takes too long to provide a result. Finally the code switches to seccomp strict mode. All that is implemented using a C macro named:

_______________UNTRUSTED_CODE_BELOW_THIS_LINE________________

On successful extraction of data, a simple binary protocol over stdout is used to communicate the result. It can be either an error message, a metadata block (width/height/duration/file format) or image data. The latter is sent out as a non-exploitable linear RGB(A) pixel stream that can be safely imported into PIL using Image.frombuffer.

On the python side, a wrapper makes it easy to use all that by just calling two function probe_format and load_image. Here’s an example generating a thumbnail from untrustworthy binary data in stream:

try:
    im = sandbox_extractor.load_image(
        sandbox_extractor.STREAM_IMAGE, stream,
    ) 
    # im is a normal PIL image, so the following just works.
    im.save("thumb.jpg")
except sandbox_extractor.DecodeError as err:
    # error handling

Rollout

The code has been tested on thousands of existing assets to ensure it works identical to the previous implementation and has been rolling out slowly to ensure everything works as expected. There’s no difference in server load and the implementation is rock solid so far. Suddenly a 0-day vulnerability in libjpeg or FFmpeg doesn’t sounds so scary any more. Feels good.

infobeamer-fw · March 18, 2025, 11:26am

Worth it: CVE-2025-27363 | Ubuntu. This vulnerability would probably be exploitable by uploading a malicious font file, but would be contained by the sandbox approach.

infobeamer-fw · July 9, 2025, 1:00pm

The same approach is now also used to transcode those small video thumbnail previews in WebM format. The transcoding process is fully realized within the same sandbox, so generating those thumbnails videos is completely safe, even on untrusted videos. Right now the output of the sandboxed process is the full (untrustworthy) WebM byte stream. As such the encoding part can’t benefit from any type of hardware encoding acceleration - which doesn’t exist right now, so it doesn’t matter.

It should also possible to set up a small protocol to pass out individual RGB or YUV420 frames from inside the sandbox to an accelerated encoder that then works outside the sandbox. But for the small preview thumbnail videos that wasn’t necessary as those tend to be rather small and encoding can’t be accelerated anyway.