The Vis x GenAI workshop at IEEE VIS 2025 invited participants to build AI agents that could autonomously generate data visualizations and explanations from real-world datasets. Running a challenge like this needs a way for participants to submit their agents and a way to actually run those agents in a controlled environment. That's what this server does.
The challenge had a few requirements that shaped the system. Participants needed to sign up, form teams, and submit code through a web interface. Each submission had to run in isolation because participants' code is arbitrary and untrusted. The agents needed access to LLM APIs like OpenAI and Azure OpenAI, with keys provided by us so participants didn't have to bring their own. And the results had to feed a public leaderboard showing each team's best output.
The system is split into four pieces that communicate over a shared Postgres database and a Redis queue.
output.pdf which
gets stored and made available via the API.
When a participant uploads their agent as a ZIP file, the web service
unpacks it, validates that it contains the expected files (agent.py
and requirements.txt), and inserts a submission record. It
then dispatches a Celery task with the submission ID.
The worker picks up the task, builds a Docker image based on the
participant's requirements.txt, and runs the container.
Inside the container, the evaluation script dynamically loads the
participant's agent.py, instantiates their
Agent class, calls initialize() and
process(), and captures the output PDF. The container is
short-lived and gets torn down once evaluation finishes.
This isolation matters because we have no idea what's in the participant's code. The container runs as a non-privileged user with a strict time and memory budget. If an agent crashes, hangs, or tries something malicious, the blast radius is one container.
Each participant implements a simple interface in agent.py:
class Agent:
def initialize(self):
"""Set up the agent. Called once before process."""
pass
def process(self, input_data: dict) -> dict:
"""Run the agent on a single input."""
pass
The agent is expected to produce a single output.pdf at the
root of the container โ that PDF is the artifact that gets stored
against the submission and shown on the leaderboard.
I was responsible for designing and implementing this server end-to-end.
That covered the data model and API surface, the Docker sandbox, the
Celery + Redis pipeline, and packaging everything so it could be brought
up or torn down with a single script (./run.sh dev for
development, ./run.sh prod for production).
The platform went live for IEEE VIS 2025 and hosted the workshop challenge submissions. Each team's agent output got picked up, evaluated, and surfaced on the platform.