Retro game generator: editing games with Claude Code

Building a Django app that runs Claude Code programmatically

Sep 15, 2025

I ended my Retro Game Generator! post by mentioning revisions to already-built games. At first, we tried a one-shot “Claude, reply with the full contents of all files you want to edit” API call, but it didn’t work very well:

Claude frequently /* lazycodes */s the bits it isn’t changing, no matter how much I ask it not to
If code files get long, its responses can get truncated
If it decides (or I ask it) to refactor and split files, it has trouble staying on topic
It frequently gets lost in dependency management, so we end up with files imported in the wrong order, code executed too soon, and similar “can’t keep this all in my head” problems

Amusingly, the /* lazycode */ problems often led to conceptual drift too:

We ask for an edit
Claude responds by eliding half of the game logic into a “rest of code remains the same” comment
We ask it to put the missing code back in
Claude responds by implementing a totally different game to fill in the gap

I knew that fixing all this would require a different architecture, likely leveraging something like Aider that’s already done the heavy lifting on those pretty fundamental issues.

Since then, of course, Claude Code was released, and the Claude Code SDK. So I recently decided to tackle revisions.

Changing the schema

Originally, I had simple database-backed models for a Game (with a name, JSON blobs of files and assets, and Vercel deployment URL) and a Message (attached to a game, with a user/assistant role, text content, and a way to keep them in order).

To properly accommodate revisions, I extended this with a GameRevision model1:

class GameRevision(BaseModel):
    game = models.ForeignKey(Game, on_delete=models.CASCADE, related_name='revisions')
    version_number = models.PositiveIntegerField(help_text="Sequential version number for this game")
    files = models.JSONField(default=list, help_text="JSON array of file objects with name and body")
    assets = models.JSONField(default=list, help_text="JSON array of asset paths used in the game")
    vercel_url = models.URLField(blank=True, null=True, help_text="The deployed game URL")
    deployment_status = models.CharField(
        max_length=20,
        choices=[
            ('pending', 'Pending'),
            ('deploying', 'Deploying'),
            ('deployed', 'Deployed'),
            ('failed', 'Failed'),
        ],
        default='pending'
    )

I also extended the Message model with a nullable foreign key to GameRevision, so that we can track which message prompted any given version of the game.

Giving Claude a code environment

Of course, Claude Code works against a filesystem, and it expects to be able to read files, write files, and run bash commands.

But the code for our games is all just living in database fields until it gets deployed to Vercel.

To bridge this gap I introduced a CodeEnvironment concept. It’s responsible for turning a GameRevision from the database into an isolated Claude Code-friendly environment, and harvesting the results of a Claude Code session for persistence and deployment:

class CodeEnvironment(ABC):
    """                                                                                                                                                                                                              
    Abstract base class for code execution environments.                                                                                                                                                             
                                                                                                                                                                                                                     
    Each environment provides:                                                                                                                                                                                       
    1. File materialization (revision -> filesystem)                                                                                                                                                                 
    2. Isolated execution space (hopefully)                                                                                                                                                                                     
    3. Change harvesting (filesystem -> revision data)                                                                                                                                                               
    4. Cleanup                                                                                                                                                                                                       
    """

    def __init__(self, revision: GameRevision):
        self.revision = revision
        self.game = revision.game

    @abstractmethod
    def __enter__(self):
        pass

    @abstractmethod
    def __exit__(self, exc_type, exc_val, exc_tb):
        pass

    @abstractmethod
    def get_working_directory(self) -> str:
        pass

    @abstractmethod
    def materialize_files(self) -> None:
        pass

    @abstractmethod
    def harvest_changes(self):
        """                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
        Returns a list of {"name": str, "body": str} dicts                                                                                                                                                   
        """
        pass

For now, I just implemented this with a tempdir-based approach that just dumps the files out to a new temporary directory and cleans up after itself:

class TempDirCodeEnvironment(CodeEnvironment):
    def __enter__(self):
        self._temp_dir = Path(tempfile.mkdtemp(prefix=f"game-{self.game.id}-rev-{self.revision.id}-"))
        try:
            self.materialize_files()
            return self
        except Exception:
            self.__exit__(None, None, None)
            raise

    def __exit__(self, exc_type, exc_val, exc_tb):
        if self._auto_cleanup and self._temp_dir and self._temp_dir.exists():
            shutil.rmtree(self._temp_dir, ignore_errors=True)
        if self._auto_cleanup:
            self._temp_dir = None

    def materialize_files(self) -> None:
        for file_data in self.revision.files:
            file_path = self._temp_dir / file_data['name']                                                                                                                                                               
            file_path.parent.mkdir(parents=True, exist_ok=True)                                                                                                                                                           
            file_path.write_text(file_data['body'], encoding='utf-8')

This is of course a terrible idea no matter how much Claude Code tries under the hood not to rm -rf / so in the longer term I’m planning to provide another environment. Maybe one that spins up Fly.io containers on demand2.

Driving Claude Code with code

The next step was integrating the Claude Code SDK itself. First I construct a claude_code_sdk.ClaudeCodeOptions to specify the tempdir our code is “checked out” to, the tools and blanket permission mode we should use, and our overall system prompt. Then I can run a prompt and watch the output:

options = claude_code_sdk.ClaudeCodeOptions(
    cwd=working_directory,
    system_prompt=system_prompt,
    allowed_tools=["Read", "Write", "Edit", "MultiEdit", "LS", "Bash"],
    permission_mode='acceptEdits',
)

async for message in claude_code_sdk.query(prompt=prompt, options=options):
    print(message)

That will run for a while, print out a bunch of interesting output, and eventually stop.

When we send a command to Claude Code programmatically, a whole bunch of assistant messages (thinking blocks, intermediate messages that the user may or may not care about, tool uses) and user messages (tool results) will occur semi-autonomously.

This will all happen while we loop over that single claude_code_sdk.query function call and wait for it to terminate.

Claude eventually decides it's completed the assigned task. At that point, the SDK conveniently marks its final message as a special ResultMessage type. That will include a subtype that can indicate success, some interesting metadata like token usage and total cost, and a text result.

This means I can log-but-ignore all the intermediate assistant messages and tool uses, and just treat the ResultMessage as a response to the “real” end user.

So I can build up a clean conversation history, where the user's message and the final ResultMessage from Claude Code are maintained in the top-level exchange for both UI display and followup LLM calls:

await save_to_conversation_table(prompt, role="user")

async for message in claude_code_sdk.query(prompt=prompt, options=options):
    save_to_audit_log(message)
    if isinstance(message, claude_code_sdk.ResultMessage):
        await save_to_conversation_table(message.result, role="assistant")

Note that the SDK is all async. I didn't want to bother with that in any Django view code or ORM code, so I wrapped it in a simple synchronous function that takes a dict of files and a prompt, kicks the whole thing off, and returns a dict of files and a reply:

def run_claude_code_sync(revision, system_prompt, messages):
    with asyncio.Runner() as runner:
        with TempDirCodeEnvironment(revision) as code_environment:
            claude_code_service = ClaudeCodeService(
                working_directory=code_environment._temp_dir,
            )
            return runner.run(
                claude_code_service.converse(
                    system_prompt=system_prompt,
                    messages=messages,
                    progress_callback=progress_callback,
                )
            )

I then called that from a simple django-q task for my view to enqueue and my front end to poll.

This feels like a good base. I should be able to e.g. build out that whole Fly.io backend — or more complex multi-agent pipelines — without any of my Django code or UI layer needing to know at all.

Who’s paying for this?

After using this to revise games for a few days I logged in to my Anthropic developer console and noticed something strange: it wasn't logging any money spent against the API key I thought it was using.

Or any other API key!

Apparently, under the hood this whole Python SDK just shells out to my system-installed claude CLI with --output-format stream-json and gathers up the output. So it's using whatever auth I have configured there, which happens to be my Claude Pro subscription plan via a previous OAuth /login.

This is great, because it means I'm not spending any extra money!

…until it’s not, because I hit my usage cap.

I’d love to figure out how to toggle between usage modes programmatically, so I can use my Pro subscription while I can but then seamlessly switch to pay-per-token if I really need to make more games before usage resets.3

Should I use this everywhere?

Now I have two totally distinct code paths:

When creating a new game: uses the Claude API, instructs Claude to put code inside XML tags with a one-shot prompt, extracts files immediately from the response
When revising a game: uses Claude Code, lets it do whatever it wants in its code environment, extracts files from the environment when it says it’s done

At some point, I should probably unify these and move to the Claude Code environment for all work. Apart from maintainability I suspect this would give me a few other benefits:

The initial game could be more complex, with more code, and better-organized files — since I wouldn’t be limited to a single response from a single LLM turn
Instead of establishing my ground rules with a system prompt, I could create an empty code scaffold for Claude Code to start all new games against — with my preferences (vanilla JS, simple <script> tags, no build step) baked in.
Longer term, I could also introduce a general-purpose game framework in here too, with an event loop, keyboard bindings, title screen state … and eventually real-time multiplayer and a tile-based map layer.

The results

This has been completely transformative! We’re reliably getting great results from edit requests now: Claude Code learns the code, goes off and makes our changes, and delivers a revised game without errors that incrementally builds upon what was there before.

Here’s a game we recently built that illustrates the progression of versions nicely — EARTH VS LAVA: GROUND WARS:

V1 was a simple directionless platformer (jump around and avoid the lava baddies)
V2 added a goal (defeat ten lava baddies)
V3 ensured there were enough lava baddies to make the goal achievable :-)
V4 made the baddies start fighting back by spewing lava
V5 introduced levels with increasing difficulty and procedural generation
V6 gave us the EARTH DEVASTATOR
V7 dialed down the difficulty on early levels so you have time to get the hang of it

Our original code was built on FastAPI. At some point I ported it to Django.

I should probably build a middle-ground version with Docker first, but that seems less fun, and I’m not sure I’d trust it anyway.

Come to think of it I also really want that for my own regular Claude Code sessions. But there seems to be a rabbit hole here.

The Third Bear Thinks

Discussion about this post

Ready for more?