07.06.2025 22:00
3

Codex Wrote WASD Controls for the Caretaker

In comments to the video demonstrating the Caretaker, I was long asked to add WASD controls to the interface.

And OpenAI this week gave Plus subscribers access to their code-writing agent with GitHub integration - Codex.

Coincidences aren't coincidental, so I decided to test Codex in action using this very task as an example.

What is Codex?

Codex is an autonomous cloud AI agent that can perform multiple tasks in parallel.

This means you can connect it to your GitHub account, grant all necessary repository permissions (the list can be limited) and make it work.

What main settings does it provide:

general settings

You can specify a custom prompt, whether to split the screen vertically (left - chat, right - diff with code changes) and a branch name template where Codex will develop.

Additionally, you can also configure the local environment (Codex has a virtual machine where it deploys your project)

basic settings

There are code execution settings - you can set an image for the VM and pre-install some packages, set environment variables and secrets, and a script that will run for the image.

You can also enable internet access for the agent and limit the list of sites it can connect to, as well as HTTP methods (funny, as if you can't cause trouble without POST / UPDATE / DELETE).

code execution settings

You can also connect to the VM terminal and discover Ubuntu 24.04 there:

/workspace/esp32-caretaker$ 
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04.2 LTS
Release:    24.04
Codename:   noble

Or just display a greeting:

terminal

What can you do next? When connecting a repository, Codex automatically starts 3 tasks (they might differ, here are mine):

starter tasks

Each works for some time, goes through the code and produces some result. This can be a simple text description, as in the case of Explain codebase to newcomer, or ready code changes that can be sent to the repository as a branch and/or PR with a couple of clicks.

But standard tasks aren't particularly interesting, I have a long-standing issue about adding WASD controls to the interface. Without much thought, I sent Codex only a link to this issue without additional explanations. And after 3 minutes I had ready changes for adding controls to the code, which I turned into a PR with a couple of clicks.

After that, with a couple more prompts, I added visual stick position changes when controlling with the keyboard and added Russian layout support - these are all additional optional features, so you can consider that Codex handled the task brilliantly on the first try. The process of project analysis, coding and sending PR can be watched in this YouTube video (sped up 2x).

Some comments on the problem-solving process and comparison with Cursor.

The entire solution process is available live on screen and post-factum in chat mode.

  1. Codex checks for AGENTS.md in the project - if you decide to use it permanently - it's logical to add this file with all necessary project instructions.
  2. Unlike Cursor, all project interaction is completely open - Codex can't browse directories or read files without console - all operations are performed directly in shell through ls / sed / grep / find and other commands.
  3. When changes need to be made, everything also happens visibly through console and the apply_patch command, for example, here's how it added a line to readme:
root@99b15073bb5c:/workspace/esp32-caretaker# apply_patch<<'PATCH'
> *** Begin Patch
> *** Update File: readme.md
> @@
>    * **Web Interface Joystick:** Intuitive virtual joystick control
>    * **Web Interface Sliders:** Fine-grained control over individual motors
>    * **Bluetooth Controller:** Support for physical Bluetooth controllers (e.g
., ExpressLRS Joystick)
> +  * **Keyboard WASD:** Use W/A/S/D keys in the web interface
>    * **Adjustable Video Settings:** Multiple resolution options (from QQVGA to
 UXGA)
> *** End Patch
> PATCH
Success. Updated the following files:
M readme.md

4 . And similarity with Cursor - after applying changes, Codex tries to check their validity - through linter or other checks if they are configured/written in your project. For example, it checked everything with Python linter for me, but couldn't run pio run because pio isn't installed on the VM (possibly this can be added through settings).

Then just click the button to send changes to the repository - a codex/{feature} branch is created in the repository and you can also make a PR from it. It couldn't link the PR with the issue itself, despite knowing the id and it being contained in the branch name, it simply didn't place the corresponding link in the PR description. Had to help it with that.

Overall - looks quite interesting. Whether I'll use this permanently, I don't know, for now it seems that Cursor with direct code access is more convenient purely from a development perspective - you can always manually fix something faster than with prompts. But for tasks that don't require this or for people who have a mediocre relationship with programming - the tool is quite curious.

Demo

Demo of the resulting keyboard control mode in the video below. You can also assess the level of control delay and video streaming - everything in one frame, in previous demos due to editing this was sometimes unclear.

Tags: AI ChatGPT Codex
More exclusive content and real-time updates in my Telegram channel

No comments yet

Latest articles