Anton Kopylov

Codex Ruby: a Ruby SDK for the Codex CLI

I use coding agents from the terminal a lot. That is fine when the human is sitting there, reading output and deciding what to do next.

But once I wanted to build Codex-powered workflows inside Ruby apps, the terminal interface was not enough.

I needed a boring Ruby API:

  • start a Codex thread
  • send a task
  • stream events as they happen
  • parse file changes and command executions as structured objects
  • resume a thread later
  • interrupt a run when needed
  • track token usage and final context size

So I built codex-ruby, a small Ruby SDK for the Codex CLI.

The gem name is codex-ruby; the Ruby module is CodexSDK.

Why wrap the CLI instead of calling an API directly?

The Codex CLI already knows how to run an agent in a project directory. It handles the local workflow: sandboxing, command execution, file edits, streaming JSONL events, MCP/tool activity, and session files.

I did not want to reimplement that.

I wanted Ruby code to treat the CLI as an engine and interact with it through a stable object model.

That means the SDK is intentionally thin. It manages the subprocess, serializes options, parses events, and gives the caller typed Ruby objects.

Starting a thread

A minimal example looks like this:

require "codex_sdk"

client = CodexSDK::Client.new(
  api_key: ENV.fetch("CODEX_API_KEY")
)

thread = client.start_thread(
  model: "o4-mini",
  sandbox_mode: "read-only",
  working_directory: "/path/to/project"
)

turn = thread.run("Explain this codebase")

puts turn.final_response
puts "Tokens used: #{turn.usage.input_tokens} in, #{turn.usage.output_tokens} out"

That is the blocking mode: send a prompt, wait for completion, get a Turn object back.

For many automation tasks, that is enough.

Streaming events

For anything interactive, streaming matters more.

The CLI emits JSONL events while the agent works. The SDK turns those into typed Ruby objects:

thread.run_streamed("Fix the failing tests") do |event|
  case event
  when CodexSDK::Events::ItemCompleted
    case event.item
    when CodexSDK::Items::AgentMessage
      puts event.item.text
    when CodexSDK::Items::CommandExecution
      puts "Ran: #{event.item.command} (exit #{event.item.exit_code})"
    when CodexSDK::Items::FileChange
      event.item.changes.each do |change|
        puts "#{change[:kind]}: #{change[:path]}"
      end
    end
  when CodexSDK::Events::TurnCompleted
    puts "Done! Used #{event.usage.output_tokens} output tokens"
  when CodexSDK::Events::TurnFailed
    puts "Error: #{event.error_message}"
  end
end

The caller does not have to parse raw JSON strings or guess what shape an event has. The SDK exposes the parts I care about: messages, reasoning, commands, file changes, MCP tool calls, web searches, todo lists, and errors.

Thread options as Ruby options

Codex has a lot of runtime options. The SDK keeps those in Ruby instead of making every app build command-line strings by hand:

client.start_thread(
  model: "o4-mini",
  sandbox_mode: "read-write",
  working_directory: "/path/to/project",
  approval_policy: "unless-allow-listed",
  reasoning_effort: "high",
  network_access: true,
  web_search: true,
  additional_directories: ["/other/path"]
)

Under the hood, codex-ruby serializes config and CLI flags in the format the Codex CLI expects.

That sounds small, but this is exactly the kind of glue that becomes annoying when every app has to do it separately.

Resuming and interrupting

Agent work is not always a single request. Sometimes the useful workflow is:

  1. start a thread
  2. let the agent inspect the repo
  3. ask a follow-up
  4. resume later from a stored thread id

The SDK supports that:

thread = client.resume_thread("thread_abc123", model: "o4-mini")
turn = thread.run("Now add tests for the changes")

And if a run needs to stop, another Ruby thread can interrupt it:

thread.interrupt

That matters when Codex is part of a larger app workflow. The app needs lifecycle control, not just text output.

Context snapshots

One feature I care about is context usage.

Codex writes rollout/session logs under ~/.codex/sessions or CODEX_HOME/sessions. Those logs include token_count entries. codex-ruby can read the latest snapshot and expose it after a run:

snapshot = thread.context_snapshot
snapshot.context_tokens
snapshot.model_context_window
snapshot.last_token_usage.total_tokens
snapshot.total_token_usage.total_tokens

This is separate from normal per-turn API usage. It answers a different question:

“How full is this agent thread’s context now?”

That is useful for long-running agent workflows where the next turn may fail, compress, or behave differently because the context window is getting crowded.

Design constraint: keep it boring

The gem is intentionally dependency-light. It shells out to the Codex CLI and parses JSONL with the Ruby standard library.

The public API is small:

  • CodexSDK::Client
  • CodexSDK::AgentThread
  • CodexSDK::Events::*
  • CodexSDK::Items::*

Most of the work is not glamorous. It is subprocess lifecycle management, graceful shutdown, stream parsing, option serialization, and error handling.

But that is the point. A good SDK makes the messy boundary boring.

Where this fits

I do not see codex-ruby as “another agent framework.” It is a bridge.

The Codex CLI remains the agent runtime. Ruby remains the application language. The SDK connects them cleanly enough that a Rails app, background job, or internal tool can run agent workflows without pretending a terminal session is an API.

That is the pattern I keep coming back to in my projects: use the strongest tool for the job, then build the small adapter that makes it usable from the rest of the system.

← Blog