Running Tests with Skills in a Devbox

banner
Sebastian WitalecSebastian Witalec
5 MIN READ

Have you ever spent your afternoon in the never ending cycle of commit, push, wait, fix, push again? CI has plenty of papercuts, but the cycle of fixing a broken test can really eat up the hours in your day.

The Devbox agent skill runs your test suite without having to push your branch. It provisions the environment, syncs your code, runs the tests, and reports back. The Devbox reflects your working tree, including uncommitted changes, so you can validate local edits before pushing.

You catch failures locally instead of cycling through commit, push, wait for CI, fix, and push again.

Installing Namespace Devbox

If you are new to Namespace Devbox, follow these instructions to get set up.
Otherwise, you can skip to the next section.

Let’s walk through the steps to setup Devboxes on your machine:

Create a Namespace account

Sign up at cloud.namespace.so and then enable the GitHub integration by visiting Devboxes, clicking Connect Organization, and following the instructions.

Install the CLI and login

curl -fsSL get.namespace.so/devbox/install.sh | bash
devbox auth login

(Optional) Test your setup

You can quickly test your setup by creating an ephemeral Devbox called hello-world:

$
devbox create --name=hello-world --ephemeral --size=S --image=builtin:agents

Then list your Devboxes to see it running:

$
devbox list

And when you are done, delete it:

$
devbox expire hello-world --force

Install the Devbox Skills

The Devbox CLI ships with a set of agent skills: small instruction packs that tell an agent how to perform Devbox-related tasks, like provisioning a Devbox and executing your tests inside it.

You can install the skill at the project level or globally on your machine. A project install checks the skill file into the repo, so everyone who clones it gets the same behavior, and you can tweak the file to match your project (test commands, required dependencies, environment variables) like any other source file. This is the right default for team projects.

A global install lives in your home directory, isn't shared with collaborators, and applies anywhere you run the agent. It's useful for personal scratch projects, or when you want the skill available everywhere without committing anything.

Run the following command from your project directory:

$
npx skills add namespacelabs/agent-skills

Running Tests with Skills in a Devbox

With the skill installed, the rest is handled by your agent.

Start your agent

Start your agent of choice (Amp, Copilot, Claude Code, etc) from your project directory:

$
amp

Run tests in a Devbox

Then tell your agent to use devboxes to run tests. The skill is loaded automatically when the agent recognizes a test-running task, so no special syntax is required. Two patterns cover most of what you'll do day-to-day:

For a quick run of a single suite, package, or file, ask the agent to use one Devbox:

$
run the tests in a devbox

The agent will create one ephemeral Devbox, sync the code, install dependencies, run the suite, report back, and then delete the Devbox.

What happens under the hood?

Once you send the prompt, the agent runs through the skill step by step. You can watch each step in the agent's output, but it boils down to five phases.

First, the agent inspects the project layout (test files, configuration, package manifests) to figure out what's being run and how to split it if you asked for parallelism.

Next, it provisions one ephemeral Devbox per shard, each with an auto-generated unique name. Unique names let you run multiple agents and suites at the same time without colliding on a Devbox.

The agent then uploads your working tree (including uncommitted changes) into each Devbox and runs the appropriate install step: pnpm install, go mod download, pip install, and so on.

The suite executes inside the Devbox. Results, logs, and any failure output stream back to your terminal. If anything fails, the agent can patch the code, push the diff into the Devbox, and rerun without leaving the conversation.

When the run is complete, every Devbox the agent created is expired. No leftover state, no cost beyond the minutes the tests actually ran.

Prebuilt images

By default, the skill provisions Devboxes from a general-purpose image that includes a broad toolchain and several language runtimes. It works for most projects out of the box, but every run starts with a setup phase: installing runtimes, pulling base packages, restoring dependencies. For small projects that's fine. For larger ones, it adds up.

You can shorten the setup phase with a custom prebuilt image. Instead of starting from the default image and reinstalling everything, you bake your toolchain, language runtimes, and project dependencies into a Devbox image ahead of time. The next time the agent provisions a Devbox, it starts from that image with most of the install work already done, and goes straight to running your tests.

See the Custom Images docs for how to define a Dockerfile, build it, and point your Devboxes at it.

The bottom line

The Devbox test skill turns "run the tests" into a single instruction your agent can act on. It provisions an ephemeral Devbox (or several, in parallel), syncs your working tree, installs dependencies, runs the suite, reports back, and tears everything down. Your laptop stays free, you skip the CI queue, and the agent iterates on failures without you ever opening a shell. Install the skill with npx skills add namespacelabs/agent-skills and get started.

Accelerate your developer team

Join hundreds of teams using Namespace to build faster, test more efficiently, and ship with confidence.