Clean-room · Prism + stdlib · Ruby ≥ 3.4

Your tests pass.
But do they test?

Mutineer mutates your source one change at a time, runs your suite against each mutant, and reports the ones your tests failed to catch — the gaps where coverage is a green lie. Built to gate CI and to give AI coding agents an objective signal of test value.

Get started → For AI agents & CI → ★ View on GitHub

0 runtime dependencies 8 mutation operators Fork-isolated execution JSON + human output --baseline CI gating

mutineer — mutation results

$ mutineer run lib/calculator.rb --test test/calculator_test.rb

Mutineer — Mutation Results
=========================

  total      26
  killed     24
  survived    1
  no-coverage 1

  mutation score: 92.3%

── survivor ──────────────────────────────────────
Calculator#discount  lib/calculator.rb:14  (comparison)
--- a/lib/calculator.rb
+++ b/lib/calculator.rb
-    return 0 if qty >= 10
+    return 0 if qty > 10
   ↳ no test asserts the boundary at exactly 10

How it works

Mutate. Run. Report.

A mutant that your suite still passes is a test that isn't testing anything. Mutineer finds them by breaking your code on purpose.

Mutate

Mutineer parses your source with Prism and applies one change per mutant — flip a < to <=, swap && for ||, delete a statement. Each mutant is re-parsed to confirm it's still valid Ruby.

Run

Every mutant runs your suite in a forked, isolated process — across all your cores. A coverage map means each mutant only runs the test files that actually reach it.

Report

Killed = your tests caught it. Survived = they didn't. You get a mutation score and a unified diff for every survivor, pinned to the exact line. Set a --threshold to fail CI.

AI agents & CI

A test-value oracle for your pipeline.

Line coverage tells you which code ran under test. It can't tell you whether a test would notice if that code broke — exactly where AI-generated tests are weakest. Mutineer answers that, programmatically: versioned JSON, stable mutant ids, structured exit codes, and diff-scoped runs.

Agent inner loop

Close the loop on test quality

An agent writes code and tests, then runs Mutineer on just the diff. Each survivor ships a ready-made diff to feed back: "write a test that fails under this change." Stop when survivors hit zero.

--since origin/main --format json

PR / CI gate

Fail only when a PR makes tests worse

Diff the run against a stored baseline by stable id. Exit 1 on any new survivor or a score drop — adopt mutation testing on a legacy suite without fixing everything first.

--baseline prior.json --threshold 90

GitHub Action

One step in your workflow

A composite action wraps the CLI: point it at your sources, a baseline, and a threshold. Reads the diff, posts the report, sets the exit code.

uses: davidteren/mutineer@main

Read the agent & CI recipes → JSON schema reference →

Why Mutineer

Built lean, on purpose.

No agent, no boot hooks, no monkey-patching your test framework. Just Ruby's own parser and standard library.

Zero runtime deps

Prism ships with Ruby ≥ 3.4 and everything else is stdlib. Nothing to pin, nothing to break on upgrade.

Valid mutants only

Every mutation is re-parsed before it runs. Syntactically broken mutants are skipped, not counted against your score.

Fork isolation

Each mutant runs in its own forked process on Linux & macOS — no state bleed between runs, parallel by default.

Coverage-mapped

A per-mutant coverage map runs only the test files that reach the mutated line. Digest-keyed cache skips unchanged work.

CI & agent ready

Stable, sorted --format json with a versioned schema and stable mutant ids. --threshold and --baseline gate the build.

Rails-aware

--rails boots config/environment once, then forks per mutant with fork-safe ActiveRecord — dogfooded against a real Rails app.

Operators

Eight ways to break your code.

Five run by default. Three Tier-2 operators are off until you ask for them with --operators or .mutineer.yml — they're noisier, but they catch deeper gaps.

Default set keeps survivor output focused; opt into Tier 2 for a harder pass.
Operator	Mutation rule	Tier
comparison	< ↔ <= , > ↔ >= , == ↔ !=	default
arithmetic	+ ↔ − , * ↔ / , % → * , ** → *	default
boolean_connector	&& ↔ \|\|	default
boolean_literal	true ↔ false , nil → true	default
statement_removal	replace a non-final statement with nil	default
return_nil	replace a return / final expression with nil	tier 2
literal_mutation	integer → 0, 1, n+1 ; string → empty	tier 2
condition_negation	wrap if/unless/ternary condition in !( … )	tier 2

Run mutineer --list-operators to see the live set for your installed version.

Get started

Install & run.

Ruby ≥ 3.4 is the only requirement. Point Mutineer at your source and the tests that cover it.

Install

$ gem install mutineer

Run against a file and its test

$ mutineer run lib/foo.rb \
  --test test/foo_test.rb \
  --threshold 90

Gate a PR against a baseline

$ mutineer run app/ \
  --since origin/main \
  --baseline .mutineer/baseline.json

Preview mutations without running tests

$ mutineer run --dry-run lib/foo.rb

Configure once in .mutineer.yml

# .mutineer.yml
operators:
  - comparison
  - arithmetic
  - return_nil
threshold: 85
jobs: 8
format: json

Key flags: --operators, --threshold, --baseline, --since, --rails, --only, --jobs, --format human|json, --output FILE, --dry-run. Typed flags override .mutineer.yml.

Your tests pass.But do they test?