Getting Started with SelfTune in 2 Minutes

Prerequisites

You need an agent CLI with skills support. SelfTune works with:

Claude Code (Anthropic)
Codex (OpenAI)
OpenCode (open-source)

You also need Node.js 18+ or Bun installed. That is it. No accounts, no API keys, no configuration services.

Step 1: Install SelfTune

npx skills add WellDunDun/selftune

This installs SelfTune as a skill in your agent environment. It adds the selftune command to your agent's available tools.

Alternatively, you can tell your agent directly:

"Install the selftune skill from WellDunDun"

The agent will handle the installation for you.

Step 2: Initialize

selftune init

Or tell your agent:

"Initialize selftune"

This sets up SelfTune's local data directory, scans your installed skills, and creates the baseline configuration. The output shows which skills were detected and their initial status.

✓ Found 12 installed skills
✓ Created .selftune/ directory
✓ Baseline configuration written
✓ Ready for observation

Step 3: Health Check

selftune doctor

The doctor command runs a quick diagnostic on your setup and your installed skills. It checks for:

SelfTune configuration integrity
Skill description quality (basic heuristics)
Known anti-patterns in skill metadata
Agent transcript availability for analysis

Think of it as a linter for your skill setup. Fix any warnings it reports before moving on.

Step 4: Check Status

selftune status

Status gives you a snapshot of where things stand:

Skill                  Triggers    Misses    Accuracy
─────────────────────────────────────────────────────
my-deploy-skill        47          3         94.0%
my-test-runner         31          8         79.5%
my-docs-generator      12          11        52.2%

If this is your first run, the numbers will be empty until SelfTune has observed some sessions. Move on to generating evals to get data faster.

Step 5: Generate Evals

selftune evals --skill <name>

This generates evaluation cases for a specific skill. SelfTune creates synthetic prompts that should trigger your skill and prompts that should not, giving you a test suite for trigger accuracy.

selftune evals --skill my-deploy-skill

Generated 24 eval cases for my-deploy-skill:
  16 positive triggers (should invoke)
   8 negative triggers (should not invoke)
  Written to .selftune/evals/my-deploy-skill.json

These evals are derived from your skill's description and metadata. They represent SelfTune's best estimate of the trigger boundary. Review them, adjust if needed, and use them as a regression suite.

Step 6: Evolve

selftune evolve --skill <name>

This is where it gets interesting. The evolve command analyzes observed trigger data (from real sessions or generated evals) and produces concrete suggestions for improving your skill.

selftune evolve --skill my-docs-generator

Analysis for my-docs-generator:

  Current trigger accuracy: 52.2%

  Detected issues:
  ✗ Description uses "generate documentation" but users say
    "write docs", "document this", "add docs"
  ✗ Missing trigger coverage for "README" and "changelog" requests
  ✗ Technical jargon in description reduces semantic match range

  Suggested description:
  "Write, update, and maintain documentation including READMEs,
   changelogs, API docs, and code comments. Works with any
   documentation format."

  Estimated accuracy after evolution: 78-84%

  Apply? [y/n]

The suggestions come from observed data. The accuracy estimates come from running the proposed changes against your eval suite. Nothing is guesswork.

Step 7: Monitor

selftune watch --skill <name>

Watch sets up continuous monitoring for a skill. As you use your agent normally, SelfTune observes trigger events in the background and updates accuracy metrics.

selftune watch --skill my-docs-generator

Watching my-docs-generator...
  Monitoring agent transcripts for trigger events.
  Metrics will update in selftune status.
  Press Ctrl+C to stop, or run in background with --daemon.

This is how you validate that your evolve changes actually work in practice, not just in evals.

Step 8: Dashboard

selftune dashboard

The dashboard gives you a consolidated view of all monitored skills, their trigger rates, recent changes, and trend lines. It is a terminal UI, no browser required.

Use it for a quick overview when you want to check the health of your entire skill portfolio at once.

Bonus: Backfill Existing Data

If you have been using an agent CLI for a while, you likely have existing session transcripts that SelfTune can analyze retroactively:

selftune replay

Replay scans your historical agent transcripts and extracts trigger data from past sessions. This gives you an immediate baseline without waiting for new sessions to accumulate.

Depending on how many transcripts you have, this can take a few seconds to a few minutes. The result is a populated status view from day one.

What Comes Next

Once you have SelfTune running, the typical workflow is:

Monitor with selftune watch during normal agent usage
Check with selftune status periodically
Evolve skills that show low accuracy with selftune evolve
Validate improvements with continued monitoring
Repeat as user language patterns shift over time

The whole cycle is data-driven. No guessing, no hoping, no waiting weeks for marketplace proxy metrics.

Total Setup Time

If you followed along, your elapsed time is somewhere around 2 minutes. You now have self-improving skills that didn't exist anywhere in the ecosystem before SelfTune.

npx skills add WellDunDun/selftune  # 30 seconds
selftune init                        # 5 seconds
selftune doctor                      # 10 seconds
selftune replay                      # varies
selftune status                      # instant

That is it. Your skills are no longer invisible.