Getting Started with SelfTune in 2 Minutes
Prerequisites
You need an agent CLI with skills support. SelfTune works with:
- Claude Code (Anthropic)
- Codex (OpenAI)
- OpenCode (open-source)
You also need Node.js 18+ or Bun installed. That is it. No accounts, no API keys, no configuration services.
Step 1: Install SelfTune
npx skills add WellDunDun/selftune
This installs SelfTune as a skill in your agent environment. It adds the selftune command to your agent's available tools.
Alternatively, you can tell your agent directly:
"Install the selftune skill from WellDunDun"
The agent will handle the installation for you.
Step 2: Initialize
selftune init
Or tell your agent:
"Initialize selftune"
This sets up SelfTune's local data directory, scans your installed skills, and creates the baseline configuration. The output shows which skills were detected and their initial status.
✓ Found 12 installed skills
✓ Created .selftune/ directory
✓ Baseline configuration written
✓ Ready for observation
Step 3: Health Check
selftune doctor
The doctor command runs a quick diagnostic on your setup and your installed skills. It checks for:
- SelfTune configuration integrity
- Skill description quality (basic heuristics)
- Known anti-patterns in skill metadata
- Agent transcript availability for analysis
Think of it as a linter for your skill setup. Fix any warnings it reports before moving on.
Step 4: Check Status
selftune status
Status gives you a snapshot of where things stand:
Skill Triggers Misses Accuracy
─────────────────────────────────────────────────────
my-deploy-skill 47 3 94.0%
my-test-runner 31 8 79.5%
my-docs-generator 12 11 52.2%
If this is your first run, the numbers will be empty until SelfTune has observed some sessions. Move on to generating evals to get data faster.
Step 5: Generate Evals
selftune evals --skill <name>
This generates evaluation cases for a specific skill. SelfTune creates synthetic prompts that should trigger your skill and prompts that should not, giving you a test suite for trigger accuracy.
selftune evals --skill my-deploy-skill
Generated 24 eval cases for my-deploy-skill:
16 positive triggers (should invoke)
8 negative triggers (should not invoke)
Written to .selftune/evals/my-deploy-skill.json
These evals are derived from your skill's description and metadata. They represent SelfTune's best estimate of the trigger boundary. Review them, adjust if needed, and use them as a regression suite.
Step 6: Evolve
selftune evolve --skill <name>
This is where it gets interesting. The evolve command analyzes observed trigger data (from real sessions or generated evals) and produces concrete suggestions for improving your skill.
selftune evolve --skill my-docs-generator
Analysis for my-docs-generator:
Current trigger accuracy: 52.2%
Detected issues:
✗ Description uses "generate documentation" but users say
"write docs", "document this", "add docs"
✗ Missing trigger coverage for "README" and "changelog" requests
✗ Technical jargon in description reduces semantic match range
Suggested description:
"Write, update, and maintain documentation including READMEs,
changelogs, API docs, and code comments. Works with any
documentation format."
Estimated accuracy after evolution: 78-84%
Apply? [y/n]
The suggestions come from observed data. The accuracy estimates come from running the proposed changes against your eval suite. Nothing is guesswork.
Step 7: Monitor
selftune watch --skill <name>
Watch sets up continuous monitoring for a skill. As you use your agent normally, SelfTune observes trigger events in the background and updates accuracy metrics.
selftune watch --skill my-docs-generator
Watching my-docs-generator...
Monitoring agent transcripts for trigger events.
Metrics will update in selftune status.
Press Ctrl+C to stop, or run in background with --daemon.
This is how you validate that your evolve changes actually work in practice, not just in evals.
Step 8: Dashboard
selftune dashboard
The dashboard gives you a consolidated view of all monitored skills, their trigger rates, recent changes, and trend lines. It is a terminal UI, no browser required.
Use it for a quick overview when you want to check the health of your entire skill portfolio at once.
Bonus: Backfill Existing Data
If you have been using an agent CLI for a while, you likely have existing session transcripts that SelfTune can analyze retroactively:
selftune replay
Replay scans your historical agent transcripts and extracts trigger data from past sessions. This gives you an immediate baseline without waiting for new sessions to accumulate.
Depending on how many transcripts you have, this can take a few seconds to a few minutes. The result is a populated status view from day one.
What Comes Next
Once you have SelfTune running, the typical workflow is:
- Monitor with
selftune watchduring normal agent usage - Check with
selftune statusperiodically - Evolve skills that show low accuracy with
selftune evolve - Validate improvements with continued monitoring
- Repeat as user language patterns shift over time
The whole cycle is data-driven. No guessing, no hoping, no waiting weeks for marketplace proxy metrics.
Total Setup Time
If you followed along, your elapsed time is somewhere around 2 minutes. You now have self-improving skills that didn't exist anywhere in the ecosystem before SelfTune.
npx skills add WellDunDun/selftune # 30 seconds
selftune init # 5 seconds
selftune doctor # 10 seconds
selftune replay # varies
selftune status # instant
That is it. Your skills are no longer invisible.