For skill creators

See which skills work,
which don't, and why.

Your skill works for you. But every user talks differently. selftune gives you a comparison grid with confidence scores, trigger rates, and miss patterns — then lets your users' real signals improve your skills for everyone.

$ npx skills add selftune-dev/selftune

Read the Docs See Pricing

Your default surface

Numbers you can trust

All your skills, side by side. Confidence scores, trigger rates, session counts. Instantly see which skills need attention and which are healthy.

Confidence scores

Every skill gets a confidence score based on real session data. See at a glance which skills are solid and which need work.

Trigger rates

How often does your skill fire when it should? selftune tracks trigger rates and surfaces miss patterns — the queries that should have matched but didn't.

Grade distributions

See how each skill performs across sessions. Grade distributions show whether a skill is consistently good or wildly inconsistent.

The moat

Your users make your skills better.

Add a config file to your skill package. Users who install it can opt in to share anonymous signals — what triggered, what missed, how well it worked. selftune aggregates across 20+ users and generates evolution proposals you could never get from testing alone.

Without selftune

You test with your vocabulary. Your users talk differently. You never know what's missing.

Personal mode

Your skills improve from your own sessions. Triggers match how you talk. But you're still just one person.

Crowdsourced

Anonymous signals from real users. Miss patterns you'd never discover. Evolution proposals from how everyone talks.

Creator workflow

Detect. Evolve. Validate. Ship.

From detection to deploy in one command. Every change is backed by evidence from real sessions.

Detect

See what's missing

The comparison grid shows which skills are underperforming. Drill into any skill to see specific miss patterns — the exact queries that should have triggered your skill but didn't.

Miss pattern detection
Query-level evidence
Confidence scoring

Evolve

Fix it in one command

'selftune evolve my-skill' rewrites the description based on real usage data. The new version is validated against your eval set before it goes live. The old version is backed up automatically.

Evidence-based rewrites
Eval set validation
Automatic backup

Scale

Crowdsource from your users

Add a selftune config to your skill package. Users who install it can opt in to share anonymous signals. You'll see aggregate proposals across 20+ contributors — miss patterns and evolution ideas from how everyone talks.

Opt-in only
Anonymized signals
Aggregate evolution proposals

How the contribution pipeline protects users

Explicit opt-in only

Users must actively opt in to share signals. The default is off. No silent data collection — ever.

Anonymized before it leaves the machine

selftune strips raw prompts and session content on the user's machine. Only aggregate signals are shared: trigger rates, miss patterns, confidence scores. You never see individual user data.

Minimum 20 contributors for proposals

Evolution proposals are only generated when signals from 20+ users are aggregated. This prevents any single user's patterns from being identifiable.

You review every change

Crowdsourced proposals are suggestions, not automatic deploys. You review, approve, reject, or modify every evolution before it ships.

Numbers you can trust.
From how everyone talks.

Start with the free CLI. Upgrade to Team when you're ready for crowdsourced evolution.