Introducing SelfTune: Self-Improving Skills for AI Agents

The Problem No One Measures

There are over 270,000 skills published across agent marketplaces today. Claude Code, Codex, OpenCode, and a growing list of agent CLIs depend on these skills for everything from file manipulation to deployment pipelines.

Not a single one of those 270K skills learns from how you actually talk.

When a skill fails to trigger, there is no error. No log line. No alert. The agent simply does not use it, and the user never knows why. The skill developer never finds out. The marketplace has no signal. Everyone loses, silently.

This is the gap SelfTune was built to close.

What SelfTune Does

SelfTune makes your agent skills learn how you work. It watches real sessions, detects where your language doesn't match skill descriptions, and rewrites those descriptions to match — automatically.

The core loop is four stages:

Observe

selftune watch --skill my-skill

SelfTune monitors your skill in real agent sessions. It captures the prompts that triggered your skill, the prompts that should have but didn't, and the context surrounding both.

Detect

SelfTune identifies missed triggers automatically. If a user says "make me a slide deck" and your presentation skill has a description optimized for "generate PowerPoint presentation," SelfTune catches that mismatch. No manual log review required.

Evolve

selftune evolve --skill my-skill

Based on observed data, SelfTune generates concrete improvements to your skill's description, trigger patterns, and metadata. These are not generic suggestions. They are derived from actual user language, ranked by frequency and impact.

Watch

Continuous monitoring validates that changes work. If an evolution improves trigger rates for one pattern but regresses another, SelfTune flags it before users notice.

Why This Matters Now

The agent skill ecosystem is at an inflection point. Claude Code now accounts for approximately 4% of all GitHub commits, roughly 135,000 per day. Codex onboarded over 1 million developers in its first month. OpenCode has 2.5 million monthly active users and 112,000 GitHub stars.

These agents are not toys. They are production infrastructure. And the skills they rely on have the quality assurance of a README and a hope.

General-purpose LLM observability tools like Langfuse track token usage, latency, and model calls. They tell you nothing about whether a skill triggered when it should have. Skill-level observability is a different problem. It requires understanding intent matching, description quality, and trigger reliability across diverse natural language inputs.

That is exactly what SelfTune measures.

Zero Dependencies, Zero Friction

SelfTune ships as a single skill install:

npx skills add WellDunDun/selftune

From there, initialization takes one command:

selftune init

No API keys. No external services. No configuration files to maintain. SelfTune runs entirely locally, analyzing your agent transcripts on your machine. Your data stays yours.

First insight in under 2 minutes. That is the bar we set and the bar we hit.

MIT Licensed, Open Source

SelfTune is MIT licensed. The entire codebase is open. We believe the tools that make skills better should be infrastructure, not a product moat.

If you build skills for Claude Code, Codex, or OpenCode, SelfTune makes them learn how each user talks. If you use skills as a power user, SelfTune adapts them to how you actually work.

What We Are Calling This

We searched for an existing category. There isn't one. LLM observability covers model calls. APM covers application performance. Neither covers the specific problem of "did the right skill fire for the right prompt."

So we are creating one: Self-improving skills — powered by skill observability under the hood.

It is a narrow, well-defined domain. SelfTune is the first tool in it. We suspect it will not be the last.

Get Started

npx skills add WellDunDun/selftune
selftune init
selftune doctor

Read the getting started guide for a full walkthrough. Check the GitHub repository for source code and documentation.

We built SelfTune because we needed it. If you build or use agent skills, you probably need it too.