Self-Improving Skills
selftune watches the skill layer — where skills should fire but don't. Consumers get invisible improvement. Creators get the data to fix it. And when creators opt their users in, everyone's skills get better.
The Problem
Your agent has infinite knowledge and zero habits
Skills are how you teach your agent — marketing workflows, PDF generation, compliance checks, research pipelines. But skill descriptions are written based on what you think you'll say, not what you actually say. The gap means skills miss, and nobody notices.
Every correction you make is lost by the next session. selftune turns those corrections into permanent improvements — learning from real usage, validating every change, rolling back if anything regresses.
Our Approach
One product. Two surfaces.
Consumers and creators have different information appetites. Consumers want outcomes — install it, forget it. Creators want evidence — confidence scores, trigger rates, comparison grids. Same product, same data. Different default surfaces.
And when creators opt their users into the contribution pipeline, anonymous usage signals flow back — enabling crowdsourced skill evolution that no amount of personal testing can match.
Who We Built This For
Anyone who teaches their agent how to work
You use skills
You want your agent to just work. selftune runs in the background, improves your skills from real sessions, and never asks you to look at a dashboard. Seen and not heard.
You build skills
You publish skills that others install. selftune gives you a comparison grid with confidence scores, trigger rates, and miss patterns. When your users opt in to share signals, you get evolution proposals powered by how everyone talks — not just you.
Comparison
Why not just rewrite skills manually?
| Approach | Problem |
|---|---|
| Rewrite the description yourself | No data on what users actually say. No validation. No regression detection. |
| Add "ALWAYS invoke when..." directives | Brittle. One agent rewrite away from breaking. |
| Force-load skills on every prompt | Doesn't fix the description. Expensive band-aid. |
| selftune | Learns from real usage, rewrites descriptions to match how you work, validates against eval sets, auto-rollbacks on regressions. |
Different Layer
MCP solved connection. selftune solves usage.
Langfuse, LangSmith, and OpenLIT trace LLM calls. selftune operates at the skill layer and uniquely captures consumer usage signals that flow back to creators through a privacy-preserving relay. The crowdsourced evolution loop is how skills improve for everyone.
| Dimension | selftune | Langfuse | LangSmith | OpenLIT |
|---|---|---|---|---|
| Observes | Skill triggers, missed fires, description drift | LLM calls, token usage | Agent traces, chain steps | Infrastructure metrics |
| Diagnoses | Why a skill didn't fire for a real user request | Latency and cost | Chain failures | System bottlenecks |
| Improves | Rewrites descriptions, bodies, and routing tables — 3-gate validation, auto-rollback | — | — | — |
| Validates | 3-gate pipeline, baseline comparison, auto-rollback | — | Custom evals | — |
| Runs | Locally, zero deps, zero API keys | Self-host or cloud | Cloud required | Helm chart |
| Price | Free (MIT) | Freemium | Paid | Free |
These tools are complementary. They trace what happens inside the LLM. selftune makes sure the right skill fires in the first place — and improves it from how you actually work.
Open source. Self-improving.
One npx command. No API keys, no configuration. Your skills start learning from your next session. MIT licensed, forever.