Self-Improving Skills

selftune watches the skill layer — where skills should fire but don't. Consumers get invisible improvement. Creators get the data to fix it. And when creators opt their users in, everyone's skills get better.

The Problem

Your agent has infinite knowledge and zero habits

Skills are how you teach your agent — marketing workflows, PDF generation, compliance checks, research pipelines. But skill descriptions are written based on what you think you'll say, not what you actually say. The gap means skills miss, and nobody notices.

Every correction you make is lost by the next session. selftune turns those corrections into permanent improvements — learning from real usage, validating every change, rolling back if anything regresses.

Our Approach

One product. Two surfaces.

Consumers and creators have different information appetites. Consumers want outcomes — install it, forget it. Creators want evidence — confidence scores, trigger rates, comparison grids. Same product, same data. Different default surfaces.

And when creators opt their users into the contribution pipeline, anonymous usage signals flow back — enabling crowdsourced skill evolution that no amount of personal testing can match.

Who We Built This For

Anyone who teaches their agent how to work

You use skills

You want your agent to just work. selftune runs in the background, improves your skills from real sessions, and never asks you to look at a dashboard. Seen and not heard.

You build skills

You publish skills that others install. selftune gives you a comparison grid with confidence scores, trigger rates, and miss patterns. When your users opt in to share signals, you get evolution proposals powered by how everyone talks — not just you.

Comparison

Why not just rewrite skills manually?

Approach	Problem
Rewrite the description yourself	No data on what users actually say. No validation. No regression detection.
Add "ALWAYS invoke when..." directives	Brittle. One agent rewrite away from breaking.
Force-load skills on every prompt	Doesn't fix the description. Expensive band-aid.
selftune	Learns from real usage, rewrites descriptions to match how you work, validates against eval sets, auto-rollbacks on regressions.

Different Layer

MCP solved connection. selftune solves usage.

Langfuse, LangSmith, and OpenLIT trace LLM calls. selftune operates at the skill layer and uniquely captures consumer usage signals that flow back to creators through a privacy-preserving relay. The crowdsourced evolution loop is how skills improve for everyone.

Dimension	selftune	Langfuse	LangSmith	OpenLIT
Observes	Skill triggers, missed fires, description drift	LLM calls, token usage	Agent traces, chain steps	Infrastructure metrics
Diagnoses	Why a skill didn't fire for a real user request	Latency and cost	Chain failures	System bottlenecks
Improves	Rewrites descriptions, bodies, and routing tables — 3-gate validation, auto-rollback	—	—	—
Validates	3-gate pipeline, baseline comparison, auto-rollback	—	Custom evals	—
Runs	Locally, zero deps, zero API keys	Self-host or cloud	Cloud required	Helm chart
Price	Free (MIT)	Freemium	Paid	Free

These tools are complementary. They trace what happens inside the LLM. selftune makes sure the right skill fires in the first place — and improves it from how you actually work.

Open source. Self-improving.

One npx command. No API keys, no configuration. Your skills start learning from your next session. MIT licensed, forever.

View on GitHub