Intent Ships Your Skills. selftune Tells You If They Work.
TanStack just shipped Intent — a CLI that lets library maintainers package agent skills inside npm modules. Skills update when you npm update. No more stale training data, no more copy-pasting from Discord threads, no more version confusion. It's a clean solution to a real problem.
Congratulations to Sarah Gerrard, Kyle Mathews, and the TanStack team. This is exactly the kind of infrastructure the skill ecosystem needs.
But distribution is only half the problem.
The Problem Intent Solves
Intent addresses a genuine gap: your docs are good, your types are solid, but your agent still gets it wrong. The reason? Documentation targets humans, not agents. TypeScript types validate individual API calls but can't encode intent. Training data mixes multiple library versions with no way to disambiguate. Breaking changes create permanent split-brain in model knowledge.
Intent fixes this by shipping skills alongside your package. Install a library, get its skills. Update the library, get updated skills. The Agent Skills spec — now adopted by VS Code, GitHub Copilot, OpenAI Codex, Cursor, Claude Code, Goose, and Amp — gives this a stable, cross-platform foundation.
This matters. Before Intent, skill distribution was ad hoc. Community-maintained rules files. Blog posts. README snippets. Copy-paste from GitHub issues. None of it versioned. None of it pinned to the library version you actually have installed. Intent solves this cleanly.
The Problem Intent Doesn't Solve
Here's what happens after you distribute a skill:
Skills degrade. A skill that triggers correctly on Claude Opus 4 might miss on Opus 4.5. Model updates shift how agents interpret descriptions. You won't know until users report failures — if they report them at all.
Triggers misfire. Your skill description says "database migration." Your user says "I need to update the schema for the new user roles table." Same intent, different language. The skill doesn't fire. Intent's staleness check compares your skill against your docs. It doesn't compare your skill against how users actually talk.
Models absorb capabilities. Six months from now, the base model might handle your library's patterns natively. Your skill still fires, adding instructions the agent doesn't need, potentially conflicting with its built-in knowledge. Nobody notices until something breaks.
Edge cases compound. Your skill works for the 80% case you tested. The other 20% — monorepo setups, unusual configurations, platform-specific paths — fail silently. Authored test prompts can't anticipate every environment.
Intent checks whether your skill matches your documentation. It doesn't check whether your skill works in production.
Three Layers of the Skill Lifecycle
Agent skills need three layers of tooling. Each solves a different problem at a different stage:
Layer 1: Authoring → Test in the lab
Layer 2: Distribution → Ship via npm
Layer 3: Runtime → Monitor in production, auto-improve
Layer 1 — Authoring. Anthropic's skill-creator lets you write test prompts, run them, compare skill versions, and detect regressions before you ship. This is unit testing for skills.
Layer 2 — Distribution. TanStack Intent lets you package skills inside npm modules, version them alongside your library, check for staleness against source docs, and validate format in CI. This is npm publish for skills.
Layer 3 — Runtime. selftune monitors skills after deployment using real session telemetry. When a skill misses a trigger, selftune detects it, proposes an improved description, validates the change against eval sets, deploys it with a backup, and rolls back automatically if quality drops. This is APM for skills.
Each layer is necessary. None is sufficient alone.
You wouldn't ship a web service with only unit tests and no monitoring. You wouldn't run a package registry with no way to track runtime errors. Skills are no different.
Where selftune Picks Up
Intent's lifecycle ends at distribution:
Intent: Author → Package → Distribute → Version → Stale-check
selftune: Deploy → Monitor → Detect failures → Evolve → Redeploy → Watch
selftune starts where Intent stops. After a skill is installed — whether from a SKILL.md file, a community skill pack, or an Intent-enabled npm package — selftune watches what happens next.
Real signals, not synthetic tests. selftune generates eval sets from actual user queries, not authored test prompts. Users don't talk the way you think they talk. The gap between your test prompts and real user language is where most skill failures live.
Autonomous improvement. When selftune detects missed triggers, it proposes an improved skill description, validates it against the full eval set (checking for regressions, not just improvements), and deploys it automatically. No maintainer action required.
Continuous monitoring. After deployment, selftune watch monitors for regressions. If a model update or environment change causes quality to drop, selftune rolls back to the last known good version and alerts you. Intent's staleness check runs when you ask. selftune watches continuously.
Cross-platform. selftune works across Claude Code, Codex, OpenCode, and OpenClaw. The same monitoring pipeline ingests sessions from any supported platform.
What This Means for Skill Authors
If you maintain an Intent-enabled library, selftune gives you something Intent can't: production feedback.
Intent tells you whether your skill matches your docs. selftune tells you whether your skill matches your users. It shows you the queries that should have triggered your skill but didn't — in the exact language your users used. It shows you the environments where your skill fails. It shows you when a model update changed how your skill description gets interpreted.
You can use this data to improve your skills manually, or let selftune improve them automatically. Either way, you're working from real signals instead of guessing.
What This Means for Skill Consumers
If you use Intent to install skills from your dependencies, selftune gives you visibility into whether those skills actually perform.
selftune status shows you trigger accuracy, false positive rates, and health scores for every installed skill. selftune doctor detects misconfigurations, stale logs, and missing hooks. selftune evolve improves underperforming skills without waiting for the library maintainer to ship an update.
You get the distribution benefits of Intent and the quality guarantees of selftune. Skills arrive automatically and improve automatically.
The Complementary Stack
Intent and selftune aren't competing. They're solving different halves of the same problem:
| Intent | selftune | |
|---|---|---|
| Problem | Knowledge distribution to agents | Skill quality and autonomous evolution |
| Signal | Doc-to-skill drift | Real user session telemetry |
| Improvement | Maintainer reviews and updates | Auto-propose, validate, deploy, rollback |
| When it runs | At publish time and in CI | Continuously after deployment |
| Staleness | Compares skill to source docs | Compares skill to production performance |
Use both. Author your skills with skill-creator. Distribute them with Intent. Monitor and improve them with selftune.
That's the complete stack.
Try It
# Install selftune
npx selftune@latest init
# Check skill health
npx selftune@latest doctor
# See how your skills perform
npx selftune@latest status
# Auto-improve underperforming skills
npx selftune@latest evolve
selftune is open-source, zero-dependency, and works with Claude Code, Codex, OpenCode, and OpenClaw.