MCP Tool data and research

This week, we’re launching a new phase of research at Lyr3: digging into the tools that power the MCP ecosystem.

So far:

We’ve collected data on > 420 MCP servers.
Across ~300 of them, we’ve identified 1,400+ distinct tools, tool descriptions and inputs.

These numbers are more than a showcase—they’re the foundation for something deeper: understanding how AI systems can interact with tools, and how tool design affects adoption, usability, and effectiveness.

Why This Matters — Especially Through the Lens of AI-UX

Tools aren’t enough — clarity and alignment matter

An MCP server may expose ten tools, but if their names, schemas, or descriptions are ambiguous, even the best model (or developer) will struggle to use them correctly. The AI-UX — how a tool is described, how intuitively its interface is exposed — is the gatekeeper between “tool exists” and “tool actually used.”

Usability breeds trust and adoption

When tools are well-described and aligned to models’ expectations, they reduce friction, errors, and hallucinations. That matters not just to hobbyist devs — for enterprise adoption, reliability matters intensely.

Models differ in tool-use ability

Recent work (e.g. ToolEyes) shows that model size or sophistication alone doesn’t guarantee good tool use: aspects like format alignment, intent understanding, tool selection, and behavior planning are critical.

Also, methods like TRICE show that giving models execution feedback (i.e. letting them see the outcome of tool calls) helps them learn when to call a tool — and when not to — which improves effectiveness.

Thus, it’s not enough to catalog tools. You need to understand which ones are usable by which models, and why.

Too many vague tools might hurt rather than help

There’s a design principle in other UX domains: fewer, well-defined, high-quality tools outperform a shotgun of loosely specified ones. While I haven’t found a single canonical study that exactly states “smaller sets of focused tools are better” in tool-UX, usability research broadly supports minimizing cognitive load and making choices clear. (See Nielsen Norman Group’s work on AI tool usability and human expectations.)

In practice, a model (or developer) that has to choose between 20 underspecified tools may make more errors than if given 5 narrowly scoped, well-documented ones.

Strategic & business implications

For tool authors: You’ll get feedback on how to improve your API design, naming, documentation, schema clarity.
For ecosystem builders: You can identify which tool types are over-supplied vs underexplored.
For platform integrators: You’ll know which tools are safe, reliable bets into agents or apps.
For AI/ML teams: You gain a dataset to benchmark tool-use performance across models.

We believe that by extracting, normalizing, and scoring tool descriptions, we can help shift MCP from “lots of servers with hidden promise” → “an ecosystem of tools that models and humans can reliably converge on.”

More insights and deeper analyses are coming soon. Stay tuned as we layer in usability scoring, cross-model testing, and narratives rooted in the data.