Skip to main content
📝 AI Tools

Stop Building AI Agents — Start Building Skills Instead

Stop Building AI Agents — Start Building Skills Instead I spent three months last year building a custom AI agent from scratch. Hundreds of lines of s...

17 min

Read time

3,250

Words

Feb 15, 2026

Published

Engr Mejba Ahmed

Written by

Engr Mejba Ahmed

Share Article

Stop Building AI Agents — Start Building Skills Instead

Stop Building AI Agents — Start Building Skills Instead

I spent three months last year building a custom AI agent from scratch. Hundreds of lines of system prompts. Carefully tuned tool definitions. Retry logic, error handling, output formatting rules — the whole production. It worked beautifully for one specific workflow. Then the client asked me to add a second workflow, and I realized I'd built a masterpiece that couldn't grow.

The agent knew how to do exactly one thing really well. Adding a second thing meant rewriting half the architecture. Adding a third would've meant starting over.

Sound familiar? If you've built any kind of AI agent in the past year, you've probably hit this wall. Your agent is brilliant at its original task and completely helpless the moment you push it outside that lane. I called it the "brilliant specialist trap" — and I couldn't figure out the way around it until I watched Barry and Mahes present something that reframed everything I thought I knew about agent development.

They didn't build a better agent. They built a better way to teach agents. And the difference between those two ideas is the gap between where most developers are stuck and where this entire field is heading.


Why Your Brilliant Agent Is Actually Kind of Dumb

Here's a thought experiment that changed how I approach agent architecture.

Take the smartest person you know. Give them a task they've never encountered — say, filing a corporate tax return in Germany, or performing a root canal. Their raw intelligence is still there. Their reasoning ability hasn't changed. But they'll fail spectacularly because intelligence without procedural knowledge is just potential energy with nowhere to go.

That's exactly what's happening with AI agents right now. Claude, GPT-4, Gemini — these models are extraordinarily capable reasoners. But when you deploy them as agents, you're essentially dropping a genius into an unfamiliar job and saying "figure it out." Sometimes they do. Often they don't. And even when they succeed, the quality is inconsistent because they're reasoning from first principles every single time instead of applying learned procedures.

Traditional agent development tries to solve this by cramming everything into the system prompt. You write paragraph after paragraph of instructions: "When the user asks about X, first check Y, then format the output as Z, and if there's an error, do W." I've written system prompts that were literally longer than the code they controlled.

The problem isn't that this approach doesn't work. It does — for one agent doing one thing. The problem is that none of that knowledge transfers. Build a second agent for a different task and you start from zero. Every instruction, every edge case, every hard-won insight about how to handle failures — locked inside a single prompt file, invisible to everything else in your system.

Barry and Mahes articulated what I'd been feeling but couldn't name: agents have intelligence and capabilities, but they lack organized expertise. They're brilliant generalists who can reason about anything but have mastered nothing. And the fix isn't building smarter agents — it's giving agents a way to acquire, store, and share expertise.

That fix is skills. And the architecture behind them is more clever than it first appears.


What Agent Skills Actually Are (And Why They're Not Just Fancy Prompts)

When I first heard "agent skills," I assumed it was a rebrand of prompt templates. It's not. The difference is structural, and it matters.

A skill is a folder. That's it at the most basic level — a directory containing files that encode procedural knowledge. But the specific composition of those files is where the design gets interesting:

Scripts (tools): Executable code that gives the agent new capabilities. A skill for web scraping might include a Python script that handles authentication, pagination, and rate limiting. The agent doesn't need to figure out how to scrape — it calls the script.

Markdown instructions: Procedural knowledge written in natural language. When to use the skill, what order to perform steps in, what edge cases to watch for, how to format outputs. This is where the domain expertise lives.

Assets: Configuration files, templates, reference data, example outputs — anything the skill needs to do its job consistently.

Here's the design choice that makes this architecture work: progressive disclosure. When Claude Code loads your skills, it doesn't dump all of them into the context window at once. It sees only the metadata — the skill name, a brief description, and trigger conditions. The full content of the skill loads only when the agent actually needs it.

Why does this matter? Context window management is the single biggest constraint in agent systems. Every token spent on instructions you're not currently using is a token stolen from reasoning about the task at hand. Progressive disclosure means you can have fifty skills installed without paying the context cost for all fifty simultaneously. The agent sees a menu of capabilities and pulls the relevant recipe only when it starts cooking.

I tested this with my own setup. I have skills for front-end design, SEO auditing, content generation, security analysis, and deployment automation. Claude Code shows me the list when I ask what skills are available. But when I say "redesign this page," only the front-end design skill's full content gets loaded. My SEO skill sits quietly in the background, ready but not consuming tokens.

Compare this to the monolithic system prompt approach. With everything in one prompt, you pay the context cost for every capability whether you use it or not. It's like carrying every tool in the hardware store in your pockets instead of walking to the right aisle when you need something.

But the folder-based structure unlocks something even more important than context efficiency — and this is the part most people haven't caught onto yet.


The Real Power Move: Skills Are Software, Not Prompts

Because skills are just folders with files, they inherit all the properties of software that we already know how to manage.

Version control. Drop your skills folder into a Git repository and you've got full history. Who changed what, when, why. Roll back a skill to last Tuesday's version because the new one introduced a bug. Branch a skill to test modifications without affecting the main version. This is foundational stuff for software but revolutionary for prompt engineering, which has traditionally been "edit the prompt, hope it works, pray you remember what you changed."

Sharing and collaboration. Zip a skill folder, send it to a colleague, and they have your exact capability. No "copy this prompt and tweak it for your setup." No "make sure you also change the model config." The skill is self-contained. Everything it needs is in the folder.

Composability. This is the big one. Multiple skills can coexist and complement each other. My front-end design skill handles visual changes. My SEO skill handles optimization. When I ask Claude to "redesign this page and fix the SEO," both skills activate. They don't conflict because each operates in its own domain with its own instructions.

I've been treating skills like microservices for agent knowledge. Each skill has a single responsibility, a clean interface, and no hidden dependencies on other skills. When I need a new capability, I don't modify an existing skill — I create a new one. When a skill becomes obsolete, I delete the folder. No refactoring required.

This microservices analogy goes deeper than I expected. In traditional software, microservices communicate through APIs. In the skills architecture, skills communicate through the agent's reasoning — the model acts as the orchestration layer, deciding which skills to invoke and how to combine their outputs. It's an elegant inversion of the usual pattern.

And the ecosystem growing around this concept is moving faster than I anticipated. Let me walk you through what's actually out there.


The Three Layers of the Skill Ecosystem

Five weeks after launch, thousands of skills already exist. They fall into three categories that reveal a lot about where this technology is heading.

Layer 1: Foundational skills. These are general-purpose capabilities that make agents better at common tasks. Document editing, scientific research synthesis, data analysis, code review. Think of these as the standard library — the skills most agents should probably have installed. The front-end design skill I use has over 100,000 installs. The SEO audit skill has around 19,000. These numbers tell you something about demand.

Layer 2: Third-party skills. This is where things get interesting. Companies are building skills for their own products. Browserbase created a web automation skill that lets Claude navigate and interact with web pages through their infrastructure. Notion built a workspace integration skill. These aren't generic — they're purpose-built by the teams who know their tools best.

What I find compelling about third-party skills is the incentive alignment. Browserbase wants developers to use their platform. Building a Claude Code skill that makes their platform effortless to use is brilliant distribution. The skill becomes the user interface, and Claude becomes the user. I expect every major developer tool to ship a Claude Code skill within the next twelve months.

Layer 3: Enterprise skills. This is the layer nobody talks about publicly but where the real money lives. Fortune 100 companies are building internal skills that encode their specific workflows, best practices, compliance requirements, and institutional knowledge.

Imagine a financial services firm that builds a skill encoding their specific risk assessment procedures. Every analyst using Claude Code gets access to the same procedural knowledge — the same checklists, the same regulatory references, the same formatting requirements. New hires ramp up faster because the skill carries the expertise that used to live only in senior employees' heads.

I've started doing this for my own clients. When I deliver a project, I include a skill folder that encodes the project's maintenance procedures. How to deploy. How to run tests. What the naming conventions are. Where the config files live. The client's team doesn't need to memorize my documentation — they ask Claude, and the skill provides the answer with the exact commands to run.

This is procedural knowledge management in its purest form. And it solves a problem I've watched organizations struggle with for my entire career: how do you preserve institutional knowledge when people leave?


Building Your First Skill: The Practical Walkthrough

Enough theory. Here's how to actually build a skill. I'll walk through creating one I built last month — a deployment verification skill that checks whether a deployment succeeded and reports any issues.

Step 1: Create the folder structure.

deployment-check/
├── skill.md
├── tools/
│   └── verify-deployment.sh
└── examples/
    └── sample-output.md

The skill.md file is the brain. It contains the trigger conditions, instructions, and context:

# Deployment Verification Skill

## Trigger
Activate when the user asks to verify, check, or validate
a deployment, or after any deploy command completes.

## Instructions
1. Run the verify-deployment.sh script with the target URL
2. Check HTTP status codes for all critical endpoints
3. Verify SSL certificate validity
4. Compare response times against baseline
5. Check for error patterns in response bodies
6. Generate a summary report with pass/fail status

## Edge Cases
- If the site returns 503, wait 30 seconds and retry once
- If SSL check fails, note whether the cert is expired
  or misconfigured
- If response times exceed 2x baseline, flag as warning
  not failure

Step 2: Write the tool script.

#!/bin/bash
# verify-deployment.sh
# Checks critical health indicators for a deployed site

TARGET_URL=$1

echo "## Deployment Verification Report"
echo "Target: $TARGET_URL"
echo "Timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)"
echo ""

# Check HTTP status
STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$TARGET_URL")
echo "### HTTP Status: $STATUS"

# Check SSL
SSL_EXPIRY=$(echo | openssl s_client -connect \
  "${TARGET_URL#https://}:443" 2>/dev/null | \
  openssl x509 -noout -enddate 2>/dev/null)
echo "### SSL: $SSL_EXPIRY"

# Check response time
RESPONSE_TIME=$(curl -s -o /dev/null \
  -w "%{time_total}" "$TARGET_URL")
echo "### Response Time: ${RESPONSE_TIME}s"

Step 3: Test it.

Install the skill in your Claude Code environment and run it:

Verify the deployment at https://mysite.com

Claude picks up the skill trigger, loads the full instructions, executes the verification script, and formats the output according to the skill's instructions. First time I ran this, it caught an SSL certificate that was expiring in three days — something I would have missed until users started seeing browser warnings.

Pro tip: Start your skills simple. My first version of this skill was just the markdown instructions with no script — Claude ran the curl commands itself based on the procedural instructions. I added the script later when I wanted more consistent output formatting. You don't need to over-engineer skills on day one. Let them evolve based on actual usage.


The Architecture Behind the Curtain: Skills + MCP

One piece of this puzzle I didn't understand initially was how skills relate to MCP servers. They looked redundant — both extend what Claude can do, so what's the difference?

After working with both extensively, here's the mental model that clicked for me.

MCP (Model Context Protocol) servers handle connectivity. They're the plumbing that lets Claude talk to external systems — databases, APIs, file systems, third-party services. An MCP server for GitHub lets Claude read repos, create issues, and open pull requests. An MCP server for Slack lets Claude send messages and read channels.

Skills handle expertise. They're the knowledge that tells Claude what to do with those connections and how to do it well.

Think of it like a kitchen. MCP servers are the appliances — the stove, the oven, the mixer. Skills are the recipes. Having a stove doesn't make you a chef. Having a recipe doesn't cook the food. You need both.

In practice, this means skills often depend on MCP servers. My deployment verification skill uses curl commands that run through Claude's shell access. A more sophisticated version might use an MCP server for cloud infrastructure to check container health, verify load balancer configurations, and query monitoring APIs. The skill provides the procedural knowledge ("check these five things in this order"), and the MCP servers provide the connections ("here's how to reach the load balancer API").

This combined architecture is already running in production at companies in financial services and life sciences. The skills encode compliance procedures and analysis workflows. The MCP servers connect to internal databases, regulatory APIs, and reporting tools. The agent orchestrates everything based on the skill's instructions.

I think this is the architecture that will define the next generation of enterprise AI deployments. Not monolithic agents with everything baked in, but thin agents with rich skill libraries and broad MCP connectivity.


What Most People Are Getting Wrong About This Shift

Let me push back on a few things I keep seeing in discussions about agent skills.

"Skills are just prompt engineering with extra steps." No. Prompt engineering is writing instructions. Skills are packaging expertise into a reusable, versionable, shareable, composable unit. That packaging is the entire point. The difference between a loose recipe scribbled on a napkin and a published cookbook isn't the content — it's the format, discoverability, and reliability. Same principle here.

"I don't need skills because my agent works fine." Your agent works fine for what it currently does. Skills are about what comes next. When your boss asks your agent to handle a new workflow, do you rewrite the system prompt? Do you create a second agent? Do you build a router to dispatch between agents? Skills let you just add a folder. The operational simplicity is the feature.

"Non-technical people can't build skills." This one's actually being disproven in real time. The markdown-based instruction format means anyone who can write clear procedures can create a skill. I watched a recruiting coordinator — zero coding background — build a skill that standardized how Claude screens resumes for their company's specific criteria. The "tool" scripts are optional for skills that don't need automation. Many skills are pure procedural knowledge in markdown, and they work surprisingly well.

The one criticism I do agree with: testing and evaluation for skills is still immature. When you write code, you have test suites, linters, type checkers. When you write a skill, how do you verify it works correctly? How do you measure whether version 2 is better than version 1? The team behind skills has acknowledged this gap and listed it as a priority for future development. Until those tools exist, I test my skills the old-fashioned way — try them on real tasks and iterate based on results.


What Happens When Agents Build Their Own Skills

Here's where my thinking gets speculative — but grounded in what I'm already seeing.

Claude Code can create skills autonomously. When you're working with Claude and it discovers a useful procedure, it can write that procedure down as a skill for future use. I watched this happen accidentally. I was debugging a deployment issue, walking Claude through my troubleshooting process step by step. After we fixed it, Claude asked if I wanted it to save the troubleshooting procedure as a skill. I said yes. Now every time I encounter a similar issue, Claude already knows the diagnostic steps.

This is transferable learning made tangible. The agent experienced a situation, extracted the procedural knowledge, and stored it in a format that persists beyond the current session. Future conversations — even future versions of Claude — can access that knowledge instantly.

The analogy to computing history is striking and I don't think it's accidental. Models are processors — raw computational power. Agent runtimes are operating systems — managing resources and processes around the processor. Skills are applications — the software that solves actual problems and encodes real expertise.

We're essentially watching the application layer of AI develop in real time. And just like software development democratized over decades — from assembly language to no-code builders — skill creation is already on that trajectory. Today it requires understanding folder structures and markdown syntax. Tomorrow it'll probably require nothing more than showing an agent how to do something and saying "remember this."


Your Next Twelve Hours

Here's what I want you to do before tomorrow. Pick one workflow you do repeatedly — could be deployment checks, code reviews, data formatting, client onboarding procedures, anything you do more than twice a month.

Write it down as a skill. Create a folder with a skill.md file. Describe the trigger condition. List the steps. Note the edge cases. You don't need tool scripts for your first skill. Just the procedural knowledge in markdown.

Install it and try it.

I guarantee two things will happen. First, you'll be surprised at how well Claude follows the procedure — better than most humans follow written documentation, honestly. Second, you'll immediately think of five more workflows you want to convert into skills.

That second reaction is the important one. It means you've stopped thinking about AI agents as products you build and started thinking about them as platforms you extend. And that shift in thinking — from building agents to building skills — is the difference between fighting the complexity ceiling and breaking through it.

The agent I spent three months building last year? I've rebuilt it as seven skills. Each one took less than a day. Together they do everything the original agent did, plus three things it couldn't. And when the client asks for a new capability next month, I'll add an eighth skill instead of starting over.

That's not incremental improvement. That's a fundamentally different way to build.


Let's Work Together

Looking to build AI systems, automate workflows, or scale your tech infrastructure? I'd love to help.

Coffee cup

Enjoyed this article?

Your support helps me create more in-depth technical content, open-source tools, and free resources for the developer community.

Related Topics

Engr Mejba Ahmed

About the Author

Engr Mejba Ahmed

Engr. Mejba Ahmed builds AI-powered applications and secure cloud systems for businesses worldwide. With 10+ years shipping production software in Laravel, Python, and AWS, he's helped companies automate workflows, reduce infrastructure costs, and scale without security headaches. He writes about practical AI integration, cloud architecture, and developer productivity.

Discussion

Comments

0

No comments yet

Be the first to share your thoughts

Leave a Comment

Your email won't be published

10  +  14  =  ?

Continue Learning

Related Articles

Browse All