The AI Daily Brief: Artificial Intelligence News and Analysis
The AI Daily Brief: Artificial Intelligence News and Analysis

How to Use Agent Skills

2h ago27:585,620 words
0:000:00

The team behind Claude Code's agent skills shares lessons on building, testing, and organizing skills — and the concept is converging across the entire AI stack, from hardcore developers to mainst...

Transcript

EN

Today I'm an AI Daily Brief, how the team that designed agent skills uses age...

and before that in the headlines, you can now control cloud co-work from your phone.

The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.

Alright friends, quick announcements before we dive in. First of all, thank you to today's sponsors KPMG, Blitzi, AI UC, and Mercury to get an ad-free version of the show go to patreon.com/aideali brief or you can subscribe at Apple Podcasts. Add free is just $3 a month. If you are interested in sponsoring the show, send us a note at sponsors@aideali brief.ai. At this point, we are firmly selling

into the summer, so if you are planning campaigns in the future, it is a good time to reach out.

And of course, if you need to know anything else about the ecosystem, you can also find that

on aideali brief.ai. I would once again point you to the newsletter, which is back, and is basically the best way to get access to the links that I talk about in the show. Again, you can get that all on aideali brief.ai, and with that other way, let's dive in. One of the interesting ways that you can tell what's really important to AI builders and people on the front lines is when there's a story that on the surface looks fairly small,

but which is getting a disproportionate share of the conversation in AI circles. Our first story today is exactly that. On the surface, it's just a simple new feature for cloud co-work. In this case, it's called dispatch, and it allows you to bring your cloud

co-work with you on the go. That said, based on the reaction, 3 million views on the announcement

tweet 9,000 bookmarks, this one is a big deal to people. In the wake of open claw, companies in the agent space have either been a releasing their own versions of open claw that was obviously the topic of our show yesterday, or they've been slowly adding the

important features of open claw to their existing product suites, which has been of course

and propics approach. A couple weeks ago, we got remote control for cloud code, which allowed users to initiate cloud sessions on their computer and then carry them on to their mobile devices where they could control them doing whatever it was that they were doing, basically coding from the gym. Dispatch is basically that, but for co-work. The co-work sessions are still hosted in the sandbox are in your computer, meaning

cloud still has the same access and protections. However, you can now kick off a co-work session and then continue monitoring progress and providing approvals while out and about. And Thropic described the feature as like having a walkie-talkie for communicating with cloud. Co-work developer Felix Reisberg wrote, "It feels pretty magical to give cloud a mission on my computer and get occasional updates like creating reports from internal

dashboards or finding me a better seat on my next flight. Everything cloud can do on your computer, files, browser tools, are reachable from wherever you go."

The first impressions are good. Daniel Sahn writes, "Testing co-work from my phone,

the walkie-talkie analogy is spot on. Your phone becomes a remote control that talks to cloud running on your desktop. One more to the weekend testing list, stay tuned, post-and-coming on how it works." Ethan Malik writes, "After using it a bit, cloud co-work dispatch covers 90% of what I was trying to use open-cloth for, but feels far less likely to upload my entire drive to a malware site. He continues what I like better,

easy, much more stable and safe, existing connectors mean better integration with Gmail, browsers, etc., very good tool use, what is missing for me, ability to invite cloud to any channel, the heartbeat and productivity, and multiple sessions. Right now dispatch is one chat. Now for hardcore open-cloth users, all of those things would be dealbreakers, but this isn't necessarily about converting hardcore open-cloth users. It's about

bringing those types of feature sets to the full spectrum of tools for all the different types of

agent users. Indeed, I think Powell Hurren gets it right when he writes, "The bigger story, code,

co-work, web, and now dispatch are all converging towards the same thing. A persistent AI layer that follows you across devices and contexts." I think that is exactly right. This qualification that we keep talking about is actual just form factor adjustments as everyone figures out the right way for people to interact with agents across a variety of different use cases and behavior patterns. Speaking of open-cloth, one of the things that we've been tracking is the rise of open-cloth

in China. You might remember seeing a bunch of viral videos about people standing in line to get access to their first open-cloths, supported by some of the big Chinese tech companies, but apparently the Chinese government is now growing concerned. In recent weeks, regulators warn staff at government agencies and state-owned enterprises of the dangers of open-cloth and advise them not to install the agent. This seems to be somewhere between a stern warning

in and out-right ban across different regions and entities. Last week, authorities released a list of six-dues and dons for organizations deploying open-cloth. Among their suggestions were using the official version and minimizing internet access and permissions. Adoption is so pervasive that the Hong Kong monetary authority, which is basically their central bank, issued an official statement that they had no plans to deploy open-cloth on their internal IT systems. Chinese media is now

running open-cloth horror stories regarding privacy leaks and financial screwups, with one user apparently giving their open-cloth access to a credit card which was promptly run up to the limit. Wendy Chang is senior analyst at the Mercator Institute for China Studies believes that open-cloth has a natural cultural resonance in China. She said most people view

Technology as a convenience so when something new comes out there more willin...

suggested open-cloth being for an open source has a major role to play in its popularity.

Many analysts have noted that Chinese tech firms have struggled to monetize their models

among consumers as the concept of software subscriptions is far less developed in the east. Then for Professor Graham Webster, who focuses on geopolitics in tech but who before that, was my homemade and entrepreneurial collaborator at Northwestern back 20 years ago, suggested that the rise of open-cloth could be a flashpoint for China's AI industry. Until now, any and all experiments have been encouraged under a formal regional initiative

called AI Plus. However, the clear privacy and security concerns could trigger a rethink according to Webster who said it could be a moment that starts to cause the Chinese government to think about the downsides of widely available open models. It feels to me like there's an interesting story brewing here although I'm still not exactly sure what it is and what it says about where we are but it's something that I'm going to continue to pay attention to.

One flag related to that, while in general optimism about AI is way higher in China than it is in the US, there was a huge spike in the term AI anxiety on WeChat and February, peaking in mid-March has opened Klamania hit a crescendo. Tony Peng of Riko China wrote,

"What is different this time is the mood?" In those earlier waves, the mainstream mood was

excitement on curiosity. This time more and more people are expressing anxiety, fear and concern. Tony argues that the most obvious reason is "job and security." He writes, "For most ordinary people in China, AI still means chatbots." Kloodcoder Codex is not available. There's no household AI agent with real penetration. Then all of a sudden media reports are claiming open clock and handle a wide range of tasks autonomously and the gap between what people knew and what they're

being told deep into the sense of being left behind. In other words, even in a place with high AI optimism, the job displacement fear persists. Now, separately, Chinese authorities are

taking a second look at Meta's acquisition of Manus. From the outset, it seemed that Manus had

designed their corporate structure to circumvent controls on Chinese tech exports. The company relocated their headquarters from Beijing to Singapore in July of last year, shortly after they began taking capital from US venture firms. Sources said that officials at China's National Development and Reform Commission called executives from Meta and Manus to a meeting last week to express concerns over the deal. Government actions remain unclear, but they appear to include

an effort to bar Manus executives from departing China for Singapore. The New York Times discussed a range of different options that Chinese officials might pursue, including clawing back data exports or declaring the relocation on lawful. This could be a reaction to growing concerns about losing AI talent to the West. However, some analysts have suggested it's just a maneuver to create leverage ahead of trade talks later this month. Meta is trying to present themselves as

unconcerned with a spokesperson stating, "The transaction complied fully with the applicable law, the outstanding team at Manus is now deeply integrated into Meta. We appreciate the appropriate resolution to the inquiry." And one last one on China, Nvidia says its restarting production is Chinese export plans get back on track. Speaking at a press conference on Tuesday, Jensen Wong said, "We've been licensed for many customers in China. We've received

purchase orders from many customers and were in the process of restarting our manufacturing. Our supply chain is getting fired up." Now the process for getting export approval for H200 has been an on-again off-again affair since the idea was floated by President Trump back in December. The most recent chatter from the beginning of March was that Nvidia would shut down production and reallocate the fab time to producing next-generation Vera Rubin Hardware.

No single catalyst was attributed to the decision, but export plans have seen multiple setbacks from both Beijing and Washington in recent months. Huang suggested on Tuesday that the squabbling within the Trump administration had been settled, commenting, "President Trump's intention is that the United States should have a leadership position in access to Nvidia's best technology. However, he would like us to compete worldwide and not concede those markets unnecessarily."

Reuters mean while reported that its all systems go from the Chinese side as well, sources familiar with the situation confirm that Chinese authorities had granted approval for multiple companies to purchase H200s. Earlier, reports suggested demand was staggering with multiple Chinese firms placing orders for hundreds of thousands of chips, that demand could go towards explaining Huang's new forecast that Nvidia could see a trillion

dollars in sales by 2027. Lastly today, speaking of that big prediction from Jensen about Revenue, Amazon CEO and D.Jassy also sees AI doubling revenue for AWS. According to Reuters sources, Jassy shared the lofty projection with staff at a recent all-hands. He said that over the long-term

AI could boost annual sales for AWS to $600 billion, double his prior estimate.

Jassy said, "I've been thinking for the last number of years that AWS, call it 10 years from now, could be $300 billion annual revenue run rate business."

I think what's happening in AI that AWS has a chance to be at least double that.

AWS most recently booked 128 billion in sales for 2025, 19% growth from the prior year, and while the numbers that he's throwing around seem big, the prediction might not be all that extravagant. This would represent 17% annual growth for the coming decade. Analyst Patrick Morhead writes, "In my view, this is the clearest signal yet that hyperscale cloud is entering a second growth phase that dwarfs the first.

Net, AI is repricing the entire cloud total addressable market, a port." Brock Meanwhile points out, if AI genuinely doubles AWS revenue to $600 billion by 2036,

Then Amazon will emerge as one of the biggest beneficiaries of the entire AI ...

without even having to build the models themselves.

Interesting stuff going on, but that is going to do it for today's headlines.

Next up, the main episode. Egentic AI is powering a $3 trillion productivity revolution, and leaders are hitting a real decision point. Do you build your own AI agents by off the shelf or borrow by partnering to scale faster? KPMJ's latest thought leadership paper,

Egentic AI Untangled, navigating the build by or borrow decision, does a great job cutting through the noise or the practical framework to help you choose based on value risk and readiness, and how to scale agents with the right trust, governance, and orchestration foundation. Don't lock in the wrong model. You can download the paper right now at www.kpmj.us/navigate,

again that's www.kpmj.us/navigate. If you're looking to adopt an agentic SDLC,

blitzy is the key to unlocking unmatched engineering velocity.

Blitzy's differentiation starts with infinite code context. Thousands of specialized agents ingest millions of lines of your code in a single pass, mapping every dependency. With a complete contextual understanding of your code base, enterprises leverage blitzy at the beginning of every sprint to deliver over 80% of the work autonomously. Enterprise grade and to end tested code that leverages your existing

services, components, and standards. This is an AI autocomplete. This is spec and test driven development at the speed of compute. Schedule a technical deep dive with our AI experts at blitzy.com, that's BLI-TZY.com. There's a new standard that I think is going to matter a lot for the

enterprise AI agent space. It's called AIUC1, and it builds itself as the world's first AI agent standard.

It's designed to cover all the core enterprise risks, things like data and privacy, security, safety, reliability, accountability, and societal impact, all verified by a trusted third party. One of the reasons it's on my radar is that 11 labs, who you've heard me talk about before, and is just an absolute juggernaut right now, just became the first voice agent to be certified against AIUC1, and is launching a first-of-its-kind insurable AI agent. What that means in practice

is real-time guardrails that block unsafe responses and protect against manipulation, plus a full safety stack. This is the kind of thing that unlocks enterprise adoption. When a company building on 11 labs can point to a third-party certification and say our agents are secure, safe, and verified, that changes the conversation. Going to AIUC.com to learn about the world's first standard for AI agents, that's AIUC.com.

This episode is brought to you by Mercury, radically different banking, now available for personal accounts. I already use Mercury for my business, so when they introduce personal accounts, it made immediate sense for me. I try to bring the same level of intention to my personal finances that I bring to building companies, and most traditional banks just do not feel designed for that. With Mercury personal, you can toggle between business and personal in a click. You can set up

sub-accounts for specific goals, automate transfers, so projects and savings fund themselves, and put idle cash to work with high-yield savings, all without friction. It's built for people who care about how their money moves and want tools that actually keep up. Visit mercury.com/personal to learn more. Mercury is a FinTech company, not an FDIC-insured bank, banking services provided through choice financial group, and column an A members FDIC.

Welcome back to the AI Daily Brief. Today, we are doing a bit more of a practical hands-on style episode. It was inspired by this post from Turek over at the Claude Code team at Anthropic, called Lessons from Building Claude Code, How We Use Skills, and the context for this, is that if you take away one theme from pretty much all of 2026's episodes so far,

it's that we are moving into a much more agentic era of AI. Skills are a key component of how to get

value out of agents, and today we're going to first give a little bit of a background of what

skills are, talk about some of these lessons and best practices from the team at Claude Code, and then share a few more resources where you can take the conversation further. First of all, let's talk about what skills are. The official GitHub repo calls them a simple, open format for giving agents new capabilities and expertise. Skills are folders of instructions, scripts, and resources that agents can discover and use to perform better at specific tasks,

right once, use everywhere. The background is this, as AI coding agents were getting more and more capable throughout 2025, people started to hit a very similar wall, which was basically the system prompts kept ballooning. Every new capability meant more instructions, more examples, and more edge cases crammed into a single context window. Of course, the more you try to jam into a context window, the more you're going to have performance degradation. Having to juggle all of

that knowledge all at once was crowding out space for actual execution on the task at hand. That led to agents getting slower, more expensive, and less reliable. Now the insight that ended up driving skills was that agents don't need access to all of their knowledge all the time. What they need is to be able to load the right knowledge at the right moment. On October 16th, anthropic officially announced skills in a blog post. The post was called equipping agents

For the real world with agent skills and frame the issue as this.

work requires procedural knowledge and organizational context. They write, "As model capabilities improve, we can now build general purpose agents that interact with full-fledged computing environments." Claude code, for example, can accomplish complex tasks across domains using local code execution

and file systems. But as these agents become more powerful, we need more composable, scalable,

and portable ways to equip them with domain-specific expertise. This led us to create agent skills, organized folders of instructions, scripts, and resources that agents can discover and load dynamically to perform better at specific tasks. So what a skill actually is, is a directory anchored by a Mark Town file. Every skill directory is going to have a skill dot MD file that's going to have some required metadata like a name and a short description.

When agents have access to skills, rather than having to have all of the context all at once, they simply load up the name in the description. The idea of progressive disclosure in skills is to give the agent just the information that it needs in order to make good decisions without overloading its context. So basically the first layer of detail is just the short description, which means that when the agent is doing a task, it has those descriptions in mind,

and can go call up that skill if it seems like it would be useful. The second level of detail

in this progressive disclosure regime is the actual body of the skill dot MD file. If the agent thinks that that skill is going to be useful, it'll move from just reading the description to reading the contents of that skill dot MD. Now the skill dot MD body is still very small, while the metadata is tiny at roughly 100 tokens per skill, even the full skill dot MD body is recommended to stay pretty small. This leads to the third level of detail in progressive

disclosure. Basically, as skills grow in complexity, they also might have context that's relevant

only in specific scenarios. And in fact, this is a really important part that gets missed.

In the article from Anthropics to rig that we're going to come back to, he writes, "A common misconception we hear about skills is that they are just marked down files, but the most interesting part of skills is that they're not just text files. They're folders that can include scripts, assets, data, etc., that the agent can discover,

explore and manipulate. Basically, you can bundle additional context in the form of other

markdown files or references or scripts that get linked out to from the skill dot MD file. The analogy they say is a well-organized manual that starts with a table of contents, then specific chapters, and finally a detailed appendix. Almost immediately, skills began being adopted outside of just the Anthropic ecosystem. Open AI added skills support to both chatGBT and the GitHub Copilot family of coding agents, all adopted the standard,

and other ecosystems and harnesses have jumped onboard as well. The launch of open claw really took these skills conversation to the next level. As people started on mass building all of these different agents, a lot of them had common skills needs. Like, for example, understanding how to use specific tools, how to interact with certain types of file formats, like documents and PDFs, or how to take specific actions like transcribing audio. A site

called claw hub quickly launched, then now has something like 28,000 skills. Another people have their own collections focused on particular use cases or areas of interest. And yet, what inropic found when they actually sat back and looked was that as many skills as they were available, many of not most of them could fit into one of nine categories. Library and API reference, product verification, data and analysis, business automation,

scaffolding and templates, code quality and review, CICD and deployment,

incident runbooks and infrastructure ops. So that's what led them to this post.

Let's talk first about some of the categories in this textonomy and then some of the more general best practices that Anthropic shared. I'm not going to go through all nine categories, but let's talk about a couple. One key category they found was data fetching an analysis, skills that, for example, can act to your data. These skills they write might include libraries to fetch your data with credentials, specific dashboard IDs, et cetera, as well as instructions

on common workforce or ways to get data. Another category which I can see being important to listeners of this show is business process and team automation. In other words, skills that automate repetitive workflows into one command. They write these skills are usually fairly simple instructions, but might have more complicated dependencies on other skills or MCPs. An example might be a weekly recap skill, where merge PR's plus close tickets plus

deploy is come together in a formatted recap post. Another category in their key taxonomy of skills, which relates to some conversations we've been having recently, is about code quality and review. Now the conversation that we've been sharing here is one about what happens when coding agents brawl gets sufficient that it just becomes impossible for humans to review all the code. There are some who are argued that we're already far past that point, while others cling on to the

idea that humans need to have the final look. My very strong instinct is that even if it would be better, if all code that was released as products and services actually had a human review, I don't

know that there's any chance that that paradigm gets out of 2026. I think we're going to have to

solve the problem of code review in new ways, which I'll be clear as a problem that I am not qualified to solve, but I just think that we're going to be producing such an incredibly high volume of code that at some point we'll give up the ghost on the idea of being able to review it all. That makes code quality and review skills seem all the more potentially important. This anthropic

Describes as skills that enforce code quality inside of your org and help rev...

examples are adversarial review, which would spawn a fresh eye sub-agent to critique, implement fixes, and iterate until findings to grade into nitpicks, or a code style skill that enforces code

styling, especially styles that Claude does not do well by default. Interestingly, and I think

related to that, to regard use that one of the highest ROI categories are verification skills. They describe this as skills that describe how to test or verify that your code is working. Verification skills are extremely useful for ensuring Claude's output is correct. It can be worth having an engineer spend a week just making your verification skills excellent. Consider techniques like having Claude record a video of its outputs you can see exactly what is tested or enforcing

programmatic assertions on state at each step. So there are more categories in the taxonomy, but that gives you a feel for what anthropic is seeing in terms of their most valuable skills. Now admittedly, this is from the Claude code team, so it's going to index highly technical, whereas if you had an agent builder who is mostly focused on business processes, you probably see more gradations of this category for business process and team automation.

Maybe even more useful than are to reconclawed codes tips for actually making skills. One thing that a number of folks missed is that anthropic actually just updated their skill creator tool. Skill creator they write helps you write e-values, run benchmarks, and keep your skills

working as models evolve, and it was meant to answer a specific challenge. Since launching

agent skills last October they wrote, we've noticed that most authors are subject matter experts not engineers. They know their workflows but don't have the tools to tell whether a skill still works with a new model, triggers when it should, or if it actually improved after an edit.

Ultimately, they write the goal is bringing some of the rigor of software development,

like testing benchmarking an iterative improvement, to skill authoring without requiring anyone to write code. Solopreneur and educator Ollie Lemann actually thought this out as a fairly big deal. He wrote inthropic shipped three upgrades to skills that fix most problems almost everyone runs into. Problem one, you had no way to measure how well your skills were actually performing. Now you can run e-values that test your skill against multiple prompts and get a score.

Problem two, your skills break when models update and you don't notice. With the new skill creator, you can run a B test comparing your skill and raw clawed. Problem number three, he writes. Clawed doesn't even use your skill half the time because the description is too vague or too specific. Now the skill creator re-write your descriptions automatically so they trigger at the right time.

Andthropic points out ran this on their own skills and saw better triggering five out of six times. Now one other note from the skill creator that I thought was valuable is the framework for organizing skills into two categories. They call those two categories one, capability uplift, skills that help claw do something the base model either can't do or can't

do consistently, IE certain types of document creation. And then the second category of skills are

called encoded preference skills that document workflows were clawed can already do each piece. But the skill sequences them according to your team's processes. The distinction matters they say because these two types of skills may need testing for different reasons. Capability uplift skills may become less necessary as models improve, while encoded preference skills are more durable, but only as valuable as their

fidelity to your actual workflow. So back to Tarek's post. Here are some of their top tips for making skills better. The first is don't state the obvious. They write, "If you're publishing a skill that is primarily about knowledge, try to focus on information that pushes clawed out of its normal way of thinking." The front-end design skill is a great example. It was built by one of the engineers

that endthropic by iterating with customers on improving clawed design taste,

avoiding classic patterns like the interfond and purple gradients. The second tip is to build a

gotchas section. In fact, to regard use that the highest signal content in any skill is the gotchas section. These sections articulate common failure points that clawed runs into when using your skill. And ideally, he says you update your skill over time to capture these gotchas. A third tip goes back to that idea that people still think of skills as just a single marked

down file rather than an entire folder, and to reach as you should think of the entire

file system as a form of context engineering. They also suggest you should avoid railroading clawed. I give clawed the information it needs, but give it the flexibility to adapt to the situation. As to make put to the conclusion, this should be thought of more as a grab bag of useful tips than as some sort of definitive guide. That makes sense because right now, everyone is just racing to figure out how to actually engage with the new capabilities of agents.

And so every bit of advice at this point is going to be at least a little bit a work in progress. Now, one of the interesting things then is how all of these work in progress lessons apply to different categories of users. The most obvious is probably the advanced agent builders who are building and maintaining complex multi-agent teams. For them, obviously, skills are essentially a modular architecture for ancient capabilities. And frankly, this is kind

of the audience that to rig most wrote this post for. A level down from that are the individual power users, which, in my guess, is a lot of you fall into this category. This is not a person who's building complex agent teams and orchestration models, instead they are using one or a small number of agents to get their own work done faster or better or do things that weren't possible before. For that type of user, skills are basically reusable prompts with superpowers. The difference

between a skill and a saved prompt is that a skill can include actual code, templates,

Reference data, and examples, not just instructions.

how to get the agent to do something well once and then you package it so it works reliably every time.

The stand-up post example from to rigs post is perfect for this tier. This is an automation

of a daily task you do, and the type of thing that you want to happen consistently over and over again. This also helps demonstrate why that "gotcha section" can be really valuable. Every time the agent makes a mistake, you add it to the skill so it doesn't happen again, and the skill becomes a living document that gets smarter over time. This also helps you stay not locked into one specific ecosystem, because skills are supported by codex, cloud code, cursor, etc., you're not locked into

anyone tools prompting format. But what about for the mainstream user? The person who isn't

even yet fully in cloud code or codex, people who are using off the shelf tools are experimenting with perplexity computer or notion custom agents. What's interesting here is that the design pattern holds, and you can see even in the simpler, prosumer and consumer tools, the idea of skills as reusable capabilities infiltrating into the mainstream. In fact, earlier this week, notion announced custom skills for notion AI. In their announcement tweet they write,

write a prompt, you'll use it once, write a skill, and you'll use it forever. And this is the mental model shift even if you are not an agent builder with cloud code. The shift is from thinking about ad hoc prompting two reusable capabilities. For a lot of folks out there, you're not ultimately going to have to care about the full architecture of skill.md files and progressive disclosure and all these things, those folks just know they can teach the AI to

do a specific thing their way, give it a name, and invoke it whenever they want. For some

it'll almost be an update to custom GPTs which for many became essentially even though they never

fully took off. Now you can see how notion has simplified skills into their own ecosystem. Basically you can take any page in notion, click the menu, and turn that page into a skill. And the point is that this concept of skills as reusable capabilities is a concept that is converging across the entire AI stack from consumer uses up to much more advanced uses all at once. The underlying idea is that AI is less and less a one-off conversation and more and more,

a library of reliable repeatable capabilities. Skills I think are a useful framework for that

no matter what level you're engaging with it on. And hopefully this episode has given you a little bit of a better starting point. We might go deeper in a future operator episode, but for now, that is going to do it for today's AI Daily Brief. I appreciate you listening or watching.

As always, and until next time, peace!

[Music]

Compare and Explore