The AI Daily Brief: Artificial Intelligence News and Analysis
The AI Daily Brief: Artificial Intelligence News and Analysis

Introducing Maturity Maps — A New Way to Measure AI Adoption

2h ago25:474,982 words
0:000:00

Today we're introducing Maturity Maps, a new framework for benchmarking AI and agent adoption across six key dimensions - from deployment depth to systems integration to people and governance. Thi...

Transcript

EN

Today, I am discussing a new way to think about AI in agent readiness inside ...

and it is called maturity maps.

The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.

Alright friends, quick announcements before we dive in.

First of all, thank you to today's sponsors, KPMG, Robots and Pencils and Blitzie.

To get an ad free version of the show, go to patreon.com/aideally brief, or you can subscribe directly on Apple Podcasts. To learn more about sponsoring the show, or really find out anything else about the show, like where we are with our agent madness bracket, which is going on right now, head on over to aideally brief.ai.

Today is the second day in our build week, which is happening while I'm traveling with my family. Yesterday we did a very high level overview around everything that had happened last quarter and what it meant for this quarter, and if we put this in the framework of this being build week, yesterday was sort of the context setting. The environment in which your building is happening.

Today's episode takes that down a level to discuss the benchmarks that show where others like you are. Now, one of the things that I've been thinking about a lot over the last six months or so is just how much we need a totally different set of data and benchmarks for this new AI era. Everyone is adapting incredibly quickly right now, or at least they're trying to. It's new processes, new workflows, new tooling, new everything.

And by and large, we're doing all that exploration without a map.

Let me give you a practical example of where I think our lack of benchmarks could actually

very significantly and meaningfully negatively impact a company when it comes to their AI adoption. Let's say that you are an early adopter company. Across your different functions, you've had really strong hands-on efforts to get your AI up and running. In the absence of knowing exactly what to measure, you're just trying to measure whatever you can, and early results are pretty positive.

For example, in the marketing function, you have increased your content output 30% year-over-year, without any sort of proportional increase in the resources it takes to produce that content. Now that 30% year-over-year growth sounds great, but what if I told you that all of your competitors had actually grown their content output by 50%. In AI world, this is actually not a far-fetched scenario,

and it shows how the need for better benchmarks and numbers is not just a vanity exercise. When we don't know how we're doing relative to peers and competitors, it makes it really hard for us to judge what we need to change, what we need to shift, and what we need to do next. Now at AIDB and its superintelligent, which is my enterprise AI planning and strategy company, we started exploring some of this with our AI ROI benchmarking survey at the end of last year.

We had people submit and share their real use cases and share with us the impact those use cases were having across an array of eight different impact dimensions, things like time savings, cost savings, new capabilities increased output and a handful of others. We asked them to rate impact from negative to transformational and found that by and large, at least when it comes to people self-reporting, they were already seeing strong and positive impact

from their AI initiatives. But there are a couple of clear and obvious limitations with that study.

First of all, while self-reporting is better than nothing, it's always going to be somewhat

in precise. Second, like pretty much everything that we do with this audience, you have to

calibrate it to a more advanced individual and organizational user than if you just surveyed a broad cross-section of businesses in the world. Third, while it did give us some great information around individual use case impact, it didn't tell us all that much about other dimensions of AI readiness and adoption outside of just the use cases themselves. Anyone who's felt the sting of the capability overhang, in other words the gap between what AI can do and what we're actually

using it for, knows that raw capability isn't really the question. It's the systems we put around it to get value from it. And unfortunately the research and information apparatus just has not adapted to this new reality. Not to pick on gardener specifically but they're the biggest in the space and so sort of presenting easy target. Try to entrue benchmarks and information products like

the gardener's magic quadrant have literally never been less useful than they are right now.

The idea that success in something like AI application development was going to be even a little bit dictated by choosing the right AI application development platform vendor is just so far outside of the reality of these tools as to be almost actively harmful if that's where you're putting your time in effort when it comes to trying to figure out how to adopt AI. Now, gardener is more than the magic quadrant and they are doing lots to try to catch up to the AI

world so it's not to single them out. It's more to make the point that we are in desperate need of some new frameworks, some new benchmarks and some new tools. And so at both AIDB and it's super intelligent we've been thinking about this a lot over the last call it three to four months. We've experimented with a couple of things that you'll probably see some version of come out at some point in the near future. One of them I call AI opportunity radars

which are basically a way of organizing use cases by function but then also by a applicability depending on where your organization is in its development cycle. Simply put it's a

Radar or bullseye type of visual where use cases are organized into one of th...

Prime time means that most organizations as they are are well suited to get value from

that use case right now. Emerging means that while there are a lot of organizations that can

get value out of them, there is some amount of setup costs to right set of circumstances or infrastructure that is going to be needed to get value from that and not all organizations are

going to be there just yet. Finally frontier is exactly what it sounds like. Where if your organization

is well set up with the right infrastructure you can be getting a lot of value from those use cases, but at this point most organizations aren't there yet. So over the last quarter we built an agent system that is basically constantly seeking out every new resource that can get its hands on, assessing what those resources tell us about the use cases in different functions and keeping these radars continuously updated both with with new use cases as well as changes in where the

existing use cases are placed. But as we were working on radars, it was clear again that there was something even more fundamental and that overly or only focusing on use cases was leaving out so much of what actual AI readiness means. When we're doing AI readiness and planning assessments at super intelligent, we're not just thinking about what use cases a company should do, but what's the full set of change management and infrastructure development and new policy

and investment in people and all this other stuff needs to go around it to actually get value from those use cases. And that led to the development of the framework which I'm going to be sharing today which we call for simplicity AI maturity maps. Now the concept of maturity is certainly not some proprietary thing that we invented. Morturity is just a heuristic and a framework

to look at where different organizations are around some key areas relative to one another and where

they should be. So the way the maturity maps work is that they organize AI and agent maturity

into six different categories. Those categories are first deployment depth which is sort of an

expanded notion of use cases. Deployment depth in the context of AI maturity not only thinks about how many use cases you have in play, but how much those use cases are assistance versus full workflow automations versus actual applied agent systems that are doing work with some meaningful degree of autonomy. The second category is systems integration. This is a measure of how deeply integrated the AI solutions and workflows that you're deploying are integrated with the existing

systems that run your enterprise. Is everyone using chat GPT independently or does your CRM system have an agent running through it, automatically extracting insights, making recommendations and even setting up new outreach campaigns. Systems integration is in some ways one part of the measure of how good the context that an enterprise as AI has to work with. Now the other piece that relates

to context is of course data. How much what quality and how well managed is your company's AI's

access to your company's data? Does it require people dropping in PDFs? Do you have company knowledge I'll set up on MCP servers? How does the AI that your company is looking to transform your company have access to the information it needs to know what that transformation should look like? Outcomes is almost a measure of measurement. Are all of your deployments piloted experiments? Or do you have a track record of actual demonstrable and measured outcomes?

Outcomes in some ways are the information you need to know what you should do next across all

these other dimensions. The fifth dimension of AI maturity maps is people. And this is an admittedly broad category. A big part of this refers to upskilling and capabilities, but another piece has to do with attitudes. Given that one of the major barriers to adoption in many companies is not just going to be skills using AI but attitudes towards AI. People is an extremely important and unfortunately, as we'll see often neglected piece of the AI maturity pie. Lastly, of course,

is governance. How clear, how established, how communicable, how known, are the rules and guidelines and access provisioning around your AI systems. To people know where to go to get the permissions they need, do they know what expectations are when issues come up or their mechanisms for resolving those issues? So those are the six areas across which we look at AI maturity. Now for the purposes of developing these maps, we've started with 10 functional maps,

split across some of the most common very broad brush categories of knowledge work. That includes customer service, engineering, IT, which by the way the difference between those two for our purposes is effectively that engineering is all the stuff that's external facing and IT is all the technology stuff that's internal facing, sales, marketing, HR, operations, finance, legal, and product. So at the end of last year, we started to put together a process for actually

assessing and visualizing AI maturity across all these dimensions. What came out of that, is the chart that you see here, which plots each of these six categories within a specific function on a five point scale. Number three, the center of the chart is the on track line. In other words, where an average organization should be, and the word should, as you'll see, is doing a lot of heavy lifting there. Now if on track is a three, four is ahead and five is significantly ahead,

while two is behind, and one is significantly behind. Is the idea that when you look at a maturity

Map, without having to read a lot of words, you can instantly see the gaps be...

organization should be and where the average organization actually is, and when you compare your

organization to it, also see where you are relative to both the general on track line in the average.

So clarifying this a little bit more, accordingly's designation of on track is not where the average organization is. It is a subjective measure of where we think the average organization should be. As you'll see when you dig into this quarter's numbers, in the vast majority of cases, we believe that the average organization is behind that on track line across pretty much all of these dimensions. To use a term that comes up a lot on this show, the fact that organizations tend to be

behind this on track line is effectively a visualization of the capability overhang. Now at this point, you might be wondering, what gives you authority to determine what the on track line is? It's a totally reasonable question, and believe it or not, it is a little bit more at least than just my opinion. We have a few different places to pull from. The first is the sort of proprietary research and surveying that we do as part of a IDB Intel, which gives us a pretty good

insight into where particularly leading organizations are. Second, it's super intelligent,

given that we are doing thousands and thousands of voice agent interviews every month to help organizations assess their AI maturity and plan their AI strategy that's another pretty unique source of frontline data. And then combined with that, we built a system to go out and effectively aggregate pretty much every new survey or study that comes out that even vaguely touches AI. You might have heard me mention before that my most useful open clause are my research open

clause, and this is one of the main things that they do. They are in a never-ending 24-hour

day, constantly hunting loop to both surface new sources, to assess those sources in terms of their legitimacy, credibility, and bias, and then to integrate that information into our larger assessment system. There are more than 480 studies and surveys from the last quarter that went into these Q2 maturity maps. Among the sources that have explicit sample sizes, the combined survey responded basic seats 150,000 professionals across more than 50 countries.

The types of source categories that we have are one, big four-in-top tier consulting firm research, there's over 20 of those sources in that mix, major platform earnings in public market statements, analyst firm predictions in research from companies like Gardener Forster and IDC, function specific regular annual surveys, such as Stack Overflow's Engineering Study, or other similar things for areas like marketing, legal, and IT, academic and government research,

behavioral data sources, where companies that have access to some unique user behavior data, aggregate, analyze, and share that. A good example of that is Jellyfish's AI coding benchmark,

which used behavioral data for more than 200,000 engineers across 700 companies with 20 million

PRs. Finally, there are of course practitioner reports in vendor case studies, although the system is careful to rate them with some amount of skepticism given that they are, of course, selling something. All right, folks, quick pause. Here's the Uncomfortable Truth. If your enterprise AI strategy is we bought some tools, you don't actually have a strategy. KPMG took the harder route and became their own client-zero. They embedded AI in agents across the enterprise, how work it's done, how teens

collaborate, how decisions move, not as a tech-initiative but as a total operating model shift. And here's the real Unlock. That shift raised the ceiling on what people could do, human state firmly at the center while AI reduced friction, surfaced insight, and accelerated

momentum. The outcome was a more capable, more empowered workforce. If you want to understand what

that actually looks like in the real world, go to www.kpmG.us/AI. That's www.kpmG.us/AI. Today's episode is brought to you by Robots and Pencils. A company that is growing fast. Their work is a high-growth AWS and Databricks partner, means that they're looking for elite talent ready to create real impact at velocity. Their teams are made up of AI native engineers, strategists, and designers who love solving hard problems and pushing how AI shows up in real products.

They move quickly using RoboWorks. Their agentic acceleration platform, so teams can deliver meaningful outcomes in weeks, not months. They don't build big teams. They build high-impact number ones. The people there are wicked smart with patents, published research, and work that's helped shape entire categories. They work in velocity pods and studios that stay focused and move with intent. If you're ready for career-defining work with peers who challenge you and have your back,

Robots and Pencils is the place. Explore open roles at robotsandpensils.com/careers, that's robotsandpensils.com/careers. You've tried in IDE co-pilets. They're fast, but they only see local silos of your code. Leverage these tools across a large enterprise code-based and they quickly become less effective. The fundamental constraint? Context. Blitzly solves this with infinite code context. Understanding your code-based down to the line-level dependency across

millions of lines of code. While co-pilets help developers write code faster, Blitzley orchestrates

Thousands of agents that reason across your full code base.

delivering over 80% of every sprint autonomously with rigorously validated code. Blitzley provides

a granular list of the remaining work for humans to complete with their co-pilets.

Tackle feature additions, large-scale refactors, legacy modernization, greenfield initiatives, all 5x faster. See the Blitzley difference at Blitzley.com. That's BLITZY.com.

So in Q2, what are some of the patterns that we saw? The first you might call the Adoption

Embedding Gap. Basically, every single function specific survey reports the same pattern. High claimed adoption, but it fairly low depth in utilization. This is maybe the most dominant finding across all these sources, that the story of Q2 when it comes to enterprise AI adoption is not just the capability overhang in general, but the applied capability overhang even when it comes to adoption inside an individual organization. A second, very common finding across a huge

array of these sources. There tends to be a fairly big gap between worker level data and leader level data. For example, one study found that in the area of customer service, 72% of leaders said that their AI training was adequate with 55% of their employees disagreeing. In HR, a huge percentage of leaders report that AI is a priority, but more than 2/3 of HR staff say that their organizations are not proactive in upskilling. In fact, one can argue that people are the bottleneck

that is not getting nearly enough investment. We gave 7 of the 10 functions, a score of 1 significantly behind when it came to that people category. The irony is that one could argue that the single largest barrier to converting AI adoption into AI value is on the human side, and it's the thing organizations are spending the least on. To use one dramatic example, Deloitte found 93% of AI spend going to infrastructure with only 7% going to anything related to people.

Now, outside of people, another finding across all of these sources is that data is kind of the ceiling on everything else. Eight of the 10 functions score a 1 or a 1.5 on data. Now, I don't need to beat this drum anymore for this audience, but obviously without proprietary context feeding AI, things like your code base, your customer history, your deal data, you really are not going to get past basic assisted usage no matter how good the assisted tools get. One could argue that

data is not one pillar among six, but the floor constraint that caps all the others. Another area with universal challenge is around outcome measurement, and this is not surprising. Given how much pressure there has been to adopt AI as fast as possible, one of the consequences of that is that no one slowed down pause their adoption while they went out and figured out how to actually measure the ROI of all those investments. Can you imagine right now someone in the

C suite suggesting with a straight face that you take six months off of adoption to figure out

better ways to measure the ROI first to ensure that you weren't spending too much? That my friend

is a recipe for an early retirement. The buy product with that, however, is that the actual evidence for AI ROI is pretty thin. Now, I will say that if I had to make a prediction about one area where

you're going to see the biggest glow up this year, I think there are tons and tons of efforts

around ROI measurement, and I would expect that to jump significantly in the quarters to come. Across all of these functions, the vast majority do not have any scores that are actually on track. Customer service we rated on track in terms of deployment depth and systems. It makes sense given how much focus there has been on customer service related solutions, vertical, and otherwise. Engineering as well, we have as on track for deployment depth, systems, and people. Again,

given that software engineering organizations are effectively the harbingers and guinea pigs for everything else that's going to happen to all the other knowledge workers. I think that this will be intuitive, and then relatedly in IT, we also have deployment depth systems in people as on track. Now, of course, all of these areas share structural advantages, mature tooling, technical practitioners, and measurable workflows. Unfortunately, though, even if it is the case

that there are bigger challenges in some ways when it comes to the technical capabilities of practitioners and the measureability of what you do, those other areas of the enterprise are going to have to catch up. Now, a few observations looking across the different functions,

while customer service did have a couple areas that we rated as on track. It also, I think,

reveals something that could be a harbinger for other areas. Remember, when it came to CS, we heard 72% of leaders say that training is adequate, but 55% of people actually working in CS say it's not. 87% of customer service workers report high stress, and 75% of leaders acknowledge that AI may be increasing stress. So, you've got a situation where AI is absorbing routine cases, humans get the harder, more emotional ones. Many people might not be trained for that shift, and that,

plus the fact of just increased questions about the long-term sustainability of your job,

the result is stress anxiety and burnout. Basically, CS could be the canary in the coal mine

for what happens when you deploy AI without investing simultaneously in the humans who work alongside it. One of the areas that I think is interesting to point out is the two rating or behind when it comes to governance. In most organizations, IT owns AI governance for a big part of

If not the entire organization, and yet only 54% have centralized frameworks.

on monitored, 88% have had security incidents, and the question becomes if the governance function

is behind on governance, what does that tell us about the rest of the organization? One of the most interesting findings when it comes to sales is that it might be the cleanest example of the adoption Mirage. 88% of sales teams say they use AI but only 24% have it in their actual revenue workflows. A fair bit of adoption then, in other words, this is why we rate them behind in deployment depth. Much of the quote-unquote adoption is rep-susing chat GPT and a separate

browser tab for email drafts and call prep, which is not a bad thing at all. It's just not the

level of automation and autonomy that I think sales organizations are hoping for. The autonomous

SDR dream has not fully come to fruition yet, and I don't think that most sales organizations have figured out the right integration and balance between humans and agents in the new sales working system. The deployment depth score on operations I think is another really interesting one. In some ways, operations has had automatable functions longer than any other function in the enterprise. Think statistical forecasting rules based inventory management predictive maintenance.

That's all stuff that predates this latest wave of Gen AI by a decade. What that means is that when 90% of operations teams say they're investing in AI, it sounds impressive. When you actually look at what that is, a lot of it is legacy optimization infrastructure that's been running since around 2015. The Gen AI layer on top is often very thin, mostly asking AI questions about operational data and generating reports. In fact, one study found that only 23% of operations groups

even have a formal AI strategy. Operations is the function where the distinction between

old automation and new AI maturity is showing up as a real distinct challenge.

Lastly, one more interesting stat, outside of the technical areas of engineering and IT, and the long duration area of customer service, which has had so much emphasis on automation, finances the only other function to hit on track on any pillar, and it does so on governance. Now, why do you think this might be? It's because, of course, finance is operating in an area where governance is not optional. 69% of CFOs report advanced or established AI risk governance frameworks.

Why? Socks compliance, audit trails, fiduciary duty, decades of regulatory muscle memory,

basically finance already knew how to govern risky tools even before AI existed.

Now, don't look at the rest of finance, where we rated it significantly behind in every other category. Basically, they know how to control AI but haven't figured out how to use it. What's interesting will be, whether over the next few quarters, we see this turn into a tortoise and a hair thing. In other words, when finance does figure out how to deploy, will they do it more safely and more effectively than functions that deployed first in government later,

and will that actually allow them to catapult at some point and jump out ahead of other functions, who at the moment feel like they are farther when it comes to deployment depth. So this is the idea of maturity maps. They are, of course, very nascent as to the first quarter we've actually fully tried at the mount, and in the spirit of getting feedback and seeing how useful these things can be, we're putting up the ability on the super intelligent website,

be super.ai to actually go not only review all of these maturity maps, but do a short quiz that actually shows you where your organization is relative to both the on track line and what we think is the average. And I do want to emphasize that this is not an assessment, this is not an audit, this is a quiz, it is an online quiz that's 18 questions that is going to give you a very general idea of where you stand. We of course have ways to go deeper with your organization and get

much better data to actually inform these things, but we want as many people as possible to have access to this to actually help us figure out where these lines should be and how we should evolve the entire system. Obviously, I'll have links to all of this in the show notes, but again, it's going to be at bsuper.ai/quiz. In terms of where we want to take this next, in addition to just continuously having better and more sources of data, probably the most glaring thing that stands

out to me is that we're trying to argue for one on track line and one average line across all different organization types. In other words, isn't really even remotely reasonable to judge a 10 person startup by the same on track lines as a 10,000 person enterprise. I obviously believe that there's enough value that it's worth putting it out while acknowledging that where I'd like to go next with this is vastly more gradations in both the on track and the average lines, organized by

things like organization size. Industry is another obvious area where you might see some fairly

significant differences and while I think it is the right call to start with some very broad

brush high level functions, obviously most organizations are a lot more nuanced than just having

these 10 clear departments. Still, ultimately, it is my argument that right now we are in a moment

where we need all the data that we can get and the more we can all pile in on top of each other and actually share that information, create new benchmarks, and help each other know what on track is and then how to get there, the better off we're all going to be. So that is maturity maps. I'm excited to share them with you. I'm excited to see what you think. Again, you can go to besuper.ai/quiz, which of course will be on all my websites and on the show notes and check us out

For yourself.

bit of argument about what it means to be behind significantly behind on track ahead or significantly ahead. Anyways friends, that is going to do it for day two of Build Week. Appreciate you listening or watching, as always, and until next time, peace!

Compare and Explore