Deep Questions with Cal Newport
Deep Questions with Cal Newport

AI Reality Check: Are LLMs a Dead End?

4d ago30:385,720 words
0:000:00

Cal Newport takes a critical look at recent AI News. Video from today’s episode: youtube.com/calnewportmedia SUB QUESTION #1: What is Yan LeCun Up To? [2:55] SUB QUESTION #2: How is it possible that L...

Transcript

EN

We've been told time and again that the massive large language models trained...

We've been told that huge percentages of existing jobs are soon to be automated.

We've been told that skills like writing and photography and filmmaking are all about to be outsourced.

And we'd be told that we're not careful that systems built on these models might someday soon become sentient and even threaten the existence of the human race. But here's the thing. One of the AI pioneers who helped usher in this current age is not convinced. His name is Jan Lacoon. And he's been long arguing that not only will LLM based AI fail to deliver all of these disruptions, but that it is, and I'm quoting him here, a technological dead end.

People have started to listen. Earlier this month, a syndicate of investors, including Jeff Bezos and Mark Cuban, along with a bunch of different VC firms,

raised over a billion dollars to fund Lacoon's new startup, advanced machine intelligence labs,

which seeks to build an alternative path to true AI, one that avoids LLM's all together. After all of the hype and stress and hand-reging around LLM based tools like CHAPGPT and Claude Code,

is it possible that Jan Lacoon was right that those specific types of tools won't change everything?

And if so, what's going to come next? If you've been following AI news recently, you've probably been asking these questions. And today, we're going to seek some measure answers. I'm Cal Newport, and this is the AI reality check. Okay, so here's the plan.

I've broken down this discussion into three sub-questions. Sub-question number one, what exactly is Jan Lacoon up to and how does this differ from what the existing major AI companies are doing? Sub-question number two, how is it possible that he could be right about LLM's running out of steam? If everything we've been hearing recently from Texas Eos and News Media is about how fast LLM's are advancing and how there's technologies about to change everything.

And number three, if Lacoon is right, what should we expect to happen in the next few years?

And what should we expect to happen in the maybe decade time span? All right, so that's our game plan here. It's going to give a little technical. I'm going to put on my computer science hat. But I'll try to keep things simple, which really is the worst of both worlds, because it means that the technical people will say I'm oversimplifying and the non-technical people will say I still don't make sense.

I'm going to do my best here to walk this high wire act. Let's get started with our first sub-question. What is Jan Lacoon up to? All right, well, let's just start with the basics. I want to read a couple quotes here from a recent article that Cade Metz wrote for the New York Times.

Discussing what just happened with Lacoon's new company. All right, so I'm quoting here.

Lacoon's startup advanced machine intelligence labs or AMI labs has raised over $1 billion in seed funding from investors in the United States, Europe and Asia.

Although AMI labs is only a month old and employs only 12 people. This funding round values the company at $3.5 billion. Dr. Lacoon, who's 65, was one of the three pioneering researchers who received a turn award, often called the Nobel Prize at Computing, for their work on the technology that is now the foundation of modern AI. Dr. Lacoon has long argued that LLMs are not a path to truly intelligent machines.

The problem with LLMs, he said, is that they do not plan ahead, train solely on digital data. They do not have a way of understanding the complexities of the real world. Quote, if you try to take robots into open environments into households or into the street, they will not be useful with current technology in quote. Mr. LeBron, who's the CEO of AMI labs, told the Times, we want to help them reach new situations, react to new situations with more common sense. All right, so that's kind of a high level summary what's going on.

Let's get in the weeds here to really get into the technical details of what Lacoon is saying and how it differs, how his vision differs from what the major existing frontier AI companies are actually doing. All right, let's start with a basic idea here. If you're an AI company, you're trying to build artificial intelligence based systems that help people do useful things. This could be like by asking them questions with a chatbot or having a system help you produce computer code. If we're talking about coding agents at the core of all these products needs to be some sort of what we can call digital brain.

Something that encapsulates the core of the artificial intelligence that your tool or system is leveraging. So the major AI companies like OpenAI and Anthropic have a different strategy for creating those underlying digital brains than young Lacoon's new company has.

All right, so what are the existing AI companies doing?

They're all in on the idea that the digital brain behind these AI products should be a large language model.

Now, we've talked about this before, you've heard this before, so I'll go quick, but it's worth re iterating.

A large language model is an AI system that takes us input text and it outputs a prediction of what word or part of a word should follow. So if we want to be sort of anthropomorphic here, what it's trying to do is that it assumes the text it has is input is a real preexisting text. And what it's trying to do is correctly guess what followed that text in the actual real existing preexisting text. That's really what a language model does. So if you call it a bunch of times, so you give it input, you get a word or part of the word as output, you then append that to your input and now put the slightly longer input into the language model.

You get another word or part of a word, and if you add that to the input and put that through the model, you slowly expand the input into a longer answer.

This is called auto regressive text production that you keep taking the output and putting it back into the input until the model finally says, I'm done, and then you have your your response.

So we can think about it then if we zoom out a little bit, the large language model takes text as input and then expands whatever story you told it to try to finish it in a way that it feels is reasonable. Under the hood, they look something like this, Jesse, we bring this up on the screen here. This is like a typical architecture for a large language model. You have input, like here it says the cat sat. That gets broken into tokens, those get embedded into some sort of mathematical semantics based, don't worry about that. They then go through a bunch of transformer layers.

Each layer has two sub layers, an intention sub layer, and a feed forward neural network, and out of the end of those layers comes some information that goes into an output head that selects what word or part of a word to output neck. So that's the kind of this linear structure is the architecture of a large language model. So the way you train a large language model is you give it lots of real existing text.

And what you do is you knock words out of that text, you have it try to predict the missing word and then you correct it to try to make it a little bit more accurate.

If you do this long enough on a big enough network with enough words, this process, which is called pre-training produces language models that are really good at predicting missing words. And to get really good at predicting missing words, they end up encoding into those feed forward neural network layers within their architecture. Lots of knowledge about the world, sort of how things work, different types of tones, they get really good pattern recognizers, really good rules. You actually sort of implicitly,

emergently and implicitly within the feed forward neural networks in the language models a lot of sort of smarts and knowledge begins to emerge. That's the basic idea with a large language model. So the large, the AI company is there their bet is if these things are large enough and we train them long enough. And then we do enough sort of fine tuning after words with post training.

You can use a single massive large language model as a digital brain for many, many different applications, right?

Speaking with a chatbot, it's referring, it's referencing the same large language model that your coding agent might also be talking to to help figure out what computer code to produce. It'll be the same large language model that your open-cloud personal assistant agent is also accessing. So it's all about one hal 9000 style massive, massive large language model that is so smart, you can use it as a digital brain for anything that people might want to do in the economics sphere. And that is the model of companies like OpenAI and Anthropic.

All right, so what is John Lacoon's AMI Labs doing differently? Well, he doesn't believe in this idea that having a single large model that implicitly learns how to do everything makes sense. He thinks that's going to hit a dead end, that's an incredibly inefficient way to try to build intelligence. And the intelligence you get is going to be brittle because it's all implicit and emergence. You're going to get hallucinations or sort of odd flights of responses that really doesn't make sense in the real world.

So what is his alternative approach? Well, he says instead of having just one large single model, he wants to shift to what we could call a modular architecture where your digital brain has lots of different modules in it that each specialized in different things that they're all wired together. Let me show you what this might look like.

I'm going to bring on the screen here a key paper that Lacoon published in 2022 called the Path Towards Autonomous Machine Intelligence.

This has most of the ideas that are behind AMI Labs. This paper has this diagram here I have on the screen. It's an example of a modular architecture. So he imagines an AI digital brain now has multiple modules, including a world model, which is separate from an actor, which is separate from the critic,

Which is separate from a perception module, which is separate from short-term...

You might have, for example, the perception module makes sense of input it's getting maybe through text or through cameras if it's a robot.

It passes that to an actor, which is going to propose like here's what we should do next, but in the critic it's going to analyze as different options.

Using the world model, which has a model of how the relevant world works, to try to figure out which of these options is best, pulling from short-term memory, then the actor can choose the best of those options, which then gets executed. So it's a much more of a, we have different pieces that do different things. Now another piece of the young Lacoon image is that you can train different modules within modular architecture differently. Again, in a language model, there's like one way you train the whole model and all the intelligence implicitly emerges in Lacoon's architecture, he says, "Well, wait a second."

Train each module with the best way, with whatever way makes the most sense for what that module does.

So like the perception module, let's say it's making sense of the world through cameras.

Well, there we want to use a sort of vision network that's trained with sort of like classic deep learning vision recognition of the type that, you know, Lacoon actually helped pioneer back in the 90s and early 2000s. But then the world model, which is trying to build an understanding of how the world works, he's like, "Oh, we would train that very differently." In fact, he has a particular technique. So if you've heard of Jeepa, GEPA, joint embedding predictive architecture, this is a new training technique that Lacoon came up with for training a world model.

Right at a very high level, he says, "Here's the right way to do that. Don't train a model that tries to understand how a particular domain works." Don't just train it with the low level data, like the actual raw words from a book or raw images from a camera. What you want to do is take these real world experiences and convert them all to high level representations and train them on the high level representation. So like I'm simplifying here a lot. Let's say you have as input a picture of a baseball about the hero hit a window and then a subsequent picture where the window is broken.

You don't want to train the world model. He argues just on those pictures. Like, "Oh, if I see a picture like this, the picture that would follow is one where the glass is broken."

That's how maybe something like a standard LM style generative picture generator might work.

He's like, "Instead, take both pictures and have a high level representation." So it's like a mathematical encoding of like a baseball is getting near a window. Like what actually matters, what are the key factors of this picture? And then the next picture is the window breaks. And what you really want to teach the model is when it has this high level setup, a baseball is about to hit the window. It learns that leads to the window breaking. So it's not stuck in particular inputs, but learning causal rules about how the relevant domain works. And you know, there's a lot of other ideas like this, the critic and actor that comes out of RL reinforcement learning worlds.

As sort of well known, you've trained one network with rewards and another one to propose actions. And so there's a lot of different ideas coming together here.

The third piece about Lacoon's vision that differs from the big AI companies is he doesn't believe in having just one system that you train once.

And as then the digital brain for all the different types of things you should do. He says this architecture is the right architecture for everything.

But you train different systems for different domains. So if I want a digital brain that we can build computer programming agent tools on, I'm going to take one of my systems with its world model and perception and actor and critics and I'm going to train it specifically for the domain of producing computer programs. And then all of my computer programming agents that people are building will use that particular system, but if I want to do help with call centers or whatever, I might completely train a different version of this system.

It's just to be really good at call centers. So we don't have just one massive hell 9000 that everything uses, which is the open AI plan or the anthropic plan. We custom train systems that may be all used to same general architecture, but we train them from scratch or different types of domains. You're going to get much better performance out of it. All right. So that is Jan Lacoon's vision. And he says this is how you're going to get much more reliable and smart and useful activity out of AI. This idea that we're just going to train like a massive model that can do everything based off of just text. He's like, "Come on, this makes no sense. That can't possibly be the best, most efficient route towards actually having smarter AI."

All right. So that is the key tension between the existing AI companies and Jan Lacoon's idea. This brings us to our second sub question. How is it possible that Lacoon could be right that LLMs are dead in if we've been hearing nonstop in recent months about how these LLM based companies are about to destroy the economy and change everything? How could we be so wrong?

Lacoon is not surprised by that.

Which is fine if you're a Sam Altman or Dario Amade that's great for you because you need investment, but it's probably not the most accurate way to think about it. Now if we ask Lacoon in this hypothetical to give a longer explanation about how we could be so wrong about LLMs, he would probably say, "Okay, let me explain to you the trajectory of the LLM technology in three stages," and I think this will clarify a lot.

So the first stage was the pre-training scaling stage, and this is the stage where the AI companies kept increasing the size of the LLM, so how big those layers are inside of them.

Besides the LLMs, the amount of data they trained among and how long they trained them. And there was a period starting in 2020, and lasting until 2024, where making the model bigger and trading them longer, the monsterably and unambiguously increased their capabilities. This petered out after about GPT-4. After about GPT-4, open AI, we have evidence that XAI had the same issue, we have evidence that Meta had the same issue, when they continued to make their models bigger,

they stopped getting those big performance jumps, so they couldn't just scale them to be more capable. This led the stage two, which I think of a starting in the summer of 2024, which is where they shifted their attention to post-training.

So now we can't make the underlying smarts of these LLMs better by making them bigger or trading them longer, so what we need to do is try to get more useful stuff out of these existing pre-trained LLMs. The first approach they came up with, and we saw this with the alphabet soup of models that were really starting in the fall of 2024, 01, 03, nanobanana, like all these type of names. The first approach they tried was telling the models to think out loud, so instead of just directly producing a response, they post-trained the models to be like, actually explain your thinking, and it was sort of a way, because remember it's auto regressive, so as the model sort of explains it's thinking that's always going back as input into the model.

And it gives it more to work off of enriching an answer. So it turned out if you had the model think out loud, you got slightly better on certain types of benchmarks, so these are the so-called reasoning models. But it was a bit of a watch because this also made it more expensive to use the models, because it burned a lot more tokens, because the answers, it produced a lot more tokens that get to the answer you cared about, so it did better, but it was unclear, like how much of that we actually want to turn on for users.

The second approach they used in the second stage was post-training. So now if you have, for example, a lot of examples of a particular type of question, prompts, correct answers, prompts, correct answers.

You could use those combined with techniques out of reinforcement learning to nudge the existing pre-trained model to be better on those type of tasks.

So we entered this stage stage two of the sort of post-training stage where because we couldn't make these LM brains fundamentally smarter, we wanted to try to tune them to get more performance out of them on particular types of tasks. This is when we began to see less of just, hey, try this model and it's going to blow your socks off, and we instead got lots of charts of inscrutable benchmarks. Look, the chart is going up on this alphabet soup benchmark, because you know, you could post-training for particular benchmarks.

It was less obvious in a lot of use cases for the regular user that, like, will the underlying smart seems to be the same.

We then entered a stage three, I think, that started in the fall of 2025, where the LM company said, really the big gains going forward isn't the applications that use the LLMs. Let's make these applications smarter.

So it's not just how capable the LLM is, it's like how capable is to programs that are prompting the LLM, let's make those smarter. So we saw a lot of this effort going into the programs that are called coding agents that help computer programmers edit and produce and plan computer code. Now, these type of agents had been around for many years, but they got really serious a lot of the AI companies, especially last year coming into the fall of last year. And how do we make the programs? They weren't changing really much the LLMs. They did some fine tuning for programming, but really the big breakthroughs and coding agents were in the programs that call the LLMs and they figured out, how can we make these coding agents capable of working with enterprise code bases.

So not just for individuals vibe coding web apps, but something you could use if you're a professional programmer in a big company, all that's tool improvements.

You're able to send better prompts to the LLM when you hear about things like...

This is all improvements in the programs that use the LLM, not these none of this is breakthroughs in the digital brain itself.

And so this is the stage that we are in now is responding a lot more time building smarter programs that sit between us and the LLMs that they're querying is their digital brain, so that it's in very particular domains, it is more useful.

So this all tells us, right? This is like what Lakun would tell you, right? I'm a channeling Lakun. He would say, once you understand this reality, you see that this impression that LLM-based AI has been on this super fast like upper trajectory of lots of fast advances is pretty illusory. The fundamental improvements in the underlying brain stopped a couple of years ago.

What we saw was then a period of lots of bragging about benchmarks doing better, but this was all about post training.

And now for the last four months, like all these improvements we've been hearing is about the programs that use the LLMs are being made smarter and they're better fitting particular use cases. But there really hasn't been major fundamental improvements in the underlying smartness of the digital brains, which is why all the problems like hallucinations and unreliability persist. The brains are actually incrementally improving either in narrow areas or in narrow ways. And it's what we're building on top of them is creating an illusion of increasing trajectory of artificial intelligence.

When in reality, we might just be in a very long-tailed stage of now we're going to do product market fit and actually build.

Do the work of building more useful products on top of a mature digital brain technology that's only advancing at a very slow rate.

We're going to be a LaCoon's argument. Therefore, we will find some good fits, but this is not a technology that's on a trajectory where it's going to be able to make massive leaps in what it's actually able to do. All right. So there you go. That would be the argument for how we could have gotten LLM progress so wrong.

All right. Sub question number three, let's follow through the thought experiment. What would happen if LaCoon is right about that? What would we expect then to happen in the near future?

Well, let's start with the window of the next one to three years. If he is right, we would see a long tail of applications based on existing LLMs to begin to fill in. So computer coding agents have gotten more useful. We will see other use cases like that that don't exist now, but where people are really experimenting to try to figure out applications that are going to work and other types of fields. So there'll be sort of cloud code moments and other fields, which I think will be useful and exciting.

The tool sets used in many jobs will change, but because we're now just trying to find areas where we can build useful applications on top of existing LLMs. These doomsday scenarios like we've been talking about on this AI reality checks recently where knowledge workers are going to have to become pet messuces and then after that they're going to have to cook the pets on garbage can fires because there's no money left the economy. Some of those scenarios are what would unfold based on LLMs in this current vision. There would be a big economic hit because what we're going to if we've shifted our attention to building better applications on top of the LLMs.

What we're going to see is a lot more companies get into that game and they're going to say, I don't want to pay for a cutting edge frontier hyperscale LLM is too expensive. Let's look at cheaper LLMs. Let's look at open source LLMs. Let's look at LLMs that can fit on chip. We saw this already with the open cloth framework, which allowed people to build their own custom applications that use LLMs to do personal assistant type roles. Right away, people are like, I don't want to pay all the money to use clutter GPT and you saw an explosion of interest in on chip machines and open source machines. All this is going to be, I think, good news for the consumers that means we could have more people building these applications.

More variety of these applications and they'll be cheaper. It's bad news for the stock market because we've invested depending on who you ask somewhere between 400 to 600 billion dollars into these LLM hyperscalers like open AI andthropic. That market's not going to support it. So there's going to be a big crash. This will probably temporarily slow down if this vision is correct with temporarily slow down AI progress because investors are going to feel burnt. All right, what's going to happen now if we zoom out to like a three to ten year range?

That's roughly the range of which the modular architecture approach that LLMs talking about would reach maturity. That's what their current CEO is saying.

Again, it's a research company now and they said it'll be several years until we really get the products that are ready for market. If lacoon is right, what we're going to see is domain by domain. You're going to have these very bespoke train domain specific modular architecture systems which if he's right.

Are going to be way more reliable and more smart in the sense of like they do...

We're going to see a lot more of that what's promised with LLMs we're going to see instead on that three to three to ten year basis if lacoon is right.

Because they're based on this modular architecture I think these systems will you know they'll be more reliable.

They're also going to be easier to align.

It's just like here's 600 billion parameters in this big box that we trained for a month on all the text and internet. Let's just see what it does.

Module architectures are way more reliable. If you have literally a critic module in there that evaluates plans based on both a world model and some sort of hard-coded value system to say which of these do I like better. You can just go in there and sort of hard code don't do these type of plans. Low score for plans that lead to whatever like a lot of variability in outcome or something like that. You have more direct knobs to turn so it does make alignment more easier. They would also be more economically efficient because when you're we have to train one module log enough one model long after to do everything it has to be huge and it takes a huge amount of energy.

But when you're training different modules in a domain specific system these can be much smaller. I like the point out the example of a deep mine a Google Deep mine tool called Dreamer V3.

Which can learn how to play video games from scratch. It's famous for figuring out how to find diamonds in Minecraft. Then it uses a module architecture. Very similar to what Lakoon is proposing here and we just read a paper about it in my doctoral seminar I'm teaching on super intelligence right now. Dreamer V3 which can play Minecraft well better than if you ask an LLM to do right. It's domain specific requires around 200 million parameters which is a factor of 10 or less than what you would get instant LLM it could be trained on a single GPU chip.

And it could do this domain way better than a front here language model which is significantly longer and trained significantly more exhaustively so there would be some advantages here there would also be some there's a little bit of digital lick around this world because. Way more so than LLMs again these domain specific models might actually have more of a displacement capability so we'd have to keep an eye on them. All right conclusion what do I think is going to happen here well you know I don't know right it's possible that there's more performance breakthroughs to get on LLMs and we're going to get more useful tools.

A gun to the head if I had to predict. You know through my computer science glasses Lakoon's modular architecture it feels like that has to be the right answer.

I think of this doubling down on LLMs is we're going to look back at this like an economic mistake it was the first really promising new.

I think we're going to get a high technology wide spread a technology built on top of deep learning and it did cool things but instead of stepping back and like okay what will this be good for. And what types of domains might we want a different models we said no let's just raise half a trillion dollars and just go all in on everything text based LLMs which are trained on text that are made to produce text all artificial intelligence will run off of these things I just think but we zoom out on the 30 year scale like that was so naive.

This was the only type of model we need for artificial intelligence super inefficient for like 99% of the domains we want to use is great for text based domains and computer programming kind of the planning is a little suspect but the code production is okay. But we're going to make all intelligence based off just massive LLMs and they'll be like four of them like four companies that have like massive ones and that's it that this can't be the right way to do it. So my computer science instincts a module architecture it just makes so much more sense domain specificity differential training of modules.

You have much more alignment capability there much more economically feasible like it just feels to me like that probably is going to be the right answer. Which means we're going to have to have some bumpiness in the stock market because I don't think that if this is true the hyper scalers has now either they have to pivot to those quick enough before they run out of money or some of them are going to go out of the business and the others are going to have to collapse before they expand again. So I think the modular architecture approach will work better. I don't know if LaCoon's company is going to do it or not, but I think that architecture it makes a lot of sense to a lot of computer scientists. Now I hope they don't get too much better because I'm much more I can much more imagine a very trained modular architecture AI digital brain.

Creating justified it because then I can build in these Python agent programs that access some sort of massive LLM somewhere. Alright so yeah well now I think within a year we'll begin to get a sense of which of these trajectories is actually true.

I of course will do my best to keep you posted here on the AI reality check. Alright that's enough computer science talk for one day hopefully that made sense hopefully that's useful.

Bit back soon with another one of these checks and until then remember take A...

Compare and Explore