Odd Lots
Odd Lots

This Is How to Tell if Writing Was Made by AI

2h ago48:329,839 words
0:000:00

When you consider the fact that many people don't know how and where to place a comma, it's safe to say that AI is already better than most people at writing. It's clean copy. It can be surprisingly p...

Transcript

EN

Thanks for listening to AdLots follow the show on Amazon Music for more futur...

Bloomberg audio studios podcast radio news

Hello and welcome to another episode of the AdLots podcast I'm Joe Weisenthal And I'm Tracy Alloway So Tracy, you know, you ever come across some writing You can't articulate exactly why, but you're like, I'm pretty sure A.I wrote this Does this happen too much?

So full disclosure, I haven't really thought about it that much Yeah, because the thing is, I probably should think about it more But there's a lot of bad writing out there And I've become sort of inured to it And I also think that, I don't know, trying to figure out whether or not something was generated by A.I. nowadays

If you actually dedicate a lot of your own time to doing that, that is a huge mental burden to be attempting Especially, you and I are in the journalism industry How many of the pitches do you think that we get from PRs right now are being generated by A.I. Imagine if you're reading each one of those and trying to figure it out on a daily basis

You know what I suppose I think about it the most is someone will respond to a tweet

Yeah, and I would be like, well, if this is a real person, then maybe this person deserves some engagement And they ask a question or I want to respond But if there's a person in the bot, then obviously I don't And that's where I look, you know what I want to figure it out I would like to know the answer

You know, I have a controversial view about A.I. writing, by the way, which is that it's pretty good I mean, like by and large, and I said this, I think maybe in a recent episode When you consider the fact that, I don't know, the majority of the population Like, doesn't know where to put a comma within the sentence Well, this is my point, right?

That's pretty good. I mean, one thing I'll say about A.I.

Is it never gets the placement of a comma wrong?

It's on some level, it's perfect Did you do that? I think it was in the New York Times the- Yeah, I kind of hated that. Okay, why? Well, because I'll tell you why.

First of all, there's only five examples. There's not many.

Two, it asked the reader which do prefer, but I think they were different subjects as well

Yes, and also I think most people probably treated that as can you guess, Which one is a human? Because everyone wants to say they prefer the human. I didn't think it was like a great test. Nonetheless, look, not only is it often indistinguishable, not often, is it often fine writing?

Sometimes A.I. could come up with a really remarkable turn of phrase. Yeah, but I still buy in large, don't like it. You read a thing, especially along Texas A.I. And it's like, even if you can articulate it, like this feels A.I. it has a certain sickness, sweetness to it that is often annoying.

So what I notice about it is it doesn't do style very well, right? So if you ask it to write something in the style of a writer, if you choose anything other than something really obvious like Shakespeare, Yeah, it really, it suffers. But the text that it actually outputs is pretty clear, right?

For basic understanding, it's probably better than a lot of what's on the internet. The real people who are going to have to worry about this are like teachers, obviously. Universities and lawyers and maybe a lot, it's fun.

But there are sometimes it's like, okay, did someone write this or not?

And there has to be a nice if like we could know the answer. Well, the other thing that's starting to happen is have you seen any books out there that actually come with a disclosure? Or explain it that say this book has been written only by humans, no A.I. used it all.

I saw that for the first time on a book that we actually read for an all-bots episode.

I don't think it's come out yet, but that kind of threw me. Yeah, no, it's more and more. Anyway, as we enter a world in which the vast majority, if not already, words written are written by A.I. It's going to be interesting this question of whether we know. Anyway, there's this company called Pangram Labs and I have a little thing.

And you can pay for it, but also a free service where you can drop like a text in. And it'll say the odds that is written by human A.I. And I'm pretty impressed by it. I like did some samples of my own writing and then A.I. outputs. It got them all right.

But then I did some like further like I tried to stump it to see if like, so what I did was I took a piece of A.I. writing. And then I had it translated into Chinese. Okay. And then I had it translate that into high Chinese. So it's like okay, imagine this has been written by a more formal register.

And then I had that translated into Hebrew. Then I had that translated into English. So the original thing through this series of A.I. telephone through various translations. And then I put that output back into Pangram. And I got that right.

It said it was A.I. So even after a series of sort of transformations designed to obfuscate the original style of the piece. To see if you know, eventually it would emerge as something else. So I was pretty impressed. It seems to work.

And you know, I think that's interesting for a couple of reasons, which is maybe there is something that you can just tell. But two, it sort of worries me because you know, there have been articles.

They'll say, like, this is written by A.

And I think one of my big fears would be that I write something. And like, two is an M dash.

I've always written them dash fan.

I love M dashes. That's how people talk. And I'm sorry. And then what if it says you wrote this by A.I. And I'm like, I didn't.

And then here's this black box that is suddenly a judge during an executioner for my career potentially. You wrote this via A.I. The lab says so you are now done. Like that worries me.

So I think this raises a lot of very interesting questions about these model detection thing.

And I want to learn more about how. Well, there's also a lot of philosophical questions about just what we value in writing as well. Because no one's going to yell at you for using spell check or something like that. It's kind of crazy to think that reputational risk is going to hinge on whether or not you might have used a platform, a chat platform to do some basic copy editing. Well, very happy to say we do, in fact, have the proof of a guest.

We're going to be speaking with Max Spiro. He is the founder and CEO of Pangram Labs and he can answer all of our questions. So Max, thank you so much for coming on outlaws. Thanks for having me. How do you know it's right?

So someone puts in a piece of text and we'll get into the method in a second.

Someone puts in a piece of text and it says, "Human, A.I. What makes you believe that you have a very good track record on this question?" So when we started Pangram, we started by doing this thing. It's called a human baseline, which is how well can we, as a human, predict whether something's A.I. or not. That's the first step at learning is this problem tractable, how harder, easy as it.

And I found like me personally, I was able to get about 90% accuracy. And so we figured, and A.I. model should be able to do much better than that. So I have a bunch of methodology questions which we can get into, but just before we get into any of that, why is A.I. thought bad in your opinion? Why does it need to be tract and identified?

I think the problem is just so easy to generate.

And so it's very difficult to know what is the intent behind it. Basically, right now, I think we're actually pretty lucky, living, we live in a world where the signal to noise ratio

on the internet and our information channels is pretty high.

We have pretty high signal to noise, but any bad actor can come in and just flood our information channels with A.I. Slot that looks legitimate, it looks like somebody put actual effort and thought into it. But really, it was just like a single prompt which could have also been automated. This is something that I think about a lot, which is that there was a point in time and maybe still is the point in time. Well, if you read something that was grammatically correct, or the punctuation was strong, or the spelling was strong,

there was reasons to think that the person who wrote it was a person of like certain seriousness and a certain intelligence behind it. And I think that the issue that you're identifying is that that link is now being severed, so that we can't use these heuristics anymore, such as the strict quality of the pros, to know in fact whether this was published by someone who was like a serious actor, intelligent or not. And now you have people inserting typos into their college.

I know, that's true, that they are actually published. Yeah, but we'd sorry, just to go back to my original question. See, mentioned, okay, you're able to get it 90% right. But now it's been used a lot more, and you have people paying for your software, presumably teachers and journalists, et cetera, given all of that, getting from 90% to 100.

I mean, if you could make one out of 10 is clearly unacceptable error rate for a piece of commercial software, that could call someone an AI creator.

So you have to do a lot better than 90%.

Talk to us about like what you've seen so far in your data since releasing it as commercial software, that makes you believe the software is doing a correct job of allocating between the two categories. So we've built out a really comprehensive e-values. Okay. And so our evaluations, there's two kinds of errors.

There's a false positive, which is when something is written by a human, and then we say that it's written by an AI. Okay. And there's a false negative, which is if it was AI written, and we don't catch it. And so we track our numbers for both of these.

And for human writing, we're actually pretty fortunate. We have like millions and millions of samples. So we can get like a false positive number that we have a very high degree of confidence in. And our number right now is about one in 10,000. Okay.

So if we scan 10,000 documents on average, one will come back as AI when it was actually human. And what about the other direction? False negative. I would say around 99% accuracy. So around 1% false negative rate.

I think this depends a little bit more on like how adversarial the prompting is, how much they're trying to of what I did. Exactly. We send it through multiple filtrations to obviscate the original output. That would be an example of adversarial prompting.

Exactly. But in like the general case where we're just looking at straight outputs from AI,

It's above 99%.

Okay.

Okay. So what is your model looking for exactly when it's evaluating a text?

Because as we mentioned in the intro, you know, syntax and grammar tends to be pretty good on AI generated copy.

The style is sometimes more of an identifier. I would argue to your point Joe, like sometimes it reads very saccharin and kind of overly earnest in some ways. So what exactly are you focusing on here? What are the tells? Yeah, so the style and the word choices are definitely part of it. But I think what a lot of people don't realize is they're actually making a lot of decisions when they write a piece of text.

So every there's, you know, dozens are hundreds of ways to phrase every single phrase and over the course of 50 or 100 or 200 words. You're making thousands of decisions actually. And so what we're doing is we're learning the patterns and how like these frontier models make these decisions. The vast majority of these decisions line up without the frontier models are doing it. Then it's vanishingly unlikely that this was written by a human.

You would have to just happen to make the same exact decisions that the LM does hundreds of times. Interesting. Okay. But this is a really important point. So everyone at this point has some feel for let go.

The M dash tell, right? But my understanding is it's not like you don't go in and like hard code. If you see a bunch of M dashes, this is the thing. These decisions, in many cases, I imagine either you, nor the model itself, can articulate and English what the decisions are.

All you know is that the decision pattern exists. Is this correct? This is correct. Okay. Can you explain?

So therefore, what does it mean that your model has learned these decision patterns?

So what we're doing on the very broad scale is we're training a deep learning model. So it's a pretty big black box, but it has the base model of a language model. And then instead of predicting the next token, it's predicting whether the text is AI or not. Okay. And how we train it is we train on tens of millions of examples.

So it sees millions of millions of human examples. And for each human example, we also show it an AI example. So for example, let's say one of these is a five star review for Denny's.

That's 78 words long. Then we'll ask an AI to write a five star review about Denny's at 78 words long in the style of the first one.

And obviously, these two will be different. And so our model is able to learn through contrast what is the difference between these two. And the important thing, sorry, just to be clear here is that you and I might not be able to articulate the difference. There will be some difference in maybe the sentence length there will be some difference in word choice. There will be some difference in punctuation, syntax, whatever.

But you and I wouldn't obviously spot it. However, after millions of examples of these side by side, the model learns what the difference is.

Exactly. I think the best that a human can do is look for some of these like really obvious tells, like chat GPT loves, they're like,

It's not just x, it's y framing earlier models really like some specific words like tapestry and internal and delves. Yeah, delves, tapestry, yeah. But yeah, I think by turning pangram we're able to go much deeper than this and look deeper than the high level science at the like document level science. Hello, I'm Stephen Carroll. I'm in Brussels for many of Europe's biggest decisions get made. And I'm Caroline Hepkin in London with the hosts of the Blueberg Day Break Europe podcast.

We're a barely every week day keeping an eye on what's happening across Europe and around the world. We do it early so the news is fresh, not recycled and so you know what actually matters as the day gets going. From Brussels, I'm following the politics, policy and the people shaping the European Union right now. And from London, I'm looking at what all that means for markets, money and the wider economy. We've got reporters across Europe and around the globe feeding in as stories break.

So whether it's geopolitics, energy, tackle markets, you're hearing it while it happens. It's smart, calm, and to the point. And it fits into your morning. You can find new episodes of the Bloomberg Day Break Europe podcast by 7 a.m. in Dublin or 8 a.m. in Brussels, Berlin and Paris. On Apple, Spotify, YouTube or wherever you get your podcasts.

So one thing, this kind of reminds me of, and I'm thinking how to phrase this, but it reminds me of, you know, those exercises people used to do where you would take a bunch of different faces and meld them all together and come up with like one thing that was hosted by the chat.

So like to what extent is this basically a distributional detector in the sense that you're looking for like certain paths that you think AI would choose.

And I guess like, could you get a false positive just from someone who's choosing like the average of the average in a way to stay to particular sense.

Maybe there's a reason we have our false positive rate is one in 10,000 and n...

It's because, you know, sometimes we look at the false positive and it's like, oh, it reads exactly like an AI generated review or essay except that it was written in 2019.

So it was probably a human who just happened to find the exact like mode collapsed type of way that like, yeah.

I would say, yeah, I think it's a good way to think about the distribution of writing or writing as a distribution.

We're like, you know, there's a space of all human writing and then AI writing is really just like a small point within this space. It's very no matter how much you prompt it, it doesn't go that far from where it was trained to be. Yeah. Okay. What's the black box? So I built a little model myself. I built this thing that detects you can upload text and says whether it's more resembling of the written word or the spoken word. Oh, I saw that. Yeah. And I used Bert, which is like one of these things to open source one from Google. What is the core model that you trend on or is it something or you build it yourself like talk to us about the very first model was actually built on Bert.

Okay. But future models we needed to up our capacity to go higher in it. So basically we were running into capacity limits with our model.

It was capping out at a certain false positive false negative rate. It wasn't learning the deeper signals. So we had to 10x and then 100x the parameter count so that can learn like really deeply like how these frontier models right. Have you noticed any interesting differences between how the models, right? Can you and actually is your model trained to identify different models as well as whether or not this is just broadly AI generated. So we don't specifically train it on different models. We don't see like, hey, this one is clawed three in this one is chat or GPT-5 what we've done we've done some interpretability work to look at.

Basically the output embeddings of the model and where we find that it actually learns which model the text came from. So you could see like little clusters like this is the cloud cluster and like all the clouds. Yeah. Cluster around here and then these are like the deep sea can quen and then this is like chat chip end and they all kind of like cluster into different spaces and embedding space. So clearly the model is able to learn what the differences between these frontier models. And since you mentioned quen I'm very interested is there anything like distinct in terms of how quen generates text versus platforms that have been developed in the US.

I think quen is unique because it's trained on a lot more Chinese and multilingual tokens than other models.

So you know, I've heard from Chinese friends that it's it's much better. Like being conversational fluent in Chinese. Beyond that I don't know that I can tell it would be hard for me to look at a text and say like I know that's quen. But I think somebody who is more familiar with it might be able to. Let's talk about sort of some of the philosophical or societal implications of this word. Have you had anyone whose text has been judged to be a written by a program?

And they're like I swear to God this isn't you're in they like really insist and what do you think about this situation? What do you do or talk to us about that? I've had a couple times this happened there have been times where I genuinely believe that you know this is just a false positive we scanned hundreds of millions of documents. So like at a certain scale like this will happen by I also get people who all the time they're just like AI detectors don't work. It's like a total fraud and and then whatever they're putting out on LinkedIn it's just a hundred percent AI generated.

And they're just like that they're getting called out and then you look back farther into their past and their history like everything they're putting out is AI generated until about like 2023 like for everyone if you look historically. There's a lot of like slop accounts that are putting out total slop and you can tell either they like weren't posting as much before. And if you scan back in time then you see that they were writing human texts at some point.

So there's a number of accounts out there that basically right around the beginning of 2023 where if you scan the entire corpus of their work.

It very clearly shows a switch right around early 2023. Yeah it really like depends on the account.

I think one thing we saw that was interesting was there is a writer for the Guardian that was covering the winter Olympics.

And somebody was like hey this articles like total AI slop ran it through pancrime it was AI. And then we so we scanned this single writers like history and we found that they really did start picking up AI like mid to late 2024 and we're using it more and more and they're articles. I mean just play devil's advocate for a second does intent matter when it comes to identifying AI slop in the sense that okay I get you can have a bad actor who's maybe trying to influence how people feel about a particular topic.

They've created a bunch of bots on twitter slash x and they're using AI to just flood the zone with a bunch of AI slop supporting their particular viewpoints on the other hand.

If you're a journalist and your business is to write you know like basic unde...

And that intent is very different to I'm going to try to influence something by just you know sheer volume.

Yeah, I mean definitely these are like one is a lot more severe than the other but I think at the same time if you're a journalist and you're using AI to basically.

You're working like not doing your work I think that's also a problem and I think it's a reputational risk to the outlet because people can tell and people are going to call you out and there's a lot of people who don't want to read AI slop kind of regardless of where it's from. Yeah, this is a definitely true. Are you ever going to run out of human material to change on right like you could be pretty confident that if you find some piece of text that was published on the internet prior to 2023 but certainly prior to like 2019 or something like that.

You can be extremely sure that this is human generated. Do you worry that in the future that like it's going to be harder to even establish the provenance of your training data? Yeah, it's definitely concerned for us. Talk to us about how to think about this. So we have a near infinite data reservoir of free 2023 data.

There's just like more than enough for us to train on for a long, long time but part of the problem is we also want to train on modern text.

We want to there's all this talk about like if if somebody's writing about LLMs or about AI we don't want to incorrectly flag that is AI because our training data has no sense of this topic. So I think we're looking at different ways to do this but most of them are just like figuring out like who is a trusted actor. Who do we know is putting out human written content and we could use our model for that like to some degree and then so we have no actors. We know they're putting out human written content and then we could use their data as well.

It's a slightly random question but using your model are you able to quantify like what percentage of the internet at the moment is AI slot? It's about 40% based on why you just where do you how do you get the number? So a lot of the internet is just like SEO written articles and yeah it's articles written for first search basically so that your website comes up more often search because it's targeting certain keywords. And a lot of that industry has switched over to using AI because then instead of having to pay writers you could turn out articles for pennies on the dollar.

But I think that kind of results in a lot of the internet being AI written. It's a little bit is also kind of platform dependent. It's about 40% from like a internet page perspective about a year and a half ago we looked at medium and found that over 50% of newly written medium articles were generated. Which was a crazy high number. What about Reddit?

Reddit. It was 7% a year ago I believe a little over 10% today. We actually this reminds me so I'm on Reddit a lot and I really enjoy it nowadays as a platform but I do worry about how much of it is being generated by AI. And the thing I don't necessarily understand is what are the economic incentives to actually write a bunch of AI generated posts on Reddit and get upvoted. Like why does that system or motivation even exist?

So there are startups. I'm not going to name names because I don't want to promote them but they will sell a promise to companies that we're going to get you organic mentions on Reddit. We're going to run our AI bots that seem organic and they're just going to naturally recommend your product or you know just mention your product in the comments or in a post.

And so I've seen evidence of this. We can find these like they're basically like bot farms that are mostly engaging seemingly organically just like doing a short reply.

And then sometimes they're doing this grand mention and so that's why these posts are very valuable.

It's really interesting. I have to also imagine it's valuable because all of the models train on Reddit, right? And if you want your product's name to appear in model output, it's like what is the best, you know, nose hair trimmer or whatever. And there's a bunch of bots that on Reddit talked about those nose hair trimmer and then that's probably more likely to show up in a touch of easy request, right? Yeah, yeah, it's been weirdly game.

You know, you used to just Google best nose hair trimmer. Yeah, there's like a thousand.

Well, the Reddit search results like show up first nowadays.

Yeah, you have people are looking. Yeah, and then people started searching best nose trimmer. Yeah, to get their Reddit comments on it.

And now it's people have realized that that's what people are searching for.

So you need to populate Reddit with your advertisements. Hmm. I'm, I'm on the mental. Are you looking for a nose hair trimmer? The panasonic ear nose hair trimmer is the number one choice on mental pros easy to hold.

Anyway, it's not. Yeah, it's all these affiliate links. Yeah, just destroyed the internet. I know it's sorry to bad, but whatever. Talk to us more about the whole pipeline. So I would, I'm very fascinated by this idea.

It's like, okay, you see this review for denies. You have the AIM model, try to replicate it as best as it could.

Who will be these subtle differences?

Talk to us about like the whole pipeline.

What are the other tests that you're using to get the true, you know,

because what I imagine you're trying to do is get the most similar data sets with an almost imperceptible difference. It's a really stress test. I just love it. Yeah, talk to us really about the whole pipeline.

Yeah, so what we're really trying to do here is we're as a model maker myself. No, no, sorry, keep going. Yeah, as an AI expert. Yeah, as an AI expert, I need to hear some tips on the view. Yeah, so what we're really looking for is examples that are as close to the boundary

between human and AI as possible, because our model learns better. Something that's very obviously AI is, you know, our model is not learning as much. Same thing for something that's obviously human. And so step one is creating this data set with synthetic mirrors of human examples. And then we train a model.

And then step two is something called active learning. So we then take this model and use it to scan a much larger corpus of data. And look for errors, false positives, false negatives. And then we pull those back into our training set and are able to train a much better model. Because it's seen these errors, which and these errors, we believe, are just much closer to the boundary between human and AI.

So, sorry, just to be clear, the first pass is like, okay, you have known human writing and known AI writing.

You train a model. And then the next pass is once again a known human and known AI writing. So you already know the answer of each of these. And therefore, you could come up with a list of which you got wrong. And then that gets fed back into the first version.

Exactly. And so that makes, once we retrain, then the model gets much, much better. And then we could do this as many times as we want to kind of just have it. Self-improving model that gets better with every training run. I can also tell you, go a little bit more into how we deal with AI edits.

Because I think that's increasingly important.

Problem is, like, I think most writing will be AI assisted in the future. I think it's already in Google Docs and it's in my Google keyboard. Grammarly arguably has been doing this for a while. Exactly. Yet Grammarly uses LLMs on the back end.

And we don't want to just say, like, all writing is AI now. We want to be able to differentiate between AI assisted and AI generated. So what we do is we also have different prompts. So rather than saying, so for our like human review of denies, rather than saying, generate a review like this,

we could say, help improve this, make it more formal. Make it more like clean up the grammar. And so we have like a long list of AI editing prompts.

And then we're able to look at basically the cosine difference.

The distance between the original human text and the day I had it. The day I had it. The day I had it. The dimensional space. Exactly.

So how much did AI change this text? Interesting. And then we're able to train our model to say, like, we're just going to, like, put a point on this distance and say, like, this is moderate AI assistance. This is like AI assistance and this is heavy AI assistance.

Interesting. I'm going to do something. I don't think I've ever done before, which is ask a founder about their corporate mission. But you know, you've set up this company. And when you think about what you're trying to do here,

is it just basic AI detection in the sense that there might be, you know, a few groups of people like teachers that find this very valuable.

Or is the mission something broader where you're actually trying to improve the internet?

And what people see on it? I believe the technology of being able to detect AI-generated content is immensely valuable.

And it's valuable, not just for teachers, but for basically everybody in every profession.

Lawyers, publishers, just an individual who consumes content on the internet. I think it's valuable for all these people. But ultimately, yeah, our, like, high-level goal is to help mitigate some of these negative effects of growing AI content. But for instance, just using the product review example is the vision that, like,

a yelp, for instance, would want to use this technology to make sure that it's system isn't being gameed. Or is the vision, like, if I am a particularly diligent consumer who has a lot of time on my hands, and I'm looking to go out to a restaurant, I can run all these individual restaurant reviews through pangram, and then, like, actually figure out if it's real hype or not.

So, I think right now, it's a lot of the former. We work with platforms. One of our biggest customers is Quora, and they run a bunch of content through pangram. But we have a lot of different platforms that use pangram to help moderate and find AI bad actors and get them off their platform.

But I also think, yeah, the individual consumer case has been growing a lot, and we're really interested in pushing here. [Music]

The news doesn't stop on the weekends.

Context changes constantly. And now, Bloomberg is the place to stay on top of it all. Hi, I'm David Gurra. Join us every Saturday and Sunday for the new Bloomberg this weekend. I'm Christina Raphini. We'll bring you the latest headlines in death analysis and big interviews.

All the stories that hit home on your days off. And I'm Lisa Matteo, watch and listen to Bloomberg this weekend for thoughtful, enlightening conversations about business, lifestyle, people, and culture. On Saturday mornings, we put the past week's events into context, examining what happened in the markets and the world.

That on Sunday as we speak with journalists, columnists, and key political figures to prepare you for the week ahead. Join us as soon as you wake up and bring us with you wherever you're weekend plans. Take you. Watch us on Bloomberg Television. Listen on Bloomberg Radio, stream the show live on the Bloomberg Business app, or listen to the podcast.

That's Bloomberg this weekend. Saturdays and Sundays starting at 7am Eastern. Make us part of your weekend routine on Bloomberg Television, radio, and wherever you get your podcasts. [Music] The free version of pangram.com, like you get a handful of tests a day or something like that. If someone had an unlimited number of pangram responses and maybe had an access to

the pangram API at infinite scale, could they theoretically learn a prompt that

they would then be able to put it into an AI to generate human style radio?

I actually had a friend do that. He put his cloud code on a loop and I gave him some API credits

and then his cloud code just basically worked overnight, writing a prompt, trying to get it to help

but something that's human written or that came back from pangram as human written. It got there, but the text was pretty like incoherent. So like yeah, it was producing more or less long gibberish. It was like grammatically incorrect. A lot of the words just didn't really make sense.

Because this one, my first thought, like when I saw it, I was like that would be like a fun experiment to see if you could take all the outputs, find the difference and just keep iterating on the prompt you would have to tell AI in order to eventually get an output that looked to pangram, like it was human generated. Yeah, I think there's a way to do it if you also had like an LM judge on coherency

and he's like pangram and the coherency judge both to score your text. I think it's definitely possible, and I'm excited for someone to try to do it because we could make our model a lot better and more robust if this existed. So I want to know what your personal like token budget is nowadays that you're even like contemplating some of those stuff.

You know what I feel like I have the cloud max playing, you know?

And I don't work like when I'm at work. I don't work on any of my vibe coding project. And you know like when we were kids, I don't know if you remember like if you didn't eat all your food, like someone to say, well, there's like starving kids in the world. Yeah, it's starving.

Yeah, it's starving. Yeah.

It's like, oh, you didn't like I have this floor hour token window and I'm almost never

maxing it out. And I'm just like, dark kids on the other side of the world that wish they had your tokens. And you're not using all of your tokens for the window. How dare you? I feel a little guilty when I don't know max out my cloud max token program.

I also have cloud max and yeah, most days I'm not doing much coding. Oh, I'm not maxing it out and then some days I'm going with your budget. I'll talk about that though. Oh, yeah. Yeah.

So can I ask you like I'm writing is kind of interesting. But like what are the prospects of this being able to work on say?

And you must get this lot image and video generation.

Is it it all theoretically similar? Is there reason to think that it will be replicable? Or is this just a different beast of a problem? I think the approach is definitely doable. I think some of the economics change,

especially if we look at video and the cost of generating video today. Okay. We can't generate video at the same scale that we can generate text. And so we might need a kind of different approach. But I also believe that if we're able to solve this for image,

plus maybe like audio, that could be enough to just solve it for video as well. Huh, zero shot. Could you ever envision, I don't know, launching some sort of like certification program for video, because this seems to be my dad's boomer. Spends a lot of time on Facebook.

Like this seems to be what society needs, right? Like a video that comes with a little thing that says this is not AI generated. And someone has actually like rubber stamp that. So there's an organization called C2PA. And I think they're doing pretty good work on content provenance.

Basically they are working with phone makers and hardware makers to basically embed like hardware signatures.

To prove that image and video were truly taken from the hardware. Like watermarks basically. Yeah, exactly. So rather than marking the AI outputs, we're instead embedding like a proof of authenticity and the thing that's real is captured in real life.

That's interesting.

All right, still big picture. Where's the internet going? You know, you mentioned 40% of the internet is already ad generated. But maybe that's something under the world.

Like, you know, if it's just a bunch of SEO pages that I never read.

I don't know, whatever. But like give us some thoughts. High level about like with the trajectory of the internet. Regardless of the uptake of pengram and other AI detection models. I'm a little bit worried about the state of the internet.

I'm going to be honest.

I think like right now, there's still like so much bit is built around trust and norms in a way that like we're not really well equipped to suddenly deal with.

And on slot of bots at a completely different scale than we've dealt with before. There's maybe like a good case in a bad case. I would say like the bad case is the internet goes the way of dead internet theory. Just like every space that's open and accessible is just flooded by bots. And then the only place people are able to communicate authentically is in like very walled garden like closed.

Servers like like discord servers, for example, where, you know, everybody's identity is known. And you know, you don't know bots in here. So that's maybe the like bad scenario. Can I turn in the same thought that I've had? Go on to see we're going to kick out of this.

Just say you heard of like I forget what they call like this idea of like with bad actors. It's called like heaven mode or heaven banning. Have you heard of this? So there's this thought that one way you could deal with bad actors on the internet is suddenly they're on a version of say Twitter. Okay.

And which they're only bots and everyone always agrees with them on everything and it drives them crazy and stuff like that.

And they would never know it because they're like, oh, it's clever. And then it's so like slowly like, yeah, they just this is like, well, you could punish people by putting them on the internet. Well, they will never get any fighting. You get heaven banned and put into basically jail. You're talking to a bunch of all right. That's right. That would be jail. We were heaven banned.

But I thought, and again, this is, you know, like I built this a little am I myself and I like showed to my friends. Like, Oh, cool job. I'm really impressed. Like, I'm really impressed by like that you're able to do this would. And I was like, our people being honest with me if I put heaven banned because I just like, like, like, you can be honest with me if it sucks. And I sort of have the fear.

The biggest humble brag ever. Like, I get this thing and everyone thought it was great. I'm just saying like people are like, I think people, I'm worried that like people bring nice to me because like,

Oh, cool. Yeah. That's what impressed you. Like, did that.

And I have this like deep anxiety that like people aren't giving it to me straight about it. I know that sounds like a humble brag which is really not like that's like you can never get like two successful. Like, yeah, less surrounded by a bunch of guests.

So you never get like, Oh, you see, this is his first try doing something with live coding.

I'm like, deeply anxious. Like, no, you can just tell me if it sucks. That's fine. That's my worry. I don't worry about this. If I tweet that I'm eating a steak, I will get like a hundred people to criticize me. That I did, you didn't cook the meat. Yeah. Yeah. So that's the other thing, which is that the two things you are never allowed to tweet about. Meet preparation and enjoying life because if you ever enjoy life and if you ever prepare meat, people will flip out at you.

On the internet, those are the two things that you're not allowed to do online. Very true. It's sort of related question, but just going back to the methodology. If you're focused on this sort of like path dependent idea, I'm kind of envisioning it as like a giant decision tree, right? Yeah.

Is there a possibility that as the models get better and better? And we know that they're already injecting like some degree of randomness into their output. Although, I know there's going to be a patent out there who like messages me and says like, "Well, you know computers can't do like true randomness." But, you know, setting that aside, setting that aside, like we know that they're adjusting. They're becoming more sophisticated at an incredible rate.

We know that they're trying to adjust and inject some randomness in order to avoid exactly this kind of detection. Do you worry about their own adaptation at all?

I have noticed that the models as they get more capable, I believe it's like their output distribution gets more complex.

Yeah. It's harder to learn with a simple model, which is why we've been increasing our model size to capture a higher complexity function that can capture the LLM output. So I think we may have to continue to make our models better. We're going to have to work to keep out with it. Yeah.

We can't just rest on our laurels. What are burstiness and perplexity? Yeah. So, this is a metric that's used by some AI detectors, but not pangram. Okay.

And so I can explain a bit about how it works. So perplexity is basically a measure of-- This is not perplexity.au, the website. This is a technical term. Okay.

This is a metric. This is a measure of how confusing a piece of text is to a language model. So basically, if, for example, with every token, we can calculate some perplexity, which is basically, like, how expected is this? So, for example, like, if it's, I went home to my pet, and then the next token is Chinchilla, that would be a much higher perplexity token than my pet dog.

Mm-hmm. So low perplexity text, or really, like, LLM outputs tend to be low perplexity.

They're not going to produce outputs that are surprising to themselves.

So this is a decent way to get an AI detector that's around 90 to 95% accurate. But it has some problems. The main one is that you can't improve upon it.

Basically, it has false positives.

Text written by non-native English speakers often is low perplexity, just because when you're a language. They don't take as many risks exactly. Yeah. Yeah.

So that's why a lot of the early AI detectors had a bunch of false positives with ESL speakers.

It's because their text was low perplexity. So I think this is a very cool metric, but it is not the path for pangram and said we went the deep learning approach. So we can do better than what's burst in us. Is that just the opposite side of the coin? Yeah, burst in us is basically, actually, yeah, I don't know if I can define it.

Okay, fine. Yeah, okay. Burstiness just sounds like one of those, like, sort of, I guess, Mano Sphere terms, doesn't it? Oh, yeah. Oh, yeah.

He has, like, he's been looks maxing with high burstiness. Yeah. Yeah. That's great.

Yeah, I think it might just be like a measure of, like, sentence length and how the ups and downs of the text.

But if we assume that the world is collectively concerned about AI Slop and wants to do something about it, what would be like the single biggest change to the system, either in terms of, like, the economics of the internet or regulation or technology, like what you're developing, that would actually help produce Slop. Yeah, I think the biggest one is norms. So there have been a couple great blog posts written about how it is rude to send other people undisclosed AI outputs. And I think I, like, completely agree here.

I think, you know, if somebody, like, asked the question on the internet and then somebody else, like, goes and puts into chatchipity and then, like, paste the answer, it's kind of rude. Like, like, I was going here to ask the opinions of my friends or, or, you know, my followers, not just like, not chatchipity, I could have done that myself. And so I think, like, building this norm is something that, you know, it's very new technology. So we need to do it quickly, but I think this would help a lot for society.

Well, that actually just gets to the question that I have then, which is, I feel as though the major internet platforms are actually moving the exact opposite direction. I mean, I'm stunned. I'm maybe I accidentally clicked on something at some point, but the frequency with which I can email and then I open it up to respond in Gmail and there's that ghost text there that, I do just want Gemini to respond in this. I've never done that.

I also consider, I think that would be extremely rude.

I've never responded to any email with a response.

But they're basically telling you to do that. They're doing the exact opposite. They're blowing up these norms. And so I'm curious from your perspective, you mentioned work with Chora. But from your impression, to the major internet platforms,

think this is a problem with solving or from their concern is, like, you know what? Yeah, it feels like content, the better. There's mixed incentives there. What does that mean? It was for the big company.

It's funny because Google seems to be playing both sides. So on one hand, they had that advertisement, which people kind of blew up about where it's like, Oh, children can now send their heroes notes on, like, how much they respect them by using AI instead of like writing the note themselves. Like, this is wrong. This is like, societally bad.

But at the same time, they're working very hard to deal with the AI Slop on the internet in search results to make sure they're doing it. I mean, I think obviously there's a lot of incentives that play up around, like, product people who are incentivized to push AI because that is the corporate mandate. But yeah, I think overall, even, like, in my sphere of a bunch of people who are AI researchers, generally consensus is that like,

AI is a powerful tool, but like, Slop is bad.

This reminds me, my parents used to make me do these like hand-made greeting cards for every, you know, for Christmas, for like all relatives and stuff. And it was supposed to be a demonstration of my commitment to communicating to family. No, no, it traumatized me forever. And I hate greeting cards as a result of them of doing this, just spending hours manufacturing these things. But then secondly, the funniest thing was once we got e-cards, my parents immediately switched to using e-cards.

And just, and now, this is also the funniest thing, my dad uses e-card. He figured out that the e-card system can tell him whether or not you opened it. So he just uses it as like day-to-day communication now. That's so funny. Just send a email to your daughter.

I do it via e-card. It's like, I noticed you haven't opened up my e-card for international hot dog day. Please let me know what's going on. I'm so terrible in writing as a kid and my mother made me write all of these handwritten notes to think people for the gift I got from my Burmids. Yeah.

I hated it. But you know what? I have deep connections with all of those people that have placed it over the years.

That miserable one week where I just wrote and I got, you know, hand cream.

I think I paid off. All right.

Well, imagine doing that for like 16 years, basically, um, in a never-ending stream.

Next we wrote, "Thank you so much for coming on Outlaws." There was a lot of fun I'm fascinated by this conversation. Thanks so much for having me.

Yeah, really exciting to talk about this and I think Slop is a grown problem, so hopefully.

Awesome. I'm able to deal with this. 40% of the internet. I can't tell if I'm surprised by that or not. Well, what's it going to be next year at this time?

Oh, man. I don't know. It'll be hard to say. It'll be over your relationship. Um, yeah, almost certainly.

Great. All right. Thanks for coming on Outlaw. Thanks. (music playing)

Tracy, I love that conversation.

I just think it's like a really fun puzzle, right? Yeah, no, totally. It's very, like it seems like a fun question to solve. And I'm fascinated by this idea of how, like, with both humans and AI,

there's going to be this gap inevitable between what we know and what we can articulate. Because you've been sitting inside the AI versus text. There are things that we both know. For example, this is newsworthy and this isn't. This is a good episode of a podcast.

Yeah. This isn't. This is a credible sounding guest in this isn't. The gap between that and then being able to, like, explain why. It's like, well, you just sort of know it, right?

You just sort of have this feeling. There's an intuition. And that intuition is built up from numerous examples, which is the same way in a sense that, like, the AI is trained. It's like these things that you only know from patterns.

And you can see them without fully being able to, like, articulate exactly what's going on. Well, the other question I would have on that is, is it even going to matter in the long run? If you think about, like, so much of the internet is already built on bots. And the sort of false attention economy.

Like, if our entire, like, world view becomes shaped by AI, driven, driven. Yeah.

Does it matter if, like, the economics of the internet are still attached to individual bot accounts and things like that?

I don't know if I'm explaining this, but... No, no, I think it makes sense. And I do think, like, it is important. Like, we're going to have to change the entire way we've been. Yeah.

And Max said at the beginning, which is, and I've thought about this, which is that it used to be that if you came across a piece of writing, and the punctuation was excellent. Mm-hmm. And this spelling was excellent.

And it was, like, cogent sounding. Okay. This has been written by a smart person. I will read the conference series. Right?

And now there is this complete severance of, sort of, like, craft and output. Because you could, and you did this. Like, ask Claude to write an argument in favor of the most absurd proposition. Yeah. Yes, Claude to write an argument for me that the reason why Reagan wanted to do tax cuts

early 1980s related to these reports of UFO sightings on the 1970s. And it will write something that not only is it grammatically correct. It will actually, like, strain to come up with the best version of this argument before. And again, if prior to that having writing, like, oh, maybe the person, like, this person took this argument seriously.

But now this argument is just created, like, nailo. We're going to have to really, like, change our heuristics about this stuff.

We've created an unlimited stream of basically cranks with really good grammar.

Yeah. That's right. That's right. Because you used to be, we knew the cranks as they had bad grammar or they would email us. And, like, half the words would be in yellow and the other half would be underlined.

Green ink was classic example. These are the tools that we used to just like, oh, those persons of cranks. They, like, you know, half the words are at all caps and stuff like that. Those don't work anymore. All right.

On that note, shall we leave it there? Let's save it there. This has been another episode of The All Thoughts podcast. I'm Tracy All-Away. You can follow me at Tracy All-Away.

And I'm Joe. I didn't know you can follow me at the store. Follow our guest, Max Spiro. He's at Max Underscore Spiro Underscore. Follow our producers.

Carmen Rodriguez at Carmen Armand. Dashio Bennett at Dashbot and Kale Brooks at Kale Brooks. And from our AdLots content, go to Bloomberg.com/AdLots. Where the daily newsletter and all of our episodes. And you can chat about all of these topics.

24/7 in our discord discord.jgs. And if you enjoy all thoughts, if you like it, when we talk about how the internet is 40% Slock, then please leave us a positive review on your favorite podcast platform.

And remember, if you are a Bloomberg subscriber, you can listen to all of our episodes. Absolutely, ad-free.

All you need to do is find the Bloomberg channel

on Apple Podcasts and follow the instructions there. Thanks for listening. [Music]

I'm Francine Lacquat, an award-winning journalist.

And I've got a new podcast.

Leaders was Francine Lacquat from Bloomberg podcasts.

I've interviewed everyone from heads of state

to fashion icons about the news of the moment.

But I've always been curious who are these people as leaders.

I don't think there's one right way to be a leader.

Make decisions.

A poor decision is always better than no decision.

Listen to new episodes every other Monday. Follow leaders with Francine Lacquat wherever you get your podcasts.

Compare and Explore