Science Friday
Science Friday

Move over, vibe-coding. Vibe-proving is here for math

2h ago18:373,268 words
0:000:00

When ChatGPT first came onto the scene, it wowed users with its writing abilities, but drew laughs for generating images of seven-fingered hands and struggling with basic math, where 2+2 didn’t always...

Transcript

EN

(upbeat music)

- Hey, I'm Florliksman and you're listening to Science Friday. When ChatchipyT came out a few years ago, one of its most impressive features was its writing abilities, but something it was often ridiculed for, you may remember, other than making images

of seven-vinkered hands, was its inability to do basic math.

Two-plus two did not always equal for.

But recently things have changed. Last year, Google and OpenAI's models bagged gold medals at the International Mathematical Olympiad, edging out some of the best high school math leads in the world.

And now, some experts say AI is getting so good, it could pose an existential threat to the field. Here were the perspective are two mathematicians who thought a lot about this. Dr. Emily Reel from Johns Hopkins University

and Dr. Daniel Lit from the University of Toronto. Emily Daniel, welcome to Science Friday. All right. - Thank you. - All right.

We have seen, at least in the nerdy media, that I consume these very big claims that AI suddenly is revolutionizing math. It's fundamentally changing what it means to be a mathematician.

I want your takes on that. Emily, let's start with you.

β€œ- I think from the perspective of our professional mathematician,”

we're trying to evaluate where AI is on kind of the trajectory of a mathematician's life.

So, you know, we're first encountered a mathematics at school

where you're solving problems that have a numerical answer, maybe involving some geometric figures, maybe involving some arithmetic or some algebra. As you mentioned already, AI's used to be very bad at those problems

and they're much better now. More recently, AI's have been doing well with more advanced contest-level problems, the contest for undergraduate mathematics majors where they're writing proofs and what's exciting

in perhaps scary in this past year is they're starting to be tested on more research-level problems. So problems that are of interest to people who get paid to do mathematics. - Daniel, what do you think?

- Yeah, so the specific claim that AI tools are revolutionizing mathematics, definitely has not come true yet. I think they're leading indicators that it will be very significant, it's definitely changing

the way professional mathematicians including me work, but the case is Emily mentioned of AI's being tested on research-level mathematics. There are a few success stories. I think those are exciting and leading indicators.

There's not any example of anything

that I would consider to be revolutionary yet,

who knows what the next year or two will bring. So, I'm excited about that.

β€œ- And I think an important part of the story”

is mathematics is not just about solving problems, even research mathematics is not just about solving problems. I mean, solving problems can be beautiful, but actually stating the problems, figuring out what is an interesting theorem to try and prove

is equally important really what drives the field forward and we haven't seen any examples there where AI's are coming up with problem statements or new mathematical universities to explore. Like picking the things that we should be trying to solve.

- Yes. - I mean, is AI solving problems that people haven't been able to solve? Has that happened? Does it seem like it's on the horizon?

- I one way to answer that is that there is so much math that still needs to be discovered and there are a lot of mathematicians working together and individually all around the world trying to make new discoveries and prove new theorems.

But there aren't enough of us to solve every problem. And so in particular, there's a famous database of problems tied to the mathematician Paul Erdish that has been collected online. And some of them are quite famous

and have received a lot of attention. But others, we don't actually know if anybody's given a serious effort to them.

β€œ- So I'm aware of, I think, maybe three ordish problems.”

I'm going to mention that have been solved fully autonomously and maybe another 6 or 7 without prior solutions to literature that were solved by a human with AI help. And then a number of where solutions were kind of extracted from the literature by AI tools.

I think that's a really exciting development. I think that said, we should view this as an exciting sign that soon the tools will be useful to help us solve maybe more interesting questions that people have devoted effort into trying to answer.

- You know, it's interesting that you use the word exciting. Just as a contrast, we had mathematician Steve Sturgetz on last year. And he said something I want your thoughts on.

- I think the days when we will understand math may be numbered, that it will not be far in the future

When computers are producing really impressive math

that we will not understand.

And it will be correct, but it'll be like their oracles just telling us the truth and we can't understand. It will just be sitting there with our mouth open. - Emily, I agree 100% with Steve, that if we had an oracle to tell us which theorems are true

and which theorems are false, that would be unsatisfactory to mathematicians. We do like it when a proof tells us something new, so that something is true or something is false. But if the proof doesn't also explain something deep

to sort of justify the new conclusion, then mathematicians will go and search for a different proof that does people get celebrated for reproving old theorems almost as much as for approving new theorems. What I would imagine if an AI discovers a proof

of a theorem that humans have difficulty understanding is that humans will then try to prove it themselves and may discover a new proof that way. But I also want to bring into the conversation, the fact that there are sort of two different modes

in which an AI can produce a proof. And if it's really a proof that is beyond the level of human understanding and it's communicated just in natural language, so as the output of a chatGPT session,

there's an argument that actually mathematicians should just throw it in the garbage and ask instead for an AI to produce a proof that is formalized. - formalized like in math, in the language of math,

is that right? - So a thing that's exciting about the possibility of generating AI for mathematics

β€œthat I think doesn't exist in quite the same way in other fields”

is that there are these trusted software programs called computer proof assistance that are engineered by human experts, so not produced with AI in any way, that will take a very precisely written mathematical proof and check the logic line by line.

So a concern that many mathematicians share is that we're going to get a lot of sort of AI slopp, a lot of new preprints, purporting proofs to very famous theorems, but they're long and complicated

and require a ton of expertise to evaluate

and humans will just never be able to judge these proofs

for accuracy, but just as a large language model can produce Python code, a large language model can be asked to write the proof not in English, but in the language of these computer proof assistance, and then at least some aspect of the referee

can be automated, can be outsourced to a computer, and I think that's a more productive realm to interact with AI than just the natural language realm. - Daniel, I want to hear your thoughts on that and also on this idea that AI will become Oracle's for us in math.

- Yeah, so I definitely think it's possible that in the long term AI tools will become better proving mathematical statements to the human beings. We're not there yet, but it could happen. As at least you just,

that would be a little bit unsatisfying, like my goals and mathematician is to understand mathematics. That's not, I don't think it's not clear to me

that AI tools producing amazing and long proofs

of mathematical statements would mean that we couldn't understand them, like why wouldn't we expect an AI tool able to prove

β€œand all some theorem also to be able to explain it to us really well?”

So yeah, I feel confident that if I'm sufficiently motivated to understand a piece of mathematics produced by a computer, then I'll be able to do it, and that's kind of my goal. - But I guess, I mean, there's a big problem, a kind of a moral problem with AI generated mathematics

and human generated mathematics. So there's a very strong norm in the mathematics community that you do not post a proof to the internet, to the archive, which is our mathematics preprint server, unless you believe that it's true.

And we can't ask the same things of large language models because they don't, you know, that, I mean, they're completely untrustworthy, they don't have beliefs, you know, I don't really have a theory of mind of large language model,

so these sort of metaphors are imperfect. So I think we should demand more of a proof written by an AI than a proof written by a human because we don't have this sort of norm of trust, this norm of belief.

So if AI gives us a large proof that is difficult to understand, as Daniel said, the AI has not finished its job, it should also give us an explanation, but maybe that explanation should also be in this formal realm, so we can use computer tools to start the verification process,

which will be very long. - We have to take a break, but when we come back, I want to talk about vibe proving.

β€œYou've heard of I'm goating, have you heard of I'm proving?”

You will, stay with us. - Hey, it's Flora, I know who that you have heard this before, but it's not a line.

Science Friday really can only continue with support

from you, our listeners.

β€œI love making and telling these stories,”

and if you like listening to them,

please go to science Friday.com/donate to make a donation. Join us, stand up for the value of science Friday and public media, and help us continue to spark curiosity and spread the joy of science. Donations are fast, easy, and secure,

just go to sciencerightad.com/donate, and thank you. - How does AI even work? Where does creativity come from? What's the secret to living longer? Ted Radioauer explores the biggest questions

with some of the world's greatest thinkers. They will surprise, challenge, and even change you. Listen to NPR's Ted Radioauer, wherever you get your podcasts. - Okay, so we've heard of vibe coding. Is there an equivalent in math?

Like, could I ask it, could I be like, I don't understand how to do this? Can you solve this problem? - You could certainly ask that, and then, well, sometimes it would give you the right answer

and sometimes it would give you a wrong answer. There's still a lot of work to do for a human,

β€œwhich is you have to check the answer is correct.”

That's hard. So you can vibe proof something, and then, well, often it will be vibe wrong, and, but sometimes it'll be correct, and sometimes it'll be really useful.

So, yeah, I think this is starting to be, but part of the mathematical workflow, it's still important at the end of the day that a human is there taking responsibility for the correctness of the results.

And I think we can see some examples of this going wrong, like, in September of last year, I counted the number of papers put on the archive, which is the main math preprint server, with the word 'hodge' conjecture

in the title of our abstract. The 'hodge' conjectures, one of the millennium problems, I think they were about 12 total, and as far as I could tell, 11 of them were nonsense generated by AI tools.

- Like hallucinations, is that what we're talking about? - Well, I think it was a human who wanted to prove the 'hodge' conjecture,

'cause there's a million dollar prize associated to it,

so they said 'clod, prove the 'hodge' conjecture, don't make any mistakes, and then they posted whatever came out of that to the archive. And it wasn't correct. - Right, and what Daniel's alluding to is there's a danger

with vibe proving just like with vibe coding. So vibe coding for an expert programmer, can be an enhancement of their workflow, but it would also allow somebody who's a total novice, so I'll put my hand up here in the programming context

to think that they can achieve more than they actually can with computer programming, but it requires a lot of expertise to know when it's bluffing and when it's sound. - Well, one thought on this is that actually the ways

the model's bluff are remarkably similar to how humans bluff and approve. - Really, say more about that.

- Yeah, so you can ask, well, first of all,

you can ask an undergrad to prove something on their homework, and often they'll leave some things out they don't know how to do, and I hope to get full marks sometimes that works out. Even if you ask a professional mathematician

just off the top of their head, prove something, they might have some kind of general idea and not think through all the details, and sometimes that'll be right, and sometimes not. So I think it's actually remarkable

to watch the models doing math and see in some ways how human they are. - I mean, can the proof assistance sort of route out the AI math slop?

β€œ- Yes, I think so, so there are some caveats there.”

Firstly, it's a lot more exacting, so you can't sort of wave your hands and skip a few steps in a proof assistant, it will not accept it, or it'll make a note of exactly what has not yet been justified. So I do think it can help weed out some slop,

and in particular, there are some AI startup companies that are aiming to get better at writing formal proofs, and they are training by using the feedback from the proof assistant to evaluate partial proofs. The proof assistant is meant to be used interactively,

and will give you some information about whether each line of your proof is correct or incorrect, and so that information could be integrated into an AI workflow to improve the search for proofs, for instance. - Do these tools change what skills are required

to be a great mathematician? - You know, I think one of the things that's most challenging and being a mathematician today is just the breadth of the field. So Daniel and I are indifferent enough research areas

that I think we would have difficulty understanding each other's most recent papers, and I also know that it's very difficult to keep on top of the literature, even in my own subject area, because there's a lot of exciting work

that gets finished every month, and I don't read at the pace that I would like to. So a thing that I'm optimistic about is that these tools will help me stay more on top of the literature. - Yeah, I found them really useful to learn things

That I need to look at over the summer I learned from

chatGPTO3, what about hypercaler geometry,

which is something that I could have gotten a human expert and pastured them, but they would have gotten annoyed with me much more rapidly than the actual did.

β€œI think that any time a new technology is developed,”

it obviates some skills that by automating them that humans previously needed, and then it also opens up some new capabilities, and I absolutely expect that to happen with their actuals from mathematics.

Right now, I think it's sort of too early to figure out exactly what skills will be become less necessary in what new capabilities will open up, but I'm hopeful that it will open up

on a lot of new possibilities.

- I mean, I think there's a lot of misconceptions about what a great mathematician looks like. There's some stereotypes that this is something that's identifiable when you're very young, and in fact, maybe when you're my age,

I'm 41, which means I'm no longer eligible for the fields medal, but I'm a much better mathematician now than I was 20 years ago, because I have spent so much time thinking about mathematics in the last 20 years, and some of that is reading the literature,

some of that is attempting to solve a problem and failing, but failing in a way that when a similar problem comes up later, I will have better intuition about what avenues might be productive, and what might be not productive.

So I really think what makes a great mathematician is time and dedication to mathematics. Anybody who falls in love with the subject and is fortunate enough to be able to devote themselves to it is gonna be able to achieve some pretty cool things.

- Yeah, so, well, yeah, what makes a great mathematician? So certainly, I think loving the subject motivation, like Emily was talking about, very important, you definitely need some technical ability,

β€œand I think that's what people usually think about”

when they think about strong mathematics, like, "Can you solve this problem? "Can you prove this to Rome?" But I think in practice, the most influential mathematics is not actually about solving problems,

or proving it to Rome, it's about something that's a little bit less tangible, like some kind of philosophy or mysticism, even. Like, you're trying to develop a theory, you're figuring out why something is true,

often that involves discovering a new structure, and that structure might not even be something that is really precise, like you spread a philosophy to a community of other mathematicians and give them a way of understanding new mathematical objects.

I think you don't really think of that as, it's not a purely formal thing, you don't necessarily think of that as being mathematics, when you're in high school or middle school or college or whatever.

But yeah, the most important parts of mathematics

β€œare actually closer to philosophy, I think,”

than science. - Well, the final thing to highlight is creativity, and where great ideas often arise is really in conversation between mathematicians. I mean, we travel all the time

to speak to each other in person at a chalkboard about some mathematical ideas, and it's amazing in a collaboration that a new direction can just originate spontaneously from a bringing together of multiple minds.

- I love that, Emily and Daniel, thank you for taking the time to talk to us today. - Thanks for having us. - Thank you, yeah, that was fun. - Dr. Emily Real Mathematician at Johns Hopkins University

and Dr. Daniel Lit Mathematician at the University of Toronto. This episode was produced by D. Peter Schmidt. If you're surprised to find that you got all the fields from a conversation about math, please leave a review and tap follow or subscribe wherever you listen.

Thank you for listening, I'm Floor Lichtman. (upbeat music) (gentle music)

Compare and Explore