Brendan Foody on Teaching AI and the Future of Knowledge Work

[MUSIC]

Conversations with Tyler is produced by the Mercadis Center at George Mason University,

bridging the gap between academic ideas and real-world problems. Learn more at Mercadis.org For a full transcript of every conversation and hands with helpful links, visit conversationwithtiler.com. [MUSIC]

Hello everyone and welcome back to Conversations with Tyler. Today I'm sitting here chatting with Brendan Foody at the offices of Mercore. Mercore is an AI company. We'll get into more detail soon enough, which dates from early 2023. Brendan is the CEO and co-founder.

“I believe he's the youngest unicorn founder ever.”

Mercore by some estimates is the fastest growing company ever,

for instance the quickest speed to $400 million.

Our Brendan also at age 22 is the youngest conversations with Tyler. guest ever. My proudest achievement. There's more we'll get to soon enough, but Brendan, welcome. Thank you so much for having me Tyler,

excited to be here. Now I saw an AI online not too long ago from Mercore, and it said $150 an hour for a poet. Why would you pay a poet? $150 an hour? That's a phenomenal place to start.

I think it's because, so for a background on what the company does,

we hire all of the experts that teach the leading AI models. And so when one of the AI labs wants to teach their models how to be better at poetry, we'll find some of the best poets in the world that can help to measure success, to be creating evils and examples of how the model should behave. And one of the reasons that we're able to pay so well to attract the best talent is that

when we have these phenomenal poets that teach the models how to do things once, they're then able to apply those skills and that knowledge across billions of users. Hence allowing us to pay $150 an hour for some of the best poets in the world.

“So the poets grade the poetry of the models or they grade the writing or what is it they're grading?”

It could be some combination depending on the project, but an example might be similar to how a professor in English class would create a rubric to grade an essay or a poem that they might have for the students. We could have a poet that creates a rubric to grade. You know how well is the model creating whatever poetry you would like and

response that would be desirable to a given user. How do you know when you have a good poet or a great poet? It's that's so much of the challenge of it, especially with these very subjective domains and liberal arts, right, is that so much of it is this question of taste where you want some degree of consensus of different exceptional people believing that they're each doing a good job, but you probably

don't want too much consensus because you also want to get all of these edge-case scenarios of what are the models doing that might deviate a little bit from what the norm is. So you want your poet graders to disagree with each other some amount? Some about exactly, but still a response that is conducive with what most users would want to see in their model responses. Are you ever tempted to ask the AI models how good are the poet

graders? We often are. We do a lot of this. It's where we'll have the humans create called a rubric or some sort of evil to measure success and then have the models say they're perspective because you actually can get a little bit of signal from that. Especially if you have an expert, I mean, you know, we have tens of thousands of people that are working on our platform at any given time, and so often times there will be someone that

is tired or not putting a lot of effort into their work and the models are able to help us

“with catching that. So you've had a recent project lately. You hired Larry Summers. I believe”

for finance and economics. That was a little bit of a unique deal. He's been a guest on this podcast. Cast Sunstein for law. He's been a guest twice on this podcast. Eric Topel from Edison. I've been a guest on his podcast. How do you pick those people? Obviously, they're highly accomplished. What makes them good at doing this other than just being smart, productive people? Absolutely. We'll step back and provide a little bit of context on

apax or the AI productivity index and why we chose them to help with it. The largest disconnect that we were seeing in AI research is that everyone was focused on academic evils like GPQA for PhD level reasoning or IMO for Olympiad Math, which were wholly disconnected from the outcomes that customers actually care about of how do we get the model to automate a medical diagnosis or a legal draft or preparing a certain financial analysis of a company.

And so we chose legal experts, medical experts, finance experts, people that have a broad economic perspective to see what is the right methodology to think about measuring success across each of these domains. Working with them on segmenting, what are all of the different

Industries within law?

marketplace of all of these experts to best capture and measure how well models have automated all of those domains? So it's because they've had real-world experience and they're not

“only academics. Is that the way to think about it? I think that's part of it. I think a lot of them”

obviously have meaningful real-world experience, but also this broad vantage point of the entire industry, right? I've not just someone that specializes in a particular type of law or a particular industry in big law, but rather having this very large perspective and how we should structure the project, how we should think about the rigorous processes associated with curating the data sites, setting up the reviews, etc. And the paper you did with that group of people as your

researchers and many others, I should add, what's the main thing you all learned from that exercise and from the paper? I think the largest takeaway is the rate of model improvement at economically

valuable tasks is incredible. Like if you look at the level that GPD4ROW scored on this model right

a frontier model a year ago, and that against GPD5 today, the Delta is profound. And so it often gets put a number on that or somehow called a 25-30% improvement per year, per year, exactly. Well, now GPD5 is at 64% so maintaining that with definitely be challenging, but I mean,

“it gets my mind wondering, like what will this technology be able to do in another year or two?”

And how will that have this profound impact on the economy that so many of us have been wondering about for a while? When you give these numbers to what extent are you measuring, how well they do on the test versus how much economic value are they creating? Well, so I'll walk through the methodology and how we derive that essentially within each industry we start out with surveys of hundreds of experts. So think within consulting, we get experts that

were previously at McKinsey, Bay, and B.C.G. and other top consulting firms, and then we survey how did they spend their time? What percentage of their time is in customer meetings is an online research, and analysis, preparing deliverables for customers. And then within each of those buckets, we asked them to write at the corresponding prompts and rubrics associated with how they spend their time. So, you know, using their time as the best proxy we have for the economic value

associated with their salary or what customers are willing to pay for. And it's incredible to see

right, the model scoring 64% on that is pretty profound. Obviously, there is some complexity in mapping that to economic impact because in certain industries like medicine, you can't have a

“30% failure, right? You need to have your perfect sort of similar to driverless cars in some ways,”

but in other industries like an initial legal draft or consulting analysis, this technology's already starting to have a profound impact, and it's only accelerating. But isn't there something about switching from task to task, which the models can't do at all? So the model would beat me on a test. The model might even run better podcast questions than I do, but somehow combining

those all in a single entity. I can do, and even the best model, it's still basically a zero,

as far as I can tell. So the economic value is in a way still it's zero? Well, so it's interesting. I think what you're getting at is there's sort of two key things, the model struggle at that humans tend to be very good at. The first is these longer horizon tasks of not just something that we could do in a few hours, but something that might take us 50 or 100 hours to do. And the second thing is integrating multiple tools with our response and going about doing these things,

maybe interacting with people as one of those elements. And I think that that's coming very soon. And the next version of Apax. And what does very soon mean? You're doing your best, yes. Well, I'll talk about it in terms of Apax, and then I'll talk about it in terms of model and vatsmen, because there's a large correlation between the two. We're doing a lot to measure all of those capabilities and how models interact with the entire workspace and how models do these

very long horizon tasks and any value that we're launching in the next couple of months. And very quickly, once researchers are able to measure those capabilities, they'll be able to help climb them. And so I would be shocked if we don't have enormously capable models across those dimensions of lots of tool use with very long horizon tasks in the next six or 12 months. Let's just take the body of knowledge alone. Forget about the long horizon, just on on the spot

test. Let's say I'm cast unsteen. I know cast. He is an incredibly impressive body of knowledge in many areas. Where do we at the point where basically cast cannot ask a question that the best models cannot answer? Well, that's an interesting question. Well, I think it depends

Domain by domain, but in law, I think it's going to be a long time.

so much taste involved in legal responses that effectively getting all of the taste that cast has into the model is going to be difficult. I do think we'll very quickly get to the point where cast has a really hard time finding a mistake the model makes, right? Where he has to spend maybe a week just like trying to probe it. How far away is that? Is that that might be about two or three years? It wouldn't surprise me for a question in response. I would think it's six

“months away would be my guess, but if he asked it a thousand questions, I think he could induce an error.”

A 50 questions, I think in less than a year we'd be there. It depends also a little bit in how tightly you define an error. Like he might have all sorts of knowledge of niche areas of the law that the model isn't strong at. And so there's some question of how you measure this, but I hold cast in very high regard with respect to his niche knowledge of the law and ability to stump the models. And what would be an area where the human expert is relatively strong

in an area where the human expert compared to the model is relatively weak? There's interestingly a lot of areas in law where the right way of approaching something is not written down or codified. It exists more in the heads of experts, at least not explicitly. And I think it's those domains where there's a lot of taste that isn't well documented, that the models will struggle immensely with. Because they either need those tokens in the pre-training data of doing these

web scale training runs, or they need it in the post-training data of having a legal expert from us to create those data sets. And if they don't have those then the model will inevitably struggle with that particular problem. Now I've argued in economics that the leading economics journals should take their referee reports and the submissions and send them somewhere are youably here. I mean, would that be useful to you? It's certainly what. It's something we've talked about a

“bunch of the past, but I think that the largest way that these deep domain experts can help”

to contribute to the advancement of AI is defining the evils. When we have these phenomenal tests

for model capabilities, whether in economics, law or other domains, it's amazing how fast the

researchers can help climb them and optimize random. And so more help in building these tests and sending them to us and other labs is extremely impactful. So those are nonprofits, those institutions. Why don't they descend into you now for free? Do you have a theory of this? I'm not sure exactly. It would improve science, right? It would improve science. I think maybe two things. One is awareness of this. I think that while evils are the thing that

everyone's talking about and so can value the AI labs, it feels like most people in the rest of the country couldn't quite describe exactly why you need an eval. And I think the second is a little bit of fear, right? Where everyone worries about how is AI going to impact their jobs,

“their work, their ability to contribute to the economy and be meaningful. And I think that that's”

always top of mind, even for nonprofit organizations that want to contribute and

preach this world of abundance. So let's say we took the life, economics, or legal, whatever seminar is, let's say the top 10, top 20 schools recorded the mall, somehow anonymized the data, but you had the comments in transcript and sent that to you. Would that be useful? It would be very useful. One thing I will say though is that there are sort of two kinds of data as a good way of thinking about it. The first kind of data is just the output. You have

some curriculum that the model is reading and learning from. The second kind of data is some way of measuring success, where you have the rubric for the response, you have the task question

answer, you have the unit test and code, and that second kind of data is the most valuable,

where we're able to have the models attempt the problem many, many times, score those responses and learn from them, but both are incredibly impactful and things we would love to get support with. So on your wish list, just to make this more concrete, you can have some kind of data, forget about realism, you just get it for free. What is it you most want? Oh, and just say social sciences. I think that we tend to focus a lot on what's

economically valuable, and so if people have tests that the models are bad at, that map to a meaningful amount of economic value, you know, it could be an academic domain that can be applied to create a lot of value in other areas, that's super exciting for us. Maybe a good heuristic is, if we could build a model that without seeing this test and reading through it, could max out the

Test, how much economic impact would that add?

is most helpful, right? And so maybe in medicine, it's, you know, a test around how well the

model is doing a certain diagnosis and a particularly difficult domain where we think the models can add a ton of impact maybe in economics, it's, you know, areas of analysis and modeling of businesses that aren't well codified, but could meaningfully impact the way that we underwrite businesses. Those types of things are what's going through my head. And let's say it's poetry. Let's say you get it for free. Grab what you want from the known universe. What's the data that's

“going to make the models working through your company, better at poetry? Well, I think that it's”

people that have the nominal taste of what would users of the end products, users of these frontier models want to see. Like someone that understands that when a given prompt is given to the model,

what is the type of response that people are going to be amazed with? How do we define the

characteristics of those responses is imperative? And so probably more than just poets that have spent a lot of time in school, we would want people that know how to write work, that gets a lot of traction from readers, that gains broad popularity and interest drives the impact, so to speak, and whatever dimension that we define it within poetry. But what's the data you want concretely? Is it a tape of them sitting around a table,

students covering their poems? The person says, I like this one, here's why, here's why not? Is it that tape, or is it written reports, or what's like the thing that would come in the mail when you get your wish? The best analogie is a rubric. If you have some rubric for how to grade? A rubric for how to grade. So if you have here, like if the poem has evokes this idea that is inevitably going to come up in this prompt, or is a characteristic of a really good response,

will reward the model a certain amount, if it says this thing will penalize the model, if it styles the response in this way, will reward it. Those are the types of things. In many ways, very similar to the way that a professor might create a rubric to grade an essay, or a poem.

“Poetry is definitely more difficult one because I feel like it's very unbounded, all right?”

With a lot of essays that you might grade from your students, it's a relatively well-scope prompt where you can probably create a rubric that's easy to apply to all of them. Versus, I can only imagine in poetry classes how difficult it is to both create an accurate rubric as well as apply it. And so the people that are able to do that the best are certainly extremely

valuable and exciting. But to get all nerdy here, you know, a manual content is third critique,

critique of judgment. He said, in essence, taste is that which cannot be captured in a rubric. And if the data you want is a rubric and taste is really important, like maybe contrast wrong. But how do I square that help picture? Is it by invoking taste, you're being circular, and wishing for a free lunch that comes from outside the model in a sense? Well, there are other kinds of data that could do if it can't be captured in a rubric.

Like another kind is Arleach F, where you could have the model generate two responses, some order what you might see in Chatchee BT, and then have these people with a lot of taste choose which response they prefer, and do that many times until the model is able to understand our preferences. And so that could be one way of going about it as well. I'm sure you know these studies where there's some AI-generated poems and some human-generated

poems, and often the humans prefer the AI-generated poems, even though to people with

“quote unquote taste, they're worse. Yeah. I mean, what's that? Who side do you take there?”

Well, it depends what you're optimizing for. I mean, I think that generally we're in the mindset of for the power users of these AI products, what are the types of responses that they would want to see and be happy with? But it's challenging because that sometimes deviates from the types of responses that the top 1% of experts in poetry might say as a broadly good poem. And so striking that balance is really up to a lot of the researchers and product leaders of the labs of what

do they think good looks like and how do we act as their partner in defining that? If you could model a much older poet, William Wordsworth, Blake, John Milton, Rilka, some of my friends say, well, there are no truly great poets left anymore. The best poets were way back when. Is it a goal to model the older poets and figure out what they would think and rather than having Larry Summers and Cast Sunstein come in that you have some AI-generated

model of John Milton? Maybe. Well, I will say a ties back to the goal of apex, which is that we saw people were too focused on a lot of these purely academic domains and not focusing enough on how people actually use the models in the economy. But I certainly do think that especially

As we start to automate more industries and there's more liberal arts and the...

where people want to spend time on poetry, certainly building the tools to help them create

phenomenal poems and make them happy and their readers happy as definitely the way we would go about it. I'm not sure if it would be using the archetypes of these former poets. How would you go about it, Tyler? I don't know. I don't trust contemporary poets. Frankly, there aren't many of them I like to read. You know, Jeffrey Hill would be one. Some are too postmodern, some maybe are too woke, some are too identity driven. I love older poetry, so it's not that I don't like poetry,

but I worry about putting them. They're not quite in charge. I get that, but giving them so much leeway. Yeah, it does evoke this really interesting idea of how we want to teach models and measure success of these models. Is it via consensus? Is it via a handful of the top experts in that

“given domain? And there's really no correct answer, and I think that different AI labs, different”

researchers will go down different routes, and that will frame the ways that these products feel

and the things that they ultimately achieve. Like maybe we should only enthrine the current age

when the current age is at a peak. Like Scott Sumner says the best movies were maybe made in the 1960s and 70s, whether or not you agree, but you could have movie evaluators be only from that time. There are some still alive. If you think the best heavy metal say comes from the 1980s, well you wouldn't have like the current evaluators, you would pick a evaluator from the 80s. The best poetry does seem by most people standards to be really quite old, and we can't

resurrect those individuals, but the notion that you enthrine current taste when taste changes so much it's a very interesting decision. It certainly is. My guess is that

in a long enough time horizon will enthrine taste from every different decade and every different

era, and then the model will be able to learn what tastes to you have, and how does it pull on each of those knowledge bases to best personalize it to your preferences? How much of society, ideally, should become a big reinforcement learning machine? We sort of tape everyone, everything,

“every debate people have over the coffee table. I think it will become an immense amount very”

quickly. There's obviously still going to be the personal conversations over the coffee table that people don't want recorded, but my firm belief is especially for economically valuable tasks will move towards a world where people do things once. Instead of the investment banker redundantly analyzing a data room to prepare and analysis of a company, every couple of weeks for a new project and a new customer, they'll teach the model how to do that once in the

particular domains that they operate in, and similar to building software ones, they'll be able to use that many times as they use their agent. Instead of the customer support rep, not necessarily responding to tickets every day, and they'll find the mistake that the agent makes, they'll turn that into an oral environment, and then all of a sudden the agent will be able to solve that problem many times. And so I think in many ways the economic incentives and how

knowledge work will change has a lot of similarities to software, and that will move towards these fixed costs investments of teaching an agent how to do something, building an oral environment for something, and then being able to use agents as many times as we want to perform that

“activity, and that's why I believe that a huge portion of the economy will become an”

oral environment machine. And do you think pendants or metal-like glasses will be more important than that? Oh, I do. I think a lot of bad. If I take myself, like I don't do that much small talk, say you attach the little pendant to me, and you got the tape of all my conversations you could feed it in. What's the social value of that? Is it like $5, $50 a bit more? How valuable is that? Well, it certainly depends a lot by person. I would imagine yours are quite valuable.

But quite. Like what's... Come on, please. I'm not asking for an offer, but exactly would you pay? Well, I would pay a lot just out of pure curiosity, but if I were trying to think about how valuable it would be to our customers and our business, I imagine it would be something in the order of it's hard because it changes over time. Certainly tens of thousands of not many hundreds of

thousands of dollars a year, and how that evolves over time. But my guess is that for the vast majority of people, they'll still care a lot about privacy. And so maybe that data will be collected to personalize their individual agent, but they're not going to be comfortable with that,

Getting added to the broader model ways to customize the base model that bill...

But that's easy, so you can take me with my pendant. I run it through my AI, and I say take out

anything I don't want mere core to hear, and it will do that quite well, maybe not perfectly. And then you get what's left over, all the debates about elasticity is in tax incidents. Maybe, I suspect you're probably more comfortable with that, but most people would probably say, "Well, you're asking the AI to be the layer of trust to remove the sensitive information,

“but it's going to have a bias and do it so." And so I think there's always going to be some”

level of sensitivity around these topics, and I actually believe that some of the companies that have done a very good job around their brand of privacy are going to have an advantage in it. Like,

I think Apple, well, maybe not totally at the frontier AI yet, has done such a good job in their

brand-run privacy, and that's going to allow them to have a lot of trust from users in a way that they're able to collect all of this precise information. Let's say three to five years out. When the top models will be both clearly better than virtually all human experts, or maybe all human experts, and recognize to such. And a ladder we certainly don't have. What do you think in that world, the reputation of expertise is like? Now, one view is no

one respects the experts because the machines are better, but I think an alternative possibility is the machine by not being tied to a personality is less disliked, and people actually respect the experts more, because they get this impersonal distillation of the experts. It's like,

oh, the experts did that. They're so amazing, and they're not annoying me like on the

“late night TV show. Like, what will happen to the status of human experts?”

I think so. I think that I definitely already at the point where there are certain demands where I trust Chatchee B.T. or whatever model I'm using more than I trust, particular expert in that industry, you know, for a very quick, like, medical perspective, even in some cases, or whatever it is. And so I think that there's some element of it being highly confident, there's some element of it not having a face to it that causes us to place this high

trust. But I do think that the point you made at the beginning is around evoking the question of, what is the point at which these models will be able to do everything that experts are able to do? And my read on the market is that models are advancing very, very quickly in being able to automate call it 50%, or 75% of what humans and experts are able to do, but will really struggle with that last 25%, and I think that for a very long time, human expertise will be imperative

to help accomplish that last 25%, as the ultimate bottleneck to more economic prosperity and productivity. How long until the best models can write a poem as good as the median Pablo Nerudapom? Oh, I think that's probably not too far off. I think that's a less than a year. Yeah, Pablo Nerudapom. I've got to calibrate on poetry, so I'd have a hard time saying, but I think it's much further out. And is that your intuition? I agree. I think that's

consistent with my intuition as well. But I think that this longer tale of advancement is generally the most difficult. The other heuristic I have for it is that going back to this dimension of the time horizon of the task, models are, there's not much superhuman with what you can do in a chat window, right with your chat bot, but they still can't drop to an email for us, they still can't schedule a meeting, and those things will come, but I think that there's a long way before we're

able to tell a model go off and build a startup for 90 days, and there's going to be an immense amount of human expertise associated with how do we get to that across every knowledge work vertical that we want the models to operate in? Insofar as we turn society into this big engine for reinforcement learning, what new jobs get created by doing that? Well, I think the most interesting part of our business is that everyone else in Silicon Valley is talking about how we automate a way jobs versus

we're very focused on how do we build this new job category of people training agents, building

“our environments to help teach models, and that's what I believe it'll converge to instead of”

the investment bankers doing the analysis, they'll build our environments and train agents, and it'll be the same across consulting and software engineers and customer support and pretty much every knowledge work vertical, and so it's hard to say the exact pace at which that'll happen, but I would not be surprised if within five years, a majority of high-end knowledge workers

Are training models, whether in their full-time jobs or through our marketpla...

improve agents at whatever workflows they want to automate. And to hold those jobs, how much

“technical AI will a person need to have, or do they just have to know about nothing?”

They just need to know about the thing, the only element of technical AI that they'll need is to find where the model makes a mistake. So long as they can find where the model makes a mistake, and sort of understand in some ways the frontier of the model and its capabilities, how you can push it to its limit, then it's relatively easy to create some criteria way of measuring that mistake so that the model can learn from it, and I think we'll have that across every different vertical

with every different tool with these very long horizons, whether it's 100 hours or 100 days

that we want the model to work on something, and that's going to very quickly become the primary

bottleneck to model improvement. Is the demand for software a price elastic? I think it's extremely price elastic. In fact, I think that the elasticity is the exact right thing

“to hone in on with respect to how job displacement will evolve in these domains. I think if we make”

software engineers 10 times more efficient, we'll have even more software engineers, maybe we'll have 10 times as many software engineers and build a hundred times as much software, versus other domains, maybe that's not the case, maybe we only need so much accounting in the world, or we only need so much customer support, but I think software engineers certainly will be able to do so much

more. Where else do you think of as price elastic? I think that building businesses is also so

a lot of the product and distribution associated with software is certainly going to be something we see a lot more of. I think there's a lot of domains, even if you think about investing, obviously it's not as price elastic as software, but I do think that there's still enormous inefficiency with respect to how we allocate capital in the economy. If I think back to the early days of more core, we were having a hard time getting our $10,000 of working capital for our

additional seed investments, and then very quickly once you get to a reasonable scale, the markets are very, very capitalized. I think a lot of this early capital allocation, as well as even just better understanding how companies will develop over time, is going to be really interesting and also how that information and analysis manifests itself within companies, right? For an operator, they sort of have this investing problem of what are all the different bats that they have within

their companies, how do they allocate capital and resources associated with that? And so I think that there's so much elasticity with respect to how we build more products, how we distribute those products, and how we allocate resources within companies more effectively. What will education look like? Five to ten years now. I think education is one of the things I'm most excited about, where a good heuristic is if everyone has cell con as their personal tutor, available 24/7 to

teach them whatever topic they want to learn. It'll be that it's much easier to motivate themselves, it's much better access to information, much better ways of explaining that information, and that'll be profoundly impactful. That seems less price elastic, right? Like only so many hours of cell con a day, no slight intended to him, but it's not going to be 27 hours a day, right? That's true. That's true. So employment for teachers, researchers might shrink?

I think in some ways areas of that may shrink, but I also think that there's a large element to teaching that exists in personal relationships, of which the model will be able to do part, but not all of it, of how does the teacher act as guiding the student through their journey and helping them to improve both in their curriculum as well as their emotional development?

“And so I think teachers will still play an important role in the economy, and ideally able to just”

provide higher touch points of contact with all the students and smaller class settings. So this is October 2025. How many people work at Mirkoor? Right now, we have just over 300 people across the world as our full-time employees. How did you hire so many good people so quickly? Well, we used our technology to our platform to help with it a bunch. I mean, the origin story of the company was automating all of the ways that we would review resumes, conduct interviews, and decide

who to hire. And so the ways that we assess talent, the ways that we optimize funnels to build out teams is really ingrained in the DNA of the company. And a top priority of me and my co-founders. And so I'm extremely grateful for everyone that we have on the team and they make it look easy.

How do other people do interviews wrong?

talked about a bunch because you obviously wrote a phenomenal book on talent. I think what of the

largest problems that people make is that they don't measure the actual skills and capabilities that they want someone to exhibit on the job. Instead of focusing on how do we measure how well this person does their investment and also the data room. They have this vibe space conversation of, you know, where did the person grow up? How similar are they? Do they think they would enjoy hanging out together? But obviously, like, that's still important if you're having a working relationship.

“But I think that they often overindex on that relative to the skills that people actually exhibit.”

And so we just give them a project. Give them a part. And great, in essence. I think that's the

cleanest way to do it. Let's say it's not programming as the company gets bigger. The major AI

companies, a lot of them now are quite large. And most of the people who work there don't do AI at all. They do jobs that are not so dissimilar from what they might do at Coca-Cola. Which is fine. It's just part of their legal, their communications, they do events, whatever. When you're trying to hire people like that, say, like, what's the test? What's the project? Or what is it you look for? I think that that's definitely more difficult. I think you

probably want to look for cases in their life where they've worked in similar roles because you can't curate a project that's as similar to exactly what they've done. And so you would

see the, you know, the best proxy for that. And then really drill in to understand the details of

that working environment. How similar it is, how well they performed and that talking to people that previously worked with them in that environment to get a gauge for it. But it definitely is more difficult to measure someone's slope versus how they'll develop on the job over six month time horizon than it is to measure their way in or so. And so I think that's one trend that we've found in Thomas Osmond. Do you think body language in an interview is predictive?

I think it can be. But I also think it can be a false signal because I've definitely had cases where I overindex on. Oh, this person feels a little bit awkward or whatever it is. But they do a

“phenomenal job at the actual work. And so I think it's important to be very cautious around”

which of these signals are actually correlated with performance in which ones are. Articularness, overrated or underrated, depends a lot on the job, depends a lot on the job. Let's say 10 years from now when we can really measure pretty well their performance of people were interviewing today, less than 10 years. But say 10 years, let's say you have a company such as Amazon, it does a very large number of interviews. And let's say they're all taped and you

run them through the best AI models. How good a predictor do you think that will be in your opinion? I think that it will be certainly superhuman because humans aren't very good at it. But it's still such a difficult problem that there's going to be variants. And I think that for roles like the one you described, what's going through my head is there's a lot of confounding variables. Did the

“person have an issue in their family that caused them to be off their game or not show up to work?”

Did they get sick during the interview process and maybe weren't full of energy? There's all these things that just add noise to that problem. But I do believe that as we're able to get all of that data in context to have all the notes from the manager around what was happening in this person's life both during the interview process as well as on the job that will allow it to over time become phenomenal. So maybe we have that on a 10-year time horizon. How can we make labor

markets more efficient? I think that one of the largest inefficiencies in labor markets is that everything is disaggregated. And that when one of our friends is applying to a job, they would apply to a couple of dozen jobs. And when companies considering who to hire, they'll consider a fraction of a percent of people in the economy. And it feels like there needs to be a structural change there where there's an aggregator that everyone applies to in every company, Harris, from facilitating

this perfect flow of information. But we need a very good AI for that to work. I think I think a very good AI will help with that working. And the reason is that the reason I think it doesn't happen today is that there's a very difficult matching problem. And let me give LinkedIn as an analog. LinkedIn has all the distribution to pretty much every company and every candidate. But at the same time, it's incredibly difficult to understand based on someone's LinkedIn profile,

whether they'll actually perform well at a given job. And so I believe that in that case, it's very much a matching problem. Less so distribution and aggregation problem to facilitate

This effective flow of information and aggregation within knowledge markets.

in line with the fact that the nature of jobs is changing dramatically, right? Previously,

everyone would think about this problem in the context of full-time roles. But as we trend towards this world of everyone building our own environments and being able to do work remotely and train models in this fractional way, that also will shift the dynamics of enabling more aggregation and enabling more globalized matching. And how that will impact the economy. Some of my friends think that mentors and nepotism will make a good comeback. And they say,

everyone will submit a perfect cover letter, have an optimized LinkedIn profile to leave and have practiced with an AI doing the interview. They won't all get up to speed, but a lot of them will. And they'll be the large mass of apparently pretty qualified candidates. And what you'll actually do is resort to the old tried and true. Well, do you know this guy's uncle or something else

“who can recommend them? Agree, disagree? I think in some companies and industries that”

will happen, I agree with it. My hope is that we have models that are helping to run companies in a very thoughtful, efficient way that are data driven about it, where the models have a evil set of all of the performance reviews of people in that given company. And they're able to make an accurate prediction over whether this reference or that piece of nepotism should actually be considered or maybe as a counter signal, right? And so that's my hope, but it'll probably play out with some

combination of both over time. In the AI sort of run labor market, let's say it's more efficient,

but you think there are fewer second chances and late bloomers in that world. You get scored to

early, so to speak. And then you're tracked, it's a bit more like how European schooling systems can differ from American. I think there will be a lot of second chances. And the reason that there will be is that oftentimes they're effective. And so the models will identify that and realize that maybe someone wasn't the right fit for that first role. There's another role that they could be really good fit for. Because I do think that there are jobs in the economy that

almost ever would excel at. And it's really just this matching problem of finding the intersection of something that they're accepted about, where they'll also add an immense amount of economic value. As you know, there are AI services now. You're doing an interview across the top or the bottom

“of your screen, the AI can give you advice answers. Does that work at all? What do you think of those?”

We write up against a lot of those. One thing I found in talent assessment is that initially

people tried to work against AI. Similar to what we do in the academic settings, where people would try to say, we're going to have you write the essay on paper so that you're not able to use chat should be teetit to help you with the essay. When really the right way of approaching it is seeing what people can do when using all of those tools. If we tell them, hey, use all of these phenomenal code gen tools and record your screen and building a product to see what you're able to do

over the course of an hour. That's a far better predictor of this person's ability to actually deliver impact than it is to say, don't use the tools at all. And so I think that's one shift that we're going to see and will likely frame the relevancy of a lot of these AI cheating tools over the coming few years. Can someone fool you by using an AI cheating tool? Or do you feel you

“more or less always know? I think that there were cases where people could fool us, but now we're”

quite good at figuring it out. We're quite good at figuring it out and also moving towards assessments where we almost encourage it, right? And are comfortable with the fact that they're using these tools because we want to see what they're able to do with them. So you were a teal fellow, right? And you dropped out. How could they improve their methods? Well, this is something we've talked with them a lot about because the teal fellowship is constrained by that exact matching

problem that we were talking about earlier where they can only consider and interview a fraction of a percent of the people in the world that they think would be a good fit for the fellowship. And so we've worked with them on building out AI interviews that are able to better assess teal fellows and using models to analyze the transcripts of those recordings to see, what are the signals to better select teal fellows and all of that, which I find very interesting.

But isn't it part of their strengths? They Peter, he's quite controversial, politically and otherwise, being a teal fellow has a certain brand. Just distinct from anything political. But it's a very particular thing. Not everyone wants to do it. Doesn't it work well because it's an extremely local market and you get people with a certain kind of orneriness and selecting from that pool just goes pretty well and you don't want to be in the bigger pool of people?

Maybe I think that you're right and the element that referrals are very important, right?

Oftentimes, great people know great people and so they'll always need to leve...

But at the same time, I think they rightfully care a lot about people that think unconverentially

and come from unconventional backgrounds, that people from every part of the world that might otherwise not get a meeting with venture capitalists or some of these more traditional institutions. And so ensuring that they're able to consider those candidates and to give them this opportunity and incorporate them into the fellowship is incredibly

“important in part of the mission. Could it be scaled ten eggs? Absolutely. Hundred eggs? I think so.”

Hundred thousand eggs? Well, I think it's sort of ties to what we're talking about earlier of the elasticity of demand for better investors, right? Because in some ways hiring has so much overlap with investing of, imagine if we could have Peter interview everyone in the world when they're 18 or 20 or whatever the age is and make a decision around whether he wants to give them

100k check. That would probably be very powerful with respect to economic mobility and how many

companies were able to create. And so I think that will happen and it's just a matter of time of building the right technology and the right focus to enable it. But it's the following possible. Let's say Peter is a just tremendous interviewer that's easy to believe. But he's really a great interviewer for the subset of people attracted to him. And if you just put him out in the broader tool, who's going to be a lifeguard at the swimming pool or something, maybe he's just not that

“good in the interviewer for that. I agree with that. I think that's certainly the case. And so”

imagine if you had a panel of domain experts across every industry that were able to perform these interviews. Because certainly the best models will be better than any single best individual. But I would expect that the aggregate sum of all experts in each domain will likely remain better than the models for a long time. Now you dropped out of school. Now you're doing the company obviously are very busy. But imagine as an act of magic, you could have a free year just inserted

between today and tomorrow. And you come back and nothing has changed. To go off and do anything you want with literature, with art, with travel, with music, with climbing the outs. I don't know. What would you do with the year? That's a fascinating question. Can it be, yeah, I related? No. Can I be company related? Let's see. I would love to travel. I think that sounds like it would be a lot of fun. Because as you can imagine, in running the company, I've worked

100 hours a week for last three years. And I love doing it and I'll continue doing that. But I do think that seeing the world and getting more of this understanding of how to perspectives vary by country and geography, how are people thinking about AI differently elsewhere, is really interesting. I really like that. I remember after Chachi Petit came out, Sam did this world tour of going to all the different places, seeing what they thought about AI, how they viewed it impacting the world.

“And I think that global perspective is incredibly valuable and formative. Where do you want to go”

the most? I want to go to Japan a lot. I've never been to Japan. So I'll have to make it out there.

That's probably my topic. It's a great visit. One thing I found since I have traveled a lot, obviously I'm older. And in some ways less busy than you are, that it helps me interview quite a bit. Because people more and more come from all over. And it's like if your model has the poetic case of different areas of John Milton, words worth Shakespeare, whatever, traveling is an individual's version to get some version of that. So if say you hire a lot of people from India, I suspect

you do. It's a popular country, a lot in Bay Area. Going to India then becomes very important. Because you get a better sense of just where they're coming from. Yeah, I completely agree. I think also being able to connect with those individuals very quickly around, hey, I've been to this place. And I'm very familiar with India and all these different things is really helpful in building relationships and setting up trust across all the different people that we work with and interact with.

How did your eighth grade donut company do? What do my favorite topics of? Well, so I could tell the story, which is I initially realized that Safeway Donuts were selling for $5 a dozen. And my eighth grade mind was thinking, that is such a deal. I would pay $2 a donut. I bet my friends would as well. And so I would bike down to Safeway. I would buy Safeway Donuts for $5 a dozen. Go to my middle school, sell them for $2 each. Eventually, my middle school called me into

the principal's office to shut me down because I was scaling up my operations. And then I moved my donut stand about 50 feet over off of school campus so they couldn't police me. I paid my mom $20 a week to drive me in her minivan to be able to bring more donuts to and

From.

I think it was about right. I anchored it on the cost of an Uber. And I was like, I'm not going to pay

more than an Uber, but I need the card to wait long enough that I'm able to load up, you know, 10 or 20 dozen donuts. And so I did this. I'd pay my friends and donuts because I perceived the cost of the donuts as, you know, my cost basis versus they perceived it as $2 each. And so I had a little bit of arbitrage in the salaries. I had competition pop up where they would sell Chuck's donuts, which are higher end donuts, but they had a $1 cost basis. And so I dropped my

prices to $1 for two weeks to drive them out of business. Before I had learned anything about and a competitive laws. So those were just a few of the stories from my, my eighth grade, donut dynasty is what we call it. So other than just intelligence, what makes a person good at extemporaneous speaking? And you want awards for this, right? I did. Well, actually, I want awards for it, but I wasn't nearly as good as my co-founders. So in high school, we all did speech and debate

together. So you knew each other from high school? We knew each other from high school. We knew each other 14, right? It's 14 exactly. And we were on the policy debate team together. We also did national extemporaneous speaking. And they were the winningest speech and debate team of all time, and policy debate, the most competitive event where they won the tournament of champions, the national speech and debate association, and NDCA, the three largest national tournaments,

which no other team has ever done. And I did okay, but I'm just luck second. So I,

though he's, you stumble over words, or mix the exact, it wasn't quite the same level as them,

“but I think there's a few things that go into the answer of what makes on phenomenal at it. I think”

that high clarity of thought often correlates very strongly with, you know, people that speak very well, and so as you mentioned, intelligence plays another role. I think a second thing is confidence, someone that's willing to speak and improve and iterate on it, because oftentimes it's just doing more of that activity that allows you to improve on it. And then maybe a third one is more than just intelligence. It's also the speed of thought. And I think about those as different dimensions.

There are certain people I think of as having very high aptitude, but thinking very deeply and slowly about a given thing. And other people that I think of as having, you know, reasonably high aptitude or medium aptitude, but being able to like be quick on their feet. And so I think definitely think there's some innate element of that. And which are you? I tend to think I'm more this lower, deeper thinking bucket, but it depends a little bit on how much coffee I've had.

So you started the company so you were 19? Yeah. Why is it there's a positive statistical correlation between being dyslexic and entrepreneurship? And there is one in some published papers. Yeah, that's the mechanism. It's shockingly strong, actually. I'm not sure exactly, but I find that one unique thing is that it feels like my brain works a little bit differently and that there's certain things that people are so much better at than I am where, you know, they're reading through

evidence and a debate round very quickly. And I could never do that. But there are certain ideas

or ways of approaching a problem that are just different that enable more creativity potentially being unconventional and doing so. And I think that that is one advantage I've had. And one of our early investors actually Scott Sandell is dyslexic and back to a lot of dyslexic entrepreneurs. And so we've talked about this a little bit. One of my hypotheses is that quite early on

“you have to learn how to delegate. And that's a skill that when people are not forced to learn”

often very competent people don't become good at it until much later, but that I select a person is good at it right away. Totally. Yeah, asking people to help read something for them. That's right. Did you please do this for me? That certainly could be the case. And focusing on bigger picture and some useful ways, at least for being a founder, not good for every job, of course. Totally. But I think one thing I really came to appreciate, especially

during high school is that there are certain things that some people are phenomenal at and others are horrible at. Like I felt areas of debate, it reading through evidence quickly, where I felt extremely unintelligent and it was super humbling. And so much of finding success in your career is just understanding like, what are your strengths and how do you leverage those and much less about what are your weaknesses? And so that's something that I've sort of taken with how I

approach more core, but also how I encourage our employees to think about their roles within the business of what are the things where they have these comparative advantages and phenomenal strengths and how do they leverage those most effectively? How much do you feel you're in touch with the general culture of intelligent 22 year old men in the United States? Or are you just so in the

“company, you have no idea what's going on? I'm so in the company, I don't know. I think that”

I obviously was in college for a couple of years before I dropped out and so I had some people

Around me that so much of our company is 22 plus or minus a couple of years.

that heuristic, but I certainly don't think that I have spent as much time with people my ages

if I had stayed in school as another comparison. This is not a question about you because we don't ask personal questions, but a good tech friend of mine, you've probably heard of him, he says so man in that age bracket 22 23 that there truly is a dating crisis that something has gone wrong, not about you, but just America in general, the very smart, possibly a nerdy person in that age group,

“is there a dating crisis? I think certainly in San Francisco, not a New York, but certainly in”

San Francisco. And you think it's just gender imbalance or the country screwed up more generally? I haven't thought too much about this. I think it's probably gender imbalance in San Francisco, especially in certain industries, but I think that dating apps are probably generally in society, I don't use dating apps, but generally in society, helping to drive a lot more efficiency in solving this matching problem. So you're pro dating app? Most people I know are against them.

No, I think they're good. I'm very much a proponent of better technology to solve these matching problems and enable people to be happy in their lives. Your last name is Fudy. Should I believe a nominative determinism? Are you a Fudy? I certainly love. I did. I get good food. It's funny.

My dad always loved cooking and growing up, and was certainly much more of a Fudy than I am,

but I a little bit of it rubbed off on me. And so while I'm not as much into cooking, I love eating good food. Where in San Francisco should people eat? Or nearby? Lots of good restaurants. I think there's sort of the everyday restaurants that I think are very good and then higher end. Every day I love Mexican foods. So El Matatez is a great Mexican restaurant. I also offer a higher end food. I like Catonia and California twins. Lots of good restaurants like that.

At the metal level, what's the thing people should know about eating out here?

“Like where I live, I would just say you need to know to go to the suburbs.”

May or may not be true here, but here, what do they need to know? Other than particular names of restaurants? I find that belly is really accurate the app for food ratings in San Francisco because there's a high downsteer users. And so if you use belly as your guide, you'll generally find good spots. Why is the company called Miracle? Miracle means marketplace in Latin and we want to build the largest marketplace in the world. So we named it Miracle.

We're from Mercatist, you know what Mercatist means in Latin? It's a variant. Yeah, it means market. Okay, there you go. So yeah, we're from the same named institution. Exactly. Yeah, well it's funny and in high school, my confederates and I went to a Jesuit school

and my confederates are yesterday, Latin and so we've always certainly thought a lot about

“Latin roots in Latin words. Your family was on Catholic, I believe, right?”

That's correct. Going to a Jesuit school, help you think or what? What did that add to the mix? Well, the none of the three of us were Catholic, despite going to Catholic school, which was a little bit funny, but one interesting story is that my mom was concerned about whether I would start selling drugs when I was doing my doughnut stand in eighth grade because, you know, it's an easy step and so I like to think that Catholic school helped to still good values in what I should

care about and being very focused on school at the time, on speech and debate, on building companies and so very grateful for that education. Last two questions. First, what's the next goal you have for the company? The next goal for the company is really in scaling up a lot of these super realistic evaluations that I've talked about, of how do we get measure the ways that models use all sorts of different tools on trajectories that would take someone days or weeks to do? It is a big focus for us

and especially how that impacts enterprise, right? Where I think that so far for the last two years people have been very focused on the idea of intelligence rather than the idea of models being useful and bridging the gap between what to enterprise is actually want to use. How do we measure that and how do we get those capabilities and models is to me the most exciting thing that I could work on? And what do you want to learn next? Work related or otherwise? That's an interesting question.

I feel like Mercore is at the intersection of labor markets and AI research and we grew up with the DNA and labor markets of thinking all about how do we aggregate all these people on our platform, how do we match them? We hired people that are deep domain experts in labor markets like Cindy Jane, who was the Chief Product Officer, Chief Technology Officer at Uber. But I am most fascinated by all the advancements in AI research of how do we apply human talent and human labor

to all of these problems with the frontier in more efficient ways to train models and what are

The specific rubrics or data types that are driving the most model improvement.

most interested in how to learn that. Brendan Foody, thank you very much. Thank you so much for

“having me Tyler. Thanks for listening to Conversations with Tyler. You can subscribe to the show on”

Apple Podcasts, Spotify or your favorite podcast app. If you like this podcast, please consider

giving us a rating and leaving a review. This helps other listeners find the show. On Twitter, I'm

“@Tiler Cowan and the show is @CowanConvows. Until next time, please keep listening and learning.”

[BLANK_AUDIO]

Transcript

[MUSIC]

Industries within law?

Domain by domain, but in law, I think it's going to be a long time.

Test, how much economic impact would that add?

As we start to automate more industries and there's more liberal arts and the...

Getting added to the broader model ways to customize the base model that bill...

Are training models, whether in their full-time jobs or through our marketpla...

How do other people do interviews wrong?

This effective flow of information and aggregation within knowledge markets.

Oftentimes, great people know great people and so they'll always need to leve...

From.

Around me that so much of our company is 22 plus or minus a couple of years.

The specific rubrics or data types that are driving the most model improvement.

Compare and Explore

More Transcripts You Might Like

Chiles v. Salazar

Immigration and Naturalization Service v. Chadha

No War Is Illegal (When They All Are) [TEASER]

The People v. MrBeast [TEASER]

Why AI Needs Ethics First | Nekia Nichelle & Shekhar Natarajan | Live at CES 2026

Success, Spirituality & Staying Real | Shilpa Shetty Kundra x Shekhar Natarajan