How to Build 10x Cheaper Object Storage, with Simon Eskildsen, CEO at Turbopuffer

Tradeoffs are very important in a database that's the fundamental thing with ...

and why some databases are good at things that other databases are not as good as that.

“Welcome back to barcive. I'm your host, Bar Your Own,”

and today I'm excited to have Simon Eskildson, co-founder and CEO of TurboPuffer joining us, and TurboPuffer Power Search for fast growing startups like cursor, notion, and linear. The fundamental tradeoff for TurboPuffer is that rights are slow, right? We have to commit them directly to office storage, it takes hundreds of milliseconds. So, you're not going to do an old tp workload on something like that. It just doesn't make sense.

It's, you can't do like checkouts for e-commerce on something like that, unless you get it all done in a single transaction. And then once in a while, you'll hit a node that doesn't have the data on a disk, and you'll hit a cold query, and it will be a couple hundred milliseconds, instead of tens of milliseconds. For search, that's a perfectly acceptable tradeoff.

For high-frequency trading, maybe not so much. We happen to think that these sets of tradeoffs are pretty phenomenal for a lot of workloads and especially search. Hey Simon. I'm so excited to have you on today sort of insane to me that you started TurboPuffer in 2023, and it's become such a beloved product. You're running hundreds of billions of

vectors, top customers are building on top of TurboPuffer, so I'm excited to get into it tonight. Thank you so much for having me, Bar. Okay, well, let's start with your a-ha realization around TurboPuffer. You were a principal engineer working on infrastructure at Shopify, but then you had your period where you consulted with startups,

you helped them with their infrastructure and their scalability issues. So when in this process of working with companies, did you realize,

“maybe there is a need for a TurboPuffer? It's time to build this thing. How did that happen?”

Yeah, I think maybe the full background is useful here. So yeah, as you said, I spent almost almost the decade working on infrastructure at Shopify. And it was kind of a rag-tag team of software developers who learned the infrastructure as we went, along with the operations people, to just make sure that discontinued this Rails app continued to scale as the company did. And when I joined, we were doing a couple hundred requests per second.

And when I left in 2021, we had peaks of more than a million requests per second.

And the hardest thing to scale through all of that is the data layer. So as part of that, me and my co-founder Justin just spent thousand, tens of thousands of hours, well, probably 10,000 hours at this point, scaling every single part of the data layer of Shopify. So my sequel, Redis, Memca, Shilastic Search, like all of these things and proxies in front of it.

And so I had some experience running these Lucene-based solutions at scale. And they're really good for the kind of like e-commerce search that we needed where it's very important to search almost all of the data a lot of the time. But it always seemed to me that there might be a better way to do search. It didn't occur to me until I spent a couple of years bobbing around helping my friends companies and small increments, but they're infrastructure challenges.

What the future search might look like? And one of the companies, it's my friend's company, Reed Wise. They asked me if I could build a small recommendation engine. After I was done spending a bunch of time in tuning their postgres auto vacuum,

“which is like the most common scaling challenge I think into 2020's.”

And we wanted to build a recommendation engine. And I thought that vectors just looked amazing

because one of the search problems that we talked a lot about at Shopify was sort of mapping the vocabulary of the user with the vocabulary of the store. So you search for red dress and they have a burgundy skirt. And you search for shoe and they have some like lime green sneaker. And it just doesn't come up, right? Because the words are not exactly. You're searching for strings, not for things. And so you got to turn strings into things and vectors are really good at that.

You chop the head off of the LLM. And outcome these numbers that you can plot in a very large cordon that system and things that are adjacent in that cordon that system are also adjacent in the real world. And the LLM's were wonderful for that. So for my friends at Reed Wise, we made a small recommendation engine that did this. And it was pretty good. Like without much tuning, it actually did an okay job recommending articles and it makes sense because it's trained on

articles on the web. So it was very good at it. But when I ran an app can math on how much this is going to cost, it was close to maybe 30 to 40 grand on the on the on the repetition of vector database at the time. It made sense. It was new, but it turned out that all of this was stored in memory. And at Reed Wise, the founders put this into bucket of, well, that seemed really neat,

It just cost too much.

come down. And I couldn't stop thinking about it. I was like, why has no one built a database that

“takes advantage of the things that we have available to us now? Because it seems like the perfect”

tradeoff for search. We have NVMe SSDs. They're about a hundred times cheaper than ram. But the

memory bandwidth is only maybe about five to 10x lower. We have S3 that is finally consistent,

which is a very nice property to have on your building a database, which happened in the late 20, in the late 2020 of being met. And then we also have compare and swap on ob-extorage, which means that we can now build a database that, you know, like a pufferfish and flights into the memory hierarchies from ob-extorage into NVMe and finally into memory, would the only downside being that all the rights have a higher right latency. And we thought that this was a perfect set of tradeoffs

for search. So that is the long story of how I went from when it's building to a buffer. I mean, that's very helpful. And I like the, you know, for someone who spends a lot of time thinking about search, I'm surprised I haven't come across things not strings, but it's a good tagline. I mean, it makes sense you have these new capabilities, like you mentioned, NVMe SSDs in the cloud, consistent S3, and compare and swap. And then you have what you had mentioned, which is sort of

this like new workload of LLMs. What are some of the other things that made this the right point in

“time, whether it was workload, data type, or other capabilities? Yeah, so I think if you want to build”

a new database, you need, you need, you need two things, you need a new workload, and you need a new storage architecture. The new workload seemed to be that there's an enormous amount of data that wants to be connected to LLMs. And the models are just hungry for more data and they're hungry at reasoning with the data. And the, but in order to do that at the scale of the acquired, we also need to change the economics of the storage. If you take all of the data and stored in memory

as vectors, and these vectors are 10, maybe 100 times larger than the data set itself, because if you have a kilobite of text that turns into easily tens of kilobites of vectors. So the, the need for a new storage architecture is even more prominent there in terms of that just the obscure costs. If you store a gigabyte of data on a disk, it costs you about 20 cents, so 10 cents

for the gig, for 10 cents per gigabyte of, of, of disk, and then you run them at around 50 percent

disk utilization, and you replicate it three ways. So you end up paying about 60 cents all in per gigabyte of data that you store versus the two cents per gigabyte when you store an obics storage. And in case you're accessing it a lot, you only have to replicate it to a single machine, which can, which costs you on disk, maybe somewhere around five cents per gigabyte. So all in, you're, like, an order amount to cheaper than the base cost of replicating this on disk.

So the two things you need for new database, what the new workload, which is connecting lots of data to LLMs, and the second thing you need is that you need a new storage architecture, the new storage architecture being that obics storage is social truth, and we just cache the data aggressively that needs to be accessed a lot. Let's actually walk through, okay, you had this realization when you were helping revise, you built, you eventually built turbo-puffer. What goes into the

simplest version of turbo-puffer, the initial version that you put in front of a first customer, I know you've done a lot of optimizations and works since then, so architecturally what goes into it, and what does it take to build that? It takes a lot of embarrassment, I would say.

“To put out that version, I think there's a lot of these startup platitudes that you hear,”

and that you don't fully internalize until you're in it. The play by play here was that it was March of 2023, and I had this idea, and I was talking through with a friend, you really encourage me to just go for it. And so I started thinking about, I was, yeah, it was a really good friend. It's actually the friend who since designed the website, which is now a big part of our

internet website, it's amazing. And I sat down and started working, like learning everything I

could about all the different vector indexing algorithms, and then reasoning through which ones would work on ob-histories in which one wouldn't. So I spent a summer, like a lot of it at the capital Canada, and just completely focused on this first version of the database. And it ended up being the simplest thing that I could possibly ship that had acceptable performance. How do you define acceptable performance? Accentable performance is like a hot quarry around 100 milliseconds

seemed good enough to ship, and a cold quarry around one to two seconds seemed good enough to me to ship with the economics that we had that we could get, right, as a sort of the trade-off. But there was no reason to me why this couldn't be as fast as the fastest one out there

That was in memory, just with better economics so you could really get all th...

gate. So the first version of TurboPoffer was literally just a file that was called Centroids

“that was on ob-history. And you download the file of the Centroids, and then you search through”

them all, and then every cluster in the vector index was in, or were in other files that were then downloaded into, in a second round trip, into the process. That was it. I could go into more details on what that exactly means, but it was very simple. It was just two round trips back and forth to ob-history. At the time there was not even an SSD cache. I just put a caching and Genx in front of it. I ran the entire thing in a teamock session on a single node in prod. It was

literally the simplest possible thing that I could come up with that I could ship after that summer of running an in order to the amount of experiments on figuring out how to make all of this fast, because it's not quite as simple as I put it to do the indexing and everything in a way that has high recall. What's the biggest challenge there in version one? The biggest challenge was that it wasn't clear what indexing algorithm you wanted to use. So at a high level, when you're building

a vector index, you sort of have three options. The first option is the simplest one. It's like when

you have a query vector, you compare it to every single vector in the target dataset and you return

“the top cake closest ones. And then you have to look through everything. You have to look through”

everything. It works for, if you have about a gigabyte of vectors, you can read that at maybe 10 to 20 gigabytes per second. If you max out the machine, so you can do maybe 10 requests per second if you exhaust the machine, latency will be around 100 milliseconds. It sort of works, but as you get into larger and larger sizes and more worries per second, it sort of starts falling apart. The second option is to use a graph-based index. And this is a lot of the rage. All of the existing production-ized

implementation we're using this algorithm called H and W. H and W is essentially you can sort of with her respects, construct a graph, where vectors that are adjacent in vector space are also connected in the graph. The problem with this approach is that if you store the data on object storage, every time you navigate and node in the graph, you have to go to object storage. And the P90 to object storage is maybe two to 300 milliseconds. So every time you navigate the

you start at the center, 200 milliseconds, you go one out, 200 milliseconds, 200 milliseconds, as you navigate. This is really fast and memory because you only need to do maybe 9 to 10 reads to to go get all the the closest vectors, but it is extremely slow on object storage. Many, many queries, even on disk, it's slow because disks are not good at a lot of of reads. We're doing very low bandwidth per read. H and W is phenomenal because you just insert

vectors and they just go into the graph and it works great. It's the economics that are difficult.

If you're storing a billion vectors in an H and W graph, you kind of have to store the whole

thing in memory. Or maybe some of it to disk gets very complicated very quickly. And the cost becomes astronomical, right? 10 to 1000, maybe even hundreds of thousands of dollars to store a billion vectors, which you can do at a thousand dollars per turbo puffer. So it these like orders of magnitudes of improvement and storage really comes out. But this post a challenge, because the reason why H and W is so popular is because it has a very high recall, very high accuracy

“against the exhaustive search. And it's very easy to maintain. So that's why it was so popular.”

The third approach is actually the I think almost the most obvious one if you sat down and drew a bunch of vectors in a coordinate system, which is that if you draw a coordinate system, you imagine it's in 2D, then naturally clusters will occur. We go back to the e-commerce example. You could imagine some of the vectors that talk about dresses and skirts are in one cluster. Some of the talk about shoes are in another cluster and some of the talk about pants are in the

third cluster. Although, I would greatly separate the dresses and skirts. I don't agree with that example, but yes, I see directionally I see what you're saying. Well, the skirts and the dresses are like adjacent-ish, right? But still, there's there's probably clothing items that I don't know the name of that you would note a name of that are right in between, a romper, maybe. And with the shoe cluster, we can say it's a little bit further away. But either way, then the idea did you do,

then is that there might be three natural clusters here. And so you take the centroids of those clusters that centroids basically just the average of all of the members is like an artificial vector. It doesn't make sense really to take the average of, you know, a romper and pants and dresses. But that forms a centroid. Now, instead of having say 100 vectors, you have three vectors, one for each cluster. And when you do the search, you just look at what is the most adjacent

Centroid to my query vector.

cluster with that centroid. This is the most old-school way of doing vector search. You run a

“big clustering algorithm over the entire thing. And you return the most adjacent clusters and you”

search those clusters exhaustively. And it works fine. It's not as fast as H and W unless you're very careful about how you construct a clusters. But constructing the optimum clusters is essentially an NP complete problem. It takes an enormous amount of time. So there's a lot of heristics that go into it just like the graph. But it works really well for disk. And it works really well for object storage. Because whether you're downloading 100 megs or one meg from object storage,

there's just not a big difference. On disk, there's also not a huge difference. Of course, there is a difference. But it's not the same kind of difference as doing a lot of random searches, right? And you can do a lot of this in a round trip. It's just a small extra penalty. So it works really well for disk. Because you just get the centroids and then you get the clusters that match. You go to object storage and it's the same thing. For memory, you can get a way with a

lot of random reads into a graph into time span that you can read all of that memory. And for me, figuring out and really moving my way away, moving myself away from the status quo of everything should be a graph. And okay, to make the graphs work on disk, we just need to shrink them so they have less ground search, which is called this K&N. Took a long time. Because there was nothing really, everything seemed to be trending towards the direction of graphs. Yep. Yep. And so the most

difficult part was actually making that fundamental architectural decision. And I'm making sure that

“it's the decision point and not the implementation of it. I think I think it was, yeah, getting high”

recall on that kind of solution, but something that worked perfectly on on object storage. And I had some false stars on like I started by using like a cloud flare worker and use it doing doing it there and I have to move to servers and to build a small storage engine. I tried to I tried a bunch of different ways of making the index like being online updateable. So you didn't have to retrain the whole index every time you did enough writes. That took some time

building the building like a simple limitation of the wall. There's just a lot of like I probably did three or four rewrites before I ship the simplest thing over that summer. And then

from shipping the simplest thing into getting into the hand of your first customer, what does that look like?

Yeah. So when I launched it, I was like kind of exhausted from having worked the whole summer on

“and I launched it on the beginning of October in 2023. And I got a nice email from one of the”

cursor co-founders. This is back when cursor was a smaller team. And knowing that team so well now, I can imagine that they had sat around at the dinner table and said, oh, like these vectors are so large. And the query profile that we need for doing retrieval over a code base, just match a so well that we can hydrated into a cache when we actually query it, only a percentage or active. Like they would have just come up with this. I don't know if they did or not. But either way,

it's slotted right into how they thought about how this problem should be solved with the right

set of tradeoffs for them, right? Graphs are great. If you're searching like a billion products all

the time and you eBay or Shopify. But for something like cursor where so much of the data is inactive, this architecture made a lot of sense to them. And so they reached out and they just sent like 10 bullet points with a bunch of numbers. Like what kind of cost they were running up against right now? Why didn't match their unit economics? Like what kind of load they had? What kind of features they would need? And we just went back and forth a bit on bullet points. And cursor was

growing really well in 2023, but it was not as big as it is now. And I felt like I needed to go beat this team in person. And so I had to instinct that I just needed to fly to San Francisco, but not make them feel bad about it. So I just said that I was going to be in San Francisco on Monday. That's the classic move. I didn't know at the time. But I went to their office and we had some long discussions. I spent a bunch of time helping them also with their postgres. I mean, they were

growing a lot at the time and they were very, very small team. And we spent a lot of time talking about their postgres and how to tune auto vacuum coming back to that. And then we, I told them how turbo pop a work where we were going with it. And we decided to partner. And they, they moved all of their load about, over the coming weeks after that, to turbo pop a revack in 2023. And by moving them to this new storage architecture with this new set of tradeoffs, they were able to

reduce their their storage costs or their vector costs by 20x or 95%. Which just matched their

user economics a lot better. Christopher first of all is a phenomenal first anchor customer and

They've grown tremendously.

customers have large vector indices with very high usage, only a fraction for cursor need to be queryable at any point in time, they'll need the index and memory for the period the users actively querying the code base. It makes a lot of sense. When you thought about initial early customers once

you had it in the hand of the first one, how did you think about the tradeoffs of kind of like

who turbo puffer is not the best fit for where turbo puffer particularly excels? And how do you think that or do you think that changes over time? Because to your point on readwise, some of it is we cannot build a feature because it's too expensive right now. cursor saved a lot of money they can do more. That's going to be true for a long tail of customers. So maybe your belief is just that that market grows. How do you think about dividing the market and where turbo puffer slots

“and is the short version of that question? Yeah, I think that I didn't really think about any of”

those things at the time. It's the honest answer. I think that I can talk, I can talk now about ideal customer profiles. I can talk about, I can use all these terms that I didn't even know at the time. But at the time it just came from a strong instinct that we could make this a hundred times cheaper. And it is offensive to me that all of these existing incumbents are in memory because it feels like there's a lot of workloads out there like the one I saw at Readwise that really

like that just cannot afford this and are okay with a different set of tradeoffs than the incumbents at the time. And I happen to think that they were really good set of tradeoffs. I didn't, I didn't know what the customers were going to look like. I was only thinking about Readwise at the time and thinking that there must be others out there and that it must be a common problem. Now, I can, I can talk in much more sophisticated terms. In, I was, I was just sitting down a bit earlier

today thinking about what kind of questions you might ask today. And one of the things that I, I reflected a bit on is just that the language and I mean, you've also gotten to know me over the past few years. The language that I used to describe these things sounds like, okay, yeah, set down, did the napkin math, built the database, got customers with the ICP and it just, it looks like

it looks like this master plan being executed but it never looks like that from the point of

“you of the founder and I think that any founder telling you that would be disingenuous. At the time,”

it just came from being immersed and having spent so much time in the napkin math soup and knowing exactly what things cost in the cloud down to the scent on almost every skew and then just thinking like, hey, if we put these things together, we could build something very different, we're very different economics and there's got to be a bit of a Javan's paradox, you know, Gaskets, cheaper, people drive more thing at play here and it turns out that that was right.

So Simon, I'll ask you something, some, I'll ask you it slightly differently and pointed, although I do want to get into some of the technical trade-offs, which is, at what point in time did you gain, because you're like, I'm doing this, I see that Readwise has this problem, I suspect this is going to be a problem for other people. It's a perfect use case for cursor, but you know, there's been, there've been many vector databases today in the past and then there's

also a subset of folks who are using things like PG-Vector on top of their databases, so at which point in time did you gain conviction that there is a large market here and, and this is what you

“want to do for many years to come? It's time. I think that in the beginning, we were very set”

on scaling for cursor and giving them an amazing experience and we picked up some other customers

that believed in us very early. These customers that are your first sign-ups and that joined the Slack channels, it's a very special relationship even now years later that you have with them. Once, at some point, we, we want to have our, one of our, one of our peers launched an architect to look very similar and, and, and at that time, we were, we were just continuing to see people who really liked the product and they liked the performance and I think in early 2024 is when we

started seeing, I'm just gaining very serious conviction on the kinds of workloads. I would say that there was a day where one of our early customers, we showed them a quote and they were, they were previous to that, they were using another Vector database with a different set of trade-offs that turned out to not be ideal for them, so they were paying for performance they didn't need. And when I showed them the quote, they asked me to show them a quote for 10x the data volume,

because now they realized that this would unlock some product that they wanted to build, but that the per-user economics previously were just holding them back. This was in, in around May of 2024, and that's when my conviction really dialed to dialed up. And now I think now

That we're seeing how much the, the modern, both agents and models are spendi...

courying data sets has increased my conviction, like to, to just an, an inordinate level.

“I mean, that's awesome. Let's talk a little bit about what you've learned with these customers.”

So we talked about what it took to make that first simplest version of turbo puffer. What are the

core optimizations and changes that you've made since then? And then to the last thing you made, we'll get there later. I'm curious how agents play into all of this and what you think ideal storage for agents looks like. So we'll do the optimization so far and then the optimizations you see in the future. Yeah, so, so turbo puffer V1, the team in turn that makes a lot of fun of it. They call it, they call it, they call it, they call it founder code. I call it the reason you have a job.

And, and the other day, someone was tacking a buy a cursor agent inside of our slacks saying, hey, can you remove all the code done by Simon? So there's a running joke to get rid of every single

vest that's of the first version. But it got us very far. And, but late last year, we completely

“replaced the initial engine. The first engine was extremely simple. It would basically just run”

a very optimized version of Keymeans, which is what I was spending a lot of time to build this custard index. And, once you've written enough data, it would completely rebuild the index. And, this was this, this, this got us really far. I did not expect it to get us that far, but it was rebuilding the entire index periodically. In, and it was very simple. We, we moved from a very simple binary encoding to zero copy, like we used, we moved to, we moved away from engine X very quickly,

we moved away from running everything on one team X very quickly. And, and, and just maturing on

that first engine, but it was very, it became very clear in the beginning of 2024, that this initial

engine was gonna sort of reach end of life by mid that year, given the growth that we were seeing. We knew we had a lot of room for optimization there, but we, at the time, another engineer joined us phenomenal engineer, and more or less he was focused on just building a new engine based on the workloads that we've seen. Very right heavy, needed to be to do incremental maintenance of the clustered index. It took a lot of time to get that right, building on top of a proper LSM for

op-extorage, rather than the very hacky storage engine died, written, and so we sort of exhausted potential of V1 by mid-202014, sorry, 2024, and then we completely replaced the V2 in the fall of 2024. The V2 engine is like a textbook, very simple, like sort of CS 101, at least initially was not so much anymore, implementation of an LSM on op-extorage with the trade-off that that comes with, and then it was using an incremental clustered algorithm called SPFresh to maintain these clusters

without having to rebuild the world periodically, and we switched over completely to that. There's a lot more optimizations we could go into now on the V2 engine, but we expect the V2 engine to be the foundation of all iterate on for a very, very long time. You mentioned the indexing as sort of the big decision for V1 between V1 and V2. What were the most challenging decisions, for example, you mentioned that one of your engineers focused a lot on

rights, and there's a trade-off in terms of number of rights, so what were the court decisions between V1 and V2 that you all spent a lot of time thinking about? The biggest pain point really was to get to something that would maintain the clusters incrementally, right? Like taking, you know, the suddenly, you know, you have the pant cluster, right? And then someone starts adding a lot of

“addresses and whatever into it, and you have to split the cluster to make the search efficient. This,”

this, when you're doing it at tens of thousands of rights per second, over tens of millions of vectors is a very difficult problem, and it's very important because otherwise the accuracy of the index will, uh, to generate over time. It's not like a B-tree, where it's very simple to prove that it just remains stable over time as you add and remove elements. It's very challenging to do, and at paper came out around that time of incrementally maintaining

these clusters, and I'd experimented with some of that doing the first summer because it felt that there was an intuition that at some point you could split a cluster and maybe if you took enough away from the cluster, you could merge it and things like that, but I could not get it right, and there's a couple of good ideas in this we talked about it. Whenever that was, yeah, and I think we weren't even convinced that this paper was a good idea, uh, boy and we implemented it was certainly not convinced

that it was even even remotely possible to do this, um, at a high recall, but we started working on

We started experimenting and we saw good results, um, but we have we have had...

work to make this work work properly at scale. Um, I think that if, if data sets are not changing very much, you can get away with just rebuilding a world and a lot of businesses will be able to do that,

“but if you want to maintain indexes with tens of millions, hundreds of millions of indexes,”

you really need to have something where you can maintain these ones that maintain these without having to recluster the entire data set, which is extremely expensive. Um, so that was really the biggest development in the V2 engine was to move to this and then also redesigning the storage engine.

The first storage engine was very simple in terms of like I'm going to put this file here and it

has this data, whereas the V2 storage engine is a key value store, right? It's like an LSM, where, you know, we think about compaction and we think about SS tables and all of these different primitives, um, rather than just a, a struct that is like put into a file and zero copied out of that file, it is a much more structured thing to to iterate on, um, as Turbo Hover supports more more queries, also not just vector queries, but also full-text queries, um, and some of the

aggregations we can do now at these kinds of things. So it was really a maturing of the database where

“V1 was get us to market, get some customers, and learn from the workloads, because I think that”

it was clear to me that the workload that these AI companies were going to have

was not going to be completely cleared to us, and there's going to be a different set of tradeoffs,

and we really learned on V1 what those tradeoffs were that could go into the other engine, like very right heavy, right? Um, and we learned a lot about how long things should stay in cash for and so on and so forth. It may be to be explicit about that. I mean, right heavy is one of the things, but if you just summarize how AI native workloads, pressure databases, and in fundamentally different ways, like how would you sort of been two sentences described that? Um, yeah, it's probably

like a hundred to one, right, read ratio, might be something to to aim for, some have some for some it's different, but that's something that we see. Um, the other thing is compaction is fundamentally different on obric storage than it is on a disk. Um, there's no literature about that. How frequently

are you seeing rights and number of rights and how are you, how are you dealing with that? The, I mean,

the biggest thing about the number of rights is that, um, turbo puffer is designed around, um, doing everything to obric storage and not having any metadata layer, and I think rights when you have to coordinate across multiple notes are very challenging to do, um, but we just come in files to obric storage and obric storage is extremely scalable. Um, so that's one way that we think a lot about rights. The other one was the incremental updating of the indexes, which is obviously extremely important

if you're doing a lot of rights. Um, uh, those are, those are probably some of the things. I mean, when, when you think about compaction, you, you also want to know how many rights are coming in.

“How often do you have to compact the database? How do you compact it? How do you lay out the LSM?”

Um, these things, um, the, the read right ratio dictates all of those things. Not to say that turbo puffer is not phenomenal with the read as well, um, but, um, but we, we do see a lot of rights. The answer may just be it mean no sense with the architecture, but was there ever a consideration to have a metadata layer? There was. And it's just in, just in my co-founder, and I spend a lot of time talking about whether we should have a metadata layer and felt like everything was

leading us to that point. Um, Richie, who you also know at workstream was, um, was like early on we kind of became friends because we were both building on the same architecture. And for them it was a very clear decision, right? The Kafka protocol sort of required an enormous amount of coordination with the metadata layer. But we had the luxury of there not being a real standard for these search workloads. Um, so we could design the protocol around not having a metadata layer if you could get

around to it. But we really thought we would have to. We also thought that we would want to replicate this to make the, the rights faster and have, have, have lower latency. But it turned out that would compare and swap on object storage. Um, we were able to do all the metadata and object storage itself. And frankly, it probably came a little bit more out of necessity in the beginning than again, back to the, it looks maybe like, um, it looks very, maybe very clever in,

in retrospect, but really at the time it's like, well, like our customers are scaling really fast. We don't really have time to look at a metadata layer. Um, and, and it was just literally the metadata files were just JSON files on object storage that we were just doing cast on and it worked better than we expected it to do. When we needed a queue, we also just implemented our object storage and we're, well, well, maybe we'll have to use like a better queue at some point,

but it kept scaling. Um, and so I think this is, I mean, this is also the, it's like the, the, the bitter lesson of scaling infrastructure is that the simple thing often takes you very far and we kept learning that again and again that Shopify as well, but you keep being surprised too. That makes sense. Look, you talked a lot here about trade-offs, right? Like the trade-off of,

You know, we didn't even have time at the beginning for the metadata layer an...

think it makes sense. Um, but, you know, all of these databases, they do make some trade-off

between latency and accuracy. You just tell me a little bit about how you measure accuracy. I know that TurboPuffer does the automatic sampling of, uh, I think, like 1% of queries to measure accuracy of, of index recall, but just a little bit more color on how you, how you all think about that. Yeah. Um, so, on, on the, on the accuracy for, for recall, that's, yeah, that's, that's, that's,

“that's really important to us. I think that we, we didn't feel comfortable with just the academic”

benchmarks at the time. The academic benchmarks used to mentionalities that we weren't comfortable with, like in terms of the fact that our customer is using these datasets. Um, a lot of the academic data sets are maybe 128 or 256 dimensions. Most of the production data sets we see have much higher dimensionality than that. The other thing about those datasets is that they don't have filtering. So, if you filter by products that are on sale in Canada, it's sort of like cuts maybe half of the

clusters in half and then how many vectors you're supposed to look at to get good recall. Um, and I think it was, it's just like, this is where I think that we really feel like production, nothing tells the truth, like production. And the way to tell the truth from production was to sample a small percentage of the queries, run it against an exhaustive node on the, or an exhaustive search on the indexing nodes, um, and then just admitted to data dog. And in data dog,

we have a view of every organization and their recall, their p10 recall and all of that. And we spend a lot of time looking at that and we looking at query plans and everything. And at some

“point, well, for sure, exposed this to users, but it was the only way we felt comfortable that every”

query plan was going to have high recall. Um, the academic data sets are just not sufficient enough. The only thing that's sufficient is production. Um, so this has been a very important consideration in everything we've done for turtle puffer because we don't want our users to have to guess whether their search results are not of what they want them to be because of inaccuracies from the search engine. I mean, I love that. Look, you've lived in, uh, dealing with the mess of when things

don't work in production. And so, uh, having a lot of empathy for that and making sure that things work as you expect, I mean, you know, we talked about, we alluded to talking about the optimizations and the changes to turbo puffer in the future. So, you know, it sounds like you've

learned a lot from being very, very hands-on with this initial set of customers. So the first thing

I'll ask is at which point in time did you say we're ready for GA, kind of we're done with this like picking, you know, you said, maybe you didn't know the words at the beginning, but picking the customers that at least felt right, um, and working really closely with them. So when did the GA, when were you ready for the GA button to be turned on? I forgot to answer the first part of your question before. So let's, let's look back to GA, which was the trade-offs. So trade-offs

are very important in a database. Um, again, if you go back to, really deep, that's the fundamental thing with databases and why some databases are good at things that other databases are not as good at. And I spent so long being on the buy side of the database and every single time I went to a database website, I'm just like, where are the limits, where did trade-offs and where's the architecture doc? Those are the things I care about. I just need to load this mental model into my head,

ASAP, and know what it's not good at, because otherwise I can't tell what it's good at. And the fundamental trade-off for turbo-power is that rights are slow, right? We have to commit them directly to a topic storage, it takes hundreds of milliseconds. So you're not going to do an old TIP workload on something like that. It just doesn't make sense. You can't do like checkouts for e-commerce on something like that, um, unless you get it all down in a single transaction.

And then once in a while, you'll, you'll hit a node that doesn't have the data on a disk and you'll get it, you'll hit a cold query and it will be a couple hundred milliseconds instead of tens of milliseconds for search that's a perfectly acceptable trade-off. For high-frequency trading, maybe not so much. We happen to think that this set of trade-offs are pretty phenomenal

“for a lot of workloads and especially search. I mean, I really agree and I think your website”

shows that fundamentally well, it's like, you can slide and understand how much you're paying, you can go and very clearly see what TurboPuffer is good for, what TurboPuffer is not good for. So that's very, very, um, kind of like customer centric and clear, which I think resonates with the types of people you're selling to. Yeah, I mean, I just, I just wanted the website to be the

website that I would have wanted. Um, so on, on, on GA, basically we just shifted when it was ready.

We, all of our engineers, spend a lot more time and gravitate more towards writing rust than React. So I wouldn't say that it was like, oh, we were at this point in this curve and blah, blah, like we probably could have GAed like in January. It was, we happened to GA what, like two or three

Months ago or something like that, maybe a little bit less.

ready. We hired someone who who needed to maintain the front end because it had been me, but I got busy with a lot of other stuff. Well, and they're deleting all of Simon's code. Yeah, yeah, I mean, that's, so this is the thing now is that is that all the rust engineers that are complaining about my code, they have to go rewrite all my JavaScript code and they don't want to do that. So we're hiring some other people to go do that and they should. And they, um, so GA was really

about that. I don't think it was a maturity thing. I mean, there's always things that you want to

“improve in your product, but I think we feel really good about the offering that we have.”

Some of the things we wanted to do is that we wanted to scale a little bit of this, the support and go to market staff that we had to make sure of all of our customers are really well supported if they run into anything that we can help them with. But in general, there is no big brain like move around when to GA or not other than this is when we feel that it's ready and, you know, like, it feels ready. And that was at the beginning of this year where everyone felt very comfortable

with going GA, sort of a question we asked each other monthly. And it's like, ah, we have a bit much going on right now. And yeah, around the beginning of the year, it'd sort of be like, yeah, I mean, I, it doesn't matter like any time. So I would say it was a pretty, it was pretty vibes-based. You know, we talked about first Turbo Puffer V1, Turbo Puffer V2, we could probably spend another three hours going into each of the optimizations. But if we just roll the tape forward and

think about, you know, what does search look like in five years and what are some of the demands as folks move to more agentic workflows? What do you see as the database needs in the feet of

“the future? Yeah, I think to pattern, I mentioned a little bit across what we're seeing from our”

customers. I think that what we're seeing is that the wave of AI companies that are doing well are trying to find more interesting ways to connect more data into their products to make the LLM's more useful. I think where we feel that where we feel right now that LLM's are better than any person on the time frame that it could do is doing research on something. It is just phenomenal at this and generating reports over enormous amounts of data. And so what, what we see

our customers want is just to search more data and, and I think we can help them with that. And I think search will do that too. I think that I think that a lot of search is going to be the LLM doing the search more so than the human, though probably there's probably going to be a 100 to 1 ratio, something like that. I don't know of of of of agents and humans doing the search. But it's very clear that the even if the context window goes to infinity, it's just going to be a lot cheaper

for them to converse with the data in some way. And it's very clear that they need some fuzzy searchy way to do that. It doesn't make sense to run put a billion rows into a context window

and ask it to do analytics on a data set. It's never going to make sense to do ACLs in a context

window. Recall is always going to suffer a little bit. So there will be some combination of this.

“I have no idea where search is in five years. But I think our customers have a really good sense”

of what they want us to ship in the next three to six months. And so we're very focused on listening to our customers and patty, patty, matching across them and working very directly with them to figure out whether they really need something. So we can make sure that we maintain simplicity in our product. So the long story, that's maybe the long answer to your question. The short story is we don't know, but we listen to our customers. They don't know either,

but they know what they need right now. And if we continue to do that long enough in a principled way, I think they will serve them really well. I love that answer. It's honest. Yeah, and it's working. One of the things that has come up throughout this entire conversation is this customer centricity. How do you hire the right team that cares about this and that is able to I guess engineer at the level of bullying your JavaScript code? I mean, the short answer. I think that

we just we just invite our customers into into Slack channels and our engineers too. And I think our engineers take a lot of pride into stuff that they work on and then loosely will pattern match on. Oh, yeah, this seems related to something that I'm working on. I'm going to dig into this. I think we have a lot of trust in the customers that we work with that if they report an issue

that there's almost always something there. And that mutual trust, I think, just shows and it

means that we our engineers wanted to engage directly with our customers. So it's been a matter. I don't think I have any secret answer to this other than this is felt very natural the way to the way that we build our business. It's felt very natural that we needed to work very closely with our customers. And our customers really liked it. And they've set things like we feel like

You're a high-performing team inside of our company.

but are you screening for something at the door when you're interviewing candidates, when you're

bringing people to come work at TurboPuff or how do you balance that kind of cracked technical engineer with care about the customer are you looking for it explicitly? I don't think I've met a P99 engineer who doesn't care about the customer experience. So it's not something that we

“screen explicitly for. I think we make it very clear how we think about our business, but”

we don't have an interview session that's like, "Hey, you're on a customer call and they're running into this bug." And what are you going to do? I maybe be sure, I don't know, but I don't think it would be very high signal. You don't know what five years out looks like. Neither do your customers and you're doing this. They know what they need for the next six months. You're adjusting and operating very quickly on that. Is there something that AI teams are

doing today that your confident is going to be considered bad storage hygiene in a few years? Even if you don't know exactly what it will look like? No, I think that some customers should be doing more bad storage hygiene than they do. We work with a customer and they wanted to

ingest a lot of data from third parties like Google Drive and others. And so ask them how

are you going to do ACLs? Like it seems very complicated to implement the Google ACLs and they're like, "Oh, I would your economics were just not going to deal with it. We're just going to have a complete copy of the Google Drive per user with the ACLs they have access to because that allows them to go to market quicker." And they'll solve the ACL thing later as an optimization. I think that's exactly the right way to think about it. I think that the pace that companies are moving at right now

is faster than anything I've seen before. I mean, it's only reminiscent of the fastest pace that I saw inside a Shopify as that was going through the hyper group in the 2010s, but it feels like so many companies are moving in that pace. I love working at this pace.

“And I think that to work at that pace, you have to be, you have to make some of those”

like some of those trade-offs. And so I don't think there's anything that our customers are doing that's like bad storage hygiene. I think we see our customers run very fast with turbo-puffer by abusing these economics to go to market quicker with product. Yep. Yeah. And in many ways that is that is the Jeffin's paradox that you want for now. So I will ask you one more question and thank you so much Simon. I'm curious how you think you've changed as a leader, as a person since you

started turbo-puffer. This is a very good question. I think what it comes down to is that we have some very simple principles that we operate on as a company. And some of these, some of these we believe very strongly in and we try to put them into everything that we do. And I think that for a while I didn't, I didn't have as much conviction that these principles would work. Like I didn't know whether being this customer-centric would work. But we've seen it work so we're continuing to do it

“and directly working with the customers in a way that maybe unusual. I think a lot of the”

my growth as a leader has been to just trust that these simple principles when applied for long enough will do the job and you don't have to sit down and come up with some like strategy feels like a band-word at turbo-puffer, right? It is that is not what we do. We have simple principles around how we do it and we align on those principles and you do that for long enough and I think you can build a really, really great company. And I don't think I didn't have the confidence for that

two years ago. I think a lot of what we had talked about is thematic with that, right? Like that first initial customer and following your intuition with cursor, the read-wise, aha moment without knowing exactly how big the market is and things have compounded on top of that. So I know we're at time, but thank you so much Simon for coming on and taking the time.

Always fun catching up. Thank you for having me, Paar.

Transcript

Tradeoffs are very important in a database that's the fundamental thing with ...

It just cost too much.

That was in memory, just with better economics so you could really get all th...

Centroid to my query vector.

They've grown tremendously.

That we're seeing how much the, the modern, both agents and models are spendi...

We started experimenting and we saw good results, um, but we have we have had...

You know, we didn't even have time at the beginning for the metadata layer an...

Months ago or something like that, maybe a little bit less.

You're a high-performing team inside of our company.

Compare and Explore

More Transcripts You Might Like

Hudson v. Michigan

Immigration Law in 2026: Fighting the Cruelty Machine

Why AI Needs Ethics First | Nekia Nichelle & Shekhar Natarajan | Live at CES 2026

Success, Spirituality & Staying Real | Shilpa Shetty Kundra x Shekhar Natarajan

Will.i.am on AI, Data Ownership & The Future of Nations | Shekhar Natarajan Podcast

AI Needs Intention, Not Fear | will.i.am on Leadership, Creativity & the Future of Artificial Intelligence