Full Transcript
[Music] Welcome to the Solspace podcast. Thanks for listening.
Mitchell: Welcome back, everybody. This is the Solspace Podcast. I'm Mitchell Kimbrough, founder of Solspace, and I've been realizing that all these podcast episodes, I've never said my own name.
That's a little bizarre. I'm trying to be more professional now. We're back with a second episode of a conversation with Thomas Payet at MeiliSearch. So, Thomas, you're the co-founder and chief operating officer at MeiliSearch. And in the previous episode, we talked about how MeiliSearch got its start as an open source project and then grew into a company, and now you're about to get into the space of monetizing the product. And you gave us a timeline on this previous episode about when that was happening, and we talked a little bit about the details of that.
I just wanted to welcome you back. And this episode, I wanted to get into use cases. So, I wanted to get into the different types of applications of MeiliSearch on different types of websites.
And maybe I could start us off by coming back to something you said on the previous episode. You've referenced the search bar many times as an abstraction. Developers are building websites, maintaining websites, and they're responsible for a search bar, meaning a bar that has a keyword field that customers, users, visitors, the audience of a website can use to find the data on the site.
There's a lot of other uses of MeiliSearch. Faceted search in particular is what we're going pretty deep in at Solspace. But let's just back up a little bit and let me ask you a question that I didn't thoroughly ask on the previous episode.
Can we talk about why MeiliSearch is so fast? When I put in a search query, send a search to the API, the results I get back come back faster in some cases than I feel like the browser can actually draw the HTML into the page. So, why is it so fast?
Can you talk about that?
Thomas: There are multiple reasons and I don't know at which level. I know when we are building MeiliSearch, what we have in mind is to get the answer as fast as possible so the end user has this instantaneity experience that it's not behind the network or something, just that you're typing something and you find it. So, everything that MeiliSearch does is done for relevancy, of course, but for speed.
And I don't know if you saw it, but when you're indexing documents of MeiliSearch, we are building a lot of different data structures that make it easier for MeiliSearch to find the document you are searching for. This is exactly how it works in any book, in which you have a reversed index and you can find the pages that contain the word you are looking for. That works exactly the same.
We have a list of words for each document and when you're typing a specific word, we know that it presents all those documents and we have to give it back as soon as possible. So, this is how a search engine works at first. And from there, there are a lot of different optimizations that you could do to make the search even more fast.
One of the big things in the timeline when you're querying for documents is the network. So, talking to specific servers and getting the result back as fast as possible. We do advise to have MeiliSearch connected to the front end, because if you have a proxy between MeiliSearch and your application, you might double the latency to get your results back.
Having a great search experience comes from getting the best result possible, but also getting the result as fast as possible, because that way you can modify your query if you're not happy with what you found. You can have feedback instantaneously instead of waiting for the page to load. This is what we call search as you type.
And yeah, that's one of the many things that we have in mind when using MeiliSearch.
Mitchell: So, the basic architecture is important and the architecture here is not necessarily unique to MeiliSearch, but the idea that there's a body of data, there's a data set, and then that data set is indexed. So, a separate thing, an index of that data is created. In the case of MeiliSearch Algolia 2, you can have multiple indices for a variety of different reasons, but it matters that there's a concept of a separate index from the data set itself.
And the reason that matters is beyond me, but it's PhD level stuff. But the idea is that, well, I mean, you could go do a search on all of the content in the database, which is kind of the out-of-the-box CMS way that you might do it, but then there's the faster version where you index that content and you sort of anticipate what types of searches will have to take place on that. Can you tell me a little bit more about that separation and fundamentally why that matters?
Why is that fast?
Thomas: When you are querying within a database, what the database is doing is that it's looking for you. It's like, okay, I'm checking in this document. No, it's not there.
I'm checking in the next document. No, it's not there until it finds something. When we are building an index, actually, we are making a copy of some of the data that you have, not all of it, but at least the one that you want to search in.
And we are organizing it for many searches to be able to go through it as easily as possible. So we are trying to reduce the amount of lookups that we are doing by copying a lot of data, having different indices in different formats, so we can dedicate, if I can say so, all the resources, so all your CPU time, all your RAM, to just make sure that we get to the result as fast as possible. And this is the reason you need to have a separate search engine on top of your database, is because actually, we do not plan...
Maybe search is not the source of truth when it comes to your database. Your database is here for keeping your data safe, whereas maybe search is here to make it accessible as fast as possible. So we are making a copy of what makes sense for us to keep, to make the search as fast as possible, and we get rid of everything else.
Mitchell: And the indexing process allows the developer, us, to hint to Meilisearch what matters as far as relevance. So Meilisearch wants to return relevant results fast. The indexing supports both of those requirements.
This is basically a guess on my part, but I know from working with Meilisearch that I could tell it, here's a little bit of information about how to know that something is relevant. For this website, for this client, for its audience, here's how to know. How does the relevance work?
Thomas: There are different ways to manage the relevancy. The thing is, what kind of relevancy rules are most important to your end user? Yeah.
On Google, they have this page rank algorithm that ranks the pages based on, I don't know, the number of links that points to them, the number of keywords that they can find in your web page. And in Meilisearch, we have many different relevancy rules that you get to reorder if you want, if you need to. By default, it managed to find the keyword that you're, the query words that you're looking for in the right fields.
So if the word that you're searching for is in a title instead of the body of the content, maybe that makes more sense. And there are different ways like this that we have by default, but maybe in certain use cases, and this is where we touch to other things like facet and filtering, but in different cases or in different contexts, you want to have a different relevancy. And so you want to either reorder the relevancy rules, the ranking rules, or you want to apply some filters and facet on it.
Mitchell: So also on the podcast is David Estrada from the Solspace team, the main developer who brought Meilisearch to everyone's attention over here. And on this episode, I wanted to, in addition to talking about the question of speed and relevancy, I wanted to talk about use cases. And David, maybe you could talk a little bit about this idea of faceted search.
Maybe we could just talk at a high level of what that's about, and we could get some questions over to Tamar about that.
David: Faceted search, like you can look at it as a way of filtering data, rather subsets of data. So by that, imagine you have a page where you want to look at a certain price point, like let's say it is between $5 and $10. So you want to display all of the items or all of the products that are between those values.
So by having this faceted search, you can say to Melee, give me all of the products that are between this $5 and $10 range, and it will display or it will return those results to you.
Mitchell: Our recent deployment with Here Comes the Guide was the first production launch, and this website is a directory of wedding venues and wedding vendors. So if you're about to hold a wedding or you're planning it or you're starting the planning process, you come to this website and you start browsing for places that you might be able to have the event. And the reason why faceted search mattered so much here is there's so many different attributes that a bride and groom might want to consider as they're planning the wedding.
So there's, of course, the e-commerce implementation that you're talking about, which Tamar was saying was of wide importance for this kind of search. But there's also this ability that you can filter down a dataset based on all sorts of criteria. And you can undo those filters, expand the dataset back out, and drill down again.
And as you're doing that, the speed matters so much. If that's slow, then you can't properly browse and explore the offering of a given website or the directory that's being presented to you. The same with any sort of a product offering on a website.
If I'm exploring some sort of like a database of electronics parts, I need to be able to come in and out of the magnification of that data by using these facets and these filters. So Tamar, in your discussion on the previous episode, you talked a lot about keyword and relevance in our use with Meilisearch using the documentation and exploring the different use cases and stuff. There's a lot of emphasis there, but the most important thing for us and a lot of our clients is going to be the faceted search capability.
So what's the difference between a facet and a filter?
Thomas: That's a very good question. To me, the big difference would be that the facet is something that you know you will be using when building your search or your discoverability interface. And so when you're setting facets, you are telling Meilisearch, we will be using this attribute or this field to filter documents.
And so Meilisearch can prepare. And so most of the work is happening at the indexing time. So we are reordering or writing the document in different ways to make them more easily searchable for us later on.
Whereas when we are talking about filter, if you say to Meilisearch, we want to filter on a specific field, we are not preparing anything before hands and it's just happening at the query time. So it's maybe a bit more flexible because there might be more different usage based on the end user, whereas the facet is more strict. But facet would be much more efficient in terms of speed and relevancy because Meilisearch knows that it has to prepare for the facet and for the queries.
Mitchell: So in this use case that we just launched, this wedding website, a facet here is view. Some venues have a view. Do they have a view of the ocean, mountains, cityscape, or none?
There's four options on that facet. And people might frequently search on that question. And David, it made sense to declare that as a facet through the API, tell Meilisearch to bake that into the index that it creates for us so that the speed we got out of that query was fast.
David: Yeah, because we knew which items were going to be like a fix, like Thomas just said. We knew those terms or those words are going to be fixed into the data structure that we were sending to Melee. So that made sense for us to do it in this way as a facet, not as a dynamic sort of thing with a filter.
And that gave us the performance that we were after.
Mitchell: So Thomas, on this particular instance that we've deployed Meilisearch, price is a facet, but it's not like $347. Price is, you know, $1 sign, $2 signs, $3, $4, $5, just like a restaurant in a directory. But there's a lot of e-commerce implementations where the exact price is part of the dataset.
So is that still a facet or does that become something else?
Thomas: It would be a facet. And this is something, actually, it was not something we had from the beginning, being able to filter and faceting based on a number, on the figures. And it's actually something that is really useful because it allows you also to facet with dates, if that makes sense.
David: Yeah, we actually use it as a facet.
Mitchell: Well, the facets and the filtering matter to us because, I mean, Google will never do that. Google will never position one of the users of our client sites to be able to get a dataset based on a keyword and then filter that down further. It can never know what matters to the dataset and what matters to the users.
Like on this wedding website, we're very clear on what matters and which filtering attributes are important to them. So we build an interface around that and indexes that support it. So that's a big part here.
So we've talked about that kind of implementation. We've talked about e-commerce. Thomas, you're better positioned than David or myself to know what other sorts of implementations people are making out of MeiliSearch.
I mean, I'm sure they're doing stuff with iOS apps. Is that the case? What else is happening out there?
Thomas: Mobile application, of course. We have more and more usage coming from B2B application, you know, SaaS app that either need a search to search through the document that they have for each customers or to use it as an entry point for the application. An example could be Notion, for example.
I'd love to have Notion as a customer because I'm using Notion every day and the search is really, really slow. And it's really something we could help with because you could have an index for your whole room, I don't know how they call it in Notion, in your own space and have facets for everything that you write down in Notion, but also have the different filtering based on the person who is searching for something. So, yeah, I think most of the use cases are, at least this is how we consider them, is B2B e-commerce, so B2B app, SaaS and content website, site search, documentation search also.
And more and more we have, and this is something difficult to work with, is having a search that search through different type of datasets. We are working with platform.sh and they have this search which is searching in the documentation but in other websites as well. And the search begins to be the entry point for your app and for the people who are using your service or product and so they can have access to anything from the beginning, from the first search that they find.
David: I have a question for you. In MeiliCloud, that's how I'm calling it, what are you using in order to determine which server capabilities are you going to be using on any given index?
Thomas: It actually depends on the use case. For example, when you have an e-commerce website, you might, you don't have that much document. I mean, maybe you have a few thousand of different items you want to sell but that's not that much.
But you might have a lot more queries or a lot of updates on a daily basis because you want to update the prices, you want to update the stock. And so the usage here, when it comes to the server, it's really writing intensive because there is a lot of updates even if it's on small datasets and some searches. Whereas if you're doing a documentation website or a content website that does not change that much, maybe you won't have that much right into your search or in your database at all.
But you might end up with a lot of different searches from all over the world. And in that case, you want to provide more RAM, some more memory and less CPU. So it really depends on the use cases that we have.
Today, we do have a standard machine that we offer for everyone but we optimize from there.
David: Okay, I see. Yeah, that was one of the things that I was having a little bit of issues when we were doing this project in order to help determine what was the ideal server configuration that we needed in order to make this work. So I was kind of curious in order to...
And I'm sure I'm not the only one who's having this question before about how to determine what server do I need? Because this helps you knowing the configuration and kind of helps you determine the cost per month for that server.
Thomas: Exactly. This is a work in progress on our side. We do need to provide more benchmarks because even if it won't be adapted to your exact use cases, it would give you a better understanding on the needs on your server based on the number of documents, the number of searches or the number of indexation that you might need to have.
It won't be accurate but at least you will have a better understanding on that.
Mitchell: Thomas, if a client out there, someone who's responsible for a website as a marketing director, if they like what they've heard and they want to look into bringing Meilisearch to bear on their website, I know that there's this open source community of people who are engaging in the product. Do you have a developer community or directory where people can go and find someone to help them get started?
Thomas: We consider the documentation to be the entry point but you can find everything on GitHub. We have a Slack community actually. We will migrate to Discord.
But this is a place where developers are talking about their own implementation of Meilisearch. They can ask questions and if someone from the community doesn't have the answer, the Meilisearch team is actually on there as well and answering as well. So making sure that we are building the right features but also this is the big win when it comes to open sources.
Once we release something, we have a thousand of users using it and we get feedback very, very quickly. So we are able to fix what we've done and to react as fast as possible. So if you're interested in trying Meilisearch, you can join the Slack, join the Discord.
On GitHub, there's already a lot of different questions that have been asked and you can find answers for. But I do hope that the documentation is good enough for you to get started in a few minutes.
Mitchell: Yeah, that's definitely true. If you're a developer, yeah, definitely. Some of our clients, they come to us on a regular basis and say, I don't know the technical stuff.
And I tell them, you're not supposed to. That's our job. That's why you hire us.
And they still... Honestly, the documentation is so easy to read and so user-friendly, I think it's a good starting point too. It's helpful to have use cases like example implementations of that.
So the Slack option is a really good thing to do. There's a question that I had. I'm still curious about the Meilisearch cloud offering.
And something you said on the previous episode stuck with me and you were talking about how in instances of search, when someone puts in a keyword, you don't necessarily know whether they're searching for golf, the Volkswagen vehicle, or golf, the sport. And if someone's using a Meilisearch implementation that we have deployed for a client on a server on DigitalOcean or Google Cloud or whatever, there's no possibility of having some intelligence where we could infer what the user is looking for based on previous history or something like that. Is this possible now or in the future with Meilisearch or the Meilisearch cloud?
I mean, up to a point, that's a really useful feature for users to be able to experience. And if it's done right, it's not an invasion of privacy or anything. I just wonder, maybe we could talk about that.
You had me intrigued with that.
Thomas: Yeah, this is something on the roadmap. Actually, I think we call that contextual search. And that would mean that we would keep the information of what you've been searching for before, or maybe we will ask to get that information from a different source.
And so we can infer the best relevancy for you. But that would mean also, even when you're not searching for something, but you're just exploring the datasets using facets and stuff, you can also improve the experience based on the context. So the history and the previous action of the user.
There are many, many different levels of complexity in doing so. But I think we have a pretty good idea on how to make a first simple version that would improve the experience a lot. So this is one of the big features we have in mind for next year.
We also know that at some point, we want to give the opportunity to organizations, businesses, to plug their own business logic inside the engine. An example could be if you're searching for documents on Slack, you want to have a higher relevancy for messages that you exchange with the people you exchange the most. And so we want to give the ability for developers to have this kind of very specific business rule that does not apply to anyone else, except if you're building a Slack app.
So the search can be really adapted to your use cases and your business.
Mitchell: So, I mean, are you heading in the direction where it could be possible for MeiliSearch to support the type of behavior that we see across the web where people who search for this also search for this and have the results informed by that sort of crowdsourced information? Yes. Okay.
That'd be sick. David, we're going to go crazy with that if it comes out.
David: Yeah, that recommendation system will be great.
Mitchell: There's so many different applications for that. Now, I mean, the challenge at the engineering level is how much of that to own and how much to hand over to developers like us to manipulate. I'm glad it's not my problem.
It's your problem. But yeah, that's pretty appealing. This is the challenge with our clients helping their users find the information they want.
That's one of the great vectors to use is someone who searched like you did, what were they interested in and how might that be relevant to you? That'd be pretty great. Well, maybe this is a good stopping point.
I feel like we could do a whole bunch more episodes because there's still a bunch of questions I want to ask you about this. David, did you have anything else you wanted to ask Thomas?
David: Yeah. I have kind of a geek question. Why did you decide to go Rust and not Go?
Thomas: Yeah. When we met with my two co-founders, we were working on that search engine for an e-commerce company and we went with Go actually. We went with Go because the learning curve is a bit easier maybe.
We have still great performances and it's easier to find developers who code in Go than in Rust at least three or four years ago. But we quickly ran into performance issues because of the garbage collector. So that means that Go is at a higher level of abstraction than Rust.
And when we are managing the memory of the search engine and when we were querying a lot, mainly search constantly, we saw a decrease in performance at every time the garbage collector is going by. And so it was less easier for us to predict the speed of my research. So that's one of the reasons we went with Rust.
The other would be maintenance. Maintaining a code base in Rust seemed a lot easier for us because of all the tool sets that you have when you're a Rust developer. So cargo and all that stuff.
Yeah, a very modern way of managing your code base. Whereas in Go it was not that well defined at least again three or four years ago. Maybe it has changed.
So we were more confident in switching to Rust and using Rust for that kind of tool instead of Go.
David: Okay, that's interesting. Nice to know. I was kind of curious why Rust, but now I see why.
Mitchell: Wow. All right. David Estrada, Thomas Payer of Meilisearch, thank you for joining the podcast.
Thanks for answering all of our questions. Best of luck to you guys. You're not going to need it.
You're doing a great job already. We're really excited that we have such a great tool that we can bring to bear on a lot of our client problems. I just want to thank you again for coming on and talking with us.
David: Thank you, Thomas, for being here.
Thomas: Thank you very much, Mitchell. Thank you very much, David, for having us. That's really nice.
[Music] You've been listening to the Solspace Podcast.