localfirst.fm

A podcast about local-first software development

Listen

Conference

#10 – Matt Wonlaw: cr-sqlite, syncing strategies and incremental view maintenance


The guest of this episode is Matt Wonlaw, a prolific local-first tool builder who’s behind projects such as Vlcn, cr-sqlite and Materialite. Most recently Matt also joined Rocicorp to work on their new product. This conversation will go deep on his projects covering CRDTs, SQLite and incremental view maintenance.

Mentioned in podcast

Links:

Thank you to Expo and Rocicorp for supporting the podcast.

Transcript

Intro
00:00So I think it's a lot easier to develop when your queries are
00:03co located with your components.
00:05So every component is responsible for getting the data it needs.
00:08And this makes your app like super composable, right?
00:11Like you can add and remove components to your app or your component tree.
00:16And because they're responsible for getting their data, like, you
00:19don't have to do any additional wiring or prop drilling.
00:21Welcome to the localfirst.fm podcast.
00:24I'm your host, Johannes Schickling, and I'm a web developer, a
00:27startup founder, and love the craft of software engineering.
00:30For the past few years, I've been on a journey to build a modern, high quality
00:34music app using web technologies.
00:37And in doing so, I've been falling down the rabbit hole of local-first software.
00:41This podcast is your invitation to join me in that journey.
00:44In this episode, I'm speaking to Matt Wanlaw, a prolific local-first tool
00:48builder who's behind projects such as Vlcn, CR-SQLite, and Materialite.
00:53Most recently, Matt also joined Rocicorp to work on their new product.
00:57In this conversation, we go deep on his projects covering CRDTs, SQLite,
01:02and incremental view maintenance.
01:04Before getting started, also a big thank you to Rocicorp and
01:08Expo for supporting this podcast.
01:10And now my interview with Matt.
01:13Hey Matt, welcome to the show.
01:14How are you doing?
01:16Good.
01:16Yeah.
01:17Thanks for having me.
01:17I'm super excited to have you on the show.
01:20You and I have been collaborating on a few projects now over the
01:23course of the last few years.
01:25Most notably, you've been working on CR-SQLite and also
01:29Materialite most recently.
01:31But for those of you who don't know Matt would you briefly
01:35mind introducing yourself?
01:36Yeah um, Matt Wonlaw, and I guess I've been involved in the local-first
01:41community for about two years now.
01:43It originally started with me trying to resurrect an ancient project
01:46of mine and trying to add, like, multiplayer and sync between devices.
01:51And I saw, like, that's actually a pretty hard problem, and there weren't, like,
01:54many off the shelf solutions, so then I just, like, dove into this , and that's
01:59where all these projects like CR-SQLite, Materialite and other things came from.
02:02Yeah, and I think this is also how you and I have met quite a while ago.
02:07And before that, you've been working at Facebook for many years.
02:10And even before that, I think you had your first engineering role
02:13working on some software that by definition had to be very local.
02:18There was no local-first yet.
02:20Would you mind telling us a little bit more about what those first apps were that
Lockheed Martin
02:26Yeah, so like my first job out of college was at a defense contractor
02:31called Lockheed Martin, but essentially what we were doing was we were building
02:35software for submarines, so like U.
02:37S.
02:37Navy submarines, and yeah, obviously a submarine has no internet connection,
02:41everything has to be self contained, but even beyond that, so the software I built
02:45was for, Like these thick clients for the workstations that sonar operators used.
02:50And, you know, there, there are servers on the sub, you know, servers that
02:54provide storage and some that provide the signal processing and whatnot.
02:58But like an extra requirement was, you know, a sub is in a hostile environment.
03:03Things can break.
03:04So the work, workstations need to be able to work even if all the servers were down.
03:08So, you know, they need to hold the last 24 hours of data.
03:10So the sonar operators can, you know.
03:12If every other system's down, they can at least work with the last
03:1624 hours of data that they had.
03:18So that, you know, I guess was my first brief introduction to local
03:23apps that eventually do need to do some sort of syncing once connections
03:26are re established and, you know, reconcile the changes that were made.
03:29That's fascinating.
03:30Have you learned some patterns back then that you still see out in the wild today?
03:36Or do you think the approach that you've taken back then is entirely
03:39different compared to today?
03:41Uh, Back then the data was very well segmented.
03:44So there was like never really conflicts.
03:46But event sourcing was the main pattern.
03:49So just have an event log of everything that has occurred.
03:52And when connectivity is reestablished, do you know.
03:55Play that event log forward from, you know, the time you lost
03:57connection to the current time.
03:59Yeah.
04:00I mean, event sourcing, I feel like is still a super active topic.
04:04I'm actually looking into it much more closely myself for the use case of
04:09building Overtone, and I'm also exploring to embrace it a bit more for like syncing
04:14and migration strategy for LiveStore.
04:16But I also think that there might be a renaissance for event sourcing.
04:20We had it for a while with Redux in a much more self contained space, but I
04:26think there's a, that's an interesting topic for research and production as well.
04:31Curious to hear your thoughts on event sourcing and how it
04:34could fit into local-first.
04:36I mean, I really like it.
04:37, you have a source of truth, like of all the facts, all
04:41the things that ever happened.
04:43Whereas, you know, a regular app or state or database that
04:46doesn't do event sourcing.
04:48You're losing all the things that happen, right?
04:51And you just have this snapshot of what things are right at this instant.
04:55So yeah there's no way to rewind.
04:57There's no way to, merge other people's changes if they did come in the past.
05:01, so I guess event sourcing, you know, probably fell out of favor in some
05:05ways because, you know, storage requirements, like are you really
05:08going to store all the event logs?
05:10And then also, you know, processing those event logs.
05:12You do eventually need to process them into some you know, snapshot of
05:16state so your app can work with it.
05:18So I think, you know, people seeing that, oh, I have to process the events
05:20and turn it into like these tables anyway, like why even do the event logs?
05:23Why not just mutate in place?
Meta
05:25Yeah, I think in case anyone of the listeners is interested in local-first
05:30and event sourcing, please get in touch.
05:32I think this is a very interesting topic to research a little bit After
05:37building those apps for like submarines which sounds super fascinating, you've
05:43at some point also decided to move on and join Facebook nowadays, Meta.
05:49I'm curious to hear more what you've been working on there and
05:52whether there was any intersection with your local-first topics now.
05:58Yeah.
05:58So Meta,
05:59, I guess, like, it shaped the way I approached the local-first problems after
06:04I left, but the problems there were very , , not related to local-first at all.
06:08So I guess, yeah, what I worked on, it was like three different teams, but
06:11they all kind of shared the same theme.
06:12And a lot of what I was doing, we had a system called Download Your Information.
06:16So this is like any Facebook user can go click a button and get
06:19an archive of all their data.
06:21So we'd have to like crawl your entire Facebook graph
06:25and find Everything you own.
06:27So we call it the ownership graph.
06:29And then another thing one of my team's built was the deletion framework.
06:33So this is like you delete your account.
06:35So go delete every post you've ever uploaded, every like you've ever
06:38made, every comment, every like ad you've ever run every message you've
06:43ever sent across all the products, Instagram, Messenger, Facebook.
06:46What's I guess not WhatsApp.
06:47And then, yeah, so that, that was like traversing this deletion graph.
06:50And then a third product, like, variation on all these themes was
06:54like investigations and human review.
06:56So, like, obviously there's lots of abuse on Facebook.
06:59And we had to have a way for, like, every product that's ever created, like,
07:03Can we make their content reviewable by like a human review team at Meta?
07:08So this is like, yeah, trying to fan out from the content and traverse the graph
07:12of things associated with it for review.
07:14So I guess like, all this like graph walking was very, goes
07:18very hand in hand with query languages and also schematization.
07:23So, I guess I mentioned like three different areas of like
07:27quote unquote compliance, right?
07:28Deletion, download your information, and content review.
07:32Well, if you're a product team at Meta, you don't want to
07:35be slowed down with having to integrate into all these systems.
07:37You don't have to learn about them.
07:38You just want to like ship your product.
07:40So like our key focus was how can we make it so developers can
07:43just make their product and not have to understand these things.
07:46So we kind of, we built it into the schema language.
07:49So as you're defining your schema for your product, you can say like for a
07:53given edge, like this is an ownership edge, or this is like a deletion edge.
07:57And then just like by declaratively specifying all this stuff all these
08:01other systems could just work.
08:03And.
08:03Yeah, eventually when I pivoted to local-first problem, I was like, wow, this
08:06like developer experience of everything being declarative and the schema layer
08:12down, taking care of it for you maybe we can do this for local-first too.
08:16I couldn't agree more.
08:17Did you eventually land on something where you got those declarative
CR-SQLite
08:24Yeah, so, the project I ended up working on was called CR-SQLite,
08:29so I was, like, taking these ideas and adding them to SQLite.
08:32So, yeah, SQLite's already, you know, a relational database, and it's already
08:36pretty declarative in terms of you can define your schema for your table,
08:40and you can specify an index, and you can specify foreign key relations, and
08:43the database manages all that for you.
08:45But it didn't have any primitives for collaboration, right?
08:49So, like What if I give you a copy of my database and I have a copy of my
08:54own database and we both go offline and we both make a bunch of changes?
08:58What if we want to like sync our changes together or merge our DBs?
09:01So what I started adding to SQLite was this concept of CRDTs.
09:05So you could say for a given table, what type of CRDT should we modeled as?
09:10Like if it's a grow only set, or if it's like an add or move set, like
09:15technical terms in CRDT literature.
09:17And then for specific columns in the table, you could also say
09:21like, what sort of CRDTs should be used to merge those columns.
09:24So yeah, the idea was any developer who has an app backed by SQLite, they can
09:30just go in and make some schema changes.
09:33And then that app can become collaborative by allowing merging
09:36databases with other collaborators.
09:39Got it.
09:39And I think the parallel there is that many things about SQL can be also seen
09:44as rather declarative and the schema modeling, et cetera, those is where
09:49I think what we gain with local-first software is the the collaborative
09:53nature of the apps working together.
09:56And I think this is where you.
09:58Brought some of the nice experiences that you've seen at Meta, where there's
10:02such a strong engineering culture for having a great developer experience,
10:07bringing that to the new topic that you were interested in, local-first.
10:12Meta had a really interesting philosophy when I got there.
10:15I think like one of my first questions I asked was like, oh, how do we like
10:18make sure people use our stuff that our teams, our security infrastructure?
10:22And they're building things to make sure Meta's code was more secure.
10:25That was my first team.
10:26And like, security at Lockheed was always Everybody hated it.
10:29It was like, nobody wants to do it.
10:31And there's always this like mandated thing.
10:33So I got to Facebook.
10:35I'm like, how are we, I'm on security for, how are we going to
10:37make sure people use our stuff?
10:39And it was interesting.
10:40Like the manager at the time was just like, yeah, like we can't do anything.
10:45Facebook's bottom's up.
10:46We can't do anything.
10:47Make sure anybody uses our stuff.
10:48The only way we can do it is make the safe way, the default, and easiest way.
10:53Like, people are going to choose the secure way because
10:55we've made it the easiest.
10:57And like, there's no other way we can make them choose it.
10:59So yeah, there's a huge, always a huge focus on everything we did from DevX.
11:03So, that like, developers would just pick that solution.
11:07Because it was easiest, not necessarily because it was secure, just because
11:09it was the easiest thing around.
11:10And then it was also secure.
11:12Which is like the nice added benefit.
Strut.io
11:14So after having worked for many years at Facebook, you've later then started
11:19working on CR-SQLite, but you didn't arrive at CR-SQLite immediately.
11:24I think you were for a little bit also still working on
11:28bringing back an app you've been working on before called Strut.
11:32So I'd be curious to hear a little bit more about that and particularly
11:36given that you worked on it quite a bit before you joined Facebook and then
11:41you started working on it again after.
11:43So tell me more about the app and how that led you to local-first.
11:47Yeah.
11:47So it was my first, I guess, web app I was building.
11:51So yeah, at Lockheed, I was doing all these like Java rich clients,
11:54and I wanted to get into web dev.
11:56So I started building this thing called Strut.io.
11:58It's a presentation editor.
12:00And I knew people's devices, you know, they have storage and
12:03compute, and the browser had, you know, some storage back then.
12:07And I didn't want to, like, run servers for people.
12:09I just wanted them to be able to work and edit presentations if
12:13they're online or offline and not have to sign up for an account.
12:17Like you go to the site and you can immediately start doing stuff.
12:20So yeah, like I guess without knowing it, I was building a presentation editor that
12:25was, you know, local-first in some ways.
12:28And at that time, it was like 2011, 2012, there was, you know, I guess the prequel
12:33to the local-first movement, there's like remoteStorage.io and ownCloud.
12:37And I started getting involved with those, but then, yeah, eventually
12:40I got the Facebook offer and I just put Strut on the shelf to be
12:43resurrected sometime in the future.
12:45And then, yeah, when I was leaving Meta or Facebook, yeah, I didn't
12:49know exactly what I was going to do.
12:50So I figured, oh, I can just resurrect Strut.io real quick and you know,
12:54it'll bring in some side income.
12:56In the early days before I abandoned it, it had like 10, 000 monthly actives.
12:59So I was like, Oh maybe a decade later we can get back to that again.
13:03But yeah, as I resurrected it, kind of the landscape had changed a bit.
13:06Like people had a lot more devices and also multiplayer was an expectation where
13:12like, You know, Google Slides, right?
13:13You can have multiple people on the same deck.
13:16So I was, you know, looking into how I was going to solve that problem for Strut.
13:19And, you know, rather than I guess I've been building frameworks for so long at
13:23Meta, like, rather than making this a one off thing in Strut to solve the problem.
13:29I was like, Oh, like, can I solve it at a lower level?
13:32And that's when I went down the rabbit hole of CRDTs investigating SQLite,
13:36adding stuff to SQLite and so on.
13:40So the data architecture you had for the first version of Strut around, like,
13:45yeah, before Facebook, you mentioned like around 2012 to when you've left
13:50Facebook and what you've arrived at.
13:53Very different.
13:54So I'm curious, like how you went from like whether you took rather
13:59some smaller steps or some much bigger steps altogether to rethink the data
14:05architecture and how you went about that.
14:07Yeah, I mean, I guess beyond just collaboration, I wanted to support
14:12Yeah, arbitrarily large presentations.
14:15So right, the original version of Strut, it would load everything off disk and
14:19into memory, and then when you're closing the tab, or periodically, it would save.
14:23A big dump of memory to disk and yeah, I think that works fine, probably
14:27for the vast majority of cases.
14:29But I wanted to be able to handle presentations with hundreds of slides
14:33and tons of binary content or like images and videos and all sorts of things.
14:39So like at first I was like, okay, how do I like lazily load this stuff off of disk?
14:43And, you know, as you're starting to implement that in your own data
14:46model, you're like, Oh, well, this is kind of like a database, which,
14:50you know, knows how to page in and out things in and out for me.
14:54So I think, you know, that was also an indication that, okay, like, something
14:58more like a database would be good.
15:00And then I don't know how I found that SQLite was better.
15:04I guess I saw James Long's Absurd-SQL and Actual Budget.
15:08That's when I first realized that, yeah, like, I mean, I'd always known Wasm was
15:12a thing but that like SQLite was being compiled to Wasm and could run in the
15:16browser and it was actually faster than IndexedDB when you were, I guess there's
15:21some caveats there, but SQLite would store to IndexedDB, but it would store pages.
15:26So it would store four kilobyte chunks.
15:28IndexedDB, when you're like.
15:29Reading things in and out individually, like single objects
15:32by key value is like super slow.
15:34And the fact that like, okay, you're using SQLite and reading in chunks
15:38rather than individual objects, like it would be faster than IndexedDB.
15:42Like if you change your IndexedDB access to be paged, okay, it
15:45could be faster in SQLite.
15:48SQLite, you can still do point queries and be faster than a point query in IndexedDB.
15:52So that was something else that turned me on, like, oh, maybe I should use SQLite
15:55in the browser rather than IndexedDB, so I don't have to implement all this
15:58crazy stuff to make IndexedDB go fast.
16:00And then, you know, there's all the problem of, like, collaboration still
16:04and my justification for picking SQLite then was, you know, not only is SQLite
16:09running the browser now, if I ever want to take this app and make it native
16:14well, SQLite, Everything I build for collaboration in the SQLite, like it'll
16:18work on native, it'll work on desktop it'll work in the cloud if you want
16:24some like cloud peer or something rather than like a JavaScript only solution.
Choosing the right CRDT
16:29So you made the decision, SQLite is the way forward, SQLite on top of IndexDB or
16:34on top of like some persistence mechanism, but you now also need collaboration.
16:39You looked a little bit into CRDTs, but you still decided
16:42instead of going CRDTs first in terms of the developer experience.
16:46You still want to give a developer yourself SQLite as the primary way to
16:52think about the developer experience, but you started to use CRDTs or you
16:57wanted to use CRDTs as a implementation detail to bring into SQLite.
17:02So how did you go about that?
17:03How did you solve that puzzle of making SQLite collaborative out of the box?
17:09Yeah, so SQLite has a bunch of mechanisms for extending it.
17:13You can make new virtual tables.
17:16yeah, a virtual table is essentially a table.
17:18It presents itself as a table in SQLite, except that The implementation of that
17:24table is any arbitrary code you want to write, so C code, Rust code, whatever.
17:28You can make functions, new functions, like a mat, you know, like min or max
17:32or whatever function you want to define.
17:34You can make new virtual file systems, so it's pretty extensible
17:38from the plugin side of things.
17:39And yeah, one of the teams I had worked with at Meta, they built this thing called
17:44osquery way back, I think, You know, some other company may have seen osquery
17:48now, but it basically lets you query all details about your OS through SQLite,
17:53and they did that through virtual tables.
17:55So yeah, for adding CRDTs to SQLite,
17:57it was kind of a combination of a bunch of special functions.
18:01That like when you create a table so, plus special functions
18:06and special virtual tables.
18:07So you can like, create a table, and then you can call a function that
18:11converts that table to a CRR, or Conflict-Free Replicated Relation.
18:16So just a fancy name for a table that's a CRDT.
18:20Or you could use this like virtual table syntax where you're like, I want
18:22to create this table as a CRR and then you list all the columns and in the
18:31columns, so when you're creating a virtual table, like you can customize that
18:34syntax all you want in your extension.
18:37So I could like have a slot for custom data types.
18:40So you can say this is a string.
18:42And then like another addition to the data type is like the CRDT type.
18:46So it's not only text, it's like a perryText, or it's like, not just an
18:50int, it's an int that's a counter.
18:53Or it's not just a date, it's a date that slash right wins.
18:56Yeah, and then and then I guess the last thing, right, like, it's great
19:00to have a database that's, you know, a CRDT but how do you merge efficiently?
19:06. There had been some earlier work, actually by a guy named Iver and we actually worked
19:11together pretty early on, on CR-SQLite.
19:15He did his master's thesis on essentially adding CRDTs to SQLite but some of the,
19:20like, Downsides of like that original approach was there's, there was no
19:24way to incrementally merge databases.
19:25You had to literally send the entire copy of the SQLiteDB to
19:29somebody else, and it would like scan every row and merge them all.
19:32So yeah, one of the key problems we had to solve was like how do we
19:34incrementally sync And that was done essentially like implementing a global
19:41version that gets tagged on every row and a virtual table that knows how
19:45to find all the rows past the given version and just send you those to sync.
19:51Was there any prior art to combining a relational database and CRDTs or
19:58another mechanism to synchronize?
20:00Yeah, so there's Iver's work, and then, I guess, James Long's work on
20:05well, Absurd-SQL, but I guess Yeah, I don't know if he ever gave a name to
20:08the layer above it that did the CRDTs.
20:10But yeah, James work, if I recall correctly, was all in JavaScript.
20:15So it wasn't built into SQLite itself.
20:17You couldn't just run it anywhere SQLite could run.
20:20It had to be in a JavaScript environment.
20:22And he took a different approach.
20:24I think he used hybrid logical clocks and Merkle DAG.
20:29So yeah, I guess we started getting some trade offs of like,
20:32Strut was the first use case.
20:34Which actually informed a lot of design decisions for CR-SQLite and why I
20:37didn't want to use something like James Long's approach or, I don't know other
20:41solutions that were available at the time.
20:43Yeah.
20:44I'm curious to hear more about your your judgment of the trade
20:48offs and what made it a good versus not so good fit for your use case.
20:52Yeah.
20:52So I think James's approach yeah, I hope I'm remembering everything correctly.
20:55This was like a couple of years ago, but when I looked into it, every
20:58single message for all time was stored.
21:01So this is, you know, kind of like event sourcing, right?
21:03But like, one of my key requirements, like, I was using Strut to drive
21:08requirements for CR-SQLite.
21:10And I wanted the ability for people to be offline as long as they wanted.
21:14For it to be able to run on constrained, low memory devices.
21:18And so those two in combination with the fact that it's like a presentation
21:22editor where you can like, you can drag and drop components, you can spin them,
21:26rotate them type, like I didn't want every single event for every keystroke
21:31and every drag and all these things to be logged for all time, right?
21:35Especially if you're offline indefinitely.
21:37So if you're offline for like, I don't know, a couple days editing
21:41a really big presentation, you come back offline, I want it to
21:43be able to sync almost instantly.
21:45So the approach I chose was the set of CRDTs that allow you to collapse history.
21:51So, yeah, if you've been off The database size never grows more than
21:56a constant factor of the base data.
21:59So no matter how much you're editing, you're always going to be within a
22:01constant factor of the regular DB size.
22:04I don't know, I think that constant factor was like three or four times,
22:07I don't recall the exact number.
22:09Whereas the other systems, if you're the longer offline, the more and
22:12more the data grew which was, you know, not a trade off I liked.
22:15Yeah, I guess the downside to my approach is there's a few less
22:18transactional guarantees and you can't, you know, rewind history.
22:22But if somebody really wanted rewinding history, I was like, Oh,
22:24they could do it in user space.
22:26Like, that's not the problem I'm trying to solve.
22:28That makes perfect sense.
22:29And I think I really like, you focusing on the use case that you're most familiar
22:34with Strut and the requirements around that, Since we're dealing with like
22:38distributed systems and programming is hard enough, that you can't really
22:43focus on all use cases at the same time.
22:47And this way you design for the use case that you know of.
22:51And I think this is where you have like a canvas app or something like that
22:55where you move things around constantly.
22:58If you work for multiple days, you don't really care exactly about
23:02the change you've made over like five seconds, three days ago.
23:06You just care about, okay, this is the state it is in right now and that
23:10other users converge to the same thing.
23:13Whereas I think a different set of requirements might be for a chat app
23:19where you might not send around, like, which sort of keystrokes someone has
23:25taken to arrive at the final message.
23:27You send the final message, but then those final messages, you
23:30might actually care about, okay, this came at this point in time.
23:32One message does rarely override another one.
23:35So this is, I think, where that might be a better fit.
23:39And that makes also sense that in James use case with Actual Budget that's all
23:44about, like, historic transactions.
23:46You want exactly those like records and time.
23:49There's probably not so many, and even if there are many,
23:52they're all kind of worth it.
23:54They need to be accurate.
23:56So I think that makes perfect sense, and I think there will be like many different
23:59solutions for many different use cases.
24:02And I think there's even, you know, an approach how you could apply a bit of like
24:06that compaction, even to event sourcing.
24:08I think there's like a concept called subsumption, I think.
24:12And that allows you to also like flatten some of those historic events.
24:16So that's super interesting topic of research, I think.
24:20. Yeah, I know there for certain CRDT algorithms and certain
24:23event sourcing things, like.
24:26Yeah, there are ways to compact history, but it did seem like, still
24:29a pretty evolving area of research.
24:32I think now it's a lot more understood especially in the,
24:35like, list based CRDT space.
24:37Sure, like, I don't know, I guess, Yjs kind of pioneered a lot of
24:41that work of, like, run length encoding, everything, and then stuff.
24:45Yeah, I guess what I want to quickly mention, so there's, when
24:47I was first getting started, there was a community called Braid.org
24:51which I got involved with early on.
24:54And yeah, I don't know, that was an amazing community in terms
24:56of like learning and being open.
24:59So I just want to call them out, like they helped me a ton to understand
25:03all these CRDT algorithms and when history can be compacted, when it can't.
25:07What kind of weird circumstances you get into if you know you're emerging
25:11changes peer-to-peer versus, you know, client server and whatnot.
25:14Yeah.
25:15There this entire community, I'm not yet a part of, but given that
25:19this year I'm also looking much more into implementing, syncing this.
25:24I feel like there's a lot I have to learn about and I would love
25:28to also have some folks from that community here on the podcast.
25:32Yeah they're trying to add.
25:34Syncing to the HTTP protocol, so essentially adding a new verb rather
25:40than get, put, and post to allow you to merge documents or resources together.
25:46That sounds fascinating.
25:48How far along is that effort?
25:49They are part of IETF now.
25:52I don't remember if they're a research, there's like two separate types of groups.
25:56There's like research groups and maybe there's standards groups.
25:59I think those are the two.
26:01Yeah, I don't know if they're still in the research phase or if
26:03they're on the standards track yet.
Challenges
26:05Very interesting.
26:06I have to look more into that.
26:08So, besides choosing the right CRDTs to fit into SQLite through virtual
26:14tables I think this, like, just picking SQLite to run in the browser quite a
26:21few years ago, that's probably you've must have been running into a bunch of
26:25challenges and issues along the way.
26:27So I'm curious to hear first getting sort of like a broad overview of
26:31the different challenges you were running into and then going into them.
26:35Yeah, I think I wasted months of my life, like fighting SQLite issues.
26:39Cause yeah, it was really early where the official SQLite WASM project
26:44had kind of like just started.
26:46And I guess some of the first problems I ran into was.
26:51I guess compiling an extension into the WASM build.
26:55So yeah, it's one thing to write a SQLite extension.
26:57It's another thing to get it to run correctly in WASM.
27:03It wasn't so hard when the extension was originally written in C, but like,
27:06I don't know, like trying to code review contributions in C and make sure there's
27:10no like memory issues was a nightmare.
27:13So I eventually like bit the bullet, started learning Rust and re implemented
27:16the extension in Rust, which I think was, yeah, it became like a third
27:21or two thirds less code and just way faster to write after doing that.
27:24I think it was worth the time.
27:26But that introduced this whole new complication of like getting the
27:29Rust compiler toolchain to compile an extension that can be linked to the
27:33SQLite compiler toolchain that then goes through Emscripten to build a Wasm bundle.
27:39And let me see what else was there.
27:42Transactions was an interesting thing.
27:45So the inner, the originally the interfaces to SQLite
27:49in the browser were async.
27:51And if you ever had Well, I guess there's two builds of SQLite.
27:55The official SQLite build was always synchronous.
27:57There's this unofficial build, which I think is better, called wa-sqlite, which
28:02all the interfaces were asynchronous.
28:03But it had these interesting caveats, like if you await two calls to
28:07SQLite at once, it would deadlock.
28:09So you had to make sure like there's only one Call to SQLite ever at once.
28:14So, you know, working around that challenge.
28:16Transaction interleaving was another, so like if it's asynchronous and you
28:20start a transaction and you have all the statements in the transaction, okay.
28:24You, you began the transaction and then you like do a read,
28:26you're waiting the read.
28:28Well, while you're waiting in the read, you delegate control of the event loop,
28:31so somebody else could come in and, you know, kick off a new task, right?
28:36A new statement to do a write or something.
28:39Well, now that's somebody else's transaction that's trying to run while
28:41your other transaction is running, so you get this, like, weird interleaving.
28:44So, yeah trying to lock out transactions in the JavaScript layer that was
28:48another annoying thing to deal with.
28:49And then, yeah, this is not related to WASM, but just getting SQLite
28:56with the extension built for all the platforms I wanted to target.
28:59So, like, eventually Expo wants to include it in their Expo SDK, so figuring
29:04out how to do Android and iOS builds in addition to, you know, the WASM build.
29:09And Mac and Linux builds.
29:12Those sound like quite the buffet of different challenges that some sounds
29:18familiar to me as well as I think some you've been pioneering a tad earlier.
29:23And by now, some of those are maybe a bit more.
29:27A well trodden path, but I think there's probably even more
29:31that we don't understand yet.
29:33One more, I know one more interesting one I think more interesting than
29:36the other ones is since SQLite in the browser, at least the one
29:41I'm using, is IndexedDB backed.
29:43IndexedDB has a very high cost for opening a transaction which
29:48I was unaware of at the time.
29:49So yeah, there's a lot of work.
29:51And anytime you open a SQLite transaction, it would open an IndexedDB transaction.
29:56So there's a lot of work making sure that, like, for reads Yeah, normally
30:00you just issue reads, you wouldn't necessarily batch them into a transaction.
30:03Like normally, you know, you'd batch your writes into a transaction.
30:06But yeah, to make SQLite in the browser go fast if you're doing a whole bunch
30:09of reads, like making sure a transaction was open automatically all the reads
30:14would happen and then the transaction would close once all the reads are done.
30:16So it's like way of batching that was like a 10x speed up in the browser for SQLite.
30:21So once you've solved all of those performance challenges and also like
30:26correctness, syncing challenges, et cetera, then it's also needs
30:30to be competitive somehow with the typical developer experience
30:34you have in a React app or in a different framework you're using.
30:40How did you go about designing that?
30:42Yeah, so, yeah, I've developed a set of hooks, so, I don't know, it's, I guess
30:47like the React integration was probably the least amount of, I don't know,
30:51I'd say it's the least amount of work.
30:52Maybe not once I like think about all the strict code problems and
30:55stuff but yeah, essentially the DevEx was like a use query hook.
30:59So I think it's a lot easier to develop when your queries are
31:03co located with your components.
31:04So every component is responsible for getting the data it needs.
31:08And this makes your app like super composable, right?
31:10Like you can add and remove components to your app or your component tree.
31:15And because they're responsible for getting their data, like, you
31:18don't have to do any additional wiring or prop drilling.
31:20And yeah, they're not going to break other components because suddenly,
31:26Some data dependencies are missing.
31:28So I really like that, yeah.
31:29Co located queries, so every component you could use this hook called UseQuery.
31:34And in that UseQuery, you just literally write your SQL that fetches
31:39the data required by that component.
31:41And that SQL could be as complex or as simple as you wanted.
31:44So, you know, like, select from issue where id equals issue id or something
31:49simple or, you know, something complex like select star from issue,
31:52order by modify, join labels join owner, and yeah, all these things.
31:58So you've managed to tame most challenges with SQLite in the browser, making sure
32:03it works in Wasm, making sure that your own extension is working, et cetera.
32:08And SQLite can be super, super fast for most workloads, but once your
32:14cardinality grows, et cetera, your tables get really big then queries can
32:19also start Taking a little bit longer.
32:22And that is because SQLite always kind of, besides a little bit of caching,
32:26always starts with your results for your queries from scratch, kind of.
32:31And you know, I've been chatting about this for quite a while back then.
32:35This is how I know about all of this context.
32:37And that has led you to look into.
32:40incrementally maintaining the query results and the views.
32:43So I'm very curious to hear more what you've learned back then
32:46and which path this led you down.
32:49Yeah.
32:50Yeah.
32:50So I guess, you know, I was talking earlier about each component
32:53fetches its own data, has this use query hook, or just, you know,
32:56specifies a SQL query it's running.
32:58So when you're developing like a rich client side application like this,
33:02ideally your database is reactive, right?
33:04So any, anytime some data changes.
33:07It's like you, you work on Overtone, right?
33:09The music app.
33:10If somebody presses play, then, you know, all the components that care about
33:16play state need to immediately update.
33:18And yeah, as you're saying in SQLite, if you're doing this somebody changes some
33:22state, well, you know, SQLite doesn't have any, you know, facilities for reactivity.
33:27It has a few but they're not very good.
33:29Like they don't give you many fine grained notions about what changed
33:32or like what queries it impacts.
33:34So what you're left doing is rerunning.
33:38Essentially from scratch the queries that could have been affected.
33:42So, like, the SQLite change notifications, they'll tell you the row ID that
33:46changed, but not the contents, and they'll tell you, like, the table.
33:49So it's usually not enough to figure out exactly which queries to invalidate.
33:52And yeah, so you rerun tons of queries throughout the app.
33:56And those are all rerunning from scratch, so like, if you have, like,
34:00for your track list, you're showing a few hundred tracks, and that track list
34:03has to do a number of joins, right?
34:05Has to, for a track, has to join the album to get the album title, has to
34:08join the artist to get the artist names.
34:12Yeah, I don't know if there's any other joins.
34:13Maybe there could be joins about, like, like status, or favorite status,
34:18or something about the track, right?
34:20And, yeah, rerunning that join to select, you know, 200 some items.
34:25And yeah, maybe they just want to sort, and sorting it doing that from
34:28scratch every time somebody mutates something severely limits I guess how
34:32fast you can interact with the app.
34:34And yeah, I'd worked, tried a bunch with like, okay, you know,
34:38maybe a purely in memory SQLite, and that gets you pretty fast.
34:42But yeah, somebody was saying, oh, like, The goal for a reactive
34:45database is it's memory fast.
34:47And then when I started, like, I don't know, that idea, a lot just
34:50helped my brain and I started like actually benchmarking, you know,
34:54if I'm just observing a value in JavaScript, how fast is that?
34:57And if I'm like, Observing a query in SQLite and updating the
35:02row and re running the query.
35:03Like, what is that?
35:04And I have a observable notebook somewhere that like compares all this
35:08and like the difference was massive.
35:09So like just seeing how far I was from memory fast, like started
35:13making me a bit sad on SQLite.
35:15And also like benchmarking the original strut and the new one, which use
35:20SQLite and these use query hooks.
35:23Yeah, like the original one I could throttle my Chrome, you know, but in
35:27the dev tools you can like downgrade your CPU right to like 6x slower or
35:31something and I could go all the way down to the slowest possible and everything
35:35was buttery smooth, but in the new one, like if I downgraded it too much,
35:40like you could see some visible lag.
35:42SQLite
35:45is fundamentally built in this request response style, right?
35:50This is the era of the LAMP stack where, you know, databases request
35:55response made sense, right?
35:56A user would go to the website it would do a query against the
35:59DB and it would render, right?
36:01There was no, like, rich interactivity.
36:03It was always a full page refresh every time.
36:04So request response made sense.
36:06Yeah, but for these rich apps, request response no longer makes sense.
36:09And trying to fit a request response sort of DB designed around that,
36:13it into this reactive scenario.
36:15And I think maybe one day somebody can get there.
36:19But I think right now it didn't seem to make much sense to me and the juice
36:23didn't seem to be worth the squeeze.
36:25So I started, I guess my love affair with SQLite has slowly been coming to a close.
36:30And I started, you know, investigating other projects.
36:33Yeah maybe one, one day port it if there's time or.
36:36porting some of these projects back to SQLite and contributing
36:39incremental data flow, differential data flow to them or something.
36:43But yeah, I guess I should describe what these projects are, right?
36:45, So I like the idea of queries, being able to like, Issue a complex declarative query
36:52against some set of relational data to get back the data you need for your view.
36:56So I wanted to like, how can I , incrementally maintain an
37:00arbitrary query against some data?
37:02So like that track list example you query the, Tracks and their
37:06artists, and their albums, and the play state, and all these things.
37:10How can we, when somebody does it right, rather than re running
37:13the query to get the track list it knows exactly which queries.
37:17Should be invalidated by that write, and rather than re running them,
37:22knows exactly how to patch up the rows that are impacted by the write.
37:27So, you know, at first this problem seems absurdly complex and hard.
37:32And you're like, Oh, yeah, I want to like invalidate and
37:34patch up any arbitrary SQL query.
37:37But yeah, I started reading some of the papers.
37:39There's like a paper called DBSP.
37:41It's a differential data flow paper.
37:43I think Materialize is based on that.
37:47And then, yeah, started implementing it.
37:49And then I realized like, I don't know, this, once you've read it and started
37:52implementing it and you realize like, this is very similar to other stream
37:56processing stuff , like, and other query, you know, builders I've done,, right,
38:02where rather than, you know, you create a series of operations and rather than
38:06them asking a database for the data, they're like taking streaming data in.
38:12So yeah, like some of the work I'd done at Meta was on real time abuse systems
38:17and that was all streaming systems.
38:20So I was like, oh, like this problem isn't that mysterious.
38:22Like I've done stuff like this before.
38:25It seems tractable.
38:26So I, you know.
38:27went full on, decided like, I think we, I think I can implement this yeah, based
38:32on prior experience, based on having done query languages before so yeah,
38:36MaterialLite was born which is bringing differential data flow to JavaScript, so
38:43you can compose, filter, map, reduce, and a join operator to create these pretty
38:48rich queries and have them reactively updated anytime there's a write.
38:53Right.
38:53And so for those of the listeners who have not yet built their own database
38:58and maybe are familiar with like SQL where in SQL, you have like select from
39:04where, and all of those and JavaScript or other programming languages who you
39:08might've like, Map, filter, et cetera.
39:11They might have different names, but they're sort of
39:13like conceptually very similar.
39:15And this is where you're basically just now like trying to recreate the same
39:20semantics that we have from SQL, where you can say select star from this where so,
39:25and you can basically have like an array where it now say .Map .Filter, et cetera,
39:32and where you flip the trade offs from before you owned in SQLite, you might get
39:38this query just once and then it's done.
39:40You no longer are interested in like subsequent changes and then like, smaller
39:45updates, but in JavaScript where our app stays warm, we click the button.
39:49Something's changed slightly.
39:52You want to change the trade offs quite significantly.
Materialite
39:57Yeah, so like just to build off on like how SQL concepts map to like filter,
40:02map, and reduce, or like map represents select or function application, like
40:08you're selecting some columns, well that's mapping the original object to some
40:12other object with a subset of the fields.
40:14Filter is the same as where Reduce is what you use for any aggregate
40:19function, like group by, sum, count.
40:21Yeah, and I guess one other operator would be concat.
40:24So if you want to do or, you just, you fork your stream that you filtered
40:28or mapped over, and then after you've filtered it, you concat the streams back
40:32together, and then you run a distinct.
40:34So you can do an or and then join, joins a special thing where essentially.
40:39You kind of maintain what you've seen from one stream and then as you
40:43see results from another stream, you link them together based on some key.
40:46At first, it like, Seemed like a large barrier, but then I don't know, once you
40:51get into it and you realize, Oh, like I've done all this before and filter,
40:53map, reduce, like these have direct analogs in SQL, like it's not so bad.
40:58So we've so far like touched the most common usage of SQL.
41:04But SQL has, or SQLite, Postgres, et cetera, but like sticking with
41:08SQLite for a moment there's quite a bit more than like the common
41:12like select from where, et cetera.
41:15There's like various kinds of aggregations, various kinds of ways
41:18to combine queries, sub queries, joins, various kinds of joins.
41:23Is the system that you've came up with MaterialLite, is that on feature
41:27parity on what SQLite is able to do?
41:31Or are there some parts where you say, okay, that's out of scope because
41:35there's years worth, maybe decades worth of optimization have gone into
41:40SQLite into making that fast and that's out of scope for Materialite.
41:44Yeah.
41:44So there's no window functions and there's no recursive queries.
41:48And base Materialite is strictly for streaming.
41:52So, like, if you're processing a stream of writes or events, like
41:57Materialite is a great fit because it's gonna, you know, patch up your
42:01query results as those events come in.
42:04But say you have a table of a million items and you want to
42:09query it from scratch, right?
42:11Like you didn't see the events, you couldn't maintain any queries.
42:15You're just like, I want to know what's in this table.
42:17So, so you're going to have to run that query from scratch, even in Materialite.
42:21But for Materialite, what that means running a query from scratch is like
42:24setting up the Dataflow pipeline and then feeding literally every row,
42:27all million rows into that pipeline.
42:30Which, yeah, since everything's implemented with, like, filter and map,
42:34where, like, it's opaque, like, when you do a filter, you provide a lambda.
42:38It's opaque.
42:38That's what fields you're filtering on which makes this from scratch case hard
42:42because, like, Oh, if I don't know what fields you're filtering on, then I don't
42:45know if I can apply some sort of index that you might have against this data.
42:49So, yeah, I guess after working on Material 8 and getting it to
42:54work pretty well, like, okay, that was the next problem.
42:57And it's like, we need an actual query language because in a
43:01query language is declarative.
43:02You express, like, This is the field name I'm filtering on.
43:05This is the operator being used and this is the value.
43:07So the engine knows exactly where the fields are ordering by the tables you're
43:12using the fields you're filtering on.
43:15And once you have that knowledge, you can also make that like first query
43:19case fast, where I guess you're doing the traditional database thing of like
43:24figuring out what indices to pick.
43:26I don't think we'll ever, Be as fast as SQLite for that case.
43:29But I think, you know, For an app like, for a rich client
43:32like, Overtone or Strut, right?
43:35, Most of your queries are subscriptions, they're not this
43:37like from scratch Sort of thing.
43:38Like, You're gonna subscribe, you're gonna like, You're gonna like, Set up
43:42a query that subscribes to the slide list, set up a query that subscribes
43:45to the component list on a slide.
43:48And then as writes happen, you just want to update those rather than,
43:52yeah, always, like, rather than having to have first class support
43:55for running a query from scratch.
43:58So, yeah, so that latter thing needs to be just fast enough, but
44:00it doesn't need to be, like, Yeah, the fastest thing in the world.
44:04I totally agree that there's a very interesting, different kinds of trade
44:08off that should be explored and should be built that makes so much more sense for a
44:14live active application where everything stays up to date and then changes
44:20quite minimally as opposed to request response, which is how SQLite works today.
44:26That being said, you can still get quite a bit of mileage out of SQLite today.
44:31If you are a little bit more mindful about like how you work with it.
44:36So some patterns that I found to Makes SQLite work for me with Overtone.
44:42Is that what makes SQLite the slowest in a browser context is shoveling data
44:48from within the SQLite database into JavaScript land and back and forth as
44:54this is where you need to cross the memory boundaries from memory staying
44:59within Wasm to how it's being like marshaled into JavaScript objects.
45:05And what I found as a pattern that helps quite a lot is just keeping that small.
45:11So if you need like a, and this is where I think you can apply quite a few of the
45:15tricks that you've now mentioned and you embrace with Materialite, you can actually
45:20bring back and layer on top of SQLite.
45:23Probably still not quite as fast as Materialite.
45:27But probably fast enough to make a use case like Overtone work within that
45:31paradigm as well, that when the list changes, instead of like, getting a
45:36new list of 5, 000 tracks for that playlist, every time those 5, 000
45:40tracks, you just get a change in terms of like, Oh, this new track was added.
45:46And getting that SQLite, I think can also be done with a few tricks namely
45:52also like a temporary table where you can save the previous result of your
45:58query before and after the change.
46:01And then you query that in a diffing way.
46:05And then you still need to sort of like work with that
46:08diff information in JavaScript.
46:11But this is how you can work around that performance challenge.
46:14But your system with Materialite.
46:17All of that is absorbed from you, which is very attractive.
46:20Yeah, and I think right, so Materialite, right now, it can be backed by
46:24an in memory collection, or it's just some little interface that's
46:28called a source that you implement.
46:30So I think, Yeah, SQLite could be that source, and then that would
46:33solve the like, first query problem.
46:36So I just compile the Materialite queries to SQL queries.
46:40and then the incremental updates are maintained by Materialite going forward.
46:44Yeah, obviously you have to like, listen to the transaction failures
46:46and somehow roll those back.
46:48But yeah, I think it's doable to like, Marry those two together and
46:51have kind of the best of both worlds.
46:54And I know there's like a project called GRDB.
46:56I think it's like a iOS SQLite wrapper which adds a bunch of reactivity.
47:02it's not like completely general.
47:04It doesn't support like completely arbitrary SQL queries, at
47:06least in an efficient way.
47:08But I think it gets you close enough with the queries it does support.
47:11One more thing about like, moving from the request response style to
47:15like, queries being subscriptions.
47:17It gives you these really cool opportunities to optimize
47:19queries at a global level.
47:22So like, if all your queries are subscriptions, you can see
47:24what subscriptions are open.
47:26And then you can start seeing which queries are either
47:29identical and deduplicate or which queries share operations.
47:34So like, maybe you have a whole bunch of queries that all do the same join.
47:37While in the streaming system.
47:39You can optimize that so that join happens once and then the results of the join are
47:44fanned out to all the queries that use it.
47:46So yeah, you can do some really neat tricks at a global level once you
47:49model your queries or subscriptions.
47:50Yeah, that's super interesting.
47:52And a few of those things I'm exploring for LiveStore myself where they're
47:58basically the problem of reactivity.
48:01And efficiency, trying to keep the performance as good as possible,
48:05but also squeezing that into a frame budget since LiveStore and Overton
48:10all run within the main thread, which I'm not sure that's probably
48:14also the way how MaterialLite works.
48:16There is Interesting to also see it through a lens of like a different
48:21paradigm shift evolution in the JavaScript front end world which is about signals.
48:27And signals is also about like, I think a more lighter weight reactivity system
48:32that is all about being composable.
48:35And and I think those worlds are now finding, like, a way together
48:39within live store with the prior work with Jeffrey and Nicholas on Riffle.
48:45This was like some early implementations based on a paper called Adapt On.
48:50And from my understanding all of like the research that has gone into adapt on.
48:56And sort of like in parallel, what's been evolving as signals.
49:00Those are actually the same thing.
49:02And I think now we can do some really interesting work on combining the
49:07power that like more substantial state management primitives, Like a database
49:13gives us or something like material light gives us and bring that together
49:17with something like a signal system that also kind of out of the box address the
49:23thing you've just mentioned, which is reusing parts of the computational graph.
49:28The only thing signals is missing.
49:29I'm surprised nobody's done this yet.
49:31Is Incremental computation against collections, right?
49:34So you think of like, I have an array, and I map the array, then I filter
49:37the array, then I reduce the array.
49:39That's like creating a new copy of the array each time.
49:42And if you like, add one element to the array, you like, rerun all
49:45those against the full array again.
49:47So it's kind of like the query from scratch case, right?
49:49Whereas Materialite, right, You add an element to the array, well,
49:52it's only going to run the filter on that one element and the map
49:55on that one element rather than running against the full array.
49:58So there's some interesting benchmarks maybe I can share with you after of
50:01like, yeah, incrementally maintaining a map filter reduce pipeline
50:05against an array of a million items.
50:07Yeah, so it'd be really cool if like the signals implementers would even
Rocicorp
50:12So even though your love affair with SQLite has for now come to an end.
50:18I think the problems and goals you're chasing after are still very
50:22similar to what I think is feasible with SQLite, just maybe with like
50:27a different performance footprint.
50:29I think with SQLite you also get the benefit.
50:32of just that being like a very familiar and trusted thing that's
50:35like been proven and battle tested for literally decades now.
50:40And even though you said like you were running into some performance
50:44issues SQLite is still so ridiculously fast that it's very feasible to do
50:49write those applications in the, in a browser context, et cetera.
50:52And I think even if you now on a modern CPU straddle your your
50:57CPU, it's still manageable.
51:00But I think there's multiple paths to arrive at the same goal.
51:05And so as my understanding is that the project Materialite opened a couple of
51:11really interesting follow up conversations for you which is now opening a new
51:16chapter for your professional life.
51:19So do you want to tell us a bit more about that?
51:21Yeah, I guess, yeah, Materialite is my least known project,
51:25least complete project.
51:27But yeah, it did pick up some interest by Rocicorp specifically Aaron, who I
51:32met at the local-first conference in St.
51:35Louis.
51:36And yeah, we're talking about, you know, the model of subscript, those queries as
51:39subscriptions and like having a reactive app and I think he's the one that said
51:42like, oh, things need to be memory fast.
51:45And yeah, so, so I started working on MaterialLite and we did some explorations
51:49of like, if you powered one of their demo apps with MaterialLite, like they
51:54have a linear clone that's a demo app and they want to see like, Oh,
51:56how many issues can we scale this to?
51:58If we like use this differential data flow approach.
52:02Yeah.
52:02And they really liked kind of the initial results.
52:05And yeah, we worked together, I guess.
52:08quite a bit over the last four slash six months.
52:10There was like a two month break for me on paternity leave.
52:13And yeah, so I'll be starting a job with Rocicorp here pretty soon as a partner.
52:18Yeah, I'm super excited about that.
52:20That's amazing.
52:21I mean, we had Aaron on the show.
52:23I think that the second show for localfirst.fm, and there were so many
52:28deep insights there by Aaron, and I'm a big fan of the products that's already
52:33been built there, Replicache, Reflect, and sounds like your work with Materialite
52:40is also going to be incorporated into making the products even better.
52:44So I'm curious to hear more of your thoughts on you know, what do you feel
52:49like where's local-first going with the products you're working on with Rosicorp?
52:55Local-first provides a superior DevX because you're no longer
52:59worried about APIs of, like, how do I get data from the server?
53:03Like, a sync engine, or your database is solving, you have a sync engine solving
53:07that for you, right, where you're just coding against local data rather than
53:11setting up all these weird REST APIs that you have to call to get the data.
53:14And yeah, I think that's the key value of local-first and the main thing
53:19that Rocicorp is pushing forward.
53:21It's like making that sync engine experience as best, as
Picking the right tool
53:25So you're now joining Rocicorp where you're building Replicache and Reflect,
53:30but the local-first space has grown quite significantly to a point that
53:34can be almost a bit disorienting.
53:36For newcomers who are trying to first figure out what does it mean to build
53:41the app local-first and then also choose the right tool for the job.
53:45Can you provide a little bit of guidance which how someone should think about
53:49use cases and picking the right tool?
53:52Yeah.
53:52So I guess maybe when you're somebody, when the first question somebody
53:56asks, should ask themselves is, "Do they want servers involved at all?"
54:01Is the first question.
54:02So, like, if you want all sync to go through a central authoritative
54:06server, and you want that server to be able to override decisions made
54:10by the client apply data integrity rules revert changes made by a client
54:16Replicash is a very good choice for that.
54:18If you want something, Where, you know, you don't want to preclude the
54:22ability for peer, for nodes to sync peer to peer, or maybe the data they're
54:27syncing is end to end encrypted.
54:29I think something like CR-SQLite actually shines there.
54:33CR-SQLite lets you merge in any topology you want, whether it be
54:37peer to peer or hub and spoke.
54:39So we've seen a couple of companies that are, have like end to end encryption,
54:44where they use CR-SQLite to sync, or they have literally a pub sub
54:48channel, where all peers just publish their messages, and any peer who
54:51receives the messages integrates it.
54:54In those use cases and then I guess if you're really looking for, like,
54:57if your data model is like document based, I guess going back to syncing,
55:01so not just CR-SQLite for peer to peer, but also Yjs and AutoMerge
55:05is capable of peer to peer sync.
55:07They're also, of course, if you can sync peer to peer, you can
55:09sync through central server too.
55:11If you're looking for, you know, you just want to support a collaborative text
55:16field collaborative text doc and you don't need, you know, ways to query over your
55:22documents, like you're fine just storing this doc in a blob in your Whatever your
55:26storage is you know, Yjs is a very good choice because it's primarily targeting
55:30it at text editing, collaborative text but they don't have, you know,
55:33facilities like built in for persistence or querying and these sorts of things.
55:38So I haven't kept super up to date with AutoMerge, but from what I've seen if
55:42you want like, the ability to fork and look at different portions of history.
55:47So like, say people are editing a document somebody wants to fork their
55:50document off, make a bunch of changes and merge the fork back in I mean, I see
55:55AutoMerge talks about this concept a lot.
55:57So I think if it's not there already, AutoMerge would be your best bet for some.
56:03for that sort of behavior.
56:04Yeah, then there's ElectricSQL.
56:06So that is like a, you know, Postgres centric solution.
56:11So if your database is Postgres and you're okay with, like, merging based on CRDT
56:17rules You know, that's a good choice.
56:18So yeah, so I say like your back end is Postgres and you're
56:21okay like using CRDT rules.
56:23Like yeah, what does that mean?
56:24So a really interesting thing about Replicache is it can work with any
56:27back end, pretty much any back end.
56:30So Postgres, MySQL, all sorts of things.
56:34And the way this works is that it uses sort of a rebase model where , the
56:39developer, defines their mutations.
56:41And those mutations get run in a specific order on the server and clients when
56:47they receive updates from the server, any mutations they have outstanding.
56:50They apply the server updates and any mutations that are
56:52outstanding, they rebase on top.
56:54And this is like super flexible in that you can write any, like, if you want to
56:59change merge and conflict behavior, well, you just write your mutations however
57:03you like to get the behavior you want.
57:05Whereas something like CR-SQLite or electricsql to get the merge
57:09behavior you want, you have to pick specific CRDT types.
57:12And it, it seems a little bit less flexible since you can't
57:15write like a custom mutation.
57:16But ElectricSQL does seem to have a pretty good end to end story of like,
57:19you set up your Postgres DB, that schema gets replicated down into a
57:22local SQLite DB, and as you make schema changes upstream, they flow downstream.
57:27So in terms of like, Having a good story between two specific
57:32databases it seems pretty good.
57:34And then I guess the last thing is PowerSync.
57:37So PowerSync, uh, works on a different model that's not rebase or CRDTs.
57:43So with PowerSync, a client will accumulate changes, and if a
57:47client has local changes, it won't take any changes from the server.
57:50So in the rebase model, if a client has changes, it'll take
57:53changes from the server and then replay its local changes on top.
57:56Uh, but in the PowerSync model, the client has local changes, it won't take any
57:59server changes until those local changes have been incorporated by the server,
58:02and then it can take server changes.
58:04Um, So yeah, to me, I've never built an app with PowerSync, but this seems like
58:07it'd be a lot higher latency, uh, for the syncing, because if you have any
58:12outstanding local changes, you simply cannot take a change from the server.
58:15Whereas rebase model, if you have outstanding changes, it's fine.
58:18You get the server changes, you just replay your local changes
58:21on top of the server changes.
58:22That's a really great overview, and there's many other tools that
58:26more more tools than we have time to exhaustively can cover here.
58:31But I think that's a really great orientation, particularly with Asking
58:34yourself the first question, like where should the authority originate from?
58:38Do you want to have a server?
58:40That's, I guess, more traditional how web apps work today.
58:43And that can simplify things a lot because in that server you can enforce if there's
58:48like a merge conflict or if there's like maybe a user has set some data in
58:54a way That might be not compliant with the app that you want to build or for
58:59other simplification reasons, et cetera.
59:02So I think picking that, where does the authority live?
59:04Is there an authority?
59:06And , which role should a server play?
59:09I think that's a really important one.
59:11And then also the two kinds of like ways to go about syncing or
59:15at least two ways going more about certain spacing, everything on CRDTs.
59:21Or going with a event rebase model, our Replicash is working.
59:27I'm curious how someone who has used neither and might have not really
59:32a lot of intuition for what are the implications of choosing one or the other.
59:37Can you provide a little bit of an intuition which kind of app use cases
59:42are a good fit for one or the other?
59:44Yeah.
59:44So with CRDTs, you don't.
59:47Have code on the right path, like the CRDT algorithm is going to
59:53determine, I guess I should say you don't have code on the merge path.
59:55The CRDT algorithm is going to determine how things merge.
59:58And you're just picking a set of CRDT algorithms where something like Replicash,
1:00:02where you write custom mutators.
1:00:04You can write a set of mutators that run on the client, and you can write mutators
1:00:08with the same name, same args, but do something totally different on the server.
1:00:11Which, one it's, since you're writing code in the mutation, it's giving you control
1:00:16over, kind of, how things merge and sync.
1:00:20And two, since The code doesn't have to be identical on every node because
1:00:24the server is the authoritative answer.
1:00:26You could do something in the server mutation like check permissions or,
1:00:30you know, set last modified times or something from the server's
1:00:33time rather than the client's time.
1:00:35Yeah, but I guess the one downside to that is a CRDT, like, you can sync messages in
1:00:40any order, like, yeah, if you have message A, B, and C, if you sync them B, C, A,
1:00:46or C, B, A, like, you always get the same state, which means that you don't have to
1:00:50have a single server being the authority.
1:00:52You can have every peer be on equal footing and essentially have no server
1:00:56whereas like this, you know, Replicache model, you do have to have, Someone
1:01:00that is that authority that's going to provide the total ordering of messages.
1:01:04And to me, it sounds like at least the latter, the rebasing model with
1:01:09that authority living in a server, for example, that is a bit easier to get
1:01:15into less of a radical shift of like thinking, changing the way, how you
1:01:20think about data modeling, et cetera, and it's probably the easier one to
1:01:25get into and might be a better catch all solution for many app use cases.
1:01:30Like if you look at yeah, I think it's the easier, it's like, it's more
1:01:33familiar and if you have an existing app that you're trying to add multiplayer
1:01:38to Like, yeah this can work on your existing data model without much
1:01:41changes or with little to no changes.
1:01:44Whereas, oh, you want to switch to CRDTs, like you might have to make a new data
1:01:47model specifically for this collaboration thing and migrate your stuff over.
Outro
1:01:51Right.
1:01:51And I mean, the advice I would give on top of that is that.
1:01:55There are now like so many interesting and easy to use options out there,
1:02:00whether it's Replicache, whether it's AutoMerge or others just maybe try
1:02:04building the same mini version of your app in different technology stacks,
1:02:09see what works best for you, and you'll develop that intuition just by yourself.
1:02:14And I think that's a really fun thing to do.
1:02:16Cool.
1:02:17Matt, thank you so much for sharing about like all those different
1:02:21projects you've been involved with.
1:02:22I'm really excited to see what you'll be building at Rosicorp.
1:02:27Is there anything else you want to share with the audience now?
1:02:30Yeah, just stay tuned.
1:02:31We're building the next.
1:02:32Kind of iteration of Replicache and Reflect.
1:02:34And , it was exciting enough to make me like drop everything I was doing
1:02:39and join the team and yeah, go back to, I yeah, work, working for somebody,
1:02:44I guess, rather than just, you know, following whatever research interests,
1:02:49wherever they took me from day to day.
1:02:50So yeah, it's, I think what they're building is super compelling and
1:02:53You'll hear more, you know, over the summer and coming months.
1:02:57That's incredible.
1:02:59I'm really excited to hear more about what you all have been working on.
1:03:03And yeah, thank you so much for coming on the show.
1:03:06Yeah.
1:03:06Thanks for having me.
1:03:08Thank you for listening to the localfirst.fm podcast.
1:03:11If you've enjoyed this episode and haven't done so already, please subscribe and
1:03:15leave a review wherever you're listening.
1:03:17Please also tell your friends about it.
1:03:18If you think they could be interested in local-first, if you have feedback,
1:03:22questions or ideas for the podcast, please get in touch via hello at
1:03:26localfirst.fm or use the feedback form on our website, special thanks to Expo and
1:03:31Crab Nebula for supporting this podcast.