Embedded engine

April 11th, 2023

I’ve come to the realization that Pipet isn’t a traditional server-database architecture because right now, the slowest part of the Pipet is the network calls. This is especially true if I want to introduce more complicated state types that are not loaded into memory and therefore are cursors instead of values.

Even if I use Redis, it introduces at least 200µs per reducer event for the network call alone. Postgres isn’t even worth discussing. Compared that to reading from SSD and the JavasScript execution, each of which is under 20µs. Not to mention if I load state into memory. The point is embedding the processing with the data gives us a speedup of almost 10x. The net difference would be the difference between running 3,500 events/second and 25,000 events/second or even 50,000 events/second.

0.1µs main memory reference
 16µs SSD random read
 50µs Read 1MB from SSD
200µs Redis network latency alone[^latency]
500µs RT in the same datacenter
 20ms Postgres latency (including network)

The difference between the two feel neglible from a human point of view, but if you have an app with 10,000 DAUs, about 2k of them might be active at any given time. Assuming they’re generating ~5 events/second, that’s 5,000 events/second. Suddenly, you get this pileup. Plus, Redis itself caps out at about 100,000 operations per second. That’s fine for a single developer, but if you’d like to provide an analytics platform, that’s wholy unacceptable for one tenant to occupy 5% of your entire theoretical capacity.

Embedding makes it easier to split out bigger streams. Not automatically, but you can impose limitations on free users so they can be put on a big “free” tier. Limited state size, limited number of reducers, limited events per second. Paid customers can be put on their own machines, which can scale to the theoretical limits of

This feels like the “right” way to approach this, without me spending all this time researching the performance characteristics of clustered Redis. I am building a database whose processing engine is V8. Funnily enough, I came to realize this when writing my own post on React Server Components, where I berated the architecture of server components which entirely fail to prevent a waterfall of redundant data requests. Writing is terribly good at exposing your own hypocrisy.