October 12th, 2023
Your product will have tradeoffs because otherwise…why didn’t the incumbent just do it? Jira sucks because it’s complicated because it’s a one-size-fits-all-departments kind of product. For me, I’ve realized that I’ve been wrestling against CAP. Trying to achieve consistency across partitioned streams is slow. Consistency requires that destination streams check with origin streams on whether it’s safe to processing beyond X. Whether or not it’s safe depends not only on where the origin head is, but also whether it’s behind the tip. The answer to the second question changes as new events roll in.
So one choice is to do nothing. First come first record. But, you do want some semblance of order when merging streams. Like if you’re merging two streams and you know approximately the order the events were received on the global time scale, you’d rather have them in that order, even at the cost of some milliseconds of “slippage”. That’s consistency and partitioning at the cost of availability.
There’s also the problem of backfill. Say you have streams A and B going into X. Later, you realize you want stream C also piping into X and you’d like historical events in there too? What is stream X? Is it a historical record? Or just notifications going forward? You definitely want it to serve the purpose of a historical record.
What if you recreate (reindex) stream X? What about its downstream effects? I think you can have splice requests for handling any downstream effects. Reindexing assigns a new stream id and rebuilds the order based on create and not append time. Is create time the original time or the parent’s time? Probably original, based on the producer time. When reindexed stream also points all pipes and reducers.
Why? Going back to the question of “Why would you use this thing?” You use it for integrating distributed systems, where consistency is already impossible (unless your third party tool allows for locks) or simple analytics or onboarding, where order matters much less.