An engineer's SQL client
May 19th, 2019
Someday, I’d love to build a SQL client for engineers.1 For the sake of this article, let’s call this hypothetical product “Potrero”.
First off, Potrero wouldn’t target the analyst. That market is done, captured. There’s way too many SQL clients already. Instead, it’s engineers. Engineers already know the basics of SQL. Many startup engineers will write their own queries to satisfy occasional question. They generally don’t use Looker or Mode. They use psql. It’s hard to compete with built-in solutions, people aren’t attached to psql in the way they’re attached to vim and even in that category, VSCode has emerged as the new dominate player in a decades-old market (though obviously subsidized by a massive company).
If your target is a different audience, a very different product starts to evolve.
Instead of starting with Snowflake or Redshift, you start with localhost Postgres. Engineers need to be able to explore their local data. Sort of like requestb.in (RIP), it’s sometimes hard to know, within a complicated code base, what is actually being put in the database. Sure, you could fire up psql, but you really just want to lazily click around. I actually saw a lot of adoption of Postico for this reason — UI clients are just nice. You’d probably want some kind of Electron app.
Also, no chart builder. This isn’t a BI tool. Chart builders are stupid complex to build. You might want conditional formatting and result column stat graphs, but if you figure this is mostly used for localhost, you’re not building charts and dashboards. You’re inspecting, counting, and modifying values.
From there, you might annotate a column or table to remind your future self of weirdness in a particular table. But you’d also want to be able to share with your teammates. So you’d actually want a sort of collaborative document. You don’t really need/want it to be realtime, so an asynchronous merge strategy like git’s would be fine. You could make local modifications and when you’re ready, publish them to the team, provided you pointed it to a migration directory.
You’d also want some kind of data summary stats displayed alongside the human-editable parts. Postgres already has pg_stats, which tells you about NULL values, popular values, percentile distributions (if it’s a continuous value like numbers or datetime). Show those!
Now for running SQL, you have two parts: Query + Results. Yet at the same time, you want access to browse the schema objects as well as previous queries. For the quick answer, we can just look at other code editors: they use tabs and split views. I know, I know. No one except engineers know about split views. But remember? I told you Potero was for engineers. Imagining
cmd+p GoTo’ing a table, with fuzzy search included. And then tabbing back to your query. And then splitting the view so you can view a few schemas on the left while you type your query on the right. Datagrip is closest here. VSCO used them and I was genuinely impressed with some of its capabilities.
Typing SQL is of course tiresome, so intensely great autocomplete is absolutely necessary. Luckily, because you’re only support Postgres, syntactical autocomplete is much easier. One of the clever tricks we used at Numeracy was mostly autocompleting within single clauses. Sometimes, a person might right the
from clause before the
select clause and autocomplete should still detect that it should prioritize table/view/table-returning functions first. “Perfect” autocomplete would be able to autocomplete not only keywords, tokens, and values based on the database, but from frequency queried (not for any user on the database, lest some machine is using the same connection, but for people using Potrero). Of course, the autocomplete should have references to the notes you commented on earlier.
Actually, since we’re talking about using query history for ordering autocomplete by frequency, you could add this data to the schema browser: what are common join’s, where’s, or group by’s for a table. What tables in a schema are most often queried. That sort of thing.
As another side note, some SQL clients sell the idea of real-time collaboration. Localhost access makes this really tricky — you’d need a central service to support this. Sort of like VSCode’s remote system.
Plus, eventually you’d want to hook this up to your production app. Your schema comments and usage metadata would carry over, but now, it wouldn’t just be yours, but your entire team’s. This is where Potrero would actually solve a valuable but subtle problem: contextual knowledge transmission. Most SQL seems to be easy to write, but the devil’s always in the details. Understanding the subtleties (or just oddities) of SQL, your database, or your dataset is everything. Hopefully a product like Potrero would assist with that.
I don’t plan on building this myself anytime soon, because I don’t think this market is viable for a business (though it might be viable as a side project making beer money). The pain point isn’t sharp enough that anyone will pay enough to pay for CAC and isn’t general enough that a free product will spread virally enough to support a VC model. ↩