Executing SQL from the browser

July 14th, 2020

The Gordian knot
Photos by Mohamed Ziyaadh

Update: pgpassthru is a rough sketch of this idea based on how we did it at Numeracy.

As Numeracy integrates into Snowflake, it necessarily is becoming not only a SQL client, but also a database GUI. Snowflake customers aren’t just analysts; they’re also database administrators and they want a GUI with grids and buttons for their most common tasks.

Normally, you’d build out RESTful endpoints for every database object and then custom logic on the backend to translate those RESTful JSON requests into SQL. The problem is that databases have a dozen object types and lots of different properties on them. Writing this translation is a huge undertaking. At this point, code generation becomes tempting. We have configuration files that define SQL (in Snowflake’s case, ANTRL files), why not translate those into node endpoints?

You should always be suspicious of code generation. That’s not to say that it’s never good, but it should be a last resort. That’s because your code generator is hard to debug later and it will have bugs. On top of that, the expect JSON payload probably won’t be obvious to the frontend developer, so you’ll either need to generate code comments, or worse yet, create a “discovery” endpoint, or start generating browser libraries. Oh! you might say, this is the perfect use case for GraphQL to SQL. I can write a translation layer and then I can take advantage of the GraphQL ecosystem! If the idea of translating GraphSQL to SQL and back doesn’t immediately sound like a massive headache, best of luck.

Instead, we choose to write SQL in frontend and had a single “SQL passthrough” endpoint, with some guidelines:

  1. 80/20 rule. A fraction of possible DDL will cover most needs. For anything else, there’s SQL.
  2. No DQL or DML. Data warehouses store terabytes and the browser can’t really handle that scale of data.1
  3. Don’t build a DDL statement builder. Write SQL strings and take advantage of SQL’s parameters to pass in identifiers.

The first objection is usually security — how do we ensure users don’t write malicious SQL? Databases have a security model. Sure, you’d need to have define a database user per app user and properly scope that user, but we get that for free with Snowflake2. Now, it’s possible that a bad client could DDoS the SQL passthrough endpoint. Just because it’s a passthrough, doesn’t mean it can’t have any intelligence. You can still (and should) build in caching and rate-limiting to prevent abuse or just ensure idempotency. For instance, if you have a “create table” button, you might event want to run all your DDL3 serially, so if someone clicks the button twice, you don’t end up with an error on the second try.

This approach prioritizes simplicity over cleverness. It minimizes your surface area and provides the most transparency to frontend developers. Yes, our frontend developers need to learn the basic Snowflake SQL commands, but trust me, if you can learn to love JavaScript, you can learn to love SQL. Plus, it’s gotta be better than learning a homemade DDL-DSL language without a consistent maintainer.

  1. It’s possible for show commands to return large datasets, in practice it’s rare. Those cases are on the other side of the 80/20 rule and specialized endpoints and caching (especially systems for browser-side cachine) will still need to be built. 

  2. That doesn’t mean you can’t do this with Postgres. When users sign up, you can also create a database user and with row-level security on the table. It’s just way more of a hassle and usually there’s no point — you don’t want your users creating tables or adding triggers to tables. 

  3. I’m not counting SHOW commands in DDL. These should very much be parallelizeable.