REPLs vs Notebooks
December 30th, 2019
There are three common ways of “running” code: REPLs, notebooks and execution.
Let’s start with execution. By this, I’m referring to running
python main.py from your command line. It runs like you might expect, with only things explicitly
REPLs (read, evaluate, print loops) are most commonly used for learning: my first experience with programming was summoning
python from command line and typing,
x = 1 and then run
x * 2. Don’t like the result?
x = 2 and then re-run
x * 2.
Notebooks are REPLs as documents. Jupyter notebooks have code cells that are executed by the
x = 1 might not actually be true, depending on what you’ve run last.
x no matter where you are in the notebook.
From the user’s perspective, what differentiates REPLs from notebooks is the ability to easily view and modify history local to your notebook. You run a series of commands transforming data, with each command producing intermediary state. If final data is not what you expect, you review your commands and intermediary states, find the ones that is not correct (sometimes by visualizing the intermediary state), fix it and re-run. When exploring and experimenting with ideas, not having to reset state or manually backtrack your final state saves cognitive overhead. You technically could do this with databases, but you’d need to retain all past commands run against the database and (for performance reasons) the intermediary states to be able to rewind to past state and play forward. This would be cost prohibitive.
Notebooks, therefore, are mostly useful as data dumping grounds: extracting a narrow slice of truth into notebook state and preparing it for presentation. REPLs, on the other hand, are much more useful for working with the trunk and any REPL state is ephemeral.
Snowflake acquisition of Numeracy has crystalized this for me. Historically, one of the most common requests from Numeracy customers was Python notebooks. We eschewed them and with Snowflake’s acquisition, it seems even less likely. SQL clients are effectively REPLs.1 Introducing notebook-like state forced the product to now represent the notebook state, in addition to the query and database state. This is made doubly confusing by sharing and collaboration, where you must now resolve divergent state for arbitrary data objects.
As a side note, one of the reasons that analysts will turn to Python notebooks is being able to use a Turing-complete, imperative language. SQL was designed for to insertion and extraction of relational data, and not transformation of ordered data. ↩