Spreadsheets are "the dark matter of business software": they're everywhere, they're invisible, and they hold everything together. Business and finance run on spreadsheets; no other software tool has empowered so many people to build solutions to so many different problems. In this context, you have to understand any assertion that "Jupyter is the new Excel" as intentionally sensational.
Jupyter notebooks do, however, share some key similarities with Excel spreadsheets. Notebooks are ubiquitous in scientific and statistical computing, in the same way that spreadsheets dominate business operations and front-office finance. In this post, we'll explore some philosophical and practical similarities and differences between the two tools in an attempt to explain why both have such passionate fans and critics.
similarities: the positives
- Superficially, both Jupyter notebooks and Excel spreadsheets use "cells" as a visual metaphor for breaking an analysis into discrete steps. Cells in both formats contain code and show results.
- Both are designed for interactive, iterative, exploratory analysis, combining computation with data visualizations.
- Both aim to have a shallow learning curve for beginners.
- Both are designed to be self-contained and easy to share. Online environments like Google Colab and JupyterHub abstract away the often-complex Python setup process.
- Both have a strong hold on higher education in their respective fields. Business schools almost universally teach financial modeling with Excel, and STEM departments usually teach data analysis with Jupyter notebooks1. New graduates bring their familiarity with these tools into the workplace.
similarities: the negatives
Both Excel spreadsheets and Jupyter notebooks are criticized by software engineers as not being "real software". Aside from the obvious limitation that both artifacts require another program to run, they also make it difficult to adhere to software engineering best practices:
- As large, monolithic files, they're difficult to version control with developer tools like git. Office OpenXML documents are zipped, which "scrambles" the file contents so that git can't track changes to the underlying data. Jupyter notebooks are really just large JSON files, but cell output and execution count changes introduce superfluous deltas2.
- Both Excel spreadsheets and Jupyter notebooks are difficult to productionalize, although both tools do get used in production in practice. Excel and Jupyter are heavy execution environments that introduce their own dependencies and seem wasteful to engineers used to writing standalone scripts.
- Both are error-prone and difficult to test. The fact that both platforms cater to users with less experience writing code gives them a reputation for creating solutions riddled with bugs. In reality it might just be that, without tools like unit testing or a culture of quality control, bugs in spreadsheets and notebooks are more likely to make it into production.
differences
- Excel makes it easier for non-programmers to understand how data flows between cells.
- Excel's grid provides a natural way to reference data via cell coordinates, whereas Jupyter relies on named variables, forcing users to confront the reality that naming variables is hard.
- It's easier to inspect intermediate results of multi-step calculations in Excel because the cells are right in front of you. Print statements in Jupyter notebooks require more effort to set up and execute.
- Excel is self-contained; Jupyter's value lies in Python's package ecosystem.
- Python's reliance on external libraries makes it easier for IT departments to restrict Jupyter's use.
- Both installing Jupyter locally and running notebooks over a network require more setup than opening Excel.
- Most Excel spreadsheets only use functions that ship with Excel, which means that a business contact can just open your model, modify it, and run it. Notebooks are difficult to share outside an organization, and even within one, because they're so tied to a specific Python environment and Python environments are difficult to set up.
- Excel can function as a "poor man's database", storing tabular data across multiple sheets and providing OLAP-like capabilities via PivotTables. Jupyter notebooks usually load data from an API or shared file location, another reason why they're not as self-contained.
- "Fudging the numbers" is easier in Excel than in Jupyter. Spreadsheets update in real-time without having to re-run code or set up interactive widgets. One-off changes are easier to make, which matters when speed is of the essence.
- Working with code is unavoidable in Jupyter, but Excel can be used entirely through a GUI: there are even menus to select functions in cell formulas.
- Jupyter is more open-ended and flexible, but it requires more technical knowledge to use effectively.
- Jupyter has a stronger emphasis on narrative and storytelling than Excel.
- Jupyter notebooks are designed for literate programming, where code and prose are interspersed to create a narrative flow.
- Reporting and presentation in Excel typically relies on either copy/paste or integrations with PowerPoint.
implications
Microsoft's efforts to integrate Python into Excel won't significantly erode Jupyter's dominance in scientific and technical computing. Spreadsheets lack a natural narrative structure, which makes them less suitable for education and reproducible research. Moreover, the "open science" community will never adopt a closed-source tool built by an American tech giant.
Tools and "best practices" will emerge to mitigate the operational disadvantages of Jupyter notebooks3, just as they have for spreadsheets. Most front-office users will ignore such guidelines4, engendering ongoing tension with IT departments. Having witnessed how things turned out with Excel, many IT departments view supporting Jupyter like opening a Pandora's box of security vulnerabilities and maintenance headaches.
Both platforms will survive into the foreseeable future. Neither will supplant the other because they target user bases with fundamentally different skill sets. People working at the intersection of quantitative modeling and business decision-making will continue to need familiarity with both tools.
conclusion
Use the tool that best fits into the culture of the organization in which you're solving problems. There are situations where technical requirements will force you to use one tool over the other, just as there are organizations that will only allow you to use one tool or the other. If you work in an Excel-dominated field and do need Python's capabilities, in my experience it's easier to read and write Excel spreadsheets from Python code than it is to get Excel users to open a Jupyter notebook.
Software engineers and IT departments worldwide will continue to look down on Jupyter notebooks, just as they have done with spreadsheets for decades. The fact that MBA-types don't use Jupyter notebooks makes it easier for IT to enforce draconian restrictions on their use. Ironically, many front-office users may only gain access to Python once Microsoft finishes integrating it into Excel.
-
Some holdouts still use MATLAB, R, SPSS, or SAS, but hefty licensing fees will continue to push users towards free and open-source alternatives over time. Capturing the education market is a key part of the business strategy for firms like MathWorks, but it's unlikely they'll hold on forever. ↩
-
Tools like nbdime can help with version control for Jupyter notebooks, but using them adds another layer of complexity. ↩
-
Tools like papermill aim to streamline running notebooks in production environments. Cloud providers also support creating pipelines involving Jupyter notebooks in production. ↩
-
How many people have even heard of the FAST standard for building spreadsheets? ↩
The above is the detailed content of Jupyter Notebooks Are Like Spreadsheets. Learn Both.. For more information, please follow other related articles on the PHP Chinese website!