Robert Važan

Scripting as a programming paradigm

Scripting is not a language feature. It's a programming style that may have some language support. Scripting is the opposite of software engineering. Script is designed for a single task, often a single run. It is used by its own developer and frequently relies on his manual intervention.

It is surprising how something as common as scripting could be so poorly understood. Most people explaining it online are wrong. Most answers on StackOverflow are wrong. Wikipedia is wrong. Let's quickly summarize the common misconceptions about scripting.

What scripting is not

Some people mistake scripting for interpreted or dynamically typed languages, but scripting can be done equally well in compiled and statically typed languages. People who make this mistake will use some dynamic language in hope of getting the benefits of scripting, but then continue to use software engineering practices that are incompatible with scripting.

Many professional software developers think of scripting as that toy programming system for users. In reality, scripting requires as little or as much skill as software engineering. Data science is an example of scripting that requires considerable skill and knowledge. There isn't any audience split between software engineering and scripting either. Both can be used by the same person. This misconception results in poorly designed scripting systems that are dumbed down in hope they will be easier to use, but it only annoys and offends their predominantly expert users.

Glue code and customization code is often called "scripts", but that's not what this code really is. Both glue code and customization code is typically developed by professional software developers using standard software engineering practices. Misrepresenting glue and customization as scripts results in shoddy quality of this code.

So what is scripting then?

Software engineering is superior to scripting in all-around software quality, but this quality is expensive in terms of implementation complexity, cost, and delivery time. Scripting enables orders of magnitude higher productivity by cutting some corners regarding quality.

Scripts are developed by their users. There is no need for nice user interface. Plaintext input and output files in predefined locations will do. There is no need for any kind of documentation, because the sole user has just written the code and he fully understands what it does. There is no team or customer communication overhead, no formal requirements, no budgeting, no vetting, no bureaucracy. Since the user is the developer, he can bend task definition to make the implementation easier to develop. There are no fountain wristwatch requirements.

Script is executed under manual supervision of its developer. This stands in sharp contrast to engineered code, which runs under Mars expedition conditions, completely detached from its developers. Supervised execution nearly completely eliminates error handling. The script just crashes when it encounters an error and its developer investigates the issue at that point. Scripts don't have to handle all corner cases. They can just throw exceptions. When some rare case actually occurs, the developer adds handling code at that moment. There is no need for telemetry, analytics, metrics, or logging. The script is developed interactively using direct observation of its behavior. No need for automated tests either. One-off manual tests are sufficient to satisfy the relaxed reliability requirements for supervised scripts.

Scripts are narrowly specialized. There is no generalization overhead. This increases the number of assumptions that can be made to simplify the code. Many corner cases become irrelevant. Scripts can assume everything about the execution environment. There are no deployment or compatibility issues.

Scripts often run only once aside from development/debugging runs. Throwaway code can be more messy and less general than code designed for regular execution. There is no naming, archiving, and versioning of code. Output of the one-off execution can be manually examined, which replaces internal sanity checks and testing. Assumptions can be based on input data rather than the general problem domain, which simplifies the script.

Engineering software by scripting

Software engineering and scripting are opposite tradeoffs between productivity on one side and generality of software on the other side. It is nevertheless apparent that some ideas from scripting can be transferred to software engineering without sacrificing quality. The question is how far can we go without undermining quality?

Cloud software adds some aspects of scripting to software that is otherwise engineered. Cloud software can make assumptions about its execution environment. It only has to deal with browser diversity. Cloud environment is known and controlled. Even though cloud software is used by end users, it still runs on computers controlled by the developer. This makes all activity of the software visible and deployment of fixes faster. It's still not scripting, but it's not space expedition software either. It is possible to handle rare cases by showing an error message and notifying the developer to handle the case manually. Analytics and easy access to user opinion simplifies requirement gathering and software design.

Some scripting concepts are already present in programming languages. Exception handling can be used to reduce error handling to a few carefully picked sites in the code. Good libraries make UI authoring almost as easy as handling raw data files.

Some overhead of software engineering unfortunately cannot be avoided. There is communication overhead in the form of documentation and user interface, not to mention requirements gathering for custom software. Many use cases must be handled upfront, not just after some user encounters them. And automated testing is unavoidable despite all the visibility in the cloud.

Personal and team script libraries

There were times when I shunned script libraries as a dirty hack that highlights lack of quality software. I have deliberately and systematically avoided scripting as a bad habit. But the pesky scripts just kept reemerging and growing. Eventually, I came to understand that having personal script library is an unavoidable part of being a developer. It's a skill and a habit that improves productivity and that is often a job requirement these days. This is one of the reasons I recommend Linux especially to developers. It is way easier to script than Windows.

Personal scripts are a problem in team environment, including in opensource projects, because scripts are often an essential part of the software being developed. This includes build, deploy, and server setup scripts as well as code generation scripts. If a developer leaves the team, part of the source code leaves with him, which can cripple and possibly kill the whole project. For this reason, I recommend keeping all scripts in the repository with other source code, in both opensource and proprietary projects. Sharing scripts moves them slightly in the direction of software engineering, but that's inevitable in team environment and it does not cost much.

Research and data science

Data science and more broadly research is an ideal application for scripting paradigm. Research may be published, but research kits and notebooks will be only ever used by skilled experts in the field. All inputs are hardcoded in the form of input or training datasets. The only "multiple runs" requirement is that outputs must be reproducible. Lower level code might need unit tests, but all high-level report and visualization code is tested only manually. Research kits and notebooks are therefore largely just script collections. Most of the scripts I write these days are part of some research kit.