Tuesday, March 10, 2020

Install scripts: A maybe not so good idea

Many software projects offer a script to install the application with a single command. For CrateDB we have one as well and I think it was a mistake to create one.

Install CrateDB: The difficult way

Let me first explain the steps required to get CrateDB running without the use of a script. If you’re on a Unix system, the steps to run CrateDB (as of CrateDB 4.1) are:

  1. Install a JRE >= 11
  2. Download the CrateDB tarball
  3. Extract the tarball
  4. Launch it

The commands for that, under Archlinux, are:

sudo pacman -S jdk-openjdk
wget https://cdn.crate.io/downloads/releases/crate-4.1.3.tar.gz
tar xvzf crate-4.1.3.tar.gz

I’m sure most developers are familiar with those steps. Remembering the arguments to tar is a challenge for some. But for those, aunpack from atool can help.

Install CrateDB: The easy way

Still, 4 commands are apparently 3 too many, so the install instructions for CrateDB look like this instead:

sh$ bash -c "$(curl -L https://try.crate.io/)"

That’s better - or maybe it isn’t? What’s wrong with using try.crate.io?

It obfuscates the steps required to get CrateDB up and running. If you invoke bash -c "$(curl -L https://try.crate.io/)" you have no idea what happens. It might call into sudo and install CrateDB on a system level. It could try to create new users and groups or change system settings. Sure, you’d get a password prompt to alert you, unless you configured sudo to allow certain commands without password invocation.

The point is, you don’t know what happens. But what does happen?

The script backing try.crate.io is online on Github, called try.sh. It is about 200 line of bash, less if you don’t count the license header.

It tries to:

  • Detect the kind of system you’re running. (By sourcing various files in /etc, like /etc/os-release)
  • Detect if you have Java installed. If not it calls sudo <pkgManager> <java-packageName> to install it.
  • It verifies that you have the right version of Java. To parse the Java version it uses awk.
  • It downloads the tarball.
  • It extracts the tarball.
  • It launches CrateDB.
  • It waits for CrateDB to start up and become responsive using nc
  • Depending on your system, it opens the administration interface in your browser.

This is a lot of extra complexity to turn the four “difficult” steps into a single step. Is this a good trade-off? My answer is no. Consider the maintenance burden:

  • The default packages available in a distribution could change. nc and awk which are used by the script could become unavailable
  • The name of the java package could change.
  • The files used for the system detection in /etc could change

All of these outside of our control and if anything breaks, the user has no clue how to proceed.

This is but one example of a common theme: We take something that is simple and try to make it easier, but in an attempt to do so, we add complexity.

All problems in computer science can be solved by another level of indirection … except for the problem of too many layers of indirection