A (cough) short (cough) history of C++ build systems

2018-10-13, 06:53 (Modificat la: 2018-10-13, 06:53)

I will confess, this title is dishonest. I don’t intend to make a history of C++ build systems, but complain about the state of the most used build system – one that’s portable but is close to impossible to use on Windows, for example, and one that’s too inconsistent to be properly used in your projects. People, however, jumped on the cmake train, although most of us are not enjoying the ride.

So while the start of this article will feel as a legit introduction to build systems, the hate will come later in the article. You were warned.

The problem – why is it complicated to build C++ projects?

Languages nowadays usually come with a clear, well defined way on how to build your projects for that language. Go comes with an executable called go which does all the magic. Go is very opinated on how things should be done; I guess the designers were sick of the history I’m just preparing to write here. Similar tools exist for C#/DotNetCore and node.js. I think it’s a sane idea to do this from the grounds up. But probably that was not the case in the 70s when people were just beginning to write code in C. I have no idea how Algol or Fortran did things, but I think there was no huge change when C came into existence. C++ focused, just like C, on the language itself, and it did little to address the already big and messy projects; and C++ brought something very bad to the table. It brought the possibility to write gi-normous bloated projects very fast (you couldn’t do that easily in C)

But still, why is it complicated?

Let’s say your project has one file. It’s quite easy. I’ll use the GCC way here, you only have to call the compiler.

g++ test.cpp

The output is named a.out and it contains the executable that you wanted. If you’re 12, learning programming and someone really suggested C++ as a starting place, this and shudders vim should suffice. But this obviously doesn’t work with big projects.

Compiling C++ projects requires at least the following stages:

pre-compilation – running the cpp preprocessor over the file. This solves all the #whatever code you have in your program. #include, #define, #ifxxx, #pragma, all these are used during pre-compilation. At the end of pre-compilation, your file should no longer contain any #stuff you put in, all the macros are processed, and the file should be self-sufficient. You never see this file.
compilation – from the self-sufficient code, you obtain binary code specific to the target platform. The binary format is not assembly, but compiled assembly. The resulting file is an object file, usually with the extension .o or .obj on platforms that like three letters in extensions.
linking – you almost never write self-sufficient files. Almost all the time you will use a library, or code written in other files. In order to make them work together you go through this process of linking. You can link one object file with other object files or with libraries. You can link dynamically or statically. Dynamic linking means that you’ll not put the code of the libraries you link with in your executable, static linking means that you’ll put the code of the library together with your code to obtain the executable.

Simple? No. All these stages are quite complicated in themselves; for example, pre-compilation requires you to know where are the files you include in your C++ code, and you specify these paths with flags you pass to the preprocessor. Compilation requires you to know the target platform, the full definition of the classes and functions you use (that’s C++ for you, C does have a simpler model in which you don’t have function overloads and you only need the name of a function, not its full definition). Linking requires from you the location of the libraries you use and all the other object files you want to include in your compilation.

But this doesn’t end here either. Some compilation flags affect the way the code can be generated and linked – some very intimate aspects of compilation like which registers are used for transferring parameters – if you link two objects compiled with different compilation flags, they might not work very well together. But that’s nothing when we talk about ABI (Application Binary Interface) changes across compilers. The situation is so bad that at times it’s either impossible or unsafe to link together two components of your software with two compiled with different versions of the same compiler.

The genius approach – script files

Our idea of genius, the 10x programmer, rockstar, etc. comes from a bad study published in the late 60s, followed by a number of studies that don’t really do things much different either. But I heard this story (not sure if it’s true) that batch programmers were considered the 10x programmers – people that wrote batch files that would automate some jobs would do that. And it does make sense, in a time when operating the computer was time consuming and costly.

A batch script is quite simple: a number of shell commands to be issued in sequence. Smart scripts will stop when they encounter the first error (see set -e in bash), but the point is quite simple: execute all commands, in sequence, until done or a failure.

One of the first „build systems” I ever worked with was such a batch script. The script was using the standard variables (CC, CFLAGS, LDFLAGS) to build the system, it was relatively small, and it did something like a „unity” build. All the C files were included in a C file which was compiled. This, of course, was not an amazing solution, but it worked. This was actually the first try to fix one of the issues that plague such system.

Redundancy. You can solve this in a script by using variables, but you must copy-paste some stuff. You can fix this by using for in bash, and a few wildcard tricks, and it has one advantage: it’s imperative, the order of doing things is quite clear, and you can see clearly when things don’t work right. Simplicity at its best. Why even bother with more?

Turns out this is not a maintainable system. You compile everything every single time, and if you’re not compiling C it can become quite slow. C++ is a complicated language to compile, and if you dare using something like Boost, that will not fly.

Make

Looking back, I think that make is the only innovation when it comes to build systems. An all-purpose commands runner, it’s not aimed at building C applications, although that’s the most common use. It’s the epitome of mechanism over policy – and while some built in rules make it an out of the box compilation tool, it can do a lot of other things.

Make works by defining dependency trees and building only the stuff that need to be built, when they need to be built. A rule (or target) is „activated” if any of the dependencies is „activated” – the basic dependency is the target that denominates a file in your build tree. So if you updated a certain file, its corresponding target is activated. Rules are edited in a file usually called Makefile.

Let’s look at an example. Below we define a simplistic build system in which file.o depends on file.cpp and file.h, main.o depends on file.h, main.h and main.cpp, and so on.

test_exe: file.o main.o
    g++ -o test_exe file.o main.o
file.o: file.cpp file.h
    g++ -o file.o file.cpp
main.o: main.cpp main.h file.h
    g++ -o main.o main.cpp

The order of the rules doesn’t matter much, this will build your executable file as you’d expect. But this feels more complicated than expected, why should we go this way? In what way is this any smarter?

SOURCES=file.cpp main.cpp
test_exe: $(SOURCES:.cpp=.o)
    $(CXX) $(CXXFLAGS) $(LDFLAGS) -o
$@ $< %.o: %.cpp
    $(CXX) $(CXXFLAGS) -o $@ $<

As a matter of fact, the last rule is not even needed, as it is already predefined inside the make engine. The only thing to modify is a preamble which will contain the flags definitions, any environment variables update one might want to make, and you’re all set. For single file programs you don’t even need a Makefile – just type make executable_name and it will compile automatically executable_name.cpp into executable_name.

The make rule definition language is the first functional language I ever used. The beauty of it is that you can express a lot of things into very simple statements, and, for correctly defined targets, make can be incredibly effective, being able to parallelize tasks, to resume tasks from where it left off (if resuming is reasonable).

As you noted, I didn’t write the dependencies properly; it can be quite complicated to maintain a proper set of dependencies – in fact, nowadays, people don’t write dependencies by hand, but the dependencies are generated by the compiler itself (see the output of g++ -M $(CXXFLAGS) file.cpp and the thinner output of g++ -MM $(CXXFLAGS) file.cpp). Understanding how make works is key to understanding of most build systems that will follow.

The beauty of make is that it’s a tool implementing a mechanism – what you do with it is your business. Compilation is just one task this tool is very good at. So good, in fact, that most of the tools that follow are either copying make’s features or wrap around make.

Being a tool that implements a mechanism, it is oblivious to the platform you’re running on; it can do a lot of things, but leaves a lot of issues out – it’s a task runner – but defining these tasks can be cumbersome. Make, however, works splendidly for small projects.

From my point of view, make is the first and last innovation when it comes to building – everything else is just adding new things to what make does, but the core of compilation remains something make-like.

autotools

I have truly mixed feelings about autotools. On the one hand, autotools are an utter mess, and were an utter mess most of the 90s and 00s; however, they did provide the mantra of ./configure && make && sudo make install that works all the time on all the systems. And it’s fascinating to discover autotools today, even if they are definitely dated and slow.

Autotools tried to fill in for the many things missing from the existing build systems – it was a stab at unifying the build system of many GNU tools. Unfortunately, the system became bloated, under-documented, and hard to use by people who don’t know the right people to ask.

Autotools broke the task of setting up a software tree to be built in two major tasks – the configuration (a configure script that will check for external dependencies of the software and expose them in a consistent manner to the software writer) and the makefile generation, thus autoconf and automake. Together they generated configuration scripts that would run, prepare the tree for building the software, then offer standardized targets (all, install, clean, distclean). Autotools prepared the advent of package managers for Linux systems. But while the usage was standardized, offering as few surprises as possible, and offering a self-documenting configuration system (with its flaws), for the end-users, people that downloaded software from the internet and built and installed it on their boxes, it worked perfectly. For the developers, though, it is not as fun.

The documentation of autotools is obscure to say the least. The choice of M4 (a macro language) as the language to build stuff from, the existence of two separate tools with unclear responsibilities (I lied a bit a paragraph earlier, the responsibilities were quite vague), an elitist and arrogant attitude of the developers condemned the autotools system to the „stuff we don’t talk about” corner. Project management is impossible, the whole system is inclined towards generation, sometimes two levels deep generation of files, and people trying to add new content, or change stuff in the build will fail due to complexity of required changes.

Because yes, project maintenance is not about adding or removing files; but sometimes you need to fix some more complex issues with the build system. Sadly, autotools doesn’t offer solutions, and while brave, the IDEs that initially embraced autotools stopped doing that eventually.

Other critique points for autotools include the fact that most tests the configure script generates are useless, one rarely needs those and they never need to write stuff that portable. A configure script can run, sometimes, more than the compilation itself. Issues were (and still are) hard to diagnose. Building software with autotools is slow, if the makefile decides that it should reconfigure the tree you’ll be waiting for three-four minutes for the reconfiguration to happen. And autotools are not as easy to integrate in an IDE, and they need a great number of configuration files. This verbosity is natural from the way autotools are built – a collection of scripts under one umbrella, autogenerated, with atrocious speed and performance issues.

The greatest win is that autotools bring a standardized and self-documenting approach to how builds should be made. The path of least surprise, that’s what this brings to the table. configure --help would tell you what switches you can use, you have a lot of –enable-x/–disable-x or –with-x/–without-x. The intention is more than laudable. But the multiple generation steps and the voodoo ways to edit the .in and .am files, hardly documented, made the effort of proficient users of autotools look like magic.

There are good points and bad points about autotools. Unfortunately, the fact that nobody in their right mind could start an autotools project nowadays is mostly due to the core technological choices. Although it looked like a clear winner in 2005, autotools is legacy nowadays, when everyone tries to switch to other, better options.

Enter cmake

So you want a project file that’s declarative, short, and to the point. You want simple statements that are meaningful; you don’t want to fight with too much boilerplate, but you want to specify a set of files, some flags, and get the whole project compile. Enter cmake, the worst option you can choose, that’s not autotools.

The main issue of cmake is its success; not only was cmake successful and ported on different platforms; cmake is so successful it has to support years of stackoverflow documenting it. One can understand how bad autotools were by looking at how bad this other option is.

cmake is not a novel idea; let’s have a declarative system, which we can hack until we can put everything we want in. In fact, a far superior build system that started with this idea in mind is qmake, but qmake didn’t really impress outside the Qt community for reasons that are a mystery to me. QMake is declarative, easy to understand and extend, you list the files that you want to compile, the flags you want to use, you set the project type and all’s good. qmake was meant to work with the QtCreator, a capable IDE inspired from the original Visual Studio 6, probably still the most influential IDE ever. I’m not sure why it’s not the build system that we should all use; perhaps someone can illuminate me.

But since the whole reason for this enormous rant of mine is explaining to the world that cmake is a bad bad option, let’s get to it. cmake is somehow like qmake; it’s declarative, somehow, but it uses some strange syntax from a made up inconsistent language. It consists of statements that start with a function name and a set of parameters between parenthesis. The set of parameters is completely inconsistent, and one has to learn by heart what does who, because the documentation for cmake is atrocious and the documentation site has no search button. Seriously. Not even 1999 level of search can help you.

In fact, most search results on Google go towards stackoverflow, who became documentation replacement. The obvious caveat for using stackoverflow for documentation is that you hope that your question is properly answered to, and it’s usually not. Nobody explains concepts and nobody says what’s „proper” and what not; it’s complicated to have a proper build system when every single time the solution to your problems is to copy-paste something out of SO.

cmake is patchwork. We’ll see that below with an example, and let’s discuss a bit how cmake should be working: The cmake project file is called „CMakeLists.txt„. Not negotiable. This means that only one such file can reside in a folder (so this is a per-folder file). This choice is dubious from my point of view, but we can go with it, it’s not a bad thing to have different folders for various different sub-projects. However, I think that this is a limitation that will make you have multiple CMakeLists.txt spread all over your project.

This is not a problem if you start from scratch, but it can be a tough proposition for existing projects. It’s not a completely bad choice either; some people like it like that. I personally hate capital letter files, I got used to having Makefile around, but that camel case and .txt extension really bothers me. .txt reminds me of DOS, and it’s an uncomfortable place I don’t really want to go to. But fine, CMakeLists.txt, horrible name, horrible extension, but after a while you can live with it. Let’s look at a basic cmake example to see how that looks like.

This is the second cmake file that comes straight from the cmake tutorial. The first one deserves a comment, but we can include it in this one. The good thing is that unlike autotools, this is not really a Makefile generator, and it shows in the project below.

cmake_minimum_required (VERSION 2.6)
project (Tutorial)
# The version number.
set (Tutorial_VERSION_MAJOR 1)
set (Tutorial_VERSION_MINOR 0)

# configure a header file to pass some of the CMake settings
# to the source code
configure_file (
  "${PROJECT_SOURCE_DIR}/TutorialConfig.h.in"
  "${PROJECT_BINARY_DIR}/TutorialConfig.h"
  )

# add the binary tree to the search path for include files
# so that we will find TutorialConfig.h
include_directories("${PROJECT_BINARY_DIR}")

# add the executable
add_executable(Tutorial tutorial.cxx)

I approve of the usage of # to start a comment. This actually enables to make the CMakeLists.txt an executable file by using the #! which most shells understand. Of course, the whole idea is compromised by using that txt extension, because this is not a simple text, but in theory this could be done.

The first line in a project file talks about the minimum required version of cmake. Why? Is there a reason to build only with a certain version? We’ll see later, it’s one of the reasons why I got to hate cmake. But let’s notice for the time being how we specify the fact that we link to a certain version:

cmake_minimum_required(VERSION 2.6)

Why do we specify VERSION there? What does it mean? We can use other parameters? Is the first parameter a field name and the second one a value? This is how all statements are built? The simple answer is no. That VERSION there is put to be checked by cmake that it exists. It could’ve been just as easily cmake_minimum_required(2.6) or it could have simply not existed at all. Does anyone know or care what did cmake add in version 3.11? Not really. This line should not exist at all. But this line exists because you have to know the level of patchwork put into cmake to work.

project(Tutorial)

Now, why wasn’t this project(NAME Tutorial) I don’t know. There is absolutely no logic into why you need to specify that it’s about the version in the previous statement, but you don’t need to specify a name here. What sort of name do you use? Do you use a friendly name with spaces and diacritic marks? No. Is this name used later? Turns out it is, but not directly, and you can’t really use the project name as anything. What if you have two projects in the current folder? Tough luck, buddy, but be sure that there’s some patchwork that will support that too. Or no, no, a project is not what you think it is. A project is… Ok, nevermind. But you have to have a project statement as well.

In fact, looking at the documentation, which is hard to find because they have no search button on their site, project does support a version field. The whole thing looks like this:

project(project-NAME [VERSION major[.minor[.patch[.tweak]]]] [DESCRIPTION project-description-string] [HOMEPAGE_URL url-string] [LANGUAGES language-name...])

This actually defines some variables. Why the project name doesn’t require a NAME in front is a mystery, but it’s fine, you don’t have to get it, you don’t need to get it, and you’re stupid for asking why there’s not a NAME in front of the project name.

This sort of inconsistencies just spring all over the place. And these are only the first two lines, which are mandatory for the top level cmake project, whether you actually use anything from it or not. In fact, if you don’t have that, but you want to include a definition from a different file, your definition will be discarded, because magic.

There’s absolutely no reason for anyone to do this. If you have multi-level cmake files, and most projects do, that means that the lower level ones are not independent, and they won’t build on their own; in fact, these lines are mandatory only in the top level file, and their presence in lower-level files will only mess your project up.

The set is adorable, but let’s say that we get it. It’s intuitive, at least that. But it’s one of the weakest sets I ever seen, ever, and I worked with scripting languages defined and implemented in under one hour. We’ll see a bit later that perhaps it needed a bit more teeth.

The configure thing beats me. From my understanding, it offers the possibility to replace certain variables in a text file, that much I get, but why is the output in the Binary directory? That means that my code cannot be checked by some IDE unless I actually include stuff from the binary directory. Why would I ever do that? Hmm, there are reasons, if I work with generated files, but this is bad practice anyway, there are other ways to define a version number, why define them in a build file and inject them in code when I can do it the other way around? I mean why not make my source code be the master of what’s versioned how, and don’t make me generate stuff. Please keep in mind that this is the cmake tutorial.

Then include_directories. The idea of adding binary directories to the list of include directories is really bad, but ok, that’s the way the makers of this program suggest we should do, so we’ll do it. But can you add multiple folders? The full documentation let’s us know that yes, but… I don’t know.

include_directories([AFTER|BEFORE] [SYSTEM] dir1 [dir2 ...])

AFTER/BEFORE what? SYSTEM what? what if I want to add dir5 as SYSTEM?

There is no reason why things are written like this. Why in the example above you had parameters before every value except the first, and in this one only on the first or second position? What’s the logic? What’s in the parenthesis? How can we discover this intuitively?

add_executable(Tutorial tutorial.cxx)

What do you add it to? You add it to a project. But you can’t have multiple projects in a file. So is it a project, really? No, it’s a name. Let’s look at the documentation: add_executable(name [WIN32] [MACOSX_BUNDLE] [EXCLUDE_FROM_ALL] [source1] [source2 ...]). Notice some things. I’m not sure why you need to say there WIN32 or MACOSX_BUNDLE, it’s definitely not consistent with any definition we seen before. include_directories had the flags at the start, project had them inside and with values. So what’s the deal? We definitely cannot learn how the cmake magic incantations have to be written, so we’re meant to copy-paste from stack overflow solutions that worked for someone sometime.

It’s not the only place where you’ll find inconsistencies; in fact, each command is so different from the rest in so many ways, that they had to make you specify from which version of non-sense onwards you’ve been playing with this, because they cannot offer a consistent experience for users with older versions, because none of the commands of cmake make any sense without the special cases they implement for every single line in your code. I’ll only add for fun the if, else and endif which, you guessed it, are all functions. Why would you write in any language else()? Probably the consistency point is the presence of the parenthesis, but if you need to add them everywhere, why would you add them at all?

For cmake programmers (and I call them programmers because you can’t be a user) the only solution is StackOverflow. SO is the go-to place for all the failed projects like this, and they offered KitWare the possibility of building something with this level of inconsistency because people were willing to actually invest the time in this.

Project management with CMake

Now let’s look at it again. You can’t specify all the files from a folder to be added to your project. You cannot select a number of files, you have to add each one, independently. You add a new file? You have to run cmake again. There is a mumbo-jumbo magic line that you can write that will address all the C++ files in your folder, but I don’t know it. It’s on StackOverflow.

The same goes for „how do I select the C++ standard”? This is an important issue, so important that they invented a new command for it. I’ll give it to you, because you have no means to actually discover that without Stack Overflow. It’s set(CMAKE_CXX_STANDARD 17). The surprise? If you want to make the standard to be 17 you have to use cmake 3.10 or later. 3.9 will tell you that this doesn’t work, only 11 and 14 can be valid values. And now you know why you have to write that silly nonsense at the beginning. And we’re talking here about a compiler flag. Now imagine what you’d have to do to compile your code against Google Test. Or just click this link, and be awed. gtest’s cmake scripts are also sign that you can’t simply have 20 files compiled for multiple platforms without hundreds of lines of code in cmake.

Is cmake a tool that you can use? They had people work on it for years, and cmake is now on every platform. But is it good? No, it’s crap. Highly portable crap, but crap nonetheless. It seems that people don’t understand that portability is not having ifs for each platform the way you have to do with cmake; but I guess we’re at a point in time where this is impossible. People are using it, successfully, because the nature of build systems is that you end up working around all the issues and making one very fragile setup which works. But from my point of view, cmake is only slightly better than hand-written scripts, and way below well thought Makefiles when we talk about clarity and expressivity.

You don’t have a clear description of the project, and you have bad documentation. You don’t have easily discoverable ways to extend your build system. Identifying where some libraries are installed might work vaguely on well built Linux distributions, but it won’t work at all on Windows,

The problem I have is that a lot of projects actually embraced cmake. And it’s a shame, because we’re stuck with this deadweight; in fact, one of the reasons why I can’t get any work done outside my professional projects is that I cannot find a working setup with cmake and nobody supports anything else, and there’s too much work to fix all this insanity.

I’ll cut this part short now. There are good things in cmake as well, as it borrowed from other tools, but it lacks the consistency that would make it a sane option for the future. I’m afraid to see how C++ modules will affect cmake. And modules are coming, 2020 or at some other time soon enough. And we’ll be still building our with cmake. That’s probably the saddest thing I ever said.