KLB Build Tool

2021-07-30, 10:55

Some time ago I wrote about the sorry state of build systems in the C++ world. In the mean time, I educated myself with a few other alternatives, yet I wasn’t happy. A build system is best when it’s not there, and after writing that rant and being basically mauled for bashing cmake the way that I did. I still stand by it. cmake is based on an inconsistent language, there is the promise of a modern cmake that will fix all the confusion, but I never saw a clear description of what is that modern cmake. But I’m not here to bash cmake again; cmake sadly became a de facto standard, and now, when one writes code, they need to go through the pain of maintaining a complex set of files that are spread all over your source code.

Cmake is not in an enviable position. It has to offer that one-size-fits-all solution. It does that job poorly, it over-complicates things and takes over more responsibilities than it should. It’s the nature of things to grow, and things that grow to take on more than they should. But if cmake tries to offer the one-size-fits-all, I’m in the position to say that cmake doesn’t fit me, and I want something else. „Fix cmake” (the code is relatively clean, if you ask me). But I don’t need cmake.

One size fits one

I started working on klb a few months ago, when I grew frustrated with the time it took me to set up a proper CMake build. I saw alternatives at work; I saw Vlad Petric’s akro doing the job quite nicely with very little overhead. I also discussed with him how he would see the evolution of his build system. Of course, the objective is, in the end, to have zero-configuration – the source code should be the ultimate source of truth. But is that even possible?

Turns out it isn’t, really. You have things like the language standard (which one doesn’t write in the code). You have things like external libraries, which have to be installed in the system and reachable from your development environment – and that always is someone that one cannot predict - there is no single central repository of C/C++ libraries, and no ultimate authority on how one package should be named, installed, and so on. This is the things that make C/C++ development harder.

Also, while other systems for other languages imposed from the start a certain layout of the code, C++ doesn’t do that, and it’s not necessarily bad. It offers a bit more freedom, although there is wisdom and order in the way others do it.

When I started my implementation, I first created a pilot in python. But needing python to be installed is somewhat unpleasant for C++ development; and I think C++ is mature enough to manage its own code. And shortly, I realized that no, perhaps it isn’t. Not if you want to feel good writing and reading the code.

My initial project was quite different. I did not want to build a build system. But I had to, the moment I wrote my Makefiles, and realized that this will get out of hand soon, no matter how I arrange my things. So I started writing klb. My objective? Compile my own code when running klb. That is all.

I realized I don’t want to solve all the problems that normal build systems want to solve. I don’t want to offer options. One doesn’t need as many options as they might think. One rarely needs compile flags. And when that one is me, it’s easy to fix the things that bother me, and move on.

How klb works

klb builds stuff by reading the source code and determining what code should be compiled and what objects should be linked to create an executable. So there are a few basic functions:

Determine which object files need to be rebuilt when you change, for example, a header file.
Determine all the object files that are required to build an executable.

While #1 is easier, it’s definitely no trivial task - one needs to read the files that are included by a certain C++ source file and then look at all the files included by that file, and so on. My solution was to scan all the source code in the src/ folder, and remember all these inclusion chains.

For #2, however, things are a bit more complicated. How can one know which are object files that one has to link together to form an executable? Here’s one restriction I had to come with: a header should have its full implementation in the similarly named C++ source code file. So if you have a feature.h header, the feature.cpp should have the full implementation of the features presented in feature.h. This is a common sense request, from my point of view. As long as all the features advertised in feature.h are implemented in feature.cpp, the problem becomes relatively similar to the first one; of course, however, we now have to build a deeper chain of dependencies, because each feature.cpp might include sub-feature1.h and sub-feature2.h which the original source file knows nothing about. So what we need is to pull those dependencies along, and if sub-feature1.cpp or sub-feature2.cpp is present, they should be built into object files and linked in the executable, as they offer sub-features for a feature that you want to include.

Based on these dependency trees one can determine if something needs a rebuild or not. klb naturally skips the targets that are up to date. Then, when it determines which targets should be executed, klb moves on and compiles and links those targets.

This is all that klb does.

Current status

What I wrote above works and I can use it to build my own code, no problem. But there are things that I can and will improve, for certain:

Parallelism

klb runs tasks in sequence, so there’s no make -jN equivalent, not yet, at least. The first step would be to compile all the object files first with multiple processes and then perform all the linking with multiple processes. However, linking is more often a very memory-intensive operation (especially if you use something like LTO), and this solution will place a higher burden on your system than necessary. But parallelism should be there, and enabled by default.

Single target build

While right now scanning the whole source code is convenient, I think that with the growth of the project, building a single target at a time could be more efficient; reading the source code only when we need to update something. This is longer term plan, and really, a secondary concern.

Using external libraries

When I started this, I assumed that a minimal configuration will have to say something like: executable: LINK FLAGS. I didn’t think too much about it, then I moved on. When I talked with other people about my idea, someone pointed me towards Rachel by the Bay and her depot building tool. I felt silly, awkward, and redundant, but then I realized that I could borrow some ideas from there. So the best way to configure this is by keeping an association between the system headers that are included and the flags that one needs to add. So I’ll steal that idea, and run with it, probably will also add pkgconfig as an option.

Speaking of flags

Right now absolutely nothing can be customized about how things are getting compiled. Honoring CXXFLAGS, CFLAGS, and perhaps LDFLAGS might be a good starting point. Perhaps honoring CXX and CC as well, with a driver for clang aside this one for gcc might be a good idea too.

Otherwise, I’m quite happy with the progress thus far. It’s not perfect, the initial version has at least one very visible bug, but gcc is kind and understanding with me, and it doesn’t spit thousand of lines of errors when compiling my own code.

Writing klb

I love C++, and, like any C++ developer, I loathe parts of it with passion. I needed to focus on the good parts, and make the bad parts go away.

A few years ago I was asked about how I would see the evolution of a C# solution. There were good reasons to think about performance, but C# brought to the table something that C++ did not. How fast people could write correct code, with very few hidden bugs. The development speed of a C# solution is amazing - I love the productivity of that language, and would probably use it more often, if I didn’t have this fondness for C++. It was the first language I learned (granted, it was C with classes, but still) and really understood. Not completely, but it helped me make sense of computing in general. After writing a lot of code in Z80 assembly language and SPECTRUM BASIC, C++ was mind-blowing. C++11 made it both harder and easier to hate the language, and the next versions of the standard even more so. The continuous evolution of the language makes me want to stick around, and see what’s going on.

So when asked if C++ can be as effective as C# when it comes to development, I said that with enough investment in the core tools, C++ can be as productive as C#, with the added performance bonus. I believe in this: it’s an API issue with C++. Sure, there are things that are easier in other languages, there are things that are made harder due to C++’s heritage, but productivity-wise, one can be as productive with C++ as with C#, if they have the same tools at their disposal. Like any C++ developer, I’m deeply opinionated about various topics. However, it’s a distraction, I’m more interested in what I can do to make things better for me I want to talk about the things I did do.

I needed to start from the basics, and the most basic, and probably the number one cause of failures in all the C/C++ projects is the lack of proper text handling. So I created a Text class that acts like a std::string_view but has the safety guarantees of std::shared_ptr behind it. This is naturally a const class, one doesn’t modify existing strings without doing weird casts that no sane person would do. This means that Text doesn’t benefit from something like SSO. It’s a cost I am willing to pay.

The most annoying part is interacting with C APIs and having to produce the zero-terminated strings. But I guess that’s one additional price to pay (and, if possible, try to avoid APIs that take zero-terminated strings and instead they take a pointer and a length). If possible.

The next, I needed to create a DateTime that will spare me the messy std::chrono API. I borrowed some ideas from the C# DateTime, and although it’s far from a complete implementation, it’s functional enough, if you can accept a tenth of a microsecond granularity.

I also wrote a List (wrapper for std::vector), Set, Pair, Dict, with APIs that make more sense for me, and I feel that make me more productive too. I will look into them more, and probably update them in time to be better, but I like the public APIs so far.

Now, about klb itself. The code is quite a mess, I admit it, it went through two different iterations, and it’s still a mess. A more readable mess now, but still more readable than gnumake’s code. After a few weeks of blocking myself due to how ugly the code is, I decided to implement a solution that works, and improve on that. At least, now I have something that works, and not something perfect, but not working at all.

I guess, that’s it. Have a look at it, at https://github.com/dorinlazar/kl.