Learning curve of a complex system

@lokxii.bsky.social

Learning curve of a complex system

Alternative title:

A rant on CMake by an inexperienced programmer

A while ago, I wrote an article about writing an add-on for fcitx5. I chose CMake for the build system because that was what the tutorial article I was following used. So when I decided to make one of the module (Input method engine) into a library, I used CMake. The problem was that an add-on was a shared library. I have to link the IME library to the shared library, and CMake refuses to spit out the correct flags for ld to link the libraries properly.

I set IME to a shared library, and seems like a shared library cannot be linked to a shared library. After searching deep in Stack Overflow and CMake forums, I finally learned that I have to set -DCMAKE_POSITION_INDEPENDENT_CODE=ON and make IME a static library.

My thought during the hour of struggle was: I hate CMake. But when I think deeper, it is not really a CMake problem.

When we are forced to use complex systems

There are times that we are forced to use systems we aren't familiar with, be it CMake, ffmpeg1 or React2, it feels exceptionally difficult when we don't know the tool enough. Remember the early days of learning to program, everything doesn't seem to make sense. You keep banging your head against the wall, trying to resolve all the errors you encounter.

There were a few times that I simply wanted to concatenate a few video segments into one. I don't have video editing applications installed, and ffmpeg was all I had. Concatenating videos is a frequently asked question on the internet, except that video segments have different resolutions and aspect ratios, and some of them even doesn't have an audio channel. I couldn't find someone already asking the question with such specific situation. ChatGPT couldn't spit out the correct command3. So I spent half an hour staring at the red error messages, trying different combinations of options and arguments, until I finally got it working4.

I don't want to learn ffmpeg. There are simply too many options and filters to memorize. I still couldn't figure out how to correctly map channels to refer to them later.

The number of features of a system scales with its complexity. Simple tools like wc only have a few flags, and more complicated ones like sed has so many commands that you probably would only use s// and //d. Not to mention that the man page of ffmpeg have 2800 lines on a 81 character wide terminal window (250 lines for sed).

Do I need all these features?

There was a time that I wrote my Makefile to build my project. It has its own way to build libraries, handle dependencies, look for headers, etc. As long as I follow a certain convention to create modules and setup vendor libraries, everything just works when I type make in the terminal. It is probably more difficult to write a Makefile than CMakeFile when there are target dependencies and second expansions of variables, but I would argue my Makefile is more easy to use than CMake. I wrote the Makefile once and used it for 2 years.

Of course there are lots of things that my Makefile cannot do. It cannot automatically search for installed libraries. It cannot clone a GitHub repo to use as a dependency. It cannot install additional files to specific locations. It cannot produce a library.

Because I didn't need these features.

Try to remember how often you install a complex tool just to use one feature. Or you bundled the whole universe into your JavaScript project just to use one function. Most of the time we don't fully utilize the tools we use. Why should we install things that we don't use?

The so-called "UNIX Philosophy" have several rules. It went through a view revisions when it is summarized by different people at different times. All those stuff isn't really important in this case. The idea I want to focus on is:

Write programs that do one thing and do it well.

Uncle Bob wrote in the book "Clean Code":

Functions should do one thing. They should do it well. They should do it only.

What is "one thing"?

It is an abstraction problem

There is really no correct answer to this. I like to think about it in terms of abstraction. Building a project is "one thing". Compiling a library is "one thing". Compiling a module in the library is "one thing". Compiling a file is "one thing". Parsing a file is "one thing". Programs and functions are different forms of abstractions anyways.

When your goal is just to "build the project", CMake may be a good abstraction because it handles all the configuration, compilation and linking for you (given that you have written the CMakeFile correctly). However when you are thinking one level deeper, you can notice that there are at least 3 things CMake is doing. And when you think one more level deeper, the compiler (gcc or clang) is doing a whole bunch of things to compile your project.

Compiling is one thing because I don't want to care about parsing the language, building the AST, performing analysis, optimizing, spitting out assembly, etc. Configuration is one thing because I don't know what it does. Linking is one thing because yet again I don't know how it works. I don't care how configuration and linking work because 99% of the time it just works, and I expect them to work given the long history of the software.

It is all about the level of abstraction you care about.

I don't want to learn the thing

In the case of CMake, I don't really want to learn the tool because

  1. I am not using the tool daily for every C/C++ project
  2. It seems too complicated
  3. I am not interested in the tool
  4. I just want to get the job done

Remember, I use CMake because that is what the tutorial told me to use, and it worked well until I demanded more from the build system. My naive understanding of CMake caused me to think that isolating IME as a standalone CMake project is a good choice. I've seen and heard people building libraries with CMake. It is the classic "surely it won't be that hard" pitfall.

But then the problem is, how much should we learn about the tool we use?

For git of course you need to know how to commit changes, how branches work, how to merge branches and how to resolve conflicts. If you are constantly dealing with JSON in the terminal, you may probably want to learn jq to get out data, do simple operations on it, and use them to construct new JSON. If you are using JavaScript, I'd expect you to at least heard of the word "event loop" because it is such a fundamental concept in JS.

How about CMake?

I know I have to create a build directory, cd into it, call cmake .. with some options, and then run make5. I also know I can specify release build with -DCMAKE_BUILD_TYPE=Release. For the CMakeFile, I know some basic structure, adding dependencies, setting target, and adding source files. That is all I know about CMake. These knowledge can usually help me set up a working build system. We always learn just enough for us to do the job.

I believe it applies to most people on Earth unless you are really really interested in the piece of tool. Therefore tools should only contain just the set of features to serve its purpose. Of course there are always edge cases that a tool needs to cover. I think tools should either generalize the problem to also cover the edge cases, meaning that the same simple interface should automatically handle all those cases, or straight up not supporting it.

Building projects are complicated. Imagine those multimillion dollar Silicon Valley companies having an entire team dedicated just to build the thing. I recognize that it is impossible to generalize "building" with a simple interface. But the companies simply have their own build system. Is CMake being too greedy to cover all different situations to serve everyone's needs? Is it worth to provide that much feature at the cost of making the learning curve steeper. How many times have you encountered a problem, you knew CMake can solve it, it is just hidden behind some unknown feature?

I don't actually know what I am looking for

Probably the biggest reason why I spent an hour looking for the solution is because I don't even know what I am looking for. I wanted to link a library to another library, combining them as one. I thought it is a straightforward thing to do. The error ld gave at the last stage of building just confuses me.

As a normal programmer would do, I copied the error message, which was trivial ld complaining not being able to resolve symbols, added the word CMake, and pasted to Google search. No working results. I tweaked the search query and continued searching.

Obviously I didn't know that I cannot just link a shared library to another shared library. Most of us don't know what they are looking for when they encounter an error message first time because we don't know what contributes to the error. In my case I don't know setting IME module to shared library contributes to the error, so I kept searching without mentioning it.

I know this is how we learn in programming. It just sucks because I expected CMake, the de facto standard of build tool in C/C++ land6, cannot handle this situation out of the box.

Conclusion

It is all about inexperience and skill issue. I know, I know. I remember the early days of me learning pointer, serializing the raw pointer into a binary file and wondering why I can't get the object back from the file. I can't keep but wonder, why are complex tools so difficult to learn?

Footnotes

  1. ffmpeg should have been a GUI application.

  2. Not a web dev.

  3. It is impossible to get ffmpeg working properly.

  4. Or sometimes straight up giving up.

  5. Who uses ninja?

  6. CMake is also known for the hatred towards it.

lokxii.bsky.social
ろくしぃ

@lokxii.bsky.social

快楽主義
ガンランスでMH3G四天王制覇した人
メゼポルタに所属するガンサー
星の翼でチンパンやってる

個人サイト: lokxii.github.io

アイコン→ @yutan-po.bsky.social

子供たち
ちゃあはんくん → @marchov.bsky.social
しゅうけいくん → @shuukei.bsky.social
よみあげくん→配信用、非公開

Post reaction in Bluesky

*To be shown as a reaction, include article link in the post or add link card

Reactions from everyone (0)