Abstractions - tagg.link

2021-03-30
In computer science, abstraction is a driving force, constantly. Object-Oriented Programming takes it very seriously and models the real world as a set of systems with independent state working independently from each other... We have an ecosystem of APIs and libraries that "abstract" away the internal workings of Windowing systems, UI layouts, Networking... And the lowest level of these abstractions (at least software abstractions) is the Operating System.

Thus, the complexity of every layer of abstraction that lies on top of any single operating system is derived from the complexity of that operating system. That is, the more complicated the API of the OS is to use, the more complicated the systems that are built on top of it need to be. Also, any bugs, errors, or badly written algorithms in the operating system directly influence every system built on top of it, all the way to the user.

Operating Systems that are popular today have not aged well. Technology is such a fast advancing field that a lot of the complexity in modern operating systems could be simplified, rewritten, or redesigned entirely from scratch, only, we're scared of introducing more bugs, introducing terrible code, doing unnecessary work, and above all, I think, scared to admit that we are at the point where we should "throw one away".

The great danger is that we don't realize how much better things could be. We're a species that is very good at being ignorant, shrugging our shoulders and saying "good enough". Once a system works, we no longer stop to think about how to improve it. In fact, we're proud of what we've made, which makes us reluctant towards starting over from scratch, as if we're insulting ourselves. It's as if we're painting a line somewhere, labeling software "done", and any work beyond that point should not change the core structure of the software.

The truth is that everything has an expiration date, including legacy Linux code, the C programming language, the HTTP protocol, the Emacs environment, and the old pioneers of these inventions. C was made to run on the PDP-10, a computer with linear execution of instructions. Linux is an amalgam of variants with very little transparency into its internal workings available to the user. The HTTP protocol was meant to help researchers at CERN "find documents faster" (something that still isn't achieved by the internet, although the wiki is a start). Emacs is... well, Emacs.

We ended up here because we stopped thinking. The people who used to think about these things, invent them, are long gone, and they left a legacy without anyone to question how they did what they did. Those with the wit to invent the correct systems at the correct time (read: Google) did so, and became powerful to the point of self-corruption, allowing the need to sustain itself as a "thing" rule over all other values.

So, let's think again. When approaching a difficult problem, you always remove all the complicated details first, until you find the core. Once at the core, your guiding principle should be usefulness. We're software engineers, and we write useful abstractions. And there is only one true rule to an abstraction: it never applies to all cases. So we don't just write one abstraction, but many. Our intuition tells us, we're redoing work, this could be compressed, but don't let it fool you. Rather than ignoring problems, solve them, each and every one of them.

Tons of people have already realized this, I just felt like writing about it. There is no reason to obsess over one operating system, no reason to rely on some singular set of abstractions, when each and every single abstraction could be optimized for its specific use case.

But what about code reuse??

Forget it! Use the right code for the right problem!

Actually, feel free to share your code, but understand that in order for that sharing to actually be truly useful, the person receiving it has to understand its internal properties. Which removes its purpose as an abstraction. Or does it?

How about defining a "language" of sorts for specific sets of algorithms that the system is built on? How about picturing the data transforms in a graph, along with its O(n) complexity (as a first example of a metric), and other terms used to describe exactly what "type" of data transform it is?

What if reading C/C++ header files could be done quicker and with a higher understanding of the code's internal structure..?

We no longer have any excuses not to do this. Knowledge into a system should be just as transparent as the availability of the system itself. For every line of code, there should be an accompanying sentence explaining what the code does. For every independent program, there should be a searchable database of clear, well-defined properties and use-cases for programs. For every computer, there should be several other computers communicating with it through an encrypted tunnel. The "thing" should no longer be sacred, replaced by "the concept of the thing". This should be the operating system on which society evolves... not corrupt, old men swallowed by their own pride.