As a professional procrastinator, there is always a list of side-projects laying around.
One of these items is a website:
I always wanted to create a website which serves as my own cheat sheet,
because my memory is utter garbage and some of these notes might be helpful to other people as well.
Given that I just got some ancient computer up and running which serves as a git-server/NAS
I thought: Why not finally host the website on it as well?
And well, once one site is set up, I might as well create a second, less-useful site
for all of my ramblings which literally no one asked for.
Ok, let's talk websites. I kind of despise how the modern web turned out [0]. To make this brief: I want a static website with some personal styling to make it pretty. Static websites generators are a thing. I could have just picked any existing solution, spent a day (or more realistically a week) on configuring it to my liking and then forget about it. But, speaking from experience, it usually goes somewhat like this:
Here comes the infamous thought: "Well, it can't be that difficult". And that kicked of my ~7 sessions of creating my own static website generator.
The very first step was to create the html/css that I want "manually" by hand. Now that I now what to generate, I created some mockup theoretical input in markdown, simply because I somewhat knew this format and thought it would be good enough for my type of stuff. After I sketched out some folder structure for posts, the coding part started. I ended up with the following structure:
That... doesn't sound particulary efficient, does it? Welp, when doing these projects I learned to do the simplest things first. All this re-walking of folders and trees made the memory allocation and content generation pretty simple.
Full disclosure: After some parsing issues I decided to make things easier for myself by modifying the markdown syntax I use for my sites just a bit. That way I didn't need to implement proper backtracking; instead I just need to do some lookahead. I'll probably still be ironing out bugs from time to time, but the result is usable enough for me already (you are looking at the website after all).
Benchmark time! I copied the input of my test post, which is supposedly almost 13 Kilobytes big, 1000 times and ran my generator. 1000 posts is really not a lot, but realistically, it's more posts than I'll ever have.
...
Found file:
Path: .//1000/input.txt
Website Path: 1000
Title: Test Entry but this time with a very long title. How will it handle it?
Date: 2022.12.07
Converting all found files...
Parse and converting of "about" page...
Writing overview files...
________________________________________________________
Executed in 266.68 millis fish external
usr time 137.05 millis 245.00 micros 136.81 millis
sys time 84.82 millis 38.00 micros 84.79 millis
...<300ms? The power of really dumb C code combined with modern computers keeps surprising me.
Which makes the whole "why the fuck is this program so slow?" even more frustrating.
The best part: This is single-core-performance because, again, I did the simplest thing
possible. This should be trivially parallizable (one task -> one thread), but that can wait
given the current performance.
Ok, one last circle-jerk: How much code is it? I know this comparison is unfair, but let's take a look at two other static website generators first: Jekyll and Hugo. Obviously these two projects can do way, way, way more than my little program - but let me have this moment >:(
> cloc jekyll/
751 text files.
713 unique files.
75 files ignored.
github.com/AlDanial/cloc v 1.94 T=0.25 s (2843.5 files/s, 249147.5 lines/s)
--------------------------------------------------------------------------------
Language files blank comment code
--------------------------------------------------------------------------------
Markdown 306 4925 0 19488
Ruby 183 3615 2878 17167
Cucumber 28 353 12 4522
YAML 44 278 124 2203
SCSS 18 433 235 2126
JavaScript 5 115 6 1073
HTML 74 63 9 966
Text 11 32 0 888
Bourne Again Shell 13 47 42 222
JSON 4 6 0 199
ERB 13 56 0 168
CSS 1 15 11 50
SVG 3 0 0 32
XML 1 3 0 29
Dockerfile 1 7 22 26
CoffeeScript 3 2 0 15
CSV 1 0 0 3
PHP 1 1 0 3
TOML 1 0 0 2
XHTML 1 0 0 1
Rmd 1 0 1 0
--------------------------------------------------------------------------------
SUM: 713 9951 3340 49183
--------------------------------------------------------------------------------
That's a lot of markdown. I suppose this is for documentation? I can't be bothered
to find out to be honest. Apparently "Cucumber" is a programming language,
so TIL I guess. Counting Ruby and Cucumber, it's 21k LoC.
That sounds pretty reasonable, what about Hugo?
> cloc hugo/
1646 text files.
1596 unique files.
394 files ignored.
github.com/AlDanial/cloc v 1.94 T=2.69 s (594.3 files/s, 89638.0 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Go 747 25623 19050 119192
Markdown 551 11904 0 34177
HTML 128 252 45 11847
JSON 11 0 0 6903
CSS 29 529 1011 5436
SVG 62 3 7 1640
TOML 19 274 58 1439
YAML 11 45 14 537
JavaScript 15 33 58 192
XML 7 2 0 147
CSV 1 1 0 129
Bourne Shell 8 31 12 51
Dockerfile 1 14 10 21
Text 4 1 0 12
SCSS 1 1 0 6
Sass 1 1 0 5
-------------------------------------------------------------------------------
SUM: 1596 38714 20265 181734
-------------------------------------------------------------------------------
120k LoC of Go? Nani the fuck?
You know what? I don't even want to know. Let's move on to my project:
> cloc asswg/
29 text files.
23 unique files.
20 files ignored.
github.com/AlDanial/cloc v 1.94 T=0.02 s (963.2 files/s, 129576.1 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
C 5 307 112 1915
CSS 1 53 29 225
C/C++ Header 4 32 3 170
Text 4 33 0 164
HTML 6 0 0 24
Bourne Shell 3 5 2 20
-------------------------------------------------------------------------------
SUM: 23 430 146 2518
-------------------------------------------------------------------------------
That's not even 2.000 Lines of C-Code. And it does what I want. Cool.
While it would be interesting to compare the runtimes, I, for reasons stated above,
don't want to learn these other two projects. Too Bad!
Additionally, let's not forget: Programming these things from scratch makes you learn a ton. This time around it was parsing stuff and why you usually tokenize your input first. I never understood why I should spend time writing a tokenizer when I could just work on the input directly. Apparently, you can do that, it's just a bit more annoying. Although I learned this the hard way, but I wouldn't want it any other way.
And now I have my own static website generator: A piece of software that I actually use. That has to come with some bragging rights. If something doesn't work - that's completely my fault and not some weird unspecified behaviour [1]. And for the things which do work I can feel pretty proud about. So I'll call this procrastination project a success.
[0]: The list of ramblings to be done really does fill itself.
[1]: I hope the irony of this sentence in conjunction with the fact that this project is written in C is not lost.