Lessons from Implementations
- Essay? Thoughts
Understanding your program’s scope
In the realm of software development, identifying the scope of a program is paramount. Take, for example, the task of counting words, lines, and characters within a file. This seemingly simple task forms the basis of the Unix wc
command, we will talk about it for the majority of this article, as it is a good example of how one should approach software design, we’ll analyze wc.c
and the two header files used.
Knowing When to Give Up
The program’s purpose is clear; to efficiently perform word counting operations on text files, doing anything other than that is Out of scope, and understanding that is key. Equally crucial is recognizing when complexity serves no purpose. Enter GNU’s wc
implementation, a sprawling 874-line behemoth. While it achieves the desired functionality, its verbosity and convoluted structure raise questions about maintainability and readability. In contrast, Sbase’s wc
implementation achieves the same task with elegance and precision, all while maintaining a concise codebase. By embracing simplicity and clarity, Sbase’s wc
embodies the principle of knowing when to give up on unnecessary complexity.
The Principle of Discretion
The comparison between GNU’s wc
and Sbase’s wc
underscores a fundamental truth in software development: just because you can implement a feature or utilize a complex algorithm doesn’t mean you should. Sbase’s wc
demonstrates that adherence to simplicity and adherence to POSIX standards can lead to a solution that is not only functional but also elegant and maintainable.
Comparing Sbase’s wc
with GNU’s wc
Now, let’s delve into the details of Sbase’s wc
implementation and compare it with its GNU counterpart. Despite both aiming to achieve the same goal, their approaches and outcomes differ significantly.
In action: Embracing Simplicity and Efficiency
Sbase’s wc
implementation is a testament to simplicity and efficiency. Leveraging the utf.h
and util.h
headers, it gets the job done in 122 lines (Counting comments and whitespace.) . Global variables such as lflag
, wflag
, and cmode
govern program output, while counters tc
, tl
, and tw
keep track of total characters, lines, and words, respectively. The output function, succinctly defined between lines 12-34, prints counts as per the set flags, utilizing standard input/output operations (printf
and putchar
). The heart of the program lies in the word count function (lines 37-60), where it reads from a file, counts lines, words, and characters using utilities from utf.h
, and formats output accordingly. Help messages are handled in the “usage” function (lines 63-67). Finally, the main function (lines 69-121) handles command-line argument processing, file operations, error handling, and result output.
2 #include <string.h>
3
4 #include "utf.h"
5 #include "util.h"
6
7 static int lflag = 0;
8 static int wflag = 0;
9 static char cmode = 0;
10 static size_t tc = 0, tl = 0, tw = 0;
11
12 static void
13 output(const char *str, size_t nc, size_t nl, size_t nw)
14 {
15 int first = 1;
16
17 if (lflag) {
18 first = 0;
19 printf("%zu", nl);
20 }
21 if (wflag) {
22 if (!first)
23 putchar(' ');
24 first = 0;
25 printf("%zu", nw);
26 }
27 if (cmode) {
28 if (!first)
29 putchar(' ');
30 printf("%zu", nc);
31 }
32 if (str)
33 printf(" %s", str);
34 putchar('\n');
35 }
36
37 static void
38 wc(FILE *fp, const char *str)
39 {
40 int word = 0, rlen;
41 Rune c;
42 size_t nc = 0, nl = 0, nw = 0;
43
44 while ((rlen = efgetrune(&c, fp, str))) {
45 nc += (cmode == 'c') ? rlen : (c != Runeerror);
46 if (c == '\n')
47 nl++;
48 if (!isspacerune(c))
49 word = 1;
50 else if (word) {
51 word = 0;
52 nw++;
53 }
54 }
55 if (word)
56 nw++;
57 tc += nc;
58 tl += nl;
59 tw += nw;
60 output(str, nc, nl, nw);
61 }
62
63 static void
64 usage(void)
65 {
66 eprintf("usage: %s [-c | -m] [-lw] [file ...]\n", argv0);
67 }
68
69 int
70 main(int argc, char *argv[])
71 {
72 FILE *fp;
73 int many;
74 int ret = 0;
75
76 ARGBEGIN {
77 case 'c':
78 cmode = 'c';
79 break;
80 case 'm':
81 cmode = 'm';
82 break;
83 case 'l':
84 lflag = 1;
85 break;
86 case 'w':
87 wflag = 1;
88 break;
89 default:
90 usage();
91 } ARGEND
92
93 if (!lflag && !wflag && !cmode) {
94 cmode = 'c';
95 lflag = 1;
96 wflag = 1;
97 }
98
99 if (!argc) {
100 wc(stdin, NULL);
101 } else {
102 for (many = (argc > 1); *argv; argc--, argv++) {
103 if (!strcmp(*argv, "-")) {
104 *argv = "<stdin>";
105 fp = stdin;
106 } else if (!(fp = fopen(*argv, "r"))) {
107 weprintf("fopen %s:", *argv);
108 ret = 1;
109 continue;
110 }
111 wc(fp, *argv);
112 if (fp != stdin && fshut(fp, *argv))
113 ret = 1;
114 }
115 if (many)
116 output("total", tc, tl, tw);
117 }
118
119 ret |= fshut(stdin, "<stdin>") | fshut(stdout, "<stdout>");
120
121 return ret;
122 }
This file can be found here: https://git.suckless.org/sbase/file/wc.c.html
GNU’s Approach: “Features” at the Expense of Clarity
On the other hand, GNU’s wc
implementation, while functional, succumbs to the perils of unnecessary complexity. Spanning 874 lines, it navigates a maze of code, laden with intricacies and dependencies. Despite its claim to offer additional functionalities, the reality is less compelling. Notably, it deviates from the POSIX2008 specification by introducing three flags—let’s call them additions instead of features—whose utility is dubious. These additions lack practical significance, especially considering that other ‘coreutils’ already fulfill similar purposes more effectively. Furthermore, GNU’s wc
falls short in terms of readability and maintainability. Its convoluted structure and unnecessary complexity. Additionally, there is no discernible performance advantage to choosing GNU’s implementation over others. In essence, GNU’s approach raises questions about its adherence to standards and the practicality of its extensive codebase, without offering tangible benefits in return. Why is the code 7x times larger when there is no added bennefit?
The Result: Readable Code vs. Technical Debt
In the end, the comparison between Sbase’s wc
and GNU’s wc
underscores a crucial dichotomy in software development. While both accomplish the same task, Sbase’s implementation shines with its readability, conciseness, and adherence to standards. In contrast, GNU’s implementation, despite the fact that it works the same way, except for violating the POSIX2008 specification [adds three ;{non standard}; flags], introduces unnecessary complexity and technical debt, compromising long-term maintainability and hackability. For software to be truly “Free”, “Libre”, and Open, it should be easy to understand and hack. Its the perfect example of Jamie Zawinski’s Law of Software Envelopment.
Conclusion: Striving for Simplicity and Clarity, for the sake of the results.
As we reflect on the comparison between Sbase’s wc
and GNU’s wc
, the lesson is clear: simplicity and clarity should reign supreme in software development. By embracing minimalism, adhering to standards, and knowing when to give up on complexity, developers can create solutions that are not only functional but also elegant, efficient, and sustainable in the long run. Furthermore, this solidifies the concept of modular programs. For example, let’s say you are to develop a Window Manager, you don’t need to re-invent the wheel and write your own ‘keyboard daemon’, if your software does not add any value, the lowest hanging fruit is the best option most of the times.
Make the obvious choices.
- Set a clear roadmap for development.
- For every major milestone, re-consider the roadmap and discard old things that don’t match the project’s current plans.
- Choose your battles. Your program deserves good libraries, don’t try to implement everything yourself when you know there is a better way.
- Try to tighten the scope of your program, or otherwise make it modular enough that you can build it with only a sub-set of features (Careful with this, some people go all out on this and end up with complex build systems, which make contributing stupidly hard).
- Don’t mislead yourself with comments. The concept is simple ; What are bugs? Bugs are unintended behavior, often from misunderstanding or misusing the language’s features. So, in the case you have a comment that says: “Increment counter before showing the laptop’s battery icon”, you could be mislead by your own comments in case your “old self” failed to implement or correctly describe the code.
- Another argument against Excessive commenting is the fact that if your code requires large & vast amounts of comments, its because it is too complex to understand ; Consider re-implementing said code’s functionality in a simpler, more understandable way.
- Do you really need that comment that ChatGPT put in your code that says “// We set variable –red to the corresponding value”?
“One of my most productive days was throwing away 1,000 lines of code.”
— Ken Thompson, co-creator of the UNIX operating system and a pioneer in computer science, known for his contributions to the development of the C programming language and the creation of the B programming language.