Felix' Ramblings
<< Quick Vent #2
>> Releasing Music Files (Almost) No-One Has Asked For

2025.03.08
Notes On Automatic Dereference For The Dot-Operator In C

What Is This Post About?

In the programming language C, you can access the members of a struct by using the dot-operator:


// Struct definition
typedef struct my_type my_type;
struct my_type
{
	int MemberA;
	char MemberB;
};

// Accessing the members of a struct in some function
void PrintMyType_ByValue(my_type MyStructInstance)
{
	printf("my_type members: \n");
	printf("  MemberA: %d\n", MyStructInstance.MemberA);
	printf("  MemberB: %c\n", MyStructInstance.MemberB);
}

C is pretty explicit about a bunch of things. If you only have a pointer to the struct instance, you have to dereference it first in order to access the members. Alternatively, you can use the arrow-operator to dereference and access the members:


// Struct definition
typedef struct my_type my_type;
struct my_type
{
	int MemberA;
	char MemberB;
};

// Accessing the members of a struct in some function, by passing a pointer
void PrintMyType_ByPointer(my_type *MyStructInstance) 
{
	printf("my_type members: \n");
	
	// Explicit dereference and access
	printf("  MemberA: %d\n", (*MyStructInstance).MemberA);
	printf("  MemberB: %c\n", (*MyStructInstance).MemberB);
	
	// Alternatively: Use the arrow operator
	printf("  MemberA: %d\n", MyStructInstance->MemberA);
	printf("  MemberB: %c\n", MyStructInstance->MemberB);
}

The "Problem"

When I write "new code", e.g. a new function, this isn't really an issue. But if I refactor / restructure existing code, this gets pretty annoying pretty quickly.

Imagine you change a function to take the struct by pointer instead of by value (or vice versa). In addition to changing the function signature and the function calls, you now have to replace the appropriate . with -> (or vice versa). As soon as multiple structs are involved, a blanket "find and replace" within the function is no longer sufficient; instead you have to go over the compiler errors or vet each line individually.

The "Solution"

At some point, you might begin to wonder: Couldn't you just make . work for both struct literals and pointers? In other words: Couldn't you make the dot-operator automatically dereference the pointer? Couldn't you modify the language / compiler such that the following "just compiles correctly":


void PrintMyType_Idea(my_type MyValue, my_type *MyPointer)
{
	// Use "." for struct value
	printf("  MemberA: %d\n", MyValue.MemberA);
	printf("  MemberB: %c\n", MyValue.MemberB);
	
	// Make "." behave like "->" for struct pointer
	printf("  MemberA: %d\n", MyPointer.MemberA);
	printf("  MemberB: %c\n", MyPointer.MemberB);
}

I even randomly stumbled upon this twice on the internet (Handmade Hero and Tsoding). Apparently, there might be some historic reasons why C was initially designed this way. However nowadays, the answer seems to be: You could absolutely change the language that way. Tsoding even modified the C compiler "TCC" to support this behavior within 45min. This doesn't even break (functioning) existing code!

So this got me wondering: Has no one made a C proposal for that?

The Proposal(s)

Good news: Someone has already put in the effort to make this an official proposal! Sadly, it seems like this feature will not be part of the C/C++ standard (at least in the foreseeable future).

I found two proposals of this feature: One for C, and one for C++. I couldn't find any discussion for the C proposal, but there has been discussions on the C++ proposal, which also includes some context for C.

As of writing, the C++ proposal is almost 5 years old, but the developments are, to my surprise, pretty recent. The proposal has the document number P2142R1. My first search led to an open github issue on cplusplus/papers. While the issue is still open, commenters mention that feedback was given to the author. In fact, another issue in sg22-c-cpp-standard-compatibility/sg-compatibility (for the same proposal) is linked. This one is closed, with the final comment stating:

"We do not think this paper is a good candidate for inclusion in C++."

The discussion of this proposal has been documented and is linked as well. While I initially thought that the rejection for C++ might not be a dealbreaker, the discussion / review / feedback paints a different picture. In it, you find statements like:

"Beginning programmers at my most recent company still had trouble recognizing the need for ->"

"The Linux kernel developers actively don't want this; -> heralds a potential cache miss and allowing . hides errors."

"I'll drop the paper; WG14 was interested if WG21 was." (Emphasis added by me; WG14 being the working group for C, while WG21 is the working group for C++)

Due to C++'s operator overloading, the issue of adding this automatic dereference becomes more complicated, as C++ allows custom definitions of the -> operator for your structs/classes, but not the . operator. Quite frankly, I don't really care too much about C++, but this rejection effectively meaning a rejection for the C proposal is quite the downer for me.

To my surprise, the argument of ease of refactoring didn't come up in the discussion at all (while the argument is made in the proposal). I don't think parity with other languages, nor potentially confusing for beginner programmers make good pro/contra arguments in programming language design.

I also find the argument of potential cache-misses a bit weak. It's not wrong; but I find it hard to think of a scenario where I'm super aware about individual cache-misses, while not being aware of where my stuff is stored in memory. It's quite the opposite: Most of my code where I switch between passing-by-value and passing-by-pointer, thus leading to refactoring annoyances, is not code where I worry about cache-misses in the first place.

And finally, I just don't understand the argument that "allowing . hides errors". I assume that the author is imagining a scenario where one would e.g. switch from passing-by-value to passing-by-pointer, and the switch simply compiling would lead to missing null-pointer-checks or something like that? Personally in this context, most of my annoyances stem from changing how I pass the data to functions. So this requires be changing the function signature, at which point I'm automatically at the function definition, confronted with potential missing null-pointer-checks (at least as much as I am when changing . to ->).

Conclusion

I'm sad that one of the few pain points I have when using C will not be alleviated, especially considering how viable and easy fixing this problem would be. Heck, you can even imagine emitting an optional compiler warning for the people which prefer sticking to ->, which would make this change as appealing to everyone as possible.

Too bad! Instead, all I got was this lousy blogpost, such that I can reference the proposal and its discussion / rejection in the future.


<< Quick Vent #2
>> Releasing Music Files (Almost) No-One Has Asked For
 Felix' Ramblings