A quick introduction:
If you want to program a graphical application for linux, your primary choices are using either X11 or Wayland. According to Wikipedia, X11 had its first release in 1984. X11 follows a client-server-model. I assume it's because the whole computational environment, was very different back then. If you have a central server which is doing all of the heavy lifting, while the user only connects to it via a slim client / dumb terminals / whatever, having the desktop system follow a client-server-model makes sense. In contrast, Wayland (protocol) had its first release in 2008. That's also quite a while ago, in the 2000s the computational environment was probably much closer to what we have today: A PC usually comes with a desktop / graphical interface, and the machine doesn't need to rely on an external server to do most of the computing.
So, over the years there has been a push to switch from X11 to Wayland. And, at least on a surface level, this makes sense to me: Developers probably have learned a lot about the various requirements of desktops, so having a (mostly) clean cut for this new desktop environment seems promising. I have read claims stating that Wayland is inherently more secure than X11. Wayland isn't "outdated", we can design the desktop with performance and modern use-cases in mind.
I am typing this on a desktop machine running sway, which is a Wayland compositor. There definitely have been the common hurdles like desktop recording / sharing not working. But over time, these issues have been resolved - at least for my machine. Some years ago, I tried out both X11 and Wayland (I think back on Arch Linux). And honestly, the sway installation was far easier than the i3/X11 one. This ease of installation, combined with Wayland supposedly being "the future of Linux Desktops", and it supporting X11 applications via XWayland, made me stick to sway, even with its rough edges.
That was the story of me using Wayland. Now comes the developing part - which has been a fucking nightmare.
For libraries to be used by other developers, I'm a big fan of:
If you just want to open a window and do some simple rendering with your GPU, raylib is a fantastic library. Here is an example application:
#include <raylib.h>
int
main(void)
{
InitWindow(1280, 720, "Test Window");
SetTargetFPS(144);
while ( ! WindowShouldClose())
{
BeginDrawing();
ClearBackground(RAYWHITE);
EndDrawing();
}
return 0;
}
Incredibly simple. Raylib is really good at covering the "make easy things easy" part. Ideally, you'd have an "upgrade path" where, using more complex code, you can handle more complicated edge-cases step-by-step [0].
When developing graphical Windows applications, you have to use Windows.h
, do a few rather cryptic
function calls to create and get window handles, and then you have your "main loop", where you work
through a bunch of window events (mouse moves, keyboard input, window wants repaint, ...) and respond
accordingly. It's fairly more complicated compared to raylib, so I see lots of potential for improvement.
This is the reason why I had high hopes for Wayland. Boy was I wrong. Wayland does not care for the simple use-case at all. Getting any example application to work is so incredibly ridiculous, that every second I program on Wayland, I yarn for the times I did Win32 programming.
Don't get me wrong: I'm not expecting that e.g. DPI aware mult-monitor applications using several input devices, mixed refreshrates and hot-plugging of devices "just works". It's just that every single thing, which I would expect to be reasonably simple, or have helper functions of any kind, is so incredibly painful at every step of the way.
I'm not posting code here, because my helper functions to open up an OpenGL Window and transform the whole Wayland insanity into a list of events is >1300 lines of code.
In Wayland, opening up a roughly works as follows:
wl_display_roundtrip()
& wl_display_dispatch()
And when I mean everything, I do mean everything.
wl_output
you got from the registry callback, then register all the callbacks on them.
I fully blame Wayland being an Object Oriented Protocol for this.
The control flow is horrible. Have fun trying to predict what code executes in what order after your
wl_display_roundtrip()
& wl_display_dispatch()
calls - and I still don't know what's the difference between
them and in what order to call them on. Initialization code for OpenGL is rather fragile, as some of these
callback functions are called several times during initialization.
If you fuck anything up during initialization, just likely just get no window and the program is running endlessly. Even if do all of this callback bullshit, none of it is simple to use.
xkb
, otherwise you don't know what character the key press is supposed to represent.wl_output
objects and register all the callbacks for eachwl_output
objectwl_output
being used.
Match that wl_output
with your stored globals and the corresponding refreshrate.zxdg_toplevel_decoration
global from the registryThe most valuable resource is Wayland.app. It has an overview over
You can see how fragmented this whole thing is. It's a complete mess.
Oh and I haven't even mentioned how you can use these extensions.
This is crucial, because you cannot open a window if you just use the core protocol.
You used to be able to, but the "shell" which was used to display things, wl_shell
,
has been deprecated.
I'm not kidding.
You are supposed to use XdgShell
instead [1]. But if you lookup the extension, you will
probably only find an XML-file - because the interface code is generated from it using
wayland-scanner
as follows:
wayland-scanner private-code < xdg-shell.xml > xdg_shell.c
wayland-scanner client-header < xdg-shell.xml > xdg_shell.h
The official Wayland Documentation does not mention wayland-scanner
,
only that it's generated from the XML. I'm also still not sure where I'd get the XML files from.
For Void Linux, there is a wayland-protocols
package, which puts the XML files in /usr/share/wayland-protocols
.
This is sheer insanity. There are so many obstacles you have to get over in order to produce
just a blank window, and even if you get past that, the control flow of the application is fucking garbage.
I have no idea why they didn't stick to "main event loop" instead, and just provided an easy way
of dismissing events you don't care about.
To give you another glimpse: For some fucked up reason, opening up an OpenGL Window is easier than just drawing some pixels on the CPU. Yes, really. Here is, roughly, what you need to do for a software rendered application:
WlSurface
WlSurface
XdgSurface
XdgSurface
XdgToplevel
XdgToplevel
ShmFD
using shm_open
ftruncate
and map it using mmap
WlShm
and ShmFD
, create an wl_shm_pool
wl_shm_pool
, create a WlBuffer
wl_surface_commit
on your WlSurface
wl_display_dispatch()
and wl_display_roundtrip()
XdgSurface.Configure
-callback, attach WlBuffer
to the WlSurface
.
Damage and commit WlSurface
to indicate that you have stuff to be displayed.XdgToplevel.Configure
-callback, you need to keep track whether your
resolution changed. If so, you probably want to create another WlBuffer
with new
dimensions, and draw to itwl_callback
for the WlSurface
wl_callback
wl_callback
wl_callback
I'm skipping over some details here, but you can see that this shit is fucking insane. Fuck anything up, and your application is less responsive, doesn't render at all or way too often, is leaking memory, and/or so on and so forth.
Here's a list of stuff I randomly stumbled over, in no particular order.
If you use your GPU for rendering, you are probably using eglSwapBuffers
using EGL.
If you activate VSync, eglSwapBuffers
will automatically block until the frame is displayed,
which is quite handy (no manual sleeping / refresh-rate querying required).
But if you unplug / replug the monitor, eglSwapBuffers
will
block indefinitely.
Wayland, AFAIK, has no concept of primary monitors. For my personal application, I wanted to hardcode something using the right-most monitor. Wayland offers a geometry callback, which is supposed to return the x & y position within the global compositor space (and other info like physical dimensions, subpixel layouts, transforms, etc). Apparently several Wayland environments always return (0, 0) for the monitor position.
The Wayland Protocol development is so slow, that someone (from Valve?) started Frog Protocols, with the sole purpose of being able to iterate much more quickly.
As far as I know, there is no standardized way of retrieving the current "desktop state", meaning retrieving a list of open windows, their positions, etc. I'm aware that Wayland is supposed to be "secure", but simply not offering this functionality introduces fragmentation and is likely more insecure than offering an API with some sort of permission system. On Sway, you can retrieve the "desktop state" using the sway-ipc-socket, which communicates via JSON (kill me). However, there is no way to query for applications / surfaces using the Wlr Layer Shell extension, which are used for workspace independent windows (e.g. status bar) [2].
Getting the Wayland Clipboard to work is a nightmare.
I just wanted to copy some text into the clipboard, but I gave up after while.
For text, the easiest hack I found was to start another process running wl-clipboard
and
passing the relevant arguments to that.
Hotplugging stuff doesn't necessarily work. The keys of my drawing tablet pad are not recognized by any application until I restart the programs. Judging from my experiments, Wayland doesn't seem to generate the relevant events to announce that the device has been plugged in, while unplugging does send the events to render the device invalid [3].
Xdg-Desktop-Portal is required to do any form of screensharing. Thanks to Wayland thinking that screensharing / recording is "out of scope", xdg-desktop-portal has been implemented by several compositors, leading to fragmentation. On Void Linux, there are the following implementations available:
While I was able to make parts of a Wayland Window transparent, I wasn't able to make it "click-through" / forward the events to the window behind it.
Setting mouse cursors is a massive pain in the ass. While I now know how to hide the cursor or how to display the "normal" cursor, I'm not sure how to handle other cursor types: You need to pass a "CursorType" string, and I legit couldn't find a list of all the valid valid strings [4].
As a user, using Wayland is nice.
Compared to X11, the internals of Wayland Compositors might also be a great upgrade, I don't know.
As a developer, I want to kill myself when working with Wayland code. The API of this "asynchronous object oriented protocol" is a fucking disaster. It's a huge downgrade compared to Win32 or X11 with XLib. You cannot write a simple application with Wayland. Every part of it, the different extensions, the fact that you generate the api code from XML, and the entire control flow when interacting with Wayland, is utterly horrible. This is the foundation all future Linux applications should supposedly build upon.
[0]: I have no idea how well raylib supports that.
[1]: XDG Shell is used for "normal Desktop applications". There's also stuff like Wlr Layer Shell, which I use to display "workspace independent" stuff, like a status bar.
[2]: There is no protocol extension to query this (on sway). The socket doesn't return any info about the shells. The only reference I was able to find was some reddit post, stating that there is no way to do this.
[3]: Side note: For several months, hot-plugging anything also just straight crashes OBS somewhere deep inside GTK library code.
[4]: There has been another protocol extension though, so maybe this will clear things up.