Felix' Ramblings
<< Stream of Thoughts: What Do I Like?
>> What If I Don't Improve Myself?

2023.02.09
I Hate: Audio On Linux

There's no fucking way "the year of the linux desktop" will ever come given the current shitshow that is Linux audio jungle.

Let's start with a quick overview of audio systems. There is:

The bottom layer is basically formed by OSS and ALSA. Apparently ALSA has replaced OSS, so ALSA is the thing actually making Linux play sounds at the end of the day. Consequentially, ALSA is also the only thing available on a minimal Linux installation.

From what I understand, PulseAudio, JACK and PipeWire all act as middlemen to the user (and software) and work with ALSA under the hood. Ironically enough, this means that not only do they all implement the ALSA API for software, but also APIs of another. What a mess.

ALSA, by default, is not able to play two sounds at the same time. Just from the user perspective, this is fucking unusable garbage. Sure, you can configure some additional bullshit to make this work. But holy fuck the ALSA configuration files are fucking horrible. Configuring your output devices is fucking awful - which is something I had to do on any system using an HDMI connection, because I am lucky enough that my system always tries to output sound using my monitor (which does not have any speakers). Disabling audio modules requires you to do some modprobe bullshit, which most often does not really do the thing I want it to do, and this is entering the part of Linux where I don't have the slightest clue of how to apply changes without restarting my whole system, fucking everything up in the process. The only thing that is somewhat working is alsamixer. You can see and configure some sound channels (e.g. the "mic"-channel is for some god forsaken reason muted after the initial installation, requiring you to unmute it). You can globally tune the volume and it persists after boot. I appreciate AlsaMixer.

Here comes our saviour PulseAudio: By default, we can play multiple sounds at the same time. A blessing from the lord. We can now also adjust the volume levels of individual applications easily: Just pop open pavucontrol and work them sliders. Default audio devices can also be configured there. However, PulseAudio introduces its own set of additional problems. When I first used wine, the whole audio of my system became lower pitched because there was some mismatch between the sample rate of my audio interface and the sample rate PulseAudio decided to use. But you can configure PulseAudio as well - which is horrible. Not fucking horrible, so this is an improvement, but still horrible. Then there's also fun bugs like some application hogging ALSA, then starting e.g. a browser trying to initialize PulseAudio, which promptly shits itself because ALSA is hogged so PulseAudio cannot become the middleman. Just restarting the applications once usually works, but this is not fun at all.

I can honestly not comment on JACK. It's some audio implementation often used for more professional audio setups. While I assume it's for loops and lower latency stuff, it also has the reputation of being a bit more complicated to setup, so I don't want to find out if not required.

However, after I tried to fiddle with my audio setup a bit, I realized that I would need to do some additional configuration of ALSA and/or PulseAudio, so I decided to give PipeWire a try.

So far, it's been a wild ride of emotions. PipeWire, which is rather new, decided to handle both video and audio streams while aiming for low latency. That sounds complicated, but from a user perspective my first installation basically just worked. Although there is a caveat (for my specific distro): PipeWire uses a "session manager" - no fucking clue what that really does. PipeWire, per default, ships with the "PipeWire Media Session" manager - which is deprecated and should not be used anymore. Great. Instead, WirePlumber should be used. That installation just required me to execute four commands which seemed to just work.

Ok, celebrations? No. After upgrading, my whole system was simply quieter. I assumed it was some level setting, so I decided to look how to adjust this. In an odd twist of events, I have seen people recommend configuring audio levels of PipeWire using pavucontrol, which PipeWire properly translates into its own configuration. This "just worked" after I setup the WirePlumber (which is good).

Now, you can do this with WirePlumber as well. In particular, you can find commands like


$ wpctl set-mute @DEFAULT_AUDIO_SINK@ toggle
$ wpctl set-volume @DEFAULT_AUDIO_SINK@ 5%+
on the web. I wouldn't have guessed to use a WirePlumber tool, but alright. What leaves a bitter taste is how you were supposed to find this out:

$ man wpctl
man: No entry for wpctl in the manual.

$ wpctl -h
Usage:
  wpctl [OPTION] COMMAND [COMMAND_OPTIONS] - WirePlumber Control CLI

Commands:
  status
  get-volume ID
  inspect ID
  set-default ID
  set-volume ID VOL[%][-/+]
  set-mute ID 1|0|toggle
  set-profile ID INDEX
  clear-default [ID]

Help Options:
  -h, --help       Show help options

Pass -h after a command to see command-specific options


$ wpctl status
 Clients
 ...

Audio
 ...

Video
 ...

Settings
 ...
No mention of @DEFAULT_AUDIO_SINK@ whatsoever, so just stick to pavucontrol.

So, why video streams as well? Who the fuck uses that?

Without going too much into the details, there are two large desktop environments on Linux: X11 and Wayland. Wayland is the new and shiny one, while providing compatibility with X11 using XWayland, so some distros have started migrating to it.

I gotta admit, setting up e.g. i3 (X11) was always a fucking pain in the ass and caused me troubles every single time, while sway (Wayland) pretty much flawlessly for me. So props to that.

On the other hand:

So on sway, OBS can record individual windows (iirc), but recording the whole screen is not possible on its own. This also bricks Zoom or Discord screen sharing. Fucking marvellous.

PipeWire fixes that! With "xdg-desktop-portal" and PipeWire, OBS screen recording just works again, that was a pleasant surprise. Here comes the downer again though:

So far, all audio levels were fine, except for MPV, which is really quiet. Popping open pavucontrol reveals that the audio levels were just set to low, but at seemingly random time intervals this would switch back. I can't even just tune MPV louder, because that makes the audio clip while staying quieter. It turns out, this is because, for some fucking reason, MPV decides to use the same audio levels as MPD, which I use for background music. Whenever I update the audio levels of MPD using mpc, a client of MDP, the audio levels I set for MPV in pavucontrol are updated to MDP's new levels as well.

The audio levels of MPD and MPV are equal. I would call this neat but I didn't ask for this and I can't get rid of it.

This shit is driving me crazy because it's just so fucking mildly inconvenient, and I can't find other people having this problem. It certainly does not fucking help that the documentation of PipeWire / WirePlumber is so fucking garbage. How do I even begin to tackle this problem? I thought using wpctl could help, because it's quite verbose:


$ wpctl status
PipeWire 'pipewire-0' [0.3.65, void@voidbox, cookie:1960366277]
  Clients:
      32. WirePlumber                         [0.3.65, void@voidbox, pid:1403]
      33. WirePlumber [export]                [0.3.65, void@voidbox, pid:1403]
      50. pipewire-pulse                      [0.3.65, void@voidbox, pid:1364]
      51. Music Player Daemon                 [0.3.65, void@voidbox, pid:1372]
      57. xdg-desktop-portal-wlr              [0.3.65, void@voidbox, pid:1673]
      64. Chromium input                      [0.3.65, void@voidbox, pid:30537]
      65. mpv                                 [0.3.65, void@voidbox, pid:9918]
      71. Firefox                             [0.3.65, void@voidbox, pid:31125]
      72. Steam                               [0.3.65, void@voidbox, pid:15222]
      73. Steam Voice Settings                [0.3.65, void@voidbox, pid:15222]
      74. Chromium input                      [0.3.65, void@voidbox, pid:15641]
      84. Firefox                             [0.3.65, void@voidbox, pid:31125]
     103. wpctl                               [0.3.65, void@voidbox, pid:10061]

...

Audio
  Streams:
      52. Music Player Daemon
           54. output_FL       > Scarlett 2i2 USB:playback_FL	[active]
           56. output_FR       > Scarlett 2i2 USB:playback_FR	[active]
      70. Firefox
           61. ouGtput_FR       > Scarlett 2i2 USB:playback_FR	[active]
           77. output_FL       > Scarlett 2i2 USB:playback_FL	[active]
      98. mpv
           86. output_FL       > Scarlett 2i2 USB:playback_FL	[active]
           95. output_FR       > Scarlett 2i2 USB:playback_FR	[active]

...

You can check both clients and streams, making this shit more complicated. Let's check for some overlap between the clients of MDP and MPV:


# MPV                                      | # MPD
$ wpctl inspect 65                         | $ wpctl inspect 51
id 65, type PipeWire:Interface:Client      | id 51, type PipeWire:Interface:Client
    application.language = "en_US.UTF-8"   |     application.icon-name = "mpd"
  * application.name = "mpv"               |     application.language = "en_US.UTF-8"
    application.process.binary = "mpv"     |   * application.name = "Music Player Daemon"
    application.process.host = "voidbox"   |     application.process.binary = "mpd"
    application.process.id = "9918"        |     application.process.host = "voidbox"
    application.process.session-id = "1"   |     application.process.id = "1372"
    application.process.user = "void"      |     application.process.session-id = "1"
    clock.power-of-two-quantum = "true"    |     application.process.user = "void"
    core.name = "pipewire-void-9918"       |     clock.power-of-two-quantum = "true"
    core.version = "0.3.65"                |     core.name = "pipewire-void-1372"
    cpu.max-align = "32"                   |     core.version = "0.3.65"
    default.clock.max-quantum = "2048"     |     cpu.max-align = "32"
    default.clock.min-quantum = "32"       |     default.clock.max-quantum = "2048"
    default.clock.quantum = "1024"         |     default.clock.min-quantum = "32"
    default.clock.quantum-limit = "8192"   |     default.clock.quantum = "1024"
    default.clock.rate = "48000"           |     default.clock.quantum-limit = "8192"
    default.video.height = "480"           |     default.clock.rate = "48000"
    default.video.rate.denom = "1"         |     default.video.height = "480"
    default.video.rate.num = "25"          |     default.video.rate.denom = "1"
    default.video.width = "640"            |     default.video.rate.num = "25"
    link.max-buffers = "64"                |     default.video.width = "640"
    log.level = "0"                        |     link.max-buffers = "64"
    mem.allow-mlock = "true"               |     log.level = "0"
    mem.warn-mlock = "false"               |     media.category = "Playback"
  * module.id = "2"                        |     media.class = "Stream/Output/Audio"
  * object.serial = "3458"                 |     media.name = "mpd"
  * pipewire.access = "unrestricted"       |     media.role = "Music"
  * pipewire.protocol = "protocol-native"  |     media.type = "Audio"
  * pipewire.sec.gid = "1000"              |     mem.allow-mlock = "true"
  * pipewire.sec.pid = "9918"              |     mem.warn-mlock = "false"
  * pipewire.sec.uid = "1000"              |   * module.id = "2"
    settings.check-quantum = "false"       |     node.autoconnect = "true"
    settings.check-rate = "false"          |     node.name = "mpd.PipeWire Sound Server"
    window.x11.display = ":0"              |     node.rate = "1/44100"
                                           |     node.want-driver = "true"
                                           |   * object.serial = "3451"
                                           |   * pipewire.access = "unrestricted"
                                           |   * pipewire.protocol = "protocol-native"
                                           |   * pipewire.sec.gid = "1000"
                                           |   * pipewire.sec.pid = "1372"
                                           |   * pipewire.sec.uid = "1000"
                                           |     settings.check-quantum = "false"
                                           |     settings.check-rate = "false"
                                           |     stream.is-live = "true"
                                           |     window.x11.display = ":0"

The only somewhat interesting overlap I can find is:


application.process.session-id = "1"
* module.id = "2"
* pipewire.sec.gid = "1000"
* pipewire.sec.uid = "1000"
No clue what the * or any of these really mean. Let's cross check with the firefox client, whose audio seems to be independant of MPD / MPV:

# MPD / MPV                          | # FireFox
application.process.session-id = "1" | application.process.session-id = "1"
* module.id = "2"                    | * module.id = "2"
* pipewire.sec.gid = "1000"          | * pipewire.sec.gid = "1000"
* pipewire.sec.uid = "1000"          | * pipewire.sec.uid = "1000"
                                     |   
                                     | # New entry
                                     | client.api = "pipewire-pulse"
Great, that overlap was not fucking helpful, because FireFox, which does not have that issue, has the same overlap, so that's probably not it. Even worse, the only meaningful difference I see is that FireFox is using PulseAudio, so now I fear that all PipeWire-native clients are like this, FUCK. Let's check the MPV / MPD "Streams" for overlap:

# MPD                                                   |  # MPV
$ wpctl inspect 52                                      |  $ wpctl inspect 98
id 52, type PipeWire:Interface:Node                     |  id 98, type PipeWire:Interface:Node
    adapt.follower.spa-node = ""                        |      adapt.follower.spa-node = ""
    application.icon-name = "mpd"                       |      application.icon-name = "mpv"
  * application.name = "Music Player Daemon"            |      application.id = "mpv"
    audio.adapt.follower = ""                           |    * application.name = "mpv"
  * client.id = "51"                                    |      audio.adapt.follower = ""
    clock.quantum-limit = "8192"                        |    * client.id = "65"
  * factory.id = "7"                                    |      clock.quantum-limit = "8192"
    factory.mode = "split"                              |    * factory.id = "7"
    library.name = "audioconvert/libspa-audioconvert"   |      factory.mode = "split"
    media.artist = "Disasterpeace"                      |      library.name = "audioconvert/libspa-audioconvert"
  * media.category = "Playback"                         |    * media.category = "Playback"
  * media.class = "Stream/Output/Audio"                 |    * media.class = "Stream/Output/Audio"
    media.name = "Disasterpeace - Beyond"               |      media.name = "Pixel Land - mpv"
  * media.role = "Music"                                |    * media.role = "Music"
    media.title = "Beyond"                              |    * media.type = "Audio"
  * media.type = "Audio"                                |      node.always-process = "true"
    node.autoconnect = "true"                           |      node.autoconnect = "true"
  * node.name = "mpd.PipeWire Sound Server"             |    * node.description = "mpv"
    node.rate = "1/44100"                               |      node.latency = "960/48000"
    node.want-driver = "true"                           |    * node.name = "mpv"
    object.register = "false"                           |      node.rate = "1/48000"
  * object.serial = "3452"                              |      node.want-driver = "true"
    stream.is-live = "true"                             |      object.register = "false"
                                                        |    * object.serial = "3459"
                                                        |      stream.is-live = "true"
I'm noticing that factory.id is overlapping, checking against FireFox:

# MPV / MPD        | # FireFox
* factory.id = "7" | * factory.id = "6"

A lead! Ok, time to lookup what the fuck a factory is doing in WirePlumber.

Ok, this didn't help, but maybe this is a PipeWire and not a WirePlumber thing?

Pretty bad, maybe the developer docs can help me out here [0]?

Not helpful at all. More... has to save my ass here.

Fuck you too I guess.

I have no idea how that shit even happens. Linking the sound levels of two different applications sounds like a pretty niche feature. Hey, this might be useful to some, the fuck do I know. But why the fuck is this the default behaviour I have? I'm pretty certain there is a one-liner somewhere to fix this. But...

It's audio, on a desktop machine. Why is audio causing so many problems? Why is this shit not solved?

The Fix (2023.04.10)

I did some more digging and finally found a workaround for this problem: Create a file .config/wireplumber/main.lua.d/51-mpv-fix.lua with the following content:


stream_defaults.rules = {
	{
		matches = {
			{
				{ "application.name", "matches", "mpv" },
			},
		},
		apply_properties = {
			["state.restore-props"] = false,
		},
	},
}
Restart pipewire / wireplumber and finally, mpv is starting with 100% volume (and not copying the volume of MPD).

Apparently this has something to do with media roles? MPD and MPV share the same role when playing music, so adjusting the volume of MPD adjusts the volume of the role, which is used as the initial volume for every newly started application using that role.

I get the use case for this - but fucking I hell I don't want this. I should probably find a way to disable this mechanism completely, but I'm satisfied for now.


[0]: You can tell that I'm grasping at straws at this point.


<< Stream of Thoughts: What Do I Like?
>> What If I Don't Improve Myself?
 Felix' Ramblings