This blog post is in a response to some very wrong things that were said during 27c3 in a talk about Desktop Linux. At the very beginning of the talk multimedia abstraction/layering was chosen as an example for how things can go wrong. In particular how Phonon's GStreamer backend causes enormous overhead for file playback. Since apparently the talker had little to know idea what is actually going on (and what was also going on some 2 years ago for that matter) I feel the need to explain how this magic actually works. First off I need to explain a bit how Phonon's own architecture looks like. Phonon itself is an abstraction layer between an individual multimedia framework or library such as GStreamer, VLC or QuickTime and a multimedia using application like Amarok. It uses some advanced Qt magic to separate those two things very clearly. The main incentive of Phonon is to provide an intuitive easy to use and stable API to applications, regardless of the platform they are executed on and the specifics of the framework used. To achieve this is has individual so-called backends which sort of wrap the underlying frameworks to the Phonon API. To the API consumer only a simplified media graph is exposed with a MediaObject as the root and a number of outputs as leafs (such as an AudioOutput to an audio device or a VideoWidget for on-screen video playback). The underlying media graphs or pipelines of the frameworks however can be a lot more complicated though. In particular usually those will also contain a module for tearing apart a video file into the actual video and audio data and then processing the data and sending it to an output. Which is roughly what GStreamer actually does. I hope that from the description you already get the idea that the way Phonon abstracts and the way GStreamer works map very closely, so Phonon mostly just needs to hide things from the API consumer. In particular it looks like this: The first thing an API consumer creates is a MediaObject, which in turn will lay out the general infrastructure for the Phonon graph. Now, as I mentioned, since Phonon really just needs to hide things in case of the GStreamer backend, what it really does is create a basic GStreamer pipeline based on a decodebin. A decodebin is sort of a convenience GStreamer element that takes care of the better part of pipeline building, in case you had to do with GStreamer earlier you might have used playbin, which is basically the same thing just with more automation. Next the API consumer will add a couple of outputs. For the sake of simplicity lets assume there is just an AudioOutput. An AudioOutput will again create a GStreamer bin, but this time for audio only. It most importantly contains an audioconvert element which transforms audio data between different formats. Other than that it also adds things like a volume control element (as obviously Phonon will have to be able to control the volume of the output). Finally it attaches an audiosink to the bin, which is the actual thing where the raw data gets stuffed into. This could be your ALSA device or possibly PulseAudio. At this point the API consumer just creates a path between the MediaObject (i.e. the pipeline) and the AudioOutput (i.e. the audiobin) and loads some media. Well, not quite. The path creation at this point only happened in Phonon. The actual GStreamer pipeline is still not connected to our GStreamer audio bin. The reason for this is that up until now we do not know what particular format the media source actually is. It could be anything from vorbis to mp3. So, what happens is that once the media gets loaded and the pipline gets data, it will notify Phonon of what GStreamer pads (sort of outputs) it has available. Phonon then tries to connect those pads to the audiobin and if successful we have audio. Hooray \o/