Weston on Raspberry Pi

One of the platforms we’ve been working on for a while at Collabora is the Raspberry Pi. Obviously the $25 pricepoint makes it hugely appealing to a lot of people — including free software developers who up until now have managed to avoid the ~~agony~~joy we experience on a daily basis working on embedded and mobile platforms — but there are a couple of aspects which speak specifically to us as a company.

Firstly, we did quite a bit of work on OLPC through the years, which had a similar, very laudable, educational mission encouraging not just deep computer literacy in children, but also open source involvement. The Raspberry Pi has broadly the same aims, a very education-friendly pricepoint, and has seen huge success.

Less loftily, it’s a great example of a number of architectures we’ve been quietly working on for quite some time, where hugely powerful special-purpose (i.e. not OpenGL ES) graphics hardware goes nearly unused, in favour of heavily loading the less powerful CPU, or pushing everything through GL.

background etc

The Raspberry Pi has a Broadcom BCM2385 SoC in it, containing an extremely beefy (roughly set-top-box-grade) video, media and graphics processor called the VideoCore (somewhat akin to a display controller, GPU and DSP hybrid), and a … somewhat less beefy general-purpose ARMv6¹ CPU. The ARM side does everything you’d expect, whereas the VideoCore is a multi-functional beast, acting as the GPU for OpenGL ES, the display engine for outputs/overlays/etc, and also any general-purpose processing (e.g. accelerated JPEG decode).

In terms of how this looks from the ARM, the VideoCore exposes its display functionality through DispManX, an API for display control similar, in capability at least, to KMS. The DispManX exposes a number of output displays, each of which can have a number of planes (sometimes called overlays or sprites) which can each be in different colourspaces (think: video), scaled, alpha-blended, or variously stacked. No surprise there, as this is how most GPUs and display controllers look everywhere: from your phone, to your desktop, to your set-top box.

A recurring theme for us is how to properly use and expose these overlays. There’s a huge benefit in doing so over using GL: not only are they a lot faster, but they also have hugely better quality when doing colourspace conversion and scaling, and extra filters for much better image quality². There’s also a pretty strong power argument to be made; at one stage, measuring on a phone, we found a 20% runtime difference — from 4 hours to a bit over 5 — when watching videos using the overlay, compared to pumping them through GL ES. And that was without the zerocopy support we enjoy nowadays!

The X Video extension (Xv) exposes overlays in a very limiting and frustrating way, meaning they’re only really suitable for video, and even then aren’t capable of zerocopy. DRI2 video support was proposed to fix this, but so far TI’s OMAP is the only real deployment of this, and client support is very patchy.

Even then, this is only applicable to video. Under the X11 model, effectively the only way to do compositing is for an external process to render the entire screen as a single image, then pass that image to the X server to display. This is fine for GL, since that’s exactly what it’s built for, but it means you’re never going to get to use all your lovely 2D compositing hardware. Either that, or you do offload compositing inside the X server: something very difficult which almost no-one does, not only because it means no compositing. And no compositing means that your desktop is all back to 1995, with those attractive flickering backgrounds and jittery resizes. :(

i thought this was about rpi … ?

We’ve been working with the Raspberry Pi Foundation since last year, when we first brought up Weston on Raspberry Pi. At that stage, Weston was still very heavily GLES-based, but was able to opportunistically pull individual surfaces out into overlays if the conditions were right. This worked, and was all well and good, but was an awkward abuse of Weston’s internal model, and made RPi look rather unlike any other backend.

Fast forward to a couple of months ago, and Weston had come a long way along the road towards 1.1. A pretty vital piece for us was the renderer infrastructure: whereas Weston previously only had pluggable backends controlling the final display output backend (think: KMS, fbdev, RDP), it now gained a similar split for the renderers.

The original split was between the original GLES renderer and a purely-software Pixman renderer (which was still capable of everything the GLES renderer was!), but it was also a perfect fit for DispManX. So, the first thing Pekka did, and the base of all our work since, was to split the RPi backend into a backend and a renderer, allowing us to guarantee that we’d never fall back to GLES. This lets us feed the entire set of windows into the hardware, complete with stacking order, alpha blending, and 2D transforms, and have the VideoCore take care of everything, all fully synchronised and tear-free. Without using GL!

Pekka’s blog has more of the gory details on both the Weston internals, and how DispManX works.

shiny demos

Seemingly no-one can talk about open source graphics without getting all misty-eyed over wobbly windows; what good is straight tech with nothing to show for it? So, while Pekka worked on the renderer, Louis-Francis and myself knocked up a couple of demos to show off what we could do with 2D compositing alone.

The first, and most compelling, demo is in our case study, in which Louis-Francis drags some windows around under both X11 and Weston for the camera. X11 struggles badly, whereas Weston with its VideoCore backend keeps up like a champ.

Louis-Francis then added an optional fade layer, where all the background windows were dimmed, complete with a nice fading animation between foreground and background. This is done internally with one or two solid-colour surfaces, with varying alpha.

Last but not least, I added a reimplementation of GNOME3’s window overview mode. It is a little on the rudimentary side, and does certainly suffer from not having an established layout/packing framework to use, but does work, and I think provides a pretty good demonstration of the kinds of smooth and fluid animations that are totally possible without ever touching GLES. And the best part is that windows zoom around, scaled and alpha-blended, at 60fps, whereas X11 struggles to get to 10fps just moving an unscaled, opaque, window. Yeesh.

And then some of the details were just that: details. Pekka add a new background scaling mode which preserves aspect ratio, Louis-Francis and Pekka worked to make sure the memory usage was always as low as humanly possible, including an optional mode where we trade strict visual correctness for performance and memory usage, we fixed some bugs in the XWayland emulation, and some bugfixes.

sounds great; how do i get it?

The source is already winging its way upstream; currently, the renderer has been merged, but some of the effects are still oustanding. In the meantime, you can clone Pekka’s repository:

$ git clone git://git.collabora.co.uk/git/user/pq/weston
$ cd weston
$ git checkout origin/raspberrypi-dispmanx-wip

Or we’ve also got Raspbian packages, based on the stable 1.0.x rather than unreleased git:

# echo deb http://raspberrypi.collabora.com wheezy rpi >> /etc/apt/sources.list
# apt-get update
# apt-get install weston

tl;dr

A lot of hardware — particularly in the media/embedded/mobile space — ships with hugely powerful special-purpose graphics hardware, aside from the usual OpenGL ES. Up until now, this special-purpose hardware has gone unused, but for equally special-purpose hacks. The work we’ve done with Weston and Raspberry Pi, using only its dedicated 2D compositing hardware, is a fantastic example of what we can do with Wayland and these kinds of platforms in general, without using GL ES.

now convince your boss

Would you love us to shut up and take your money, but just need that little bit extra to persuade those humourless lot up in purchasing? Fear not: we have an entire case study for the work we’ve done on the Raspberry Pi, as well as a general graphics page. Raspberry Pi have a writeup themselves too. Or maybe you’re more impressed (ha) by press releases? Either way, our trained operators are standing by to take your call.

ARMv6 is the sixth iteration of the ARM architecture; the chip family is ARM11, which was the generation immediately pre-Cortex. Various ARM11s were used in, e.g., the Nokia N810, the iPhone 3G, and the Nintendo 3DS. If you’re confused by the ARM nomenclature, Wikipedia has a really good table. ^[return]
Fun side note: if you want zero-copy video through GL, thanks to the unbelievably harsh wording of GL_OES_EGL_image_external, then you’re only allowed to use linear or nearest filtering. Not even bilinear, and definitely nothing near what overlays can do. ^[return]