Planet Igalia

May 26, 2016

Manuel Rego

CSS Grid Layout and positioned items

As part of the work done by Igalia in the CSS Grid Layout implementation on Chromium/Blink and Safari/WebKit, we’ve been implementing the support for positioned items. Yeah, absolute positioning inside a grid. 😅

Probably the first idea is that come to your mind is that you don’t want to use positioned grid items, but maybe in some use cases it can be needed. The idea of this post is to explain how they work inside a grid container as they have some particularities.

Actually there’s not such a big difference compared to regular grid items. When the grid container is the containing block of the positioned items (e.g. using position: relative; on the grid container) they’re placed almost the same than regular grid items. But, there’re a few differences:

  • Positioned items don't stretch by default.
  • They don't use the implicit grid. They don't create implicit tracks.
  • They don't occupy cells regarding auto-placement feature.
  • autohas a special meaning when referring lines.

Let’s explain with more detail each of these features.

Positioned items shrink to fit

We’re used to regular items that stretch by default to fill their area. However, that’s not the case for positioned items, similar to what a positioned regular block does, they shrink to fit.

This is pretty easy to get, but a simple example will make it crystal clear:

<div style="display: grid; position: relative;
            grid-template-columns: 300px 200px; grid-template-rows: 200px 100px;">
    <div style="grid-column: 1 / 3; grid-row: 1 / 3;"></div>
    <div style="position: absolute; grid-column: 1 / 3; grid-row: 1 / 3;"></div>

In this example we’ve a simple 2x2 grid. Both the regular item and the positioned one are placed with the same rules taking the whole grid. This defines the area for those items, which takes the 1st & 2nd rows and 1st & 2nd columns.

Positioned items shrink to fit Positioned items shrink to fit

The regular item stretches by default both horizontally and vertically, so it takes the whole size of the grid area. However, the positioned item shrink to fit and adapts its size to the contents.

For the examples in the next points I’m ignoring this difference, as I want to show the area that each positioned item takes. To get the same result than in the pictures, you’d need to set 100% width and height on the positioned items.

Positioned items and implicit grid

Positioned items don’t participate in the layout of the grid, neither they affect how items are placed.

You can place a regular item outside the explicit grid, and the grid will create the required tracks to accommodate the item. However, in the case of positioned items, you cannot even refer to lines in the implicit grid, they'll be treated as auto. Which means that you cannot place a positioned item in the implicit grid. they cannot create implicit tracks as they don't participate in the layout of the grid.

Let’s use an example to understand this better:

<div style="display: grid; position: relative;
            grid-template-columns: 200px 100px; grid-template-rows: 100px 100px;
            grid-auto-columns: 100px; grid-auto-rows: 100px;
            width: 300px; height: 200px;">
    <div style="position: absolute; grid-area: 4 / 4;"></div>

The example defines a 2x2 grid, but the positioned item is using grid-area: 4 / 4; so it tries to goes to the 4th row and 4th column. However the positioned items cannot create those implicit tracks. So it’s positioned like if it has auto, which in this case will take the whole explicit grid. auto has a special meaning in positioned items, it’ll be properly explained later.

Positioned items do not create implicit tracks Positioned items do not create implicit tracks

Imagine another example where regular items create implicit tracks:

<div style="display: grid; position: relative;
            grid-template-columns: 200px 100px; grid-template-rows: 100px 100px;
            grid-auto-columns: 100px; grid-auto-rows: 100px;
            width: 500px; height: 400px;">
    <div style="grid-area: 1 / 4;"></div>
    <div style="grid-area: 4 / 1;"></div>
    <div style="position: absolute; grid-area: 4 / 4;"></div>

In this case, the regular items will be creating the implicit tracks, making a 4x4 grid in total. Now the positioned item can be placed on the 4th row and 4th column, even if those columns are on the explicit grid.

Positioned items can be placed on the implicit grid Positioned items can be placed on the implicit grid

As you can see this part of the post has been modified, thanks to @fantasai for notifying me about the mistake.

Positioned items and placement algorithm

Again the positioned items do not affect the position of other items, as they don’t participate in the placement algorithm.

So, if you’ve a positioned item and you’re using auto-placement for some regular items, it’s expected that the positioned one overlaps the other. The positioned items are completely ignored during auto-placement.

Just showing a simple example to show this behavior:

<div style="display: grid; position: relative;
            grid-template-columns: 300px 300px; grid-template-rows: 200px 200px;">
    <div style="grid-column: auto; grid-row: auto;"></div>
    <div style="grid-column: auto; grid-row: auto;"></div>
    <div style="grid-column: auto; grid-row: auto;"></div>
    <div style="position: absolute;
                grid-column: 2 / 3; grid-row: 1 / 2;"></div>

Here we’ve again a 2x2 grid, with 3 auto-placed regular items, and 1 absolutely positioned item. As you can see the positioned item is placed on the 1st row and 2nd column, but there’s an auto-placed item in that cell too, which is below the positioned one. This shows that the grid container doesn’t care about positioned items and it just ignores them when it has to place regular items.

Positioned items and placement algorithm Positioned items and placement algorithm

If all the children were not positioned, the last one would be placed in the given position (1st row and 2nd column), and the rest of them (auto-placed) will take the other cells, without overlapping.

Positioned items and auto lines

This is probably the biggest difference compared to regular grid items. If you don’t specify a line, it’s considered that you’re using auto, but auto is not resolved as span 1 like in regular items. For positioned items auto is resolved to the padding edge.

The specification introduces the concepts of the lines 0 and -0, despite how weird it can sound, it actually makes sense. The auto lines would be referencing to those 0 and -0 lines, that represent the padding edges of the grid container.

Again let’s use a few examples to explain this:

<div style="display: grid; position: relative;
            grid-template-columns: 300px 200px; grid-template-rows: 200px 200px;
            padding: 50px 100px; width: 500px; height: 400px;">
    <div style="position: absolute;
                grid-column: 1 / auto; grid-row: 2 / auto;"></div>

Here we have a 2x2 grid container, which has some padding. The positioned item will be placed in the 2nd row and 1st column, but its area will take up to the padding edges (as the end line is auto in both axis).

Positioned items and auto lines Positioned items and auto lines

We could even place positioned grid items on the padding itself. For example using “grid-column: auto / 1;” the item would be on the left padding.

Positioned items using auto lines to be placed on the left padding Positioned items using auto lines to be placed on the left padding

Of course if the grid is wider and we’ve some free space on the content box, the items will take that space too. For example:

<div style="display: grid; position: relative;
            grid-template-columns: 300px 200px; grid-template-rows: 200px 200px;
            padding: 50px 100px; width: 600px; height: 400px;">
    <div style="position: absolute;
                grid-column: 3 / auto; grid-row: 2 / 3;"></div>

Here the grid columns are 500px, but the grid container has 600px width. This means that we’ve 100px of free space in the grid content box. As you can see in the example, that space will be also used when the positioned items extend up to the padding edges.

Positioned items taking free space and right padding Positioned items taking free space and right padding


Of course you can use offsets to place your positioned items (left, right, top and bottom properties).

These offsets will apply inside the grid area defined for the positioned items, following the rules explained above.

Let’s use another example:

<div style="display: grid; position: relative;
            grid-template-columns: 300px 200px; grid-template-rows: 200px 200px;
            padding: 50px 100px; width: 500px; height: 400px;">
    <div style="position: absolute;
                grid-column: 1 / auto; grid-row: 2 / auto;
                left: 90px; right: 70px; top: 80px; bottom: 60px;"></div>

Again a 2x2 grid container with some padding. The positioned item have some offsets which are applied inside its grid area.

Positioned items and offets Positioned items and offets


I’m not completely sure about how important is the support of positioned elements for web authors using Grid Layout. You’ll be the ones that have to tell if you really find use cases that need this. I hope this post helps to understand it better and make your minds about real-life scenarios where this might be useful.

The good news is that you can test this already in the most recent versions of some major browsers: Chrome Canary, Safari Technology Preview and Firefox. We hope that the 3 implementations are interoperable, but please let us know if you find any issue.

There’s one last thing missing: alignment support for positioned items. This hasn’t been implemented yet in any of the browsers, but the behavior will be pretty similar to the one you can already use with regular grid items. Hopefully, we’ll have time to add support for this in the coming months.

Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

Last but least, thanks to Bloomberg for supporting Igalia in the CSS Grid Layout implementation on Blink and WebKit.

May 26, 2016 10:00 PM

May 24, 2016

Alberto Garcia

I/O bursts with QEMU 2.6

QEMU 2.6 was released a few days ago. One new feature that I have been working on is the new way to configure I/O limits in disk drives to allow bursts and increase the responsiveness of the virtual machine. In this post I’ll try to explain how it works.

The basic settings

First I will summarize the basic settings that were already available in earlier versions of QEMU.

Two aspects of the disk I/O can be limited: the number of bytes per second and the number of operations per second (IOPS). For each one of them the user can set a global limit or separate limits for read and write operations. This gives us a total of six different parameters.

I/O limits can be set using the throttling.* parameters of -drive, or using the QMP block_set_io_throttle command. These are the names of the parameters for both cases:

-drive block_set_io_throttle
throttling.iops-total iops
throttling.iops-read iops_rd
throttling.iops-write iops_wr
throttling.bps-total bps
throttling.bps-read bps_rd
throttling.bps-write bps_wr

It is possible to set limits for both IOPS and bps at the same time, and for each case we can decide whether to have separate read and write limits or not, but if iops-total is set then neither iops-read nor iops-write can be set. The same applies to bps-total and bps-read/write.

The default value of these parameters is 0, and it means unlimited.

In its most basic usage, the user can add a drive to QEMU with a limit of, say, 100 IOPS with the following -drive line:

-drive file=hd0.qcow2,throttling.iops-total=100

We can do the same using QMP. In this case all these parameters are mandatory, so we must set to 0 the ones that we don’t want to limit:

   { "execute": "block_set_io_throttle",
     "arguments": {
        "device": "virtio0",
        "iops": 100,
        "iops_rd": 0,
        "iops_wr": 0,
        "bps": 0,
        "bps_rd": 0,
        "bps_wr": 0

I/O bursts

While the settings that we have just seen are enough to prevent the virtual machine from performing too much I/O, it can be useful to allow the user to exceed those limits occasionally. This way we can have a more responsive VM that is able to cope better with peaks of activity while keeping the average limits lower the rest of the time.

Starting from QEMU 2.6, it is possible to allow the user to do bursts of I/O for a configurable amount of time. A burst is an amount of I/O that can exceed the basic limit, and there are two parameters that control them: their length and the maximum amount of I/O they allow. These two can be configured separately for each one of the six basic parameters described in the previous section, but here we’ll use ‘iops-total’ as an example.

The I/O limit during bursts is set using ‘iops-total-max’, and the maximum length (in seconds) is set with ‘iops-total-max-length’. So if we want to configure a drive with a basic limit of 100 IOPS and allow bursts of 2000 IOPS for 60 seconds, we would do it like this (the line is split for clarity):

   -drive file=hd0.qcow2,

Or with QMP:

   { "execute": "block_set_io_throttle",
     "arguments": {
        "device": "virtio0",
        "iops": 100,
        "iops_rd": 0,
        "iops_wr": 0,
        "bps": 0,
        "bps_rd": 0,
        "bps_wr": 0,
        "iops_max": 2000,
        "iops_max_length": 60,

With this, the user can perform I/O on hd0.qcow2 at a rate of 2000 IOPS for 1 minute before it’s throttled down to 100 IOPS.

The user will be able to do bursts again if there’s a sufficiently long period of time with unused I/O (see below for details).

The default value for ‘iops-total-max’ is 0 and it means that bursts are not allowed. ‘iops-total-max-length’ can only be set if ‘iops-total-max’ is set as well, and its default value is 1 second.

Controlling the size of I/O operations

When applying IOPS limits all I/O operations are treated equally regardless of their size. This means that the user can take advantage of this in order to circumvent the limits and submit one huge I/O request instead of several smaller ones.

QEMU provides a setting called throttling.iops-size to prevent this from happening. This setting specifies the size (in bytes) of an I/O request for accounting purposes. Larger requests will be counted proportionally to this size.

For example, if iops-size is set to 4096 then an 8KB request will be counted as two, and a 6KB request will be counted as one and a half. This only applies to requests larger than iops-size: smaller requests will be always counted as one, no matter their size.

The default value of iops-size is 0 and it means that the size of the requests is never taken into account when applying IOPS limits.

Applying I/O limits to groups of disks

In all the examples so far we have seen how to apply limits to the I/O performed on individual drives, but QEMU allows grouping drives so they all share the same limits.

This feature is available since QEMU 2.4. Please refer to the post I wrote when it was published for more details.

The Leaky Bucket algorithm

I/O limits in QEMU are implemented using the leaky bucket algorithm (specifically the “Leaky bucket as a meter” variant).

This algorithm uses the analogy of a bucket that leaks water constantly. The water that gets into the bucket represents the I/O that has been performed, and no more I/O is allowed once the bucket is full.

To see the way this corresponds to the throttling parameters in QEMU, consider the following values:

  • Water leaks from the bucket at a rate of 100 IOPS.
  • Water can be added to the bucket at a rate of 2000 IOPS.
  • The size of the bucket is 2000 x 60 = 120000.
  • If iops-total-max is unset then the bucket size is 100.


The bucket is initially empty, therefore water can be added until it’s full at a rate of 2000 IOPS (the burst rate). Once the bucket is full we can only add as much water as it leaks, therefore the I/O rate is reduced to 100 IOPS. If we add less water than it leaks then the bucket will start to empty, allowing for bursts again.

Note that since water is leaking from the bucket even during bursts, it will take a bit more than 60 seconds at 2000 IOPS to fill it up. After those 60 seconds the bucket will have leaked 60 x 100 = 6000, allowing for 3 more seconds of I/O at 2000 IOPS.

Also, due to the way the algorithm works, longer burst can be done at a lower I/O rate, e.g. 1000 IOPS during 120 seconds.


As usual, my work in QEMU is sponsored by Outscale and has been made possible by Igalia and the help of the QEMU development team.


Enjoy QEMU 2.6!

by berto at May 24, 2016 11:47 AM

May 23, 2016

Igalia Compilers Team

Awaiting the future of JavaScript in V8

On the evening of Monday, May 16th, 2016, we have made history. We’ve landed the initial implementation of “Async Functions” in V8, the JavaScript runtime in use by the Google Chrome and Node.js. We do these things not because they are easy, but because they are hard. Because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one we are willing to accept. It is very exciting to see this, roughly 2 months of implementation, codereview and standards finangling/discussion to land. It is truly an honour.


To introduce you to Async Functions, it’s first necessary to understand two things: the status quo of async programming in JavaScript, as well as Generators (previously implemented by fellow Igalian Andy).


Async programming in JavaScript has historically been implemented by callbacks. `window.setTimeout(function toExecuteLaterOnceTimeHasPassed() {}, …)` being the common example. Callbacks on their own are not scalable: when numerous nested asynchronous operations are needed, code becomes extremely difficult to read and reason about. Abstraction libraries have been tacked on to improve this, including caolan’s async package, or Promise libraries such as Q. These abstractions simplify control flow management and data flow management, and are a massive improvement over plain Callbacks. But we can do better! For a more detailed look at Promises, have a look at the fantastic MDN article. Some great resources on why and how callbacks can lead to utter non-scalable disaster exist too, check out!


The second concept, Generators, allow a runtime to return from a function at an arbitrary line, and later re-enter that function at the following instruction, in order to continue execution. So right away you can imagine where this is going — we can continue execution of the same function, rather than writing a closure to continue execution in a new function. Async Functions rely on this same mechanism (and in fact, on the underlying Generators implementation), to achieve their goal, immensely simplifying non-trivial coordination of asynchronous operations.


As a simple example, lets compare the following two approaches:
function deployApplication() {
  return cleanDirectory(__DEPLOYMENT_DIR__).
      deps => Promise.all(
          dep => moveToDeploymentSite(
    then(() => compileSources(__SRC_DIR__,

The Promise boiler plate makes this preit harder to read and follow than it could be. And what happens if an error occurs? Do we want to add catch handlers to each link in the Promise chain? That will only make it even more difficult to follow, with error handling interleaved in difficult to read ways.

Lets refactor this using async functions:

async function deployApplication() {
    await cleanDIrectory(__DEPLOYMENT_DIR__);
    let dependencies = await fetchNpmDependencies();

    // *see below*
    for (let dep of dependencies) {
        await moveToDeploymentSite(

    await compileSources(__SRC_DIR__,
    return uploadToServer();


You’ll notice that the “moveToDeploymentSite” step is slightly different in the async function version, in that it completes each operation in a serial pipeline, rather than completing each operation in parallel, and continuing once finished. This is an unfortunate limitation of the async function specification, which will hopefully be improved on in the future.


In the meantime, it’s still possible to use the Promise API in async functions, as you can `await` any Promise, and continue execution after it is resolved. This grants compatibility with  numerous existing Web Platform APIs (such as `fetch()`), which is ultimately a good thing! Here’s an alternative implementation of this step, which performs the `moveToDeploymentSite()` bits in parallel, rather than serially:


await Promise.all(
  dep => moveToDeploymentSite(


Now, it’s clear from the let dependencies = await fetchNpmDependencies(); line that Promises are unwrapped automatically. What happens if the promise is rejected with an error, rather than resolved with a value? With try-catch blocks, we can catch rejected promise errors inside async functions! And if they are not caught, they will automatically return a rejected Promise from the async function.

function throwsError() { throw new Error("oops"); }

async function foo() { throwsError(); }

// will print the Error thrown in `throwsError`.

async function bar() {
  try {
    var value = await foo();
  } catch (error) {
    // Rejected Promise is unwrapped automatically, and
    // execution continues here, allowing us to recover
    // from the error! `error` is `new Error("oops!")`


There are also lots of convenient forms of async function declarations, which hopefully serve lots of interesting use-cases! You can concisely declare methods as asynchronous in Object literals and ES6 classes, by preceding the method name with the `async` keyword (without a preceding line terminator!)


class C {
  async doAsyncOperation() {
    // ...

var obj = {
  async getFacebookProfileAsynchronously() {
    /* ... */


These features allow us to write more idiomatic, easier to understand asynchronous control flow in our applications, and future extensions to the ECMAScript specification will enable even more idiomatic forms for writing complex algorithms, in a maintainable and readable fashion. We are very excited about this! There are numerous other resources on the web detailing async functions, their benefits, and perhaps ways they might be improved in the future. Some good ones include [this piece from Google’s Jake Archibald](, so give that a read for more details. It’s a few years old, but it holds up nicely!


So, now that you’ve seen the overview of the feature, you might be wondering how you can try it out, and when it will be available for use. For the next few weeks, it’s still too experimental even for the “Experimental Javascript” flag.  But if you are adventurous, you can try it already!  Fetch the latest Chrome Canary build, and start Chrome with the command-line-flag `–js-flags=”–harmony-async-await”`. We can’t make promises about the shipping timeline, but it could ship as early as Chrome 53 or Chrome 54, which will become stable in September or October.


We owe a shout out to Bloomberg, who have provided us with resources to improve the web platform that we love. Hopefully, we are providing their engineers with ways to write more maintainable, more performant, and more beautiful code. We hope to continue this working relationship in the future!


As well, shoutouts are owed to the Chromium team, who have assisted in reviewing the feature, verifying its stability, getting devtools integration working, and ultimately getting the code upstream. Terriffic! In addition, the WebKit team has also been very helpful, and hopefully we will see the feature land in JavaScriptCore in the not too distant future.

by compilers at May 23, 2016 02:08 PM

May 20, 2016

Víctor Jáquez

GStreamer Hackfest 2016

Yes, it happened again: the Gstreamer Spring Hackfest 2016! This time in the beautiful city of Thessaloniki. Thanks a lot, Vivia and Sebastian, for making it happen.

Thessaloniki’s promenade

My objective this time was to work with dma-buf support in gstreamer-vaapi. Though it is supported already, it needs a major clean up, and to extend its usage for downstream buffers (bugs 755072 and 765435).

In the way I learned that we need to update our internal API (called `libgstvaapi`), when handling dma-buf, to support mult-plane formats.

On the other hand, Nicolas Dufresne and I talked a bit about kmssink, libdrm and dma-buf. He managed to hack his Odroid U (Exynos3) to enable its V4L2 mem2mem video decoder and share buffers with kmssink. It was amazing. By the way, he promised me to write a blog post with the instructions to replicate his deed.

GStreamer Hackfest
Photo by Luis de Bethencourt

Finally, we had a preview of Edward Hervey‘s decodebin3. It is fun his test of switching the different audio streams in a media container (the different available dubbings) in every second or less. It was truly a multi-language audio!

In the meantime, we shared beers and meals, learning and laughing.

by vjaquez at May 20, 2016 10:32 AM

May 18, 2016

Antonio Gomes

[Chromium] content_shell running on Wayland desktop (Weston Compositor)

During my first weeks at Igalia, I got an interesting and promising task of Understand the status of Wayland support in Chromium upstream.

At first I could see clear signs that Wayland is being actively considered by the Chromium community:

  1. Ozone/Wayland project by Intel – which was my starting point as described later on.
  2. The meta bug “upstream wayland backend for ozone (20% project)“, which has some recent activity by the Chromium community.
  3. This comment in the chromium-dev mailing list (by a Googler):
  4. “(..) I ‘d recommend using ToT. Wayland support is a work-in-progress and newer trees will probably be better.”.

    Chromium’s DEPS file has “wayland” as one its core dependency (search for “wayland”).

Next step, naturally, was get my hands dirty, compiling and experimenting with it. I decided to start with content_shell.


Ubuntu 16.04 LTS, regular Gnome session (with X) and Weston Wayland compositor running (no ‘xwayland’ installed) to run Wayland apps.


component=static_library use_ash=1 use_aura=1 chromeos=0 use_ozone=1 ozone_platform_wayland=1 use_wayland_egl=1 ozone_auto_platforms=0 use_xkbcommon=1 clang=0 use_sysroot=0 linux_use_bundled_gold=0

(note: GYP was used for personal familiarity with it, but GN applies fine here).

Chromium version

Base SHA 5da650481 (as of 2016/May/17)

Initial results

As is, content_shell built fine, but hung at runtime upon launch, hitting a CHECK at

Analysis and Action

After understanding current Ozone/Wayland upstream support, compare designs/code against, I could start connecting some of the missing dots.

The following files were “ported” from

  • ui/views/widget/desktop_aura/ / h
  • ui/views/widget/desktop_aura/ / h
  • ui/views/widget/desktop_aura/ / h

And then, I implemented DesktopFactoryOzoneWayland class (ui/views/widget/ – it inherits from DesktopFactoryOzone, and implements the following pure virtual methods ::CreateWindowTreeHost and ::CreateDesktopScreen.

Initial result

After that, I could build and run content_shell with Weston Wayland Compositor (with no ‘xwayland’ installed). See a quick preview below.


As is, the UI process owns the the Wayland connection, and GPU process runs without GL support.

UI processes initializes Ozone by calling:

#0 ui::WaylandSurfaceFactory::WaylandSurfaceFactory
#1 ui::OzonePlatformWayland::InitializeUI
#2 ui::OzonePlatform::InitializeForUI();
#3 aura::Env::Init()
#4 aura::Env::CreateInstance()
#5 content::BrowserMainLoop::InitializeToolkit()
#X content::ContentMain()
On the other side, GPU process gets initialized by calling:
#0 ui::WaylandSurfaceFactory::WaylandSurfaceFactory
#1 ui::OzonePlatformWayland::InitializeGPU
#2 ui::OzonePlatform::InitializeForGPU();
#3 gfx::InitializeStaticGLBindings()
#4 gfx::GLSurface::InitializeOneOffImplementation()
#5 gfx::GLSurface::InitializeOneOff()
#6 content::GpuMain()

Differently from UI process, the GPU process call does not initialize OzonePlatformWayland::display_ and instead passes ‘nullptr’ to WaylandSurfaceFactory ctor.

Down the road on the GPU processes initialization WaylandSurfaceFactory::LoadEGLGLES2Bindings is called but bails out earlier explicitly because display_ is NIL.
Then, UI process falls back to software rendering (see call to WaylandSurfaceFactory::CreateCanvasForWidget).

Next step

  • So far I have experimented Ozone/Wayland support using “linux” as the target OS. As far as I can tell, most of the Ozone work upstream though has been focusing on “chromeos” builds instead (e.g. ozone/x11).
  • Hence the idea is to clean up the code and agree with Googlers / (Intel) people about how to best make use of this code.
  • It is being also discussed with some Googlers what the best way to tackle this lack of GL support is. Some real nice stuff are on the pipe here.


by agomes at May 18, 2016 08:21 PM

May 10, 2016

Sergio Villar

Automatizing the Grid

My Igalia colleagues and me have extensively reviewed how to create grids and how to position items inside the grid using different CSS properties. So far everything was more or less static. We declare the sizes of our columns/rows or define a set of grid areas and that’s it. Well, actually there is room for automatic stuff,  you can dynamically create new tracks just by adding items to positions outside the explicit grid. Furthermore the grid is able to auto-position items for you if you don’t really care much about the final destination.

Use Cases

But imagine the following use case. Let’s assume that you are designing the product catalog of your pretty nice web store. CSS Grid Layout is the obvious choice for such layout. Just set some columns and some rows and that’s mostly it. Thing is that your catalog is likely not static, but automatically generated from a query to some database where you store all that data. You cannot know a priori how many items you’re going to show (users normally can also filter results).

Grid already supports that use case. You can already define the number of columns (rows) and let the grid create as many rows (columns) as needed. Something like this:

    grid-template-columns: repeat(3, 100px);
    grid-auto-rows: 100px;
    grid-auto-flow: row;

Picture showing auto row creation


The Multicol Example

But you need more flexibility. You’re happy with the grid creating new tracks on demand in one axis, but you also want to have as many tracks as possible (depending on the available size) in the other axis. You’ve already designed web sites using CSS Multicol and you want your grid to behave like columns: 100px;

Note that in the case of multicol the specified column width is an “optimal” size, meaning that it could be enlarged/narrowed to fill the container once the number of columns is calculated.

So is it possible to tell grid to add as many tracks as needed to fill some available space? It was not but now we have…

Repeat to fill: auto-fill and auto-fit

The grid way to implement it is by using the recently added auto repeat syntax. The already known repeat() function accepts as a first argument two new keywords, auto-fill and auto-fit. Both of them will generate as many repetitions as needed to fill the available space with the specified tracks without overflowing. The only difference between them is that, auto-fit will additionally drop any empty track (meaning no items spanning through it) generated by repeat() after positioning grid items.

The use of these two new keywords has some limitations:

  • Tracks with intrinsic or flexible sizes cannot be combined with auto repetitions
  • There can be just one auto-repeat per axis at the most
  • repeat(auto-fill|fit,) accepts only one track size as second argument (all auto repeat tracks will have the same size)

Some examples of valid declarations:

grid-template-columns: repeat(auto-fill, minmax(200px, 1fr)) [last];
grid-template-rows: 100px repeat(auto-fit, 2em) repeat(10, minmax(15%, 1fr));
grid-template-columns: repeat(auto-fill, minmax(max-content, 300px));

And some examples of invalid ones:

grid-template-columns: min-content repeat(auto-fill, 25px) 10px;
grid-template-rows: repeat(auto-fit, 15px) repeat(auto-fill, minmax(10px, 100px);
grid-template-columns: repeat(auto-fill, min-content);

The Details

I mentioned that we cannot mix flexible and/or intrinsic tracks with auto repetitions, so why repeat(auto-fill, minmax(200px, 1fr)) is a valid declaration? Well, according to the grammar, the auto repeat syntax require something called a <fixed-size> track, which is basically a track which has a length (like 10px) or a percentage in either its min or max function. The following are all <fixed-size> tracks:

minmax(min-content, 200px)
minmax(5%, 1fr)

That length/percentage is the one used by the grid to compute the number of repetitions. You should also be aware of how the number of auto repeat tracks is computed. First of all you need a definite size on the axis where you want to use auto-repeat. If that is not the case then the number of repetitions will be 1.

The nice thing is that the definite size could be either the “normal” size (width: 250px), the max size (max-height: 15em) or even the min size (min-width: 200px). But there is an important difference, if either the size or max-size is definite, then the number of repetitions is the largest possible positive integer that does not cause the grid to overflow its grid container. Otherwise, if the min-size is definite, then the number of repetitions is the smallest possible positive integer that fulfills that minimum requirement.

For example, the following declaration will generate 4 rows:

height: 600px;
grid-template-rows: repeat(auto-fill, 140px);

But this one will generate 5:

min-height: 600px;
grid-template-rows: repeat(auto-fill, 140px);

I want to use it now!

Sure, no problem. I’ve just landed the support for auto-fill in both Blink and WebKit meaning that you’ll be able to test it (unprefixed) really soon in Chrome Canary and Safari Technology Preview (Firefox Nightly builds already support both auto-fill and auto-fit). Many thanks to our friends at Bloomberg for sponsoring this work. Enjoy!

by svillar at May 10, 2016 07:25 PM

May 02, 2016

Diego Pino

Network namespaces: IPv6 connectivity

In the last post I introduced network namespaces and showed a practical example on how to share IPv4 connectivity between a network namespace and a host. Before that post, I also wrote a short tutorial on how to set up an IPv6 tunnel using Hurricane Electric broker service. This kind of service allow us to get into the IPv6 realm using an IPv4 connection.

In this article I continue exploring network namespaces. Taking advantage of the work done in the aforementioned posts, I explain in this post how to share IPv6 connectivity between a host and a network namespace.

Let’s assume we already have a SIT tunnel (IPv6-in-IPv4 tunnel) enabled in our host and we’re able to ping an external IPv6 address. If you haven’t, I encourage you to check out set up an IPv6 tunnel.

$ ping6
PING 56 data bytes
64 bytes from icmp_seq=1 ttl=57 time=93.2 ms

I need to write an script which will create the network namespace and set it up accordingly. I call that script ns-ipv6. If the script works correctly, I should be able to ping an external IPv6 host from the namespace. Such script looks like this:

 1 #!/bin/bash
 3 set -x
 5 if [[ $EUID -ne 0 ]]; then
 6    echo "You must run this script as root."
 7    exit 1
 8 fi
10 VETH1_IPV6=fd00::1
11 VPEER1_IPV6=fd00::2
13 # Clean up.
14 ip netns del ns-ipv6 &>/dev/null
15 ip li del veth1 &> /dev/null
17 # Create network namespace.
18 ip netns add ns-ipv6
20 # Create veth pair.
21 ip li add name veth1 type veth peer name vpeer1
23 # Setup veth1 (host).
24 ip -6 addr add ${VETH1_IPV6}/64 dev veth1
25 ip li set dev veth1 up
27 # Setup vpeer1 (network namespace).
28 ip li set dev vpeer1 netns ns-ipv6
29 ip netns exec ns-ipv6 ip li set dev lo up
30 ip netns exec ns-ipv6 ip -6 addr add ${VPEER1_IPV6}/64 dev vpeer1
31 ip netns exec ns-ipv6 ip li set vpeer1 up
33 # Make vpeer1 default gw.
34 ip netns exec ns-ipv6 ip -6 route add default dev vpeer1 via ${VETH1_IPV6}
36 # NAT
37 sysctl -w net.ipv6.conf.all.forwarding=1
38 ip6tables -t nat --flush
39 ip6tables -t nat -A POSTROUTING -o he-ipv6 -j MASQUERADE
41 # Get into ns-ipv6.
42 ip netns exec ns-ipv6 /bin/bash --rcfile <(echo "PS1=\"ns-ipv6> \"")

It actually works:

$ sudo ./
ns-ipv6> ping6 -c 1 2a00:1450:4003:801::200e
PING 2a00:1450:4003:801::200e(2a00:1450:4003:801::200e) 56 data bytes
64 bytes from 2a00:1450:4003:801::200e: icmp_seq=1 ttl=54 time=83.7 ms

Let’s take a deeper look on how it works.


The script creates a veth pair to communicate the network namespace with the host. Each virtual interface is assigned an IPv6 address in the ‘fd00::0/64’ network space (Lines 21 and 27). This type of address is known as ULA or Unique Local Address. ULAs are the IPv6 counterpart of IPv4 private addresses.

Before continuing, a brief reminder on how IPv6 addresses work:

An IPv6 address is a 128-bit value represented as 8 blocks of 16-bit (8 x 16-bit = 128-bit). Blocks are separated by a colon (‘:’). Unlike IPv4 addresses, block values are written in hexadecimal. Since each block is a 16-bit value, they can be written in hexadecimal as 4-digit numbers. Leading zeros in each block can be ommitted. On the same hand, when several consecutive block values are zero they can be ommitted too. In that case two colons (‘::’) are written instead, meaning everything in between is nil. For instance, the address ‘fd00::1’ is the short form of the much longer address ‘fd00:0000:0000:0000:0000:0000:0000:0001’.

RFC 4193 (section 3) describes Unique Local Addresses format as:

| 7 bits |1|  40 bits   |  16 bits  |          64 bits           |
| Prefix |L| Global ID  | Subnet ID |        Interface ID        |

- Prefix is always fc00::/7.
- L:
  * set to 1, for local assigned addresses (block _fd00::/8_).
  * set to 0, not defined yet (block _fc00::/8_).
- Global ID: 40-bit global identifier used to create a globally unique prefix.
- Subnet ID: 16-bit Subnet ID is an identifier of a subnet within the site.
- Interface ID: 64-bit Interface ID.

The RFC reserves the IPv6 address block ‘fc00::/7’ for ULAs. It divides this address in two subnetworks: ‘fc00::/8’ and ‘fd00::/8’. The use of the ‘fc00::/8’ block has not been defined yet, while the ‘fd00:/8’ block is used for IPv6 local assigned addresses (private addresses).

The address ‘fd63:b1f4:7268:d970::1’ is an example of a valid ULA. It starts by the ‘fd’ prefix followed by an unique Global ID (‘63:b1f4:7268’) and a Subnet ID (‘d970’), leaving 64 bits for the Interface ID (‘::1’). I recommend the page Private IPv6 address range to obtain valid random ULAs.

ULAs are not routable in the global Internet. They are meant to be used inside local networks and that’s precisely the reason why they exist.

NAT on IPv6

Lines 34-36 activate IPv6 forwarding and IP Masquerade on the source address. However, this solution is not optimal.

The Hurricane Electric tunnel broker service lends us a ‘::0/64’ block, with 2^64 - 2 valid hosts. NAT, Network Address Translation, grants a host in a private network external connectivity via a proxy that lends the private host its address. This is the most common use case of NAT, known as Source NAT. Besides IP addresses, NAT translates port numbers too and that’s why it’s sometimes referred as NAPT (Network Address and Port Translation). NAT has been an important technology for optimizing the use of IPv4 address space, although it has its costs too.

The original goal of IPv6 was solving the problem of IP address exhaustation. Mechanisms such as NAT are not needed because the IPv6 address space is so big that every host could have an unique address, reachable from another end of the network. Actually, IPv6 brings back the original point-to-point design of the IPv4 Internet, before private addresses and NAT were introduced.

So let’s try to get rid of NAT66 (IPv6-to-IPv6 translation) by:

  • Using global IPv6 addresses.
  • Removing MASQUERADING.

The new script is available as a gist here: There’s some tricky bits that are worth explaining:

First thing, is to replace the ULAs by IPv6 addresses which belong to /64 block leased by Hurricane Electric:


When setting up the interfaces, the host side should add a more restricted routing rule for the other end of the veth pair. The reason is that all addresses belong to the same network. If from the host side a packet needs to get to the network namespace side, it would be routed through the IPv6 tunnel unless there’s a more restricted rule.

# Setup veth1 (host).
ip -6 addr add ${VETH1_IPV6} dev veth1
ip -6 route add ${VPEER1_IPV6}/128 dev veth1

Lastly, NAT66 can be removed but IP forwarding is still necessary as the host acts as a router.

# Enable IPv6 forwarding.
systctl net.ipv6.conf.all.forwarding=1

When a packet arrives from the network namespace into the host, the destination address of the packet doesn’t match any of the interfaces of the host. If IP forwarding were disabled, the packet will simply be dropped. However, since IP forwarding is enabled, non-delivered packets get forwarded through the host’s default gateway reaching their destination, hopefully.

After these changes, the script still works:

ns-ipv6-no-nat> ping6 2a00:1450:4004:801::200e
PING 2a00:1450:4004:801::200e(2a00:1450:4004:801::200e) 56 data bytes
64 bytes from 2a00:1450:4004:801::200e: icmp_seq=1 ttl=56 time=86.0 ms

DNS resolution

In the line above I’m pinging an IPv6 address directly (this address is actually What happens if I try to ping a host name instead?

ns-ipv6> ping6
unknown host

When I ping from the network namespace, the /etc/resolv.conf file is queried to obtain a DNS nameserver address.



This address is an IPv4 address, but the network namespace has IPv6 connectivity only. It cannot reach any host in the IPv4 realm. However, DNS resolution works in the host since the host has either IPv6 and IPv4 connectivity. It is necessary to add a DNS server with an IPv6 address. Luckily, Google has DNS servers available in the IPv6 realm too.

nameserver 2001:4860:4860::8888

Now I should be able to ping from the network namespace:

ns-ipv6> ping6 -c 1
PING 56 data bytes
64 bytes from icmp_seq=1 ttl=57 time=85.7 ms

--- ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 85.702/85.702/85.702/0.000 ms

Wrapping up

After all these changes we end up with a script that:

  • Uses Hurricane Electric’s IPv6 network addresses, instead of ULAs.
  • Doesn’t do NAT66 to provide external IPv6 connectivity to the network namespace.

It has been a lot of fun writing out this post, it helped me to understand many things better. I definitely encourage everyone interested to run some of the scripts above and try out IPv6, if you haven’t yet. The network namespace part is not fundamental but it makes it more interesting.

Lastly, I’d like to thank my colleague Carlos López for his unvaluable help as well as the StackOverflow community which helped me to figure out the script that gets rid of NAT66.

May 02, 2016 10:00 AM

April 29, 2016

Javier Muñoz

Scalable placement of replicated data in Ceph

One of the most interesting and powerful features in Ceph is the way how it computes the placement and storage of a growing amount of data at hyperscale.

This computation avoids the need to look up data locations in a central directory in order to allow nodes to be added or removed, moving as few objects as possible while still maintaining balance across new cluster configurations.

Ceph and the challenges of data placement at scale

Take into consideration the traditional standard storage array in the industry. It has the usual two controller units (for redundancy) and a lot of disk trays. Those controllers connect with a storage area network (SAN) in order to provide storage to clients (servers mainly).

The disk trays are connected to the storage controllers and all storage clients use the disks through those controllers. Scaling up the capacity and performance of the array requires adding/handling more disks to the controllers.

As expected, raising the order of scale the controllers become a bottleneck. They will get overloaded in some moment. The usual solution for this bottleneck is buying a new pair of controllers with a new array of disks and moving some load onto the new hardware.

This solution is expensive and not very flexible. It can work in the terabyte scale but it doesn't fit really well with the new Cloud industry demmanding elasticity and extreme flexibility on the petabyte/exabyte scale.

The Ceph approach to cope with these challenges is the design and implementation of a new scale-out distributed storage system based on commodity hardware communicating over a regular TCP/IP network.

This approach enables a fast and more affordable non-disruptive operational process on demmand while supporting the new petabyte scale with a very high performance. This kind of hyper scalable storage fits really well with the Cloud models.

The way how Ceph enables hyperscale storage is avoiding any kind of centralized coordination via autonomous and smart storage nodes. In this scale the design of the cluster software determines how many nodes the cluster can handle. If the cluster scales out to a hundred of nodes we will be working around the petabytes and millions of I/O operations per second (IOPS)

Another point to consider in this context is the inclusion of the new failure domains coming from the scale-out architecture. In this configuration each node is a new failure domain. This is usually handled by the software coordinating the cluster via copies of all data on at least two nodes. This approach is known as replica-based data protection.

Ceph is able to enable hyperscale storage via CRUSH, a hash-based algorithm for calculating how and where to store and retrieve data in a distributed-object storage cluster.

This major challenge of 'data placement at scale', and how it is designed and implemented, has a huge impact on the scalability and performance of massive storage solutions.

A "bird's-eye view" in the Ceph's data placement

In order to explore CRUSH we need to understand concepts such as pools, placement groups, OSDs and so on. If you are not familiar with the Ceph architecture I would suggest reading one of my previous posts where all this information is summarised together with pointers to the official documentation.

From now on I will consider you are familiar with the general concepts and we will keep the focus on CRUSH and related components.

Ceph uses CRUSH to determine how to store and retrieve data by computing data storage locations. Those data are handled as objects of variable size by Ceph in a simple flat namespace. On top of these native objects, Ceph build the rest of abstractions such as blocks, file systems or S3 buckets.

In this point we can reformulate the data placement problem how an object-to-osd mapping problem without loss of generality.

In Ceph, the objects belong to pools and pools are comprised of placement groups. Each placement group maps to a list of OSDs. This is the critical path you need to understand.

The pool is the way how Ceph divides the global storage. This division or partition is the abstraction used to define the resilience (number of replicas, etc.), the number of placement groups, the CRUSH ruleset, the ownership and so on. You can consider this abstraction as the right place to define the configuration of your policies, so each pool handles its own number of replicas, number of placement groups, etc.

The placement group is the abstraction used by Ceph to map objects to OSDs in a dynamic way. You can consider it as the placement or distribution unit in Ceph.

So how we go from objects to OSDs via pools and placements groups? It is straight.

In Ceph one object will be stored in a concrete pool so the pool identifier (a number) and the name of the object are used to uniquely identify the object in the system.

Those two values, the pool id and the name of the object, are used to get a placement group via hashing.

When a pool is created it is assigned a number of placement groups (PGs). One object is always stored in a concrete pool so the pool identifier (a number) and the name of the object is used to uniquely identify each object in the system.

With the pool identifier and the hashed name of the object, Ceph will compute the hash modulo the number of PGs to retrieve the dynamic list of OSDs.

In detail, the steps to compute for one placement group for the object named 'tree' in the pool 'images' (pool id 7) with 65536 placement groups would be...

  1. Hash the object name : hash('tree') = 0xA062B8CF
  2. Calculates the hash modulo the number of PGs : 0xA062B8CF % 65536 = 0xB8CF
  3. Get the pool id : 'images' = 7
  4. Prepends the pool id to 0xB8CF to get the placement group: 7.B8CF

Ceph uses this new placement group (7.B8CF) together with the cluster map and the placement rules to get the dynamic list of OSDs...

  • CRUSH('7.B8CF') = [4, 19, 3]

The size of this list is the number of replicas configured in the pool. The first OSD in the list is the primary, the next one is the secondary and so on.

Understanding CRUSH

The CRUSH (Controlled Replication Under Scalable Hashing) algorithm determines how to store and retrieve data by computing data storage locations. As mentioned in the previous lines, Ceph uses this approach to overcome the data placement at scale.

Under the hood, the algorithm requires knowing how the cluster storage is organized (device locations, hierarchies and so on). All this information is defined in the CRUSH map.

The roles and responsibilities of the CRUSH map are the following:

  • Together with a ruleset for each hierarchy, the map determines how Ceph stores data.

  • It contains at least one hierarchy of nodes and leaves.

  • The nodes of a hierarchy are called 'buckets'. Those buckets are defined by their type.

  • The data objects are distributed among the storage devices according to a per-device weight value, approximating an uniform probability distribution.

  • The hierarchies are arbitraries. They are defined according to your own needs but the leaf nodes always represent the OSDs, and each leaf belong to one node or bucket.

Let's see one example of default hierarchy...

$ ceph osd tree
-1 0.16672 root default                                      
-2 0.05557     host node-1                                   
 0 0.02779         osd.0        up  1.00000          1.00000 
 1 0.02779         osd.1        up  1.00000          1.00000 
-3 0.05557     host node-2                                   
 2 0.02779         osd.2        up  1.00000          1.00000 
 3 0.02779         osd.3        up  1.00000          1.00000 
-4 0.05557     host node-3                                   
 4 0.02779         osd.4        up  1.00000          1.00000 
 5 0.02779         osd.5        up  1.00000          1.00000 

In this case the bucket hierarchy has six leaf buckets (osd 0-5), three host buckets (node 1-3) and one root node (default)

Having a look in the decompiled map we get the following content...

$ ceph osd getcrushmap -o compiled-crush-map.bin
got crush map from osdmap epoch 65
$ crushtool -d compiled-crush-map.bin -o decompiled-crush-map.txt
$ cat decompiled-crush-map.txt
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host node-1 {
id -20 # do not change unnecessarily
# weight 0.056
alg straw
hash 0 # rjenkins1
item osd.0 weight 0.028
item osd.1 weight 0.028
host node-2 {
id -32 # do not change unnecessarily
# weight 0.056
alg straw
hash 0 # rjenkins1
item osd.2 weight 0.028
item osd.3 weight 0.028
host node-3 {
id -48 # do not change unnecessarily
# weight 0.056
alg straw
hash 0 # rjenkins1
item osd.4 weight 0.028
item osd.5 weight 0.028
root default {
id -10 # do not change unnecessarily
# weight 0.167
alg straw
hash 0 # rjenkins1
item node-1 weight 0.056
item node-2 weight 0.056
item node-3 weight 0.056

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit

# end crush map

As you can see the bucket declaration requires specifying its type, an unique name, weight and hash algorithm.

[bucket-type] [bucket-name] {
   id [a unique negative numeric id]
   weight [the relative capacity/capability of the item(s)]
   alg [uniform, list, tree, straw, straw2]
   hash [the hash type]
   item [item-name] weight [weight]

The kinds of buckets (uniform, list, tree, straw2, etc) represent internal (non-leaf) nodes in the cluster hierarchies. Those buckets are based on different internal data structures and utilize different functions for pseudo-random choosing nested items.

The typical example is the uniform bucket, in this case all selected items are restricted in that they must contain items that are all of the same weight.

The hash type represent the hash algorithm used in the functions associated with the different kinds of buckets.

Finally, the CRUSH rules define how a Ceph client and OSDs select buckets.

rule  {
   type [ replicated | raid4]
   step take 
   step [choose|chooseleaf] [firstn|indep]  
   step emit

The Jenkins hash function

We mentioned Ceph, and CRUSH, use a hash function as part of their logic but we didn't comment anything about this function yet.

This function is known as the Jenkins hash function, a hash function for hash table lookups. One paper covering the technical details on this hash function is available here.

The paper presents fast and reliable hash functions for table lookup using 32-bit or 64-bit arithmetic together with a framework for evaluating hash functions.

In Ceph, the Jenkins function is not only used in CRUSH as part of the replicas selection. It is used along the Ceph codebase when some hashing requirement is needed.

As Jenkins comments on his paper, these hashes work equally well on all types of inputs, including text, numbers, compressed data, counting sequences, and sparse bit arrays.

I would highlight the following points related to the hash function design. They are relevant to understand the hashing code while sequencing, masking, etc. in Ceph/CRUSH the different input values.

  1. If the hash value needs to be smaller that 32 (64) bits, you can mask the high bits.
  2. The hash functions work best if the size of the hash table is a power of 2.
  3. If the hash table has more than 232 (264) entries, this can be handled by calling the hash function twice with different initial initvals then concatenating the results.
  4. If the key consists of multiple strings, the strings can be hashed sequentially, passing in the hash value from the previous string as the initval for the next.
  5. Hashing a key with different initial initvals produces independent hash values.

Ceph and data placement in practice

Let's locate one object...

$ ceph health
$ ceph osd tree
-1 0.16672 root default                                      
-2 0.05557     host node-1                                   
 0 0.02779         osd.0        up  1.00000          1.00000 
 1 0.02779         osd.1        up  1.00000          1.00000 
-3 0.05557     host node-2                                   
 2 0.02779         osd.2        up  1.00000          1.00000 
 3 0.02779         osd.3        up  1.00000          1.00000 
-4 0.05557     host node-3                                   
 4 0.02779         osd.4        up  1.00000          1.00000 
 5 0.02779         osd.5        up  1.00000          1.00000 
$ ceph osd pool create mypool 256 256 replicated
pool 'mypool' created
$ ceph osd lspools
0 rbd,1 mypool,
$ ceph osd dump | grep mypool
pool 1 'mypool' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins
pg_num 256 pgp_num 256 last_change 59 flags hashpspool stripe_width 0
$ dd if=/dev/zero of=myobject bs=25M count=1 2> /dev/null
$ ls -sh myobject
25M myobject
$ rados -p mypool put myobject myobject
$ rados -p mypool stat myobject
mypool/myobject mtime 2016-04-30 10:12:10.000000, size 26214400
$ rados -p mypool ls
$ ceph osd map mypool myobject
osdmap e60 pool 'mypool' (1) object 'myobject' -> pg 1.5da41c62 (1.62)
-> up ([4,2,1], p4) acting ([4,2,1], p4)

It is mapping the object 'myobject' in the pool 'mypool' to pg 1.5da41c62 (1.62) and OSDs 4, 2 and 1

Let's see how it balances/replicates the object when OSDs go down...

$ ceph osd dump | grep mypool
pool 1 'mypool' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins
pg_num 256 pgp_num 256 last_change 59 flags hashpspool stripe_width 0
$ ceph osd map mypool myobject
osdmap e60 pool 'mypool' (1) object 'myobject' -> pg 1.5da41c62 (1.62)
-> up ([4,2,1], p4) acting ([4,2,1], p4)
$ ceph osd tree
-1 0.16672 root default                                      
-2 0.05557     host node-1                                   
 0 0.02779         osd.0        up  1.00000          1.00000 
 1 0.02779         osd.1        up  1.00000          1.00000 
-3 0.05557     host node-2                                   
 2 0.02779         osd.2      down  1.00000          1.00000 
 3 0.02779         osd.3        up  1.00000          1.00000 
-4 0.05557     host node-3                                   
 4 0.02779         osd.4        up  1.00000          1.00000 
 5 0.02779         osd.5        up  1.00000          1.00000 
$ ceph osd map mypool myobject
osdmap e66 pool 'mypool' (1) object 'myobject' -> pg 1.5da41c62 (1.62)
-> up ([4,1], p4) acting ([4,1], p4)
$ ceph osd tree
-1 0.16672 root default                                      
-2 0.05557     host node-1                                   
 0 0.02779         osd.0        up  1.00000          1.00000 
 1 0.02779         osd.1      down  1.00000          1.00000 
-3 0.05557     host node-2                                   
 2 0.02779         osd.2      down  1.00000          1.00000 
 3 0.02779         osd.3        up  1.00000          1.00000 
-4 0.05557     host node-3                                   
 4 0.02779         osd.4        up  1.00000          1.00000 
 5 0.02779         osd.5        up  1.00000          1.00000 
$ ceph osd map mypool myobject
osdmap e68 pool 'mypool' (1) object 'myobject' -> pg 1.5da41c62 (1.62)
-> up ([4], p4) acting ([4], p4)
$ ceph osd map mypool myobject
osdmap e96 pool 'mypool' (1) object 'myobject' -> pg 1.5da41c62 (1.62)
-> up ([4,3], p4) acting ([4,3], p4)
$ ceph osd tree
-1 0.16672 root default                                      
-2 0.05557     host node-1                                   
 0 0.02779         osd.0        up  1.00000          1.00000 
 1 0.02779         osd.1        up  1.00000          1.00000 
-3 0.05557     host node-2                                   
 2 0.02779         osd.2      down        0          1.00000 
 3 0.02779         osd.3        up  1.00000          1.00000 
-4 0.05557     host node-3                                   
 4 0.02779         osd.4        up  1.00000          1.00000 
 5 0.02779         osd.5        up  1.00000          1.00000
$ ceph osd map mypool myobject
osdmap e103 pool 'mypool' (1) object 'myobject' -> pg 1.5da41c62 (1.62)
-> up ([4,1,3], p4) acting ([4,1,3], p4)
$ ceph osd tree
-1 0.16672 root default                                      
-2 0.05557     host node-1                                   
 0 0.02779         osd.0        up  1.00000          1.00000 
 1 0.02779         osd.1        up  1.00000          1.00000 
-3 0.05557     host node-2                                   
 2 0.02779         osd.2        up  1.00000          1.00000 
 3 0.02779         osd.3        up  1.00000          1.00000 
-4 0.05557     host node-3                                   
 4 0.02779         osd.4        up  1.00000          1.00000 
 5 0.02779         osd.5        up  1.00000          1.00000
$ ceph osd map mypool myobject
osdmap e108 pool 'mypool' (1) object 'myobject' -> pg 1.5da41c62 (1.62)
-> up ([4,2,1], p4) acting ([4,2,1], p4)
$ ceph osd tree
-1 0.16672 root default                                      
-2 0.05557     host node-1                                   
 0 0.02779         osd.0        up  1.00000          1.00000 
 1 0.02779         osd.1        up  1.00000          1.00000 
-3 0.05557     host node-2                                   
 2 0.02779         osd.2        up  1.00000          1.00000 
 3 0.02779         osd.3        up  1.00000          1.00000 
-4 0.05557     host node-3                                   
 4 0.02779         osd.4      down  1.00000          1.00000 
 5 0.02779         osd.5        up  1.00000          1.00000 
$ ceph osd map mypool myobject
osdmap e110 pool 'mypool' (1) object 'myobject' -> pg 1.5da41c62 (1.62)
-> up ([2,1], p2) acting ([2,1], p2)
$ ceph osd tree
-1 0.16672 root default                                      
-2 0.05557     host node-1                                   
 0 0.02779         osd.0        up  1.00000          1.00000 
 1 0.02779         osd.1        up  1.00000          1.00000 
-3 0.05557     host node-2                                   
 2 0.02779         osd.2        up  1.00000          1.00000 
 3 0.02779         osd.3        up  1.00000          1.00000 
-4 0.05557     host node-3                                   
 4 0.02779         osd.4        up  1.00000          1.00000 
 5 0.02779         osd.5        up  1.00000          1.00000 
$ ceph osd map mypool myobject
osdmap e115 pool 'mypool' (1) object 'myobject' -> pg 1.5da41c62 (1.62)
-> up ([4,2,1], p4) acting ([4,2,1], p4)

It works as expected. It uses three replicas with a minimal size of two replicas.


Along this post some challenges of data placement at hyperscale were introduced together with the Ceph approach (smart storage nodes, CRUSH algorithm, Jenkins hash function...) to address them.

Some practical examples to illustrate the mapping and data placement path are also available in the previous lines. They were tested on the new Ceph Jewel release.

As usual, I would say the primary reference to understand the current Ceph data placement is the source code. I would suggest to read the Sage's thesis (6.2, 6.3 and 6.4 sections) to know more on the roots of the current solution too. These sections cover reliable autonomic storage, performance and scalability. Beyond of these references you might also find useful the official documentation.

If you are looking for some kind of support related to development, design, deployment, etc. in Ceph or you would love to see some new feature in the next releases. Feel free to contact me!

by Javier at April 29, 2016 10:00 PM

April 26, 2016

Adrián Pérez

Identifying Layer-7 packet flows in SnabbWall

Spring is here already, the snow has melted a while ago, and it looks like a good time to write a bit about network traffic flows, as promised in my previous post about ljndpi. Why so? Well, looking at network traffic and grouping it into logical streams between two endpoints is something that needs to be done for SnabbWall, a suite of Snabb applications which implement a Layer-7 analyzer and firewall which Igalia is developing with sponsorship from the NLnet Foundation.


(For those interested in following only my Snabb-related posts, there are separate feeds you can use: RSS, Atom.)

Going With the Flow

Any sequence of related network packets between two hosts can be a network traffic flow. But not quite so: the exact definition may vary, depending on the level at which we are working. For example, an ISP may want to consider all packets between the pair of hosts —regardless of their contents— as part of the same flow in order to account for transferred data in metered connections, but for SnabbWall we want “application-level” traffic flows. That is: all packets generated (or received) by the same application should be classified into the same flow.

But that can get tricky, because even if we looked only at TCP traffic from one application, it is not possible just map a single connection to one flow. Take FTP for example: in active mode it uses a control connection, plus an additional data connection, and both should be considered part of the same flow because both belong to the same application. On the other side of the spectrum are web browsers like the one you are probably using to read this article: it will load the HTML using one connection, and then other related content (CSS, JavaScript, images) needed to display the web page.

In SnabbWall, the assignment of packets to flows is done based on the following fields from the packet:

  • 802.1Q VLAN tag.
  • Source and destination IP addresses.
  • Source and destination port numbers.

The VLAN tag is there to force classifying packets with the same source and destination but in different logical networks in separate packet flows. As for port numbers, in practice these are only extracted from packets when the upper layer protocol is UDP or TCP. There are other protocols which use port numbers, but they are deliberately left out (for now) because either nDPI does not support them, or they are not widely adopted (SCTP comes to mind).

Some Implementation Details

Determining the flow to which packets belong is an important task which is performed for each single packet scanned. Even before packet contents are inspected, they have to be classified.

Handling flows is split in two in the SnabbWall packet scanner: a generic implementation inspects packets to extract the fields above (VLAN tag, addresses, ports) and calculates a unique flow key from them, while backend-specific code inspects the contents of the packet and identifies the application for a flow of packets. Once the generic part of the code has calculated a key, it can be used to keep tables which associate additional data to each flow. While SnabbWall has only one backend which at the moment which uses nDPI, this split makes it easier to add others in the future.

For efficiency —both in terms of memory and CPU usage— flow keys are represented using a C struct. The following snippet shows the one for IPv4 packets, with a similar one where the address fields are 16 bytes wide being used for IPv6 packets:

ffi.cdef [[
   struct swall_flow_key_ipv4 {
      uint16_t vlan_id;
      uint8_t  __pad;
      uint8_t  ip_proto;
      uint8_t  lo_addr[4];
      uint8_t  hi_addr[4];
      uint16_t lo_port;
      uint16_t hi_port;
   } __attribute__((packed));

local flow_key_ipv4 = ffi.metatype("struct swall_flow_key_ipv4", {
   __index = {
      hash = make_cdata_hash_function(ffi.sizeof("struct swall_flow_key_ipv4")),

The struct is laid out with an extra byte of padding, to ensure that its size is a multiple of 4. Why so? The hash function (borrowed from the lib.ctable module) used for flow keys works on inputs with sizes multiple of 4 bytes because calculations are done in a word-by-word basis. In Lua the hash value for userdata values is their memory address, which makes them all different to each other: defining our own hashing function allows using the hash values as keys into tables, instead of the flow key itself. Let's see how this works with the following snippet, which counts per-flow packets:

local flows = {}
while not ended do
   local key = key_from_packet(read_packet())
   if flows[key:hash()] then
      flows[key:hash()].num_packets = flows[key:hash()].num_packets + 1
      flows[key:hash()] = { key = key, num_packets = 1 }

If we used the keys themselves instead of key:hash() for indexing the flows table, this wouldn't work because the userdata for the new key is created for each packet being processed, which means that keys with the same content created for different packets would have different hash values (their address in memory). On the other hand, the :hash() method always returns the same value keys with the same contents.

Highs and Lows

You may be wondering why our flow key struct has its members named lo_addr, hi_addr, lo_port and hi_port. It turns out that in packets which belong to the same application travel between two hosts in both directions. Let's consider the following:

  • Host A, with address
  • Host B, with address
  • A web browser from A connects (using randomly assigned port 10205) to host B, which has an HTTP server running in port 80.

The sequence of packets observed will go like this:

# Source IP Destination IP Source Port Destination Port
1 10205 80
2 80 10205
3 10205 80

If the flow key fields would be src_addr, dst_addr and so on, the first and second packets would be classified in separate flows — but they belong in the same one! This is sidestepped by sorting the addresses and ports of each packet when calculating its flow key. For the example connection above, all packets involved have as the “low IP address” (lo_addr), as the “high IP address” (hi_addr), 80 as the “low port” (lo_port), and 10205 as the “high port” (hi_port) — effectively classifying all the packets into the same flow.

This translates into some minor annoyance in the nDPI scanner backend because nDPI expects us to pass a pair of identifiers for the source and destination hosts for each packet inspected. Not a big deal, though.

Flow(er Power)

Something we have to do for IPv6 packets is traversing the chain of extension headers to get to the upper-layer protocol and extract port numbers from TCP and UDP packets. There can be any number of extension headers, and while in practice they should never be lots, this makes the amount of work needed to derive a flow key from a packet is not constant.

The good news is that RFC 6437 specifies using the 20-bit flow label field of the fixed IPv6 header in a way that, combined with the source and destination addresses, they uniquely identify the flow of the packet. This is all rainbows and ponies, but in practice the current behaviour would still be needed: the specification considers that an all-zeroes value indicates “packets that have not been labeled”. Which means that it is still needed to use the source and destination ports as fallback. What is even worse: while forbidden by the specification, flow labels can mutate while packets are en-route without any means of verifying that the change was made. Also, it is allowed to assign a new flow label to an unlabeled packet when packets are being forwarded. Nevertheless, using the flow label may be interesting to be used instead of the port numbers when the upper layer protocol is neither TCP nor UDP. Due to the limited usefulness, using IPv6 flow labels remains unimplemented for now, but I have not discarded adding support later on.

Something Else

Alongside with the packet scanner, I have implemented the L7Spy application, and the snabb wall command during this phase of the SnabbWall project. Expect another post soon about them!

by aperez ( at April 26, 2016 08:34 PM

April 15, 2016

Frédéric Wang

OpenType MATH in HarfBuzz


  • Work is in progress to add OpenType MATH support in HarfBuzz and will be instrumental for many math rendering engines relying on that library, including browsers.

  • For stretchy operators, an efficient way to determine the required number of glyphs and their overlaps has been implemented and is described here.

In the context of Igalia browser team effort to implement MathML support using TeX rules and OpenType features, I have started implementation of OpenType MATH support in HarfBuzz. This table from the OpenType standard is made of three subtables:

  • The MathConstants table, which contains layout constants. For example, the thickness of the fraction bar of ab\frac{a}{b}.

  • The MathGlyphInfo table, which contains glyph properties. For instance, the italic correction indicating how slanted an integral is e.g. to properly place the subscript in ∫D\displaystyle\displaystyle\int_{D}.

  • The MathVariants table, which provides larger size variants for a base glyph or data to build a glyph assembly. For example, either a larger parenthesis or a assembly of U+239B, U+239C, U+239D to write something like:


Code to parse this table was added to Gecko and WebKit two years ago. The existing code to build glyph assembly in these Web engines was adapted to use the MathVariants data instead of only private tables. However, as we will see below the MathVariants data to build glyph assembly is more general, with arbitrary number of glyphs or with additional constraints on glyph overlaps. Also there are various fallback mechanisms for old fonts and other bugs that I think we could get rid of when we move to OpenType MATH fonts only.

In order to add MathML support in Blink, it is very easy to import the OpenType MATH parsing code from WebKit. However, after discussions with some Google developers, it seems that the best option is to directly add support for this table in HarfBuzz. Since this library is used by Gecko, by WebKit (at least the GTK port) and by many other applications such as Servo, XeTeX or LibreOffice it make senses to share the implementation to improve math rendering everywhere.

The idea for HarfBuzz is to add an API to

  1. 1.

    Expose data from the MathConstants and MathGlyphInfo.

  2. 2.

    Shape stretchy operators to some target size with the help of the MathVariants.

It is then up to a higher-level math rendering engine (e.g. TeX or MathML rendering engines) to beautifully display mathematical formulas using this API. The design choice for exposing MathConstants and MathGlyphInfo is almost obvious from the reading of the MATH table specification. The choice for the shaping API is a bit more complex and discussions is still in progress. For example because we want to accept stretching after glyph-level mirroring (e.g. to draw RTL clockwise integrals) we should accept any glyph and not just an input Unicode strings as it is the case for other HarfBuzz shaping functions. This shaping also depends on a stretching direction (horizontal/vertical) or on a target size (and Gecko even currently has various ways to approximate that target size). Finally, we should also have a way to expose italic correction for a glyph assembly or to approximate preferred width for Web rendering engines.

As I mentioned at the beginning, the data and algorithm to build glyph assembly is the most complex part of the OpenType MATH and deserves a special interest. The idea is that you have a list of n≥1n\geq 1 glyphs available to build the assembly. For each 0≤i≤n-10\leq i\leq n-1, the glyph gig_{i} has advance aia_{i} in the stretch direction. Each gig_{i} has straight connector part at its start (of length sis_{i}) and at its end (of length eie_{i}) so that we can align the glyphs on the stretch axis and glue them together. Also, some of the glyphs are “extenders” which means that they can be repeated 0, 1 or more times to make the assembly as large as possible. Finally, the end/start connectors of consecutive glyphs must overlap by at least a fixed value omino_{\mathrm{min}} to avoid gaps at some resolutions but of course without exceeding the length of the corresponding connectors. This gives some flexibility to adjust the size of the assembly and get closer to the target size tt.










Figure 1: Two adjacent glyphs in an assembly

To ensure that the width/height is distributed equally and the symmetry of the shape is preserved, the MATH table specification suggests the following iterative algorithm to determine the number of extenders and the connector overlaps to reach a minimal target size tt:

  1. 1.

    Assemble all parts by overlapping connectors by maximum amount, and removing all extenders. This gives the smallest possible result.

  2. 2.

    Determine how much extra width/height can be distributed into all connections between neighboring parts. If that is enough to achieve the size goal, extend each connection equally by changing overlaps of connectors to finish the job.

  3. 3.

    If all connections have been extended to minimum overlap and further growth is needed, add one of each extender, and repeat the process from the first step.

We note that at each step, each extender is repeated the same number of times r≥0r\geq 0. So if IExtI_{\mathrm{Ext}} (respectively INonExtI_{\mathrm{NonExt}}) is the set of indices 0≤i≤n-10\leq i\leq n-1 such that gig_{i} is an extender (respectively is not an extender) we have ri=rr_{i}=r (respectively ri=1r_{i}=1). The size we can reach at step rr is at most the one obtained with the minimal connector overlap omino_{\mathrm{min}} that is

∑i=0N-1(∑j=1riai-omin)+omin=(∑i∈INonExtai-omin)+(∑i∈IExtr⁢(ai-omin))+omin\sum_{i=0}^{N-1}\left(\sum_{j=1}^{r_{i}}{a_{i}-o_{\mathrm{min}}}\right)+o_{ \mathrm{min}}=\left(\sum_{i\in I_{\mathrm{NonExt}}}{a_{i}-o_{\mathrm{min}}} \right)+\left(\sum_{i\in I_{\mathrm{Ext}}}r{(a_{i}-o_{\mathrm{min}})}\right)+o% _{\mathrm{min}}

We let NExt=|IExt|N_{\mathrm{Ext}}={|I_{\mathrm{Ext}}|} and NNonExt=|INonExt|N_{\mathrm{NonExt}}={|I_{\mathrm{NonExt}}|} be the number of extenders and non-extenders. We also let SExt=∑i∈IExtaiS_{\mathrm{Ext}}=\sum_{i\in I_{\mathrm{Ext}}}a_{i} and SNonExt=∑i∈INonExtaiS_{\mathrm{NonExt}}=\sum_{i\in I_{\mathrm{NonExt}}}a_{i} be the sum of advances for extenders and non-extenders. If we want the advance of the glyph assembly to reach the minimal size tt then

SNonExt-omin⁢(NNonExt-1)+r⁢(SExt-omin⁢NExt)≥t{S_{\mathrm{NonExt}}-o_{\mathrm{min}}\left(N_{\mathrm{NonExt}}-1\right)}+{r% \left(S_{\mathrm{Ext}}-o_{\mathrm{min}}N_{\mathrm{Ext}}\right)}\geq t

We can assume 0" display="inline">SExt-omin⁢NExt>0S_{\mathrm{Ext}}-o_{\mathrm{min}}N_{\mathrm{Ext}}>0 or otherwise we would have the extreme case where the overlap takes at least the full advance of each extender. Then we obtain

r≥rmin=max⁡(0,⌈t-SNonExt+omin⁢(NNonExt-1)SExt-omin⁢NExt⌉)r\geq r_{\mathrm{min}}=\max\left(0,\left\lceil\frac{t-{S_{\mathrm{NonExt}}+o_{ \mathrm{min}}\left(N_{\mathrm{NonExt}}-1\right)}}{S_{\mathrm{Ext}}-o_{\mathrm{ min}}N_{\mathrm{Ext}}}\right\rceil\right)

This provides a first simplification of the algorithm sketched in the MATH table specification: Directly start iteration at step rminr_{\mathrm{min}}. Note that at each step we start at possibly different maximum overlaps and decrease all of them by a same value. It is not clear what to do when one of the overlap reaches omino_{\mathrm{min}} while others can still be decreased. However, the sketched algorithm says all the connectors should reach minimum overlap before the next increment of rr, which means the target size will indeed be reached at step rminr_{\mathrm{min}}.

One possible interpretation is to stop overlap decreasing for the adjacent connectors that reached minimum overlap and to continue uniform decreasing for the others until all the connectors reach minimum overlap. In that case we may lose equal distribution or symmetry. In practice, this should probably not matter much. So we propose instead the dual option which should behave more or less the same in most cases: Start with all overlaps set to omino_{\mathrm{min}} and increase them evenly to reach a same value oo. By the same reasoning as above we want the inequality

SNonExt-o⁢(NNonExt-1)+rmin⁢(SExt-o⁢NExt)≥t{S_{\mathrm{NonExt}}-o\left(N_{\mathrm{NonExt}}-1\right)}+{r_{\mathrm{min}} \left(S_{\mathrm{Ext}}-oN_{\mathrm{Ext}}\right)}\geq t

which can be rewritten

SNonExt+rmin⁢SExt-o⁢(NNonExt+rmin⁢NExt-1)≥tS_{\mathrm{NonExt}}+r_{\mathrm{min}}S_{\mathrm{Ext}}-{o\left(N_{\mathrm{NonExt% }}+{r_{\mathrm{min}}N_{\mathrm{Ext}}}-1\right)}\geq t

We note that N=NNonExt+rmin⁢NExtN=N_{\mathrm{NonExt}}+{r_{\mathrm{min}}N_{\mathrm{Ext}}} is just the exact number of glyphs used in the assembly. If there is only a single glyph, then the overlap value is irrelevant so we can assume NNonExt+r⁢NExt-1=N-1≥1N_{\mathrm{NonExt}}+{rN_{\mathrm{Ext}}}-1=N-1\geq 1. This provides the greatest theorical value for the overlap oo:

omin≤o≤omaxtheorical=SNonExt+rmin⁢SExt-tNNonExt+rmin⁢NExt-1o_{\mathrm{min}}\leq o\leq o_{\mathrm{max}}^{\mathrm{theorical}}=\frac{S_{ \mathrm{NonExt}}+r_{\mathrm{min}}S_{\mathrm{Ext}}-t}{N_{\mathrm{NonExt}}+{r_{ \mathrm{min}}N_{\mathrm{Ext}}}-1}

Of course, we also have to take into account the limit imposed by the start and end connector lengths. So omaxo_{\mathrm{max}} must also be at most min⁡(ei,si+1)\min{(e_{i},s_{i+1})} for 0≤i≤n-20\leq i\leq n-2. But if rmin≥2r_{\mathrm{min}}\geq 2 then extender copies are connected and so omaxo_{\mathrm{max}} must also be at most min⁡(ei,si)\min{(e_{i},s_{i})} for i∈IExti\in I_{\mathrm{Ext}}. To summarize, omaxo_{\mathrm{max}} is the minimum of omaxtheoricalo_{\mathrm{max}}^{\mathrm{theorical}}, of eie_{i} for 0≤i≤n-20\leq i\leq n-2, of sis_{i} 1≤i≤n-11\leq i\leq n-1 and possibly of e0e_{0} (if 0∈IExt0\in I_{\mathrm{Ext}}) and of of sn-1s_{n-1} (if n-1∈IExt{n-1}\in I_{\mathrm{Ext}}).

With the algorithm described above NExtN_{\mathrm{Ext}}, NNonExtN_{\mathrm{NonExt}}, SExtS_{\mathrm{Ext}}, SNonExtS_{\mathrm{NonExt}} and rminr_{\mathrm{min}} and omaxo_{\mathrm{max}} can all be obtained using simple loops on the glyphs gig_{i} and so the complexity is O⁢(n)O(n). In practice nn is small: For existing fonts, assemblies are made of at most three non-extenders and two extenders that is n≤5n\leq 5 (incidentally, Gecko and WebKit do not currently support larger values of nn). This means that all the operations described above can be considered to have constant complexity. This is much better than a naive implementation of the iterative algorithm sketched in the OpenType MATH table specification which seems to require at worst

∑r=0rmin-1NNonExt+r⁢NExt=NNonExt⁢rmin+rmin⁢(rmin-1)2⁢NExt=O⁢(n×rmin2)\sum_{r=0}^{r_{\mathrm{min}}-1}{N_{\mathrm{NonExt}}+rN_{\mathrm{Ext}}}=N_{ \mathrm{NonExt}}r_{\mathrm{min}}+\frac{r_{\mathrm{min}}\left(r_{\mathrm{min}}-% 1\right)}{2}N_{\mathrm{Ext}}={O(n\times r_{\mathrm{min}}^{2})}

and at least Ω⁢(rmin)\Omega(r_{\mathrm{min}}).

One of issue is that the number of extender repetitions rminr_{\mathrm{min}} and the number of glyphs in the assembly NN can become arbitrary large since the target size tt can take large values e.g. if one writes \underbrace{\hspace{65535em}} in LaTeX. The improvement proposed here does not solve that issue since setting the coordinates of each glyph in the assembly and painting them require Θ⁢(N)\Theta(N) operations as well as (in the case of HarfBuzz) a glyph buffer of size NN. However, such large stretchy operators do not happen in real-life mathematical formulas. Hence to avoid possible hangs in Web engines a solution is to impose a maximum limit NmaxN_{\mathrm{max}} for the number of glyph in the assembly so that the complexity is limited by the size of the DOM tree. Currently, the proposal for HarfBuzz is Nmax=128N_{\mathrm{max}}=128. This means that if each assembly glyph is 1em large you won’t be able to draw stretchy operators of size more than 128em, which sounds a quite reasonable bound. With the above proposal, rminr_{\mathrm{min}} and so NN can be determined very quickly and the cases N≥NmaxN\geq N_{\mathrm{max}} rejected, so that we avoid losing time with such edge cases…

Finally, because in our proposal we use the same overlap oo everywhere an alternative for HarfBuzz would be to set the output buffer size to nn (i.e. ignore r-1r-1 copies of each extender and only keep the first one). This will leave gaps that the client can fix by repeating extenders as long as oo is also provided. Then HarfBuzz math shaping can be done with a complexity in time and space of just O⁢(n)O(n) and it will be up to the client to optimize or limit the painting of extenders for large values of NN…

April 15, 2016 10:00 PM

April 10, 2016

Diego Pino

Network namespaces

Namespaces and cgroups are two of the main kernel technologies most of the new trend on software containerization (think Docker) rides on. To put it simple, cgroups are a metering and limiting mechanism, they control how much of a system resource (CPU, memory) you can use. On the other hand, namespaces limit what you can see. Thanks to namespaces processes have their own view of the system’s resources.

The Linux kernel provides 6 types of namespaces: pid, net, mnt, uts, ipc and user. For instance, a process inside a pid namespace only sees processes in the same namespace. Thanks to the mnt namespace, it’s possible to attach a process to its own filesystem (like chroot). In this article I focus only in network namespaces.

If you have grasped the concept of namespaces you may have at this point an intuitive idea of what a network namespace might offer. Network namespaces provide a brand-new network stack for all the processes within the namespace. That includes network interfaces, routing tables and iptables rules.

Network namespaces

From the system’s point of view, when creating a new process via clone() syscall, passing the flag CLONE_NEWNET will create a brand-new network namespace into the new process. From the user perspective, we simply use the tool ip (package is iproute2) to create a new persistent network namespace:

$ ip netns add ns1

This command will create a new network namespace called ns1. When the namespace is created, the ip command adds a bind mount point for it under /var/run/netns. This allows the namespace to persist even if there’s no process attached to it. To list the namespaces available in the system:

$ ls /var/run/netns

Or via ip:

$ ip netns

As previously said, a network namespace contains its own network resources: interfaces, routing tables, etc. Let’s add a loopback interface to ns1:

1 $ ip netns exec ns1 ip link set dev lo up
2 $ ip netns exec ns1 ping
3 PING ( 56(84) bytes of data.
4 64 bytes from icmp_seq=1 ttl=64 time=0.115 ms
  • Line 1 brings up the loopback interface inside the network namespace ns1.
  • Line 2 executes the command ping inside the network namespace.

An alternative syntax to bring up the loopback interface could be:

$ ip netns exec ns1 ifconfig lo up

However, I tend to use the command ip as it has become the preferred networking tool in Linux, obsoleting the old but more familiar commands ifconfig, route, etc. Notice that ip requires root privileges, so run it as root or prepend sudo.

A network namespace has its own routing table too:

$ ip netns exec ns1 ip route show

Which at this point returns nothing as we haven’t add any routing table rule yet. Generally speaking, any command run within a network namespace is prepend by the prologue:

$ ip netns exec <network-namespace>

A practical example

One of the consequences of network namespaces is that only one interface could be assigned to a namespace at a time. If the root namespace owns eth0, which provides access to the external world, only programs within the root namespace could reach the Internet. The solution is to communicate a namespace with the root namespace via a veth pair. A veth pair works like a patch cable, connecting two sides. It consists of two virtual interfaces, one of them is assigned to the root network namespace, while the other lives within a network namespace. Setting up their IP addresses and routing rules accordingly, plus enabling NAT in the host side, will be enough to provide Internet access to the network namespace.

Additionally, I feel like I need to make a clarification at this point. I’ve read in several articles about network namespaces that physical device interfaces can only live in the root namespace. At least that’s not the case with my current kernel (Linux 3.13). I can assign eth0 to a namespace other than the root and when setting it up properly have Internet access from the namespace. However, the limitation of one interface living only in one single namespace at a time still applies, and that’s a reason powerful enough to need connecting network namespace via a veth pair.

To start with, let’s create a new network namespace called ns1.

# Remove namespace if it exists.
ip netns del ns1 &>/dev/null

# Create namespace
ip netns add ns1

Next, create a veth pair. Interface v-eth1 will remain inside the root network namespace, while its peer, v-peer1, will be moved to the ns1 namespace.

# Create veth link.
ip link add v-eth1 type veth peer name v-peer1

# Add peer-1 to NS.
ip link set v-peer1 netns ns1

Next, setup IPv4 addresses for both interfaces and bring them up.

# Setup IP address of v-eth1.
ip addr add dev v-eth1
ip link set v-eth1 up

# Setup IP address of v-peer1.
ip netns exec ns1 ip addr add dev v-peer1
ip netns exec ns1 ip link set v-peer1 up
ip netns exec ns1 ip link set lo up

Additionally I brought up the loopback interface inside ns1.

Now it’s necessary we make all external traffic leaving ns1 to go through v-eth1.

ip netns exec ns1 ip route add default via

However this won’t be enough. As with any host sharing Its internet connection, it’s necessary to enable IPv4 forwarding in the host and enable masquerading.

# Share internet access between host and NS.

# Enable IP-forwarding.
echo 1 > /proc/sys/net/ipv4/ip_forward

# Flush forward rules, policy DROP by default.
iptables -P FORWARD DROP
iptables -F FORWARD

# Flush nat rules.
iptables -t nat -F

# Enable masquerading of
iptables -t nat -A POSTROUTING -s -o eth0 -j MASQUERADE

# Allow forwarding between eth0 and v-eth1.
iptables -A FORWARD -i eth0 -o v-eth1 -j ACCEPT
iptables -A FORWARD -o eth0 -i v-eth1 -j ACCEPT

If everything went fine, it would be possible to ping an external host from ns1.

$ ip netns exec ns1 ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=50 time=48.5 ms
64 bytes from icmp_seq=2 ttl=50 time=50.8 ms

This is how the routing table inside ns1 would look like after the setup:

$ ip netns exec ns1 ip route sh
default via dev v-peer1 dev v-peer1  proto kernel  scope link  src 

Prepending the ip netns exec prologue for every command to run from the namespace might be a bit tedious. Once the most basic features inside the namespace are setup, a more interesting possibility is to run a bash shell and attach it to the network namespace:

$ ip netns exec ns1 /bin/bash --rcfile <(echo "PS1=\"namespace ns1> \"")
namespace ns1> ping
PING ( 56(84) bytes of data.
64 bytes from ( icmp_seq=1 ttl=58 time=17.6 ms

Type exit to leave end the bash process and leave the network namespace.


Network namespaces, as well as other containerization technologies provided by the Linux kernel, are a lightweight mechanism for resource isolation. Processes attached to a network namespace see their own network stack, while not interfering with the rest of the system’s network stack.

Network namespaces are easy to use too. A similar network-level isolation could have been set up using a VM. However, that seems a much more expensive solution in terms of system resources and time investment to build up such environment. If you only need process isolation at the networking level, network namespaces are definitively something to consider.

The full script is available as a GitHub gist at:

More readings

April 10, 2016 09:30 PM

April 06, 2016

Víctor Jáquez

gstreamer-vaapi 1.8: the codec split

On march 23th GStreamer 1.8 was released, along with all its bundled modules, and, of course, one of those modules is gstreamer-vaapi.

Let us talk about this gstreamer-vaapi release, since there are several sweets!

First thing to notice is that the encoders have been renamed. Before they followed the pattern vaapiencode_{codec}, now they follow the pattern vaapi{codec}enc. The purpose of this change is twofold: to fix the plugins gtk-docs and to keep the usual element names in GStreamer. The conversion table is this:

Old New
vaapiencode_h264 vaapih264enc
vaapiencode_h265 vaapih265enc
vaapiencode_mpeg2 vaapimpeg2enc
vaapiencode_jpeg vaapijpegenc
vaapiencode_vp8 vaapivp8enc

But those were not the only name changes, we also have split the vaapidecode. Now we have a vaapijpegdec, which only decodes JPEG images, while keeping the old vaapidecode for video decoding. Also, vaapijpegdec was demoted to a marginal rank, because there are some problems in the Intel VA driver (which is the only one which supports JPEG decoding right now).

Note that in future releases, all the decoders will be split by codec, just as we did the JPEG decoder now; but first, we need to modify vaapidecodebin to choose a decoder in run-time based on the negotiated caps.

Please, update your scripts and code accordingly.

There are a ton of enhancements and optimizations too. Let me enumerate some of them: Vineeth TM fixed several memory leaks, and some compilations issues; Tim enabled vaapisink to send unhandled keyboard or mouse events to the application, making the usage of apps like gst-play-1.0 or apps based on GstPlayer be more natural; Thiago fixed the h264/h265 parsers, meanwhile Sree fixed the vp9 and the h265 ones too; Scott also fixed the h265 parser; et cetera. As may you see, H265/HEVC parser has been very active lately, it is the new thing!

I have to thank Sebastian Dröge, he did all the release work and also fixed a couple compilation issues.

This is the short log summary since 1.6:

 2  Lim Siew Hoon
 1  Scott D Phillips
 8  Sebastian Dröge
 3  Sreerenj Balachandran
 5  Thiago Santos
 1  Tim-Philipp Müller
 8  Vineeth TM
16  Víctor Manuel Jáquez Leal

by vjaquez at April 06, 2016 10:47 AM

March 31, 2016

Jacobo Aragunde

A new PhpReport for 2016

It’s been three years without any new release of our venerable time tracking tool, PhpReport. It doesn’t mean the project has been still during all this time; despite the slower development pace, you will find more than 80 patches in the repository since the last release, which account for 45 fixes or small features in the project.

It’s time to gather them all in a new release, PhpReport 2.16. These are the highlights:

CSV export

csv export button

All reports have now an option to export data to CSV, so they can be imported into other software. This was made with spreadsheets in mind, and I can confirm it works perfectly with our dear LibreOffice Calc.

Quick-access buttons

Quick access date form

Most reports got quick-access buttons added for the most common time periods: current and previous week, current month and year, etc.

Smarter “copy from yesterday”

The “copy from date” feature allows to copy tasks from a certain date into the current one. Its main use case is to copy everything from your previous work day, because you probably have been doing the same work, more or less during the same timetable… You can conveniently copy the tasks and just add some tweaks. The default date to copy from used to be the previous day, but you don’t usually want to copy your tasks from Sunday to Monday! Now it defaults to the previous date you have worked.


Addressed several slightly annoying behaviors to make the application more enjoyable to use. For example, when you create a project, the grid will scroll automatically to the newly added project so you can keep editing the project attributes. A similar thing happens when you have some row selected in the accumulated hours report and you load a new set of dates: the previously selected row will be kept selected and visible. Opening the project details from a report now pops up a new window so your report remains untouched when you go back to it. Inexplicably small grids now use all the screen space, and we increased the consistency among the different edition screens.

Moved to GitHub

The project sources, releases and the information previously available in the project website has been moved to GitHub, so any potential contributors will find a familiar environment if they happen to be interested in PhpReport.


I’ve bumped the version number from 2.1 straight to 2.16. The intention is to make one release per year, to guarantee some predictability regarding how and when the changes in the repository will arrive to users. You can expect PhpReport 2.17 releasing in March next year; it may look like a long time but I think it’s reasonable for a project with a low level of activity.

Looking for help!

Igalia has recently announced open positions in our Coding Experience program, and one of them is aimed to work in PhpReport. We have many ideas to improve this tool to make it more productive and powerful. Check the conditions of the program in the website if you are interested!

by Jacobo Aragunde Pérez at March 31, 2016 07:39 PM

Michael Catanzaro

Positive progress on WebKitGTK+ security updates

I previously reported that, although WebKitGTK+ releases regular upstream security updates, most Linux distributions are not taking the updates. At the time, only Arch Linux and Fedora were reliably releasing our security updates. So I’m quite pleased that openSUSE recently released a WebKitGTK+ security update, and then Mageia did too. Gentoo currently has an update in the works. It remains to be seen if these distros regularly follow up on updates (expect a follow-up post on this in a few months), but, optimistically, you now have several independent distros to choose from to get an updated version WebKitGTK+, plus any distros that regularly receive updates directly from these distros.

Unfortunately, not all is well yet. It’s still not safe to use WebKitGTK+ on the latest releases of Debian or Ubuntu, or on derivatives like Linux Mint, elementary OS, or Raspbian. (Raspbian is notable because it uses an ancient, insecure version of Epiphany as its default web browser, and Raspberry Pis are kind of popular.)

And of course, no distribution has been able to get rid of old, insecure WebKitGTK+ 2.4 compatibility packages, so many applications on distributions that do provide security updates for modern WebKitGTK+ will still be insecure. (Don’t be fooled by the recent WebKitGTK+ 2.4.10 update; it contains only a few security fixes that were easy to backport, and was spurred by the need to add GTK+ 3.20 compatibility. It is still not safe to use.) Nor have distributions managed to remove QtWebKit, which is also old and insecure. You still need to check individual applications to see if they are running safe versions of WebKit.

But at least there are now several distros providing WebKitGTK+ security updates. That’s good.

Special thanks to Apple and to my colleagues at Igalia for their work on the security advisories that motivate these updates.

by Michael Catanzaro at March 31, 2016 03:00 AM

Epiphany 3.20

So, what’s new in Epiphany 3.20?

First off: overlay scrollbars. Because web sites have the ability to style their scrollbars (which you’ve probably noticed on Google sites), WebKit embedders cannot use a normal GtkScrolledWindow to display content; instead, WebKit has to paint the scrollbars itself. Hence, when overlay scrollbars appeared in GTK+ 3.16, WebKit applications were left out. Carlos García Campos spent some time to work on this, and the result speaks for itself (if you fullscreen this video to see it properly):

Overlay scrollbars did not actually require any changes in Epiphany itself — all applications using an up-to-date version of WebKit will immediately benefit — but I mention it here as it’s one of the most noticeable changes. Read about other WebKit improvements, like the new Faster Than Light FTL/B3 JavaScript compilation tier, on Carlos’s blog.

Next up, there is a new downloads manager, also by Carlos García Campos. This replaces the old downloads bar that used to appear at the bottom of the screen:

Screenshot of the new downloads manager in Epiphany 3.20.

I flipped the switch in Epiphany to enable WebGL:

If you watched that video in fullscreen, you might have noticed that page is marked as insecure, even though it doesn’t use HTTPS. Like most browsers, we used to have several confusing security states. Pages with mixed content received a security warning that all users ignored, but pages with no security at all received no such warning. That’s pretty dumb, which is why Firefox and Chrome have been talking about changing this for a year or so now. I went ahead and implemented it. We now have exactly two security states: secure and insecure. If your page loads any content not over HTTPS, it will be marked as insecure. The vast majority of pages will be displayed as insecure, but it’s no less than such sites deserve. I’m not concerned at all about “warning fatigue,” because users are not generally expected to take any action on seeing these warnings. In the future, we will take this further, and use the insecure indicator for sites that use SHA-1 certificates.

Moving on. By popular request, I exposed the previously-hidden setting to disable session restore in the preferences dialog, as “Remember previous tabs on startup:”

Screenshot of the preferences dialog, with the new

Meanwhile, Carlos worked in both WebKit and Epiphany to greatly improve session restoration. Previously, Epiphany would save the URLs of the pages loaded in each tab, and when started it would load each URL in a new tab, but you wouldn’t have any history for those tabs, for example, and the state of the tab would otherwise be lost. Carlos worked on serializing the WebKit session state and exposing it in the WebKitGTK+ API, allowing us to restore full back/forward history for each tab, plus details like your scroll position on each tab. Thanks to Carlos, we also now make use of this functionality when reopening closed tabs, so your reopened tab will have a full back/forward list of history, and also when opening new tabs, so the new tab will inherit the history of the tab it was opened from (a feature that we had in the past, but lost when we switched to WebKit2).

Interestingly, we found the session restoration was at first too good: it would restore the page really exactly as you last viewed it, without refreshing the content at all. This means that if, for example, you were viewing a page in Bugzilla, then when starting the browser, you would miss any new comments from the last time you loaded the page until you refresh the page manually. This is actually the current behavior in Safari; it’s desirable on iOS to make the browser launch instantly, but questionable for desktop Safari. Carlos decided to always refresh the page content when restoring the session for WebKitGTK+.

Last, and perhaps least, there’s a new empty state displayed for new users, developed by Lorenzo Tilve and polished up by me, so that we don’t greet new users with a completely empty overview (where your most-visited sites are normally displayed):

Empty State

That, plus a bundle of the usual bugfixes, significant code cleanups, and internal architectual improvements (e.g. I converted the communication between the UI process and the web process extension to use private D-Bus connections instead of the session bus). The best things have not changed: it still starts up about 5-20 times faster than Firefox in my unscientific testing; I expect you’ll find similar results.


by Michael Catanzaro at March 31, 2016 01:54 AM

March 24, 2016

Andy Wingo

a simple (local) solution to the pay gap

International Working Women's Day was earlier this month, a day that reminds the world how far it has yet to go to achieve just treatment of women in the workplace. Obviously there are many fronts on which to fight to dismantle patriarchy, and also cissexism, and also transphobia, and also racism, and sometimes it gets a bit overwhelming just to think of a world where people treat each other right.

Against this backdrop, it's surprising that some policies are rarely mentioned by people working on social change. This article is about one of them -- a simple local change that can eliminate the pay gap across all axes of unfair privilege.


OK here it is: just pay everyone in a company the same hourly wage.

That's it!

on simple, on easy

But, you say, that's impossible!

Rich Hickey has this famous talk where he describes one thing as simple and the other as easy. In his narrative, simple is good but hard, and easy is bad but, you know, easy. I enjoy this talk because it's easy (hah!) to just call one thing simple and the other easy and it's codewords for good and bad, and you come across as having the facile prestidigitatory wisdom of a Malcolm Gladwell.

As far as simple, the substance of equal pay is as simple as it gets. And as far as practical implementation goes, it only needs buy-in from one person: your boss could do it tomorrow.

But, you say, a real business would never do this! This is getting closer to the real issues, but not there yet. There are plenty of instances of real businesses that do this. Incidentally, mine is one of them! I do not intend this to be an advertisement for my company, but I have to mention this early because society does its best to implant inside our brains the ideas that certain ideas are possible and certain others are not.

But, you say, this would be terrible for business! Here I think we are almost there. There's a question underneath, if we can manage to phrase it in a more scientific way -- I mean, the ideal sense in which science is a practice of humankind in which we use our limited powers to seek truth, with hypotheses but without prejudice. It might sound a bit pompous to invoke capital-S Science here, but I think few conversations of this kind try to honestly even consider existence proofs in the form of already-existing histories (like the company I work for), much less an unbiased study of the implications of modelling the future on those histories.

Let's assume that you and I want to work for justice, and in this more perfect world, men and women and nonbinary people will have equal pay for equal work, as will all people that lie on all axes of privilege that currently operate in society. If you are with me up to here: great. If not, we don't share a premise so it's not much use to go farther. You can probably skip to the next article in your reading list.

So, then, the questions: first of all, would a flat equal wage within a company actually help people in marginalized groups? What changes would happen to a company if it enacted a flat wage tomorrow? What are its limitations? How could this change come about?

would it help?

Let's take the most basic question first. How would this measure affect people in marginalized groups?

Let us assume that salaries are distributed inversely: the higher salaries are made by fewer people. A lower salary corresponds to more people. So firstly, we are in a situation where the median salary is less than the mean: that if we switched to pay everyone the mean, then most people would see an increase in their salary.

Assuming that marginalized people were evenly placed in a company, that would mean that most would benefit. But we know that is not the case: "marginalized" is the operative term. People are categorized at a lower point than their abilities; people's climb of the organizational hierarchy (and to higher salaries) is hindered by harassment, by undervalued diversity work, and by external structural factors, like institutionalized racism or the burden of having to go through a gender transition. So probably, even if a company touts equal pay within job classifications, the job classifications themselves unfairly put marginalized people lower than white dudes like me. So, proportionally marginalized people would benefit from an equal wage more than most.

Already this plan is looking pretty good: more money going to marginalized people is a necessary step to bootstrap a more just world.

All that said, many (but not most) people from marginalized groups will earn more than the mean. What for them? Some will decide that paying for a more just company as a whole is worth a salary reduction. (Incidentally, this applies to everyone: everyone has their price for justice. It might be 0.1%, it might be 5%, it might be 50%.)

Some, though, will decide it is not worth paying. They will go work elsewhere, probably for even more money (changing jobs being the best general way to advance your salary). I don't blame marginalized folks for getting all they can: more power to them.

From what I can tell, things are looking especially good for marginalized people under a local equal-wage initiative. Not perfect, not in all cases, but generally better.

won't someone think of the dudes

I don't believe in value as a zero-sum proposition: there are many ways in which a more fair world could be more productive, too. But in the short term, a balance sheet must balance. Salary increases in the bottom will come from salary decreases from the top, and the dudebro is top in tech.

We should first note that many and possibly most white men will see their wages increase under a flat-wage scheme, as most people earn below the mean.

Secondly, some men will be willing to pay for justice in the form of equal pay for equal work. An eloquent sales pitch as to what they are buying will help.

Some men would like to pay but have other obligations that a "mean" salary just can't even. Welp, there are lots of jobs out there. We'll give you a glowing recommendation :)

Finally there will be dudes that are fine with the pay gap. Maybe they have some sort of techno-libertarian justification? Oh well! They will find other jobs. As someone who cares about justice, you don't really want to work with these people anyway. Call it "bad culture fit", and treat it as a great policy to improve the composition of your organization.

an aside: what are we here for anyway?

A frequent objection to workplace change comes in the form of a pandering explanation of what companies are for, that corporations are legally obligated to always proceed along the the most profitable path.

I always find it extraordinarily ignorant to hear this parroted by people in tech: it's literally part of the CS canon to learn about the limitations of hill-climbing as an optimization strategy. But on the other hand, I do understand; the power of just-so neoliberal narrative is immense, filling your mind with pat explanations, cooling off your brain into a poorly annealed solid mass.

The funny thing about corporate determinism that it's not even true. Folks who say this have rarely run companies, otherwise they should know better. Loads of corporate decisions are made with a most tenuous link on profitability, and some that probably even go against the profit interest. It's always easy to go in a known-profitable direction, but that doesn't mean it's the only way to go, nor that all the profitable directions are known.

Sometimes this question is framed in the language of "what MyDesignCo really cares about is good design; we're worried about how this measure might affect our output". I respect this question more, because it's more materialist (you can actually answer the question!), but I disagree with the premise. I don't think any company really cares about the product in a significant way. Take the design company as an example. What do you want on your tombstone: "She made good advertisements"??? Don't get me wrong, I like my craft, and I enjoy practicing it with my colleagues. But if on my tombstone they wrote "He worked for justice", and also if there were a heaven, I would be p OK with that. What I'm saying is, you start a company, you have an initial idea, you pivot, whatever, it doesn't matter in the end. What matters is you relationship with life on the planet, and that is the criteria you should use to evaluate what you do.

Beyond all that -- it's amazing how much wrong you can wrap up in a snarky hacker news one-liner -- beyond all that, the concern begs the question by assuming that a flat-wage arrangement is less profitable. People will mention any down-side they can but never an up-side.

possible flat-wage up-sides from a corporate perspective

With that in mind, let's consider some ways that a flat wage can actually improve the commercial fate of a company.

A company with a flat wage already has a marketing point that they can use to attract people that care about this sort of thing. It can make your company stand out from the crowd and attract good people.

The people you attract will know you're doing the flat-wage thing, and so will be predisposed to want to work together. This can increase productivity. It also eliminates some material sources of conflict between different roles in an organization. You would still need "human resources" people but they would need to spend less time on mitigating the natural money-based conflicts that exist in other organizations.

Another positive side relates to the ability of the company to make collective sacrifices. For example a company that is going through harder times can collectively decide not to raise wages or even to lower them, rather than fire people. Obviously this outcome depends on the degree to which people feel responsible for the organization, which is incomplete without a feeling of collective self-management as in a cooperative, but even in a hierarchical organization these effects can be felt.

Incidentally a feeling of "investment" in the organization is another plus. When you work in a company in which compensation depends on random factors that you can't see, you always wonder if you're being cheated out of your true value. If everyone is being paid the same you know that everyone's interest in improving company revenue is aligned with their own salary interest -- you can't gain by screwing someone else over.

limitations of a flat wage at improving justice

All that said, paying all workers/partners/employees the same hourly wage is not a panacea for justice. It won't dismantle patriarchy overnight. It won't stop domestic violence, and it won't stop the cops from killing people of color. It won't stop microagressions or harassment in the workplace, and in some ways if there are feelings of resentment, it could even exacerbate them. It won't arrest attrition of marginalized people from the tech industry, and it won't fix hiring. Enacting the policy in a company won't fix the industry as a whole, even if all companies enacted it, as you would still have different wages at different companies. It won't fix the situation outside of the tech industry; a particularly egregious example being that in almost all places, cleaning staff are hired via subcontracts and not as employees. And finally, it won't resolve class conflict at work: the owner still owns. There are still pressures on the owner to keep the whole balance sheet secret, even if the human resources side of things is transparent.

All that said, these are mainly ways in which an equal wage policy is incomplete. A step in the right direction, on a justice level, but incomplete. In practice though the objections you get will be less related to justice and more commercial in nature. Let's take a look at some of them.

commercial challenges to a flat wage

Having everyone paid the same makes it extraordinarily difficult to hire people that are used to being paid on commission, like sales people. Sales people drive Rolexes and wear Mercedes. It is very, very tough to hire good sales people on salary. At my work we have had some limited success hiring, and some success growing technical folks into sales roles, but this compensation package will hinder your efforts to build and/or keep your sales team.

On the other hand, having the same compensation between sales and engineering does eliminate some of the usual sales-vs-product conflicts of interest.

Another point it that if you institute a flat-wage policy, you will expect to lose some fraction of your highly-skilled workers, as many of these are more highly paid. There are again some mitigations but it's still a reality. Perhaps more perniciously, you will have greater difficulties hiring senior people: you literally can't get into a bidding war with a competitor over a potential hire.

On the flip side, a flat salary can make it difficult to hire more junior positions. There are many theories here but I think that a company is healthy when it has a mix of experiences, that senior folks and junior folks bring different things to the table. But if your flat wage is higher than the standard junior wage, then your potential junior hires are now competing against more senior people -- internally it will be hard to keep a balance between different experiences.

Indeed junior workers that you already have are now competing at their wage level with potential hires that might be more qualified in some way. An unscrupulous management could fire those junior staff members and replace them with more senior candidates. An equal wage policy does not solve internal class conflicts; you need to have equal ownership and some form of workplace democracy for that.

You could sort people into pay grades, but in many ways this would formalize injustice. Marginalized people are by definition not equally distributed across pay grades.

Having a flat wage also removes a standard form of motivation, that your wage is always rising as you get older. It could be that after 5 years in a job, maybe your wages went up because the company's revenues went up, but they're still the same as a new hire's -- how do you feel about that? It's a tough question. I think an ever-rising wage has a lot of negative aspects, including decreasing the employability of older workers, but it's deeply rooted in tech culture at least.

Another point is motivation of people within the same cadre. Some people are motivated by bonuses, by performing relatively well compared to their peers. This wouldn't be an option in an organization with a purely flat wage. Does it matter? I do not know.

work with me tho

As the prophet Pratchett said, "against one perfect moment, the centuries beat in vain". There are some definite advantages to a flat wage within a company: it's concrete, it can be immediately enacted, it solves some immediate problems in a local way. Its commercial impact is unclear, but the force of narrative can bowl over many concerns in that department: what's important is to do the right thing. Everybody knows that!

As far as implementation, I see three-and-a-half ways this could happen in a company.

The first is that equal pay could be a founding principle of the company. This was mostly the case in the company I work for (and operate, and co-own equally with the other 40 or so partners). I wasn't a founder of the company, and the precise set of principles and policies has changed over the 15 years of the company's life, but it's more obvious for this arrangement to continue from a beginning than to change from the normal pay situation.

The second is, the change could come from the top down. Some CEOs get random brain waves and this happens. In this case, the change is super-easy to make: you proclaim the thing and it's done. As a person who has had to deal with cash-flow and payroll and balance sheets, I can tell you that this considerably simplifies HR from a management perspective.

The third is via collective action. This only works if workers are able to organize and can be convinced to be interested in justice in this specific way. In some companies, a worker's body might simply be able to negotiate this with management -- e.g., we try it out for 6 months and see. In most others you'd probably need to unionize and strike.

Finally, if this practice were more wider-spread in a sector, it could be that it just becomes "best practice" in some way -- that company management could be shamed into doing it, or it could just be the way things are done.


Many of these points are probably best enacted in the context of a worker-owned cooperative, where you can do away with the worker-owner conflict at the same time. But still, they are worth thinking of in a broader context, and worth evaluating in the degree to which they work for (or against) justice in the workplace. But enough blathering from me today :) Happy hacking!

by Andy Wingo at March 24, 2016 09:49 PM

March 22, 2016

Carlos García Campos

WebKitGTK+ 2.12

We did it again, the Igalia WebKit team is pleased to announce a new stable release of WebKitGTK+, with a bunch of bugs fixed, some new API bits and many other improvements. I’m going to talk here about some of the most important changes, but as usual you have more information in the NEWS file.


FTL JIT is a JavaScriptCore optimizing compiler that was developed using LLVM to do low-level optimizations. It’s been used by the Mac port since 2014 but we hadn’t been able to use it because it required some patches for LLVM to work on x86-64 that were not included in any official LLVM release, and there were also some crashes that only happened in Linux. At the beginning of this release cycle we already had LLVM 3.7 with all the required patches and the crashes had been fixed as well, so we finally enabled FTL for the GTK+ port. But in the middle of the release cycle Apple surprised us announcing that they had the new FTL B3 backend ready. B3 replaces LLVM and it’s entirely developed inside WebKit, so it doesn’t require any external dependency. JavaScriptCore developers quickly managed to make B3 work on Linux based ports and we decided to switch to B3 as soon as possible to avoid making a new release with LLVM to remove it in the next one. I’m not going to enter into the technical details of FTL and B3, because they are very well documented and it’s probably too boring for most of the people, the key point is that it improves the overall JavaScript performance in terms of speed.

Persistent GLib main loop sources

Another performance improvement introduced in WebKitGTK+ 2.12 has to do with main loop sources. WebKitGTK+ makes an extensive use the GLib main loop, it has its own RunLoop abstraction on top of GLib main loop that is used by all secondary processes and most of the secondary threads as well, scheduling main loop sources to send tasks between threads. JavaScript timers, animations, multimedia, the garbage collector, and many other features are based on scheduling main loop sources. In most of the cases we are actually scheduling the same callback all the time, but creating and destroying the GSource each time. We realized that creating and destroying main loop sources caused an overhead with an important impact in the performance. In WebKitGTK+ 2.12 all main loop sources were replaced by persistent sources, which are normal GSources that are never destroyed (unless they are not going to be scheduled anymore). We simply use the GSource ready time to make them active/inactive when we want to schedule/stop them.

Overlay scrollbars

GNOME designers have requested us to implement overlay scrollbars since they were introduced in GTK+, because WebKitGTK+ based applications didn’t look consistent with all other GTK+ applications. Since WebKit2, the web view is no longer a GtkScrollable, but it’s scrollable by itself using native scrollbars appearance or the one defined in the CSS. This means we have our own scrollbars implementation that we try to render as close as possible to the native ones, and that’s why it took us so long to find the time to implement overlay scrollbars. But WebKitGTK+ 2.12 finally implements them and are, of course, enabled by default. There’s no API to disable them, but we honor the GTK_OVERLAY_SCROLLING environment variable, so they can be disabled at runtime.

But the appearance was not the only thing that made our scrollbars inconsistent with the rest of the GTK+ applications, we also had a different behavior regarding the actions performed for mouse buttons, and some other bugs that are all fixed in 2.12.

The NetworkProcess is now mandatory

The network process was introduced in WebKitGTK+ since version 2.4 to be able to use multiple web processes. We had two different paths for loading resources depending on the process model being used. When using the shared secondary process model, resources were loaded by the web process directly, while when using the multiple web process model, the web processes sent the requests to the network process for being loaded. The maintenance of this two different paths was not easy, with some bugs happening only when using one model or the other, and also the network process gained features like the disk cache that were not available in the web process. In WebKitGTK+ 2.12 the non network process path has been removed, and the shared single process model has become the multiple web process model with a limit of 1. In practice it means that a single web process is still used, but the network happens in the network process.

NPAPI plugins in Wayland

I read it in many bug reports and mailing lists that NPAPI plugins will not be supported in wayland, so things like will not work. That’s not entirely true. NPAPI plugins can be windowed or windowless. Windowed plugins are those that use their own native window for rendering and handling events, implemented in X11 based systems using XEmbed protocol. Since Wayland doesn’t support XEmbed and doesn’t provide an alternative either, it’s true that windowed plugins will not be supported in Wayland. Windowless plugins don’t require any native window, they use the browser window for rendering and events are handled by the browser as well, using X11 drawable and X events in X11 based systems. So, it’s also true that windowless plugins having a UI will not be supported by Wayland either. However, not all windowless plugins have a UI, and there’s nothing X11 specific in the rest of the NPAPI plugins API, so there’s no reason why those can’t work in Wayland. And that’s exactly the case of, for example. In WebKitGTK+ 2.12 the X11 implementation of NPAPI plugins has been factored out, leaving the rest of the API implementation common and available to any window system used. That made it possible to support windowless NPAPI plugins with no UI in Wayland, and any other non X11 system, of course.


And as usual we have completed our API with some new additions:


by carlos garcia campos at March 22, 2016 10:36 AM

March 14, 2016

Javier Muñoz

Requester Pays Bucket goes upstream in Ceph

The last Requester Pays Buckets patches went upstream in Ceph some days ago. This new feature is available in the master branch now, and it will be part of the next Ceph Jewel release.

In S3, this feature is used to configure buckets in such a way that the user who request the contents will pay transfer fee.

Along this post I will introduce the feature in order to know how this concept maps to Ceph and how it works under the hood.

Understanding the feature

The Requester Pays Buckets feature originates in the Amazon S3 storage. It is part of the Amazon business model related to the Cloud storage.

In S3, the bucket owners pay for all Amazon S3 storage and data transfer costs associated with their buckets. This approach makes sense to cover the use cases where the bucket owners use the service to host/consume the content and/or they want to share the content with some authenticated users.

On the other hand, a relevant number of use cases use S3 to share a huge amount of content requiring some kind of option to balance the costs among the different content consumers in that bucket. This option is known as 'Requester Pays Buckets'

Bear in mind this feature becomes critical when the content is fairly popular and the bucket owner have many requests. This is the typical use case among global content distributors. In this case, the transfer fees may become a major issue.

Mapping the feature to Ceph

As mentioned, this feature comes from the S3 storage where it is used to balance costs related to data transfer in buckets. When enabled the requester pays for the data transfer and the request although the bucket owner still pays for the data storage.

The S3 algorithm also charges the bucket owner for the request under the following conditions:

  • The requester doesn't tag the request as 'Requester Pays Bucket'
  • The request authentication fails
  • The request is anonymous
  • The request is a SOAP request

Ceph does not implement any billing and account management service oriented to charge users so the feature can not be ported as it is.

In this point we made the decision to implement the mechanisms behind of this feature but keeping out the billing policies of Ceph. This way you can find the proper support to reproduce the original Amazon S3 billing behaviour although you will be free to wrap this support with different and more flexible billing policies if needed.

To keep the compatibility with the tools in the S3 ecosystem, the S3 interface of this feature is in place. The Boto library was used to test the proper behaviour.

In the backend, the usage logging was extended to accommodate the new feature. Now the usage logging records are not displayed by the bucket owner. They are listed by the bucket user where this user may be the owner or not.

This change required a new way to index the records in the usage logging although it doesn't break the compatibility with the previous Ceph versions. Bear in mind the old records are displayed in the new format.

There are two notable differences between the S3 and the RGW S3 algorithms. The S3 algorithm charges the bucket owner for the request if the requester doesn't tag the request as 'Requester Pays Bucket' or the request authentication fails. In the case of RGW S3 both cases are logged under the requester instead of the owner.


The Ceph RGW S3 REST API was extended to support the following use cases:

Ceph RGW S3 REST API implements the same behaviour and semantics as S3. It is needed to support the S3 tooling ecosystem in a transparent way.

Some examples with Python and Boto

You can use the following Python scripts to set, retrieve and download objects using the Requester Pays Buckets feature in Ceph. Those examples require Boto.

The usage log

The usage log is the place where the Requester Pays Bucket information is aggregated. There are three fields related with the feature ('user', 'owner' and 'payer').

The 'user' (or 'requester') is the client credential accessing the bucket content.

The 'owner' is the client credential creating the bucket.

The 'payer' is the client credential paying the data transfer. If this field doesn't exist the 'owner' is the client credential paying the data transfer.

The new virtual error buckets

One virtual error bucket is a new abstraction to log the usage on non existent resources (404 Not Found). All virtual error buckets have the same name ('-').

Having a look in the 'ops' and 'successful_ops' fields under the virtual error buckets, you will see the second one is always zero.

Each user has its own virtual error bucket to collect 404 errors. Ceph will add a virtual error bucket with the first 404 error available. The virtual error buckets live in the usage logging only.

Wrap up

With this new feature in place, Ceph implements the required support to know in detail who is accessing the RGW S3 buckets (owners vs authenticated users)

The feature brings in new ways to understand and track the bucket content in massive use cases where the costs may be assigned to different users.

The new usage logging also contains more detailed information to be used in regular reports to customers ('owner vs payer' categories, ops on virtual error buckets, etc)


My work in Ceph is sponsored by Outscale and has been made possible by Igalia and the invaluable help of the Ceph development team. Thank you guys!

by Javier at March 14, 2016 11:00 PM

March 13, 2016

Michael Catanzaro

Do you trust this application?

Much of the software you use is riddled with security vulnerabilities. Anyone who reads Matthew Garrett knows that most proprietary software is a lost cause. Some Linux advocates claim that free software is more secure than proprietary software, but it’s an open secret that tons of popular desktop Linux applications have many known, unfixed vulnerabilities. I rarely see anybody discuss this, as if it’s taboo, but it’s been obvious to me for a long time.

Usually vulnerabilities go unreported simply because nobody cares to look. Here’s an easy game: pick any application that makes HTTP connections — anything stuck on an old version of WebKit is a good place to start — and look for the following basic vulnerabilities:

  • Failure to use TLS when required (GNOME Music, GNOME Weather; note these are the only apps I mention here that do not use WebKit). This means the application has no security.
  • Failure to perform TLS certificate verification (Shotwell and Pantheon Photos). This means the application has no security against active attackers.
  • Failure to perform TLS certificate verification on subresources (Midori and XombreroLiferea). As sites usually send JavaScript in subresources, this means active attackers can get total control of the page by changing the script, without being detected (update: provided JavaScript is enabled). (Regrettably, Epiphany prior to 3.14.0 was also affected by this issue.)
  • Failure to perform TLS certificate verification before sending HTTP headers (private Midori bugBanshee). This leaks secure cookies, usually allowing attackers full access to your user account on a website. It also leaks the page you’re visiting, which HTTPS is supposed to keep private. (Update: Regrettably, Epiphany prior to 3.14.0 was affected by this issue. Also, the WebKit 2 API in WebKitGTK+ prior to 2.6.6, CVE-2015-2330.)

Except where noted, the latest release of all of the applications listed above are still vulnerable at the time of this writing, even though almost all of these bugs were reported long ago. With the exception of Shotwell, nobody has fixed any of these issues. Perhaps nobody working on the project cares to fix it, or perhaps nobody working on the project has the time or expertise to fix it, or perhaps nobody is working on the project anymore at all. This is all common in free software.

In the case of Shotwell, the issue has been fixed in git, but it might never be released because nobody works on Shotwell anymore. I informed distributors of the Shotwell vulnerability three months ago via the GNOME distributor list, our official mechanism for communicating with distributions, and advised them to update to a git snapshot. Most distributions ignored it. This is completely typical; to my knowledge, the stable releases of all Linux distributions except Fedora are still vulnerable.

If you want to play the above game, it should be very easy for you to add to my list by checking only popular desktop software. A good place to start would be to check if Liferea or Xombrero (supposedly a security-focused browser) perform TLS certificate verification before sending HTTP headers, or if Banshee performs verification on subresources, on the principle that vulnerable applications probably have other related vulnerabilities. (I did not bother to check.)

On a related note, many applications use insecure dependencies. Tons of popular GTK+ applications are stuck on an old, deprecated version of WebKitGTK+, for example. Many popular KDE applications use QtWebKit, which is old and deprecated. These deprecated versions of WebKit suffer from well over 100 remote code execution vulnerabilities fixed upstream that will probably never be backported. (100 is a lowball estimate; I would be unsurprised if the real number for QtWebKit was much, much higher.)

I do not claim that proprietary software is generally more secure than free software, because that is absolutely not true. Proprietary software vendors, including big name corporations that you might think would know better, are still churning out consumer products based on QtWebKit, for example. (This is unethical, but most proprietary software vendors do not care about security.) Not that it matters too much, as proprietary software vendors rarely provide comprehensive security updates anyway. (If your Android phone still gets updates, guess what: they’re superficial.) A few prominent proprietary software vendors really do care about security and do good work to keep their users safe, but they are rare exceptions, not the rule.

It’s a shame we’re not able to do better with free software.

by Michael Catanzaro at March 13, 2016 01:39 AM

March 12, 2016

Michael Catanzaro

Do you trust this website?

TLS certificate validation errors are much less common on today’s Internet than they used to be, but you can still expect to run into them from time to time. Thanks to a decade of poor user interface decisions by web browsers (only very recently fixed in major browsers), users do not understand TLS and think it’s OK to bypass certificate warnings if they trust the site in question.

This is completely backwards. You should only bypass the warning if you do not trust the site.

The TLS certificate does not exist to state that the site is somehow trustworthy. It exists only to state that the site is the site you think it is: to ensure there is no man in the middle (MITM) attacker. If you are visiting and get a certificate validation error, that means that even though your browser is displaying the URL, there’s zero reason to believe you’re really visiting rather than an attack site. Your browser can tell the difference, and it’s warning you. (More often, the site is just broken, or “misconfigured” if you want to be generous, but you and your browser have no way to know that.)

If you do not trust the site in question (e.g. you do not have any user account on the site), then there is not actually any harm in bypassing the warning. You don’t trust the site, so you do not care if a MITM is changing the page, recording your passwords, sending fake data to the site in your name, or whatever else.

But if you do trust the site, this error is cause to freak out and not continue, because it gives you have strong reason to believe there is a MITM attacker. Once you click continue, you should assume the MITM has total control over your interaction with the trusted website.

I will pick on Midori for an example of how bad design can confuse users:

The button label reads
The button label reads “Trust this website,” but it should read “I do not trust this website.”

As you can see from the label, Midori has this very wrong. Users are misled into continuing if they trust the website: the very situation in which it is unsafe to continue.

Firefox and Chrome handle this much better nowadays, but not perfectly. Firefox says “Your connection is not secure” while Chrome says “Your connection is not private.” It would be better to say: “This doesn’t look like the real”

by Michael Catanzaro at March 12, 2016 10:55 PM

February 29, 2016

Javier Muñoz

AWS Signature Version 4 goes upstream in Ceph

The first stable AWS4 implementation in Ceph went upstream some days ago. It is now available in the master branch and it will ship with the next Ceph release Jewel as planned.

I will use this blog post to talk about this new feature shipping in Ceph Jewel and the current effort by Outscale and Igalia to raise the level of compatibility between the Ceph RGW S3 and Amazon S3 interfaces.

In detail, I will describe the signing process in AWS4, how it works in Ceph RGW, the current coverage and the next steps in the pipeline around this authentication algorithm.

S3 request authentication algorithms

If you are not familiar with request authentication, regions, endpoints, credential scopes, etc. in Amazon S3 you could want to read one of my last posts about this stuff. It offers a simple and quick overview of this stuff while introducing the concepts and terms I will use in this blog post. A more long and low-level technical reading is available in the AWS documentation too. I will use this last one to drive/compare the implementations in an reasonable level for everybody.

Amazon S3 provides storage through web services interfaces (REST, SOAP and BitTorrent). By the way, Ceph RGW implements a compatible S3 REST interface in order to be interoperable with the Amazon S3 REST ecosystem (tools, libraries, third-party services and so on).

This S3 REST interface works over the Hypertext Transfer Protocol (HTTP) with the same HTTP verbs (GET, POST, PUT, DELETE, etc) that web browsers use to retrieve web pages and to send data to remote servers.

There are two kind of Amazon S3 RESTful interactions: authenticated and anonymous. The way to implement request authentication is signing these requests or interactions using an authentication algorithm. In the Amazon signing process' public specification there are two authentication algorithms currently in use: AWS2 and AWS4.

AWS2 and AWS4 in Ceph

The new Signature Version 4 (AWS4) is the current AWS signing protocol. It improves the previous Signature Version 2 (AWS2) significantly. Take into consideration these algorithm strenghts of AWS4 over AWS2

  • To sign a message, the signer use a signing key that is derived from her secret access key rather than using the secret access key itself
  • The signer derives the signing key from the credential scope, which means that she doesn't need to include the key itself in the request
  • Each signing task requires to use the credential scope

The benefits of using AWS4 in Ceph are clear:

  • Verification of the identify of the requester via access key ID and secret access key
  • Request tampering prevention while the request is in transit
  • Replay attacks protection within 15 minutes of the timestamp in the request

Authentication methods

The signing process can express authentication information by using one of the following methods:

  • HTTP Authorization header. The most common method of authenticating. The signature calculations vary depending on the method you choose to transfer the request payload; 'transfer payload in a single chunk' vs 'transfer payload in multiple chunks (chunked upload)'
  • Query string parameters. It uses a query string to express a request entirely in an URL together with the authorization information. This type of URL is also known as a presigned URL.

The current Ceph AWS4 implementation supports all authentication methods but transfering payload in multiple chunks (chunked upload). It is in the pipeline though.

Lacking chunked upload does not impact the Ceph RGW performance. The server side always use a streaming-hash approach to compute the signature.

Computing a Signature

The idea behind of computing a signature is using a cryptographic hash function over the request, and then use the hash value, some other values from the request, and a secret access key to create a signed hash. That is the signature.

Depending on the kind of authentication method used and the concrete request the algorithm requires different inputs. As one example to illustrate one of the authentication paths we can explore the required steps to craft a signature in the HTTP Authorization header case.

As you can see it computes a canonical request, a string to sign and a signing key as part of the process.

The final signature is the result of hashing the signing key and the string to sign. The keyed-hash message authentication code used along the signature computation is HMAC-SHA256

Default configuration in Ceph Jewel

Ceph Jewel is planned to ship with AWS2 and AWS4 enabled by default. You will not need to configure any extra switch to authenticate with AWS2 or AWS4.

Region constraints

In Amazon S3 the region enforces the allowed authentication algorithms.

In the case of Ceph RGW the code doesn't implement any kind of constraint related to the region names.

The next steps in the pipeline

The chunked upload feature to transfer the payload in multiple chunks is part of the pipeline definitely.

Some kind of integration with zones/regions to provide 'signature binding' could make sense too. It would help to enforce auth policies and so on.


My work in Ceph is sponsored by Outscale and has been made possible by Igalia and the invaluable help of the Ceph development team. Thank you guys!

by Javier at February 29, 2016 11:00 PM

February 26, 2016

Xabier Rodríguez Calvar

Über latest Media Source Extensions improvements in WebKit with GStreamer

In this post I am going to talk about the implementation of the Media Source Extensions (known as MSE) in the WebKit ports that use GStreamer. These ports are WebKitGTK+, WebKitEFL and WebKitForWayland, though only the latter has the latest work-in-progress implementation. Of course we hope to upstream WebKitForWayland soon and with it, this backend for MSE and the one for EME.

My colleague Enrique at Igalia wrote a post about this about a week ago. I recommend you read it before continuing with mine to understand the general picture and the some of the issues that I managed to fix on that implementation. Come on, go and read it, I’ll wait.

One of the challenges here is something a bit unnatural in the GStreamer world. We have to process the stream information and then make some metadata available to the JavaScript app before playing instead of just pushing everything to a playing pipeline and being happy. For this we created the AppendPipeline, which processes the data and extracts that information and keeps it under control for the playback later.

The idea of the our AppendPipeline is to put a data stream into it and get it processed at the other side. It has an appsrc, a demuxer (qtdemux currently

) and an appsink to pick up the processed data. Something tricky of the spec is that when you append data into the SourceBuffer, that operation has to block it and prevent with errors any other append operation while the current is ongoing, and when it finishes, signal it. Our main issue with this is that the the appends can contain any amount of data from headers and buffers to only headers or just partial headers. Basically, the information can be partial.

First I’ll present again Enrique’s AppendPipeline internal state diagram:

First let me explain the easiest case, which is headers and buffers being appended. As soon as the process is triggered, we move from Not started to Ongoing, then as the headers are processed we get the pads at the demuxer and begin to receive buffers, which makes us move to Sampling. Then we have to detect that the operation has ended and move to Last sample and then again to Not started. If we have received only headers we will not move to Sampling cause we will not receive any buffers but we still have to detect this situation and be able to move to Data starve and then again to Not started.

Our first approach was using two different timeouts, one to detect that we should move from Ongoing to Data starve if we did not receive any buffer and another to move from Sampling to Last sample if we stopped receiving buffers. This solution worked but it was a bit racy and we tried to find a less error prone solution.

We tried then to use custom downstream events injected from the source and at the moment they were received at the sink we could move from Sampling to Last sample or if only headers were injected, the pads were created and we could move from Ongoing to Data starve. It took some time and several iterations to fine tune this but we managed to solve almost all cases but one, which was receiving only partial headers and no buffers.

If the demuxer received partial headers and no buffers it stalled and we were not receiving any pads or any event at the output so we could not tell when the append operation had ended. Tim-Philipp gave me the idea of using the need-data signal on the source that would be fired when the demuxer ran out of useful data. I realized then that the events were not needed anymore and that we could handle all with that signal.

The need-signal is fired sometimes when the pipeline is linked and also when the the demuxer finishes processing data, regardless the stream contains partial headers, complete headers or headers and buffers. It works perfectly once we are able to disregard that first signal we receive sometimes. To solve that we just ensure that at least one buffer left the appsrc with a pad probe so if we receive the signal before any buffer was detected at the probe, it shall be disregarded to consider that the append has finished. Otherwise, if we have seen already a buffer at the probe we can consider already than any need-data signal means that the processing has ended and we can tell the JavaScript app that the append process has ended.

Both need-data signal and probe information come in GStreamer internal threads so we could use mutexes to overcome any race conditions. We thought though that deferring the operations to the main thread through the pipeline bus was a better idea that would create less issues with race conditions or deadlocks.

To finish I prefer to give some good news about performance. We use mainly the YouTube conformance tests to ensure our implementation works and I can proudly say that these changes reduced the time of execution in half!

That’s all folks!

by calvaris at February 26, 2016 12:30 PM

February 25, 2016

Javier Muñoz

Ceph, a free unified distributed storage system

Over the last few months I have been working in Ceph, a free unified distributed storage system, in order to implement some missing features in RADOS gateway, help some customers with Ceph clusters in production and fixing bugs.

This effort is part of my daily work here in Igalia working in upstream projects. As you could know, Igalia works in the Cloud arena providing services on development, deployment and orchestration around interesting open projects.

Together with Ceph (storage) we are also working upstream in Qemu (compute) and Snabb (networking). All these projects are in the core to create private and public clouds with Open Source.

My goal with this first post is introducing Ceph in a simple and easy way to understand this marvelous piece of software. I will cover the design and main innovations in Ceph together with its architecture, major use cases and relationship with OpenStack (a well-known free and open-source software platform for cloud computing).

Understanding Ceph

Ceph is an object storage based free software storage platform that stores data on a single distributed computer cluster. I would say this definition catches the essence of Ceph perfectly. It is also the foundation to understand its innovations, the architecture and the performance/scalability factors in Ceph.

Let's start with the object storage. The object storage is a storage architecture that manages data as objects, as opposed to other storage architectures like file systems which manage data as a file hierarchy and block storage which manages data as blocks within sectors and tracks. Each object typically includes the data, a variable amount of metadata, and a globally unique identifier.

On top of this object storage, Ceph provides a block interface (RBD), an object interface (RGW) and a filesystem interface (CephFS).

If we add a smart cluster approach in the previous design we will have a reliable object storage service that can scales to many thousands of devices. This reliable object storage service is known as RADOS (Reliable Autonomic Distributed Object Storage) in the current Ceph implementation.

But what is a 'smart cluster approach' here? At the petabyte and exabyte scale, systems are necessarily dynamic. They are built incrementally, they grow and contract with the deployment of new storage and decommissioning of old devices, devices fail and recover on a continous basis, and large amounts of data are created and destroyed. RADOS takes care of a consistent view of the data distribution and consistent read and write access to data objects.

RADOS also provides storage nodes with complete knowledge of the distribution of data in the systems, devices can act semi-autonomously using peer-to-peer like protocols to self-manage data replication, participate in failure detection and respond to device failures and the resulting changes in the distribution of data by replicating or migrating data objects.

If we consider the minimal configuration together with the basic components needed to set up a RADOS system, we will have a set of object storage daemons (OSDs) and a small group of monitors (MONs) reponsible for managing OSD cluster membership.

In Ceph this OSD cluster membership requires a cluster map. This cluster map specifies cluster membership, device state and the mapping of data objects to devices. The data distribution is specified first by mapping objects to placemente groups (PGs) and then mapping each PG onto a set of devices. The algorithm taking care of these steps is known as CRUSH (Controlled, Scalable, Decentralized Placement of Replicated Data)

With this information in mind we may consider two major innovations in Ceph RADOS:

  • The CRUSH algorithm. The way how Ceph clients and Ceph OSD daemons compute information (hashing function) about object location instead of having to depend on a central lookup table
  • Smart daemons. The Ceph's OSD daemons and Ceph clients are cluster aware. This enables OSDs interact directly with other OSDs and MONs. Ceph clients interacts with OSDs directly.

Both items add significant intelligence in the solution to avoid bottlenecks and, at the same time, pursue hyperscale at the petabyte and exabyte scale.

In this point we should have enough information to understand the raw Ceph architecture. Let's have a look in the usual block diagram for Ceph:

  • RGW. A web services gateway for object storage, compatible with S3 and Swift
  • RBD. A reliable, fully distributed block device with cloud platform integration
  • CEPHFS. A distributed file system with POSIX semantics and scale-out metatadata management
  • LIBRADOS. A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
  • RADOS. A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors

Mapping out the major components involved under the hood and their interactions makes it still possible getting a more detailed version of this architecture:

The OpenStack basics

Although this is an introduction post in Ceph I will describe OpenStack and its relationship with Ceph briefly. It will be useful later.

Ceph may be used alone but some of its most interesting use cases take place as part of OpenStack. A quick overview on OpenStack will be useful to understand how the OpenStack and Ceph components work together to provide reliable and scalable storage.

The current stable release for OpenStack is 'Liberty' and it includes 17 components (compute, image services, object store, etc). All those components have well-known code names (Nova, Glance, Swift, etc)

The next picture catches a very high level abstraction for OpenStack:

As you can see, Glance (VM image manager) and Cinder (block storage) are two core services in the solution.

We mentioned the previous picture shows a simple view of OpenStack. A more accurate diagram together with the relationships among the services is available in the next picture for 'Folsom', a previous release (2012)

While OpenStack evolves and include new services, this 'Folsom' picture should be good enough to introduce the services related to storage and the level of complexity of OpenStack.

So the storage services in place are Swift (object store service), Glance (image service) and Cinder (block storage service).

Those services work in tandem to cover the general and specific requirements for storage in OpenStack.

Using Ceph in OpenStack

The main integration points between OpenStack and Ceph are the object and block device interfaces.

The RADOS gateway (RGW) and the RADOS block device (RBD) interfaces are used to provide the required storage to 5 services (Keystone, Swift, Cinder, Glance and Nova)

It is worth mentioning the compute service (Nova) interfaces the RBD layer via a hypervisor. An open source hypervisor working like a charm with Ceph is Qemu/KVM. It uses librbd and librados.

Other component to mention in the stack is libvirt. OpenStack uses libvirt to configure Qemu/KVM properly.

Ceph RBD dominates the choice for Cinder drivers currently, as stated in the sixth public survey of OpenStack users (page 31)

The physical deployment of Ceph and OpenStack

Setting up and operating a reliable and scalable storage cluster is always demanding. It requires a careful planning along many different aspects. Some of these critical decisions are related to the cluster capacity (RAM, disks, number of nodes, use profiles, etc)

Although it is always possible going with your own custom configuration some hardware providers offer several standard configurations.

As a random and arbitrary example, we can have a look in the HPE Helion portfolio. This set of solutions is a mix of open-source software and integrated systems for enterprise cloud computing.

The next picture shows the physical space required and how it compares to the different logical components in the architecture.

The new and old use cases

The production of data is expanding at an astonishing pace. Two major drivers in this rapid growth of global data are the analog-to-digital switch (software is everywhere) and the rapid increase in data generation by individuals and companies.

The new use cases related to storage nowadays are radically different of the previous ones a few years ago. These new use cases are all about storing and retrieving unstructured data like photos, videos and social media in massive scale. All this stuff requires real-time analitycs and reporting together with efficient processing.

To get these requirements together, some companies are extending/migrating their current datacenters to support software-defined approaches. As consecuence, those new datacenters leverage virtualization concepts such as abstraction, pooling, and automation to all of the data center’s resources and services to achieve IT as a service. In this vision all elements of the infrastructure (compute, storage, networking and security) are virtualized and delivered as a service.

In this context, we can identify some new and well-known use cases along the next 5 different categories. The original classification is used by the RedHat Storage team. Take into consideration I am merging Cloud infrastructure and Virtualzation here.

  • Big data analytics. Storing, integrating, and analyzing data at petabyte scale
  • Cloud infrastructure and Virtualization. Virtual machine storage and storage for tenant applications (Swift/S3 API)
  • Rich media. Massive scalability and cost containment (scaling out with commodity hardware)
  • File sync and share. Secure mobility, collaboration and the need for anytime, anywhere access to files
  • Archival data. Agile, scalable, cost-effective and flexible unified storage (objects, blocks and file systems)

Ceph is used to support all these use cases in production with great results.

Pushing Ceph to the limit

Some folks in the CERN IT Department are pushing Ceph to the limit. They use Ceph as part of an OpenStack deployment and I have to say the numbers are great.

The solution is a large distributed OpenStack infrastructure with around 10,000 VMs and 100,000 CPU cores (1000 Cinder volumes and 1500 Glance images). The Cloud is predominantly used for physics data analysis but they also reported on a long tail of conventional IT services and user-managed application VMs.

If you want to know more on this Ceph cluster operated by CERN, I would recommend to watch this video at Vancouver Summit 2015.

In brief, and beyond of the great insights shared along the talk, the current Ceph version scales out to 10 PB. In that scale it just works. Over that threshold, it requires extra configuration adjustments.


I told you! This piece of software is marvelous!

I plan to add new blog entries to cover some of the new features implemented in the previous months. They are upstream code now so you will be able to enjoy them in Jewel!

If you are looking for some kind of support related to development, design, deployment, etc. in Ceph or you would love to see some new feature in the next releases. Feel free to contact me!

by Javier at February 25, 2016 11:00 PM

February 24, 2016

Manuel Rego

Igalia Coding Experience on Web Engines

In Igalia we’re looking for people to join the Igalia Coding Experience program. Basically we’re opening positions for internships in different fields related to some of our teams like multimedia, compilers, networking or web platform. The main purpose is to give students and recent graduates an initial taste of coding in industry. Where you could work together with several Igalia hackers on different free software projects.

I’m part of the web platform team where we work on different tasks related to the core of several web engines. Apart from our work on CSS Grid Layout in Blink and WebKit, that you probably know if you follow my blog, Igalia has been working on other topics like:

In our team we’re looking for a student willing to help on some of this topics. Probably, the final work might be somehow related to CSS Grid Layout where we’ve a bunch of peripheral tasks that would be really useful. Some ideas off the top of my head:

This is not meant to be an exhaustive list, but just some examples so you can realize the type of tasks you’ll be doing. Of course, depending on the profile of the selected person we’ll choose the task that fits better.

If you’re interested in this internship or any other from the rest of the teams, you can find all the details and conditions in our website. We’re a company spread all around the globe, with igalians in different countries and timezones (from Seoul to San Francisco). And, of course, these internships are remote friendly.

On top of that, Igalia is hiring too, just in case you already have some experience and are looking for a job. Again you can find all the information at

Last but not least, Igalia welcomes everyone and encourages applications from members of underrepresented groups in the free software community. We’re aiming to keep a diverse environment in our company.

February 24, 2016 11:00 PM

February 19, 2016

Michael Catanzaro

WebKitGTK+ Gets Security Updates

My recent blog post On WebKit Security Updates has attracted some not-unexpected attention. Since I knew poorly-chosen words could harm the image of the WebKit project, I prefaced that blog post with a disclaimer which I hoped few would miss:

WebKitGTK+ releases regular security updates upstream. It is safe to use so long as you apply the updates.

We have a stable branch that receives only bug fixes for six months, from which we release regular updates including security fixes. This is is comparable to industry standards (consider nine months of support for a Firefox ESR, or less than two months of support for a Chromium release). It is hardly WebKit’s fault that most distributions regularly release security updates for Firefox and Chromium, but not for WebKit.

I reject the notion that we should provide a branch with security fixes and no other bug fixes. Withholding bug fixes is unfair to users, and nobody expects Firefox or Chromium to do this. This feels like a double standard to me.

I also reject the notion that WebKit is too risky to update because it is not a leaf package. I provided a solution to this (carrying separate -stable and -secure packages) in my previous blog post for distributions that are very concerned about unexpected regressions. I don’t think it’s necessary, but it is not exactly rocket science.

I strongly disagree with conclusions that you should stop using WebKit wholesale. You should, however, verify that the version of WebKit offered by your distribution and used in your application is secure. That means (a) ensuring your distribution provides the most-recent stable release (currently 2.10.6 or 2.10.7 are both fine), and (b) ensuring your application is using that release rather than 2.4.x, which will also be packaged by your distribution and is used by most applications. For web browsers, check the Internet for well-known security flaws.

If your distribution is not providing a safe version of WebKit, consider switching to one that does and applying pressure on distributions that irresponsibly ship insecure versions of WebKit. I call on Ubuntu, Debian, openSUSE, and other distributions to follow the lead of Fedora and Arch Linux in providing stable WebKit updates to all users, not just testing branch users.

by Michael Catanzaro at February 19, 2016 09:04 PM

February 18, 2016

Enrique Ocaña

Improving Media Source Extensions on WebKit ports based on GStreamer

During 2014 I started to become interested on how GStreamer was used in WebKit to play media content and how it was related to Media Source Extensions (MSE). Along 2015, my company Igalia strenghtened its cooperation with Metrological to enhance the multimedia support in their customized version of WebKitForWayland, the web platform they use for their products for the set-top box market. This was an opportunity to do really interesting things in the multimedia field on a really nice hardware platform: Raspberry Pi.

What are Media Source Extensions?

Normal URL playback in the <video> tag works by configuring the platform player (GStreamer in our case) with a source HTTP URL, so it behaves much like any other external player, downloading the content and showing it in a window. Special cases such as Dynamic Adaptive Streaming over HTTP (DASH) are automatically handled by the player, which becomes more complex. At the same time, the JavaScript code in the webpage has no way to know what’s happening with the quality changes in the stream.

The MSE specification lets the authors move the responsibility to the JavaScript side in that kind of scenarios. A Blob object (Blob URL) can be configured to get its data from a MediaSource object. The MediaSource object can instantiate SourceBuffer objects. Video and Audio elements in the webpage can be configured with those Blob URLs. With this setup, JavaScript can manually feed binary data to the player by appending it to the SourceBuffer objects. The data is buffered and the playback time ranges generated by the data are accessible to JavaScript. The web page (and not the player) has now the control on the data being buffered, its quality, codec and procedence.  Now it’s even possible to synthesize the media data programmatically if needed, opening the door to media editors and media effects coded in JavaScript.


MSE is being adopted by the main content broadcasters on the Internet. It’s required by YouTube for its dedicated interface for TV-like devices and they even have an MSE conformance test suite that hardware manufacturers wanting to get certified for that platform must pass.

MSE architecture in WebKit

WebKit is a multiplatform framework with an end user API layer (WebKit2), an internal layer common to all platforms (WebCore) and particular implementations for each platform (GObject + GStreamer, in our case). Google and Apple have done a great work bringing MSE to WebKit. They have led the effort to implement the common WebCore abstractions needed to support MSE, such as MediaSource, SourceBuffer, MediaPlayer and the integration with HTMLMediaElement (video tag). They have also provided generic platform interfaces (MediaPlayerPrivateInterface, MediaSourcePrivate, SourceBufferPrivate) a working platform implementation for Mac OS X and a mock platform for testing.


The main contributions to the platform implementation for ports using GStreamer for media playback were done by Stephane Jadaud and Sebastian Dröge on bugs #99065 (initial implementation with hardcoded SourceBuffers for audio and video), #139441 (multiple SourceBuffers) and #140078 (support for tracks, more containers and encoding formats). This last patch hasn’t still been merged in trunk, but I used it as the starting point of the work to be done.

GStreamer, unlike other media frameworks, is strongly based on the concept of pipeline: the data traverses a series of linked elements (sources, demuxers, decoders, sinks) which process it in stages. At a given point in time, different pieces of data are in the pipeline at the same time in varying degrees of processing stages. In the case of MSE, a special WebKitMediaSrc GStreamer element is used as the data source in the pipeline and also serves as interface with the upper MSE layer, acting as client of MediaSource and SourceBuffer. WebKitMediaSrc is spawned by GstPlayBin (a container which manages everything automatically inside) when an MSE SourceBuffer is added to the MediaSource. The MediaSource is linked with the MediaPlayer, which has MediaPlayerPrivateGStreamer as private platform implementation. In the design we were using at that time, WebKitMediaSrc was responsible for demuxing the data appended on each SourceBuffer into several streams (I’ve never seen more than one stream per SourceBuffer, though) and for reporting the statistics and the samples themselves to the upper layer according to the MSE specs. To do that, the WebKitMediaSrc encapsulated an appsrc, a demuxer and a parser per source. The remaining pipeline elements after WebKitMediaSrc were in charge of decoding and playback.

Processing appends with GStreamer

The MSE implementation in Chromium uses a chunk demuxer to parse (demux) the data appended to the SourceBuffers. It keeps the parsing state and provides a self-contained way to perform the demuxing. Reusing that Chromium code would have been the easiest solution. However, GStreamer is a powerful media framework and we strongly believe that the demuxing stage can be done using GStreamer as part of the pipeline.

Because of the way GStreamer works, it’s easy to know when an element outputs new data but there’s no easy way to know when it has finished processing its input without discontinuing the flow with with End Of Stream (EOS) and effectively resetting the element. One simple approach that works is to use timeouts. If the demuxer doesn’t produce any output after a given time, we consider that the append has produced all the MediaSamples it could and therefore has finished. Two different timeouts were used: one to detect when appends that produce no samples have finished (noDataToDecodeTimeout) and another to detect when no more samples are coming (lastSampleToDecodeTimeout). The former needs to be longer than the latter.

Another technical challenge was to perform append processing when the pipeline isn’t playing. While playback doesn’t start, the pipeline just prerolls (is filled with the available data until the first frame can be rendered on the screen) and then pauses there until the continuous playback can start. However, the MSE spec expects the appended data to be completely processed and delivered to the upper MSE layer first, and then it’s up to JavaScript to decide if the playback on screen must start or not. The solution was to add intermediate queue elements with a very big capacity to force a preroll stage long enough for the probes in the demuxer source (output) pads to “see” all the samples pass beyond the demuxer. This was how the pipeline looked like at that time (see also the full dump):


While focusing on making the YouTube 2015 tests pass on our Raspberry Pi 1, we realized that the generated buffered ranges had strange micro-holes (eg: [0, 4.9998]; [5.0003, 10.0]) and that was confusing the tests. Definitely, there were differences of interpretation between ChunkDemuxer and qtdemux, but this is a minor problem which can be solved by adding some extra time ranges that fill the holes. All these changes got the append feature in good shape and the we could start watching videos more or less reliably on YouTube TV for the first time.

Basic seek support

Let’s focus on some real use case for a moment. The JavaScript code can be appending video data in the [20, 25] range, audio data in the [30, 35] range (because the [20, 30] range was appended before) and we’re still playing the [0, 5] range. Our previous design let the media buffers leave the demuxer and enter in the decoder without control. This worked nice for sequential playback, but was not compatible with non-linear playback (seeks). Feeding the decoder with video data for [0, 5] plus [20, 25] causes a big pause (while the timeline traverses [5, 20]) followed by a bunch of decoding errors (the decoder needs sequential data to work).

One possible improvement to support non-linear playback is to implement buffer stealing and buffer reinjecting at the demuxer output, so the buffers never go past that point without control. A probe steals the buffers, encapsulates them inside MediaSamples, pumps them to the upper MSE layer for storage and range reporting, and finally drops them at the GStreamer level. The buffers can be later reinjected by the enqueueSample() method when JavaScript decides to start the playback in the target position. The flushAndEnqueueNonDisplayingSamples() method reinjects auxiliary samples from before the target position just to help keeping the decoder sane and with the right internal state when the useful samples are inserted. You can see the dropping and reinjection points in the updated diagram:


The synchronization issues of managing several independent timelines at once must also be had into account. Each of the ongoing append and playback operations happen in their own timeline, but the pipeline is designed to be configured for a common playback segment. The playback state (READY, PAUSED, PLAYING), the flushes needed by the seek operation and the prerolls also affect all the pipeline elements. This problem can be minimized by manipulating the segments by hand to accomodate the different timings and by getting the help of very large queues to sustain the processing in the demuxer, even when the pipeline is still in pause. These changes can solve the issues and get the “47. Seek” test working, but YouTube TV is more demanding and requires a more structured design.

Divide and conquer

At this point we decided to simplify MediaPlayerPrivateGStreamer and refactor all the MSE logic into a new subclass called MediaPlayerPrivateGStreamerMSE. After that, the unified pipeline was split into N append pipelines (one per SourceBuffer) and one playback pipeline. This change solved the synchronization issues and splitted a complex problem into two simpler ones. The AppendPipeline class, visible only to the MSE private player, is in charge of managing all the append logic. There’s one instance for each of the N append pipelines.

Each append pipeline is created by hand. It contains an appsrc (to feed data into it), a typefinder, a qtdemuxer, optionally a decoder (in case we want to suport Encrypted Media Extensions too), and an appsink (to pick parsed data). In my willing to simplify, I removed the support for all formats except ISO MP4, the only one really needed for YouTube. The other containers could be reintroduced in the future.


The playback pipeline is what remains of the old unified pipeline, but simpler. It’s still based on playbin, and the main difference is that the WebKitMediaSrc is now simpler. It consists of N sources (one per SourceBuffer) composed by an appsrc (to feed buffered samples), a parser block and the src pads. Uridecodebin is in charge of instantiating it, like before. The PlaybackPipeline class was created to take care of some of the management logic.


The AppendPipeline class manages the callback forwarding between threads, using asserts to strongly enforce the access to WebCore MSE classes from the main thread. AtomicString and all the classes inheriting from RefCounted (instead of ThreadSafeRefCounted) can’t be safely managed from different threads. This includes most of the classes used in the MSE implementation. However, the demuxer probes and other callbacks sometimes happen in the streaming thread of the corresponding element, not in the main thread, so that’s why call forwarding must be done.

AppendPipeline also uses an internal state machine to manage the different stages of the append operation and all the actions relevant for each stage (starting/stopping the timeouts, process the samples, finish the appends and manage SourceBuffer aborts).


Seek support for the real world

With this new design, the use case of a typical seek works like this (very simplified):

  1. The video may be being currently played at some position (buffered, of course).
  2. The JavaScript code appends data for the new target position to each of the video/audio SourceBuffers. Each AppendPipeline processes the data and JavaScript is aware of the new buffered ranges.
  3. JavaScript seeks to the new position. This ends up calling the seek() and doSeek() methods.
  4. MediaPlayerPrivateGStreamerMSE instructs WebKitMediaSrc to stop accepting more samples until further notice and to prepare the seek (reset the seek-data and need-data counters). The player private performs the real GStreamer seek in the playback pipeline and leaves the rest of the seek pending for when WebKitMediaSrc is ready.
  5. The GStreamer seek causes some changes in the pipeline and eventually all the appsrc in WebKitMediaSrc emit the seek-data and need-data events. Then WebKitMediaSrc notifies the player private that it’s ready to accept samples for the target position and needs data. MediaSource is notified here to seek and this triggers the enqueuing of the new data (non displaying samples and visible ones).
  6. The pending seek at player private level which was pending from step 4 continues, giving permission to WebKitMediaSrc to accept samples again.
  7. Seek is completed. The samples enqueued in step 5 flow now through the playback pipeline and the user can see the video from the target position.

That was just the typical case, but more complex scenarios are also supported. This includes multiple seeks (pressing the forward/backward button several times), seeks to buffered areas (the easiest ones) and to unbuffered areas (where the seek sequence needs to wait until the data for the target area is appended and buffered).

Close cooperation from qtdemux is also required in order to get accurate presentation timestamps (PTS) for the processed media. We detected a special case when appending data much forward in the media stream during a seek. Qtdemux kept generating sequential presentation timestamps, completely ignoring the TFDT atom, which tells where the timestamps of the new data block must start. I had to add a new “always-honor-tfdt” attribute to qtdemux to solve that problem.

With all these changes the YouTube 2015 and 2016 tests are green for us and YouTube TV is completely functional on a Raspberry Pi 2.

Upstreaming the code during Web Engines Hackfest 2015

All this work is currently in the Metrological WebKitForWayland repository, but it could be a great upstream contribution. Last December I was invited to the Web Engines Hackfest 2015, an event hosted in Igalia premises in A Coruña (Spain). I attended with the intention of starting the upstreaming process of our MSE implementation for GStreamer, so other ports such as WebKitGTK+ and WebKitEFL could also benefit from it. Thanks a lot to our sponsors for making it possible.

At the end of the hackfest I managed to have something that builds in a private branch. I’m currently devoting some time to work on the regressions in the YouTube 2016 tests, clean unrelated EME stuff and adapt the code to the style guidelines. Eventually, I’m going to submit the patch for review on bugzilla. There are some topics that I’d like to discuss with other engineers as part of this process, such as the interpretation of the spec regarding how the ReadyState is computed.

In parallel to the upstreaming process, our plans for the future include getting rid of the append timeouts by finding a better alternative, improving append performance and testing seek even more thoroughly with other real use cases. In the long term we should add support for appendStream() and increase the set of supported media containers and codecs at least to webm and vp8.

Let’s keep hacking!

by eocanha at February 18, 2016 08:10 PM

February 17, 2016

Manuel Rego

CSS Grid Layout from the inside out (HTML5DevConf 2015)

As I announced in a previous blog post I was attending HTML5DevConf past October, where I gave a talk about CSS Grid Layout. The video of my talk is now online, so I thought it would be a good moment to remember the days in sunny San Francisco.

My Talk

First of all, thanks to the organization for giving me the chance to speak about CSS Grid Layout during the conference.

My talk was in one of the rooms of the Contemporary Jewish Museum. The room was full and people got really excited about CSS Grid Layout. This is not a big surprise as I have the same feeling every time someone speaks about it in any event. Thanks for attending my talk and for the nice feedback. 😊

During the talk we reviewed the main features provided by Grid Layout spec with live coding examples. Then I also explained the work that the browser has to do to render a grid. And finally we checked the current status on the different browser engines.

The slides have been available on my blog since the conference’s date, but now the video has been published in YouTube. If you missed it, you’ve the opportunity to watch it now. I hope you enjoy it!

Video of my talk: CSS Grid Layout from the inside out

Conference Highlights

HTML5DevConf is a huge conference, lots of people around, several talks at the same time in different venues, etc.

The JavaScript language, as expected, was one of the hot topics. There were some nice talks explaining the new stuff from ECMAScript 6 (ES6), and even talking about some stuff from ES7. For example, they were explaining things like arrow functions that were implemented in Chromium/V8 by some of my colleagues at Igalia.

Jamund Ferguson talking about arrow functions during his talk Jamund Ferguson talking about arrow functions during his talk

However, I’d like to highlight the talk by Parashuram Narasimhan called Measuring web Perf ? Lets write an app for that !!. During the talk he explained a tool called browser-perf which automates performance measurements in web pages. For example, you can test scrolling on a page and get the FPS and other performance parameters from DevOps tools. The cool thing is that you can very easily create a bot to monitor this and check if you’re suffering any regression. If you’re interested you can watch the video too.

San Francisco

Traveling to California is a long trip, but it really paid off. The weather was perfect and I had time for some sightseeing (as you can see in my flickr album). San Francisco is a lovely city and, despite of the hills, is a neat place for biking and walking. On top of that, my fellow igalian Martin Robinson was really kind and showed me some nice spots in the city. Big thanks Martin!

Sun and waves at Ocean Beach Sun and waves at Ocean Beach

After spending my whole life in front of the Atlantic Ocean, I couldn’t resist it so I went to Ocean Beach in order to try (and taste) the Pacific Ocean for the first time.


Overall, it was an incredible experience in a wonderful city, I fell in love with San Francisco. 😍

Regarding the conference itself, it was nice to make an update of my presentation at CSS Conf 2015. In addition it was useful to follow the cutting edge stuff around the web platform.

Hopefully I’ll have the possibility to keep showing you all the awesome features provided by Grid Layout in some other conferences during 2016. Fingers crossed!

Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

Lastly, as you probably know all the work around CSS Grid Layout in Blink and WebKit is the result of a collaboration between Bloomberg and Igalia. Thanks folks for your support.

February 17, 2016 11:00 PM

February 11, 2016

Manuel Rego

Subgrids thinking out loud

Subgrids has become a hot topic on the past weeks, mostly due to an article by Eric Meyer called “Subgrids Considered Essential”, that shows some examples where subgrids can be really useful. Which is somehow based on a previous post by fantasai on the same topic. Also Rachel Andrew has talked about subgrids on her blog and her last ebook about Grid Layout (which is quite handy to get an introduction to grid syntax and main features).

As you probably know, Igalia has been working on the CSS Grid Layout implementation on Chromium/Blink and Safari/WebKit for a while. As implementors we thought it would be nice to share our feedback on this topic, so this week the team working on Grid Layout (Sergio, Javi and myself) arranged a meeting to review subgrids feature. These are our initial thoughts about it and a first draft proposal about a possible implementation.

State of art

Subgrids feature allows us to create nested grids that share lines with the grid container. We just define the size of the tracks on the main grid container, and the subgrid use those sizes to layout its items.

In a grid, only direct children are grid items and can be positioned on the grid. We can mark one of those items as subgrid, and then the subgrid’s children will be like regular items for the main grid. This is pretty hard to explain with words so we’ll follow up with some examples in the next section.

The feature has been marked as “at-risk” in the specification since January 2014 Which means that it could be moved to the next level (or version) of the spec.

Current situation is that none of the grid implementations (nor any web engine, neither any polyfill) support subgrids yet. Mostly because it’s a pretty complex feature and implementors have been focused on other parts of the spec.

Use cases

In order to understand what subgrids are, let’s explain a few use cases. The canonical example for subgrids feature is a regular form defined using a list:


Ideally we’d like to define a grid with the labels on the first column and the inputs on the second one. If we define the <ul> element as a grid container, we’ll need to define the <li> as subgrids to be able to place the labels and inputs as desired. Without subgrids we’d need to remove the list (probably replacing <ul> by <div> and removing <li> tags) flattening our HTML and losing the semantic structure, which is really bad from the accessibility point of view.

With subgrids support we could use the following CSS:

ul {
    display: grid;
    grid-template-columns: auto 1fr;

li {
    display: grid;
    grid: subgrid;

label {
    grid-column: 1;

input {
    grid-column: 2;

Example of form using subgrid Example of form using subgrid

For this particular purpose, we could be using display: contents; (which is only supported in Firefox so far) instead of subgrids. Just applying display: contents; to the <li> elements we’ll have the very same result.

On the other hand, another of the subgrids examples that we can find out there is a catalog of shop items. Imagine that those products are composed by title, picture and description.

The HTML of each item would be something like this:

<div class="product">
    <img src="product.jpg" />
    <p>Long description...</p>

And imagine that we use the following CSS:

body {
    display: grid;
    grid-template-columns: 1fr 1fr 1fr;

.product {
    display: grid;
    grid: subgrid;

h1 {
    grid-row: 2;

img {
    grid-row: 1;

p {
    grid-row: 3;

The goal is that each product appears in a subgrid of 3 rows, that way the titles and descriptions are horizontally aligned. For that purpose we’re defining a main grid of 3 columns, and a subgrid for each product with the 3 rows.

Example of catalog using subgrid Example of catalog using subgrid

As we can see even if the title has 2 lines, or independently of the lines in the description, the items are properly aligned.

This cannot be achieved with display: contents;, because of it’ll add all the products in a single row (actually 3 rows in the grid). Because we’re setting the row explicitly to the <h1> (row 2), <img> (row 1) and <p> (row 3) elements. Without subgrids, those rows cannot be automatically translated to rows 5, 4 and 6, when the products have filled the 3 columns defined on the main grid.

Main issues

This and next section are very technical and should be discussed with the CSS Working Group (CSS WG), but we thought a blog post would be better to show examples than a simple mail to www-style.

We’ve been reading the spec and found some potential issues thinking about how to implement subgrids. These are some of them in no specific order.

Margin, border and padding

Probably the most important issue is the problem with margin, border and padding on the subgrid. And how they affect to the grid track sizing algorithm.

Let’s try to use a simple example to explain it, imagine a 3x2 grid where we’ve a subgrid taking the first row completely.

<div style="display: grid; grid-template-columns: auto auto auto; grid-template-rows: auto auto;">
    <div style="display: grid; grid: subgrid; grid-column: 1 / 4; grid-row: 1 / 2;">
      <div style="grid-column: 1;">Item 1</div>
      <div style="grid-column: 2;">Item 2</div>
      <div style="grid-column: 3;">Item 3</div>

Imagine that each item’s width is 100px, so the columns will be 100px width each one.

However, if we set for example the padding on the subgrid (e.g. padding: 0px 50px;), we’ll have to do some extra operations to calculate the size of the tracks:

  • First column, as it’s on the left edge of the subgrid we’ll be adding 50px to the size computation. Resulting on a 150px track.
  • Second column is not affected by the padding, so it keeps being 100px.
  • Third column, is in a similar situation than the first one because it’s on the right edge of the subgrid. So the final size is 150px.

Example of subgrid with padding Example of subgrid with padding

The problem here is that the grid track sizing algorithm works directly with the grid items. So it’d need to know if the item is in one of the edges of the subgrid in order to use the margin, border and/or padding to compute the size of the track.

On a first sight this might seem doable, but if we think on items spanning several tracks and nested subgrids things become really complex. For example, imagine that we’ve a nested subgrid that takes 2nd and 3rd columns on the first row with padding: 0px 25px;.

Example of nested subgrids with padding Example of nested subgrids with padding

For each column of the main grid container, we’d need to take into account the different paddings of the subgrids. So for each item we’d need to know not only if it’s on the edge of the subgrid but also check the ancestors.

On a related note, having scrollbars on the subgrids have similar issues related to track sizing.

Implicit tracks on the subgrid

The issue here is what happens when we get rid of explicit tracks on the subgrid. According to the spec, the items in this situation should use implicit tracks of the subgrid, and not take tracks from the main grid. Which means that they’re not aligned with the main grid (they don’t share lines with it).

Let’s use an example to illustrate this issue:

<div style="display: grid; grid-template-columns: 100px 100px 100px; grid-template-rows: 100px 100px;">
    <div style="display: grid; grid: subgrid; grid-column: 1 / 3; grid-row: 2 / 3;">
      <div style="grid-column: 1;">si1</div>
      <div style="grid-column: 2;">si2</div>
      <div style="grid-column: 3;">si3</div>
      <div style="grid-column: 4;">si4</div>

Example of subgrid with implicit tracks Example of subgrid with implicit tracks

As we can see the subgrid items si3 and si4 are placed in implicit tracks of the subgrid. Those tracks has no relation with the main grid container, literally from the spec: “they effectively extend in a third dimension”. We cannot think in a use case for this particular topic, and it seems really weird as the main goal of the subgrid is to share the lines with the main grid container.

Besides, imagine a different example of an auto-sized column, and some items from a subgrid in that column that span in and out the explicit subgrid. It’d be really complex from the grid track sizing algorithm point of view, to determine the size of that auto-sized column of the main grid container.

Automatic grid span

In the previous examples we were defining subgrids explicitly setting both the grid position (the initial column and row for the subgrid) and the grid span (how many cells the subgrid takes in each direction).

However, we can define a subgrid just setting the grid position with an automatic grid span. According to the spec, the implicit tracks on the subgrid will determine the span.

Again hopefully one example can help to get the idea:

<div style="display: grid; grid-template-columns: 100px 100px 100px; grid-template-rows: 100px 100px;">
    <div style="display: grid; grid: subgrid; grid-column: 2; grid-row: 1;">
      <div style="grid-column: 1;">si1</div>
      <div style="grid-column: 2;">si2</div>
      <div style="grid-column: 3;">si3</div>

The subgrid’s grid position is set to the 2nd column and the 1st row. Then the first 2 items (si1 and si2) are placed in the tracks from the main grid (2nd and 3rd column respectively). But the last item (si3) get rid of tracks, so it’s added on an extra implicit track on the subgrid unrelated to the main grid container. Because the subgrid doesn’t create implicit tracks on the main grid,

Example of subgrid with automatic grid span Example of subgrid with automatic grid span

This seems quite complex to manage, specially if the subgrid cannot create implicit tracks on the main grid container. Also, if we combine it with auto-placement (so we don’t set the grid position), finding the right position and span might be tricky.

On top of that, imagine that one of the items in the subgrid is positioned using grid-column: -1;. As the subgrid doesn’t have a grid span yet, probably it’d be placed on the first column of the subgrid, when we’d be expecting it to be on the last one.

Subgrid only in one axis

The subgrid keyword can be set in both properties defining the grid (grid-template-columns and grid-template-rows) or only in one of them. If we set it only in one of them, we’re creating a subgrid only in one direction.

Let’s see one example:

<div style="display: grid; grid-template-columns: 150px 150px; grid-template-rows: 150px 150px;">
    <div style="display: grid; grid-template-columns: subgrid; grid-column: 1 / 3; grid-row: 1 / 2;">
      <div style="grid-column: 1;">si1</div>
      <div style="grid-column: 2;">si2</div>
      <div style="grid-column: 1;">si3</div>
      <div style="grid-column: 2;">si4</div>

In this case we’re defining a subgrid only for the columns direction, so in order to calculate the size of the items in the 1st and 2nd columns it use the track sizes defined in the main grid container.

However, for the rows we’re not setting anything, so it uses auto-sized rows. This means that the rows will behave like in a regular nested grid, completely independent of the parent grid container.

Example of subgrid only in one axis Example of subgrid only in one axis

Probably this is feasible to implement, but again it seems a weird feature, we might need some good use cases to decide about it.

Descendants navigation

Another issue with subgrids is that we can nest them as much as we want. So, in order to determine the size of a track in the grid container, we need to traverse not only the direct children, but also all the descendants. This can have a huge performance impact if nested subgrids are abused.

Regarding this point, probably we’ve to simply live with it and navigate all the descendants being aware of the performance penalty.

Grid gaps

Probably grid gaps properties should be ignored in the subgrids. We want the items to be aligned with the main grid container, so having different gaps seems really strange.

This is not in the spec, but we guess that the reason is that grid gaps were added after subgrids, so that part of the spec was not updated.


There are other things that we didn’t explore in detail. For example:

  • Named grid lines from the main grid container used in the subgrids.
  • Positioned items in the subgrids.
  • Subgrids with orthogonal flows.

And we’re sure there are more things that we’re missing at this moment.

Draft proposal

As we can see there are a bunch of potential issues, of course they need a deeper analysis and further discussions, but the complexity of the whole feature seems not small at all.

For this reason, we might want to think in a simpler proposal. François Remy has already sent one to the CSS WG mailing list.

We’ve been thinking in another idea, trying to fulfill the use cases described previously in this post, and avoid the issues as much as possible. This is a pre-alpha proposal, so take it with a grain of salt.

The ideas would be to have a limited support of subgrids:

  • Make the subgrid pretty similar to display: contents;, so we cannot set margin, border or padding on it. That will simplify the calculations needed to determine the size of the tracks and the items.

    Main difference with display: contents; is that the subgrid will allow to translate positions depending on the final placement on the main grid container. Like in the use case of the shop catalog.

  • Remove implicit tracks on the subgrid, so we need to always set a grid span (how many columns and rows the subgrid takes) beforehand.

    Items trying to be placed on the implicit grid of the subgrid, would end up on the first cell of the subgrid. For items spanning outside the explicit grid, the span would be truncated at the end edge of the subgrid.

  • It seems clear that items in the subgrids need to be positioned before the items in the main grid container. And once all the positions are resolved, we could try to run the grid track sizing algorithm as usual.

  • Probably on an initial approach only allowing subgrids in both axis might be enough.

With this proposal the use cases explained in the first section would be covered, we’d need to be more specific on the CSS, but they’ll work.

We’d lack the chance to set CSS properties on the subgrid itself, but we could add an extra item on the subgrid that spans the whole subrid and paint a border, background or whatever is needed. Not an ideal solution anyway.


This post is mostly a braindump from the Igalia team working on CSS Grid Layout, after one day discussing about this part of the spec. We just want to share it with the rest of the web community, the CSS WG and anyone interested on the topic. In order to help to move forward the decision regarding what do to with subgrids in the CSS Grid Layout specification.

Lots of questions about subgrids are still hanging in the air:

  • Defer them to level 2?
  • Simplify them and keep them on level 1?
  • Keep them as they are and wait for subgrids before shipping CSS Grid Layout?

We think there aren’t good answers to these questions right now, hopefully we all together will find the right solution and find a final solution that makes happy all the involved parties. 😄

Last but not least, there’s going to be a CSS Grid Layout Workshop in New York City by the end of the month organized by one of the spec editors (fantasai). My colleague Sergio, who is the maintainer of Grid Layout code in Blink and WebKit will be participating in this event. Hopefully this and other topics will be clarified there, and the specification could move to Last Call Working Draft (LCWD) towards Candidate Recommendation (CR)!

Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

Finally, we’d like to thank again Bloomberg for supporting Igalia in the development of CSS Grid Layout.


February 11, 2016 11:00 PM

February 09, 2016

Víctor Jáquez

GStreamer VA-API under the umbrella of GStreamer

We have a new GStreamer VA-API release: 1.6.0!

Wait a minute“, you might say, “weren’t the last release 0.7?“, and you will be correct; but something big has happened: GStreamer VA-API is now part of the official GStreamer project!

And this means a couple changes.

First of all, the official repository of the project has been moved, now it’s co-hosted along with the rest of the GStreamer components, in (fdo):

Anonymous Git repository: git://

Developer Git repository: ssh://

Web Git gateway:

Second, the bug tracking has also moved, since now GStreamer VA-API is now a component of GStreamer, the new bugs must be filled here:

And the bug list is here:

What will happen with the old bugs?” you will ask. Well, we will move them as soon as we have reviewed all them.

The third change, as you already noticed, is the version scheme. Now we are following the GStreamer version numbering to avoid confusion and to simplify our development. Hence, this release, 1.6.0, supports the current GStreamer stable version (1.6.3), and our current development branch is the 1.7.x. The future releases will follow the GStreamer version schemes and dependencies.

Sweet! but, what’s new in this release?“. The answer is, not much. Really. Most of the changes are related with the coupling with the upstream processes (autoconf setup, documentation, etc.). Perhaps the most remarkable thing is the removal of the support libraries (libgstvaapi-*.so) used by the vaapi plugin: now they are compiled as one static library and linked to the GStreamer’s plugin. Also, the custom parsers were removed, and the plugin and elements documentation got better shape.

At code level, we had to push a huge indentation commit, in order to align with the GStreamer code style. This commit artificially kills the blame history, but it was our better option.

I ought to say that those were not the only changes at code level, Michael Olbrich fixed a missing frame release in the Wayland backend. And Sree, as usual, fixed a bunch of hardcore stuff. But specially I want to thank Tim-Philipp Müller, for helping us along the upstreaming process. And obviously to the Intel Open Source Technology Center, for let this happen.

Here’s the git’s short log summary since the last 0.7.0 release:

 1  Joel Holdsworth
 1  Michael Olbrich
 9  Sreerenj Balachandran
 4  Tim-Philipp Müller
42  Víctor Manuel Jáquez Leal

By the way, Igalia is hiring!

by vjaquez at February 09, 2016 12:32 PM

February 08, 2016

Andy Wingo

a lambda is not (necessarily) a closure


Greets, folks! Check it out: Guile had a whole track devoted to it at FOSDEM this year. OK, so it was only half a day, but there were like a dozen talks! And the room was full all the morning! And -- get this -- I had nothing to do with its organization! I think we can credit the Guix project with the recent surge of interest in Guile; fully half the talks were from people excited about using Guix to solve their problems. Thanks very, very much to Pjotr Prins for organizing the lovely event.

I gave a talk on how the Guile 2.2 compiler and virtual machine could change the way people program. Happily, the video recording came out OK! Video below (or here if that doesn't work), and slides here.

The time was super-limited though and I wasn't able to go into the detail that I'd like. So, dear readers, here we are, with a deeper look on lambda representation in Guile.

a lambda is not (necessarily) a closure

What is this?

(lambda (a b) (+ a b))

If you answer, "it's a lambda expression", you're right! You're also right if you say it's a function -- I mean, lambda makes a function, right? There are lots of things that you could say that would be right, including silly things like "twenty-two characters set in an awkward typeface".

But if you said "it's a closure" -- well you're right in general I guess, like on a semantic what-does-it-mean level, but as far as how Guile represents this thing at run-time, hoo boy are there a number of possibilities, and a closure is just one of them. This article dives into the possibilities, with the goal being to help you update your mental model of "how much do things cost".

In Guile, a lambda expression can be one of the following things at run-time:

  1. Gone

  2. Inlined

  3. Contified

  4. Code pointer

  5. Closure

Let's look into these one-by-one.

lambda: gone

If Guile can prove that a lambda expression is never reached, it won't be present at run-time. The main way this happens is via partial evaluation, but later passes can do this too. In the most basic example, consider the lambda bound to f by this let expression.

(let ((f (lambda ()

Guile has an ,optimize command that can be run at the REPL to show the effect of partial evaluation on your code. These days it's a bit out of date in a way -- it can't show what CPS-based optimization will do to your code -- but for our purposes here it will transform the expression to the following code:

(let ((f (lambda ()
=> 42

So the lambda is gone, big whoop. The interesting thing though is that this happens concurrently with other things that partial evaluation does, so the lambda goes away in this expression too:

(let ((launch? #f)
      (f (lambda ()
  (if launch? (f) 'just-kidding))
=> 'just-kidding

lambda: inlined

The other trick that partial evaluation can do with lambda expressions is inlining. Re-taking the example above, if we change launch? to #t, the branch folds the other way and the application (f) inlines:

(let ((launch? #t)
      (f (lambda ()
  (if launch? (f) 'just-kidding))
=> (let ((launch? #t)
         (f (lambda ()
     (if #t (f) 'just-kidding))
=> (let ((launch? #t)
         (f (lambda ()
=> (let ((launch? #t)
         (f (lambda ()
     ((lambda () (launch-the-missiles!))))
=> (let ((launch? #t)
         (f (lambda ()
=> (launch-the-missiles!)

Here again the lambda is gone, but not because it was unreachable, but because it was inlined into its use. I showed some intermediate steps as well, just so you get a feel about how partial evaluation works. The inlining step is illustrated by the fourth transformation, where the lambda application went away, replaced by its body.

Partial evaluation can also unroll many kinds of recursion:

(letrec ((lp (lambda (n)
               (if (zero? n)
                   (+ n (lp (1- n)))))))
  (lp 5))
=> 15

The partial evaluator in Guile 2.2 is more or less unchanged from the one in Guile 2.0, so you get these benefits on old Guile as well. Building a good intuition as to what the partial evaluator will do is important if you want to get the best performance out of Guile. Use the ,optimize command at the REPL to see the effects of partial evaluation on any given expression.

lambda: contified

So, here we step into the unknown, in the sense that from here on out, these optimizations are new in Guile 2.2. Unfortunately, they can be hard to see as they aren't really representable in terms of source-to-source transformations over Scheme programs. Consider this program:

(define (count-down n)
  (define loop
    (lambda (n out)
      (let ((out (cons n out)))
        (if (zero? n)
            (loop (1- n) out)))))
  (loop n '()))

It's a little loop that builds a list of integers. The lambda in this loop, bound to loop, will be contified into the body of count-down.

To see that this is the case, we have to use a new tool, ,disassemble (abbreviated ,x). This takes a procedure and prints its bytecode. It can be hard to understand, so I'm going to just point out some "shapes" of disassembly that you can recognize.

> ,x count-down
Disassembly of #<procedure count-down (n)> at #x9775a8:

  10    (cons 2 1 2)
  11    (br-if-u64-=-scm 0 1 #f 5) ;; -> L2
  14    (sub/immediate 1 1 1)
  15    (br -5)                    ;; -> L1

I've snipped the disassembly to the interesting part. The first thing to notice is that there's just one procedure here: only one time that ,x prints "Disassembly of ...". That means that the lambda was eliminated somehow, either because it was dead or inlined, as described above, or because it was contified. It wasn't dead; we can see that from looking at the ,optimize output, which doesn't significantly change the term. It wasn't inlined either; again, ,optimize can show you this, but consider that because partial evaluation can't determine when the loop would terminate, it won't find a point at which it can stop unrolling the loop. (In practice what happens though is that it tries, hits an effort or code growth limit, then aborts the inlining attempt.)

However, what we see in the disassembly is the body of the loop: we cons something onto a list (the cons), check if a two numbers are equal (br-if-u64-=-scm), and if they are we jump out of the loop (L2). Otherwise we subtract 1 from a number (sub/immediate) and loop (br to L1). That is the loop. So what happened?

Well, if inlining is copying, then contification is rewiring. Guile's compiler was able to see that although it couldn't inline the loop function, it could see all of loop's callers, and that loop always returned to the same "place". (Another way to say this is that loop is always called with the same continuation.) The compiler was then able to incorporate the body of loop into count-down, rewiring calls to loop to continue to loop's beginning, and rewriting returns from loop to proceed to the continuation of the loop call.

a digression on language

These words like "contification" and "continuation" might be unfamiliar to you, and I sympathize. If you know of a better explanation of contification, I welcome any links you might have. The name itself comes from a particular formulation of the intermediate language used in Guile, the so-called "CPS" language. In this language, you convert a program to make it so it never returns: instead, each sub-expression passes its values to its continuation via a tail call. Each continuation is expressed as a lambda expression. See this article for an intro to CPS and how it relates to things you might know like SSA.

Transforming a program into CPS explodes it into a bunch of little lambdas: every subexpression gets its own. You would think this would be a step backwards, if your goal is to eliminate closures in some way. However it's possible to syntactically distinguish between lambda expressions which are only ever used as continuations and those that are used as values. Let's call the former kind of lambda a cont and the latter a function. A cont-lambda can be represented at run-time as a label -- indeed, the disassembly above shows this. It turns out that all lambda expressions introduced by the CPS transformation are conts. Conts form a first-order flow graph, and are basically the same as SSA basic blocks. If you're interested in this kind of thing, see Andrew Kennedy's great paper, Compiling with Continuations, Continued, and see also CPS soup for more on how this evolved in Guile 2.2.

I say all this to give you a vocabulary. Functions that are present in the source program start life as being treated as function-lambdas. Contification takes function-lambda values and turns then into cont-lambda labels, if it can. That's where the name "contification" comes from. For more on contification, see MLton's page on its contification pass, linking to the original paper that introduces the concept.

and we're back

Contification incorporates the body of a function into the flow graph of its caller. Unlike inlining, contification is always an optimization: it never causes code growth, and it enables other optimizations by exposing first-order control flow. (It's easier for the compiler to reason about first-order loops than it is to reason about control flow between higher-order functions.)

Contification is a reliable optimization. If a function's callers are always visible to the compiler, and the function is always called with the same continuation, it will be contified. These are two fairly simple conditions that you can cultivate your instincts to detect and construct.

Contification can also apply to mutually recursive functions, if as a group they are all always called with the same continuation. It's also an iterative process, in the sense that contifying one set of functions can expose enough first-order control flow that more contification opportunities become apparent.

It can take a while to get a feel for when this optimization applies. You have to have a feel for what a continuation is, and what it means for a function's callers to all be visible to the compiler. However, once you do internalize these conditions, contification is something you can expect Guile's compiler to do to your code.

lambda: code pointer

The next representation a lambda might have at run-time is as a code pointer. In this case, the function fails the conditions for contification, but we still avoid allocating a closure.

Here's a little example to illustrate the case.

(define (thing)
  (define (log what)
    (format #t "Very important log message: ~a\n" what))
  (log "ohai")
  (log "kittens")
  (log "donkeys"))

In this example, log is called with three different continuations, so it's not eligible for contification. Unfortunately, this example won't illustrate anything for us because the log function is so small that partial evaluation will succeed in inlining it. (You could determine this for yourself by using ,optimize.) So let's make it bigger, to fool the inliner:

(define (thing)
  (define (log what)
    (format #t "Very important log message: ~a\n" what)
    ;; If `log' is too short, it will be inlined.  Make it bigger.
    (format #t "Did I ever tell you about my chickens\n")
    (format #t "I was going to name one Donkey\n")
    (format #t "I always wanted a donkey\n")
    (format #t "In the end we called her Raveonette\n")
    (format #t "Donkey is not a great name for a chicken\n")
    (newline) (newline) (newline) (newline) (newline))
  (log "ohai")
  (log "kittens")
  (log "donkeys"))

Now if we disassembly it, we do get disassembly for two different functions:

,x thing
Disassembly of #<procedure thing ()> at #x97d704:

Disassembly of log at #x97d754:

So, good. We defeated the inliner. Let's look closer at the disassembly of the outer function.

,x thing
Disassembly of #<procedure thing ()> at #x97d704:
  12    (call-label 3 2 8)              ;; log at #x97d754

Here we see that instead of the generic call instruction, we have the specific call-label instruction which calls a procedure whose code is at a known offset from the calling function.

call-label is indeed a cheaper call than the full call instruction that has to check that the callee is actually a function and so on. But that's not the real optimization here. If all callers of a function are known -- and by this time, you're starting to catch the pattern, I think -- if all callers are known, then the procedure does not need to exist as a value at run-time.

This affords a number of optimization opportunities. Theoretically there are many -- all call sites can be specialized to the specific callee. The callee can have an optimized calling convention that doesn't have anything to do with the generic convention. Effect analysis can understand the side effects and dependencies of the callee in a more precise way. The compiler can consider unboxing some arguments and return values, if it finds that useful.

In Guile though, there's only one real optimization that we do, and that is related to free variables. Currently in Guile, all procedures have a uniform calling convention, in which the procedure being called (the callee) is itself passed as the zeroeth argument, and then the arguments follow on the stack. The function being called accesses its free variables through that zeroeth argument. If however there is no need for the procedure to be represented as a value, we are free to specialize that zeroeth argument.

So, consider a well-known procedure like log above. (By "well-known", we mean that all of log's callers are known.) Since log doesn't actually have any lexically bound free variables, we can just pass in anything as argument zero when invoking it. In practice we pass #f, because it happens to be an easy value to make.

(Why isn't format treated as a free variable in log? Because there is special support from the linker for lazily initializing the locations of variables imported from other modules or defined at the top level instead of within a lexical contour. In short: only variables that are (a) used within the lambda and (b) defined within a let or similar count towards a lambda's free variables.)

For a well-known procedure with only one free variable, we can pass in that free variable as the zeroeth argument. Internally to the function, we rewrite references to that free variable to reference argument 0 instead. This is a neat hack because we can have a lambda with a free variable but which results in no allocation at run-time.

Likewise if there are two free variables -- and this is starting to sound like Alice's restaurant, isn't it -- well we do have to pass in their values to the procedure, but we don't have to build an actual closure object with a tag and a code pointer and all. Pairs happen to be small and have some fast paths in Guile, so we use that. References to the free variables get internally rewritten to be car or cdr of argument 0.

For three or more free variables, we do the same, but with a vector.

For a final trick, a set of mutually recursive procedures whose callers are all known can share the object that collects their free variables. We collect the union of the free variables of all of the procedures, and pack them into a specialized representation as above.

Note that for well-known procedures, all variables that are free in the lambda are also free in the caller; that's why the 1-free-variable substitution works. The lambda is bound in a scope that dominates its callers, but its free variables dominate the lambda so they dominate the callers too. For that reason in this case we could choose to do lambda lifting instead, with no penalty: instead of bundling up the free variables in a heap object, we could pass them as arguments. Dybvig claims this is not a great idea because it increases register pressure. That could be true, but I haven't seen the numbers. Anyway, we do the flat closure thing, so we pack the free vars into data.

All these ideas came pretty much straight from the great Optimizing Closures in O(0) Time by Andrew Keep et al.

lambda: closure

OK! So you have a lambda whose callees are not all visible to the compiler. You need to reify the procedure as a value. That reified procedure-as-value is a closure: an object with a tag, a code pointer, and an array of free variables.

Of course, if the procedure has no free variables, you just have the tag and the code pointer... and because Scheme is semantically squirrely when it comes to the result of (eqv? (lambda () 10) (lambda () 10)) (it's unspecified: lambda expressions don't have identity), we can statically allocate the closure in the binary, as a constant.

Otherwise we do allocate the heap object.

Note however that if a group of mutually recursive procedures has just one entry that is not "well-known", then that procedure clique can share one closure object.

lambda: it's complicated

In summary, a lambda is an abstraction that has many concrete representations. Guile will choose the cheapest representation that it can. If you need to eke out even more performance from your program, having a good mental model of how the abstract maps to the concrete will help you know where to focus your efforts, and what changes might be helpful. Good luck, and happy hacking!

by Andy Wingo at February 08, 2016 10:12 AM

Claudio Saavedra

Mon 2016/Feb/08

About a year ago, Igalia was approached by the people working on printing-related technologies in HP to see whether we could give them a hand in their ongoing effort to improve the printing experience in the web. They had been working for a while in extensions for popular web browsers that would allow users, for example, to distill a web page from cruft and ads and format its relevant contents in a way that would be pleasant to read in print. While these extensions were working fine, they were interested in exploring the possibility of adding this feature to popular browsers, so that users wouldn't need to be bothered with installing extensions to have an improved printing experience.

That's how Alex, Martin, and me spent a few months exploring the Chromium project and its printing architecture. Soon enough we found out that the Chromium developers had been working already on a feature that would allow pages to be removed from cruft and presented in a sort of reader mode, at least in mobile versions of the browser. This is achieved through a module called dom distiller, which basically has the ability to traverse the DOM tree of a web page and return a clean DOM tree with only the important contents of the page. This module is based on the algorithms and heuristics in a project called boilerpipe with some of it also coming from the now popular Readability. Our goal, then, was to integrate the DOM distiller with the modules in Chromium that take care of generating the document that is then sent to both the print preview and the printing service, as well as making this feature available in the printing UI.

After a couple of months of work and thanks to the kind code reviews of the folks at Google, we got the feature landed in Chromium's repository. For a while, though, it remained hidden behind a runtime flag, as the Chromium team needed to make sure that things would work well enough in all fronts before making it available to all users. Fast-forward to last week, when I found out by chance that the runtime flag has been flipped and the Simplify page printing option has been available in Chromium and Chrome for a while now, and it has even reached the stable releases. The reader mode feature in Chromium seems to remain hidden behind a runtime flag, I think, which is interesting considering that this was the original motivation behind the dom distiller.

As a side note, it is worth mentioning that the collaboration with HP was pretty neat and it's a good example of the ways in which Igalia can help organizations to improve the web experience of users. From the standards that define the web to the browsers that people use in their everyday life, there are plenty of areas in which work needs to be done to make the web a more pleasant place, for web developers and users alike. If your organization relies on the web to reach its users, or to enable them to make use of your technologies, chances are that there are areas in which their experience can be improved and that's one of the things we love doing.

February 08, 2016 09:01 AM

February 04, 2016

Andy Wingo

guile compiler tasks

Hey! We released Guile 2.1.2, including the unboxing work, and we fixed the slow bootstrap problem by shipping pre-built bootstraps in tarballs. A pretty OK solution in my opinion; check it out!

future work

At this point I think I'm happy with Guile's compiler and VM, enough for now. There is a lot more work to do but it's a good point at which to release a stable series. There will probably be a number of additional pre-releases, but not any more significant compiler/VM work that must be done before a release.

However, I was talking with Guilers at FOSDEM last weekend and we realized that although we do a pretty good job at communicating the haps in compiler-land, we don't do a good job at sharing a roadmap or making it possible for other folks to join the hack. And indeed, it's been difficult to do so while things were changing so much: I had to get things right in my head before joining in the confusion of other people's heads.

In that spirit I'd like to share a list of improvements that it would be nice to make at some point. If you take one of these tasks, be my guest: find me on IRC (wingo on freenode) and let me know, and I'll help as I am able. You need to be somewhat independent; I'm not offering a proper mentoring or anything, more like office hours or something, where you come with the problem you are having and I commiserate and give context/background/advice as I am able.

So with that out of the way, here's a huge list of stuff! Following this, more details on each one.

  1. stripping binaries

  2. full source in binaries

  3. cps in in binaries

  4. linking multiple modules together

  5. linking a single executable

  6. instruction explosion

  7. elisp optimizations

  8. prompt removal

  9. basic register allocation

  10. optimal register allocation

  11. unboxed record fields

  12. textual CPS

  13. avoiding arity checks

  14. unboxed calls and returns

  15. module-level inlining

  16. cross-module inlining

As a bonus, in the end I'll give some notes on native compilation. But first, the hacks!

stripping binaries

Guile uses ELF as its object file format, and currently includes source location information as DWARF data. On space-constrained devices this might be too much. Your task: add a hack to the linker that can strip existing binaries. Read Ian Lance Taylor's linker articles for more background, if you don't know things about linkers yet.

full source in binaries

Wouldn't it be nice if the ELF files that Guile generates actually included the source as well as the line numbers? We could do that, in a separate strippable ELF section. This point is like the reverse of the previous point :)

cps in in binaries

We could also include the CPS IR in ELF files too. This would enable some kinds of link-time optimization and cross-module inlining. You'd need to define a binary format for CPS, like LLVM bitcode or so. Neat stuff :)

linking multiple modules together

Currently in Guile, just about every module is a separate .go file. Loading a module will cause a few stat calls and some seeks and reads and all that. Wouldn't it be nice if you could link together all the .go files that were commonly used into one object? Again this is a linker hack, but it needs support from the run-time as well: when the run-time goes to load a file, it should first check in a registry if that file has been logically provided by some other file. We'd be able to de-duplicate constant data from various modules. However there is an initialization phase when loading a .go file which effectively performs all the relocations needed by constants that need a fix-up at load-time; see the ELF article I linked to above for more. For some uses, it would be OK to produce one relocation/initialization procedure. For others, if you expected to only load a fraction of the modules in a .go file, it would be a lose on startup time,
so you would probably need to support lazy relocation when a module is first loaded.

Anyway, your task would be to write a linker hack that loads a bunch of .go files, finds the relocations in them, de-duplicates the constants, and writes out a combined .go file that includes a table of files contained in it. Good luck :) This hack would work great for Emacs, where it's effectively a form of unexec that doesn't actually rely on unexec.

linking a single executable

In the previous task, you could end up with the small guile binary that links to libguile (or your binary linking to libguile), and then a .go file containing all the modules you are interestd in. It sure would be nice to be able to link those together into just one binary, or at least to link the .go into the Guile binary. If the Guile is statically linked itself, you would have a statically linked application. If it's dynamically linked, it would remain dynamically linked. Again, a linker hack, but one that could provide a nicer way to distribute Guile binaries.

instruction explosion

Now we get more to the compiler side of things. Currently in Guile's VM there are instructions like vector-ref. This is a little silly: there are also instructions to branch on the type of an object (br-if-tc7 in this case), to get the vector's length, and to do a branching integer comparison. Really we should replace vector-ref with a combination of these test-and-branches, with real control flow in the function, and then the actual ref should use some more primitive unchecked memory reference instruction. Optimization could end up hoisting everything but the primitive unchecked memory reference, while preserving safety, which would be a win. But probably in most cases optimization wouldn't manage to do
this, which would be a lose overall because you have more instruction dispatch.

Well, this transformation is something we need for native compilation anyway. I would accept a patch to do this kind of transformation on the master branch, after version 2.2.0 has forked. In theory this would remove most all high level instructions from the VM, making the bytecode closer to a virtual CPU, and likewise making it easier for the compiler to emit native code as it's working at a lower level.

elisp optimizations

Guile implements Emacs Lisp, and does so well. However it hasn't been the focus of a lot of optimization. Emacs has a lot of stuff going on on its side, and so have we, so we haven't managed to replace the Elisp interpreter in Emacs with one written in Guile, though Robin Templeton has brought us a long way forward. We need someone to do both the integration work but also to poke the compiler and make sure it's a clear win.

prompt removal

It's pretty natural to use delimited continuations when compiling some kind of construct that includes a break statement to Guile, whether that compiler is part of Elisp or just implemented as a Scheme macro. But, many instances of prompts can be contified, resulting in no overhead at run-time. Read up on contification and contify the hell out of some prompts!

basic register allocation

Guile usually tries its best to be safe-for-space: only the data which might be used in the future of a program is kept alive, and the rest is available for garbage collection. Notably, this applies to function arguments, temporaries, and lexical variables: if a value is dead, the GC can collect it and re-use its space. However this isn't always what you want. Sometimes you might want to have all variables that are in scope to be available, for better debugging. Your task would be to implement a "slot allocator" (which is really register allocation) that keeps values alive in the parts of the programs that they dominate.

optimal register allocation

On the other hand, our slot allocator -- which is basically register allocation, but for stack slots -- isn't so great. It does OK but you can often end up shuffling values in a loop, which is the worst. Your task would be to implement a proper register allocator: puzzle-solving, graph-coloring, iterative coalescing, something that really tries to do a good job. Good luck!

unboxed record fields

Guile's "structs", on which records are implemented, support unboxed values, but these values are untyped, not really integrated with the record layer, and always boxed in the VM. Your task would be to design a language facility that allows us to declare records with typed fields, and to store unboxed values in those fields, and to cause access to their values to emit boxing/unboxing instructions around them. The optimizer will get rid of those boxing/unboxing instructions if it can. Good luck!

textual CPS

The CPS language is key to all compiler work in Guile, but it doesn't have a nice textual form like LLVM IR does. Design one, and implement a parser and an unparser!

avoiding arity checks

If you know the procedure you are calling, like if it's lexically visible, then if you are calling it with the right number of arguments you can skip past the argument check and instead do a call-label directly into the body. Would be pretty neat!

unboxed calls and returns

Likewise if a function's callers are all known, it might be able to unbox its arguments or return value, if that's a good idea. Tricky! You could start with a type inference pass or so, and maybe that could produce some good debugging feedback too.

module-level inlining

Guile currently doesn't inline anything that's not lexically visible. Unfortunately this restriction extends to top-level definitions in a module: they are treated as mutable and so never inlined/optimized/etc. Probably we need to change the semantics here such that a module can be compiled as a unit, and all values which are never mutated can be assumed to be constant. Probably you also want a knob to turn off this behavior, but really you can always re-compile and re-load a module as a whole if re-loading a function at run-time doesn't work because it was inlined. Anyway. Some semantic work here, but some peval work as well. Be careful!

cross-module inlining

Likewise Guile currently doesn't inline definitions from other modules. However for small functions this really hurts. Guile should probably serialize tree-il for small definitions in .go files, and allow peval to speculatively inline imported definitions. This is related to the previous point and has some semantic implications.

bobobobobobonus! native compilation

Thinking realistically, native compilation is the next step. We have the object file format, cool. We will need the ability to call out from machine code in .go files to run-time functions, so we need to enhance the linker, possibly even with things like PLT/GOT sections to avoid dirtying too many pages. We need to lower the CPS even further, to get closer to some kind of machine model, then go specific, with an assembler for each architecture. The priority in the beginning will be simplicity and minimal complexity; good codegen will come later. This is obviously the most attractive thing but it's also the most tricky, design-wise. I want to do at least part of this, so though you can't have it all, you are welcome to help :)

That's it for now. I'll amend the post with more things as and when I think of them. Comments welcome too, as always. Happy hacking!

by Andy Wingo at February 04, 2016 09:38 PM

Claudio Saavedra

Thu 2016/Feb/04

We've opened a few positions for developers in the fields of multimedia, networking, and compilers. I could say a lot about why working in Igalia is way different to working on your average tech-company or start-up, but I think the way it's summarized in the announcements is pretty good. Have a look at them if you are curious and don't hesitate to apply!

February 04, 2016 12:53 PM

February 03, 2016

Michael Catanzaro

On Subresource Certificate Validation

Ryan Castellucci has a quick read on subresource certificate validation. It is accurate; I fixed this shortly after joining Igalia. (Update: This was actually in response to a bug report from him.) Run his test to see if your browser is vulnerable.

Epiphany, Xombrero, Opera Mini and Midori […] were loading subresources, such as scripts, from HTTPS servers without doing proper certificate validation. […] Unfortunately Xombrero and Midori are still vulnerable. Xombrero seems to be dead, and I’ve gotten no response from them. I’ve been in touch with Midori, but they say they don’t have the resources to fix it, since it would require rewriting large portions of the code base in order to be able to use the fixed webkit.

I reported this to the Midori developers in late 2014 (private bug). It’s hard to understate how bad this is: it makes HTTPS completely worthless, because an attacker can silently modify JavaScript loaded via subresources.

This is actually a unique case in that it’s a security problem that was fixed only thanks to the great API break, which has otherwise been the cause of many security problems. Thanks to the API break, we were able to make the new API secure by default without breaking any existing applications. (But this does no good for applications unable to upgrade.)

(A note to folks who read Ryan’s post: most mainstream browsers do silently block invalid certificates, but Safari will warn instead. I’m not sure which behavior I prefer.)

by Michael Catanzaro at February 03, 2016 08:36 PM

February 01, 2016

Michael Catanzaro

On WebKit Security Updates

Linux distributions have a problem with WebKit security.

Major desktop browsers push automatic security updates directly to users on a regular basis, so most users don’t have to worry about security updates. But Linux users are dependent on their distributions to release updates. Apple fixed over 100 vulnerabilities in WebKit last year, so getting updates out to users is critical.

This is the story of how that process has gone wrong for WebKit.

Before we get started, a few disclaimers. I want to be crystal clear about these points:

  1. This post does not apply to WebKit as used in Apple products. Apple products receive regular security updates.
  2. WebKitGTK+ releases regular security updates upstream. It is safe to use so long as you apply the updates.
  3. The opinions expressed in this post are my own, not my employer’s, and not the WebKit project’s.

Browser Security in a Nutshell

Web engines are full of security vulnerabilities, like buffer overflows, null pointer dereferences, and use-after-frees. The details don’t matter; what’s important is that skilled attackers can turn these vulnerabilities into exploits, using carefully-crafted HTML to gain total control of your user account on your computer (or your phone). They can then install malware, read all the files in your home directory, use your computer in a botnet to attack websites, and do basically whatever they want with it.

If the web engine is sandboxed, then a second type of attack, called a sandbox escape, is needed. This makes it dramatically more difficult to exploit vulnerabilities. Chromium has a top-class Linux sandbox. WebKit does have a Linux sandbox, but it’s not any good, so it’s (rightly) disabled by default. Firefox does not have a sandbox due to major architectural limitations (which Mozilla is working on).

For this blog post, it’s enough to know that attackers use crafted input to exploit vulnerabilities to gain control of your computer. This is why it’s not a good idea to browse to dodgy web pages. It also explains how a malicious email can gain control of your computer. Modern email clients render HTML mail using web engines, so malicious emails exploit many of the same vulnerabilities that a malicious web page might. This is one reason why good email clients block all images by default: image rendering, like HTML rendering, is full of security vulnerabilities. (Another reason is that images hosted remotely can be used to determine when you read the email, violating your privacy.)

WebKit Ports

To understand WebKit security, you have to understand the concept of WebKit ports, because different ports handle security updates differently.

While most code in WebKit is cross-platform, there’s a large amount of platform-specific code as well, to improve the user and developer experience in different environments. Different “ports” run different platform-specific code. This is why two WebKit-based browsers, say, Safari and Epiphany (GNOME Web), can display the same page slightly differently: they’re using different WebKit ports.

Currently, the WebKit project consists of six different ports: one for Mac, one for iOS, two for Windows (Apple Windows and WinCairo), and two for Linux (WebKitGTK+ and WebKitEFL). There are some downstream ports as well; unlike the aforementioned ports, downstream ports are, well, downstream, and not part of the WebKit project. The only one that matters for Linux users is QtWebKit.

If you use Safari, you’re using the Mac or iOS port. These ports get frequent security updates from Apple to plug vulnerabilities, which users receive via regular updates.

Everything else is broken.

Since WebKit is not a system library on Windows, Windows applications must bundle WebKit, so each application using WebKit must be updated individually, and updates are completely dependent on the application developers. iTunes, which uses the Apple Windows port, does get regular updates from Apple, but beyond that, I suspect most applications never get any security updates. This is a predictable result, the natural consequence of environments that require bundling libraries.

(This explains why iOS developers are required to use the system WebKit rather than bundling their own: Apple knows that app developers will not provide security updates on their own, so this policy ensures every iOS application rendering HTML gets regular WebKit security updates. Even Firefox and Chrome on iOS are required to use the system WebKit; they’re hardly really Firefox or Chrome at all.)

The same scenario applies to the WinCairo port, except this port does not have releases or security updates. Whereas the Apple ports have stable branches with security updates, with WinCairo, companies take a snapshot of WebKit trunk, make their own changes, and ship products with that. Who’s using WinCairo? Probably lots of companies; the biggest one I’m aware of uses a WinCairo-based port in its AAA video games. It’s safe to assume few to no companies are handling security backports for their downstream WinCairo branches.

Now, on to the Linux ports. WebKitEFL is the WebKit port for the Enlightenment Foundation Libraries. It’s not going to be found in mainstream Linux distributions; it’s mostly used in embedded devices produced by one major vendor. If you know anything at all about the internet of things, you know these devices never get security updates, or if they do, the updates are superficial (updating only some vulnerable components and not others), or end a couple months after the product is purchased. WebKitEFL does not bother with pretense here: like WinCairo, it has never had security updates. And again, it’s safe to assume few to no companies are handling security backports for their downstream branches.

None of the above ports matter for most Linux users. The ports available on mainstream Linux distributions are QtWebKit and WebKitGTK+. Most of this blog will focus on WebKitGTK+, since that’s the port I work on, and the port that matters most to most of the people who are reading this blog, but QtWebKit is widely-used and deserves some attention first.

It’s broken, too.


QtWebKit is the WebKit port used by Qt software, most notably KDE. Some cherry-picked examples of popular applications using QtWebKit are Amarok, Calligra, KDevelop, KMail, Kontact, KTorrent, Quassel, Rekonq, and Tomahawk. QtWebKit provides an excellent Qt API, so in the past it’s been the clear best web engine to use for Qt applications.

After Google forked WebKit, the QtWebKit developers announced they were switching to work on QtWebEngine, which is based on Chromium, instead. This quickly led to the removal of QtWebKit from the WebKit project. This was good for the developers of other WebKit ports, since lots of Qt-specific code was removed, but it was terrible for KDE and other QtWebKit users. QtWebKit is still maintained in Qt and is getting some backports, but from a quick check of their git repository it’s obvious that it’s not receiving many security updates. This is hardly unexpected; QtWebKit is now years behind upstream, so providing security updates would be very difficult. There’s not much hope left for QtWebKit; these applications have hundreds of known vulnerabilities that will never be fixed. Applications should port to QtWebEngine, but for many applications this may not be easy or even possible.

Update: As pointed out in the comments, there is some effort to update QtWebKit. I was aware of this and in retrospect should have mentioned this in the original version of this article, because it is relevant. Keep an eye out for this; I am not confident it will make its way into upstream Qt, but if it does, this problem could be solved.


WebKitGTK+ is the port used by GTK+ software. It’s most strongly associated with its flagship browser, Epiphany, but it’s also used in other places. Some of the more notable users include Anjuta, Banshee, Bijiben (GNOME Notes), Devhelp, Empathy, Evolution, Geany, Geary, GIMP, gitg, GNOME Builder, GNOME Documents, GNOME Initial Setup, GNOME Online Accounts, GnuCash, gThumb, Liferea, Midori, Rhythmbox, Shotwell, Sushi, and Yelp (GNOME Help). In short, it’s kind of important, not only for GNOME but also for Ubuntu and Elementary. Just as QtWebKit used to be the web engine for choice for Qt applications, WebKitGTK+ is the clear choice for GTK+ applications due to its nice GObject APIs.

Historically, WebKitGTK+ has not had security updates. Of course, we released updates with security fixes, but not with CVE identifiers, which is how software developers track security issues; as far as distributors are concerned, without a CVE identifier, there is no security issue, and so, with a few exceptions, distributions did not release our updates to users. For many applications, this is not so bad, but for high-risk applications like web browsers and email clients, it’s a huge problem.

So, we’re trying to improve. Early last year, my colleagues put together our first real security advisory with CVE identifiers; the hope was that this would encourage distributors to take our updates. This required data provided by Apple to WebKit security team members on which bugs correspond to which CVEs, allowing the correlation of Bugzilla IDs to Subversion revisions to determine in which WebKitGTK+ release an issue has been fixed. That data is critical, because without it, there’s no way to know if an issue has been fixed in a particular release or not. After we released this first advisory, Apple stopped providing the data; this was probably just a coincidence due to some unrelated internal changes at Apple, but it certainly threw a wrench in our plans for further security advisories.

This changed in November, when I had the pleasure of attending the WebKit Contributors Meeting at Apple’s headquarters, where I was finally able meet many of the developers I had interacted with online. At the event, I gave a presentation on our predicament, and asked Apple to give us information on which Bugzilla bugs correspond to which CVEs. Apple kindly provided the necessary data a few weeks later.

During the Web Engines Hackfest, a yearly event that occurs at Igalia’s office in A Coruña, my colleagues used this data to put together WebKitGTK+ Security Advisory WSA-2015-0002, a list of over 130 vulnerabilities disclosed since the first advisory. (The Web Engines Hackfest was sponsored by Igalia, my employer, and by our friends at Collabora. I’m supposed to include their logos here to advertise how cool it is that they support the hackfest, but given all the doom and gloom in this post, I decided perhaps they would perhaps prefer not to have their logos attached to it.)

Note that 130 vulnerabilities is an overcount, as it includes some issues that are specific to the Apple ports. (In the future, we’ll try to filter these out.) Only one of the issues — a serious error in the networking backend shared by WebKitGTK+ and WebKitEFL — resided in platform-specific code; the rest of the issues affecting WebKitGTK+ were all cross-platform issues. This is probably partly because the trickiest code is cross-platform code, and partly because security researchers focus on Apple’s ports.

Anyway, we posted WSA-2015-0002 to the oss-security mailing list to make sure distributors would notice, crossed our fingers, and hoped that distributors would take the advisory seriously. That was one month ago.

Distribution Updates

There are basically three different approaches distributions can take to software updates. The first approach is to update to the latest stable upstream version as soon as, or shortly after, it’s released. This is the strategy employed by Arch Linux. Arch does not provide any security support per se; it’s not necessary, so long as upstream projects release real updates for security problems and not simply patches. Accordingly, Arch almost always has the latest version of WebKitGTK+.

The second main approach, used by Fedora, is to provide only stable release updates. This is more cautious, reflecting that big updates can break things, so they should only occur when upgrading to a new version of the operating system. For instance, Fedora 22 shipped with WebKitGTK+ 2.8, so it would release updates to new 2.8.x versions, but not to WebKitGTK+ 2.10.x versions.

The third approach, followed by most distributions, is to take version upgrades only rarely, or not at all. For smaller distributions this may be an issue of manpower, but for major distributions it’s a matter of avoiding regressions in stable releases. Holding back on version updates actually works well for most software. When security problems arise, distribution maintainers for major distributions backport fixes and release updates. The problem is that this not feasible for web engines; due to the huge volume of vulnerabilities that need fixed, security issues can only practically be handled upstream.

So what’s happened since WSA-2015-0002 was released? Did it convince distributions to take WebKitGTK+ security seriously? Hardly. Fedora is the only distribution that has made any changes in response to WSA-2015-0002, and that’s because I’m one of the Fedora maintainers. (I’m pleased to announce that we have a 2.10.7 update headed to both Fedora 23 and Fedora 22 right now. In the future, we plan to release the latest stable version of WebKitGTK+ as an update to all supported versions of Fedora shortly after it’s released upstream.)


Ubuntu releases WebKitGTK+ updates somewhat inconsistently. For instance, Ubuntu 14.04 came with WebKitGTK+ 2.4.0. 2.4.8 is available via updates, but even though 2.4.9 was released upstream over eight months ago, it has not yet been released as an update for Ubuntu 14.04.

By comparison, Ubuntu 15.10 (the latest release) shipped with WebKitGTK+ 2.8.5, which has never been updated; it’s affected by about 40 vulnerabilities fixed in the latest upstream release. Ubuntu organizes its software into various repositories, and provides security support only to software in the main repository. This version of WebKitGTK+ is in Ubuntu’s “universe” repository, not in main, so it is excluded from security support. Ubuntu users might be surprised to learn that a large portion of Ubuntu software is in universe and therefore excluded from security support; this is in contrast to almost all other distributions, which typically provide security updates for all the software they ship.

I’m calling out Ubuntu here not because it is specially-negligent, but simply because it is our biggest distributor. It’s not doing any worse than most of our other distributors.


Debian provides WebKit updates to users running unstable, and to testing except during freeze periods, but not to released version of Debian. Debian is unique in that it has a formal policy on WebKit updates. Here it is, reproduced in full:

Debian 8 includes several browser engines which are affected by a steady stream of security vulnerabilities. The high rate of vulnerabilities and partial lack of upstream support in the form of long term branches make it very difficult to support these browsers with backported security fixes. Additionally, library interdependencies make it impossible to update to newer upstream releases. Therefore, browsers built upon the webkit, qtwebkit and khtml engines are included in Jessie, but not covered by security support. These browsers should not be used against untrusted websites.

For general web browser use we recommend Iceweasel or Chromium.

Chromium – while built upon the Webkit codebase – is a leaf package, which will be kept up-to-date by rebuilding the current Chromium releases for stable. Iceweasel and Icedove will also be kept up-to-date by rebuilding the current ESR releases for stable.

(Iceweasel and Icedove are Debian’s de-branded versions of Firefox and Thunderbird, the product of an old trademark spat with Mozilla.)

Debian is correct that we do not provide long term support branches, as it would be very difficult to backport security fixes. But it is not correct that “library interdependencies make it impossible to update to newer upstream releases.” This might have been true in the past, but for several years now, we have avoided requiring new versions of libraries whenever it would cause problems for distributions, and — with one big exception that I will discuss below — we ensure that each release maintains both API and ABI compatibility. (Distribution maintainers should feel free to get in touch if we accidentally introduce some compatibility issue for your distribution; if you’re having trouble taking our updates, we want to help. I recently worked with openSUSE to make sure WebKitGTK+ can still be compiled with GCC 4.8, for example.)

The risk in releasing updates is that WebKitGTK+ is not a leaf package: a bad update could break some application. This seems to me like a good reason for application maintainers to carefully test the updates, rather than a reason to withhold security updates from users, but it’s true there is some risk here. One possible solution would be to have two different WebKitGTK+ packages, say, webkitgtk-secure, which would receive updates and be used by high-risk software like web browsers and email clients, and a second webkitgtk-stable package that would not receive updates to reduce regression potential.

Recommended Distributions

We regularly receive bug reports from users with very old versions of WebKit, who trust their distributors to handle security for them and might not even realize they are running ancient, unsafe versions of WebKit. I strongly recommend using a distribution that releases WebKitGTK+ updates shortly after they’re released upstream. That is currently only Arch and Fedora. (You can also safely use WebKitGTK+ in Debian testing — except during its long freeze periods — and Debian unstable, and maybe also in openSUSE Tumbleweed, and (update) also in Gentoo testing. Just be aware that the stable releases of these distributions are currently not receiving our security updates.) I would like to add more distributions to this list, but I’m currently not aware of any more that qualify.

The Great API Break

So, if only distributions would ship the latest release of WebKitGTK+, then everything would be good, right? Nope, because of a large API change that occurred two and a half years ago, called WebKit2.

WebKit (an API layer within the WebKit project) and WebKit2 are two separate APIs around WebCore. WebCore is the portion of the WebKit project that Google forked into Blink; it’s too low-level to be used directly by applications, so it’s wrapped by the nicer WebKit and WebKit2 APIs. The difference between the WebKit and WebKit2 APIs is that WebKit2 splits work into multiple secondary processes. Asides from the UI process, an application will have one or many separate web processes (for the actual page rendering), possibly a separate network process, and possibly a database process for IndexedDB. This is good for security, because it allows the secondary processes to be sandboxed: the web process is the one that’s likely to be compromised first, so it should not have the ability to access the filesystem or the network. (Remember, though, that there is no Linux sandbox yet, so this is currently only a theoretical benefit.) The other main benefit is robustness. If a web site crashes the renderer, only a single web process crashes (corresponding to one tab in Epiphany), not the entire browser. UI process crashes are comparatively rare.

Intermission: Certificate Verification

Another advantage provided by the API change is the opportunity to handle HTTPS connections more securely. In the original WebKitGTK+ API, applications must handle certificate verification on their own. This was a serious mistake; predictably, applications performed no verification at all, or did so improperly. For instance, take this Shotwell bug which is not fixed in any released version of Shotwell, or this Banshee bug which is still open. Probably many more applications are affected, because I have not done a comprehensive check. The new API is secure by default; applications can ignore verification errors, but only if they go out of their way to do so.

Remember that even though WebKitGTK+ 2.4.9 was released upstream over eight months ago, Ubuntu 14.04 is still on 2.4.8? It’s worth mentioning that 2.4.9 contains the fix for that serious networking backend issue I mentioned earlier (CVE-2015-2330). The bug is that TLS certificate verification was not performed until an HTTP response was received from the server; it’s supposed to be performed before sending an HTTP request, to prevent secure cookies from leaking. This is a disaster, as attackers can easily use it to get your session cookie and then control your user account on most websites. (Credit to Ross Lagerwall for reporting that issue.) We reported this separately to oss-security due to its severity, but that was not enough to convince distributions to update. But most applications in Ubuntu 14.04, including Epiphany and Midori, would not even benefit from this fix, because the change only affects WebKit2; remember, there’s no certificate verification in the original WebKitGTK+ API. (Modern versions of Epiphany do use WebKit2, but not the old version included in Ubuntu 14.04.) Old versions of Epiphany and Midori load pages even if certificate verification fails; the verification result is only used to change the status of a security indicator, basically giving up your session cookies to attackers.

Removing WebKit1

WebKit2 has been around for Mac and iOS for longer, but the first stable release for WebKitGTK+ was the appropriately-versioned WebKitGTK+ 2.0, in March 2013. This release actually contained three different APIs: webkitgtk-1.0, webkitgtk-3.0, and webkit2gtk-3.0. webkitgtk-1.0 was the original API, used by GTK+ 2 applications. webkitgtk-3.0 was the same thing for GTK+ 3 applications, and webkit2gtk-3.0 was the new WebKit2 API, available only for GTK+ 3 applications.

Maybe it should have remained that way.

But, since the original API was a maintenance burden and not as stable or robust as WebKit2, it was deleted after the WebKitGTK+ 2.4 release in March 2014. Applications had had a full year to upgrade; surely that was long enough, right? The original WebKit API layer is still maintained for the Mac, iOS, and Windows ports, but the GTK+ API for it is long gone. WebKitGTK+ 2.6 (September 2014) was released with only one API, webkit2gtk-4.0, which was basically the same as webkit2gtk-3.0 except for a couple small fixes; most applications were able to upgrade by simply changing the version number. Since then, we have maintained API and ABI compatibility for webkit2gtk-4.0, and intend to do so indefinitely, hopefully until GTK+ 4.0.

A lot of good that does for applications using the API that was removed.

WebKit2 Adoption

While upgrading to the WebKit2 API will be easy for most applications (it took me ten minutes to upgrade GNOME Initial Setup), for many others it will be a significant challenge. Since rendering occurs out of process in WebKit2, the DOM API can only be accessed by means of a shared object injected into the web process. For applications that perform only a small amount of DOM manipulation, this is a minor inconvenience compared to the old API. For applications that use extensive DOM manipulation — the email clients Evolution and Geary, for instance — it’s not just an inconvenience, but a major undertaking to upgrade to the new API. Worse, some applications (including both Geary and Evolution) placed GTK+ widgets inside the web view; this is no longer possible, so such widgets need to be rewritten using HTML5. Say nothing of applications like GIMP and Geany that are stuck on GTK+ 2. They first have to upgrade to GTK+ 3 before they can consider upgrading to modern WebKitGTK+. GIMP is working on a GTK+ 3 port anyway (GIMP uses WebKitGTK+ for its help browser), but many applications like Geany (the IDE, not to be confused with Geary) are content to remain on GTK+ 2 forever. Such applications are out of luck.

As you might expect, most applications are still using the old API. How does this work if it was already deleted? Distributions maintain separate packages, one for old WebKitGTK+ 2.4, and one for modern WebKitGTK+. WebKitGTK+ 2.4 has not had any updates since last May, and the last real comprehensive security update was over one year ago. Since then, almost 130 vulnerabilities have been fixed in newer versions of WebKitGTK+. But since distributions continue to ship the old version, few applications are even thinking about upgrading. In the case of the email clients, the Evolution developers are hoping to upgrade later this year, but Geary is completely dead upstream and probably will never be upgraded. How comfortable are you with using an email client that has now had no security updates for a year?

(It’s possible there might be a further 2.4 release, because WebKitGTK+ 2.4 is incompatible with GTK+ 3.20, but maybe not, and if there is, it certainly will not include many security fixes.)

Fixing Things

How do we fix this? Well, for applications using modern WebKitGTK+, it’s a simple problem: distributions simply have to start taking our security updates.

For applications stuck on WebKitGTK+ 2.4, I see a few different options:

  1. We could attempt to provide security backports to WebKitGTK+ 2.4. This would be very time consuming and therefore very expensive, so count this out.
  2. We could resurrect the original webkitgtk-1.0 and webkitgtk-3.0 APIs. Again, this is not likely to happen; it would be a lot of work to restore them, and they were removed to reduce maintenance burden in the first place. (I can’t help but feel that removing them may have been a mistake, but my colleagues reasonably disagree.)
  3. Major distributions could remove the old WebKitGTK+ compatibility packages. That will force applications to upgrade, but many will not have the manpower to do so: good applications will be lost. This is probably the only realistic way to fix the security problem, but it’s a very unfortunate one. (But don’t forget about QtWebKit. QtWebKit is based on an even older version of WebKit than WebKitGTK+ 2.4. It doesn’t make much sense to allow one insecure version of WebKit but not another.)

Or, a far more likely possibility: we could do nothing, and keep using insecure software.

by Michael Catanzaro at February 01, 2016 10:40 PM

Xabier Rodríguez Calvar

Web Engines Hackfest according to me

And once again, in December we celebrated the hackfest. This year happened between Dec 7-9 at the Igalia premises and the scope was much broader than WebKitGTK+, that’s why it was renamed as Web Engines Hackfest. We wanted to gather people working on all open source web engines and we succeeded as we had people working on WebKit, Chromium/Blink and Servo.

The edition before this I was working with Youenn Fablet (from Canon) on the Streams API implementation in WebKit and we spent our time on the same thing again. We have to say that things are much more mature now. During the hackfest we spent our time in fixing the JavaScriptCore built-ins inside WebCore and we advanced on the automatic importation of the specification web platform tests, which are based on our prior test implementation. Since now they are managed there, it does not make sense to maintain them inside WebKit too, we just import them. I must say that our implementation is fairly complete since we support the current version of the spec and have almost all tests passing, including ReadableStream, WritableStream and the built-in strategy classes. What is missing now is making Streams work together with other APIs, such as Media Source Extensions, Fetch or XMLHttpRequest.

There were some talks during the hackfest and we did not want to be less, so we had our own about Streams. You can enjoy it here:

You can see all hackfest talks in this YouTube playlist. The ones I liked most were the ones by Michael Catanzaro about HTTP security, which is always interesting given the current clumsy political movements against cryptography and the one by Dominik Röttsches about font rendering. It is really amazing what a browser has to do just to get some letters painted on the screen (and look good).

As usual, the environment was amazing and we had a great time, including the traditional Street Fighter‘s match, where Gustavo found a worthy challenger in Changseok :)

Of course, I would like to thank Collabora and Igalia for sponsoring the event!

And by the way, quite shortly after that, I became a WebKit reviewer!

by calvaris at February 01, 2016 11:06 AM

January 31, 2016

Manuel Rego

Deep Dive into Grid Layout Placement

During the last months as part of my work in Igalia I’ve been focused on finishing the new/missing bits of the CSS Grid Layout Blink’s implementation related to items placement. In summary the task was mostly related to 2 things:

  • Support implicit grid before explicit. So the grid can add tracks on demand not only on the growing direction (usually right/bottom) but also on the opposite one.

  • Fix unknown named grid lines resolution. This is the case when an item is placed in a line called “foo”, but no lines with that name exist in the grid.

These might seem quite simple tasks, but they implied quite a lot of changes in the underlying implementation. I ended up refactoring all the code to resolve grid positions in order to complete them. I even wrote a document explaining the whole plan and all the technical details, where you can find links to all the related patches.

Now that we’ve finished this, it’s time to explain how you can use it. Although my colleague Sergio already wrote about this in 2014, the specification has changed since that time, so I think it’s better to try to explain the whole thing from scratch. This post is a kind of summary with examples of Placing Grid Items” section of the CSS Grid Layout spec.

Grid lines

This is probably one of the most important concepts of the grid layout spec. The grid lines are the ones dividing horizontally and vertically a grid. And they’re actually numbered, starting at 1.

Let’s use a 3x2 grid as example to explain how this works:

<div style="display: grid;
            grid-template-columns: 300px 200px 100px;
            grid-template-rows: 100px 50px;">

Numbered grid lines example Numbered grid lines example

Grid placement properties

In order to position the items inside the grid container, you need to use the grid placement properties. These properties are:

  • grid-column-start: Set the first vertical line for the item.
  • grid-column-end: Set the last vertical line for the item.
  • grid-row-start: Set the first horizontal line for the item.
  • grid-row-end: Set the last horizontal line for the item.

With these properties you define the area where the grid item will be placed. In order to do that, you use the line numbers.

The initial value for these properties is auto, which makes possible that items are automatically placed looking for empty cells inside the grid. For more information about this, please review a previous post on the matter.

On top of that, there’re some handy shorthands:

  • grid-column: Shorthand for grid-column-start and grid-column-end properties.
  • grid-row: Shorthand for grid-row-start and grid-row-end properties.
  • grid-area: Shorthand to set the 4 placement properties in just one declaration.

So, imagine that you add the following item in the previous grid:

<div style="grid-column-start: 2; grid-column-end: 4;
  grid-row-start: 1; grid-row-end 2;">

Probably easier to read if you use some shorthands:

<div style="grid-column: 2 / 4; grid-row: 1 / 2;"></div>

This means that the grid item will be placed taking the 2nd and 3rd columns in the first row.

Place item using line numbers example Place item using line numbers example

Cell spanning

Previous item was spanning 2 columns (2nd and 3rd ones) referencing the lines. You could do the same using the span keyword, together with the number of cells you want to span.

So, you could place the item in the very same position using:

<div style="grid-column: 2 / span 2; grid-row: 1;"></div>

Place item using span example Place item using span example

Note that here you’re not setting the end line for the row. This means that grid-row-end takes a value of auto. In this case auto defaults to a span of one.

Negative line numbers

So far we’ve only seen positive numbers, but lines have also negative indexes. Negative numbers allow you to reference the lines starting from the end of the grid.

Following with the same example, you could place the item again in the same position using the negative indexes:

<div style="grid-column: -3 / -1; grid-row: -3 / -2;"></div>

Place item using negative line numbers example Place item using negative line numbers example

This might be really useful in some situations. For example, if you want to be sure that the item is in the last column, independently of the number of tracks, you’ll just need to set: grid-column-end: -1;.

Named grid lines

Not only that, but you can also name the grid lines, so you don’t need to remember the specific number to reference to them.

Let’s modify the definition of the grid, keeping the size of tracks but adding names to the lines:

<div style="display: grid;
            grid-template-columns: [start] 300px [main] 200px [aside] 100px [end];
            grid-template-rows: [top] 100px [middle] 50px [bottom];">

And again if we want to place the item in the same position, we just need to reference the name of the lines:

<div style="grid-column: main / end; grid-row: top / middle;"></div>

Place item using line names example Place item using line names example

One line can have several names, you just need to set them in the definition: grid-template-rows: [top start] 100px [middle center] 50px [bottom end];.

Also the names of the lines can be repeated. To reference them you’ve to use a number that can be again positive or negative. Let’s use a different example to showcase this:

<div style="display: grid;
            grid-template-columns: [col] 200px [gap] 50px [col] 200px [gap] 50px [col] 200px [gap];">

And imagine that you place some items like this:

<div style="grid-column: col 2;"></div>
<div style="grid-column: col 1 / gap 2;"></div>
<div style="grid-column: col -1;"></div>

Place items with repeated named grid lines example Place items with repeated named grid lines example

And of course, you can span to a named grid line:

<div style="grid-column: col 2 / span 2 gap;"></div>

Place item spanning to named grid line example Place item spanning to named grid line example

Grid areas

Better still, you can define grid areas and place items directly on them. You have to use the grid-template-areas property to put names to the different areas in your grid. And you could use the grid-area shorthand directly to place the items.

Let’s use a bigger grid (5x4) to show an example:

<div style="display: grid;
            grid-template-columns: 100px 100px 100px 100px 100px;
            grid-template-rows: 100px 100px 100px 100px;
                'title  title  title  title  aside'
                'nav    main   main   main   aside'
                'nav    main   main   main   aside'
                'footer footer footer footer footer';">

And position one item in each of the areas:

<div style="grid-area: title;"></div>
<div style="grid-area: nav;"></div>
<div style="grid-area: main;"></div>
<div style="grid-area: aside;"></div>
<div style="grid-area: footer;"></div>

Place items inside grid areas example Place items inside grid areas example

Grid areas & Named grid lines

One interesting thing about areas and placement is that grid areas create implicit names for the grid lines surrounding them. These implicit names use the “-start” and “-end” suffixes. And you can reference those lines when placing an item, instead of using the whole area.

E.g. the “title” area from previous example creates 4 implicit names for the lines (2 in each axis):

  • Left line: “title-start
  • Right line: “title-end
  • Top line: “title-start
  • Bottom line: “title-end

Following with that example you could place an item using the implicit names:

<div style="grid-column: main-start / aside-end;
            grid-row: title-start / nav-end;"></div>

Place item using implicit names from grid areas example Place item using implicit names from grid area

And the same can happen the other way around. If you name lines using those suffixes, you’re actually creating implicit areas. So, we could just create the very same grid using named grid lines:

<div style="display: grid;
            grid-template-columns: [title-start nav-start footer-start] 100px [main-start nav-end] 100px 100px 100px [aside-start title-end main-end] 100px [aside-end footer-end];
            grid-template-rows: [title-start aside-start] 100px [nav-start main-start title-end] 100px 100px [footer-start nav-end main-end aside-end] 100px [footer-end];">

All the examples of items positioned in this section will be exactly in the same place with this new grid.

Implicit grid

With the grid definition properties (grid-template-columns, grid-template-rows and grid-template-areas) you determine the explicit number of tracks (columns and rows) in your grid. However, grid spec allows you to place items outside of the explicit grid. In order to support that, implicit tracks are created automatically, the size of these tracks is controlled by grid-auto-columns and grid-auto-rows properties. In the following examples I’ll use red color to highlight the implicit lines.

This time let’s use a simple 2x1 grid:

<div style="display: grid;
            grid-template-columns: 200px 200px;
            grid-auto-columns: 100px;">

And imagine that you place an item in the 5th column (grid-column: 5;). As the grid only has 2 columns, 3 implicit columns will be added in order to position the item.

Implicit grid example Implicit grid example

Again you can also create implicit tracks with items that span several cells. For example, if an item takes 3 columns starting on the 2nd one (grid-column: 2 / span 3);

Implicit grid with span example Implicit grid with span example

Originally, the implicit tracks could only be added at the end. But now it’s possible to add implicit tracks before the explicit grid. For example, if you place an item using grid-column: -5; it’ll add 2 columns on the left and it’ll be placed in the -2nd column.

Implicit grid before explicit example Implicit grid before explicit example

Implicit grid & Named grid lines

But this is not the only way to create implicit tracks, you can also create them if you use undefined named grid lines. This is more a way to show mistakes on the user CSS than a feature itself, but maybe someone finds it useful. The idea is that all the lines in the implicit grid will take any random name you might need to position an item.

The basic example is placing items referencing a nonexistent line called “foo”. For example you will create 3 implicit columns (1 before and 2 after the explicit grid) with the following items:

<div style="grid-column: foo;"></div>
<div style="grid-column: 2 / span 2 foo;"></div>
<div style="grid-column: -1 foo;"></div>

Implicit grid using undefined named grid lines example Implicit grid using undefined named grid lines example

Note that the simplest example grid-column: foo is being placed in the 4th column (adding an extra empty column just after the explicit grid). This is because first line that is considered to be called “foo” is the first implicit line (line 4), so last line of the grid (line 3) is not included.

Also, the last item grid-column: -1 foo is placed on the -1th column (maybe you was not expecting that). This is because of you start looking for a line named “foo” from the edge of the explicit grid. So, you ignore lines -1, -2 and -3 (as they’re not called “foo”) and consider line -4 (first line on the implicit grid) to have that name.

This is a bit trickier if the line actually exists, as you’ve to count it too in order to place the item. Specially it’s complex if you’re using span to a named grid line, but there’re not enough lines. In that case only implicit lines in the search direction are considered to have that name.

Again, hopefully an example can help to understand this. Let’s add a name to the middle line in the previous example:

<div style="display: grid;
            grid-template-columns: 200px [middle] 200px;
            grid-auto-columns: 100px;">

And now, let’s use place a few items referencing that “middle” line:

<div style="grid-column: 3 middle;"></div>
<div style="grid-column: span 2 middle / 5;"></div>
<div style="grid-column: -4 / span middle;"></div>

Implicit grid using more named grid lines than available example Implicit grid using more named grid lines than available example

The strange case here is grid-column: span 2 middle / 5;, as you can see it takes from -1th column to 4th column (both included). The item ends at line 5, and it has to span 2 lines called “middle” to find the start. You could think that it should count line 4 and line 2, but, as explained before, you have to start counting lines from the edge of the explicit grid. So you actually count line 2 and then you’ve to consider the implicit lines on the left to find the start position (line -4).

Special cases

Grid placement properties have a few special situations that are resolved by the conflict handling section of the spec.

For example, if you place an item where the end line is before than the start line, both lines are swapped. Thus, something like grid-column: 5 / 2; would become grid-column: 2 / 5;.

Another situation is the one in which you have span in both the start and end positions. The span for the end position is discarded. So, grid-column: span 2 / span 3; would become grid-column: span 2;. Which will use the grid placement algorithm to find an empty area (of 2 columns in this particular example) to position itself.

Last one is the case when you only have a span to named grid line. In that case, it’s replaced by span 1. E.g. grid-column: span foo; will become grid-column: span 1;.

Placement special cases example Placement special cases example


If you have read that far it seems you’re really interested on CSS Grid Layout. The main conclusion is that the specification is really flexible regarding how to place items on the grid. As you can see there’re quite a lot of different ways to position an item in the very same place. Probably each person will get used to a few of them and just forget about the rest.

IMHO the basic stuff (line indexes, spans, line names and areas) is quite straightforward. And the implicit grid is not that hard either. Then negative indexes bring some fun. However undefined named grid lines behavior is really tricky (hopefully not something you should care about anyway). But it’s true that I’m biased as I’ve been dealing with this for a long time.

Last, I thought it would be nice to finish this with a big example which uses most of the things described in this post. If you got it right and don’t get confused at all you’re mastering grid layout placement. Congrats! 😄

This is the definition of the grid container used in this example:

<div style="display: grid;
            grid-template-columns: [left] 200px [main-start] 100px [center] 100px 100px [main-end] 50px 50px [right];
            grid-template-rows: [top title-start] 50px [title-end main-start] 200px [main-end center] 150px 100px [bottom];
            grid-auto-columns: 50px;
            grid-auto-rows: 50px;">

And in the next picture you can see how different items will be placed.

Placement big example Placement big example

You can try it live in our examples repository:


As commented on the introduction Blink’s implementation should support now (Chrome 50+) all these different placement possibilities. Igalia has been working on this implementation and we’re porting it now to WebKit too.

On the other side, Gecko has already support for it too, in this case developed by Mozilla.

Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

Finally, as usual I’d like to highlight one more time that all this work has been done as part of a collaboration between Igalia and Bloomberg. Big thanks for your support!


January 31, 2016 11:00 PM