Planet Igalia

May 29, 2015

Samuel Iglesias

piglit (III): How to write GLSL shader tests

In earlier posts I talked about how to install piglit on your system and run the first time and how to tailor your piglit run. I have given a talk in FOSDEM 2015 about how to test your OpenGL drivers using Free Software where I explained how to run piglit and dEQP. This post and the next one are going to introduce the topic of creating new tests for piglit, as this was not covered before in this post series.

Brief introduction

Before start talking about how to create a GLSL test, let me explain you a couple of things about how piglit organizes its tests.

First is that all the tests are defined inside tests/ directory.

  • The ones related to a specific spec (like OpenGL extension, GL version, GLSL version, etc) are placed inside tests/spec subdirectory.
  • Shader tests are usually defined inside  tests/shaders/ directory, but not always. If they are specific to a OpenGL extension, they will be in the its subdirectory (see first bullet).
  • Tests specific to other APIs: tests/egl, tests/cl, etc.
  • Etc.

Take your time browsing all these directories and figure out the best place for your tests or, if you are looking for a specific test, where it is likely placed.

Second thing is that binary tests should be defined inside tests/all.py to be executed in a full piglit run. This is not the case for GLSL tests or shader runner tests as we are going to see in this post.

The most common language to write these tests is C but there are a couple of alternatives if you are interested on writing GLSL shader tests. Let’s start with a couple of examples of GLSL shader tests as it is the easiest way to contribute new tests to piglit.

GLSL compiler test

When creating a GLSL compiler test, most of the time you can avoid to write a C code example with all those bells and whistles. Actually, it is pretty straightforward if you just want to check that a GLSL shader compiles successfully or not according to your expectations.

Usually, the rule of thumb is to keep GLSL tests simple and focused to detect failures (or success) in shaders’ compilation. For example, you want to check that defining a shader storage buffer block is only possible if ARB_shader_storage_buffer_object extension is enabled in the driver.

#version 120
#extension GL_ARB_shader_storage_buffer_object: disable

buffer ssbo {
    vec4 a;
};

void foo(void) {
}

Once you write the GLSL shader, save it to a file named like extension-disabled-shader-storage-block.frag. There are several suffixes that you can use depending of the GLSL shader type: .vert for vertex shader, .geom for geometry shader, .frag for fragment shader. Notice that the name itself is a summary of what the test does, so other people can find what it does without opening the file.

There is a piglit binary called glslparser that it is going to pick this shader and compile it checking for errors. This binary needs some parameters to know how to run this test and the expected result, so we provide them. Add at the top of the shader the following content:

// [config]
// expect_result: fail
// glsl_version: 1.20
// require_extensions: GL_ARB_shader_storage_buffer_object
// [end config]

There we write down that we expect the test to fail, which is the minimum supported GLSL version to run this test and the required extensions. At this case we need GLSL 1.20 version and GL_ARB_shader_storage_buffer_object extension.

// [config]
// expect_result: fail
// glsl_version: 1.20
// require_extensions: GL_ARB_shader_storage_buffer_object
// [end config]

#version 120
#extension GL_ARB_shader_storage_buffer_object: disable

buffer ssbo {
    vec4 a;
};

void foo(void) {
}

Once you have it, you should save it in the proper place. The directory will be the same than the extension name it is going to test, usually it is saved in a subdirectory called compiler because we are testing the GLSL shader compilation, for this example: tests/spec/arb_shader_storage_buffer_object/compiler.

And that’s it. Next time you run piglit with tests/all.py profile, one script will find this test, parse it and execute glslparser with the provided configuration checking if the result of your shader is the expected one or not.

You can execute it manually by running the following command:

$ bin/glslparser tests/spec/arb_shader_storage_buffer_object/compiler/extension-disabled-shader-storage-block.frag fail 1.20 GL_ARB_shader_storage_buffer_object

As you see, the last arguments come from the config we defined in the test file but we added them by hand.

It is possible to link a GLSL shader and look for errors by using check_link with true or false, depending if you expect the test to succeed or to fail at link time:

// check_link: true

Nevertheless, it only links one shader. If you want to link more than one shader, I recommend you to use another type of piglit tests (see next section).

As you see, you can easily write new GLSL shader tests to verify some bits of a specification: valid/invalid keywords, initialization, special cases explained in the spec, etc.

GLSL shader linker test

Sometimes compiling/linking only one shader is not enough, there are cases that you want to compile and link different but related shaders: link a vertex and a fragment shader or a geometry shader between them, etc. In the previous example, we saw that the GLSL shader test does not link different shaders (actually there is only one). When there are several GLSL shaders, you need to use another type of piglit tests: shader_runner tests.

As usual, you first start writing the GLSL shaders. In some cases, you can substitute one shader by its pass-through counterpart to avoid writing “useless” shaders when there is not any user provided interface dependency.

Let’s start with an example of a pass-through vertex shader:

# ARB_gpu_shader5 spec says:
#   "If an implementation supports  vertex streams, the individual
#   streams are numbered 0 through -1"
#
# This test verifies that a link error occurs if EmitStreamVertex()
# is called with a stream value which is negative.

[require]
GLSL >= 1.50
GL_ARB_gpu_shader5

[vertex shader passthrough]

[geometry shader]

#extension GL_ARB_gpu_shader5 : enable

layout(points) in;
layout(points, max_vertices=3) out;

void main()
{
    gl_Position = vec4(1.0, 1.0, 1.0, 1.0);
    EmitStreamVertex(-1);
    EndStreamPrimitive(-1);
}

[fragment shader]

out vec4 color;

void main()
{
  color = vec4(0.0, 1.0, 0.0, 1.0);
}

[test]
link error

The file starts with a comment describing what the test does and which spec (or part of it) is checking.

Next step is very similar to the previous example: specify the minimum GL and/or GLSL version required and the extensions needed for the test execution. This information is written in the [require] section.

Then the different GLSL shader definitions start: vertex shader is pass-through as it has no user-defined interface dependency to next stage (geometry shader at this case). After it, it defines the  geometry and fragment shaders and finally the test commands to run. In this case, we expect the test to fail at link time.

However you are not limited to link checking, there are other available commands to run in the test commands part:

    • Load uniform values
      • uniform <type> <name> <value>

uniform vec4 color 0.0 0.5 0.0 0.0

    • Draw rectangles
      • draw rect <coordinates>

draw rect -1 -1 2 2

    • Probe a pixel color value and compare it to an expected value
      • probe {rgb, rgba}

probe rgb 1 1 0.0 1.0 0.0

    • Probe all pixel values.
      • probe all {rgb,rgba}

probe all rgba 0.0 1.0 0.0 0.0

Or even more complex commands:

    • Load data into a vertex array object and render primitives using that vertex array data.

[vertex data]
vertex/float/3
 1.0  1.0  1.0
-1.0  1.0  1.0
-1.0 -1.0  1.0
 1.0 -1.0  1.0

[test]
draw arrays GL_TRIANGLE_FAN 0 4

    •  Relative probe pixel color

relative probe rgb (.5, .5) (0.0, 1.0, 0.0)

    • And much more:

[fragment program]
!!ARBfp1.0
OPTION ARB_fragment_program_shadow;
TXP result.color, fragment.texcoord[0], texture[0], SHADOWRECT;
END

[test]
texture shadowRect 0 (32, 32)
texparameter Rect depth_mode luminance
texparameter Rect compare_func greater

Just check other tests to figure out if they are doing something like what you want to do in your test and copy the interesting bits!

Save the file with a good name describing briefly what it does (example’s filename is stream-negative-value.shader_test) and place it in the corresponding subdirectory. For this example, the proper place is under GL_ARB_gpu_shader5 test directory and linker subdirectory because this is a linker test: tests/spec/arb_gpu_shader5/linker/

In order to execute this test with tests/all.py profile, you don’t need to do anything else. A script will find this test file and call shader_runner binary with the filename as a parameter.

In case you want to run it by yourself, the command is easy:

$ bin/shader_runner tests/spec/arb_gpu_shader5/linker/stream-negative-value.shader_test

Next post will cover the creation of GL binary tests within the piglit framework. I hope this post and next one will help you to contribute to piglit!

by Samuel Iglesias at May 29, 2015 09:49 AM

May 20, 2015

Iago Toral

Bringing ARB_shader_storage_buffer_object to Mesa and i965

In the last weeks I have been working together with my colleague Samuel on bringing support for ARB_shader_storage_buffer_object, an OpenGL 4.3 feature, to Mesa and the Intel i965 driver, so I figured I would write a bit on what this brings to OpenGL/GLSL users. If you are interested, read on.

Introducing Shader Storage Buffer Objects

This extension introduces the concept of shader storage buffer objects (SSBOs), which is a new type of OpenGL buffer. SSBOs allow GL clients to create buffers that shaders can then map to variables (known as buffer variables) via interface blocks. If you are familiar with Uniform Buffer Objects (UBOs), SSBOs are pretty similar but:

  • They are read/write, unlike UBOs, which are read-only.
  • They allow a number of atomic operations on them.
  • They allow an optional unsized array at the bottom of their definitions.

Since SSBOs are read/write, they create a bidirectional channel of communication between the GPU and CPU spaces: the GL application can set the value of shader variables by writing to a regular OpenGL buffer, but the shader can also update the values stored in that buffer by assigning values to them in the shader code, making the changes visible to the GL application. This is a major difference with UBOs.

In a parallel environment such as a GPU where we can have multiple shader instances running simultaneously (processing multiple vertices or fragments from a specific rendering call) we should be careful when we use SSBOs. Since all these instances will be simultaneously accessing the same buffer there are implications to consider relative to the order of reads and writes. The spec does not make many guarantees about the order in which these take place, other than ensuring that the order of reads and writes within a specific execution of a shader is preserved. Thus, it is up to the graphics developer to ensure, for example, that each execution of a fragment or vertex shader writes to a different offset into the underlying buffer, or that writes to the same offset always write the same value. Otherwise the results would be undefined, since they would depend on the order in which writes and reads from different instances happen in a particular execution.

The spec also allows to use glMemoryBarrier() from shader code and glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT) from a GL application to add sync points. These ensure that all memory accesses to buffer variables issued before the barrier are completely executed before moving on.

Another tool for developers to deal with concurrent accesses is atomic operations. The spec introduces a number of new atomic memory functions for use with buffer variables: atomicAdd, atomicMin, atomicMax, atomicAnd, atomicOr, atomicXor, atomicExchange (atomic assignment to a buffer variable), atomicCompSwap (atomic conditional assignment to a buffer variable).

The optional unsized array at the bottom of an SSBO definition can be used to push a dynamic number of entries to the underlying buffer storage, up to the total size of the buffer allocated by the GL application.

Using shader storage buffer objects (GLSL)

Okay, so how do we use SSBOs? We will introduce this through an example: we will use a buffer to record information about the fragments processed by the fragment shader. Specifically, we will group fragments according to their X coordinate (by computing an index from the coordinate using a modulo operation). We will then record how many fragments are assigned to a particular index, the first fragment to be assigned to a given index, the last fragment assigned to a given index, the total number of fragments processed and the complete list of fragments processed.

To store all this information we will use the SSBO definition below:

layout(std140, binding=0) buffer SSBOBlock {
   vec4 first[8];     // first fragment coordinates assigned to index
   vec4 last[8];      // last fragment coordinates assigned to index
   int counter[8];    // number of fragments assigned to index
   int total;         // number of fragments processed
   vec4 fragments[];  // coordinates of all fragments processed
};

Notice the use of the keyword buffer to tell the compiler that this is a shader storage buffer object. Also notice that we have included an unsized array called fragments[], there can only be one of these in an SSBO definition, and in case there is one, it has to be the last field defined.

In this case we are using std140 layout mode, which imposes certain alignment rules for the buffer variables within the SSBO, like in the case of UBOs. These alignment rules may help the driver implement read/write operations more efficiently since the underlying GPU hardware can usually read and write faster from and to aligned addresses. The downside of std140 is that because of these alignment rules we also waste some memory and we need to know the alignment rules on the GL side if we want to read/write from/to the buffer. Specifically for SSBOs, the specification introduces a new layout mode: std430, which removes these alignment restrictions, allowing for a more efficient memory usage implementation, but possibly at the expense of some performance impact.

The binding keyword, just like in the case of UBOs, is used to select the buffer that we will be reading from and writing to when accessing these variables from the shader code. It is the application’s responsibility to bound the right buffer to the binding point we specify in the shader code.

So with that done, the shader can read from and write to these variables as we see fit, but we should be aware of the fact that multiple instances of the shader could be reading from and writing to them simultaneously. Let’s look at the fragment shader that stores the information we want into the SSBO:

void main() {
   int index = int(mod(gl_FragCoord.x, 8));

   int i = atomicAdd(counter[index], 1);
   if (i == 0)
      first[index] = gl_FragCoord;
   else
      last[index] = gl_FragCoord;

   i = atomicAdd(total, 1);
   fragments[i] = gl_FragCoord;
}

The first line computes an index into our integer array buffer variable by using gl_FragCoord. Notice that different fragments could get the same index. Next we increase in one unit counter[index]. Since we know that different fragments can get to do this at the same time we use an atomic operation to make sure that we don’t lose any increments.

Notice that if two fragments can write to the same index, reading the value of counter[index] after the atomicAdd can lead to different results. For example, if two fragments have already executed the atomicAdd, and assuming that counter[index] is initialized to 0, then both would read counter[index] == 2, however, if only one of the fragments has executed the atomic operation by the time it reads counter[index] it would read a value of 1, while the other fragment would read a value of 2 when it reaches that point in the shader execution. Since our shader intends to record the coordinates of the first fragment that writes to counter[index], that won’t work for us. Instead, we use the return value of the atomic operation (which returns the value that the buffer variable had right before changing it) and we write to first[index] only when that value was 0. Because we use the atomic operation to read the previous value of counter[index], only one fragment will read a value of 0, and that will be the fragment that first executed the atomic operation.

If this is not the first fragment assigned to that index, we write to last[index] instead. Again, multiple fragments assigned to the same index could do this simultaneously, but that is okay here, because we only care about the the last write. Also notice that it is possible that different executions of the same rendering command produce different values of first[] and last[].

The remaining instructions unconditionally push the fragment coordinates to the unsized array. We keep the last index into the unsized array fragments[] we have written to in the buffer variable total. Each fragment will atomically increase total before writing to the unsized array. Notice that, once again, we have to be careful when reading the value of total to make sure that each fragment reads a different value and we never have two fragments write to the same entry.

Using shader storage buffer objects (GL)

On the side of the GL application, we need to create the buffer, bind it to the appropriate binding point and initialize it. We do this as usual, only that we use the new GL_SHADER_STORAGE_BUFFER target:

typedef struct {
   float first[8*4];      // vec4[8]
   float last[8*4];       // vec4[8]
   int counter[8*4];      // int[8] padded as per std140
   int total;             // int
   int pad[3];            // padding: as per std140 rules
   char fragments[1024];  // up to 1024 bytes of unsized array
} SSBO;

SSBO data;

(...)

memset(&data, 0, sizeof(SSBO));

GLuint buf;
glGenBuffers(1, &buf);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, buf);
glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(SSBO), &data, GL_DYNAMIC_DRAW);

The code creates a buffer, binds it to binding point 0 of GL_SHADER_STORAGE_BUFFER (the same we have bound our shader to) and initializes the buffer data to 0. Notice that because we are using std140 we have to be aware of the alignment rules at work. We could have used std430 instead to avoid this.

Since we have 1024 bytes for the fragments[] unsized array and we are pushing a vec4 (16 bytes) worth of data to it with every fragment we process then we have enough room for 64 fragments. It is the developer’s responsibility to ensure that this limit is not surpassed, otherwise we would write beyond the allocated space for our buffer and the results would be undefined.

The next step is to do some rendering so we get our shaders to work. That would trigger the execution of our fragment shader for each fragment produced, which will generate writes into our buffer for each buffer variable the shader code writes to. After rendering, we can map the buffer and read its contents from the GL application as usual:

SSBO *ptr = (SSBO *) glMapNamedBuffer(buf, GL_READ_ONLY);

/* List of fragments recorded in the unsized array */
printf("%d fragments recorded:\n", ptr->total);
float *coords = (float *) ptr->fragments;
for (int i = 0; i < ptr->total; i++, coords +=4) {
   printf("Fragment %d: (%.1f, %.1f, %.1f, %.1f)\n",
          i, coords[0], coords[1], coords[2], coords[3]);
}

/* First fragment for each index used */
for (int i = 0; i < 8; i++) {
   if (ptr->counter[i*4] > 0)
      printf("First fragment for index %d: (%.1f, %.1f, %.1f, %.1f)\n",
             i, ptr->first[i*4], ptr->first[i*4+1],
             ptr->first[i*4+2], ptr->first[i*4+3]);
}

/* Last fragment for each index used */
for (int i = 0; i < 8; i++) {
   if (ptr->counter[i*4] > 1)
      printf("Last fragment for index %d: (%.1f, %.1f, %.1f, %.1f)\n",
             i, ptr->last[i*4], ptr->last[i*4+1],
             ptr->last[i*4+2], ptr->last[i*4+3]);
   else if (ptr->counter[i*4] == 1)
      printf("Last fragment for index %d: (%.1f, %.1f, %.1f, %.1f)\n",
             i, ptr->first[i*4], ptr->first[i*4+1],
             ptr->first[i*4+2], ptr->first[i*4+3]);
}

/* Fragment counts for each index */
for (int i = 0; i < 8; i++) {
   if (ptr->counter[i*4] > 0)
      printf("Fragment count at index %d: %d\n", i, ptr->counter[i*4]);
}
glUnmapNamedBuffer(buf);

I get this result for an execution where I am drawing a handful of points:

4 fragments recorded:
Fragment 0: (199.5, 150.5, 0.5, 1.0)
Fragment 1: (39.5, 150.5, 0.5, 1.0)
Fragment 2: (79.5, 150.5, 0.5, 1.0)
Fragment 3: (139.5, 150.5, 0.5, 1.0)

First fragment for index 3: (139.5, 150.5, 0.5, 1.0)
First fragment for index 7: (39.5, 150.5, 0.5, 1.0)

Last fragment for index 3: (139.5, 150.5, 0.5, 1.0)
Last fragment for index 7: (79.5, 150.5, 0.5, 1.0)

Fragment count at index 3: 1
Fragment count at index 7: 3

It recorded 4 fragments that the shader mapped to indices 3 and 7. Multiple fragments where assigned to index 7 but we could handle that gracefully by using the corresponding atomic functions. Different executions of the same program will produce the same 4 fragments and map them to the same indices, but the first and last fragments recorded for index 7 can change between executions.

Also notice that the first fragment we recorded in the unsized array (fragments[0]) is not the first fragment recorded for index 7 (fragments[1]). That means that the execution of fragments[0] got first to the unsized array addition code, but the execution of fragments[1] beat it in the race to execute the code that handled the assignment to the first/last arrays, making clear that we cannot make any assumptions regarding the execution order of reads and writes coming from different instances of the same shader execution.

So that’s it, the patches are now in the mesa-dev mailing list undergoing review and will hopefully land soon, so look forward to it! Also, if you have any interesting uses for this new feature, let me know in the comments.

by Iago Toral at May 20, 2015 08:57 AM

May 11, 2015

Jacobo Aragunde

Speaking at Protocols Plugfest 2015

In a couple of hours, I will go to the airport and head for Zaragoza, where I’m taking part of the Protocols Plugfest 2015 happening this week.

Protocols Plugfest 2015 logo

Igalia will contribute a speech about LibreOffice interoperability with ECM solutions, specially SharePoint, through the CMIS protocol.

See you there!

EDIT: The plugfest is over! Thanks to everyone attending. By the way, I couldn’t help doing some changes to the slides the night before the talk; I’ve updated the link above.

by Jacobo Aragunde Pérez at May 11, 2015 08:03 AM

April 28, 2015

Claudio Saavedra

Tue 2015/Apr/28

A follow up to my last post. As I was writing it, someone was packaging Linux 4.0 for Debian. I fetched it from the experimental distribution today and all what was broken with the X1 Carbon now works (that is, the bluetooth keyboard, trackpad button events, and 3G/4G USB modem networking). The WEP128 authentication still doesn't work but you shouldn't be using it anyway because aircrack and so on and so on.

So there you have it, just upgrade your kernel and enjoy a functional laptop. I will still take the opportunity to public shame Lenovo for the annoying noise coming out of the speakers every once in a while. Bad Lenovo, very bad.

April 28, 2015 08:04 AM

April 27, 2015

Iago Toral

Free access to Valve-produced games on Steam for Mesa contributors

Just like they did for Debian developers before, it is Valve’s way of saying thanks and giving something back to the community. This is great news for all Mesa contributors, now we can play some great Valve games for free and we can also have an easier time looking into bug reports for them, which also works great for Valve, closing a perfect circle :)

by Iago Toral at April 27, 2015 11:34 AM

April 21, 2015

Claudio Saavedra

Tue 2015/Apr/21

Igalia got me a Lenovo X1 Carbon, third generation. I decided to install Debian on it without really considering that the imminent release of Debian Jessie would get in the way. After a few weeks of tinkering, these are a few notes on what works (with a little help) and what doesn't (yet).

What works (with a little help):

  • Graphics acceleration: Initially X was using llvmpipe and software rasterization. This laptop has Intel Broadwell graphics and the support for it has not been without issues recently. I installed libdrm-intel1, xserver-xorg-video-intel, and the 3.19 kernel from experimental, and that fixed it.

  • I also got a OneLink Pro Dock, which I use to connect to two external displays to the laptop. For whatever reason, these were not detected properly with Jessie's 3.16 kernel. Upgrading to 3.19 fixed this too. I should have used the preinstalled Windows to upgrade its firmware, by the way, but by the time I realized, the Windows partitions were long gone.

  • But upgrading to 3.19 broke both wireless and bluetooth, as with this kernel version newer binary firmware blobs are needed. These are not yet packaged in Debian, but until then you can fetch them from the web. The files needed are ibt-hw-37.8.10-fw-1.10.3.11.e.bseq for bluetooth and iwlwifi-7265-12.ucode for the wireless. There is a bug about it in the Debian bugtracker somewhere.

  • Intel's Rapid Start Technology. Just follow Matthew Garrett's advice and create a large enough partition with the appropriate type.

What doesn't work yet:

  • My Bluetooth keyboard. There are disconnects at random intervals that make it pretty much useless. This is reported in the Debian bugtracker but there have not been any responses yet. I packaged the latest BlueZ release and installed it locally, but that didn't really help, so I'm guessing that the issue is in the kernel. It is possible that my package is broken, though, as I had to rebase some Debian patches and remove others. As a side note, I had forgotten how nice quilt can be for this.

  • The trackpad buttons. Some people suggest switching the driver but then Synaptics won't work. So there's that. I think that the 4.0 kernel has the fixes needed, but last I checked there was no package yet and I don't feel like compiling a kernel. Compiling browsers the whole day is already enough for me, so I'll wait.

  • Using a Nokia N9 as a USB mobile broadband modem or the integrated Sierra 4G modem. The former works in my Fedora laptop, and in Debian both seem to be detected correctly, but journalctl reports some oddities, like:

    Apr 20 21:27:11 patanjali ModemManager[560]: <info>  ModemManager (version 1.4.4) starting in system bus...
    Apr 20 21:27:13 patanjali ModemManager[560]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00:14.0/usb3/3-3/3-3.1/3-3.1.3': not supported by any plugin
    Apr 20 21:27:13 patanjali ModemManager[560]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00:19.0': not supported by any plugin
    Apr 20 21:27:13 patanjali ModemManager[560]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00:1c.1/0000:04:00.0': not supported by any plugin
    Apr 20 21:27:20 patanjali ModemManager[560]: <info>  Creating modem with plugin 'Generic' and '2' ports
    Apr 20 21:27:20 patanjali ModemManager[560]: <info>  Modem for device at '/sys/devices/pci0000:00/0000:00:14.0/usb2/2-4' successfully created
    Apr 20 21:27:20 patanjali ModemManager[560]: <warn>  Modem couldn't be initialized: couldn't load current capabilities: Failed to determine modem capabilities.
    Apr 21 10:01:15 patanjali ModemManager[560]: <info>  (net/usb0): released by modem /sys/devices/pci0000:00/0000:00:14.0/usb2/2-4
    Apr 21 10:01:15 patanjali ModemManager[560]: <info>  (tty/ttyACM0): released by modem /sys/devices/pci0000:00/0000:00:14.0/usb2/2-4
    Apr 21 10:01:30 patanjali ModemManager[560]: <info>  Creating modem with plugin 'Generic' and '2' ports
    Apr 21 10:01:30 patanjali ModemManager[560]: <info>  Modem for device at '/sys/devices/pci0000:00/0000:00:14.0/usb2/2-4' successfully created
    Apr 21 10:01:30 patanjali ModemManager[560]: <warn>  Modem couldn't be initialized: couldn't load current capabilities: Failed to determine modem capabilities.

    I upgraded to experimental's ModemManager, without any improvement. Haven't yet figured out what could this be, although I only used NetworkManager to try to connect.

  • The (terrible) WEP128 authentication in the Nokia N9 wireless hotspot application. As neither the USB modem nor the 4G one are working yet, using the wireless hotspot is the only alternative for the afternoons outside my home office. Not sure why it won't connect (again, only tested with NetworkManager), but at this point I'm starting to be more pragmatic about being able to use this laptop at all. Leaving the hotspot open was the only alternative. I know.

I know I should be a good citizen and add at least some of this information to ThinkWiki, but hey, at least I wrote it down somewhere.

April 21, 2015 07:10 AM

March 30, 2015

Manuel Rego

Web Engines Hackfest 2015: Save the dates!

This is a short note to announce the dates of the Web Engines Hackfest 2015, that will happen next winter at Igalia Headquearters in A Coruña (Spain), from Monday, 7th December, to Wednesday, 9th December.

After all the great work and collaboration that happened during the past year edition, with hackers from all parts of the Web Platform community (Chromium/Blink, WebKit, Gecko, Servo, JSC, V8, SpiderMonkey, etc.), Igalia is really excited to host this great event again.

Web Engines Hackfest 2014 (picture by Adrián Pérez) Web Engines Hackfest 2014 (picture by Adrián Pérez)

We’re still closing the last details and will be sending the invitations in the coming weeks. However, do not hesitate to send us an invitation request if you are willing to come by the end of the year.

Do not miss any update following @webhackfest on twitter. For more details, visit the official webpage http://www.webengineshackfest.org/.

March 30, 2015 10:00 PM

March 23, 2015

Carlos García Campos

WebKitGTK+ 2.8.0

We are excited and proud of announcing WebKitGTK+ 2.8.0, your favorite web rendering engine, now faster, even more stable and with a bunch of new features and improvements.

Gestures

Touch support is one the most important features missing since WebKitGTK+ 2.0.0. Thanks to the GTK+ gestures API, it’s now more pleasant to use a WebKitWebView in a touch screen. For now only the basic gestures are implemented: pan (for scrolling by dragging from any point of the WebView), tap (handling clicks with the finger) and zoom (for zooming in/out with two fingers). We plan to add more touch enhancements like kinetic scrolling, overshot feedback animation, text selections, long press, etc. in future versions.

HTML5 Notifications

notifications

Notifications are transparently supported by WebKitGTK+ now, using libnotify by default. The default implementation can be overridden by applications to use their own notifications system, or simply to disable notifications.

WebView background color

There’s new API now to set the base background color of a WebKitWebView. The given color is used to fill the web view before the actual contents are rendered. This will not have any visible effect if the web page contents set a background color, of course. If the web view parent window has a RGBA visual, we can even have transparent colors.

webkitgtk-2.8-bgcolor

A new WebKitSnapshotOptions flag has also been added to be able to take web view snapshots over a transparent surface, instead of filling the surface with the default background color (opaque white).

User script messages

The communication between the UI process and the Web Extensions is something that we have always left to the users, so that everybody can use their own IPC mechanism. Epiphany and most of the apps use D-Bus for this, and it works perfectly. However, D-Bus is often too much for simple cases where there are only a few  messages sent from the Web Extension to the UI process. User script messages make these cases a lot easier to implement and can be used from JavaScript code or using the GObject DOM bindings.

Let’s see how it works with a very simple example:

In the UI process, we register a script message handler using the WebKitUserContentManager and connect to the “script-message-received-signal” for the given handler:

webkit_user_content_manager_register_script_message_handler (user_content, 
                                                             "foo");
g_signal_connect (user_content, "script-message-received::foo",
                  G_CALLBACK (foo_message_received_cb), NULL);

Script messages are received in the UI process as a WebKitJavascriptResult:

static void
foo_message_received_cb (WebKitUserContentManager *manager,
                         WebKitJavascriptResult *message,
                         gpointer user_data)
{
        char *message_str;

        message_str = get_js_result_as_string (message);
        g_print ("Script message received for handler foo: %s\n", message_str);
        g_free (message_str);
}

Sending a message from the web process to the UI process using JavaScript is very easy:

window.webkit.messageHandlers.foo.postMessage("bar");

That will send the message “bar” to the registered foo script message handler. It’s not limited to strings, we can pass any JavaScript value to postMessage() that can be serialized. There’s also a convenient API to send script messages in the GObject DOM bindings API:

webkit_dom_dom_window_webkit_message_handlers_post_message (dom_window, 
                                                            "foo", "bar");

 

Who is playing audio?

WebKitWebView has now a boolean read-only property is-playing-adio that is set to TRUE when the web view is playing audio (even if it’s a video) and to FALSE when the audio is stopped. Browsers can use this to provide visual feedback about which tab is playing audio, Epiphany already does that :-)

ephy-is-playing-audio

HTML5 color input

Color input element is now supported by default, so instead of rendering a text field to manually input the color  as hexadecimal color code, WebKit now renders a color button that when clicked shows a GTK color chooser dialog. As usual, the public API allows to override the default implementation, to use your own color chooser. MiniBrowser uses a popover, for example.

mb-color-input-popover

APNG

APNG (Animated PNG) is a PNG extension that allows to create animated PNGs, similar to GIF but much better, supporting 24 bit images and transparencies. Since 2.8 WebKitGTK+ can render APNG files. You can check how it works with the mozilla demos.

webkitgtk-2.8-apng

SSL

The POODLE vulnerability fix introduced compatibility problems with some websites when establishing the SSL connection. Those problems were actually server side issues, that were incorrectly banning SSL 3.0 record packet versions, but that could be worked around in WebKitGTK+.

WebKitGTK+ already provided a WebKitWebView signal to notify about TLS errors when loading, but only for the connection of the main resource in the main frame. However, it’s still possible that subresources fail due to TLS errors, when using a connection different to the main resource one. WebKitGTK+ 2.8 gained WebKitWebResource::failed-with-tls-errors signal to be notified when a subresource load failed because of invalid certificate.

Ciphersuites based on RC4 are now disallowed when performing TLS negotiation, because it is no longer considered secure.

Performance: bmalloc and concurrent JIT

bmalloc is a new memory allocator added to WebKit to replace TCMalloc. Apple had already used it in the Mac and iOS ports for some time with very good results, but it needed some tweaks to work on Linux. WebKitGTK+ 2.8 now also uses bmalloc which drastically improved the overall performance.

Concurrent JIT was not enabled in GTK (and EFL) port for no apparent reason. Enabling it had also an amazing impact in the performance.

Both performance improvements were very noticeable in the performance bot:

webkitgtk-2.8-perf

 

The first jump on 11th Feb corresponds to the bmalloc switch, while the other jump on 25th Feb is when concurrent JIT was enabled.

Plans for 2.10

WebKitGTK+ 2.8 is an awesome release, but the plans for 2.10 are quite promising.

  • More security: mixed content for most of the resources types will be blocked by default. New API will be provided for managing mixed content.
  • Sandboxing: seccomp filters will be used in the different secondary processes.
  • More performance: FTL will be enabled in JavaScriptCore by default.
  • Even more performance: this time in the graphics side, by using the threaded compositor.
  • Blocking plugins API: new API to provide full control over the plugins load process, allowing to block/unblock plugins individually.
  • Implementation of the Database process: to bring back IndexedDB support.
  • Editing API: full editing API to allow using a WebView in editable mode with all editing capabilities.

by carlos garcia campos at March 23, 2015 11:56 AM

March 18, 2015

Víctor Jáquez

GStreamer Hackfest 2015

Last weekend was the GStreamer Hackfest in Staines, UK, in the Samsung’s premises, who also sponsored the dinners and the lunches. Special thanks to Luis de Bethencourt, the almighty organizer!

My main purpose was to sip one or two pints with the GStreamer folks and, secondarily, to talk about gstreamer-vaapi, WebKitGTK+ and the new OpenGL/ES support in gst-plugins-bad.

15030008

About gstreamer-vaapi, there were a couple questions about some problems shown in downstream (stable releases in distributions) which I was happy to announce that they are mostly fixed in upstream. On the other hand, Sebastian Drödge was worried about the existing support of GStreamer 0.10 and I answered him that its removal is already in the pipeline. He looked pleased.

Related with gstreamer-vaapi and the new GstGL, we tested and merged a patch for GLES2/EGL, so now it is possible to render VA-API decoded video through glimagesink with (nearly) zero-copy. Sadly, this is not currently possible using GLX. Along the way I found a silly bug that came from a previous patch of mine and fixed it; also, we fixed other small bug in the gluploader .

In the WebKitGTK+ realm, I worked on a new functionality: to share the OpenGL context and the display of the browser with the GStreamer pipeline. With it, we could add gl filters into the pipeline. But honour to whom honour is due: this patch is a split of a previous patch done by Philippe Normand. The ultimate goal is to ditch the custom video sink in WebKit and reuse the glimagesink, with it’s new off-screen rendering feature.

Finally, on Sunday’s afternoon, I walked around Richmond and it is beautiful.

15030009

Thanks to Igalia, Intel and all the sponsors  that make possible the hackfest and my attendance.

by vjaquez at March 18, 2015 06:36 PM

March 17, 2015

Jacobo Aragunde

Creating new document providers in LibreOffice for Android

We recently completed our tasks for The Document Foundation regarding the Android document browser; nonetheless, we had a pending topic regarding the documentation of our work: write and publish a guide to extend the cloud storage integration. This blog post covers how to integrate new cloud solutions using the framework for cloud storage we have implemented.

Writing a document provider

Document Provider class diagram

We have called “document providers” to the set of classes that implement support for some storage solution. Document providers will consist of two classes implementing the IDocumentProvider and IFile interfaces. Both contain extensive in-code documentation of the operations to help anybody implementing them.

The IDocumentProvider interface provides some general operations about the provider, addressed to provide a starting point for the service. getRootDirectory() provides a pointer to the root of the service, while createFromUri() is required to restore the state of the document browser.

The IFile interface is an abstraction of the java File class, with many similar operations. Those operations will be used by the document browser to print information about the files, browse the directories and open the final documents.

Once those classes have been implemented, the new provider must be linked with the rest of the application by making some modifications to DocumentProviderFactory class. Touching the initialize() method to add a new instance of the provider to the providers[] array should be enough:

    // initialize document providers list
    instance.providers = new IDocumentProvider[3];
    instance.providers[0] = new LocalDocumentsDirectoryProvider();
    instance.providers[1] = new LocalDocumentsProvider();
    instance.providers[2] = new OwnCloudProvider(context);

At this point, your provider should appear in the drawer that pops-up with a swipe gesture from the left of the screen.

LibreOffice for Android, provider selection

You are encouraged to create the classes for your document provider in a separate package inside org.libreoffice.storage. Your operations may throw a RuntimeException in case of error, it will be captured by the UI activity and the message inside the exception will be shown, so make sure that you are internationalizing the strings using the standard Android API. You can always take a look to the existing providers and use them as an example, specially OwnCloudProvider which is the most complex one but still quite manageable.

Making use of application settings

If you are implementing a generic provider for some cloud service, it is quite likely that you need some input from the user like a login name or a password. For that reason we have added an activity for configuration related with document providers.

To add your settings in that screen, modify the file res/xml/documentprovider_preferences.xml and add a new PreferenceCategory that contain your own ones. The android:key attribute will allow you to use the preference from your code; you may want to add that preference string as a constant in DocumentProviderSettingsActivity.

At this point, you will be able to use the preferences in your DocumentProvider using the standard Android API. Take OwnCloudProvider as an example:

    public OwnCloudProvider(Context context) {
        ...
        // read preferences
        SharedPreferences preferences = PreferenceManager.getDefaultSharedPreferences(context);
        serverUrl = preferences.getString(
                DocumentProviderSettingsActivity.KEY_PREF_OWNCLOUD_SERVER, "");

Finally, we have added a way for providers to check if settings have changed; otherwise, the application should be restarted for the changes to take effect. Your provider must implement the OnSharedPreferenceChangeListener, which brings the onSharedPreferenceChanged() method. That method will be called whenever any preference is changed, you can just check the ones you are interested in using the key parameter, and make the changes required to the internal state of your provider.

Preference listener class diagram

Future

This effort has been one more step towards building a full featured version of LibreOffice for Android, and there are improvements we can, or even must do in the future:

  • With the progression of the work in the LibreOffice Android editor we will have to add a save operation to the document providers that takes care of uploading the new version of the file to the cloud.
  • It would be a good idea to implement an add-on mechanism to install the document providers. That way we would not add unnecessary weight to the main package, and plugins could be independently distributed.

That’s all for now; you can try the ownCloud provider building the feature/owncloud-provider-for-android branch yourself, or wait for it to be merged. We hope to see other members of the community taking advantage of this framework to provide new services soon.

by Jacobo Aragunde Pérez at March 17, 2015 11:54 AM

March 12, 2015

Michael Catanzaro

Stop using RC4

A follow up of my previous post: in response to my letter, NIST is going to increase the CVSS score of CVE-2013-2566 (RC4) to match CVE-2011-3389 (BEAST). Yay!

In other news, WebKitGTK+ 2.8 has full support for RFC 7465. That’s a fancy way of saying that we will no longer negotiate RC4 connections and you will now be unable to access the small minority of HTTPS sites that offer nothing but RC4. Hopefully other browsers will follow along sooner rather than later. In particular, Firefox nightly has stopped negotiating RC4 except for a few whitelisted sites: I would very much like to see that whitelist removed. Internet Explorer has stopped negotiating RC4 except when it performs voluntary protocol version fallback. It would be great to see a firmer stance from Mozilla and Microsoft, and some action from Google and Apple.

by Michael Catanzaro at March 12, 2015 08:37 PM

March 09, 2015

Javier Fernández

Content Distribution in CSS Grid Layout

It’s been a while since Igalia and Bloomberg started to implement the Box Alignment specification for the CSS Grid Layout model. Some weeks ago we accomplished an important milestone of our roadmap landing in Blink trunk the last patches implementing the Content Distribution properties: align-content and justify-content.

Quoting the CSS Box Alignment document, the content distribution properties are defined as follows:

Aligns the contents of the box as a whole along the box’s inline/row/main axis.
The alignment container is the grid container’s content box. The alignment subjects are the grid tracks.

The CSS syntax of these recently added properties gives an idea of how powerful and flexible they are for grid layout definitions, allowing every possible alignment values combination:

auto | <baseline-position> | <content-distribution> || [ <overflow-position>? && <content-position> ]

It’s worth mentioning that Baseline Alignment is still not implemented, as well as the <content-distribution> values for Distribution Alignment. However, in the latter case I’ve got already a quite promising draft implementation which, eventually, has been very useful to activate a discussion inside the W3C community to allow these alignment values for grid. In previous versions of the specification it was stated that all <content-distribution> values should use their <content-position> fallback values for grid containers. I’m glad that such decision was finally made, because I think that <content-distribution> values are really useful for defining fancier grid layouts. I’ll talk about this soon in a new post, as I consider it deserves a detailed description with the proper examples.

Last, but not least, as it happens with Self Alignment it allows using overflow keywords to define how we want to handle grid’s content overflow. It works in the same way for Content Distribution as we’ll see later with some examples.

Aligning the grid

When there is available space in the grid container block, it’s always useful to have a way to control how we want to use such space and how we want our grid to behave on it. It might happen that container’s size changes (fullscreen mode) or we could have to deal with a content sized grid modifying its content’s size. There are quite many possibilities, so I’ll leave this issue for user/designer’s imagination and I’ll focus on very simple examples to illustrate the concept.

For now, let’s consider this case to understand what you can do with the different <content-alignment> values in a grid layout.

.grid {
    grid: 50px 50px / 100px 100px;
    position: relative;
    width: 200px;
    height: 300px;
}
.fixedSize {
    width: 20px;
    height: 40px;
}
<div class="grid">
   <div class="fixedSize" style="grid-column: 1; grid-row: 1; background: violet;"></div>
   <div style="grid-column: 1; grid-row: 2; background: yellow;"></div>
   <div style="grid-column: 2; grid-row: 1; background: green;"></div>
   <div class="fixedSize" style="grid-column: 2; grid-row: 2; background: red;"></div>
</div>

We are defining a 2×2 grid with 50×100 pixels cells where we are going to place 4 items, one in each cell. Notice that items at (1,1) and (2,2) have a fixed size of 20×40 pixels, while the other 2 are auto-sized so they will be stretched to fill their corresponding grid cell (if you don’t know why, a reading of previous post might help). Also, bear in mind that both align-content and justify-content properties have start as the initial value for grids.

ContentAlignment

Controlling the grid overflow

When grid content’s size exceeds its container dimensions there is the risk of data loss. Some examples of this scenario are center or end alignment from the viewport’s edges; all the content overflowing the viewport’s area can’t be reached, hence we lose such data. In order to prevent this issue Box Alignment specification defines the safe overflow mode, which basically forces a start alignment value for the property handling the dimension where the overflow is detected.

Using the same CSS and HTML code in the example above, we can easily define cases where this data loss issue (red colored arrows) is clearly noticeable just modifying the height or width to cause top or left overflow respectively.

Content-Alignment-Overflow1

There are other situations where Content Alignment and Overflow interact in a different way, using margins, padding or/and borders and even combining different writing-modes and flow directions. The effect of the alignment values varies considerably depending on those factors but I think you have now a clear idea of how to use these new properties in a grid layout.

Current status and next steps

With the grid support for the align-content and justify-content CSS properties in Blink we’ve got most of the Box Alignment specification covered. As it was commented before, just Base Alignment is still pending to be implemented in Chromium browsers. I have to admit that there are also some bugs and wrong behavior using certain CSS combinations, specially regarding orthogonal flows, but we are working on it right now and I hope to integrate the patches soon in trunk.

For the time being, let’s consider the following table as the current implementation status of the Box Alignment specification for the Grid Layout model in WebKit (Safari/Epiphany) and Blink (Chrome/Chromium/Opera) based browsers:

align-grid-support-1

The lack of progress in the implementation of the Box Alignment specification in the WebKit web engine is disappointing. I’ve been stuck for quite a lot of time trying to upgrade the CSS properties to the last version of the spec, mainly due design and performance issues. I’ll discuss with the WebKit hackers the best approach to solve this issue so I can put the Grid Layout implementation at the same level than in Blink web engine.

Igalia and Bloomberg will continue working on the implementation of the CSS Grid Layout specification and among my short/mid term challenges are completing the Box Alignment support. These goals include the following tasks:

  • Fixing bugs and completing the orthogonal flows support.
  • Implementing the Base Alignment features
  • Completing the Content Distribution Alignment with the <content-distribution> values
  • Implementing the Box Alignment spec in WebKit
Igalia & Bloomberg logos

Igalia and Bloomberg working to build a better web platform

by jfernandez at March 09, 2015 01:58 PM

March 06, 2015

Iago Toral

An introduction to Mesa’s GLSL compiler (II)

Recap

My previous post served as an initial look into Mesa’s GLSL compiler, where we discussed the Mesa IR, which is a core aspect of the compiler. In this post I’ll introduce another relevant aspect: IR lowering.

IR lowering

There are multiple lowering passes implemented in Mesa (check src/glsl/lower_*.cpp for a complete list) but they all share a common denominator: their purpose is to re-write certain constructs in the IR so they fit better the underlying GPU hardware.

In this post we will look into the lower_instructions.cpp lowering pass, which rewrites expression operations that may not be supported directly by GPU hardware with different implementations.

The lowering process involves traversing the IR, identifying the instructions we want to lower and modifying the IR accordingly, which fits well into the visitor pattern strategy discussed in my previous post. In this case, expression lowering is handled by the lower_instructions_visitor class, which implements the lowering pass in the visit_leave() method for ir_expression nodes.

The hierarchical visitor class, which serves as the base class for most visitors in Mesa, defines visit() methods for leaf nodes in the IR tree, and visit_leave()/visit_enter() methods for non-leaf nodes. This way, when traversing intermediary nodes in the IR we can decide to take action as soon as we enter them or when we are about to leave them.

In the case of our lower_instructions_visitor class, the visit_leave() method implementation is a large switch() statement with all the operators that it can lower.

The code in this file lowers common scenarios that are expected to be useful for most GPU drivers, but individual drivers can still select which of these lowering passes they want to use. For this purpose, hardware drivers create instances of the lower_instructions class passing the list of lowering passes to enable. For example, the Intel i965 driver does:

const int bitfield_insert = brw->gen >= 7
                            ? BITFIELD_INSERT_TO_BFM_BFI
                            : 0;
lower_instructions(shader->base.ir,
                   MOD_TO_FLOOR |
                   DIV_TO_MUL_RCP |
                   SUB_TO_ADD_NEG |
                   EXP_TO_EXP2 |
                   LOG_TO_LOG2 |
                   bitfield_insert |
                   LDEXP_TO_ARITH);

Notice how in the case of Intel GPUs, one of the lowering passes is conditionally selected depending on the hardware involved. In this case, brw->gen >= 7 selects GPU generations since IvyBridge.

Let’s have a look at the implementation of some of these lowering passes. For example, SUB_TO_ADD_NEG is a very simple one that transforms subtractions into negative additions:

void
lower_instructions_visitor::sub_to_add_neg(ir_expression *ir)
{
   ir->operation = ir_binop_add;
   ir->operands[1] =
      new(ir) ir_expression(ir_unop_neg, ir->operands[1]->type,
                            ir->operands[1], NULL);
   this->progress = true;
}

As we can see, the lowering pass simply changes the operator used by the ir_expression node, and negates the second operand using the unary negate operator (ir_unop_neg), thus, converting the original a = b – c into a = b + (-c).

Of course, if a driver does not have native support for the subtraction operation, it could still do this when it processes the IR to produce native code, but this way Mesa is saving driver developers that work. Also, some lowering passes may enable optimization passes after the lowering that drivers might miss otherwise.

Let’s see a more complex example: MOD_TO_FLOOR. In this case the lowering pass provides an implementation of ir_binop_mod (modulo) for GPUs that don’t have a native modulo operation.

The modulo operation takes two operands (op0, op1) and implements the C equivalent of the ‘op0 % op1‘, that is, it computes the remainder of the division of op0 by op1. To achieve this the lowering pass breaks the modulo operation as mod(op0, op1) = op0 – op1 * floor(op0 / op1), which requires only multiplication, division and subtraction. This is the implementation:

ir_variable *x = new(ir) ir_variable(ir->operands[0]->type, "mod_x",
                                     ir_var_temporary);
ir_variable *y = new(ir) ir_variable(ir->operands[1]->type, "mod_y",
                                     ir_var_temporary);
this->base_ir->insert_before(x);
this->base_ir->insert_before(y);

ir_assignment *const assign_x =
   new(ir) ir_assignment(new(ir) ir_dereference_variable(x),
                         ir->operands[0], NULL);
ir_assignment *const assign_y =
   new(ir) ir_assignment(new(ir) ir_dereference_variable(y),
                         ir->operands[1], NULL);

this->base_ir->insert_before(assign_x);
this->base_ir->insert_before(assign_y);

ir_expression *const div_expr =
   new(ir) ir_expression(ir_binop_div, x->type,
                         new(ir) ir_dereference_variable(x),
                         new(ir) ir_dereference_variable(y));

/* Don't generate new IR that would need to be lowered in an additional
 * pass.
 */
if (lowering(DIV_TO_MUL_RCP) && (ir->type->is_float() ||
    ir->type->is_double()))
   div_to_mul_rcp(div_expr);

ir_expression *const floor_expr =
   new(ir) ir_expression(ir_unop_floor, x->type, div_expr);

if (lowering(DOPS_TO_DFRAC) && ir->type->is_double())
   dfloor_to_dfrac(floor_expr);

ir_expression *const mul_expr =
   new(ir) ir_expression(ir_binop_mul,
                         new(ir) ir_dereference_variable(y),
                         floor_expr);

ir->operation = ir_binop_sub;
ir->operands[0] = new(ir) ir_dereference_variable(x);
ir->operands[1] = mul_expr;
this->progress = true;

Notice how the first thing this does is to assign the operands to a variable. The reason for this is a bit tricky: since we are going to implement ir_binop_mod as op0 – op1 * floor(op0 / op1), we will need to refer to the IR nodes op0 and op1 twice in the tree. However, we can’t just do that directly, for that would mean that we have the same node (that is, the same pointer) linked from two different places in the IR expression tree. That is, we want to have this tree:

                             sub
                           /     \
                        op0       mult
                                 /    \
                              op1     floor
                                        |
                                       div
                                      /   \
                                   op0     op1

Instead of this other tree:


                             sub
                           /     \
                           |      mult
                           |     /   \
                           |   floor  |
                           |     |    |
                           |    div   |
                           |   /   \  |
                            op0     op1   

This second version of the tree is problematic. For example, let’s say that a hypothetical optimization pass detects that op1 is a constant integer with value 1, and realizes that in this case div(op0/op1) == op0. When doing that optimization, our div subtree is removed, and with that, op1 could be removed too (and possibily freed), leaving the other reference to that operand in the IR pointing to an invalid memory location… we have just corrupted our IR:

                             sub
                           /     \
                           |      mult
                           |     /    \
                           |   floor   op1 [invalid pointer reference]
                           |     |
                           |    /
                           |   /
                            op0 

Instead, what we want to do here is to clone the nodes each time we need a new reference to them in the IR. All IR nodes have a clone() method for this purpose. However, in this particular case, cloning the nodes creates a new problem: op0 and op1 are ir_expression nodes so, for example, op0 could be the expression a + b * c, so cloning the expression would produce suboptimal code where the expression gets replicated. This, at best, will lead to slower
compilation times due to optimization passes needing to detect and fix that, and at worse, that would go undetected by the optimizer and lead to worse performance where we compute the value of the expression multiple times:

                              sub
                           /        \
                         add         mult
                        /   \       /    \
                      a     mult  op1     floor
                            /   \          |
                           b     c        div
                                         /   \
                                      add     op1
                                     /   \
                                    a    mult
                                        /    \
                                        b     c

The solution to this problem is to assign the expression to a variable, then dereference that variable (i.e., read its value) wherever we need. Thus, the implementation defines two variables (x, y), assigns op0 and op1 to them and creates new dereference nodes wherever we need to access the value of the op0 and op1 expressions:

                       =               =
                     /   \           /   \
                    x     op0       y     op1


                             sub
                           /     \
                         *x       mult
                                 /    \
                               *y     floor
                                        |
                                       div
                                      /   \
                                    *x     *y

In the diagram above, each variable dereference is marked with an ‘*’, and each one is a new IR node (so both appearances of ‘*x’ refer to different IR nodes, both representing two different reads of the same variable). With this solution we only evaluate the op0 and op1 expressions once (when they get assigned to the corresponding variables) and we never refer to the same IR node twice from different places (since each variable dereference is a new IR node).

Now that we know why we assign these two variables, let’s continue looking at the code of the lowering pass:

In the next step we implement op0 / op1 using a ir_binop_div expression. To speed up compilation, if the driver has the DIV_TO_MUL_RCP lowering pass enabled, which transforms a / b into a * 1 / b (where 1 / b could be a native instruction), we immediately execute the lowering pass for that expression. If we didn’t do this here, the resulting IR would contain a division operation that might have to be lowered in a later pass, making the compilation process slower.

The next step uses a ir_unop_floor expression to compute floor(op0/op1), and again, tests if this operation should be lowered too, which might be the case if the type of the operands is a 64bit double instead of a regular 32bit float, since GPUs may only have a native floor instruction for 32bit floats.

Next, we multiply the result by op1 to get op1 * floor(op0 / op1).

Now we only need to subtract this from op0, which would be the root IR node for this expression. Since we want the new IR subtree spawning from this root node to replace the old implementation, we directly edit the IR node we are lowering to replace the ir_binop_mod operator with ir_binop_sub, make a dereference to op1 in the first operand and link the expression holding op1 * floor(op0 / op1) in the second operand, effectively attaching our new implementation in place of the old version. This is how the original and lowered IRs look like:

Original IR:

[prev inst] -> mod -> [next inst]
              /   \            
           op0     op1         

Lowered IR:

[prev inst] -> var x -> var y ->   =   ->   =   ->   sub   -> [next inst]
                                  / \      / \      /   \
                                 x  op0   y  op1  *x     mult
                                                        /    \
                                                      *y      floor
                                                                |
                                                               div
                                                              /   \
                                                            *x     *y

Finally, we return true to let the compiler know that we have optimized the IR and that as a consequence we have introduced new nodes that may be subject to further lowering passes, so it can run a new pass. For example, the subtraction we just added may be lowered again to a negative addition as we have seen before.

Coming up next

Now that we learnt about lowering passes we can also discuss optimization passes, which are very similar since they are also based on the visitor implementation in Mesa and also transform the Mesa IR in a similar way.

by Iago Toral at March 06, 2015 02:06 PM

Samuel Iglesias

My FOSDEM 2015 talk’s slides and video

As I said in a previous post, this year I attended FOSDEM to give talk about how to test our OpenGL drivers with Free Software. The slides are available online and the video recording has been recently released.

Last but not least, I would like to thank Igalia for sponsoring my trip :-)

Igalia logo

by Samuel Iglesias at March 06, 2015 08:25 AM

March 05, 2015

Jacobo Aragunde

New features in LibreOffice for Android document browser

The Document Foundation recently assigned one of the packages of the Android tender to Igalia; in particular, the one about cloud storage and email sharing. Our proposal comprised the following tasks:

  • Integrate the “share” feature of the Android framework to be able to send documents by email, bluetooth or any other means provided by the system.
  • Provide the means for the community to develop integration of cloud storage solutions.
  • Implement ownCloud integration as an example of how to integrate other cloud solutions.
  • Extensive documentation of the process to integrate more cloud solutions.

The work is completed and the patches available in the repository; most of them are already merged in master, while ownCloud support lives in a different branch for now.

Sharing documents

The Android-provided share feature allows to send a document not only through email but through bluetooth or any available methods, depending on the software installed in your device.

We have made this feature available to users through a context menu in the document browser, which pops up after a long press on a document.

Context menu in Android document browser

Share from the Android document browser

Support for cloud storage solutions

This task consisted of creating an interface to develop integration of any cloud storage solution. The first step was abstracting the code that made direct access to the file system, so it could be replaced by the different implementations of storage services, which from now on will be denominated document providers.

Afterwards, we created two document providers for local storage: one to access the internal storage of the device and another one to conveniently access the Documents directory inside the storage. These two simple providers served as a test for the UI to switch between both; we used the Android drawer widget, which pops-up with a swipe gesture from the left of the screen.

Side drawer in Android document browser

All the operations in the Android document browser were being performed in the same thread. Besides being suboptimal, the development framework actually forbids running network code in the main thread of the application. The next step for us was isolating the code that might need networking access when interacting with a cloud provider, and run it in separate threads.

ownCloud document provider

At that point, we had everything in place to write the code to access an ownCloud server. We did it with the help of an Android library provided by ownCloud developers.

There was still another task, though; any cloud service will likely need some configuration from the user for login credentials and so. We had to implement a preferences screen to enter these settings and do the proper wiring for the provider to be able to listen to any changes in them.

ownCloud settings screen

Documentation

To help other developers writing new document providers, we have tried to document the new code in detail, specially those interfaces that must be implemented to create new document providers. Besides, we will publish a document explaining how to extend the cloud storage integration here soon.

That’s all for now; to try the ownCloud provider you will have to build the feature/owncloud-provider-for-android branch yourself, while you will find the share feature in the packages already available in the Play Store or F-Droid. Hope you enjoy it!

by Jacobo Aragunde Pérez at March 05, 2015 03:46 PM

March 04, 2015

Michael Catanzaro

Security and Privacy Roadmap for Epiphany and WebKitGTK+

I’ve laid out some informal thoughts on where we should be heading with regards to new security and privacy features in Epiphany. It’s in the form of a list of features we really ought to have. (That is, it’s a wishlist.) Most of these features would be implemented in WebKitGTK+, so other applications using WebKitGTK+ would benefit as well.

There’s certainly no shortage of work to be done, so except for a couple items on the list, this is not a list of things you should expect to be implemented soon. Comments welcome on the wiki or on this blog. Volunteers especially welcome! Most of these tasks on the list would make for great GSoC projects (but I’m not accepting more applicants this year: prospective students should find another mentor who’s interested in one of the tasks).

The list will also be used to help assign one or more bounties using some of the money we raised in our 2013 security and privacy campaign.

by Michael Catanzaro at March 04, 2015 06:24 AM

March 03, 2015

Iago Toral

An introduction to Mesa’s GLSL compiler (I)

Recap

In my last post I explained that modern 3D pipelines are programmable and how this has impacted graphics drivers. In the following posts we will go deeper into this aspect by looking at different parts of Mesa’s GLSL compiler. Specifically, this post will cover the GLSL parser, the Mesa IR and built-in variables and functions.

The GLSL parser

The job of the parser is to process the shader source code string provided via glShaderSource and transform it into a suitable binary representation that is stored in RAM and can be efficiently processed by other parts of the compiler in later stages.

The parser consists of a set of Lex/Yacc rules to process the incoming shader source. The lexer (glsl_parser.ll) takes care of tokenizing the source code and the parser (glsl_parser.yy) adds meaning to the stream of tokens identified in
the lexer stage.

Similarly, just like in C or C++, GLSL includes a pre-processor that goes through the shader source code before the main parser kicks in. Mesa’s implementation of the GLSL pre-processor lives in src/glsl/glcpp and is also based on Lex/Yacc rules.

The output of the parser is an Abstract Syntax Tree (AST) that lives in RAM memory, which is a binary representation of the shader source code. The nodes that make this tree are defined in src/glsl/ast.h.

For someone familiar with all the Lex/Yacc stuff, the parser implementation in Mesa should feel familiar enough.

The next step takes care of converting from the AST to a different representation that is better suited for the kind of operations that drivers will have to do with it. This new representation, called the IR (Intermediate Representation), is usually referenced in Mesa as Mesa IR, GLSL IR or simply HIR.

The AST to Mesa IR conversion is driven by the code in src/glsl/ast_to_hir.cpp.

Mesa IR

The Mesa IR is the main data structure used in the compiler. Most of the work that the compiler does can be summarized as:

  • Optimizations in the IR
  • Modifications in the IR for better/easier integration with GPU hardware
  • Linking multiple shaders (multiple IR instances) into a single program.
  • Generating native assembly code for the target GPU from the IR

As we can see, the Mesa IR is at the core of all the work that the compiler has to do, so understanding how it is setup is necessary to work in this part of Mesa.

The nodes in the Mesa IR tree are defined in src/glsl/ir.h. Let’s have a look at the most important ones:

At the top of the class hierarchy for the IR nodes we have exec_node, which is Mesa’s way of linking independent instructions together in a list to make a program. This means that each instruction has previous and next pointers to the instructions that are before and after it respectively. So, we have ir_instruction, the base class for all nodes in the tree, inherit from exec_node.

Another important node is ir_rvalue, which is the base class used to represent expressions. Generally, anything that can go on the right side of an assignment is an ir_rvalue. Subclasses of ir_rvalue include ir_expression, used to represent all kinds of unary, binary or ternary operations (supported operators are defined in the ir_expression_operation enumeration), ir_texture, which is used to represent texture operations like a texture lookup, ir_swizzle, which is used for swizzling values in vectors, all the ir_dereference nodes, used to access the values stored in variables, arrays, structs, etc. and ir_constant, used to represent constants of all basic types (bool, float, integer, etc).

We also have ir_variable, which represents variables in the shader code. Notice that the definition of ir_variable is quite large… in fact, this is by large the node with the most impact in the memory footprint of the compiler when compiling shaders in large games/applications. Also notice that the IR differentiates between variables and variable dereferences (the fact of looking into a variable’s value), which are represented as an ir_rvalue.

Similarly, the IR also defines nodes for other language constructs like ir_loop, ir_if, ir_assignment, etc.

Debugging the IR is not easy, since the representation of a shader program in IR nodes can be quite complex to traverse and inspect with a debugger. To help with this Mesa provides means to print the IR to a human-readable text format. We can enable this by using the environment variable MESA_GLSL=dump. This will instruct Mesa to print both the original shader source code and its IR representation. For example:

$ MESA_GLSL=dump ./test_program

GLSL source for vertex shader 1:
#version 140
#extension GL_ARB_explicit_attrib_location : enable

layout(location = 0) in vec3 inVertexPosition;
layout(location = 1) in vec3 inVertexColor;

uniform mat4 MVP;
smooth out vec3 out0;

void main()
{
  gl_Position = MVP * vec4(inVertexPosition, 1);
  out0 = inVertexColor;
}

GLSL IR for shader 1:
(
(declare (sys ) int gl_InstanceID)
(declare (sys ) int gl_VertexID)
(declare (shader_out ) (array float 0) gl_ClipDistance)
(declare (shader_out ) float gl_PointSize)
(declare (shader_out ) vec4 gl_Position)
(declare (uniform ) (array vec4 56) gl_CurrentAttribFragMESA)
(declare (uniform ) (array vec4 33) gl_CurrentAttribVertMESA)
(declare (uniform ) gl_DepthRangeParameters gl_DepthRange)
(declare (uniform ) int gl_NumSamples)
(declare () int gl_MaxVaryingComponents)
(declare () int gl_MaxClipDistances)
(declare () int gl_MaxFragmentUniformComponents)
(declare () int gl_MaxVaryingFloats)
(declare () int gl_MaxVertexUniformComponents)
(declare () int gl_MaxDrawBuffers)
(declare () int gl_MaxTextureImageUnits)
(declare () int gl_MaxCombinedTextureImageUnits)
(declare () int gl_MaxVertexTextureImageUnits)
(declare () int gl_MaxVertexAttribs)
(declare (shader_in ) vec3 inVertexPosition)
(declare (shader_in ) vec3 inVertexColor)
(declare (uniform ) mat4 MVP)
(declare (shader_out smooth) vec3 out0)
(function main
  (signature void
    (parameters
    )
    (
      (declare (temporary ) vec4 vec_ctor)
      (assign  (w) (var_ref vec_ctor)  (constant float (1.000000)) ) 
      (assign  (xyz) (var_ref vec_ctor)  (var_ref inVertexPosition) ) 
      (assign  (xyzw) (var_ref gl_Position)
            (expression vec4 * (var_ref MVP) (var_ref vec_ctor) ) ) 
      (assign  (xyz) (var_ref out0)  (var_ref inVertexColor) ) 
    ))
)
)

Notice, however, that the IR representation we get is not the one that is produced by the parser. As we will see later, that initial IR will be modified in multiple ways by Mesa, for example by adding different kinds of optimizations, so the IR that we see is the result after all these processing passes over the original IR. Mesa refers to this post-processed version of the IR as LIR (low-level IR) and to the initial version of the IR as produced by the parser as HIR (high-level IR). If we want to print the HIR (or any intermediary version of the IR as it transforms into the final LIR), we can edit the compiler and add calls to _mesa_print_ir as needed.

Traversing the Mesa IR

We mentioned before that some of the compiler’s work (a big part, in fact) has to do with optimizations and modifications of the IR. This means that the compiler needs to traverse the IR tree and identify subtrees that are relevant to this kind of operations. To achieve this, Mesa uses the visitor design pattern.

Basically, the idea is that we have a visitor object that can traverse the IR tree and we can define the behavior we want to execute when it finds specific nodes.

For instance, there is a very simple example of this in src/glsl/linker.cpp: find_deref_visitor, which detects if a variable is ever read. This involves traversing the IR, identifying ir_dereference_variable nodes (the ones where a variable’s value is accessed) and check if the name of that variable matches the one we are looking for. Here is the visitor class definition:

/**
 * Visitor that determines whether or not a variable is ever read.
 */
class find_deref_visitor : public ir_hierarchical_visitor {
public:
   find_deref_visitor(const char *name)
      : name(name), found(false)
   {
      /* empty */
   }

   virtual ir_visitor_status visit(ir_dereference_variable *ir)
   {
      if (strcmp(this->name, ir->var->name) == 0) {
         this->found = true;
         return visit_stop;
      }

      return visit_continue;
   }

   bool variable_found() const
   {
      return this->found;
   }

private:
   const char *name;       /**< Find writes to a variable with this name. */
   bool found;             /**< Was a write to the variable found? */
};

And this is how we get to use this, for example to check if the shader code ever reads gl_Vertex:

find_deref_visitor find("gl_Vertex");
find.run(sh->ir);
if (find.variable_found()) {
  (...)
}

Most optimization and lowering passes in Mesa are implemented as visitors and follow a similar idea. We will look at examples of these in a later post.

Built-in variables and functions

GLSL defines a set of built-in variables (with ‘gl_’ prefix) for each shader stage which Mesa injects into the shader code automatically. If you look at the example where we used MESA_GLSL=dump to obtain the generated Mesa IR you can see some of these variables.

Mesa implements support for built-in variables in _mesa_glsl_initialize_variables(), defined in src/glsl/builtin_variables.cpp.

Notice that some of these variables are common to all shader stages, while some are specific to particular stages or available only in specific versions of GLSL.

Depending on the type of variable, Mesa or the hardware driver may be able to provide the value immediately (for example for variables holding constant values like gl_MaxVertexAttribs or gl_MaxDrawBuffers). Otherwise, the driver will probably have to fetch (or generate) the value for the variable from the hardware at program run-time by generating native code that is added to the user program. For example, a geometry shader that uses gl_PrimitiveID will need that variable updated for each primitive processed by the Geometry Shader unit in a draw call. To achieve this, a driver might have to generate native code that fetches the current primitive ID value from the hardware and puts stores it in the register that provides the storage for the gl_PrimitveID variable before the user code is executed.

The GLSL language also defines a number of available built-in functions that must be provided by implementators, like texture(), mix(), or dot(), to name a few examples. The entry point in Mesa’s GLSL compiler for built-in functions
is src/glsl/builtin_functions.cpp.

The method builtin_builder::create_builtins() takes care of registering built-in functions, and just like with built-in variables, not all functions are always available: some functions may only be available in certain shading units, others may only be available in certain GLSL versions, etc. For that purpose, each built-in function is registered with a predicate that can be used to test if that function is at all available in a specific scenario.

Built-in functions are registered by calling the add_function() method, which registers all versions of a specific function. For example mix() for float, vec2, vec3, vec4, etc Each of these versions has its own availability predicate. For instance, mix() is always available for float arguments, but using it with integers requires GLSL 1.30 and the EXT_shader_integer_mix extension.

Besides the availability predicate, add_function() also takes an ir_function_signature, which tells Mesa about the specific signature of the function being registered. Notice that when Mesa creates signatures for the functions it also defines the function body. For example, the following code snippet defines the signature for modf():

ir_function_signature *
builtin_builder::_modf(builtin_available_predicate avail,
                       const glsl_type *type)
{
   ir_variable *x = in_var(type, "x");
   ir_variable *i = out_var(type, "i");
   MAKE_SIG(type, avail, 2, x, i);

   ir_variable *t = body.make_temp(type, "t");
   body.emit(assign(t, expr(ir_unop_trunc, x)));
   body.emit(assign(i, t));
   body.emit(ret(sub(x, t)));

   return sig;
}

GLSL’s modf() splits a number in its integer and fractional parts. It assigns the integer part to an output parameter and the function return value is the fractional part.

This signature we see above defines input parameter ‘x’ of type ‘type’ (the number we want to split), an output parameter ‘i’ of the same type (which will hold the integer part of ‘x’) and a return type ‘type’.

The function implementation is based on the existence of the unary operator ir_unop_trunc, which can take a number and extract its integer part. Then it computes the fractional part by subtracting that from the original number.

When the modf() built-in function is used, the call will be expanded to include this IR code, which will later be transformed into native code for the GPU by the corresponding hardware driver. In this case, it means that the hardware driver is expected to provide an implementation of the ir_unop_trunc operator, for example, which in the case of the Intel i965 driver is implemented as a single hardware instruction (see brw_vec4_visitor.cpp or brw_fs_visitor.cpp
in src/mesa/drivers/dri/i965).

In some cases, the implementation of a built-in function can’t be defined at the IR level. In this case the implementation simply emits an ad-hoc IR node that drivers can identify and expand appropriately. An example of this is EmitVertex() in a geometry shader. This is not really a function call in the traditional sense, but a way to signal the driver that we have defined all the attributes of a vertex and it is time to “push” that vertex into the current primitive. The meaning of “pushing the vertex” is something that can’t be defined at the IR level because it will be different for each driver/hardware. Because of that, the built-in function simply injects an IR node ir_emit_vertex that drivers can identify and implement properly when the time comes. In the case of the Intel code, pushing a vertex involves a number of steps that are very intertwined with the hardware, but it basically amounts to generating native code that implements the behavior that the hardware expects for that to happen. If you are curious, the implementation of this in the i965 driver code can be found in brw_vec4_gs_visitor.cpp, in the visit() method that takes an ir_emit_vertex IR node as parameter.

Coming up next

In this post we discussed the parser, which is the entry point for the compiler, and introduced the Mesa IR, the main data structure. In following posts we will delve deeper into the GLSL compiler implementation. Specifically, we will look into the lowering and optimization passes as well as the linking process and the hooks for hardware drivers that deal with native code generation.

by Iago Toral at March 03, 2015 08:28 AM

March 02, 2015

Lorenzo Tilve

Fosdem 2015

brussels

This is a good excuse as any other to retake the blog back to life.

It has been a long, long time since I was writing here last time, and I have been doing quite different things since then. As part of the Igalia browsers team, I have been working on WebKit related projects (mostly WebKitGTK+ and Epiphany), contributing on Chromium and Blink, and dealing with the coverage of Media Source Extensions specification.

Now going back to the reason for this post, it was my first year both at FOSDEM and Brussels. I had an idea about the size of the event, but it was indeed quite impressive.

This year we had four talks from igalians there, sharing the state on a few things that some our teams have been working on. In this case, we were talking about OpenGL conformance validation, on our improvements on performance and testing with Pflua, and the progress on LibreOffice for Android.

In particular the Igalia talks at FOSDEM 2015 were the following ones:

The four of them were really cool, and they had a lot of people attending to (well done mates :D )

Also, after digesting the FOSDEM huge schedule, my plan was to get the most out of the profiling and multimedia related tracks, and a few other talks that were calling my attention.

Saturday

On top of my colleagues presentations, and a part of the introductory talk by Karen Sandler, I could attend to Valgrind Integration in the Eclipse IDE , What is wrong with Operating SystemsTuning Valgrind for your WorkloadGStreamer in the living room and in outer spaceHow to start hacking on Valgrind by example and How to record all TV. Creating a 30 channel DVR.

There were some useful tricks at the Valgrind track and a nice overview on GStreamer portability. It was also curious finding RMS at the exhibition area. After a long conference day, it was Delirium Tremens time.

Sunday

My schedule for the Sunday was Ubuntu on phones and beyondMobile == WebKodi mediacenter (XBMC) past, present and futureIt’s not a bug, it’s an environment problemServo (the parallel web browser) and YOU! and Living on Mars: A Beginner’s Guide.

I found particularly interesting the one from Stormy Peters on how limited the understanding of the web is going to be for the next new billion users due to the devices used for it. I had also planned to attend to the one by Habib Virji (who I knew at the 2014 Web Engines Hackfest) but is talk on Web Security was full.

It was also curious the closing keynote on Mars One project. It would be fun if they manage to solve the (apparently still quite complex) open problems and this generation can witness a Mars colony.

To conclude, I think I probably rushed too much to get myself into an excessive amount of talks for 2 days, but it was great in any case to get info and fresh updates on the state of a lot of projects, meet some old friends, and breath some (literally) fresh air.

by lorenzo tilve at March 02, 2015 05:12 PM

February 24, 2015

Manuel Rego

Grid Auto-Placement Is Ready

Back in June I wrote my first post about CSS Grid Layout automatic placement feature, where I was talking about finishing the support for this feature “soon”. At the end it took a bit longer, but finally this month Igalia has completed the implementation and you can already start to test it in both Blink and WebKit.

A little bit of history

Basic support for auto-placement feature was already working since 2013. The past summer I added support for spanning auto-positioned items. However, the different packing modes of the grid item placement algorithm (aka auto-placement algorithm) were not implemented yet.

These packing modes were sparse, dense and stack. Initial behavior was implementing dense. I implemented sparse without too much trouble, however when implementing stack I had some doubts that I shared with the CSS Working Group. This ended up with some discussions and, finally, the removal of the stack mode. So you can forget about what I explained about it in my previous post.

Final syntax

After some back and forth it seems that now we’ve a definitive syntax for the grid-auto-flow property.

This property allows to determine two different things:

  • Direction: row (by default) or column.
    Defines the direction in which the grid is going to grow if needed to insert the auto-placed items.
  • Mode: sparse (if omitted) or dense.
    Depending on the packing mode the algorithm will try to fill (dense) or not (sparse) all the holes in the grid while inserting the auto-placed items.

So, you can use different combinations of these keywords (row, column and dense) to determine the desired behavior. Examples of some valid declarations:

grid-auto-flow: column;
grid-auto-flow: dense;
grid-auto-flow: row dense;
grid-auto-flow: dense column;

Let’s use an example to explain this better. Imagine the following 3x3 grid:

<div style="grid-template-rows: 50px 50px 50px; grid-template-columns: 100px 100px 100px;">
    <div style="grid-row: span 2; grid-column: 2;">item 1</div>
    <div style="grid-column: span 2;">item 2</div>
    <div>item 3</div>
    <div>item 4</div>
</div>

Depending on the value of grid-auto-flow property, the grid items will be placed in different positions as you can see in the next picture.

grid-auto-flow values example grid-auto-flow values example

Grid item placement algorithm

I’ve been talking about this algorithm for a while already. It describes how the items should be placed into the grid. Let’s use a new example that will help to understand better how the algorithm works:

<div style="grid-template-rows: repeat(4, 50px); grid-template-columns: repeat(5, 50px);">
    <div style="grid-row: 1; grid-column: 2;">i1</div>
    <div style="grid-row: 2; grid-column: 1;">i2</div>
    <div style="grid-row: 1; grid-column: span 2;">i3</div>
    <div style="grid-row: 1;">i4</div>
    <div style="grid-row: 2;">i5</div>
    <div style="grid-column: 2;">i6</div>
    <div>i7</div>
    <div style="grid-column: 3;">i8</div>
    <div>i9</div>
</div>

For items with definite positions (i1 and i2), you don’t need to calculate anything, simply place them in the positions defined by the grid placement properties and that’s all. However, for auto-placed items this is the algorithm explaining how to convert the automatic positions into actual ones.

Grid items location is determined by the row and column coordinates and the number of tracks it spans in each position. So you might have 3 types of auto-placed items:

  • Both coordinates are auto: i7 and i9.
  • Major axis position is auto: i6 and i8.
  • Minor axis position is auto: i3, i4 and i5.

Note: Major axis refers to the direction determined by grid-auto-flow property, and minor axis to the opposite one. For example in “grid-auto-flow: column;” the major axis is column and the minor axis is row.

Let’s describe briefly the main steps of the algorithm (considering that the direction determined by grid-auto-flow is row):

  1. First all the non auto-positioned items (the ones with definite position) should be placed in the grid.

    Example:

    • i1 (grid-row: 1; grid-column: 2;): 1st row and 2nd column.
    • i2 (grid-row: 2; grid-column: 1;): 2nd row and 1st column.

    Grid item placement algorithm: Step 1 Grid item placement algorithm: Step 1

  2. Next step is to place the auto-positioned items where the major axis direction is not auto, so they’re locked to a given row.

    Here we have different behaviors depending on the packing mode.

    • sparse: Look for the first empty grid area in this row where the item fits and it’s past any item previously placed by this step in the same row.

      Example:

      • i3 (grid-row: 1; grid-column: span 2;): 1st row and 3rd-4th columns.
      • i4 (grid-row: 1;): 1st row and 5th column.
      • i5 (grid-row: 2;): 2nd row and 2nd column.

      Grid item placement algorithm: Step 2 (sparse) Grid item placement algorithm: Step 2 (sparse)

    • dense: Look for the first empty grid area in this row where the item fits (without caring about the previous items).

      Example:

      • i3 (grid-row: 1; grid-column: span 2;): 1st row and 3rd-4th columns.
      • i4 (grid-row: 1;): 1st row and 1st column.
      • i5 (grid-row: 2;): 2nd row and 2nd column.

      Grid item placement algorithm: Step 2 (dense) Grid item placement algorithm: Step 2 (dense)

  3. Finally the rest of auto-placed items are positioned.

    Again the behavior depends on the packing mode. And the description is pretty similar to the one in the previous step, but without having the row constraint.

    • sparse: Look for the first empty area where the item fits and it’s past any item previously placed by this step. This means that we start looking for the empty area from the position of the last item placed.

      Example:

      • i6 (grid-column: 2;): 3rd row and 2nd column.
      • i7: 3rd row and 3rd column.
      • i8 (grid-column: 3;): 4th row and 3rd column.
      • i9: 4th row and 4th column.

      Grid item placement algorithm: Step 3 (sparse) Grid item placement algorithm: Step 3 (sparse)

    • dense: Look for the first empty area where the item fits. Starting always to look from the beginning of the grid.

      Example:

      • i6 (grid-column: 2;): 3rd row and 2nd column.
      • i7: 1st row and 5th column.
      • i8 (grid-column: 3;): 2nd row and 3rd column.
      • i9: 2nd row and 4th column.

      Grid item placement algorithm: Step 3 (dense) Grid item placement algorithm: Step 3 (dense)

Implementation details

Probably most of you won’t be interested in the details related to the implementation of this feature in Blink and WebKit. So, feel free to skip this point and move to the next one.

You can check the meta-bugs in Blink and WebKit to see all the patches involved in this feature. The code is almost the same in both projects and the main methods involved in the grid auto-placement are:

  • RenderGrid::placeItemsOnGrid() (Blink & WebKit): Which is in charge of placing the non auto-positioned items, covering step 1 explained before. And then it calls the next 2 methods with the auto-positioned items depending on its type.

  • RenderGrid::placeSpecifiedMajorAxisItemsOnGrid() (Blink & WebKit): This method will process the auto-placed items where only minor axis position is auto. So, they’re locked to a given row/column, which corresponds to step 2.

    Note that in the sparse packing mode we need to keep a cursor for each row/column to fulfill the condition related to have items placed after previous ones.

  • RenderGrid::placeAutoMajorAxisItemsOnGrid() (Blink & WebKit): And this last method is the one placing the auto-positioned items where the major axis position or both coordinates are auto.

    In this case, it uses the auto-placement cursor in order to implement the sparse behavior.

It’s also important to mention the class RenderGrid::GridIterator (Blink & WebKit) which has the responsibility to find the empty grid areas that are big enough to fit the auto-placed grid items.

Note: RenderGrid has just been renamed to LayoutGrid in Blink.

Peculiarities

On one side, the sparse packing mode is intended to preserve the DOM order of the grid items. However, this is only true if for all the items the major axis coordinate or both are auto. Otherwise, the DOM order cannot be guaranteed.

This is very easy to understand, as if you have some fixed elements, they’re going to be placed in their given positions, and the algorithm cannot maintain the ordering.

Let’s use a very simple example to show it:

<div style="display: grid;">
    <div style="grid-row: 1; grid-column: span 2;">item 1</div>
    <div>item 2</div>
    <div style="grid-row: 1; grid-column: 2;">item 3</div>
</div>

Where the elements will be positioned in:

  • item 1: 1st row and 3rd-4th columns.
  • item 2: 1st row and 1st column.
  • item 3: 1st row and 2nd column.

Grid item placement algorithm ordering Grid item placement algorithm ordering

On the other hand, there is another issue regarding sparse again, but this time in the second step of the algorithm (the one related to items locked to a given row/column). Where the items are placed after any other item previously positioned in that step in the same row/column.

Let’s take a look to the following example:

<div style="display: grid;">
    <div style="grid-row: 1; grid-column: 1;">item 1</div>
    <div style="grid-row: 1 / span 2;">item 2</div>
    <div style="grid-row: 2">item 3</div>
</div>

According to the algorithm these items will be placed at:

  • item 1: 1st row and 1st column.
  • item 2: 1st-2nd row and 2nd column.
  • item 3: 2nd row and 1st column.

Grid item placement algorithm sparse fixed row/column Grid item placement algorithm sparse fixed row/column

However, for the case of item 3 we could think that it should be placed in the 2nd row and the 3rd column. Because of item 2 is actually placed in the 1st and 2nd rows. But, for the sake of simplicity, this isn’t the behavior described in the spec.

Use cases

So far this has been too abstract, so let’s try to think in some real cases where the auto-placement feature might be handy.

The canonical example is the one about how to format a form. You can check the example in my previous post which puts labels in the first column and inputs in the second one. Or, the example in the spec with a little bit more complex form.

Another example could be how to format a definition list. Imagine that we’ve the following list:

<dl>
    <dt>Cat</dt>
    <dd>A carnivorous mammal (Felis catus) long domesticated as a pet and for catching rats and mice.</dd>
    <dt>Crocodile</dt>
    <dd>Any of several large carnivorous thick-skinned long-bodied aquatic reptiles (family Crocodylidae) of tropical and subtropical waters; broadly: crocodilian.</dd>
    <dt>Donkey</dt>
    <dd>The domestic ass (Equus asinus).</dd>
</dl>

In order to show it using a grid layout you would just need the next CSS lines:

dl {
    display: grid;
    grid-template-columns: 10px auto 10px 1fr 10px;
}

dt {
    grid-column: 2;
    width: max-content;
    align-self: center;
    justify-self: end;
}

dd {
    grid-column: 4;
    margin: 0px;
}

The result will be a 2-columns layout, where the terms go in the first column right aligned and vertically centered; and the definitions in the second column. With a 10px gutter wrapping the columns.

Definition list formatted using grid auto-placement feature Definition list formatted using grid auto-placement feature

Conclusion

Grid automatic placement has just been finished and you can start to play with it already in Chrome Canary (enabling the flag “Enable experimental Web Platform features”) or WebKit Nightly Builds. We think that it’s a nice and pretty powerful feature, and we encourage front-end web developers to test it (like the rest of CSS Grid Layout). The best part is that the spec is still having its last changes, and you’ll have the chance to provide feedback to the CSS Working Group if you miss something.

In order to make things easier to test I’ve adapted the demo to test the grid item placement algorithm, adding the option to specify the grid-auto-flow property too: http://igalia.github.io/css-grid-layout/autoplacement.html

Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

All this work has been done as part of the collaboration between Igalia and Bloomberg. We’ll be happy to receive your feedback about this feature. Stay tuned to get the latest news regarding CSS Grid Layout development!

February 24, 2015 11:00 PM