Planet Igalia

June 29, 2015

Manuel Rego

CSS Grid Layout is just around the corner (CSSConf US 2015)

Coming back to real life after a wonderful week in New York City is not that easy, but here we’re on the other side of the pond writing about CSS Grid Layout again.

First kudos to Bocoup for the CSSConf US 2015 organization. Specially to Adam Sontag and the rest of the conference staff. You were really supportive during the whole week. And the videos with live transcripts were available just a few days after the conference, awesome job! The only issue was the internet connection which was really flaky.

So, yeah I attended CSSConf this year, but not only that, I was also speaking about CSS Grid Layout and the video of my talk is already online together with the slides.

During the talk I described the basic concepts, syntax and features of CSS Grid with different live coding examples. Then I tried to explain the main tasks that the browser has to do in order to render a grid and gave some tips about grid performance. Finally, we reviewed the browsers adoption and the status of Chromium/Blink and Safari/WebKit implementations that Igalia is doing.

CSS Grid Layout is just around the corner talk sketchnotes by Susan CSS Grid Layout is just around the corner talk sketchnotes by Susan

The feedback about my talk was incredibly positive and everybody seemed really excited about what CSS Grid Layout can bring to the web platform. Big thanks to you all!

Of course, there were other great talks at CSSConf as you can check in the videos. From the top of my head, I loved the one by Lea Verou, impressive talk as usual where she even released a polyfill for conic gradients on the stage. SVG and animations have two nice talks by Chris Coyier and Sarah Drasner. PostCSS and inline styles were also hot topics. Responsive (and responsible!) images, Fun.css and CSS? WTF! were also great (and probably I’m forgetting some other).

Last, on Thursday’s night we attended BrooklynJS which had a great panel discussing about CSS. The inline styles vs stylesheets topic became hot, as projects like React are moving people away from stylesheets. Chris Coyier (one of the panelists and also speaker at CSSConf) wrote a nice post past week giving a good overview of this topic. Also The Four Fives were amazing!

On top of that, as part of the collaboration between Igalia and Bloomberg, I was visiting their fancy office in Manhattan. I spent a great time there talking about grids with several people from the team. They really believe that CSS Grid Layout will change the future of the web benefiting lots of people in different use cases, and hopefully helping to alleviate performance issues in complex scenarios.

Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

Looking forward for the next opportunity to talk about CSS Grid Layout. Keeping the hard work to make it a reality as soon as possible!

June 29, 2015 10:00 PM

June 24, 2015

Javier Fernández

Performance analysis of Grid Layout

Now that we have a quite complete implementation of CSS Grid Layout specification it’s time to take care of performance analysis and optimizations. In this essay, which is the first of a series of posts about performance, I’ll first introduce briefly how to use Blink (Chrome) and WebKit (Safari) performance analysis tools, some of the most interesting cases I’ve seen during my work on the implementation of this spec and, finally, a basic case to compare Flexbox and Grid layout models, which I’d like to evolve and analyze further in the coming months.

Performance analysis tools

Both WebKit and Blink projects provide several useful and easy to use scrips (python) to run a set of test cases and take different measurements and early analysis. They were written before the fork, that’s why related documentation can be found at WebKit’s track, but both engines still uses them, for the time being.

Tools/Scripts/run-perf-tests
Tools/Scripts/webkitpy/performance_tests/

There are a wide set of performance tests under PerformanceTest folder, at Blink’s/WebKit’s root directory, but even though both engines share a substantial number of tests, there are some differences.

(blink’s root directory) $ ls PerformanceTests/
Bindings BlinkGC Canvas CSS DOM Dromaeo Events inspector Layout Mutation OWNERS Parser resources ShadowDOM Skipped SunSpider SVG XMLHttpRequest XSSAuditor

Chromium project has introduced a new performance tool, called Telemetry, which in addition of running the above mentioned tests, it’s designed to execute more complex cases like running specific PageSets or doing benchmarking to compare results with a preset recording (WebPageRelay). It’s also possible to send patches to performance try bots, directly from gclient or git (depot_tools) command line. There are quite much information available in the following links:

Regarding profiling tools, it’s possible both in Webkit and Blink to use the –profiler option when running the performance tests so we can collect profiling data. However, while WebKit recommends perf for linux, Google’s Blink engine provides some alternatives.

CSS Grid Layout performance tests and current status

While implementing a new browser feature is not easy to measure performance while code evolves so much and quickly and, what it’s worst, be aware of regressions introduced by new logic. When the feature’s syntax changes or there are missing or incomplete functionality, it’s not always possible to establish a well defined baseline for performance. It’s also a though decision to determine which use cases we might care about; obviously the faster the better, but adding performance optimizations usually complicates code, it may affect its robustness and it could lead to unexpected, and even worst, hard to find bugs.

At the time of this writing, we had 3 basic performance tests:

Why we have selected those uses cases to measure and keep track of performance regression ? First of all, note that auto-sizing one of the most expensive branches inside the grid track sizing algorithm, so we are really interested on both, improving it and keeping track of regressions on this code path.

body {
    display: grid;
    grid-template-rows: repeat(100, auto);
    grid-template-columns: repeat(20, auto);
}
.gridItem {
    height: 200px;
    width: 200px;
}

On the other hand, fixed-sized is the easiest/fastest path of the algorithm, so besides the importance of avoiding regressions (when possible), it’s also a good case to compare with auto-sized.

body {
    display: grid;
    grid-template-rows: repeat(100, 200px);
    grid-template-columns: repeat(20, 200px);
}
.gridItem {
    height: 200px;
    width: 200px;
}

Finally, a stretching use cases was added because it’s the default alignment value for grid items and the two test cases already described use fixed size items, hence no stretch (even though items fill the whole grid cell area). Given that I implemented CSS Box Alignment support for grid I was conscious of how expensive the stretching logic is, so I considered it an important use case to analyze and optimize as much as possible. Actually, I’ve already introduced several optimizations because the early implementation was quite slow, around 40% slower than using any other basic alignment (start, end, center). We will talk more about this later when we analyze a case to compare Flexbox and Grid performance in layout.

body {
    display: grid;
    grid-template-rows: repeat(100, 200px);
    grid-template-columns: repeat(20, 200px);
}
.gridItem {
    height: auto;
    width: auto;
}

The basic HTML body of these 3 tests is quite simple because we want to analyze performance of very specific parts of the Grid Layout logic, in order to detect regressions in sensible code paths. We’d like to have eventually some real use cases to analyze and create many more performance tests, but chrome performance platform it’s definitively not the place to do so. The following graphs show performance evolution during 2015 for the 3 tests we have defined so far.

grid-performance-overview

Note that yellow trace shows data taken from a reference build, so we can discount temporary glitches on the machine running the performance tests of target build, which are shown in the blue trace; this reference trace is also useful to detect invalid regression alerts.

Why performance is so different for these cases ?

The 3 tests we have for Grid Layout use runs/second values as a way to measure performance; this is the preferred method for both WebKit and Blink engines because we can detect regressions with relatively small tests. It’s possible, though, to do other kind of measurements. Looking at the graphs above we can extract the following data:

  • auto-sized grid: around 650 runs/sec
  • fixed-sized grid: around 1400 runs/sec
  • fixed-sized stretched grid: around 1250 runs/sec

Before analyzing possible causes of performance drop for each case, I’ve defined some additional tests to stress even more these 3 cases, so we can realize how grid size affect to the obtained results. I defined 20 tests for these cases, each one with different grid items; from 10×10 up to 200×200 grids. I run those tests in my own laptop, so let’s take the absolute numbers of each case with a grain of salt; although differences between each of these 3 scenarios should be coherent. The table below shows some numeric results of this experiment.

grid-fixed-VS-auto-VS-stretch

First of all, recall that these 3 tests produce the same web visualization, consisting of grids with NxN items of 100px each one. The only difference is the grid layout strategy used to produce such result: auto-sizing, fixed-sizing and stretching. So now, focusing on previous table’s data we can evaluate the cost, in terms of layout performance, of using auto-sized tracks for defining the grid (which may be the only solution for certain cases). Performance drop is even growing with the number of grid items, but we can conclude that it’s stabilized around 60%. On the other hand stretching is also slower but, unlike auto-sized, in this case performance drop does not show a high dependency of grid size, more or less constant around 15%.

grid-performance-graphs-2

Impact of auto-sized tracks in layout performance

Basically, the track sizing algorithm can be described in the following 4 steps:

  • 1- Initialize per Grid track variables.
  • 2- Resolve content-based TrackSizingFunctions.
  • 3- Grow all Grid tracks in GridTracks from their baseSize up to their growthLimit value until freeSpace is exhausted.
  • 4- Grow all Grid tracks having a fraction as the MaxTrackSizingFunction.

These steps will be executed twice, first cycle for determining column tracks’s size and another cycle to set row tracks’s size which it may depend on grid’s width. When using just fixed-sized tracks in the very simple case we are testing, the only computation required to determine grid’s size is completing step 1 and determining free available space based on the specified fixed-size values of each track.

// 1. Initialize per Grid track variables.
for (size_t i = 0; i < tracks.size(); ++i) {
    GridTrack& track = tracks[i];
    GridTrackSize trackSize = gridTrackSize(direction, i);
    const GridLength& minTrackBreadth = trackSize.minTrackBreadth();
    const GridLength& maxTrackBreadth = trackSize.maxTrackBreadth();
 
    track.setBaseSize(computeUsedBreadthOfMinLength(direction, minTrackBreadth));
    track.setGrowthLimit(computeUsedBreadthOfMaxLength(direction, maxTrackBreadth, track.baseSize()));
 
    if (trackSize.isContentSized())
        sizingData.contentSizedTracksIndex.append(i);
    if (trackSize.maxTrackBreadth().isFlex())
        flexibleSizedTracksIndex.append(i);
}
for (const auto& track: tracks) {
    freeSpace -= track.baseSize();
}

Focusing now on the auto-sized scenario, we will have the overhead of resolving content-sized functions for all the grid items.

// 2. Resolve content-based TrackSizingFunctions.
if (!sizingData.contentSizedTracksIndex.isEmpty())
    resolveContentBasedTrackSizingFunctions(direction, sizingData);

I didn’t add source code of resolveContentBasedTrackSizingFunctions because it’s quite complex, but basically it implies a cost proportional to the number of grid tracks (minimum of 2x), in order to determine minContent and maxContent values for each grid item. It might imply additional computation overhead when using spanning items; it would require to sort them based on their spanning value and iterate over them again to resolve their content-sized functions.

Some issues may be interesting to analyze in the future:

  • How much each content-sized track costs ?
  • What is the impact on performance of using flexible-sized tracks ? Would it be the worst case scenario ? Considering it will require to follow the four steps of track sizing algorithm, it likely will.
  • Which are the performance implications of using spanning items ?

Why stretching is so performance drain ?

This is an interesting issue, given that stretch is the default value for both Grid and Flexbox items. Actually, it’s the root cause of why Grid beats Flexbox in terms of layout performance for the cases when stretch alignment is used. As I’ll explain later, Flexbox doesn’t have the optimizations I’ve implemented for Grid Layout.

Stretching logic takes place during the grid container layout operations, after all tracks have their size precisely determined and we have properly computed all grid track’s positions relatively to the grid container. It happens before the alignment logic is executed because stretching may imply changing some grid item’s size, hence they will be marked for layout (if they wasn’t already).

Obviously, stretching only takes place when the corresponding Self Alignment properties (align-self, justify-self) have either auto or stretch as value, but there are other conditions that must be fulfilled to trigger this operation:

  • box’s computed width/height (as appropriate to the axis) is auto.
  • neither of its margins (in the appropriate axis) are auto
  • still respecting the constraints imposed by min-height/min-width/max-height/max-width

In that scenario, stretching logic implies the following operations:

LayoutUnit stretchedLogicalHeight = availableAlignmentSpaceForChildBeforeStretching(gridAreaBreadthForChild, child);
LayoutUnit desiredLogicalHeight = child.constrainLogicalHeightByMinMax(stretchedLogicalHeight, -1);
 
bool childNeedsRelayout = desiredLogicalHeight != child.logicalHeight();
if (childNeedsRelayout || !child.hasOverrideLogicalContentHeight())
    child.setOverrideLogicalContentHeight(desiredLogicalHeight - child.borderAndPaddingLogicalHeight());
if (childNeedsRelayout) {
    child.setLogicalHeight(0);
    child.setNeedsLayout();
}
 
LayoutUnit LayoutGrid::availableAlignmentSpaceForChildBeforeStretching(LayoutUnit gridAreaBreadthForChild, const LayoutBox& child) const
{
    LayoutUnit childMarginLogicalHeight = marginLogicalHeightForChild(child);
 
    // Because we want to avoid multiple layouts, stretching logic might be performed before
    // children are laid out, so we can't use the child cached values. Hence, we need to
    // compute margins in order to determine the available height before stretching.
    if (childMarginLogicalHeight == 0)
        childMarginLogicalHeight = computeMarginLogicalHeightForChild(child);
 
    return gridAreaBreadthForChild - childMarginLogicalHeight;
}

In addition to the extra layout required for changing grid item’s size, computing the available space for stretching adds an additional overhead, overall if we have to compute grid item’s margins because some layout operations are still incomplete.

Given that grid container relies on generic block’s layout operations to determine the stretched width, this specific logic is only executed for determining the stretched height. Hence performance drop is alleviated, compared with the auto-sized tracks scenario.

Grid VS Flexbox layout performance

One of the main goals of CSS Grid Layout specification is to complement Flexbox layout model for 2 dimensions. It’s expectable that creating grid designs with Flexbox will be more inefficient than using a layout model specifically designed for these cases, not only regarding CSS syntax, but also regarding layout performance.

However, I think it’s interesting to measure Grid Layout performance in 1-dimensional cases, usually managed using Flexbox, so we can have comparable scenarios to evaluate both models. In this post I’ll start with such cases, using a very simple one in this occasion. I’d like to get more complex examples in future posts, the ones more usual in Flexbox based designs.

So, let’s consider the following simple test case:

<div class="className">
   <div class="i1">Item 1</div> 
   <div class="i2">Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</div>
   <div class="i3">Item 3 longer</div>
</div>

I evaluated the simple HTML example above with both Flexbox and Grid layouts to measure performance. I used a CPU profiler to figure out where the bottlenecks are for each model, trying to explain where differences came from. So, I defined 2 CSS classes for each layout model, as follows:

.flex {
    background-color: silver;
    display: flex;
    height: 100px;
    align-items: start;
}
.grid {
    background-color: silver;
    display: grid;
    grid-template-columns: 100px 1fr auto;
    grid-template-rows: 100px;
    align-items: start;
    justify-items: start;
}
.i1 { 
    background-color: cyan;
    flex-basis: 100px; 
}
.i2 { 
    background-color: magenta;
    flex: 1; 
}
.i3 { 
    background-color: yellow; 
}

Given that there is not concept of row in Flexbox, I evaluated performance of 100 up to 2000 grid or flex containers, creating 20 tests to be run inside the chrome performance framework, described at the beginning of this post. You can check out resources and a script to generate them at our github examples repo.

flexVSgrid

When comparing both layout models targeting layout times, we see clearly that Grid Layout beats Flexbox using the default values for CSS properties controlling layout itself and alignment, which is stretch for these containers. As it was explained before, the stretching logic adds an important computation overhead, which as we can see now in the numeric table above, has more weight for Flexbox than Grid.

Looking at the plot about differences in layout time, we see that for the default case, Grid performance improvement is stabilized around 7%. However, when we avoid the stretching logic, for instance by using any other alignment value, layout performance it’s considerable worse than Flexbox, for this test case, around 15% slower. This is something sensible, as this test case is the idea for Flexbox, while a bit artificial for Grid; using a single Grid with N rows improves performance considerably, getting much better numbers than Flexbox, but we will see these cases in future analysis.

Grid layout better results for the default case (stretch) are explained because I implemented several optimizations for Grid. Probably Flexbox should do the same, as it’s the default value and it could affect many sites using this layout model in their designs.

Thanks to Bloomberg for sponsoring this work, as part of the efforts that Igalia has been doing all these years pursuing a better and more open web.

Igalia & Bloomberg logos

by jfernandez at June 24, 2015 12:03 PM

June 18, 2015

Andy Wingo

arrow functions coming to chrome 45!

It's been a long time coming, but I just flipped the bit in V8 that will ship arrow functions in Chrome 45! Woo hoo!

You probably know, but arrow functions are a new way to write functions in JavaScript. They look like this:

// Two arguments, body implicitly returned.
(x, y) => x + y

// With just one argument, no parentheses needed.
x => x * 2

// Body can have braces too; in that case use "return".
x => { return x * 2 }

Relative to the other kind of function that is written like function (x) { return x * 2 }, arrow functions don't define this or arguments in their bodies, instead capturing these values from the environment. There are a couple of other minor differences, too, but instead of writing about them here I'll just point to the great article by Jason Orendorff of the SpiderMonkey team.

Arrow functions are part of the JavaScript language standard that was called "ECMAScript 6" or ES6, and I guess you could still call it that. It seems like a silly thing for the committee to do to throw away all their branding like that but they decided to rename it ECMAScript 2015, which I'm sure is a link that the pedants are glad I have included. The upshot is that the standard is now final, gold master, etched in stone, which from an implementor's perspective is a relief. You can practically feel the anxiety ebbing away by the happy rate at which commits bubble out of source repositories and into shipping browsers, free from the fear that some spec change will force the hack-stream to change course.

From the V8 side, our arrow function implementation has also been a long time coming. My colleague Adrián Pérez did the first half of the work, and I picked up on the back end of things. It seems like such a small feature and in many ways it is, but still it took a long time. Now I know that my readers are a bunch of nerds and many of you like implementing languages, so you might appreciate these nargish points.

One of the first bits is that arrow functions are hard to parse. Consider, this is a valid JavaScript expression:

(x,y)

It's a "comma expression" that will evaluate x then y and its result will be the result of evaluating y. But add an arrow on after the end and you get not an expression but a formal parameter list:

(x,y)=>x+y

Now you might think, well OK, when you see an arrow, rewind the input stream and parse in "arrow function mode". Indeed that would be fine, but not in combination with some additional ES6 features, optional and destructuring arguments. Optional arguments look like this:

(x=42)=>x

The =42 part is the expression that will be evaluated to give x a value, if the function is called with no arguments. Note that this bit is still under implementation in V8 so you can't try it in your browser. An optional argument initializer is an expression and not a value, so you can also have:

(x=(x)=>42)=>x

Combined, this makes rewinding the token stream a proposition of exponential complexity, which is a no-go for a production JavaScript parser. Parsers are on the hot path for page-load times and no browser vendor wants to introduce a pathological case into their page load.

Instead, V8 does something I hadn't seen before. It keeps an open mind about whether something is a comma expression or a formal parameter list of an arrow function, and only makes a decision when it sees the => (or not). As it parses, V8 records places that it would signal an error for either a parameter list or for an expression, and then when that superimposed wave function collapses it checks that the production is valid, signalling the appropriate error if not. I thought this was a really neat trick, so if you're into that thing see expression classifier to see those details.

The other thing that's tricky about arrow functions is the this binding. In JavaScript, this is basically a hidden parameter passed to a function when it is called. Calling a function like o.f() passes the value of o to f as its this parameter. If instead f() is called directly, like with no dot before the call, then undefined is passed as this. Also for sloppy-mode functions, if the passed this value isn't an object, then the global object instead is assigned to this. Finally outside a function, this is bound to the global object.

OK, I know all of you know these things. Thing is, you always have a this, and although it's like a variable it's not a valid variable name, and before ES6 nothing could capture its value, because each function has its own this value. Perhaps you see where I'm going with this (ahem) now. Arrow functions introduce a function scope that doesn't have a this value, and that indeed might capture some other scope's this value, forcing it to be context-allocated. Other parts of ES6 can actually force assignment to this, like a super call, and that assignment can actually come from within an arrow function. Zounds! A simple concept, but there was a lot of incidental complexity in V8 around the implementation. Between Adrián and myself it took like three months to fix this usage in V8 to always just go through the (possibly context-allocated) variable, and there are still probably some devtools bugs to find in the upcoming weeks.

Performance-wise, arrow functions are just like functions. They should be just as fast as if you wrote them with function. So use them with joy, use them with abandon, use them judiciously -- however you decide you use them, don't let perf influence your decision one way or the other.

That's about it! Like all of my JS engine work over the past couple years, this hacking was sponsored by fabulous folks over at Bloomberg, so big ups to them. From me and Adrián at Igalia, until next time! We leave you to puzzle out what this bit of JavaScript evaluates to:

(({},{},({},{})=>({},{}))=>(({},{})=>({},{}),{},{}))({},{})

Happy hacking!

by Andy Wingo at June 18, 2015 04:41 PM

June 12, 2015

Samuel Iglesias

piglit (V): how to contribute to piglit and table of contents

Last post and the one before were about how to create your own piglit tests. Previously, I have written an introduction to piglit and how to launch a tailored piglit run (more details about these last two topics in my FOSDEM 2015 talk).

Now it’s time to talk about how to contribute to piglit.

How to contribute to piglit

Once you want to contribute something to piglit, you need to generate the patches and send them for review to the mailing list. They are usually created by git format-patch and sent by git send-email command (if you need help with git, there are a lot of tutorials). Remember to rebase your branch against up-to-dated master before creating the patches, so no merge conflicts will appear if the reviewers wants to apply them locally.

Whether you have some patches ready to be submitted or you have questions about piglit, subscribe to piglit@lists.freedesktop.org and send them there.

Most piglit developers work in other areas (such as OpenGL driver development!) which means that the review process of piglit patches could take some time, so be patient and wait.

If after some time (something like one or two weeks) there is no answer about your patches, you can send a reminder saying that a review is pending. If you have commit rights and the patch is trivial (or you are very confident that it is right), you can even push it to the repository’s master branch after that time. Piglit is not as strict as other projects in this regard, however do not abuse of this rule.

Once you have a good track of contributions or other contributors told you to do so, you can ask for piglit repository’s commit rights by following these instructions. And don’t hesitate to review patches from others!

Table of contents

This is the list of my piglit related posts:

  1. Piglit, an open-source test suite for OpenGL implementations
  2. piglit (II): How to launch a tailored piglit run
  3. piglit (III): How to write GLSL shader tests
  4. piglit (IV): How to write binary test programs
  5. piglit (V): how to contribute to piglit and table of contents

Plus my FOSDEM 2015 talk.

Thanks for following this short introduction to piglit. Happy hacking!

by Samuel Iglesias at June 12, 2015 08:07 AM

June 11, 2015

Samuel Iglesias

piglit (IV): How to write binary test programs

Last post I talked about how to develop GLSL shader tests and how to add them to piglit. This is a nice way to develop simple tests but sometimes you need to do something more complex. For that case, piglit can run binary test programs.

Introduction

Binary test programs in piglit are written in C language taking advantage of the piglit framework to facilitate the test development (there is no main() function, no need to setup GLX (or WGL or EGL or whatever), no need to check manually the available extensions or call to glXSwapBuffers() or similar functions…), however you can still use all OpenGL C-language API.

Simple example

Piglit framework is mostly undocumented but easy to understand once you start reading some existing tests. So I will start with a simple example explaining how a typical binary test looks like and then show you a real example.

/*
 * Copyright © 2015 Intel Corporation
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice (including the next
 * paragraph) shall be included in all copies or substantial portions of the
 * Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 * DEALINGS IN THE SOFTWARE.
 */

/** @file test-example.c
 *
 * This test shows an skeleton for creating new tests.
 */

#include "piglit-util-gl.h"

PIGLIT_GL_TEST_CONFIG_BEGIN
    config.window_width = 100;
    config.window_height = 100;
    config.supports_gl_compat_version = 10;
    config.supports_gl_core_version = 31;
    config.window_visual = PIGLIT_GL_VISUAL_DOUBLE | PIGLIT_GL_VISUAL_RGBA;

PIGLIT_GL_TEST_CONFIG_END

static const char vs_pass_thru_text[] =
    "#version 330n"
    "n"
    "in vec4 piglit_vertex;n"
    "n"
    "void main() {n"
    "   gl_Position = piglit_vertex;n"
        "}n";

static const char fs_source[] =
    "#version 330n"
    "n"
    "void main() {n"
    "       color = vec4(1.0, 0.0. 0.0, 1.0);n"
    "}n";

GLuint prog;

void
piglit_init(int argc, char **argv)
{
    bool pass = true;

    /* piglit_require_extension("GL_ARB_shader_storage_buffer_object"); */

    prog = piglit_build_simple_program(vs_pass_thru_text, fs_source);

    glUseProgram(prog);

    glClearColor(0, 0, 0, 0);

    /* <-- OpenGL commands to be done --> */

    glViewport(0, 0, piglit_width, piglit_height);

    /* piglit_draw_* commands can go into piglit_display() too */
    piglit_draw_rect(-1, -1, 2, 2);

    if (!piglit_check_gl_error(GL_NO_ERROR))
       pass = false;

    piglit_report_result(pass ? PIGLIT_PASS : PIGLIT_FAIL);
}

enum piglit_result piglit_display(void)
{
    /* <-- OpenGL drawing commands, if needed --> */

    /* UNREACHED */
    return PIGLIT_FAIL;
}

As you see in this example, there are four different parts:

  1. License header and description of the test
    • The license details should be included in each source file. There is one agreed by most contributors and it’s a MIT license assigning the copyright to Intel Corporation. More information in COPYING file.
    • It includes a brief description of what the test does.

    /*
     * Copyright © 2015 Intel Corporation
     *
     * Permission is hereby granted, free of charge, to any person obtaining a
     * copy of this software and associated documentation files (the "Software"),
     * to deal in the Software without restriction, including without limitation
     * the rights to use, copy, modify, merge, publish, distribute, sublicense,
     * and/or sell copies of the Software, and to permit persons to whom the
     * Software is furnished to do so, subject to the following conditions:
     *
     * The above copyright notice and this permission notice (including the next
     * paragraph) shall be included in all copies or substantial portions of the
     * Software.
     *
     * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
     * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
     * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
     * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
     * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
     * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
     * DEALINGS IN THE SOFTWARE.
     */
    
    /** @file test-example.c
     *
     * This test shows an skeleton for creating new tests.
     */

  2. Piglit setup. This is needed to check if a test can be executed by a given driver (minimum supported GL version), or to create a window of a specific size, or even to define if we want double buffering.
PIGLIT_GL_TEST_CONFIG_BEGIN
    config.window_width = 100;
    config.window_height = 100;
    config.supports_gl_compat_version = 10;
    config.supports_gl_core_version = 31;
    config.window_visual = PIGLIT_GL_VISUAL_DOUBLE | PIGLIT_GL_VISUAL_RGBA;

PIGLIT_GL_TEST_CONFIG_END

  • piglit_init(). This is the function that it is going to be called for configuring the test itself. Some tests implement all their code inside piglit_init() because it doesn’t need to draw anything (or it doesn’t need to update the drawing frame by frame). In any case, you usually put here the following code:
    • Check for needed extensions.
    • Check for limits or maximum values of different variables like GL_MAX_SHADER_STORAGE_BLOCKS, GL_UNIFORM_BLOCK_SIZE, etc.
    • Setup constant data, upload it.
    • All the initialization setup you need for drawing commands: compile and link shaders, set clear color, etc.

    void
    piglit_init(int argc, char **argv)
    {
        bool pass = true;
    
        /* piglit_require_extension("GL_ARB_shader_storage_buffer_object"); */
    
        prog = piglit_build_simple_program(vs_pass_thru_text, fs_source);
    
        glUseProgram(prog);
    
        glClearColor(0, 0, 0, 0);
    
        /* <-- OpenGL commands to be done --> */
    
        glViewport(0, 0, piglit_width, piglit_height);
    
        /* piglit_draw_* commands can go into piglit_display() too */
        piglit_draw_rect(-1, -1, 2, 2);
    
        if (!piglit_check_gl_error(GL_NO_ERROR))
           pass = false;
    
        piglit_report_result(pass ? PIGLIT_PASS : PIGLIT_FAIL);
    }

  • piglit_display(). This is the function that it is going to be executed periodically to update each frame of the rendered window. In some tests, you will find it almost empty (it returns PIGLIT_FAIL) because it is not needed by the test program.
  • enum piglit_result piglit_display(void)
    {
        /* <-- OpenGL drawing commands, if needed --> */
    
        /* UNREACHED */
        return PIGLIT_FAIL;
    }

    Notice that you are free to add any helper functions you need like any other C program but the aforementioned parts are required by piglit.

    Piglit API

    Piglit provides a lot of functions under its API to be used by the test program. They are usually often-used functions that substitute one or several OpenGL function calls and other code that accompany them.

    The available functions are listed in piglit-util.gl.h file, which must be included in every binary test source code.

    /*
     * Copyright (c) The Piglit project 2007
     *
     * Permission is hereby granted, free of charge, to any person obtaining a
     * copy of this software and associated documentation files (the "Software"),
     * to deal in the Software without restriction, including without limitation
     * on the rights to use, copy, modify, merge, publish, distribute, sub
     * license, and/or sell copies of the Software, and to permit persons to whom
     * the Software is furnished to do so, subject to the following conditions:
     *
     * The above copyright notice and this permission notice (including the next
     * paragraph) shall be included in all copies or substantial portions of the
     * Software.
     *
     * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
     * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
     * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.  IN NO EVENT SHALL
     * VA LINUX SYSTEM, IBM AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
     * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
     * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
     * USE OR OTHER DEALINGS IN THE SOFTWARE.
     */
    
    #pragma once
    #ifndef __PIGLIT_UTIL_GL_H__
    #define __PIGLIT_UTIL_GL_H__
    
    #ifdef __cplusplus
    extern "C" {
    #endif
    
    #include "piglit-util.h"
    
    #include <piglit/gl_wrap.h>
    #include <piglit/glut_wrap.h>
    
    #define piglit_get_proc_address(x) piglit_dispatch_resolve_function(x)
    
    #include "piglit-framework-gl.h"
    #include "piglit-shader.h"
    
    extern const uint8_t fdo_bitmap[];
    extern const unsigned int fdo_bitmap_width;
    extern const unsigned int fdo_bitmap_height;
    
    extern bool piglit_is_core_profile;
    
    /**
     * Determine if the API is OpenGL ES.
     */
    bool piglit_is_gles(void);
    
    /**
     * \brief Get version of OpenGL or OpenGL ES API.
     *
     * Returned version is multiplied by 10 to make it an integer.  So for
     * example, if the GL version is 2.1, the return value is 21.
     */
    int piglit_get_gl_version(void);
    
    /**
     * \precondition name is not null
     */
    bool piglit_is_extension_supported(const char *name);
    
    /**
     * reinitialize the supported extension List.
     */
    void piglit_gl_reinitialize_extensions();
    
    /**
     * \brief Convert a GL error to a string.
     *
     * For example, given GL_INVALID_ENUM, return "GL_INVALID_ENUM".
     *
     * Return "(unrecognized error)" if the enum is not recognized.
     */
    const char* piglit_get_gl_error_name(GLenum error);
    
    /**
     * \brief Convert a GL enum to a string.
     *
     * For example, given GL_INVALID_ENUM, return "GL_INVALID_ENUM".
     *
     * Return "(unrecognized enum)" if the enum is not recognized.
     */
    const char *piglit_get_gl_enum_name(GLenum param);
    
    /**
     * \brief Convert a string to a GL enum.
     *
     * For example, given "GL_INVALID_ENUM", return GL_INVALID_ENUM.
     *
     * abort() if the string is not recognized.
     */
    GLenum piglit_get_gl_enum_from_name(const char *name);
    
    /**
     * \brief Convert a GL primitive type enum value to a string.
     *
     * For example, given GL_POLYGON, return "GL_POLYGON".
     * We don't use piglit_get_gl_enum_name() for this because there are
     * other enums which alias the prim type enums (ex: GL_POINTS = GL_NONE);
     *
     * Return "(unrecognized enum)" if the enum is not recognized.
     */
    const char *piglit_get_prim_name(GLenum prim);
    
    
    /**
     * \brief Check for unexpected GL errors.
     *
     * If glGetError() returns an error other than \c expected_error, then
     * print a diagnostic and return GL_FALSE.  Otherwise return GL_TRUE.
     */
    GLboolean
    piglit_check_gl_error_(GLenum expected_error, const char *file, unsigned line);
    
    #define piglit_check_gl_error(expected) \
     piglit_check_gl_error_((expected), __FILE__, __LINE__)
    
    /**
     * \brief Drain all GL errors.
     *
     * Repeatly call glGetError and discard errors until it returns GL_NO_ERROR.
     */
    void piglit_reset_gl_error(void);
    
    void piglit_require_gl_version(int required_version_times_10);
    void piglit_require_extension(const char *name);
    void piglit_require_not_extension(const char *name);
    unsigned piglit_num_components(GLenum base_format);
    bool piglit_get_luminance_intensity_bits(GLenum internalformat, int *bits);
    int piglit_probe_pixel_rgb_silent(int x, int y, const float* expected, float *out_probe);
    int piglit_probe_pixel_rgba_silent(int x, int y, const float* expected, float *out_probe);
    int piglit_probe_pixel_rgb(int x, int y, const float* expected);
    int piglit_probe_pixel_rgba(int x, int y, const float* expected);
    int piglit_probe_rect_r_ubyte(int x, int y, int w, int h, GLubyte expected);
    int piglit_probe_rect_rgb(int x, int y, int w, int h, const float* expected);
    int piglit_probe_rect_rgb_silent(int x, int y, int w, int h, const float *expected);
    int piglit_probe_rect_rgba(int x, int y, int w, int h, const float* expected);
    int piglit_probe_rect_rgba_int(int x, int y, int w, int h, const int* expected);
    int piglit_probe_rect_rgba_uint(int x, int y, int w, int h, const unsigned int* expected);
    void piglit_compute_probe_tolerance(GLenum format, float *tolerance);
    int piglit_compare_images_color(int x, int y, int w, int h, int num_components,
                    const float *tolerance,
                    const float *expected_image,
                    const float *observed_image);
    int piglit_probe_image_color(int x, int y, int w, int h, GLenum format, const float *image);
    int piglit_probe_image_rgb(int x, int y, int w, int h, const float *image);
    int piglit_probe_image_rgba(int x, int y, int w, int h, const float *image);
    int piglit_compare_images_ubyte(int x, int y, int w, int h,
                    const GLubyte *expected_image,
                    const GLubyte *observed_image);
    int piglit_probe_image_stencil(int x, int y, int w, int h, const GLubyte *image);
    int piglit_probe_image_ubyte(int x, int y, int w, int h, GLenum format,
                     const GLubyte *image);
    int piglit_probe_texel_rect_rgb(int target, int level, int x, int y,
                    int w, int h, const float *expected);
    int piglit_probe_texel_rgb(int target, int level, int x, int y,
                   const float* expected);
    int piglit_probe_texel_rect_rgba(int target, int level, int x, int y,
                     int w, int h, const float *expected);
    int piglit_probe_texel_rgba(int target, int level, int x, int y,
                    const float* expected);
    int piglit_probe_texel_volume_rgba(int target, int level, int x, int y, int z,
                     int w, int h, int d, const float *expected);
    int piglit_probe_pixel_depth(int x, int y, float expected);
    int piglit_probe_rect_depth(int x, int y, int w, int h, float expected);
    int piglit_probe_pixel_stencil(int x, int y, unsigned expected);
    int piglit_probe_rect_stencil(int x, int y, int w, int h, unsigned expected);
    int piglit_probe_rect_halves_equal_rgba(int x, int y, int w, int h);
    
    bool piglit_probe_buffer(GLuint buf, GLenum target, const char *label,
                 unsigned n, unsigned num_components,
                 const float *expected);
    
    int piglit_use_fragment_program(void);
    int piglit_use_vertex_program(void);
    void piglit_require_fragment_program(void);
    void piglit_require_vertex_program(void);
    GLuint piglit_compile_program(GLenum target, const char* text);
    GLvoid piglit_draw_triangle(float x1, float y1, float x2, float y2,
                    float x3, float y3);
    GLvoid piglit_draw_triangle_z(float z, float x1, float y1, float x2, float y2,
                      float x3, float y3);
    GLvoid piglit_draw_rect_custom(float x, float y, float w, float h,
                       bool use_patches);
    GLvoid piglit_draw_rect(float x, float y, float w, float h);
    GLvoid piglit_draw_rect_z(float z, float x, float y, float w, float h);
    GLvoid piglit_draw_rect_tex(float x, float y, float w, float h,
                                float tx, float ty, float tw, float th);
    GLvoid piglit_draw_rect_back(float x, float y, float w, float h);
    void piglit_draw_rect_from_arrays(const void *verts, const void *tex,
                      bool use_patches);
    
    unsigned short piglit_half_from_float(float val);
    
    void piglit_escape_exit_key(unsigned char key, int x, int y);
    
    void piglit_gen_ortho_projection(double left, double right, double bottom,
                     double top, double near_val, double far_val,
                     GLboolean push);
    void piglit_ortho_projection(int w, int h, GLboolean push);
    void piglit_frustum_projection(GLboolean push, double l, double r, double b,
                       double t, double n, double f);
    void piglit_gen_ortho_uniform(GLint location, double left, double right,
                      double bottom, double top, double near_val,
                      double far_val);
    void piglit_ortho_uniform(GLint location, int w, int h);
    
    GLuint piglit_checkerboard_texture(GLuint tex, unsigned level,
        unsigned width, unsigned height,
        unsigned horiz_square_size, unsigned vert_square_size,
        const float *black, const float *white);
    GLuint piglit_miptree_texture(void);
    GLfloat *piglit_rgbw_image(GLenum internalFormat, int w, int h,
                               GLboolean alpha, GLenum basetype);
    GLubyte *piglit_rgbw_image_ubyte(int w, int h, GLboolean alpha);
    GLuint piglit_rgbw_texture(GLenum internalFormat, int w, int h, GLboolean mip,
                GLboolean alpha, GLenum basetype);
    GLuint piglit_depth_texture(GLenum target, GLenum format, int w, int h, int d, GLboolean mip);
    GLuint piglit_array_texture(GLenum target, GLenum format, int w, int h, int d, GLboolean mip);
    GLuint piglit_multisample_texture(GLenum target, GLenum tex,
                      GLenum internalFormat,
                      unsigned width, unsigned height,
                      unsigned depth, unsigned samples,
                      GLenum format, GLenum type, void *data);
    extern float piglit_tolerance[4];
    void piglit_set_tolerance_for_bits(int rbits, int gbits, int bbits, int abits);
    extern void piglit_require_transform_feedback(void);
    
    bool
    piglit_get_compressed_block_size(GLenum format,
                     unsigned *bw, unsigned *bh, unsigned *bytes);
    
    unsigned
    piglit_compressed_image_size(GLenum format, unsigned width, unsigned height);
    
    unsigned
    piglit_compressed_pixel_offset(GLenum format, unsigned width,
                       unsigned x, unsigned y);
    
    void
    piglit_visualize_image(float *img, GLenum base_internal_format,
                   int image_width, int image_height,
                   int image_count, bool rhs);
    
    float piglit_srgb_to_linear(float x);
    float piglit_linear_to_srgb(float x);
    
    extern GLfloat cube_face_texcoords[6][4][3];
    extern const char *cube_face_names[6];
    extern const GLenum cube_face_targets[6];
    
    /**
     * Common vertex program code to perform a model-view-project matrix transform
     */
    #define PIGLIT_VERTEX_PROGRAM_MVP_TRANSFORM        \
        "ATTRIB iPos = vertex.position;\n"      \
        "OUTPUT oPos = result.position;\n"      \
        "PARAM  mvp[4] = { state.matrix.mvp };\n"   \
        "DP4    oPos.x, mvp[0], iPos;\n"        \
        "DP4    oPos.y, mvp[1], iPos;\n"        \
        "DP4    oPos.z, mvp[2], iPos;\n"        \
        "DP4    oPos.w, mvp[3], iPos;\n"
    
    /**
     * Handle to a generic fragment program that passes the input color to output
     *
     * \note
     * Either \c piglit_use_fragment_program or \c piglit_require_fragment_program
     * must be called before using this program handle.
     */
    extern GLint piglit_ARBfp_pass_through;
    
    static const GLint PIGLIT_ATTRIB_POS = 0;
    static const GLint PIGLIT_ATTRIB_TEX = 1;
    
    /**
     * Given a GLSL version number, return the lowest-numbered GL version
     * that is guaranteed to support it.
     */
    unsigned
    required_gl_version_from_glsl_version(unsigned glsl_version);
    
    
    #ifdef __cplusplus
    } /* end extern "C" */
    #endif
    
    #endif /* __PIGLIT_UTIL_GL_H__ */

    Most functions are undocumented although there are a lot of examples of how to use it in other piglit tests. Furthermore, once you know which function you need, it is usually straightforward to learn how to call it.

    Just to mention a few: you can request extensions (piglit_require_extension()) or a GL version (piglit_require_gl_version()), compile program (piglit_compile_program()), draw a rectangle or triangle, read pixel values and compare them with a list of expected values, check for GL errors, etc.

    piglit-shader.h has all the shader-related functions: compile a shader, link a simple program, build (i.e. compile and link) a simple program with vertex and fragment shaders, require a specific GLSL version, etc.

    /*
     * Copyright (c) The Piglit project 2007
     *
     * Permission is hereby granted, free of charge, to any person obtaining a
     * copy of this software and associated documentation files (the "Software"),
     * to deal in the Software without restriction, including without limitation
     * on the rights to use, copy, modify, merge, publish, distribute, sub
     * license, and/or sell copies of the Software, and to permit persons to whom
     * the Software is furnished to do so, subject to the following conditions:
     *
     * The above copyright notice and this permission notice (including the next
     * paragraph) shall be included in all copies or substantial portions of the
     * Software.
     *
     * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
     * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
     * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.  IN NO EVENT SHALL
     * VA LINUX SYSTEM, IBM AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
     * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
     * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
     * USE OR OTHER DEALINGS IN THE SOFTWARE.
     */
    
    #pragma once
    
    /**
     * Null parameters are ignored.
     *
     * \param es Is it GLSL ES?
     */
    void piglit_get_glsl_version(bool *es, int* major, int* minor);
    
    GLuint piglit_compile_shader(GLenum target, const char *filename);
    GLuint piglit_compile_shader_text_nothrow(GLenum target, const char *text);
    GLuint piglit_compile_shader_text(GLenum target, const char *text);
    GLboolean piglit_link_check_status(GLint prog);
    GLboolean piglit_link_check_status_quiet(GLint prog);
    GLint piglit_link_simple_program(GLint vs, GLint fs);
    GLint piglit_build_simple_program(const char *vs_source, const char *fs_source);
    GLuint piglit_build_simple_program_unlinked(const char *vs_source,
                            const char *fs_source);
    GLint piglit_link_simple_program_multiple_shaders(GLint shader1, ...);
    GLint piglit_build_simple_program_unlinked_multiple_shaders_v(GLenum target1,
                                     const char*source1,
                                     va_list ap);
    GLint piglit_build_simple_program_unlinked_multiple_shaders(GLenum target1,
                                   const char *source1,
                                   ...);
    GLint piglit_build_simple_program_multiple_shaders(GLenum target1,
                              const char *source1,
                              ...);
    
    extern GLboolean piglit_program_pipeline_check_status(GLuint pipeline);
    extern GLboolean piglit_program_pipeline_check_status_quiet(GLuint pipeline);
    
    /**
     * Require a specific version of GLSL.
     *
     * \param version Integer version, for example 130
     */
    extern void piglit_require_GLSL_version(int version);
    /** Require any version of GLSL */
    extern void piglit_require_GLSL(void);
    extern void piglit_require_fragment_shader(void);
    extern void piglit_require_vertex_shader(void);

    There are more header files such as piglit-glx-util.h, piglit-matrix.h, piglit-util-egl.h, etc.

    Usually, you only need to add piglit-util-gl.h to your source code, however I recommend you to browse through tests/util/ so you find out all the available functions that piglit provides.

    Example

    A complete example of how a piglit binary test looks like is ARB_uniform_buffer_object rendering test.

    /*
     * Copyright (c) 2014 VMware, Inc.
     *
     * Permission is hereby granted, free of charge, to any person obtaining a
     * copy of this software and associated documentation files (the "Software"),
     * to deal in the Software without restriction, including without limitation
     * the rights to use, copy, modify, merge, publish, distribute, sublicense,
     * and/or sell copies of the Software, and to permit persons to whom the
     * Software is furnished to do so, subject to the following conditions:
     *
     * The above copyright notice and this permission notice (including the next
     * paragraph) shall be included in all copies or substantial portions of the
     * Software.
     *
     * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
     * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
     * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
     * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
     * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
     * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
     * DEALINGS IN THE SOFTWARE.
     */
    
    /** @file rendering.c
     *
     * Test rendering with UBOs.  We draw four squares with different positions,
     * sizes, rotations and colors where those parameters come from UBOs.
     */
    
    #include "piglit-util-gl.h"
    
    PIGLIT_GL_TEST_CONFIG_BEGIN
    
        config.supports_gl_compat_version = 20;
        config.window_visual = PIGLIT_GL_VISUAL_DOUBLE | PIGLIT_GL_VISUAL_RGBA;
    
    PIGLIT_GL_TEST_CONFIG_END
    
    static const char vert_shader_text[] =
        "#extension GL_ARB_uniform_buffer_object : require\n"
        "\n"
        "layout(std140) uniform;\n"
        "uniform ub_pos_size { vec2 pos; float size; };\n"
        "uniform ub_rot {float rotation; };\n"
        "\n"
        "void main()\n"
        "{\n"
        "   mat2 m;\n"
        "   m[0][0] = m[1][1] = cos(rotation); \n"
        "   m[0][1] = sin(rotation); \n"
        "   m[1][0] = -m[0][1]; \n"
        "   gl_Position.xy = m * gl_Vertex.xy * vec2(size) + pos;\n"
        "   gl_Position.zw = vec2(0, 1);\n"
        "}\n";
    
    static const char frag_shader_text[] =
        "#extension GL_ARB_uniform_buffer_object : require\n"
        "\n"
        "layout(std140) uniform;\n"
        "uniform ub_color { vec4 color; float color_scale; };\n"
        "\n"
        "void main()\n"
        "{\n"
        "   gl_FragColor = color * color_scale;\n"
        "}\n";
    
    #define NUM_SQUARES 4
    #define NUM_UBOS 3
    
    /* Square positions and sizes */
    static const float pos_size[NUM_SQUARES][3] = {
        { -0.5, -0.5, 0.1 },
        {  0.5, -0.5, 0.2 },
        { -0.5, 0.5, 0.3 },
        {  0.5, 0.5, 0.4 }
    };
    
    /* Square color and color_scales */
    static const float color[NUM_SQUARES][8] = {
        { 2.0, 0.0, 0.0, 1.0,   0.50, 0.0, 0.0, 0.0 },
        { 0.0, 4.0, 0.0, 1.0,   0.25, 0.0, 0.0, 0.0 },
        { 0.0, 0.0, 5.0, 1.0,   0.20, 0.0, 0.0, 0.0 },
        { 0.2, 0.2, 0.2, 0.2,   5.00, 0.0, 0.0, 0.0 }
    };
    
    /* Square rotations */
    static const float rotation[NUM_SQUARES] = {
        0.0,
        0.1,
        0.2,
        0.3
    };
    
    static GLuint prog;
    static GLuint buffers[NUM_UBOS];
    static GLint alignment;
    static bool test_buffer_offset = false;
    
    
    static void
    setup_ubos(void)
    {
        static const char *names[NUM_UBOS] = {
            "ub_pos_size",
            "ub_color",
            "ub_rot"
        };
        static GLubyte zeros[1000] = {0};
        int i;
    
        glGetIntegerv(GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT, &alignment);
        printf("GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT = %d\n", alignment);
    
        if (test_buffer_offset) {
            printf("Testing buffer offset %d\n", alignment);
        }
        else {
            /* we use alignment as the offset */
            alignment = 0;
        }
    
        glGenBuffers(NUM_UBOS, buffers);
    
        for (i = 0; i < NUM_UBOS; i++) {
            GLint index, size;
    
            /* query UBO index */
            index = glGetUniformBlockIndex(prog, names[i]);
    
            /* query UBO size */
            glGetActiveUniformBlockiv(prog, index,
                          GL_UNIFORM_BLOCK_DATA_SIZE, &size);
    
            printf("UBO %s: index = %d, size = %d\n",
                   names[i], index, size);
    
            /* Allocate UBO */
            /* XXX for some reason, this test doesn't work at all with
             * nvidia if we pass NULL instead of zeros here.  The UBO data
             * is set/overwritten in the piglit_display() function so this
             * really shouldn't matter.
             */
            glBindBuffer(GL_UNIFORM_BUFFER, buffers[i]);
            glBufferData(GL_UNIFORM_BUFFER, size + alignment,
                                 zeros, GL_DYNAMIC_DRAW);
    
            /* Attach UBO */
            glBindBufferRange(GL_UNIFORM_BUFFER, i, buffers[i],
                      alignment,  /* offset */
                      size);
            glUniformBlockBinding(prog, index, i);
    
            if (!piglit_check_gl_error(GL_NO_ERROR))
                piglit_report_result(PIGLIT_FAIL);
        }
    }
    
    
    void
    piglit_init(int argc, char **argv)
    {
        piglit_require_extension("GL_ARB_uniform_buffer_object");
    
        if (argc > 1 && strcmp(argv[1], "offset") == 0) {
            test_buffer_offset = true;
        }
    
        prog = piglit_build_simple_program(vert_shader_text, frag_shader_text);
        assert(prog);
        glUseProgram(prog);
    
        setup_ubos();
    
        glClearColor(0.2, 0.2, 0.2, 0.2);
    }
    
    
    static bool
    probe(int x, int y, int color_index)
    {
        float expected[4];
    
        /* mul color by color_scale */
        expected[0] = color[color_index][0] * color[color_index][4];
        expected[1] = color[color_index][1] * color[color_index][4];
        expected[2] = color[color_index][2] * color[color_index][4];
        expected[3] = color[color_index][3] * color[color_index][4];
    
        return piglit_probe_pixel_rgba(x, y, expected);
    }
    
    
    enum piglit_result
    piglit_display(void)
    {
        bool pass = true;
        int x0 = piglit_width / 4;
        int x1 = piglit_width * 3 / 4;
        int y0 = piglit_height / 4;
        int y1 = piglit_height * 3 / 4;
        int i;
    
        glViewport(0, 0, piglit_width, piglit_height);
    
        glClear(GL_COLOR_BUFFER_BIT);
    
        for (i = 0; i < NUM_SQUARES; i++) {
            /* Load UBO data, at offset=alignment */
            glBindBuffer(GL_UNIFORM_BUFFER, buffers[0]);
            glBufferSubData(GL_UNIFORM_BUFFER, alignment, sizeof(pos_size[0]),
                    pos_size[i]);
            glBindBuffer(GL_UNIFORM_BUFFER, buffers[1]);
            glBufferSubData(GL_UNIFORM_BUFFER, alignment, sizeof(color[0]),
                    color[i]);
            glBindBuffer(GL_UNIFORM_BUFFER, buffers[2]);
            glBufferSubData(GL_UNIFORM_BUFFER, alignment, sizeof(rotation[0]),
                    &rotation[i]);
    
            if (!piglit_check_gl_error(GL_NO_ERROR))
                return PIGLIT_FAIL;
    
            piglit_draw_rect(-1, -1, 2, 2);
        }
    
        pass = probe(x0, y0, 0) && pass;
        pass = probe(x1, y0, 1) && pass;
        pass = probe(x0, y1, 2) && pass;
        pass = probe(x1, y1, 3) && pass;
    
        piglit_present_results();
    
        return pass ? PIGLIT_PASS : PIGLIT_FAIL;
    }

    The source starts with the license header and describing what the test does. Following those, it’s the config setup for piglit: request doube buffer, RGBA pixel format and GL compat version 2.0.

    Furthermore, this test defines GLSL shaders, the global data and helper functions as any other OpenGL C-language program. Notice that setup_ubos() include calls to OpenGL API but also calls to piglit_check_gl_error() and piglit_report_result() which are used to check for any OpenGL error and tell piglit that there was a failure, respectively.

    Following the structure introduced before, piglit_init() indicates that ARB_uniform_buffer_object extension is required, it builds the program with the aforementioned shaders, setups the uniform buffer objects and sets the clear color.

    Finally in piglit_display() is where the relevant content is placed. Among other things, it loads UBO’s data, draw a rectangle and check that the rendered pixels have the same values as the expected ones. Depending of the result of that check, it will report to piglit that the test was a success or a failure.

    How to add a binary test to piglit

    How to build it

    Now that you have written the source code of your test, it’s time to learn how to build it. As we have seen in an earlier post, piglit uses cmake for generating binaries.

    First you need to check if cmake tracks the directory where your source code is. If you create a new extension subdirectory under tests/spec, modify this CMakeLists.txt file and add yours.

    Once you have done that, cmake needs to know that there are binaries to build and where to get its source code. For doing that, create a CMakeLists.txt file in your test’s directory with the following content:

    piglit_include_target_api()

    If you have binaries inside directories pending the former, you can add them to the same CMakeLists.txt file you are creating:

    add_subdirectory (compiler)
    add_subdirectory (execution)
    add_subdirectory (linker)
    piglit_include_target_api()

    But this is not enough to compile your test as we have not specified which binary is and where to get its source code. We do that by creating a new file CMakeLists.gl.txt with a similar content than this one.

    include_directories(
        ${GLEXT_INCLUDE_DIR}
        ${OPENGL_INCLUDE_PATH}
    )
    
    link_libraries (
        piglitutil_${piglit_target_api}
        ${OPENGL_gl_LIBRARY}
    )
    
    piglit_add_executable (arb_shader_storage_buffer_object-minmax minmax.c)
    
    # vim: ft=cmake:

    As you see, first we declare where to find the needed headers and libraries. Then we define the binary name (arb_shader_storage_buffer_object-minmax) and which is its source code file (minmax.c).

    And that’s it. If everything is fine, next time you run cmake . && make in piglit’s directory, piglit will build the test and place it inside bin/ directory.

    Example in piglit repository

    Let’s review a real example of how a test was added for a new extension in piglit (this commit). As you see, it added tests/spec/arb_robustness/ subdirectory to tests/spec/CMakeLists.txt, create a tests/spec/arb_robustness/CMakeLists.txt to indicate cmake to track this directory, tests/spec/arb_robustness/CMakeLists.gl.txt to compile the binary test file and add its source code file tests/spec/arb_robustness/draw-vbo-bounds.c.

    tests/spec/CMakeLists.txt | 1 +
     tests/spec/arb_robustness/CMakeLists.gl.txt | 19 +++++++++++++++++++
     tests/spec/arb_robustness/CMakeLists.txt | 1 +
     tests/spec/arb_robustness/draw-vbo-bounds.c | 205 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
     4 files changed, 226 insertions(+)

    If you run git log tests/spec/<dir> command in other directories, you will find similar commits.

    How to run it with all.py profile

    Once you successfully build the binary test program you can run it standalone. However, it’s better to add it to tests/all.py profile so it is executed automatically by this profile.

    Open tests/all.py  and look for your extension name, then add your test by looking at  how other tests are defined and adapt it to your case. In case you are developing tests for a new extension, you have to add more lines to tests/all.py. For example this is how was done for ARB_shader_storage_buffer_object extension:

    with profile.group_manager(
            PiglitGLTest,
            grouptools.join('spec', 'arb_shader_storage_buffer_object')) as g:
        g(['arb_shader_storage_buffer_object-minmax'], 'minmax')

    These lines make several things: indicate under which extension you will find the results, which are the binaries to run and which short name will appear in the summaries for each binary.

    ARB_shader_storage_buffer_object's minmax test result in HTML summary
    ARB_shader_storage_buffer_object’s minmax test in HTML summary

    Conclusions

    This post and last one were about how to create your own piglit tests. I hope you will find them useful and I will be glad to see new contributors because of them :-)

    Next post will talk about how to send your contributions to piglit and a table of contents so you can easily jump to any post of this series.

    by Samuel Iglesias at June 11, 2015 02:02 PM

    June 02, 2015

    Manuel Rego

    Grid and the City

    I’m really glad to announce that my talk “CSS Grid Layout is just around the corner” has been accepted at CSSConf US 2015 (18-19 June). Thanks to the organizers for selecting my proposal, it’s a pleasure to be among all these great speakers. BTW, If you haven’t grabbed your ticket yet, you could use the following promo code when checking out to save some money: MR200

    I’m part of the Igalia Web Platform team, and I’m currently working on the implementation of the CSS Grid Layout W3C spec on Blink and WebKit. So, I’m kind of an “exotic” profile in a conference like CSSConf, as I’m not working on frontend. However, I’ll try to bring the implementor perspective to the table, explaining some internals about how grid works. I’ll also introduce the basic syntax to be able to start playing with it.

    My talk abstract from CSSConf website My talk abstract from CSSConf website

    CSSConf this year is happening in New York City and the venue, Caroline’s on Broadway, is in the heart of Manhattan. So, I’ll take advantage to pay a visit to our friends at Bloomberg, whom we collaborate with in the development of CSS Grid Layout. In addition, BrooklynJS is organized on the evening of June 18th, and as part of the ticket for CSSConf, we’ll have the chance to attend this event too.

    From the personal side, this will be my first time in NYC, exciting times ahead! Feel free to ping me if you want to talk about grid, the web, Igalia or simply do some sightseeing; as I’ll be arriving on 15th June’s night.

    Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

    As you might guess, I’m very excited about this crazy week, full of events and new experiences. I’m sure I’ll meet lots of great people and I’ll do my best to convince the world about the goodness of grid and make them feel how awesome it is. Exciting times ahead!

    June 02, 2015 10:00 PM

    June 01, 2015

    Javier Fernández

    Distributing tracks along Grid Layout container

    In my last post I introduced the concept of Content Distribution alignment and how it affects Grid Layout implementation. At that time, it was possible to use all the <content-position> values to select grid tracks position inside a grid container, moving them across the available space. However, it wasn’t until recently that users can distribute grid tracks along such available space, literally adding gaps in between or even stretching them.

    In this post I’ll describe how each <content-distribution> value affect tracks in a Grid Layout, their position and size, using different grid structures (eg. number of tracks, span).

    Let’s start analyzing the new Content Distribution alignment syntax defined in the CSS Box Alignment specification:

    auto | <baseline-position> | <content-distribution> || [ <overflow-position>? && <content-position> ]

    In case of a <content-distribution> value can’t be applied, its associated fallback <content-distribution> value should be used instead. However, this CSS syntax allow users to specify a preferred fallback value:

    If both a <content-distribution> and <content-position> are given, the <content-position> provides an explicit fallback alignment.

    Before going into each value, I think it’s a good idea to refresh the concepts of alignment container and alignment subject and how they apply in the context of Grid Layout:

    The alignment container is the grid container’s content box. The alignment subjects are the grid tracks.

    The different <content-distribution> values that can be used for align-content and justify-content CSS properties are defined as follows:

    • space-betweenThe alignment subjects are evenly distributed in the alignment container. Default fallback: start.
    • space-aroundThe alignment subjects are evenly distributed in the alignment container, with a half-size space on either end. Default fallback: center.
    • space-evenlyThe alignment subjects are evenly distributed in the alignment container, with a full-size space on either end. Default fallback: center.
    • stretchAny auto-sized alignment subjects have their size increased equally (not proportionally) so that the combined size exactly fills the alignment container. Default fallback: start.

    Picture below describes how these values would behave depending on the number of grid tracks; for simplicity I only use justify-content property, so tracks are distributed along the inline (row) axis. In next examples we will see how both properties work together using more complex grid definitions.

    content-distribution-1

    Effect of different Content Distribution values on Grid Layout. Click on the Image to evaluate the behavior when using different number of tracks.

    Previous examples were defined with grid items filling grid areas of just 1×1 tracks, which makes distribution pretty simple and easier to predict. But thanks to the flexibility of Grid Layout syntax we can define irregular grids, for instance, using the grid-template-areas property like in the next example.

    align-content-and span-4

    Basic example of how to apply the different values and its effect on irregular grid design.

    Since Content Distribution alignment considers grid tracks as the alignment subject, distributing tracks along the available space may have the consequence of modifying the dimensions of grid-areas defined by more than one track. The following picture shows the result of the code above and provides and excellent example of how powerful is the Content Alignment effect on a Grid Layout.

    These use cases can be obtained from Igalia’s Grid Layout examples repository, so anybody can play with different grid designs and alignment values combinations. They are also available at our codepen repository.

    Grid Layout behind the scene

    Now I’d like to explain a bit what I had to implement in the browser’s webcore to get these new features done; just some small pieces of source code, the ones I considered better, to get an idea of what implementing new behavior in browsers implies.

    As you might already know because of my previous posts, CSS Box Alignment specification was born to generalize Flexbox’s alignment behavior so that it can be used for grid and even regular blocks. Several new properties were added, like justify-items and justify-self, and CSS syntax has changed considerably. Specially noteworthy how Content Distribution alignment properties have changed from their initial Flexbox definition. They now support complex values like ‘space-between true’, ‘space-around start’, or even ‘stretch center safe’. This makes possible to express more info than using the previous simple keyword form, although it requires a new CSS parsing logic in Browsers.

    More complex CSS parsing

    Since both align-content and justify-content properties accept multiple optional keywords I needed to re-implement completely their parsing logic. I’m happy to announce that it recently landed WebKit’s trunk too, so now both web engines support the new CSS syntax.

    Due to the complex values defined for theses CSS properties, a new CSSValue derived class was defined to hold all the Content Alignment data, named as CSSContentDistributionValue. This data is then converted to something meaningful for the style logic using the StyleBuilderConverter class. This is the preferred method in both WebKit and Blink engines, which it just needs to be declared in the CSSPropertyNames.in and CSSProperties.in template files, respectively.

    align-content initial=initialContentAlignment, converter=convertContentAlignmentData
    justify-content initial=initialContentAlignment, converter=convertContentAlignmentData

    The StyleBuildConverter logic is pretty simple thanks to these 2 new data structures, as it can be appreciated in the following excerpt of source code:

    StyleContentAlignmentData StyleBuilderConverter::convertContentAlignmentData(StyleResolverState&amp;, CSSValue* value)
    {
        StyleContentAlignmentData alignmentData = ComputedStyle::initialContentAlignment();
        CSSContentDistributionValue* contentValue = toCSSContentDistributionValue(value);
        if (contentValue->distribution()->getValueID() != CSSValueInvalid)
            alignmentData.setDistribution(*contentValue->;distribution());
        if (contentValue->position()->getValueID() != CSSValueInvalid)
            alignmentData.setPosition(*contentValue->;position());
        if (contentValue->overflow()->getValueID() != CSSValueInvalid)
            alignmentData.setOverflow(*contentValue->overflow());
        return alignmentData;
    }

    The StyleContentAlignmentData class was defined to simplify how we manage these complex values, so that we can handle properties as they had an atomic value. This approach allows a more efficient and robust way of detecting and managing style changes in these properties.

    New Layout operations

    Once this new CSS syntax is correctly parsed and a LayoutStyle instance generated according to user defined CSS style rules, I needed to modify Flexbox’s layout code for adapting it to the new data structures, ensuring browser backward compatibility and passing all the Layout and Unit tests. I implemented from scratch this logic for Grid Layout so I had the opportunity to introduce several performance optimizations to avoid unnecessary layouts and repaints. This area is pretty interesting and I’ll talk about it soon in a new post.

    One interesting aspect of Content Distribution alignment is that it might take part in the track sizing algorithm. As it was explained in my previous post about Self Alignment, stretch value increases alignment subject’s size to fill its alignment container’s available space. This is also the case of Content Alignment, but considering tracks as alignment subject. However, there is another case not so obvious where <content-distribution> values may influence in track sizing resolution, or perhaps better said, grid area sizing.

    Let’s consider this example of grid where there are certain areas using more than one track:

    grid-template-areas: "a a b"
                         "c d b"
    grid-auto-columns: 20px;
    grid-auto-rows: 40px;
    width: 150px;
    height: 300px;

    The example above defines a grid with 3 column tracks of 20px and 2 row tracks of 40px, which would be laid out as it’s shown in the following diagram:

    content-distribution-spans

    Grid Layout with areas filing more than one track. Click on the picture to evaluate the effect of each value on the grid area size.

    This fact has interesting implementation implications due to the fact that in certain cases, in order to determine grid item’s logical height we need its logical width to be resolved first. Track sizing algorithm uses children grid area size to determine grid cell’s logical height, hence given that alignment logic needs track sizes have been already resolved, it may imply a re-layout of the grid items which size could be affected by the used content-distribution value. The following source code shows how I handle this scenario:

    LayoutUnit LayoutGrid::gridAreaBreadthForChild(const LayoutBox& child, GridTrackSizingDirection direction, const Vector& tracks) const
    {
        const GridCoordinate& coordinate = cachedGridCoordinate(child);
        const GridSpan& span = (direction == ForColumns) ? coordinate.columns : coordinate.rows;
        const Vector& trackPositions = (direction == ForColumns) ? m_columnPositions : m_rowPositions;
        if (span.resolvedFinalPosition.toInt() < trackPositions.size()) {
            LayoutUnit startOftrack = trackPositions[span.resolvedInitialPosition.toInt()];
            LayoutUnit endOfTrack = trackPositions[span.resolvedFinalPosition.toInt()];
            return endOfTrack - startOftrack + tracks[span.resolvedFinalPosition.toInt()].baseSize();
        }
        LayoutUnit gridAreaBreadth = 0;
        for (GridSpan::iterator trackPosition = span.begin(); trackPosition != span.end(); ++trackPosition)
            gridAreaBreadth += tracks[trackPosition.toInt()].baseSize();
     
        return gridAreaBreadth;

    The code above will return different results, in the cases mentioned before, depending on whether it’s run during track sizing alignment or after applying the alignment logic. This will likely make needed a new layout of the whole grid, or at least, the affected grid items, which it likely has a negative impact on performance.

    Current status and next steps

    I’d like to finish this post with a snapshot of current situation and challenges for the next months, as I’ve been regularly doing in my last posts.

    Unlike last reports, this time I’ve got good news regarding reduction of implementation gaps between the two web engines we are focusing our efforts on, WebKit and Blink. The following table describes current situation:

    alignment-status

    The table above indicates that several milestones were reached since the last report, although there are still some pending issues:

    • I’ve completed the implementation in WebKit of the parsing logic for the new Box Alignment properties: align-items and align-self.
    • As a side effect, I’ve also upgraded the ones already present because of Flexbox to the latest CSS3 Box Alignment specification.
    • WebKit has now full support for Default and Self Alignment fro Grid Layout, including also overflow handling
    • Blink has now full support for Content Distribution alignment, which missed <content-distrbuton> values.
    • WebKit’s Grid Layout implementation still misses support for Content Distribution alignment.
    • Baseline Alignment is still missing in both web engines

    In addition to the above mentioned pending issues, our roadmap include the following tasks as part of my todo list for the next months:

    • Even though there s support for different writing-modes and flow directions, there are still some issues with orthogonal flows. I’ve got already some promising patches but they still have to be reviewed by Blink and WebKit engineers.
    • Optimizations of style and repaint invalidations triggered by changes on the alignment properties. As commented before, this is a very interesting topic involving, which I’ll elaborate further in next posts.
    • Performance analysis of relevant Grid Layout use cases, which hopefully will lead to optimizations proposals.

    All this work and many other contributions to Grid Layout for WebKit and Blink web engines are the result of the collaboration between Bloomberg and Igalia to implement this W3C specification.

    Igalia & Bloomberg logos

    by jfernandez at June 01, 2015 07:32 PM

    May 29, 2015

    Samuel Iglesias

    piglit (III): How to write GLSL shader tests

    In earlier posts I talked about how to install piglit on your system and run the first time and how to tailor your piglit run. I have given a talk in FOSDEM 2015 about how to test your OpenGL drivers using Free Software where I explained how to run piglit and dEQP. This post and the next one are going to introduce the topic of creating new tests for piglit, as this was not covered before in this post series.

    Brief introduction

    Before start talking about how to create a GLSL test, let me explain you a couple of things about how piglit organizes its tests.

    First is that all the tests are defined inside tests/ directory.

    • The ones related to a specific spec (like OpenGL extension, GL version, GLSL version, etc) are placed inside tests/spec subdirectory.
    • Shader tests are usually defined inside  tests/shaders/ directory, but not always. If they are specific to a OpenGL extension, they will be in the its subdirectory (see first bullet).
    • Tests specific to other APIs: tests/egl, tests/cl, etc.
    • Etc.

    Take your time browsing all these directories and figure out the best place for your tests or, if you are looking for a specific test, where it is likely placed.

    Second thing is that binary tests should be defined inside tests/all.py to be executed in a full piglit run. This is not the case for GLSL tests or shader runner tests as we are going to see in this post.

    The most common language to write these tests is C but there are a couple of alternatives if you are interested on writing GLSL shader tests. Let’s start with a couple of examples of GLSL shader tests as it is the easiest way to contribute new tests to piglit.

    GLSL compiler test

    When creating a GLSL compiler test, most of the time you can avoid to write a C code example with all those bells and whistles. Actually, it is pretty straightforward if you just want to check that a GLSL shader compiles successfully or not according to your expectations.

    Usually, the rule of thumb is to keep GLSL tests simple and focused to detect failures (or success) in shaders’ compilation. For example, you want to check that defining a shader storage buffer block is only possible if ARB_shader_storage_buffer_object extension is enabled in the driver.

    #version 120
    #extension GL_ARB_shader_storage_buffer_object: disable
    
    buffer ssbo {
        vec4 a;
    };
    
    void foo(void) {
    }

    Once you write the GLSL shader, save it to a file named like extension-disabled-shader-storage-block.frag. There are several suffixes that you can use depending of the GLSL shader type: .vert for vertex shader, .geom for geometry shader, .frag for fragment shader. Notice that the name itself is a summary of what the test does, so other people can find what it does without opening the file.

    There is a piglit binary called glslparser that it is going to pick this shader and compile it checking for errors. This binary needs some parameters to know how to run this test and the expected result, so we provide them. Add at the top of the shader the following content:

    // [config]
    // expect_result: fail
    // glsl_version: 1.20
    // require_extensions: GL_ARB_shader_storage_buffer_object
    // [end config]

    There we write down that we expect the test to fail, which is the minimum supported GLSL version to run this test and the required extensions. At this case we need GLSL 1.20 version and GL_ARB_shader_storage_buffer_object extension.

    // [config]
    // expect_result: fail
    // glsl_version: 1.20
    // require_extensions: GL_ARB_shader_storage_buffer_object
    // [end config]
    
    #version 120
    #extension GL_ARB_shader_storage_buffer_object: disable
    
    buffer ssbo {
        vec4 a;
    };
    
    void foo(void) {
    }

    Once you have it, you should save it in the proper place. The directory will be the same than the extension name it is going to test, usually it is saved in a subdirectory called compiler because we are testing the GLSL shader compilation, for this example: tests/spec/arb_shader_storage_buffer_object/compiler.

    And that’s it. Next time you run piglit with tests/all.py profile, one script will find this test, parse it and execute glslparser with the provided configuration checking if the result of your shader is the expected one or not.

    You can execute it manually by running the following command:

    $ bin/glslparser tests/spec/arb_shader_storage_buffer_object/compiler/extension-disabled-shader-storage-block.frag fail 1.20 GL_ARB_shader_storage_buffer_object

    As you see, the last arguments come from the config we defined in the test file but we added them by hand.

    It is possible to link a GLSL shader and look for errors by using check_link with true or false, depending if you expect the test to succeed or to fail at link time:

    // check_link: true

    Nevertheless, it only links one shader. If you want to link more than one shader, I recommend you to use another type of piglit tests (see next section).

    As you see, you can easily write new GLSL shader tests to verify some bits of a specification: valid/invalid keywords, initialization, special cases explained in the spec, etc.

    GLSL shader linker test

    Sometimes compiling/linking only one shader is not enough, there are cases that you want to compile and link different but related shaders: link a vertex and a fragment shader or a geometry shader between them, etc. In the previous example, we saw that the GLSL shader test does not link different shaders (actually there is only one). When there are several GLSL shaders, you need to use another type of piglit tests: shader_runner tests.

    As usual, you first start writing the GLSL shaders. In some cases, you can substitute one shader by its pass-through counterpart to avoid writing “useless” shaders when there is not any user provided interface dependency.

    Let’s start with an example of a pass-through vertex shader:

    # ARB_gpu_shader5 spec says:
    #   "If an implementation supports  vertex streams, the individual
    #   streams are numbered 0 through -1"
    #
    # This test verifies that a link error occurs if EmitStreamVertex()
    # is called with a stream value which is negative.
    
    [require]
    GLSL >= 1.50
    GL_ARB_gpu_shader5
    
    [vertex shader passthrough]
    
    [geometry shader]
    
    #extension GL_ARB_gpu_shader5 : enable
    
    layout(points) in;
    layout(points, max_vertices=3) out;
    
    void main()
    {
        gl_Position = vec4(1.0, 1.0, 1.0, 1.0);
        EmitStreamVertex(-1);
        EndStreamPrimitive(-1);
    }
    
    [fragment shader]
    
    out vec4 color;
    
    void main()
    {
      color = vec4(0.0, 1.0, 0.0, 1.0);
    }
    
    [test]
    link error

    The file starts with a comment describing what the test does and which spec (or part of it) is checking.

    Next step is very similar to the previous example: specify the minimum GL and/or GLSL version required and the extensions needed for the test execution. This information is written in the [require] section.

    Then the different GLSL shader definitions start: vertex shader is pass-through as it has no user-defined interface dependency to next stage (geometry shader at this case). After it, it defines the  geometry and fragment shaders and finally the test commands to run. In this case, we expect the test to fail at link time.

    However you are not limited to link checking, there are other available commands to run in the test commands part:

      • Load uniform values
        • uniform <type> <name> <value>

    uniform vec4 color 0.0 0.5 0.0 0.0

      • Draw rectangles
        • draw rect <coordinates>

    draw rect -1 -1 2 2

      • Probe a pixel color value and compare it to an expected value
        • probe {rgb, rgba}

    probe rgb 1 1 0.0 1.0 0.0

      • Probe all pixel values.
        • probe all {rgb,rgba}

    probe all rgba 0.0 1.0 0.0 0.0

    Or even more complex commands:

      • Load data into a vertex array object and render primitives using that vertex array data.

    [vertex data]
    vertex/float/3
     1.0  1.0  1.0
    -1.0  1.0  1.0
    -1.0 -1.0  1.0
     1.0 -1.0  1.0
    
    [test]
    draw arrays GL_TRIANGLE_FAN 0 4

      •  Relative probe pixel color

    relative probe rgb (.5, .5) (0.0, 1.0, 0.0)

      • And much more:

    [fragment program]
    !!ARBfp1.0
    OPTION ARB_fragment_program_shadow;
    TXP result.color, fragment.texcoord[0], texture[0], SHADOWRECT;
    END
    
    [test]
    texture shadowRect 0 (32, 32)
    texparameter Rect depth_mode luminance
    texparameter Rect compare_func greater

    Just check other tests to figure out if they are doing something like what you want to do in your test and copy the interesting bits!

    Save the file with a good name describing briefly what it does (example’s filename is stream-negative-value.shader_test) and place it in the corresponding subdirectory. For this example, the proper place is under GL_ARB_gpu_shader5 test directory and linker subdirectory because this is a linker test: tests/spec/arb_gpu_shader5/linker/

    In order to execute this test with tests/all.py profile, you don’t need to do anything else. A script will find this test file and call shader_runner binary with the filename as a parameter.

    In case you want to run it by yourself, the command is easy:

    $ bin/shader_runner tests/spec/arb_gpu_shader5/linker/stream-negative-value.shader_test

    Next post will cover the creation of GL binary tests within the piglit framework. I hope this post and next one will help you to contribute to piglit!

    by Samuel Iglesias at May 29, 2015 09:49 AM

    May 20, 2015

    Iago Toral

    Bringing ARB_shader_storage_buffer_object to Mesa and i965

    In the last weeks I have been working together with my colleague Samuel on bringing support for ARB_shader_storage_buffer_object, an OpenGL 4.3 feature, to Mesa and the Intel i965 driver, so I figured I would write a bit on what this brings to OpenGL/GLSL users. If you are interested, read on.

    Introducing Shader Storage Buffer Objects

    This extension introduces the concept of shader storage buffer objects (SSBOs), which is a new type of OpenGL buffer. SSBOs allow GL clients to create buffers that shaders can then map to variables (known as buffer variables) via interface blocks. If you are familiar with Uniform Buffer Objects (UBOs), SSBOs are pretty similar but:

    • They are read/write, unlike UBOs, which are read-only.
    • They allow a number of atomic operations on them.
    • They allow an optional unsized array at the bottom of their definitions.

    Since SSBOs are read/write, they create a bidirectional channel of communication between the GPU and CPU spaces: the GL application can set the value of shader variables by writing to a regular OpenGL buffer, but the shader can also update the values stored in that buffer by assigning values to them in the shader code, making the changes visible to the GL application. This is a major difference with UBOs.

    In a parallel environment such as a GPU where we can have multiple shader instances running simultaneously (processing multiple vertices or fragments from a specific rendering call) we should be careful when we use SSBOs. Since all these instances will be simultaneously accessing the same buffer there are implications to consider relative to the order of reads and writes. The spec does not make many guarantees about the order in which these take place, other than ensuring that the order of reads and writes within a specific execution of a shader is preserved. Thus, it is up to the graphics developer to ensure, for example, that each execution of a fragment or vertex shader writes to a different offset into the underlying buffer, or that writes to the same offset always write the same value. Otherwise the results would be undefined, since they would depend on the order in which writes and reads from different instances happen in a particular execution.

    The spec also allows to use glMemoryBarrier() from shader code and glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT) from a GL application to add sync points. These ensure that all memory accesses to buffer variables issued before the barrier are completely executed before moving on.

    Another tool for developers to deal with concurrent accesses is atomic operations. The spec introduces a number of new atomic memory functions for use with buffer variables: atomicAdd, atomicMin, atomicMax, atomicAnd, atomicOr, atomicXor, atomicExchange (atomic assignment to a buffer variable), atomicCompSwap (atomic conditional assignment to a buffer variable).

    The optional unsized array at the bottom of an SSBO definition can be used to push a dynamic number of entries to the underlying buffer storage, up to the total size of the buffer allocated by the GL application.

    Using shader storage buffer objects (GLSL)

    Okay, so how do we use SSBOs? We will introduce this through an example: we will use a buffer to record information about the fragments processed by the fragment shader. Specifically, we will group fragments according to their X coordinate (by computing an index from the coordinate using a modulo operation). We will then record how many fragments are assigned to a particular index, the first fragment to be assigned to a given index, the last fragment assigned to a given index, the total number of fragments processed and the complete list of fragments processed.

    To store all this information we will use the SSBO definition below:

    layout(std140, binding=0) buffer SSBOBlock {
       vec4 first[8];     // first fragment coordinates assigned to index
       vec4 last[8];      // last fragment coordinates assigned to index
       int counter[8];    // number of fragments assigned to index
       int total;         // number of fragments processed
       vec4 fragments[];  // coordinates of all fragments processed
    };
    

    Notice the use of the keyword buffer to tell the compiler that this is a shader storage buffer object. Also notice that we have included an unsized array called fragments[], there can only be one of these in an SSBO definition, and in case there is one, it has to be the last field defined.

    In this case we are using std140 layout mode, which imposes certain alignment rules for the buffer variables within the SSBO, like in the case of UBOs. These alignment rules may help the driver implement read/write operations more efficiently since the underlying GPU hardware can usually read and write faster from and to aligned addresses. The downside of std140 is that because of these alignment rules we also waste some memory and we need to know the alignment rules on the GL side if we want to read/write from/to the buffer. Specifically for SSBOs, the specification introduces a new layout mode: std430, which removes these alignment restrictions, allowing for a more efficient memory usage implementation, but possibly at the expense of some performance impact.

    The binding keyword, just like in the case of UBOs, is used to select the buffer that we will be reading from and writing to when accessing these variables from the shader code. It is the application’s responsibility to bound the right buffer to the binding point we specify in the shader code.

    So with that done, the shader can read from and write to these variables as we see fit, but we should be aware of the fact that multiple instances of the shader could be reading from and writing to them simultaneously. Let’s look at the fragment shader that stores the information we want into the SSBO:

    void main() {
       int index = int(mod(gl_FragCoord.x, 8));
    
       int i = atomicAdd(counter[index], 1);
       if (i == 0)
          first[index] = gl_FragCoord;
       else
          last[index] = gl_FragCoord;
    
       i = atomicAdd(total, 1);
       fragments[i] = gl_FragCoord;
    }
    

    The first line computes an index into our integer array buffer variable by using gl_FragCoord. Notice that different fragments could get the same index. Next we increase in one unit counter[index]. Since we know that different fragments can get to do this at the same time we use an atomic operation to make sure that we don’t lose any increments.

    Notice that if two fragments can write to the same index, reading the value of counter[index] after the atomicAdd can lead to different results. For example, if two fragments have already executed the atomicAdd, and assuming that counter[index] is initialized to 0, then both would read counter[index] == 2, however, if only one of the fragments has executed the atomic operation by the time it reads counter[index] it would read a value of 1, while the other fragment would read a value of 2 when it reaches that point in the shader execution. Since our shader intends to record the coordinates of the first fragment that writes to counter[index], that won’t work for us. Instead, we use the return value of the atomic operation (which returns the value that the buffer variable had right before changing it) and we write to first[index] only when that value was 0. Because we use the atomic operation to read the previous value of counter[index], only one fragment will read a value of 0, and that will be the fragment that first executed the atomic operation.

    If this is not the first fragment assigned to that index, we write to last[index] instead. Again, multiple fragments assigned to the same index could do this simultaneously, but that is okay here, because we only care about the the last write. Also notice that it is possible that different executions of the same rendering command produce different values of first[] and last[].

    The remaining instructions unconditionally push the fragment coordinates to the unsized array. We keep the last index into the unsized array fragments[] we have written to in the buffer variable total. Each fragment will atomically increase total before writing to the unsized array. Notice that, once again, we have to be careful when reading the value of total to make sure that each fragment reads a different value and we never have two fragments write to the same entry.

    Using shader storage buffer objects (GL)

    On the side of the GL application, we need to create the buffer, bind it to the appropriate binding point and initialize it. We do this as usual, only that we use the new GL_SHADER_STORAGE_BUFFER target:

    typedef struct {
       float first[8*4];      // vec4[8]
       float last[8*4];       // vec4[8]
       int counter[8*4];      // int[8] padded as per std140
       int total;             // int
       int pad[3];            // padding: as per std140 rules
       char fragments[1024];  // up to 1024 bytes of unsized array
    } SSBO;
    
    SSBO data;
    
    (...)
    
    memset(&data, 0, sizeof(SSBO));
    
    GLuint buf;
    glGenBuffers(1, &buf);
    glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, buf);
    glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(SSBO), &data, GL_DYNAMIC_DRAW);
    

    The code creates a buffer, binds it to binding point 0 of GL_SHADER_STORAGE_BUFFER (the same we have bound our shader to) and initializes the buffer data to 0. Notice that because we are using std140 we have to be aware of the alignment rules at work. We could have used std430 instead to avoid this.

    Since we have 1024 bytes for the fragments[] unsized array and we are pushing a vec4 (16 bytes) worth of data to it with every fragment we process then we have enough room for 64 fragments. It is the developer’s responsibility to ensure that this limit is not surpassed, otherwise we would write beyond the allocated space for our buffer and the results would be undefined.

    The next step is to do some rendering so we get our shaders to work. That would trigger the execution of our fragment shader for each fragment produced, which will generate writes into our buffer for each buffer variable the shader code writes to. After rendering, we can map the buffer and read its contents from the GL application as usual:

    SSBO *ptr = (SSBO *) glMapNamedBuffer(buf, GL_READ_ONLY);
    
    /* List of fragments recorded in the unsized array */
    printf("%d fragments recorded:\n", ptr->total);
    float *coords = (float *) ptr->fragments;
    for (int i = 0; i < ptr->total; i++, coords +=4) {
       printf("Fragment %d: (%.1f, %.1f, %.1f, %.1f)\n",
              i, coords[0], coords[1], coords[2], coords[3]);
    }
    
    /* First fragment for each index used */
    for (int i = 0; i < 8; i++) {
       if (ptr->counter[i*4] > 0)
          printf("First fragment for index %d: (%.1f, %.1f, %.1f, %.1f)\n",
                 i, ptr->first[i*4], ptr->first[i*4+1],
                 ptr->first[i*4+2], ptr->first[i*4+3]);
    }
    
    /* Last fragment for each index used */
    for (int i = 0; i < 8; i++) {
       if (ptr->counter[i*4] > 1)
          printf("Last fragment for index %d: (%.1f, %.1f, %.1f, %.1f)\n",
                 i, ptr->last[i*4], ptr->last[i*4+1],
                 ptr->last[i*4+2], ptr->last[i*4+3]);
       else if (ptr->counter[i*4] == 1)
          printf("Last fragment for index %d: (%.1f, %.1f, %.1f, %.1f)\n",
                 i, ptr->first[i*4], ptr->first[i*4+1],
                 ptr->first[i*4+2], ptr->first[i*4+3]);
    }
    
    /* Fragment counts for each index */
    for (int i = 0; i < 8; i++) {
       if (ptr->counter[i*4] > 0)
          printf("Fragment count at index %d: %d\n", i, ptr->counter[i*4]);
    }
    glUnmapNamedBuffer(buf);
    

    I get this result for an execution where I am drawing a handful of points:

    4 fragments recorded:
    Fragment 0: (199.5, 150.5, 0.5, 1.0)
    Fragment 1: (39.5, 150.5, 0.5, 1.0)
    Fragment 2: (79.5, 150.5, 0.5, 1.0)
    Fragment 3: (139.5, 150.5, 0.5, 1.0)
    
    First fragment for index 3: (139.5, 150.5, 0.5, 1.0)
    First fragment for index 7: (39.5, 150.5, 0.5, 1.0)
    
    Last fragment for index 3: (139.5, 150.5, 0.5, 1.0)
    Last fragment for index 7: (79.5, 150.5, 0.5, 1.0)
    
    Fragment count at index 3: 1
    Fragment count at index 7: 3
    

    It recorded 4 fragments that the shader mapped to indices 3 and 7. Multiple fragments where assigned to index 7 but we could handle that gracefully by using the corresponding atomic functions. Different executions of the same program will produce the same 4 fragments and map them to the same indices, but the first and last fragments recorded for index 7 can change between executions.

    Also notice that the first fragment we recorded in the unsized array (fragments[0]) is not the first fragment recorded for index 7 (fragments[1]). That means that the execution of fragments[0] got first to the unsized array addition code, but the execution of fragments[1] beat it in the race to execute the code that handled the assignment to the first/last arrays, making clear that we cannot make any assumptions regarding the execution order of reads and writes coming from different instances of the same shader execution.

    So that’s it, the patches are now in the mesa-dev mailing list undergoing review and will hopefully land soon, so look forward to it! Also, if you have any interesting uses for this new feature, let me know in the comments.

    by Iago Toral at May 20, 2015 08:57 AM

    May 11, 2015

    Jacobo Aragunde

    Speaking at Protocols Plugfest 2015

    In a couple of hours, I will go to the airport and head for Zaragoza, where I’m taking part of the Protocols Plugfest 2015 happening this week.

    Protocols Plugfest 2015 logo

    Igalia will contribute a speech about LibreOffice interoperability with ECM solutions, specially SharePoint, through the CMIS protocol.

    See you there!

    EDIT: The plugfest is over! Thanks to everyone attending. By the way, I couldn’t help doing some changes to the slides the night before the talk; I’ve updated the link above.

    by Jacobo Aragunde Pérez at May 11, 2015 08:03 AM

    April 28, 2015

    Claudio Saavedra

    Tue 2015/Apr/28

    A follow up to my last post. As I was writing it, someone was packaging Linux 4.0 for Debian. I fetched it from the experimental distribution today and all what was broken with the X1 Carbon now works (that is, the bluetooth keyboard, trackpad button events, and 3G/4G USB modem networking). The WEP128 authentication still doesn't work but you shouldn't be using it anyway because aircrack and so on and so on.

    So there you have it, just upgrade your kernel and enjoy a functional laptop. I will still take the opportunity to public shame Lenovo for the annoying noise coming out of the speakers every once in a while. Bad Lenovo, very bad.

    April 28, 2015 08:04 AM

    April 27, 2015

    Iago Toral

    Free access to Valve-produced games on Steam for Mesa contributors

    Just like they did for Debian developers before, it is Valve’s way of saying thanks and giving something back to the community. This is great news for all Mesa contributors, now we can play some great Valve games for free and we can also have an easier time looking into bug reports for them, which also works great for Valve, closing a perfect circle :)

    by Iago Toral at April 27, 2015 11:34 AM

    April 21, 2015

    Claudio Saavedra

    Tue 2015/Apr/21

    Igalia got me a Lenovo X1 Carbon, third generation. I decided to install Debian on it without really considering that the imminent release of Debian Jessie would get in the way. After a few weeks of tinkering, these are a few notes on what works (with a little help) and what doesn't (yet).

    What works (with a little help):

    • Graphics acceleration: Initially X was using llvmpipe and software rasterization. This laptop has Intel Broadwell graphics and the support for it has not been without issues recently. I installed libdrm-intel1, xserver-xorg-video-intel, and the 3.19 kernel from experimental, and that fixed it.

    • I also got a OneLink Pro Dock, which I use to connect to two external displays to the laptop. For whatever reason, these were not detected properly with Jessie's 3.16 kernel. Upgrading to 3.19 fixed this too. I should have used the preinstalled Windows to upgrade its firmware, by the way, but by the time I realized, the Windows partitions were long gone.

    • But upgrading to 3.19 broke both wireless and bluetooth, as with this kernel version newer binary firmware blobs are needed. These are not yet packaged in Debian, but until then you can fetch them from the web. The files needed are ibt-hw-37.8.10-fw-1.10.3.11.e.bseq for bluetooth and iwlwifi-7265-12.ucode for the wireless. There is a bug about it in the Debian bugtracker somewhere.

    • Intel's Rapid Start Technology. Just follow Matthew Garrett's advice and create a large enough partition with the appropriate type.

    What doesn't work yet:

    • My Bluetooth keyboard. There are disconnects at random intervals that make it pretty much useless. This is reported in the Debian bugtracker but there have not been any responses yet. I packaged the latest BlueZ release and installed it locally, but that didn't really help, so I'm guessing that the issue is in the kernel. It is possible that my package is broken, though, as I had to rebase some Debian patches and remove others. As a side note, I had forgotten how nice quilt can be for this.

    • The trackpad buttons. Some people suggest switching the driver but then Synaptics won't work. So there's that. I think that the 4.0 kernel has the fixes needed, but last I checked there was no package yet and I don't feel like compiling a kernel. Compiling browsers the whole day is already enough for me, so I'll wait.

    • Using a Nokia N9 as a USB mobile broadband modem or the integrated Sierra 4G modem. The former works in my Fedora laptop, and in Debian both seem to be detected correctly, but journalctl reports some oddities, like:

      Apr 20 21:27:11 patanjali ModemManager[560]: <info>  ModemManager (version 1.4.4) starting in system bus...
      Apr 20 21:27:13 patanjali ModemManager[560]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00:14.0/usb3/3-3/3-3.1/3-3.1.3': not supported by any plugin
      Apr 20 21:27:13 patanjali ModemManager[560]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00:19.0': not supported by any plugin
      Apr 20 21:27:13 patanjali ModemManager[560]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00:1c.1/0000:04:00.0': not supported by any plugin
      Apr 20 21:27:20 patanjali ModemManager[560]: <info>  Creating modem with plugin 'Generic' and '2' ports
      Apr 20 21:27:20 patanjali ModemManager[560]: <info>  Modem for device at '/sys/devices/pci0000:00/0000:00:14.0/usb2/2-4' successfully created
      Apr 20 21:27:20 patanjali ModemManager[560]: <warn>  Modem couldn't be initialized: couldn't load current capabilities: Failed to determine modem capabilities.
      Apr 21 10:01:15 patanjali ModemManager[560]: <info>  (net/usb0): released by modem /sys/devices/pci0000:00/0000:00:14.0/usb2/2-4
      Apr 21 10:01:15 patanjali ModemManager[560]: <info>  (tty/ttyACM0): released by modem /sys/devices/pci0000:00/0000:00:14.0/usb2/2-4
      Apr 21 10:01:30 patanjali ModemManager[560]: <info>  Creating modem with plugin 'Generic' and '2' ports
      Apr 21 10:01:30 patanjali ModemManager[560]: <info>  Modem for device at '/sys/devices/pci0000:00/0000:00:14.0/usb2/2-4' successfully created
      Apr 21 10:01:30 patanjali ModemManager[560]: <warn>  Modem couldn't be initialized: couldn't load current capabilities: Failed to determine modem capabilities.

      I upgraded to experimental's ModemManager, without any improvement. Haven't yet figured out what could this be, although I only used NetworkManager to try to connect.

    • The (terrible) WEP128 authentication in the Nokia N9 wireless hotspot application. As neither the USB modem nor the 4G one are working yet, using the wireless hotspot is the only alternative for the afternoons outside my home office. Not sure why it won't connect (again, only tested with NetworkManager), but at this point I'm starting to be more pragmatic about being able to use this laptop at all. Leaving the hotspot open was the only alternative. I know.

    I know I should be a good citizen and add at least some of this information to ThinkWiki, but hey, at least I wrote it down somewhere.

    April 21, 2015 07:10 AM

    March 30, 2015

    Manuel Rego

    Web Engines Hackfest 2015: Save the dates!

    This is a short note to announce the dates of the Web Engines Hackfest 2015, that will happen next winter at Igalia Headquearters in A Coruña (Spain), from Monday, 7th December, to Wednesday, 9th December.

    After all the great work and collaboration that happened during the past year edition, with hackers from all parts of the Web Platform community (Chromium/Blink, WebKit, Gecko, Servo, JSC, V8, SpiderMonkey, etc.), Igalia is really excited to host this great event again.

    Web Engines Hackfest 2014 (picture by Adrián Pérez) Web Engines Hackfest 2014 (picture by Adrián Pérez)

    We’re still closing the last details and will be sending the invitations in the coming weeks. However, do not hesitate to send us an invitation request if you are willing to come by the end of the year.

    Do not miss any update following @webhackfest on twitter. For more details, visit the official webpage http://www.webengineshackfest.org/.

    March 30, 2015 10:00 PM

    March 23, 2015

    Carlos García Campos

    WebKitGTK+ 2.8.0

    We are excited and proud of announcing WebKitGTK+ 2.8.0, your favorite web rendering engine, now faster, even more stable and with a bunch of new features and improvements.

    Gestures

    Touch support is one the most important features missing since WebKitGTK+ 2.0.0. Thanks to the GTK+ gestures API, it’s now more pleasant to use a WebKitWebView in a touch screen. For now only the basic gestures are implemented: pan (for scrolling by dragging from any point of the WebView), tap (handling clicks with the finger) and zoom (for zooming in/out with two fingers). We plan to add more touch enhancements like kinetic scrolling, overshot feedback animation, text selections, long press, etc. in future versions.

    HTML5 Notifications

    notifications

    Notifications are transparently supported by WebKitGTK+ now, using libnotify by default. The default implementation can be overridden by applications to use their own notifications system, or simply to disable notifications.

    WebView background color

    There’s new API now to set the base background color of a WebKitWebView. The given color is used to fill the web view before the actual contents are rendered. This will not have any visible effect if the web page contents set a background color, of course. If the web view parent window has a RGBA visual, we can even have transparent colors.

    webkitgtk-2.8-bgcolor

    A new WebKitSnapshotOptions flag has also been added to be able to take web view snapshots over a transparent surface, instead of filling the surface with the default background color (opaque white).

    User script messages

    The communication between the UI process and the Web Extensions is something that we have always left to the users, so that everybody can use their own IPC mechanism. Epiphany and most of the apps use D-Bus for this, and it works perfectly. However, D-Bus is often too much for simple cases where there are only a few  messages sent from the Web Extension to the UI process. User script messages make these cases a lot easier to implement and can be used from JavaScript code or using the GObject DOM bindings.

    Let’s see how it works with a very simple example:

    In the UI process, we register a script message handler using the WebKitUserContentManager and connect to the “script-message-received-signal” for the given handler:

    webkit_user_content_manager_register_script_message_handler (user_content, 
                                                                 "foo");
    g_signal_connect (user_content, "script-message-received::foo",
                      G_CALLBACK (foo_message_received_cb), NULL);
    

    Script messages are received in the UI process as a WebKitJavascriptResult:

    static void
    foo_message_received_cb (WebKitUserContentManager *manager,
                             WebKitJavascriptResult *message,
                             gpointer user_data)
    {
            char *message_str;
    
            message_str = get_js_result_as_string (message);
            g_print ("Script message received for handler foo: %s\n", message_str);
            g_free (message_str);
    }
    

    Sending a message from the web process to the UI process using JavaScript is very easy:

    window.webkit.messageHandlers.foo.postMessage("bar");
    

    That will send the message “bar” to the registered foo script message handler. It’s not limited to strings, we can pass any JavaScript value to postMessage() that can be serialized. There’s also a convenient API to send script messages in the GObject DOM bindings API:

    webkit_dom_dom_window_webkit_message_handlers_post_message (dom_window, 
                                                                "foo", "bar");
    

     

    Who is playing audio?

    WebKitWebView has now a boolean read-only property is-playing-adio that is set to TRUE when the web view is playing audio (even if it’s a video) and to FALSE when the audio is stopped. Browsers can use this to provide visual feedback about which tab is playing audio, Epiphany already does that :-)

    ephy-is-playing-audio

    HTML5 color input

    Color input element is now supported by default, so instead of rendering a text field to manually input the color  as hexadecimal color code, WebKit now renders a color button that when clicked shows a GTK color chooser dialog. As usual, the public API allows to override the default implementation, to use your own color chooser. MiniBrowser uses a popover, for example.

    mb-color-input-popover

    APNG

    APNG (Animated PNG) is a PNG extension that allows to create animated PNGs, similar to GIF but much better, supporting 24 bit images and transparencies. Since 2.8 WebKitGTK+ can render APNG files. You can check how it works with the mozilla demos.

    webkitgtk-2.8-apng

    SSL

    The POODLE vulnerability fix introduced compatibility problems with some websites when establishing the SSL connection. Those problems were actually server side issues, that were incorrectly banning SSL 3.0 record packet versions, but that could be worked around in WebKitGTK+.

    WebKitGTK+ already provided a WebKitWebView signal to notify about TLS errors when loading, but only for the connection of the main resource in the main frame. However, it’s still possible that subresources fail due to TLS errors, when using a connection different to the main resource one. WebKitGTK+ 2.8 gained WebKitWebResource::failed-with-tls-errors signal to be notified when a subresource load failed because of invalid certificate.

    Ciphersuites based on RC4 are now disallowed when performing TLS negotiation, because it is no longer considered secure.

    Performance: bmalloc and concurrent JIT

    bmalloc is a new memory allocator added to WebKit to replace TCMalloc. Apple had already used it in the Mac and iOS ports for some time with very good results, but it needed some tweaks to work on Linux. WebKitGTK+ 2.8 now also uses bmalloc which drastically improved the overall performance.

    Concurrent JIT was not enabled in GTK (and EFL) port for no apparent reason. Enabling it had also an amazing impact in the performance.

    Both performance improvements were very noticeable in the performance bot:

    webkitgtk-2.8-perf

     

    The first jump on 11th Feb corresponds to the bmalloc switch, while the other jump on 25th Feb is when concurrent JIT was enabled.

    Plans for 2.10

    WebKitGTK+ 2.8 is an awesome release, but the plans for 2.10 are quite promising.

    • More security: mixed content for most of the resources types will be blocked by default. New API will be provided for managing mixed content.
    • Sandboxing: seccomp filters will be used in the different secondary processes.
    • More performance: FTL will be enabled in JavaScriptCore by default.
    • Even more performance: this time in the graphics side, by using the threaded compositor.
    • Blocking plugins API: new API to provide full control over the plugins load process, allowing to block/unblock plugins individually.
    • Implementation of the Database process: to bring back IndexedDB support.
    • Editing API: full editing API to allow using a WebView in editable mode with all editing capabilities.

    by carlos garcia campos at March 23, 2015 11:56 AM

    March 18, 2015

    Víctor Jáquez

    GStreamer Hackfest 2015

    Last weekend was the GStreamer Hackfest in Staines, UK, in the Samsung’s premises, who also sponsored the dinners and the lunches. Special thanks to Luis de Bethencourt, the almighty organizer!

    My main purpose was to sip one or two pints with the GStreamer folks and, secondarily, to talk about gstreamer-vaapi, WebKitGTK+ and the new OpenGL/ES support in gst-plugins-bad.

    15030008

    About gstreamer-vaapi, there were a couple questions about some problems shown in downstream (stable releases in distributions) which I was happy to announce that they are mostly fixed in upstream. On the other hand, Sebastian Drödge was worried about the existing support of GStreamer 0.10 and I answered him that its removal is already in the pipeline. He looked pleased.

    Related with gstreamer-vaapi and the new GstGL, we tested and merged a patch for GLES2/EGL, so now it is possible to render VA-API decoded video through glimagesink with (nearly) zero-copy. Sadly, this is not currently possible using GLX. Along the way I found a silly bug that came from a previous patch of mine and fixed it; also, we fixed other small bug in the gluploader .

    In the WebKitGTK+ realm, I worked on a new functionality: to share the OpenGL context and the display of the browser with the GStreamer pipeline. With it, we could add gl filters into the pipeline. But honour to whom honour is due: this patch is a split of a previous patch done by Philippe Normand. The ultimate goal is to ditch the custom video sink in WebKit and reuse the glimagesink, with it’s new off-screen rendering feature.

    Finally, on Sunday’s afternoon, I walked around Richmond and it is beautiful.

    15030009

    Thanks to Igalia, Intel and all the sponsors  that make possible the hackfest and my attendance.

    by vjaquez at March 18, 2015 06:36 PM

    March 17, 2015

    Jacobo Aragunde

    Creating new document providers in LibreOffice for Android

    We recently completed our tasks for The Document Foundation regarding the Android document browser; nonetheless, we had a pending topic regarding the documentation of our work: write and publish a guide to extend the cloud storage integration. This blog post covers how to integrate new cloud solutions using the framework for cloud storage we have implemented.

    Writing a document provider

    Document Provider class diagram

    We have called “document providers” to the set of classes that implement support for some storage solution. Document providers will consist of two classes implementing the IDocumentProvider and IFile interfaces. Both contain extensive in-code documentation of the operations to help anybody implementing them.

    The IDocumentProvider interface provides some general operations about the provider, addressed to provide a starting point for the service. getRootDirectory() provides a pointer to the root of the service, while createFromUri() is required to restore the state of the document browser.

    The IFile interface is an abstraction of the java File class, with many similar operations. Those operations will be used by the document browser to print information about the files, browse the directories and open the final documents.

    Once those classes have been implemented, the new provider must be linked with the rest of the application by making some modifications to DocumentProviderFactory class. Touching the initialize() method to add a new instance of the provider to the providers[] array should be enough:

        // initialize document providers list
        instance.providers = new IDocumentProvider[3];
        instance.providers[0] = new LocalDocumentsDirectoryProvider();
        instance.providers[1] = new LocalDocumentsProvider();
        instance.providers[2] = new OwnCloudProvider(context);
    

    At this point, your provider should appear in the drawer that pops-up with a swipe gesture from the left of the screen.

    LibreOffice for Android, provider selection

    You are encouraged to create the classes for your document provider in a separate package inside org.libreoffice.storage. Your operations may throw a RuntimeException in case of error, it will be captured by the UI activity and the message inside the exception will be shown, so make sure that you are internationalizing the strings using the standard Android API. You can always take a look to the existing providers and use them as an example, specially OwnCloudProvider which is the most complex one but still quite manageable.

    Making use of application settings

    If you are implementing a generic provider for some cloud service, it is quite likely that you need some input from the user like a login name or a password. For that reason we have added an activity for configuration related with document providers.

    To add your settings in that screen, modify the file res/xml/documentprovider_preferences.xml and add a new PreferenceCategory that contain your own ones. The android:key attribute will allow you to use the preference from your code; you may want to add that preference string as a constant in DocumentProviderSettingsActivity.

    At this point, you will be able to use the preferences in your DocumentProvider using the standard Android API. Take OwnCloudProvider as an example:

        public OwnCloudProvider(Context context) {
            ...
            // read preferences
            SharedPreferences preferences = PreferenceManager.getDefaultSharedPreferences(context);
            serverUrl = preferences.getString(
                    DocumentProviderSettingsActivity.KEY_PREF_OWNCLOUD_SERVER, "");
    

    Finally, we have added a way for providers to check if settings have changed; otherwise, the application should be restarted for the changes to take effect. Your provider must implement the OnSharedPreferenceChangeListener, which brings the onSharedPreferenceChanged() method. That method will be called whenever any preference is changed, you can just check the ones you are interested in using the key parameter, and make the changes required to the internal state of your provider.

    Preference listener class diagram

    Future

    This effort has been one more step towards building a full featured version of LibreOffice for Android, and there are improvements we can, or even must do in the future:

    • With the progression of the work in the LibreOffice Android editor we will have to add a save operation to the document providers that takes care of uploading the new version of the file to the cloud.
    • It would be a good idea to implement an add-on mechanism to install the document providers. That way we would not add unnecessary weight to the main package, and plugins could be independently distributed.

    That’s all for now; you can try the ownCloud provider building the feature/owncloud-provider-for-android branch yourself, or wait for it to be merged. We hope to see other members of the community taking advantage of this framework to provide new services soon.

    by Jacobo Aragunde Pérez at March 17, 2015 11:54 AM

    March 12, 2015

    Michael Catanzaro

    Stop using RC4

    A follow up of my previous post: in response to my letter, NIST is going to increase the CVSS score of CVE-2013-2566 (RC4) to match CVE-2011-3389 (BEAST). Yay!

    In other news, WebKitGTK+ 2.8 has full support for RFC 7465. That’s a fancy way of saying that we will no longer negotiate RC4 connections and you will now be unable to access the small minority of HTTPS sites that offer nothing but RC4. Hopefully other browsers will follow along sooner rather than later. In particular, Firefox nightly has stopped negotiating RC4 except for a few whitelisted sites: I would very much like to see that whitelist removed. Internet Explorer has stopped negotiating RC4 except when it performs voluntary protocol version fallback. It would be great to see a firmer stance from Mozilla and Microsoft, and some action from Google and Apple.

    by Michael Catanzaro at March 12, 2015 08:37 PM

    March 09, 2015

    Javier Fernández

    Content Distribution in CSS Grid Layout

    It’s been a while since Igalia and Bloomberg started to implement the Box Alignment specification for the CSS Grid Layout model. Some weeks ago we accomplished an important milestone of our roadmap landing in Blink trunk the last patches implementing the Content Distribution properties: align-content and justify-content.

    Quoting the CSS Box Alignment document, the content distribution properties are defined as follows:

    Aligns the contents of the box as a whole along the box’s inline/row/main axis.
    The alignment container is the grid container’s content box. The alignment subjects are the grid tracks.

    The CSS syntax of these recently added properties gives an idea of how powerful and flexible they are for grid layout definitions, allowing every possible alignment values combination:

    auto | <baseline-position> | <content-distribution> || [ <overflow-position>? && <content-position> ]

    It’s worth mentioning that Baseline Alignment is still not implemented, as well as the <content-distribution> values for Distribution Alignment. However, in the latter case I’ve got already a quite promising draft implementation which, eventually, has been very useful to activate a discussion inside the W3C community to allow these alignment values for grid. In previous versions of the specification it was stated that all <content-distribution> values should use their <content-position> fallback values for grid containers. I’m glad that such decision was finally made, because I think that <content-distribution> values are really useful for defining fancier grid layouts. I’ll talk about this soon in a new post, as I consider it deserves a detailed description with the proper examples.

    Last, but not least, as it happens with Self Alignment it allows using overflow keywords to define how we want to handle grid’s content overflow. It works in the same way for Content Distribution as we’ll see later with some examples.

    Aligning the grid

    When there is available space in the grid container block, it’s always useful to have a way to control how we want to use such space and how we want our grid to behave on it. It might happen that container’s size changes (fullscreen mode) or we could have to deal with a content sized grid modifying its content’s size. There are quite many possibilities, so I’ll leave this issue for user/designer’s imagination and I’ll focus on very simple examples to illustrate the concept.

    For now, let’s consider this case to understand what you can do with the different <content-alignment> values in a grid layout.

    .grid {
        grid: 50px 50px / 100px 100px;
        position: relative;
        width: 200px;
        height: 300px;
    }
    .fixedSize {
        width: 20px;
        height: 40px;
    }
    <div class="grid">
       <div class="fixedSize" style="grid-column: 1; grid-row: 1; background: violet;"></div>
       <div style="grid-column: 1; grid-row: 2; background: yellow;"></div>
       <div style="grid-column: 2; grid-row: 1; background: green;"></div>
       <div class="fixedSize" style="grid-column: 2; grid-row: 2; background: red;"></div>
    </div>

    We are defining a 2×2 grid with 50×100 pixels cells where we are going to place 4 items, one in each cell. Notice that items at (1,1) and (2,2) have a fixed size of 20×40 pixels, while the other 2 are auto-sized so they will be stretched to fill their corresponding grid cell (if you don’t know why, a reading of previous post might help). Also, bear in mind that both align-content and justify-content properties have start as the initial value for grids.

    ContentAlignment

    Controlling the grid overflow

    When grid content’s size exceeds its container dimensions there is the risk of data loss. Some examples of this scenario are center or end alignment from the viewport’s edges; all the content overflowing the viewport’s area can’t be reached, hence we lose such data. In order to prevent this issue Box Alignment specification defines the safe overflow mode, which basically forces a start alignment value for the property handling the dimension where the overflow is detected.

    Using the same CSS and HTML code in the example above, we can easily define cases where this data loss issue (red colored arrows) is clearly noticeable just modifying the height or width to cause top or left overflow respectively.

    Content-Alignment-Overflow1

    There are other situations where Content Alignment and Overflow interact in a different way, using margins, padding or/and borders and even combining different writing-modes and flow directions. The effect of the alignment values varies considerably depending on those factors but I think you have now a clear idea of how to use these new properties in a grid layout.

    Current status and next steps

    With the grid support for the align-content and justify-content CSS properties in Blink we’ve got most of the Box Alignment specification covered. As it was commented before, just Base Alignment is still pending to be implemented in Chromium browsers. I have to admit that there are also some bugs and wrong behavior using certain CSS combinations, specially regarding orthogonal flows, but we are working on it right now and I hope to integrate the patches soon in trunk.

    For the time being, let’s consider the following table as the current implementation status of the Box Alignment specification for the Grid Layout model in WebKit (Safari/Epiphany) and Blink (Chrome/Chromium/Opera) based browsers:

    align-grid-support-1

    The lack of progress in the implementation of the Box Alignment specification in the WebKit web engine is disappointing. I’ve been stuck for quite a lot of time trying to upgrade the CSS properties to the last version of the spec, mainly due design and performance issues. I’ll discuss with the WebKit hackers the best approach to solve this issue so I can put the Grid Layout implementation at the same level than in Blink web engine.

    Igalia and Bloomberg will continue working on the implementation of the CSS Grid Layout specification and among my short/mid term challenges are completing the Box Alignment support. These goals include the following tasks:

    • Fixing bugs and completing the orthogonal flows support.
    • Implementing the Base Alignment features
    • Completing the Content Distribution Alignment with the <content-distribution> values
    • Implementing the Box Alignment spec in WebKit
    Igalia & Bloomberg logos

    Igalia and Bloomberg working to build a better web platform

    by jfernandez at March 09, 2015 01:58 PM

    March 06, 2015

    Iago Toral

    An introduction to Mesa’s GLSL compiler (II)

    Recap

    My previous post served as an initial look into Mesa’s GLSL compiler, where we discussed the Mesa IR, which is a core aspect of the compiler. In this post I’ll introduce another relevant aspect: IR lowering.

    IR lowering

    There are multiple lowering passes implemented in Mesa (check src/glsl/lower_*.cpp for a complete list) but they all share a common denominator: their purpose is to re-write certain constructs in the IR so they fit better the underlying GPU hardware.

    In this post we will look into the lower_instructions.cpp lowering pass, which rewrites expression operations that may not be supported directly by GPU hardware with different implementations.

    The lowering process involves traversing the IR, identifying the instructions we want to lower and modifying the IR accordingly, which fits well into the visitor pattern strategy discussed in my previous post. In this case, expression lowering is handled by the lower_instructions_visitor class, which implements the lowering pass in the visit_leave() method for ir_expression nodes.

    The hierarchical visitor class, which serves as the base class for most visitors in Mesa, defines visit() methods for leaf nodes in the IR tree, and visit_leave()/visit_enter() methods for non-leaf nodes. This way, when traversing intermediary nodes in the IR we can decide to take action as soon as we enter them or when we are about to leave them.

    In the case of our lower_instructions_visitor class, the visit_leave() method implementation is a large switch() statement with all the operators that it can lower.

    The code in this file lowers common scenarios that are expected to be useful for most GPU drivers, but individual drivers can still select which of these lowering passes they want to use. For this purpose, hardware drivers create instances of the lower_instructions class passing the list of lowering passes to enable. For example, the Intel i965 driver does:

    const int bitfield_insert = brw->gen >= 7
                                ? BITFIELD_INSERT_TO_BFM_BFI
                                : 0;
    lower_instructions(shader->base.ir,
                       MOD_TO_FLOOR |
                       DIV_TO_MUL_RCP |
                       SUB_TO_ADD_NEG |
                       EXP_TO_EXP2 |
                       LOG_TO_LOG2 |
                       bitfield_insert |
                       LDEXP_TO_ARITH);
    

    Notice how in the case of Intel GPUs, one of the lowering passes is conditionally selected depending on the hardware involved. In this case, brw->gen >= 7 selects GPU generations since IvyBridge.

    Let’s have a look at the implementation of some of these lowering passes. For example, SUB_TO_ADD_NEG is a very simple one that transforms subtractions into negative additions:

    void
    lower_instructions_visitor::sub_to_add_neg(ir_expression *ir)
    {
       ir->operation = ir_binop_add;
       ir->operands[1] =
          new(ir) ir_expression(ir_unop_neg, ir->operands[1]->type,
                                ir->operands[1], NULL);
       this->progress = true;
    }
    

    As we can see, the lowering pass simply changes the operator used by the ir_expression node, and negates the second operand using the unary negate operator (ir_unop_neg), thus, converting the original a = b – c into a = b + (-c).

    Of course, if a driver does not have native support for the subtraction operation, it could still do this when it processes the IR to produce native code, but this way Mesa is saving driver developers that work. Also, some lowering passes may enable optimization passes after the lowering that drivers might miss otherwise.

    Let’s see a more complex example: MOD_TO_FLOOR. In this case the lowering pass provides an implementation of ir_binop_mod (modulo) for GPUs that don’t have a native modulo operation.

    The modulo operation takes two operands (op0, op1) and implements the C equivalent of the ‘op0 % op1‘, that is, it computes the remainder of the division of op0 by op1. To achieve this the lowering pass breaks the modulo operation as mod(op0, op1) = op0 – op1 * floor(op0 / op1), which requires only multiplication, division and subtraction. This is the implementation:

    ir_variable *x = new(ir) ir_variable(ir->operands[0]->type, "mod_x",
                                         ir_var_temporary);
    ir_variable *y = new(ir) ir_variable(ir->operands[1]->type, "mod_y",
                                         ir_var_temporary);
    this->base_ir->insert_before(x);
    this->base_ir->insert_before(y);
    
    ir_assignment *const assign_x =
       new(ir) ir_assignment(new(ir) ir_dereference_variable(x),
                             ir->operands[0], NULL);
    ir_assignment *const assign_y =
       new(ir) ir_assignment(new(ir) ir_dereference_variable(y),
                             ir->operands[1], NULL);
    
    this->base_ir->insert_before(assign_x);
    this->base_ir->insert_before(assign_y);
    
    ir_expression *const div_expr =
       new(ir) ir_expression(ir_binop_div, x->type,
                             new(ir) ir_dereference_variable(x),
                             new(ir) ir_dereference_variable(y));
    
    /* Don't generate new IR that would need to be lowered in an additional
     * pass.
     */
    if (lowering(DIV_TO_MUL_RCP) && (ir->type->is_float() ||
        ir->type->is_double()))
       div_to_mul_rcp(div_expr);
    
    ir_expression *const floor_expr =
       new(ir) ir_expression(ir_unop_floor, x->type, div_expr);
    
    if (lowering(DOPS_TO_DFRAC) && ir->type->is_double())
       dfloor_to_dfrac(floor_expr);
    
    ir_expression *const mul_expr =
       new(ir) ir_expression(ir_binop_mul,
                             new(ir) ir_dereference_variable(y),
                             floor_expr);
    
    ir->operation = ir_binop_sub;
    ir->operands[0] = new(ir) ir_dereference_variable(x);
    ir->operands[1] = mul_expr;
    this->progress = true;
    

    Notice how the first thing this does is to assign the operands to a variable. The reason for this is a bit tricky: since we are going to implement ir_binop_mod as op0 – op1 * floor(op0 / op1), we will need to refer to the IR nodes op0 and op1 twice in the tree. However, we can’t just do that directly, for that would mean that we have the same node (that is, the same pointer) linked from two different places in the IR expression tree. That is, we want to have this tree:

                                 sub
                               /     \
                            op0       mult
                                     /    \
                                  op1     floor
                                            |
                                           div
                                          /   \
                                       op0     op1
    

    Instead of this other tree:

    
                                 sub
                               /     \
                               |      mult
                               |     /   \
                               |   floor  |
                               |     |    |
                               |    div   |
                               |   /   \  |
                                op0     op1   
    

    This second version of the tree is problematic. For example, let’s say that a hypothetical optimization pass detects that op1 is a constant integer with value 1, and realizes that in this case div(op0/op1) == op0. When doing that optimization, our div subtree is removed, and with that, op1 could be removed too (and possibily freed), leaving the other reference to that operand in the IR pointing to an invalid memory location… we have just corrupted our IR:

                                 sub
                               /     \
                               |      mult
                               |     /    \
                               |   floor   op1 [invalid pointer reference]
                               |     |
                               |    /
                               |   /
                                op0 
    

    Instead, what we want to do here is to clone the nodes each time we need a new reference to them in the IR. All IR nodes have a clone() method for this purpose. However, in this particular case, cloning the nodes creates a new problem: op0 and op1 are ir_expression nodes so, for example, op0 could be the expression a + b * c, so cloning the expression would produce suboptimal code where the expression gets replicated. This, at best, will lead to slower
    compilation times due to optimization passes needing to detect and fix that, and at worse, that would go undetected by the optimizer and lead to worse performance where we compute the value of the expression multiple times:

                                  sub
                               /        \
                             add         mult
                            /   \       /    \
                          a     mult  op1     floor
                                /   \          |
                               b     c        div
                                             /   \
                                          add     op1
                                         /   \
                                        a    mult
                                            /    \
                                            b     c
    

    The solution to this problem is to assign the expression to a variable, then dereference that variable (i.e., read its value) wherever we need. Thus, the implementation defines two variables (x, y), assigns op0 and op1 to them and creates new dereference nodes wherever we need to access the value of the op0 and op1 expressions:

                           =               =
                         /   \           /   \
                        x     op0       y     op1
    
    
                                 sub
                               /     \
                             *x       mult
                                     /    \
                                   *y     floor
                                            |
                                           div
                                          /   \
                                        *x     *y
    

    In the diagram above, each variable dereference is marked with an ‘*’, and each one is a new IR node (so both appearances of ‘*x’ refer to different IR nodes, both representing two different reads of the same variable). With this solution we only evaluate the op0 and op1 expressions once (when they get assigned to the corresponding variables) and we never refer to the same IR node twice from different places (since each variable dereference is a new IR node).

    Now that we know why we assign these two variables, let’s continue looking at the code of the lowering pass:

    In the next step we implement op0 / op1 using a ir_binop_div expression. To speed up compilation, if the driver has the DIV_TO_MUL_RCP lowering pass enabled, which transforms a / b into a * 1 / b (where 1 / b could be a native instruction), we immediately execute the lowering pass for that expression. If we didn’t do this here, the resulting IR would contain a division operation that might have to be lowered in a later pass, making the compilation process slower.

    The next step uses a ir_unop_floor expression to compute floor(op0/op1), and again, tests if this operation should be lowered too, which might be the case if the type of the operands is a 64bit double instead of a regular 32bit float, since GPUs may only have a native floor instruction for 32bit floats.

    Next, we multiply the result by op1 to get op1 * floor(op0 / op1).

    Now we only need to subtract this from op0, which would be the root IR node for this expression. Since we want the new IR subtree spawning from this root node to replace the old implementation, we directly edit the IR node we are lowering to replace the ir_binop_mod operator with ir_binop_sub, make a dereference to op1 in the first operand and link the expression holding op1 * floor(op0 / op1) in the second operand, effectively attaching our new implementation in place of the old version. This is how the original and lowered IRs look like:

    Original IR:

    [prev inst] -> mod -> [next inst]
                  /   \            
               op0     op1         
    

    Lowered IR:

    [prev inst] -> var x -> var y ->   =   ->   =   ->   sub   -> [next inst]
                                      / \      / \      /   \
                                     x  op0   y  op1  *x     mult
                                                            /    \
                                                          *y      floor
                                                                    |
                                                                   div
                                                                  /   \
                                                                *x     *y
    

    Finally, we return true to let the compiler know that we have optimized the IR and that as a consequence we have introduced new nodes that may be subject to further lowering passes, so it can run a new pass. For example, the subtraction we just added may be lowered again to a negative addition as we have seen before.

    Coming up next

    Now that we learnt about lowering passes we can also discuss optimization passes, which are very similar since they are also based on the visitor implementation in Mesa and also transform the Mesa IR in a similar way.

    by Iago Toral at March 06, 2015 02:06 PM

    Samuel Iglesias

    My FOSDEM 2015 talk’s slides and video

    As I said in a previous post, this year I attended FOSDEM to give talk about how to test our OpenGL drivers with Free Software. The slides are available online and the video recording has been recently released.

    Last but not least, I would like to thank Igalia for sponsoring my trip :-)

    Igalia logo

    by Samuel Iglesias at March 06, 2015 08:25 AM

    March 05, 2015

    Jacobo Aragunde

    New features in LibreOffice for Android document browser

    The Document Foundation recently assigned one of the packages of the Android tender to Igalia; in particular, the one about cloud storage and email sharing. Our proposal comprised the following tasks:

    • Integrate the “share” feature of the Android framework to be able to send documents by email, bluetooth or any other means provided by the system.
    • Provide the means for the community to develop integration of cloud storage solutions.
    • Implement ownCloud integration as an example of how to integrate other cloud solutions.
    • Extensive documentation of the process to integrate more cloud solutions.

    The work is completed and the patches available in the repository; most of them are already merged in master, while ownCloud support lives in a different branch for now.

    Sharing documents

    The Android-provided share feature allows to send a document not only through email but through bluetooth or any available methods, depending on the software installed in your device.

    We have made this feature available to users through a context menu in the document browser, which pops up after a long press on a document.

    Context menu in Android document browser

    Share from the Android document browser

    Support for cloud storage solutions

    This task consisted of creating an interface to develop integration of any cloud storage solution. The first step was abstracting the code that made direct access to the file system, so it could be replaced by the different implementations of storage services, which from now on will be denominated document providers.

    Afterwards, we created two document providers for local storage: one to access the internal storage of the device and another one to conveniently access the Documents directory inside the storage. These two simple providers served as a test for the UI to switch between both; we used the Android drawer widget, which pops-up with a swipe gesture from the left of the screen.

    Side drawer in Android document browser

    All the operations in the Android document browser were being performed in the same thread. Besides being suboptimal, the development framework actually forbids running network code in the main thread of the application. The next step for us was isolating the code that might need networking access when interacting with a cloud provider, and run it in separate threads.

    ownCloud document provider

    At that point, we had everything in place to write the code to access an ownCloud server. We did it with the help of an Android library provided by ownCloud developers.

    There was still another task, though; any cloud service will likely need some configuration from the user for login credentials and so. We had to implement a preferences screen to enter these settings and do the proper wiring for the provider to be able to listen to any changes in them.

    ownCloud settings screen

    Documentation

    To help other developers writing new document providers, we have tried to document the new code in detail, specially those interfaces that must be implemented to create new document providers. Besides, we will publish a document explaining how to extend the cloud storage integration here soon.

    That’s all for now; to try the ownCloud provider you will have to build the feature/owncloud-provider-for-android branch yourself, while you will find the share feature in the packages already available in the Play Store or F-Droid. Hope you enjoy it!

    by Jacobo Aragunde Pérez at March 05, 2015 03:46 PM