Planet Igalia

September 15, 2014

Iago Toral

Setting up a development environment for Mesa

Recap

In my previous post I provided an overview of the Mesa source tree and identified some of its main modules.

Since we are on that subject I thought it would make sense to give a few tips on how to setup the development environment for Mesa too, so here I go.

Development environment

Mesa is mostly written in a combination of C and C++, uses autotools for its build system and Git for version control, so it should be a fairly familiar environment for many people. I am not going to explain how to build autotools projects here, there is plenty of documentation available on that subject, so instead I will focus on the specifics of Mesa.

First we need to checkout the source code. If you do not have a developer account then do an anonymous checkout:

# git clone git://anongit.freedesktop.org/git/mesa/mesa

If you do have a developer account do this instead:

# git clone git+ssh://username@git.freedesktop.org/git/mesa/mesa

Next, we will have to deal with dependencies. This should not be too hard though. Mesa is fairly low in the software stack so it does not have many and the ones it has seem to have a fairly stable API and don’t change too often, so typically, you should be able to build Mesa if you have a recent distribution and you keep it up to date. For reference, as of now I can build Mesa on my Ubuntu 14.04 without any problems.

In any case, the actual dependencies you will need to get may vary depending on the drivers you want to build, the target platform and the features you want to enable. For example, the R300 Gallium driver requires LLVM, but the Intel i965 driver doesn’t.

Notice, however, that if you are hacking on features that require specific builds of the XServer, Wayland/Weston or similar stuff the required setup will be more complex, since you would probably need to include these other projects into the mix, together with their respective dependencies.

Configuring the source tree

Here I will mention some of the Mesa specific options that I found to be more useful in my time with Mesa:

–enable-debug: This is necessary, at least, to get assertions to work, and you want this while you are developing. Mesa and the drivers have assertions on many places to make sure that new code does not break certain assumptions or violate hardware constraints, so you really want to make sure that you have these activated when you are developing. It also adds “-g -O0″ to enable debug support.

–with-dri-drivers: This is the list of classic Mesa DRI drivers you want to build. If you know you will only hack on the i965 driver, for example, then building other drivers will only slow down your builds.

–with-gallium-drivers: This is the list of Gallium drivers you want to build. Again, if you are hacking on the classic DRI i965 driver you are probably not interested in building any Gallium drivers.

Notice that if you are working on the Mesa framework layer, that is, the bits shared by all drivers, instead of the internals of a specific driver, you will probably want to include more drivers in the build to make sure that they keep building after your changes.

–with-egl-platforms: This is a list of supported platforms. Same as with the options above, you probably only want to build Mesa for the platform or platforms you are working on.

Besides using a combination of these options, you probably also want to set your CFLAGS and CXXFLAGS (remember that Mesa uses both C and C++). I for one like to pass “-g3″, for example.

Using your built version of Mesa

Once you have built Mesa you can type ‘make install’ to install the libraries and drivers. Probably, you have configured autotools (via the --prefix option) to do this to a safe location that does not conflict with your distribution installation of Mesa and now your problem is to tell your OpenGL programs that they should use this version of Mesa instead of the one provided by your distro.

You will have to adjust a couple of environment variables for this:

LIBGL_DRIVERS_PATH: Set this to the path where your built drivers have been installed. This will tell Mesa’s loader to look for the drivers here.

LD_LIBRARY_PATH: Set this to the path where your Mesa libraries have been installed. This will make it so that OpenGL programs load your recently built libGL.so rather than your system’s.

For more tips I’d suggest to read this short thread in the Mesa mailing list, which has some Mesa developers discussing their development environment setup.

Coming up next

In the next post I will provide an introduction to modern 3D graphics hardware. After all, the job of the graphics driver is all about programming the hardware, so having a basic understanding of how it works is a requirement if want to do any meaningful driver development.

by Iago Toral at September 15, 2014 02:44 PM

September 14, 2014

Claudio Saavedra

Sun 2014/Sep/14

You can try to disguise it in any way you want, but at the end of the day what we have is a boys' club that suddenly cannot invest all of its money into toys for the boys' amusement but now also needs to spend it leveling the field for the girls to be able to play too. Less money for the toys the boys like, surely that upsets them -- after all, boys were having so much fun so far and now that fun is being taken away.

The fact that the fun in this case happens to be of a socially necessary technological nature (a free desktop, a free software stack, whatever you want to call it) doesn't make this any different. If you are objecting to OPW and your argument is that it hinders the technological advance of the GNOME project, well, admit it -- isn't the fact that you enjoy technology at heart (ie, you are the one having fun) one of the main reasons you're saying this?

Male-chauvinism can take a thousand forms, and many of those forms are so well hidden and ingrained into our culture that they are terribly difficult to see, specially if you're a man and not the target of it. Once we are confronted with any of these forms, this might even give us a terrible headache -- we are in front of something we didn't even know it existed -- and it can take a tremendous effort to accept they're here. But, frankly, that effort is long due and many of us will refuse to be around those not wanting to make it.

September 14, 2014 03:52 PM

September 11, 2014

Javier Muñoz

Pflua and high performance packet filtering

Time to write other post! This time I will comment on one of our most recent projects here in Igalia, a high performance packet filtering toolkit written in Lua.

Several weeks ago I received a phone call coming from Juan. Andy was looking for some mate ready to jump in a new opportunity related to high performance networking, hypervisors, packet filtering and LuaJIT. Hey! this mix sounded great so I joined Andy and we went ahead.

Six weeks later, and with Diego joining the project too, one first implementation (Pflua) of the libpcap packet filtering language (pflang), together with the proper testing code and benchmarking (Pflua-bench) went live.

Along those weeks, I hacked in bindings/FFI implementation, performance/benchmarking, testing stuff and kernel-space to user-space code adaptation (Linux BPF JIT wrapped as a dynamic library!). With this post I will share a quick overview of the project and the proper links to explore it in detail.

As mentioned, Pflua implements the libpcap packet filtering language, which we allude as 'pflang' for short. Pflua is a high performance packet filtering toolkit implemented in LuaJIT (a tracing compiler for the Lua language). Together with Pflua we developed Pflua-bench too, a benchmarking implementation of pflang.

Pflua and Pflua-bench were developed for Snabb Gmbh, the company behind the Snabb Switch network appliance toolkit. You can read on this project or getting in touch with Luke and other Snabb hackers in the snabb-devel forum. They are working in very interesting and challenging use cases where virtualization and Software Defined Networking (SDN) are pulling more and more networking into servers. At the same time, user-space networking software is out-performing kernel-space software too.

In this point, you could be interested in the inner technical details for Pflua and Pflua-bench. If so, I would recommend to read the last post of my colleague Andy. He introduces the project with a great compiler hacker perspective. If you are in a hurry I would highlight the following points:

  • Pflua implements two compilation pipelines or execution engines. It is able to generate Lua code starting from a pflang expression or starting from Berkeley Packet Filter VM. With the first engine you reach great flexibility to craft complex/expert filters. Moreover, your final filters in Lua will be free from some limitations and constraints in BPF, such as extra bound checks or converting to host byte order.
  • Pflua-bench compares 5 pflang implementations: the user-space BPF interpreter from libpcap (libpcap), the old Linux kernel-space BPF compiler (linux-bpf), the new Linux kernel-space BPF compiler (linux-ebpf), BPF bytecode cross-compiled to Lua (bpf) and pflang compiled directly to Lua (Pflua). You can see our benchmarking results and comparative analysis here.
  • Pflua seems to be an acceptable implementation of pflang and, in many circumstances, Pflua is the fastest pflang implementation by a long shot.

As mentioned, Pflua was developed for Snabb Gmbh around an Open Source virtualized Ethernet networking stack and it has the right potential to become one high performance packet filtering toolkit in SDN solutions (forwarding and control planes).

We are incubating this project in Igalia. Feel free to follow the development and drop us a mail if you want to support this project or you are just using it!

by Javier at September 11, 2014 10:00 PM

September 08, 2014

Javier Fernández

Box Alignment and Grid Layout

As some of my readers already know, Igalia and Bloomberg are collaborating in the implementation of the Grid Layout specification for the Blink/Chromium and WebKit web engines. As part of this assignment, I had the opportunity to review and contirbute to the implementaiton of another feature I consider quite useful for the web: CSS Box Alignment Module (level 3).

The Box Alignment specification was designed to generalize the behavior of boxes alignment within their containers, which is nowadays defined across multiple specifications. Several layout models are affected by this new specification: block, table, flex and grid. This post is about how it affects to the Grid Layout implementation.

I think is a good idea to begin my exposition with a brief introduction of some concepts related to alignment and CSS Writing Modes, which I consider quite relevant to understand the implications of this specification for the Grid Layout implementation and, more important, to realize about its potential.

Examples are mandatory when analyzing W3C specifications; personally, I can’t see all the angles and implications of a feature described in a specification without the proper examples, both visual and source code.

Finally, I’d like to conclude my article with a development angle describing some interesting implementation details and technical challenges I faced while working on both Blink and WebKit web engines. Also, which perhaps is more interesting, the ones I couldn’t solve yet and I’m still working on. As always comments and feedback are really welcome.

Introduction to Box Alignment and Writing-Modes

From the CSS Box Alignment specification:

features of CSS relating to the alignment of boxes within their containers in the various CSS box layout models: block layout, table layout, flex layout, and grid layout.

From the CSS Writing Modes specification:

CSS features to support for various international writing modes, such as left-to-right (e.g. Latin or Indic), right-to-left (e.g. Hebrew or Arabic), bidirectional (e.g. mixed Latin and Arabic) and vertical (e.g. Asian scripts).

In order to get a better understanding of alignment some abstract dimensional and directional terms should be explained and taken into account. I’m going to briefly describe some of them, the ones I consider more relevant for my exposition; a more detailed definition can be obtained from the Abstract Box Terminology section of the specification.

There are three sets of directional terms in CSS:

  • physical – Interpreted relative to the page, independent of writing mode. The physical directions are left, right, top, and bottom
  • flow-relative -  Interpreted relative to the flow of content. The flow-relative directions are start and end, or block-start, block-end, inline-start, and inline-end if the dimension is also ambiguous.
  • line-relative – Interpreted relative to the orientation of the line box. The line-relative directions are line-left, line-right, line-over, and line-under.

The abstract dimensions are defined below:

  • block dimension – The dimension perpendicular to the flow of text within a line, i.e. the vertical dimension in horizontal writing modes, and the horizontal dimension in vertical writing modes.
  • inline dimension – The dimension parallel to the flow of text within a line, i.e. the horizontal dimension in horizontal writing modes, and the vertical dimension in vertical writing modes.
  • block axis – The axis in the block dimension, i.e. the vertical axis in horizontal writing modes and the horizontal axis in vertical writing modes.
  • inline axis - The axis in the inline dimension, i.e. the horizontal axis in horizontal writing modes and the vertical axis in vertical writing modes.
  • extent or logical height - A measurement in the block dimension: refers to the physical height (vertical dimension) in horizontal writing modes, and to the physical width (horizontal dimension) in vertical writing modes.
  • measure or logical width - A measurement in the inline dimension: refers to the physical width (horizontal dimension) in horizontal writing modes, and to the physical height (vertical dimension) in vertical writing modes. (The term measure derives from its use in typography.)

Then, there are flow-relative and line-relative directions. For the time being, I’ll consider only flow-relative directions terms since they are more relevant for discussing alignment issues.

  • block-start - The side that comes earlier in the block progression, as determined by the writing-mode property: the physical top in horizontal-tb mode, the right in vertical-rl, and the left in vertical-lr.
  • block-end - The side opposite block-start.
  • inline-start - The side from which text of the inline base direction would start. For boxes with a used direction value of ltr, this means the line-left side. For boxes with a used direction value of rtl, this means the line-right side.
  • inline-end - The side opposite start.

writing-modes

So now that we have defined the box edges and flow direction concepts we can review how they are used when defining the alignment

properties and values inside a Grid Layout, which can be defined along two axes:

  • which dimension they apply to (inline vs. stacking)
  • whether they control the position of the box within its parent, or the box’s content within itself.

alignment-properties

Regarding the alignment values, there are two concepts that are important to understand:

  • alignment subject - The alignment subject is the thing or things being aligned by the property. For justify-self and align-self, the alignment subject is the margin box of the box the property is set on. For justify-content and align-content, the alignment subject is defined by the layout mode.
  • alignment container - The alignment container is the rectangle that the alignment subject is aligned within. This is defined by the layout mode, but is usually the alignment subject’s containing block.

Also, there are several kind of alignment behaviors:

  • Positional Alignment - specify a position for an alignment subject with respect to its alignment container.
  • Baseline Alignment - form of positional alignment that aligns multiple alignment subjects within a shared alignment context (such as cells within a row or column) by matching up their alignment baselines.
  • Distributed Alignment - used by justify-content and align-content to distribute the items in the alignment subject evenly between the start and end edges of the alignment container.
  • Overflow Alignment - when the alignment subject is larger than the alignment container, it will overflow. To help combat this problem, an overflow alignment mode can be explicitly specified.

At the time of this writing, only Positional Alignment is implemented so I’ll focus on those values in the rest of the article. I’m still working on implementing the specification, though, so there will be time to talk about the other values in future posts.

  • center - Centers the alignment subject within its alignment container.
  • start - Aligns the alignment subject to be flush with the alignment container’s start edge.
  • end - Aligns the alignment subject to be flush with the alignment container’s end edge.
  • self-start - Aligns the alignment subject to be flush with the edge of the alignment container corresponding to the alignment subject’s start side. If the writing modes of the alignment subject and the alignment container are orthogonal, this value computes to start.
  • self-end - Aligns the alignment subject to be flush with the edge of the alignment container corresponding to the alignment subject’s end side. If the writing modes of the alignment subject and the alignment container are orthogonal, this value computes to end.
  • left - Aligns the alignment subject to be flush with the alignment container’s line-left edge. If the property’s axis is not parallel with the inline axis, this value computes to start.
  • right - Aligns the alignment subject to be flush with the alignment container’s line-right edge. If the property’s axis is not parallel with the inline axis, this value computes to start.

So, after this introduction and with all these concepts in mind, it’s now time to get hands on the Grid Layout implementation of the Box Alignment specification. As it was commented before, I’ll try to use as many examples as possible.

Aligning items inside a Grid Layout

Before entering in details with source code and examples, I’d like to summarize most of the concepts described below with some pretty simple diagrams:

2×2 Grid Layout (LTR)

grid-alignment-ltr

2×2 Grid Layout (RTL)

grid-alignment-rtl

The diagram below illustrates how items are placed inside the grid using different writing modes:

grid-writing-modes

At this point, some real examples would help to understand how the CSS alignment properties work on Grid Layout and why they are so important to get all the potential behind this new layout model.

Let’s consider this basic stylesheet which will be used in the examples from now on:

<style>
  .grid {
      grid-auto-columns: 100px;
      grid-auto-rows: 200px;
      width: -webkit-fit-content;
      margin-bottom: 20px;
  }
   .item {
      width: 20px;
      height: 40px;
  }
   .content {
      width: 10px;
      height: 20px;
      background: white;
  }
   .verticalRL {
      -webkit-writing-mode: vertical-rl;
  }
   .verticalLR {
      -webkit-writing-mode: vertical-lr;
  }
   .horizontalBT {
      -webkit-writing-mode: horizontal-bt;
  }
   .directionRTL {
      direction: rtl;
  }
</style>

The item style will be used for the grid items, while the content will be the style of the elements to be placed inside each grid item. There are as well writing-mode related styles, which will be useful later to experiment with different flow and text directions.

In the first example we will center all the cells content so we can have a fully aligned grid, which is particularly interesting for many web applications.

<div class="grid" style="align-items: center; 
                         justify-items: center">
  <div class="cell row1-column1">
    <div class="item"></div>
  </div>
  <div class="cell row1-column2">
    <div class="item"></div>
  </div>
  <div class="cell row2-column1">
    <div class="item"></div>
  </div>
  <div class="cell row2-column2">
    <div class="item"></div>
  </div>
</div>
grid-alignment-example1

In the next example we will illustrate how to use all the Positional Alignment values so we can place nine items in the same grid cell.

 
<div class="grid">
  <div class="cell row1-column1"
     style="align-self: start; justify-self: start;">
    <div class="item"></div>
  </div>
  <div class="cell row1-column1"
     style="align-self: center; justify-self: start;">
    <div class="item"></div>
  </div>
  <div class="cell row1-column1"
     style="align-self: end; justify-self: start;">
    <div class="item"></div>
  </div>
  <div class="cell row1-column1"
     style="align-self: start; justify-self: center;">
    <div class="item"></div>
  </div>
  <div class="cell row1-column1"
     style="align-self: center; justify-self: center;">
    <div class="item"></div>
  </div>
  <div class="cell row1-column1"
     style="align-self: end; justify-self: center;">
    <div class="item"></div>
  </div>
  <div class="cell row1-column1"
     style="align-self: start; justify-self: end;">
    <div class="item"></div>
  </div>
  <div class="cell row1-column1"
     style="align-self: center; justify-self: end;">
    <div class="item"></div>
  </div>
  <div class="cell row1-column1"
     style="align-self: end; justify-self: end;">
    <div class="item"></div>
  </div>
</div>
grid-alignment-example2

Let’s start playing with inline and block-flow direction and see how it affects to the different Positional Alignment values. I’ll start with the inline direction, which affects only to the justify-xxx set of CSS properties.

<div class="grid" style="align-items: self-start; justify-items: self-start">
  <div class="cell row1-column1">
    <div class="item"></div>
  </div>
  <div class="cell row1-column2">
    <div class="item"></div>
  </div>
  <div class="cell row2-column1">
    <div class="item"></div>
  </div>
  <div class="cell row2-column2">
    <div class="item"></div>
  </div>
</div>
Direction LTR Direction RTL
grid-alignment-example3 grid-alignment-example4

The writing-mode CSS Property applies to the block-flow direction, hence it’s the align-xxx properties the ones affected. In this case, orthogonal writing-modes can be specified in the HTML source code; however, these use cases are not yet fully supported by the current implementation of Grid Layout.

<div class="grid"
      style="align-items: self-start; 
             justify-items: self-start">
  <div class="cell row1-column1">
    <div class="item"></div>
  </div>
  <div class="cell row1-column2">
    <div class="item"></div>
  </div>
  <div class="cell row2-column1">
    <div class="item"></div>
  </div>
  <div class="cell row2-column2">
    <div class="item"></div>
  </div>
</div>
grid-alignment-example3
Vertical LR Vertical RL
grid-alignment-example5 grid-alignment-example6

Technical challenges, accomplished and to be faced

Implementing the Box Alignment specification has been a long task and there is still quite much work ahead for both, WebKit and Blink/Chromium web engines. Perhaps one of the most tedious issue was the definition of a couple of new CSS properties: justify-self and justify-items, which required to touch several Core components, from the CSS parser, the style builder and resolver and finally the rendering.

Another important technical challenge comes from the fact that the Box Alignment properties already present in both web engines were implemented as part of the Flexible Box specification. As it was commented before in this post, the Box Alignment specification aims to generalize the alignment behavior for several layout models, hence these properties were not tied to the Flexible Box implementation anymore; this lead to many technical issue, as I’ll explain later.

The patch implemented for issue 333423005 is a good example of the files to touch and logic to be added in order to implement a new CSS property in Blink/Chromium. There is a similar work to be done in the WebKit web engine; at the time of this writing the similarities are still big, even though some parts changed considerably, like the CSS parsing and style builder logic. As an example, the patch implemented in bug 134419

The following code is quite descriptive of the nature of the CSS Box Alignment properties and how they are applied during the style cascade:

void StyleAdjuster::adjustStyleForAlignment(RenderStyle& style, const RenderStyle& parentStyle)
{
    bool isFlexOrGrid = style.isDisplayFlexibleOrGridBox();
    bool absolutePositioned = style.position() == AbsolutePosition;
 
    // If the inherited value of justify-items includes the legacy keyword, 'auto'
    // computes to the the inherited value.
    // Otherwise, auto computes to:
    //  - 'stretch' for flex containers and grid containers.
    //  - 'start' for everything else.
    if (style.justifyItems() == ItemPositionAuto) {
        if (parentStyle.justifyItemsPositionType() == LegacyPosition) {
            style.setJustifyItems(parentStyle.justifyItems());
            style.setJustifyItemsPositionType(parentStyle.justifyItemsPositionType());
        } else {
            style.setJustifyItems(isFlexOrGrid ? ItemPositionStretch : ItemPositionStart);
        }
    }
 
    // The 'auto' keyword computes to 'stretch' on absolutely-positioned elements,
    // and to the computed value of justify-items on the parent (minus
    // any 'legacy' keywords) on all other boxes (to be resolved during the layout).
    if ((style.justifySelf() == ItemPositionAuto) && absolutePositioned)
        style.setJustifySelf(ItemPositionStretch);
 
    // The 'auto' keyword computes to:
    //  - 'stretch' for flex containers and grid containers,
    //  - 'start' for everything else.
    if (style.alignItems() == ItemPositionAuto)
        style.setAlignItems(isFlexOrGrid ? ItemPositionStretch : ItemPositionStart);
 
    // The 'auto' keyword computes to 'stretch' on absolutely-positioned elements,
    // and to the computed value of align-items on the parent (minus
    // any 'legacy' keywords) on all other boxes (to be resolved during the layout).
    if ((style.alignSelf() == ItemPositionAuto) && absolutePositioned)
        style.setAlignSelf(ItemPositionStretch);
}

The WebKit web engine implements the same logic in the StyleResolver class; the StyleAdjuster class is just a helper class defined in the blink/Chromium engine to assist the StyleReslolver logic during the style cascade in order to make some final adjustmetns.

The issue 297483005 implements the align-self CSS property support in Grid Layout; the follwong code extrated from that patch is a good example of how alingment interacts with the grid tracks.

LayoutUnit RenderGrid::rowPositionForChild(const RenderBox* child) const
{
    bool hasOrthogonalWritingMode = child->isHorizontalWritingMode() != isHorizontalWritingMode();
    ItemPosition alignSelf = resolveAlignment(style(), child->style());
 
    switch (alignSelf) {
    case ItemPositionSelfStart:
        // If orthogonal writing-modes, this computes to 'Start'.
        // FIXME: grid track sizing and positioning does not support orthogonal modes yet.
        if (hasOrthogonalWritingMode)
            return startOfRowForChild(child);
 
        // self-start is based on the child's block axis direction. That's why we need to check against the grid container's block flow.
        if (child->style()->writingMode() != style()->writingMode())
            return endOfRowForChild(child);
 
        return startOfRowForChild(child);
    case ItemPositionSelfEnd:
        // If orthogonal writing-modes, this computes to 'End'.
        // FIXME: grid track sizing and positioning does not support orthogonal modes yet.
        if (hasOrthogonalWritingMode)
            return endOfRowForChild(child);
 
        // self-end is based on the child's block axis direction. That's why we need to check against the grid container's block flow.
        if (child->style()->writingMode() != style()->writingMode())
            return startOfRowForChild(child);
 
        return endOfRowForChild(child);
 
    case ItemPositionLeft:
        // orthogonal modes make property and inline axes to be parallel, but in any case
        // this is always equivalent to 'Start'.
        //
        // self-align's axis is never parallel to the inline axis, except in orthogonal
        // writing-mode, so this is equivalent to 'Start’.
        return startOfRowForChild(child);
 
    case ItemPositionRight:
        // orthogonal modes make property and inline axes to be parallel.
        // FIXME: grid track sizing and positioning does not support orthogonal modes yet.
        if (hasOrthogonalWritingMode)
            return endOfRowForChild(child);
 
        // self-align's axis is never parallel to the inline axis, except in orthogonal
        // writing-mode, so this is equivalent to 'Start'.
        return startOfRowForChild(child);
 
    case ItemPositionCenter:
        return centeredRowPositionForChild(child);
        // Only used in flex layout, for other layout, it's equivalent to 'Start'.
    case ItemPositionFlexStart:
    case ItemPositionStart:
        return startOfRowForChild(child);
        // Only used in flex layout, for other layout, it's equivalent to 'End'.
    case ItemPositionFlexEnd:
    case ItemPositionEnd:
        return endOfRowForChild(child);
    case ItemPositionStretch:
        // FIXME: Implement the Stretch value. For now, we always start align the child.
        return startOfRowForChild(child);
    case ItemPositionBaseline:
    case ItemPositionLastBaseline:
        // FIXME: Implement the ItemPositionBaseline value. For now, we always start align the child.
        return startOfRowForChild(child);
    case ItemPositionAuto:
        break;
    }
 
    ASSERT_NOT_REACHED();
    return 0;
}

The resolveAlignment function call deserves an special mention, since it will lead to the open issues I’m still working on. The Box Alignment specification states that the auto values must be resolved to either stretch or start depending on the kind of element. This is theoretically performed during the style cascade, so it wouldn’t be necessary to resolve it at the rendering stage. The code is pretty simple :

static ItemPosition resolveAlignment(const RenderStyle* parentStyle, const RenderStyle* childStyle)
{
    ItemPosition align = childStyle->alignSelf();
    // The auto keyword computes to the parent's align-items computed value, or to "stretch", if not set or "auto".
    if (align == ItemPositionAuto)
        align = (parentStyle->alignItems() == ItemPositionAuto) ? ItemPositionStretch : parentStyle->alignItems();
    return align;
}

The RenderFlexibleBox implementation has to define a similar logic and what is more important, the default value of all the Box Alignment properties have been changed to auto, instead of stretch as it’s stated in the Flexbible Box specification.

To make things even more complicated, many HTML elements are being rendered by RenderFlexibleBox objects as an implementation decision, without the proper display value set to indicate such assumption. This causes many issues and layout tests failures, since the resolved value for auto depends on the kind of element, which is defined by its display property value. Additionally, there are also problems with the anonymous render objects added to the tree on certain implementations.

Both WebKit and Blink/Chromium are affected by these issues; Mathml is a good example for the WebKit engine, since most if its render objects are implemented using a RenderFlexibleBox; also, it assigns and manipulates the align-{self, items} properties during the layout. The RenderFullScreen object is a source of problems for the Blink/Chromium web engine on this regard; it uses a RenderFleixibleBox because of its stretch default behavior, which is not the case anymore according to the Box Alignment specification.

I’m still working on theses issues in both web engines, so this issue is trying to face part of the problems on Blink/Chromium. There are a similar bug in the WebKit engine with similar challenges.

Another pending issue present in both web engines is the lack of support for different writing-modes. Eventhouth the Grid Layout logic is prepared to support them, it’s still buggy and for certain combinations it does not produce the expected outcome.

I’d like to finish this post pointing out that anybody can follow the progress of the Box Alignment spec implementation for Grid Layout you can track these bugs on either of the web engine you are more interested on:

  • Blink/Chromium
    • bug 249451: [CSS Grid Layout] Implement row-axis Alignment
    • bug 376823: [CSS Grid Layout] Implement column-axis Alignment
  • WebKit
    • bug 133224 – [meta] [CSS Grid Layout] Implement column-axis Alignment
    • bug 133222 – [meta] [CSS Grid Layout] Implement row-axis Alignment

This work wouldn’t be possible without the support of Bloomberg and Igalia, who are comitted to provide a better web platform for developers.

Igalia & Bloomberg logos

Igalia and Bloomberg working to build a better web platform

by jfernandez at September 08, 2014 01:48 PM

Eduardo Lima Mitev

Drawing Web content with OpenGL (ES 3.0) instanced rendering

This is a follow up article about my ongoing research on Web content rendering using aggressive batching and merging of draw operations, together with OpenGL (ES 3.0) instanced rendering.

In a previous post, I discussed how relying on the Web engine’s layer tree to figure out non-overlapping content (layers) of a Web page, would (theoretically) allow an OpenGL based rasterizer to ignore the order of the drawing operations. This would allow the rasterizer to group together drawing of similar geometry and submit them efficiently to the GPU using instanced rendering.

I also presented some basic examples and comparisons of this technique with Skia, a popular 2D rasterizer, giving some hints on how much we can accelerate rendering if the overhead of the OpenGL API calls is reduced by using the instanced rendering technique.

However, this idea remained to be validated for real cases and in real hardware, specially because of the complexity and pressure imposed on shader programs, which now become responsible for de-referencing the attributes of each batched geometry and render them correctly.

Also, there are potential API changes in the rasterizer that could make this technique impractical to implement in any existing Web engine without significant changes in the rendering process.

To try keep this article short and focused, today I want to talk only about my latest experiments rendering some fairly complex Web elements using this technique; and leave the discussion about performance to future entries.

Everything is a rectangle

As mentioned in my previous article, almost everything in a Web page can be rendered with a rectangle primitive.

Web pages are mostly character glyphs, which today’s rasterizers normally draw by texture mapping a pre-rendered image of the glyph onto a rectangular area. Then you have boxes, images, shadows, lines, etc; which can all be drawn with a rectangle with the correct layout, transformation and/or texturing.

Primitives that are not rectangles are mostly seen in the element’s border specification, where you have borders with radius, and different styles: double, dotted, grooved, etc. There is a rich set of primitives coming from the combination of features in the borders spec alone.

There is also the Canvas 2D and SVG APIs, which are created specifically for arbitrary 2D content. The technique I’m discussing here purposely ignores these APIs and focuses on accelerating the rest.

In practice, however, these non-rectangular geometries account for just a tiny fraction of the typical rendering of a Web page, which allows me to effectively call them “exceptions”.

The approach I’m currently following assumes everything in a Web page is a rectangle, and all non-rectangular geometry is treated as exceptions and handled differently on shader code.

This means I no longer need to ignore the ordering problem since I always batch a rectangle for every single draw operation, and then render all rectangles in order. This introduces a dramatic change compared to the previous approach I discussed. Now I can (partially) implement this technique without changing the API of existing rasterizers. I say “partially” because to take full advantage of the performance gain, some API changes would be desired.

Drawing non-rectangular geometry using rectangles

So, how do we deal with these exceptions? Remember that we want to draw only with rectangles so that no operation could ever break our batch, if we want to take full advantage of the instanced rendering acceleration.

There are 3 ways of rendering non-rectangular geometry using rectangles:

  • 1. Using a geometry shader:

    This is the most elegant solution, and looks like it was designed for this case. But since it isn’t yet widely deployed, I will not make much emphasis on it here. But we need to follow its evolution closely.

  • 2. Degenerating rectangles:

    This is basically to turn a rectangle into a triangle by degenerating one of its vertices. Then, with a set of degenerated rectangles one could draw any arbitrary geometry as we do today with triangles.

  • 3. Drawing geometry in the fragment shader:

    This sounds like a bad idea, and it is definitely a bad idea! However, given the small and limited amount of cases that we need to consider, it can be feasible.

I’m currently experimenting with 3). You might ask why?, it looks like the worse option. The reason is that going for 2), degenerating rectangles, seems overkill at this point, lacking a deeper understanding of exactly what non-rectangle geometry we will ever need. Implementing a generic rectangle degeneration just for a few tiny set of cases would have been initially a bad choice and a waste of time.

So I decided to explore first the option of drawing these exceptions in the fragment shader and see how far I could go in terms of shader code complexity and performance (un)loss.

Next, I will show some examples of simple Web features rendered this way.

Experiments

The setup:

While my previous screen-casts were ran in my working laptop with a powerful Haswell GPU, one of my goals then was to focus on mobile devices. Hence, I started developing on an Arndale board I happen to have around. Details of the exact setup is out of the scope now, but I will just mention that the board is running a Linaro distribution with the official Mali T604 drivers by ARM.

My Arndale board

Following is a video I ensambled to show the different examples running on the Arndale board (and my laptop at the same time). This time I had to record using an external camera instead of screen-casting to avoid interference with the performance, so please bear with my camera-on-hand video recording skills.



This video file is also available on Vimeo.

I won’t talk about performance now, since I plan to cover that in future deliveries. Enough to be said that the performance is pretty good, comparable to my laptop in most of the examples. Also, there are a lot of simple known optimizations that I have not done because I’m focusing on validating the method first.

One important thing to note is that when drawing is done in a fragment shader, you cannot benefit from multi-sampling anti-aliasing (MSAA), since sampling occurs at an earlier stage. Hence, you have to implement anti-aliasing your self. In this case, I implemented a simple distance-to-edge linear anti-aliasing, and to my surprise, the end result is much better than the MSAA with 8 samples I was trying on my Haswell laptop before, and it is also faster.

On a related note, I have found out that MSAA does not give me much when rendering character glyphs (the majority of content) since they come already anti-aliased by FreeType2. And MSAA will slow down the rendering of the entire scene for every single frame.

I continue to dump the code from this research into a personal repository on GitHub. Go take a look if you are interested in the prototyping of these experiments.

Conclusions and next steps

There is one important conclusion coming out from these experiments: The fact that the rasterizer is stateless makes it very inefficient to modify a single element in a scene.

By stateless I mean they do not keep semantic information about the elements being drawn. For example, lets say I draw a rectangle in one frame, and in the next frame I want to draw the same rectangle somewhere else on the canvas. I already have a batch with all the elements of the scene happily stored in a vertex buffer object on GPU memory, and the rectangle in question is there somewhere. If I could keep the offset where that rectangle is in the batch, I could modify its attributes without having to drop and re-submit the entire buffer.

The solution: Moving to a scene graph. Web engines already implement a scene graph but at a higher level. Here I’m talking about a scene graph in the rasterizer itself, where nodes keep the offset of their attributes in the batch (layout, transformation, color, etc); and when you modify any of these attributes, only the deltas are uploaded to the GPU, rather than the whole batch.

I believe a scene graph approach has the potential to open a whole new set of opportunities for acceleration, specially for transitions and animations, and scrolling.

And that’s exciting!

Apart from this, I also want to:

  • Benchmark! set up a platform for reliable benchmarking and perf comparison with Skia/Cairo.
  • Take a subset of this technique and test it in Skia, behind current API.
  • Validate the case of drawing drop shadows and multi-step gradient backgrounds.
  • Test in other different OpenGL ES 3.0 implementations (and more devices!).

Let us not forget the fight we are fighting: Web applications must be as fast as native. I truly think we can do it.

by elima at September 08, 2014 01:16 PM

Iago Toral

An eagle eye view into the Mesa source tree

Recap

My last post introduced Mesa’s loader as the module that takes care of auto-selecting the right driver for our hardware. If the loader fails to find a suitable hardware driver it will fall back to a software driver, but we can also force this situation ourselves, which may come in handy in some scenarios. We also took a quick look at the glxinfo tool that we can use to query the capabilities and features exposed by the selected driver.

The topic of today focuses on providing a quick overview of the Mesa source code tree, which will help us identify the parts of the code that are relevant to our interests depending on the driver and/or the feature we intend to work on.

Browsing the source code

First off, there is already some documentation on this topic available on the Mesa 3D website that is a good place to start. Since that already gives some insight on what goes into each part of the repository I’ll focus on complementing that information with a little bit more of detail for some of the most important parts I have interacted with so far:

  • In src/egl/ we have the implementation of the EGL standard. If you are working on EGL-specific features, tracking down an EGL-specific problem or you are simply curious about how EGL links into the GL implementation, this is the place you want to visit. This includes the EGL implementations for the X11, DRM and Wayland platforms.
  • In src/glx/ we have the OpenGL bits relating specifically to X11 platforms, known as GLX. So if you are working on the GLX layer, this is the place to go. Here there is all the stuff that takes care of interacting with the XServer, the client-side DRI implementation, etc.
  • src/glsl/ contains a critical aspect of Mesa: the GLSL compiler used by all Mesa drivers. It includes a GLSL parser, the definition of the Mesa IR, also referred to as GLSL IR, used to represent shader programs internally, the shader linker and various optimization passes that operate on the Mesa IR. The resulting Mesa IR produced by the GLSL compiler is then consumed by the various drivers which transform it into native GPU code that can be loaded and run in the hardware.
  • src/mesa/main/ contains the core Mesa elements. This includes hardware-independent views of core objects like textures, buffers, vertex array objects, the OpenGL context, etc as well as basic infrastructure, like linked lists.
  • src/mesa/drivers/ contains the actual classic drivers (not Gallium). DRI drivers in particular go into src/mesa/drivers/dri. For example the Intel i965 driver goes into src/mesa/drivers/dri/i965. The code here is, for the most part, very specific to the underlying hardware platforms.
  • src/mesa/swrast*/ and src/mesa/tnl*/ provide software implementations for things like rasterization or vertex transforms. Used by some software drivers and also by some hardware drivers to implement certain features for which they don’t have hardware support or for which hardware support is not yet available in the driver. For example, the i965 driver implements operations on the accumulation and selection buffers in software via these modules.
  • src/mesa/vbo/ is another important module. Across its various versions, OpenGL has specified many ways in which a program can tell OpenGL about its vertex data, from using functions of the glVertex*() family inside glBegin()/glEnd() blocks, to things like vertex arrays, vertex array objects, display lists, etc… The drivers, however, do not need to deal with all this, Mesa makes it so that they always receive their vertex data as collection of vertex arrays, significantly reducing complexity on the side of the driver implementator. This is the module that takes care of managing all this, so no matter what type of drawing you GL program is doing or how it specifies its vertex data, it will always go through this module before it reaches the driver.
  • src/loader/, as we have seen in my previous post, contains the Mesa driver loader, which provides the logic necessary to decide which Mesa driver is the right one to use for a specific hardware so that Mesa’s libGL.so can auto-select the right driver when loaded.
  • src/gallium/ contains the Gallium3D framework implementation. If, like me, you only work on a classic driver, you don’t need to care about the contents of this at all. If you are working on Gallium drivers however, this is the place where you will find the various Gallium drivers in development (inside src/gallium/drivers/), like the various Gallium ATI/AMD drivers, Nouveau or the LLVM based software driver (llvmpipe) and the Gallium state trackers.

So with this in mind, one should have enough information to know where to start looking for something specific:

  • If are interested in how vertex data provided to OpenGL is manipulated and uploaded to the GPU, the vbo module is probably the right place to look.
  • If we are looking to work on a specific aspect of a concrete hardware driver, we should go to the corresponding directory in src/mesa/drivers/ if it is a classic driver, or src/gallium/drivers if it is a Gallium driver.
  • If we want to know about how Mesa, the framework, abstracts various OpenGL concepts like textures, vertex array objects, shader programs, etc. we should look into src/mesa/main/.
  • If we are interested in the platform specific support, be it EGL or GLX, we want to look into src/egl or src/glx.
  • If we are interested in the GLSL implementation, which involves anything from the compiler to the intermediary IR and the various optimization passes, we need to look into src/glsl/.

Coming up next

So now that we have an eagle view of the contents of the Mesa repository let’s see how we can prepare a development environment so we can start hacking on
some stuff. I’ll cover this in my next post.

by Iago Toral at September 08, 2014 11:59 AM

September 04, 2014

Iago Toral

Driver loading and querying in Mesa

Recap

In my previous post I explained that Mesa is a framework for OpenGL driver development. As such, it provides code that can be reused by multiple driver implementations. This code is, of course, hardware agnostic, but frees driver developers from doing a significant part of the work. The framework also provides hooks for developers to add the bits of code that deal with the actual hardware. This design allows multiple drivers to co-exist and share a significant amount of code.

I also explained that among the various drivers that Mesa provides, we can find both hardware drivers that take advantage of a specific GPU and software drivers, that are implemented entirely in software (so they work on the CPU and do not depend on a specific GPU). The latter are obviously slower, but as I discussed, they may come in handy in some scenarios.

Driver selection

So, Mesa provides multiple drivers, but how does it select the one that fits the requirements of a specific system?

You have probably noticed that Mesa is deployed in multiple packages. In my Ubuntu system, the one that deploys the DRI drivers is libgl1-mesa-dri:amd64. If you check its contents you will see that this package installs OpenGL drivers for various GPUs:

# dpkg -L libgl1-mesa-dri:amd64 
(...)
/usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_radeonsi.so
/usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_r600.so
/usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_nouveau.so
/usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_vmwgfx.so
/usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_r300.so
/usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_swrast.so
/usr/lib/x86_64-linux-gnu/dri/i915_dri.so
/usr/lib/x86_64-linux-gnu/dri/i965_dri.so
/usr/lib/x86_64-linux-gnu/dri/r200_dri.so
/usr/lib/x86_64-linux-gnu/dri/r600_dri.so
/usr/lib/x86_64-linux-gnu/dri/radeon_dri.so
/usr/lib/x86_64-linux-gnu/dri/r300_dri.so
/usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so
/usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
/usr/lib/x86_64-linux-gnu/dri/nouveau_vieux_dri.so
/usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
/usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
(...)

Since I have a relatively recent Intel GPU, the driver I need is the one provided in i965_dri.so. So how do we tell Mesa that this is the one we need? Well, the answer is that we don’t, Mesa is smart enough to know which driver is the right one for our GPU, and selects it automatically when you load libGL.so. The part of Mesa that takes care of this is called the ‘loader’.

You can, however, point Mesa to look for suitable drivers in a specific directory other than the default, or force it to use a software driver using various environment variables.

What driver is Mesa actually loading?

If you want to know exactly what driver Mesa is loading, you can instruct it to dump this (and other) information to stderr via the LIBGL_DEBUG environment variable:

# LIBGL_DEBUG=verbose glxgears 
libGL: screen 0 does not appear to be DRI3 capable
libGL: pci id for fd 4: 8086:0126, driver i965
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/tls/i965_dri.so
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/i965_dri.so

So we see that Mesa checks the existing hardware and realizes that the i965 driver is the one to use, so it first attempts to load the TLS version of that driver and, since I don’t have the TLS version, falls back to the normal version, which I do have.

The code in src/loader/loader.c (loader_get_driver_for_fd) is the one responsible for detecting the right driver to use (i965 in my case). This receives a device fd as input parameter that is acquired previously by calling DRI2Connect() as part of the DRI bring up process. Then the actual driver file is loaded in glx/dri_common.c (driOpenDriver).

We can also obtain a more descriptive indication of the driver we are loading by using the glxinfo program that comes with the mesa-utils package:

# glxinfo | grep -i "opengl renderer"
OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Mobile 

This tells me that I am using the Intel hardware driver, and it also shares information related with the specific Intel GPU I have (SandyBridge).

Forcing a software driver

I have mentioned that having software drivers available comes in handy at times, but how do we tell the loader to use them? Mesa provides an environment variable that we can set for this purpose, so switching between a hardware driver and a software one is very easy to do:

# LIBGL_DEBUG=verbose LIBGL_ALWAYS_SOFTWARE=1 glxgears 
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/tls/swrast_dri.so
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so

As we can see, setting LIBGL_ALWAYS_SOFTWARE will make the loader select a software driver (swrast).

If I force a software driver and call glxinfo like I did before, this is what I get:

# LIBGL_ALWAYS_SOFTWARE=1 glxinfo | grep -i "opengl renderer"
OpenGL renderer string: Software Rasterizer

So it is clear that I am using a software driver in this case.

Querying the driver for OpenGL features

The glxinfo program also comes in handy to obtain information about the specific OpenGL features implemented by the driver. If you want to check if the Mesa driver for your hardware implements a specific OpenGL extension you can inspect the output of glxinfo and look for that extension:

# glxinfo | grep GL_ARB_texture_multisample

You can also ask glxinfo to include hardware limits for certain OpenGL features including the -l switch. For example:

# glxinfo -l | grep GL_MAX_TEXTURE_SIZE
GL_MAX_TEXTURE_SIZE = 8192

Coming up next

In my next posts I will cover the directory structure of the Mesa repository, identifying its main modules, which should give Mesa newcomers some guidance as to where they should look for when they need to find the code that deals with something specific. We will then discuss how modern 3D hardware has changed the way GPU drivers are developed and explain how a modern 3D graphics pipeline works, which should pave the way to start looking into the real guts of Mesa: the implementation of shaders.

by Iago Toral at September 04, 2014 11:43 AM

September 02, 2014

Andy Wingo

high-performance packet filtering with pflua

Greets! I'm delighted to be able to announce the release of Pflua, a high-performance packet filtering toolkit written in Lua.

Pflua implements the well-known libpcap packet filtering language, which we call pflang for short.

Unlike other packet filtering toolkits, which tend to use the libpcap library to compile pflang expressions bytecode to be run by the kernel, Pflua is a completely new implementation of pflang.

why lua?

At this point, regular readers are asking themselves why this Schemer is hacking on a Lua project. The truth is that I've always been looking for an excuse to play with the LuaJIT high-performance Lua implementation.

LuaJIT is a tracing compiler, which is different from other JIT systems I have worked on in the past. Among other characteristics, tracing compilers only emit machine code for branches that are taken at run-time. Tracing seems a particularly appropriate strategy for the packet filtering use case, as you end up with linear machine code that reflects the shape of actual network traffic. This has the potential to be much faster than anything static compilation techniques can produce.

The other reason for using Lua was because it was an excuse to hack with Luke Gorrie, who for the past couple years has been building the Snabb Switch network appliance toolkit, also written in Lua. A common deployment environment for Snabb is within the host virtual machine of a virtualized server, with Snabb having CPU affinity and complete control over a high-performance 10Gbit NIC, which it then routes to guest VMs. The administrator of such an environment might want to apply filters on the kinds of traffic passing into and out of the guests. To this end, we plan on integrating Pflua into Snabb so as to provide a pleasant, expressive, high-performance filtering facility.

Given its high performance, it is also reasonable to deploy Pflua on gateway routers and load-balancers, within virtualized networking appliances.

implementation

Pflua compiles pflang expressions to Lua source code, which are then optimized at run-time to native machine code.

There are actually two compilation pipelines in Pflua. The main one is fairly traditional. First, a custom parser produces a high-level AST of a pflang filter expression. This AST is lowered to a primitive AST, with a limited set of operators and ways in which they can be combined. This representation is then exhaustively optimized, folding constants and tests, inferring ranges of expressions and packet offset values, hoisting assertions that post-dominate success continuations, etc. Finally, we residualize Lua source code, performing common subexpression elimination as we go.

For example, if we compile the simple Pflang expression ip or ip6 with the default compilation pipeline, we get the Lua source code:

return function(P,length)
   if not (length >= 14) then return false end
   do
      local v1 = ffi.cast("uint16_t*", P+12)[0]
      if v1 == 8 then return true end
      do
         do return v1 == 56710 end
      end
   end
end

The other compilation pipeline starts with bytecode for the Berkeley packet filter VM. Pflua can load up the libpcap library and use it to compile a pflang expression to BPF. In any case, whether you start from raw BPF or from a pflang expression, the BPF is compiled directly to Lua source code, which LuaJIT can gnaw on as it pleases. Compiling ip or ip6 with this pipeline results in the following Lua code:

return function (P, length)
   local A = 0
   if 14 > length then return 0 end
   A = bit.bor(bit.lshift(P[12], 8), P[12+1])
   if (A==2048) then goto L2 end
   if not (A==34525) then goto L3 end
   ::L2::
   do return 65535 end
   ::L3::
   do return 0 end
   error("end of bpf")
end

We like the independence and optimization capabilities afforded by the native pflang pipeline. Pflua can hoist and eliminate bounds checks, whereas BPF is obligated to check that every packet access is valid. Also, Pflua can work on data in network byte order, whereas BPF must convert to host byte order. Both of these restrictions apply not only to Pflua's BPF pipeline, but also to all other implementations that use BPF (for example the interpreter in libpcap, as well as the JIT compilers in the BSD and Linux kernels).

However, though Pflua does a good job in implementing pflang, it is inevitable that there may be bugs or differences of implementation relative to what libpcap does. For that reason, the libpcap-to-bytecode pipeline can be a useful alternative in some cases.

performance

When Pflua hits the sweet spots of the LuaJIT compiler, performance screams.


(full image, analysis)

This synthetic benchmark runs over a packet capture of a ping flood between two machines and compares the following pflang implementations:

  1. libpcap: The user-space BPF interpreter from libpcap

  2. linux-bpf: The old Linux kernel-space BPF compiler from 2011. We have adapted this library to work as a loadable user-space module (source)

  3. linux-ebpf: The new Linux kernel-space BPF compiler from 2014, also adapted to user-space (source)

  4. bpf-lua: BPF bytecodes, cross-compiled to Lua by Pflua.

  5. pflua: Pflang compiled directly to Lua by Pflua.

To benchmark a pflang implementation, we use the implementation to run a set of pflang expressions over saved packet captures. The result is a corresponding set of benchmark scores measured in millions of packets per second (MPPS). The first set of results is thrown away as a warmup. After warmup, the run is repeated 50 times within the same process to get multiple result sets. Each run checks to see that the filter matches the the expected number of packets, to verify that each implementation does the same thing, and also to ensure that the loop is not dead.

In all cases the same Lua program is used to drive the benchmark. We have tested a native C loop when driving libpcap and gotten similar results, so we consider that the LuaJIT interface to C is not a performance bottleneck. See the pflua-bench project for more on the benchmarking procedure and a more detailed analysis.

The graph above shows that Pflua can stream in packets from memory and run some simple pflang filters them at close to the memory bandwidth on this machine (100 Gbit/s). Because all of the filters are actually faster than the accept-all case, probably due to work causing prefetching, we actually don't know how fast the filters themselves can run. At any case, in this ideal situation, we're running at a handful of nanoseconds per packet. Good times!


(full image, analysis)

It's impossible to make real-world tests right now, especially since we're running over packet captures and not within a network switch. However, we can get more realistic. In the above test, we run a few filters over a packet capture from wingolog.org, which mostly operates as a web server. Here we see again that Pflua beats all of the competition. Oddly, the new Linux JIT appears to fare marginally worse than the old one. I don't know why that would be.

Sadly, though, the last tests aren't running at that amazing flat-out speed we were seeing before. I spent days figuring out why that is, and that's part of the subject of my last section here.

on lua, on luajit

I implement programming languages for a living. That doesn't mean I know everything there is to know about everything, or that everything I think I know is actually true -- in particular, I was quite ignorant about trace compilers, as I had never worked with one, and I hardly knew anything about Lua at all. With all of those caveats, here are some ignorant first impressions of Lua and LuaJIT.

LuaJIT has a ridiculously fast startup time. It also compiles really quickly: under a minute. Neither of these should be important but they feel important. Of course, LuaJIT is not written in Lua, so it doesn't have the bootstrap challenges that Guile has; but still, a fast compilation is refreshing.

LuaJIT's FFI is great. Five stars, would program again.

As a compilation target, Lua is OK. On the plus side, it has goto and efficient bit operations over 32-bit numbers. However, and this is a huge downer, the result range of bit operations is the signed int32 range, not the unsigned range. This means that bit.band(0xffffffff, x) might be negative. No one in the history of programming has ever wanted this. There are sensible meanings for negative results to bit operations, but only if an argument was negative. Grr. Otherwise, Lua shares the same concerns as other languages whose numbers are defined as 64-bit doubles.

Sometimes people get upset that Lua starts its indexes (in "arrays" or strings) with 1 instead of 0. It's foreign to me, so it's sometimes a challenge, but it can work as well as anything else. The problem comes in when working with the LuaJIT FFI, which starts indexes with 0, leading me to make errors as I forget which kind of object I am working on.

As a language to implement compilers, Lua desperately misses a pattern matching facility. Otherwise, a number of small gripes but no big ones; tables and closures abound, which leads to relatively terse code.

Finally, how well does trace compilation work for this task? I offer the following graph.


(full image, analysis)

Here the tests are paired. The first test of a pair, for example the leftmost portrange 0-6000, will match most packets. The second test of a pair, for example the second-from-the-left portrange 0-5, will reject all packets. The generated Lua code will be very similar, except for some constants being different. See portrange-0-6000.md for an example.

The Pflua performance of these filters is very different: the one that matches is slower than the one that doesn't, even though in most cases the non-matching filter will have to do more work. For example, a non-matching filter probably checks both src and dst ports, whereas a successful one might not need to check the dst.

It hurts to see Pflua's performance be less than the Linux JIT compilers, and even less than libpcap at times. I scratched my head for a long time about this. The Lua code is fine, and actually looks much like the BPF code. I had taken a look at the generated assembly code for previous traces and it looked fine -- some things that were not as good as they should be (e.g. a fair bit of conversions between integers and doubles, where these traces have no doubles), but things were OK. What changed?

Well. I captured the traces for portrange 0-6000 to a file, and dove in. Trace 66 contains the inner loop. It's interesting to see that there's a lot of dynamic checks in the beginning of the trace, although the loop itself is not bad (scroll down to see the word LOOP:), though with the double conversions I mentioned before.

It seems that trace 66 was captured for a packet whose src port was within range. Later, we end up compiling a second trace if the src port check fails: trace 67. The trace starts off with an absurd amount of loads and dynamic checks -- to a similar degree as trace 66, even though trace 66 dominates trace 67. It seems that there is a big penalty for transferring from one trace to another, even though they are both compiled.

Finally, once trace 67 is done -- and recall that all it has to do is check the destination port, and then update the counters from the inner loop) -- it jumps back to the top of trace 66 instead of the top of the loop, repeating all of the dynamic checks in trace 66! I can only think this is a current deficiency of LuaJIT, and not with trace compilation in general, although the amount of state transfer points to a lack of global analysis that you would get in a method JIT. I'm sure that values are being transferred that are actually dead.

This explains the good performance for the match-nothing cases: the first trace that gets compiled residualizes the loop expecting that all tests fail, and so only matching cases or variations incur the trace transfer-and-re-loop cost.

It could be that the Lua code that Pflua residualizes is in some way not idiomatic or not performant; tips in that regard are appreciated.

conclusion

I was going to pass some possible slogans by our marketing department, but we don't really have one, so I pass them on to you and you can tell me what you think:

  • "Pflua: A Totally Adequate Pflang Implementation"

  • "Pflua: Sometimes Amazing Performance!!!!1!!"

  • "Pflua: Organic Artisanal Network Packet Filtering"

Pflua was written by Igalians Diego Pino, Javier Muñoz, and myself for Snabb Gmbh, fine purveyors of high-performance networking solutions. If you are interested in getting Pflua in a Snabb context, we'd be happy to talk; drop a note to the snabb-devel forum. For Pflua in other contexts, file an issue or drop me a mail at wingo@igalia.com. Happy hackings with Pflua, the totally adequate pflang implementation!

by Andy Wingo at September 02, 2014 10:15 AM

August 27, 2014

Jacobo Aragunde

Speaking in the next LibreOffice conference

I’m happy to announce that I will be taking part in the 2014 edition of LibreOffice Conference as a speaker. I’ll overview the status of accessibility in our favorite productivity suite, starting with an introduction to accessibility support and how application are supposed to implement it, we will check the particular case of LibreOffice: which accessibility backends are implemented and how the architecture is designed to support multiple backends while maximizing code reuse.

The conference program looks hot too, and this time I’m particularly interested on hearing from the success cases that will be presented there, looking for ideas and lessons to apply to new deployments.

Igalia is one of the sponsors of the conference, taking our compromise with LibreOffice project a step further. The company will also be sponsoring my flight and stay in Bern.

Last but not least, it will be great to meet the community members again, and get to know those I haven’t met yet in previous conferences or hackfests. Looking forward to seeing you at Bern!

Igalia & LibreOffice

by Jacobo Aragunde Pérez at August 27, 2014 10:00 AM

August 25, 2014

Andy Wingo

revisiting common subexpression elimination in guile

A couple years ago I wrote about a common subexpression pass that I implemented in Guile 2.0.

To recap, Guile 2.0 has a global, interprocedural common subexpression elimination (CSE) pass.

In the context of compiler optimizations, "global" means that it works across basic block boundaries. Basic blocks are simple, linear segments of code without control-flow joins or branches. Working only within basic blocks is called "local". Working across basic blocks requires some form of understanding of how values can flow within the blocks, for example flow analysis.

"Interprocedural" means that Guile 2.0's CSE operates across closure boundaries. Guile 2.0's CSE is "context-insensitive", in the sense that any possible effect of a function is considered to occur at all call sites; there are newer CSE passes in the literature that separate effects of different call sites ("context-sensitive"), but that's not a Guile 2.0 thing. Being interprocedural was necessary for Guile 2.0, as its intermediate language could not represent (e.g.) loops directly.

The conclusion of my previous article was that although CSE could do cool things, in Guile 2.0 it was ultimately limited by the language that it operated on. Because the Tree-IL direct-style intermediate language didn't define order of evaluation, didn't give names to intermediate values, didn't have a way of explicitly representing loops and other kinds of first-order control flow, and couldn't precisely specify effects, the results, well, could have been better.

I know you all have been waiting for the last 27 months for an update, probably forgoing meaningful social interaction in the meantime because what if I posted a followup while you were gone? Be at ease, fictitious readers, because that day has finally come.

CSE over CPS

The upcoming Guile 2.2 has a more expressive language for the optimizer to work on, called continuation-passing style (CPS). CPS explicitly names all intermediate values and control-flow points, and can integrate nested functions into first-order control-flow via "contification". At the same time, the Guile 2.2 virtual machine no longer penalizes named values, which was another weak point of CSE in Guile 2.0. Additionally, the CPS intermediate language enables more fined-grained effects analysis.

All of these points mean that CSE has the possibility to work better in Guile 2.2 than in Guile 2.0, and indeed it does. The shape of the algorithm is a bit different, though, and I thought some compiler nerds might be interested in the details. I'll follow up in the next section with some things that new CSE pass can do that the old one couldn't.

So, by way of comparison, the old CSE pass was a once-through depth-first visit of the nested expression tree. As the visit proceeded, the pass built up an "environment" of available expressions -- for example, that (car a) was evaluated and bound to b, and so on. This environment could be consulted to see if a expression was already present in the environment. If so, the environment would be traversed from most-recently-added to the found expression, to see if any intervening expression invalidated the result. Control-flow joins would cause recomputation of the environment, so that it only held valid values.

This simple strategy works for nested expressions without complex control-flow. CPS, on the other hand, can have loops and other control flow that Tree-IL cannot express, so for it to build up a set of "available expressions" requires a full-on flow analysis. So that's what the pass does: a flow analysis over the labelled expressions in a function to compute the set of "available expressions" for each label. A labelled expression a is available at label b if a dominates b, and no intervening expression could have invalidated the results. An expression invalidates a result if it may write to a memory location that the result may have read. The code, such as it is, may be found here.

Once you have the set of available expressions for a function, you can proceed to the elimination phase. First, you start by creating an "eliminated variable" map, which initially maps each variable to itself, and an "equivalent expressions" table, which maps "keys" to a set of labels and bound variables. Then you visit each expression in a function, again in topologically sorted order. For each expression, you compute a "key", which is some unique representation of an expression that can be compared by structural equality. Keys that compare as equal are equivalent, and are subject to elimination.

For example, consider a call to the add primitive with variables labelled b and c as arguments. Imagine that b maps to a in the eliminated variable table. The expression as a whole would then have a key representation as the list (primcall add a c). If this key is present in the equivalent expression table, you check to see if any of the equivalent labels is available at the current label. If so, hurrah! You mark the outputs of the current label as being replaced by the outputs of the equivalent label. Otherwise you add the key to the equivalent table, associated with the current label.

This simple algorithm is enough to recursively eliminate common subexpressions. Sometimes the recursive aspect (i.e. noticing that b should be replaced by a), along with the creation of a common key, causes the technique to be called global value numbering (GVN), but CSE seems a better name to me.

The algorithm as outlined above eliminates expressions that bind values. However not all expressions do that; some are used as control-flow branches. For this reason, Guile also computes a "truthy table" with another flow analysis pass. This table computes a set of which branches have been taken to get to each program point. In the elimination phase, if a branch is reached that is equivalent to a previously taken branch, we consult the truthy table to see which continuation the previous branch may have taken. If it can be proven to have taken just one of the legs, the test is elided and replaced with a direct jump.

A few things to note before moving on. First, the "compute an analysis, then transform the function" sequence is quite common in this sort of problem. It leads to some challenges regarding space for the analysis; my last article deals with these in more detail.

Secondly, the rewriting phase assumes that a value that is available may be substituted, and that the result would be a proper CPS term. This isn't always the case; see the discussion at the end of the article on CSE in Guile 2.0 about CPS, SSA, dominators, and scope. In essence, the scope tree doesn't necessarily reflect the dominator tree, so not all transformations you might like to make are syntactically valid. In Guile 2.2's CSE pass, we work around the issue by concurrently rewriting the scope tree to reflect the dominator tree. It's something I am seeing more and more and it gives me some pause as to the suitability of CPS as an intermediate language.

Also, consider the clobbering part of analysis, where e.g. an expression that writes a value to memory has to invalidate previously read values. Currently this is implemented by traversing all available expressions. This is suboptimal and could be quadratic in the end. A better solution is to compute a dependency graph for expressions, which links together operations on the same regions of memory; see LLVM's memory dependency analysis for an idea of how to do this.

Finally, note that this algorithm is global but intraprocedural, meaning that it doesn't propagate values across closure boundaries. It's possible to extend it to be interprocedural, though it's less necessary in the presence of contification.

scalar replacement via fabricated expressions

Let's say you get to an expression at label L, (cons a b). It binds a result c. You determine you haven't seen it before, so you add (primcall cons a b) → L, c to your equivalent expressions set. Cool. We won't be able to replace a future instance of (cons a b) with c, because that doesn't preserve object identity of the newly allocated memory, but it's definitely a cool fact, yo.

What if we add an additional mapping to the table, (car c) → L, a? That way any expression at which L is available would replace (car c) with a, which would be pretty neat. To do so, you would have to add the &read effect to the cons call's effects analysis, but since the cons wasn't really up for elimination anyway it's all good.

Similarly, for (set-car! c d) we can add a mapping of (car c) → d. Again we have to add the &read effect to the set-car, but that's OK too because the write invalidated previous reads anyway.

The same sort of transformation holds for other kinds of memory that Guile knows how to allocate and mutate. Taken together, they form a sort of store-to-load forwarding and scalar replacement that can entirely eliminate certain allocations, and many accesses as well. To actually eliminate the allocations requires a bit more work, but that will be the subject of the next article.

future work

So, that's CSE in Guile 2.0. It works pretty well. In the future I think it's probably worth considering an abstract heap-style analysis of effects; in the end, the precision of CSE is limited to how precisely we can model the effects of expressions.

The trick of using CSE to implement scalar replacement is something I haven't seen elsewhere, though I doubt that it is novel. To fully remove the intermediate allocations needs a couple more tricks, which I will write about in my next nargy dispatch. Until then, happy hacking!

by Andy Wingo at August 25, 2014 09:48 AM

August 18, 2014

Andy Wingo

on gnu and on hackers

Greetings, gentle hackfolk. 'Tis a lovely waning light as I write this here in Munich, Munich the green, Munich full of dogs and bikes, Munich the summer-fresh.

Last weekend was the GNU hackers meeting up in Garching, a village a few metro stops north of town. Garching is full of quiet backs and fruit trees and small gardens bursting with blooms and beans, as if an eddy of Chistopher Alexander whirled out and settled into this unlikely place. My French suburb could learn a thing or ten. We walked back from the hack each day, ate stolen apples and corn, and schemed the nights away.

The program of GHM this year was great. It started off with a bang, as GNUnet hackers Julian Kirsch and Christian Grothoff broke the story that the Five-Eyes countries (US, UK, Canada, Australia, NZ) regularly port-scan the entire internet, looking for vulnerabilities. They then proceed to exploit those vulnerabilities, in regular hack-a-thons, trying to own as many boxes in as many countries as they can. They then use them as launchpads for attacks and for exfiltration of information from other networks.

The presentation that broke this news also proposed a workaround based on port-knocking, Knock. Knock embeds the hash of a pre-shared key with some other information into the 32-bit initial sequence number of a TCP connection. Unlike previous incarnations of port-knocking, Knock also authenticates the first n payload bytes, so that the connection isn't vulnerable to hijacking (e.g. via GCHQ "quantum injection", where well-placed evil routers race the true destination server to provide the first response packet of a connection). Leaking the pwn-the-internet documents with Laura Poitras at the same time as the Knock unveiling was a pretty slick move!

I was also really impressed by Christian's presentation on the GNU name system. GNS is a replacement for DNS whose naming structure mirrors our social naming structure. For example, www.alice.gnu would be my friend Alice, and www.alice.bob.gnu would be Alice's friend Bob. With some integration, it can work on normal desktops and mobile devices. There are lots more details, so check gnunet.org/gns for more information.

Of course, a new naming system does need some operating system support. In this regard Ludovic Courtès' update on Guix was particularly impressive. Guix is a Nix-like system whose goal is reproducible, user-controlled GNU/Linux systems. A couple years ago I didn't think much of it, but now it's actually booting on raw hardware, not just under virtualization, and things seem to be rolling forth as if on rails. Guix manages to be technically innovative at the same time as being GNU-centered, so it can play a unique role in propagating GNU work like GNS.

and yet.

But now, as the dark clouds race above and the light truly goes, we arrive to the point I really wanted to discuss. GNU has a terrible problem with gender balance, and with diversity in general. Of about 70 attendees at this recent GHM, only two were women. We talk the talk about empowering users and working for freedom but, to a first approximation, it's really just a bunch of dudes that think the exact same things.

There are many reasons for this, of course. Some people like to focus on what's called the "pipeline problem" -- that there aren't as many women coming out of computer science programs as men. While true, the proportion of women CS graduates is much higher than the proportion of women at GHM events, so something must be happening in between. And indeed, the attrition rates of women in the tech industry are higher than that of men -- often because we men make it a needlessly unpleasant place for women to be. Sometimes it's even dangerous. The incidence of sexual harassment and assault in tech, especially at events, is something terrible. Scroll down in that linked page to June, July, and August 2014, and ask yourself whether that's OK. (Hint: hell no.)

And so you would think that people who consider themselves to be working for some abstract liberatory principle, as GNU is, would be happy to take a stand against this kind of asshaberdashery. There you would be wrong. Voilà a timeline of an incident.

timeline

March 2014
Someone at the FSF asks a GHM organizer to add an anti-harassment policy to GHM. The organizer does so and puts one on the web page, copying the text from Libreplanet's web site. The policy posted is:

Offensive or overly explicit sexual language or imagery is inappropriate during the event, including presentations.

Participants violating these rules may be sanctioned or expelled from the meeting at the discretion of the organizers.

Harassment includes offensive comments related to gender, sexual orientation, disability, appearance, body size, race, religion, sexual images in public spaces, deliberate intimidation, stalking, harassing photography or recording, persistent disruption of talks or other events, repeated unsolicited physical contact, or sexual attention.

Monday, 11 August 2014
The first mention of the policy is made on the mailing list, in a mail with details about the event that also includes the line:

An anti-harrasment policy applies at GHM: http://gnu.org/ghm/policy.html

Monday, 11 August 2014
A speaker writes the list to say:

Since I do not desire to be denounced, prosecuted and finally sanctioned or expelled from the event (especially considering the physical pain and inconvenience of attending due to my very recent accident) I withdraw my intention to lecture "Introducing GNU Posh" at the GHM, as it is not compliant with the policy described in the page above.

Please remove the talk from the official schedule. Thanks.

PS: for those interested, I may perform the talk off-event in case we find a suitable place, we will see..

The resulting thread goes totes clownshoes and rages on up until the event itself.
Friday, 15 August 2014
Sheepish looks between people that flamed each other over the internet. Hallway-track discussion starts up quickly though. Disagreeing people are not rational enough to have a conversation though (myself included).
Saturday, 16 August 2014
In the morning and lunch break, people start to really discuss the issues (spontaneously). It turns out that the original mail that sparked the thread was based, to an extent, on a misunderstanding: that "offensive or overly explicit sexual language or imagery" was parsed (by a few non-native English speakers) as "offensive language or ...", which people thought was too broad. Once this misunderstanding was removed, there were still people that thought that any policy at all was unneeded, and others that were concerned that someone could say something without intending offense, but then be kicked out of the event. Arguments back and forth. Some people wonder why others can be hurt by "just words". Some discussion of rape culture as continuum between physical violence and cultural tropes. One of the presentations after lunch is by a GNU hacker. He starts his talk by stating his hope that he won't be seen as "offensive or part of rape culture or something". His microphone wasn't on, so once he gets it on he repeats the joke. I stomp out, slam the door, and tweet a few angry things. Later in the evening the presenter and I discuss the issue. He apologizes to me.
Sunday, 17 August 2014
A closed meeting for GNU maintainers to round up the state of GNU and GHM. No women present. After dealing with a number of banalities, we finally broach the topic of the harassment policy. More opposition to the policy Sunday than Saturday lunch. Eventually a proposal is made to replace "offensive" with "disparaging", and people start to consent to that. We run out of time and have to leave; resolution unclear.
Monday, 18 August 2014
GHM organizer updates the policy to remove the words "offensive or" from the beginning of the harassment policy.

I think anyone involved would agree on this timeline.

thoughts

The problems seen over the last week with this anti-harassment policy are entirely to do with the men. It was a man who decided that he should withdraw his presentation because he found it offensive that he could be perceived as offensive. It was men who willfully misread the policy, comparing it to such examples as "I should have the right to offend Microsoft supporters", "if I say the wrong word I will go to jail", and who raised the ignorant, vacuous spectre of "political correctness" to argue that they should be able to say what they want, in a GNU conference, no matter who they hurt, no matter what the effects. That they are able to argue this position from a status-quo perspective is the definition of privilege.

Now, there is ignorance, and there is malice. Both must be opposed, but the former may find a cure. Although I didn't begin my contribution to the discussion in the smoothest way, linking to a an amusing article on the alchemy of intent that is probably misunderstood, it ended up that one of the main points was about intent. I know Ralph (say) and Ralph is a great person and so how could it be that anything Ralph would say could be a slur? You know he wouldn't mean it like that!

To that, we of course have to say that as GNU grows, not everyone knows that Ralph is a great person. In the end what would it mean for someone to be anti-racist but who says racist things all the time? You would have to call them racist, right? Or if you just said something one time, but refused to own up to your error, and instead persisted in repeating a really racist slur -- you would be racist right? But I know you... But the thing that you said...

But then to be honest I wonder sometimes. If someone repeats a joke trivializing rape culture, after making sure that the microphone is picking up his words -- I mean, that's a misogynist action, right? Put aside the question of whether the person is, in essence, misogynist or not. They are doing misogynist things. How do I know that this person isn't going to do it again, private apology or not?

And how do I know that this community isn't going to permit it again? That remark was made to a room of 40 dudes or so. Not one woman was present. Although there was some discussion afterwards, if people left because of the phrase, it was only two or three. How can we then say that GNU is not a misogynist community -- is not a community that tolerates misogyny?

Given all of this, what do you expect? Do you expect to grow GNU into a larger organization in the future, rich and strong and diverse? If that's not your goal, you are not my colleague. And if it is your goal, why do you put up with this kind of behavior?

The discussion on intent and offense seems to have had its effect in the removal of "offensive or" from the anti-harassment policy language. I think it's terrible, though -- if you don't trust someone who says they were offended by sexual language or imagery, why would you trust them when they report sexual harassment or assault? I can only imagine this leading to some sort of argument where the person who has had the courage to report such an incident finds himself or herself in a public witness box, justifying that the incident was offensive. "I'm sorry my words offended you, but that was not my intent, and anyway the words were not offensive." Lolnope.

There were so many other wrong things about this -- a suggestion that we the GNU cabal (lamentably, a bunch of dudes) should form a committee to make the wording less threatening to us; that we're just friends anyway; that illegal things are illegal anyway... it's as if the Code of Conduct FAQ that Ashe Dryden assembled were a Bingo card and we all lost.

Finally I don't think I trusted the organizers enough with this policy. Both organizers expressed skepticism about the policy in such terms that if I personally hadn't won the privilege lottery (white male "western" hetero already-established GNU maintainer) I wouldn't feel comfortable bringing up a concern to them.

In the future I will not be attending any conferences without strong, consciously applied codes of conduct, and I enjoin you to do the same.

conclusion


Propagandhi, "Refusing to Be a Man", Less Talk, More Rock (1996)

There is no conclusion yet -- working for the better world we all know is possible is a process, as people and as a community.

To outsiders, to outsiders everywhere, please keep up the vocal criticisms. Thank you for telling your story.

To insiders, to insiders everywhere, this is your problem. The problem is you. Own it.

by Andy Wingo at August 18, 2014 09:07 PM

August 08, 2014

Iago Toral

Diving into Mesa

Recap

In my last post I gave a quick introduction to the Linux graphics stack. There I explained how what we call a graphics driver in Linux is actually a combination of three different drivers:

  • the user space X server DDX driver, which handles 2D graphics.
  • the user space 3D OpenGL driver, that can be provided by Mesa.
  • the kernel space DRM driver.

Now that we know where Mesa fits let’s have a more detailed look into it.

DRI drivers and non-DRI drivers

As explained, Mesa handles 3D graphics by providing an implementation of the OpenGL API. Mesa OpenGL drivers are usually called DRI drivers too. Remember that, after all, the DRI architecture was brought to life precisely to enable efficient implementation of OpenGL drivers in Linux and, as I introduced in my previous post, DRI/DRM are the building blocks of the OpenGL drivers in Mesa.

There are other implementations of the OpenGL API available too. Hardware vendors that provide drivers for Linux will provide their own implementation of the OpenGL API, usually in the form of a binary blob. For example, if you have an NVIDIA GPU and install NVIDIA’s proprietary driver this will install its own libGL.so.

Notice that it is possible to create graphics drivers that do not follow the DRI architecture in Linux. For example, the NVIDIA proprietary driver installs a Kernel module that implements similar functionality to DRM but with a different API that has been designed by NVIDIA, and obviously, their corresponding user space drivers (DDX and OpenGL) will use this API instead of DRM to communicate with the NVIDIA kernel space driver.

Mesa, the framework

You have probably noticed that when I talk about Mesa I usually say ‘drivers’, in plural. That is because Mesa itself is not really a driver, but a project that hosts multiple drivers (that is, multiple implementations of the OpenGL API).

Indeed, Mesa is best seen as a framework for OpenGL implementators that provides abstractions and code that can be shared by multiple drivers. Obviously, there are many aspects of an OpenGL implementation that are independent of the underlying hardware, so these can be abstracted and reused.

For example, if you are familiar with OpenGL you know it provides a state based API. This means that many API calls do not have an immediate effect, they only modify the values of certain variables in the driver but do not require to push these new values to the hardware immediately. Indeed, usually that will happen later, when we actually render something by calling glDrawArrays() or a similar API: it is at that point that the driver will configure the 3D pipeline for rendering according to all the state that has been set by the previous API calls. Since these APIs do not interact with the hardware their implementation can be shared by multiple drivers, and then, each driver, in their implementation of glDrawArrays(), can fetch the values stored in this state and translate them into something meaningful for the hardware at hand.

As such, Mesa provides abstractions for many things and even complete implementations for multiple OpenGL APIs that do not require interaction with the hardware, at least not immediate interaction.

Mesa also defines hooks for the parts where drivers may need to do hardware specific stuff, for example in the implementation of glDrawArrays().

Looking into glDrawArrays()

Let’s see an example of these hooks into a hardware driver by inspecting the stacktrace produced from a call to glDrawArrays() inside Mesa. In this case, I am using the Mesa Intel DRI driver and I am calling glDrawArrays() from a function named render() in my program. This is the relevant part of the stacktrace:

brw_upload_state () at brw_state_upload.c:651
brw_try_draw_prims () at brw_draw.c:483
brw_draw_prims () at brw_draw.c:578
vbo_draw_arrays () at vbo/vbo_exec_array.c:667
vbo_exec_DrawArrays () at vbo/vbo_exec_array.c:819
render () at main.cpp:363

Notice that glDrawArrays() is actually vbo_exec_DrawArrays(). What is interesting about this stack is that vbo_exec_DrawArrays() and vbo_draw_arrays() are hardware independent and reused by many drivers inside Mesa. If you don’t have an Intel GPU like me, but also use a Mesa, your backtrace should be similar. These generic functions would usually do things like checks for API use errors, reformatting inputs in a way that is more appropriate for later processing or fetching additional information from the current state that will be needed to implement the actual operation in the hardware.

At some point, however, we need to do the actual rendering, which involves configuring the hardware pipeline according to the command we are issuing and the relevant state we have set in prior API calls. In the stacktrace above this starts with brw_draw_prims(). This function call is part of the Intel DRI driver, it is the hook where the Intel driver does the stuff required to configure the Intel GPU for drawing and, as you can see, it will later call something named brw_upload_state(), which will upload a bunch of state to the hardware to do exactly this, like configuring the various shader stages required by the current program, etc.

Registering driver hooks

In future posts we will discuss how the driver configures the pipeline in more detail, but for now let’s just see how the Intel driver registers its hook for the glDrawArrays() call. If we look at the stacktrace, and knowing that brw_draw_prims() is the hook into the Intel driver, we can just inspect how it is called from vbo_draw_arrays():

static void
vbo_draw_arrays(struct gl_context *ctx, GLenum mode, GLint start,
                GLsizei count, GLuint numInstances, GLuint baseInstance)
{
   struct vbo_context *vbo = vbo_context(ctx);
   (...)
   vbo->draw_prims(ctx, prim, 1, NULL, GL_TRUE, start, start + count - 1,
                   NULL, NULL);
   (...)
}

So the hook is draw_prims() inside vbo_context. Doing some trivial searches in the source code we can see that this hook is setup in brw_draw_init() like this:

void brw_draw_init( struct brw_context *brw )
{
   struct vbo_context *vbo = vbo_context(ctx);
   (...)
   /* Register our drawing function:
    */
   vbo->draw_prims = brw_draw_prims;
   (...)
}

Let’s put a breakpoint there and see when Mesa calls into that:

brw_draw_init () at brw_draw.c:583
brwCreateContext () at brw_context.c:767
driCreateContextAttribs () at dri_util.c:435
dri2_create_context_attribs () at dri2_glx.c:318
glXCreateContextAttribsARB () at create_context.c:78
setupOpenGLContext () at main.cpp:411
init () at main.cpp:419
main () at main.cpp:477

So there it is, Mesa (unsurprisingly) calls into the Intel DRI driver when we setup the OpenGL context and it is there when the driver will register various hooks, including the one for drawing primitives.

We could do a similar thing to see how the driver registers its hook for the context creation. We will see that the Intel driver (as well as other drivers in Mesa) assign a global variable with the hooks they need like this:

static const struct __DriverAPIRec brw_driver_api = {
   .InitScreen           = intelInitScreen2,
   .DestroyScreen        = intelDestroyScreen,
   .CreateContext        = brwCreateContext,
   .DestroyContext       = intelDestroyContext,
   .CreateBuffer         = intelCreateBuffer,
   .DestroyBuffer        = intelDestroyBuffer,
   .MakeCurrent          = intelMakeCurrent,
   .UnbindContext        = intelUnbindContext,
   .AllocateBuffer       = intelAllocateBuffer,
   .ReleaseBuffer        = intelReleaseBuffer
};

PUBLIC const __DRIextension **__driDriverGetExtensions_i965(void)
{
   globalDriverAPI = &brw_driver_api;

   return brw_driver_extensions;
}

This global is then used throughout the DRI implementation in Mesa to call into the hardware driver as needed.

We can see that there are two types of hooks then, the ones that are needed to link the driver into the DRI implementation (which are the main entry points of the driver in Mesa) and then the hooks they add for tasks that are related to the hardware implementation of OpenGL bits, typically registered by the driver at context creation time.

In order to write a new DRI driver one would only have to write implementations for all these hooks, the rest is already implemented in Mesa and reused across multiple drivers.

Gallium3D, a framework inside a framework

Currently, we can split Mesa DRI drivers in two kinds: the classic drivers (not based on the Gallium3D framework) and the new Gallium drivers.

Gallium3D is part of Mesa and attempts to make 3D driver development easier and more practical than it was before. For example, classic Mesa drivers are tightly coupled with OpenGL, which means that implementing support for other APIs (like Direct3D) would pretty much require to write a completely new implementation/driver. This is addressed by the Gallium3D framework by providing an API that exposes hardware functions as present in modern GPUs rather than focusing on a specific API like OpenGL.

Other benefits of Gallium include, for example, support for various Operating Systems by separating the part of the driver that relies on specific aspects of the underlying OS.

In the last years we have seen a lot of drivers moving to the Gallium infrastructure, including nouveau (the open source driver for NVIDIA GPUs), various radeon drivers, some software drivers (swrast, llvmpipe) and more.


Gallium3D driver model (image via wikipedia)

Although there were some efforts to port the Intel driver to Gallium in the past, development of the Intel Gallium drivers (i915g and i965g) is stalled now as far as I know. Intel is focusing in the classic version of the drivers instead. This is probably because it would take a large amount of time and effort to bring the current classic driver to Gallium with the same features and stability that it has in its current classic form for many generations of Intel GPUs. Also, there is a lot of work going on to add support for new OpenGL features to the driver at the moment, which seems to be the priority right now.

Gallium and LLVM

As we will see in more detail in future posts, writing a modern GPU driver involves a lot of native code generation and optimization. Also, OpenGL includes the OpenGL Shading Language (GLSL) which directly requires to have a GLSL compiler available in the driver too.

It is no wonder then that Mesa developers thought that it would make sense to reuse existing compiler infrastructure rather than building and using their own: enter LLVM.

By introducing LLVM into the mix, Mesa developers expect to bring new and better optimizations to shaders and produce better native code, which is critical to performance.

This would also allow to eliminate a lot of code from Mesa and/or the drivers. Indeed, Mesa has its own complete implementation of a GLSL compiler, which includes a GLSL parser, compiler and linker as well as a number of optimizations, both for abstract representations of the code, in Mesa, and for the actual native code for a specific GPU, in the actual hardware driver.

The way that Gallium plugs LLVM is simple: Mesa parses GLSL and produces LLVM intermediary representation of the shader code that it can then pass to LLVM, which will take care of the optimization. The role of hardware drivers in this scenario is limited to providing LLVM backends that describe their respective GPUs (instruction set, registers, constraints, etc) so that LLVM knows how it can do its work for the target GPU.

Hardware and Software drivers

Even today I see people who believe that Mesa is just a software implementation of OpenGL. If you have read my posts so far it should be clear that this is not true: Mesa provides multiple implementations (drivers) of OpenGL, most of these are hardware accelerated drivers but Mesa also provides software drivers.

Software drivers are useful for various reasons:

  • For developing and testing purposes, when you want to take the hardware out of the equation. From this point of view, a software representation can provide a reference for expected behavior that is not tied or constrained by any particular hardware. For example, if you have an OpenGL program that does not work correctly we can run it with the software driver: if it works fine then we know the problem is in the hardware driver, otherwise we can suspect that the problem is in the application itself.
  • To allow execution of OpenGL in systems that lack 3D hardware drivers. It would obviously be slow, but in some scenarios it could be sufficient and it is definitely better than not having any 3D support at all.

I initially intended to cover more stuff in this post, but it is already getting long enough so let’s stop here for now. In the next post we will discuss how we can check and change the driver in use by Mesa, for example to switch between a software and hardware driver, and we will then start looking into Mesa’s source code and introduce its main modules.

by Iago Toral at August 08, 2014 10:31 AM

August 06, 2014

Carlos García Campos

GTK+ 3 Plugins in WebKitGTK+ and Evince Browser Plugin

GTK+ 3 plugins in WebKitGTK+

The WebKit2 GTK+ API has always been GTK+ 3 only, but WebKitGTK+ still had a hard dependency on GTK+ 2 because of the plugin process. Some popular browser plugins like flash or Java use GTK+ 2 unconditionally (and it seems they are not going to be ported to GTK+ 3, at least not in the short term). These plugins stopped working in Epiphany when it switched to GTK+ 3 and started to work again when Epiphany moved to WebKit2.

To support GTK+ 2 plugins we had to build the plugin process with GTK+ 2, but also some parts of WebCore and WebKit2 (the ones depending on GTK+ and used by the plugin process) were built twice. As a result we had a WebKitPluginProcess binary of ~40MB, that was always used for all the plugins. This kind of made sense, since there were no plugins using GTK+ 3, and the GTK+ 2 dependency was harmless for plugins not using GTK+ at all. However, we realized we were making a rule for the exception, since most of the plugins don’t even use GTK+, and there weren’t plugins using GTK+ 3 because they were not supported by any browser (kind of chicken-egg problem).

Since WebKitGTK+ 2.5.1 we have two binaries for the plugin process: WebKitPluginProcess2 which is exactly the same 40MB binary using GTK+ 2 that we have always had, but that now is only used to load plugins using GTK+ 2; and WebKitPluginProcess, a 7,4K binary that is now used by default for everything except loading plugins that use GTK+ 2. And since it links to GTK+ 3, it might load plugins using GTK+ 3 as well. Another side effect is that now we can make GTK+ 2 optional, WebKitPluginProcess2 wouldn’t be built and only plugins using GTK+ 2 wouldn’t be supported.

Evince Browser Plugin

For a long time, we have maintained that PDF documents shouldn’t be opened inside the browser, but downloaded and then opened by the default document viewer. But then the GNOME design team came up with new mockups for Epiphany were everything was integrated in the browser, including PDF documents. It’s something all the major browsers do nowadays, using different approaches though (Custom PDF plugin inside the web engine, JavaScript libraries, etc.).

At the WebKitGTK+ hackfest in 2012 we started to think about how to implement the integrated document reading in Epiphany based on the design mockups. We quickly discarded the idea of implementing it as a NPAPI plugin, because that would mean we had to use a very old evince version using GTK+ 2. We can’t implement it inside WebKit using libevince because it’s a GPL library, so the first approach was to implement it inside Epiphany using libevince. I wrote a first patch, it was mostly a proof of concept hack, that added a new view widget based on EvView to be used instead of a WebView when a document supported by evince was requested. This approach has a lot of limitations, since it only works when the main resource is a document, but not for documents embedded in a HTML page or an iframe, and a lot of integration problems that makes it quite difficult to maintain inside Epiphany. All of these issues would be solved by implementing it as a NPAPI plugin and it wouldn’t require any change in Epiphany. Now that WebKitGTK+ supports GTK+ 3 plugins, there’s no reason not to do so.

Epiphany Evince Plugin

Thanks to a project in Igalia I’ve been able to work on it, and today I’ve landed an initial implementation of the browser plugin to Evince git master. It’s only a first implementation (written in C++ 11) with the basic features (page navigation, view modes, zoom and printing), and a very simple UI that needs to be updated to match the mockups. It can be disabled at compile time like all other frontends inside Evince (thumbnailer, previewer, nautilus properties page).

Epiphany embedded PDF document Epiphany standalone PDF document

Another advantage of being a NPAPI plugin is that it’s scriptable so that you can control the viewer using JavaScript.

Epiphany scriptable PDF

And you can pass initial parameters (like current page, zoom level, view mode, etc.) from the HTML tag.

<object data="test.pdf" type="application/pdf" width="600" height="300" 
                currentPage="2" zoomMode="fit-page" continuous="false">
  The pdf could not be rendered.
</object>

You can even hide the default toolbar and build your own one using HTML and JavaScript.

by carlos garcia campos at August 06, 2014 10:45 AM

August 05, 2014

Víctor Jáquez

GUADEC 2014

The last Friday 25 of July, National Day of Galicia, started very early because I had to travel to Strasbourg, official seat of the European Parliament, not for any political duty, but for the GNOME Users and Developers European Conference, the GUADEC!

My last GUADEC was in The Hague, in 2010, though in 2012, when it was hosted in Coruña, I attended a couple talks. Nonetheless, it had been a long time since I met the community, and it was a pleasure to me meet them again.

My biggest impression was the number of attendees. I remember the times in Turkey or in Gran Canaria where hundreds packed the auditoriums and halls. Nowadays the audience was smaller, but that is a good thing, since now you get in touch with the core of developers who drive and move the project easily.

We, Igalia, as sponsors, had a banner in the main room and a table in a corridor. Here is a picture of Juan to prove it:

Juan at the Igalia's both.

Juan at the Igalia’s booth.

Also I ran across with Emmanuele Bassi, setting up a booth to show up the Endless Mobile OS, based on GNOME 3. The people at GUADEC welcomed with enthusiasm the user experience provided by it and the purpose of the project. Personally, I love it. If you don’t know the project, you should visit their web site.

The first talk I attended what the classic GStreamer update by Sebastian Dröge and Tim Müller. They talked about the new features in GStreamer 1.4. Neat stuff in there. I like the new pace of GStreamer, rather of the old stagnated evolution of 0.10 version.

Afterwards, Jim Hall gave us a keynote about Usability in GNOME. I really enjoyed that talk. He studied the usability of several GNOME applications such as Nautilus (aka Files), GEdit, Epiphany (aka Web), etc., as part of his Masters’ research. It was a pleasure to hear that Epiphany is regarded as having a good usability.

After lunch I was in the main room hearing Sylvain Le Bon about sustainable business models for free software. He talked about crowd funding, community management and related stuff.

The next talk was Christian Hergert about his project GOM, an object mapper from GObjects to SQLite, which is used in Grilo to prevent SQL injection by some plugins that use SQLite.

Later on, Marina Zhurakhinskaya gave us one of the best talks of the GUADEC: How to be an ally to women in tech. I encourage you to download the slides and read them. There I learned about the unicorn law and the impostor syndrome.

The day closed with the GNOME Foundation’s teams reports.

Sunday came and I arrived to the venue for the second keynote: Should We Teach The Robot To Kill by Nathan Willis. In his particular style, Nathan, presented a general survey of GNU/Linux in the Automotive Industry.

Next, one of main talks from Igalia: Web 3.12: a browser to make us proud, presented by Edu. It was fairly good. Edu showed us the latest development in WebKitGTK+ and Epiphany (aka Web). There were quite a few questions at the end of the talk. Epiphany nowadays is actively used by a lot of people in the community.

After, Zeeshan presented his GNOME boxes, an user interface for running virtual machines. Later on Alberto Ruiz showed us Fleet Commander, a web application to handle large desktop deployments.

And we took our classic group photo:

Group phoo

Group photo

That Sunday closed with the intern’s lighting talks. Cool stuff is being cooked by them.

On Monday I was in the venue when Emmanuele Bassi talked us about GSK, the GTK+ Scene Graph Kit, his new project, using as a starting point the lessons learned in Clutter. Its objective is to have a scene graph library fully integrated in GTK+.

After the lunch and the second part of the Foundation’s Annual General Meeting, Benjamin Otte gave an amusing talk about the CSS implementation in GTK+. Later, Jasper St. Pierre talked about the Wayland support in GNOME.

When the coffee break ended, the almighty Žan Doberšek gave the other talk from Igalia: Wayland support in WebKit2GTK+.

In the last day of the GUADEC, I attended Bastien Nocera’s talk: Hardware integration, the GNOME way, where he reviewed the history of his contributions to GNOME related with hardware integration and the goal of nicely support most of the hardware in GNOME, like compasses, gyroscopes, et cetera.

Afterwards, Owen Taylor talked us about the GNOME’s continuous integration performance testing, in order to know exactly why one release of GNOME is faster or slower than the last.

And the third keynote came: Matthew Garrett talked us about his experiences with the GNOME community and his vision about where it should go: to enhance the privacy and security of the users, something that many GNOMErs are excited about, such as Federico Mena.

Later on, David King talked about his plans for Cheese, the webcam application, turning it into a DBus service, using the current development of kdbus to sandbox the interaction with the hardware.

Afterwards Christian Hergert talked us about his plans for Builder, a new IDE for GNOME. Promising stuff, but we will see how it goes. Christian said that he is going to take a full year working on this project.

The GUADEC ended with the lighting talks, where I enjoyed one about the problems around the current encryption and security tools.

Finally, the next GUADEC host was unveiled: the Sweden Conspiracy: Gothenburg!

by vjaquez at August 05, 2014 11:46 AM

August 01, 2014

Jacobo Aragunde

Tales of LibreOffice interoperability: the missing files

I’m welcoming the release of LibreOffice 4.3.0 in the name of Igalia with the last post in this series. This time we will talk about the preservation of embeddings in OOXML text documents, and devote some lines to the support of Standard Document Tags.

Embedded content in documents

Our goal this time was the preservation of embedded content in OOXML text documents, as a first step towards full support like we did with other features. The insertion of new embeddings in .docx documents or the edition of existing ones will have to come later in the future.

An embedded document usually consists of two files; one of them is a preview picture to be shown in the parent document, and the other one is the actual embedded document. For the case of a spreadsheet embedded in a text document, the most common case, you will find these two files in the document:

/word/media/image1.emf  
/word/embeddings/Microsoft_Excel_Worksheet1.xlsx 

With the corresponding entries in the relations file:

<Relationship Id="rId2"
  Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/package"
  Target="embeddings/oleObject1.xlsx" />
  <Relationship Id="rId3"
  Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
  Target="media/image1.emf" />

The relevant bits in the document.xml file are below. Notice a w:object consists of one shape, which is filled with data from an image file, and the OLE object itself, linked to the embedded spreadsheet. Also notice the attribute ProgID, which defines the program, document type and version:

<w:object>
  <v:shape id="ole_rId2"
  style="width:362.25pt;height:146.25pt" o:ole="">
    <v:imagedata r:id="rId3" o:title="" />
  </v:shape>
  <o:OLEObject Type="Embed" ProgID="Excel.Sheet.12"
  ShapeID="ole_rId2" DrawAspect="Content"
  ObjectID="_570182397" r:id="rId2" />
</w:object>

There is one more element that allows the embedded file to be properly detected by Word, it’s a content type definition in the content types file:

<Override PartName="/word/embeddings/oleObject1.xlsx"
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" />

As you can see there are three elements that determine the kind of embedding we are dealing with, and Word requires the right combination of the three of them:

  • The properties in the tag in document.xml
  • The ContentType for the file defined in [Content_Types].xml
  • The Type of the Relationship defined in document.xml.rels

The most convenient way to achieve our goal was using the grab bag technique to store the ProgID attribute of the object, and infer the correct content type and relation type. Some examples:

  • An object with ProgID Excel.Sheet.12 is a OOXML spreadsheet. Its media type must be application/vnd.openxmlformats-officedocument.spreadsheetml.sheet and the relation type is http://schemas.openxmlformats.org/officeDocument/2006/relationships/package.
  • If the ProgID is Excel.Sheet.8, this is an old Office spreadsheet. Now the media type must be application/vnd.ms-excel and the relation type http://schemas.openxmlformats.org/officeDocument/2006/relationships/oleObject.

If you detect a particular type of embedding in your documents that isn’t being preserved, drop us a line in the Bugzilla. A patch to add new relations of this kind should be quick and easy.

Bonus track: Structured Document Tags

Structured Document Tags (SDTs) is a family of document objects that contains form-like controls, citations, contents tables or bibliography tables among many other. This variety of uses means that they can live inside a paragraph or they can be a high-level element that contains several paragraphs and even shapes, which of course is tricky to implement.

For 4.3.0 we have worked on some of these tags, and we can say we properly implemented the import and export of combo, date and check boxes. We also wrote some code to preserve generic SDTs and now most of the tags are preserved but there are formatting issues. The proper way to support every kind of SDT is translating them to the equivalent objects in LibreOffice on import and translate them back to SDTs on export, but that will require time and work. Any volunteers? ;)

Wrap-up

Despite the 6-month development cycles, I’m feeling like the development of 4.3 line started a long time ago and I may have forgotten to write about some little feature or fix… Anyway, it’s time to close this batch of blog posts about interoperability features, all of them developed by Igalia and sponsored by CloudOn.

Enjoy our shiny new LibreOffice, and happy hacking!

by Jacobo Aragunde Pérez at August 01, 2014 11:10 AM

Carlos García Campos

WebKitGTK+ 2.5.1: Good bye WebKit1

WebKitGTK+ 2.5.1 is the first version of this release cycle. It comes very late mainly due to the regressions introduced by the switch to CMake and the problems we found after removing WebKit1 from the tree. It also includes some new features that I’ll talk about in other posts, probably when 2.6.0 is released. In this post I’ll only focus on the breaks introduced in this release, in order to help everybody to adapt their applications to the API changes if needed.

Wait, but why breaking the API?

Since the release of WebKitGTK+ 2.0 the WebKit1 API has been considered deprecated and in maintenance mode. The new WebKit2 API is quite complete and stable now, so the plan for WebKitGTK+ 2.6 was removing WebKit1, leaving it alive, but still in maintenance mode, in the 2.4 branch. After removing the code from trunk we realized that newer versions of WebKitGTK+ that are WebKit2 only should be parallel installable with older versions of WebKitGTK+ that also include WebKit1. After some discussions trying to find the best solution, we reached the conclusion that we had to bump the binary version. But then I thought, since we were going to force everybody to recompile, why not take advantage to introduce some small (but necessary) API changes that in most of the cases will not affect the the users anyway? And then I started to review the API and proposing some changes. I also wanted to make sure all API changes were introduced in the first unstable release, so that users only have to adapt their applications once, and that’s the main reason why the release has taken so long.

Binary version bump

The new binary version is 4.0, so to use this new release you need to update your build system to look for webkit2gtk-4.0 pkg-config file.

GObject DOM Bindings

The GObject DOM bindings API situation was actually the main reason for breaking the API. The problem was that the code for the DOM bindings is generated automatically from the IDL files. This means that every time a new IDL file was added to the build system, we ended up exposing a new class in our public API without even noticing. Same happened when a API incompatible change was introduced in an IDL file, for example to update it to the current standard. We added a script to our build bots to warn us when that happened, and then we had to manually deprecate the existing API and add exceptions to the code generator. This was a lot of work just to keep backwards compatibility of an API nobody was using. Most of the people actually use a 5-10% of the DOM bindings API.

Since WebKitGTK+ 2.5.1 the GObject DOM bindings API is split into stable and unstable parts. The stable part contains the most commonly used API that it’s unlikely to change. We will keep maintaining backwards compatibility of this part of the API. The rest of the API is considered unstable and might change at any time, you can still use it but at your own risk. We thought this solution was better than just removing the unstable API. There are two kind of unstable APIs:

  • Classes that are considered unstable: the entire class is considered unstable. The header is not included in the main webkitdom.h header, so to use them you have to include the header file explicitly.
  • Unstable symbols of stable classes: a method or constant in a stable class that is considered unstable. In this case the header file is included by the main webkitfom.h header, but it doesn’t contain any unstable symbols, they are included in a new header WebKitDOMClassNameUnstable.h that also needs to be included explicitly.

In both cases you need to define WEBKIT_DOM_USE_UNSTABLE_API before including the headers

#define WEBKIT_DOM_USE_UNSTABLE_API
#include <webkitdom/WebKitDOMHTMLMediaElement.h>
#include <webkitdom/WebKitDOMElementUnstable.h>

WebKit2 GTK+ API

The API changes in the WebKit2 GTK+ API could have been avoided, by deprecating symbols and adding new ones, but since we were going to break the API anyway, and the affected symbols are not that commonly used we thought it was worth it.

  • WebKitWebView::create: the signal now receives a WebKitNavigationAction parameter containing information about the navigation action that triggered the event. So now you can know the type of event (if it was a link clicked, a form submitted, etc.), the mouse button and keyboard modifiers, the URI request or even if it was a user gesture. This information is very useful to implement a popup blocker, for example.
    /* before */
    static WebKitWebView *
    web_view_created_cb (WebKitWebView *web_view,
                         gpointer       user_data)
    
    /* after */
    static WebKitWebView *
    web_view_created_cb (WebKitWebView          *web_view,
                         WebKitNavigationAction *navigation_action,
                         gpointer                user_data)
  • WebKitWebViewGroup has been removed. This class was only introduced to add the user stylesheets API, since most of the people actually use the default web view group. The grouping of pages inside WebKit2 is something that will be eventually removed, in favor of users doing the groups they need. The user stylesheets API has been moved to a new class WebKitUserContentManager that will also be extended to support user scripts. The settings can still be handled directly with the WebKitWebView API, so that if you want a group of web views to share the same settings you can simply call webkit_web_view_set_settings() for all the web views passing the same WebKitSettings object.
    /* before */
    WebKitWebViewGroup *group = webkit_web_view_get_group (web_view);
    
    webkit_web_view_group_add_user_style_sheet (group, 
                                                buffer, 
                                                NULL, /* base URI */
                                                NULL, /* whitelist */
                                                NULL, /* blacklist */
                                                WEBKIT_INJECTED_CONTENT_FRAMES_ALL);
    
    /* after */
    WebKitUserContentManager *user_content;
    WebKitUserStyleSheet     *style_sheet;
    
    style_sheet = webkit_user_style_sheet_new (buffer,
                                               WEBKIT_USER_CONTENT_INJECT_ALL_FRAMES,
                                               WEBKIT_USER_STYLE_LEVEL_USER,
                                               NULL, /* whitelist */
                                               NULL /* blacklist */);
    user_content = webkit_web_view_get_user_content_manager (web_view);
    webkit_user_content_manager_add_style_sheet (user_content, style_sheet);
    webkit_user_style_sheet_unref (style_sheet);
  • WebKitCertificateInfo has been removed. This was supposed to be a convenient way of handling TLS certificates, but when trying to use it in a real case, it ended up being unconvenient. The WebKitWebView::load-failed-with-tls-errors signal now receives a GTlsCertificate and TlsCertificateFlags, and webkit_web_context_allow_tls_certificate_for_host() receives a GTlsCertificate.
    /* before */
    static gboolean
    load_failed_with_tls_errors_cb (WebKitWebView         *web_view,
                                    WebKitCertificateInfo *info,
                                    const gchar           *host,
                                    gpointer               user_data)
    {
      WebKitWebContext *context = webkit_web_view_get_context (web_view);
      GTlsCertificate *certificate = webkit_certificate_info_get_tls_certificate (info);
      GTlsCertificateFlags errors = webkit_certificate_info_get_tls_errors (info);
    
      if (add_exception_for_error (host, errors))
        webkit_web_context_allow_tls_certificate_for_host (context, info, host);
    }
    
    /* after */
    static gboolean
    load_failed_with_tls_errors_cb (WebKitWebView       *web_view,
                                    GTlsCertificate     *certificate,
                                    GTlsCertificateFlags errors,
                                    const gchar         *host,
                                    gpointer             user_data)
    {
      WebKitWebContext *context = webkit_web_view_get_context (web_view);
    
      if (add_exception_for_error (host, errors))
        webkit_web_context_allow_tls_certificate_for_host (context, certificate, host);
    }
  • View mode API: The view source mode was removed from WebCore, and the API was already marked as deprecated. Since it’s very unlikely we add more view modes, we just removed the API. There’s no replacement for this, but it could be easily implemented either using a external window with a GtkSourceView or embedded into a WebKitWebView by using a custom URI scheme and a JavaScript library for syntax highlighting.

CMake

Since version 2.5.1 WebKitGTK+ uses CMake instead autotools as its build system. The equivalent to configure, make and make install now would be something like this:

$ cd webkitgtk-2.5.1
$ cmake -DCMAKE_INSTALL_PREFIX= -DCMAKE_INSTALL_LIBDIR=lib -DPORT=GTK \
-DCMAKE_BUILD_TYPE=Release .
$ make
(enjoy the summer in the meantime)
# make install

Help!

Sure, we are available as usual in the #webkitgtk+ IRC channel at FreeNode and our mailing list webkit-gtk@lists.webkit.org.

by carlos garcia campos at August 01, 2014 10:27 AM

July 30, 2014

Carlos García Campos

Evince Hackfest

The Evince hackfest took place last week from 23rd to 25th July in Strasbourg. Yes, 3 days only, but very productive in my opinion, I’ll summarize all the cool stuff we worked on.

HiDPI

This work was initially started by Owen, and then Germán kept the patches up to date with evince git master. I reviewed all the pending patches and updated the thumbnails one and the result is that evince doesn’t look blurry on HiDPI screens any more.

Evince running with GDK_SCALE=2

Evince running with GDK_SCALE=2

Recent View

This was a GSoC project of 2013, but the patch provided by the student was never in an “upstreamable” state. Again Germán, who always loved this feature, took care of the patch addressing my review comments. At the beginning of the hackfest most of the work has already been done, we only needed a few more review iterations during the hackfest to finally push this feature to master. The idea is to show the list of recent documents as an icon view with thumbnails and documents metadata. This view is loaded when evince is launched without any document replacing the useless empty window we had before. It also replaces the recent documents submenu in the gear menu.

Evince Recent View

UI improvements

The move to the header bar recently made the toolbar look a bit cluttered, mainly because the title might use a lot of space. We discussed several ideas to improve the header bar and implemented some of them:

Evince header bar improvements

 

Juanjo Marín also wrote a patch to change the default zoom mode to “Automatic”, since several people commented that the current “Fit Width” mode doesn’t look good in screens with higher resolutions. The patch is still waiting review.

Annotations

Giselle and Anuj, our GSoc students this year, worked on their projects to improve the annotations support in both Evince and poppler.

    • Anuj wrote some patches to add support for Free Text annotations to poppler glib API. After a couple of review iterations and discussions about the API, the patches are now in bugzilla waiting for a final review (I hope to find the time soon)
    • Giselle was focused on adding support for highlight annotations to Evince, since poppler already has all the required API for this. The patches are not yet ready, but they look really promising.

 

Caret navigation and accessibility

Joanie and API continued improving the evince a11y support and fixing some remaining issues from the FoG project. Antía fought with the caret navigation implementation again to implement some missing key bindings and fixing other issues.

Comics backend

Juanjo Marín focused on the comics backend, working on a patch to use libarchive to uncompress the documents instead of spawning external command line tools.

Gestures

I started to review the gestures branch during the hackfest, patches looked clean and simple, but since I was not familiarized with the new GTK+ touch API and I didn’t have a touch screen to try it out either, I decided to wait after the hackfest and see it in action in garnacho’s laptop during GUADEC. Carlos explained to me how the touch API works in GTK+ and I could check it actually works great. The code doesn’t affect the normal use with the mouse, so the branch will be merged in master soon.

Evince hackfest dinner

And of course not everything was hacking

THANKS!

Many thanks to Alexandre Franke for the local organization, everything worked perfectly. Of course thanks to the GNOME Foundation for sponsoring the GSoC students, Giselle and Anuj, and Igalia for sponsoring all the Igalians attending the hackfest. Thanks also to Epitech for allowing us to do the hackfest there before the GUADEC.

Igalia S.L. GNOME FoundationEPITECH

by carlos garcia campos at July 30, 2014 03:37 PM

July 29, 2014

Iago Toral

A brief introduction to the Linux graphics stack

This post attempts to be a brief and simple introduction to the Linux graphics stack, and as such, it has an introductory nature. I will focus on giving enough context to understand the role that Mesa and 3D drivers in general play in the stack and leave it to follow up posts to dive deeper into the guts of Mesa in general and the Intel DRI driver specifically.

A bit of history

In order to understand some of the particularities of the current graphics stack it is important to understand how it had to adapt to new challenges throughout the years.

You see, nowadays things are significantly more complex than they used to be, but in the early times there was only a single piece of software that had direct access to the graphics hardware: the X server. This approach made the graphics stack simpler because it didn’t need to synchronize access to the graphics hardware between multiple clients.

In these early days applications would do all their drawing indirectly, through the X server. By using Xlib they would send rendering commands over the X11 protocol that the X server would receive, process and translate to actual hardware commands on the other side of a socket. Notice that this “translation” is the job of a driver: it takes a bunch of hardware agnostic rendering commands as its input and translates them into hardware commands as expected by the targeted GPU.

Since the X server was the only piece of software that could talk to the graphics hardware by design, these drivers were written specifically for it, became modules of the X server itself and an integral part of its architecture. These userspace drivers are called DDX drivers in X server argot and their role in the graphics stack is to support 2D operations as exported by Xlib and required by the X server implementation.


DDX drivers in the X server (image via wikipedia)

In my Ubuntu system, for example, the DDX driver for my Intel GPU comes via the xserver-xorg-video-intel package and there are similar packages for other GPU vendors.

3D graphics

The above covers 2D graphics as that is what the X server used to be all about. However, the arrival of 3D graphics hardware changed the scenario significantly, as we will see now.

In Linux, 3D graphics is implemented via OpenGL, so people expected an implementation of this standard that would take advantage of the fancy new 3D hardware, that is, a hardware accelerated libGL.so. However, in a system where only the X server was allowed to access the graphics hardware we could not have a libGL.so that talked directly to the 3D hardware. Instead, the solution was to provide an implementation of OpenGL that would send OpenGL commands to the X server through an extension of the X11 protocol and let the X server translate these into actual hardware commands as it had been doing for 2D commands before.

We call this Indirect Rendering, since applications do not send rendering commands directly to the graphics hardware, and instead, render indirectly through the X server.


OpenGL with Indirect Rendering (image via wikipedia)

Unfortunately, developers would soon realize that this solution was not sufficient for intensive 3D applications, such as games, that required to render large amounts of 3D primitives while maintaining high frame rates. The problem was clear: wrapping OpenGL calls in the X11 protocol was not a valid solution.

In order to achieve good performance in 3D applications we needed these to access the hardware directly and that would require to rethink a large chunk of the graphics stack.

Enter Direct Rendering Infrastructure (DRI)

Direct Rendering Infrastructure is the new architecture that allows X clients to talk to the graphics hardware directly. Implementing DRI required changes to various parts of the graphics stack including the X server, the kernel and various client libraries.

Although the term DRI usually refers to the complete architecture, it is often also used to refer only to the specific part of it that involves the interaction of applications with the X server, so be aware of this dual meaning when you read about this stuff on the Internet.

Another important part of DRI is the Direct Rendering Manager (DRM). This is the kernel side of the DRI architecture. Here, the kernel handles sensitive aspects like hardware locking, access synchronization, video memory and more. DRM also provides userspace with an API that it can use to submit commands and data in a format that is adequate for modern GPUs, which effectively allows userspace to communicate with the graphics hardware.

Notice that many of these things have to be done specifically for the target hardware so there are different DRM drivers for each GPU. In my Ubuntu system the DRM module for my Intel GPU is provided via the libdrm-intel1:amd64 package.


OpenGL with Direct Rendering (image via wikipedia)

DRI/DRM provide the building blocks that enable userspace applications to access the graphics hardware directly in an efficient and safe manner, but in order to use OpenGL we need another piece of software that, using the infrastructure provided by DRI/DRM, implements the OpenGL API while respecting the X server requirements.

Enter Mesa

Mesa is a free software implementation of the OpenGL specification, and as such, it provides a libGL.so, which OpenGL based programs can use to output 3D graphics in Linux. Mesa can provide accelerated 3D graphics by taking advantage of the DRI architecture to gain direct access to the underlying graphics hardware in its implementation of the OpenGL API.

When our 3D application runs in an X11 environment it will output its graphics to a surface (window) allocated by the X server. Notice, however, that with DRI this will happen without intervention of the X server, so naturally there is some synchronization to do between the two, since the X server still owns the window Mesa is rendering to and is the one in charge of displaying its contents on the screen. This synchronization between the OpenGL application and the X server is part of DRI. Mesa’s implementation of GLX (the extension of the OpenGL specification that addresses the X11 platform) uses DRI to talk to the X server and accomplish this.

Mesa also has to use DRM for many things. Communication with the graphics hardware happens by sending commands (for example “draw a triangle”) and data (for example the vertex coordinates of the triangle, their color attributes, normals, etc). This process usually involves allocating a bunch of buffers in the graphics hardware where all these commands and data are copied so that the GPU can access them and do its work. This is enabled by the DRM driver, which is the one piece that takes care of managing video memory and which offers APIs to userspace (Mesa in this case) to do this for the specific target hardware. DRM is also required whenever we need to allocate and manage video memory in Mesa, so things like creating textures, uploading data to textures, allocating color, depth or stencil buffers, etc all require to use the DRM APIs for the target hardware.


OpenGL/Mesa in the context of 3D Linux games (image via wikipedia)

What’s next?

Hopefully I have managed to explain what is the role of Mesa in the Linux graphics stack and how it works together with the Direct Rendering Infrastructure to enable efficient 3D graphics via OpenGL. In the next post we will cover Mesa in more detail, we will see that it is actually a framework where multiple OpenGL drivers live together, including both hardware and software variants, we will also have a look at its directory structure and identify its main modules, introduce the Gallium framework and more.

by Iago Toral at July 29, 2014 02:51 PM

July 24, 2014

Víctor Jáquez

See you at GUADEC!

Woah! Has been long time since I attended a GUADEC, besides the one in Coruña, of course. And I am very excited about it!

My plan is to chill with other GNOME developers and talk about the future of GNOME OS.

I'm going to GUADEC 2014

I’m going to GUADEC 2014

by vjaquez at July 24, 2014 10:03 AM

Juan A. Suárez

Another year, another GUADEC

It’s 2014, and like previous years:

guadec-2014-badge-large

This time I won’t give any talk, just relax and enjoy talks from others, and hope Strasbourg.

And what is more important, meet those hackers you interact with frequently, and maybe share some beers.

So if you go there, and you want to have a nice chat with me, or talk about Grilo project, don’t hesitate to do it. Igalia, which is kindly sponsoring my attendance, will have a place there during the core days, so likely you could find me around or ask anyone there for me.

Enjoy!

by Juan A. Suárez at July 24, 2014 08:30 AM

July 21, 2014

July 19, 2014

Philippe Normand

Moving to Pelican

Time for a change! Almost 10 years ago I was starting to hack on a Blog engine with two friends, it was called Alinea and it powered this website for a long time. Back then hacking on your own Blog engine was the pre-requirement to host your blog :) But nowadays people just use Wordpress or similar platforms, if they still have a blog at all. Alinea fell into oblivion as I didn’t have time and motivation to maintain it.

Moving to Pelican was quite easy, since I’ve been writing content in ReST on the previous blog I only had to pull data from the database and hacked pelican_import.py a bit :)

Now that this website looks almost modern I’ll hopefully start to blog again, expect at least news about the WebKit work I still enjoy doing at Igalia.

by Philippe Normand at July 19, 2014 02:13 PM

July 18, 2014

Iago Toral

A tour around the world of Mesa and Linux graphics drivers

For some time now I have decided to focus my work at Igalia on the graphics stack. As a result of this I had the chance to participate in a couple of very interesting projects like implementing Wayland support in WebKitGtk+ (a topic I have visited in this blog a number of times) and, lately, work on graphics drivers for Linux in the Mesa framework.

The graphics stack in Linux is complex and it is not always easy to find information and technical documentation that can aid beginners in their firsts steps. This is usually a very demanding domain, the brave individuals who decide to put their energy into it usually have their hands full hacking on the code and they don’t have that much room for documenting what they do in a way that is particularly accessible to newcomers.

As I mentioned above, I have been hacking on Mesa lately (particularly on the Intel i965 driver) and so far it as been a lot of fun, probably the most exciting work I have done at Igalia in all these years, but it is also certainly challenging, requiring me to learn a lot of new things and some times fairly complex stuff.

Getting involved in this is no easy endeavor, the learning curve is steep because the kind of work you do here is probably unlike anything you have done before: for starters it requires a decent understanding of OpenGL and capacity to understand OpenGL specifications and what they mean in the context of the driver, you also need to have a general understanding of how modern 3D-capable GPUs work and finally, you have to dig deeper and understand how the specific GPU that your driver targets works and what is the role that the driver needs to play to make that hardware work as intended. And that’s not all of it, a driver may need to support multiple generations of GPUs which sometimes can be significantly different from each other, requiring driver developers to write and merge multiple code paths that handle these differences. You can imagine the maintenance burden and extra complexity that comes from this.

Finally, we should also consider the fact that graphics drivers are among the most critical pieces of code you can probably have in a system, they need to be performant and stable for all supported hardware generations, which adds to the overall complexity.

All this stuff can be a bit overwhelming in the beginning for those who attempt to give their first steps in this world but I believe that this initial steep learning curve can be smoothed out by introducing some of the most important concepts in a way that is oriented specifically to new developers. The rest will still not be an easy task, it requires hard work, some passion, be willing to learn and a lot of attention to detail, but I think anyone passionate enough should be able to get into it with enough dedication.

I had to go through all this process myself lately, so I figured I am in a very good situation to try and address this problem myself, so that’s why I decided to write a series of posts to introduce people to the world of Mesa and 3D graphics drivers, with a focus on OpenGL and Intel GPUs, which is the area were I am currently developing my work. Although I’ll focus on Intel hardware I believe that many of the concepts that I will be introducing here are general enough so that they are useful also to people interested in other GPUs. I’ll try to be clear about when I am introducing general concepts and when I am discussing Intel specific stuff.

My next post, which will be the first in this series, will serve as an introduction to the Linux graphics stack and Linux graphics drivers. We will discuss what Mesa brings to the table exactly and what we mean when we talk about graphics drivers in Linux exactly. I think that should put us on the right track to start looking into the internals of Mesa.

So that’s it, if you are interested in learning more about Linux graphics and specifically Mesa and 3D graphics drivers, stay tuned! I’ll try my best to post regularly and often.

by Iago Toral at July 18, 2014 09:41 AM

July 10, 2014

Jacobo Aragunde

Hot accessibility for LibreOffice 4.3.0

The first release candidates for LibreOffice 4.3.0 are already bouncing around the internet and, besides great new features like the ones I’ve been explaining in my latest posts, they come with a set of fixes to ease the life of screen reader users and developers alike. This is, once again, a part of the accessibility work we do at Igalia.

Back in April, when the Gran Canaria hackfest took place, I started working on the bug #71556. The problem was that typing on a document triggered a lot of unnecessary text-attributes-changed events. These events had a variety of origins, but most of them were caused by modifications in the internal text attribute rsid, used for change tracking and with no relevance per se for the user. The bug is not completely fixed, but addressing the problem with rsid attribute allowed us to get rid of the most annoying part of it; now LibreOffice only sends one unnecessary event when we type the first character in a new paragraph and not with every keystroke.

Bug #71558 is also related with the same kind of events; in this case, text-attributes-changed was not being triggered when a word became marked as misspelled. Actually, the spell-checking status is not internally treated as a text attribute and because of that there were no events indicating its change. The patch explicitly raises the event which lets the accessibility code check the status of the text attributes and find out the spelling mistake. While I was working on this issue, I also detected a weird behavior when checking the text attributes through the Python API; it resulted to be a bug in the bridge between AT-SPI and ATK, which I reported and fixed too.

A triaging session took me to bug #74681; the main issue reported there had already been fixed for a while and only small bits regarding missing accessible names in some buttons were missing. I fixed that allowing toolbar buttons to use their tooltip text as the accessible name if it is not explicitly set, and now the paragraph properties panel is fully accessible.

Finally, I retook the work I had been doing in relation with ATK roles at bug #39944 and detected wrong mappings for LibreOffice EDIT_BAR, EMBEDDED_OBJECT and HYPER_LINK accessible roles. I fixed them and opened a ticket in ATK bugzilla to create ATK roles for the five cases that were still registering custom roles. Once that ticket is managed, hopefully we will be able to close bug #75191 too, which is related with the deprecation of atk_role_register.

These fixes are added on top of the ones coded in February hackfest, making 4.3.0 the most accessible LibreOffice so far… Until the next version arrives, of course!

by Jacobo Aragunde Pérez at July 10, 2014 10:55 AM

July 04, 2014

Eduardo Lima Mitev

A possibly faster approach to OpenGL rasterization of 2D Web content

Even thought it has been a while since my last entry on this blog, I have been quite busy. During most of last year I brought my modest contributions into an awesome startup that you have probably heard of by now, helping them integrate GNOME technologies into their products. I was lucky to join their team at an early stage of the development and participate in key discussions.

But more on that project on future entries.

Today I want to talk about things that keep me busy these days, and are of course related to Web engines. Specifically, I want to talk about 2D rasterization and the process of putting pixels on the screen as fast as possible (aka, the 60 frames-per-second holy grail). I want to discuss an idea that has the potential to significantly increase the performance of page rendering by utilizing modern GPU capabilities, OpenGL, and a bit of help from Web engines’ magic.

This is a technical article, so if you are not very familiar with 2D rasterization, OpenGL or how Web engines draw stuff, I recommended you to take some time off and read about it. It is truly a world of wonders (and sometimes pain).

Instanced rendering

The core of the idea is based on instanced rendering. It is a fairly well known technique introduced by OpenGL 3.1 and OpenGL-ES 3.0 as extension GL_EXT_draw_instanced.

To draw geometry with OpenGL, one normally submits a primitive to the rendering pipeline. The primitive consists of a collection of vertices, and a number of attributes per each vertex. Traditionally, you could only submit one primitive at a time.

With instanced rendering, it is possible to send several “instances” of the same primitive (the same collection of vertices and attributes) on a single call. This dramatically reduces the overhead of pipeline state changes and gives the GPU driver a better chance at optimizing rendering of instances of a particular geometry.

Hence, it is generally a common practice for OpenGL applications to group rendering of similar geometry into batches, and submit them to the pipeline all at once as instances. This technique is commonly known as batching and merging.

Skia, the 2D rasterizer used by the Chromium and Android projects, and Cairo, a popular 2D rasterizer backing many projects such as GNOME and previous versions of Mozilla Firefox; both to some extent have support for some sort of instanced rendering in their respective GL backends.

Telling instances apart

Ok, it is possible to draw a bunch of primitives at once, but how can we make them look different? A way of customizing individual instances is necessary, otherwise they will all render on top of the previous one. Not very useful.

There are two ways of submitting per-instance information: one is by adding a “divisor” to the buffers containing vertex or attribute information (VBOs), which will tell the pipeline to use the divided chunks as per-instance information instead of per-vertex. glVertexAttribDivisor is used in this case.

The other way is to upload the per-instance information to a buffer texture (or any texture for that matter) and fetch the information of the corresponding vertex by sampling, using a new variable gl_InstanceID available in shader code, as the key. This variable will increase for each instance of the geometry being rendered (as oppose to per vertex, for which you have gl_VertexID).

So, a quick recap so far. We are able to draw several instances of the same geometry at once, very efficiently, and are able to upload arbitrary data to customize each of these instances at will.

But wait, there are caveats.

The ordering problem

So, lets say we can now group together all drawing operations that involve the same primitive (rectangle, line, circle, etc). What happens if we draw (say) a filled rectangle, then a circle on top, and then another rectangle on top of the circle?

Following the simple grouping rule, what will happen is that the two rectangles will be grouped together and drawn first in one call, then the circle. This will not render the expected result, since the circle will end up laying on top of the rectangle that was drawn after it.

This problem is commonly known as “ordering”, and it clearly breaks our otherwise super-performing batching and merging case.

So, in scenes that involve lots of geometry overlapping, the grouping is limited to contents that do not overlap, if we wanted to preserve the right order of operations.

In practice, it means that we first need to separate the content in layers, then group the same primitives within a single layer, and finally submit the batches from each layer in the right order.

But guess what? Browser engines already do exactly that. Engines build a layer tree (among several other trees) with the information contained in the HTML and CSS content (layout, styling, transformations, etc), where the content is separated in render nodes whose content do not normally overlap. The actual process is much more complicated than that, but this simplification is enough to illustrate the idea.

Now, what if?

First, for the sake of presenting an idea, lets ignore the 2D context of a canvas element by now. More on that later. Lets focus on most of the web sites out there.

If we look at the number of primitives typically used by the rendering of a page, they boil down to a handful. Essentially, they are:

  • A rectangle: for almost all HTML elements, which are boxes. And character glyphs! which are normally rendered ahead of time and cached in a texture layout, then texture-mapped onto a rectangle. And images!, which are also texture-mapped onto rectangles.
  • A thin line: for thin (<=1 pixel) borders, inset/outset effect, hr, etc. Thicker borders can be drawn as thin rectangles.
  • A round corner: the quarter of a circle, filled or stroke, used to implement rounded rectangles (hello border-radius).
  • A circle: for bulleted listings, radio-buttons, etc. Argueably, these can be rendered using unicode characters, so no need for specific geometry.

Lets stay with these. There are other cases that I will purposely ignore, like one seen in a rounded rectangle with different thickness in two consecutive borders.

Then we have, for each of these primitives, an evolutionary-like variety of background styles (imaged, colored, repeated, gradient, etc); transformations (rotation, translation, scaling, etc); border styles (again imaged, colored, with different thickness, etc), shadow and blurring effects, and so on.

With a working texture cache, we have a potentially good chance at aggressively grouping together drawing of these primitives, like rectangles for example, for all text glyphs, boxes and images.

So, what if we could submit to a smart shader all the information that describes and tells apart these grouped instances? Is it possible to efficiently pack and then re-interpret in a shader all the styling and transformation complexities of today's CSS-styled HTML elements?

A new approach

Existing 2D rasterizers used in Web engines (at least Skia and Cairo, whose source code is available to me) are general purpose drawing libraries. That means they should render deterministically for any kind of application, not only Web engines. Specifically, they need to avoid the ordering problem explained above, where the result of a set of overlapped drawing operations is different if you change their order.

There are several reasons why modern Web engines use general purpose 2D rasterizers (as opposed to rasterizers written specifically for the needs of Web content rendering). One clear reason is that they existed before (in the case of Cairo at least) as a generic 2D graphics library, and was later used for Web rendering. Other reason is that the implementation of the Canvas 2D spec requires a general purpose 2D API, because that's what it is. And there is a clear benefit in reusing your beautifully optimized Canvas 2D implementation to draw the rest of the Web contents. Also, these libraries evolved from a pixmap (image) backed rendering target, into libraries exploiting the hardware-acceleration of GPU cards. Both libraries now feature an OpenGL(ES) backend that is somehow forced to comply with the previously existing API and behaviors.

But that is sub-optimal for Web engines that simply want to draw non-overlapping content into layers, then draw the layers in order. And even though batching and merging do occur in the GL backends today, it is apparently far from optimal as we will see later.


So, if we completely ignore the ordering problem for the case of Web engines drawing already layered nodes onto an OpenGL based render target, we might be able to aggressively group together potentially all the operations that share the same primitive.

This is of course if, as mentioned above, we are able to describe the particularities of each instance of these primitives, hand them down to a smart shader for rendering, then do all that efficiently so that the performance gained in batching is not lost by uploading tons of instance information to the GPU or running heavy shader code.

It is unclear (to me) whether this is at all possible. That's is why this approach is just an idea yet lacking validation for the real world. But it is a research that could potentially boost performance of Web content rendering.

It shares some similarities with (and was partially inspired by) the way Android does font rendering.

A proof of concept

So, I was set up to write a proof of concept trying to validate or discard the idea as quickly as possible. The purpose is to write the minimum code that would allow meaningful comparison between this approach and exiting rasterizers (Skia being my first target for this), for specific use cases that are relevant to generic Web content rendering (not Canvas 2D).

My proof of concept is being developed at: https://github.com/elima/glr

So far, it just provides a few primitives: rectangle and rounded corner, allowing for 3 basic drawing operations: rectangle (filled or stroked), rounded rectangle (only filled) and character glyph (not text, just single characters).

Then each element drawn can be transformed (scaled, rotated and/or translated), laid-out on the canvas (top, left, width and height), and has a color or texture.

Anti-aliasing is achieved by multisampling with 8 samples per pixel. Character glyphs are not anti-aliased, that was too complex to put in a proof of concept and it is a problem already solved by others anyway. I used the simplest possible path to put a pre-cached glyph on the screen, and for that wrote a super naive texture cache, and used FreeType2 for rasterizing the glyphs.
The idea of including chars was to explore if text glyphs, which accounts for most of typical Web page's content, could be batched together with all others drawings that use a rectangle primitive.

Note should be taken that this proof-of-concept is not intended to become a new library. It is just a vehicle to validate an idea by providing the minimum implementation needed to test its limits. Eventually, all this would have to be implemented in existing libraries. I just happen to be very fluent at glib and C :), as to prototype fast.

Comparisons

Before we jump into FPS excitements, lets clarify that any comparison here should be taken with a grain of salt. 2D rasterizers are complex libraries that do a lot of non-trivial things like anti-aliasing, sub-pixel alignment, color space conversion, adaptation to the specifics of the underlying hardware/driver/GL-version combos, to name just a few.

Thus, any comparison should be put in the context of what code paths are being selected, what rendering operations are being grouped, and when and why they aren't; how many GL operations are submitted to the pipeline to render the same scene, etc.

I have included 3 initial examples that try to illustrate how batching and merging of "compatible" draws (sharing the same underlying primitive) improves performance when ordering is ignored, while at the same time each element can have its own color, layout and transformation. For each example, I have written a Skia counterpart that tries to render exactly the same, to the extent possible, for the sake of comparing.

The data below corresponds to runs in my laptop, which is a Thinkpad T440p running Debian GNU/Linux, has an integrated Intel(tm) GPU (4th gen), and the OpenGL driver is provided by Mesa 10.2.1.

I used apitrace to look at what GL commands are actually sent to the driver.

Lets start with the RectsAndText example. It basically draws a lot of alternating filled rectangles and character glyphs, each with its own color, transformation and layout. In the screencast below, both examples (Skia and glr) are running at the same time. This of course does not reflect real performance since both compete for GPU resources, but I decided to record it this way because the improvement is much better noticed. The frames-per-second decrease proportionally for both examples when run at the same time, so it remains relevant for comparison.

The window in the left corresponds to the Skia example, and the right to glr. The same goes for all screencasts below.



This video file is also available for download.

The difference is considerable. Skia performs at an average of 6-7 FPS while the new approach gives around 40 FPS. That’s a 5x-6x improvement, not bad. Also, notice that CPU usage is considerably higher in the case of Skia.

The interesting thing here is that in the case of glr, all elements are batched together (both rectangles and chars), so only one drawing operation is actually submitted to the pipeline, as you can see in the available apitrace dump. A trace for the corresponding Skia example is also available.



apitrace output for RectAndText glr example



apitrace output for RectAndText Skia example

The next example is Rects, which is similar but renders only rectangles, alternating between filled and stroked. The interesting bit is that in the case of glr, each style of rectangles is drawn onto one different layer, each layer operating on its own separate thread; demonstrating that parallelism is now possible during batching.



This video file is available for download.

In this example, the performance difference is even higher. glr is around 8x faster. Again, apitrace traces for the glr example and the Skia version are available. This time glr submits a total of 2 instanced drawing operations, one for filled rects and one for stroked.



apitrace output for Rects glr example



apitrace output for Rects Skia example

The last example draws several layers of non-overlapping rounded rectangles. As with previous examples, every element is given a unique layout, color and transformation. This example tries to illustrate that because batching operates only at layer level, the more layers you have the less you benefit from this technique. In this particular example, the gap is reduced considerably. In fact it looks like Skia is faster by a few FPS, but it is actually not true. When both examples are run together, Skia is faster, but if run separately, glr example is faster (though not much). I’m still figuring this out.



This video file is available for download.

And the traces for the glr example and the Skia example.



apitrace output for Layers glr example



apitrace output for Layers Skia example


If you are curious about the implementation, take a look at where most of the magic happens: GlrContext, GlrCanvas and GlrBatch objects, and the vertex shader. The rest of the code is mostly API and glue to provide a coherent way to use this approach. Specifically, an abstract concept of "layer" is introduced. The workflow goes this way:

  • For initialization, a context, a rendering target and a canvas object are created. This is similar to how other 2D libraries work.
  • In the rendering loop and for each frame, the canvas is first notified that a new frame will be rendered.
  • Then any number of layer objects are created and attached to the canvas. The drawing API works against a layer (instead of a canvas), and will group and batch all the drawing operations in internal commands. When drawing to a layer finishes, the layer is notified that it is ready.
  • Finally, the canvas is requested to finish the frame, right before swapping buffers. This call will wait for all the attached layers to finish (blocking if needed). Once all complete, the canvas will take the batched commands from each layer, in the order they were attached, and submit them to the pipeline for rendering.

One thing to remark is that layers are self-contained stateful objects, and can survive frames without needing to redraw.

Other benefits

One by-product derived from the fact that layers cache drawing operations in internal commands (which in turn use locally allocated buffers), is that layers now become data-parallel. This is a term rarely used in the context of OpenGL because as you probably know, the way its API is designed makes it a giant state machine, making any parallelization unpractical.

With this approach, layers can be drawn in separated threads (or fully moved to OpenCL), which can bring extra performance if there are several complex layers that need drawing at the same time.

Another potential extra benefit comes from the fact that the canvas renders to a target that is actually a framebuffer backed up by a multisample texture. This means we can use any previously rendered frame as a texture, the same way it currently works in both Chromium and Webkit, where layers are texture-mapped then composited into the final scene.

So, we have the flexibility that, if a particular layer is too complex or slow to draw, we can attach it alone to a canvas, render it, and use the texture as with the current model. But, if we are short on texture memory, it is possible to keep commands batched in layers and render them on every frame. This is kind of similar to what Chromium does, recording draw operations into an SkPicture and then re-playing back when needed.

Future work

This is an approach that needs validation for a number of real world use cases before it can be even considered for testing on a Web engine. It is key to explore how complex information (for example, multi-step gradient backgrounds, or complex border styling with rounded rectangles) can be passed to the shaders and rendered correctly and efficiently. Also, there are shadows and blurring effects, all parametrized to cover the most creative use cases, that also need verification against this model.

Basically, we need to understand the limits of the approach by trying to implement modern W3C specs, selecting the most complex features first.

Other important priorities are:

  • Understand how much workload can be imposed on shaders side before the gained performance starts to degrade.
  • Test on OpenGL-ES and constrained GPU on embedded ARM, to detect the minimum requirements.
  • Figure out how to implement a mid-frame flushing mechanism when texture cache exhausts or command buffers get too large. This is not trivial, since to flush a layer (that is possibly running in a separate thread) it has to be blocked, then the canvas has to wait for all layers below it to finish and then execute their commands, then signal the blocked layer to continue.
  • Try how scrolling would behave if previously batched layers are drawn for every frame, instead of using current scrolling techniques that rely on rolling big textures, or moving several tiles up and down. These techniques impose either great pressure on texture memory, or a lot of complexity on tile management (or both), specially in the context of new super-high resolution screens.

Conclusions and final words

I have tried to detail an idea that although not new, I believe has not been explored in full in the context of Web engines. It relies on two essential hypothesis:

  • That it is possible to batch not only geometry, but the complex attributes of arbitrarily styled HTML elements, and render that geometry as instances using shader code.
  • It is safe to ignore ordering of draw operations during rasterization phase, and leverage on Web engine’s layer tree to solve overlapping.

Modern GPUs and OpenGL APIs have great potential for optimizing 2D rasterization, but as it happens most of the times, there is no one solution to fit all. Instead, each particular application and use case requires a different set of strategies and trade-offs for optimum performance.

This approach, even if valid for a sufficient number of use cases is unlikely to go faster than existing approaches for all tests cases. Even less replace these approaches. This is pretty clear in the case of canvas 2D for example, which will continue to require a general purpose rasterizer. But if there is a sufficient number of use cases that would benefit from this approach to some degree, then maintaining one code path that enables it will already be a win.

Finally, I want to thank Samsung SRA for partially sponsoring the time I dedicated to pursue this idea, and also Igalia and igalians which are always there to back me up and help me move forward.

Now, is there anyone interested in helping me explore this idea further?

by elima at July 04, 2014 08:28 AM

July 01, 2014

Andy Wingo

flow analysis in guile

Greets, and welcome back to the solipsism! I've been wandering the wilderness with my Guile hackings lately, but I'm finally ready to come back to civilization. Hopefully you will enjoy my harvest of forest fruit.

Today's article is about flow analysis and data structures. Ready? Let's rock!

flow analysis

Many things that a compiler would like to know can be phrased as a question of the form, "What do I know about the data flowing through this particular program point?" Some things you might want to know are:

  1. The set of variables that must be live.

  2. The set of variables that happen to be live. This is the same as (1) except it includes variables that aren't needed but haven't been clobbered by anything.

  3. The set of expressions whose results are still valid (i.e., haven't been clobbered by anything else).

  4. An upper and lower bound on the range of numeric variables.

Et cetera. I'll talk about specific instances of flow analysis problems in the future, but today's article is a bit more general.

The first thing to note about these questions is that they don't necessarily need or have unique answers. If GCC decides that it can't prove anything about the ranges of integers in your program, it's not the end of the world -- it just won't be able to do some optimizations that it would like to do.

At the same time, there are answers that are better and worse than others, and answers that are just invalid. Consider a function of the form:

int f():
  int a = 1
  int b = 2
  int c = a + b
  int d = b + c
  ...
  int z = x + y
  return z

In this function, there are 27 different expressions, including the return, and 27 different program points. (You can think of a program point as a labelled sub-expression. In this example none of the expressions have sub-expressions.) If we number the program points in order from 0 to 26, we will have a program that first executes expression 0 (int a = 1), then 1, and so on to the end.

Let's plot some possible solutions to the live variable flow-analysis problem for this program.

Here we see two solutions to the problem (in light and dark blue), along with a space of invalid solutions (in red). The Y axis corresponds to the variables of the program, starting with a on the bottom and finishing with z on the top.

For example, consider position 4 in the program, corresponding to int e = c + d. It is marked in the graph with a vertical bar. After position 4, the only values that are used in the rest of the program are d and e. These are the variables that are contained within the light-blue area. It wouldn't be invalid to consider a, b, and c to be live also, but it also wouldn't be as efficient to allocate space and reason about values that won't contribute to the answer. The dark blue space holds those values that may harmlessly be considered to be live, but which actually aren't live.

It would, however, be invalid to consider the variable f to be live after position 4, because it hasn't been defined yet. This area of the variable space is represented in red on the graph.

Of course, the space of all possible solutions isn't possible to represent nicely on a two-dimensional graph; we're only able to show two with colors, and that not very well as they overlap. This difficulty cuts close to the heart of the data-flow problem: that it ultimately requires computing a two-dimensional answer, which necessarily takes time and space O(n2) in program size.

Or does it?

classical flow analysis frameworks

The classical way to do flow analysis is to iterate a set of data-flow equations over an finite lattice until you reach a fixed point.

That's a pithy sentence that deserves some unpacking. If you're already comfortable with what it means, you can skip a couple sections.

Still here? Cool, me too. Let's take a simple example of sign analysis. The problem is to determine, for the integer variables of a program, at every point in the program, which ones may be negative (-), which ones may be zero (0), and which may be positive (+). All of these are conceptually bit-flags.

For example, in this program:

int f(int x):
 L0:  while (x >= 0)
 L1:    int y = x - 1
 L2:    x = y
 L3:  return x

We can assign the flags -0+ to the argument x as the initial state that flows into L0, because we don't know what it is ahead of time, and it is the only variable in scope. We start by representing the initial state of the solution as a set of sets of state values:

state := {L0: {x: -0+}, L1: Ø, L2: Ø, L3: Ø}

In this notation, Ø indicates a program point that hasn't been visited yet.

Now we iterate through all labels in the program, propagating state to their successors. Here is where the specific problem being solved "hooks in" to the generic classical flow analysis framework: before propagating to a successor, a flow equation transforms the state that flows into a program point to a state that flows out, to the particular successor. In this case we could imagine equations like this:

visit_test(expr, in, true_successor, false_successor):
  if expr matches "if var >= 0":
    # On the true branch, var is not negative.
    propagate(in + {var: in[var] - -}, true_successor)
    # On the false branch, var is not zero and not positive.
    propagate(in + {var: in[var] - 0+}, false_successor)
  else if ...

visit_expr(expr, in, successor):
  if expr matches "left = right - 1":
    if in[right] has +:
      if in[right] has 0:
        # Subtracting one from a non-negative arg may be negative.
        propagate(in + {left: in[right] + -}, successor)
      else
        # Subtracting one from a positive arg may be 0.
        propagate(in + {left: in[right] + 0}, successor)
    else:
      # Subtracting one from a nonpositive arg will be negative.
      propagate(in + {left: -}, successor)
  else if expr matches "left = right":
    propagate(in + {left: in[right]}, successor)
  ...

The meat of classical data-flow analysis is the meet operation:

propagate(out, successor):
  if state[successor] is Ø:
    state[successor] = out
  else
    state[successor] = meet(out, state[successor]):

# A version of meet for sign analysis
meet(out, in):
  return intersect_vars_and_union_values(out, in)

Let's run this algorithm by hand over the example program. Starting from the initial state, we propagate the L0→L1 and L0→L3 edges:

visit_test("if x <= 0", {x: -0+}, L1, L3)
→ propagate({x: 0+}, L1)
→ state[L1] = {x: 0+}
→ propagate({x: -}, L3)
→ state[L3] = {x: -}

Neat. Let's keep going. The successor of L1 is L2:

visit_expr("y = x - 1", {x: 0+}, L2)
→ propagate({x: 0+, y: -0+}, L2)
→ state[L2] = {x: 0+, y: -0+}

L2→L0 is a back-edge, returning to the top of the loop:

visit_expr("x = y", {x: 0+, y: -0+}, L0)
→ propagate({x: -0+, y: -0+}, L0)
→ state[L0] = meet({x: -0+, y: -0+}, state[L0])
→ state[L0] = meet({x: -0+, y: -0+}, {x: -0+})
→ state[L0] = {x: 0+}

Finally, L3 has no successors, so we're done with this iteration. The final state is:

{L0: {x: -0+},
 L1: {x: 0+},
 L2: {x: 0+, y: -0+},
 L3: {x: -}}

which indeed corresponds with what we would know intuitively.

fixed points and lattices

Each of the steps in our example flow analysis was deterministic: the result was calculated from the inputs and nothing else. However the backwards branch in the loop, L2→L0, could have changed inputs that were used by the previous L0→L1 and L0→L3 forward edges. So what we really should do is iterate the calculation to a fixed point: start it over again, and run it until the state doesn't change any more.

It's easy to see in this case that running it again won't end up modifying the state. But do we know that in all cases? How do we know that iteration would terminate at all? It turns out that a few simple conditions are sufficient.

The first thing to ensure is that state space being explored is finite. Here we can see this is the case, because there are only so many ways you can combine -, 0, and +. Each one may be present or not, and so we have 2n = 23 = 8 possible states. The elements of the state array will be a set with at most one entry for each variable, so the whole state space is finite. That at least ensures that an answer exists.

Next, the "meet" operation has to be commutative, associative, and idempotent. The above example used intersect_vars_and_union_values. We intersect vars because it only makes sense to talk about a variable at a program point if the variable dominates the program point. It didn't make sense to propagate y on the L2→L0 branch, for example. It's usually a good idea to model a data-flow problem using sets, as set union and intersection operations fulfill these commutative, associative, and distributive requirements.

Finally, the state being modelled should have a partial order, and functions that add information along control-flow edges -- above, visit_test and visit_expr -- should preserve this partial ordering. That is to say, visit_test and visit_expr should be monotonic. This means that no matter on what control paths data propagates, we keep building towards an answer with more information, making forward progress. This condition is also easily fulfilled with sets, or more generally with any lattice. (A lattice is nothing more than a data type that fulfills these conditions.)

Iterating the data-flow equations until the state stops changing will find a fixed point of the lattice. Whether you find the greatest or least fixed point is another question; I can't help linking to Paul Khuong's old article on Québécois student union strikes for a lovely discussion.

Another question is, how many iterations are necessary to reach a fixed point? I would first note that although in our walk-through we iterated in forward order (L0, L1, L2, L3), we could have visited nodes in any order and the answer would be the same. I'll cut to the chase and say that if:

  1. you represent your state with bitvectors

  2. the control-flow graph is reducible (has only natural loops)

  3. the meet operation on values is bitvector union or intersection

  4. you visit the program points in topologically sorted order

If these conditions are fulfilled, then you will reach a fixed point after LC + 2 iterations, where LC is the "loop-connectness number" of your graph. You can ensure (1), (3), and (4) by construction. (Reverse post-order numbering is an easy way to fulfill (4).) (2) can be ensured by using programming languages without goto (a for loop is always a natural loop) but can be violated by optimizing compilers (for example, via contification).

Loop connectedness is roughly equivalent to the maximum nesting level of loops in the program, which has experimentally been determined to rarely exceed 3. Therefore in practice, data-flow analysis requires a number of steps that is O(n * 5) = O(n) in program size.

For more information on data-flow analysis, including full proofs and references, see Carl Offner's excellent, excellent manuscript "Notes on Graph Algorithms used in Optimizing Compilers". I don't know of any better free resource than that. Thanks, Carl!

an aside: the kCFA algorithms

I just finished describing what I called "classical" data-flow analysis. By that I mean to say that people have been doing it since the 1970s, which is classical enough as far as our industry goes. However with the rise of functional languages in the 1980s, it became unclear how to apply classical data-flow analysis on a language like Scheme. Let's hear it from the horse's mouth:

This brings us to the summer of 1984. The mission was to build the world's most highly-optimising Scheme compiler. We wanted to compete with C and Fortran. The new system was T3, and the compiler was to be called Orbit. We all arrived at WRL and split up responsibility for the compiler. Norman was going to do the assembler. Philbin was going to handle the runtime (as I recall). Jonathan was project leader and (I think) wrote the linker. Kranz was to do the back end. Kelsey, the front end. I had passed the previous semester at CMU becoming an expert on data-flow analysis, a topic on which I completely grooved. All hot compilers do DFA. It is necessary for all the really cool optimisations, like loop-invariant hoisting, global register allocation, global common subexpression elimination, copy propagation, induction-variable elimination. I knew that no Scheme or Lisp compiler had ever provided these hot optimisations. I was burning to make it happen. I had been writing 3D graphics code in T, and really wanted my floating-point matrix multiplies to get the full suite of DFA optimisation. Build a DFA module for T, and we would certainly distinguish ourselves from the pack. So when we divided up the compiler, I told everyone else to back off and loudly claimed DFA for my own. Fine, everyone said. You do the DFA module. Lamping signed up to do it with me.

Lamping and I spent the rest of the summer failing. Taking trips to the Stanford library to look up papers. Hashing things out on white boards. Staring into space. Writing little bits of experimental code. Failing. Finding out *why* no one had ever provided DFA optimisation for Scheme. In short, the fundamental item the classical data-flow analysis algorithms need to operate is not available in a Scheme program. It was really depressing. I was making more money than I'd ever made in my life ($600/week). I was working with *great* guys on a cool project. I had never been to California before, so I was discovering San Francisco, my favorite city in the US and second-favorite city in the world. Silicon Valley in 1984 was beautiful, not like the crowded strip-mall/highway hell hole it is today. Every day was perfect and beautiful when I biked into work. I got involved with a gorgeous redhead. And every day, I went in to WRL, failed for 8 hours, then went home.

It was not a good summer.

At the end of the summer, I slunk back to CMU with my tail between my legs, having contributed not one line of code to Orbit.

Olin Shivers, A history of T

It took him another 7 years, but Shivers stuck with it, and in the end came out with the family of algorithms known as k-CFA. Instead of focusing on loops, which Scheme doesn't have syntactically, Shivers used continuation-passing style to ruthlessly simplify Scheme into a dialect consisting of not much more than function calls, and focused his attention on function calls. The resulting family of flow algorithms can solve flow equations even in the presence of higher-order functions -- a contribution to computer science born out of necessity, failure, and stubbornness.

With all those words, you'd think that I'd be itching to use k-CFA in Guile, and I'm not. Unfortunately even the simplest, least expressive version (0-CFA) is O(n2); 1-CFA is exponential. I don't have time for that. Instead, Guile is able to use classical DFA because it syntactically distinguishes labelled continuations and functions, and contifies functions to continuations where possible, which makes the Scheme DFA problem exactly the same as in any other language.

n times what?

Now that we have established that the number of visit operations is O(n), it remains to be seen what the individual complexity of a visit operation is in order to determine the total complexity. The naïve thing is just to use bitvectors, with each of the bitvectors having as many entries as the program has variables, times however many bits we are using.

This leads to O(|L|*|V|) space and time complexity, where |L| is the number of program points (labels) and |V| is the number of variables. As the number of variables is generally proportional to the size of program, we can approximate this as O(n2).

In practice, this means that we can use data-flow analysis to programs up to about 10000 labels in size. Sign analysis on a 10000-label function would require 100002*3/8 = 37.5 MB of memory, which is already a bit hefty. It gets worse if you need to represent more information. I was recently doing some flow-sensitive type and range inference, storing 12 bytes per variable per program point; for a 10000-label function, that's more than a gigabyte of memory. Badness.

shared tails

Although it was the type inference case that motivated this investigation, sign inference is similar and more simple so let's go with that. The visit_expr and visit_test functions above are only ever going to add additional information about the variables that are used in or defined by an expression; in practice this is a small finite number. What if we chose a representation of state that could exploit this fact by only adding O(1) amounts of data, sharing a common tail with preceding expressions?

If we draw a control-flow graph for the sign analysis program, we get something like:

The goal is to create a data structure that looks like the dominator tree. For "normal" control-flow edges -- those whose destination only have one predecessor -- we can avoid the "meet" operations, and just copy the predecessor's out set to the successor's in set. We then define "meet" as an adjoin operation that effectively conses the new information onto a shared tail, if it wasn't there already. The first iteration through the CFG will initialize the shared tail of a given control-flow join to the set of variables flowing into the join's dominator. Subsequent information will adjoin (cons) on new incoming values. In this case the resulting data structure ends up looking like:

Here the italic references like L1 indicate shared structure, and the tuples annotating the edges represent additional information flow, beyond that information that was already present in the successor's dominator.

Of course, you can implement this with linked lists and it will work fine. The problem there will be lookup speed -- when your visit operation (visit_expr or visit_test) goes to look up the sign of a variable, or the same happens via the meet operation, you get O(n) lookup penalties. Anticipating this, I implemented this with a version of Phil Bagwell's vhashes, which promise O(log n) variable lookup. See Guile's documentation, or Bagwell's excellent paper.

Note that you can't remove items from sets once they have been added in a shared-tail flow analysis; to keep the meet function monotonic, you have to instead insert tombstone entries. Not so nice, but it is what it is.

A shared-tail flow analysis consumes only O(1) additional memory per node, leading to O(n) space complexity. I have some measured space and time graphs below that show this experimentally as well.

space and time

Unfortunately, lookup time on branchy vhashes is really terrible: O(log n) in the best case, and O(n) at worst. This is especially acute because there is no easy way to do unions or intersections on vhashes -- you end up having to compute the unshared heads of the two vhashes you are merging, and looking up elements in one in the other... I could go on, but I think you believe me when I say it gets complicated and slow. It's possible to beat a bitvector approach in time for relatively "big" problems like type analysis, but for common subexpression elimination where I was just storing a bit per expression, it was tough to beat the speed of bitvectors.

I started looking for another solution, and in the end came on a compromise that I am much happier with, and again it's Phil Bagwell to the rescue. Instead of relying on vhashes that explicitly share state, I use Clojure-style persistent sparse bit-sets and bit-maps that share state opportunistically.

Guile's intset module implements a bitvector as a functional tree whose branches are vectors and whose leaves are fixnums. Each leaf represents one range of 32 integers, and each branch on top of it increases the range by a factor of 8. Branches can be sparse, so not all integers in the range of an intset need leaves.

As you would expect, adjoining an element onto such a tree is O(log n). Intersecting is much faster than vhashes though, as intsets partition the key space into power-of-two blocks. Intsets try hard to share state, so that if your adjoin would return the same value, the result is the same object, at the same address. This allows sub-trees to be compared for equality via pointer comparison, which is a great fast-path for intersection and union.

Likewise, Guile's new intmap module allow the association of larger values with integer keys.

science! fetch your white coats and lab books!

I had the chance to actually test the system with all three of these data structures, so I compiled one of Guile's bigger files and recorded the memory used and time taken when solving a 1-bit flow analysis problem. This file has around 600 functions, many of them small nested functions, many of them macro-generated, some of them quite loopy, and one big loopless one (6000 labels) to do the initialization.

First, a plot of how many bytes are consumed per label during while solving this 1-bit DFA.

Note that the X axis is on a log scale.

The first thing that pops out at me from these graphs is that the per-label overhead vhash sizes are indeed constant. This is a somewhat surprising result for me; I thought that iterated convergence would make this overhead depend on the size of the program being compiled.

Secondly we see that the bitvector approach, while quadratic in overall program size, is still smaller until we get to about 1000 labels. It's hard to beat the constant factor for packed bitvectors! Note that I restricted the Y range, and the sizes for the bitvector approach are off the charts for N > 1024.

The intset size is, as we expected, asymptotically worse than vhashes, but overall not bad. It stays on the chart at least. Surprisingly, intsets are often better than vhashes for small functions, where we can avoid allocating branches at all -- note the "shelves" in the intset memory usage, at 32 and 256 entries, respectively, corresponding to the sizes that require additional levels in the tree. Things keep on rising with n, but sublinearly (again, recall that the X axis is on a log scale).

Next, a plot of how many nanoseconds it takes per label to solve the DFA equation above.

Here we see, as expected, intsets roundly beating vhashes for all n greater than about 200 or so, and show sublinear dependence on program size.

The good results for vhashes for the largest program are because the largest program in this file doesn't have any loops, and hardly any branching either. It's the best case for vhashes: all appends and no branches. Unfortunately, it's not the normal case.

It's not quite fair to compare intsets to bitvectors, as Guile's bitvectors are implemented in C and intsets are implemented in Scheme, which runs on a bytecode VM right now. But still, the results aren't bad, with intsets even managing to beat bitvectors for the biggest function. The gains there probably pay for the earlier losses.

This is a good result, considering that the goal was to reduce the space complexity of the algorithm. The 1-bit case is also the hardest case; when the state size grows, as in type inference, the gains of using structure-sharing trees grow accordingly.

conclusion

Let's wrap up this word-slog. A few things to note.

Before all this DFA work in Guile, I had very little appreciation of the dangers of N-squared complexity. I mean, sometimes I had to to think about it, but not often, expecially if your constant factors are low, or so I thought. But I got burned by it; hopefully the next time, if any, will be a long time coming.

I was happily, pleasantly surprised at the expressiveness and power of Bagwell/Clojure-style persistent data structures when applied to the kinds of problems that I work on. Space-sharing can make a fundamental difference to the characteristics of an algorithm, and Bagwell's data structures can do that well. Intsets simplified my implementations because I didn't have to reason much about space-sharing on my own -- finding the right shared tail for vhashes is, as I said, an unmitigated mess.

Finally I would close by saying that I was happy to fail in such interesting (to me) ways. It has been a pleasant investigation and I hope I have been able to convey some of the feeling of it. If you want to see what the resulting algorithm looks like in practice, see compute-truthy-expressions.

Until next time, happy hacking!

by Andy Wingo at July 01, 2014 08:00 AM

June 26, 2014

Jacobo Aragunde

Tales of LibreOffice interoperability: shape effects

We continue introducing features that will be part of the 4.3.0 release of LibreOffice, which is coming soon. After having worked in the preservation of color in shapes, we worked on the different effects that can be applied to shapes and bitmaps.

There are three types of effects that are managed separately in the DrawingML specification.

General shape effects

Examples of these effects are inner or outer shadows, reflections, glow… They can be applied both to vectorial shapes or bitmaps, and several of them can be applied at the same time.

Shape effects sample

These effects are indicated with the a:effectLst tag inside the shape properties tag spPr. I won’t explain their specification in detail because you can find a very good description in this website, but you can get an idea by taking a look at the following example where three effects are applied: glow, inner shadow and reflection:

<a:effectLst>
  <a:glow rad="63500">
    <a:schemeClr val="accent2">
      <a:satMod val="175000" />
      <a:alpha val="40000" />
    </a:schemeClr>
  </a:glow>
  <a:innerShdw blurRad="63500" dist="50800"
  dir="2700000">
    <a:prstClr val="black">
      <a:alpha val="50000" />
    </a:prstClr>
  </a:innerShdw>
  <a:reflection blurRad="6350" stA="52000"
  endA="300" endPos="35000" dir="5400000"
  sy="-100000" algn="bl"
  rotWithShape="0" />
</a:effectLst>

Notice that some effects only have some attributes while others contain color specifications as child elements like the ones explained in the previous post.

3D effects

Shapes and bitmaps can be transformed into 3D objects and get lighting and camera modifications applied to them.

Shape 3D effects sample

These effects are basically controlled by two children tags of spPr. One of them is a:scene3d and controls the camera and lighting, and the other one is a:sp3d which controls the transformation of the shape in a 3D object adding extrusion, bevels and a material effect to the surface. In the same website I linked before, you can read a description of scene3d and sp3d tags and their children. Find an example of their combined use below:

<a:scene3d>
  <a:camera prst="perspectiveRelaxedModerately"
  zoom="150000">
    <a:rot lat="19490639" lon="0"
    rev="12900001" />
  </a:camera>
  <a:lightRig rig="threePt" dir="t">
    <a:rot lat="0" lon="0" rev="4800000" />
  </a:lightRig>
</a:scene3d>
<a:sp3d z="488950" extrusionH="63500"
prstMaterial="metal">
  <a:bevelT w="165100" prst="coolSlant" />
  <a:extrusionClr>
    <a:schemeClr val="tx2" />
  </a:extrusionClr>
</a:sp3d>

Artistic effects

Effects from the last category act like the filters found in image manipulation programs (blur, grain or background removal among others) and that’s why they only can be applied to bitmaps. This is actually not a part of DrawingML spec but an extension over it.

There is an important difference with other filters; these ones come pre-calculated in the document. The bitmap linked by the DrawingML shape already comes with the effect, and the effect specification links a second bitmap that contains the original picture so the effect can be undone. This second bitmap is saved in the relatively new loss-less Windows Media Photo format.

Writer screenshot showing artistic effects

Check the following example of a blip-filled shape; the actual filling comes from the file linked as rId6, while the effect definition is linked to rId7 which is a copy of the original image before the filter was applied:

<a:blip r:embed="rId6" cstate="print">
  <a:extLst>
    <a:ext uri="{BEBA8EAE-BF5A-486C-A8C5-ECC9F3942E4B}">
      <a14:imgProps xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main">
        <a14:imgLayer r:embed="rId7">
          <a14:imgEffect>
            <a14:artisticLightScreen trans="10000" gridSize="6" />
          </a14:imgEffect>
        </a14:imgLayer>
      </a14:imgProps>
    </a:ext>
  </a:extLst>
</a:blip>

These are the relations between the ids and the files contained in the document, as specified at document.xml.rels:

<Relationship Id="rId6"
  Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
  Target="media/image2.png" />
<Relationship Id="rId7"
  Type="http://schemas.microsoft.com/office/2007/relationships/hdphoto"
  Target="media/hdphoto1.wdp" />

The funny thing of this approach is that LO was able to render these effects with no effort, although the program was not aware of the effect parameters or the original bitmap and these were being lost on save.

Preservation

We use again the grab bag technique to save all the tags and attributes related with the effects as a hidden property that will be used later in the export phase to re-build the effect definitions. In the case of artistic effects, we additionally need to make sure that the original bitmap is preserved; LibreOffice doesn’t support the Windows Media Photo format yet, but we can keep the raw stream of data and output it to a properly named file in the exported document. A small cache table is maintained by the exporter code to prevent that the same original file is saved more than once when two or more pictures apply effects to the same image.

We have finished with the improvements related to shapes and pictures, but there are a few interoperability features not yet mentioned which will be covered in a future post. Like the current and previous ones, they were developed by Igalia and sponsored by CloudOn.

Happy hacking!

by Jacobo Aragunde Pérez at June 26, 2014 12:15 PM

June 19, 2014

Manuel Rego

CSS Grid Layout Automatic Placement

In his last post my mate Sergio explained the different syntax to position elements inside a grid. Now is time to talk about the automatic placement feature, how it works and show some examples of its potential.

Auto-placement

The concept is defined in the specification:

Grid items that aren’t explicitly placed are automatically placed into an unoccupied space in the grid container.

So, let’s start with a simple example to show how it works. Imagine that you have the following 2×2 grid:

<div style="display: grid;">
    <div style="grid-row: 1; grid-column: 1 / span 2;">first</div>
    <div style="grid-row: 2; grid-column: 2;">second</div>
    <div>auto</div>
</div>

first item is using the 2 columns on the 1st row, second item is placed on the 2nd row 2nd column. That leaves an empty space on the 2nd row 1st column, where the auto item is placed.

Simple example where the auto-positioned item is placed into the 2nd row 1st column

Simple example where the auto-positioned item is placed into the 2nd row 1st column

Of course you can do more complex things, like setting a specific position in one dimension (row or column) and left the other as auto, request more than one cell for the item, etc. as you can see in the following example of a 3×3 grid:

<div style="display: grid;">
    <div style="grid-column: span 2;">item 1</div>
    <div style="grid-column: 3;">item 2</div>
    <div style="grid-row: span 2;">item 3</div>
    <div style="grid-row: span 2; grid-column: span 2;">item 4</div>
    <div style="grid-row: 2;">item 5</div>
</div>
Example animation of how different auto-positioned items are placed in the grid

Example animation of how different auto-positioned items are placed in the grid

Let’s analyze why elements are positioned as you can see in the animation above:

  • item 1: This is the first element placed, this item need 2 columns and it’s placed in the 1st row 1st-2nd columns, as they’re empty.
  • item 2: This item is only specifying the column, so it’s placed in the 1st row 3rd column, as it’s the first empty cell in the 3rd column.
  • item 3: In this case the items needs 2 rows and it’s placed in the 2nd-3rd rows and 1st column.
  • item 4: Here the example needs 2 rows and 2 columns, we’re lucky as there’re some empty cells so it’s placed in the 2nd-3rd rows and 2nd-3rd columns.
  • item 5: This item needs to be placed in the 2nd row, as there’s no empty cells in that row, it’ll be placed in a new column outside current grid, 2nd row 4th column. Imagine that this item is completely auto (we remove the “grid-row: 2;” style), then it’d be placed in a new row outside current grid, 4th row 1st column.
    Why in a new row and not in a new column? That’s managed by grid-auto-flow property that’ll be explained below.

grid-auto-flow property

Again from the spec:

The grid-auto-flow property controls the direction in which the search for unoccupied space takes place, and whether rows or columns are added as needed to accommodate the content.

The simpler (and easier to understand) values for this property are row (by default) and column. These values define the direction that the auto-placement algorithm should follow to place the auto-positioned items. For example if the value is row the algorithm will try to fill each row before jumping to the next one, adding new rows as necessary.

Also, this property allows to combine these values (row and column) with other keywords (that cannot not be used together):

  • dense: By default auto-placement algorithm is sparse, that means that if placing an auto item has leave some empty cells in the grid (because of the item doesn’t fit) they won’t be used anymore. If dense is used, these holes will be used if smaller items come up later.
  • stack: In this case the auto-placement algorithm looks for the first empty cell (like if it was adding a 1×1 item). Then it places all the auto-positioned items in that cell (independently of their size), stacked atop one another.

As usual it’s easier to understand with a HTML example:

<div style="display: grid;">
    <div style="grid-row: 1; grid-column: 2;">item 1</div>
    <div style="grid-column: span 2;">item 2</div>
    <div>item 3</div>
</div>
Example grid with different values for grid-auto-flow property

Example grid with different values for grid-auto-flow property

In the image you can check how the auto-positioned elements (item 2 and item 3) are placed depending on the different values of grid-auto-flow:

  • sparse (grid-auto-flow: row;):
    • item 2: It looks for an empty area of 2 columns, so it skips cell in 1st row 1st column levaing it as an empty space. The item is placed in the 2nd row 1st-2nd columns.
    • item 3: It looks for the next empty space, and it adds the item in the 3rd row 1st column. This will leave a hole in the 1st row 1st column.
  • dense (grid-auto-flow: row dense;):
    • item 2: This is placed exactly the same than in the sparse case. The item is placed in the 2nd row 1st-2nd columns.
    • item 3: It looks for an empty cell and it adds the item in the 1st row and 1st column as it’s still empty (item 2 didn’t use it before, because of it doesn’t fit there).
  • stack (grid-auto-flow: stack row;):
    • item 2: It looks for the first empty cell, and it finds the 1st row and 1st column. Then all the auto-positioned items will be placed there. So item 2 is placed in 1st row 1st-2nd column (even if it takes more than one cell, it’ll keep using as reference the first empty cell).
    • item 3: Again this is added in the 1st row 1st column over item 2.

Possibilities

As you can imagine this brings a lot of power to the CSS Grid Layout spec. You won’t need to always place your items explicitly in a grid, you can simply pass the responsibility to find the best position to the grid itself.

The typical example is to think in a regular form:

<form>
    <label>Name</label><input type="text" />
    <label>Mail</label><input type="text" />
    <label>Comments</label><textarea></textarea>
    <label>Accept policy</label><input type="checkbox" />
    <div id="buttons">
        <button>Accept</button>
        <button>Cancel</button>
    </div>
</form>

You could just apply the following CSS and you’d have a formatted form “for free”:

    form     { display: grid;           }
    label    { grid-column: 1;          }
    input    { grid-column: 2;          }
    #buttons { grid-column: 1 / span 2; }

The result will be something like the following picture:

Formatting regular form using grid auto-placement feature

Formatting regular form using grid auto-placement feature

Wrap-up

We’ve created a small demo (as part of our grid examples repository) that will allow you to play with the auto-placement feature and understand it better:

http://igalia.github.io/css-grid-layout/autoplacement.html

As you could read during the post, the auto-placement algorithm has been mentioned several times. This algorithm and the implementations details will be explained in a subsequent post.

If you’re wondering about current support in Blink and WebKit, the basis of auto-placement feature is already working on both engines. In Blink span support in auto-positioned items has recently landed and in WebKit the patch is ready to be integrated.
However, the new grid-auto-flow syntax (sparse, dense and stack) is still on the way (follow the Blink and WebKit bugs if you’re interested).
As you can see, we’ve been working hard on implementing the missing bits of this feature and we hope it would be fully supported soon in both Blink and WebKit. :-)

Finally, we’d like to thank again Bloomberg for sponsoring our work around CSS Grid Layout.

Igalia & Bloomberg logos

Igalia and Bloomberg working together to build a better web

by Manuel Rego Casasnovas at June 19, 2014 03:36 PM

June 15, 2014

Javier Muñoz

Visit to INTECO's Cyber-Security Headquarters

Several weeks ago, I was invited by INTECO to attend the seminar 'Secure Coding in C and C++'. The event took place at INTECO's Cyber-Security headquarters in León, Spain.

It was a great coincidence because Robert, the person teaching this seminar, and I are part of the teams in SEI and Igalia collaborating on browser security.

As you may know, the mission of the National Institute for Communication Technologies (INTECO), located in León (Spain)

, is to strengthen cybersecurity, trust, and the protection of privacy with respect to services offered within the information society, providing value to the public, businesses, the Spanish Government, the Spanish academic and research network, the information technology sector and strategic sectors in general. It is a huge responsibility though.

This seminar was included in the Program on Excelence in Cibersecurity (PECS) in order to find and promote talent in Cibersecurity. The seminar focused on producing secure programs and designs using C and C++ languages. The seminar ran for four days providing a detailed explanation of common programming errors in C and C++.

Robert organized the material covering the following topics: string management, dynamic memory management, integral security, formatted output and file I/O. The lessons interleaved theory, practical exercises, mapping issues to CERT coding standards and debate around the flaws and vulnerabilities handled.

Robert is a person with long experience in Information Security. He is currently the Secure Coding Technical Manager in the CERT Program of Carnegie Mellon's Software Engineering Institute. I enjoyed a great time talking to him about the state of the art in Information Security, avoidable software defects and vulnerabilities.

At the end, I would like to thank to INTECO for hosting and organizing this event. They did a great job with the materials, infrastructure and organization. Congrats guys!

by Javier at June 15, 2014 10:00 PM

Collaborating with the Carnegie Mellon Software Engineering Institute on browser security

Those last weeks I was really busy here in Igalia. We were hacking in Chromium/Blink broadly, attending to BlinkOn 2, held our Assembly, enjoyed one of our summits and so on. On the top of all these things we started to collaborate with the Carnegie Mellon Software Engineering Institute (SEI) around browsers security too. Great news!

On the other hand, Robert commented on the SEI's blog about the importance of having in mind secure coding practices to prevent vulnerabilities while coding, and how the CERT Secure Coding Initiative at the SEI is supporting this approach with completed standards for C and Java. By the way, coding standards for C++, Perl and other languages are under development too.

CERT coding standards are valuable resources for the programmer taking care of Information Security. As Robert highlights, secure coding standards itemize coding errors that are the root causes of current software vulnerabilities, prioritizing them by severity, likelihood of exploitation, and remediation costs. Each rule in the standard includes examples of insecure code, as well as secure alternative implementations.

If you are interested about our collaboration with SEI and the research project to evaluate the costs of producing a CERT-conforming implementation of the Chromium browser you should not skip his post. It introduces the rest of lines and colleagues collaborating in this project too.

by Javier at June 15, 2014 10:00 PM

June 13, 2014

Carlos Alberto López Pérez

WebKitGTK+ Performance Bot!

Lately I have been working on the WebKitGTK+ build bots that Igalia maintains.

Some of the improvements we achieved include:

The most interesting one is the performance bot. You can check the numbers at http://perf.webkit.org. That numbers are not valid to compare the performance between the different WebKit ports (EFL/GTK+/Mac), because each one of these performance bots run on different hardware. The idea is to track the performance between revisions, and identify the commits that have a performance impact.

The performance bot we deployed is not very powerful (i5 661), but does the job. I took extra care to ensure that external factors don’t affect the performance results of the tests on the machine. This is a summary of that measures:

  • I disabled cron and every other daemon.
  • Gave the performance tests processes very high priority (nice -19, ionice -c1).
  • Disabled Hyper-Threading: While the tests are run in sequential order, the javascript engine (JSC) or other parts of WebKit can be parallelized. HT is great if you want to squeeze the maximum performance of your machine, but it can become a problem when you care about consistent results related to performance (or you care about real time).
  • By default the performance tests are run inside Xvfb, and this is not great if you are interested in the most realistic scenario possible. So I patched WebKit to allow running the tests on the native X display.

These are some details of the bot we deployed:

  • The machine is an Intel Core i5 661 with 8GB of RAM.
  • As explained, we run the tests on a real X server:
    $ glxinfo | grep "render.*:"
    direct rendering: Yes
    OpenGL renderer string: Mesa DRI Intel(R) Ironlake Desktop 
    
  • The OS were the tests are run is Debian Jessie.
  • The DE is GNOME 3: I disabled every power-save feature of GNOME (including the screensaver). Since the window of the browser that runs the tests is positioned at the top-left of the screen, I installed the extension Activities Configurator that allows to hide the top bar of activities, and to disable the hot upper-left corner.

If you want to run the performance tests on your machine, this is as easy as:

Tools/Scripts/run-perf-tests --platform=gtk

If you want to run them on your X display, simply export the variable USE_NATIVE_XDISPLAY before:

export USE_NATIVE_XDISPLAY=1

If you don’t want to run the full suite (takes no less than 2 hours), you can specify the test names (look on the PerformanceTests directory for the names).

For example to run the test DoYouEvenBench Speedometer (DoYouEvenBench was renamed to Speedometer):

Tools/Scripts/run-perf-tests --platform=gtk Speedometer/Full.html

Check the WebKit wiki for further information and keep and eye on http://perf.webkit.org: If you notice any performance regression, please let us know that by opening a bug.

by clopez at June 13, 2014 01:33 PM

June 05, 2014

Jacobo Aragunde

Tales of LibreOffice interoperability: shape theme colors and styles

This is the latest chapter of the series about interoperability features that will be part of LibreOffice 4.3, brought to you by Igalia and sponsored by CloudOn, like the previous ones.

Last week we explained our work with theme color preservation for fonts and paragraphs, and now we move to a very close topic: the preservation of theme colors and styles on shapes.

XML definitions

DrawingML includes new conventions to denominate colors, wider than the initial OOXML spec as used for paragraphs and fonts. Such color definitions can be used in the context of shape properties or style definitions to define fillings, line colors, gradients…

In first place, English color names (red, black, etc.) can be used besides RGB values and theme color names; and for any color, a set of transformations can be defined, similarly to the tint and shade properties in fonts and paragraph colors, but with more options. An important transformation is alpha to define the transparency of that color.

Find below one example of each color tag (for color names, RGB values or theme colors), with or without transformations:

<!-- color by name, with an alpha transform -->
<a:prstClr val="black">
  <a:alpha val="50000"/>
</a:prstClr>

<!-- color by RGB value, no transforms -->
<a:srgbClr val="FF0000"/>

<!-- theme color, with two transforms -->
<a:schemeClr val="accent1">
  <a:satMod val="175000"/>
  <a:alpha val="40000"/>
</a:schemeClr>

Shapes in DrawingML receive their attributes from two different sources: the style definitions and the own shape properties. The former act as a fallback of the latter; attributes from style definitions will only be applied if they are not overwritten by shape properties. For example, in case a shape contains both a filling definition among its properties and a filling style, the filling definition prevails.

The theme file defines a list of filling and line styles so they can be used by shapes. This is an example of an area filling style which adds a couple of color transformations:

<a:fillStyleLst>
  <a:solidFill>
    <a:schemeClr val="phClr">
      <a:tint val="15000"/>
      <a:satMod val="350000"/>
    </a:schemeClr>
  </a:solidFill>
  <!-- more fill style definitions -->
</a:fillStyleLst>

Notice it’s not using any particular color, phClr is an entry parameter of sorts for a color that will be indicated in the shape style definition. The style definition can add some transformations or even define a gradient fill based on that color.

A line style definition can be a bit more complex, because it has more format options: width, dash pattern, single or double… These options are available for shape properties too.

<a:lnStyleLst>
  <a:ln w="9525" cap="flat" cmpd="sng" algn="ctr">
    <a:solidFill>
      <a:schemeClr val="phClr"/>
    </a:solidFill>
    <a:prstDash val="solid"/>
  </a:ln>
  <!-- more line style definitions -->
</a:lnStyleLst>

This is how a shape from the document would be assigned to the first filling style and the second line style, specifying “accent1″ and “accent2″ colors as parameters:

<wps:style>
  <a:lnRef idx="2"> <!-- use the second element from fillStyleLst -->
    <a:schemeClr val="accent1"/>
  </a:lnRef>
  <a:fillRef idx="1"> <!-- use the first element from lnStyleLst -->
    <a:schemeClr val="accent2"/>
  </a:fillRef>
  ...
</wps:style>

Finally, this is how a shape can overwrite the style-defined attributes for its filling and line. The following XML chunk defines a solid filling for the area using “accent3″ color and a solid color for the shape line using “accent4″:

<wps:spPr> <!-- shape properties tag -->
  ...
  <a:solidFill> <!-- area filling -->
    <a:schemeClr val="accent3"/>
  </a:solidFill>
  <a:ln> <!-- line properties -->
    <a:solidFill>
      <a:schemeClr val="accent4"/>
    </a:solidFill>
  </a:ln>
  ...
</wps:spPr>

The importer code

The duty of the importer code is, in general, applying the style definitions, merging them with the shape properties and transforming the properties in a way that matches LibreOffice internal data model.

Applying the style definitions means getting the corresponding styles from the theme file and applying them to the specific colors indicated in the shape wps:style block. This results in a set of properties that has to be merged with the shape-specific ones, and this is done so the latter have priority, as explained before.

All properties must be transformed to match LibreOffice data model, and specially colors because of their complexity. Theme colors are translated to their RGB value checking the theme definition to match the color name with its value. As for color transformations, the importer applies all the transformations to the original color to get a final RGB value. The only exception is the alpha transformation, which is converted in a transparency value to be stored in the fill properties object (alpha and transparency are complementary; after homogenizing their units, alpha = 1 – transparency).

LibreOffice doesn’t have equivalent concepts to theme colors and color transformations other than alpha, and doesn’t have shape styles either. To prevent information loss, we will use a hidden property in the Shape object called the interop grab bag to fill it with any properties we need to rebuild this information on export:

  • To preserve a theme color, we need to store its name and the complete list of transformations. We will also need the RGB value of the color with all the transformations applied, so we can compare it with the final color to know if the user has changed it. We do it both for line and filling colors.
  • For preset and RGB colors, the loss of the transformations is not important so we don’t consider this special case.
  • To preserve the style definitions, we store the idx attribute of each and their color parameter, with all transformations if necessary. We also store the RGB value of the color including transformations because we will need it too.

The exporter code

The normal behavior of the exporter is translating every LibreOffice shape property to DrawingML to write it to the document. In the case of colors, that means using a a:srgbClr tag to save it in RGB format and optionally adding an alpha transformation calculated from the shape transparency value.

The preservation of shape styles, area and line colors increases a bit the complexity. The shape grab bag must be checked for interoperability information saved in the import phase which should be saved back to the document under certain circumstances.

If there are any style definitions in the shape grab bag, they are always written back to the document; we need no additional verification because style definitions act as a fallback as we already explained. On the other hand, the case of shape properties has some complexity because we must perform several checks before deciding which area and line information we want to save:

  • Does the final color match the original one? If it doesn’t, it means the user has changed it during edition, so the new one must be saved and the information in the grab bag discarded.
  • If the original and final colors match, is there any theme color information in the grab bag? If there is, we must write it to the shape properties.
  • If the original and final colors match and there is no theme color information in the grab bag, we must compare the color with the style color; if they match, it means the shape was using the style color so we must not write a shape property that would overwrite it.
  • Otherwise, the shape was using a custom color different from the style color and must be written to the shape properties.

Bonus track: multiple color gradients

While I was working on this topic, I noticed that DrawingML allows to specify gradient fillings with any number of steps, unlike in previous MS Office documents (and LibreOffice ones) where you could only use two. This is an example of gradient with three steps:

<wps:spPr>
  ...
  <a:gradFill>
    <a:gsLst>
      <a:gs pos="0">
        <a:srgbClr val="ffff00" />
      </a:gs>
      <a:gs pos="50000">
        <a:srgbClr val="ffff33">
          <a:alpha val="20000" />
        </a:srgbClr>
      </a:gs>
      <a:gs pos="100000">
        <a:srgbClr val="ff0000" />
      </a:gs>
    </a:gsLst>
    <a:lin ang="5400000" />
  </a:gradFill>
  ...
</wps:spPr>

Every a:gs tag indicates a point in the gradient extension with the pos attribute, measured as a percentage, and the color of that point. The system should calculate the gradient between the colors of two consecutive points.

LibreOffice does an approximation of those gradients with more than two steps using only two colors when it imports the document. To prevent information loss, we use again the grab bag to store the complete gradient definition. In an analogous way, we store the original, approximated gradient information to be able to compare it with the gradient information in the moment of the save operation.

We can identify three situations in the exporter:

  • The original and final gradients don’t match: it means the user has changed it during edition, so the new one must be saved and the information in the grab bag discarded.
  • The original and final gradients match and there is a full gradient definition stored in the grab bag: we must write that gradient definition to the shape properties.
  • The original and final gradients match and there isn’t any gradient definition stored in the grab bag: the gradient definition comes from the shape style and in that case we must not write the shape property that would, again, overwrite it.

One more thing or two

As you can see, this feature relies on the hidden interop grab bag property which contains more and more information as we increase its use to preserve unsupported properties. We hope to reduce its weight as those properties become natively supported. I would also like to comment that we have complemented our work with a set of unit tests that will help to detect regressions appearing in the future.

We haven’t yet finished with shapes; next post will be related to the preservation of different kinds of shape effects. It will be ready soon!

by Jacobo Aragunde Pérez at June 05, 2014 12:00 PM

May 28, 2014

Jacobo Aragunde

Tales of LibreOffice interoperability: font and paragraph colors

Last week, LibreOffice community branched out the version 4.3 in preparation of the next release in July. We had already introduced one of the new interoperability features that will be part of that release in the previous post in this series, and now we will continue explaining how OOXML theme colors will be preserved in LibreOffice.

Theme colors have a prominent place in the color palette in the latest Microsoft Office versions, as you can see:
Word color elector

They consist on a palette of ten colors, each one of them with five more variations. When a user changes the document theme, a new palette of colors will be loaded and any objects in the document that used one of those colors will be updated with the corresponding one from the new palette.

In the XML document, a theme color is identified by a name, and its variations are implemented with two attributes: shade to lighten the color and tint to darken it.

This is an example of theme color applied to the characters in a text run – a chunk of text with the same properties. In the run properties (rPr) we find the following tag (reference):

<w:rPr>
  <w:color w:val="E5B8B7" w:themeColor="accent2" w:themeTint="66"/>
</w:rPr>

Theme color for character

This is the theme color named accent2 darkened a 66%. The w:val property indicates the RGB code of the final color, it’s ignored if w:themeColor is present.

The preservation of this tag is simple:

  • On import, store w:themeColor, w:themeTint and w:themeShade properties as hidden properties in the text run.
  • Store the original color too!
  • On export, check if the original color has changed:
    • if it did, write only the new color in the w:val property.
    • if it didn’t, write the stored w:themeColor, w:themeTint and w:themeShade properties.

Now, an example of paragraph shade with theme colors (reference):

<w:pPr>
  <w:shd w:val="thinDiagStripe"
         w:color="215868" w:themeColor="accent5" w:themeShade="80"
         w:fill="DBE5F1" w:themeFill="accent1" w:themeFillTint="33" />
</w:pPr>

Paragraph filling with pattern

Paragraph shades can specify two colors, one for the background and another one for the pattern if present, with the type of pattern specified by the w:val attribute. The set of attributes w:color, w:themeColor, w:themeShade and w:themeTint work like in the case of font colors, specifying the RGB code, theme color name, shade and tint modifiers for the pattern; w:fill, w:themeFill w:themeFillTint and w:themeFillShade do the same for the background color.

The preservation of these properties works analogously; they are also stored as a hidden property of the text on import, and saved back on export in case the user hasn’t changed the fill color while editing the document with LibreOffice.

The w:shd tag can also be used in table or table cell properties with the same meaning. And speaking of tables, we also made sure that the table style property was preserved:

<w:tblPr>
  <w:tblStyle w:val="Tablaconcuadrcula" />
  ...
</w:tblPr>

We will continue next week with more shiny features brought to you by Igalia and CloudOn. Stay tuned!

by Jacobo Aragunde Pérez at May 28, 2014 03:05 PM

May 23, 2014

Claudio Saavedra

Sat 2014/May/24

I am a GNOME developer and I do not share Philip Van Hoof's views. He doesn't represent me.

May 23, 2014 11:44 PM