Planet Igalia

April 14, 2014

Jacobo Aragunde

Tales of LibreOffice interoperability: theme fonts

This is the first post in a series that explains the work that we’ve been doing lately in LibreOffice at Igalia, which hopefully will be part of the future 4.3 release.

Document themes is one of the features of the Microsoft Office suite that doesn’t have support in LibreOffice yet. While it happens, which is a hard work that includes extending the ODF standard, we are focusing on the preservation of the theme information when a document coming from Office is edited in LibreOffice and saved back to OOXML.

First of all, we needed to preserve the theme files contained in the document, which are stored at /word/theme/ inside the document. LibreOffice was able to read and parse the theme definition file to assign colors and fonts to the document elements, but it was discarded after that and not preserved on export. Miguel and Andrés took care of that part of the job when working in the preservation of Smart-Art information and it’s already present in 4.2 release.

Then it was my turn to identify and preserve the theme-related attributes in the document, starting with font attributes. From the user point of view, there are two types of theme fonts in Word: Heading, named major internally, and Body, named minor. Besides this distinction, there are three types of fonts that users can manage separately: Latin, Asian and Complex Script (for Hebrew, Arabic, etc.). Users can set one specific font, or a major or minor theme font, for each of the three types.

Word 2010 font selector

What actual font this mess translates to depends on several things:

  • Type of characters: the application decides if a portion of text is written in Latin, Asian or CS checking the range of characters it contains, and uses the font that the user set for that type.
  • The document language: the theme file can define one font per language besides a fallback font per type:
            <!-- default fonts per type -->
            <a:latin typeface="Trebuchet MS" />
            <a:ea typeface="" />
            <a:cs typeface="" />
            <!-- language-specific fonts -->
            <a:font script="Hans" typeface="宋体" />
            <a:font script="Hebr" typeface="Arial" />
  • The default language: the document settings file can define one default language for each font type:
    <w:themeFontLang w:val="en-US" w:eastAsia="zh-CN"
                     w:bidi="he-IL" />

How does it work altogether? With the above definition of the language settings, the default language for CS text is Hebrew (he-IL), and the theme defines the minor font Arial for that language. In the case of latin languages, it will be Trebuchet MS. This was a new behaviour in LibreOffice that we had to implement ourselves. Notice the mixed use of different naming conventions for languages and regions, which requires conversions.

LibreOffice font selector

My work on theme attributes comprises the preservation of font and shape colors too, but this post is already long enough; that will be part of the next chapter. Let me finish with a thank you to CloudOn for funding this development, and to Adam Fyne for his work detecting and triaging these theme preservation issues.

by Jacobo Aragunde Pérez at April 14, 2014 11:42 AM

April 11, 2014

Javier Fernández

New shorthand properties for CSS Grid Layout

I’ve been working for a while already on the implementation of the CSS Grid Layout standard for the WebKit and Blink web engines. It’s a complex specification, indeed, like most of them, so I enjoyed a lot decrypting all the angles behind the language used to define the different CSS properties, their usage and limits, exceptions and so on.  It’s fair to start thanking the WebKit reviewers and Blink owners for their patient and support reviewing patches. It also worth mentioning that the E.D is still a live document with frequent changes and active discussions in the www-style mailing list, which is very active and supportive solving doubts and attending suggestions of the hackers working on the implementation.

Before continue reading, I’d strongly recommend reading the previous posts of my colleges Manuel and Sergio to understand the basic concepts of the CSS Grid Layout and its main features and advantages for the web.

I had the chance to land several patches in WebKit and Blink that improved the current implementation of the standard, both fixing bugs and adapting it to the latest syntax changes introduced in the spec, but perhaps the most noticeable improvements are, so far, the new grid-template and grid shorthands added recently.

The “grid-template” shorthand

Quoting the CSS Grid Layout specs:

The grid-template property is a shorthand for setting grid-template-columns, grid-template-rows, and grid-template-areas in a single declaration. It has several distinct syntax forms:

none | subgrid | <‘grid-template-columns’> / <‘grid-template-rows’> | [<’track-list’>/ ]? [<’line-names’>? <’string’> <’track-size’>?]+

It’s always easier if we have some examples at hand:

grid-template: auto 1fr auto / auto 1fr;
grid-template: 10px / "a" 15px;
grid-template: 10px / (head1) "a" 15px (tail1)
                      (head2) "b" 20px (tail2);
grid-template: (first) 10px repeat(2, (nav nav2) 15px) /       "a b c" 100px (nav)
                                                        (nav2) "d e f" 25px  (nav)
                                                        (nav2) "g h i" 25px  (last);

It’s important to notice that the subgrid functionality is under discussion to be postponed for the level 2 of the specification, hence  it was not implemented, for the time being,  in the shorthand either. This decision had the support of IE and Chromium browsers;   Mozilla partially agree on this, even though with some  doubts.

There was something special in the implementation of this shorthand property. Usually, the CSS property parsing methods are implemented straight forward, avoiding unnecessary or duplicated operations over the parsed value list. However, due to the ambiguity of the shorthand syntax, it’s not clear which form the expression belongs to until reaching the <string> clause. In order to reuse the <grid-template-{row, column}> parsing function, it was necessary to allow rewinding the parsedValue list in case of detecting the wrong form was being processed.

Another remarkable implementation detail was the change in the gridLineName parsing function, required to join the adjoining line names of the last and first columns (nav and nav2 in the example). See below the longhand equivalence of the last case in the previous example:

grid-template-columns: (first) 10px repeat(2, (nav nav2) 15px);
grid-template-rows: 100px (av nav2) 25px (nav nav2) auto (last):
grid-template-areas: "a b c" 
                     "d e f"
                     "g h i";


The “grid” shorthand

Quoting the CSS Grid Layout specs:

The grid property is a shorthand that sets all of the explicit grid properties (grid-template-rows, grid-template-columns, and grid-template-areas) as well as all the implicit grid properties (grid-auto-rows, grid-auto-columns, and grid-auto-flow) in a single declaration.

<‘grid-template’> | [<‘grid-auto-flow’> [<‘grid-auto-columns’>[/ <‘grid-auto-rows’>]?]?]

Even that the shorthand sets both implicit and explicit grid properties, it can be only specified either implicit or explicit grid properties; the missing properties will be set to the initial values. Now let’s see some examples:

grid: 15px / 10px;
grid: row 10px;
grid: column 10px / 20px;

The “grid” shorthand is the recommended mechanism even to define just the  the explicit shorthand, unless web authors are interested on cascade separately the impicit grid properties.

Current status and next steps

Both properties landed Blink trunk rencetly (revisions 170552 and 171143) and and they are waiting for the final review in WebKit, hopefully they will land soon. There are enough layout tests to cover the most common cases but perhaps some additional cases might be added in the future. As it was mentioned, there are certain ambiguities in both shorthands syntax and it’s also important to check out the www-style mailing list looking for changes that might require modifying the implementation, hence adding the proper test cases.

With the implemmentation of these two new shorthands, the properties implementation tasks are almost completed. We are working gonw on fixing bugs and implementing the alignment features. There is a quite important gap between the Blink and WebKit implementation, but we are working on porting patches as soon as possible, since we think it’s important to have both implementations synced.

I’ll attend the WebKit Contributors Meeting next week, so perhaps I could speed up the landing the patches for the shorthand properties. My main goal, though, will be to gather feedback from the WebKit community about the status of the CSS Grid Layout implementation, what features they miss the most, which bugs should have more priority and share with them our future plans at Igalia.

All this work was possbile thanks to the collaboration between Igalia and Bloomberg, We both are working hard to help and promote the wide adpoption of this standar, which will be shipped soon on IE and hopefully also in Chromimum. We are also following the efforts Mozila is doing, which give us the impresion that the interest of most of the browsers on this standar is quite high.

by jfernandez at April 11, 2014 08:34 PM

Claudio Saavedra

Fri 2014/Apr/11

Igalia has opened internship positions for students, to work in tasks around WebKitGTK+. This is a great opportunity for you to become acquaintance with the WebKit project, GNOME, their communities, and also to learn about our company.

Many Igalians started their journey in the company by doing an internship during their study years and eventually came back to join us full-time, so if you are interested to join us at some point but are still in the middle of your studies, this is a great opportunity to get us to know you and have a wonderful summer.

More details in the official announcement. Applications are due on May 8th. Do not hesitate to apply!

April 11, 2014 11:35 AM

March 31, 2014

Sergio Villar

Adventures in the Grid

Hi there, fellow readers. Today I’m starting a mini-series of posts to talk a little bit about the work I’ve been lately doing at Igalia around WebKit and Blink web engines. I’ve been involved in the implementation of a new standard called CSS Grid Layout in both engines. My mate rego has already talked about that, so take a look at his post if you need to know more about the basics. Read it? Great, let’s move on.

The new grid layout

I’d like to stress that grid layout is not (only) about aligning elements in columns and rows (after all you can already do it with tables or complicated layouts using positioned and/or floating blocks right?). What’s interesting is that a grid layout doesn’t have content structure. What does it mean? As the specs say:

unlike tables, grid layout doesn’t have content structure, and thus enables a wide variety of layouts not possible with tables. For example, the children of a grid container can position themselves such that they overlap and layer similar to positioned elements.

In addition, it provides a very fine grained control of the layout of the elements of the grid (generally with the help of media queries) based on device form factors, orientation, available space, etc…

Today, I’m going to focus specifically on how to position elements inside a grid, and in particular how to do that using named grid lines and named grid areas. Going too fast? No problem, let’s recap a bit.

Positioning grid items

In CSS grid layout a grid is defined by a set of intersecting horizontal and vertical lines which divide the space of the grid in areas where to place grid items. Those grid lines are numbered starting from number 1. Based on a declaration like:

	grid-template-rows: 50px 1fr 50px;
	grid-template-columns: 150px 1fr;

we would get the following

Grid lines defining a grid with 3 rows and 2 columns

So in order to position an item in a grid, you have the a very straightforward way of doing it which consist of specifying the the grid line’s start and end positions for either columns, rows or both. Given the above grid, and supposing that you want to place some item in the rightmost area on the bottom, you’d just need to specify it the following way:

	grid-row: 3 / 4;
	grid-column: 2 / 3;

Easy huh? Your item will span from column lines 2 to 3 and from row lines 3 to 4. There are tons of different ways to specify the position and the span for your grid items, so many that perhaps they deserve a separate blog post (just check this section of the specs if you need the details now).

Placing items using named grid lines

Despite being easy and even intuitive (at least for us developers), this way of placing grid items using grid line numbers might not be optimal for web authors. With large grids, knowing the exact grid line our item should span to could become an increasingly difficult task. So, why not give the grid lines a meaningful name? What about giving authors the possibility to assign a name (or multiple) to each grid line so they wouldn’t have to remember/compute the exact index of each grid line?

Naming grid lines is pretty easy. Authors just need to specify the names they want to assign to each line (yeah “names” in plural) in the grid definition by putting them into parentheses surrounding the definitions of the grid track breadths. Something like:

	grid-template-rows: (first start) 100px 40px (middle) 1fr 250px (last)

Based on this definition, the following declarations are totally equivalent:

	grid-row: first / middle;
	grid-row: 1 / middle;
	grid-row: first / 3;
	grid-row: 1 / 3;
	grid-row: start / middle;
	grid-row: start / 3;

I wouldn’t enter into discussing which one is the most descriptive or the best (there are some more), just choose the one you prefer.

Placing items using named areas

Although things like named grid lines ease the design task, authors don’t normally foresee the layout of a web page using a matrix of positions but instead they normally divide the available space in “areas” in their minds, we’re talking about areas like the footer, the side bar, the header, etc… So why don’t take advantage of that and allow the web authors to specify those grid areas?

For example, let’s imagine that we’re designing a website with a header on top, a footer in the bottom, a sidebar with links for example on your left and the important content on your right. We can easily (and very visually I’d say) define these four grid areas using the CSS Grid Layout properties. It’d be something like this:

	grid-template-areas: "  .     header  header"
	                     "sidebar content content"
	                     "sidebar content content"
	                     "  .     footer  footer";

There you go, a 4×3 matrix with 4 different named grid areas. Some remarks:

  • The dot “.” means unnamed area
  • “header” will occupy columns 2 and 3 in the first row
  • “footer” will occupy columns 2 and 3 in the last row
  • “sidebar” will span from 2nd to 3rd row in the first column
  • “content” will fill columns 2 to 3 in rows 2 to 3

That’s all. Based on that definition, we could use the grid-area property to place our elements. We just need to specify the grid area where to place them and the grid will automatically position them based on their size and available free space.

Placing items using grid area’s implicit grid line names

Hmm, not enough for you? You said you still need something extra? So you want the simplicity of grid areas and the flexibility of named grid lines? No problem, the spec defines what’s called the implicit named grid lines generated by each defined grid area. What does it mean? Basically that for any grid area you define, let’s call it “A“, the engine will automatically generate “A-start” and “A-end” grid line names for columns and rows, so you could use them to position any grid item as if they were explicitly defined by you.

If I take the grid defined above with grid-template-areas it’s perfectly valid to position a grid item using the following declarations

	grid-column: header-start / 3;
	grid-row: 2 / sidebar-end;

which are equivalent to

	grid-column: 2 / 3;
	grid-row: 2 / 4;

As you might have wondered, you can mix all these three methods to position grid items together.

How is it implemented?

What I have described should be working fine in latest WebKit nightlies after I landed this. Regarding Blink, there are some missing bits already addressed here that will likely land soon. Obviously this is not a matter of a single change :) , we’re building all this on top of many other changes in WebKit and also in Blink.

In my next post I’ll talk about the implementation details and also about the differences between WebKit and Blink as we have followed slightly different paths to implement this. Stay tuned!


Many thanks for the nice reviews I got from both WebKit reviewers and Blink owners. Also I’d explicitly like to thank the awesome guys at Bloomberg for sponsoring this work.

by svillar at March 31, 2014 03:20 PM

March 25, 2014

Jacobo Aragunde

Document Freedom Day 2014 @ A Coruña

I’m honored to be invited to talk about LibreOffice project as part of the celebration of the Document Freedom Day at the University of Coruña. It’s being organized by GPUL and the other speakers will be Chema Casanova and Juan Antonio Añel.

Come and join us tomorrow, 26th of March, from 18:00 on :)


by Jacobo Aragunde Pérez at March 25, 2014 11:08 AM

March 24, 2014

Carlos García Campos

WebKitGTK+ 2.4.0: the multiprocess made easy

Yes, we did it again, we have just released WebKitGTK+ 2.4.0, another major stable release with a lot of bug fixes, some new features and more complete API.

Multiple Web Processes

This is the most important new feature included in this release, and the one we have spent most of the release cycle with. All started during the WebKitGTK+ hackfest when a team of around 10 people worked together to implement the base of the multi-process support. And at the very end of the release cycle we have been able to turn it on by default in Epiphany.

DOM touch events support

WebKitWebView now processes the touch events happening in the widget to notify the DOM, making modern websites using DOM touch API properly work. Carlos Garnacho has taken a screencast to show it in action

Plugins cache

When the first page containing plugins was loaded, the UI process got blocked for some time, while the plugins were scanned. This was one of the most annoying bugs of WebKitGTK+ introduced in 2.0. Plugins are synchronously scanned on demand during the page load, and it’s something that can’t be avoided. WebKitGTK+ 2.4 uses a persistent cache to store information about all plugins, so that plugins are only scanned the first time or when they change.


As always a huge thanks to all the contributors that make this possible, and very specially in this release to the sponsors of the WebKitGTK+ hackfest 2013 (Igalia and the GNOME Foundation).

WebKit1 deprecation

There’s one last thing I would like to mention. Even when WebKit1 API has been deprecated since we released WebKitGTK+ 2.0, we have kept shipping both APIs in our tarball releases. A decision hasn’t been made yet, but this is probably the last release including the WebKit1 API, we have plans to remove the WebKit1 code from trunk and move all the build bots to run only WebKit2 tests. We encourage everybody to port their applications to WebKit2, submitting bug reports if there’s anything preventing the switch, and of course we are happy to help on IRC, mailing list, etc.

by carlos garcia campos at March 24, 2014 07:26 PM

Víctor Jáquez


I spent a week in Munich. I went there for two reasons: to attend the Web & TV workshop organized by the W3C and to hack along with the gst-gang in the GStreamer Hackfest 2014. All these sponsored by Igalia, my company.

I arrived to Munich on Tuesday evening, and when I reached the Marienplatz metro station, I ran across with a crowd of Bayern Munich fans, chanting songs about the glory of their team, huddling and dancing. And a lot of police officers surrounding the tracks.

The workshop was organized by the W3C Web and TV Interest Group, and intended to spark discussions around how to integrate and standardize TV technologies and the Web.

The first slide of the workshop

The first slide of the workshop

On Wednesday morning, the workshop began. People from Espial and Samsung talked about HbbTV, and japanese broadcasters talked about their Hybridcast. Both technologies seek to enhance the television experience, using the Internet Protocols, the first for Europe, and the former for Japan. Also, broadcasters from Chine showed their approach using ad-hoc technologies. I have to say that Hybricast awed me.


Afterwards, most of the workshop was around the problem of the companion device. People showed their solutions and proposals, in particular about device discovering, and data sharing and control. All the solutions relied on WebSockets and WebRTC for the data sharing between devices.


During the panels, I enjoyed a lot the participation of Jon Piesing, in particular his slide summarizing the specifications used by the HbbTV V2. It’s like juggling specs!

Specifications used by HbbTV V2

Specifications used by HbbTV V2

Broadcaster are very interested in Encrypted Media Extension and Media APIs, for example the specification for a Tuner API. Also there’s a lot of expectation about meta-data handling and defining common TV ontologies.

Finally, there were a couple talks about miscellaneous technologies surrounding the IPTV broadcasting.

The second stage of my visit to Bavaria’s Capital, was the GStreamer Hackfest. It was in the Google Offices, near to the Marienplatz.

Christan Schaller has made a very good summary of what appened along the hackfest. From my side, I worked with Nicolas Dufresne with the v4l2 video converter for the Exynos4, which is a piece required for the hardware acceleration decoding for that platform using v4l2 video decoder.

by vjaquez at March 24, 2014 12:45 PM

March 20, 2014

Andrés Gómez

Side tabs in Empathy

Going quickly to the interesting part.

If you happen to use Ubuntu Saucy 13.10 and would like to have side tabs in Empathy, just write the following commands:

$ sudo add-apt-repository ppa:tanty/ppa

If, in addition to be using Ubuntu Saucy 13.10 you are using also GNOME3 Team’s PPA, you will need to run the following command:

$ sudo add-apt-repository ppa:tanty/gnome3

Finally, update your repositories, upgrade empathy and set the proper configuration:

$ sudo apt-get update && sudo apt-get install empathy
$ gsettings set org.gnome.Empathy.conversation tab-position 'left'

After this, you can just open the chat window in a new Empathy running instance and you should see something like this:

Side tabs in Empathy by ::Tanty::
Side tabs in Empathy, a screenshot by ::Tanty:: on Flickr.


I’m a long time user of Jabber and Empathy. I use it for every day’s communications and, in Igalia, we have several internal rooms in which we coordinate ourselves. Because of the amount of rooms in which I am as a regular basis, Empathy’s chat window is unable to display the tabs of each of them in the top bar of the conversations.

This forces me either to split in different windows or just to navigate among them every now and then to check if there is any interesting update. Quite annoying :) .

Some time ago, #586145 was filed requesting the possibility of having the chat room tabs not only displayed on top but also in other positions, specially in the side.

Hence, I decided to take the existing patch and perform some small changes to the work done by Neil Roberts in order to be able to have these side tabs.

With this new feature, you can change the position of the tabs just by changing a setting, as the position property is bond to it. If you want to set the tabs at ‘top’, ‘left’, ‘bottom’ or ‘right’, you should run, respectively:

$ gsettings set org.gnome.Empathy.conversation tab-position 'top'
$ gsettings set org.gnome.Empathy.conversation tab-position 'left'
$ gsettings set org.gnome.Empathy.conversation tab-position 'bottom'
$ gsettings set org.gnome.Empathy.conversation tab-position 'right'

Now, I’ve uploaded a new version of the patch and I’m waiting to pass the review process and land it.

This is a tiny enhancement on top of the great work that several GNOME developers have done in Empathy over the years. However, it is really making a difference to me so I’ve decided to share it quickly in case someone else would find it useful since it will take a while to come into the main distributions. Hence, I’ve ported it to the Empathy version I’m using in the Ubuntu Saucy 13.10 running on my desktop.

If you want to give it a try, just follow the instructions I’ve written at the beginning of this post.

Final notes

In addition to Empathy, you will be able to find in my PPAs:

  • A working (and custom) version of the faulty official icecc package with patches fixing LP#1182491.
  • A custom version of webkitgtk with patches fixing WK#115650 which will speed up opening new tabs in Web.


by tanty at March 20, 2014 11:55 AM

March 19, 2014

Andrés Gómez

Quickly publishing in your Ubuntu PPA

This is more a note pad for myself with quickly instructions about how to upload a (usually patched) package to my own PPAs.

Patching an existing package

First thing is downloading the sources of the package from the repository that is providing the buggy binary package installed in my system.

For example, when patching webkitgtk, if my installed package is from a vanilla Ubuntu release, I only have to check that I have the source from the official Ubuntu repositories. However, if my installed package is from another PPA, I will have to check that I have the source from it or, if not, I would have to download the needed packages manually. Let’s assume my installed package is coming from the GNOME3 Team Ubuntu PPA:

$ cd ~/ppa/ && mkdir -p webkitgtk/gnome3 && cd webkitgtk/gnome3
$ apt-get source webkitgtk
Reading package lists... Done
Building dependency tree
Reading state information... Done
NOTICE: 'webkitgtk' packaging is maintained in the 'Git' version control system at:
Need to get 9,440 kB of source archives.
Get:1 saucy/main webkitgtk 2.3.2-1ubuntu6~saucy1 (tar) [9,353 kB]
Get:2 saucy/main webkitgtk 2.3.2-1ubuntu6~saucy1 (diff) [82.2 kB]
Get:3 saucy/main webkitgtk 2.3.2-1ubuntu6~saucy1 (dsc) [4,577 B]
Fetched 9,440 kB in 3s (2,769 kB/s)
gpgv: Signature made Sun 22 Dec 2013 01:34:25 AM EET using RSA key ID 153ACABA
gpgv: Can't check signature: public key not found
dpkg-source: warning: failed to verify signature on ./webkitgtk_2.3.2-1ubuntu6~saucy1.dsc
dpkg-source: info: extracting webkitgtk in webkitgtk-2.3.2
dpkg-source: info: unpacking webkitgtk_2.3.2.orig.tar.xz
dpkg-source: info: unpacking webkitgtk_2.3.2-1ubuntu6~saucy1.debian.tar.gz
dpkg-source: info: applying 02_notebook_scroll.patch
dpkg-source: info: applying aarch64.patch
dpkg-source: info: applying fix-ppc.diff
dpkg-source: info: applying fix-aarch64.patch
dpkg-source: info: applying remove-use-lockfree-threadsaferefcounted.patch
dpkg-source: info: applying no-jit-build-failure.patch
dpkg-source: info: applying 0001-GTK-Fails-to-build-with-freetype-2.5.1.patch
dpkg-source: info: applying disable-jit-harder.patch
dpkg-source: info: applying fix-llint-c-loop.patch
dpkg-source: info: applying fix-armv7.patch
dpkg-source: info: applying bugzilla_clear_surface.patch
dpkg-source: info: applying ppc64el.patch
$ cd webkitgtk-2.3.2

Just in case, something I like to do is to add the code from the downloaded package to a local git:

$ git init
$ git add *
$ git commit -m "Initial commit"

Then, it is time to apply the needed changes to the source code. This is the reason why git comes handy, in case these changes are not trivial and they need actually some more work. When we are done with the changes, we have to add them to the debian package as an additional patch to the original source. We use dpkg-source for this:

$ dpkg-source --commit
dpkg-source: info: local changes detected, the modified files are:
Enter the desired patch name: 0001-GTK-WK2-Blocks-when-fetching-plugins-information.patch

We enter the patch name and the description of the changes:

$ cat debian/patches/0001-GTK-WK2-Blocks-when-fetching-plugins-information.patch
Description: [GTK][WK2] Blocks when fetching plugins information

Patch by Carlos Garcia Campos.
Reviewed by Gustavo Noronha Silva.
Use a persistent cache to store the plugins metadata to avoid
having to load all the plugins everytime a plugin is used for the
first time.
* Shared/Plugins/Netscape/NetscapePluginModule.h:
* Shared/Plugins/Netscape/x11/NetscapePluginModuleX11.cpp:
(WebKit::NetscapePluginModule::parseMIMEDescription): Make this
method public.
(WebKit::NetscapePluginModule::buildMIMEDescription): Added this
helper to build the MIME description string.
* UIProcess/Plugins/gtk/PluginInfoCache.cpp: Added.
* UIProcess/Plugins/gtk/PluginInfoCache.h: Added.
* UIProcess/Plugins/unix/PluginInfoStoreUnix.cpp:
(WebKit::PluginInfoStore::getPluginInfo): Check first if we have
metadata of the plugin in the cache and update the cache if we
loaded the plugin to get its metadata.

Finally, we modify the release information adding or increasing the non-maintainer digit. For example, in this case the downloaded source version was 2.3.2-1ubuntu6~saucy1, so I’m setting 2.3.2-1ubuntu6~saucy1.1. Also, remember to provide the proper distribution name or to modify it when writing down the log of the changes. In this case, we are using saucy. Check also that you are using the proper email for the log. In my PPAs I use my personal one:

$ DEBEMAIL="" dch -n -D saucy
$ cat debian/changelog
webkitgtk (2.3.2-1ubuntu6~saucy1.1) saucy; urgency=low

* Fixes #115650:
- debian/patches/0001-GTK-WK2-Blocks-when-fetching-plugins-information.patch

-- Andres Gomez <> Wed, 19 Mar 2014 14:26:19 +0200

With this, we are done modifying the source of the package.

Building the source package

We just have to take into account that, when you have more than one GPG key available, the signature of the package will fail during the process, as in:

$ debuild -S -rfakeroot
Finished running lintian.
Now signing changes and any dsc files...
signfile webkitgtk_2.3.2-1ubuntu6~saucy1.1.dsc Andres Gomez <>
gpg: skipped "Andres Gomez <>": secret key not available
gpg: /tmp/debsign.vhpVY32w/webkitgtk_2.3.2-1ubuntu6~saucy1.1.dsc: clearsign failed: secret key not available
debsign: gpg error occurred! Aborting....
debuild: fatal error at line 1280:
running debsign failed

Hence, you have to provide the key id to use in the -k parameter.

In addition, if the sources used for the package are not coming from one of the official Ubuntu repositories you will need to provide also the sources when uploading to the PPA. For this, you have to pass the -sa parameter. For the used example, as we are taking the source from the GNOME3 Team Ubuntu PPA, we will pass this parameter as in:

$ debuild -S -sa -rfakeroot -k3FEA1034

While for other packages which we modify directly from the sources of the official packages provided by Ubuntu, we just use:

$ debuild -S -rfakeroot -k3FEA1034

Optional local build

A local build is not really necessary but it will tell you if your applied changes are breaking or not the compilation of the package.

The best way of doing a trustful local build is using pbuilder.

When using pbuilder we have to be sure that we are using the proper packages not only from Ubuntu’s official repositories but also from the PPAs our target PPA depends on and also our own PPA itself.

I’ve already created the tarballs with the chroot distributions for my own PPAs. However, in order to show an example, we would be using a line like the following one for creating a new tarball for my gnome3 PPA which depends in my ppa PPA and also in GNOME3 Team’s gnome3 PPA:

$ sudo pbuilder --create --distribution saucy --mirror "" --othermirror "deb saucy restricted universe multiverse|deb saucy-updates main restricted universe multiverse|deb saucy-proposed main restricted universe multiverse|deb saucy-security main restricted universe multiverse|deb saucy partner|deb saucy main|deb saucy main" --basetgz /<path_to_base_pbuilder>/saucy-gnome3.tgz --buildplace /<path_to_base_pbuilder>/build --aptcache "/<path_to_base_pbuilder>/aptcache/"

I make use of the <path_to_base_pbuilder> because by default it is all done at /var and I do not always have enough space there.

Once created, and following our example, we would be building our package for the target gnome3 PPA as follows:

$ sudo pbuilder --build --mirror "" --othermirror "deb saucy main restricted universe multiverse|deb saucy-updates main restricted universe multiverse|deb saucy-proposed main restricted universe multiverse|deb saucy-security main restricted universe multiverse|deb saucy partner|deb saucy main|deb saucy main" --basetgz /<path_to_base_pbuilder>/saucy-gnome3.tgz --buildplace /<path_to_base_pbuilder>/build --aptcache "/<path_to_base_pbuilder>/aptcache/" ../webkitgtk_2.3.2-1ubuntu6~saucy1.1.dsc

Now, it is just a matter of waiting and checking the results.

Uploading to your PPA

The final step is uploading the package with the new changes to your PPA.

I actually have one sandbox PPA per each stable PPA. These PPAs are not intended for the general users but for being able to play with the changes until I feel they are stable enough to be published in the stable PPAs. Hence, I have 4 PPAs:

  • ppa: Where I keep changes from official Ubuntu packages that are useful to me.
  • ppa-next: Not intended for general users. Where I keep unstable packages with the changes that I will move to the ppa one once I feel they are stable enough.
  • gnome3: Where I keep changes on packages which source has been obtained from the GNOME3 Team PPA.
  • gnome3-next: Not intended for general users. Where I keep unstable packages with the changes that I will move to the gnome3 one once I feel they are stable enough.

With this, during the first cycles of development I will be uploading the changes to my unstable PPAs before uploading them to the stables. For this example, I would be uploading first to the gnome3-next one:

$ dput ppa:tanty/gnome3-next ../webkitgtk_2.3.2-1ubuntu6~saucy1.1_source.changes

Once I’m happy enough I would be uploading the changes to the stable PPA:

$ dput -f ppa:tanty/gnome3 ../webkitgtk_2.3.2-1ubuntu6~saucy1.1_source.changes

The -f flag is avoid the error that is triggered when there is already a “log” file from a previous upload with dput of a certain “.changes” package.

With this, you only have to wait for the package to be built on the PPA bots, upload your repositories and upgrade:

$ sudo apt-get update && sudo apt-get upgrade

Enjoy your newly patched package!

by tanty at March 19, 2014 11:57 PM

March 17, 2014

Andy Wingo

stack overflow

Good morning, gentle hackers. Today's article is about stack representation, how stack representations affect programs, what it means to run out of stack, and that kind of thing. I've been struggling with the issue for a while now in Guile and finally came to a nice solution. But I'm getting ahead of myself; read on for some background on the issue, and details on what Guile 2.2 will do.

stack limits

Every time a program makes a call that is not a tail call, it pushes a new frame onto the stack. Returning a value from a function pops the top frame off the stack. Stack frames take up memory, and as nobody has an infinite amount of memory, deep recursion could cause your program to run out of memory. Running out of stack memory is called stack overflow.

Most languages have a terrible stack overflow story. For example, in C, if you use too much stack, your program will exhibit "undefined behavior". If you are lucky, it will crash your program; if are unlucky, it could crash your car. It's especially bad in C, as you neither know ahead of time how much stack your functions use, nor the stack limit imposed by the user's system, and the stack limit is often quite small relative to the total memory size.

Things are better, but not much better, in managed languages like Python. Stack overflow is usually assumed to throw an exception (though I couldn't find the specification for this), but actually making that happen is tricky enough that simple programs can cause Python to abort and dump core. And still, like C, Python and most dynamic languages still have a fixed stack size limit that is usually much smaller than the heap.

Arbitrary stack limits would have an unfortunate effect on Guile programs. For example, the following implementation of the inner loop of map is clean and elegant:

(define (map f l)
  (if (pair? l)
      (cons (f (car l))
            (map f (cdr l)))

However, if there were a stack limit, that would limit the size of lists that can be processed with this map. Eventually, you would have to rewrite it to use iteration with an accumulator:

(define (map f l)
  (let lp ((l l) (out '()))
    (if (pair? l)
        (lp (cdr l) (cons (f (car l)) out))
        (reverse out))))

This second version is sadly not as clear, and it also allocates twice as much heap memory (once to build the list in reverse, and then again to reverse the list). You would be tempted to use the destructive linear-update reverse! to save memory and time, but then your code would not be continuation-safe -- if f returned again after the map had finished, it would see an out list that had already been reversed. (If you're interested, you might like this little Scheme quiz.) The recursive map has none of these problems.

a solution?

Guile 2.2 will have no stack limit for Scheme code.

When a thread makes its first Guile call, a small stack is allocated -- just one page of memory. Whenever that memory limit would be reached, Guile arranges to grow the stack by a factor of two.

Ideally, stack growth happens via mremap, and ideally at the same address in memory, but it might happen via mmap or even malloc of another memory block. If the stack moves to a different address, we fix up the frame pointers. Recall that right now Guile runs on a virtual machine, so this is a stack just for Scheme programs; we'll talk about the OS stack later on.

Being able to relocate the stack was not an issue for Guile, as we already needed them to implement delimited continuations. However, relocation on stack overflow did cause some tricky bugs in the VM, as relocation could happen at more places. In the end it was OK. Each stack frame in Guile has a fixed size, and includes space to make any nested calls; check my earlier article on the Guile 2.2 VM for more. The entry point of a function handles allocation of space for the function's local variables, and that's basically the only point the stack can overflow. The few things that did need to point into the stack were changed to be an offset from the stack base instead of a raw pointer.

Even when you grow a stack by a factor of 2, that doesn't mean you immediately take up twice as much memory. Operating systems usually commit memory to a process on a page-by-page granularity, which is usually around 4 kilobytes. Once accessed, this memory is always a part of your process's memory footprint. However, Guile mitigates this memory usage here; because it has to check for stack overflow anyway, it records a "high-water mark" stack usage since the last garbage collection. When garbage collection happens, Guile arranges to return the unused part of the stack to the operating system (using MADV_DONTNEED), but without causing the stack to shrink. In this way, the stack can grow to consume up to all memory available to the Guile process, and when the recursive computation eventually finishes, that stack memory is returned to the system.

You might wonder, why not just allocate enormous stacks, relying on the kernel to page them in lazily as needed? The biggest part of the answer is that we need to still be able to target 32-bit platforms, and this isn't a viable strategy there. Even on 64-bit, whatever limit you choose is still a limit. If you choose 4 GB, what if you want to map over a larger list? It's admittedly extreme, given Guile's current GC, but not unthinkable. Basically, your stack should be able to grow as big as your heap could grow. The failure mode for the huge-stack case is also pretty bad; instead of getting a failure to grow your stack, which you can handle with an exception, you get a segfault as the system can't page in enough memory.

The other common strategy is "segmented stacks", but the above link covers the downsides of that in Go and Rust. It would also complicate the multiple-value return convention in Guile, where currently multiple values might temporarily overrun the receiver's stack frame.

exceptional situations

Of course, it's still possible to run out of stack memory. Usually this happens because of a program bug that results in unbounded recursion, as in:

(define (faulty-map f l)
  (if (pair? l)
      (cons (f (car l)) (faulty-map f l))

Did you spot the bug? The recursive call to faulty-map recursed on l, not (cdr l). Running this program would cause Guile to use up all memory in your system, and eventually Guile would fail to grow the stack. At that point you have a problem: Guile needs to raise an exception to unwind the stack and return memory to the system, but the user might have throw handlers in place that want to run before the stack is unwound, and we don't have any stack in which to run them.

Therefore in this case, Guile throws an unwind-only exception that does not run pre-unwind handlers. Because this is such an odd case, Guile prints out a message on the console, in case the user was expecting to be able to get a backtrace from any pre-unwind handler.

runaway recursion

Still, this failure mode is not so nice. If you are running an environment in which you are interactively building a program while it is running, such as at a REPL, you might want to impose an artificial stack limit on the part of your program that you are building to detect accidental runaway recursion. For that purpose, there is call-with-stack-overflow-handler. You run it like this:

(call-with-stack-overflow-handler 10000
  (lambda ()              ; body
    (faulty-map (lambda (x) x) '(1 2 3)))
  (lambda ()              ; handler
    (error "Stack overflow!")))

→ ERROR: Stack overflow

The body procedure is called in an environment in which the stack limit has been reduced to some number of words (10000, in the above example). If the limit is reached, the handler procedure will be invoked in the dynamic environment of the error. For the extent of the call to the handler, the stack limit and handler are restored to the values that were in place when call-with-stack-overflow-handler was called.

Unlike the unwind-only exception that is thrown if Guile is unable to grow its stack, any exception thrown by a stack overflow handler might invoke pre-unwind handlers. Indeed, the stack overflow handler is itself a pre-unwind handler of sorts. If the code imposing the stack limit wants to protect itself against malicious pre-unwind handlers from the inner thunk, it should abort to a prompt of its own making instead of throwing an exception that might be caught by the inner thunk. (Overflow on unwind via inner dynamic-wind is not a problem, as the unwind handlers are run with the inner stack limit.)

Usually, the handler should raise an exception or abort to an outer prompt. However if handler does return, it should return a number of additional words of stack space to grant to the inner environment. A stack overflow handler may only ever "credit" the inner thunk with stack space that was available when the handler was instated. When Guile first starts, there is no stack limit in place, so the outer handler may allow the inner thunk an arbitrary amount of space, but any nested stack overflow handler will not be able to consume more than its limit.

I really, really like Racket's notes on iteration and recursion, but treating stack memory just like any other kind of memory isn't always what you want. It doesn't make sense to throw an exception on an out-of-memory error, but it does make sense to do so on stack overflow -- and you might want to do some debugging in the context of the exception to figure out what exactly ran away. It's easy to attribute blame for stack memory use, but it's not so easy for heap memory. And throwing an exception will solve the problem of too much stack usage, but it might not solve runaway memory usage. I prefer the additional complexity of having stack overflow handlers, as it better reflects the essential complexity of resource use.

os stack usage

It is also possible for Guile to run out of space on the "C stack" -- the stack that is allocated to your program by the operating system. If you call a primitive procedure which then calls a Scheme procedure in a loop, you will consume C stack space. Guile tries to detect excessive consumption of C stack space, throwing an error when you have hit 80% of the process' available stack (as allocated by the operating system), or 160 kilowords in the absence of a strict limit.

For example, looping through call-with-vm, a primitive that calls a thunk, gives us the following:

(use-modules (system vm vm))

(let lp () (call-with-vm lp))

→ ERROR: Stack overflow

Unfortunately, that's all the information we get. Overrunning the C stack will throw an unwind-only exception, because it's not safe to do very much when you are close to the C stack limit.

If you get an error like this, you can either try rewriting your code to use less stack space, or you can increase Guile's internal C stack limit. Unfortunately this is a case in which the existence of a limit affects how you would write your programs. The the best thing is to have your code operate without consuming so much OS stack by avoiding loops through C trampolines.

I don't know what will happen when Guile starts to do native compilation. Obviously we can't relocate the C stack, so lazy stack growth and relocation isn't a viable strategy if we want to share the C and Scheme stacks. Still, we need to be able to relocate stack segments for delimited continuations, so perhaps there will still be two stacks, even with native C compilation. We will see.

Well, that's all the things about stacks. Until next time, happy recursing!

by Andy Wingo at March 17, 2014 11:40 AM

March 13, 2014

Manuel Rego

Welcome CSS Grid Layout

Igalia has been working in the implementation of CSS Grid Layout since past year (more below). This year I’ve had the chance to join the team and help to move forward this great spec.

State of the Art

Maybe, you’ve never heard before about CSS Grid Layout, but for sure you know part of the history of design in the web platform.

  • In the beginning people tried to make cool designs using <table> tag. Which completely breaks the separation between content (HTML) and presentation (CSS), apart from the ugly code and poor results achieved with this approach.
  • Then people started to use <div> for everything, floating blocks here and there. This might cause a terrible mess when trying to do flexible layouts that will adapt themselves to the screen size.
  • Now lots of CSS frameworks are emerging (960 Grid, Blueprint, Bootstrap, Foundation, Skeleton, …). They allow to create responsive designs (very useful in the mobile era) most of them based on the grid concept. The problem is that each framework has its own syntax and lots of CSS code in order to provide the expected behavior.

To solve all these issues W3C has defined a new powerful and flexible spec that will make web designer’s life happier.

CSS Grid Layout

“allows authors to easily define complex, responsive 2-dimensional layouts”

By Tab Atkins Jr. (Google) at CSS WG Blog.

CSS Grid Layout provides a mechanism to divide the available space in rows and columns with a set of predictable sizing behaviors. It defines a set of grid areas where designers can precisely place the elements of a web page.

CSS Grid Layout example

CSS Grid Layout example

The Grid Layout spec can be used to intelligently reflow elements within a web page optimizing locations and sizes, depending on the device where the page is rendered in, creating responsive designs.


So, let’s imagine that you have the following HTML and you want to layout it like in the picture above:

<div class="grid">
    <div class="title">Title</div>
    <div class="menu">Menu</div>
    <div class="main">Main</div>
    <div class="footer">Footer</div>

This will be the CSS syntax required to do the magic:

.grid {
    display: grid;
    grid-template-columns: 200px 1fr;
    grid-template-rows: 100px 1fr auto;
.title { grid-column: 1; grid-row: 1; }
.menu { grid-column: 1; grid-row: 2 / span 2; }
.main { grid-column: 2; grid-row: 1 / span 2; }
.footer { grid-column: 2; grid-row: 3; }


  • display: grid: Defines a grid container.
  • grid-template-columns and grid-template-rows: Specify the track breadths.
  • grid-column and grid-row: Determine a grid item’s size and location within the grid.

Flexible tracks (the ones defined with 1fr in the previous example) will grow or shrink and automatically adapt themselves to the viewport size. This allows to create very customized grids, defining flexible breadths depending on the contents or the available space.

We’ve created a repository with some examples and demos, feel free to play with them to get a better understanding of the spec capabilities:

Implementation status

Despite of being a Working Draft (WD) spec, implementations are appearing in the main web engines:

  • IE/Trident first shipped an implementation in IE10.
  • Google started an implementation in WebKit by the end of 2011. Igalia has been working on this implementation since mid-2013, collaborating with Google in Chromium/Blink and maintaining an up-to-date implementation in Safari/WebKit.
  • Mozilla started working on an implementation in Firefox/Gecko this year.


Finally, we’d like to thank Bloomberg for sponsoring our work around CSS Grid Layout on Blink and Webkit, together with the implementation of ECMAScript 6 (ES6) features in V8 and SpiderMonkey JavaScript engines.

Igalia & Bloomberg logos

Igalia and Bloomberg working together to build a better web

As a related note, my mates Juanjo and Xavi were talking about CSS Grid Layout and CSS Regions in the last Mobile World Congress at Barcelona as part of the W3C booth. Check the slides in case you missed it (demos and videos of this post are extracted from them).

Igalia will keep working hard to evolve the implementation of CSS Grid Layout spec both in Blink and WebKit. We’ll try to keep you informed about our progress, stay tuned!

by Manuel Rego Casasnovas at March 13, 2014 03:38 PM

March 07, 2014

Andy Wingo

es6 generator and array comprehensions in spidermonkey

Good news, everyone: ES6 generator and array comprehensions just landed in SpiderMonkey!

Let's take a quick look at what comprehensions are, then talk about what just landed in SpiderMonkey and when you'll see it in a Firefox release. Thanks to Bloomberg for sponsoring this work.

comprendes, mendes

Comprehensions are syntactic sugar for iteration. Unlike for-of, which processes its body for side effects, an array comprehension processes its body for its values, collecting them into a new array. Like this:

// Before (by hand)
var foo = (function(){
             var result = [];
             for (var x of y)
             return result;

// Before (assuming y has a map() method)
var foo = { return x*x });

// After
var foo = [for (x of y) x*x];

As you can see, array comprehensions are quite handy. They're expressions, not statements, and so their result can be passed directly to whatever code needs it. This can make your program more clear, because you aren't forced to give names to intermediate values, like result. At the same time, they work on any iterable, so you can use them on more kinds of data than just arrays. Because array comprehensions don't make a new closure, you can access arguments and this and even yield from within the comprehension tail.

Generator comprehensions are also nifty, but for a different reason. Let's look at an example first:

// Before
var bar = (function*(){ for (var x of y) yield y })();

// After
var bar = (for (x of y) y);

As you can see the syntactic win here isn't that much, compared to just writing out the function* and invoking it. The real advantage of generator comprehensions is their similarity to array comprehensions, and that often you can replace an array comprehension by a generator comprehension. That way you never need to build the complete list of values in memory -- you get laziness for free, just by swapping out those square brackets for the comforting warmth of parentheses.

Both kinds of comprehension can contain multiple levels of iteration, with embedded conditionals as you like. You can do [for (x of y) for (z of x) if (z % 2) z + 1] and all kinds of related niftiness. Comprehensions are almost always more concise than map and filter, with the added advantage that they are usually more efficient.

what happened

SpiderMonkey has had comprehensions for a while now, but only as a non-standard language extension you have to opt into. Now that comprehensions are in the draft ES6 specification, we can expose them to the web as a whole, by default.

Of course, the comprehensions that ES6 specified aren't quite the same as the ones that were in SM. The obvious difference is that SM's legacy comprehensions were written the other way around: [x for (x of y)] instead of the new [for (x of y) x]. There were also a number of other minor differences, which I'll list here for posterity:

  • ES6 comprehensions create one scope per "for" node -- not one for the comprehension as a whole.

  • ES6 comprehensions can have multiple "if" components, which may be followed by other "for" or "if" components.

  • ES6 comprehensions should make a fresh binding on each iteration of a "for", although Firefox currently doesn't do this (bug 449811). Incidentally, for-of in Firefox has this same problem.

  • ES6 comprehensions only do for-of iteration, not for-in iteration.

  • ES6 generator comprehensions always need parentheses around them. (The parentheses were optional in some cases for SM's old generator comprehensions.

  • ES6 generator comprehensions are ES6 generators (returning {value, done} objects), not legacy generators (StopIteration).

I should note in particular that the harmony wiki is out of date, as the feature has moved into the spec proper: array comprehensions, generator comprehensions.

For another fine article on ES6 generators, check out Ariya Hidayat's piece on comprehensions from about a year ago.

So, ES6 comprehensions just landed in SpiderMonkey today, which means it should be part of Firefox 30, which should reach "beta" in April and become a stable release in June. You can try it out tomorrow if you use a nightly build, provided it doesn't cause some crazy breakage tonight. As of this writing, Firefox will be the first browser to ship ES6 array and generator comprehensions.


I had a Monday of despair: hacking at random on something that didn't solve my problem. But I had a Tuesday morning of pleasure, when I realized that my Monday's flounderings could be cooked into a delicious mid-week bisque; the hack was obvious and small and would benefit the web as a whole. (Wednesday was for polish and Thursday on another bug, and Friday on a wild parser-to-OSR-to-assembly-and-back nailbiter; but in the end all is swell.)

Thanks again to Bloomberg for this opportunity to build out the web platform, and to Mozilla for their quality browser wares (and even better community of hackers).

This has been an Igalia joint. Until next time!

by Andy Wingo at March 07, 2014 09:11 PM

March 06, 2014

Juan A. Suárez

Yum Search Extended

Hi again! Let me tell you something. I’m a Fedora user since several releases ago, probably since Fedora 13 or 14.

Before that, I was using Ubuntu, but decided to switch to Fedora for several reasons that are not worth explaining here. In any case, after switching to Fedora there was something that I was missing quite a lot: the aptitude package manager. aptitude is a deb package manager, similar to apt. What I really like about aptitude is its flexibility when searching packages.

While apt or yum allows to specify the search term, they just get all the packages matching the search text, but they don’t allow you where to search. Do you want to get only packages that are not installed? Or do you just remember the package had python in the name, and part of the description? With aptitude this is not a problem, as it allows you to specify such search expressions.

Though search in yum is not so flexible, as far as I know, it has a nice feature: it allows plugins to implement new features. So several months ago I wrote a plugin to mimic the aptitude search flexibility: yum searchex (search extended).

It is worth saying that I didn’t want to imitate the full aptitude functionality; only those features that I really missed from Ubuntu.

The basic idea is specifying for each term where to search. This is done by prefixing the text with ~ and a letter that expresses where to search. In some cases, the text to search is not needed. For instance, to search only in the list of installed packages, we would use ~i.

The full list of the available options can be found in the project forge.

As an example is worth a thousand words, let’s show how to search a package that we know it contains python in the name, it is not installed, and also we remember it has something to do with KDE:

yum searchex ~apython~dKDE

Hope this plugin is as useful for you as it is for me!

© jasuarez for Words from the inside, 2014. | Permalink | No comment | Add to
Post tags: ,

Feed enhanced by Better Feed from Ozh

by jasuarez at March 06, 2014 09:45 AM

February 28, 2014

Iago Toral

Epiphany + WebKitGTK/WebKit2 + Wayland + Accelerated Compositing

In my previous post I shared that I had managed to get a basic implementation of WebKitGTK+WebKit2 to work under Wayland. I also discussed some of the pieces that were still missing, most important of which was supporting for multiple views, that is, having the possibility to run multiple browser windows/tabs that render accelerated content simultaneously.


In the last weeks I have continued making progress and I am happy to say that I have finally implemented support for this too, proof in the video below:


Support for multiple views required to implement an extension to the Wayland protocol so that we can effectively map widgets and their corresponding Wayland surfaces in our nested compositor. This is needed to know which surface provides the graphics for which widget. Thanks to Pekka Paalanen for introducing me into the world of Wayland extensions!


My work also uncovered a number of hidden bugs in WebKitGTK that were hidden by the fact that we always use a sharing context for all our GL contexts. In Wayland, however, my colleague Zan Dobersek is working on implementing support for the sharing context separately and our patches still need to be merged together, so I have been working all this time without a sharing context and that uncovered these bugs that show up when we deal with multiple views (and hence multiple GL contexts). I am still working on fixing these but in any case merging my work with Zan’s should be enough to prevent them from actually producing any harm, just like in X11. Actually, one of these bugs is the one behind the rendering issues I mentioned in my last post when clicking on the browser’s back button.


One more thing worth mentioning: I needed a full browser to test multiple browser windows and tabs, so that also led me to test all my work with Epiphany/Web, which I had not done yet (so far I had restricted myself to work only with WebKit’s MiniBrowser), that is of course the browser I use in the video above.


If you are interested in following progress more closely or want to look at the patches that enable Accelerated Compositing for WebKitGTK/WebKit2 under Wayland, here is the bug.


Finally, I would like to thank my sponsor, Igalia, for supporting this work since its inception.

by itoral at February 28, 2014 11:20 AM

February 27, 2014

Xabier Rodríguez Calvar

New media controls in WebKitGtk+ (reloaded)

In December we organized in A Coruña the WebKitGTK+ hackfest at the Igalia premises as usual and also as usual it was an awesome oportunity to meet the rest of the team. For more information about the progress done in the hackfest, you can have a look at KaL’s post.

As part of the hackfest I decided to take a task that it would take some time so that I could focus and I decided to go for rewriting once again the WebKitGTK+ multimedia controls. People who just read this post will wonder why I say again and the reason is that last year we completely redesigned the multimedia controls. This time I have not redesigned them (well, a bit) but rewritten them in JavaScript as the Apple guys had done before.

To get the job done, the first step was bundling the JavaScript code and activating the codepath to use those controls. I used the Apple controls as template so you can imagine that the first result was a non-working monster that at some point reminded to Safari multimedia controls. At that point I could do two things, forking or inheriting. I decided to go with inheritance because it keeps the spirit of WebKit (and almost all Free Software projects) of sharing as much code as possible and because forking later is easier than merging. Then step by step I kept redefining JavaScript methods and tweaking some stuff in the C++ and CSS code to create the current user experience that we had so far.

Some of the non-aesthetic changes are the following:

  • Focus rings are now managed from CSS instead of C++.
  • Tests got new fixes, rebaselines and more love.
  • CMake support for the new controls.
  • Load captions icon from theme.
  • Load and hide elements handled now with CSS (and JavaScript).

The captions icon problem was interesting because I found out that the one we were using was “user-invisible-symbolic” and it was hardcoded directly in the CSS code. I changed it to be loaded from the theme but it raised the issue of using the incorrect metaphor though the current icon looks nice for captions. I filed a GNOME bug (and another WebKit bug to follow this up) so that a new icon can be created for captions/subtitles with the correct metaphor.

And which are the controls aesthetic changes?

  • Show a very subtle gradient when the elements are focused or active to improve the accessibility support (which won’t be complete until bug 117857 is fixed).
  • Volume slider rolls up and down with a nice animation.
  • Some other elements are not shown when they are not needed.
  • Captions menu shows up with both click and mouse hover for coherence with the volume slider.
  • Captions menu is also animated the same way as the volume slider.
  • Captions menu was propertly centered.
  • Captions menu style was changed to make it more similar to the rest of the controls (fonts, margings…)
  • Volume slider shows below the media element when it is too close to the page top and it cannot be shown on it. This was a regression that I introduced with the first rewrite, happy to have it fixed now.

As I already said the aesthetic differences with the former C++ are not a big deal unless you compare them with the original controls:

Starting point

To appreciate the new controls I cannot just show a screenshot, because the nicest thing are the animations. Therefore a video is needed (and if you have WebKit compiled you can experience them yourself)):

Of course, I thank our hackfest sponsors as the it was possible because of them:

Igalia GNOME Foundation

by calvaris at February 27, 2014 12:04 PM

February 25, 2014

Jacobo Aragunde

LibreOffice UX hackfest: for an accessible user experience

Accessibility is one of the areas of interest for us at Igalia, that’s why I traveled to Brussels some weeks ago to take part in the LibreOffice UX hackfest organized by The Document Foundation and kindly hosted by Betacowork. My colleage Joanmarie Diggs, maintainer of Orca, had identified a set of bugs that will be instrumental in improving the experience with the screen reader; as she keeps saying, “Orca can only provide access to what is made accessible.”

Before diving into the specifics, let me introduce the accessibility layers of LibreOffice first. Due to its multi-platform nature, LibreOffice provides its own Visual Components Library (VCL), with a graphical toolkit in the fashion of GTK+, Qt, etc. It serves as an operating system abstraction, and it is mapped to system-specific libraries whenever possible. When it comes to accessibility, there is a set of internal accessibility APIs that can be translated into ATK objects and functions in the same way it can be translated to IAccessible2 API in Windows.

The first thing I tackled was implementing and improving the accessible roles exposed by LibreOffice. Better role support means smarter client applications: for example, an object registering itself as a spreadsheet document allows the screen reader to make some assumptions about the structure of that document and present its contents more efficiently, and in a more tailored fashion, than it could for a generic document. And navigation by heading only works if headings claim to be headings rather than paragraphs.

Bug #39944 is about adding support for new(ish) ATK roles which LibreOffice wasn’t using yet. Checking the list of existing internal roles we find out that some of the new roles fit properly in the existing API, for example group box and comment internal roles match with ATK’s grouping and comment. But other roles had to be added from scratch. This was the case for the new, specific roles for spreadsheet, text and presentation documents that I added to LibreOffice API and linked to their ATK relatives.

Bug #35105 was a trickier one, because it looked more simple than it was… The problem as reported was that headings in text were not exposed using the proper heading role. But as I dug deeper, I discovered a much more complicated issue: paragraphs that could become headings, and the opposite, as a result of editing the document. LibreOffice lacked support for accessibility objects changing their roles, so I added a new event to the API to notify role changes, and connected that event with atk_object_set_role.

I also spent quite some time working on bug #35107. Again, it looked like a simple role-support bug, but I had learned not to trust the first impression. To debug this one, I first had to learn about the ATK hypertext interface and hyperlink objects. Having done so, I was able to verify that LibreOffice’s existing hypertext implementation was working properly. The same didn’t hold true for the hyperlink; in particular, the implementation of hyper_link_get_object in LibreOffice was wrong. A patch is still pending because it requires involving the third member of the ATK hyper-family, AtkHyperlinkImpl, of which there is no trace in LibreOffice (yet) and thus has to be implemented from scratch. Anyway, it was an interesting debugging and documentation-reading exercise.

In addition to the above, I spent some time triaging  existing accessibility bug reports filed by GNOME developers and users and closing those which had already been fixed. The work of code analysis also led me to think of some additional improvements which can be made in LibreOffice’s ATK support. I’ve filed them for now and will begin working on them soon.

I want to thank The Document Foundation for having invited me to attend this hackfest, and all the other participants. It was great to meet all the people I know from the IRC in person.

Happy hacking!

Flavors from Brussels Flavors from Brussels

by Jacobo Aragunde Pérez at February 25, 2014 08:52 AM

February 18, 2014

Andy Wingo

compost, a leaf function compiler for guile

What's that out by the woodshed? It's a steaming pile -- it's full of bugs -- it's compost, a leaf function compiler for Guile!

Around this time last year, a few of us cooked up some hack-dishes to bring to a potluck for Guile 2.0's release anniversary. Last year, mine was a little OpenGL particle demo.

That demo was neat but it couldn't be as big as I would have liked it to be because it was too slow. So, this year when the potluck season rolled around again I sat down to make a little compiler for the subset of Scheme that you see in inner numeric loops -- bytevector access, arithmetic, and loops.

The result is compost. Compost compiles inner loops into native x86-64 machine code that operates on unboxed values.

As you would imagine, compost-compiled code is a lot faster than code interpreted by Guile's bytecode interpreter. I go from being able to compute and render 5K particles at 60 fps up to 400K particles or so -- an 80-fold improvement. That's swell but it gets sweller. The real advantage is that with fast inner loops, I can solve more complicated problems.

Like this one!

(Videos on the internet are a surprisingly irritating problem. Maybe because it's not a cat? Check for various other versions of 1300-bodies-512x416 if that doesn't play for you.)

Last year's demo hard-coded a gravitational attractor at (0, 0, 0). This one has no hard-coded attractor -- instead, each particle attracts each other. This so-called n-body simulation is an n-squared problem, so you need to be really efficient with the primitives to scale up, and even then the limit approaches quickly.

With compost, I can get to about 1650 particles at 60 frames per second, using 700% CPU on this 4-core 2-thread-per-core i7-3770 machine, including display with the free software radeon drivers. Without compost -- that is to say, just with Guile's bytecode virtual machine -- I max out at something more like 120 particles, and only 200% CPU.

The rest of this post describes how compost works. If compilers aren't your thing, replace the rest of the words with cat noises.

meow meow meow meow meow meow meow meow

The interface to compost is of course a macro, define/compost. Here's a little loop to multiply two vectors into a third, element-wise:

(use-modules (compost syntax) (rnrs bytevectors))
(define/compost (multiply-vectors (dst bytevector?)
                                  (a bytevector?)
                                  (b bytevector?)
                                  (start exact-integer?)
                                  (end exact-integer?))
  (let lp ((n start))
    (define (f32-ref bv n)
      (bytevector-ieee-single-native-ref bv (* n 4)))
    (define (f32-set! bv n val)
      (bytevector-ieee-single-native-set! bv (* n 4) val))
    (when (< n end)
      (f32-set! dst n (* (f32-ref a n) (f32-ref b n)))
      (lp (1+ n)))))

It's low-level but that's how we roll. If you evaluate this form and all goes well, it prints out something like this at run-time:

;;; loading /home/wingo/.cache/guile/compost/

This indicates that compost compiled your code into a shared object at macro-expansion time, and then printed out that message when it loaded it at runtime. If composting succeeds, compost writes out the compiled code into a loadable shared object (.so file). It then residualizes a call to dlopen to load that file at run-time, followed by code to look up the multiply-vectors symbol and create a foreign function. If composting fails, it prints out a warning and falls back to normal Scheme (by residualizing a plain lambda).

In the beginning of the article, I called compost a "leaf function compiler". Composted functions should be "leaf functions" -- they shouldn't call other functions. This restriction applies only to the low-level code, however. The first step in composting is to run the function through Guile's normal source-to-source optimizers, resulting in a CPS term. The upshot is that you can use many kinds of abstraction inside the function, like the little f32-ref/f32-set! helpers above, but in the end Guile should have inlined or contified them all away. It's a restriction, but hey, this is just a little hack.

Let's look at some assembly. We could get disassembly just by calling objdump -d /home/wingo/.cache/guile/compost/, but let's do it a different way. Let's put that code into a file, say "/tmp/qux.scm", and add on this code at the end:

(define size #e1e8) ;; 100 million
(define f32v (make-f32vector size 2.0))
(multiply-vectors f32v f32v f32v 0 size)

OK. Now we run Guile under GDB:

$ gdb --args guile /tmp/qux.scm 
(gdb) b 'multiply-vectors'
Function "multiply-vectors" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 ('multiply-vectors') pending.
(gdb) r
Starting program: /opt/guile/bin/guile /tmp/qux.scm
[New Thread 0x7ffff604b700 (LWP 13729)]
;;; loading /home/wingo/.cache/guile/compost/

Breakpoint 1, 0x00007ffff5322000 in multiply-vectors () from /home/wingo/.cache/guile/compost/
(gdb) step
multiply-vectors () at /tmp/qux.scm:12
12	    (when (< n end)

Word! GDB knows about the symbol, multiply-vectors. That's top. We are able to step into it, and it prints Scheme code!

Both of these swell things are because compost residualizes its compiled code as ELF files, and actually re-uses Guile's linker. The ELF that we generate can be loaded by dlopen, and its symbol tables and DWARF debugging information are known to GDB.

(In my last article on ELF I mentioned that Guile had no plans to use the system dynamic library loader (dlopen). That's still true; Guile has its own loader. I used the system loader in this place, though, just because I thought it was a neat hack.)

We can tell GDB to disassemble the next line:

(gdb) set disassemble-next-line on
(gdb) step
9	      (bytevector-ieee-single-native-ref bv (* n 4)))
=> 0x00007ffff532201d <multiply-vectors+29>:	4c 0f af c9	imul   %rcx,%r9
13	      (f32-set! dst n (* (f32-ref a n) (f32-ref b n)))
=> 0x00007ffff532203b <multiply-vectors+59>:	f2 0f 59 c1	mulsd  %xmm1,%xmm0
   0x00007ffff532203f <multiply-vectors+63>:	49 b9 04 00 00 00 00 00 00 00	movabs $0x4,%r9

GDB does OK with these things, but it doesn't have special support for Scheme, and really you would want column pointers, not just lines. That data is in the DWARF but it's not processed by GDB. Anyway here's the disassembly:

(gdb) disassemble
Dump of assembler code for function multiply-vectors:
   0x00007ffff5322000 <+0>:	push   %rbx
   0x00007ffff5322001 <+1>:	push   %rbp
   0x00007ffff5322002 <+2>:	push   %r12
   0x00007ffff5322004 <+4>:	push   %r13
   0x00007ffff5322006 <+6>:	push   %r14
   0x00007ffff5322008 <+8>:	push   %r15
   0x00007ffff532200a <+10>:	cmp    %r8,%rcx
   0x00007ffff532200d <+13>:	jge    0x7ffff5322060 <multiply-vectors+96>
   0x00007ffff5322013 <+19>:	movabs $0x4,%r9
   0x00007ffff532201d <+29>:	imul   %rcx,%r9
   0x00007ffff5322021 <+33>:	cvtss2sd (%rsi,%r9,1),%xmm0
   0x00007ffff5322027 <+39>:	movabs $0x4,%r9
   0x00007ffff5322031 <+49>:	imul   %rcx,%r9
   0x00007ffff5322035 <+53>:	cvtss2sd (%rdx,%r9,1),%xmm1
=> 0x00007ffff532203b <+59>:	mulsd  %xmm1,%xmm0
   0x00007ffff532203f <+63>:	movabs $0x4,%r9
   0x00007ffff5322049 <+73>:	imul   %rcx,%r9
   0x00007ffff532204d <+77>:	cvtsd2ss %xmm0,%xmm15
   0x00007ffff5322052 <+82>:	movss  %xmm15,(%rdi,%r9,1)
   0x00007ffff5322058 <+88>:	inc    %rcx
   0x00007ffff532205b <+91>:	jmpq   0x7ffff532200a <multiply-vectors+10>
   0x00007ffff5322060 <+96>:	movabs $0x804,%rdi
   0x00007ffff532206a <+106>:	mov    %rdi,%rax
   0x00007ffff532206d <+109>:	pop    %r15
   0x00007ffff532206f <+111>:	pop    %r14
   0x00007ffff5322071 <+113>:	pop    %r13
   0x00007ffff5322073 <+115>:	pop    %r12
   0x00007ffff5322075 <+117>:	pop    %rbp
   0x00007ffff5322076 <+118>:	pop    %rbx
   0x00007ffff5322077 <+119>:	retq   
End of assembler dump.

Now if you know assembly, this is pretty lame stuff -- it saves registers it doesn't use, it multiplies instead of adds to get the bytevector indexes, it loads constants many times, etc. It's a proof of concept. Sure beats heap-allocated floating-point numbers, though.

safety and semantics

Compost's goal is to match Guile's semantics, while processing values with native machine operations. This means that it needs to assign concrete types and representations to all values in the function. To do this, it uses the preconditions, return types from primitive operations, and types from constant literals to infer other types in the function. If it succeeds, it then chooses representations (like "double-precision floating point") and assigns values to registers. If the types don't check out, or something is unsupported, compilation bails and runtime will fall back on Guile's normal execution engine.

There are a couple of caveats, however.

One is that compost assumes that small integers do not promote to bignums. We could remove this assumption with better range analysis. Compost does do some other analysis, like sign analysis to prove that the result of sqrt is real. Complex numbers will cause compost to bail.

Compost also doesn't check bytevector bounds at run-time. This is terrible. I didn't do it though because to do this nicely you need to separate the bytevector object into two variables: the pointer to the contents and the length. Both should be register-allocated separately, and range analysis would be nice too. Oh well!

Finally, compost is really limited in terms of the operations it supports. In fact, if register allocation would spill on the function, it bails entirely :)

the source

If it's your thing, have fun over on yon gitorious. Compost needs Guile from git, and the demos need Figl, the GL library. For me this project is an ephemeral thing; a trial for future work, with some utility now, but not something I really want to support. Still, if it's useful to you, have at it.


I woke up this morning at 5 thinking about the universe, and how friggin big it is. I couldn't go back to sleep. All these particles swirling and interacting in parallel, billions upon zillions, careening around, somehow knowing what forces are on them, each somehow a part of the other. I studied physics and I never really boggled at the universe in the way I did this morning thinking about this n-body simulation. Strange stuff.

I remember back in college when I was losing my catholicism and I would be at a concert or a dance show or something and I would think "what is this, this is just particles vibrating, bodies moving in the nothing; there is no meaning here." It was a thought of despair and unmooring. I don't suffer any more over it, but mostly because I don't think about it any more. I still don't believe in some omniscient skydude or skylady, but if I did, I know he or she has got a matrix somewhere with every particle's position and velocity.

Swirl on, friends, and until next time, happy hacking.

by Andy Wingo at February 18, 2014 08:21 PM

February 17, 2014

Adrián Pérez

A tale of multiple processes in WebKitGTK+

One of the goals we defined during the 2013 edition of the WebKitGTK+ hackfest —among others— was enabling the Network Process, which my colleague Carlos already blogged about right after the hackfest, and with that in place, we would be then able to support having one Web Process for each WebKitWebView. Fast forward one month and a half to version 2.3.5: this is the first release of WebKitGTK+ shipping with support for the Network Process and multiple Web Processes.

This feature enables applications which use multiple views to be much, much more robust. Previously, one single Web Process was shared among all the web views in an application, meaning that rogue web content could stall the Web Process —or even worse, make it crash— and all the views in the application would suddenly stop working. All of them. Translating this to a familiar use of WebKitGTK+: each tab in your favorite Web browser will be completely kaputt and rendered unusable as soon as just one of the loaded web pages causes havoc. With multiple Web Processes though, only the web view (read: browser tab) causing trouble would stop working, and the rest will keep running completely unaffected by it.

More processes, Igor!

We have not only enabled support for multiple processes: as a bonus, it has been made optional. That is good if your application will keep using the shared Web Process mode for all the web views: that is the default setting. Simple applications that deal with content know to be safe, like Yelp or Devhelp do not need any changes.

Applications can change the process model with a single API call:

int main(int argc, char **argv)
  webkit_web_context_set_process_model (
      webkit_web_context_get_default (),

  /* The rest of the application code, unmodified */
The result of a single function call

Setting the process model must be done as early as possible in application code or, more precisely, before any other API calls that would cause a Web Process to be spawned. That is: before creating a web view and also before any other method call that would make WebKitGTK+ reach for the network. This is a hard requirement, and not doing as advised will make your application crash. You have been warned.

Taking one step back

While having separate Web Processes is great and all, it is reasonable to wonder whether that should be the one and only option… Is there some mid-term approach that would be more reasonable? Moreover, how memory usage fares with the process galore we have just unleashed? Is WebKitGTK+ going to cause an involuntary DoS attack in systems with scarce memory?

Fry also wonders

As it turns out, there are a number of situations in which having exactly one Web Process for each WebKitWebView is not the best option:

  • Pages which open multiple browser windows may want to use JavaScript to manipulate the state of one window from the other, or open links in a named window created by specifying the target attribute. Sharing the state may be needed in those cases.
  • Applications may want to decide under which circumstances to create new processes. For example, a web browser could implement a mode in which each different domain is assigned a Web Process, and all the views displaying content from the same domain use the same process.
  • When memory goes low, it would be interesting to make web views start sharing processes, to try to make better use the memory.

Therefore, Carlos has added a new function in the API to allow application developers to create “related” web views, which will share a Web Process:

GtkWidget *view, *related_view, *unrelated_view;

void create_views ()
  /* Create two views which share a Web Process */
  view = webkit_web_view_new ();
  related_view =
    webkit_web_view_new_with_related_view (WEBKIT_WEB_VIEW (view));

  /* This view will spawn a new Web Process */
  unrelated_view = webkit_web_view_new ();

While this does not directly solve the case in which WebKitGTK+ limits by itself the amount of processes being used —which is a feature that may be in an upcoming release—, it covers the most interesting use case without breaking the API: once again, existing applications wanting to stay in the single process world do not need any changes to their code.

Added complexity: the case for Web Extensions

Unfortunately, there were some assumptions in WebKitGTK+ that no longer hold true now that it is possible to have multiple Web Processes. Even when we have tried to make as easier as possible for applications to switch to multiprocess mode, it is very likely that applications using Web Extensions will need to be updated to handle the fact that each Web Process will load and instantiate the Web Extensions.

Because things are never like that in real life

A common pattern to establish a communication channel between a Web Process and an application using WebKitGTK+ is for a Web Extension to use D-Bus to expose a known unique name, and the application waits for message bus to appear and connects to it. Unfortunately, it is no good having multiple instances of a Web Extension trying to register the same name on the bus—names are unique!

One option we briefly considered was allowing to application to know the process identifier of the Web Process associated to a Web View, which would allow to use the process identifier to generate an unique name. Instead, we introduced a way to enable applications to pass arbitrary data to Web Extensions before they are initialized, and allow extensions to retrieve the data during their initialization: this way the application can generate an unique identifier, and pass it to the extensions without exposing a low-level detail like a process identifier in the public API. Plus, being allowed to pass arbitrary data is a much more generic solution, it avoids the needs for ugly hacks, and in cases where Web Extensions do not need to pass information back to the application it avoids needing to use an additional communication channel.

Using this new facility, the application can set any data that can be represented as a GVariant to be passed to Web Extensions on initialization:


int main (int argc, char **argv)
  WebKitWebContext *context = webkit_web_context_get_default ();
  GVariant *data = get_data_for_web_extensions ();

  webkit_web_context_set_web_extensions_directory (
  webkit_web_context_set_web_extensions_initialization_user_data (
      context, data);

  GtkWidget *view = webkit_web_view_new ();

  /* ... */

In the code for the Web Extensions, the signature and name of the initialization function has to be changed, for an additional argument with the user data to be passed to the initialization function:

void webkit_web_extension_initialize_with_user_data (
    WebKitWebExtension *extension, GVariant *user_data)
  /* Initialize the extension, using “user_data” */

But there is still one use-case that this mechanism alone does not cover: passing different user data to each instance of a Web Extension running in a separate Web Process. This is the reason for the new initialize-web-extensions. This signal is emitted exactly before spawning a Web Process that will instantiate new instances of the Web Extensions, and its callback function is guaranteed to be the most appropriate moment to set the data that will be passed to the Web Exntensions on initialization:


static void
initialize_web_extensions (WebKitWebContext *context, gpointer user_data)
  /* Web Extensions get a different ID for each Web Process */
  static guint32 unique_id = 0;

  webkit_web_context_set_web_extensions_directory (
  webkit_web_context_set_web_extensions_initialization_user_data (
      context, g_variant_new_uint32 (unique_id++));

int main (int argc, char **argv)
  g_signal_connect (webkit_web_context_get_default (),
                    G_CALLBACK (initialize_web_extensions),

  GtkWidget *view = webkit_web_view_new ();

  /* ... */

With this final bit, there is not just a way of knowing when a Web Process is about to be spawned: the best moment to set the data for the initialization of Web Extensions is also known.

Check list

To change the process model:

  • Use webkit_web_context_set_process_model() to set the process model.

To pass initialization data to Web Extensions:

  • Use webkit_web_context_set_web_extensions_initialization_user_data() to pass arbitrary data to Web Extensions.
  • Rename the webkit_web_extension_initialize() function in Web Extensions to webkit_web_extension_initialize_with_user_data(), in order to receive the initialization user data.

To update make Web Extensions play well with multiple processes:

Some final words

Having all those pieces in WebKitGTK+ means that support for multiple processes (one per tab) in the GNOME Web browser is happening. Right now the needed changes are already in the Git repository, and at the Igalia browsers team we are doing steady progress to have multiprocess support in Web as an opt-in setting for the next release of GNOME. Brave souls can build Web (and WebKitGTK+) themselves, and enable the multiprocess mode using gsettings:

gsettings set org.gnome.Epiphany \
    process-model one-secondary-process-per-web-view

Last, but not least, I want to mention that all this is possible to all the restless developers who devoted time to make multiprocess in WebKitGTK+ happen after we kickstarted the work during the WebKitGTK+ hackfest.

Happy times!

by aperez ( at February 17, 2014 09:45 PM

January 29, 2014

Manuel Rego

Performance of selection with CSS Regions in WebKit and Blink (Part II – perf profiler)

After the initial post introducing this topic and describing the Performance Tests (perftests), now is time to explain how to analyze the performance issues with a profiler in order to improve the code.

“Manual” measurements

First of all, you can think of doing some measurements in the source code trying to find the possible bottle necks in an application. For example you could use the following lines inside WebKit/Blink in order to measure the time required to execute a given function:
timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
printf("%lld.%.9ld\n", (long long)ts.tv_sec, ts.tv_nsec);
// Call to the function you want to measure.
timespec ts2;
clock_gettime(CLOCK_REALTIME, &ts2);
printf("%lld.%.9ld\n", (long long)ts2.tv_sec, ts2.tv_nsec);
printf("Diff: %lld.%.9ld\n", (long long)ts2.tv_sec - ts.tv_sec, ts2.tv_nsec - ts.tv_nsec);

Actually this is a pretty bad idea for a number of reasons:

  • Checking directly times with printf() is wrong as you are adding I/O that will spoil the measurements.
  • You have to modify the source code for every statement you want to measure, this can be really hard in large projects like WebKit/Blink.
  • There are better tools out there, called profilers, explicitly designed for this very same purpose, let’s use them.

Using perf profiler

In this case we are going to talk about how to use perf in a GNU/Linux environment to analyze the performance of the WebKitGTK+ port. You could follow the next steps:

  • Install it. It depends on your distro, but it should be simple. For example in Debian: apt-get install linux-tools
  • Run perf record to get the data. This will create a file called Here you have different options:
    • Call directly the application, it will follow the children processes, so it works properly in WebKit2 architecture with multiple processes: perf record -g <app>
    • Connect perf to an already existing process, for example to the WebProcess: perf record -p <process-pid> -g
  • Use perf report to analyze the data gathered by perf record. Simply use the following command (where -i <file-name> is optional as by default it reads file): perf report -i <file-name>

About how to collect the data, in WebKitGTK+ you have the alternative to generate the perf data files while running the perftests. Just adding some arguments to run-perf-tests script: Tools/Scripts/run-perf-tests --platform=gtk --debug --profile

Which will create a file called under WebKitBuild/Debug/layout-test-results/ folder.

Analyze profile session

perf report provides you the list of methods that have been running for more time, with the percentage of how many time was spent in each of them. Then you can get the full backtraces from the different places the method is called and know how many times it was invoked from each trace.

Let's use a concrete example to illustrate it. This is a perf report output got from the WebProcess doing a big selection in a page with CSS Regions:

- 6.26% lt-WebKitWebPro [.] WebCore::LayoutUnit::rawValue() const
- WebCore::LayoutUnit::rawValue() const
- 33.10% WebCore::operator+(WebCore::LayoutUnit const&, WebCore::LayoutUnit const&)
- 39.54% WebCore::LayoutRect::maxX() const
- 83.75% WebCore::LayoutRect::intersect(WebCore::LayoutRect const&)
- 98.69% WebCore::RenderRegion::repaintFlowThreadContentRectangle(WebCore::LayoutRect const&, bool, WebCore::LayoutRect const&, WebCore::LayoutRect const&, WebCore::LayoutPoint const&) const
WebCore::RenderRegion::repaintFlowThreadContent(WebCore::LayoutRect const&, bool) const
WebCore::RenderFlowThread::repaintRectangleInRegions(WebCore::LayoutRect const&, bool) const
- WebCore::RenderObject::repaintUsingContainer(WebCore::RenderLayerModelObject const*, WebCore::IntRect const&, bool) const
- 75.49% WebCore::RenderSelectionInfo::repaint()
WebCore::RenderView::setSelection(WebCore::RenderObject*, int, WebCore::RenderObject*, int, WebCore::RenderView::SelectionRepaintMode)

As you can see, 6.26% of the time is spent in LayoutUnit::rawValue() method, this is used in lots of places. 33.10% of the time it's called from WebCore::operator+() which is also quite generic. We should keep going deeper in the call-graph till we reach some methods that are interesting in our particular case.

In this case, the selection starts in RenderView::setSelection(), so we should investigate further the methods called from there. Of course, in order to do that you need to have some understanding of the code where you're moving or you'll end up completely lost.

Improve source code

Thanks to this data I realized that in each RenderSelectionInfo::repaint() it's used the RenderObject::containerForRepaint(). Which most times returns the parent RenderNamedFlowThread for all its children in the render tree.

This causes that for every element under RenderNamedFlowThread the method RenderFlowThread::repaintRectangleInRegions() is called. Taking a look to this method it has a loop over all the regions forcing a repaint. This means that if you have 1000 regions, even if you're just selecting in one of them, a repaint in the rest of regions is executed.

So, I've provided a patch that repaints only the affected regions, it means around 12%, 18% and 73% improvement in Layout/RegionsSelection.html, Layout/RegionsSelectAllMixedContent.html and Layout/RegionsExtendingSelectionMixedContent.html perftest respectively.

Doing tests with more regions using this example we got the following results:

Regions Without patch (ms) With patch (ms) Improvement
100 923 338 63%
150 2712 727 73%
200 5952 1285 78%
500 81731 7868 90%

As expected, results are better as we increase the number of regions.


This was just one specific example in order to explain how to use the available tools, trying to provide the required context to understand them properly.

For sure, there's still plenty of work to be done in order to improve the performance of selection with CSS Regions. Nonetheless, we still have to settle a final implementation for the selection in CSS Regions before going on with the optimization efforts, as it was explained in my previous post we're on the way to fix it.

This work has been done inside the collaboration between Adobe and Igalia around CSS Regions.

In Igalia we have a great experience in performance optimization for CSS standards like CSS Flexible Box Layout, CSS Grid Layout and CSS Regions. Please don't hesitate to contact us if you need some help around these topics.

by Manuel Rego Casasnovas at January 29, 2014 01:10 PM

January 23, 2014

Manuel Rego

Approach to fix selection with CSS Regions (WebKitGTK+ hackfest late wrap-up)

As you probably know if you have been following this blog, we’ve been working for a while in selection interaction with CSS Regions inside a collaboration between Adobe and Igalia.

First of all, let’s contextualize this post.

WebKitGTK+ Hackfest 2013

Back in December it was organized the 5th WebKitGTK+ Hackfest at Igalia offices in A Coruña (Spain). During that week Mihnea, Javi and myself were working together trying to find a good solution for selection in CSS Regions.

As Javi has already explained in his last post, our initial implementation to fix the issues of selection in documents with CSS Regions was discarded due to several problems.

At that point, the main goal was to find an approach that will make selection in CSS Regions spec compliant and might be validated by the editing experts in order to get it integrated upstream.

Current problem

According to editing spec selection is DOM based. If you select some stuff in a page with CSS Regions, the content is retrieved properly from the DOM tree. In a selection retrieved content and highlighted content should match. However, this is not happening with CSS Regions.

CSS Regions introduces a divergence between DOM and render trees. In CSS Regions some nodes of the DOM tree (called content nodes, the ones that have -flow-into property) are moved from the “regular” render tree (under RenderHTML) to the RenderFlowThread (a sibling of RenderHTML).

During the selection we have a start and end positions in the DOM tree that are mapped into positions in the render tree. Selection algorithms traverse the render tree from start to end, which has some issues with CSS Regions in several situations. Let’s use the following diagram to explain some of them.

Diagram showing DOM and render trees together with visual display of a CSS Regions example

Diagram showing DOM and render trees together with visual display of a CSS Regions example

Let’s imagine we perform a selection from content-1 to content-2.

  • DOM selection: Returns the following nodes: content-1, source-1 and content-2
  • Render tree selection: We start and end the selection outside the flow thread. However, source-1 which is selected in the DOM tree is under the RenderFlowThread and it's not processed by the algorithm while traversing the render tree from start to end. Thus, it's not highlighted.
  • Issue: source-1 is not highlighted.
  • Example of current and expected behaviors for selection in CSS Regions

    Example of current and expected behaviors for selection in CSS Regions


    After some discussion and lots of diagrams representing the DOM and render trees on paper, we came up with 2 ideas:

    • Subtrees approach: The idea is to split the render tree in subtrees calculating the start and end positions of the selection for each of them. Then it will use current selection methods to perform the selection inside each subtree.
    • DOM order approach: In this case we would use DOM order to traverse the render tree. Also when looking for the container of a node, we will follow the DOM tree too if it's a content node.
    Handwritten diagrams representing DOM and render trees

    Handwritten diagrams representing DOM and render trees

    During the hackfest we implemented 2 quick prototypes to check if previous ideas would solve the problems. In both cases results were satisfactory.

    However, DOM order idea seems more complicated as some elements in the DOM tree might not be present in the render tree. On top of that, we should review carefully the different methods involved in the selection and modify how they traverse the tree (or look for their container) in some cases. This would imply several changes in the more complex parts of the selection code.

    On a first sight the subtrees approach is more promising as it looks simpler and cleaner. After talking with Ryosuke Niwa and David Hyatt on IRC it was confirmed as the preferred approach and likely valid to be integrated into WebKit. In addition, it seems that more performance optimizations could be done with this approach. So, let's explain deeply how this solution would work.

    Subtrees approach

    All selection algorithms are thought to work in a single render tree under RenderView. Actually, RenderHTML is the top element of the render tree where selection state is set (see RenderBoxModelObject::setSelectionState()).

    Inside this render tree the start object is always before than the end object. But that's not always true in the case of CSS Regions, as you can see in the diagram above. For example, if you start selection inside the source-1 element (depending on its position inside the DOM tree) start position could be under the RenderFlowThread and end under the RenderHTML.

    So, this solution will split the render tree in subtrees where the root objects would be:

    • On one hand, the RenderHTML (for the "regular" render tree).
    • On the other hand, all the RenderFlowThreads (there will be 1 per flow thread in the page).

    Then the total number of subtrees we'll be 1 (the RenderHTML subtree) + number of flow threads.

    For each subtree we'll calculate the start and end positions (start will be always before end). And then we'll call selection algorithms for each subtree using that positions.

    In the previous example we'll have 2 subtrees:

    • RenderHTML subtree.
    • RenderFlowThread (flow-1) subtree.

    Let's calculate start and end positions for each subtree taking into account the example above:

    • RenderHTML subtree:
      • start: content-1, offset: start offset set by user's selection.
      • end: content-2, offset: end offset set by user's selection.
    • RenderFlowThread (flow-1) subtree:
      • start: source-1, offset: start of source-1 object.
      • end: source-1, offset: end of source-1 object.
    Subtrees approach DOM and render trees selection example

    Subtrees approach DOM and render trees selection example

    Using these subtrees RenderView::setSelection() method will be called for each of them. Subtree's root object will need to store information about start and end positions (together with their offsets).


    This is an important milestone for the work we have been doing lately, and it allows us to have a clear path to make progress on this field and eventually fixing the selection in CSS Regions.

    Of course, there's still quite a lot of work to do. We'll improve current prototype to convert it in a real patch to be uploaded at WebKit's bugzilla and got it integrated after the review iterations. Then, as usual, we'll try to get integrated into Blink too.

    Finally, thanks to The GNOME Foundation and Igalia for sponsoring the WebKitGTK+ Hackfest, and also to Mihnea for attending and helping us to move this topic forward sharing his experience on the CSS Regions standard.

    Exciting times ahead, stay tuned! Happy hacking. :-)

    Sponsored by GNOME Foundation logo

    Igalia logo

    by Manuel Rego Casasnovas at January 23, 2014 01:11 PM

    January 22, 2014

    Jacobo Aragunde

    Flying to Brussels for some hacking

    Just some lines to say I’ll be flying to Brussels with some of my fellow Igalians to attend FOSDEM 2014 next week. It will be a wonderful opportunity to meet in person the members of the communities we are involved in, learning and, of course, having fun.

    FOSDEM 2014 logo

    But my main interest is taking part in the LibreOffice UX hackfest that will happen afterwards. My plans there include hacking on LibreOffice accessibility, one of the areas of interest for us in Igalia.

    LibreOffice hackfest logo

    Hope to see you there!

    by Jacobo Aragunde Pérez at January 22, 2014 01:33 PM

    January 21, 2014

    Javier Fernández

    Improving selection in CSS Regions

    I would like to introduce in this post the main problems we have detected in the Selection implementation of two of the most important web engines, such as Blink and WebKit. I’ve already described some of these issues, particularly for CSSRegions, but I’ll go a bit further now analyzing them and also introducing one of the proposal we have been working on at Igalia s part of the collaboration we have with Adobe.

    Selection is a DOM Tree matter

    At Igalia, we have been investigating how to adapt the selection to the new layout models which provide more complex ways of visualizing the content defined in the DOM Tree. Let’s consider the following basic example using CSSRegions layout:


    Figure 1: base case

    In the last post about this issue we have identified 4 main problems with selection in CSSRegions:

    • Selection direction
    • Highlighting and content mismatch
    • Incorrect block gaps filling
    • Clear the selection

    I’ll describe some examples where these issues are present, where are the root causes and how they can be solved or, at least, improved. I’ll try as well to explain how the Selection works, starting from the mouse events the end user generates to perform a new selection, how those are mapped into a DOM Tree range and finally, how the rendering process produces the highlighting of the selected elements.


    Figure 2: Highlighting and selected content mismatch

    I’ll use this first example (Figure 2) to briefly describe how the Selection is implemented and how all the involved components interact to generate both, the selected DOM Range and the corresponding highlighting by the RenderTree. Obviously the end user selects contents from the Visualized elements, in this case the content of two regular blocks (no regions involved). The mouse events are translated to VisiblePosition instances (Start and End)  in the DOM Tree using the positionForPoint method. Such VisiblePositions are then mapped into the corresponding RenderObjects in the Render Tree; these objects are the ones used to traverse the tree in the RenderView::setSelection method and mark the appropriated elements with one of the following flags: SelectionNone, SelectionStart, SelectionInside, SelectionEnd, SelectionBoth. These flags are also very important in the block gaps filling algorithm, implemented basically in the RenderBlock::selectionGaps method.

    The algorithm implemented in the RenderVieww::setSelection method can be described, very simplified, as the following steps:

    • gathering information (RenderSelectionInfo and RenderblockSelectionInfo) of the old selection.
    • clearing the old selection (basically mark all the elements as SelectionNone)
    • updating the flags of the elements of the new selection.
    • gathering information of the new selection.
    • repainting old objects which might have changed.
    • painting the new selected objects.
    • repainting the old blocks which might have changed.
    • painting the new selected blocks.

    The algorithm traverses the RenderTree, from the Start to the End using the RenderObject::nextInPreOrder function. Here is where the clear operation issues can appear. If not all objects can be reached by the pre-order traversal, the clear operation does not work properly. That’s why we introduced a way to traverse back the Tree (r155058) looking for elements which can be unreachable. One of the causes of this issue is the selection direction change.

    This first example shows the highlighting and content mismatch issue, since the DOM Range considers the source (flow-into) element, while is not highlighted by the RenderTree.

    The next example considers now selection from both regions and regular blocks and introduces also an interesting Selection topic: selection direction.


    Figure 3: Incorrect block gaps filling

    As you can see in the diagram Figure 3, the user selected content upwards. In most of the cases the selection direction is not used at all, so Start  must be always above the End VisiblePosition in the DOM Tree. The VisibleSelection ensures that, but in this case, because of how the Source (flow-into) is defined according to the CSSRegions specification and where it was placed in the HTML code, the original Start and End position are not flipped. We will talk more about this in the next example. However,  the RenderObject associated to a DOM element with a flow-into property is located in the in the render tree under the RenderFlowThread object, which itself is placed at the end of normal render tree, thus causing the start render object to be below the end render object. This fact causes the highlighted content to be exactly the opposite to what the user actually selected.

    This example illustrates also the issue of incorrect block gaps filling, since the element with the id content-1 is considered a block gap to be filled. This happens because of the changes introduced in the already mentioned revision  r155058  since the element with id content-2 and the body are flagged as SelectionEnd, the intermediate elements are considered as block gaps to be filled.

    At this point is quite clear that the way the Render Tree is traversed is very important to produce a coherent selection rendering; notice that in this case, highlight and selected content match perfectly. The idea of using the Region DIV as container of the Source DIV content portion, which is rendered inside such region, looks like a promising approach. The next example will go further into this idea.


    Figure 4: Selection direction

    In this example (Figure 4) the Start and End VisiblePosition instance have to be flipped by the generated VisibleSelection, since the DIV with the id content-1 is above the original Start element defined by the end user. By flipping both positions it makes the corresponding Start and End RenderObject instances to be consistent, that’s why there is no selection direction issue in this case. However, because of the position of the End element as child of the RenderFlowThread, the RenderElement with the id content-2  is selected, which, while being seamless from the user experience point of view, it does not match the selected content retrieved from the DOM Range.

    The solution: Regions as containers

    At this point is clear that the selection direction issues are one of the most important source of problems for the Selection with CSSRegions. The current algprithms, RenderView::setselection and RenderBlock::selectionGaps, require to traverse the RenderTree downwards from start to end. In fact, this is specially important for the block gaps filling algorithm.

    It’s important to notice that the divergence of the DOM and Render trees when using CSSRegions comes from how these two concepts, the flow-into DOM element and the RenderFlowThread object, are managed and placed in each trees. Why not just using the region elements for the selection algorithms and considering both flow-into and RFT as “meta-elements” where the selected content is extracted from ?

    Considering the steps defined previously for the selection algorithm the regions as containers approach could be described as follows:

    • Case 1: Start and Stop elements, either both or none, are children of the RenderFlowThread.
      • The current RenderView::setSelection algorithm works fine.
    • Case 2: Only Start is child of the RenderFlowThread.
      • First, determining the RenderElements range [RegionStart, RegionEnd] in the RenderFlowThread associated to the RenderRegion the Start element is rendered by.
      • Then, applying the current algorithm to the range of elements [Start, RegionEnd]
      • Finally, applying the current algorithm from the NextInPreOrder element of the RenderRegion until the Stop, as usual.
    • Case 3: Only Stop is child of the RenderFlowThread.
      • First, applying the current algorithm from the Start element to the RenderRegion the Stop element is rendered by.
      • Then, determining the RenderElements range [RegionStart, RegionEnd] in the RenderFlowThread associated to the RenderRegion the Stop element is rendered by.
      • Finally, applying the current algorithm to the range of elements [RegionStart, Stop]

    Determining the selection direction, at VisibleSelection, is also affected by the structure of the RenderTree; even that the editing module in both WebKit and Blink is also using rendering info for certain operations, this is perhaps one of the weakest points of this approach. Let’s use one of the examples defined before to illustrate this situation.


    Figure 5: Block gaps filling issues solved

    While traversing the RenderTree, once a RenderRegion is reached its corresponding range of RenderObjects is determined in the RenderFlowThread. The entire range will be traversed for the blocks flagged as SelectionInside. For the ones flagged as SelectionStart or SelectionEnd, the steps previously defined are applied.

    The key of this new approach is that traversing is always downwards, from the Start to the End, which solves also the block gaps related issues.

    Let’s considering now a more complex example (Figure 6), with several regions between a number of regular blocks. selection is more natural with this approach, coherent with what the user expects and also matching the DOM Tree range for most of the cases. This is, however, the biggest drawback of this approach, since it does not follow completely the Editing/Selection specs. I’ll talk  more about this in the last lines of this post.


    Figure 6: Selection direction issues solved

    The following video showcase our proposal on the WebKit MiniBrowser testing application using a real HTML example based on the Figure 6 diagram.

    Even though selection is more natural an coherent, as I already mentioned, it does not follow completely the Editing/Selection specs. As I stated at the beginning of this post, selection is a DOM matter, defined by a Range of elements in the DOM Tree. This very simple case (Figure 7) is enough to describe this issue:


    Figure 7: Regions as containers NON specs compliant

    The regions as containers approach does not considers the Source (flow-into) elements as actual DOM elements, so they will never be part of the selection. This breaks the Editing/Selection specification, since those are regular DOM elements as they are defined in the CSS Region standard. This approach was our first try and perhaps too ambitious, providing a good user experience on selection with CSSRegions and specs compliant at the same time. Even that it was a good experience we can conclude that the problem is too complex and it requires a different strategy.

    We had the opportunity to introduce and discuss our proposal during the last WebKitGtk+ Hackfest, where Rego, Mihnea and me had the chance to work hand in hand, carefully analyzing and digesting this proposal. Even that we finally discard it, we were able to design other approaches which some of them are really promising. Rego will introduce some of them shortly in a blog post. Hence, can’t end this post without thanking to Igalia and the GNOME Foundation for sponsoring such a productive and useful hackfest.

    by jfernandez at January 21, 2014 11:27 PM

    January 19, 2014

    Andy Wingo

    elf in guile

    Good evening, gentle hackfolk!

    Today I'd like to wrap up my three-part series of articles on what's new in Guile 2.2's compiler and runtime. I talked about the virtual machine a couple months ago, and the compiler internals just last Sunday. Today's article is about the object file format.

    Sounds boring, right? Well, probably for most humans. But hackers, Guile compiles to ELF. Pretty rad, amirite? I thought so too. Read on, nerdy readers, read on!

    object files

    So let's consider the problem: Guile compiles to bytecode for a custom virtual machine. In the future we want to do native compilation. In both cases we'll need to write bytes out to disk in some format, and be able to load that code back into Guile. What should be the format for those bytes? For our purposes, a good object file format has a number of characteristics:

    • Above all else, it should be very cheap to load a compiled file.

    • It should be possible to statically allocate constants in the file. For example, a bytevector literal in source code can be emitted directly into the object file.

    • The compiled file should enable maximum code and data sharing between different processes.

    • The compiled file should contain debugging information, such as line numbers, but that information should be separated from the code itself. It should be possible to strip debugging information if space is tight.

    These characteristics are not specific to Scheme. So why not just steal from the C and C++ people? They use the flexible ELF object file format, and so does Guile.

    Note that although Guile uses ELF on all platforms, we do not use platform support for ELF. Guile implements its own linker and loader. The advantage of using ELF is not sharing code, but sharing ideas. ELF is simply a well-designed object file format.

    An ELF file has two meta-tables describing its contents. The first meta-table is for the loader, and is called the program table or sometimes the segment table. The program table divides the file into big chunks that should be treated differently by the loader. Mostly the difference between these segments is their permissions.

    Typically all segments of an ELF file are marked as read-only, except that part that represents modifiable static data or static data that needs load-time initialization. Loading an ELF file is as simple as mmapping the thing into memory with read-only permissions, then using the segment table to mark a small sub-region of the file as writable. This writable section is typically added to the root set of the garbage collector as well.

    The other meta-table in an ELF file is the section table. Whereas the program table divides an ELF file into big chunks for the loader, the section table specifies small sections for use by introspective tools like debuggers or the like. One segment (program table entry) typically contains many sections. There may be sections outside of any segment, as well.

    I know, this is a bit dry. So I made you a picture; or rather, a program that makes pictures. Here's an example of one of the pictures:

    What we see here is a page map of assembler.go, a module in Guile's compiler. The upper part of the top shows each section in a different color, in the place they appear in the file. Each horizontal line is one four-kilobyte page.

    As you can see, some pages contain multiple sections or parts of sections; that is the case for pages that have sections with the same permissions (read-only versus read-write), and which don't have special alignment requirements. About halfway down after the small red .dynamic section there is a gap, indicating that the next section, .data needs to start on a separate page -- in this case because it is writable.

    This page map shows the sections of the ELF file. (There are only three segments; the first read-only part, a small "dynamic" segment holding the .dynamic section, used when the file is loaded; and the final read-write section. You can't see this from the visualization, but actually everything after .data is in no segment at all -- because it's not strictly necessary at run-time. That's the ELF convention.)

    (Why is this file so big, you ask? It's a complicated answer. Part of it is because much of the assembler is itself a generated program; it uses Scheme procedural macros to define emit-foo procedures for each kind of instruction, based information extracted from the VM code (link). Guile doesn't do phasing, so these macros are residualized into the object file, and because of datum->syntax, macros have a lot of associated constant literals. That probably explains the huge size of .data, as syntax objects contain vectors and lists and symbols needing run-time relocation. I would have to check, but I think .rtl-text is big mostly because of a number of dynamic type checks in this particular module that the optimizer is (rightly) unable to elide. Alack. Surely both of these can be fixed eventually.)

    But enough of problems. Did I mention that I made a program? And how! I think this may be the only blog post you have ever read that has a form in it, but that's how I am rolling today. Select an ELF file on your system, click submit, and you can have your very own page map. It works on any ELF file, not just Guile's files, so you can send your or whatever.

    If you don't see a form above, your blog reader must have stripped it out. Click through to the blog post itself, or go visit the tool directly. And of course the program is written in Guile -- the file upload handling, the ELF parsing, and the pixel munging (using Cairo and a homebrew charting library).

    Anyway, here's another example, this time of a small file:

    Here we see that this file has the same sections as our earlier bloated example, only they are smaller.

    Allow me to point out some details.

    Firstly, the .rtl-text section holds the bytecode. Usually this would be called .text, but I didn't want to mess with people's expectations, and the new VM used to be called the "RTL" VM. (No longer, that was a silly name.)

    The .data section holds data that needs initialization, or which may be modified at runtime. .rodata holds statically allocated data that needs no run-time initialization, and which therefore can be shared between processes. The initializations themselves, like relocations, are compiled to a procedure in the .rtl-text section, linked to from the dynamic section.

    And that's all the sections! Except for the introspection and debugging-related sections, that is. These sections are used by the Guile runtime to be able to answer questions like "what is the name of this function?" or "what is the source position corresponding to bytecode position X?" or "what is the documentation string for this function?" or, well, I think you get the point. None of this is usually needed at runtime, and because it is all allocated at the end of the file, that means that usually none of it is ever paged into memory.

    Note that for some of this metadata we use the standard DWARF format. I think we are one of the few dynamic language runtimes that does this nifty thing.

    Also, all read-only data in these ELF files is ripe for sharing between processes, paging out to disk if under memory pressure, etc. For example, if I look on the smaps file for one of the two web processes running on this server, I see:

    Size:                132 kB
    Rss:                 124 kB
    Pss:                  62 kB

    meaning that for this particular file, almost all of it is not only shareable but being shared. Good times.

    Finally, all of this has a positive impact on start-up time. While I can get Guile 2.0 to start up in 11 milliseconds, with Guile 2.2 I am down to 8 milliseconds. Likewise guile -c '(sleep 100)' in Guile 2.0 uses 3144 kB of private dirty memory, compared to 1852 kB with Guile 2.2. There's still improvements to be made, but things are going well.

    Well, again I find myself rambling. Check out the little ELF mapper tool I have above and let me know of any curious results. Do send your questions as well; though I've been derelict at responding, who knows, new year, new leaf right? Until next time, happy hacking.

    by Andy Wingo at January 19, 2014 07:55 PM

    January 17, 2014

    Iago Toral

    WebKitGTK Wayland: Initial support for WebKit2 and Accelerated Compositing

    Quick Recap

    In my last post on the subject I explained how during the last WebKitGTK hackfest my colleague Eduardo Lima and I got a working GTK application that made use of the nested compositor design we need in WebKitGTK to get WebKit2 to work under Wayland and how the next steps would involve developing a similar solution inside WebKitGTK.


    Current Status

    During the last 2 weeks I have been working on this, and today I got Accelerated Compositing to work under Wayland and WebKit2. There are still a lot of rough edges of course since this milestone is mostly a prototype that only covers the basics. Its purpose was solely  to investigate how the nested compositor approach would work in WebKitGTK to support the Accelerated Compositing code path. Still, this is an important milestone: Accelerated Compositing and WebKit2 were the biggest missing pieces to bring Wayland support to WebKitGTK and even if this is  only a prototype it provides a solution for these two aspects, which is great news.


    To Do List

    There are probably a lot of things that need more work to convert this prototype into a proper solution. I have not tested it thoroughly but here is a quick list of things that I already know need more work:

    • Support for multiple windows and tabs (the prototype only supports one tab in one window)
    • For some pages the first composition can be very slow (as in taking >5 seconds). This problem only happens the first time the page is loaded, but it does not happen when  reloading the same page (the demo video below shows this)
    • Rendering of text selections does not seem to work
    • There are rendering artifacts when going back using the browser’s back button to a previously visited page that activates the Accelerated Compositing code path. If the page is reloaded things go back to normal though
    • There are some style rendering issues I have not looked into yet, might be on the side of GTK though
    • All this was tested in a Wayland environment inside an X session, so it can be that some things still need to be fixed for this to work in a pure Wayland environment (with no X server running).
    • Ideally we would like a solution that can make run-time decisions about the underlying platform (X or Wayland) so that we don’t have to build WebKitGTK specifically for one or the other. This is particularly important now that adoption of Wayland is still low. My prototype, however, only supports Wayland at the moment and would require more work to select between X and Wayland at run-time.

    And there is probably a lot more to do that we will find out once we start testing this more seriously.



    So here is a small video demoing the prototype. The demo uses WebKit’s MiniBrowser (WebKit2) to load 3 pages that activate the Accelerated Compositing code path in WebKitGTK. The browser is restarted for every page load only because it made it easier for me to record the video. You will see that for some pages, the first time the page  is composited it takes a long time which is one of the issues I mentioned above. The demo also shows how this is not the case when the page is reloaded:



    Next Steps

    Once I reached this milestone I think we should start moving things to get this  upstream as soon as possible: the current implementation provides the basics for Wayland support in WebKit2 and it would allow other interested developers to step in and help with testing, completing and fixing this initial implementation. I am sure there is still a lot of work to do for a fully operational Wayland port of WebKitGTK, so the more people who can contribute to this, the better.


    I presume that upstreaming my code will still require a significant effort: my current implementation is a bit too hackish right now so there will be a lot of cleanups to do and a lot of back and forth with upstream maintainers to get the code in position to be merged, so the sooner we start the better. I also need to rebase my code against up-to-date versions of WebKitGTK and Wayland since I froze my development environment during the last WebKitGTK hackfest.


    So that’s it. It is always good to reach milestones and I am happy to have reached this one in particular. If you are excited about WebKitGTK and Wayland I hope you enjoyed the news as much as I enjoyed working on it!


    I would like to thank Igalia for sponsoring my work on this as well as all the other Igalians who helped me in the process, it would have not been possible without this support!

    by itoral at January 17, 2014 12:34 PM

    January 14, 2014

    Jacobo Aragunde

    A somehow late recap on 2013

    First of all let me wish you a happy new year, dear reader! These kinds of posts are supposed to be published in the last days of the year, but I needed some holidays before being back at full-steam.

    That’s because 2013 has been quite a ride from the professional point of view. I have never touched so many different technologies and frameworks in such a relatively short time period:

    • I started the year Introducing KiCad, one of the best free software EDA software suites. I have been contributing to KiCad for some months and even gave a talk to the students at the Facultade de Informática da Coruña.
    • PhpReport got a new release in the first months of 2013.
    • I was also Playing with HTML5 in my spare time to develop a full, albeit short, arcade game.
    • I have helped my fellow Igalian Juan A. Suárez to debug the XML meta-plugin for Grilo, which will allow the development of simpler Grilo plugins using an XML based language. We hope to release it (and blog about it!) in the months to come.
    • I also took some steps into Android application development.
    • And finally, I joined the LibreOffice community where I have been happily contributing to the support of OOXML standard and helping to detect and correct accessibility problems. I’ve already been granted commit permission on the main repo and invited to join ESC meetings. I feel really honored and grateful about that.


    Another feat of 2013 is that Igalia has turned 12, and we celebrated it as a part of the last Igalia summit. We spent the weekend in a really beautiful location in Galicia where we were able to relax, eat, drink, compete in the first official Igalia gymkhana and we even had time to sit down together to discuss how to face the future, spread our values and keep enjoying what we do.

    I am really happy to show how Igalia keeps going forward on its compromise with Free Software. In the last year we joined a number of consortia and associations and sponsored FLOSS-related events all around the world. I would like to highlight our membership to the W3C, with whom we will contribute shaping the Web of the future, our partnership with Adobe to implement new web standards, our sponsorship to multiple Linux Foundation events or our support to The Document Foundation.

    2014 is ahead and it bring new challenges for sure; my only wish is having the energy to face them with passion!

    by Jacobo Aragunde Pérez at January 14, 2014 09:14 AM

    January 12, 2014

    Andy Wingo

    a continuation-passing style intermediate language for guile

    Happy new year's, hackfolk!

    A few weeks ago I wrote about the upcoming Guile 2.2 release, and specifically about its new register virtual machine. Today I'd like to burn some electrons on another new part in Guile 2.2, its intermediate language.

    To recap, we switched from a stack machine to a register machine because, among other reasons, register machines can consume and produce named intermediate results in fewer instructions than stack machines, and that makes things faster.

    To take full advantage of this new capability, it is appropriate to switch at the same time from the direct-style intermediate language (IL) that we had to an IL that names all intermediate values. This lets us effectively reason about each subexpression that goes into a computation, for example in common subexpression elimination.

    As far as intermediate languages go, basically there are two choices these days: something SSA-based, or something CPS-based. I wrote an article on SSA, ANF and CPS a few years ago; if you aren't familiar with these or are feeling a little rusty, I suggest you go and take a look.

    In Guile I chose a continuation-passing style language. I still don't know if I made the right choice. I'll go ahead and describe Guile's IL and then follow up with some reflections. The description below is abbreviated from a more complete version in Guile's manual.

    guile's cps language

    Guile's CPS language is composed of terms, expressions, and continuations. It was heavily inspired by Andrew Kennedy's Compiling with Continuations, Continued paper.

    A term can either evaluate an expression and pass the resulting values to some continuation, or it can declare local continuations and contain a sub-term in the scope of those continuations.

    $continue k src exp
    Evaluate the expression exp and pass the resulting values (if any) to the continuation labelled k.
    $letk conts body
    Bind conts in the scope of the sub-term body. The continuations are mutually recursive.

    Additionally, the early stages of CPS allow for a set of mutually recursive functions to be declared as a term via a $letrec term. A later pass will attempt to transform the functions declared in a $letrec into local continuations. Any remaining functions are later lowered to $fun expressions. More on "contification" later.

    Here is an inventory of the kinds of expressions in Guile's CPS language. Recall that all expressions are wrapped in a $continue term which specifies their continuation.

    Continue with the unspecified value.
    $const val
    Continue with the constant value val.
    $prim name
    Continue with the procedure that implements the primitive operation named by name.
    $fun src meta free body
    Continue with a procedure. body is the $kentry $cont of the procedure entry. free is a list of free variables accessed by the procedure. Early CPS uses an empty list for free; only after closure conversion is it correctly populated.
    $call proc args
    Call proc with the arguments args, and pass all values to the continuation. proc and the elements of the args list should all be variable names. The continuation identified by the term's k should be a $kreceive or a $ktail instance.
    $primcall name args
    Perform the primitive operation identified by name, a well-known symbol, passing it the arguments args, and pass all resulting values to the continuation.
    $values args
    Pass the values named by the list args to the continuation.
    $prompt escape? tag handler
    Push a prompt on the stack identified by the variable name tag and continue with zero values. If the body aborts to this prompt, control will proceed at the continuation labelled handler, which should be a $kreceive continuation. Prompts are later popped by pop-prompt primcalls.

    The remaining element of the CPS language in Guile is the continuation. In CPS, all continuations have unique labels. Since this aspect is common to all continuation types, all continuations are contained in a $cont instance:

    $cont k cont
    Declare a continuation labelled k. All references to the continuation will use this label.

    The most common kind of continuation binds some number of values, and then evaluates a sub-term. $kargs is this kind of simple lambda.

    $kargs names syms body
    Bind the incoming values to the variables syms, with original names names, and then evaluate the sub-term body.

    Variables (e.g., the syms of a $kargs) should be globally unique. To bind the result of an expression a variable and then use that variable, you would continue from the expression to a $kargs that declares one variable. The bound value would then be available for use within the body of the $kargs.

    $kif kt kf
    Receive one value. If it is a true value, branch to the continuation labelled kt, passing no values; otherwise, branch to kf.

    Non-tail function calls should continue to a $kreceive continuation in order to adapt the returned values to their uses in the calling function, if any.

    $kreceive arity k
    Receive values from a function return. Parse them according to arity, and then proceed with the parsed values to the $kargs continuation labelled k.

    $arity is a helper data structure used by $kreceive and also by $kclause, described below.

    $arity req opt rest kw allow-other-keys?
    A data type declaring an arity. See Guile's manual for details.

    Additionally, there are three specific kinds of continuations that can only be declared at function entries.

    $kentry self tail clauses
    Declare a function entry. self is a variable bound to the procedure being called, and which may be used for self-references. tail declares the $cont wrapping the $ktail for this function, corresponding to the function's tail continuation. clauses is a list of $kclause $cont instances.
    A tail continuation.
    $kclause arity cont
    A clause of a function with a given arity. Applications of a function with a compatible set of actual arguments will continue to cont, a $kargs $cont instance representing the clause body.


    Before starting Guile's compiler rewrite, I had no real-world experience with CPS-based systems. I had worked with a few SSA-based systems, and a few more direct-style systems. I had most experience with the previous direct-style system that Guile had, but never had to seriously design another kind of IL, so basically I was ignorant. It shows, I think; but time will tell if it came out OK anyway. At this point I am cautiously optimistic.

    As far as fitness for purpose goes, the CPS IL works in the sense that it is part of a self-hosting compiler. I'll say no more on that point other than to mention that it has explicit support for a number of Guile semantic features: multiple-value returns; optional, rest, and keyword arguments; cheap delimited continuations; Guile-native constant literals.

    Why not ANF instead? If you recall from my SSA and CPS article, I mentioned that ANF is basically CPS with fewer labels. It tries to eliminate "administrative" continuations, whereas Guile's CPS labels everything. There is no short-hand let form.

    ANF proponents tout its parsimony as a strength, but I do not understand this argument. I like having labels for everything. In CPS, I have as many labels as there are expressions, plus a few for continuations that don't contain terms. I use them directly in the bytecode compiler; the compiler never has to generate a fresh label, as they are part of the CPS itself.

    More importantly, labelling every control-flow point allows me to reason precisely about control flow. For example, if a function is always called with the same continuation, it can be incorporated in the flow graph of the calling function. This is called "contification". It is not the same thing as inlining, as it works for sets of recursive functions as well, and never increases code size. This is a crucial transformation for a Scheme compiler to make, as it turns function calls into gotos, and self-function calls into loop back-edges.

    Guile's previous compiler did a weak form of contification, but because we didn't have names for all control points it was gnarly and I was afraid to make it any better. Now its contifier is optimal. See Fluet and Weeks' Contification using Dominators and Kennedy's CWCC, for more on contification.

    One more point in favor of labelling all continuations. Many tranformations can be best cast as a two-phase process, in which you first compute a set of transformations to perform, and then you apply them. Dead-code elimination works this way; first you find the least fixed-point of live expressions, and then you residualize only those expressions. Without names, how are you going to refer to an expression in the first phase? The ubiquitous, thorough labelling that CPS provides does not have this problem.

    So I am happy with CPS, relative to ANF. But what about SSA? In my previous article, I asked SSA proponents to imagine returning a block from a block. Of course it doesn't make any sense; SSA is a first-order language. But modern CPS is also first-order, is the thing! Modern CPS distinguishes "continuations" syntactically from functions, which is exactly the same as SSA's distinction between basic blocks and functions. CPS and SSA really are the same on this level.

    The fundamental CPS versus SSA difference is, as Stephen Weeks noted a decade ago, one of data structures: do you group your expressions into basic blocks stored in a vector (SSA), or do you nest them into a scope tree (CPS)? It's not clear that I made the correct choice.

    In practice with Guile's CPS you end up building graphs on the side that describe some aspect of your term. For example you can build a reverse-post-ordered control flow analysis that linearizes your continuations, and assigns them numbers. Then you can compute a bitvector for each continuation representing each one's reachable continuations. Then you can use this reachability analysis to determine the extent of a prompt's body, for example.

    But this analysis is all on the side and not really facilitated by the structure of CPS itself; the CPS facilities that it uses are the globally unique continuation and value names of the CPS, and the control-flow links. Once the transformation is made, all of the analysis is thrown away.

    Although this seems wasteful, the SSA approach of values having "implicit" names by their positions in a vector (instead of explicit ephemeral name-to-index mappings) is terrifying to me. Any graph transformation could renumber things, or leave holes, or cause vectors to expand... dunno. Perhaps I am too shy of the mutation foot-gun. I find comfort in CPS's minimalism.

    One surprise I have found is that I haven't needed to do any dominator-based analysis in any of the paltry CPS optimizations I have made so far. I expect to do so once I start optimizing loops; here we see the cultural difference with SSA I guess, loops being the dominant object of study there. On the other hand I have had to solve flow equations on a few occasions, which was somewhat surprising, though enjoyable.

    The optimizations I have currently implemented for CPS are fairly basic. Contification was tricky. One thing I did recently was to make all non-tail $call nodes require $kreceive continuations; if, as in the common case, extra values were unused, that was reflected in an unused rest argument. This required a number of optimizations to clean up and remove the extra rest arguments for other kinds of source expressions: dead-code elimination, the typical beta/eta reduction, and some code generation changes. It was worth it though, and now with the optimization passes things are faster than they were before.

    Well, I find that I am rambling now. I know this is a lot of detail, but I hope that it helps some future compiler hacker understand more about intermediate language tradeoffs. I have been happy with CPS, but I'll report back if anything changes :) And if you are actually hacking on Guile, check the in-progress manual for all the specifics.

    Happy hacking to all, and to all a good hack!

    by Andy Wingo at January 12, 2014 09:58 PM

    December 31, 2013

    Víctor Jáquez

    Boosting WebKitGTK+ compilation for armhf with icecream

    Some time ago I needed to jump into the fix-compile-test loop for WebKitGTK+, but in the armhf architecture, speaking in terms of Debian/Ubuntu.

    To whom don’t know, WebKitGTK+ is huge, it is humongous, and it takes a lot of resources to compile. For me, at first glance, was impossible to even try to compile it natively in my hardware, which, by the way, is an Odroid-X2. So I setup a cross-compilation environment.

    And I failed. I could not cross-compile the master branch of WebKitGTK+ using as root file system, a bootstrapped Debian. It is supposed to be the opposite, but all the multiarch thing made my old and good cross-compilation setup (based on scratchbox2) a bloody hell. Long story short, I gave up and I took more seriously the idea of native builds. Besides, Ubuntu and Debian does full native builds of their distributions for armhf, not to say that the Odroid-X2 has enough power for give it a try.

    It is worth to mention that I could not use Yocto/OE or buildroot, though I would love to use them, because the target was a distribution based on Debian Sid/Ubuntu, and I would not afford a chroot environment only for WebKitGTK+.

    With a lot of patience I was able to compile, in the Odroid, a minimalist configuration of WebKitGTK+ without symbols. As expected, it took ages (less than 3 hours, if I remember correctly)

    Quickly an idea popped out in the office: to use distcc. I grabbed as many board based on ARMv7 I could find: another Odroid-X2, a couple Pandaboards, an Arndaleboard, and an IFC6410, installed in them a distcc compilation setup.

    And yes, the compilation time went down, but not that much, though I don’t remember how much.

    Many of the colleagues at the office migrated from distcc to icecream. Particularly, Juan A. Suárez told me about his experiments with icecc and his Raspberry pi. I decided to give it a shoot.

    Icecream permits to do cross-compilation because the scheduler can deliver, into the compilation host, the required tool-chain by the requester.

    First, you should have one or several cross tool-chains, one for each compilation tuple. In this case we will have only one: to compile in X86_64, generating code for armfh. Luckily, embdebian provides it, out of the box. Nevertheless you could use any other mean to obtain it, such as crosstool.

    Second, you need the icecc-create-env script to create the tarball that the scheduler will distribute to the compilation host.

        $ /usr/lib/icecc/icecc-create-env \
              --gcc /usr/bin/arm-linux-gnueabihf-gcc-4.7 \

    The output of this script is an archive file containing all the files necessary to setup the compiler environment. The file will have a random unique name like “ddaea39ca1a7c88522b185eca04da2d8.tar.bz2″ per default. You will need to rename it to something more expressive.

    Third, copy the generated archive file to board where your code will be compiled and linked, in this case WebKitGTK+.

    For the purpose of this text, I assume that the board has already installed and configured the icecc daemon. Beside, I use ccache too. Hence my environment variables are more or less like these:

    CCACHE_DIR=/mnt/hd/.ccache # /mnt/hd is a mounted hard disk through USB.
    PATH=/usr/lib/ccache:..    # where Debian sets the compiler's symbolic links

    Finally, the last pour of magic is the environment variable ICECC_VERSION. This variable needs to have this pattern


    Where <native_archive_file> is the archive file with the native tool-chain. <platform> is the host hardware architecture. <cross_archive_file> is the archive file with the cross tool-chain. <target> is the target architecture of the cross tool-chain.

    In my case, the target is not needed because I’m doing native compilation in armhf. Hence, my ICECC_VERSION environment variable looks like this:


    And that’s it! Now I’m using the big iron available in the office, reducing the time of a clean compilation in less than an hour.

    As a final word, I expect that this compilation time will be reduced a bit more using the new cmake build infrastructure in WebKitGTK+.

    by vjaquez at December 31, 2013 01:48 PM

    December 19, 2013

    Javier Muñoz

    On Technology of Controls for Accelerators and Detectors

    As a result of our work in the Kernel and Virtualization team here in Igalia, Samuel and I were invited to take part at the first conference on control system's technologies used by High Energy Physics facilities. This event was hosted in the National Center of Scientific Research NCSR DEMOKRITOS, the biggest and most acclaimed research center in Greece.

    After our talk titled Driving and virtualizing control systems: the Open Source approach used in WhiteRabbit, we joined the round table to discuss about the future of controls for accelerators and detectors. It was great sensing how the open hardware makes its way in this community.

    Along the discussion, we shared our experience on topics such as the different open source approaches, business models based on community or academia/industry interactions around big science. We noticed a genuine interest related to the open way vs another more traditional ways to proceed.

    CERN servers host the content for this event and it is on-line now. Feel free to download our slides and the talk's abstract to know how big science players leverage Linux and QEMU/KVM to raise the quality of the hardware/software, accelerating innovation and gaining additional contributors for their projects.

    by Javier at December 19, 2013 11:00 PM

    Enrique Ocaña

    Using schroot to have a stable and transplantable development environment

    Over the last months I’ve been working in several projects, switched laptop and reinstalled my main distribution several times. Having to replicate the development environment one time after another and reinstall the compiler, tools, editor and all sorts of dependencies that stain a desktop distribution is a major nuisance. In the worst case, things won’t work as before due to incompatibilities of the new distribution. And what about bringing new developers to a complex project with a lot of dependencies which are difficult to track? The new team member can spend days trying to replicate the development environment of the rest of the team.

    Fortunately, I learnt to use a tool that solves all these problems: schroot. Now I have one chroot for each of the main projects I work on. I can “transplant” it from one distribution or computer to the next, give it to a newcoming developer so that they can start hacking in the project in minutes, or just archive it when I’m not in the project anymore. It’s true that I’m spending more disk space with this approach but, from my point of view, the advantages are worth. Moreover, working in this way you start from a “bare” distribution and if your project build system is forgetting some dependency (because it “should” be on a desktop distribution) you will notice.

    Making it work on a Debian based distribution is as easy as:

     sudo apt-get install schroot

    After that, you have to initialize the chroot(s) to use. Each chroot can be based on their own distribution. For instance, to create one chroot based on Ubuntu Quantal 64 bit to hack on webkit (so, named after it), you have to initialize it using debootstrap:

     sudo debootstrap --arch amd64 quantal /srv/chroot/webkit

    Then you have to configure schroot to find it by editing /etc/schroot/schroot.conf and adding the new entry. There are a several flavours to configure chroots, but my favourite one is directory-based:

      description=Quantal for compiling Webkit

    If you want to have an independent home dir in the chroot, comment out the corresponding line in /etc/schroot/default/fstab (and probably in desktop/fstab too). If you’re going to use WebKit2, you need to enable (uncomment) shared memory support. The resulting fstab should look like this:

      /proc           /proc           none    rw,bind         0       0
      /sys            /sys            none    rw,bind         0       0
      /dev            /dev            none    rw,bind         0       0
      /dev/pts        /dev/pts        none    rw,bind         0       0
      #/home          /home           none    rw,bind         0       0
      /tmp            /tmp            none    rw,bind         0       0
      #/run           /run            none    rw,bind         0       0
      #/run/lock      /run/lock       none    rw,bind         0       0
      /dev/shm        /dev/shm        none    rw,bind         0       0
      /run/shm        /run/shm        none    rw,bind         0       0

    Now, to enter in the chroot:

      schroot -c webkit

    Schroot has an automatic session management system that allows creating, reusing and killing sessions but which can be a bit uncomfortable at the begining. If you find yourself with a lot of zombie sessions, just kill them all:

      schroot -e --all-sessions

    As I’ve got tired of automatic and zombie sessions, now I use a tiny script in $HOME/bin to create, reuse and kill them on demand. Feel free to customize it for your needs if you find it useful.

    For more info, here are the original sources I used to learn about schroot:

    UPDATE: 2014-02-03

    If you want to enjoy graphics acceleration (and probably other features), the chroot needs to have the same acceleration packages than the host. In my case (Nvidia card), I needed this in the chroot to avoid a big black rectangle while running Chromium/Blink:

      apt-get install nvidia-319

    by admin at December 19, 2013 12:57 PM

    December 18, 2013

    Andy Wingo

    optimizing let in spidermonkey

    Peoples! Firefox now optimizes let-bound variables!

    What does this mean, you ask? Well, as you nerdy wingolog readers probably know, the new ECMAScript 6 standard is coming soon. ES6 has new facilities that make JavaScript more expressive. At the same time, though, many ES6 features stress parts of JavaScript engines that are currently ignored.

    One of these areas is lexical scope. Until now, with two exceptions, all local variables in JavaScript have been "hoisted" to the function level. (The two exceptions are the exception in a catch clause, and the name of a named function expression.) Therefore JS engines have rightly focused on compiling methods at a time, falling back to slower execution strategies when lexical scopes are present.

    The presence of let in ES6 changes this. It was a hip saying a couple years ago that "let is the new var", but in reality no one uses let -- not only because engines didn't yet implement let without feature flags, if they implemented it at all, but because let was slow. Because let-bound variables were on the fallback path, they were unoptimized.

    But that is changing. Firefox now optimizes many kinds of let-bound variables, making "let is the new var" much closer to being an actual thing. V8 has made some excellent steps in similar directions as well. I'll focus on the Firefox bit here as that's what I've been working on, then go on to mention the work that the excellent V8 hackers have done.

    implementing scope in javascript

    JavaScript the language is defined in terms of a "scope chain", with respect to which local variables are looked up. Each link in the chain binds some set of variables. Lookup of a name traverses the chain from tip to tail looking for a binding.

    Usually it's possible to know at compile-time where a binding will be in the chain. For example, you can know that the "x" reference in the returned function here:

    function () {
      var x = 3;
      return function () { return x; };

    will be one link up in the scope chain, and will be stored in the first named slot in that link. This is a classic implementation of lexical scope, and it plays very well with the semantics of JavaScript as specified. All engines do something like this.

    Another thing to note in this case is that we don't need to store the name for "x" anywhere, as we can see through all of its uses at compile-time. In practice however, as JavaScript functions are typically compiled lazily on first call, we do store the association "x -> slot 0" in the outer function's environment.

    (Could you just propagate the constant, you ask? Of course you could. But since JS compilation typically occurs one function at a time, lazily, no engine does this initially. Of course later when the optimizing compiler kicks in, it does get propagated, but that usually doesn't avoid the scope chain node allocation.)

    Let's take another case.

    function foo() { var x = 3; return x; }

    In this case we know that the "x" reference can be found in the top link of the chain, in the first slot. Semantically speaking, that is. No JS engine implements things this way. Instead we take advantage of noticing that "x" cannot be captured by any other function. Therefore we are free to assign "x" to a slot in a stack frame, and refer to it by index directly -- without allocating a scope chain node on the heap. All engines do this.

    And of course later if this function is hot, the constant exists only in a register, probably inlined to its caller. For this reason scope hasn't had a lot of work put into it. The unit of optimization -- the function -- is the same as the unit of scope. If the implementation details of scope get costly, we take advantage of run-time optimization to speculatively remove allocations, loads, and stores.

    OK. What if you have some variables captured, and some not captured?

    function foo() { var x = 3; var y = 4; return function () { return x; } }

    Here "x" is captured, but "y" is not. Usually engines will allocate "x" on the scope chain, and "y" as a local. (JavaScriptCore is an interesting exception; it uses a strategy I haven't seen elsewhere called "lazy tear-off". Basically all variables are on the stack for the dynamic extent of the scope, whether they are captured or not. If needed, potentially lazily, an object is pushed on the scope chain that aliases the stack slots. If a scope is pushed on the scope chain, when the scope ends the current values are "torn off" the stack and copied to the heap, and the slots pointer of the scope object updated to point to the heap.)

    I digress. "x" on the scope chain, "y" not. (I regress: why allocate "y" at all, isn't it dead? Yes it is dead. Optimizing compilers kill it. Baseline compilers don't bother.) So typically access to "x" goes through the scope chain, and access to "y" does not.

    Calculating the set of variables that can be captured is one of the few static analyses done by JS baseline compilers. The other ones are whether we are in strict mode or not, whether nested scopes contain "with" or "eval" (thus forcing all locals to be on the scope chain), and whether the "arguments" object is used or not. There may be more, but that is the lowest common denominator of efficiency.

    implementing let

    let is part of ES6, and all browsers will eventually implement it.

    Let me give you an insight as to the mindset of a browser maker. What a browser maker does when they see a feature is first to try to ignore it -- all features have a cost and not all of them pay for themselves. Next you try to do the minimum correct thing -- the thing that passes all the test suites, but imposes a minimal burden on the rest of the system. Usually this means that the feature probably works, except for the corner cases which users will file bugs for, but it is slow. Finally when there is either an internal use the browser maker cares about (Google, Mozilla to an extent) or you want to heat up a benchmark war (everyone else), you start to optimize.

    The state of let was until recently between "ignore" and "slow". "Slow" means different things to different browsers.

    The engines that have started to implement let are V8 (Chrome) and SpiderMonkey (Firefox). In Chrome, until recently using let in a function was an easy way of preventing that function from ever being optimized by Crankshaft. Good times? When writing this article I was going to claim this was still the case, but I see that in fact this is happily not true. You can Crankshaft a function that has let bindings. I believe, with a cursory glance, that locals that are not captured are still allocated on a scope chain -- but perhaps I am misreading things.

    Firefox, on the other hand, would not Ion-compile any function with a let. You would think that this would not be the case, given that Firefox has had let for many years. If you thought this, you misunderstand how browser development works. Basically let me lay it out for you:

    class struggle : marxism :: benchmarks : browsers

    Get it? Benchmarks are the motor of browser history. If this bloviation has any effect, please let it be to inspire you to go and make benchmarks!

    Anyway, Firefox. No Ion for let. The reason why you would avoid optimizing these things is clear -- breaking the "optimization unit == scope unit" equivalence has a cost. However the time has come.

    What I have done is to recognize let scopes that have no captured locals. This class of let scope will now be optimized by Ion. So for example in this case:

    function sumto(n) {
      let sum = 0;
      for (let i=0; i<n; i++)
        sum += i;
      return sum;

    Previously this would allocate a block scope around the "for". (Incidentally, Firefox borks block scoping in this case; each iteration should produce a fresh binding. But this is historical, will be fixed, and I digress.) The operations that created and pushed block scopes were not optimized. Avoiding pushing and popping scopes from the scope chain avoids this limitation, and thus we have awesome speed!

    The effect of simply turning on Ion cannot be denied. In this case, the runtime of a sum up to 1e9 goes from 8.8 seconds (!) to 0.8 seconds. Obviously it's a little micro-benchmark but that's how I'm rolling today. The biggest optimization is to stop deoptimization. And here's where history cranks up: V8 takes 7.7 seconds on this loop. Gentlefolk, start your engines!

    At the same time, in Firefox we are still able to reify a parallel chain of "debug scopes" as needed that corresponds to the scope chain as the specification would see it. This was the most challenging part of optimizing block scope in Firefox -- not the actual optimization, which was trivial, but optimization while retaining introspection and debuggability.

    future work

    Unhappily, scopes with let-bound variables that are captured by nested scopes ("aliased", in spidermonkey parlance) are not yet optimized. So not only do they cause scope chain allocation, but they also don't benefit from Ion. Boo. Bug 942810.

    I should also mention that the let supported by Firefox is not the let specified in ES6. In ES6 there is thing called a "temporal dead zone" whereby it is invalid to access the value of a let before its initialization. This is like Scheme's "letrec restriction", and has to be enforced dynamically for similar reasons. V8 does a great job in actually implementing this, and Firefox should do it soon.

    Of course, it's not clear to me how let can actually be deployed without breaking the tubes. I think it probably can somehow but I haven't checked the latest TC39 cogitations in that regard.

    twisted paths

    It's been a super-strange end-of-year. I was sure I would be optimizing SpiderMonkey generators and moving on to other things, but I just got caught up with stuff -- the for-of bits took approximately forever and then the amount of state carried in SpiderMonkey stack frames filled me with despair. So it was that this hack, while assisting me in that goal, was not actually a planned thing.

    See, SpiderMonkey used to actually reserve a stack slot for the "block chain". No, they aren't using your browser to mine for Bitcoins, though that would be a neat hack. The "block chain" was a simulation of the spec-mandated scope chain. But of course this might not reflect the actual implemented, optimized behavior -- but again one might want to map them to each other for debugging purposes. It was a mess.

    The resulting changeset to fix this ended up so large that it took weeks to land. Well, live and learn, right? I remember Michael Starzinger telling me the same thing about V8 -- you just have to keep your patches small, as small as possible, and always working. Words to the wise indeed.

    happy days

    But in the end we at least have some juice from this citric fruit. This has been an Igalia joint. Thanks very much to Mozilla's Luke Wagner for suffering through the reviews.

    Thanks especially to Bloomberg for making this work possible; you folks are swell. Forward ES6!

    by Andy Wingo at December 18, 2013 08:00 PM