Planet Igalia

July 31, 2020

Alejandro Piñeiro

v3dv status update 2020-07-31

Iago talked recently about the work done testing and supporting well known applications, like the Vulkan ports of the Quake1, Quake 2 and Quake3. Let’s go back here to the lastest news on feature and bugfixing work.

Pipeline cache

Pipeline cache objects allow the result of pipeline construction to be reused. Usually (and specifically on our implementation) that means caching compiled shaders. Reuse can be achieved between pipelines creation during the same application run by passing the same pipeline cache object when creating multiple pipelines. Reuse across runs of an application is achieved by retrieving pipeline cache contents in one run of an application, saving the contents, and using them to preinitialize a pipeline cache on a subsequent run.

Note that it may happens that a pipeline cache would not improve the performance of an application once it starts to render. This is because application developers are encouraged to create all the pipelines in advance, to avoid any hiccup during rendering. On that situation pipeline cache would help to reduce load times. In any case, that is not always avoidable. In that case the pipeline cache would allow to reduce the hiccup, as a cache hit is far faster than a shader recompilation.

One specific detail about our implementation is that internally we keep a default pipeline cache, used if the user doesn’t provide a pipeline cache when creating a pipeline, and also to cache the custom shaders we use for internal operations. This allowed to simplify our code, discarding some custom caches that were already implemented.

Uniform/storage texel buffer

Uniform texel buffers define a tightly-packed 1-dimensional linear array of texels, with texels going through format conversion when read in a shader in the same way as they are for an image. They are mostly equivalent to OpenGL buffer texture, so you can see them as textures backed up by a VkBuffer (through a VkBufferView). With uniform texel buffers you can only do a formatted load.

Storage texel buffers are the equivalent concept, but applied to images instead of textures. Unlike uniform texel buffers, they can also be written to in the same way as for storage images.


Multisampling is a technique that allows to reduce aliasing artifacts on images, by by sampling pixel coverage at multiple subpixel locations and then averaging subpixel samples to produce a final color value for each pixel. We have already started working on this feature, and included some patches on the development branch, but it is still a work in progress. Having said so, it is enough to get Sascha Willems’s basic multisampling demo working:

Sascha Willems multisampling demo run on rpi4


Again, in addition to work on specific features, we also spent some time fixing specific driver bugs, using failing Vulkan CTS tests as reference. So let’s take a look of some screenshots of Sascha Willem’s demos that are now working:

Sascha Willems deferred demo run on rpi4

Sascha Willems texture array demo run on rpi4

Sascha Willems Compute N-Body demo run on rpi4


We plan to work on supporting the following features next:

  • Robust access
  • Multisample (finish it)

Previous updates

Just in case you missed any of the updates of the vulkan driver so far:

Vulkan raspberry pi first triangle
Vulkan update now with added source code
v3dv status update 2020-07-01
V3DV Vulkan driver update: VkQuake1-3 now working

by infapi00 at July 31, 2020 08:23 AM

July 27, 2020

Jacobo Aragunde

The trip of a key press event in Chromium accessibility

It’s amazing to think about how much computing goes into something as simple as a keystroke that we just take for granted. Recently, I was fixing a bug related to accessibility key events, and to do this, first I had to understand the complex trip that these events take when they arrive to the browser – from the X server until they reach the accessibility system.

Let me start from the beginning. I’m working on the accessibility of the Chromium browser on Linux. The bug was #1042864: key strokes happening on native dialogs, like open and save dialogs, were not reported to the screen reader. The issue also affects Electron-based software, one important example is Visual Studio Code.

A cake with many layers

Linux accessibility is composed of many layers, and this fact stands out when working on a complex piece of software like Chromium which comes with its own UI toolkit.

Currently, the most straight-forward way to build accessible applications for the Linux desktop is using and implementing the hooks provided by the ATK (Accessibility Toolkit) API. GTK+ widgets already do so, applications using them will be accessible by default, but more complex software that implements custom widgets or its own toolkit will need to provide the code for the ATK entry points and emit the corresponding events. That’s the case of Chromium!

The screen reader or any Assistive technology (AT) has to get information and listen for events happening in any software running on the system, to transform it into something meaningful for their users, like speech or braille.

AT-SPI is the glue between these two ends: it runs at system level, observing what’s going on with applications and keeping a global registry of the accessible objects; it receives updates from applications, which translate local ATK objects into global AT-SPI objects; and it receives queries from ATs then pings them when events of their interest happen. It uses D-Bus for inter-process communication (IPC).

You can learn more about accessibility, in general and also in the web platform, in this great interview with my colleague Martin Robinson.

The trip of a keypress event

Let’s say we are building an AT in Python, making use of the pyatspi2 library. We want to listen keypress events, so we run registerKeystrokeListener to register a callback function.

Pyatspi2 actually wraps AT-SPI’s function atspi_register_keystroke_listener, which eventually calls the remote method RegisterKeystrokeListener via D-Bus. The actual D-Bus remote calls happen at dbind.c.

We have jumped from our AT to the AT-SPI service, via IPC. The DeviceEventController interface provides the remote method mentioned above, and the actual code implementing it is in impl_register_keystroke_listener. Then, the function is added to a list of listeners for key events, in spi_controller_register_device_listener; these listeners in the list will be notified when an event happens, in spi_controller_notify_keylisteners.

AT-SPI will sit there, waiting for the events from applications to arrive. They will come over through D-Bus, as they are from different processes: the entry point for any D-Bus message in the AT-SPI core is handle_dec_method_from_idle. One of the operations, NotifyListenersSync, will run impl_notify_listeners_sync which will later call the function spi_controller_notify_keylisteners we just mentioned, and run all the registered listeners.

Who will call the remote method NotifyListenersSync? Applications will have to do it, if they want to be accessible. They could implement this themselves, but they are likely using a wrapper library. In the case of GTK+, there is at-spi2-atk, which bridges ATK signals with the at-spi2-core D-Bus interfaces so applications don’t have to know them.

The bridge sports its own callback, named spi_atk_bridge_key_listener, and eventually calls the NotifyListenersSync method via D-Bus, at Accessibility_DeviceEventController_NotifyListenersSync. The callback was registered as an ATK key event listener in spi_atk_register_event_listeners: the function atk_add_key_event_listener, which is part of the ATK API, registers the key event listener, and it internally makes use of the AtkUtil struct as defined in atkutil.h.

AtkUtil functions are not implemented by the ATK library; instead, toolkits must provide them. GTK+ does it in _gtk_accessibility_override_atk_util; in the case of Chromium, the functions that populate the AtkUtil struct are defined in atk_util_auralinux_class_init. The particular function that registers the key event listener in Chromium is AtkUtilAuraLinuxAddKeyEventListener: it gets added to a list, which will later be called when an Atk key event is processed in Chromium, at HandleAtkKeyEvent.

Are we there yet?

There are more pieces of software involved in this issue: Chromium will receive key press events from the X server, involving another IPC connection and API layer, and there’s all the browser code managing them, where the solution was actually implemented. I will cover those parts, and the actual solution, in a future post.

Meanwhile, I hope you enjoyed reading this, and happy hacking!

by Jacobo Aragunde Pérez at July 27, 2020 03:01 PM

July 23, 2020

Iago Toral

V3DV Vulkan driver update: VkQuake1-3 now working

A few weeks ago we shared an update regarding the status of the Vulkan driver for the Raspberry Pi 4. In that post I mentioned that the plan was to focus on completing the feature set for Vulkan 1.0 and then moving on to conformance and bugfixing work before attempting to run actual games and applications. A bit later my colleague Alejandro shared another update detailing some of our recent feature work.

We have been making good progress so far and at this point we are getting close to having a complete Vulkan 1.0 implementation. I believe the main pending features for that are pipeline caches, which Alejandro is currently working on, texel buffers, multisampling support and robust buffer access, so in the last few weeks I decided to take a break from feature development and try to get some Vulkan games running with our driver and use them to guide some inital performance work.

I decided to work with all 3 VkQuake games since they run on Linux, the source code is available (which makes things a lot easier to debug) and seemed to be using a subset of the Vulkan API we already supported. For vkQuake we needed compute shaders and input attachments that we implemented recently, and for vkQuake3 we needed a couple of optional Vulkan features which I implemented recently to get it running without having to modify the game code. So all these games are now running on the Raspberry Pi4 with the V3DV driver. At the same time, our friend Salva from Pi Labs has also been testing the PPSSPP emulator using Vulkan and reporting that some games seem to be working already, which has been great to hear.

I was particularly interested in getting vkQuake3 to work because the project includes both the Vulkan and the original OpenGL renderers, which was great to compare performance between both APIs. VkQuake3 comes with a GL1 and a GL2 renderer, with the GL1 render being the fastest of the two by a large margin (apparently the GL2 renderer has additional rendering features that make it much slower). I think the Vulkan renderer is based on the GL1 renderer (although I have not actually checked) so I figured it would make the most reasonable comparison, and in our tests we found the Vulkan version to be up to 60% faster. Of course, it could be argued that GL1 is a pretty old API and that the difference with a more modern GL or GLES renderer might be less significant, but it is still a good sign.

To finish the post, here are some pics of the games:



vkQuake3 OpenGL 1 renderer

vkQuake3 OpenGL 1 renderer

vkQuake3 Vulkan renderer

vkQuake3 Vulkan renderer

A couple of final notes:
* Note that the Vulkan renderer for vkQuake3 is much darker, but that is just how the renderer operates and not a driver issue, we observed the same behavior on Intel GPUs.
* A note for those interested in trying vkQuake3, we noticed that exterior levels have broken sky rendering, I hope we will get to fix that soon.

by Iago Toral at July 23, 2020 11:29 AM

July 13, 2020

Frédéric Wang

Igalia's contribution to the Mozilla project and Open Prioritization

As many web platform developer and Firefox users, I believe Mozilla’s mission is instrumental for a better Internet. In a recent Igalia’s chat about the Web Ecosystem Health, participants made the usual observation regarding this important role played by Mozilla on the one hand and the limited development resources and small Firefox’s usage share on the other hand. In this blog post, I’d like to explain an experimental idea we are launching at Igalia to try and make browser development better match the interest of the web developer and user community.

Igalia’s contribution to browser repositories

As mentioned in the past in this blog, Igalia has contributed to different part of Firefox such as multimedia (e.g. <video> support), layout (e.g. Stylo, WebRender, CSS, MathML), scripts (e.g. BigInt, WebAssembly) or accessibility (e.g. ARIA). But is it enough?

Although commit count is an imperfect metric it is also one of the easiest to obtain. Let’s take a look at how Igalia’s commits repositories of the Chromium (chromium, v8), Mozilla (mozilla-central, servo, servo-web-render) and WebKit projects were distributed last year:

pie chart
Diagram showing, the distribution of Igalia's contributions to browser repositories in 2019 (~5200 commits). Chromium (~73%), Mozilla (~4%) and WebKit (~23%).

As you can see, in absolute value Igalia contributed roughly 3/4 to Chromium, 1/4 to WebKit, with a small remaining amount to Mozilla. This is not surprising since Igalia is a consulting company and our work depends on the importance of browsers in the market where Chromium dominates and WebKit is also quite good for iOS devices and embedded systems.

This suggests a different way to measure our contribution by considering, for each project, the percentage relative to the total amount of commits:

Bar graph
Diagram showing, for each project, the percentage of Igalia's commits in 2019 relative to the total amount of the project. From left to right: Chromium (~3.96%), Mozilla (~0.43%) and WebKit (~10.92%).

In the WebKit project, where ~80% of the contributions were made by Apple, Igalia was second with ~10% of the total. In the Chromium project, the huge Google team made more than 90% of the contributions and many more companies are involved, but Igalia was second with about 4% of the total. In the Mozilla project, Mozilla is also doing ~90% of the contributions but Igalia only had ~0.5% of the total. Interestingly, the second contributing organization was… the community of unindentified addresses! Of course, this shows the importance of volunteers in the Mozilla project where a great effort is done to encourage participation.

Open Prioritization

From the commit count, it’s clear Igalia is not contributing as much to the Mozilla project as to Chromium or WebKit projects. But this is expected and is just reflecting the priority set by large companies. The solid base of Firefox users as well as the large amount of volunteer contributors show that the Mozilla project is nevertheless still attractive for many people. Could we turn this into browser development that is not funded by advertising or selling devices?

Another related question is whether the internet can really be shaped by the global community as defended by the Mozilla’s mission? Is the web doomed to be controlled by big corporations doing technology’s “evangelism” or lobbying at standardization committees? Are there prioritization issues that can be addressed by moving to a more collective decision process?

At Igalia, we internally try and follow a more democratic organization and, at our level, intend to make the world a better place. Today, we are launching a new Open Prioritization experiment to verify whether crowdfunding could be a way to influence how browser development is prioritized. Below is a short (5 min) introductory video:

I strongly recommend you to take a look at the proposed projects and read the FAQ to understand how this is going to work. But remember this is an experiment so we are starting with a few ideas that we selected and tasks that are relatively small. We know there are tons of user reports in bug trackers and suggestions of standards, but we are not going to solve everything in one day !

If the process is successful, we can consider generalizing this approach, but we need to test it first, check what works and what doesn’t, consider whether it is worth pursuing, analyze how it can be improved, etc

Two Crowdfunding Tasks for Firefox

CIELAB color space*
Representation of the CIELAB color space (top view) by Holger Everding, under CC-SA 4.0.

As explained in the previous paragraph, we are starting with small tasks. For Firefox, we selected the following ones:

  • CSS lab() colors. This is about giving web developers a way to express colors using the CIELAB color space which approximates better the human perception. My colleague Brian Kardell wrote a blog with more details. Some investigations have been made by Apple and Google. Let’s see what we can do for Firefox !

  • SVG path d attribute. This is about expressing SVG path using the corresponding CSS syntax for example <path style="d: path('M0,0 L10,10,...')">. This will likely involve a refactoring to use the same parser for both SVG and CSS paths. It’s a small feature but part of a more general convergence effort between SVG and CSS that Igalia has been involved in.


Is this crowd-funded experiment going to work? Can this approach solve the prioritization problems or at least help a bit? How can we improve that idea in the future?…

There are many open questions but we will only be able to answer them if we have enough people participating. I’ll personally pledge for the two Firefox projects and I invite you to at least take a look and decide whether there is something there that is interesting for you. Let’s try and see!

July 13, 2020 10:45 AM

Oriol Brufau

Open Prioritization for implementing selector list argument of :not() in Chrome

CSS :not() selector

As you probably know, CSS selectors are patterns that can be used to apply a group of style declarations to multiple elements. For example, if you want to select all elements with the class foo, you can use the selector .foo or [class~=foo]. But sometimes, instead of styling the elements that have a specific characteristic, we want to exclude these and style the other ones.

For example, some developers prefer the border box sizing model rather than the content box one, so they use

* { box-sizing: border-box }

But then, imagine there is an image like

<img src="img.png" width="100" height="50" style="padding: 5px" />

The size attributes will include the padding, so the resulting content area will be 90x40. Not only will the image be downscaled, possibly producing artifacts, but it will also be stretched: the aspect ratio becomes 2.25 instead of the natural 2.

The above shows that excluding images can be a good idea. In CSS2, the solution was simply using a more specific selector to undo the general one:

* { box-sizing: border-box }
img { box-sizing: content-box }

However, in more complex cases this can get tedious, and the reset value may not be that obvious. Therefore, the Selectors Level 3 specification introduced the :not() pseudo-class. It accepts a selector argument, and matches all elements that do not match the argument. The example above can then be

:not(img) { box-sizing: border-box }

There were some limitations though. The selector argument had to be a simple selector, that is, one of:

Additionally, nesting negations like :not(:not(...)) was also invalid.

:not() with selector list argument

Selectors level 4 has made :not() more flexible, now it accepts any selector list as the argument.

In particular, it allows a compound selector argument, like :not(ul[reversed]), which matches all elements except the ones that both are ul and have the attribute reversed.

Another example of what you can do in level 4 is using a complex selector, that is, a sequence of compound selectors separated by combinators. For instance, :not(div > p) matches all elements except the p which have a div parent.

And finally, an example with a selector list could be :not(div, p), matching all elements except the ones that are either a div or a p.

Is that completely new behavior? Well, using De Morgan’s laws, the selectors above can be transformed into others which were valid in level 3:

Level 4 new syntax Level 3 alternative
:not(ul[reversed]) :not(ul), :not([reversed])
:not(div > p) :not(p), :not(div) > p, p:root
:not(div, p) :not(div):not(p)

However, the specificities are not completely equivalent. The specificity of a selector is a tern of 3 natural numbers (A,B,C), where in simple cases A is the number of ID selectors in the whole selector, B is the number of attribute or class selectors and pseudo-classes, and C is the number of type selectors and pseudo-elements. The specificity is one of the criteria used to solve conflicts when different CSS rules set the same property to the same element: the winning declaration will be the one with the most specific selector, when comparing specificities in lexicographical order.

The specificity of a :not() pseudo-class is the greatest among the specificities of the complex selectors in the argument, while the specificity of a selector list is the greatest among the complex selectors that match.

For example, for :not(ul[reversed]), the specificity is the same as for ul[reversed], i.e. (0,1,1). However, the specificity of :not(ul), :not([reversed]) can either be:

  • If only :not(ul) matches: (0,0,1).
  • If only :not([reversed]) matches: (0,1,0).
  • If both match, the maximum: (0,1,0).

Additionally, the level 3 alternatives can become cumbersome when used as part of a bigger selector. For instance, :not(.a1.a2) :not(.b1.b2) :not(.c1.c2) would become

:not(.a1) :not(.b1) :not(.c1), :not(.a1) :not(.b1) :not(.c2),
:not(.a1) :not(.b2) :not(.c1), :not(.a1) :not(.b2) :not(.c2),
:not(.a2) :not(.b1) :not(.c1), :not(.a2) :not(.b1) :not(.c2),
:not(.a2) :not(.b2) :not(.c1), :not(.a2) :not(.b2) :not(.c2)

Note the combinatorial explosion! It could be avoided using :is() or :where(), but they are also new level 4 additions.

Moreover, not all :not() selectors have a finite alternative in level 3. Consider :not(div p), that is, all elements which either are not p or don’t have any div ancestor. The problem is that we can’t directly enforce a constraint over all ancestors, instead we need something like

:not(p), p:root,
:root:not(div) > p,
:root:not(div) > :not(div) > p,
:root:not(div) > :not(div) > :not(div) > p,
/* ... */

and if a priori the DOM tree can be arbitrarily deep, we don’t know when to end the selector.

It’s for all these reasons that allowing selector lists in :not() is such a nice addition!

Browser support and Igalia’s Open Prioritization

If the new capabilities of :not() with a selector list sound cool to you, you might want to start using it right now! However, while major browsers have supported the level 3 version for a long time, only WebKit supports the level 4 one, Chrome and Firefox don’t.

But here is where Igalia comes into play! We are happy to announce Open Prioritization, an experiment for crowdfunding web platform features.

Historically, which features are implemented sooner and which ones are delayed has been decided by browser vendors, and big companies that can fund the work. But wouldn’t it be great if web developers could also have a saying in this prioritization?

At Igalia we have selected a few tasks that we think the community might be interested in. During a first stage, anybody can pledge their desired amount of money for their preferred choices. The first feature that reaches the goal will be selected for a 2nd stage, in which the actual funding will happen. And once funded, Igalia will do the implementation work. Note: don’t worry if you pledge for an option which is not selected, your money is only deducted when funding. See the FAQ for more details.

One of the features that we offer is precisely implementing :not() with selector lists in Chrome. If you like the idea, I invite you to pledge here!

Open Prioritization by Igalia. An experiment in crowd-funding prioritization.

by Oriol Brufau at July 13, 2020 10:45 AM

Brian Kardell

Open Prioritization and Advocacy

Open Prioritization and Advocacy

I love the Web, and I want it to thrive. It is an incredible open commons that makes so much of modern life possible, and better. If you're reading this, chances are pretty good that the web has even enabled your career. Mine too! But there's also pretty good odds that you think it could be better. I agree! Let me tell you about an experiment at visibly making it better, together.

The web is 30 years old. In that time we've learned a lot about the problem space involved. I write a lot about the challenges we've faced, and am a regular advocate for the idea that we can do better. I believe that when we lean on pragmatic answers and include more people as directly as possible, we all win. The commons wins. That's why I am so excited about our new experiment...

Open Prioritization, by Igalia: An experiment in crowdfunding prioritization

If you're a fan of The ShopTalk show, I first publicly discussed the idea there back in April. But, as you might have noticed, the world has been full of snags in 2020 and things got a little delayed. But finally, here we are!

There will be lots of posts describing what its about, it's a big idea. But, in a very few words: Here are six concrete web features that we can crowdfund together toward advancing actual work. Here is a means of having a tangible voice in how we advance this commons that we all share. Below is a short (5 min) video that describes it and answers questions submitted by some friends we talked to about it early...

More details are available at, and that's where you can pledge, and it contains more details too. But before you pledge anything, I'd like to offer some thoughts on it before you go.

I think that we'll see that prioritization is hard. I think we'll see that we probably don't all automatically agree on which thing is most worthy of prioritization.

And that's ok... In fact, it's really interesting.

Which of the 6 features best to support...

Everything is best. It depends who you ask. Part of the purpose of this experiment is to show that groups can find new ways to advocate and work together to decide to do concrete things. So let me attempt to do just that and get the ball rolling...

Let me tell you how I prioritized, and why - and advocate for the ones I think are the best choices. You can judge for yourself whether that makes sense and make your own decisions, or share your own advocacy.

First off, note that you have a lot of expressive power here: I pledged something to everything because I really believe in the idea. I think think that changing the status quo and advancing any of these in a new way is a positive result. What I did is vary how much I helped anything along the way to it's goal. My minium was $5, my max was $40, but that could just as well be $1 or $5. If you pledge $1, that is $1 worth of funding of the commons that didn't exist before and lots of $1's can get it done too.

The implmentation projects I pledged less to...
  • Selector list support for `:not in Chromium`

    I pledged $5 (1/9600th of the goal) to supporting selector lists in Chromium. Chrome already spends way more than anyone else, and is by far the most dominant browser. Generally speaking, I am more interested in helping to level things out. Almost as importantly to me: It's only a second implementation - it doesn't achieve universality in the way some others do. Those are more valuable to me. A second implementation is better than a just first, and it's a necessary step along the way, for sure. However, it still leaves me without for a while. In a very tangible way, the last implementation is worth a lot. Finally, while it offers something toward specifity issues, mostly, it's just sugar. Would I like it? Sure, but it feels like there not a long of "bang for the buck".

  • CSS lab()` colors in Firefox

    I gave $5 toward CSS's lab colors in Firefox too (the same degree toward the goal as :not). This was a tough one, I'm not going to lie. It is potentially very valuable, but it is the kind of investment you only fully realize at the end of several stages. This is, in fact, one of the reasons it's been slow to get traction. We have to get this to get colors level 5. We want this in design tools. But the real value is only when we have all of the next steps, even across the platform (like in canvas too). It would also appear that this is not yet a "cross the finish line" investment.

  • CSS d (SVG path) support in Firefox

    I also pledged $5 toward supporting SVG (d) path in CSS in Firefox (but that is 1/2600th of the goal). It seems less important than others to me on a number of counts. It also doesn't help us cross a finish line: Once this is done, it still isn't in WebKit. That said, it is very good bang for the buck and I think we could get it done.

  • CSS contain in WebKit

    I pledged $10 toward supporting containment in WebKit (1/7100th of the the goal). Again, this is tough. It is a last mile - only WebKit doesn't support it. However, the price tag is much bigger too, and my big challenge here is that I don't know what the payoff really looks like is in practice. It can improve performance, which is great, but for me - it's extra. It can be used in conjunction with ResizeObserver for some pretty good approximations of container queries, which is great too. But it isn't 100% clear that it is a necessary element for container queries. In fact, in our proposal and experiments to far, it isn't necessary for a whole bunch of use cases. But... it's uncertain. If it turns out to be critical, and we haven't invested in it, it's only that much longer until we finally arrive. So... Sigh... That's a tough one.

My top choices: inert and focus-visible

This leads me to my top choices: :focus-visible (I pledged $40 - 1/875th of the goal) and inert ($40 or 1 1200th of the goal)- both in WebKit.

Why these? Well, because I think they are the most valuable in a lot of ways. Of course, I'm biased: I worked on the design and championing of both of these features with my friends Alice Boxhall and Rob Dodson, including writing and popularizing polyfills and iterating on the details with practical feedback and experience under our belts. And now, here we are a few years later and these are last miles - only one browser doesn't have them implemented or isn't implementing yet. We can change that.

I did this work on my own time. as a developer, before I came to work for Igalia because there were real problems my teams were facing every day. Worse, they were right at the intersection of UX, DX and accessibility so the pains were very real. So, let me tell you about what they are and why I think they're worth giving that last push.


There are a whole bunch of design patterns that require you to manage a whole bunch of other properties enmasse: Make them unclickable, make their text unselectable, hide them from screen readers via ARIA, take them out of sequential focus temporarily. Probably the easiest example of this is when a dialog is open - the rest of the page is 'there' but kind of 'inert'. In fact, inert was as a term to the HTML specification when <dialog> was spec'ed. But this isn't the only time when you want this behavior and, in fact, it is useful in a number of other use cases - but it's hard enough that even a lot of common UI libraries get it wrong.

inert exposes the ability to say "the stuff inside this element should be all of those things" in a simple way via a reflecting attribute - that is, you can set it in markup as an attribute, or via DOM property (which updates the attribute).

You can read more about it, including other use cases and details on the WICG explainer (the related pull request to HTML itself is pending).


Research shows that way too many people disable the focus indicator, at the cost of accessibility. Well, there's kind of a rational explanation for why this happens so much, and that is that CSS's :focus pseudo-class is just plain broken.

From very early days, browsers have used various heuristics to make the native focus ring valuable and, at the same time, unobtrusive. There are times when an element gets focus where showing the indicator just feels bad and is there for no particularly really good reason for it to be there. The use cases are not all the same: If you click on a password field in a busy environment, for example, it's very important that you understand that your typed characters will land in the obscured field -- so you will always get a focus indicator. If you click on a button, on the other hand, and it gets a focus indicator, that might be disorienting... But if you keyboard navigate to that button, then you need to see the indicator.

Unfortunately, designs frequently don't "work" with only the native indicator, and the trouble is that traditionally the only tool that designers have to style the focus indicator is :focus, and it doesn't care about any of that. The net result is that, unfortunately, attempting to use :focus to make the focus indicator work better for your site results in unfamilliar and disorienting UX for everyone. Given only bad choices, the answer that too many make is the easy one: Simply disable it.

:focus-visible, on the other hand, only matches if the indicator would natively be shown, taking into account all of the browser UX research and preventing this disconnect. It is, in our experience, provably easier to get right, and more successful.

What about you?

What about you, will you support something? Will you help advocate for the priority of something specific? I (and we at Igalia at large) would love to hear your thoughts - share them on social media with the hashtag #openprioritization so that we can try to collect them.

July 13, 2020 04:00 AM

July 12, 2020

Manuel Rego

Open Prioritization and CSS Containment

Igalia is a major contributor to all the open source web rendering engines (Blink, Gecko, Servo and WebKit). We have been doing different kind of contributions for years, which has led us to have an important position on the different communities. This allows us to help our customers to solve their problems through upstream contributions that also benefit the whole web community.

Implementing a feature in a rendering engine (or in several) might look very simple at first sight, but contributing them upstream can take a while depending on the standarization status, the related bugs, the browser architecture, and many other factors. You can find examples of things implemented by Igalia in the past on my previous blog posts, and you will realize about all the work behind some of those features.

There’s a common thing everywhere, people usually get really angry because that bug they reported years ago is still not fixed in a given browser. That can be for a variety of reasons, and not simply because the developers of that browser are very lazy and not paying attention to that particular bug. In many cases the answer to why that hasn’t been solved yet is pretty simple: priorities. Different companies and individuals contributing to the projects have their own interests and priorities, they prioritize the different issues and tasks and put the focus and effort on the ones that have a higher priority for them. A possible solution for that, now that major browsers are all open source, would be to look for a consulting company like Igalia that can fix that bug for you; but you as an individual, or even as a company, maybe you don’t have the budget to make that happen.

What would happen if we allow several parties to contribute together to the development of some features? That would make possible that both individuals and organizations that don’t have the power to implement them alone, could contribute their piece of the cake in order to add support for those features on the web platform.

Open Prioritization

Igalia is launching Open Prioritization, a crowd-founding campaign for the web platform. We believe this can open the door to many different people and organizations to prioritize the development of some features on the different web engines. Initially we have defined 6 tasks that can be found on the website, together with a FAQ explaining all the details of the campaign. 🚀

Let’s hope we can make this happen. If this is a success and some of these items get funded and implemented, probably there’ll be more in the future, including new things or ideas that you can share with us.

Open Prioritization by Igalia. An experiment in crowd-funding prioritization. Open Prioritization by Igalia

One of the tasks of the Open Prioritization campaign we’re starting this week is about adding CSS Containment support in WebKit, and we have experience working on that in Chromium.

Why CSS Containment in WebKit?

Briefly speaking CSS Containment is a standard focused in improving the rendering performance of web pages, it allows author to isolate DOM subtrees from the rest of the document, so any change that happens on the “contained” subtree doesn’t affect anything outside that element.

This is the spec behind the contain property, that can have a few values defining the “type of containment”: layout, paint, size and style. I’m not going to go deeper into this and I’ll refer to my introductory post or my CSSconf EU talk if you’re interested in getting more details about this specification.

So why we think this is important? Currently we have an issue with CSS Containment, it’s supported in Chromium and Firefox (except style containment) but not in WebKit. This might be not a big deal as it’s a performance oriented feature, so if you don’t have support you’ll simply have a worse performance and that’s all. But that’s not completely true as the different type of containments have some restrictions that apply to the contained element (e.g. layout containment makes the element become the containing block of positioned descendants), which might cause interoperability issues if you start to use the contain property in your websites.

The main goal of this task would be add CSS Containment support in WebKit, at least to the level that it’s spec compliant with the other implementations, and if time permits to implement some optimizations based on it. Once we have interoperability you can start using it wihtout any concern in your web pages, as the behavior won’t change between the different browsers and you might get some perf improvements (that will vary depending on each browser implementation).

In addition this will allow WebKit to implement further optimizations thanks to the information that the web authors provide through the contain property. On top of that, this initial support is a requirement in order to implement new features that are based on it; like the new CSS properties content-visibility and contain-intrinsic-size which are part of Display Locking feature.

If you think this is an important feature for you, please go ahead and do your pledge so it can get prioritized and implemented in WebKit upstream.

Really looking forward to seeing how this Open Prioritization campaign goes in the coming weeks. 🤞

July 12, 2020 10:00 PM

July 10, 2020

Víctor Jáquez

New VA-API H.264 decoder in gst-plugins-bad

Recently, a new H.264 decoder, using VA-API, was merged in gst-plugins-bad.

Why another VA-based H.264 decoder if there is already gstreamer-vaapi?

As usual, an historical perspective may give some clues.

It started when Seungha Yang implemented the GStreamer decoders for Windows using DXVA2 and D3D11 APIs.

Perhaps we need one step back and explain what are stateless decoders.

Video decoders are magic and opaque boxes where we push encoded frames, and later we’ll pop full decoded frames in raw format. This is how OpenMAX and V4L2 decoders work, for example.

Internally we can imagine those magic and opaque boxes has two main operations:

  • Codec state handling
  • Signal processing like Fourier-related transformations (such as DCT), entropy coding, etc. (DSP, in general)

The codec state handling basically extracts, from the stream, the frame’s parameters and its compressed data, so the DSP algorithms can decode the frames. Codec state handling can be done with generic CPUs, while DSP algorithms are massively improved through specific purpose processors.

These video decoders are known as stateful decoders, and usually they are distributed through binary and closed blobs.

Soon, silicon vendors realized they can offload the burden of state handling to third-party user-space libraries, releasing what it is known as stateless decoders. With them, your code not only has to push frames into the opaque box, but now it shall handle the codec specifics to provide all the parameters and references for each frame. VAAPI and DXVA2 are examples of those stateless decoders.

Returning to Seungha’s implementation, in order to get theirs DXVA2/D3D11 decoders, they also needed a state handler library for each codec. And Seungha wrote that library!

Initially they wanted to reuse the state handling in gstreamer-vaapi, which works pretty good, but its internal library, from the GStreamer perspective, is over-engineered: it is impossible to rip out only the state handling without importing all its data types. Which is kind of sad.

Later, Nicolas Dufresne, realized that this library can be re-used by other GStreamer plugins, because more stateless decoders are now available, particularly V4L2 stateless, in which he is interested. Nicolas moved Seungha’s code into a library in gst-plugins-bad.

Currently, libgstcodecs provides state handling of H.264, H.265, VP8 and VP9.

Let’s return to our original question: Why another VA-based H.264 decoder if there is already one in gstreamer-vaapi?

The quick answer is «to pay my technical debt».

As we already mentioned, gstreamer-vaapi is big and over-engineered, though we have being simplifying the internal libraries, in particular He Junyan, has done a lot of work replacing the internal base class, GstVaapiObject, withGstObject or GstMiniObject. Also, this kind of projects, where there’s a lot of untouched code, it carries a lot of cargo cult decisions.

So I took the libgstcodecs opportunity to write a simple, thin and lean, H.264 decoder, using VA new API calls (vaExportSurfaceHandle(), for example) and learning from other implementations, such as FFMpeg and ChromeOS. This exercise allowed me to identify where are the dusty spots in gstreamer-vaapi and how they should be fixed (and we have been doing it since then!).

Also, this opportunity lead me to learn a bit more about the H.264 specification since I implemented the reference picture list handling, and fixed a small bug in Chromium.

Now, let me be crystal clear: GStreamer VA-API is not going anywhere. It is, right now, one of the most feature-complete implementations using VA-API, even with its integration issues, and we are working on them, particularly, Intel folks are working hard on a new AV1 decoder, enhancing encoders and adding new video post-processing features.

But, this new vah264dec is an experimental VA-API decoder, which aims towards a tight integration with GStreamer, oriented to provide a good experience in most of the common use cases and to enhance the common libgstcodecs library shared with other stateless decoders, looking to avoid Intel specific nuances.

These are the main characteristics and plans of this new decoder:

  • It use, by default, a DRM connection to VA display, avoiding the troubles of choosing X11 or Wayland.
    • It uses the first found DRM device as VA display
    • In the future, users will be able to provide their custom VA display through the pipeline’s context.
  • It requires libva >= 1.6
  • No multiview/stereo profiles, neither interlaced streams, because libgstcodecs doesn’t handle them yet
  • It is incompatible with gstreamer-vaapi: mixing elements might lead to problems.
  • Even if memory:VAMemory is exposed, it is not handled yet by any other element yet.
    • Users will get VASurfaces via mapping as GstGL does with textures.
  • Caps templates are generated dynamically generated by querying VAAPI
  • YV12 and I420 are added for system memory caps because they seem to be supported for all the drivers when downloading frames onto main memory, as they are used by xvimagesink and others, avoiding color conversion.
  • Decoding surfaces aren’t bounded to context, so they can grow beyond the DBP size, allowing smooth reverse playback.
  • There isn’t yet error handling and recovery.
  • The element is supposed to spawn if different renderD nodes with VA-API driver support are found (like gstv4l2), but it hasn’t been tested yet.

Now you may be asking how do I use vah264dec?

Currently vah264dec has NONE rank, which means that it will never be autoplugged, but you can use the trick of the environment variable GST_PLUGIN_FEATURE_RANK:

$ GST_PLUGIN_FEATURE_RANK=vah264dec:259 gst-play-1.0 ~/video.mp4

And that’s it!


by vjaquez at July 10, 2020 06:15 PM

July 08, 2020

Ricardo García

VK_EXT_extended_dynamic_state released for Vulkan

A few days ago, the VK_EXT_extended_dynamic_state extension for Vulkan was released and included for the first time as part of Vulkan 1.2.145. This is a pretty interesting extension that makes Vulkan pipelines more flexible and practical for many use cases. At Igalia, I was involved in getting this extension out the door as the author of its VK-GL-CTS tests and, in a very minor capacity, by reviewing the spec text and contributing a couple of small fixes to it.

Vulkan pipelines

The purpose of this Vulkan extension is to make Vulkan pipelines less rigid by allowing them to have certain values set dynamically when you use the pipeline instead of those values being set in stone when creating the pipeline. For those less familiar with Vulkan, Vulkan pipelines are one of the most “heavy” objects in the API. Vulkan typically has compute and graphics pipelines. For this extension, we’ll be talking about graphics pipelines. A pipeline object, when created, contains a lot of information about what the GPU needs to do when rendering a scene or part of a scene, like how triangle vertices need to be read from memory, the number of textures, buffers and images that will be used, parameters for color blending operations, depth and stencil tests, multisample antialiasing, viewports, etc.

Vulkan, being a low-overhead API that tries to help you squeeze as much performance as possible out of a GPU, wants you to specify all that information in advance so implementations (GPU plus driver) have higher chances of optimizing the process, both at pipeline creation time and at runtime. Every time you “bind a pipeline” (i.e. setting it as the active pipeline for future commands) you’re telling the implementation how everything should work, which is usually followed by commands telling the GPU to draw lots of geometry using the previous parameters.

Creating a pipeline may also involve compiling shaders to native GPU instructions. Shaders are “small” programs that run on the GPU when the rendering process reaches a programmable stage. When a GPU is drawing anything, the drawing process is divided in stages. Each stage takes a number of inputs both directly from the previous stage and as external resources (buffers, textures, etc), and produces a number of outputs to be directly consumed by the next stage or as side effects in external resources. Some of those stages are fixed and some are programmable with user-provided shader programs. When these shaders are not so small, compiling and optimizing them to native GPU instructions takes some time. Usually not a very long time, but every millisecond counts when you only have 16 of them to draw the next frame in order to achieve 60 frames per second. Stuff like this is what drove the creation of the ACO shader compiler for the Mesa RADV driver and it’s also why some drivers hash shader contents and use a shader cache to check if that exact shader has been compiled before. It’s also why Vulkan wants you to create pipelines in advance if possible. Otherwise, if you realize you need a new pipeline in the middle of preparing the next frame in an action game, the pipeline creation process may make the game stutter at that point due to the extra processing time needed.

Vulkan gives you several possibilities to alleviate the problem. You can create every pipeline you may need in advance. This is one of the most effective approaches but may involve a good number of pipelines due to the different possible combinations of pipeline parameters you may want to use. Say you want to vary 7 different parameters independently from each other with two possible values each. That means you have to create 128 different pipelines and manage them in your application. Another option is using a pipeline cache that will speed up creation of pipelines identical or similar to other ones created in the past. This lets you focus only on the pipeline variants you need at a given point in time. Finally, Vulkan gives you the possibility of changing a few pipeline parameters on the fly instead of giving them fixed values at pipeline creation time. This is the dynamic state inside the pipeline.

Dynamic state and VK_EXT_extended_dynamic_state

Dynamic state helps in addition to anything I mentioned before. It makes your application logic easier by not having to deal with so many different variations and reduces the total number of times you may have to create a new pipeline, which may decrease initialization time, pipeline cache sizes and access, state changes and game stuttering. VK_EXT_extended_dynamic_state, when available and as its name implies, extends the number of pipeline elements that can be part of that dynamic state. It adds states like the culling mode, front face, primitive topology, viewport with count, scissor with count (previously, viewports and scissors could be changed dynamically but not their counts), vertex input binding stride, depth test activation and writes, depth comparison operation, depth bounds activation and stencil test activation and operations. That’s a pretty large set of new dynamic elements.

The obvious question that follows is if using so many dynamic elements decreases performance, in the sense that it may reduce the optimization opportunities the implementation may have because some details about the pipeline are not known in advance. The answer is that this really depends on the implementation. For example, in some implementations the culling mode or front face may be set in a register before drawing operations and there’s no practical difference between setting it when the pipeline is bound to be used or dynamically before a large set of drawing commands are used.

I’ve measured the impact of enabling every new dynamic state in a simple GPU-bound Vulkan program that displays a rotating model on screen and I haven’t noticed any performance impact with the NVIDIA proprietary driver and a GTX 1070 card, but your mileage may vary. As usual, measure before deploying.

VK_EXT_extended_dynamic_state can also help when Vulkan is used as the backend to implement other higher level APIs which are not as rigid as Vulkan itself and in which some drawing parameters can be changed on the fly, being up to the driver to implement those changes as efficiently as possible. We’re talking about OpenGL, or DirectX up to version 11. As you can imagine, it’s an interesting extension for projects like DXVK and it can help improve the state of Linux gaming through Wine and Proton.

Origins of VK_EXT_extended_dynamic_state

The story about how this extension came to be is also interesting. It all started as a reaction to an “angry” tweet by Eric Lengyel in which he lamented that he had to create two separate pipelines just to change the front face or winding order of triangles when rendering a reflection. That prompted Piers Daniell from NVIDIA to start a multivendor effort inside Khronos that resulted in VK_EXT_extended_dynamic_state. As you can read in the extension summary, several companies where involved: AMD, Arm, Broadcom, Google, Imagination, Intel, NVIDIA, and Valve.

For that reason, this extension is also one of the many success stories from the Khronos Group, a forum in which hardware and software vendors, big and small, participate designing and standardizing cross-platform solutions for the graphics industry. Many different points of view are taken into account when designing those solutions. If you look at the member list you’ll see plenty of known logos from hardware manufacturers and software developers, including companies making widely available game engines.

In this case an angry tweet was enough to spark an effort, but that’s not the ideal situation. You can propose specification improvements, extensions or new ideas using the Vulkan Docs repository. An issue could be enough and, for small changes, a pull request can be even better.

July 08, 2020 09:02 PM

Mario Sanchez Prada

​Chromium now migrated to the new C++ Mojo types

At the end of the last year I wrote a long blog post summarizing the main work I was involved with as part of Igalia’s Chromium team. In it I mentioned that a big chunk of my time was spent working on the migration to the new C++ Mojo types across the entire codebase of Chromium, in the context of the Onion Soup 2.0 project.

For those of you who don’t know what Mojo is about, there is extensive information about it in Chromium’s documentation, but for the sake of this post, let’s simplify things and say that Mojo is a modern replacement to Chromium’s legacy IPC APIs which enables a better, simpler and more direct way of communication among all of Chromium’s different processes.

One interesting thing about this conversion is that, even though Mojo was already “the new thing” compared to Chromium’s legacy IPC APIs, the original Mojo API presented a few problems that could only be fixed with a newer API. This is the main reason that motivated this migration, since the new Mojo API fixed those issues by providing less confusing and less error-prone types, as well as additional checks that would force your code to be safer than before, and all this done in a binary compatible way. Please check out the Mojo Bindings Conversion Cheatsheet for more details on what exactly those conversions would be about.

Another interesting aspect of this conversion is that, unfortunately, it wouldn’t be as easy as running a “search & replace” operation since in most cases deeper changes would need to be done to make sure that the migration wouldn’t break neither existing tests nor production code. This is the reason why we often had to write bigger refactorings than what one would have anticipated for some of those migrations, or why sometimes some patches took a bit longer to get landed as they would span way too much across multiple directories, making the merging process extra challenging.

Now combine all this with the fact that we were confronted with about 5000 instances of the old types in the Chromium codebase when we started, spanning across nearly every single subdirectory of the project, and you’ll probably understand why this was a massive feat that would took quite some time to tackle.

Turns out, though, that after just 6 months since we started working on this and more than 1100 patches landed upstream, our team managed to have nearly all the existing uses of the old APIs migrated to the new ones, reaching to a point where, by the end of December 2019, we had completed 99.21% of the entire migration! That is, we basically had almost everything migrated back then and the only part we were missing was the migration of //components/arc, as I already announced in this blog back in December and in the chromium-mojo mailing list.

Progress of migrations to the new Mojo syntax by December 2019

This was good news indeed. But the fact that we didn’t manage to reach 100% was still a bit of a pain point because, as Kentaro Hara mentioned in the chromium-mojo mailing list yesterday, “finishing 100% is very important because refactoring projects that started but didn’t finish leave a lot of tech debt in the code base”. And surely we didn’t want to leave the project unfinished, so we kept collaborating with the Chromium community in order to finish the job.

The main problem with //components/arc was that, as explained in the bug where we tracked that particular subtask, we couldn’t migrate it yet because the external libchrome repository was still relying on the old types! Thus, even though almost nothing else in Chromium was using them at that point, migrating those .mojom files under //components/arc to the new types would basically break libchrome, which wouldn’t have a recent enough version of Mojo to understand them (and no, according to the people collaborating with us on this effort at that particular moment, getting Mojo updated to a new version in libchrome was not really a possibility).

So, in order to fix this situation, we collaborated closely with the people maintaining the libchrome repository (external to Chromium’s repository and still relies in the old mojo types) to get the remaining migration, inside //components/arc, unblocked. And after a few months doing some small changes here and there to provide the libchrome folks with the tools they’d need to allow them to proceed with the migration, they could finally integrate the necessary changes that would ultimately allow us to complete the task.

Once this important piece of the puzzle was in place, all that was left was for my colleague Abhijeet to land the CL that would migrate most of //components/arc to the new types (a CL which had been put on hold for about 6 months!), and then to land a few CLs more on top to make sure we did get rid of any trace of old types that might still be in codebase (special kudos to my colleague Gyuyoung, who wrote most of those final CLs).

Progress of migrations to the new Mojo syntax by July 2020

After all this effort, which would sit on top of all the amazing work that my team had already done in the second half of 2019, we finally reached the point where we are today, when we can proudly and loudly announce that the migration of the old C++ Mojo types to the new ones is finally complete! Please feel free to check out the details on the spreadsheet tracking this effort.

So please join me in celebrating this important milestone for the Chromium project and enjoy the new codebase free of the old Mojo types. It’s been difficult but it definitely pays off to see it completed, something which wouldn’t have been possible without all the people who contributed along the way with comments, patches, reviews and any other type of feedback. Thank you all! 👌 🍻

IgaliaLast, while the main topic of this post is to celebrate the unblocking of these last migrations we had left since December 2019, I’d like to finish acknowledging the work of all my colleagues from Igalia who worked along with me on this task since we started, one year ago. That is, Abhijeet, Antonio, Gyuyoung, Henrique, Julie and Shin.

Now if you’ll excuse me, we need to get back to working on the Onion Soup 2.0 project because we’re not done yet: at the moment we’re mostly focused on converting remote calls using Chromium’s legacy IPC to Mojo (see the status report by Dave Tapuska) and helping finish Onion Soup’ing the remaining directores under //content/renderer (see the status report by Kentaro Hara), so there’s no time to waste. But those migrations will be material for another post, of course.

by mario at July 08, 2020 08:55 AM

Philip Chimento

Sculpture of one of Salvador Dalí's melting pocket watches draped over a tree branch.

It’s been a long time since I last blogged. In the interim I started a new job at Igalia as a JavaScript Engine Developer on the compilers team, and attended FOSDEM in Brussels several million years ago in early February back when “getting on a plane and traveling to a different country” was still a reasonable thing to do.

In this blog post I would like to present Temporal, a proposal to add modern and comprehensive handling of dates and times to the JavaScript language. This has been the project I’m working on at Igalia, as sponsored by Bloomberg. I’ve been working on it for the last 6 months, joining several of my coworkers in a cross-company group of talented people who have already been working on it for several years.

This is the kind of timekeeping you get with the old JavaScript Date… (Public domain photograph by Julo)

I already collaborated on a blog post about Temporal, “Dates and Times in JavaScript”, so I won’t repeat all that here, but all the explanation you really need is that Temporal is a modern replacement for the Date object in JavaScript, which is terrible. You may also want to read “Fixing JavaScript Date”, a two-part series providing further background, by Maggie Pint, one of the originators of Temporal.

How Temporal can be useful in GNOME

I’m aware that this blog is mostly read by the GNOME community. That’s why in this blog post I want to talk especially about how a large piece of desktop software like GNOME is affected by JavaScript Date being so terrible.

Of course most improvements to the JavaScript language are driven by the needs of the web.1 But a few months ago this merge request caught my eye, fixing a bug that made the date displayed in GNOME wrong by a full 1,900 years! The difference between Date.getYear() not doing what you expect (and Date.getFullYear() doing it instead) is one of the really awful parts of JavaScript Date. In this case if there had been a better API without evil traps, the mistake might not have been made in the first place, and it wouldn’t have come down to a last-minute code freeze break.

In the group working on the Temporal proposal we are seeking feedback from people who are willing to try out the Temporal API, so that we can find out if there are any parts that don’t meet people’s needs and change them before we try to move the proposal to Stage 3 of the TC39 process. Since I think GNOME Shell and GNOME Weather, and possibly other apps, might benefit from using this API when it becomes part of JavaScript in the future, I’d be interested in finding out what we in the GNOME community need from the Temporal API.

It seems to me the best way to do this would be to make a port of GNOME Shell and/or GNOME Weather to the experimental Temporal API, and see what issues come up. Unfortunately, it would defeat the purpose for me to do this myself, since I am already overly familiar with Temporal and by now its shortcomings are squarely in my blind spot! So instead I’ll offer my help and guidance to anyone who wants to try this out. Please get in touch with me if you are interested.

How to try it out

Since Temporal is of course not yet a built-in object in JavaScript, to try it out we will need to import a polyfill. We have published a polyfill which is experimental only, for the purpose of trying out the API and integrating it with existing code. Here’s a link to the API documentation.

The polyfill is primarily published as an NPM library, but we can get it to work with GJS quite easily. Here’s how I did it.

First I cloned the tc39/proposal-temporal repo, and ran npm install and npm run build in it. This generates a file called polyfill/script.js which you can copy into your code, into a place in your imports path so that the importer can find it. Then you can import Temporal:

const {Temporal} = imports.temporal.temporal;

Note that the API is not stable, so only use this to try out the API and give feedback! Don’t actually include it in your code. We have every intention of changing the API, maybe even drastically, based on feedback that we receive.

Once you have tried it out, the easiest way to tell us about your findings is to complete the survey, but do also open an issue in the bug tracker if you have something specific.

Intl, or how to stop doing _("%B %-d %Y")

While I was browsing through GNOME Shell bug reports to find ones related to JavaScript Date, I found several such as gnome-shell#2293 where the translated format strings lag behind the release while translators figure out how to translate cryptic strings such as "%B %-d %Y" for their locales. By doing our own translations, we are actually creating the conditions to receive these kinds of bug reports in the first place. Translations for these kinds of formats that respect the formatting rules for each locale are already built into JavaScript engines nowadays, in the Intl API via libicu, and we could take advantage of these translations to take some pressure off of our translators.

In fact, we could do this right now already, no need to wait for the Temporal proposal to be adopted into JavaScript and subsequently make it into GJS. We already have everything we need in GNOME 3.36. With Intl, the function I linked above would become:

_updateTitle() {
    const locale = getCachedLocale();
    const timeSpanDay = GLib.TIME_SPAN_DAY / 1000;
    const now = new Date();
    const rtf = new Intl.RelativeTimeFormat(locale, {numeric: 'auto'});

    if (this._startDate <= now && now <= this._endDate)
        this._title.text = rtf.format(0, 'day');
    else if (this._endDate < now && now - this._endDate < timeSpanDay)
        this._title.text = rtf.format(-1, 'day');
    else if (this._startDate > now && this._startDate - now < timeSpanDay)
        this._title.text = rtf.format(1, 'day');
    else if (this._startDate.getFullYear() === now.getFullYear())
        this._title.text = this._startDate.toLocaleString(locale, {month: 'long', day: 'numeric'});
        this._title.text = this._startDate.toLocaleString(locale, {year: 'numeric', month: 'long', day: 'numeric'});

(Note, this presumes a function getCachedLocale() which determines the correct locale for Intl by looking at the LC_TIME, LC_ALL, etc. evnvironment variables. If GNOME apps wanted to move to Intl generally, I think it might be worth adding such a function to GJS’s Gettext module.)

Whereas in the future with Temporal, it would be even simpler and clearer, and I couldn’t resist rewriting that method! We wouldn’t need to store a start Date at 00:00 and end Date at 23:59.999 which is really just a workaround for the fact that we are talking here about a date without a time component, that is purely a calendar day. Temporal covers this use case out of the box:

_updateTitle() {
    const locale = getCachedLocale();
    const today =;

    const {days} = today.difference(this._date);
    if (days <= 1) {
        const rtf = new Intl.RelativeTimeFormat(locale, {numeric: 'auto'});
        // Note: if this negation seems a bit unwieldy, be aware that we are
        // considering revising the API to allow negative-valued durations
        days =, this._date) < 0 ? days : -days;
        this._title.text = rtf.format(days, 'day');
    } else {
        const options = {month: 'long', day: 'numeric'};
        if (today.year !== this._date.year)
            options.year = 'numeric';

        this._title.text = this._date.toLocaleString(locale, options);

Calendar systems

One exciting thing about Temporal is that it will support non-Gregorian calendars. If you are a GNOME user or developer who uses a non-Gregorian calendar, or develops code for users who do, then please get in touch with me! In the group of people developing Temporal everyone uses the Gregorian calendar, so we have a knowledge gap about what users of other calendars need. We’d like to try to close this gap by talking to people.

A Final Note

In the past months I’ve not been much in the mood to write blog posts. My mind has been occupied worrying about the health of my family, friends, and myself; feeling fury and shame at the inequalities of our society that, frankly, the pandemic has made harder to fool ourselves into forgetting if it doesn’t affect us directly; and fury at our governments that perpetuate these problems and resist meaningful attempts at reform.

With all that’s going on in the world, blogging about technical achievements feels a bit ridiculous and inconsequential, but, well, I’m writing this, and you’re reading this, and here we are. So keep in mind there are other important things too. Be safe, be kind, but don’t forget to stay furious after the dust settles.

[1] One motivation for why some are eagerly awaiting Temporal as part of the JavaScript language, as opposed to a library, is that it would be built-in to the browser. The most popular library for fixing the deficiencies of Date, moment.js, can mean an extra download of 20–100 kb, depending on whether you include all locales and support for time zones. This adds up to quite a lot of wasted data if you are downloading this on a large number of the websites you visit, but this specifically doesn’t affect GNOME. ↩

by Philip Chimento at July 08, 2020 12:43 AM

July 06, 2020

Asumu Takikawa

Shipping WebAssembly's BigInt/I64 conversion in Firefox

Hello folks. Today I’m excited to share with you about some work I’ve been hacking on in Firefox’s WebAssembly (AKA Wasm) engine recently.

The tl;dr summary: starting in Firefox 78 (released June 30, 2020), you will be able to write WebAssembly functions that pass 64-bit integers to JavaScript (where they will turn into BigInts) and vice versa.

(see Lin Clark’s excellent Cartoon Intro to WebAssembly if you’re not familiar with WebAssembly)

Wasm comes with a JavaScript API that allows a Wasm program to interact with JavaScript, by exchanging values through mechanisms such as function calls between the languages, globals, or shared linear memory. The initial MVP release of Wasm came with four built-in types: i32, i64, f32, and f64 (32-bit and 64-bit integers and 32-bit and 64-bit floats, respectively).

All but one of these types could be used to talk with JavaScript programs from the start by converting to a Number appropriately. But i64s were disallowed.

Concretely, a simple example program (with a single function that returns an i64) like this one:

  (func (export "f") (result i64)
    (i64.const 42)))

would produce an error like this when you would try to run it from JS:

> WebAssembly.instantiateStreaming(fetch('../out/main.wasm')).
then(obj => console.log(obj.instance.exports.f())).catch(console.error);

[error]: TypeError: cannot pass i64 to or from JS

(you can try this out in WebAssembly Studio: select a wat project and put the above code in the .wat and .js files respectively)

In Firefox 78, you won’t get an error. Instead, the JS caller will receive a BigInt value 42n and can continue its computation.

One of the practical ramifications of this limitation in previous releases was that tools that produce Wasm code would have to legalize it by transforming the code to accept or produce multiple 32-bit integers instead. For example, a function with a signature (param i64) would be translated to one with (param i32 i32). This can increase compiled code size and complicate the JS logic in your program.

The reason that the limitation existed was that until relatively recently, there was no good way to represent a 64-bit integer value in JS; the Number type in JS doesn’t allow you to represent the full range of a 64-bit integer.

That’s all changed since BigInt (arbitrary-length integers) became part of the JS standard and started shipping in browsers. In Firefox, it’s been available since 2019 after my colleagues at Igalia shipped it (see Andy Wingo’s blog post about that).

With the now implemented “JavaScript BigInt to WebAssembly i64 integration” proposal, i64 values from Wasm get converted to BigInts on their way out to JS as you saw in the earlier example. On the other hand, JS code can send a BigInt (e.g., as an argument to a Wasm function) to Wasm and it will get converted to an i64. In the case that a BigInt’s value exceeds the i64 range, it is fit into the range with a modulo operation.

Support for this proposal has also landed in tools like Emscripten already as well (you have to pass a WASM_BIGINT flag to use it).

It was possible to fill in this gap in the JS API thanks to a long line of work on BigInt in JS standards and engines. Some of my colleagues at Igalia such as Dan Ehrenberg and Caio Lima have been pushing forward a lot of the work on BigInt, which we talked about a little bit in a recent blog post.

Much of the BigInt/i64 interoperation work in Firefox was originally done by Sven Sauleau, who championed the spec proposal along with Dan Ehrenberg. Ms2ger from Igalia also worked on the spec text and tests. Many thanks to the engineers at Mozilla (Lars Hansen, Benjamin Bouvier, and André Bargull) who helped review the work and provided bug fixes.

Finally, our work on WebAssembly at Igalia is sponsored by Tech at Bloomberg. Thanks to them for making this work on the web platform possible! They have been a great partner in Igalia’s mission to help build an expressive and open web platform for everyone.

by Asumu Takikawa at July 06, 2020 07:00 PM

July 05, 2020

Frédéric Wang

Contributions to Web Platform Interoperability (First Half of 2020)

Note: This blog post was co-authored by AMP and Igalia teams.

Web developers continue to face challenges with web interoperability issues and a lack of implementation of important features. As an open-source project, the AMP Project can help represent developers and aid in addressing these challenges. In the last few years, we have partnered with Igalia to collaborate on helping advance predictability and interoperability among browsers. Standards and the degree of interoperability that we want can be a long process. New features frequently require experimentation to get things rolling, course corrections along the way and then, ultimately as more implementations and users begin exploring the space, doing really interesting things and finding issues at the edges we continue to advance interoperability.

Both AMP and Igalia are very pleased to have been able to play important roles at all stages of this process and help drive things forward. During the first half of this year, here’s what we’ve been up to…

Default Aspect Ratio of Images

In our previous blog post we mentioned our experiment to implement the intrinsic size attribute in WebKit. Although this was a useful prototype for standardization discussions, at the end there was a consensus to switch to an alternative approach. This new approach addresses the same use case without the need of a new attribute. The idea is pretty simple: use specified width and height attributes of an image to determine the default aspect ratio. If additional CSS is used e.g. “width: 100%; height: auto;”, browsers can then compute the final size of the image, without waiting for it to be downloaded. This avoids any relayout that could cause bad user experience. This was implemented in Firefox and Chromium and we did the same in WebKit. We implemented this under a flag which is currently on by default in Safari Tech Preview and the latest iOS 14 beta.


We continued our efforts to enhance scroll features. In WebKit, we began with scroll-behavior, which provides the ability to do smooth scrolling. Based on our previous patch, it has landed and is guarded by an experimental flag “CSSOM View Smooth Scrolling” which is disabled by default. Smooth scrolling currently has a generic platform-independent implementation controlled by a timer in the web process, and we continue working on a more efficient alternative relying on the native iOS UI interfaces to perform scrolling.

We have also started to work on overscroll and overscroll customization, especially for the scrollend event. The scrollend event, as you might expect, is fired when the scroll is finished, but it lacked interoperability and required some additional tests. We added web platform tests for programmatic scroll and user scroll including scrollbar, dragging selection and keyboard scrolling. With these in place, we are now working on a patch in WebKit which supports scrollend for programmatic scroll and Mac user scroll.

On the Chrome side, we continue working on the standard scroll values in non-default writing modes. This is an interesting set of challenges surrounding the scroll API and how it works with writing modes which was previously not entirely interoperable or well defined. Gaining interoperability requires changes, and we have to be sure that those changes are safe. Our current changes are implemented and guarded by a runtime flag “CSSOM View Scroll Coordinates”. With the help of Google engineers, we are trying to collect user data to decide whether it is safe to enable it by default.

Another minor interoperability fix that we were involved in was to ensure that the scrolling attribute of frames recognizes values “noscroll” or “off”. That was already the case in Firefox and this is now the case in Chromium and WebKit too.

Intersection and Resize Observers

As mentioned in our previous blog post, we drove the implementation of IntersectionObserver (enabled in iOS 12.2) and ResizeObserver (enabled in iOS 14 beta) in WebKit. We have made a few enhancements to these useful developer APIs this year.

Users reported difficulties with observe root of inner iframe and the specification was modified to accept an explicit document as a root parameter. This was implemented in Chromium and we implemented the same change in WebKit and Firefox. It is currently available Safari Tech Preview, iOS 14 beta and Firefox 75.

A bug was also reported with ResizeObserver incorrectly computing size for non-default zoom levels, which was in particular causing a bug on twitter feeds. We landed a patch last April and the fix is available in the latest Safari Tech Preview and iOS 14 beta.

Resource Loading

Another thing that we have been concerned with is how we can give more control and power to authors to more effectively tell the browser how to manage the loading of resources and improve performance.

The work that we started in 2019 on lazy loading has matured a lot along with the specification.

The lazy image loading implementation in WebKit therefore passes the related WPT tests and is functional and comparable to the Firefox and Chrome implementations. However, as you might expect, as we compare uses and implementation notes it becomes apparent that determining the moment when the lazy image load should start is not defined well enough. Before this can be enabled in releases some more work has to be done on improving that. The related frame lazy loading work has not started yet since the specification is not in place.

We also added an implementation for stale-while-revalidate. The stale-while-revalidate Cache-Control directive allows a grace period in which the browser is permitted to serve a stale asset while the browser is checking for a newer version. This is useful for non-critical resources where some degree of staleness is acceptable, like fonts. The feature has been enabled recently in WebKit trunk, but it is still disabled in the latest iOS 14 beta.

Contributions were made to improve prefetching in WebKit taking into account its cache partitioning mechanism. Before this work can be enabled some more patches have to be landed and possibly specified (for example, prenavigate) in more detail. Finally, various general Fetch improvements have been done, improving the fetch WPT score. Examples are:

What’s next

There is still a lot to do in scrolling and resource loading improvements and we will continue to focus on the features mentioned such as scrollend event, overscroll behavior and scroll behavior, lazy loading, stale-while-revalidate and prefetching.

As a continuation of the work done for aspect ratio calculation of images, we will consider the more general CSS aspect-ratio property. Performance metrics such as the ones provided by the Web Vitals project is also critical for web developers to ensure that their websites provide a good user experience and we are willing to investigate support for these in Safari.

We love doing this work to improve the platform and we’re happy to be able to collaborate in ways that contribute to bettering the web commons for all of us.

July 05, 2020 10:00 PM

July 03, 2020

Eleni Maria Stea

[OpenGL and Vulkan Interoperability on Linux] Part 1: Introduction

It’s been a while that Igalia’s graphics team had been working on the OpenGL extensions that provide the mechanisms for OpenGL and Vulkan interoperability in the Intel iris (gallium3d) driver that is part of mesa. As there were no conformance tests (CTS) for this extension, and we needed to test it, we have written (and … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 1: Introduction

by hikiko at July 03, 2020 11:39 AM

Martin Robinson

CSS Painting Order

How does a browser determine what order to paint content in? A first guess might be that browsers will paint content in the order that it is specified in the DOM, which for an HTML page is the order it appears in the source code of the page.

We can construct a simple example showing that two divs overlap in this order. We overlap two divs by giving one of them a negative top margin.

    .box {
        width: 8ex;
        height: 8ex;
        padding: 0.2ex;
        color: white;
        font-weight: bold;
        text-align: right;

    .blue { background: #99DDFF; }

    /* The second div has a negative top margin so that it overlaps
       with the first (blue) div. Also, shift it over to the right
       slightly. */
    .green {
        background: #44BB99;
        margin-left: 3ex;
        margin-top: -6ex;

<div class="blue box">1</div>
<div class="green box">2</div>
.example { margin-left: 5ex; margin-bottom: 6ex; } .box { width: 8ex; height: 8ex; padding: 0.2ex; color: white; font-weight: bold; text-align: right; } .blue { background: #99DDFF; } /* The second div has a negative top margin so that it overlaps with the first (blue) div. Also, shift it over to the right slightly. */ .green { background: #44BB99; margin-left: 3ex; margin-top: -6ex; }

It seems that our guess was a pretty good guess! I’m sure some of you are saying, “Hold on! What about z-index?” You’re right. Using the z-index property, we can override the normal painting order used by the browser. We give the green div a z-index and make it relatively positioned, because z-index only works on positioned elements. We also add a yellow child of the green div to see how this affects children. Finally, let’s start labeling each div with its z-index.

    .yellow {
        margin-left: 3ex;
        background: #EEDD88;

<div class="blue box">0</div>
<div class="green box" style="position: relative; z-index: -1;">-1
    <div class="yellow box">-1</div>
.yellow { margin-left: 3ex; background: #EEDD88; }

In this example, the green div is painted before the blue div, even though it comes later in the source code. We can see that the z-index affects the div itself and also the yellow child div. What if we want to now paint the yellow nested child on top of everything by giving it a large positive z-index?

<div class="blue box">0</div>
<div class="green box" style="position: relative; z-index: -1;">-1
    <div class="yellow box" style="position: relative; z-index: 1000;">1000</div>

Wait! What’s going on here? The blue div has no z-index specified, which should mean that the value used for its z-index is zero. The z-index of our nested yellow child is 1000, yet this div is still painted underneath. Why isn’t the nested child painted on top of the blue div as we might expect?

At this point, it’s appropriate we have to buy the classic joke “CSS IS AWESOME” mug, fill it up with coffee, and read the entirety of the CSS2 specification. Suddenly, we understand that the answer is that the our divs are forming something called stacking contexts.

The Stacking Context

We determined exactly what was going when we arrived at Appendix E: Elaborate description of Stacking Contexts. Thankfully, we made a stupidly big cup of coffee since all the good information is apparently stuffed in the appendices. Appendix E gives us a peak at the algorithm that browsers use to determine the painting order of content on the page, including what sorts of properties affect this painting order. It turns out that our early guesses were mostly correct, things generally stack according to the order in the DOM and active z-indices. Sometimes though, certain CSS properties applied to elements trigger the creation of a stacking context which might affect painting order in a way we don’t expect.

We learn from the Appendix E that a stacking context is an atomically painted collection of page items. What does this mean? To put it simply, it means that things inside a stacking context are painted together, as a unit, and that items outside the stacking content will never be painted between them. Having an active z-index is one of the situations in CSS which triggers the creation of a stacking context. Is there a way we can adjust our example above so that the third element belongs to the same stacking context as the first two elements? The answer is that we must remove it from the stacking context created by the second element.

<div class="blue box">0</div>
<div class="yellow box" style="position: relative; z-index: 1000; margin-top: -5ex">1000</div>
<div class="green box" style="position: relative; z-index: -1; margin-left: 6ex;">-1</div>

Now the yellow div is a sibling of the blue and the green and is painted on top of both of them, even though it now comes second in the source.

It’s clear that stacking contexts can impose strong limitations on the order our elements are painted, so it’d be great to know when we are triggering them. Whether or not a particular CSS feature triggers the creation of a new stacking context is defined with that feature, which means the information is spread throughout quite a few specifications. Helpfully, MDN has a great list of situations where element create a stacking context. Some notable examples are elements with an active z-index, position: fixed and position: sticky elements, and elements with a transform or perspective.

Surprising Details

I’m going to level with you. While the stacking context might be a bit confusing at first, for a browser implementor it makes things a lot simpler. The stacking context is a handy abstraction over a chunk of the layout tree which can be processed atomically. In fact, it would be nice if more things created stacking contexts. Rereading the list above you may notice some unusual exceptions. Some of these exceptions are not “on purpose,” but were just arbitrary decisions made a long time ago.

For me, one of the most surprising exceptions to stacking context creation is overflow: scroll. We know that setting scroll for the overflow property causes all contents that extend past the padding edge of a box to be hidden within a scrollable area. What does it mean that they do not trigger the creation of a stacking context? It means that content outside of a scrollable area can intersect content inside of it. All it takes is a little bit of work to see this in action:

    .scroll-area {
        overflow: scroll;
        border: 3px solid salmon;
        width: 18ex;
        height: 15ex;
        margin-left: 2ex;

    .scroll-area .vertical-bar {
        position: relative; /* We give each bar position: relative so that they can have z-indices. */
        float: left;
        height: 50ex;
        width: 4ex;

        /* A striped background that shows scrolling motion. */
        background: repeating-linear-gradient(
            salmon 0px, salmon 10px,
            orange 10px, orange 20px

    /* Even bars will be on top of the yellow vertical bar due to having a greater z-index. */
    .scroll-area .vertical-bar:nth-child(even) {
        z-index: 4;
        opacity: 0.9;

    .yellow-horizontal-bar {
        margin-top: -10ex;
        margin-bottom: 10ex;
        max-width: 30ex;
        background: #EEDD88;

        /* Raise the horizontal bar above the scrollbar of the scrolling area. */
        position: relative;
        z-index: 2;

<!-- A scroll area with four vertical bars. -->
<div class="scroll-area">
    <div class="vertical-bar"></div>
    <div class="vertical-bar"></div>
    <div class="vertical-bar"></div>
    <div class="vertical-bar"></div>

<!-- A div that will thread in between the vertical bars of the scroll area above. -->
<div class="yellow-horizontal-bar">~~JUST PASSING THROUGH~~</div>
.scroll-area { overflow: scroll; border: 3px solid salmon; width: 18ex; height: 15ex; margin-left: 2ex; } .scroll-area .vertical-bar { position: relative; /* We give each bar position: relative so that they can have z-indices. */ float: left; height: 50ex; width: 4ex; /* A striped background that shows scrolling motion. */ background: repeating-linear-gradient( 120deg, salmon 0px, salmon 10px, orange 10px, orange 20px ); } /* Even bars will be on top of the yellow vertical bar due to having a greater z-index. */ .scroll-area .vertical-bar:nth-child(even) { z-index: 4; opacity: 0.9; } .yellow-vertical-bar { margin-top: -10ex; margin-bottom: 10ex; max-width: 30ex; background: #EEDD88; /* Raise the vertical bar above the scrollbar of the scrolling area. */ position: relative; z-index: 2; }

Using the power of web design, we’ve managed to wedge the final div between the contents of the scroll frame. Half of the scrolling content is on top of the interloper and half is underneath. This probably renders in a surprising way with the interposed div on top the scrolling area’s scrollbar (if it has one). You can imagine what kind of headaches this causes for the implementation of scrollable areas in browser engines, because the children of a particular scroll area might be spread throughout the layout tree. There’s no guarantee that it has any kind of recursive encapsulation.

CSS’s rules often have a reasoned origin, but some of them are just arbitrary implementation decisions made roughly 20 years ago without the benefit of hindsight. Rough edges like this stacking context exception might seldom come into play, but the web is huge and has collected years of content. There are potentially thousands of pages relying on this behavior such as lists of 2003’s furriest angora rabbits or memorials to someone’s weird obsession with curb cuts. The architects of the web have chosen not to break those galleries of gorgeous lagomorphs and has instead opted for maximizing long-term web compatibility.

Breaking the Rules

Earlier, I wrote that nothing from the outside a stacking context can be painted in between a stacking context’s contents. Is that really, really, really true though? CSS is so huge, there must be at least one exception, right? I now have a concrete answer to this question and that answer is “maybe.” CSS is full of big hammers and one of the biggest hammers (this is foreshadowing for a future post) is CSS transformations. This makes sense. Stacking contexts are all about enforcing order amidst the chaos of the z-axis, which is the one that extends straight from your heart into your screen. Transformed elements can traverse this dimension allowing for snazzy flipbook effects and also requiring web browsers to gradually become full 3D compositors. Surely if its possible to break this rule we can do it with 3D CSS transformations.

Let’s take a modified version of one of our examples above. Here we have three boxes. The last two are inside of a div with a z-index of -2, which means that they are both inside a single stacking context that stacks underneath the first box.

    .salmon {
        background: salmon;
        margin-top: -5ex;
        margin-left: 4ex;

<div class="blue box">0</div>
<div style="position: relative; z-index: -2;">
    <div class="green box">-2</div>
    <div class="salmon box">-2</div>
.salmon { background: salmon; margin-top: -5ex; margin-left: 4ex; }

Now we make two modifications to this example. First, we wrap the example in a new div with a transform-style of preserve-3d, which will position all children in 3d space. Finally, we push one of the divs with z-index of -2 out of the screen using a 3d translation.

<div style="transform-style: preserve-3d;">
    <div class="blue box">0</div>
    <div style="position: relative; z-index: -2;">
        <div class="green box">-2</div>
        <div class="salmon box" style="transform: translateZ(50px);">-2</div>

It’s possible that your browser might not render this in the same way, but in Chrome the div with z-index of 0 is rendered in between two divs within the same stacking context both with z-index of -2.

We broke the cardinal rule of the stacking context. Take that architects of the web! Is this exercise useful at all? Almost certainly not. I hope it was sufficiently weird though!


Hopefully I’ll be back soon to talk about the implementation of this wonderful nonsense in Servo. I want to thank Frédéric Wang for input on this post and also Mozilla for allowing me to hack on this as part of my work for Igalia. Servo is a really great way to get involved in browser development. It’s also written in Rust, which is a language that can help you become a better programmer simply by learning it, so check it out. Thanks for reading!

July 03, 2020 04:00 AM

July 02, 2020

Philippe Normand

Web-augmented graphics overlay broadcasting with WPE and GStreamer

Graphics overlays are everywhere nowadays in the live video broadcasting industry. In this post I introduce a new demo relying on GStreamer and WPEWebKit to deliver low-latency web-augmented video broadcasts.

Readers of this blog might remember a few posts about WPEWebKit and a GStreamer element we at Igalia worked on. In december 2018 I introduced GstWPE and a few months later blogged about a proof-of-concept application I wrote for it. So, learning from this first iteration, I wrote another demo!

The first demo was already quite cool, but had a few down-sides:

  1. It works only on desktop (running in a Wayland compositor). The Wayland compositor dependency can be a burden in some cases. Ideally we could imaginge GstWPE applications running “in the cloud”, on machines without GPU, bare metal.
  2. While it was cool to stream to Twitch, Youtube and the like, these platforms currently can ingest only RTMP streams. That means the latency introduced can be quite significant, depending on the network conditions of course, but even in ideal conditions the latency was between one and 2 seconds. This is not great, in the world we live in.

To address the first point, WPE founding engineer, Žan Doberšek enabled software rasterizing support in WPE and its FDO backend. This is great because it allows WPE to run on machines without GPU (like continuous integration builders, test bots) but also “in the cloud” where machines with GPU are less affordable than bare metal! Following up, I enabled this feature in GstWPE. The source element caps template now has video/x-raw, in addition to video/x-raw(memory:GLMemory). To force swrast, you need to set the LIBGL_ALWAYS_SOFTWARE=true environment variable. The downside of swrast is that you need a good CPU. Of course it depends on the video resolution and framerate you want to target.

On the latency front, I decided to switch from RTMP to WebRTC! This W3C spec isn’t only about video chat! With WebRTC, sub-second live one-to-many broadcasting can be achieved, without much efforts, given you have a good SFU. For this demo I chose Janus, because its APIs are well documented, and it’s a cool project! I’m not sure it would scale very well in large deployments, but for my modest use-case, it fits very well.

Janus has a plugin called video-room which allows multiple participants to chat. But then imagine a participant only publishing its video stream and multiple “clients” connecting to that room, without sharing any video or audio stream, one-to-many broadcasting. As it turns out, GStreamer applications can already connect to this video-room plugin using GstWebRTC! A demo was developed by tobiasfriden and saket424 in Python, it recently moved to the gst-examples repository. As I kind of prefer to use Rust nowadays (whenever I can anyway) I ported this demo to Rust, it was upstreamed in gst-examples as well. This specific demo streams the video test pattern to a Janus instance.

Adapting this Janus demo was then quite trivial. By relying on a similar video mixer approach I used for the first GstWPE demo, I had a GstWPE-powered WebView streaming to Janus.

The next step was the actual graphics overlays infrastructure. In the first GstWPE demo I had a basic GTK UI allowing to edit the overlays on-the-fly. This can’t be used for this new demo, because I wanted to use it headless. After doing some research I found a really nice NodeJS app on Github, it was developed by Luke Moscrop, who’s actually one of the main developers of the Brave BBC project. The Roses CasparCG Graphics was developed in the context of the Lancaster University Students’ Union TV Station, this app starts a web-server on port 3000 with two main entry points:

  • An admin web-UI (in /admin/ allowing to create and manage overlays, like sports score boards, info banners, and so on.
  • The target overlay page (in the root location of the server), which is a web-page without predetermined background, displaying the overlays with HTML, CSS and JS. This web-page is meant to be fed to CasparCG (or GstWPE :))

After making a few tweaks in this NodeJS app, I can now:

  1. Start the NodeJS app, load the admin UI in a browser and enable some overlays
  2. Start my native Rust GStreamer/WPE application, which:
    • connects to the overlay web-server
    • mixes a live video source (webcam for instances) with the WPE-powered overlay
    • encodes the video stream to H.264, VP8 or VP9
    • sends the encoded RTP stream using WebRTC to a Janus server
  3. Let “consumer” clients connect to Janus with their browser, in order to see the resulting live broadcast.

(If the video doesn’t display, here is the Youtube link.)

This is pretty cool and fun, as my colleague Brian Kardell mentions in the video. Working on this new version gave me more ideas for the next one. And very recently the audio rendering protocol was merged in WPEBackend-FDO! That means even more use-cases are now unlocked for GstWPE.

This demo’s source code is hosted on Github. Feel free to open issues there, I am always interested in getting feedback, good or bad!

GstWPE is maintained upstream in GStreamer and relies heavily on WPEWebKit and its FDO backend. Don’t hesitate to contact us if you have specific requirements or issues with these projects :)

by Philippe Normand at July 02, 2020 01:00 PM

July 01, 2020

Alejandro Piñeiro

v3dv status update 2020-07-01

About three weeks ago there was a big announcement about the update of the status of the Vulkan effort for the Raspberry Pi 4. Now the source code is public. Taking into account the interest that it got, and that now the driver is more usable, we will try to post status updates more regularly. Let’s talk about what’s happened since then.

Input Attachments

Input attachment is one of the main sub-features for Vulkan multipass, and we’ve gained support since the announcement. On Vulkan the support for multipass is more tightly supported by the API. Renderpasses can have multiple subpasses. These can have dependencies between each other, and each subpass define a subset of “attachments”. One attachment that is easy to understand is the color attachment: This is where a given subpass writes a given color. Another, input attachment, is an attachment that was updated in a previous subpass (for example, it was the color attachment on such previous subpass), and you get as a input on following subpasses. From the shader POV, you interact with it as a texture, with some restrictions. One important restriction is that you can only read the input attachment at the current pixel location. The main reason for this restriction is because on tile-based GPUs (like rpi4) all primitives are batched on tiles and fragment processing is rendered one tile at a time. In general, if you can live with those restrictions, Vulkan multipass and input attachment will provide better performance than traditional multipass solutions.

If you are interested in reading more details on this, you can check out ARM’s very nice presentation “Vulkan Multipass mobile deferred done right”, or Sascha Willems’ post “Vulkan input attachments and sub passes”. The latter also includes information about how to use them and code snippets of one of his demos. For reference, this is how the input attachment demos looks on the rpi4:

Sascha Willems inputattachment demos run on rpi4

Compute Shader

Given that this was one of the most requested features after the last update, we expect that this will be likely be the most popular news from this post: Compute shaders are now supported.

Compute shaders give applications the ability to perform non-graphics related tasks on the GPU, outside the normal rendering pipeline. For example they don’t have vertices as input, or fragments as output. They can still be used for massivelly parallel GPGPU algorithms. For example, this demo from Sascha Willems uses a compute shader to simulate cloth:

Sascha Willems Compute Cloth demos run on rpi4

Storage Image

Storage Image is another recent addition. It is a descriptor type that represents an image view, and supports unfiltered loads, stores, and atomics in a shader. It is really similar in most other ways to the well-known OpenGL concept of texture. They are really common with compute shaders. Compute shaders will not render (they can’t) directly any image, and it is likely that if they need an image, they will update it. In fact the two Sascha Willem demos using storage images also require compute shader support:

Sascha Willems compute shader demos run on rpi4

Sascha Willems compute raytracing demo run on rpi4


Right now our main focus for the driver is working on features, targetting a compliant Vulkan 1.0 driver. Having said so, now that we both support a good range of features and can run non-basic applications, we have devoted some time to analyze if there were clear points where we could improve the performance. Among these we implemented:
1. A buffer object (BO) cache: internally we are allocating and freeing really often buffer objects for basically the same tasks, so there are a constant need of buffers of the same size. Such allocation/free require a DRM call, so we implemented a BO cache (based on the existing for the OpenGL driver) so freed BOs would be added to a cache, and reused if a new BO is allocated with the same size.
2. New code paths for buffer to image copies.


In addition to work on specific features, we also spent some time fixing specific driver bugs, using failing Vulkan CTS tests as reference. Thanks to that work, the Sascha Willems’ radial blur demo is now properly rendering, even though we didn’t focus specifically on working on that demo:

Sascha Willems radial blur demo run on rpi4


Now that the driver supports a good range of features and we are able to test more applications and run more Vulkan CTS Tests with all the needed features implemented, we plan to focus some efforts towards bugfixing for a while.

We also plan to start to work on implementing the support for Pipeline Cache, which allows the result of pipeline construction to be reused between pipelines and between runs of an application.

by infapi00 at July 01, 2020 05:26 PM

June 30, 2020

Enrique Ocaña

Developing on WebKitGTK with Qt Creator 4.12.2

After the latest migration of WebKitGTK test bots to use the new SDK based on Flatpak, the old development environment based on jhbuild became deprecated. It can still be used with export WEBKIT_JHBUILD=1, though, but support for this way of working will gradually fade out.

I used to work on a chroot because I love the advantages of having an isolated and self-contained environment, but an issue in the way bubblewrap manages mountpoints basically made it impossible to use the new SDK from a chroot. It was time for me to update my development environment to the new ages and have it working in my main Kubuntu 18.04 distro.

My mail goal was to have a comfortable IDE that follows standard GUI conventions (that is, no emacs nor vim) and has code indexing features that (more or less) work with the WebKit codebase. Qt Creator was providing all that to me in the old chroot environment thanks to some configuration tricks by Alicia, so it should be good for the new one.

I preferred to use the Qt Creator 4.12.2 offline installer for Linux, so I can download exactly the same version in the future in case I need it, but other platforms and versions are also available.

The WebKit source code can be downloaded as always using git:

git clone

It’s useful to add WebKit/Tools/Scripts and WebKit/Tools/gtk to your PATH, as well as any other custom tools you may have. You can customize your $HOME/.bashrc for that, but I prefer to have an environment script to be sourced from the current shell when I want to enter into my development environment (by running webkit). If you’re going to use it too, remember to adjust to your needs the paths used there.

Even if you have a pretty recent distro, it’s still interesting to have the latests Flatpak tools. Add Alex Larsson’s PPA to your apt sources:

sudo add-apt-repository ppa:alexlarsson/flatpak

In order to ensure that your distro has all the packages that webkit requires and to install the WebKit SDK, you have to run these commands (I omit the full path). Downloading the Flatpak modules will take a while, but at least you won’t need to build everything from scratch. You will need to do this again from time to time, every time the WebKit base dependencies change:


Now just build WebKit and check that MiniBrowser works:

build-webkit --gtk
run-minibrowser --gtk

I have automated the previous steps as go full-rebuild and

This build process should have generated a WebKit/WebKitBuild/GTK/Release/compile_commands.json
file with the right parameters and paths used to build each compilation unit in the project. This file can be leveraged by Qt Creator to get the right include paths and build flags after some preprocessing to translate the paths that make sense from inside Flatpak to paths that make sense from the perspective of your main distro. I wrote to take care of those transformations. It can be run manually or automatically when calling go full-rebuild or go update.

The WebKit way of managing includes is a bit weird. Most of the cpp files include config.h and, only after that, they include the header file related to the cpp file. Those header files depend on defines declared transitively when including config.h, but that file isn’t directly included by the header file. This breaks the intuitive rule of “headers should include any other header they depend on” and, among other things, completely confuse code indexers. So, in order to give the Qt Creator code indexer a hand, the script pre-includes WebKit.config for every file and includes config.h from it.

With all the needed pieces in place, it’s time to import the project into Qt Creator. To do that, click File → Open File or Project, and then select the compile_commands.json file that should have generated in the WebKit main directory.

Now make sure that Qt Creator has the right plugins enabled in Help → About Plugins…. Specifically: GenericProjectManager, ClangCodeModel, ClassView, CppEditor, CppTools, ClangTools, TextEditor and LanguageClient (more on that later).

With this setup, after a brief initial indexing time, you will have support for features like Switch header/source (F4), Follow symbol under cursor (F2), shading of disabled if-endif blocks, auto variable type resolving and code outline. There are some oddities of compile_commands.json based projects, though. There are no compilation units in that file for header files, so indexing features for them only work sometimes. For instance, you can switch from a method implementation in the cpp file to its declaration in the header file, but not the opposite. Also, you won’t see all the source files under the Projects view, only the compilation units, which are often just a bunch of UnifiedSource-*.cpp files. That’s why I prefer to use the File System view.

Additional features like Open Type Hierarchy (Ctrl+Shift+T) and Find References to Symbol Under Cursor (Ctrl+Shift+U) are only available when a Language Client for Language Server Protocol is configured. Fortunately, the new WebKit SDK comes with the ccls C/C++/Objective-C language server included. To configure it, open Tools → Options… → Language Client and add a new item with the following properties:

  • Name: ccls
  • Language: *.c;.cpp;*.h
  • Startup behaviour: Always On
  • Executable: /home/enrique/work/webkit/WebKit/Tools/Scripts/webkit-flatpak
  • Arguments: --gtk -c ccls --index=/home/enrique/work/webkit/WebKit

Some “LanguageClient ccls: Unexpectedly finished. Restarting in 5 seconds.” errors will appear in the General Messages panel after configuring the language client and every time you launch Qt Creator. It’s just ccls taking its time to index the whole source code. It’s “normal”, don’t worry about it. Things will get stable and start to work after some minutes.

Due to the way the Locator file indexer works in Qt Creator, it can become confused, run out of memory and die if it finds cycles in the project file tree. This is common when using Flatpak and running the MiniBrowser or the tests, since /proc and other large filesystems are accessible from inside WebKit/WebKitBuild. To avoid that, open Tools → Options… → Environment → Locator and set Refresh interval to 0 min.

I also prefer to call my own custom build and run scripts (go and instead of letting Qt Creator build the project with the default builders and mess everything. To do that, from the Projects mode (Ctrl+5), click on Build & Run → Desktop → Build and edit the build configuration to be like this:

  • Build directory: /home/enrique/work/webkit/WebKit
  • Add build step → Custom process step
    • Command: go (no absolute route because I have it in my PATH)
    • Arguments:
    • Working directory: /home/enrique/work/webkit/WebKit

Then, for Build & Run → Desktop → Run, use these options:

  • Deployment: No deploy steps
  • Run:
    • Run configuration: Custom Executable → Add
      • Executable:
      • Command line arguments:
      • Working directory:

With these configuration you can build the project with Ctrl+B and run it with Ctrl+R.

I think I’m not forgetting anything more regarding environment setup. With the instructions in this post you can end up with a pretty complete IDE. Here’s a screenshot of it working in its full glory:

Anyway, to be honest, nothing will ever reach the level of code indexing features I got with Eclipse some years ago. I could find usages of a variable/attribute and know where it was being read, written or read-written. Unfortunately, that environment stopped working for me long ago, so Qt Creator has been the best I’ve managed to get for a while.

Properly configured web based indexers such as the Searchfox instance configured in Igalia can also be useful alternatives to a local setup, although they lack features such as type hierarchy.

I hope you’ve found this post useful in case you try to setup an environment similar to the one described here. Enjoy!

by eocanha at June 30, 2020 03:47 PM

June 23, 2020

Igalia Compilers Team

Dates and Times in JavaScript

tl;dr: We are looking for feedback on the Temporal proposal. Try out the polyfill, and complete the survey; but don’t use it in production yet!

JavaScript Date is broken in ways that cannot be fixed without breaking the web. As the story goes, it was included in the original 10-day JavaScript engine hack and based on java.util.Date, which itself was deprecated in 1997 due to being a terrible API and replaced with a better one. The result has been for all of JavaScript’s history, the built-in Date has remained very hard to work with directly.

Starting a few years ago, a proposal has been developing, to add a new globally available object to JavaScript, Temporal. Temporal is a robust and modern API for working with dates, times, and timestamps, and also makes it easy to do things that were hard or impossible with Date, like converting dates between time zones, adding and subtracting while accounting for daylight saving time, working with date-only or time-only data, and even handling dates in non-Gregorian calendars. Although Temporal has “just works” defaults, it also provides fine-grained opt-in control of overflows, interpreting ambiguous times, and other corner cases. For more on the history of the proposal, and why it’s not possible to fix Date itself, read Maggie Pint’s two-part blog post “Fixing JavaScript Date”.

For examples of the power of Temporal, check out the cookbook. Many of these examples would be difficult to do with legacy Date, particularly the ones involving time zones. (We would have put an example in this post, but the code might soon become stale, for reasons which will hopefully become clear!)

This proposal is currently at Stage 2 in TC39’s proposal process, and we1 are hoping to move it along to Stage 3 soon.2 We have been working on the feature set of Temporal and the API for a long time, and we believe it’s full-featured and that the API is reasonable. You don’t design good APIs solely on the drawing board, however, so it’s time to put it to the test and let the JavaScript developer community try it out and see whether what we’ve come up with meets people’s needs.

It is still early enough that we can make drastic changes to the API if we find we need to, based on the feedback that we get. So please, try it out and let us know!

How to Try Temporal

If you just want to try Temporal out casually, with an interactive prompt, that’s easy! Visit the API documentation in your browser. On any of the documentation or cookbook pages, you can open your browser console and Temporal will be already loaded, ready for you to try out the examples. Or you can try it out on RunKit.

Or, maybe you are interested in a bit more in-depth evaluation, like building a small test project using Temporal. We know this takes up people’s valuable project time, but it’s also the best way that we can get the most valuable feedback, so we’d really appreciate this! We have released a polyfill for the Temporal API on npm. You can use it in your project with npm install --save proposal-temporal, and import it in your project with const { Temporal } = require('proposal-temporal');.

However, don’t use the polyfill in production applications! The proposal is still at Stage 2, and the polyfill has an 0.x version, so that should make it clear that the API is subject to change, and we do intend to keep changing it when we get feedback from you!

How to Give Feedback

We would love to hear from you about your experiences with Temporal! Once you’ve tried it, we have a short survey for you to fill out. If you feel comfortable doing so, please leave us your contact information, since we might want to ask some follow up questions.

Please also open an issue on our issue tracker if you have some suggestion! We welcome suggestions whether or not you filled out the survey. You can also browse the feedback that’s already been given in the issue tracker, and give it a thumbs-up if you agree or thumbs-down if you disagree.

Thanks for participating if you can! All the feedback that we receive now will help us make the right decisions as the proposal moves along to Stage 3 and Temporal eventually appears in your browser.

[1] “We” in this post means the Temporal champions group, a group of TC39 delegates and interested people. As you may guess from where this blog post is hosted, it includes members of Igalia’s Compilers team, but this was written on behalf of the Temporal champions. ↩

[2] Read the TC39 process document for more information on what these stages mean. tl;dr: Stage 2 is the time to give feedback on the proposal that can still be incorporated even if it requires drastic changes. Stage 3 is when the proposal remains stable except for serious problems discovered during implementation in browsers. ↩

by compilers at June 23, 2020 06:11 PM

June 16, 2020

Víctor Jáquez

WebKit Flatpak SDK and gst-build

This post is an annex of Phil’s Introducing the WebKit Flatpak SDK. Please make sure to read it, if you haven’t already.

Recapitulating, nowadays WebKitGtk/WPE developers —and their CI infrastructure— are moving towards to Flatpak-based environment for their workflow. This Flatpak-based environment, or Flatpak SDK for short, can be visualized as a software sandboxed-container, which bundles all the dependencies required to compile, run and debug WebKitGtk/WPE.

In a day-by-day work, this approach removes the potential compilation of the world in order to obtain reproducible builds, improving the development and testing work flow.

But what if you are also involved in the development of one dependency?

This is the case of Igalia’s multimedia team where, besides developing the multimedia features for WebKitGtk and WPE, we also participate in the GStreamer development, the framework used for multimedia.

Because of this, in our workflow we usually need to build WebKit with a fix, hack or new feature in GStreamer. Is it possible to add in Flatpak our custom GStreamer build without messing its own GStreamer setup? Yes, it’s possible.

gst-build is a set of scripts in Python which clone GStreamer repositories, compile them and setup an uninstalled environment. This uninstalled environment allows a transient usage of the compiled framework from their build tree, avoiding installation and further mess up with our system.

The WebKit scripts that wraps Flatpak operations are also capable to handle the scripts of gst-build to build GStreamer inside the container, and, when running WebKit’s artifacts, the scripts enable the mentioned uninstalled environment, overloading Flatpak’s GStreamer.

How do we unveil all this magic?

First of all, setup a gst-build installation as it is documented. In this installation is were the GStreamer plumbing is done.

Later, gst-build operations through WebKit compilation scripts are enabled when the environment variable GST_BUILD_PATH is exported. This variable should point to the directory where the gst-build tree is placed.

And that’s all!

But let’s put these words in actual commands. The following workflow assumes that WebKit repository is cloned in ~/WebKit and the gst-build tree is in ~/gst-build (please, excuse my bashisms).

Compiling WebKitGtk with symbols, using LLVM as toolchain (this command will also compile GStreamer):

$ cd ~/WebKit
% CC=clang CXX=clang++ GST_BUILD_PATH=/home/vjaquez/gst-build Tools/Scripts/build-webkit --gtk --debug

Running the generated minibrowser (remind GST_BUILD_PATH is required again for a correct linking):

$ GST_BUILD_PATH=/home/vjaquez/gst-build Tools/Scripts/run-minibrowser --gtk --debug

Running media layout tests:

$ GST_BUILD_PATH=/home/vjaquez/gst-build ./Tools/Scripts/run-webkit-tests --gtk --debug media

But wait! There’s more...

What if you I want to parametrize the GStreamer compilation. To say, I would like to enable a GStreamer module or disable the built of a specific element.

gst-build, as the rest of GStreamer modules, uses meson build system, so it’s possible to pass arguments to meson through the environment variable GST_BUILD_ARGS.

For example, I would like to enable gstreamer-vaapi 😇

$ cd ~/WebKit
% CC=clang CXX=clang++ GST_BUILD_PATH=/home/vjaquez/gst-build GST_BUILD_ARGS="-Dvaapi=enabled" Tools/Scripts/build-webkit --gtk --debug

by vjaquez at June 16, 2020 11:49 AM

June 13, 2020

Philippe Normand

Setting up Debian containers on Fedora Silverblue

After almost 20 years using Debian, I am trying something different, Fedora Silverblue. However for work I still need to use Debian/Ubuntu from time to time. In this post I am explaining the steps to setup Debian containers on Silverblue.

By default Silverblue comes with Toolbox which perfectly integrates the OS. It’s a shell script which actually relies on Podman, an alternative to Docker. And it works really well with Fedora images! However I ran into various issues when I wanted to setup Debian containers pulled from

$ toolbox create -c sid --image
Image required to create toolbox container.
Download (500MB)? [y/N]: y
Created container: sid
Enter with: toolbox enter --container sid
$ toolbox enter -c sid
toolbox: failed to start container sid

This should work, but doesn’t, because Toolbox expects specific Image requirements. After some digging into alternate Toolbox implementations and quite a bit of experimentation, I found a Toolbox pull-request adding support for Debian containers. Unfortunately this pull-request wasn’t merged yet, perhaps because the Toolbox developers are busy in the rewrite of Toolbox in Go. Scrolling down the comments, Martin Pitt provides some details and links to the approach he’s taken to achieve the same goal. Trying to follow his instructions, I finally managed to have working Debian containers in Silverblue. Many thanks to him! Here’s the run-down

  1. Download the build-debian-toolbox script:
$ wget;a=blob_plain;f=build-debian-toolbox;hb=HEAD
$ mkdir -p ~/bin
$ mv build-debian-toolbox ~/bin
$ chmod +x ~/bin/build-debian-toolbox
  1. Modify it a little. Here I had to:
    • change the shell shebang to /bin/bash
    • Update toolbox call sites to ~/bin/toolbox
  2. Make sure ~/bin/ is first in your shell $PATH
  3. Download the patched toolbox script from this pull-request and:
    • Move it to ~/bin/
    • Again, switch to the /bin/bash/ shebang in that script

Then run build-debian-toolbox. By default it will create a Debian Sid container, unless you provide override as command-line arguments, so for instance to create an Ubuntu bionic container you can:

$ build-debian-toolbox bionic ubuntu
$ toolbox run -c bionic cat /etc/os-release
VERSION="18.04.4 LTS (Bionic Beaver)"
PRETTY_NAME="Ubuntu 18.04.4 LTS"

That’s it! Hopefully the Toolbox Go rewrite will support this out of the box without requiring third-party tweaks. Thanks again to Martin for his work on this topic.

by Philippe Normand at June 13, 2020 11:50 AM

June 11, 2020

Philippe Normand

Introducing the WebKit Flatpak SDK

Working on a web-engine often requires a complex build infrastructure. This post documents our transition from JHBuild to Flatpak for the WebKitGTK and WPEWebKit development builds.

For the last 10 years, WebKitGTK has been relying on a custom JHBuild moduleset to handle its dependencies and (try to) ensure a reproducible test environment for the build bots. When WPEWebKit was upstreamed several years ago, a similar approach was used. We ended up with two slightly different modulesets to maintain. The biggest problem with that is that we still depend on the host OS for some dependencies. Another set of scripts was then written to install those, depending on which distro the host is running… This is a bit unfortunate. There are more issues with JHBuild, when a moduleset is updated, the bots wipe all the resulting builds and start a new build from scratch! Every WebKitGTK and WPE developer is strongly advised to use this setup, so everybody ends up building the dependencies.

In 2018, my colleague Thibault Saunier worked on a new approach, based on Flatpak. WebKit could be built as a Flatpak app relying on the GNOME SDK. This experiment was quite interesting but didn’t work for several reasons:

  • Developers were not really aware of this new approach
  • The GNOME SDK OSTree commits we were pinning we being removed from the upstream repo periodically, triggering issues on our side
  • Developers still had to build all the WebKit “app” dependencies

In late 2019 I started having a look at this, with a different workflow. I started to experiment with a custom SDK, based on the Freedesktop SDK. The goal was to distribute this to WebKit developers, who wouldn’t need to worry as much about build dependencies anymore and just focus on building the damn web-engine.

I first learned about flatpak-builder, built a SDK with that, and started playing with it for WebKit builds. I was almost happy with that, but soon realized flatpak-builder is cool for apps packaging, but for big SDKs, it doesn’t really work out. Flatpak-builder builds a single recipe at a time and if I want to update one, everything below in the manifest is rebuilt. As the journey goes on, I found Buildstream, which is actually used by the FDO and GNOME folks nowadays. After converting my flatpak-builder manifest to Buildstream I finally achieved happiness. Buildstream is bit like Yocto, Buildroot and all the similar NIH sysroot builders. It was one more thing to learn, but worth it.

The SDK build definitions are hosted in WebKit’s repository. The resulting Flatpak images are hosted on Igalia’s server and locally installed in a custom Flatpak UserDir so as to not interfere with the rest of the host OS Flatpak apps. The usual mix of python, perl, ruby WebKit scripts rely on the SDK to perform their job of building WebKit, running the various test suites (API tests, layout tests, JS tests, etc). All you need to do is:

  1. clone the WebKit git repo
  2. run Tools/Scripts/update-webkit-flatpak
  3. run Tools/Scripts/build-webkit —gtk
  4. run build artefacts, like the MiniBrowser, Tools/Scripts/run-minibrowser —gtk

Under the hood, the SDK will be installed in WebKitBuild/UserFlatpak and transparently used by the various build scripts. The sandbox is started using a flatpak run call which bind-mounts the WebKit checkout and build directory. This is great! We finally have a unified workflow, not depending on specific host distros. We can easily update toolchains, package handy debug tools like rr, LSP tooling such as ccls, etc and let the lazy developers actually focus on development, rather than tooling infrastructure.

Another nice tool we now support, is sccache. By deploying a “cloud” of builders in our Igalia servers we can now achieve improved build times. The SDK generates a custom sccache config based on the toolchains it includes (currently GCC 9.3.0 and clang 8). You can optionally provide an authentication token and you’re set. Access to the Igalia build cloud is restricted to Igalians, but folks who have their own sccache infra can easily set it up for WebKit. In my home office where I used to wait more than 20 minutes for a build to complete, I can now have a build done in around 12 minutes. Once we have Redis-powered cloud storage we might reach even better results. This is great because the buildbots should be able to keep the Redis cache warm enough. Also developers who can rely on this won’t need powerful build machines anymore, a standard workstation or laptop should be sufficient because the heavy C++ compilation jobs happen in the “cloud”.

If you don’t like sccache, we still support IceCC of course. The main difference is that IceCC works better on local networks.

Hacking on WebKit often involves hacking on its dependencies, such as GTK, GStreamer, libsoup and so on. With JHBuild it was fairly easy to vendor patches in the moduleset. With Buildstream the process is a bit more complicated, but actually not too bad! As we depend on the FDO SDK we can easily “patch” the junction file and also adding new dependencies from scratch is quite easy. Many thanks to the Buildstream developers for the hard work invested in the project!

As a conclusion, all our WebKitGTK and WPEWebKit bots are now using the SDK. JHBuild remains available for now, but on opt-in basis. The goal is to gently migrate most of our developers to this new setup and eventually JHBuild will be phased out. We still have some tasks pending, but we achieved very good progress on improving the WebKit developer workflow. A few more things are now easier to achieve with this new setup. Stay tuned for more! Thanks for reading!

by Philippe Normand at June 11, 2020 12:50 PM

Diego Pino

Renderization of Conic gradients

The CSS Images Module Level 4 introduced a new type of gradient: conic-gradient. Until then, there were only two other type of gradients available on the Web: linear-gradient and radial-gradient.

The first browser to ship conic-gradient support was Google Chrome, around March 2018. A few months after, September 2018, the feature was available in Safari. Firefox have been missing support until now, although an implementation is on the way and will ship soon. In the case of WebKitGTK (Epiphany) and WPE (Web Platform for Embedded), support landed in October 2019 which I implemented as part of my work at Igalia. The feature has been officially available in WebKitGTK and WPE since version 2.28 (March 2020).

Before native browser support, conic-gradient was available as a JavaScript polyfill created by Lea Verou.

Gradients in the Web

Generally speaking, a gradient is a smooth transition of colors defined by two or more stop-colors. In the case of a linear gradient, this transition is defined by a straight line (which might have and angle or not).

div.linear-gradient {
  width: 400px;
  height: 100px;
  background: linear-gradient(to right, red, yellow, lime, aqua, blue, magenta, red);
Linear gradient
Linear gradient

In the case of a radial gradient, the transition is defined by a center and a radius. Colors expand evenly in all directions from the center of the circle to outside.

div.radial-gradient {
  width: 300px;
  height: 300px;
  border-radius: 50%;
  background: radial-gradient(red, yellow, lime, aqua, blue, magenta, red);
Radial gradient
Radial gradient

A conical gradient, although also defined by a center and a radius, isn’t the same as a radial gradient. In a conical gradient colors spin around the circle.

div.conic-gradient {
  width: 300px;
  height: 300px;
  border-radius: 50%;
  background: conic-gradient(red, yellow, lime, aqua, blue, magenta, red);
Conic gradient
Conic gradient

Implementation in WebKitGTK and WPE

At the time of implementing support in WebKitGTK and WPE, the feature had already shipped in Safari. That meant WebKit already had support for parsing the conic-gradient specification as defined in CSS Images Module Level 4 and the data structures to store relevant information were already created. The only piece missing in WebKitGTK and WPE was painting.

Safari leverages many of its graphical painting operations on CoreGraphics library, which counts with a primitive for conic gradient painting (CGContextDrawConicGradient). Something similar happens in Google Chrome, although in this case the graphics library underneath is Skia (CreateTwoPointConicalGradient). WebKitGTK and WPE use Cairo for many of their graphical operations. In the case of linear and radial gradients, there’s native support in Cairo. However, there isn’t a function for conical gradient painting. This doesn’t mean Cairo cannot be used to paint conical gradients, it just means that is a little bit more complicated.

Mesh gradients

Cairo documentation states is possible to paint a conical gradient using a mesh gradient. A mesh gradient is defined by a set of colors and control points. The most basic type of mesh gradient is a Gouraud-shading triangle mesh.

cairo_mesh_pattern_begin_patch (pattern)

cairo_mesh_pattern_move_to (pattern, 100, 100);
cairo_mesh_pattern_line_to (pattern, 130, 130);
cairo_mesh_pattern_line_to (pattern, 130,  70);

cairo_mesh_pattern_set_corner_color_rgb (pattern, 0, 1, 0, 0);
cairo_mesh_pattern_set_corner_color_rgb (pattern, 1, 0, 1, 0);
cairo_mesh_pattern_set_corner_color_rgb (pattern, 2, 0, 0, 1);

cairo_mesh_pattern_end_patch (pattern)
Gouraud-shaded triangle mesh
Gouraud-shaded triangle mesh

A more sophisticated patch of mesh gradient is a Coons patch. A Coons patch is a quadrilateral defined by 4 cubic Bézier curve and 4 colors, one for each vertex. A Bézier curve is defined by 4 points, so we have a total of 12 control points (and 4 colors) in a Coons patch.

cairo_mesh_pattern_begin_patch (pattern);

cairo_mesh_pattern_move_to (pattern, 45, 12);
cairo_mesh_pattern_curve_to(pattern, 69, 24, 173, -15, 115, 50);
cairo_mesh_pattern_curve_to(pattern, 127, 66, 174, 47, 148, 104);
cairo_mesh_pattern_curve_to(pattern, 65, 58, 70, 69, 18, 103);
cairo_mesh_pattern_curve_to(pattern, 42, 43, 63, 45, 45, 12);

cairo_mesh_pattern_set_corner_color_rgb (pattern, 0, 1, 0, 0); // red
cairo_mesh_pattern_set_corner_color_rgb (pattern, 1, 0, 1, 0); // green
cairo_mesh_pattern_set_corner_color_rgb (pattern, 2, 0, 0, 1); // blue
cairo_mesh_pattern_set_corner_color_rgb (pattern, 3, 1, 1, 0); // yellow

cairo_mesh_pattern_end_patch (pattern);
Coons patch gradient
Coons patch gradient

A Coons patch comes very handy to paint a conical gradient. Consider the first quadrant of a circle, such quadrant can be easily defined with a Bézier curve.

cairo_mesh_pattern_begin_patch (pattern);

cairo_mesh_pattern_move_to (pattern, 0, 200);
cairo_mesh_pattern_line_to (pattern, 0, 0);
cairo_mesh_pattern_curve_to (pattern, 133, 0, 200, 133, 200, 200);
cairo_mesh_pattern_line_to (pattern, 0, 200);

cairo_mesh_pattern_set_corner_color_rgb (pattern, 0, 1, 0, 0); // red
cairo_mesh_pattern_set_corner_color_rgb (pattern, 1, 0, 1, 0); // green
cairo_mesh_pattern_set_corner_color_rgb (pattern, 2, 0, 0, 1); // blue
cairo_mesh_pattern_set_corner_color_rgb (pattern, 3, 1, 1, 0); // yellow

Coons patch of the first quadrant of a circle
Coons patch of the first quadrant of a circle

If we just simply use two colors instead, the final result resembles more to how a conical gradient looks.

cairo_mesh_pattern_set_corner_color_rgb (pattern, 0, 1, 0, 0); // red
cairo_mesh_pattern_set_corner_color_rgb (pattern, 1, 1, 0, 0); // red
cairo_mesh_pattern_set_corner_color_rgb (pattern, 2, 1, 1, 0); // yellow
cairo_mesh_pattern_set_corner_color_rgb (pattern, 3, 1, 1, 0); // yellow
Coons patch of the first quadrant of a circle (2 colors)
Coons patch of the first quadrant of a circle (2 colors)

Repeat this step 3 times more, with a few more stop colors, and you have a nice conical gradient.

A conic gradient made by composing mesh patches
A conic gradient made by composing mesh patches

Bézier curve as arcs

At this point the difficulty of painting a conical gradient has been reduced to calculating the shape of the Bézier curve of each mesh patch.

Computing the starting and ending points is straight forward, however calculating the position of the other two control points of the Bézier curve is a bit much harder.

Bézier curve approximation to a circle quadrant
Bézier curve approximation to a circle quadrant

Mozillian Michiel Kamermans (pomax) has a beautifully written essay on Bézier curves. Section “Circles and cubic Bézier curves” of such essay discusses how to approximate a Bézier curve to an arc. The case of a circular quadrant is particularly interesting because it allows painting a circle with 4 Bézier curves with minimal error. In the case of the quadrant above the values for each point would be the following:

S = (0, r), CP1 = (0.552 * r, r), CP2 = (r, 0.552 * r), E = (r, 0) 

Even though on its most basic form a conic gradient is defined by one starting and one ending color, painting a circle with two Bézier curves is not a good approximation to a semicircle (check the interactive examples of pomax’s Bézier curve essay). In such case, the conic gradient is split into four Coon patches with middle colors interpolated.

Also, in cases were there are more than 4 colors, each Coons patch will be smaller than a quadrant. It’s necessary a general formula that can compute the control points for each section of the circle, given an angle and a radius. After some math, the following formula can be inferred (check section “Circle and cubic Bézier curves” in pomax’s essay):

cp1 = {
   x: cx + (r * cos(angleStart) - f * (r * sin(angleStart),
   y: cy + (r * sin(angleStart)) + f * (r * cos(angleStart))
cp2 = {
   x: cx + (r * cos(angleEnd)) + f * (r * sin(angleEnd)),
   y: cy + (r * sin(angleEnd)) - f * (r * cos(angleEnd))

where f is a variable computed as:

f = 4 * tan((angleEnd - angleStart) / 4) / 3;

For a 90 degrees angle the value of f is 0.552. Thus, if the quadrant above had a radius of 100px, the values of the control points would be: CP1(155.2, 0) and CP2(200, 44.8) (considering top corner left as point 0,0).

And that’s basically all that is needed. The formula above allows us to compute a circular sector as a Bézier line, which when setup as a Coons patch creates a section of a conical gradient. Adding several Coons patches together creates the final conical gradient.

Wrapping up

It has been a long time since conic gradients for the Web were first drafted. For instance, the current bug in Firefox’s Bugzilla was created by Lea Verou five years ago. Fortunately, browsers have started shipping native support and conical gradients have been available in Chrome and Safari since two years ago. In this post I discussed the implementation, mainly rendering, of conic gradients in WebKitGTK and WPE. And since both browsers are WebKit based, they can leverage on the implementation efforts led by Apple when bringing support of this feature to Safari. With Firefox shipping conic gradient support soon this feature will be safe to use in the Web Platform.

June 11, 2020 12:00 AM

June 10, 2020

Philippe Normand

WebKitGTK and WPE now supporting videos in the img tag

Using videos in the <img> HTML tag can lead to more responsive web-page loads in most cases. Colin Bendell blogged about this topic, make sure to read his post on the cloudinary website. As it turns out, this feature has been supported for more than 2 years in Safari, but only recently the WebKitGTK and WPEWebKit ports caught up. Read on for the crunchy details or skip to the end of the post if you want to try this new feature.

As WebKitGTK and WPEWebKit already heavily use GStreamer for their multimedia backends, it was natural for us to also use GStreamer to provide the video ImageDecoder implementation.

The preliminary step is to hook our new decoder into the MIMETypeRegistry and into the ImageDecoder.cpp platform-agnostic module. This is where the main decoder branches out to platform-specific backends. Then we need to add a new class implementing WebKit’s ImageDecoder virtual interface.

First you need to implement supportsMediaType(). For this method we already had all the code in place. WebKit scans the GStreamer plugin registry and depending on the plugins available on the target platform, a mime-type cache is built, by the RegistryScanner. Our new image decoder just needs to hook into this component (exposed as singleton) so that we can be sure that the decoder will be used only for media types supported by GStreamer.

The second most important places of the decoder are its constructor and the setData() method. This is the place where the decoder receives encoded data, as a SharedBuffer, and performs the decoding. Because this method is synchronously called from a secondary thread, we run the GStreamer pipeline there, until the video has been decoded entirely. Our pipeline relies on the GStreamer decodebin element and WebKit’s internal video sink. For the time being hardware-accelerated decoding is not supported. We should soon be able to fix this issue though. Once all samples have been received by the sink, the decoder notifies its caller using a callback. The caller then knows it can request decoded frames.

Last, the decoder needs to provide decoded frames! This is implemented using the createFrameImageAtIndex() method. Our decoder implementation keeps an internal Map of the decoded samples. We sub-classed the MediaSampleGStreamer to provide an image() method, which returns the Cairo surface representing the decoded frame. Again, here we don’t support GL textures yet. Some more infrastructure work is likely going to be needed in that area of our WebKit ports.

Our implementation of the video ImageDecoder lives in ImageDecoderGStreamer which will be shipped in WebKitGTK and WPEWebKit 2.30, around September/October. But what if you want to try this already? Well, building WebKit can be a tedious task. We’ve been hard a work at Igalia to make this a bit easier using a new Flatpak SDK. So, you can either try this feature in Epiphany Tech Preview or (surprise!) with our new tooling allowing to download and run nightly binaries from the upstream WebKit build bots:

$ wget
$ chmod +x webkit-flatpak-run-nightly
$ python3 webkit-flatpak-run-nightly MiniBrowser

This script locally installs our new Flatpak-based developer SDK in ~/.cache/wk-nightly and then downloads a zip archive of the build artefacts from servers recently brought up by my colleague Carlos Alberto Lopez Perez, many thanks to him :). The downloaded zip file is unpacked in /tmp and kept around in case you want to run this again without re-downloading the build archive. Flatpak is then used to run the binaries inside a sandbox! This is a nice way to run the bleeding edge of the web-engine, without having to build it or install any distro package.

Implementing new features in WebKit is one of the many expertize domains we are involved in at Igalia. Our multimedia team is always on the lookout to help folks in their projects involving either GStreamer or WebKit or both! Don’t hesitate to reach out.

by Philippe Normand at June 10, 2020 07:00 AM

June 09, 2020

Alejandro Piñeiro

v3dv: quick guide to build and run some demos

Just today it has published a status update of the Vulkan effort for the Raspberry Pi 4, including that we are moving the development of the driver to an open repository. As it is really likely that some people would be interested on testing it, even if it is not complete at all, here you can find a quick guide to compile it, and get some demos running.


So let’s start installing some dependencies. My personal recipe, that I use every time I configure a new machine to work on mesa is the following one (sorry if some extra unneeded dependencies slipped):

sudo apt-get install libxcb-randr0-dev libxrandr-dev \
        libxcb-xinerama0-dev libxinerama-dev libxcursor-dev \
        libxcb-cursor-dev libxkbcommon-dev xutils-dev \
        xutils-dev libpthread-stubs0-dev libpciaccess-dev \
        libffi-dev x11proto-xext-dev libxcb1-dev libxcb-*dev \
        bison flex libssl-dev libgnutls28-dev x11proto-dri2-dev \
        x11proto-dri3-dev libx11-dev libxcb-glx0-dev \
        libx11-xcb-dev libxext-dev libxdamage-dev libxfixes-dev \
        libva-dev x11proto-randr-dev x11proto-present-dev \
        libclc-dev libelf-dev git build-essential mesa-utils \
        libvulkan-dev ninja-build libvulkan1 python-mako \
        libdrm-dev libxshmfence-dev libxxf86vm-dev \

Most Raspian libraries are recent enough, but they have been updating some of then during the past months, so just in case, don’t forget to update:

$ sudo apt-get update
$ sudo apt-get upgrade

Additionally, you woud need to install meson. Mesa has just recently bumped up the version needed for meson, so Raspbian version is not enough. There is the option to build meson from the tarball (meson-0.52.0 here), but by far, the easier way to get a recent meson version is using pip3:

$ pip3 install meson

2020-07-04 update

It seems that some people had problems if they have installed meson with apt-get on their system, as when building it would try the older meson version first. For those people, they were able to fix that doing this:

$ sudo apt-get remove meson
$ pip3 install --user meson

Download and build v3dv

This is the simpler recipe to build v3dv:

$ git clone mesa
$ cd mesa
$ git checkout wip/igalia/v3dv
$ meson --prefix /home/pi/local-install --libdir lib -Dplatforms=x11,drm -Dvulkan-drivers=broadcom -Ddri-drivers= -Dgallium-drivers=v3d,kmsro,vc4 -Dbuildtype=debug _build
$ ninja -C _build
$ ninja -C _build install

This builds and install a debug version of v3dv on a local directory. You could set a release build, or any other directory. The recipe is also building the OpenGL driver, just in case anyone want to compare, but if you are only interested on the vulkan driver, that is not mandatory.

Run some Vulkan demos

Now, the easiest way to ensure that a vulkan program founds the drivers is setting the following envvar:

export VK_ICD_FILENAMES=/home/pi/local-install/share/vulkan/icd.d/broadcom_icd.armv7l.json

That envvar is used by the Vulkan loader (installed as one of the dependencies listed before) to know which library load. This also means that you don’t need to use LD_PRELOAD, LD_LIBRARY_PATH or similar

So what Vulkan programs are working? For example several of the Sascha Willem Vulkan demos. To make things easier to everybody, here another quick recipe of how to get them build:

$ sudo apt-get install libassimp-dev
$ git clone --recursive  sascha-willems
$ cd sascha-willems
$ mkdir build; cd build
$ cmake -DCMAKE_BUILD_TYPE=Debug  ..
$ make

Update 2020-08-03: When the post was originally written, some demos didn’t need to ask for extra assets. Recently the fonts were moved there, so you would need to gather the assests always:

$ cd ..
$ python3

So in order to see a really familiar demo:

$ cd build/bin
$ ./gears

And one slightly more complex:


As mentioned, not all the demos works. But a list of some that we tested and seem to work:
* distancefieldfonts
* descriptorsets
* dynamicuniformbuffer
* gears
* gltfscene
* imgui
* indirectdraw
* occlusionquery
* parallaxmapping
* pbrbasic
* pbribl
* pbrtexture
* pushconstants
* scenerendering
* shadowmapping
* shadowmappingcascade
* specializationconstants
* sphericalenvmapping
* stencilbuffer
* textoverlay
* texture
* texture3d
* texturecubemap
* triangle
* vulkanscene

Update : rpiMike on the comments, and some people privately, have pointed some errors on the post. Thanks! And sorry for the inconvenience.

Update 2 : Mike Hooper pointed more issues on gitlab

by infapi00 at June 09, 2020 09:33 AM

June 08, 2020

Jacobo Aragunde

Dialog accessibility in Chromium

In the latest weeks I’ve been identifying and fixing several issues related to accessibility on dialogs (called “bubbles” in the code base), specially but not limited to the Linux platform.

It all started with the “Restore pages” dialog that appears when restarting after a browser crash. ATs, like screen readers, were not being notified about the presence of that dialog due to it using an incorrect role, which made it impossible for a blind user to find it out unless by chance, tabbing through the application.

While I was working on that, I detected more issues related to this and other dialogs, so I started reporting and fixing individually. They also led me to an existing meta-bug related to the “restore pages” dialog and accessibility… In the end, this is what I accomplished:

For the original issue with ATs not being able to report the “restore pages” dialog, a quick solution was to make this subwindow use the appropriate ATK role, “alert”, and implement some code in Orca, the screen reader, to detect an alert on a newly created browser window.

Now that users are notified of the presence of that dialog, it would be great to provide them with a way to focus it directly. Two related hotkeys are available in Chromium: F6 to rotate the pane focus, which should focus dialogs first if they are present, and Alt+Shift+A to specifically focus a dialog; but they did not work with that dialog. I fixed this problem, making the hotkey handler code look for dialogs anchored to the menu icon, where the “restore pages” dialog is located. This problem was affecting all platforms, so it’s been a big gain!

Testing the existing hotkey code led me to trying out other kinds of dialogs, and I noticed that permission dialogs (like “a website wants to know your location”) had similar problems: they were not notified and not affected by hotkeys. I fixed both things by making sure that the dialogs have the proper role, that the alert events are properly managed by the Linux accessibility backend, and checking the browser omnibar for anchored dialogs when the focus hotkeys are used.

I detected similar problems in the “store password” bubble; the Orca screen reader was unable to announce that, because it didn’t have the expected role nor it did emit the proper events. Changing the role of the bubble was enough to activate the code that triggered the events, fixing both problems at the same time.

Finally, working with dialogs and alerts made us reconsider their role mappings for ATK, which we decided to modify to better match Chromium and ARIA roles.

There are more enhancements in the backlog, for example, we will try to minimize the number of redundant alert events or come up with a more general solution to decide the role of the dialog, which is also causing similar problems to Windows accessibility (e.g. on the “restore pages” dialog).

Thanks a lot to everyone who helped land these patches, specially Googlers who provided their feedback on reviews!

by Jacobo Aragunde Pérez at June 08, 2020 04:30 PM

June 07, 2020

Ricardo García

Visualizing images from VK-GL-CTS test results

When running OpenGL or Vulkan tests normally from VK-GL-CTS, the test suite executable will usually produce a file named TestResults.qpa containing test results in XML format. Sometimes either by default or when a test fails, this file contains output images obtained typically from the test output itself and maybe a reference image the former is being compared to. In addition, sometimes an error mask is provided so implementors can easily detect cases of pixels falling outside the expected result range.

These images are normally converted to PNGs using 32-bits per pixel (RGBA8) and represented in the test log as a string of Base64-encoded binary data. Representing the result image in that format can be an approximation exercise when the original format or type is very different (like R32G32B32A32_SFLOAT or a 3D image), but in some situations the result image can be faithfully represented in that chunk of PNG-encoded data.

To view those images, the VK-GL-CTS README file mentioned the possibility of using the Cherry tool. Given that it requires setting up a web server and running tests in a special way, some people would typically rely on external tools or scripts instead, like the base64 command line tool included in GNU coreutils, which can encode and decode Base64 content.

My Igalia colleague Eduardo Lima, however, had another idea. Most people have a tool in their systems which is capable of directly displaying Base64-encoded PNG data: a web browser. With a little bit of Javascript, he had created a self-contained and single-page web app you could use to view images by pasting the PNG-encoded version of those images directly. The tool created <img> elements on the page from the pasted text, and used data:image/png;base64,<Base64-encoded data> as the image src property. It also allowed comparing two images side by side and was, in general, incredibly handy.

Alejandro Piñeiro, currently working on the Raspberry Pi Vulkan driver also at Igalia, suggested improving the existing tool by allowing it to read TestResults.qpa files directly in order to reduce friction. I’m far from a web programmer and I’m sorry for the many times I have probably sinned along the way, but I took his suggestions and my own pet peeves with the existing tool and implemented an improved version of it. I submitted my new tool for review and inclusion as part of VK-GL-CTS and I’m happy to say it landed not long ago. If you use VK-GL-CTS, do not hesitate to open qpa_image_viewer.html from the scripts subdirectory in your web browser and give it a go when visualizing test results.

I have also uploaded a copy of the tool to my own personal space at Igalia. Feel free to bookmark it and use it when needed. It’s just a few KB in size. As I mentioned before, it’s self-contained and standalone, so everything happens locally in your browser. You can read its source code as it’s also relatively small. You can open TestResults.qpa files directly from your system and you can also paste chunks of text containing <Image> elements. It accumulates images in the images section at the bottom, i.e., every time you tell it to process a chunk of text or process a new file, it will add the images it finds to the images section. To ease identifying problems, it also includes a built-in zoom tool. Images have the image-rendering: pixelated CSS property, which is supported in Chromium and WebKit-based browsers (sadly, not Firefox for now), making the zooming process use the nearest-neighbor approximation instead of a linear interpolation, so pixels are represented as faithfully as possible when scaling.

June 07, 2020 01:59 PM

June 05, 2020

Igalia Compilers Team

What we do at Igalia’s Compiler Team

Compilers for the web

At Igalia, our development teams have included a team specializing in compilers since around 2012. Since most tech companies don’t work on compilers or even more generally on programming language implementation, you might be wondering “What does a compilers team even do?”. This blog post will try to explain, as well as highlight some of our recent work.

While many companies who work on compilers own or maintain their own programming language (e.g., like Google and Go, Apple and Swift, Mozilla and Rust, etc.), domain-specific compiler or language, Igalia is a little bit different.

Since we are a consulting company, our compiler team instead helps maintain and improve existing free software/open source programming language implementations, with a focus on languages for the web. In other words, we help improve JavaScript engines and, more recently, WebAssembly (Wasm) runtimes.

To actually do the work, Igalia has grown a compilers team of developers from a variety of backgrounds. Some of our developers came into the job from a career in industry, and others from a research or academic setting. Our developers are contributors to a variety of non-web languages as well, including functional programming languages and scripting languages.

Our recent work

Given our team’s diverse backgrounds, we are able to work on not only compiler implementations (which includes compilation, testing, maintenance, and so on) but also in the standardization process for language features. To be more specific, here are some examples of projects we’re working on, split into several areas:

  • Maintenance: We work on the maintenance of JS engines to make sure they work well on platforms that our customers care about. For example, we maintain the support for 32-bit architectures in JavaScriptCore (WebKit’s JS engine). This is especially important to us because WebKit is used on billions of embedded devices and we are the maintainers of WPE, the official WebKit port for embdedded systems.
    • This involves things like making sure that CI continues to pass on platforms like ARMv7 and MIPS, and also making sure that JS engine performance is good on these platforms.
    • Recently, some of our developers have been sharing their knowledge about JSC development in several blog posts. [1], [2], [3]
  • JS feature development & standardization: We also work on implementing features proposed by the web platform community in all of the major JS engines, and we work on standardizing features as participants in TC39.
    • Recently we have been doing a lot of work around class fields and private methods in multiple browsers.
    • We’re also involved in the work on the Temporal proposal for better date/time management in JS.
    • Another example of our recent work in standardization is the BigInt feature, which is now part of the language specification for JS. Igalians led work on both the specification and also its implementation in browsers. [1], [2] We are currently working on integrating BigInts with WebAssembly as well.
  • WebAssembly: In the last year, we have gotten more involved in helping to improve Wasm, the new low-level compiler target language for the web (so that you can write C/C++/etc. code that will run on the web).

In the future, we’ll continue to periodically put pointers to our recent compilers work on this blog, so please follow along!

If you think you might be interested in helping to expand the web platform as a customer, don’t hesitate to get in touch!

by compilers at June 05, 2020 06:28 PM

Paulo Matos

JSC: what are my options?

Compilers tend to be large pieces of software that provide an enormous amount of options. We take a quick look at how to find what JavaScriptCore (JSC) provides.


by Paulo Matos at June 05, 2020 12:23 PM

June 03, 2020

Andy Wingo

a baseline compiler for guile

Greets, my peeps! Today's article is on a new compiler for Guile. I made things better by making things worse!

The new compiler is a "baseline compiler", in the spirit of what modern web browsers use to get things running quickly. It is a very simple compiler whose goal is speed of compilation, not speed of generated code.

Honestly I didn't think Guile needed such a thing. Guile's distribution model isn't like the web, where every page you visit requires the browser to compile fresh hot mess; in Guile I thought it would be reasonable for someone to compile once and run many times. I was never happy with compile latency but I thought it was inevitable and anyway amortized over time. Turns out I was wrong on both points!

The straw that broke the camel's back was Guix, which defines the graph of all installable packages in an operating system using Scheme code. Lately it has been apparent that when you update the set of available packages via a "guix pull", Guix would spend too much time compiling the Scheme modules that contain the package graph.

The funny thing is that it's not important that the package definitions be optimized; they just need to be compiled in a basic way so that they are quick to load. This is the essential use-case for a baseline compiler: instead of trying to make an optimizing compiler go fast by turning off all the optimizations, just write a different compiler that goes from a high-level intermediate representation straight to code.

So that's what I did!

it don't do much

The baseline compiler skips any kind of flow analysis: there's no closure optimization, no contification, no unboxing of tagged numbers, no type inference, no control-flow optimizations, and so on. The only whole-program analysis that is done is a basic free-variables analysis so that closures can capture variables, as well as assignment conversion. Otherwise the baseline compiler just does a traversal over programs as terms of a simple tree intermediate language, emitting bytecode as it goes.

Interestingly the quality of the code produced at optimization level -O0 is pretty much the same.

This graph shows generated code performance of the CPS compiler relative to new baseline compiler, at optimization level 0. Bars below the line mean the CPS compiler produces slower code. Bars above mean CPS makes faster code. You can click and zoom in for details. Note that the Y axis is logarithmic.

The tests in which -O0 CPS wins are mostly because the CPS-based compiler does a robust closure optimization pass that reduces allocation rate.

At optimization level -O1, which adds partial evaluation over the high-level tree intermediate language and support for inlining "primitive calls" like + and so on, I am not sure why CPS peels out in the lead. No additional important optimizations are enabled in CPS at that level. That's probably something to look into.

Note that the baseline of this graph is optimization level -O1, with the new baseline compiler.

But as I mentioned, I didn't write the baseline compiler to produce fast code; I wrote it to produce code fast. So does it actually go fast?

Well against the -O0 and -O1 configurations of the CPS compiler, it does excellently:

Here you can see comparisons between what will be Guile 3.0.3's -O0 and -O1, compared against their equivalents in 3.0.2. (In 3.0.2 the -O1 equivalent is actually -O1 -Oresolve-primitives, if you are following along at home.) What you can see is that at these optimization levels, for these 8 files, the baseline compiler is around 4 times as fast.

If we compare to Guile 3.0.3's default -O2 optimization level, or -O3, we see bigger disparities:

Which is to say that Guile's baseline compiler runs at about 10x the speed of its optimizing compiler, which incidentally is similar to what I found for WebAssembly compilers a while back.

Also of note is that -O0 and -O1 take essentially the same time, with -O1 often taking less time than -O0. This is because partial evaluation can make the program smaller, at a cost of being less straightforward to debug.

Similarly, -O3 usually takes less time than -O2. This is because -O3 is allowed to assume top-level bindings that aren't exported from a module can be transformed to lexical bindings, which are more available for contification and inlining, which usually leads to smaller programs; it is a similar debugging/performance tradeoff to the -O0/-O1 case.

But what does one gain when choosing to spend 10 times more on compilation? Here I have a gnarly graph that plots performance on some microbenchmarks for all the different optimization levels.

Like I said, it's gnarly, but the summary is that -O1 typically gets you a factor of 2 or 4 over -O0, and -O2 often gets you another factor of 2 above that. -O3 is mostly the same as -O2 except in magical circumstances like the mbrot case, where it adds an extra 16x or so over -O2.

worse is better

I haven't seen the numbers yet of this new compiler in Guix, but I hope it can have a good impact. Already in Guile itself though I've seen a couple interesting advantages.

One is that because it produces code faster, Guile's boostrap from source can take less time. There is also a felicitous feedback effect in that because the baseline compiler is much smaller than the CPS compiler, it takes less time to macro-expand, which reduces bootstrap time (as bootstrap has to pay the cost of expanding the compiler, until the compiler is compiled).

The second fortunate result is that now I can use the baseline compiler as an oracle for the CPS compiler, when I'm working on new optimizations. There's nothing worse than suspecting that your compiler miscompiled itself, after all, and having a second compiler helps keep me sane.

stay safe, friends

The code, you ask? Voici.

Although this work has been ongoing throughout the past month, I need to add some words on the now before leaving you: there is a kind of cognitive dissonance between nerding out on compilers in the comfort of my home, rain pounding on the patio, and at the same time the world on righteous fire. I hope it is clear to everyone by now that the US police are an essentially racist institution: they harass, maim, and murder Black people at much higher rates than whites. My heart is with the protestors. Godspeed to you all, from afar. At the same time, all my non-Black readers should reflect on the ways they participate in systems that support white supremacy, and on strategies to tear them down. I know I will be. Stay safe, wear eye protection, and until next time: peace.

by Andy Wingo at June 03, 2020 08:39 PM

May 26, 2020

Brian Kardell

Web Engine Diversity and Ecosystem Health

Web Engine Diversity and Ecosystem Health

For many years, we've seen lots of blog posts and conversation on the topic of "browser engine diversity". I'd like to offer a slightly tilted view of this based instead on "the health of the ecosystem", and explain why I think this is more valuable measure and way to discuss these topics.

Back in January, Jeremy Keith compared the complexity of browser engine diversity to political systems...

If you have hundreds of different political parties, that’s not ideal. But if you only have one political party, that’s very bad indeed!

I like this analogy because I think he's right in even more ways than he intended. It's almost self-evident: 1 is too few, a hundred is too many - it's just chaos and noise. But what is the ideal number? This answer dogs us a lot, in part because it is not only unanswerable, it's actually kind of a red herring. I think there's something to this that leads to an important takeaway about how we think about the problem...

Numbers and goals...

The interesting part about this analogy is that the simple number of political parties isn't actually a very meaningful measure: Parties can differ a lot, or they can differ a little. 6 parties that barely disagree is quite a different thing than 3 that disagree substantially. Further, it's not simply a matter of giving voice to every dissenting opinion that matters either: There are political parties formed on ideas of hate, for example. Adding those doesn't add good things. In short, what really matters is the goal of those numbers.

In the case of the political analogy, the real aim is about what makes for a healthy, effective and just model of governance. Similarly on the topic of "browser engine diversity", I believe that the real goal is "What makes for a healthy and effective ecosystem?" and that the numbers we frequently talk about (and how) can easily lead us into perhaps the wrong takeaways.

Older engines and diversity

It is fairly common that discussions of browser diversity point to the Web's own history for examples of "why engine diversity matters": IE vs Netscape are often held up. IE's dominance and ability to dictate the market (or even kill it in this case) was overcome by diversity. This is true, but very incomplete: There's a lot more beneath the surface...

Consider, for example, the fact that for a time there were only two mainstream browsers - and the very dominant one (IE) was both entirely proprietary and based on a single proprietary OS.

Consider how different this could have been, if IE had been open source and properly licensed. Even if Microsoft didn't want it to run on other operating systems, someone else could step in and fill that gap. If Microsoft walked away from the Web, or worse, went belly up, someone could fork and take over the project. If, for any reason at all, the primary maintainer cannot prioritize - other people and organization could more directly invest in advancements.

A while later, Opera's Presto was very interesting, even multi-OS - but ultimately similarly proprietary.

So, while we had greater 'diversity', these numbers alone aren't a great measure of the overall health of the ecosystem.

The most healthy, open ecosystem ever.

This is in stark contrast to the situation today: In important ways, we are a more diverse, efficient and healthier ecosystem with the three multi-os, open source engines we have left (Blink, Gecko and WebKit) than when we had had more and were dominated by projects that weren't that at all.

A lot of people seem to want to suggest that there aren't really 3 today because blink was born from a fork of WebKit, or because they think that WebKit is somehow "Just for iOS".

There are certainly 3 entirely different JavaScript engines, and 3 entirely different architectures. There are some bits that remain similar, but the truth is that that this is neither new nor easily solvable. Web engines today are so astonishingly complex (measured in tens of millions of lines of code) that we only get new ones through evolution. For the most part, this has been true for a long time. Even Netscape was a "take 2" from many of the same minds from Mosaic. Mozilla was created from a rework of Netscape. Even IE was born by licensing the Mosaic code. WebKit was born by forking KHTML. In other words: The critical investment required to create a full-blown compliant browser from nothing would be mind-boggling if everything were stopped today - and it never stops.

Today, the 3 that remain are all multi-OS.

"...Wait, even WebKit?" - you, just now.

Yes! I'm glad you asked! The GNOME flagship Epiphany/Web browser for Linux is WebKit based -- and billions of devices are based on Embedded WebKit. PS4 is a fun example of something lots of people might be familliar with -- not only the Web browser on your PS4, but in fact lots of the UI itself is made with Web content running in a WebKit based browser. But chances are pretty good that you encounter emdedded WebKit all the time: On cable boxes, smart TVs and appliances, kiosks, car and airplane infotainment systems, digital signage and so on.

This is possible because WebKit (and all of the engines today) are open source. Because of this, they all receive investments more widely than the org that maintains them, and that's expanding in important ways that are very good for the health of the ecosystem.

Igalia, where I work, for example are able to work on all of the browsers and help expand investment in advancing the commons. The list of things we have helped advance, and who has helped fund that is growing all the time: From recent things in JavaScript like class fields and Big Integer, to CSS features like CSS Grid, or features in HTML like lazy loading - the benefits of this are clear.

This also leads to lots of interesting new opportunities. For example, we are the creators/maintainers of WPE, the official WebKit Port for Embedded that powers tons of those devices mentioned above. This matters a lot because a rising tide lifts all boats in the commons. You can see an example of this in that Igalia is advancing work on SVG2 and hardware acceleration for SVG in WebKit - a topic which has failed to get the prioritization for years, but is especially critical for lots of embedded use cases. If you're interested in this, as well as some interesting history of SVG, WebKit, HTML/CSS and embedded browsers, have a listen to this.

Opportunities to do better...

There is another interesting thing here that is always left out of these discussions but I think is considerably interesting in this reframing. A funny little quirk of history is that while we say there are 3 rendering engines, that's only partially true.

More specifically, there are 3 modern browser rendering engines, and a bunch of other rendering engines that aren't that.

Amazon, for example, has a renderer that is used for ebooks and printing. PrinceXML has a renderer used for print. Antenna house too. And there are more still.

For the most part, these are proprietary and what they do and don't support isn't necessarily clear cut. Their support for modern standards is pretty ragged and the gaps are only likely to grow faster. This is a shame as there are lots of potentially useful things for these industries, like Houdini which are unlikely to ever exist for them.

This is a topic I'd love to see discussed more for several reasons - but mainy they center on the fact that it is just harder for us to move forward together if things are too fragmented. Instead of growing together and a rising tide lifting all boats - the boats are kind of all in entirely different bodies of water.

I would love to see some effort to resolve this. Imagine if these vendors came into the fray and either invested in basing their work on modern engines (hopefully various), or pulled together to create something new. That might be the one practical way we could arrive at an actual viable 4th engine somehow.

Why now is a good time to discuss it

Most of these engines also contain some things that were developed in standards organizations but which no browser supports... That's kind of why they were created. However, there are new opportunities to align...

Web engine architectures are like a lot of engineering projects - they grow, they get crufty, you get locked into certain boundaries that you know you'd like to escape. However,until you do you can't really afford to tackle certain kinds of problems. That's a big part of why browser engines today don't do a lot of the things you needed to fill their use cases. Browser architectures have traditionally, generally not been well-prepared for the problems of print or ebooks, and that's part of why we find ourselves in the current situation.

However, for many years, there has been much rework and rearchitecture of layout engines aimed at solving precisely the fundamental sorts of things that prevented those sorts of things from being considered.

Stir in that most of our office suites and things are now web based and there's considerable incentives to figuring out things like good print and pagination.

More open prioritization

But it's not just the projects being open source that matters, it's also that this affords us new opportunities for standardization itself.

Ultimately, it is the calculus of prioritization which impacts our ability to get things done almost more than anything else. Historically, lots of companies get together and, effectively, ask that a very few companies do the work. They are big organizations with big investments, to be sure - but they are all constrained by hard limits. The gauntlet of issues surrounding our ability to prioritize everything from attention to actual implementation have stymied many a topic.

But in this new age, companies like Igalia are showing that there is potentially a whole new game here: Where centralized, concrete investments from outside that small group can lift up the commons, benefit everyone and help drive progress.

In other words: Things are better and healthier because we continue to find better ways to work together. And when we do, everyone does better.

May 26, 2020 04:00 AM

May 14, 2020

Mario Sanchez Prada

The Web Platform Tests project

Web Browsers and Test Driven Development

Working on Web browsers development is not an easy feat but if there’s something I’m personally very grateful for when it comes to collaborating with this kind of software projects, it is their testing infrastructure and the peace of mind that it provides me with when making changes on a daily basis.

To help you understand the size of these projects, they involve millions of lines of code (Chromium is ~25 million lines of code, followed closely by Firefox and WebKit) and around 200-300 new patches landing everyday. Try to imagine, for one second, how we could make changes if we didn’t have such testing infrastructure. It would basically be utter and complete chao​s and, more especially, it would mean extremely buggy Web browsers, broken implementations of the Web Platform and tens (hundreds?) of new bugs and crashes piling up every day… not a good thing at all for Web browsers, which are these days some of the most widely used applications (and not just ‘the thing you use to browse the Web’).

The Chromium Trybots in action
The Chromium Trybots in action

Now, there are all different types of tests that Web engines run automatically on a regular basis: Unit tests for checking that APIs work as expected, platform-specific tests to make sure that your software runs correctly in different environments, performance tests to help browsers keep being fast and without increasing too much their memory footprint… and then, of course, there are the tests to make sure that the Web engines at the core of these projects implement the Web Platform correctly according to the numerous standards and specifications available.

And it’s here where I would like to bring your attention with this post because, when it comes to these last kind of tests (what we call “Web tests” or “layout tests”), each Web engine used to rely entirely on their own set of Web tests to make sure that they implemented the many different specifications correctly.

Clearly, there was some room for improvement here. It would be wonderful if we could have an engine-independent set of tests to test that a given implementation of the Web Platform works as expected, wouldn’t it? We could use that across different engines to make sure not only that they work as expected, but also that they also behave exactly in the same way, and therefore give Web developers confidence on that they can rely on the different specifications without having to implement engine-specific quirks.

Enter the Web Platform Tests project

Good news is that just such an ideal thing exists. It’s called the Web Platform Tests project. As it is concisely described in it’s official site:

“The web-platform-tests project is a cross-browser test suite for the Web-platform stack. Writing tests in a way that allows them to be run in all browsers gives browser projects confidence that they are shipping software which is compatible with other implementations, and that later implementations will be compatible with their implementations.”

I’d recommend visiting its website if you’re interested in the topic, watching the “Introduction to the web-platform-tests” video or even glance at the git repository containing all the tests here. Here, you can also find specific information such as how to run WPTs or how to write them. Also, you can have a look as well at the dashboard to get a sense of what tests exists and how some of the main browsers are doing.

In short: I think it would be safe to say that this project is critical to the health of the whole Web Platform, and ultimately to Web developers. What’s very, very surprising is how long it took to get to where it is, since it came into being only about halfway into the history of the Web (there were earlier testing efforts at the W3C, but none that focused on automated & shared testing). But regardless of that, this is an interesting challenge: Filling in all of the missing unified tests, while new things are being added all the time!

Luckily, this was a challenge that did indeed took off and all the major Web engines can now proudly say that they are regularly running about 36500 of these Web engine-independent tests (providing ~1.7 million sub-tests in total), and all the engines are showing off a pass rate between 91% and 98%. See the numbers below, as extracted from today’s WPT data:

Chrome 84 Edge 84 Firefox 78 Safari 105 preview
Pass Total Pass Total Pass Total Pass Total
1680105 1714711 1669977 1714195 1640985 1698418 1543625 1695743
Pass rate: 97.98% Pass rate: 97.42% Pass rate: 96.62% Pass rate: 91.03%

And here at Igalia, we’ve recently had the opportunity to work on this for a little while and so I’d like to write a bit about that…

Upstreaming Chromium’s tests during the Coronavirus Outbreak

As you all know, we’re in the middle of an unprecedented world-wide crisis that is affecting everyone in one way or another. One particular consequence of it in the context of the Chromium project is that Chromium releases were paused for a while. On top of this, some constraints on what could be landed upstream were put in place to guarantee quality and stability of the Chromium platform during this strange period we’re going through these days.

These particular constraints impacted my team in that we couldn’t really keep working on the tasks we were working on up to that point, in the context of the Chromium project. Our involvement with the Blink Onion Soup 2.0 project usually requires the landing of relatively large refactors, and these kind of changes were forbidden for the time being.

Fortunately, we found an opportunity to collaborate in the meantime with the Web Platform Tests project by analyzing and trying to upstream many of the existing Chromium-specific tests that haven’t yet been unified. This is important because tests exist for widely used specifications, but if they aren’t in Web Platform Tests, their utility and benefits are limited to Chromium. If done well, this would mean that all of the tests that we managed to upstream would be immediately available for everyone else too. Firefox and WebKit-based browsers would not only be able to identify missing features and bugs, but also be provided with an extra set of tests to check that they were implementing these features correctly, and interoperably.

The WPT Dashboard
The WPT Dashboard

It was an interesting challenge considering that we had to switch very quickly from writing C++ code around the IPC layers of Chromium to analyzing, migrating and upstreaming Web tests from the huge pool of Chromium tests. We focused mainly on CSS Grid Layout, Flexbox, Masking and Filters related tests… but I think the results were quite good in the end:

As of today, I’m happy to report that, during the ~4 weeks we worked on this my team migrated 240 Chromium-specific Web tests to the Web Platform Tests’ upstream repository, helping increase test coverage in other Web Engines and thus helping towards improving interoperability among browsers:

  • CSS Flexbox: 89 tests migrated
  • CSS Filters: 44 tests migrated
  • CSS Masking: 13 tests migrated
  • CSS Grid Layout: 94 tests migrated

But there is more to this than just numbers. Ultimately, as I said before, these migrations should help identifying missing features and bugs in other Web engines, and that was precisely the case here. You can easily see this by checking the list of automatically created bugs in Firefox’s bugzilla, as well as some of the bugs filed in WebKit’s bugzilla during the time we worked on this.

…and note that this doesn’t even include the additional 96 Chromium-specific tests that we analyzed but determined were not yet eligible for migrating to WPT (normally because they relied on some internal Chromium API or non-standard behaviour), which would require further work to get them upstreamed. But that was a bit out of scope for those few weeks we could work on this, so we decided to focus on upstreaming the rest of tests instead.

Personally, I think this was a big win for the Web Platform and I’m very proud and happy to have had an opportunity to have contributed to it during these dark times we’re living, as part of my job at Igalia. Now I’m back to working on the Blink Onion Soup 2.0 project, where I think I should write about too, but that’s a topic for a different blog post.

Credit where credit is due

IgaliaI wouldn’t want to finish off this blog post without acknowledging all the different contributors who tirelessly worked on this effort to help improve the Web Platform by providing the WPT project with these many tests more, so here it is:

From the Igalia side, my whole team was the one which took on this challenge, that is: Abhijeet, Antonio, Gyuyoung, Henrique, Julie, Shin and myself. Kudos everyone!

And from the reviewing side, many people chimed in but I’d like to thank in particular the following persons, who were deeply involved with the whole effort from beginning to end regardless of their affiliation: Christian Biesinger, David Grogan, Robert Ma, Stephen Chenney, Fredrik Söderquist, Manuel Rego Casasnovas and Javier Fernandez. Many thanks to all of you!

Take care and stay safe!

by mario at May 14, 2020 09:07 AM

May 11, 2020

Gyuyoung Kim

How Chromium Got its Mojo?

Chromium IPC

Chromium has a multi-process architecture to become more secure and robust like modern operating systems, and it means that Chromium has a lot of processes communicating with each other. For example, renderer process, browser process, GPU process, utility process, and so on. Those processes have been communicating using IPC [1].

Why is Mojo needed?

As a long-term intent, the Chromium team wanted to refactor Chromium into a large set of smaller services. To achieve that, they had considered below questions [3]

  • Which services we bring up?
  • How can we isolate these services to improve security and stability?
  • Which binary features can we ship?

They learned much from using the legacy Chromium IPC and maintaining Chromium dependencies over the past years. They felt a more robust messaging layer could allow them to integrate a large number of components without link-time interdependencies as well as help to build more and better features, faster, and with much less cost to users. So, that’s why Chromium team begins to make the Mojo communication framework.

From the performance perspective, Mojo is 3 times faster than IPC, and ⅓ less context switching compared to the old IPC in Chrome [3]. Also, we can remove unnecessary layers like content/renderer layer to communicate between different processes. Because combined with generated code from the Mojom IDL, we can easily connect interface clients and implementations across arbitrary inter-process boundaries. Lastly, Mojo is a collection of runtime libraries providing a platform-agnostic abstraction of common IPC primitives. So, we can build higher-level bindings APIs to simplify messaging for developers writing C++, Java, Javascript.

Status of migrating legacy IPC to Mojo

Igalia has been working on the migration since this year in earnest. But, hundreds of IPCs still remain in Chromium. The below chart shows the progress of migrating legacy IPC to Mojo [4].

Mojo Terminology

Let’s take a look at the key terminology before starting the migration briefly.

  • Message Pipe: A pair of endpoints and either endpoint may be transferred over another message pipe. Because we bootstrap a primordial message pipe between the browser process and each child process, eventually this means that a new pipe we create ultimately sends either end to any process, and the two ends will still be able to talk to each other seamlessly and exclusively. We don’t need to use routing ID anymore. Each point has a queue of incoming messages.
  • Mojom file: Define interfaces, which are strongly-typed collections of messages. Each interface message is roughly analogous to a single prototype message
  • Remote: Used to send messages described by the interface.
  • Receiver: Used to receive the interface messages sent by Remote.
  • PendingRemote: Typed container to hold the other end of a Receiver’s pipe.
  • PendingReceiver: Typed container to hold the other end of a Remote’s pipe.
  • AssociatedRemote/Receiver: Similar to a Remote and a Receiver. But, they run on multiple interfaces over a single message pipe while preserving message order, because the AssociatedRemote/Receiver was implemented by using the IPC::Channel used by legacy IPC messages.

Example of migrating a legacy IPC to Mojo

In the following example, we migrate WebTestHostMsg_SimulateWebNotificationClose to illustrate the conversion from legacy IPC to Mojo.

The existing WebTestHostMsg_SimulateWebNotificationClose IPC

  1. Message definition
    File: content/shell/common/web_test/web_test_messages.h
                    std::string /*title*/,  bool /*by_user*/)
  • Send the message in the renderer
    File: content/shell/renderer/web_test/
  • void BlinkTestRunner::SimulateWebNotificationClose(
        const std::string& title, bool by_user) {
      Send(new WebTestHostMsg_SimulateWebNotificationClose(
        routing_id(), title, by_user));
  • Receive the message in the browser
    File: content/shell/browser/web_test/
  • bool WebTestMessageFilter::OnMessageReceived(
        const IPC::Message& message) {
      bool handled = true;
      IPC_BEGIN_MESSAGE_MAP(WebTestMessageFilter, message)
  • Call the handler in the browser
    File: content/shell/browser/web_test/
  • void WebTestMessageFilter::OnSimulateWebNotificationClose(
        const std::string& title, bool by_user) {
          SimulateClose(title, by_user);

    Call flow after migrating the legacy IPC to Mojo

    We begin to migrate WebTestHostMsg_SimulateWebNotificationClose to WebTestClient interface from here. First, let’s see an overall call flow through simple diagrams. [5]

    1. The WebTestClientImpl factory method is called with passing the WebTestClientImpl PendingReceiver along to the Receiver.
    2. The receiver takes ownership of the WebTestClientImpl PendingReceiver’s pipe endpoint and begins to watch it for incoming messages. The pipe is readable immediately, so a task is scheduled to read the pending SimulateWebNotificationClose message from the pipe as soon as possible.
    3. The WebTestClientImpl message is read and deserialized, then, it will make the Receiver to invoke the WebTestClientImpl::SimulateWebNotificationClose() implementation on its bound WebTestClientImpl.

    Migrate the legacy IPC to Mojo

    1. Write a mojom file
      File: content/shell/common/web_test/web_test.mojom
    module content.mojom;
    // Web test messages sent from the renderer process to the
    // browser. 
    interface WebTestClient {
      // Simulates closing a titled web notification depending on the user
      // click.
      //   - |title|: the title of the notification.
      //   - |by_user|: whether the user clicks the notification.
      SimulateWebNotificationClose(string title, bool by_user);
  • Add the mojom file to a proper GN target.
    File: content/shell/
  • mojom("web_test_common_mojom") {
      sources = [
  • Implement the interface files
    File: content/shell/browser/web_test/web_test_client_impl.h
  • #include "content/shell/common/web_test.mojom.h"
    class WebTestClientImpl : public mojom::WebTestClient {
      WebTestClientImpl() = default;
      ~WebTestClientImpl() override = default;
      WebTestClientImpl(const WebTestClientImpl&) = delete;
      WebTestClientImpl& operator=(const WebTestClientImpl&) = delete;
      static void Create(
          mojo::PendingReceiver<mojom::WebTestClient> receiver);
      // WebTestClient implementation.
     void SimulateWebNotificationClose(const std::string& title,
                             bool by_user) override;
  • Implement the interface files
    File: content/shell/browser/web_test/
  • void WebTestClientImpl::SimulateWebNotificationClose(
        const std::string& title, bool by_user) {
            SimulateClose(title, by_user);
  • Creating an interface pipe
    File: content/shell/renderer/web_test/blink_test_runner.h
  • mojo::AssociatedRemote<mojom::WebTestClient>&

    File: content/shell/renderer/web_test/

    BlinkTestRunner::GetWebTestClientRemote() {
      if (!web_test_client_remote_) {
       return web_test_client_remote_;
  • Register the WebTest interface
    File: content/shell/browser/web_test/
  • void WebTestContentBrowserClient::ExposeInterfacesToRenderer {
     void WebTestContentBrowserClient::BindWebTestController(
         int render_process_id,
         StoragePartition* partition,
               receiver) {
  • Call an interface message in the renderer
    File: content/shell/renderer/web_test/
  • void BlinkTestRunner::SimulateWebNotificationClose(
     const std::string& title, bool by_user) {
         SimulateWebNotificationClose(title, by_user);
  • Receive the incoming message in the browser
    File: content/shell/browser/web_test/
  • void WebTestClientImpl::SimulateWebNotificationClose(
     const std::string& title, bool by_user) {
         SimulateClose(title, by_user);

    Appendix: A case study of Regression

    There were a lot of flaky web test failures after finishing the migration of WebTestHostMsg to Mojo. The failures were caused by using ‘Remote’ instead of ‘AssociatedRemote’ for WebTestClient interface in the BlinkTestRunner class. Because BlinkTestRunner was using the WebTestControlHost interface for ‘PrintMessage’ as an ‘AssociatedRemote’. But, ‘Remote’ used by WebTestClient didn’t guarantee the message order between ‘PrintMessage’ and ‘InitiateCaptureDump’ message implemented by different interfaces(WebTestControlHost vs. WebTestClient). Thus, tests had often finished before receiving all logs. The actual results could be different from the expected results.

    Changing Remote with AssociatedRemote for the WebTestClient interface solved the flaky test issues.

    [1] Inter-process Communication (IPC)
    [2] Mojo in Chromium
    [3] Mojo & Servicification Performance Notes
    [4] Chrome IPC legacy Conversion Status
    [5] Convert Legacy IPC to Mojo



    by gyuyoung at May 11, 2020 07:46 AM

    April 29, 2020

    Brian Kardell

    Interactive Elements: A Strange Game

    Note from the author...

    My posts frequently (like this one) have a 'theme' and tend to use a number of images for visual flourish. Personally, I like it that way, I find it more engaging and I prefer for people to read it that way. However, for users on a metered or slow connection, downloading unnecessary images is, well, unnecessary, potentially costly and kind of rude. Just to be polite to my users, I offer the ability for you to opt out of 'optional' images if the total size of viewing the page would exceed a budget I have currently defined as 200k...

    Interactive Elements: A Strange Game

    It can seem especially frustrating to a lot of developers that we don't have more elements "out of the box" as part of HTML. I mean: Where are the tabs? How do we not have tabs by now? It seems like almost every week this comes up in some fashion or other. In this piece I'll talk a little about why, how to see this for what it is, how we're trying to move forward - and also present an interesting experiment for your consideration and feedback that may help inform some of these efforts. If you want, you can skip right to the 'new idea/experiment', but I think this background is pretty helpful in understanding it, so please, come back and read it...

    Let's have a look at HTML so far: ~10 of the elements are the 'boilerplate stuff': <html>, <body>, <head> and metadata stuff - <link>, <meta>, <base>, <style>, <script>, <title>. There's a few elements for linking and embedding other content and multimedia, and a few with special powers like <template> and <slot>. And then the rest: ~30 are "weak semantics about text" that aren't interactive and don't have signficant (any) special meaning necessarily and are primarily just things with default CSS stylesheet rules. About 15 are "weak semantics about text, but meaningfully weak" - stuff like lists. ~15 are sectioning or landmarks. 11 more are about tables - a 2 dimensional relationship of text. 14 are about forms. And then two are... well... something else. We'll come back to that. And ~30 are deprecated.

    The route to all of these elements was a little different - and they have different costs associated with them, and offer different value. One thing that's interesting to note is that the majority of them are what Dave Rupert refers to as "Spicy Divs". That is, there's not much to them - they aren't complex or interactive, and they don't bring a ton of real world value. That's not to say they are without value, but ultimately we spend a lot of time and bandwidth in standards debating what often amount to silly things because of it. <main> is a kind of a good example. Nothing wrong with it as an element, it's great, in fact - it just has a long and complex backstory that ultimately cost a lot relative to its real world value and proven use. Tons of time defining and debating <address>, which isn't really that meaningful to browsers, and then lots of evangelism telling people they're using it wrong - only in the end to eventually have to admit that it ultimately means the thing people used it as.

    The second thing to notice is that there there are very few interactive elements. Further, most of the ones we usually talk about are about forms. And that lead us to this: Interactive, form-related elements on the web are... tricky.

    The form-related game

    On the one hand, there's a lot to like about native, interactive form controls. In theory, they can bring a simple declarative form with all kinds of goodness in terms of platform integration, portability and centralized work on accessibility and UX.


    ...primary goal has not yet been achieved
    > What is the primary goal?
    To win the game.

    More simply: Despite a lot of work, and huge investments, people still gravitate toward custom implementations. That's not good for anybody.

    Why are native form controls hard?

    Well... So many reasons, but to start with, the one that they could really do without is that they got off to a tricky start by design.

    <input type="WOPR">

    Most of the form-related controls are created based on <input type-"...">. You've probably heard someone suggest that this is great because in cases where it isn't supported, simply providing the text field is fine. However, this is incomplete and we've been paying the price for it for a long time.

    > People sometimes make... mistakes.

    Did you ever notice how the JavaScript API surface of input works differently based on the type? If the type is number and you type gibberish that looks remotely 'potentially numbery' like '12312e314124', then .value will be '' (empty string). Or, the checkbox type has a checked property, but that means that input type=number has that property too... What does it do? Anything!? Or, have you ever noticed how .selectionStart and .selectionEnd actually would initially throw if you tried to use them on a numeric input type?! (fixed by Simon Pieters). If not, go have a watch of Monica Dinculescu's <input> I ♡ you, but you're bringing me down -- a 45 minute talk about just this.

    Ultimately, many of these problems stem from us accepting the illusion presented that a date picker is a kind of text input.

    Faulken pointing to NORAD screens
    Stephen Falken: Uh, uh, General, what you see on these screens up here is a fantasy; a computer-enhanced hallucination. Those blips are not real missiles. They're phantoms.

    Realistically, a date picker isn't a text input. Perhaps at an HTTP level, or even a database level, it's just a value - but controls are about how to populate values, and that's quite a different thing. It's not an is-a relationship: A date picker is a completely different complex control with potentially lots of complex interactions: Different things it needs to communicate, lots of sub-iteractions to manage, and so on.

    All of this is why you might hear people suggest "LSP" or "Favor composition over inheritance" as a better alternative.

    Acceptable Losses?

    This is exacerbated by the fact that at the end of the day, developers are left with some pretty unappealing choices: Something is going to be sacrificed.

    Color pickers provide a great example of how this plays out in practice: Will developers building the sorts of tools that use a color picker allow a simple input field as the fallback until every browser ships something acceptable? Seems like not much of a choice really.

    Sadly the story for 'polyfilling' an element also still isn't great: It's really just 'replacing it with something else entirely'. So, at the end of the day, you've got to go and find something that is a good quality stand in anyway. Finally, since literally everything your code relies on understanding the DOM, and that stand in will have entirely different DOM, you now have two entirely different interactive trees to worry about. Yikes.

    Further, this situation isn't necessarily short lived, historically. The reality is that lacking universal support can be the case for a long time: Native support for <input type="color"> was added to the last major browser... Checks notes... last year.

    Even after all of that... Guess what? It still isn't acceptable for a lot of people because it isn't styleable -- and in fact, it has entirely different UI and features everywhere. While this can occasionally be very useful on some kinds of devices, it makes things like providing helpful documentation hard. Worse, even the native ones weren't super keyboard accessible either.

    So... That's a lot of barriers and disincentives.

    Conversely, a custom solution can solve all of these problems now... So it shouldn't actually be surprising that developers often choose a custom solution: There are no 'acceptable losses' for developers here - all of the choices are somehow bad.

    Learn, dammit.

    Now, let's talk about how we get out of this mess.

    You'll be happy to know that there are efforts to both learn and improve existing native interactive elements. If you haven't seen Greg Whitworth and Nicole Sullivan's talk HTML Isn't Done from Chrome Dev Summit last year, I would highly recommend it (below).

    There's also been a heck of a lot of work to make it possible for authors to define their own interactive elements that work just like the native ones. The idea is to empower developers to explore the space and try to help find the really sweet spots that strike all of the right balances.

    It's worth pointing out why this is useful - it's not just an academic exercise that is trying to push responsibility to developers. It's important because the truth is: We don't actually know how. We need to be able to throw lots of things at the wall and see what sticks. The trouble is, currently (historically) failure is really slow and really expensive -- and even our most educated 'guesses' at improvement are still just that: Guesses. We haven't proved we have this figured out yet.

    We need something like the WOPR for playing out possible answers without doing harm: failing, and learning - that's what the custom elements stuff is all about.
    Shall we play a (different) game?

    Ok, but hold on.... Let's talk about non-form related interactive controls now. Realistically, we've come up with really only 1 strictly generic non-form-related UI control in 30 years: <details>. That "other" one in my intial list. Wow... And we've had that done for like 10 years, right? Let's see, it's been supported universally since... checks notes... Jan 14, 2020. Waaaaat?

    But... Again, it has most of the same kinds of general problems. In the most recent HTTP Almanac data, it was the 102nd most used HTML element... Can you even name 101 other HTML elements? It has styling problems. It's misunderstood. It was hard to polyfill well without just creating other problems.

    So... This is terrible.

    Our continued problems in this space are part of the reason there is hesitancy to take up entirely new things... Like, tabs.

    Back in 2015, I managed to get a number of people in Web standards (especially a11y) interested in working on a possible new element proposal for HTML which would have given not only the tabset but some other things as well. Determined not to be doomed by the same sorts of stylability problems, we set out to define things that would become shadow parts and themes and custom properties and so on.

    But, in many ways, it wasn't great. Recently, this has got me to thinking: What if this isn't even the right game we're playing? What if for some elements at least - non-form associated ones - the right lesson to take away is that this is an unnecessary exercise in futility?

    Maybe we need a different kind of experiment: One that just says "Strange game. The only winning move is not to play".

    How about a nice game of chess?

    So, here's the basic idea: What if we just created a resilient pattern focused on mainly function and meaning -- and hardly at all on the UI aspect. What if instead of inventing complex new ways to strike the balance of preventing authors from styling too much, we mostly just acknowledged the fact that they want to, and... let them.

    What if, like with <video> you could kind of just take control of the UI, but... maybe without throwing it away entirely.

    I spoke with Greg Whitworth about this, who has been investigating how to improve the existing controls on the Web (see also Can we please style select?.

    In fact, he completely agreed with the overall premise and went on to described how...

    The core mission of Open UI, a new W3C community group, is to document the anatomies, behaviors and states of components and controls from across frameworks. By doing this, we can ensure that the key pieces that folks don't want to re-create, they don't have to. We've got a few different ideas that are beginning to take shape but changes like this are like peeling an onion. If it's a new control/component then it would be a bit more straight forward but the most painful controls we've found have been on the web since the 90s. So we'll be gathering telemetry and ensuring that any modifications we ultimately propose are web compatible.

    In our conversation we discussed how things like tabs seem like an interesting target where this kind of thing might just work really well. The particular shape of the tabs problem lends itself, I think, to easily trying something completely different and entirely without past baggage.

    So, in that light here's a demo/link to just such an experiment which gives us... tabs! - and which we think has some nice qualities:

    • It is a declarative decorator over otherwise good but non-interactive semantics. That means there's no miss of differing interaction issues here caused by complex inheritance. It's composition.
    • It doesn't change your light DOM, and it barely uses Shadow DOM. There's no "two trees" problem, and there's not secrets or new challenges. Go ahead, use CSS and JavaScript.
    • It adds the functionality: roving focus, keyboard handling, accessibility stuff and a simple container for you. That's it, really.
    • If you really want to opt in to some simple 'default styles' and like the 'minor tweaksonly' approach, it has an attribute that enables 'native' look and feel, with a bit less flexibility, but which might be enough for some people. It just starts with the assumption that that's not the case.

    So... that's it. It could for sure use more work, but WDYT? Would you try it out? Let us know your thoughts? Such experiments will be useful in informing work and directions for efforts like Open-UI and possible future standards.

    Will it help? To be honest, I don't know, but maybe the right way to learn involves trying something really different, and I'm open to anything...

    April 29, 2020 04:00 AM

    April 14, 2020

    Andy Wingo

    understanding webassembly code generation throughput

    Greets! Today's article looks at browser WebAssembly implementations from a compiler throughput point of view. As I wrote in my article on Firefox's WebAssembly baseline compiler, web browsers have multiple wasm compilers: some that produce code fast, and some that produce fast code. Implementors are willing to pay the cost of having multiple compilers in order to satisfy these conflicting needs. So how well do they do their jobs? Why bother?

    In this article, I'm going to take the simple path and just look at code generation throughput on a single chosen WebAssembly module. Think of it as X-ray diffraction to expose aspects of the inner structure of the WebAssembly implementations in SpiderMonkey (Firefox), V8 (Chrome), and JavaScriptCore (Safari).

    experimental setup

    As a workload, I am going to use a version of the "Zen Garden" demo. This is a 40-megabyte game engine and rendering demo, originally released for other platforms, and compiled to WebAssembly a couple years later. Unfortunately the original URL for the demo was disabled at some point in late 2019, so it no longer has a home on the web. A bit of a weird situation and I am not clear on licensing either. In any case I have a version downloaded, and have hacked out a minimal set of "imports" that the WebAssembly module needs from the host to allow the module to compile and link when run from a JavaScript shell, without requiring WebGL and similar facilities. So the benchmark is just to instantiate a WebAssembly module from the 40-megabyte byte array and see how long it takes. It would be better if I had more test cases (and would be happy to add them to the comparison!) but this is a start.

    I start by benchmarking the various WebAssembly implementations, firstly in their standard configuration and then setting special run-time flags to measure the performance of the component compilers. I run these tests on the core-rich machine that I use for browser development (2 Xeon Silver 4114 CPUs for a total of 40 logical cores). The default-configuration numbers are therefore not indicative of performance on a low-end Android phone, but we can use them to extract aspects of the different implementations.

    Since I'm interested in compiler throughput, I'm not particularly concerned about how well a compiler will use all 40 cores. Therefore when testing the specific compilers I will set implementation-specific flags to disable parallelism in the compiler and GC: --single-threaded on V8, --no-threads on SpiderMonkey, and --useConcurrentGC=false --useConcurrentJIT=false on JSC. To further restrict any threads that the implementation might decide to spawn, I'll bind these to a single core on my machine using taskset -c 4. Otherwise the machine is in its normal configuration (nothing else significant running, all cores available for scheduling, turbo boost enabled).

    I'll express results in nanoseconds per WebAssembly code byte. Of the 40 megabytes or so in the Zen Garden demo, only 23 891 164 bytes are actually function code; the rest is mostly static data (textures and so on). So I'll divide the total time by this code byte count.

    I tested V8 at git revision 0961376575206, SpiderMonkey at hg revision 8ec2329bef74, and JavaScriptCore at subversion revision 259633. The benchmarks can be run using just a shell; see the pull request. I timed how long it took to instantiate the Zen Garden demo, ensuring that a basic export was callable. I collected results from 20 separate runs, sleeping a second between them. The bars in the charts below show the median times, with a histogram overlay of all results.

    results & analysis

    We can see some interesting results in this graph. Note that the Y axis is logarithmic. The "concurrent tiering" results in the graph correspond to the default configurations (no special flags, no taskset, all cores available).

    The first interesting conclusions that pop out for me concern JavaScriptCore, which is the only implementation to have a baseline interpreter (run using --useWasmLLInt=true --useBBQJIT=false --useOMGJIT=false). JSC's WebAssembly interpreter is actually structured as a compiler that generates custom WebAssembly-specific bytecode, which is then run by a custom interpreter built using the same infrastructure as JSC's JavaScript interpreter (the LLInt). Directly interpreting WebAssembly might be possible as a low-latency implementation technique, but since you need to validate the WebAssembly anyway and eventually tier up to an optimizing compiler, apparently it made sense to emit fresh bytecode.

    The part of JSC that generates baseline interpreter code runs slower than SpiderMonkey's baseline compiler, so one is tempted to wonder why JSC bothers to go the interpreter route; but then we recall that on iOS, we can't generate machine code in some contexts, so the LLInt does appear to address a need.

    One interesting feature of the LLInt is that it allows tier-up to the optimizing compiler directly from loops, which neither V8 nor SpiderMonkey support currently. Failure to tier up can be quite confusing for users, so good on JSC hackers for implementing this.

    Finally, while baseline interpreter code generation throughput handily beats V8's baseline compiler, it would seem that something in JavaScriptCore is not adequately taking advantage of multiple cores; if one core compiles at 51ns/byte, why do 40 cores only do 41ns/byte? It could be my tests are misconfigured, or it could be that there's a nice speed boost to be found somewhere in JSC.

    JavaScriptCore's baseline compiler (run using --useWasmLLInt=false --useBBQJIT=true --useOMGJIT=false) runs much more slowly than SpiderMonkey's or V8's baseline compiler, which I think can be attributed to the fact that it builds a graph of basic blocks instead of doing a one-pass compile. To me these results validate SpiderMonkey's and V8's choices, looking strictly from a latency perspective.

    I don't have graphs for code generation throughput of JavaSCriptCore's optimizing compiler (run using --useWasmLLInt=false --useBBQJIT=false --useOMGJIT=true); it turns out that JSC wants one of the lower tiers to be present, and will only tier up from the LLInt or from BBQ. Oh well!

    V8 and SpiderMonkey, on the other hand, are much of the same shape. Both implement a streaming baseline compiler and an optimizing compiler; for V8, we get these via --liftoff --no-wasm-tier-up or --no-liftoff, respectively, and for SpiderMonkey it's --wasm-compiler=baseline or --wasm-compiler=ion.

    Here we should conclude directly that SpiderMonkey generates code around twice as fast as V8 does, in both tiers. SpiderMonkey can generate machine code faster even than JavaScriptCore can generate bytecode, and optimized machine code faster than JSC can make baseline machine code. It's a very impressive result!

    Another conclusion concerns the efficacy of tiering: for both V8 and SpiderMonkey, their baseline compilers run more than 10 times as fast as the optimizing compiler, and the same ratio holds between JavaScriptCore's baseline interpreter and compiler.

    Finally, it would seem that the current cross-implementation benchmark for lowest-tier code generation throughput on a desktop machine would then be around 50 ns per WebAssembly code byte for a single core, which corresponds to receiving code over the wire at somewhere around 160 megabits per second (Mbps). If we add in concurrency and manage to farm out compilation tasks well, we can obviously double or triple that bitrate. Optimizing compilers run at least an order of magnitude slower. We can conclude that to the desktop end user, WebAssembly compilation time is indistinguishable from download time for the lowest tier. The optimizing tier is noticeably slower though, running more around 10-15 Mbps per core, so time-to-tier-up is still a concern for faster networks.

    Going back to the question posed at the start of the article: yes, tiering shows a clear benefit in terms of WebAssembly compilation latency, letting users interact with web sites sooner. So that's that. Happy hacking and until next time!

    by Andy Wingo at April 14, 2020 08:59 AM

    April 08, 2020

    Andy Wingo

    multi-value webassembly in firefox: a binary interface

    Hey hey hey! Hope everyone is staying safe at home in these weird times. Today I have a final dispatch on the implementation of the multi-value feature for WebAssembly in Firefox. Last week I wrote about multi-value in blocks; this week I cover function calls.

    on the boundaries between things

    In my article on Firefox's baseline compiler, I mentioned that all WebAssembly engines in web browsers treat the function as the unit of compilation. This facilitates streaming, parallel compilation of WebAssembly modules, by farming out compilation of individual functions to worker threads. It also allows for easy tier-up from quick-and-dirty code generated by the low-latency baseline compiler to the faster code produced by the optimizing compiler.

    There are some interesting Conway's Law implications of this choice. One is that division of compilation tasks becomes an opportunity for division of human labor; there is a whole team working on the experimental Cranelift compiler that could replace the optimizing tier, and in my hackings on Firefox I have had minimal interaction with them. To my detriment, of course; they are fine people doing interesting things. But the code boundary means that we don't need to communicate as we work on different parts of the same system.

    Boundaries are where places touch, and sometimes for fluid crossing we have to consider boundaries as places in their own right. Functions compiled with the baseline compiler, with Ion (the production optimizing compiler), and with Cranelift (the experimental optimizing compiler) are all able to call each other because they actively maintain a common boundary, a binary interface (ABI). (Incidentally the A originally stands for "application", essentially reflecting division of labor between groups of people making different components of a software system; Conway's Law again.) Let's look closer at this boundary-place, with an eye to how it changes with multi-value.

    what's in an ABI?

    Among other things, an ABI specifies a calling convention: which arguments go in registers, which on the stack, how the stack values are represented, how results are returned to the callers, which registers are preserved over calls, and so on. Intra-WebAssembly calls are a closed world, so we can design a custom ABI if we like; that's what V8 does. Sometimes WebAssembly may call functions from the run-time, though, and so it may be useful to be closer to the C++ ABI on that platform (the "native" ABI); that's what Firefox does. (Incidentally here I think Firefox is probably leaving a bit of performance on the table on Windows by using the inefficient native ABI that only allows four register parameters. I haven't measured though so perhaps it doesn't matter.) Using something closer to the native ABI makes debugging easier as well, as native debugger tools can apply more easily.

    One thing that most native ABIs have in common is that they are really only optimized for a single result. This reflects their heritage as artifacts from a world built with C and C++ compilers, where there isn't a concept of a function with more than one result. If multiple results are required, they are represented instead as arguments, typically as pointers to memory somewhere. Consider the AMD64 SysV ABI, used on Unix-derived systems, which carefully specifies how to pass arbitrary numbers of arbitrary-sized data structures to a function (§3.2.3), while only specifying what to do for a single return value. If the return value is too big for registers, the ABI specifies that a pointer to result memory be passed as an argument instead.

    So in a multi-result WebAssembly world, what are we to do? How should a function return multiple results to its caller? Let's assume that there are some finite number of general-purpose and floating-point registers devoted to return values, and that if the return values will fit into those registers, then that's where they go. The problem is then to determine which results will go there, and if there are remaining results that don't fit, then we have to put them in memory. The ABI should indicate how to address that memory.

    When looking into a design, I considered three possibilities.

    first thought: stack results precede stack arguments

    When a function needs some of its arguments passed on the stack, it doesn't receive a pointer to those arguments; rather, the arguments are placed at a well-known offset to the stack pointer.

    We could do the same thing with stack results, either reserving space deeper on the stack than stack arguments, or closer to the stack pointer. With the advent of tail calls, it would make more sense to place them deeper on the stack. Like this:

    The diagram above shows the ordering of stack arguments as implemented by Firefox's WebAssembly compilers: later arguments are deeper (farther from the stack pointer). It's an arbitrary choice that happens to match up with what the native ABIs do, as it was easier to re-use bits of the already-existing optimizing compiler that way. (Native ABIs use this stack argument ordering because of sloppiness in a version of C from before I was born. If you were starting over from scratch, probably you wouldn't do things this way.)

    Stack result order does matter to the baseline compiler, though. It's easier if the stack results are placed in the same order in which they would be pushed on the virtual stack, so that when the function completes, the results can just be memmove'd down into place (if needed). The same concern dictates another aspect of our ABI: unlike calls, registers are allocated to the last results rather than the first results. This is to make it easy to preserve stack invariant (1) from the previous article.

    At first I thought this was the obvious option, but I ran into problems. It turns out that stack arguments are fundamentally unlike stack results in some important ways.

    While a stack argument is logically consumed by a call, a stack result starts life with a call. As such, if you reserve space for stack results just by decrementing the stack pointer before a call, probably you will need to load the results eagerly into registers thereafter or shuffle them into other positions to be able to free the allocated stack space.

    Eager shuffling is busy-work that should be avoided if possible. It's hard to avoid in the baseline compiler. For example, a call to a function with 10 arguments will consume 10 values from the temporary stack; any results will be pushed on after removing argument values from the stack. If there any stack results, it's almost impossible to avoid a post-call memmove, to move stack results to where they should be before the 10 argument values were pushed on (and probably spilled). So the baseline compiler case is not optimal.

    However, things get gnarlier with the Ion optimizing compiler. Like many other optimizing compilers, Ion is designed to compute the necessary stack frame size ahead of time, and to never move the stack pointer during an activation. The only exception is for pushing on any needed stack arguments for nested calls (which are popped directly after the nested call). So in that case, assuming there are a number of multi-value calls in a stack frame, we'll be shuffling in the optimizing compiler as well. Not great.

    Besides the need to shuffle, stack arguments and stack results differ as regards ownership and garbage collection. A callee "owns" the memory for its stack arguments; it is responsible for them. The caller can't assume anything about the contents of that memory after a call, especially if the WebAssembly implementation supports tail calls (a whole 'nother blog post, that). If the values being passed are just bits, that's one thing, but with the reference types proposal, some result values may be managed by the garbage collector. The callee is responsible for making stack arguments visible to the garbage collector; the caller is responsible for the results. The caller will need to emit metadata to allow the garbage collector to see stack result references. For this reason, a stack result actually starts life just before a call, because it can become initialized at any point and thus needs to be traced during the entire callee activation. Not all callers can easily add garbage collection roots for writable stack slots, so the need to place stack results in a fixed position complicates calling multi-value WebAssembly functions in some cases (e.g. from C++).

    second thought: pointers to individual stack results

    Surely there are more well-trodden solutions to the multiple-result problem. If we encoded a multi-value return in C, how would we do it? Consider a function in C that has three 64-bit integer results. The idiomatic way to encode it would be to have one of the results be the return value of the function, and the two others to be passed "by reference":

    int64_t foo(int64_t* a, int64_t* b) {
      *a = 1;
      *b = 2;
      return 3;
    void call_foo(void) {
      int64 a, b, c;
      c = foo(&a, &b);

    This program shows us a possibility for encoding WebAssembly's multiple return values: pass an additional argument for each stack result, pointing to the location to which to write the stack result. Like this:

    The result pointers are normal arguments, subject to normal argument allocation. In the above example, given that there are already stack arguments, they will probably be passed on the stack, but in many cases the stack result pointers may be passed in registers.

    The result locations themselves don't even need to be on the stack, though they certainly will be in intra-WebAssembly calls. However the ability to write to any memory is a useful form of flexibility when e.g. calling into WebAssembly from C++.

    The advantage of this approach is that we eliminate post-call shuffles, at least in optimizing compilers. But, having to make an argument for each stack result, each of which might itself become a stack argument, seems a bit offensive. I thought we might be able to do a little better.

    third thought: stack result area, passed as pointer

    Given that stack results are going to be written to memory, it doesn't really matter where they will be written, from the perspective of the optimizing compiler at least. What if we allocated them all in a block and just passed one pointer to the block? Like this:

    Here there's just one additional argument, no matter how many stack results. While we're at it, we can specify that the layout of the stack arguments should be the same as how they would be written to the baseline stack, to make the baseline compiler's job easier.

    As I started implementation with the baseline compiler, I chose this third approach, essentially because I was already allocating space for the results in a block in this way by bumping the stack pointer.

    When I got to the optimizing compiler, however, it was quite difficult to convince Ion to allocate an area on the stack of the right shape.

    Looking back on it now, I am not sure that I made the right choice. The thing is, the IonMonkey compiler started life as an optimizing compiler for JavaScript. It can represent unboxed values, which is how it came to be used as a compiler for asm.js and later WebAssembly, and it does a good job on them. However it has never had to represent aggregate data structures like a C++ class, so it didn't have support for spilling arbitrary-sized data to the stack. It took a while staring at the register allocator to convince it to allocate arbitrary-sized stack regions, and then to allocate component scalar values out of those regions. If I had just asked the register allocator to give me one appropriate-sized stack slot for each scalar, and hacked out the ability to pass separate pointers to the stack slots to WebAssembly calls with stack results, then I would have had an easier time of it, and perhaps stack slot allocation could be more dense because multiple results wouldn't need to be allocated contiguously.

    As it is, I did manage to hack it in, and I think in a way that doesn't regress. I added a layer over an argument type vector that adds a synthetic stack results pointer argument, if the function returns stack results; iterating over this type with ABIArgIter will allocate a stack result area pointer, either as a register argument or a stack argument. In the optimizing compiler, I added add a kind of value allocation coresponding to a variable-sized stack area, (using pointer tagging again!), and extended the register allocator to allocate LStackArea, and the component stack results. Interestingly, I had to add a kind of definition that starts life on the stack; previously all Ion results started life in registers and were only spilled if needed.

    In the end, a function will capture the incoming stack result area argument, either as a normal SSA value (for Ion) or stored to a stack slot (baseline), and when returning will write stack results to that pointer as appropriate. Passing in a pointer as an argument did make it relatively easy to implement calls from WebAssembly to and from C++, getting the variable-shape result area to be known to the garbage collector for C++-to-WebAssembly calls was simple in the end but took me a while to figure out.

    Finally I was a bit exhausted from multi-value work and ready to walk away from the "JS API", the bit that allows multi-value WebAssembly functions to be called from JavaScript (they return an array) or for a JavaScript function to return multiple values to WebAssembly (via an iterable) -- but then when I got to thinking about this blog post I preferred to implement the feature rather than document its lack. Avoidance-of-document-driven development: it's a thing!

    towards deployment

    As I said in the last article, the multi-value feature is about improved code generation and also making a more capable base for expressing further developments in the WebAssembly language.

    As far as code generation goes, things are progressing but it is still early days. Thomas Lively has implemented support in LLVM for emitting return of C++ aggregates via multiple results, which is enabled via the -experimental-multivalue-abi cc1 flag. Thomas has also been implementing multi-value support in the binaryen WebAssembly toolchain component, used by the emscripten C++-to-WebAssembly toolchain. I think it will be a few months though before everything lands in a way that end users can take advantage of.

    On the specification side, the multi-value feature is now at phase 4 since January, which basically means things are all done there.

    Implementation-wise, V8 has had experimental support since 2017 or so, and the feature was staged last fall, although V8 doesn't yet support multi-value in their baseline compiler. WebKit also landed support last fall.

    Unlike V8 and SpiderMonkey, JavaScriptCore (the JS and wasm engine in WebKit) actually implements a WebAssembly interpreter as their solution to the one-pass streaming compilation problem. Then on the compiler side, there are two tiers that both operate on basic block graphs (OMG and BBQ; I just puked a little in my mouth typing that). This strategy makes the compiler implementation quite straightforward. It's also an interesting design point because JavaScriptCore's garbage collector scans the stack conservatively; there's no need for the compiler to do bookkeeping on the GC's behalf, which I'm sure was a relief to the hacker. Anyway, multi-value in WebKit is done too.

    The new thing of course is that finally, in Firefox, the feature is now fully implemented (woo) and enabled by default on Nightly builds (woo!). I did that! It took me a while! Perhaps too long? Anyway it's done. Thanks again to Bloomberg for supporting this work; large ups to y'all for helping the web move forward.

    See you next time with a more general article rounding up compile-time benchmarks on a variety of WebAssembly implementations. Until then, happy hacking!

    by Andy Wingo at April 08, 2020 09:02 AM