Planet Igalia

July 03, 2020

Eleni Maria Stea

[OpenGL and Vulkan Interoperability on Linux] Part 1: Introduction

It’s been a while that Igalia’s graphics team had been working on the OpenGL extensions that provide the mechanisms for OpenGL and Vulkan interoperability in the Intel iris (gallium3d) driver that is part of mesa. As there were no conformance tests (CTS) for this extension, and we needed to test it, we have written (and … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 1: Introduction

by hikiko at July 03, 2020 11:39 AM

July 02, 2020

Philippe Normand

Web-augmented graphics overlay broadcasting with WPE and GStreamer

Graphics overlays are everywhere nowadays in the live video broadcasting industry. In this post I introduce a new demo relying on GStreamer and WPEWebKit to deliver low-latency web-augmented video broadcasts.

Readers of this blog might remember a few posts about WPEWebKit and a GStreamer element we at Igalia worked on. In december 2018 I introduced GstWPE and a few months later blogged about a proof-of-concept application I wrote for it. So, learning from this first iteration, I wrote another demo!

The first demo was already quite cool, but had a few down-sides:

  1. It works only on desktop (running in a Wayland compositor). The Wayland compositor dependency can be a burden in some cases. Ideally we could imaginge GstWPE applications running “in the cloud”, on machines without GPU, bare metal.
  2. While it was cool to stream to Twitch, Youtube and the like, these platforms currently can ingest only RTMP streams. That means the latency introduced can be quite significant, depending on the network conditions of course, but even in ideal conditions the latency was between one and 2 seconds. This is not great, in the world we live in.

To address the first point, WPE founding engineer, Žan Doberšek enabled software rasterizing support in WPE and its FDO backend. This is great because it allows WPE to run on machines without GPU (like continuous integration builders, test bots) but also “in the cloud” where machines with GPU are less affordable than bare metal! Following up, I enabled this feature in GstWPE. The source element caps template now has video/x-raw, in addition to video/x-raw(memory:GLMemory). To force swrast, you need to set the LIBGL_ALWAYS_SOFTWARE=true environment variable. The downside of swrast is that you need a good CPU. Of course it depends on the video resolution and framerate you want to target.

On the latency front, I decided to switch from RTMP to WebRTC! This W3C spec isn’t only about video chat! With WebRTC, sub-second live one-to-many broadcasting can be achieved, without much efforts, given you have a good SFU. For this demo I chose Janus, because its APIs are well documented, and it’s a cool project! I’m not sure it would scale very well in large deployments, but for my modest use-case, it fits very well.

Janus has a plugin called video-room which allows multiple participants to chat. But then imagine a participant only publishing its video stream and multiple “clients” connecting to that room, without sharing any video or audio stream, one-to-many broadcasting. As it turns out, GStreamer applications can already connect to this video-room plugin using GstWebRTC! A demo was developed by tobiasfriden and saket424 in Python, it recently moved to the gst-examples repository. As I kind of prefer to use Rust nowadays (whenever I can anyway) I ported this demo to Rust, it was upstreamed in gst-examples as well. This specific demo streams the video test pattern to a Janus instance.

Adapting this Janus demo was then quite trivial. By relying on a similar video mixer approach I used for the first GstWPE demo, I had a GstWPE-powered WebView streaming to Janus.

The next step was the actual graphics overlays infrastructure. In the first GstWPE demo I had a basic GTK UI allowing to edit the overlays on-the-fly. This can’t be used for this new demo, because I wanted to use it headless. After doing some research I found a really nice NodeJS app on Github, it was developed by Luke Moscrop, who’s actually one of the main developers of the Brave BBC project. The Roses CasparCG Graphics was developed in the context of the Lancaster University Students’ Union TV Station, this app starts a web-server on port 3000 with two main entry points:

  • An admin web-UI (in /admin/ allowing to create and manage overlays, like sports score boards, info banners, and so on.
  • The target overlay page (in the root location of the server), which is a web-page without predetermined background, displaying the overlays with HTML, CSS and JS. This web-page is meant to be fed to CasparCG (or GstWPE :))

After making a few tweaks in this NodeJS app, I can now:

  1. Start the NodeJS app, load the admin UI in a browser and enable some overlays
  2. Start my native Rust GStreamer/WPE application, which:
    • connects to the overlay web-server
    • mixes a live video source (webcam for instances) with the WPE-powered overlay
    • encodes the video stream to H.264, VP8 or VP9
    • sends the encoded RTP stream using WebRTC to a Janus server
  3. Let “consumer” clients connect to Janus with their browser, in order to see the resulting live broadcast.

(If the video doesn’t display, here is the Youtube link.)

This is pretty cool and fun, as my colleague Brian Kardell mentions in the video. Working on this new version gave me more ideas for the next one. And very recently the audio rendering protocol was merged in WPEBackend-FDO! That means even more use-cases are now unlocked for GstWPE.

This demo’s source code is hosted on Github. Feel free to open issues there, I am always interested in getting feedback, good or bad!

GstWPE is maintained upstream in GStreamer and relies heavily on WPEWebKit and its FDO backend. Don’t hesitate to contact us if you have specific requirements or issues with these projects :)

by Philippe Normand at July 02, 2020 01:00 PM

July 01, 2020

Alejandro Piñeiro

v3dv status update 2020-07-01

About three weeks ago there was a big announcement about the update of the status of the Vulkan effort for the Raspberry Pi 4. Now the source code is public. Taking into account the interest that it got, and that now the driver is more usable, we will try to post status updates more regularly. Let’s talk about what’s happened since then.

Input Attachments

Input attachment is one of the main sub-features for Vulkan multipass, and we’ve gained support since the announcement. On Vulkan the support for multipass is more tightly supported by the API. Renderpasses can have multiple subpasses. These can have dependencies between each other, and each subpass define a subset of “attachments”. One attachment that is easy to understand is the color attachment: This is where a given subpass writes a given color. Another, input attachment, is an attachment that was updated in a previous subpass (for example, it was the color attachment on such previous subpass), and you get as a input on following subpasses. From the shader POV, you interact with it as a texture, with some restrictions. One important restriction is that you can only read the input attachment at the current pixel location. The main reason for this restriction is because on tile-based GPUs (like rpi4) all primitives are batched on tiles and fragment processing is rendered one tile at a time. In general, if you can live with those restrictions, Vulkan multipass and input attachment will provide better performance than traditional multipass solutions.

If you are interested in reading more details on this, you can check out ARM’s very nice presentation “Vulkan Multipass mobile deferred done right”, or Sascha Willems’ post “Vulkan input attachments and sub passes”. The latter also includes information about how to use them and code snippets of one of his demos. For reference, this is how the input attachment demos looks on the rpi4:

Compute Shader

Given that this was one of the most requested features after the last update, we expect that this will be likely be the most popular news from this post: Compute shaders are now supported.

Compute shaders give applications the ability to perform non-graphics related tasks on the GPU, outside the normal rendering pipeline. For example they don’t have vertices as input, or fragments as output. They can still be used for massivelly parallel GPGPU algorithms. For example, this demo from Sascha Willems uses a compute shader to simulate cloth:

Storage Image

Storage Image is another recent addition. It is a descriptor type that represents an image view, and supports unfiltered loads, stores, and atomics in a shader. It is really similar in most other ways to the well-known OpenGL concept of texture. They are really common with compute shaders. Compute shaders will not render (they can’t) directly any image, and it is likely that if they need an image, they will update it. In fact the two Sascha Willem demos using storage images also require compute shader support:


Right now our main focus for the driver is working on features, targetting a compliant Vulkan 1.0 driver. Having said so, now that we both support a good range of features and can run non-basic applications, we have devoted some time to analyze if there were clear points where we could improve the performance. Among these we implemented:
1. A buffer object (BO) cache: internally we are allocating and freeing really often buffer objects for basically the same tasks, so there are a constant need of buffers of the same size. Such allocation/free require a DRM call, so we implemented a BO cache (based on the existing for the OpenGL driver) so freed BOs would be added to a cache, and reused if a new BO is allocated with the same size.
2. New code paths for buffer to image copies.


In addition to work on specific features, we also spent some time fixing specific driver bugs, using failing Vulkan CTS tests as reference. Thanks to that work, the Sascha Willems’ radial blur demo is now properly rendering, even though we didn’t focus specifically on working on that demo:


Now that the driver supports a good range of features and we are able to test more applications and run more Vulkan CTS Tests with all the needed features implemented, we plan to focus some efforts towards bugfixing for a while.

We also plan to start to work on implementing the support for Pipeline Cache, which allows the result of pipeline construction to be reused between pipelines and between runs of an application.

by infapi00 at July 01, 2020 05:26 PM

June 30, 2020

Enrique Ocaña

Developing on WebKitGTK with Qt Creator 4.12.2

After the latest migration of WebKitGTK test bots to use the new SDK based on Flatpak, the old development environment based on jhbuild became deprecated. It can still be used with export WEBKIT_JHBUILD=1, though, but support for this way of working will gradually fade out.

I used to work on a chroot because I love the advantages of having an isolated and self-contained environment, but an issue in the way bubblewrap manages mountpoints basically made it impossible to use the new SDK from a chroot. It was time for me to update my development environment to the new ages and have it working in my main Kubuntu 18.04 distro.

My mail goal was to have a comfortable IDE that follows standard GUI conventions (that is, no emacs nor vim) and has code indexing features that (more or less) work with the WebKit codebase. Qt Creator was providing all that to me in the old chroot environment thanks to some configuration tricks by Alicia, so it should be good for the new one.

I preferred to use the Qt Creator 4.12.2 offline installer for Linux, so I can download exactly the same version in the future in case I need it, but other platforms and versions are also available.

The WebKit source code can be downloaded as always using git:

git clone

It’s useful to add WebKit/Tools/Scripts and WebKit/Tools/gtk to your PATH, as well as any other custom tools you may have. You can customize your $HOME/.bashrc for that, but I prefer to have an environment script to be sourced from the current shell when I want to enter into my development environment (by running webkit). If you’re going to use it too, remember to adjust to your needs the paths used there.

Even if you have a pretty recent distro, it’s still interesting to have the latests Flatpak tools. Add Alex Larsson’s PPA to your apt sources:

sudo add-apt-repository ppa:alexlarsson/flatpak

In order to ensure that your distro has all the packages that webkit requires and to install the WebKit SDK, you have to run these commands (I omit the full path). Downloading the Flatpak modules will take a while, but at least you won’t need to build everything from scratch. You will need to do this again from time to time, every time the WebKit base dependencies change:


Now just build WebKit and check that MiniBrowser works:

build-webkit --gtk
run-minibrowser --gtk

I have automated the previous steps as go full-rebuild and

This build process should have generated a WebKit/WebKitBuild/GTK/Release/compile_commands.json
file with the right parameters and paths used to build each compilation unit in the project. This file can be leveraged by Qt Creator to get the right include paths and build flags after some preprocessing to translate the paths that make sense from inside Flatpak to paths that make sense from the perspective of your main distro. I wrote to take care of those transformations. It can be run manually or automatically when calling go full-rebuild or go update.

The WebKit way of managing includes is a bit weird. Most of the cpp files include config.h and, only after that, they include the header file related to the cpp file. Those header files depend on defines declared transitively when including config.h, but that file isn’t directly included by the header file. This breaks the intuitive rule of “headers should include any other header they depend on” and, among other things, completely confuse code indexers. So, in order to give the Qt Creator code indexer a hand, the script pre-includes WebKit.config for every file and includes config.h from it.

With all the needed pieces in place, it’s time to import the project into Qt Creator. To do that, click File → Open File or Project, and then select the compile_commands.json file that should have generated in the WebKit main directory.

Now make sure that Qt Creator has the right plugins enabled in Help → About Plugins…. Specifically: GenericProjectManager, ClangCodeModel, ClassView, CppEditor, CppTools, ClangTools, TextEditor and LanguageClient (more on that later).

With this setup, after a brief initial indexing time, you will have support for features like Switch header/source (F4), Follow symbol under cursor (F2), shading of disabled if-endif blocks, auto variable type resolving and code outline. There are some oddities of compile_commands.json based projects, though. There are no compilation units in that file for header files, so indexing features for them only work sometimes. For instance, you can switch from a method implementation in the cpp file to its declaration in the header file, but not the opposite. Also, you won’t see all the source files under the Projects view, only the compilation units, which are often just a bunch of UnifiedSource-*.cpp files. That’s why I prefer to use the File System view.

Additional features like Open Type Hierarchy (Ctrl+Shift+T) and Find References to Symbol Under Cursor (Ctrl+Shift+U) are only available when a Language Client for Language Server Protocol is configured. Fortunately, the new WebKit SDK comes with the ccls C/C++/Objective-C language server included. To configure it, open Tools → Options… → Language Client and add a new item with the following properties:

  • Name: ccls
  • Language: *.c;.cpp;*.h
  • Startup behaviour: Always On
  • Executable: /home/enrique/work/webkit/WebKit/Tools/Scripts/webkit-flatpak
  • Arguments: --gtk -c ccls --index=/home/enrique/work/webkit/WebKit

Some “LanguageClient ccls: Unexpectedly finished. Restarting in 5 seconds.” errors will appear in the General Messages panel after configuring the language client and every time you launch Qt Creator. It’s just ccls taking its time to index the whole source code. It’s “normal”, don’t worry about it. Things will get stable and start to work after some minutes.

Due to the way the Locator file indexer works in Qt Creator, it can become confused, run out of memory and die if it finds cycles in the project file tree. This is common when using Flatpak and running the MiniBrowser or the tests, since /proc and other large filesystems are accessible from inside WebKit/WebKitBuild. To avoid that, open Tools → Options… → Environment → Locator and set Refresh interval to 0 min.

I also prefer to call my own custom build and run scripts (go and instead of letting Qt Creator build the project with the default builders and mess everything. To do that, from the Projects mode (Ctrl+5), click on Build & Run → Desktop → Build and edit the build configuration to be like this:

  • Build directory: /home/enrique/work/webkit/WebKit
  • Add build step → Custom process step
    • Command: go (no absolute route because I have it in my PATH)
    • Arguments:
    • Working directory: /home/enrique/work/webkit/WebKit

Then, for Build & Run → Desktop → Run, use these options:

  • Deployment: No deploy steps
  • Run:
    • Run configuration: Custom Executable → Add
      • Executable:
      • Command line arguments:
      • Working directory:

With these configuration you can build the project with Ctrl+B and run it with Ctrl+R.

I think I’m not forgetting anything more regarding environment setup. With the instructions in this post you can end up with a pretty complete IDE. Here’s a screenshot of it working in its full glory:

Anyway, to be honest, nothing will ever reach the level of code indexing features I got with Eclipse some years ago. I could find usages of a variable/attribute and know where it was being read, written or read-written. Unfortunately, that environment stopped working for me long ago, so Qt Creator has been the best I’ve managed to get for a while.

Properly configured web based indexers such as the Searchfox instance configured in Igalia can also be useful alternatives to a local setup, although they lack features such as type hierarchy.

I hope you’ve found this post useful in case you try to setup an environment similar to the one described here. Enjoy!

by eocanha at June 30, 2020 03:47 PM

June 23, 2020

Igalia Compilers Team

Dates and Times in JavaScript

JavaScript Date is broken in ways that cannot be fixed without breaking the web. As the story goes, it was included in the original 10-day JavaScript engine hack and based on java.util.Date, which itself was deprecated in 1997 due to being a terrible API and replaced with a better one. The result has been for all of JavaScript’s history, the built-in Date has remained very hard to work with directly.

Starting a few years ago, a proposal has been developing, to add a new globally available object to JavaScript, Temporal. Temporal is a robust and modern API for working with dates, times, and timestamps, and also makes it easy to do things that were hard or impossible with Date, like converting dates between time zones, adding and subtracting while accounting for daylight saving time, working with date-only or time-only data, and even handling dates in non-Gregorian calendars. Although Temporal has “just works” defaults, it also provides fine-grained opt-in control of overflows, interpreting ambiguous times, and other corner cases. For more on the history of the proposal, and why it’s not possible to fix Date itself, read Maggie Pint’s two-part blog post “Fixing JavaScript Date”.

For examples of the power of Temporal, check out the cookbook. Many of these examples would be difficult to do with legacy Date, particularly the ones involving time zones. (We would have put an example in this post, but the code might soon become stale, for reasons which will hopefully become clear!)

This proposal is currently at Stage 2 in TC39’s proposal process, and we1 are hoping to move it along to Stage 3 soon.2 We have been working on the feature set of Temporal and the API for a long time, and we believe it’s full-featured and that the API is reasonable. You don’t design good APIs solely on the drawing board, however, so it’s time to put it to the test and let the JavaScript developer community try it out and see whether what we’ve come up with meets people’s needs.

It is still early enough that we can make drastic changes to the API if we find we need to, based on the feedback that we get. So please, try it out and let us know!

How to Try Temporal

If you just want to try Temporal out casually, with an interactive prompt, that’s easy! Visit the API documentation in your browser. On any of the documentation or cookbook pages, you can open your browser console and Temporal will be already loaded, ready for you to try out the examples. Or you can try it out on RunKit.

Or, maybe you are interested in a bit more in-depth evaluation, like building a small test project using Temporal. We know this takes up people’s valuable project time, but it’s also the best way that we can get the most valuable feedback, so we’d really appreciate this! We have released a polyfill for the Temporal API on npm. You can use it in your project with npm install --save proposal-temporal, and import it in your project with const { Temporal, Intl } = require('proposal-temporal');.

However, don’t use the polyfill in production applications! The proposal is still at Stage 2, and the polyfill has an 0.x version, so that should make it clear that the API is subject to change, and we do intend to keep changing it when we get feedback from you!

How to Give Feedback

We would love to hear from you about your experiences with Temporal! Once you’ve tried it, we have a short survey for you to fill out. If you feel comfortable doing so, please leave us your contact information, since we might want to ask some follow up questions.

Please also open an issue on our issue tracker if you have some suggestion! We welcome suggestions whether or not you filled out the survey. You can also browse the feedback that’s already been given in the issue tracker, and give it a thumbs-up if you agree or thumbs-down if you disagree.

Thanks for participating if you can! All the feedback that we receive now will help us make the right decisions as the proposal moves along to Stage 3 and Temporal eventually appears in your browser.

[1] “We” in this post means the Temporal champions group, a group of TC39 delegates and interested people. As you may guess from where this blog post is hosted, it includes members of Igalia’s Compilers team, but this was written on behalf of the Temporal champions. ↩

[2] Read the TC39 process document for more information on what these stages mean. tl;dr: Stage 2 is the time to give feedback on the proposal that can still be incorporated even if it requires drastic changes. Stage 3 is when the proposal remains stable except for serious problems discovered during implementation in browsers. ↩

by compilers at June 23, 2020 06:11 PM

June 16, 2020

Víctor Jáquez

WebKit Flatpak SDK and gst-build

This post is an annex of Phil’s Introducing the WebKit Flatpak SDK. Please make sure to read it, if you haven’t already.

Recapitulating, nowadays WebKitGtk/WPE developers —and their CI infrastructure— are moving towards to Flatpak-based environment for their workflow. This Flatpak-based environment, or Flatpak SDK for short, can be visualized as a software sandboxed-container, which bundles all the dependencies required to compile, run and debug WebKitGtk/WPE.

In a day-by-day work, this approach removes the potential compilation of the world in order to obtain reproducible builds, improving the development and testing work flow.

But what if you are also involved in the development of one dependency?

This is the case of Igalia’s multimedia team where, besides developing the multimedia features for WebKitGtk and WPE, we also participate in the GStreamer development, the framework used for multimedia.

Because of this, in our workflow we usually need to build WebKit with a fix, hack or new feature in GStreamer. Is it possible to add in Flatpak our custom GStreamer build without messing its own GStreamer setup? Yes, it’s possible.

gst-build is a set of scripts in Python which clone GStreamer repositories, compile them and setup an uninstalled environment. This uninstalled environment allows a transient usage of the compiled framework from their build tree, avoiding installation and further mess up with our system.

The WebKit scripts that wraps Flatpak operations are also capable to handle the scripts of gst-build to build GStreamer inside the container, and, when running WebKit’s artifacts, the scripts enable the mentioned uninstalled environment, overloading Flatpak’s GStreamer.

How do we unveil all this magic?

First of all, setup a gst-build installation as it is documented. In this installation is were the GStreamer plumbing is done.

Later, gst-build operations through WebKit compilation scripts are enabled when the environment variable GST_BUILD_PATH is exported. This variable should point to the directory where the gst-build tree is placed.

And that’s all!

But let’s put these words in actual commands. The following workflow assumes that WebKit repository is cloned in ~/WebKit and the gst-build tree is in ~/gst-build (please, excuse my bashisms).

Compiling WebKitGtk with symbols, using LLVM as toolchain (this command will also compile GStreamer):

$ cd ~/WebKit
% CC=clang CXX=clang++ GST_BUILD_PATH=/home/vjaquez/gst-build Tools/Scripts/build-webkit --gtk --debug

Running the generated minibrowser (remind GST_BUILD_PATH is required again for a correct linking):

$ GST_BUILD_PATH=/home/vjaquez/gst-build Tools/Scripts/run-minibrowser --gtk --debug

Running media layout tests:

$ GST_BUILD_PATH=/home/vjaquez/gst-build ./Tools/Scripts/run-webkit-tests --gtk --debug media

But wait! There’s more...

What if you I want to parametrize the GStreamer compilation. To say, I would like to enable a GStreamer module or disable the built of a specific element.

gst-build, as the rest of GStreamer modules, uses meson build system, so it’s possible to pass arguments to meson through the environment variable GST_BUILD_ARGS.

For example, I would like to enable gstreamer-vaapi 😇

$ cd ~/WebKit
% CC=clang CXX=clang++ GST_BUILD_PATH=/home/vjaquez/gst-build GST_BUILD_ARGS="-Dvaapi=enabled" Tools/Scripts/build-webkit --gtk --debug

by vjaquez at June 16, 2020 11:49 AM

June 13, 2020

Philippe Normand

Setting up Debian containers on Fedora Silverblue

After almost 20 years using Debian, I am trying something different, Fedora Silverblue. However for work I still need to use Debian/Ubuntu from time to time. In this post I am explaining the steps to setup Debian containers on Silverblue.

By default Silverblue comes with Toolbox which perfectly integrates the OS. It’s a shell script which actually relies on Podman, an alternative to Docker. And it works really well with Fedora images! However I ran into various issues when I wanted to setup Debian containers pulled from

$ toolbox create -c sid --image
Image required to create toolbox container.
Download (500MB)? [y/N]: y
Created container: sid
Enter with: toolbox enter --container sid
$ toolbox enter -c sid
toolbox: failed to start container sid

This should work, but doesn’t, because Toolbox expects specific Image requirements. After some digging into alternate Toolbox implementations and quite a bit of experimentation, I found a Toolbox pull-request adding support for Debian containers. Unfortunately this pull-request wasn’t merged yet, perhaps because the Toolbox developers are busy in the rewrite of Toolbox in Go. Scrolling down the comments, Martin Pitt provides some details and links to the approach he’s taken to achieve the same goal. Trying to follow his instructions, I finally managed to have working Debian containers in Silverblue. Many thanks to him! Here’s the run-down

  1. Download the build-debian-toolbox script:
$ wget;a=blob_plain;f=build-debian-toolbox;hb=HEAD
$ mkdir -p ~/bin
$ mv build-debian-toolbox ~/bin
$ chmod +x ~/bin/build-debian-toolbox
  1. Modify it a little. Here I had to:
    • change the shell shebang to /bin/bash
    • Update toolbox call sites to ~/bin/toolbox
  2. Make sure ~/bin/ is first in your shell $PATH
  3. Download the patched toolbox script from this pull-request and:
    • Move it to ~/bin/
    • Again, switch to the /bin/bash/ shebang in that script

Then run build-debian-toolbox. By default it will create a Debian Sid container, unless you provide override as command-line arguments, so for instance to create an Ubuntu bionic container you can:

$ build-debian-toolbox bionic ubuntu
$ toolbox run -c bionic cat /etc/os-release
VERSION="18.04.4 LTS (Bionic Beaver)"
PRETTY_NAME="Ubuntu 18.04.4 LTS"

That’s it! Hopefully the Toolbox Go rewrite will support this out of the box without requiring third-party tweaks. Thanks again to Martin for his work on this topic.

by Philippe Normand at June 13, 2020 11:50 AM

June 11, 2020

Philippe Normand

Introducing the WebKit Flatpak SDK

Working on a web-engine often requires a complex build infrastructure. This post documents our transition from JHBuild to Flatpak for the WebKitGTK and WPEWebKit development builds.

For the last 10 years, WebKitGTK has been relying on a custom JHBuild moduleset to handle its dependencies and (try to) ensure a reproducible test environment for the build bots. When WPEWebKit was upstreamed several years ago, a similar approach was used. We ended up with two slightly different modulesets to maintain. The biggest problem with that is that we still depend on the host OS for some dependencies. Another set of scripts was then written to install those, depending on which distro the host is running… This is a bit unfortunate. There are more issues with JHBuild, when a moduleset is updated, the bots wipe all the resulting builds and start a new build from scratch! Every WebKitGTK and WPE developer is strongly advised to use this setup, so everybody ends up building the dependencies.

In 2018, my colleague Thibault Saunier worked on a new approach, based on Flatpak. WebKit could be built as a Flatpak app relying on the GNOME SDK. This experiment was quite interesting but didn’t work for several reasons:

  • Developers were not really aware of this new approach
  • The GNOME SDK OSTree commits we were pinning we being removed from the upstream repo periodically, triggering issues on our side
  • Developers still had to build all the WebKit “app” dependencies

In late 2019 I started having a look at this, with a different workflow. I started to experiment with a custom SDK, based on the Freedesktop SDK. The goal was to distribute this to WebKit developers, who wouldn’t need to worry as much about build dependencies anymore and just focus on building the damn web-engine.

I first learned about flatpak-builder, built a SDK with that, and started playing with it for WebKit builds. I was almost happy with that, but soon realized flatpak-builder is cool for apps packaging, but for big SDKs, it doesn’t really work out. Flatpak-builder builds a single recipe at a time and if I want to update one, everything below in the manifest is rebuilt. As the journey goes on, I found Buildstream, which is actually used by the FDO and GNOME folks nowadays. After converting my flatpak-builder manifest to Buildstream I finally achieved happiness. Buildstream is bit like Yocto, Buildroot and all the similar NIH sysroot builders. It was one more thing to learn, but worth it.

The SDK build definitions are hosted in WebKit’s repository. The resulting Flatpak images are hosted on Igalia’s server and locally installed in a custom Flatpak UserDir so as to not interfere with the rest of the host OS Flatpak apps. The usual mix of python, perl, ruby WebKit scripts rely on the SDK to perform their job of building WebKit, running the various test suites (API tests, layout tests, JS tests, etc). All you need to do is:

  1. clone the WebKit git repo
  2. run Tools/Scripts/update-webkit-flatpak
  3. run Tools/Scripts/build-webkit —gtk
  4. run build artefacts, like the MiniBrowser, Tools/Scripts/run-minibrowser —gtk

Under the hood, the SDK will be installed in WebKitBuild/UserFlatpak and transparently used by the various build scripts. The sandbox is started using a flatpak run call which bind-mounts the WebKit checkout and build directory. This is great! We finally have a unified workflow, not depending on specific host distros. We can easily update toolchains, package handy debug tools like rr, LSP tooling such as ccls, etc and let the lazy developers actually focus on development, rather than tooling infrastructure.

Another nice tool we now support, is sccache. By deploying a “cloud” of builders in our Igalia servers we can now achieve improved build times. The SDK generates a custom sccache config based on the toolchains it includes (currently GCC 9.3.0 and clang 8). You can optionally provide an authentication token and you’re set. Access to the Igalia build cloud is restricted to Igalians, but folks who have their own sccache infra can easily set it up for WebKit. In my home office where I used to wait more than 20 minutes for a build to complete, I can now have a build done in around 12 minutes. Once we have Redis-powered cloud storage we might reach even better results. This is great because the buildbots should be able to keep the Redis cache warm enough. Also developers who can rely on this won’t need powerful build machines anymore, a standard workstation or laptop should be sufficient because the heavy C++ compilation jobs happen in the “cloud”.

If you don’t like sccache, we still support IceCC of course. The main difference is that IceCC works better on local networks.

Hacking on WebKit often involves hacking on its dependencies, such as GTK, GStreamer, libsoup and so on. With JHBuild it was fairly easy to vendor patches in the moduleset. With Buildstream the process is a bit more complicated, but actually not too bad! As we depend on the FDO SDK we can easily “patch” the junction file and also adding new dependencies from scratch is quite easy. Many thanks to the Buildstream developers for the hard work invested in the project!

As a conclusion, all our WebKitGTK and WPEWebKit bots are now using the SDK. JHBuild remains available for now, but on opt-in basis. The goal is to gently migrate most of our developers to this new setup and eventually JHBuild will be phased out. We still have some tasks pending, but we achieved very good progress on improving the WebKit developer workflow. A few more things are now easier to achieve with this new setup. Stay tuned for more! Thanks for reading!

by Philippe Normand at June 11, 2020 12:50 PM

Diego Pino

Renderization of Conic gradients

The CSS Images Module Level 4 introduced a new type of gradient: conic-gradient. Until then, there were only two other type of gradients available on the Web: linear-gradient and radial-gradient.

The first browser to ship conic-gradient support was Google Chrome, around March 2018. A few months after, September 2018, the feature was available in Safari. Firefox have been missing support until now, although an implementation is on the way and will ship soon. In the case of WebKitGTK (Epiphany) and WPE (Web Platform for Embedded), support landed in October 2019 which I implemented as part of my work at Igalia. The feature has been officially available in WebKitGTK and WPE since version 2.28 (March 2020).

Before native browser support, conic-gradient was available as a JavaScript polyfill created by Lea Verou.

Gradients in the Web

Generally speaking, a gradient is a smooth transition of colors defined by two or more stop-colors. In the case of a linear gradient, this transition is defined by a straight line (which might have and angle or not).

div.linear-gradient {
  width: 400px;
  height: 100px;
  background: linear-gradient(to right, red, yellow, lime, aqua, blue, magenta, red);
Linear gradient
Linear gradient

In the case of a radial gradient, the transition is defined by a center and a radius. Colors expand evenly in all directions from the center of the circle to outside.

div.radial-gradient {
  width: 300px;
  height: 300px;
  border-radius: 50%;
  background: radial-gradient(red, yellow, lime, aqua, blue, magenta, red);
Radial gradient
Radial gradient

A conical gradient, although also defined by a center and a radius, isn’t the same as a radial gradient. In a conical gradient colors spin around the circle.

div.conic-gradient {
  width: 300px;
  height: 300px;
  border-radius: 50%;
  background: conic-gradient(red, yellow, lime, aqua, blue, magenta, red);
Conic gradient
Conic gradient

Implementation in WebKitGTK and WPE

At the time of implementing support in WebKitGTK and WPE, the feature had already shipped in Safari. That meant WebKit already had support for parsing the conic-gradient specification as defined in CSS Images Module Level 4 and the data structures to store relevant information were already created. The only piece missing in WebKitGTK and WPE was painting.

Safari leverages many of its graphical painting operations on CoreGraphics library, which counts with a primitive for conic gradient painting (CGContextDrawConicGradient). Something similar happens in Google Chrome, although in this case the graphics library underneath is Skia (CreateTwoPointConicalGradient). WebKitGTK and WPE use Cairo for many of their graphical operations. In the case of linear and radial gradients, there’s native support in Cairo. However, there isn’t a function for conical gradient painting. This doesn’t mean Cairo cannot be used to paint conical gradients, it just means that is a little bit more complicated.

Mesh gradients

Cairo documentation states is possible to paint a conical gradient using a mesh gradient. A mesh gradient is defined by a set of colors and control points. The most basic type of mesh gradient is a Gouraud-shading triangle mesh.

cairo_mesh_pattern_begin_patch (pattern)

cairo_mesh_pattern_move_to (pattern, 100, 100);
cairo_mesh_pattern_line_to (pattern, 130, 130);
cairo_mesh_pattern_line_to (pattern, 130,  70);

cairo_mesh_pattern_set_corner_color_rgb (pattern, 0, 1, 0, 0);
cairo_mesh_pattern_set_corner_color_rgb (pattern, 1, 0, 1, 0);
cairo_mesh_pattern_set_corner_color_rgb (pattern, 2, 0, 0, 1);

cairo_mesh_pattern_end_patch (pattern)
Gouraud-shaded triangle mesh
Gouraud-shaded triangle mesh

A more sophisticated patch of mesh gradient is a Coons patch. A Coons patch is a quadrilateral defined by 4 cubic Bézier curve and 4 colors, one for each vertex. A Bézier curve is defined by 4 points, so we have a total of 12 control points (and 4 colors) in a Coons patch.

cairo_mesh_pattern_begin_patch (pattern);

cairo_mesh_pattern_move_to (pattern, 45, 12);
cairo_mesh_pattern_curve_to(pattern, 69, 24, 173, -15, 115, 50);
cairo_mesh_pattern_curve_to(pattern, 127, 66, 174, 47, 148, 104);
cairo_mesh_pattern_curve_to(pattern, 65, 58, 70, 69, 18, 103);
cairo_mesh_pattern_curve_to(pattern, 42, 43, 63, 45, 45, 12);

cairo_mesh_pattern_set_corner_color_rgb (pattern, 0, 1, 0, 0); // red
cairo_mesh_pattern_set_corner_color_rgb (pattern, 1, 0, 1, 0); // green
cairo_mesh_pattern_set_corner_color_rgb (pattern, 2, 0, 0, 1); // blue
cairo_mesh_pattern_set_corner_color_rgb (pattern, 3, 1, 1, 0); // yellow

cairo_mesh_pattern_end_patch (pattern);
Coons patch gradient
Coons patch gradient

A Coons patch comes very handy to paint a conical gradient. Consider the first quadrant of a circle, such quadrant can be easily defined with a Bézier curve.

cairo_mesh_pattern_begin_patch (pattern);

cairo_mesh_pattern_move_to (pattern, 0, 200);
cairo_mesh_pattern_line_to (pattern, 0, 0);
cairo_mesh_pattern_curve_to (pattern, 133, 0, 200, 133, 200, 200);
cairo_mesh_pattern_line_to (pattern, 0, 200);

cairo_mesh_pattern_set_corner_color_rgb (pattern, 0, 1, 0, 0); // red
cairo_mesh_pattern_set_corner_color_rgb (pattern, 1, 0, 1, 0); // green
cairo_mesh_pattern_set_corner_color_rgb (pattern, 2, 0, 0, 1); // blue
cairo_mesh_pattern_set_corner_color_rgb (pattern, 3, 1, 1, 0); // yellow

Coons patch of the first quadrant of a circle
Coons patch of the first quadrant of a circle

If we just simply use two colors instead, the final result resembles more to how a conical gradient looks.

cairo_mesh_pattern_set_corner_color_rgb (pattern, 0, 1, 0, 0); // red
cairo_mesh_pattern_set_corner_color_rgb (pattern, 1, 1, 0, 0); // red
cairo_mesh_pattern_set_corner_color_rgb (pattern, 2, 1, 1, 0); // yellow
cairo_mesh_pattern_set_corner_color_rgb (pattern, 3, 1, 1, 0); // yellow
Coons patch of the first quadrant of a circle (2 colors)
Coons patch of the first quadrant of a circle (2 colors)

Repeat this step 3 times more, with a few more stop colors, and you have a nice conical gradient.

A conic gradient made by composing mesh patches
A conic gradient made by composing mesh patches

Bézier curve as arcs

At this point the difficulty of painting a conical gradient has been reduced to calculating the shape of the Bézier curve of each mesh patch.

Computing the starting and ending points is straight forward, however calculating the position of the other two control points of the Bézier curve is a bit much harder.

Bézier curve approximation to a circle quadrant
Bézier curve approximation to a circle quadrant

Mozillian Michiel Kamermans (pomax) has a beautifully written essay on Bézier curves. Section “Circles and cubic Bézier curves” of such essay discusses how to approximate a Bézier curve to an arc. The case of a circular quadrant is particularly interesting because it allows painting a circle with 4 Bézier curves with minimal error. In the case of the quadrant above the values for each point would be the following:

S = (0, r), CP1 = (0.552 * r, r), CP2 = (r, 0.552 * r), E = (r, 0) 

Even though on its most basic form a conic gradient is defined by one starting and one ending color, painting a circle with two Bézier curves is not a good approximation to a semicircle (check the interactive examples of pomax’s Bézier curve essay). In such case, the conic gradient is split into four Coon patches with middle colors interpolated.

Also, in cases were there are more than 4 colors, each Coons patch will be smaller than a quadrant. It’s necessary a general formula that can compute the control points for each section of the circle, given an angle and a radius. After some math, the following formula can be inferred (check section “Circle and cubic Bézier curves” in pomax’s essay):

cp1 = {
   x: cx + (r * cos(angleStart) - f * (r * sin(angleStart),
   y: cy + (r * sin(angleStart)) + f * (r * cos(angleStart))
cp2 = {
   x: cx + (r * cos(angleEnd)) + f * (r * sin(angleEnd)),
   y: cy + (r * sin(angleEnd)) - f * (r * cos(angleEnd))

where f is a variable computed as:

f = 4 * tan((angleEnd - angleStart) / 4) / 3;

For a 90 degrees angle the value of f is 0.552. Thus, if the quadrant above had a radius of 100px, the values of the control points would be: CP1(155.2, 0) and CP2(200, 44.8) (considering top corner left as point 0,0).

And that’s basically all that is needed. The formula above allows us to compute a circular sector as a Bézier line, which when setup as a Coons patch creates a section of a conical gradient. Adding several Coons patches together creates the final conical gradient.

Wrapping up

It has been a long time since conic gradients for the Web were first drafted. For instance, the current bug in Firefox’s Bugzilla was created by Lea Verou five years ago. Fortunately, browsers have started shipping native support and conical gradients have been available in Chrome and Safari since two years ago. In this post I discussed the implementation, mainly rendering, of conic gradients in WebKitGTK and WPE. And since both browsers are WebKit based, they can leverage on the implementation efforts led by Apple when bringing support of this feature to Safari. With Firefox shipping conic gradient support soon this feature will be safe to use in the Web Platform.

June 11, 2020 12:00 AM

June 10, 2020

Philippe Normand

WebKitGTK and WPE now supporting videos in the img tag

Using videos in the <img> HTML tag can lead to more responsive web-page loads in most cases. Colin Bendell blogged about this topic, make sure to read his post on the cloudinary website. As it turns out, this feature has been supported for more than 2 years in Safari, but only recently the WebKitGTK and WPEWebKit ports caught up. Read on for the crunchy details or skip to the end of the post if you want to try this new feature.

As WebKitGTK and WPEWebKit already heavily use GStreamer for their multimedia backends, it was natural for us to also use GStreamer to provide the video ImageDecoder implementation.

The preliminary step is to hook our new decoder into the MIMETypeRegistry and into the ImageDecoder.cpp platform-agnostic module. This is where the main decoder branches out to platform-specific backends. Then we need to add a new class implementing WebKit’s ImageDecoder virtual interface.

First you need to implement supportsMediaType(). For this method we already had all the code in place. WebKit scans the GStreamer plugin registry and depending on the plugins available on the target platform, a mime-type cache is built, by the RegistryScanner. Our new image decoder just needs to hook into this component (exposed as singleton) so that we can be sure that the decoder will be used only for media types supported by GStreamer.

The second most important places of the decoder are its constructor and the setData() method. This is the place where the decoder receives encoded data, as a SharedBuffer, and performs the decoding. Because this method is synchronously called from a secondary thread, we run the GStreamer pipeline there, until the video has been decoded entirely. Our pipeline relies on the GStreamer decodebin element and WebKit’s internal video sink. For the time being hardware-accelerated decoding is not supported. We should soon be able to fix this issue though. Once all samples have been received by the sink, the decoder notifies its caller using a callback. The caller then knows it can request decoded frames.

Last, the decoder needs to provide decoded frames! This is implemented using the createFrameImageAtIndex() method. Our decoder implementation keeps an internal Map of the decoded samples. We sub-classed the MediaSampleGStreamer to provide an image() method, which returns the Cairo surface representing the decoded frame. Again, here we don’t support GL textures yet. Some more infrastructure work is likely going to be needed in that area of our WebKit ports.

Our implementation of the video ImageDecoder lives in ImageDecoderGStreamer which will be shipped in WebKitGTK and WPEWebKit 2.30, around September/October. But what if you want to try this already? Well, building WebKit can be a tedious task. We’ve been hard a work at Igalia to make this a bit easier using a new Flatpak SDK. So, you can either try this feature in Epiphany Tech Preview or (surprise!) with our new tooling allowing to download and run nightly binaries from the upstream WebKit build bots:

$ wget
$ chmod +x webkit-flatpak-run-nightly
$ python3 webkit-flatpak-run-nightly MiniBrowser

This script locally installs our new Flatpak-based developer SDK in ~/.cache/wk-nightly and then downloads a zip archive of the build artefacts from servers recently brought up by my colleague Carlos Alberto Lopez Perez, many thanks to him :). The downloaded zip file is unpacked in /tmp and kept around in case you want to run this again without re-downloading the build archive. Flatpak is then used to run the binaries inside a sandbox! This is a nice way to run the bleeding edge of the web-engine, without having to build it or install any distro package.

Implementing new features in WebKit is one of the many expertize domains we are involved in at Igalia. Our multimedia team is always on the lookout to help folks in their projects involving either GStreamer or WebKit or both! Don’t hesitate to reach out.

by Philippe Normand at June 10, 2020 07:00 AM

June 09, 2020

Alejandro Piñeiro

v3dv: quick guide to build and run some demos

Just today it has published a status update of the Vulkan effort for the Raspberry Pi 4, including that we are moving the development of the driver to an open repository. As it is really likely that some people would be interested on testing it, even if it is not complete at all, here you can find a quick guide to compile it, and get some demos running.


So let’s start installing some dependencies. My personal recipe, that I use every time I configure a new machine to work on mesa is the following one (sorry if some extra unneeded dependencies slipped):

sudo apt-get install libxcb-randr0-dev libxrandr-dev \
        libxcb-xinerama0-dev libxinerama-dev libxcursor-dev \
        libxcb-cursor-dev libxkbcommon-dev xutils-dev \
        xutils-dev libpthread-stubs0-dev libpciaccess-dev \
        libffi-dev x11proto-xext-dev libxcb1-dev libxcb-*dev \
        bison flex libssl-dev libgnutls28-dev x11proto-dri2-dev \
        x11proto-dri3-dev libx11-dev libxcb-glx0-dev \
        libx11-xcb-dev libxext-dev libxdamage-dev libxfixes-dev \
        libva-dev x11proto-randr-dev x11proto-present-dev \
        libclc-dev libelf-dev git build-essential mesa-utils \
        libvulkan-dev ninja-build libvulkan1 python-mako \
        libdrm-dev libxshmfence-dev libxxf86vm-dev \

Most Raspian libraries are recent enough, but they have been updating some of then during the past months, so just in case, don’t forget to update:

$ sudo apt-get update
$ sudo apt-get upgrade

Additionally, you woud need to install meson. Mesa has just recently bumped up the version needed for meson, so Raspbian version is not enough. There is the option to build meson from the tarball (meson-0.52.0 here), but by far, the easier way to get a recent meson version is using pip3:

$ pip3 install meson

2020-07-04 update

It seems that some people had problems if they have installed meson with apt-get on their system, as when building it would try the older meson version first. For those people, they were able to fix that doing this:

$ sudo apt-get remove meson
$ pip3 install --user meson

Download and build v3dv

This is the simpler recipe to build v3dv:

$ git clone mesa
$ cd mesa
$ git checkout wip/igalia/v3dv
$ meson --prefix /home/pi/local-install --libdir lib -Dplatforms=x11,drm -Dvulkan-drivers=broadcom -Ddri-drivers= -Dgallium-drivers=v3d,kmsro,vc4 -Dbuildtype=debug _build
$ ninja -C _build
$ ninja -C _build install

This builds and install a debug version of v3dv on a local directory. You could set a release build, or any other directory. The recipe is also building the OpenGL driver, just in case anyone want to compare, but if you are only interested on the vulkan driver, that is not mandatory.

Run some Vulkan demos

Now, the easiest way to ensure that a vulkan program founds the drivers is setting the following envvar:

export VK_ICD_FILENAMES=/home/pi/local-install/share/vulkan/icd.d/broadcom_icd.armv7l.json

That envvar is used by the Vulkan loader (installed as one of the dependencies listed before) to know which library load. This also means that you don’t need to use LD_PRELOAD, LD_LIBRARY_PATH or similar

So what Vulkan programs are working? For example several of the Sascha Willem Vulkan demos. To make things easier to everybody, here another quick recipe of how to get them build:

$ sudo apt-get install libassimp-dev
$ git clone --recursive  sascha-willems
$ cd sascha-willems
$ mkdir build; cd build
$ cmake -DCMAKE_BUILD_TYPE=Debug  ..
$ make
$ cd bin
$ ./gears

If everything went well, doing that would get this familiar image:

If you want to a somewhat more eye candy demo, you would need to download some assets. So:

$ cd ../..
$ python3
$ cd build/bin

As mentioned, not all the demos works. But a list of some that we tested and seem to work:
* distancefieldfonts
* descriptorsets
* dynamicuniformbuffer
* gears
* gltfscene
* imgui
* indirectdraw
* occlusionquery
* parallaxmapping
* pbrbasic
* pbribl
* pbrtexture
* pushconstants
* scenerendering
* shadowmapping
* shadowmappingcascade
* specializationconstants
* sphericalenvmapping
* stencilbuffer
* textoverlay
* texture
* texture3d
* texturecubemap
* triangle
* vulkanscene

Update : rpiMike on the comments, and some people privately, have pointed some errors on the post. Thanks! And sorry for the inconvenience.

Update 2 : Mike Hooper pointed more issues on gitlab

by infapi00 at June 09, 2020 09:33 AM

June 08, 2020

Jacobo Aragunde

Dialog accessibility in Chromium

In the latest weeks I’ve been identifying and fixing several issues related to accessibility on dialogs (called “bubbles” in the code base), specially but not limited to the Linux platform.

It all started with the “Restore pages” dialog that appears when restarting after a browser crash. ATs, like screen readers, were not being notified about the presence of that dialog due to it using an incorrect role, which made it impossible for a blind user to find it out unless by chance, tabbing through the application.

While I was working on that, I detected more issues related to this and other dialogs, so I started reporting and fixing individually. They also led me to an existing meta-bug related to the “restore pages” dialog and accessibility… In the end, this is what I accomplished:

For the original issue with ATs not being able to report the “restore pages” dialog, a quick solution was to make this subwindow use the appropriate ATK role, “alert”, and implement some code in Orca, the screen reader, to detect an alert on a newly created browser window.

Now that users are notified of the presence of that dialog, it would be great to provide them with a way to focus it directly. Two related hotkeys are available in Chromium: F6 to rotate the pane focus, which should focus dialogs first if they are present, and Alt+Shift+A to specifically focus a dialog; but they did not work with that dialog. I fixed this problem, making the hotkey handler code look for dialogs anchored to the menu icon, where the “restore pages” dialog is located. This problem was affecting all platforms, so it’s been a big gain!

Testing the existing hotkey code led me to trying out other kinds of dialogs, and I noticed that permission dialogs (like “a website wants to know your location”) had similar problems: they were not notified and not affected by hotkeys. I fixed both things by making sure that the dialogs have the proper role, that the alert events are properly managed by the Linux accessibility backend, and checking the browser omnibar for anchored dialogs when the focus hotkeys are used.

I detected similar problems in the “store password” bubble; the Orca screen reader was unable to announce that, because it didn’t have the expected role nor it did emit the proper events. Changing the role of the bubble was enough to activate the code that triggered the events, fixing both problems at the same time.

Finally, working with dialogs and alerts made us reconsider their role mappings for ATK, which we decided to modify to better match Chromium and ARIA roles.

There are more enhancements in the backlog, for example, we will try to minimize the number of redundant alert events or come up with a more general solution to decide the role of the dialog, which is also causing similar problems to Windows accessibility (e.g. on the “restore pages” dialog).

Thanks a lot to everyone who helped land these patches, specially Googlers who provided their feedback on reviews!

by Jacobo Aragunde Pérez at June 08, 2020 04:30 PM

June 07, 2020

Ricardo García

Visualizing images from VK-GL-CTS test results

When running OpenGL or Vulkan tests normally from VK-GL-CTS, the test suite executable will usually produce a file named TestResults.qpa containing test results in XML format. Sometimes either by default or when a test fails, this file contains output images obtained typically from the test output itself and maybe a reference image the former is being compared to. In addition, sometimes an error mask is provided so implementors can easily detect cases of pixels falling outside the expected result range.

These images are normally converted to PNGs using 32-bits per pixel (RGBA8) and represented in the test log as a string of Base64-encoded binary data. Representing the result image in that format can be an approximation exercise when the original format or type is very different (like R32G32B32A32_SFLOAT or a 3D image), but in some situations the result image can be faithfully represented in that chunk of PNG-encoded data.

To view those images, the VK-GL-CTS README file mentioned the possibility of using the Cherry tool. Given that it requires setting up a web server and running tests in a special way, some people would typically rely on external tools or scripts instead, like the base64 command line tool included in GNU coreutils, which can encode and decode Base64 content.

My Igalia colleague Eduardo Lima, however, had another idea. Most people have a tool in their systems which is capable of directly displaying Base64-encoded PNG data: a web browser. With a little bit of Javascript, he had created a self-contained and single-page web app you could use to view images by pasting the PNG-encoded version of those images directly. The tool created <img> elements on the page from the pasted text, and used data:image/png;base64,<Base64-encoded data> as the image src property. It also allowed comparing two images side by side and was, in general, incredibly handy.

Alejandro Piñeiro, currently working on the Raspberry Pi Vulkan driver also at Igalia, suggested improving the existing tool by allowing it to read TestResults.qpa files directly in order to reduce friction. I’m far from a web programmer and I’m sorry for the many times I have probably sinned along the way, but I took his suggestions and my own pet peeves with the existing tool and implemented an improved version of it. I submitted my new tool for review and inclusion as part of VK-GL-CTS and I’m happy to say it landed not long ago. If you use VK-GL-CTS, do not hesitate to open qpa_image_viewer.html from the scripts subdirectory in your web browser and give it a go when visualizing test results.

I have also uploaded a copy of the tool to my own personal space at Igalia. Feel free to bookmark it and use it when needed. It’s just a few KB in size. As I mentioned before, it’s self-contained and standalone, so everything happens locally in your browser. You can read its source code as it’s also relatively small. You can open TestResults.qpa files directly from your system and you can also paste chunks of text containing <Image> elements. It accumulates images in the images section at the bottom, i.e., every time you tell it to process a chunk of text or process a new file, it will add the images it finds to the images section. To ease identifying problems, it also includes a built-in zoom tool. Images have the image-rendering: pixelated CSS property, which is supported in Chromium and WebKit-based browsers (sadly, not Firefox for now), making the zooming process use the nearest-neighbor approximation instead of a linear interpolation, so pixels are represented as faithfully as possible when scaling.

June 07, 2020 01:59 PM

June 05, 2020

Igalia Compilers Team

What we do at Igalia’s Compiler Team

Compilers for the web

At Igalia, our development teams have included a team specializing in compilers since around 2012. Since most tech companies don’t work on compilers or even more generally on programming language implementation, you might be wondering “What does a compilers team even do?”. This blog post will try to explain, as well as highlight some of our recent work.

While many companies who work on compilers own or maintain their own programming language (e.g., like Google and Go, Apple and Swift, Mozilla and Rust, etc.), domain-specific compiler or language, Igalia is a little bit different.

Since we are a consulting company, our compiler team instead helps maintain and improve existing free software/open source programming language implementations, with a focus on languages for the web. In other words, we help improve JavaScript engines and, more recently, WebAssembly (Wasm) runtimes.

To actually do the work, Igalia has grown a compilers team of developers from a variety of backgrounds. Some of our developers came into the job from a career in industry, and others from a research or academic setting. Our developers are contributors to a variety of non-web languages as well, including functional programming languages and scripting languages.

Our recent work

Given our team’s diverse backgrounds, we are able to work on not only compiler implementations (which includes compilation, testing, maintenance, and so on) but also in the standardization process for language features. To be more specific, here are some examples of projects we’re working on, split into several areas:

  • Maintenance: We work on the maintenance of JS engines to make sure they work well on platforms that our customers care about. For example, we maintain the support for 32-bit architectures in JavaScriptCore (WebKit’s JS engine). This is especially important to us because WebKit is used on billions of embedded devices and we are the maintainers of WPE, the official WebKit port for embdedded systems.
    • This involves things like making sure that CI continues to pass on platforms like ARMv7 and MIPS, and also making sure that JS engine performance is good on these platforms.
    • Recently, some of our developers have been sharing their knowledge about JSC development in several blog posts. [1], [2], [3]
  • JS feature development & standardization: We also work on implementing features proposed by the web platform community in all of the major JS engines, and we work on standardizing features as participants in TC39.
    • Recently we have been doing a lot of work around class fields and private methods in multiple browsers.
    • We’re also involved in the work on the Temporal proposal for better date/time management in JS.
    • Another example of our recent work in standardization is the BigInt feature, which is now part of the language specification for JS. Igalians led work on both the specification and also its implementation in browsers. [1], [2] We are currently working on integrating BigInts with WebAssembly as well.
  • WebAssembly: In the last year, we have gotten more involved in helping to improve Wasm, the new low-level compiler target language for the web (so that you can write C/C++/etc. code that will run on the web).

In the future, we’ll continue to periodically put pointers to our recent compilers work on this blog, so please follow along!

If you think you might be interested in helping to expand the web platform as a customer, don’t hesitate to get in touch!

by compilers at June 05, 2020 06:28 PM

Paulo Matos

JSC: what are my options?

Compilers tend to be large pieces of software that provide an enormous amount of options. We take a quick look at how to find what JavaScriptCore (JSC) provides.


by Paulo Matos at June 05, 2020 12:23 PM

June 03, 2020

Andy Wingo

a baseline compiler for guile

Greets, my peeps! Today's article is on a new compiler for Guile. I made things better by making things worse!

The new compiler is a "baseline compiler", in the spirit of what modern web browsers use to get things running quickly. It is a very simple compiler whose goal is speed of compilation, not speed of generated code.

Honestly I didn't think Guile needed such a thing. Guile's distribution model isn't like the web, where every page you visit requires the browser to compile fresh hot mess; in Guile I thought it would be reasonable for someone to compile once and run many times. I was never happy with compile latency but I thought it was inevitable and anyway amortized over time. Turns out I was wrong on both points!

The straw that broke the camel's back was Guix, which defines the graph of all installable packages in an operating system using Scheme code. Lately it has been apparent that when you update the set of available packages via a "guix pull", Guix would spend too much time compiling the Scheme modules that contain the package graph.

The funny thing is that it's not important that the package definitions be optimized; they just need to be compiled in a basic way so that they are quick to load. This is the essential use-case for a baseline compiler: instead of trying to make an optimizing compiler go fast by turning off all the optimizations, just write a different compiler that goes from a high-level intermediate representation straight to code.

So that's what I did!

it don't do much

The baseline compiler skips any kind of flow analysis: there's no closure optimization, no contification, no unboxing of tagged numbers, no type inference, no control-flow optimizations, and so on. The only whole-program analysis that is done is a basic free-variables analysis so that closures can capture variables, as well as assignment conversion. Otherwise the baseline compiler just does a traversal over programs as terms of a simple tree intermediate language, emitting bytecode as it goes.

Interestingly the quality of the code produced at optimization level -O0 is pretty much the same.

This graph shows generated code performance of the CPS compiler relative to new baseline compiler, at optimization level 0. Bars below the line mean the CPS compiler produces slower code. Bars above mean CPS makes faster code. You can click and zoom in for details. Note that the Y axis is logarithmic.

The tests in which -O0 CPS wins are mostly because the CPS-based compiler does a robust closure optimization pass that reduces allocation rate.

At optimization level -O1, which adds partial evaluation over the high-level tree intermediate language and support for inlining "primitive calls" like + and so on, I am not sure why CPS peels out in the lead. No additional important optimizations are enabled in CPS at that level. That's probably something to look into.

Note that the baseline of this graph is optimization level -O1, with the new baseline compiler.

But as I mentioned, I didn't write the baseline compiler to produce fast code; I wrote it to produce code fast. So does it actually go fast?

Well against the -O0 and -O1 configurations of the CPS compiler, it does excellently:

Here you can see comparisons between what will be Guile 3.0.3's -O0 and -O1, compared against their equivalents in 3.0.2. (In 3.0.2 the -O1 equivalent is actually -O1 -Oresolve-primitives, if you are following along at home.) What you can see is that at these optimization levels, for these 8 files, the baseline compiler is around 4 times as fast.

If we compare to Guile 3.0.3's default -O2 optimization level, or -O3, we see bigger disparities:

Which is to say that Guile's baseline compiler runs at about 10x the speed of its optimizing compiler, which incidentally is similar to what I found for WebAssembly compilers a while back.

Also of note is that -O0 and -O1 take essentially the same time, with -O1 often taking less time than -O0. This is because partial evaluation can make the program smaller, at a cost of being less straightforward to debug.

Similarly, -O3 usually takes less time than -O2. This is because -O3 is allowed to assume top-level bindings that aren't exported from a module can be transformed to lexical bindings, which are more available for contification and inlining, which usually leads to smaller programs; it is a similar debugging/performance tradeoff to the -O0/-O1 case.

But what does one gain when choosing to spend 10 times more on compilation? Here I have a gnarly graph that plots performance on some microbenchmarks for all the different optimization levels.

Like I said, it's gnarly, but the summary is that -O1 typically gets you a factor of 2 or 4 over -O0, and -O2 often gets you another factor of 2 above that. -O3 is mostly the same as -O2 except in magical circumstances like the mbrot case, where it adds an extra 16x or so over -O2.

worse is better

I haven't seen the numbers yet of this new compiler in Guix, but I hope it can have a good impact. Already in Guile itself though I've seen a couple interesting advantages.

One is that because it produces code faster, Guile's boostrap from source can take less time. There is also a felicitous feedback effect in that because the baseline compiler is much smaller than the CPS compiler, it takes less time to macro-expand, which reduces bootstrap time (as bootstrap has to pay the cost of expanding the compiler, until the compiler is compiled).

The second fortunate result is that now I can use the baseline compiler as an oracle for the CPS compiler, when I'm working on new optimizations. There's nothing worse than suspecting that your compiler miscompiled itself, after all, and having a second compiler helps keep me sane.

stay safe, friends

The code, you ask? Voici.

Although this work has been ongoing throughout the past month, I need to add some words on the now before leaving you: there is a kind of cognitive dissonance between nerding out on compilers in the comfort of my home, rain pounding on the patio, and at the same time the world on righteous fire. I hope it is clear to everyone by now that the US police are an essentially racist institution: they harass, maim, and murder Black people at much higher rates than whites. My heart is with the protestors. Godspeed to you all, from afar. At the same time, all my non-Black readers should reflect on the ways they participate in systems that support white supremacy, and on strategies to tear them down. I know I will be. Stay safe, wear eye protection, and until next time: peace.

by Andy Wingo at June 03, 2020 08:39 PM

May 26, 2020

Brian Kardell

Web Engine Diversity and Ecosystem Health

Web Engine Diversity and Ecosystem Health

For many years, we've seen lots of blog posts and conversation on the topic of "browser engine diversity". I'd like to offer a slightly tilted view of this based instead on "the health of the ecosystem", and explain why I think this is more valuable measure and way to discuss these topics.

Back in January, Jeremy Keith compared the complexity of browser engine diversity to political systems...

If you have hundreds of different political parties, that’s not ideal. But if you only have one political party, that’s very bad indeed!

I like this analogy bat's almost self-evident: 1 is too few, a hundred is too many - it's just chaos and noise. But what is the ideal number? This answer dogs us a lot, in part because it is not only unanswerable, it's actually kind of a red herring. I think there's something to this that leads to an important takeaway about how we think about the problem...

Numbers and goals...

The interesting part about this analogy is that the simple number of political parties isn't actually a very meaningful measure: Parties can differ a lot, or they can differ a little. 6 parties that barely disagree is quite a different thing than 3 that disagree substantially. Further, it's not simply a matter of giving voice to every dissenting opinion that matters either: There are political parties formed on ideas of hate, for example. Adding those doesn't add good things. In short, what really matters is the goal of those numbers.

In the case of the political analogy, the real aim is about what makes for a healthy, effective and just model of governance. Similarly on the topic of "browser engine diversity", I believe that the real goal is "What makes for a healthy and effective ecosystem?" and that the numbers we frequently talk about (and how) can easily lead us into perhaps the wrong takeaways.

Older engines and diversity

It is fairly common that discussions of browser diversity point to the Web's own history for examples of "why engine diversity matters": IE vs Netscape are often held up. IE's dominance and ability to dictate the market (or even kill it in this case) was overcome by diversity. This is true, but very incomplete: There's a lot more beneath the surface...

Consider, for example, the fact that for a time there were only two mainstream browsers - and the very dominant one (IE) was both entirely proprietary and based on a single proprietary OS.

Consider how different this could have been, if IE had been open source and properly licensed. Even if Microsoft didn't want it to run on other operating systems, someone else could step in and fill that gap. If Microsoft walked away from the Web, or worse, went belly up, someone could fork and take over the project. If, for any reason at all, the primary maintainer cannot prioritize - other people and organization could more directly invest in advancements.

A while later, Opera's Presto was very interesting, even multi-OS - but ultimately similarly proprietary.

So, while we had greater 'diverity', these numbers alone aren't a great measure of the overall health of the ecosystem.

The most healthy, open ecosystem ever.

This is in stark contrast to the situation today: In important ways, we are a more diverse, efficient and healthier ecosystem with the three multi-os, open source engines we have left than when we had had more and were dominated by projects that weren't that at all.

A lot of people seem to want to suggest that there aren't really 3 today because blink was born from a fork of WebKit, or because they think that WebKit is somehow "Just for iOS".

There are certainly 3 entirely different JavaScript engines, and 3 entirely different architectures. There are some bits that remain similar, but the truth is that that this is neither new nor easily solvable. Web engines today are so astonishingly complex (measured in tens of millions of lines of code) that we only get new ones through evolution. For the most part, this has been true for a long time. Even Netscape was a "take 2" from many of the same minds from Mosaic. Mozilla was created from a rework of Netscape. Even IE was born by licensing the Mosaic code. WebKit was born by forking KHTML. In other words: The critical investment required to create a full-blown compliant browser from nothing would be mind-boggling if everything were stopped today - and it never stops.

Today, the 3 that remain are all multi-OS.

"...Wait, even WebKit?" - you, just now.

Yes! I'm glad you asked! The GNOME flagship Epiphany/Web browser for Linux is WebKit based -- and billions of devices are based on Embedded WebKit. PS4 is a fun example of something lots of people might be familliar with -- not only the Web browser on your PS4, but in fact lots of the UI itself is made with Web content running in a WebKit based browser. But chances are pretty good that you encounter emdedded WebKit all the time: On cable boxes, smart TVs and appliances, kiosks, car and airplane infotainment systems, digital signage and so on.

This is possible because WebKit (and all of the engines today) are open source. Because of this, they all receive investments more widely than the org that maintains them, and that's expanding in important ways that are very good for the health of the ecosystem.

Igalia, where I work, for example are able to work on all of the browsers and help expand investment in advancing the commons. The list of things we have helped advance, and who has helped fund that is growing all the time: From recent things in JavaScript like class fields and Big Integer, to CSS features like CSS Grid, or features in HTML like lazy loading - the benefits of this are clear.

This also leads to lots of interesting new opportunities. For example, we are the creators/maintainers of WPE, the official WebKit Port for Embedded that powers tons of those devices mentioned above. This matters a lot because a rising tide lifts all boats in the commons. You can see an example of this in that Igalia is advancing work on SVG2 and hardware acceleration for SVG in WebKit - a topic which has failed to get the prioritization for years, but is especially critical for lots of embedded use cases. If you're interested in this, as well as some interesting history of SVG, WebKit, HTML/CSS and embedded browsers, have a listen to this.

Opportunities to do better...

There is another interesting thing here that is always left out of these discussions but I think is considerably interesting in this reframing. A funny little quirk of history is that while we say there are 3 rendering engines, that's only partially true.

More specifically, there are 3 modern browser rendering engines, and a bunch of other rendering engines that aren't that.

Amazon, for example, has a renderer that is used for ebooks and printing. PrinceXML has a renderer used for print. Antenna house too. And there are more still.

For the most part, these are proprietary and what they do and don't support isn't necessarily clear cut. Their support for modern standards is pretty ragged and the gaps are only likely to grow faster. This is a shame as there are lots of potentially useful things for these industries, like Houdini which are unlikely to ever exist for them.

This is a topic I'd love to see discussed more for several reasons - but mainy they center on the fact that it is just harder for us to move forward together if things are too fragmented. Instead of growing together and a rising tide lifting all boats - the boats are kind of all in entirely different bodies of water.

I would love to see some effort to resolve this. Imagine if these vendors came into the fray and either invested in basing their work on modern engines (hopefully various), or pulled together to create something new. That might be the one practical way we could arrive at an actual viable 4th engine somehow.

Why now is a good time to discuss it

Most of these engines also contain some things that were developed in standards organizations but which no browser supports... That's kind of why they were created. However, there are new opportunities to align...

Web engine architectures are like a lot of engineering projects - they grow, they get crufty, you get locked into certain boundaries that you know you'd like to escape. However,until you do you can't really afford to tackle certain kinds of problems. That's a big part of why browser engines today don't do a lot of the things you needed to fill their use cases. Browser architectures have traditionally, generally not been well-prepared for the problems of print or ebooks, and that's part of why we find ourselves in the current situation.

However, for many years, there has been much rework and rearchitecture of layout engines aimed at solving precisely the fundamental sorts of things that prevented those sorts of things from being considered.

Stir in that most of our office suites and things are now web based and there's considerable incentives to figuring out things like good print and pagination.

More open prioritization

But it's not just the projects being open source that matters, it's also that this affords us new opportunities for standardization itself.

Ultimately, it is the calculus of prioritization which impacts our ability to get things done almost more than anything else. Historically, lots of companies get together and, effectively, ask that a very few companies do the work. They are big organizations with big investments, to be sure - but they are all constrained by hard limits. The gauntlet of issues surrounding our ability to prioritize everything from attention to actual implementation have stymied many a topic.

But in this new age, companies like Igalia are showing that there is potentially a whole new game here: Where centralized, concrete investments from outside that small group can lift up the commons, benefit everyone and help drive progress.

In other words: Things are better and healthier because we continue to find better ways to work together. And when we do, everyone does better.

May 26, 2020 04:00 AM

May 24, 2020

Ricardo García

Replacing Vim with Visual Studio Code

For the better part of the last 20 years or so I’ve been a heavy Vim user and I’ve been using Vim as my main development environment. However, around a year ago I tried Visual Studio Code and I’ve been using it continuously ever since, tweaking some minor bits here and there. While I still use Vim as my main plain text editor (as a matter of fact I’m typing this post using Vim), when it comes to writing code either in a small or large code base, I tend to use VSC. Note most projects I work on are based on C/C++.

The reason for the change is simple: I just believe VSC is a better development environment. It’s not about the way the editor looks or about bells and whistles. VSC, simply put, has all the features most people expect to have in a modern IDE preconfigured and more or less working out of the box, or at least accessible with a few simple clicks and adjustments. It’s true you can configure Vim to give you a similar experience to VSC but it requires learning a respectable amount of new key combinations that you need to get used to (and slowly incorporate into your muscle memory) and, in addition, you need to add and manage every bit of new configuration yourself.

For example, if you search on the web for a list of useful Vim plugins for coders you’ll find plenty of articles and posts like this one from Alex Hunt recommending many things I was using in Vim, like fzf to quickly find files by name in the tree, lightline, a multiple cursors plugin, NerdTree, EditorConfig support, etc. Those are definitely great but VSC has all of that out of the box without having to do anything, save for installing a couple of extensions with a few mouse clicks, and I don’t need to use Vundle and remember to keep my plugins updated.

Obviously, not everything is perfect in the world of VSC. Thanks to Vim’s cindent, I remember having a more seamless experience indenting code, almost requiring no input from my side. In VSC I find myself hitting the Tab key or Shift+Tab key combination more frequently than I used to. Or, for example, Vim’s macros and pattern substitution commands were more powerful for automating text replacement compared to multiple cursors, but the reality is that multiple cursors are easier to use and good enough for 90% of the cases.

Strengths of Visual Studio Code

I’ve already mentioned a better experience out of the box and, in general, being easier to use. Not long ago, this “Making Emacs popular again” article was circulating around and reached the front page in Hacker News. It’s about an internal discussion in the Emacs community regarding how to make Emacs more attractive to newcomers and pointing to a Stack Overflow survey that revealed Visual Studio Code was chosen as the main development environment by more than 50% of respondents. Both in and Hacker News, some people were surprised about how fast, in a matter of a few years, Visual Studio Code was offering a better out-of-the-box experience than Emacs, which has been out there for more than 40 years, but I don’t think that’s a fair comparison.

If you’ve ever had to work writing code for Windows or know someone who does, you probably know or have heard that Visual Studio, the original proprietary tool, is a great development environment. You can blame Microsoft for many things, but definitely not for forgetting to put man-hours and resources behind its development tools. Visual Studio has been praised for years for being powerful, smart and having excellent debugging facilities. So Visual Studio Code is not a new project coming out of nowhere. It’s a bunch of people from Microsoft trying to replicate and refine that experience using a multiplatform interface based on Electron, something that’s possible today but wouldn’t have been possible several years ago.

While doing that, they have embraced or created open specifications and standards to ease interoperability. For example, while Visual Studio is known for its “project approach” in C/C++, where you typically create a “solution” that may contain one or more configured “projects”, Visual Studio Code is way more relaxed. Coupled with Microsoft’s proprietary C/C++ plugin (more on that later), it will happily launch in a directory containing the source code for a C/C++ project and find a compile_commands.json file in there (a standard originating in the LLVM project). It will start indexing source code files and extracting the code structure, and it will be ready to assist you writing new code in seconds or a few minutes if the project is very large (you can, of course, start writing code immediately but its smart code completion features may not be fully available until it finishes indexing the project). It launches fast and caches important data and editor state, so the second day you will be able to continue where you left off immediately.

For its smart “IntelliSense” features it relies on an open standard called “Language Server Protocol” that originated precisely in Visual Studio Code. Those smart features aim to give you all the information you need to write code while requiring the absolute minimum from you. For example, lets suppose you’re writing a new statement that calls a function. If VSC can locate where that function is declared, it will display a tooltip with its prototype (or a list of prototypes to choose if the function is overloaded) and will let you easily see the function arguments and their types, highlighting which one is next as you fill the list. If you wrote that function and you happened to include a comment above the function explaining what it does (just a comment, no special Doxygen-style formatting is required), it will display it too as part of the tooltip. Of course, when typing a period or an arrow to call an object method, it will also list the available methods and so on.

Also worth mentioning are the C/C++ debugging facilities provided by the proprietary C/C++ plugin. Sometimes it’s a joy to be able to so easily browse local variables, STL data structures or to get the value of variables merely by hovering the mouse cursor over them while reviewing the code (values will be revealed in a tooltip while in debugging mode), to name a few things it does right. For these tasks, VSC typically relies on GDB or LLDB in their “machine interface” modes on Linux, so it’s again leveraging interoperability.

In addition, it’s an extensible IDE with an add-ons market similar to the ones for Chrome and Firefox which has allowed a community to grow around the project. Finally, the editor itself is fully available as real open source but let me clarify that a bit before closing this post.

Visual Studio Code license

Like I said, the core of the editor is available under a real open source license. However, when you install Visual Studio Code using one of its official repositories (like the one available for Fedora), you will be downloading a binary that contains a few extra things and is itself released under a Freeware (not Free Software) license.

In that regard, you can choose to use binaries provided by the VSCodium project. VSCodium is to VSC what Chromium is to Chrome (hence, I believe, the name of the project). If you use VSCodium, a few extensions and plugins that are available for VSC will not be available in Codium. For example, Microsoft’s official C/C++ plugin, which is not released under a Free Software license either. Its source code is available but the binary releases you get from the add-ons market are also released under a Freeware license. Specifically, its license prohibits using it in anything else than VSC as you can read in the first numbered point.

This otherwise excellent C/C++ extension provides two key aspects of the VSC experience: “IntelliSense” and the debugging interface. As I mentioned before, code completion and assistance uses the language server protocol under the hood. This matches what, for example, YouCompleteMe uses for Vim. As far as free software implementations of a language server go, for C/C++ you typically have clangd (YCM uses this one) and also ccls. Both have a VSC extension in its market under a proper FOSS license. I have tried both and I think ccls is better as of the time I’m writing this. ccls is, in my humble opinion, 95% there compared to Microsoft’s extension. It provides 90% of what Microsoft gives you and 5% more details that are missing from their proprietary extension, which is pretty cool.

The project wiki for ccls has some neat instructions about how to use it for code completion while relying on Microsoft’s extension for the debugger, which made me wonder if there’s any FOSS extension out there providing a nice debugger interface that can replace the one by Microsoft. So far I’ve found Native Debug. It might be good enough for you but I’ve found it to be a bit lacking in some aspects. At Igalia we currently do not have the resources needed to develop a better debugging extension or helping improve that one. However, I truly believe someone with deeper pockets and an interest in the free software ecosystem could and should fund an initiative to get the debugging facilities in VSC to the next level, releasing the result under a FOSS license. Coupled with clangd or ccls, that would make the C/C++ proprietary extension not needed and we could rely on an entirely free software solution.

May 24, 2020 01:04 PM

May 14, 2020

Mario Sanchez Prada

The Web Platform Tests project

Web Browsers and Test Driven Development

Working on Web browsers development is not an easy feat but if there’s something I’m personally very grateful for when it comes to collaborating with this kind of software projects, it is their testing infrastructure and the peace of mind that it provides me with when making changes on a daily basis.

To help you understand the size of these projects, they involve millions of lines of code (Chromium is ~25 million lines of code, followed closely by Firefox and WebKit) and around 200-300 new patches landing everyday. Try to imagine, for one second, how we could make changes if we didn’t have such testing infrastructure. It would basically be utter and complete chao​s and, more especially, it would mean extremely buggy Web browsers, broken implementations of the Web Platform and tens (hundreds?) of new bugs and crashes piling up every day… not a good thing at all for Web browsers, which are these days some of the most widely used applications (and not just ‘the thing you use to browse the Web’).

The Chromium Trybots in action
The Chromium Trybots in action

Now, there are all different types of tests that Web engines run automatically on a regular basis: Unit tests for checking that APIs work as expected, platform-specific tests to make sure that your software runs correctly in different environments, performance tests to help browsers keep being fast and without increasing too much their memory footprint… and then, of course, there are the tests to make sure that the Web engines at the core of these projects implement the Web Platform correctly according to the numerous standards and specifications available.

And it’s here where I would like to bring your attention with this post because, when it comes to these last kind of tests (what we call “Web tests” or “layout tests”), each Web engine used to rely entirely on their own set of Web tests to make sure that they implemented the many different specifications correctly.

Clearly, there was some room for improvement here. It would be wonderful if we could have an engine-independent set of tests to test that a given implementation of the Web Platform works as expected, wouldn’t it? We could use that across different engines to make sure not only that they work as expected, but also that they also behave exactly in the same way, and therefore give Web developers confidence on that they can rely on the different specifications without having to implement engine-specific quirks.

Enter the Web Platform Tests project

Good news is that just such an ideal thing exists. It’s called the Web Platform Tests project. As it is concisely described in it’s official site:

“The web-platform-tests project is a cross-browser test suite for the Web-platform stack. Writing tests in a way that allows them to be run in all browsers gives browser projects confidence that they are shipping software which is compatible with other implementations, and that later implementations will be compatible with their implementations.”

I’d recommend visiting its website if you’re interested in the topic, watching the “Introduction to the web-platform-tests” video or even glance at the git repository containing all the tests here. Here, you can also find specific information such as how to run WPTs or how to write them. Also, you can have a look as well at the dashboard to get a sense of what tests exists and how some of the main browsers are doing.

In short: I think it would be safe to say that this project is critical to the health of the whole Web Platform, and ultimately to Web developers. What’s very, very surprising is how long it took to get to where it is, since it came into being only about halfway into the history of the Web (there were earlier testing efforts at the W3C, but none that focused on automated & shared testing). But regardless of that, this is an interesting challenge: Filling in all of the missing unified tests, while new things are being added all the time!

Luckily, this was a challenge that did indeed took off and all the major Web engines can now proudly say that they are regularly running about 36500 of these Web engine-independent tests (providing ~1.7 million sub-tests in total), and all the engines are showing off a pass rate between 91% and 98%. See the numbers below, as extracted from today’s WPT data:

Chrome 84 Edge 84 Firefox 78 Safari 105 preview
Pass Total Pass Total Pass Total Pass Total
1680105 1714711 1669977 1714195 1640985 1698418 1543625 1695743
Pass rate: 97.98% Pass rate: 97.42% Pass rate: 96.62% Pass rate: 91.03%

And here at Igalia, we’ve recently had the opportunity to work on this for a little while and so I’d like to write a bit about that…

Upstreaming Chromium’s tests during the Coronavirus Outbreak

As you all know, we’re in the middle of an unprecedented world-wide crisis that is affecting everyone in one way or another. One particular consequence of it in the context of the Chromium project is that Chromium releases were paused for a while. On top of this, some constraints on what could be landed upstream were put in place to guarantee quality and stability of the Chromium platform during this strange period we’re going through these days.

These particular constraints impacted my team in that we couldn’t really keep working on the tasks we were working on up to that point, in the context of the Chromium project. Our involvement with the Blink Onion Soup 2.0 project usually requires the landing of relatively large refactors, and these kind of changes were forbidden for the time being.

Fortunately, we found an opportunity to collaborate in the meantime with the Web Platform Tests project by analyzing and trying to upstream many of the existing Chromium-specific tests that haven’t yet been unified. This is important because tests exist for widely used specifications, but if they aren’t in Web Platform Tests, their utility and benefits are limited to Chromium. If done well, this would mean that all of the tests that we managed to upstream would be immediately available for everyone else too. Firefox and WebKit-based browsers would not only be able to identify missing features and bugs, but also be provided with an extra set of tests to check that they were implementing these features correctly, and interoperably.

The WPT Dashboard
The WPT Dashboard

It was an interesting challenge considering that we had to switch very quickly from writing C++ code around the IPC layers of Chromium to analyzing, migrating and upstreaming Web tests from the huge pool of Chromium tests. We focused mainly on CSS Grid Layout, Flexbox, Masking and Filters related tests… but I think the results were quite good in the end:

As of today, I’m happy to report that, during the ~4 weeks we worked on this my team migrated 240 Chromium-specific Web tests to the Web Platform Tests’ upstream repository, helping increase test coverage in other Web Engines and thus helping towards improving interoperability among browsers:

  • CSS Flexbox: 89 tests migrated
  • CSS Filters: 44 tests migrated
  • CSS Masking: 13 tests migrated
  • CSS Grid Layout: 94 tests migrated

But there is more to this than just numbers. Ultimately, as I said before, these migrations should help identifying missing features and bugs in other Web engines, and that was precisely the case here. You can easily see this by checking the list of automatically created bugs in Firefox’s bugzilla, as well as some of the bugs filed in WebKit’s bugzilla during the time we worked on this.

…and note that this doesn’t even include the additional 96 Chromium-specific tests that we analyzed but determined were not yet eligible for migrating to WPT (normally because they relied on some internal Chromium API or non-standard behaviour), which would require further work to get them upstreamed. But that was a bit out of scope for those few weeks we could work on this, so we decided to focus on upstreaming the rest of tests instead.

Personally, I think this was a big win for the Web Platform and I’m very proud and happy to have had an opportunity to have contributed to it during these dark times we’re living, as part of my job at Igalia. Now I’m back to working on the Blink Onion Soup 2.0 project, where I think I should write about too, but that’s a topic for a different blog post.

Credit where credit is due

IgaliaI wouldn’t want to finish off this blog post without acknowledging all the different contributors who tirelessly worked on this effort to help improve the Web Platform by providing the WPT project with these many tests more, so here it is:

From the Igalia side, my whole team was the one which took on this challenge, that is: Abhijeet, Antonio, Gyuyoung, Henrique, Julie, Shin and myself. Kudos everyone!

And from the reviewing side, many people chimed in but I’d like to thank in particular the following persons, who were deeply involved with the whole effort from beginning to end regardless of their affiliation: Christian Biesinger, David Grogan, Robert Ma, Stephen Chenney, Fredrik Söderquist, Manuel Rego Casasnovas and Javier Fernandez. Many thanks to all of you!

Take care and stay safe!

by mario at May 14, 2020 09:07 AM

May 11, 2020

Gyuyoung Kim

How Chromium Got its Mojo?

Chromium IPC

Chromium has a multi-process architecture to become more secure and robust like modern operating systems, and it means that Chromium has a lot of processes communicating with each other. For example, renderer process, browser process, GPU process, utility process, and so on. Those processes have been communicating using IPC [1].

Why is Mojo needed?

As a long-term intent, the Chromium team wanted to refactor Chromium into a large set of smaller services. To achieve that, they had considered below questions [3]

  • Which services we bring up?
  • How can we isolate these services to improve security and stability?
  • Which binary features can we ship?

They learned much from using the legacy Chromium IPC and maintaining Chromium dependencies over the past years. They felt a more robust messaging layer could allow them to integrate a large number of components without link-time interdependencies as well as help to build more and better features, faster, and with much less cost to users. So, that’s why Chromium team begins to make the Mojo communication framework.

From the performance perspective, Mojo is 3 times faster than IPC, and ⅓ less context switching compared to the old IPC in Chrome [3]. Also, we can remove unnecessary layers like content/renderer layer to communicate between different processes. Because combined with generated code from the Mojom IDL, we can easily connect interface clients and implementations across arbitrary inter-process boundaries. Lastly, Mojo is a collection of runtime libraries providing a platform-agnostic abstraction of common IPC primitives. So, we can build higher-level bindings APIs to simplify messaging for developers writing C++, Java, Javascript.

Status of migrating legacy IPC to Mojo

Igalia has been working on the migration since this year in earnest. But, hundreds of IPCs still remain in Chromium. The below chart shows the progress of migrating legacy IPC to Mojo [4].

Mojo Terminology

Let’s take a look at the key terminology before starting the migration briefly.

  • Message Pipe: A pair of endpoints and either endpoint may be transferred over another message pipe. Because we bootstrap a primordial message pipe between the browser process and each child process, eventually this means that a new pipe we create ultimately sends either end to any process, and the two ends will still be able to talk to each other seamlessly and exclusively. We don’t need to use routing ID anymore. Each point has a queue of incoming messages.
  • Mojom file: Define interfaces, which are strongly-typed collections of messages. Each interface message is roughly analogous to a single prototype message
  • Remote: Used to send messages described by the interface.
  • Receiver: Used to receive the interface messages sent by Remote.
  • PendingRemote: Typed container to hold the other end of a Receiver’s pipe.
  • PendingReceiver: Typed container to hold the other end of a Remote’s pipe.
  • AssociatedRemote/Receiver: Similar to a Remote and a Receiver. But, they run on multiple interfaces over a single message pipe while preserving message order, because the AssociatedRemote/Receiver was implemented by using the IPC::Channel used by legacy IPC messages.

Example of migrating a legacy IPC to Mojo

In the following example, we migrate WebTestHostMsg_SimulateWebNotificationClose to illustrate the conversion from legacy IPC to Mojo.

The existing WebTestHostMsg_SimulateWebNotificationClose IPC

  1. Message definition
    File: content/shell/common/web_test/web_test_messages.h
                    std::string /*title*/,  bool /*by_user*/)
  • Send the message in the renderer
    File: content/shell/renderer/web_test/
  • void BlinkTestRunner::SimulateWebNotificationClose(
        const std::string& title, bool by_user) {
      Send(new WebTestHostMsg_SimulateWebNotificationClose(
        routing_id(), title, by_user));
  • Receive the message in the browser
    File: content/shell/browser/web_test/
  • bool WebTestMessageFilter::OnMessageReceived(
        const IPC::Message& message) {
      bool handled = true;
      IPC_BEGIN_MESSAGE_MAP(WebTestMessageFilter, message)
  • Call the handler in the browser
    File: content/shell/browser/web_test/
  • void WebTestMessageFilter::OnSimulateWebNotificationClose(
        const std::string& title, bool by_user) {
          SimulateClose(title, by_user);

    Call flow after migrating the legacy IPC to Mojo

    We begin to migrate WebTestHostMsg_SimulateWebNotificationClose to WebTestClient interface from here. First, let’s see an overall call flow through simple diagrams. [5]

    1. The WebTestClientImpl factory method is called with passing the WebTestClientImpl PendingReceiver along to the Receiver.
    2. The receiver takes ownership of the WebTestClientImpl PendingReceiver’s pipe endpoint and begins to watch it for incoming messages. The pipe is readable immediately, so a task is scheduled to read the pending SimulateWebNotificationClose message from the pipe as soon as possible.
    3. The WebTestClientImpl message is read and deserialized, then, it will make the Receiver to invoke the WebTestClientImpl::SimulateWebNotificationClose() implementation on its bound WebTestClientImpl.

    Migrate the legacy IPC to Mojo

    1. Write a mojom file
      File: content/shell/common/web_test/web_test.mojom
    module content.mojom;
    // Web test messages sent from the renderer process to the
    // browser. 
    interface WebTestClient {
      // Simulates closing a titled web notification depending on the user
      // click.
      //   - |title|: the title of the notification.
      //   - |by_user|: whether the user clicks the notification.
      SimulateWebNotificationClose(string title, bool by_user);
  • Add the mojom file to a proper GN target.
    File: content/shell/
  • mojom("web_test_common_mojom") {
      sources = [
  • Implement the interface files
    File: content/shell/browser/web_test/web_test_client_impl.h
  • #include "content/shell/common/web_test.mojom.h"
    class WebTestClientImpl : public mojom::WebTestClient {
      WebTestClientImpl() = default;
      ~WebTestClientImpl() override = default;
      WebTestClientImpl(const WebTestClientImpl&) = delete;
      WebTestClientImpl& operator=(const WebTestClientImpl&) = delete;
      static void Create(
          mojo::PendingReceiver<mojom::WebTestClient> receiver);
      // WebTestClient implementation.
     void SimulateWebNotificationClose(const std::string& title,
                             bool by_user) override;
  • Implement the interface files
    File: content/shell/browser/web_test/
  • void WebTestClientImpl::SimulateWebNotificationClose(
        const std::string& title, bool by_user) {
            SimulateClose(title, by_user);
  • Creating an interface pipe
    File: content/shell/renderer/web_test/blink_test_runner.h
  • mojo::AssociatedRemote<mojom::WebTestClient>&

    File: content/shell/renderer/web_test/

    BlinkTestRunner::GetWebTestClientRemote() {
      if (!web_test_client_remote_) {
       return web_test_client_remote_;
  • Register the WebTest interface
    File: content/shell/browser/web_test/
  • void WebTestContentBrowserClient::ExposeInterfacesToRenderer {
     void WebTestContentBrowserClient::BindWebTestController(
         int render_process_id,
         StoragePartition* partition,
               receiver) {
  • Call an interface message in the renderer
    File: content/shell/renderer/web_test/
  • void BlinkTestRunner::SimulateWebNotificationClose(
     const std::string& title, bool by_user) {
         SimulateWebNotificationClose(title, by_user);
  • Receive the incoming message in the browser
    File: content/shell/browser/web_test/
  • void WebTestClientImpl::SimulateWebNotificationClose(
     const std::string& title, bool by_user) {
         SimulateClose(title, by_user);

    Appendix: A case study of Regression

    There were a lot of flaky web test failures after finishing the migration of WebTestHostMsg to Mojo. The failures were caused by using ‘Remote’ instead of ‘AssociatedRemote’ for WebTestClient interface in the BlinkTestRunner class. Because BlinkTestRunner was using the WebTestControlHost interface for ‘PrintMessage’ as an ‘AssociatedRemote’. But, ‘Remote’ used by WebTestClient didn’t guarantee the message order between ‘PrintMessage’ and ‘InitiateCaptureDump’ message implemented by different interfaces(WebTestControlHost vs. WebTestClient). Thus, tests had often finished before receiving all logs. The actual results could be different from the expected results.

    Changing Remote with AssociatedRemote for the WebTestClient interface solved the flaky test issues.

    [1] Inter-process Communication (IPC)
    [2] Mojo in Chromium
    [3] Mojo & Servicification Performance Notes
    [4] Chrome IPC legacy Conversion Status
    [5] Convert Legacy IPC to Mojo



    by gyuyoung at May 11, 2020 07:46 AM

    May 03, 2020

    Ricardo García

    Upgraded to Fedora 32 and some local mail tips

    Fedora 32 was released a few days ago and I have already upgraded my home computers to it. The upgrade process was as short and smooth as always, so congratulations to everyone involved and also congratulations to everyone behind RPM Fusion, who had their Fedora 32 repositories ready from day 1. From my particular point of view this upgrade required adjusting a few things after installing and some of them are worth mentioning.

    mDNS breaks after upgrading

    As explained in the “Common F32 Bugs” wiki page, there’s a bug present if upgrading from Fedora 31 to Fedora 32 that will not happen when installing the system from scratch. If I recall correctly, two packages will fight to modify /etc/nsswitch.conf and that will probably result in mDNS being removed from that file as an option to resolve host names. It will not make much of a difference to many people but some others may experience problems because of it. For example, in my case the networked work printer stopped “working” in the sense that CUPS wasn’t able to properly contact the printer, which led to confusing situations.

    This upgrade bug, as far as I know, was already present when upgrading from Fedora 30 to Fedora 31. I recommend you to take a look at /etc/nsswitch.conf just after upgrading and adding the mDNS options if not present, as described in the wiki.

    EarlyOOM is now enabled by default in Fedora

    EarlyOOM is a nice thing to have that will try to kill offending processes if you are about to run out of memory, but before actually running out of it so your system will not become unresponsive. It’s one of those relatively simple solutions to a complex problem that will work well for 90% of the cases out there. It prioritizes killing web browsers and other programs known to use a lot of memory while preserving programs that are considered essential for a workstation session if they’re running.

    If you didn’t have the package installed before, you may be missing it after upgrading, so it’s worth taking a look. You could check if the package is installed with rpm -qa | grep earlyoom. Once installed, you can verify it’s running with sudo systemctl status earlyoom. If not, you can enable the unit with sudo systemctl enable earlyoom and start it with sudo systemctl start earlyoom.

    Periodic TRIM is now enabled by default

    Before Fedora 32, I used to mount all my filesystems with the discard mount option. This is called continuous TRIM, but it turns out to have some limitations, like only being able to trim blocks that change from used to free and having some potential or theoretical security problems with disk encryption. In this situation, as mentioned in the Arch Linux wiki, both Debian and Red Hat recommend to use periodic trimming if possible and the circumstances allow it. Periodic trimming is achieved by running fstrim from time to time, a utility provided by the util-linux package, which can be used via a systemd unit and timer file that are also provided there.

    If you were using the discard mount option before, you can remove it from /etc/fstab and the default filesystem mount options, if present, and enable the fstrim.timer unit if not enabled.

    Some Python 2 packages were purged

    With Python 2.x reaching its end-of-life, Fedora has completed the essential purge of Python 2 packages. Many people will not be too affected by this as most outstanding Python packages have been ported to Python 3 or have supported Python 3 for a long time now, but a few of them will be lost. In my case, the getmail package was removed and I was using it as part of my mail forwarding script that you can see here, to receive mail both locally and forwarded to my main FastMail account when a problem happens in my system and a daemon wants to send me an email message (think smartd).

    Fortunately, Postfix allows configuring several destinations in ~/.forward, one per line, so I changed my mail forwarding script to skip local mail delivery and added an extra line in my ~/.forward file, like this:


    This will keep both delivery destinations.

    Bonus: local mail tip

    Related to this, if a daemon generates email output before the network is completely available, the local delivery part will work but the mail forwarding part may not, so I consider it essential to be notified of local mail delivery. As you may know, many shells like Bash will allow you to set a mail spool file to be checked from time to time using the MAIL and MAILCHECK environment variables, the latter being used to specify the checking period. See man bash for more details. However, when a new mail message is detected, this will only print “You have new mail” before the prompt is displayed. If the command you just ran was a bit verbose it’s easy to miss the line, and it will not be printed again until you log in or a new mail message arrives.

    In my opinion, a superior solution is to check for local mail availability every time the prompt is displayed, and to put something in the prompt itself when local mail is available. This way, the notification will stay in the prompt as long as needed and it’s not as easy to miss, specially if you use colors. You will probably want to use an mbox spool in this case, and remove every message from it after reading them (either deleting them like I do or moving read messages to another mail box).

    To achieve what I explained above, I set MAIL to my local mail spool in mbox format (/var/spool/mail/mbox) and MAILCHECK to zero to disable periodic checks using bash. Then, I set PS1 like this:

    PS1="$( printf '\001\033[93m\002$( test -s "$MAIL" && printf "[mail] " )\001\033[96m\002\\u@\h:\001\033[95m\002\w\001\033[92m\002\$\001\033[0m\002 ' )"
    export PS1

    As you can see, I use printf as a more portable and shell-independent echo alternative to generate escape sequences for the prompt colors. The important part is that, as show above, I run a command in the prompt by generating a literal “$(…​)” in it. This command checks if the mail spool exists and is not empty, printing [mail] before the prompt in a different color when it does. Due to the “$(…​)” sequence being included literally in the PS1 value, this is run every time the prompt is displayed. The notification about pending mail stays with the prompt until I read and delete the message, making it much harder to miss.

    May 03, 2020 02:32 PM

    May 01, 2020

    Ricardo García

    My first tablet is an Apple iPad (7th gen)

    Not long ago, just before the coronavirus pandemic started, I bought my first tablet ever. I had used tablets many times before but never actually owned one. However, my wife and I decided it was a good idea to get one to use at home, for several reasons. It could be used in long trips by the kids to watch movies, for example, and it was a practical device to quickly browse the web when using our computers or mobile phones turned out to be a bit cumbersome. More recently, we changed ISPs and got rid of the ISP-provided set-top-box that allowed us to play Netflix content on our non-smart TV. The tablet could be used in combination with our Chromecast for the same purpose.

    Because this tablet would belong to our “home” instead of a particular individual and would only have to serve some basic needs, I started looking at cheap Android tablets around the 200 euros mark. Contrary to mobile phones, where you can get many decent phones at that price point, most Android tablets I found were pretty bad devices, in my humble opinion, either lacking memory, CPU or GPU power or having pretty basic low resolution screens. Also, for a security conscious person like myself, they all implied switching tablets after around probably no more than 3 years. Android manufacturers will stop giving you security updates for a given device after that.

    Not owning any other Apple device, it was by pure chance that I took a look at the Apple store and discovered the normal iPad (not the iPad Pro, of course, but also not the iPad Mini) was the cheapest Apple tablet, costing around 380 euros. Moreover, it was heavily discounted on Amazon and it was available, at the time, for 350 euros including a non-official case (which works perfectly fine, by the way). As you can check in Wikipedia’s iPad page, iPads are also typically supported for around 5 years or even a bit more. I reasoned it made more sense to get an iPad for 350 euros than an Android tablet for around 200. 350 euros in 5 years is 70 euros a year, which is the same as 210 euros in 3 years. Not only it costs basically the same per year, but you’re also getting a superior device in the first place. Battery life is better, the screen is better, etc. Not having to change devices so often is also good from an ecological perspective (not as good as not having one in the first place, of course) and it’s also less of a mental burden.

    The final detail that made me choose an iPad was talking to some other family members who had one and who showed it to me. Some of those devices were over 5 years old and they still ran perfectly fine, were responsive and had decent battery life. It was only then that I started to really understand the conclusions in some online reviews I had been reading, which indicated the best cheap tablet was a second-hand iPad. So in the end I went for it and bought one.

    Having said that, when buying tablets, phones and smart devices in general (something I rarely do) I always have the feeling I’m choosing the lesser evil, and this is no different situation. People who know me well will recall I’m not precisely an Apple fan. I hate the fact that they’re selling you a programmable computing device that you do not really own as such because, if you want to create a program for it, you need an Apple computer, a developer license and, even then, you cannot even directly load your program on the device like you can do with Android or any classic computer. It has to go through their gatekeepers. Ridiculous, in my opinion, and annoying. Fortunately for me, in this case I’m treating the tablet almost like an appliance in a similar fashion to my phone: it’s a device with a known relatively short lifespan that allows me to have access to a number of applications and services, and I always keep that in my mind.

    Some weeks after buying it, the pandemic happened and the iPad is basically paying for itself. We use it every single day to cast content from Netflix, Amazon Prime Video and Disney+ to the living room TV and have basically “discovered” FaceTime, which we use to talk to other family members. The tablet has a larger and much more comfortable screen for videoconferencing compared to a phone, and it can be easily carried around the house compared to a laptop. It’s a pretty good compromise.

    In the smart TV aspect this, again, is a lose-lose situation that implies choosing the lesser evil. I trust the combination of an iPad plus a Chromecast (from Google, nonetheless!) more than I trust TV apps provided by manufacturers and connecting my TV to the Internet. I also think the equivalent iPad apps are easier to use and generally work better. Chromecasts will probably be supported for longer than TV apps, if anything because their purpose and interfaces are simpler and every major content provider supports casting to them, while the market share for iPads, specially in the US, almost guarantees Netflix, HBO, Disney and Amazon, to name a few, will make their apps available for them for as long as I will use mine.

    May 01, 2020 08:40 PM

    April 29, 2020

    Brian Kardell

    Interactive Elements: A Strange Game

    Note from the author...

    My posts frequently (like this one) have a 'theme' and tend to use a number of images for visual flourish. Personally, I like it that way, I find it more engaging and I prefer for people to read it that way. However, for users on a metered or slow connection, downloading unnecessary images is, well, unnecessary, potentially costly and kind of rude. Just to be polite to my users, I offer the ability for you to opt out of 'optional' images if the total size of viewing the page would exceed a budget I have currently defined as 200k...

    Interactive Elements: A Strange Game

    It can seem especially frustrating to a lot of developers that we don't have more elements "out of the box" as part of HTML. I mean: Where are the tabs? How do we not have tabs by now? It seems like almost every week this comes up in some fashion or other. In this piece I'll talk a little about why, how to see this for what it is, how we're trying to move forward - and also present an interesting experiment for your consideration and feedback that may help inform some of these efforts. If you want, you can skip right to the 'new idea/experiment', but I think this background is pretty helpful in understanding it, so please, come back and read it...

    Let's have a look at HTML so far: ~10 of the elements are the 'boilerplate stuff': <html>, <body>, <head> and metadata stuff - <link>, <meta>, <base>, <style>, <script>, <title>. There's a few elements for linking and embedding other content and multimedia, and a few with special powers like <template> and <slot>. And then the rest: ~30 are "weak semantics about text" that aren't interactive and don't have signficant (any) special meaning necessarily and are primarily just things with default CSS stylesheet rules. About 15 are "weak semantics about text, but meaningfully weak" - stuff like lists. ~15 are sectioning or landmarks. 11 more are about tables - a 2 dimensional relationship of text. 14 are about forms. And then two are... well... something else. We'll come back to that. And ~30 are deprecated.

    The route to all of these elements was a little different - and they have different costs associated with them, and offer different value. One thing that's interesting to note is that the majority of them are what Dave Rupert refers to as "Spicy Divs". That is, there's not much to them - they aren't complex or interactive, and they don't bring a ton of real world value. That's not to say they are without value, but ultimately we spend a lot of time and bandwidth in standards debating what often amount to silly things because of it. <main> is a kind of a good example. Nothing wrong with it as an element, it's great, in fact - it just has a long and complex backstory that ultimately cost a lot relative to its real world value and proven use. Tons of time defining and debating <address>, which isn't really that meaningful to browsers, and then lots of evangelism telling people they're using it wrong - only in the end to eventually have to admit that it ultimately means the thing people used it as.

    The second thing to notice is that there there are very few interactive elements. Further, most of the ones we usually talk about are about forms. And that lead us to this: Interactive, form-related elements on the web are... tricky.

    The form-related game

    On the one hand, there's a lot to like about native, interactive form controls. In theory, they can bring a simple declarative form with all kinds of goodness in terms of platform integration, portability and centralized work on accessibility and UX.


    ...primary goal has not yet been achieved
    > What is the primary goal?
    To win the game.

    More simply: Despite a lot of work, and huge investments, people still gravitate toward custom implementations. That's not good for anybody.

    Why are native form controls hard?

    Well... So many reasons, but to start with, the one that they could really do without is that they got off to a tricky start by design.

    <input type="WOPR">

    Most of the form-related controls are created based on <input type-"...">. You've probably heard someone suggest that this is great because in cases where it isn't supported, simply providing the text field is fine. However, this is incomplete and we've been paying the price for it for a long time.

    > People sometimes make... mistakes.

    Did you ever notice how the JavaScript API surface of input works differently based on the type? If the type is number and you type gibberish that looks remotely 'potentially numbery' like '12312e314124', then .value will be '' (empty string). Or, the checkbox type has a checked property, but that means that input type=number has that property too... What does it do? Anything!? Or, have you ever noticed how .selectionStart and .selectionEnd actually would initially throw if you tried to use them on a numeric input type?! (fixed by Simon Pieters). If not, go have a watch of Monica Dinculescu's <input> I ♡ you, but you're bringing me down -- a 45 minute talk about just this.

    Ultimately, many of these problems stem from us accepting the illusion presented that a date picker is a kind of text input.

    Faulken pointing to NORAD screens
    Stephen Falken: Uh, uh, General, what you see on these screens up here is a fantasy; a computer-enhanced hallucination. Those blips are not real missiles. They're phantoms.

    Realistically, a date picker isn't a text input. Perhaps at an HTTP level, or even a database level, it's just a value - but controls are about how to populate values, and that's quite a different thing. It's not an is-a relationship: A date picker is a completely different complex control with potentially lots of complex interactions: Different things it needs to communicate, lots of sub-iteractions to manage, and so on.

    All of this is why you might hear people suggest "LSP" or "Favor composition over inheritance" as a better alternative.

    Acceptable Losses?

    This is exacerbated by the fact that at the end of the day, developers are left with some pretty unappealing choices: Something is going to be sacrificed.

    Color pickers provide a great example of how this plays out in practice: Will developers building the sorts of tools that use a color picker allow a simple input field as the fallback until every browser ships something acceptable? Seems like not much of a choice really.

    Sadly the story for 'polyfilling' an element also still isn't great: It's really just 'replacing it with something else entirely'. So, at the end of the day, you've got to go and find something that is a good quality stand in anyway. Finally, since literally everything your code relies on understanding the DOM, and that stand in will have entirely different DOM, you now have two entirely different interactive trees to worry about. Yikes.

    Further, this situation isn't necessarily short lived, historically. The reality is that lacking universal support can be the case for a long time: Native support for <input type="color"> was added to the last major browser... Checks notes... last year.

    Even after all of that... Guess what? It still isn't acceptable for a lot of people because it isn't styleable -- and in fact, it has entirely different UI and features everywhere. While this can occasionally be very useful on some kinds of devices, it makes things like providing helpful documentation hard. Worse, even the native ones weren't super keyboard accessible either.

    So... That's a lot of barriers and disincentives.

    Conversely, a custom solution can solve all of these problems now... So it shouldn't actually be surprising that developers often choose a custom solution: There are no 'acceptable losses' for developers here - all of the choices are somehow bad.

    Learn, dammit.

    Now, let's talk about how we get out of this mess.

    You'll be happy to know that there are efforts to both learn and improve existing native interactive elements. If you haven't seen Greg Whitworth and Nicole Sullivan's talk HTML Isn't Done from Chrome Dev Summit last year, I would highly recommend it (below).

    There's also been a heck of a lot of work to make it possible for authors to define their own interactive elements that work just like the native ones. The idea is to empower developers to explore the space and try to help find the really sweet spots that strike all of the right balances.

    It's worth pointing out why this is useful - it's not just an academic exercise that is trying to push responsibility to developers. It's important because the truth is: We don't actually know how. We need to be able to throw lots of things at the wall and see what sticks. The trouble is, currently (historically) failure is really slow and really expensive -- and even our most educated 'guesses' at improvement are still just that: Guesses. We haven't proved we have this figured out yet.

    We need something like the WOPR for playing out possible answers without doing harm: failing, and learning - that's what the custom elements stuff is all about.
    Shall we play a (different) game?

    Ok, but hold on.... Let's talk about non-form related interactive controls now. Realistically, we've come up with really only 1 strictly generic non-form-related UI control in 30 years: <details>. That "other" one in my intial list. Wow... And we've had that done for like 10 years, right? Let's see, it's been supported universally since... checks notes... Jan 14, 2020. Waaaaat?

    But... Again, it has most of the same kinds of general problems. In the most recent HTTP Almanac data, it was the 102nd most used HTML element... Can you even name 101 other HTML elements? It has styling problems. It's misunderstood. It was hard to polyfill well without just creating other problems.

    So... This is terrible.

    Our continued problems in this space are part of the reason there is hesitancy to take up entirely new things... Like, tabs.

    Back in 2015, I managed to get a number of people in Web standards (especially a11y) interested in working on a possible new element proposal for HTML which would have given not only the tabset but some other things as well. Determined not to be doomed by the same sorts of stylability problems, we set out to define things that would become shadow parts and themes and custom properties and so on.

    But, in many ways, it wasn't great. Recently, this has got me to thinking: What if this isn't even the right game we're playing? What if for some elements at least - non-form associated ones - the right lesson to take away is that this is an unnecessary exercise in futility?

    Maybe we need a different kind of experiment: One that just says "Strange game. The only winning move is not to play".

    How about a nice game of chess?

    So, here's the basic idea: What if we just created a resilient pattern focused on mainly function and meaning -- and hardly at all on the UI aspect. What if instead of inventing complex new ways to strike the balance of preventing authors from styling too much, we mostly just acknowledged the fact that they want to, and... let them.

    What if, like with <video> you could kind of just take control of the UI, but... maybe without throwing it away entirely.

    I spoke with Greg Whitworth about this, who has been investigating how to improve the existing controls on the Web (see also Can we please style select?.

    In fact, he completely agreed with the overall premise and went on to described how...

    The core mission of Open UI, a new W3C community group, is to document the anatomies, behaviors and states of components and controls from across frameworks. By doing this, we can ensure that the key pieces that folks don't want to re-create, they don't have to. We've got a few different ideas that are beginning to take shape but changes like this are like peeling an onion. If it's a new control/component then it would be a bit more straight forward but the most painful controls we've found have been on the web since the 90s. So we'll be gathering telemetry and ensuring that any modifications we ultimately propose are web compatible.

    In our conversation we discussed how things like tabs seem like an interesting target where this kind of thing might just work really well. The particular shape of the tabs problem lends itself, I think, to easily trying something completely different and entirely without past baggage.

    So, in that light here's a demo/link to just such an experiment which gives us... tabs! - and which we think has some nice qualities:

    • It is a declarative decorator over otherwise good but non-interactive semantics. That means there's no miss of differing interaction issues here caused by complex inheritance. It's composition.
    • It doesn't change your light DOM, and it barely uses Shadow DOM. There's no "two trees" problem, and there's not secrets or new challenges. Go ahead, use CSS and JavaScript.
    • It adds the functionality: roving focus, keyboard handling, accessibility stuff and a simple container for you. That's it, really.
    • If you really want to opt in to some simple 'default styles' and like the 'minor tweaksonly' approach, it has an attribute that enables 'native' look and feel, with a bit less flexibility, but which might be enough for some people. It just starts with the assumption that that's not the case.

    So... that's it. It could for sure use more work, but WDYT? Would you try it out? Let us know your thoughts? Such experiments will be useful in informing work and directions for efforts like Open-UI and possible future standards.

    Will it help? To be honest, I don't know, but maybe the right way to learn involves trying something really different, and I'm open to anything...

    April 29, 2020 04:00 AM

    April 19, 2020

    Ricardo García

    Using borg for backups

    In the last few days I’ve replaced almost all my custom backup scripts with a single one based on Borg. Borg is based on Attic and has been around for a few years. If you didn’t know about it, I highly recommend you look into it.

    How to make proper backups

    When you read a bit about how to properly create data backups you quickly learn a few things. The first one is that a mirror is not a proper backup. If you have a replica of your data somewhere else and update it frequently so both data sets are exactly the same, you’re not protecting yourself from inadvertently deleting or corrupting what you have. Obviously if you remove something by accident and realize what you’ve done in that very moment, you may be able to get your data back from the mirror. But if you don’t realize about it until a few days later, when you may have already synchronized your mirror, you’re out of luck. So lesson number one is that backups need to preserve history to some degree, like a version control system does.

    The second lesson is that you’ll want to keep backups in one, two or more locations, but you probably want at least one of those locations to be off-site. That is, if you backup your home directory to a second computer in your house, even if you monitor the health status of your primary and backup systems, you’re risking to lose all your data if, say, a fire breaks out in the house.

    Given those lessons, one common scheme for creating backups without taking up too much disk space is to use a system that loses granularity over time. For example, you could create a backup archive (say, a tarball) once a week (or daily, if you prefer, but I will use a weekly scheme for this example) and you could preserve the last 4 or 5 archives to cover the whole past month. In addition, for the previous 11 months you preserve one archive for each one and, in addition, for the previous 9 years you preserve at least an archive for each year. With this scheme, using only 24 or 25 archives you can recover up to 10 years of data history if you need to do so, with recent archives providing finer-grained information deltas.

    As a matter of fact, before moving to borg I had a relatively simple set of scripts to follow a scheme similar to the one described above.

    How to use borg

    Borg basically simplifies all of the above so it only takes a few lines in a script to achieve those goals. Borg itself is based on two concepts: repositories and archives. Imagine your backups are simple tarballs which are stored in a known directory somewhere, probably using the backup date as part of the tarball name. With borg, it’s just the same thing with somewhat different names: the directory is called repository and tarballs are called archives (tar comes from tape archive, so it’s not even that different).

    The trick here is that borg gives you a couple additional things for free: inside a repository, archives are encrypted. There are a few different possibilities to manage encryption in a repository, and the most common one is probably using a passphrase chosen at repository creation time. Second, data inside a repository is compressed and deduplicated. Deduplication means your data is split into chunks and borg tries to avoid storing the same chunk twice. When you create a new archive a given week, chances are it can be stored using a fraction of the total space the files take on disk because only new and modified chunks need to be stored. In my case, I’m storing 265GB worth of data using just around 20GB on disk thanks to compression and deduplication.

    Once you have created a repository, you can add archives to it. Creating a new archive is like creating a tarball: you need to tell borg what directories to store and a name for the archive, which can easily include the current date. Much like rsync and other similar tools, borg has options to indicate exclusion patterns to avoid storing everything under a given directory. Once your new archive is created, you’re almost done.

    The final and optional step is to prune archives you no longer need from the repository. Remember: you probably want to have a backup scheme with varying granularity like I described before. You can just tell borg the desired granularity and it will find out if there’s any archive no longer worth including, removing it from the repo.

    Other important details

    borg can work both with local repositories and remote repositories if they are accessible using SSH, in a similar way to rsync. If the remote machine also has borg installed, borg can leverage its remote counterpart to operate more efficiently. Encryption is done on the machine creating the archives, so the remote server never sees unencrypted data and an attacker accessing the remote server disk cannot see files stored there.

    There are several ways to access borg archives to recover data. My personal favorite is using borg’s mount command. It will mount, under a given directory, either a repository or a particular archive, which you can then browse as any other local file system. This is similar to using sshfs.

    Also, shout out to the guys at, who offer Unix-friendly disks in the cloud accessible using SSH. They have a special price for borg users with somewhat competitive pricing. It’s not as cheap as the likes of Dropbox, Google Drive or OneDrive, but neither of those have such a flexible price scheme, where you can pay exactly for the storage space you want, and neither of those let you access the disk using standard Unix tools.

    An example from my case

    As an example, I’m pasting below my custom borg script. I use it to create backups of my home directory. In it, REPO_PATH has been edited out but it contains an SSH-like URL with the remote location of my borg repository. Feel free to use it as a reference. Note after creating a new archive and pruning old archives I go one step ahead and verify the status of the borg repository as an additional measure, even if I monitor the health of all my local drives using smartd.

    #!/usr/bin/env bash
    read -s -p "borg passphrase: " BORG_PASSPHRASE
    printf '\nCreating backup...\n'
    REPO_PATH=<repo location edited out>
    borg create                                         \
        --stats                                         \
        --show-rc                                       \
        --compression zstd                              \
        --exclude-caches                                \
        --exclude "$HOME"/Downloads                     \
        --exclude "$HOME"/.gvfs                         \
        --exclude "$HOME"/.mozilla                      \
        --exclude "$HOME"/.thunderbird                  \
        --exclude "$HOME"/.thumbnails                   \
        --exclude "$HOME"/.cache                        \
        --exclude "$HOME"/.local/share/Trash            \
        --exclude "$HOME"/mnt                           \
        "$REPO_PATH::{utcnow}"                          \
    borg prune                                          \
        --list                                          \
        --show-rc                                       \
        --keep-weekly 4                                 \
        --keep-monthly 12                               \
        --keep-yearly 10                                \
    borg check                                          \
        -p                                              \
        --show-rc                                       \
    GLOBAL_RC="$( printf '%s\n%s\n%s\n' $CREATE_RC $PRUNE_RC $CHECK_RC | sort -n | tail -1 )"
    exit "$GLOBAL_RC"

    To import my backup history into borg I used a custom script based on bind mounts and the --timestamp option for borg’s create command. Borg’s documentation is pretty detailed, so do not hesitate to look for answers there.

    April 19, 2020 01:18 PM

    April 14, 2020

    Andy Wingo

    understanding webassembly code generation throughput

    Greets! Today's article looks at browser WebAssembly implementations from a compiler throughput point of view. As I wrote in my article on Firefox's WebAssembly baseline compiler, web browsers have multiple wasm compilers: some that produce code fast, and some that produce fast code. Implementors are willing to pay the cost of having multiple compilers in order to satisfy these conflicting needs. So how well do they do their jobs? Why bother?

    In this article, I'm going to take the simple path and just look at code generation throughput on a single chosen WebAssembly module. Think of it as X-ray diffraction to expose aspects of the inner structure of the WebAssembly implementations in SpiderMonkey (Firefox), V8 (Chrome), and JavaScriptCore (Safari).

    experimental setup

    As a workload, I am going to use a version of the "Zen Garden" demo. This is a 40-megabyte game engine and rendering demo, originally released for other platforms, and compiled to WebAssembly a couple years later. Unfortunately the original URL for the demo was disabled at some point in late 2019, so it no longer has a home on the web. A bit of a weird situation and I am not clear on licensing either. In any case I have a version downloaded, and have hacked out a minimal set of "imports" that the WebAssembly module needs from the host to allow the module to compile and link when run from a JavaScript shell, without requiring WebGL and similar facilities. So the benchmark is just to instantiate a WebAssembly module from the 40-megabyte byte array and see how long it takes. It would be better if I had more test cases (and would be happy to add them to the comparison!) but this is a start.

    I start by benchmarking the various WebAssembly implementations, firstly in their standard configuration and then setting special run-time flags to measure the performance of the component compilers. I run these tests on the core-rich machine that I use for browser development (2 Xeon Silver 4114 CPUs for a total of 40 logical cores). The default-configuration numbers are therefore not indicative of performance on a low-end Android phone, but we can use them to extract aspects of the different implementations.

    Since I'm interested in compiler throughput, I'm not particularly concerned about how well a compiler will use all 40 cores. Therefore when testing the specific compilers I will set implementation-specific flags to disable parallelism in the compiler and GC: --single-threaded on V8, --no-threads on SpiderMonkey, and --useConcurrentGC=false --useConcurrentJIT=false on JSC. To further restrict any threads that the implementation might decide to spawn, I'll bind these to a single core on my machine using taskset -c 4. Otherwise the machine is in its normal configuration (nothing else significant running, all cores available for scheduling, turbo boost enabled).

    I'll express results in nanoseconds per WebAssembly code byte. Of the 40 megabytes or so in the Zen Garden demo, only 23 891 164 bytes are actually function code; the rest is mostly static data (textures and so on). So I'll divide the total time by this code byte count.

    I tested V8 at git revision 0961376575206, SpiderMonkey at hg revision 8ec2329bef74, and JavaScriptCore at subversion revision 259633. The benchmarks can be run using just a shell; see the pull request. I timed how long it took to instantiate the Zen Garden demo, ensuring that a basic export was callable. I collected results from 20 separate runs, sleeping a second between them. The bars in the charts below show the median times, with a histogram overlay of all results.

    results & analysis

    We can see some interesting results in this graph. Note that the Y axis is logarithmic. The "concurrent tiering" results in the graph correspond to the default configurations (no special flags, no taskset, all cores available).

    The first interesting conclusions that pop out for me concern JavaScriptCore, which is the only implementation to have a baseline interpreter (run using --useWasmLLInt=true --useBBQJIT=false --useOMGJIT=false). JSC's WebAssembly interpreter is actually structured as a compiler that generates custom WebAssembly-specific bytecode, which is then run by a custom interpreter built using the same infrastructure as JSC's JavaScript interpreter (the LLInt). Directly interpreting WebAssembly might be possible as a low-latency implementation technique, but since you need to validate the WebAssembly anyway and eventually tier up to an optimizing compiler, apparently it made sense to emit fresh bytecode.

    The part of JSC that generates baseline interpreter code runs slower than SpiderMonkey's baseline compiler, so one is tempted to wonder why JSC bothers to go the interpreter route; but then we recall that on iOS, we can't generate machine code in some contexts, so the LLInt does appear to address a need.

    One interesting feature of the LLInt is that it allows tier-up to the optimizing compiler directly from loops, which neither V8 nor SpiderMonkey support currently. Failure to tier up can be quite confusing for users, so good on JSC hackers for implementing this.

    Finally, while baseline interpreter code generation throughput handily beats V8's baseline compiler, it would seem that something in JavaScriptCore is not adequately taking advantage of multiple cores; if one core compiles at 51ns/byte, why do 40 cores only do 41ns/byte? It could be my tests are misconfigured, or it could be that there's a nice speed boost to be found somewhere in JSC.

    JavaScriptCore's baseline compiler (run using --useWasmLLInt=false --useBBQJIT=true --useOMGJIT=false) runs much more slowly than SpiderMonkey's or V8's baseline compiler, which I think can be attributed to the fact that it builds a graph of basic blocks instead of doing a one-pass compile. To me these results validate SpiderMonkey's and V8's choices, looking strictly from a latency perspective.

    I don't have graphs for code generation throughput of JavaSCriptCore's optimizing compiler (run using --useWasmLLInt=false --useBBQJIT=false --useOMGJIT=true); it turns out that JSC wants one of the lower tiers to be present, and will only tier up from the LLInt or from BBQ. Oh well!

    V8 and SpiderMonkey, on the other hand, are much of the same shape. Both implement a streaming baseline compiler and an optimizing compiler; for V8, we get these via --liftoff --no-wasm-tier-up or --no-liftoff, respectively, and for SpiderMonkey it's --wasm-compiler=baseline or --wasm-compiler=ion.

    Here we should conclude directly that SpiderMonkey generates code around twice as fast as V8 does, in both tiers. SpiderMonkey can generate machine code faster even than JavaScriptCore can generate bytecode, and optimized machine code faster than JSC can make baseline machine code. It's a very impressive result!

    Another conclusion concerns the efficacy of tiering: for both V8 and SpiderMonkey, their baseline compilers run more than 10 times as fast as the optimizing compiler, and the same ratio holds between JavaScriptCore's baseline interpreter and compiler.

    Finally, it would seem that the current cross-implementation benchmark for lowest-tier code generation throughput on a desktop machine would then be around 50 ns per WebAssembly code byte for a single core, which corresponds to receiving code over the wire at somewhere around 160 megabits per second (Mbps). If we add in concurrency and manage to farm out compilation tasks well, we can obviously double or triple that bitrate. Optimizing compilers run at least an order of magnitude slower. We can conclude that to the desktop end user, WebAssembly compilation time is indistinguishable from download time for the lowest tier. The optimizing tier is noticeably slower though, running more around 10-15 Mbps per core, so time-to-tier-up is still a concern for faster networks.

    Going back to the question posed at the start of the article: yes, tiering shows a clear benefit in terms of WebAssembly compilation latency, letting users interact with web sites sooner. So that's that. Happy hacking and until next time!

    by Andy Wingo at April 14, 2020 08:59 AM

    April 08, 2020

    Andy Wingo

    multi-value webassembly in firefox: a binary interface

    Hey hey hey! Hope everyone is staying safe at home in these weird times. Today I have a final dispatch on the implementation of the multi-value feature for WebAssembly in Firefox. Last week I wrote about multi-value in blocks; this week I cover function calls.

    on the boundaries between things

    In my article on Firefox's baseline compiler, I mentioned that all WebAssembly engines in web browsers treat the function as the unit of compilation. This facilitates streaming, parallel compilation of WebAssembly modules, by farming out compilation of individual functions to worker threads. It also allows for easy tier-up from quick-and-dirty code generated by the low-latency baseline compiler to the faster code produced by the optimizing compiler.

    There are some interesting Conway's Law implications of this choice. One is that division of compilation tasks becomes an opportunity for division of human labor; there is a whole team working on the experimental Cranelift compiler that could replace the optimizing tier, and in my hackings on Firefox I have had minimal interaction with them. To my detriment, of course; they are fine people doing interesting things. But the code boundary means that we don't need to communicate as we work on different parts of the same system.

    Boundaries are where places touch, and sometimes for fluid crossing we have to consider boundaries as places in their own right. Functions compiled with the baseline compiler, with Ion (the production optimizing compiler), and with Cranelift (the experimental optimizing compiler) are all able to call each other because they actively maintain a common boundary, a binary interface (ABI). (Incidentally the A originally stands for "application", essentially reflecting division of labor between groups of people making different components of a software system; Conway's Law again.) Let's look closer at this boundary-place, with an eye to how it changes with multi-value.

    what's in an ABI?

    Among other things, an ABI specifies a calling convention: which arguments go in registers, which on the stack, how the stack values are represented, how results are returned to the callers, which registers are preserved over calls, and so on. Intra-WebAssembly calls are a closed world, so we can design a custom ABI if we like; that's what V8 does. Sometimes WebAssembly may call functions from the run-time, though, and so it may be useful to be closer to the C++ ABI on that platform (the "native" ABI); that's what Firefox does. (Incidentally here I think Firefox is probably leaving a bit of performance on the table on Windows by using the inefficient native ABI that only allows four register parameters. I haven't measured though so perhaps it doesn't matter.) Using something closer to the native ABI makes debugging easier as well, as native debugger tools can apply more easily.

    One thing that most native ABIs have in common is that they are really only optimized for a single result. This reflects their heritage as artifacts from a world built with C and C++ compilers, where there isn't a concept of a function with more than one result. If multiple results are required, they are represented instead as arguments, typically as pointers to memory somewhere. Consider the AMD64 SysV ABI, used on Unix-derived systems, which carefully specifies how to pass arbitrary numbers of arbitrary-sized data structures to a function (§3.2.3), while only specifying what to do for a single return value. If the return value is too big for registers, the ABI specifies that a pointer to result memory be passed as an argument instead.

    So in a multi-result WebAssembly world, what are we to do? How should a function return multiple results to its caller? Let's assume that there are some finite number of general-purpose and floating-point registers devoted to return values, and that if the return values will fit into those registers, then that's where they go. The problem is then to determine which results will go there, and if there are remaining results that don't fit, then we have to put them in memory. The ABI should indicate how to address that memory.

    When looking into a design, I considered three possibilities.

    first thought: stack results precede stack arguments

    When a function needs some of its arguments passed on the stack, it doesn't receive a pointer to those arguments; rather, the arguments are placed at a well-known offset to the stack pointer.

    We could do the same thing with stack results, either reserving space deeper on the stack than stack arguments, or closer to the stack pointer. With the advent of tail calls, it would make more sense to place them deeper on the stack. Like this:

    The diagram above shows the ordering of stack arguments as implemented by Firefox's WebAssembly compilers: later arguments are deeper (farther from the stack pointer). It's an arbitrary choice that happens to match up with what the native ABIs do, as it was easier to re-use bits of the already-existing optimizing compiler that way. (Native ABIs use this stack argument ordering because of sloppiness in a version of C from before I was born. If you were starting over from scratch, probably you wouldn't do things this way.)

    Stack result order does matter to the baseline compiler, though. It's easier if the stack results are placed in the same order in which they would be pushed on the virtual stack, so that when the function completes, the results can just be memmove'd down into place (if needed). The same concern dictates another aspect of our ABI: unlike calls, registers are allocated to the last results rather than the first results. This is to make it easy to preserve stack invariant (1) from the previous article.

    At first I thought this was the obvious option, but I ran into problems. It turns out that stack arguments are fundamentally unlike stack results in some important ways.

    While a stack argument is logically consumed by a call, a stack result starts life with a call. As such, if you reserve space for stack results just by decrementing the stack pointer before a call, probably you will need to load the results eagerly into registers thereafter or shuffle them into other positions to be able to free the allocated stack space.

    Eager shuffling is busy-work that should be avoided if possible. It's hard to avoid in the baseline compiler. For example, a call to a function with 10 arguments will consume 10 values from the temporary stack; any results will be pushed on after removing argument values from the stack. If there any stack results, it's almost impossible to avoid a post-call memmove, to move stack results to where they should be before the 10 argument values were pushed on (and probably spilled). So the baseline compiler case is not optimal.

    However, things get gnarlier with the Ion optimizing compiler. Like many other optimizing compilers, Ion is designed to compute the necessary stack frame size ahead of time, and to never move the stack pointer during an activation. The only exception is for pushing on any needed stack arguments for nested calls (which are popped directly after the nested call). So in that case, assuming there are a number of multi-value calls in a stack frame, we'll be shuffling in the optimizing compiler as well. Not great.

    Besides the need to shuffle, stack arguments and stack results differ as regards ownership and garbage collection. A callee "owns" the memory for its stack arguments; it is responsible for them. The caller can't assume anything about the contents of that memory after a call, especially if the WebAssembly implementation supports tail calls (a whole 'nother blog post, that). If the values being passed are just bits, that's one thing, but with the reference types proposal, some result values may be managed by the garbage collector. The callee is responsible for making stack arguments visible to the garbage collector; the caller is responsible for the results. The caller will need to emit metadata to allow the garbage collector to see stack result references. For this reason, a stack result actually starts life just before a call, because it can become initialized at any point and thus needs to be traced during the entire callee activation. Not all callers can easily add garbage collection roots for writable stack slots, so the need to place stack results in a fixed position complicates calling multi-value WebAssembly functions in some cases (e.g. from C++).

    second thought: pointers to individual stack results

    Surely there are more well-trodden solutions to the multiple-result problem. If we encoded a multi-value return in C, how would we do it? Consider a function in C that has three 64-bit integer results. The idiomatic way to encode it would be to have one of the results be the return value of the function, and the two others to be passed "by reference":

    int64_t foo(int64_t* a, int64_t* b) {
      *a = 1;
      *b = 2;
      return 3;
    void call_foo(void) {
      int64 a, b, c;
      c = foo(&a, &b);

    This program shows us a possibility for encoding WebAssembly's multiple return values: pass an additional argument for each stack result, pointing to the location to which to write the stack result. Like this:

    The result pointers are normal arguments, subject to normal argument allocation. In the above example, given that there are already stack arguments, they will probably be passed on the stack, but in many cases the stack result pointers may be passed in registers.

    The result locations themselves don't even need to be on the stack, though they certainly will be in intra-WebAssembly calls. However the ability to write to any memory is a useful form of flexibility when e.g. calling into WebAssembly from C++.

    The advantage of this approach is that we eliminate post-call shuffles, at least in optimizing compilers. But, having to make an argument for each stack result, each of which might itself become a stack argument, seems a bit offensive. I thought we might be able to do a little better.

    third thought: stack result area, passed as pointer

    Given that stack results are going to be written to memory, it doesn't really matter where they will be written, from the perspective of the optimizing compiler at least. What if we allocated them all in a block and just passed one pointer to the block? Like this:

    Here there's just one additional argument, no matter how many stack results. While we're at it, we can specify that the layout of the stack arguments should be the same as how they would be written to the baseline stack, to make the baseline compiler's job easier.

    As I started implementation with the baseline compiler, I chose this third approach, essentially because I was already allocating space for the results in a block in this way by bumping the stack pointer.

    When I got to the optimizing compiler, however, it was quite difficult to convince Ion to allocate an area on the stack of the right shape.

    Looking back on it now, I am not sure that I made the right choice. The thing is, the IonMonkey compiler started life as an optimizing compiler for JavaScript. It can represent unboxed values, which is how it came to be used as a compiler for asm.js and later WebAssembly, and it does a good job on them. However it has never had to represent aggregate data structures like a C++ class, so it didn't have support for spilling arbitrary-sized data to the stack. It took a while staring at the register allocator to convince it to allocate arbitrary-sized stack regions, and then to allocate component scalar values out of those regions. If I had just asked the register allocator to give me one appropriate-sized stack slot for each scalar, and hacked out the ability to pass separate pointers to the stack slots to WebAssembly calls with stack results, then I would have had an easier time of it, and perhaps stack slot allocation could be more dense because multiple results wouldn't need to be allocated contiguously.

    As it is, I did manage to hack it in, and I think in a way that doesn't regress. I added a layer over an argument type vector that adds a synthetic stack results pointer argument, if the function returns stack results; iterating over this type with ABIArgIter will allocate a stack result area pointer, either as a register argument or a stack argument. In the optimizing compiler, I added add a kind of value allocation coresponding to a variable-sized stack area, (using pointer tagging again!), and extended the register allocator to allocate LStackArea, and the component stack results. Interestingly, I had to add a kind of definition that starts life on the stack; previously all Ion results started life in registers and were only spilled if needed.

    In the end, a function will capture the incoming stack result area argument, either as a normal SSA value (for Ion) or stored to a stack slot (baseline), and when returning will write stack results to that pointer as appropriate. Passing in a pointer as an argument did make it relatively easy to implement calls from WebAssembly to and from C++, getting the variable-shape result area to be known to the garbage collector for C++-to-WebAssembly calls was simple in the end but took me a while to figure out.

    Finally I was a bit exhausted from multi-value work and ready to walk away from the "JS API", the bit that allows multi-value WebAssembly functions to be called from JavaScript (they return an array) or for a JavaScript function to return multiple values to WebAssembly (via an iterable) -- but then when I got to thinking about this blog post I preferred to implement the feature rather than document its lack. Avoidance-of-document-driven development: it's a thing!

    towards deployment

    As I said in the last article, the multi-value feature is about improved code generation and also making a more capable base for expressing further developments in the WebAssembly language.

    As far as code generation goes, things are progressing but it is still early days. Thomas Lively has implemented support in LLVM for emitting return of C++ aggregates via multiple results, which is enabled via the -experimental-multivalue-abi cc1 flag. Thomas has also been implementing multi-value support in the binaryen WebAssembly toolchain component, used by the emscripten C++-to-WebAssembly toolchain. I think it will be a few months though before everything lands in a way that end users can take advantage of.

    On the specification side, the multi-value feature is now at phase 4 since January, which basically means things are all done there.

    Implementation-wise, V8 has had experimental support since 2017 or so, and the feature was staged last fall, although V8 doesn't yet support multi-value in their baseline compiler. WebKit also landed support last fall.

    Unlike V8 and SpiderMonkey, JavaScriptCore (the JS and wasm engine in WebKit) actually implements a WebAssembly interpreter as their solution to the one-pass streaming compilation problem. Then on the compiler side, there are two tiers that both operate on basic block graphs (OMG and BBQ; I just puked a little in my mouth typing that). This strategy makes the compiler implementation quite straightforward. It's also an interesting design point because JavaScriptCore's garbage collector scans the stack conservatively; there's no need for the compiler to do bookkeeping on the GC's behalf, which I'm sure was a relief to the hacker. Anyway, multi-value in WebKit is done too.

    The new thing of course is that finally, in Firefox, the feature is now fully implemented (woo) and enabled by default on Nightly builds (woo!). I did that! It took me a while! Perhaps too long? Anyway it's done. Thanks again to Bloomberg for supporting this work; large ups to y'all for helping the web move forward.

    See you next time with a more general article rounding up compile-time benchmarks on a variety of WebAssembly implementations. Until then, happy hacking!

    by Andy Wingo at April 08, 2020 09:02 AM

    April 03, 2020

    Andy Wingo

    multi-value webassembly in firefox: from 1 to n

    Greetings, hackers! Today I'd like to write about something I worked on recently: implementation of the multi-value future feature of WebAssembly in Firefox, as sponsored by Bloomberg.

    In the "minimum viable product" version of WebAssembly published in 2018, there were a few artificial restrictions placed on the language. Functions could only return a single value; if a function would naturally return two values, it would have to return at least one of them by writing to memory. Loops couldn't take parameters; any loop state variables had to be stored to and loaded from indexed local variables at each iteration. Similarly, any block that would naturally return more than one result would also have to do so via locals.

    This restruction is lifted with the multi-value proposal. Function types now map from result type to result type, where a result type is a sequence of value types. That is to say, just as functions can take multiple arguments, they can return multiple results. Similarly, with the multi-value proposal, block types are now the same as function types: loops and blocks can take arguments and return any number of results. This change improves the expressiveness of WebAssembly as a compilation target; a C++ program compiled to multi-value WebAssembly can be encoded in fewer bytes than before. Multi-value also establishes a base for other language extensions. For example, the exception handling proposal builds on multi-value to pass multiple values to catch blocks.

    So, that's multi-value. You would think that relaxing a restriction would be easy, but you'd be wrong! This task took me 5 months and had a number of interesting gnarly bits. This article is part one of two about interesting aspects of implementing multi-value in Firefox, specifically focussing on blocks. We'll talk about multi-value function calls next week.

    multi-value in blocks

    In the last article, I presented the basic structure of Firefox's WebAssembly support: there is a baseline compiler optimized for low latency and an optimizing compiler optimized for throughput. (There is also Cranelift, a new experimental compiler that may replace the current implementation of the optimizing compiler; but that doesn't affect the basic structure.)

    The optimizing compiler applies traditional compiler techniques: SSA graph construction, where values flow into and out of graphs using the usual defs-dominate-uses relationship. The only control-flow joins are loop entry and (possibly) block exit, so the addition of loop parameters means in multi-value there are some new phi variables in that case, and the expansion of block result count from [0,1] to [0,n] means that you may have more block exit phi variables. But these compilers are built to handle these situations; you just build the SSA and let the optimizing compiler go to town.

    The problem comes in the baseline compiler.

    from 1 to n

    Recall that the baseline compiler is optimized for compiler speed, not compiled speed. If there are only ever going to be 0 or 1 result from a block, for example, the baseline compiler's internal data structures will use something like a Maybe<ValType> to represent that block result.

    If you then need to expand this to hold a vector of values, the naïve approach of using a Vector<ValType> would mean heap allocation and indirection, and thus would regress the baseline compiler.

    In this case, and in many other similar cases, the solution is to use value tagging to represent 0 or 1 value type directly in a word, and the general case by linking out to an external vector. As block types are function types, they actually appear as function types in the WebAssembly type section, so they are already parsed; the BlockType in that case can just refer out to already-allocated memory.

    In fact this value-tagging pattern applies all over the place. (The jit/ links above are for the optimizing compiler, but they relate to function calls; will write about that next week.) I have a bit of pause about value tagging, in that it's gnarly complexity and I didn't measure the speed of alternative implementations, but it was a useful migration strategy: value tagging minimizes performance risk to existing specialized use cases while adding support for new general cases. Gnarly it is, then.

    control-flow joins

    I didn't mention it in the last article, but there are two important invariants regarding stack discipline in the baseline compiler. Recall that there's a virtual stack, and that some elements of the virtual stack might be present on the machine stack. There are four kinds of virtual stack entry: register, constant, local, and spilled. Locals indicate local variable reads and are mostly like registers in practice; when registers spill to the stack, locals do too. (Why spill to the temporary stack instead of leaving the value in the local variable slot? Because locals are mutable. A local.get captures a local variable value at its point of execution. If future code changes the local variable value, you wouldn't want the captured value to change.)

    Digressing, the stack invariants:

    1. Spilled values precede registers and locals on the virtual stack. If u and v are virtual stack entries and u is older than v, then if u is in a register or is a local, then v is not spilled.

    2. Older values precede newer values on the machine stack. Again for u and v, if they are both spilled, then u will be farther from the stack pointer than v.

    There are five fundamental stack operations in the baseline compiler; let's examine them to see how the invariants are guaranteed. Recall that before multi-value, targets of non-local exits (e.g. of the br instruction) could only receive 0 or 1 value; if there is a value, it's passed in a well-known register (e.g. %rax or %xmm0). (On 32-bit machines, 64-bit values use a well-known pair of registers.)

    Results of WebAssembly operations never push spilled values, neither onto the virtual nor the machine stack. v is either a register, a constant, or a reference to a local. Thus we guarantee both (1) and (2).
    pop() -> v
    Doesn't affect older stack entries, so (1) is preserved. If the newest stack entry is spilled, you know that it is closest to the stack pointer, so you can pop it by first loading it to a register and then incrementing the stack pointer; this preserves (2). Therefore if it is later pushed on the stack again, it will not be as a spilled value, preserving (1).
    When spilling the virtual stack to the machine stack, you first traverse stack entries from new to old to see how far you need to spill. Once you get to a virtual stack entry that's already on the stack, you know that everything older has already been spilled, because of (1), so you switch to iterating back towards the new end of the stack, pushing registers and locals onto the machine stack and updating their virtual stack entries to be spilled along the way. This iteration order preserves (2). Note that because known constants never need to be on the machine stack, they can be interspersed with any other value on the virtual stack.
    return(height, v)
    This is the stack operation corresponding to a block exit (local or nonlocal). We drop items from the virtual and machine stack until the stack height is height. In WebAssembly 1.0, if the target continuation takes a value, then the jump passes a value also; in that case, before popping the stack, v is placed in a well-known register appropriate to the value type. Note however that v is not pushed on the virtual stack at the return point. Popping the virtual stack preserves (1), because a stack and its prefix have the same invariants; popping the machine stack also preserves (2).
    Whereas return operations happen at block exits, capture operations happen at the target of block exits (the continuation). If no value is passed to the continuation, a capture is a no-op. If a value is passed, it's in a register, so we just push that register onto the virtual stack. Both invariants are obviously preserved.

    Note that a value passed to a continuation via return() has a brief instant in which it has no name -- it's not on the virtual stack -- but only a location -- it's in a well-known place. capture() then gives that floating value a name.

    Relatedly, there is another invariant, that the allocation of old values on block entry is the same as their allocation on block exit, so that all predecessors of the block exit flow all values via the same places. This is preserved by spilling on block entry. It's a big hammer, but effective.

    So, given all this, how do we pass multiple values via return()? We don't have unlimited registers, so the %rax strategy isn't going to work.

    The answer for the baseline compiler is informed by our lean into the stack machine principle. Multi-value returns are allocated in such a way that a capture() can push them onto the virtual stack. Because spilled values must precede registers, we therefore allocate older results on the stack, and put the last result in a register (or register pair for i64 on 32-bit platforms). Note that it's possible in theory to allocate multiple results to registers; we'll touch on this next week.

    Therefore the implementation of return(height, is straightforward: we first pop register results, then spill the remaining virtual stack items, then shuffle stack results down towards height. This should result in a memmove of contiguous stack results towards the frame pointer. However because const values aren't present on the machine stack, depending on the stack height difference, it may mean a split between moving some values toward the frame pointer and some towards the stack pointer, then filling in by spilling constants. It's gnarly, but it is what it is. Note that the links to the return and capture implementations above are to the post-multi-value world, so you can see all the details there.

    that's it!

    In summary, the hard part of multi-value blocks was reworking internal compiler data structures to be able to represent multi-value block types, and then figuring out the low-level stack manipulations in the baseline compiler. The optimizing compiler on the other hand was pretty easy.

    When it comes to calls though, that's another story. We'll get to that one next week. Thanks again to Bloomberg for supporting this work; I'm really delighted that Igalia and Bloomberg have been working together for a long time (coming on 10 years now!) to push the web platform forward. A special thanks also to Mozilla's Lars Hansen for his patience reviewing these patches. Until next week, then, stay at home & happy hacking!

    by Andy Wingo at April 03, 2020 10:56 AM

    March 25, 2020

    Andy Wingo

    firefox's low-latency webassembly compiler

    Good day!

    Today I'd like to write a bit about the WebAssembly baseline compiler in Firefox.

    background: throughput and latency

    WebAssembly, as you know, is a virtual machine that is present in web browsers like Firefox. An important initial goal for WebAssembly was to be a good target for compiling programs written in C or C++. You can visit a web page that includes a program written in C++ and compiled to WebAssembly, and that WebAssembly module will be downloaded onto your computer and run by the web browser.

    A good virtual machine for C and C++ has to be fast. The throughput of a program compiled to WebAssembly (the amount of work it can get done per unit time) should be approximately the same as its throughput when compiled to "native" code (x86-64, ARMv7, etc.). WebAssembly meets this goal by defining an instruction set that consists of similar operations to those directly supported by CPUs; WebAssembly implementations use optimizing compilers to translate this portable instruction set into native code.

    There is another dimension of fast, though: not just work per unit time, but also time until first work is produced. If you want to go play Doom 3 on the web, you care about frames per second but also time to first frame. Therefore, WebAssembly was designed not just for high throughput but also for low latency. This focus on low-latency compilation expresses itself in two ways: binary size and binary layout.

    On the size front, WebAssembly is optimized to encode small files, reducing download time. One way in which this happens is to use a variable-length encoding anywhere an instruction needs to specify an integer. In the usual case where, for example, there are fewer than 128 local variables, this means that a local.get instruction can refer to a local variable using just one byte. Another strategy is that WebAssembly programs target a stack machine, reducing the need for the instruction stream to explicitly load operands or store results. Note that size optimization only goes so far: it's assumed that the bytes of the encoded module will be compressed by gzip or some other algorithm, so sub-byte entropy coding is out of scope.

    On the layout side, the WebAssembly binary encoding is sorted by design: definitions come before uses. For example, there is a section of type definitions that occurs early in a WebAssembly module. Any use of a declared type can only come after the definition. In the case of functions which are of course mutually recursive, function type declarations come before the actual definitions. In theory this allows web browsers to take a one-pass, streaming approach to compilation, starting to compile as functions arrive and before download is complete.

    implementation strategies

    The goals of high throughput and low latency conflict with each other. To get best throughput, a compiler needs to spend time on code motion, register allocation, and instruction selection; to get low latency, that's exactly what a compiler should not do. Web browsers therefore take a two-pronged approach: they have a compiler optimized for throughput, and a compiler optimized for latency. As a WebAssembly file is being downloaded, it is first compiled by the quick-and-dirty low-latency compiler, with the goal of producing machine code as soon as possible. After that "baseline" compiler has run, the "optimizing" compiler works in the background to produce high-throughput code. The optimizing compiler can take more time because it runs on a separate thread. When the optimizing compiler is done, it replaces the baseline code. (The actual heuristics about whether to do baseline + optimizing ("tiering") or just to go straight to the optimizing compiler are a bit hairy, but this is a summary.)

    This article is about the WebAssembly baseline compiler in Firefox. It's a surprising bit of code and I learned a few things from it.

    design questions

    Knowing what you know about the goals and design of WebAssembly, how would you implement a low-latency compiler?

    It's a question worth thinking about so I will give you a bit of space in which to do so.




    After spending a lot of time in Firefox's WebAssembly baseline compiler, I have extracted the following principles:

    1. The function is the unit of compilation

    2. One pass, and one pass only

    3. Lean into the stack machine

    4. No noodling!

    In the remainder of this article we'll look into these individual points. Note, although I have done a good bit of hacking on this compiler, its design and original implementation comes mainly from Mozilla hacker Lars Hansen, who also currently maintains it. All errors of exegesis are mine, of course!

    the function is the unit of compilation

    As we mentioned, in the binary encoding of a WebAssembly module, all definitions needed by any function come before all function definitions. This naturally leads to a partition between two phases of bytestream parsing: an initial serial phase that collects the set of global type definitions, annotations as to which functions are imported and exported, and so on, and a subsequent phase that compiles individual functions in an essentially independent manner.

    The advantage of this approach is that compiling functions is a natural task unit of parallelism. If the user has a machine with 8 virtual cores, the web browser can keep one or two cores for the browser itself and farm out WebAssembly compilation tasks to the rest. The result is that the compiled code is available sooner.

    Taking functions to be the unit of compilation also allows for an easy "tier-up" mechanism: after the baseline compiler is done, the optimizing compiler can take more time to produce better code, and when it is done, it can swap out the results on a per-function level. All function calls from the baseline compiler go through a jump table indirection, to allow for tier-up. In SpiderMonkey there is no mechanism currently to tier down; if you need to debug WebAssembly code, you need to refresh the page, causing the wasm code to be compiled in debugging mode. For the record, SpiderMonkey can only tier up at function calls (it doesn't do OSR).

    This simple approach does have some down-sides, in that it leaves intraprocedural optimizations on the table (inlining, contification, custom calling conventions, speculative optimizations). This is mitigated in two ways, the most obvious being that LLVM or whatever produced the WebAssembly has ideally already done whatever inlining might be fruitful. The second is that WebAssembly is designed for predictable performance. In JavaScript, an implementation needs to do run-time type feedback and speculative optimizations to get good performance, but the result is that it can be hard to understand why a program is fast or slow. The designers and implementers of WebAssembly in browsers all had first-hand experience with JavaScript virtual machines, and actively wanted to avoid unpredictable performance in WebAssembly. Therefore there is currently a kind of détente among the various browser vendors, that everyone has agreed that they won't do speculative inlining -- yet, anyway. Who knows what will happen in the future, though.

    Digressing, the summary here is that the baseline compiler receives an individual function body as input, and generates code just for that function.

    one pass, and one pass only

    The WebAssembly baseline compiler makes one pass through the bytecode of a function. Nowhere in all of this are we going to build an abstract syntax tree or a graph of basic blocks. Let's follow through how that works.

    Firstly, emitFunction simply emits a prologue, then the body, then an epilogue. emitBody is basically a big loop that consumes opcodes from the instruction stream, dispatching to opcode-specific code emitters (e.g. emitAddI32).

    The opcode-specific code emitters are also responsible for validating their arguments; for example, emitAddI32 is wrapped in an assertion that there are two i32 values on the stack. This validation logic is shared by a templatized codestream iterator so that it can be re-used by the optimizing compiler, as well as by the publicly-exposed WebAssembly.validate function.

    A corollary of this approach is that machine code is emitted in bytestream order; if the WebAssembly instruction stream has an i32.add followed by a i32.sub, then the machine code will have an addl followed by a subl.

    WebAssembly has a syntactically limited form of non-local control flow; it's not goto. Instead, instructions are contained in a tree of nested control blocks, and control can only exit nonlocally to a containing control block. There are three kinds of control blocks: jumping to a block or an if will continue at the end of the block, whereas jumping to a loop will continue at its beginning. In either case, as the compiler keeps a stack of nested control blocks, it has the set of valid jump targets and can use the usual assembler logic to patch forward jump addresses when the compiler gets to the block exit.

    lean into the stack machine

    This is the interesting bit! So, WebAssembly instructions target a stack machine. That is to say, there's an abstract stack onto which evaluating i32.const 32 pushes a value, and if followed by i32.const 10 there would then be i32(32) | i32(10) on the stack (where new elements are added on the right). A subsequent i32.add would pop the two values off, and push on the result, leaving the stack as i32(42). There is also a fixed set of local variables, declared at the beginning of the function.

    The easiest thing that a compiler can do, then, when faced with a stack machine, is to emit code for a stack machine: as values are pushed on the abstract stack, emit code that pushes them on the machine stack.

    The downside of this approach is that you emit a fair amount of code to do read and write values from the stack. Machine instructions generally take arguments from registers and write results to registers; going to memory is a bit superfluous. We're willing to accept suboptimal code generation for this quick-and-dirty compiler, but isn't there something smarter we can do for ephemeral intermediate values?

    Turns out -- yes! The baseline compiler keeps an abstract value stack as it compiles. For example, compiling i32.const 32 pushes nothing on the machine stack: it just adds a ConstI32 node to the value stack. When an instruction needs an operand that turns out to be a ConstI32, it can either encode the operand as an immediate argument or load it into a register.

    Say we are evaluating the i32.add discussed above. After the add, where does the result go? For the baseline compiler, the answer is always "in a register" via pushing a new RegisterI32 entry on the value stack. The baseline compiler includes a stupid register allocator that spills the value stack to the machine stack if no register is available, updating value stack entries from e.g. RegisterI32 to MemI32. Note, a ConstI32 never needs to be spilled: its value can always be reloaded as an immediate.

    The end result is that the baseline compiler avoids lots of stack store and load code generation, which speeds up the compiler, and happens to make faster code as well.

    Note that there is one limitation, currently: control-flow joins can have multiple predecessors and can pass a value (in the current WebAssembly specification), so the allocation of that value needs to be agreed-upon by all predecessors. As in this code:

    (func $f (param $arg i32) (result i32)
      (block $b (result i32)
        (i32.const 0)
        (local.get $arg)
        (br_if $b) ;; return 0 from $b if $arg is zero
        (i32.const 1))) ;; otherwise return 1
    ;; result of block implicitly returned

    When the br_if branches to the block end, where should it put the result value? The baseline compiler effectively punts on this question and just puts it in a well-known register (e.g., $rax on x86-64). Results for block exits are the only place where WebAssembly has "phi" variables, and the baseline compiler allocates all integer phi variables to the same register. A hack, but there we are.

    no noodling!

    When I started to hack on the baseline compiler, I did a lot of code reading, and eventually came on code like this:

    void BaseCompiler::emitAddI32() {
      int32_t c;
      if (popConstI32(&c)) {
        RegI32 r = popI32();
        masm.add32(Imm32(c), r);
      } else {
        RegI32 r, rs;
        pop2xI32(&r, &rs);
        masm.add32(rs, r);

    I said to myself, this is silly, why are we only emitting the add-immediate code if the constant is on top of the stack? What if instead the constant was the deeper of the two operands, why do we then load the constant into a register? I asked on the chat channel if it would be OK if I improved codegen here and got a response I was not expecting: no noodling!

    The reason is, performance of baseline-compiled code essentially doesn't matter. Obviously let's not pessimize things but the reason there's a baseline compiler is to emit code quickly. If we start to add more code to the baseline compiler, the compiler itself will slow down.

    For that reason, changes are only accepted to the baseline compiler if they are necessary for some reason, or if they improve latency as measured using some real-world benchmark (time-to-first-frame on Doom 3, for example).

    This to me was a real eye-opener: a compiler optimized not for the quality of the code that it generates, but rather for how fast it can produce the code. I had seen this in action before but this example really brought it home to me.

    The focus on compiler throughput rather than compiled-code throughput makes it pretty gnarly to hack on the baseline compiler -- care has to be taken when adding new features not to significantly regress the old. It is much more like hacking on a production JavaScript parser than your traditional SSA-based compiler.

    that's a wrap!

    So that's the WebAssembly baseline compiler in SpiderMonkey / Firefox. Until the next time, happy hacking!

    by Andy Wingo at March 25, 2020 04:29 PM

    March 16, 2020

    Víctor Jáquez

    Review of the Igalia Multimedia team Activities (2019/H2)

    This blog post is a review of the various activities the Igalia Multimedia team was involved along the second half of 2019.

    Here are the previous 2018/H2 and 2019/H1 reports.


    Succinctly, GstWPE is a GStreamer plugin which allows to render web-pages as a video stream where it frames are GL textures.

    Phil, its main author, wrote a blog post explaning at detail what is GstWPE and its possible use-cases. He wrote a demo too, which grabs and previews a live stream from a webcam session and blends it with an overlay from wpesrc, which displays HTML content. This composited live stream can be broadcasted through YouTube or Twitch.

    These concepts are better explained by Phil himself in the following lighting talk, presented at the last GStreamer Conference in Lyon:

    Video Editing

    After implementing a deep integration of the GStreamer Editing Services (a.k.a GES) into Pixar’s OpenTimelineIO during the first half of 2019, we decided to implement an important missing feature for the professional video editing industry: nested timelines.

    Toward that goal, Thibault worked with the GSoC student Swayamjeet Swain to implement a flexible API to support nested timelines in GES. This means that users of GES can now decouple each scene into different projects when editing long videos. This work is going to be released in the upcoming GStreamer 1.18 version.

    Henry Wilkes also implemented the support for nested timeline in OpenTimelineIO making GES integration one of the most advanced one as you can see on that table:

    Single Track of Clips ✔ ✔ ✔ ✔ ✔ W-O ✔ ✔
    Multiple Video Tracks ✔ ✖ ✔ ✔ ✔ W-O ✔ ✔
    Audio Tracks & Clips ✔ ✔ ✔ ✔ ✔ W-O ✔ ✔
    Gap/Filler ✔ ✔ ✔ ✔ ✔ ✔ ✖ ✔
    Markers ✔ ✔ ✔ ✔ ✖ N/A ✖ ✔
    Nesting ✔ ✖ ✔ ✔ ✔ W-O ✔ ✔
    Transitions ✔ ✔ ✖ ✖ ✔ W-O ✖ ✔
    Audio/Video Effects ✖ ✖ ✖ ✖ ✖ N/A ✖ ✔
    Linear Speed Effects ✔ ✔ ✖ ✖ R-O ✖ ✖ ✖
    Fancy Speed Effects ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖
    Color Decision List ✔ ✔ ✖ ✖ ✖ ✖ N/A ✖

    Along these lines, Thibault delivered a 15 minutes talk, also in the GStreamer Conference 2019:

    After detecting a few regressions and issues in GStreamer, related to frame accuracy, we decided to make sure that we can seek in a perfectly frame accurate way using GStreamer and the GStreamer Editing Services. In order to ensure that, an extensive integration testsuite has been developed, mostly targeting most important container formats and codecs (namely mxf, quicktime, h264, h265, prores, jpeg) and issues have been fixed in different places. On top of that, new APIs are being added to GES to allow expressing times in frame number instead of nanoseconds. This work is still ongoing but should be merged in time for GStreamer 1.18.

    GStreamer Validate Flow

    GstValidate has been turning into one of the most important GStreamer testing tools to check that elements behave as they are supposed to do in the framework.

    Along with our MSE work, we found that other way to specify tests, related with produced buffers and events through specific pads, was needed. Thus, Alicia developed a new plugin for GstValidate: Validate Flow.

    Alicia gave an informative 30 minutes talk about GstValidate and the new plugin in the last GStreamer Conference too:

    GStreamer VAAPI

    Most of the work along the second half of 2019 were maintenance tasks and code reviews.

    We worked mainly on memory restrictions per backend driver, and we reviewed a big refactor: internal encoders now use GstObject, instead of the custom GstVaapiObject. Also we reviewed patches for new features such as video rotation and cropping in vaapipostproc.

    Servo multimedia

    Last year we worked integrating media playing in Servo. We finally delivered hardware accelerated video playback in Linux and Android. We worked also for Windows and Mac ports but they were not finished. As natural, most of the work were in servo/media crate, pushing code and reviewing contributions. The major tasks were to rewrite the media player example and the internal source element looking to handle the download playbin‘s flag properly.

    We also added WebGL integration support with <video> elements, thus webpages can use video frames as WebGL textures.

    Finally we explored how to isolate the multimedia processing in a dedicated thread or process, but that task remains pending.

    WebKit Media Source Extension

    We did a lot of downstream and upstream bug fixing and patch review, both in WebKit and GStreamer, for our MSE GStreamer-based backend.

    Along this line we improved WebKitMediaSource to use playbin3 but also compatibility with older GStreamer versions was added.

    WebKit WebRTC

    Most of the work in this area were maintenance and fix regressions uncovered by the layout tests. Besides, the support for the Rasberry Pi was improved by handling encoded streams from v4l2 video sources, with some explorations with Minnowboard on top of that.


    GStreamer Conference

    Igalia was Gold sponsor this last GStreamer Conference held in Lyon, France.

    All team attended and five talks were delivered. Only Thibault presented, besides the video editing one which we already referred, another two more: One about GstTranscoder API and the other about the new documentation infrastructure based in Hotdoc:

    We also had a productive hackfest, after the conference, where we worked on AV1 Rust decoder, HLS Rust demuxer, hardware decoder flag in playbin, and other stuff.

    Linaro Connect

    Phil attended the Linaro Connect conference in San Diego, USA. He delivered a talk about WPE/Multimedia which you can enjoy here:


    Charlie attended Demuxed, in San Francisco. The conference is heavily focused on streaming and codec engineering and validation. Sadly there are not much interest in GStreamer, as the main focus is on FFmpeg.


    Phil and I attended the last RustFest in Barcelona. Basically we went to meet with the Rust community and we attended the “WebRTC with GStreamer-rs” workshop presented by Sebastian Dröge.

    by vjaquez at March 16, 2020 03:20 PM

    March 09, 2020

    Jacobo Aragunde

    Mapping the input method implementations in Chromium

    This is an overview of the different input method implementations in Chromium, centered in Linux platforms.

    By Rime-devel – 小狼毫输入法界面 / Interface of Weasel Input Method, GPL, Link

    Native IME

    InputMethodAuraLinux is the central piece here. It implements two important interfaces and triggers the construction of the platform-specific bits. This is the structure for the Wayland/Ozone backend:

    The InputMethodAuraLinux triggers the construction of a specific LinuxInputMethodContext through the Factory. In the case above, it would be an instance of WaylandInputMethodContext. It will save a reference to the Context while, at the same time, set itself as the delegate for that Context object. The set of methods of both classes is pretty similar, because they are expected to work together: the Context object, in the parts related to the platform, and InputMethodAuraLinux in the parts related to the rest of Chromium. It will interact with other system classes, mainly through the singleton object IMEBridge.

    The X11 backend currently involves slightly different classes. It still has the InputMethodAuraLinux as a center piece, but uses different implementations for the Context and the ContextFactory: the multi-purpose class LinuxUI also acts as a factory. It’s expected to adapt to the Ozone structure at some point in the future.

    ChromeOS and IME extension API

    Chromium provides an extension API to allow the implementation of IMEs exclusively with web technologies. It shares code with the ChromeOS implementation.

    The piece glueing the native infrastructure and the ChromeOS/Extension API implementations is the IMEEngineHandlerInterface, which has hooks to manage relevant events like focus in or out, process key events, etc. The singleton class IMEBridge can set an EngineHandler, which is an implementation of the aforementioned interface, and in the case that such implementation exists, it will receive those events from Chromium whenever they happen.

    There is one base implementation of IMEEngineHandlerInterface called InputMethodEngineBase. It is extended by chromeos::InputMethodEngine, for the ChromeOS implementation, and by input_method::InputMethodEngine, for the IME extension API implementation.

    The setup of the IME extension API contains the following pieces:

    • observers to the IME bridge
    • an InputIMEEventRouter that contains an instance of the InputMethodEngine class
    • a factory to associate instances of InputIMEEventRouter with browser profiles.
    • An InputImeAPI class that acts as several kinds of observers.
    • the actual code that backs every JavaScript IME operation, which makes use of the classes mentioned above and is contained in individual classes that implement UIThreadExtensionFunction for every operation.

    The IME API operates in two directions: when one of the Javascript operations is directly called, it will trigger the native code associated to it, which would invoke any relevant Chromium code; when a relevant event happens in the system, the IMEBridgeObserver is notified and the associated JavaScript callback will be triggered.

    by Jacobo Aragunde Pérez at March 09, 2020 05:00 PM

    Brian Kardell

    Making Sure Content Lives On...

    Making Sure Content Lives On...

    In which I describe a problem, share a possible solution that you can use today, and, hopefully start a conversation...

    (If you already know the problem, and you're looking for the solution, check out the documentation over on GitHub)

    There's a series of problems that I experience literally all the time: Link rot and the disappearance of our history. I have written posts on blogging platforms that no longer exist. I have written guest posts for sites that no longer exist. I frequently wish that that content still existed - not only still existed, but was findable again. I have researched hard and bookmarked or linked to obscure but important history which, you guessed it: no longer exists. This is tremendously frustrating, but also just really sad (more on this later)...

    This is one thing that really interests me about a potentially future decentralized web - we could do better here. At the same time, we often have to deal with the Web we have now - not the one we wish existed. So, let's look the state of this today?

    Archiving today

    Luckily, when content disappears, I can often turn to the Internet Archive and the Wayback Machine to find content. But the truth is that they've got a very tough challenge. Identifing all the things that need indexing is really hard. If you have a relatively small blog or site, especially, it can be hard to find. Then, even if it gets found, how do they know when you've got some new content? So, the net result is that no matter how good a job they do - a lot of really interesting stuff can wind up not being findable today there either.

    History itself makes this problem even harder: It's very unpredictable. Often things are considerably more interesting in only in hindsight. Things also tend to seem a lot more stable in the short term than they are in reality in the long arc of history: Even giant "too big to fail", semi-monopolies ultimately shift, burn out or even just break their history. One of the wisest things I ever heard was an older engineer who told me if they could impart one things to students it was this - all the paradigms that you think are forever will shift. Once upon a time IBM ruled the world. Motorola was absolutely dominant. Once upon a time, SourceForge was the shit. Then GitHub. Once upon a time Netscape won the interwebs, and then they lost it to Microsoft who won it even bigger. Then Microsoft lost it again.

    It's ok if you don't share my desire, but I personally, want to be sure my content continues to exist, and people can find it via old links despite all of these uncertain futures. Today, the best chance of that is in the Internet Archive. So, how can I be more confident that it does?

    Maybe... Just ask?

    Interestingly, a lot of these problems get hella simpler if you just ask.

    For a really long time now (as long as I can remember the Wayback Machine existing), you can go to, enter a URL under the heading 'Save page now' and click the button.

    A notification model is way, way simpler than somehow trying to monitor and guess - and we use this same basic pattern for several things in engineering. It's quite success at keeping things efficient.

    The trouble here is, I think, that it is currently manual. It's unlikely that I'm going to do that every time I add a new blog post, even thought that's exactly what I want to do.

    Maybe we can fix that?

    Asking by default...

    Recently, this got me to thinking... What if it were really easy to incorporate this into whatever blog software you use today? Think about it: RSS continues to thrive, in part, because you hardly have to know about it - it's so comparatively easy (entirely free in some cases in that it is just part of the software you get). If you could incorporate this into your system in somewhere between 0 and 30 minutes, would you?

    Every now and then I have a thought like this that seems so clear to me that I figure "surely this must exist? Why do I not know about it?" So, I asked around on social media. Surprisingly, I could find no one who was aware of such a thing. What about publicly exposing an API? Do they? Maybe? Kinda? Could we make it easier?

    Brendan Eich cc'ed Jason Scott from the Internet Archive into the conversation. Jason cc'ed in Mark Graham, the Director for the Wayback Machine. Mark got me access to a service and set up this little experiement to try it out.

    But how to get there?

    One of the challenges here is how to make something 'easy' when there are so many blogging systems and site hosting things. Submitting a URL automatically, even, is technically not hard - but fitting that into each system is a little more work. Then, each of these uses different frameworks, languages, package managers and so on. I don't really have time to build out 100 different 'easy' solutions just to start a conversation. In fact, I want this for myself, and currently my setup is very custom - it's all stuff I wrote that works for me but would be more or less useful to someone else.

    But then I had a thought: There's actually lots of good stuff in the Web we have for tackling this problem.

    Existing, higher level hooks

    HTTP is already a nice, high-level API that doesn't really care about how you built your site, but it might not be automatically obvious or convenient regarding how to plug into it. We already have lots of infrastructure too... As I said, almost everyone has an RSS feed and almost everyone has some kind of 'publish' process. Maybe I could make it easy for huge swaths of people by just tapping into some big patterns and existing machinery. What I really wanted, I thought, was just a flexible and easy to integrate service. It doesn't have to block your build, it isn't absolutely critical, it's just "hey... I just published a thing."

    So, I decided to try this..

    I created an HTTP service with a Cloudflare worker. Cloudflare workers are interesting because they're deployed at the edge, they sleep when youre not using them and you can process up to 100k requests per-day for free. Since they kind of spin up as needed, handling errors for this kind of thing is easy too - you aren't going to pollute the system somehow. In fact, they seem kind of ideally suited for this kind of thing.

    This seems very good actually because blogs, especially personal blogs, tend to churn out content "slowly". That is, 100k requests per day seems like plenty for a few orders of magnitude more blogs than that. So... Cool.

    Basically - you send an HTTP Post to this service with the content-type `application/json` and provide some stuff in the body. But what you provide can differ to make integration very easy regardless of what kind of setup you have. It allows, effectively 3 'shapes' that I've put together that should fit nicely and easily into whatever system you already have today and be easy to integrate. For my own blog, hosted on gh pages, it took ~5 minutes. If you're interested in trying it out or seeing what I came up with check out the experiment itself, try it out.

    I think it should be pretty easy and flexible, but I don't know, wdyt? Are you willing to try it? Is this an interesting problem? Should the Wayback maybe offer something like this itself? How could it be better?

    March 09, 2020 04:00 AM