Planet Igalia

September 02, 2021

Byungwoo Lee

CSS Selectors :has()

Selector? Combinator? Subject?

As described in the Selectors Level 4 spec, a selector represents a particular pattern of element(s) in a tree structure. We can select specific elements in a tree structure by matching the pattern to the tree.

Generally, this pattern involves two disctinct concepts: First, a means to express conditions to be tested on an element itself (simple selectors or compound selector). Second, a means to express conditions on the relationship between two elements (combinators).

And the subject of a selector is any element matched by the selector.

The limits of subjects, so far

When you have a reference element in a DOM tree, you can select other elements with a CSS selector.

In a generic tree structure, an element can have 4-way relationships to other elements.

  • an element is an ancestor of an other element.
  • an element is a previous sibling of an other element.
  • an element is a next sibling of an other element.
  • an element is a descendant of an other element.

CSS Selectors, to date, have only allowed the last 2 (‘is a next sibling of’ and ‘is a descendant of’).

So in the CSS world, Thor can say “I am Thor, son of Odin” like this: Odin > Thor. But there has been no way for Darth Vader to tell Luke, “I’m your father”.

At least, these are the limits of what has been implemented and is shipping in every browser to date. However, :has() in the CSS Selectors spec provides the expression: DarthVader:has(> Luke)

The reason of the limitation is mainly about efficiency.

The primary use of selectors has always been in CSS itself. Pages often have 500-2000 CSS rules and slightly more elements in them. Selectors act as filters in the process of applying style rules to elements. If we have 2000 css rules for 2000 elements, matching could be done at least 2,000 times, and in the worst case (in theory) 4,000,000 times. In the browser, the tree is changing constantly - even a static document is rapidly mutated (built) as it is parsed - and we try to render all of this incrementally and at 60 fps. In summary, the selector matching is performed very frequently in performance-critical processes. So, it must be designed and implemented to meet very high performance. And one of the efficient ways to make it is to make the problem simple by limiting complex problems.

In the tree structure, checking a descendant relationship is more efficient than checking an ancestor relationship because an element has only one parent, but it can have multiple children.

<div id=parent>
  <div id=subject>
    <div id=child1></div>
    <div id=child2></div>
    <div id=child10></div>
subject.matches('#parent > :scope');
// matches   : Are you a child of #parent ?
// #subject  : Yes, my parent is #parent.

subject.matches(':has(> #child10)');
// matches   : Are you a parent of #child10 ?
// #subject  : Wait a second, I have to lookup all my children.
//             Yes, #child10 is one of my children.

By removing one of the two opposite directions, we can always place the subject of a selector to the right, no matter how complex the selector is.

  • ancestor subject
    -> subject is a descendant of ancestor
  • previous_sibling ~ subject
    -> subject is a next sibling of previous_sibling
  • previous_sibling ~ ancestor subject
    -> subject is a descendant of ancestor, which is a next sibling of previous_sibling

With this limitation, we can get the advantages of having simple data structures and simple matching sequences.

A > B + C { color: red; }
'A > B + C' can be parsed as a list of selector/combinator pair.
  {selector: 'C', combinator: '+'},
  {selector: 'B', combinator: '>'},
  {selector: 'A', combinator: null}
<A>       <!-- 3. match 'A' and apply style to C if matched-->
  <B></B> <!-- 2. match 'B' and move to parent if matched-->
  <C></C> <!-- 1. match 'C' and move to previous if matched-->

:has() allows you to select subjects at any position

With combinators, we can only select downward (descendants, next siblings or descendants of next siblings) from a reference element. But there are many other elements that we can select if the other two relationships, ancestors and previous siblings, are supported.

<div>               <!-- ? -->
  <div></div>         <!-- ? -->
<div>               <!-- ? -->
  <div>               <!-- ? -->
    <div></div>         <!-- ? -->
  <div id=reference>  <!-- #reference -->
    <div></div>         <!-- #reference > div -->
  <div>               <!-- reference + div -->
    <div></div>         <!-- reference + div > div -->
<div>               <!-- ? -->
  <div></div>         <!-- ? -->

:has() provides the way of selecting upward (ancestors, previous siblings, previous siblings of ancestors) from a reference element.

<div>               <!-- div:has(+ div > #reference) -->
  <div></div>         <!-- ? -->
<div>               <!-- div:has(> #reference) -->
  <div>               <!-- div:has(+ #reference) -->
    <div></div>         <!-- ? -->
  <div id=reference>  <!-- #reference -->
    <div></div>         <!-- #reference > div -->
  <div>               <!-- #reference + div -->
    <div></div>         <!-- #reference + div > div -->
<div>               <!-- ? -->
  <div></div>         <!-- ? -->

And with some simple combinations, we can select all elements around the reference element.

<div>               <!-- div:has(+ div > #reference) -->
  <div></div>         <!-- div:has(+ div > #reference) > div -->
<div>               <!-- div:has(> #reference) -->
  <div>               <!-- div:has(+ #reference) -->
    <div></div>         <!-- div:has(+ #reference) > div -->
  <div id=reference>  <!-- #reference -->
    <div></div>         <!-- #reference > div -->
  <div>               <!-- #reference + div -->
    <div></div>         <!-- #reference + div > div -->
<div>               <!-- div:has(> #reference) + div -->
  <div></div>         <!-- div:has(> #reference) + div > div -->

What is the problem with :has() ?

As you might already know, this pseudo class has been delayed for a long time despite the constant interest.

There are many complex situations that makes things difficult when we try to support :has().

  • There are many, many complex cases of selector combinations.
  • Those cases are handled in the selector matching operations and style invalidation operations in the style engine.
  • Selector matching operation and style invalidation operation is very critical to performance.
  • The style engine is carefully designed and highly optimized based on the existing two relationships. (is a descendant of, is a next sibling of).
  • Each Browser engine has its own design and optimization for those operations.

In this context, :has() provides the other two relationships (is a parent of, is a previous sibling of), and problems and concerns start from this.

When we meet a complex and difficult problem, the first strategy we can take is to break it down into smaller ones. For :has(), we can divide the problems with the CSS selector profiles

Problems of the :has() matching operation

:has() matching operation basically implies descendant lookup overhead as described previously. This is an unavoidable overhead we have to take on when we want to use :has() functionality.

In some cases, :has() matching can be O(n2) because of the duplicated argument matching operations. When we call document.querySelectorAll('A:has(B)') on the DOM <A><A><A><A><A><A><A><A><A><A><B>, there can be unnecessary argument selector matching because the descendant traversal can occur for every element A. If so, the number of argument matching operation can be 55(=10+9+8+7+6+5+4+3+2+1) without any optimization, whereas 10 is optimal for this case.

There can be more complex cases involving shadow tree boundary crossing.

Problems of the :has() Style invalidation

In a nutshell, the style engine tries to invalidate styles of elements that are possibly affected by a DOM mutation. It has long been designed and highly optimized based on the assumption that, any possibly affected element is the changed element itself or is downward from it.

.mutation .subject { color: red; }
<div>          <!-- classList.toggle('mutation') affect .subject -->
  <div class="subject"></div>       <!-- .subject is in downward -->

But :has() invalidation is different because the possibly affected element is upward of the changed element (an ancestor, rather than a descendant).

.subject:has(.mutation) { color: red; }
<div class="subject">                 <!-- .subject is in upward -->
  <div></div>  <!-- classList.toggle('mutation') affect .subject -->

In some cases, a change can affect elements in both the upward and downward directions.

.subject1:has(:is(.mutation1 .something)) { color: red; }
.something:has(.mutation2) .subject2 { color: red; }
<div class="subject1">              <!-- .subject1 is in upward -->
  <div>     <!-- classList.toggle('mutation1') affect .subject1 -->
    <div class="subject1">        <!-- .subject1 is in downward -->
      <div class="something"></div>
<div class="something">
  <div class="subject2">            <!-- .subject2 is in upward -->
    <div>   <!-- classList.toggle('mutation2') affect .subject2 -->
      <div class="subject2"></div><!-- .subject2 is in downward -->

Actually, a change can affect everywhere.

:has(~ .mutation) .subject { color: red; }
:has(.mutation) ~ .subject { color: red; }
    <div class="subject">        <!-- not in upward or downward -->
  <div></div> <!-- classList.toggle('mutation') affect .subject -->
<div class="subject"></div>      <!-- not in upward or downward -->

The expansion of the invalidation traversal scope (from the downward sub-tree to the entire tree) can cause performance degradation. And the violation of the basic assumptions of the invalidation logic (finding a subject from the entire tree instead of finding it from downward) can cause performance degradation and can increase implementation complexity or maintenance overhead, because it will be hard or impossible for the existing invalidation logic to support :has() invalidation as it is.

(There are many more details about :has() invalidation, and those will be covered later.)

What is the current status of :has() ?

Thanks to funding from eye/o, the :has() prototyping in the Chromium project was started by Igalia after some investigations.

(You can get rich background about this from the post - “Can I :has() by Brian Kardell.)

Prototyping is still underway, but here is our progress so far.

  • Chromium
    • Landed CLs to support :has() selector matching (3 CLs)
    • Bug fix (2 CLs)
    • Add experimental feature flag for :has() in snapshot profile (1 CL)
  • WPT (web platform test)
    • Add tests (3 Pull requests)
  • CSS working group drafts

:has() in snapshot profile

For about the :has() in snapshot profile, as of now, Chrome Dev (Version 94 released at Aug 19) supports all the :has() functionalities except some cases involving shadow tree boundary crossing.

You can try :has() with javascript APIs (querySelectorAll, querySelector, matches, closest) in snapshot profile after enabling the runtime flag : enable-experimental-web-platform-features.


You can also enable it with the commandline flag : CSSPseudoHasInSnapshotProfile.

$ google-chrome-unstable \

:has() in both (snapshot/live) profile

You can enable :has() in both profiles with the commandline flag : CSSPseudoHas.

$ google-chrome-unstable --enable-blink-features=CSSPseudoHas

Support for :has() in the live profile is still in progress. When you enable :has() with this flag, you can see that style rules with :has() are working only at loading time. The style will not be recalculated after DOM changes.


by Byungwoo's Blog at September 02, 2021 03:00 PM

August 31, 2021

Juan A. Suárez

Implementing Performance Counters in V3D driver

Let me talk here about how we implemented the support for performance counters in the Mesa V3D driver, the OpenGL driver used by the Raspberry Pi 4. For reference, the implementation is very similar to the one already available (not done by me, by the way) for the VC4, OpenGL driver for the Raspberry Pi 3 and prior devices, also part of Mesa. If you are already familiar with how this is implemented in VC4, then this will mostly be a refresher.

First of all, what are these performance counters? Most of the processors nowadays contain some hardware facilities to get measurements about what is happening inside the processor. And of course graphics processors aren’t different. In this case, the graphics chips used by Raspberry Pi devices (manufactured by Broadcom) can record a bunch of different graphics-related parameters: how many quads are passing or failing depth/stencil tests, how many clock cycles are spent on doing vertex/fragment shading, hits/misses in the GPU cache, and many others values. In fact, with the V3D driver it is possible to measure around 87 different parameters, and up to 32 of them simultaneously. Quite a few less in VC4, though. But still a lot.

On a hardware level, using these counters is just a matter of writing and reading some GPU registers. First, write the registers to select what we want to measure, then a few more to start to measure, and finally read other registers containing the results. But of course, much like we don’t expect users to write GPU assembly code, we don’t expect users to write registers in the GPU directly. Moreover, even the Mesa drivers such as V3D can’t interact directly with the hardware; rather, this is done through the kernel, the one that can use the hardware directly, through the DRM subsystem in the kernel. For the case of V3D (and same applies to VC4, and in general to any other driver), we have a driver in user-space (whether the OpenGL driver, V3D, or the Vulkan driver, V3DV), and a kernel driver in the kernel-space, unsurprisingly also called V3D. The user-space driver is in charge of translating all the commands and options created with the OpenGL API or other API to batches of commands to be executed by the GPU, which are submitted to the kernel driver as DRM jobs. The kernel does the proper actions to send these to the GPU to execute them, including touching the proper registers. Thus, if we want to implement support for the performance counters, we need to modify the code in two places: the kernel and the (user-space) driver.

Implementation in the kernel

Here we need to think about how to deal with the GPU and the registers to make the performance counters work, as well as the API we provide to user-space to use them. As mentioned before, the approach we are following here is the same as the one used in the VC4 driver: performance counters monitors. That is, the user-space driver creates one or more monitors, specifying for each monitor what counters it is interested in (up to 32 simultaneously, the hardware limit). The kernel returns a unique identifier for each monitor, which can be used later to do the measurement, query the results, and finally destroy it when done.

In this case, there isn’t an explicit start/stop the measurement. Rather, every time the driver wants to measure a job, it includes the identifier of the monitor it wants to use for that job, if any. Before submitting a job to the GPU, the kernel checks if the job has a monitor identifier attached. If so, then it needs to check if the previous job executed by the GPU was also using the same monitor identifier, in which case it doesn’t need to do anything other than send the job to the GPU, as the performance counters required are already enabled. If the monitor is different, then it needs first to read the current counter values (through proper GPU registers), adding them to the current monitor, stop the measurement, configure the counters for the new monitor, start the measurement again, and finally submit the new job to the GPU. In this process, if it turns out there wasn’t a monitor under execution before, then it only needs to execute the last steps.

The reason to do all this is that multiple applications can be executing at the same time, some using (different) performance counters, and most of them probably not using performance counters at all. But the performance counter values of one application shouldn’t affect any other application so we need to make sure we don’t mix up the counters between applications. Keeping the values in their respective monitors helps to accomplish this. There is still a small requirement in the user-space driver to help with accomplishing this, but in general, this is how we avoid the mixing.

If you want to take a look at the full implementation, it is available in a single commit.

Implementation in the driver

Once we have a way to create and manage the monitors, using them in the driver is quite easy: as mentioned before, we only need to create a monitor with the counters we are interested in and attach it to the job to be submitted to the kernel. In order to make things easier, we keep a mirror-like version of the monitor inside the driver.

This approach is adequate when you are developing the driver, and you can add code directly on it to check performance. But what about the final user, who is writing an OpenGL application and wants to check how to improve its performance, or check any bottleneck on it? We want the user to have a way to use OpenGL for this.

Fortunately, there is in fact a way to do this through OpenGL: the GL_AMD_performance_monitor extension. This OpenGL extension provides an API to query what counters the hardware supports, to create monitors, to start and stop them, and to retrieve the values. It looks very similar to what we have described so far, except for an important difference: the user needs to start and stop the monitors explicitly. We will explain later why this is necessary. But the key point here is that when we start a monitor, this means that from that moment on, until stopping it, any job created and submitted to the kernel will have the identifier of that monitor attached. This implies that only one monitor can be enabled in the application at the same time. But this isn’t a problem, as this restriction is part of the extension.

Our driver does not implement this API directly, but through “queries”, which are used then by the Gallium subsystem in Mesa to implement the extension. For reference, the V3D driver (as well as the VC4) is implemented as part of the Gallium subsystem. The Gallium part basically handles all the hardware-independent OpenGL functionality, and just requires the driver hook functions to be implemented by the driver. If the driver implements the proper functions, then Gallium exposes the right extension (in this case, the GL_AMD_performance_monitor extension).

For our case, it requires the driver to implement functions to return which counters are available, to create or destroy a query (in this case, the query is the same as the monitor), start and stop the query, and once it is finished, to get the results back.

At this point, I would like to explain a bit better what it implies to stop the monitor and get the results back. As explained earlier, stopping the monitor or query means that from that moment on, any new job submitted to the kernel (and thus to the GPU) won’t contain a performance monitor identifier attached, and hence won’t be measured. But it is important to know that the driver submits jobs to the kernel to be executed at its own pace, but these aren’t executed immediatly; the GPU needs time to execute the jobs, and so the kernel puts the arriving jobs in a queue, to be submitted to the GPU. This means when the user stops the monitor, there could be still jobs in the queue that haven’t been executed yet and are thus pending to be measured.

And how do we know that the jobs have been executed by the GPU? The hook function to implement getting the query results has a “wait” parameter, which tells if the function needs to wait for all the pending jobs to be measured to be executed or not. If it doesn’t but there are pending jobs, then it just returns telling the caller this fact. This allows to do other work meanwhile and query again later, instead of becoming blocked waiting for all the jobs to be executed. This is implemented through sync objects. Every time a job is sent to the kernel, there’s a sync object that is used to signal when the job has finished executing. This is mainly used to have a way to synchronize the jobs. In our case, when the user finalizes the query we save this fence for the last submitted job, and we use it to know when this last job has been executed.

There are quite a few details I’m not covering here. If you are interested though, you can take a look at the merge request.

Gallium HUD

So far we have seen how the performance counters are implemented, and how to use them. In all the cases it requires writing code to create the monitor/query, start/stop it, and querying back the results, either in the driver itself or in the application through the GL_AMD_performance_monitor extension1.

But what if we want to get some general measurements without adding code to the application or the driver? Fortunately, there is an environmental variable GALLIUM_HUD that, when correctly, will show on top of the application some graphs with the measured counters.

Using it is very easy; set it to help to know how to use it, as well as to get a list of the available counters for the current hardware.

As example:

$ env GALLIUM_HUD=L2T-CLE-reads,TLB-quads-passing-z-and-stencil-test,QPU-total-active-clk-cycles-vertex-coord-shading scorched3d

You will see:

Performance Counters in Scorched 3D

Bear in mind that to be able to use this you will need a kernel that supports performance counters for V3D. At the moment of writing this, no kernel has been released yet with this support. If you don’t want to wait for it, you can download the patch, apply it to your raspberry pi kernel (which has been tested in the 5.12 branch), build and install it.

  1. All this is for the case of using OpenGL; if your application uses Vulkan, there are other similar extensions, which are not yet implemented in our V3DV driver at the moment of writing this post. 

August 31, 2021 10:00 PM

August 27, 2021

Qiuyi Zhang (Joyee)

Building V8 on an M1 MacBook

I’ve recently got an M1 MacBook and played around with it a bit. It seems many open source projects still haven’t added MacOS with ARM64

August 27, 2021 03:52 PM

On deps/v8 in Node.js

I recently ran into a V8 test failure that only showed up in the V8 fork of Node.js but not in the upstream. Here I’ll write down my

August 27, 2021 03:52 PM

Tips and Tricks for Node.js Core Development and Debugging

I thought about writing some guides on this topic in the nodejs/node repo, but it’s easier to throw whatever tricks I personally use on

August 27, 2021 03:52 PM

August 20, 2021

Brian Kardell

Experimenting with :has()

Experimenting with :has()

Back in May, I wrote Can I :has()?. In that piece, I discussed the :has() pseudo-class and the practical reasons it's been hard to advance. Today I'll give you some updates on advancing :has() efforts in Chromium, and how you can play with it today.

In my previous piece I explained that Igalia had been working to help move these discussions along by doing the research that has been difficult for vendors to prioritize (funded by eyeo) and that we believe that we'd gotten somewhere: We'd done lot of research, developed a prototype in a custom build of chromium and had provided what we believed were good proofs for discussion. The day that I wrote that last piece, we were filing an intent to prototype in chromium.

Today, I'd like to give some updates on those efforts...

Where things stand in Chromium, as of yesterday

As you may, or may not know, the process for shipping new features in Chromium is pretty involved and careful. There are several 'intent' steps, many, reviews along the way, many channels (canary, dev, beta, stable). Atop this are also things which launch with command line flags, runtime feature flags, origin trials and finch flags.

Effectively, things get more serious and certain, and as that happens we want to expand the reach of these things by making it easier for more developers to experiment with it.


For a while now our up-streaming efforts have allowed you to pass command line flags to enable some support in early channels. Either


The former adds support for the use of the :has() pseudo class in the JavaScript selector APIs ('the snapshot/static profile'), and the latter enables support in CSS stylesheets too.

These ways still work, but it's obviously a lot more friction than most developers will take the time to learn, figure out, and try. Most of us don't launch from a command line.

New Advancements!

As things have gotten more stable and serious, we're moving along and making some thing easier...

As of the dev channel release 94.0.4606.12 (yesterday), enabling support in the JavaScript selector APIs is now as simple as enabling the experimental web platform features runtime flag. Chances are, a number of readers already have this flag flipped, so low friction indeed!

Support in the JavaScript APIs has always involved far fewer unknowns and challenges, but what's held us from adding support there first has always been a desire to prevent splitting and a lack of ability to answer questions about whether the main, live CSS profile could be supported, what limits it would need and so on. We feel like we have a much better grip on many of these questions now and so things are moving along a bit.

We hope that this encourages more people to try it out and provide feedback, open bugs, or just add encouragement. Let us know if you do!

Much more at Ad Blocker Dev Summit 2021

I'm also happy to note that I'll be speaking, along with my colleague Byungwoo Lee and eyeo's @shwetank and @WebReflection at Ad Blocker Dev Summit 2021 on October 21. Looking forward to being able to provide a lot more information there on the history, technical challenges, process, use cases and impacts! Hope to see you there!

August 20, 2021 04:00 AM

August 13, 2021

Qiuyi Zhang (Joyee)

My 2019

It’s that time of the year again! I did not manage to write a recap about my 2018, so I’ll include some reflection about that year in

August 13, 2021 06:21 PM

Uncaught exceptions in Node.js

In this post, I’ll jot down some notes that I took when refactoring the uncaught exception handling routines in Node.js. Hopefully it

August 13, 2021 06:21 PM

My 2017

I decided to write a recap of my 2017 because looking back, it was a very important year to me.

August 13, 2021 06:21 PM

New Blog

I’ve been thinking about starting a new blog for a while now. So here it is.

Not sure if I am going to write about tech here.

August 13, 2021 06:21 PM

August 11, 2021

Danylo Piliaiev

Testing Vulkan drivers with games that cannot run on the target device

Here I’m playing “Spelunky 2” on my laptop and simultaneously replaying the same Vulkan calls on an ARM board with Adreno GPU running the open source Turnip Vulkan driver. Hint: it’s an x64 Windows game that doesn’t run on ARM.

The bottom right is the game I’m playing on my laptop, the top left is GFXReconstruct immediately replaying Vulkan calls from the game on ARM board.

How is it done? And why would it be useful for debugging? Read below!

Debugging issues a driver faces with real-world applications requires the ability to capture and replay graphics API calls. However, for mobile GPUs it becomes even more challenging since for Vulkan driver the main “source” of real-world workload are x86-64 apps that run via Wine + DXVK, mainly games which were made for desktop x86-64 Windows and do not run on ARM. Efforts are being made to run these apps on ARM but it is still work-in-progress. And we want to test the drivers NOW.

The obvious solution would be to run those applications on an x86-64 machine capturing all Vulkan calls. Then replaying those calls on a second machine where we cannot run the app. This way it would be possible to test the driver even without running the application directly on it.

The main trouble is that Vulkan calls made on one GPU + Driver combo are not generally compatible with other GPU + Driver combo, sometimes even for one GPU vendor. There are different memory capabilities (VkPhysicalDeviceMemoryProperties), different memory requirements for buffer and images, different extensions available, and different optional features supported. It is easier with OpenGL but there are also some incompatibilities there.

There are two open-source vendor-agnostic tools for capturing Vulkan calls: RenderDoc (captures single frame) and GFXReconstruct (captures multiple frames). RenderDoc at the moment isn’t suitable for the task of capturing applications on desktop GPUs and replaying on mobile because it doesn’t translate memory type and requirements (see issue #814). GFXReconstruct on the other hand has the necessary features for this.

I’ll show a couple of tricks with GFXReconstruct I’m using to test things on Turnip.

Capturing with GFXReconstruct

At this point you either have the application itself or, if it doesn’t use Vulkan, a trace of its calls that could be translated to Vulkan. There is a detailed instruction on how to use GFXReconstruct to capture a trace on desktop OS. However there is no clear instruction of how to do this on Android (see issue #534), fortunately there is one in Android’s documentation:

Android how-to (click me)
For Android 9 you should copy layers to the application which will be traced
For Android 10+ it's easier to copy them to com.lunarg.gfxreconstruct.replay
You should have userdebug build of Android or probably rooted Android

# Push GFXReconstruct layer to the device
adb push /sdcard/

# Since there is to APK for capture layer,
# copy the layer to e.g. folder of com.lunarg.gfxreconstruct.replay
adb shell run-as com.lunarg.gfxreconstruct.replay cp /sdcard/ .

# Enable layers
adb shell settings put global enable_gpu_debug_layers 1

# Specify target application
adb shell settings put global gpu_debug_app <package_name>

# Specify layer list (from top to bottom)
adb shell settings put global gpu_debug_layers VK_LAYER_LUNARG_gfxreconstruct

# Specify packages to search for layers
adb shell settings put global gpu_debug_layer_app com.lunarg.gfxreconstruct.replay

If the target application doesn’t have rights to write into external storage - you should change where the capture file is created:

adb shell "setprop debug.gfxrecon.capture_file '/data/data/<target_app_folder>/files/'"

However, when trying to replay the trace you captured on another GPU - most likely it will result in an error:

[gfxrecon] FATAL - API call vkCreateDevice returned error value VK_ERROR_EXTENSION_NOT_PRESENT that does not match the result from the capture file: VK_SUCCESS.  Replay cannot continue.
Replay has encountered a fatal error and cannot continue: the specified extension does not exist

Or other errors/crashes. Fortunately we could limit the capabilities of desktop GPU with VK_LAYER_LUNARG_device_simulation

VK_LAYER_LUNARG_device_simulation when simulating another GPU should be told to intersect the capabilities of both GPUs, making the capture compatible with both of them. This could be achieved by recently added environment variables:


whitelist name is rather confusing because it’s essentially means “intersection”.

One would also need to get a json file which describes target GPU capabilities, this should be done by running:

vulkaninfo -j &> <device_name>.json

The final command to capture a trace would be:

VK_LAYER_PATH=<path/to/device-simulation-layer>:<path/to/gfxreconstruct-layer> \
VK_INSTANCE_LAYERS=VK_LAYER_LUNARG_gfxreconstruct:VK_LAYER_LUNARG_device_simulation \
VK_DEVSIM_FILENAME=<device_name>.json \

Replaying with GFXReconstruct

gfxrecon-replay -m rebind --skip-failed-allocations <trace_name>.gfxr
  • -m Enable memory translation for replay on GPUs with memory types that are not compatible with the capture GPU’s
    • rebind Change memory allocation behavior based on resource usage and replay memory properties. Resources may be bound to different allocations with different offsets.
  • --skip-failed-allocations skip vkAllocateMemory, vkAllocateCommandBuffers, and vkAllocateDescriptorSets calls that failed during capture

Without these options replay would fail.

Now you could easily test any app/game on your ARM board, if you have enough RAM =) I even successfully ran a capture of “Metro Exodus” on Turnip.

But what if I want to test something that requires interactivity?

Or you don’t want to save a huge trace on disk, which could grow tens of gigabytes if application is running for considerable amount of time.

During the recording GFXReconstruct just appends calls to a file, there are no additional post-processing steps. Given that the next logical step is to just skip writing to a disk and send Vulkan calls over the network!

This would allow us to interact with the application and immediately see the results on another device with different GPU. And so I hacked together a crude support of over-the-network replay.

The only difference with ordinary tracing is that now instead of file we have to specify a network address of the target device:

VK_LAYER_PATH=<path/to/device-simulation-layer>:<path/to/gfxreconstruct-layer> \

And on the target device:

while true; do gfxrecon-replay -m rebind --sfa ":<port>"; done

Why while true? It is common for DXVK to call vkCreateInstance several times leading to the creation of several traces. When replaying over the network we therefor want gfxrecon-replay to immediately restart when one trace ends to be ready for another.

You may want to bring the FPS down to match the capabilities of lower power GPU in order to prevent constant hiccups. It could be done either with libstrangle or with mangohud:

  • stranglevk -f 10
  • MANGOHUD_CONFIG=fps_limit=10 mangohud

You have seen the result at the start of the post.

by Danylo Piliaiev at August 11, 2021 09:00 PM

August 10, 2021

Iago Toral

An update on feature progress for V3DV

I’ve been silent here for quite some time, so here is a quick summary of some of the new functionality we have been exposing in V3DV, the Vulkan driver for Raspberry PI 4, over the last few months:

  • VK_KHR_bind_memory2
  • VK_KHR_copy_commands2
  • VK_KHR_dedicated_allocation
  • VK_KHR_descriptor_update_template
  • VK_KHR_device_group
  • VK_KHR_device_group_creation
  • VK_KHR_external_fence
  • VK_KHR_external_fence_capabilities
  • VK_KHR_external_fence_fd
  • VK_KHR_external_semaphore
  • VK_KHR_external_semaphore_capabilities
  • VK_KHR_external_semaphore_fd
  • VK_KHR_get_display_properties2
  • VK_KHR_get_memory_requirements2
  • VK_KHR_get_surface_capabilities2
  • VK_KHR_image_format_list
  • VK_KHR_incremental_present
  • VK_KHR_maintenance2
  • VK_KHR_maintenance3
  • VK_KHR_multiview
  • VK_KHR_relaxed_block_layout
  • VK_KHR_sampler_mirror_clamp_to_edge
  • VK_KHR_storage_buffer_storage_class
  • VK_KHR_uniform_buffer_standard_layout
  • VK_KHR_variable_pointers
  • VK_EXT_custom_border_color
  • VK_EXT_external_memory_dma_buf
  • VK_EXT_index_type_uint8
  • VK_EXT_physical_device_drm

Besides that list of extensions, we have also added basic support for Vulkan subgroups (this is a Vulkan 1.1 feature) and Geometry Shaders (we use this to implement multiview).

I think we now meet most (if not all) of the Vulkan 1.1 mandatory feature requirements, but we still need to check this properly and we also need to start doing Vulkan 1.1 CTS runs and fix test failures. In any case, the bottom line is that Vulkan 1.1 should be fairly close now.

by Iago Toral at August 10, 2021 08:10 AM

August 07, 2021

Enrique Ocaña

Beyond Google Bookmarks

I was a happy user of for many years until the service closed. Then I moved my links to Google Bookmarks, which offered basically the same functionality (at least for my needs): link storage with title, tags and comments. I’ve carefully tagged and filed more than 2500 links since I started, and I’ve learnt to appreciate the usefulness of searching by tag to find again some precious information that was valuable to me in the past.

Google Bookmarks is a very old and simple service that “just works”. Sometimes it looked as if Google had just forgotten about it and let it run for years without anybody noticing… until now. It’s closing on September 2021.

I didn’t want to lose all my links, still need a link database searchable by tags and don’t want to be locked-in again in a similar service that might close in some years, so I wrote my own super-simple alternative to it. It’s called bs, sort of bookmark search.

The usage can’t be simpler, just add the tag you want to look for and it will print a list of links that have that tag:

$ bs webassembly
  title = Canvas filled three ways: JS, WebAssembly and WebGL | Compile 
    url = 
   tags = canvas,graphics,html5,wasm,webassembly,webgl 
   date = 2020-02-18 16:48:56 
comment =  
  title = Compiling to WebAssembly: It’s Happening! ★ Mozilla Hacks – the Web developer blog 
    url = 
   tags = asm.js,asmjs,emscripten,llvm,toolchain,web,webassembly 
   date = 2015-12-18 09:14:35 
comment = 

If you call the tools without parameters, it will prompt data to insert a new link or edit it if the entered url matches a preexisting one:

$ bs 
title: Canvas filled three ways: JS, WebAssembly and WebGL | Compile 
tags: canvas,graphics,html5,wasm,webassembly,webgl 

The data is stored in an sqlite database and I’ve written some JavaScript snippets to import the Delicious exported bookmarks file and the Google Bookmarks exported bookmarks file. Those snippets are meant to be copypasted in the JavaScript console of your browser while you have the exported bookmarks html file open on it. They’ll generate SQL sentences that will populate the database for the first time with your preexisting data.

By now the tool doesn’t allow to delete bookmarks (I haven’t had the need yet) and I still need to find a way to simplify its usage through the browser with a bookmarklet to ease adding new bookmarks automatically. But that’s a task for other day. By now I have enough just by knowing that my bookmarks are now safe.


[UPDATE: 2020-09-08]

I’ve now coded an alternate variant of the database client that can be hosted on any web server with PHP and SQLite3. The bookmarks can now be managed from a browser in a centralized way, in a similar fashion as you could before with Google Bookmarks and Delicious. As you can see in the screenshot, the style resembles Google Bookmarks in some way.

You can easily create a quick search / search engine link in Firefox and Chrome (I use “d” as keyword, a tradition from the Delicious days, so that if I type “d debug” in the browser search bar it will look for that tag in the bookmark search page). Also, the 🔖 button opens a popup that shows a bookmarklet code that you can add to your browser bookmark bar. When you click on that bookmarklet, the edit page prefilled with the current page info is opened, so you can insert or edit a new entry.

There’s a trick to use the bookmarklet on Android Chrome: Use a rare enough name for the bookmarklet (I used “+ Bookmark 🔖”). Then, when you want to add the current page to the webapp, just start typing “+ book”… in the search bar and the saved bookmarklet link will appear as an autocomplete option. Click on it and that’s it.


by eocanha at August 07, 2021 12:29 PM

August 05, 2021

Chris Lord

OffscreenCanvas update

Hold up, a blog post before a year’s up? I’d best slow down, don’t want to over-strain myself 🙂 So, a year ago, OffscreenCanvas was starting to become usable but was missing some key features, such as asynchronous updates and text-related functions. I’m pleased to say that, at least for Linux, it’s been complete for quite a while now! It’s still going to be a while, I think, before this is a truly usable feature in every browser. Gecko support is still forthcoming, support for non-Linux WebKit is still off by default and I find it can be a little unstable in Chrome… But the potential is huge, and there are now double the number of independent, mostly-complete implementations that prove it’s a workable concept.

Something I find I’m guilty of, and I think that a lot of systems programmers tend to be guilty of, is working on a feature but not using that feature. With that in mind, I’ve been spending some time in the last couple of weeks to try and bring together demos and information on the various features that the WebKit team at Igalia has been working on. With that in mind, I’ve written a little OffscreenCanvas demo. It should work in any browser, but is a bit pointless if you don’t have OffscreenCanvas, so maybe spin up Chrome or a canary build of Epiphany.

OffscreenCanvas fractal renderer demo, running in GNOME Web Canary

Those of us old-skool computer types probably remember running fractal renderers back on their old home computers, whatever they may have been (PC for me, but I’ve seen similar demos on Amigas, C64s, Amstrad CPCs, etc.) They would take minutes to render a whole screen. Of course, with today’s computing power, they are much faster to render, but they still aren’t cheap by any stretch of the imagination. We’re talking 100s of millions of operations to render a full-HD frame. Running on the CPU on a single thread, this is still something that isn’t really real-time, at least implemented naively in JavaScript. This makes it a nice demonstration of what OffscreenCanvas, and really, Worker threads allow you to do without too much fuss.

The demo, for which you can look at my awful code, splits that rendering into 64 tiles and gives each tile to the first available Worker in a pool of rendering threads (different parts of the fractal are much more expensive to render than others, so it makes sense to use a work queue, rather than just shoot them all off distributed evenly amongst however many Workers you’re using). Toggle one of the animation options (palette cycling looks nice) and you’ll get a frame-rate counter in the top-right, where you can see the impact on performance that adding Workers can have. In Chrome, I can hit 60fps on this 40-core Xeon machine, rendering at 1080p. Just using a single worker, I barely reach 1fps (my frame-rates aren’t quite as good in WebKit, I expect because of some extra copying – there are some low-hanging fruit around OffscreenCanvas/ImageBitmap and serialisation when it comes to optimisation). If you don’t have an OffscreenCanvas-capable browser (or a monster PC), I’ve recorded a little demonstration too.

The important thing in this demo is not so much that we can render fractals fast (this is probably much, much faster to do using WebGL and shaders), but how easy it is to massively speed up a naive implementation with relatively little thought. Google Maps is great, but even on this machine I can get it to occasionally chug and hitch – OffscreenCanvas would allow this to be entirely fluid with no hitches. This becomes even more important on less powerful machines. It’s a neat technology and one I’m pleased to have had the opportunity to work on. I look forward to seeing it used in the wild in the future.

by Chris Lord at August 05, 2021 03:33 PM

August 02, 2021

Philippe Normand

Introducing the GNOME Web Canary flavor

Today I am happy to unveil GNOME Web Canary which aims to provide bleeding edge, most likely very unstable builds of Epiphany, depending on daily builds of the WebKitGTK development version. Read on to know more about this.

Until recently the GNOME Web browser was available for end-users in two flavors. The primary, stable release provides the vanilla experience of the upstream Web browser. It is shipped as part of the GNOME release cycle and in distros. The second flavor, called Tech Preview, is oriented towards early testers of GNOME Web. It is available as a Flatpak, included in the GNOME nightly repo. The builds represent the current state of the GNOME Web master branch, the WebKitGTK version it links to is the one provided by the GNOME nightly runtime.

Tech Preview is great for users testing the latest development of GNOME Web, but what if you want to test features that are not yet shipped in any WebKitGTK version? Or what if you are GNOME Web developer and you want to implement new features on Web that depend on API that was not released yet in WebKitGTK?

Historically, the answer was simply “you can build WebKitGTK yourself“. However, this requires some knowledge and a good build machine (or a lot of patience). Even as WebKit developer builds have become easier to produce thanks to the Flatpak SDK we provide, you would still need to somehow make Epiphany detect your local build of WebKit. Other browsers offer nightly or “Canary” builds which don’t have such requirements. This is exactly what Epiphany Canary aims to do! Without building WebKit yourself!

A brief interlude about the term: Canary typically refers to highly unstable builds of a project, they are named after Sentinel species. Canary birds were taken into mines to warn coal miners of carbon monoxide presence. For instance Chrome has been providing Canary builds of its browser for a long time. These builds are useful because they allow early testing, by end-users. Hence potentially early detection of bugs that might not have been detected by the usual automated test harness that buildbots and CI systems run.

To similar ends, a new build profile and icon were added in Epiphany, along with a new Flatpak manifest. Everything is now nicely integrated in the Epiphany project CI. WebKit builds are already done for every upstream commit using the WebKit Buildbot. As those builds are made with the WebKit Flatpak SDK, they can be reused elsewhere (x86_64 is the only arch supported for now) as long as the WebKit Flatpak platform runtime is being used as well. Build artifacts are saved, compressed, and uploaded to a web server kindly hosted and provided by Igalia. The GNOME Web CI now has a new job, called canary, that generates a build manifest that installs WebKitGTK build artifacts in the build sandbox, that can be detected during the Epiphany Flatpak build. The resulting Flatpak bundle can be downloaded and locally installed. The runtime environment is the one provided by the WebKit SDK though, so not exactly the same as the one provided by GNOME Nightly.

Back to the two main use-cases, and who would want to use this:

  • You are a GNOME Web developer looking for CI coverage of some shiny new WebKitGTK API you want to use from GNOME Web. Every new merge request on the GNOME Web Gitlab repo now produces installable Canary bundles, that can be used to test the code changes being submitted for review. This bundle is not automatically updated though, it’s good only for one-off testing.
  • You are an early tester of GNOME Web, looking for bleeding edge version of both GNOME Web and WebKitGTK. You can install Canary using the provided Flatpakref. Every commit on the GNOME Web master branch produces an update of Canary, that users can get through the usual flatpak update or through their flatpak-enabled app-store.


Due to an issue in the Flatpakref file, the WebKit SDK flatpak remote is not automatically added during the installation of GNOME Web Canary. So it needs to be manually added before attempting to install the flatpakref:

$ flatpak --user remote-add --if-not-exists webkit
$ flatpak --user install

As you can see in the screenshot below, the GNOME Web branding is clearly modified compared to the other flavors of the application. The updated logo, kindly provided by Tobias Bernard, has some yellow tones and the Tech Preview stripes. Also the careful reader will notice the reported WebKitGTK version in the screenshot is a development build of SVN revision r280382. Users are strongly advised to add this information to bug reports.

As WebKit developers we are always interested in getting users’ feedback. I hope this new flavor of GNOME Web will be useful for both GNOME and WebKitGTK communities. Many thanks to Igalia for sponsoring WebKitGTK build artifacts hosting and some of the work time I spent on this side project. Also thanks to Michael Catanzaro, Alexander Mikhaylenko and Jordan Petridis for the reviews in Gitlab.

by Philippe Normand at August 02, 2021 05:15 PM

July 22, 2021

Mario Sanchez Prada

Igalia and the Chromium project

A couple of months ago I had the pleasure of speaking at the 43rd International Conference on Software Engineering (aka ICSE 2021), in the context of its “Spanish Industry Case Studies” track. We were invited to give a high level overview of the Chromium project and how Igalia contributes to it upstream.

This was an unusual chance to speak at a forum other than the usual conferences I attend to, so I welcomed this as a double opportunity to explain the project to people less familiar with Chromium than those attending events such as BlinkOn or the Web Engines Hackfest, as well as to spread some awareness on our work in there.

Contributing to Chromium is something we’ve been doing for quite a few years already, but I think it’s fair to say that in the past 2-3 years we have intensified our contributions to the project even more and diversified the areas that we contribute to, something I’ve tried to reflect in this talk in no more than 25 minutes (quite a challenge!). Actually, it’s precisely because of this amount of contributions that we’re currently the 2nd biggest non-Google contributor to the project in number of commits, and among the Top 5 contributors by team size (see a highlight on this from BlinkOn 14’s keynote). For a small consultancy company such as ours, it’s certainly something to feel proud of.

With all this in mind, I organized the talk into 2 main parts: First a general introduction to the Chromium project and then a summary of the main upstream work that we at Igalia have contributed recently to it. I focused on the past year and a half, since that seemed like a good balance that allowed me to highlight the most important bits without adding too much  information. And from what I can tell based on the feedback received so far, it seems the end result has been helpful and useful for some people without prior knowledge to understand things such as the differences between Chromium and Chrome, what ChromiumOS is and how our work on several different fronts (e.g. CSS, Accessibility, Ozone/X11/Wayland, MathML, Interoperability…) fits into the picture.

Obviously, the more technically inclined you are, and the more you know about the project, the more you’ll understand the different bits of information condensed into this talk, but my main point here is that you shouldn’t need any of that to be able to follow it, or at least that was my intention (but please let me know in the comments if you have any feedback). Here you have it:

You can watch the talk online (24:05 min) on our YouTube channel, as well as grab the original slide deck as a PDF in case you also want it for references, or to check the many links I included with pointers for further information and also for reference to the different sources used.

Last, I don’t want to finish this post without thanking once again to the organizers for the invitation and for runing the event, and in particular to Andrés-Leonardo Martínez-Ortiz and Javier Provecho for taking care of the specific details involved with the “Spanish Industry Case Studies” track.

Thank you all

by mario at July 22, 2021 02:16 PM

July 21, 2021

Ricardo García

Debugging shaders in Vulkan using printf

Debugging programs using printf statements is not a technique that everybody appreciates. However, it can be quite useful and sometimes necessary depending on the situation. My past work on air traffic control software involved using several forms of printf debugging many times. The distributed and time-sensitive nature of the system being studied made it inconvenient or simply impossible to reproduce some issues and situations if one of the processes was stalled while it was being debugged.

In the context of Vulkan and graphics in general, printf debugging can be useful to see what shader programs are doing, but some people may not be aware it’s possible to “print” values from shaders. In Vulkan, shader programs are normally created in a high level language like GLSL or HLSL and then compiled to SPIR-V, which is then passed down to the driver and compiled to the GPU’s native instruction set. That final binary, many times outside the control of user applications, runs in a quite closed and highly parallel environment without many options to observe what’s happening and without text input and output facilities. Fortunately, tools like glslang can generate some debug information when compiling shaders to SPIR-V and other tools like Nsight can use that information to let you debug shaders being run.

Still, being able to print the values of different expressions inside a shader can be an easy way to debug issues. With the arrival of Ray Tracing, this is even more useful than before. In ray tracing pipelines, the shaders being executed and resources being used are chosen based on the scene geometry, the origin and the direction of the ray being traced. printf debugging can let you see where you are and what you’re using. So how do you print values from shaders?

Vulkan’s debug printf is implemented as part of the Validation Layers and the general procedure is well documented. If you were to implement this kind of mechanism yourself, you’d likely use a storage buffer to save the different values you want to print while shader invocations are running and, later, you’d go over the contents of that buffer and print the associated message with each value or values. And that is, essentially, what debug printf does but in a very convenient and automated way so that you don’t have to deal with the gory details and corner cases.

In a GLSL shader, simply:

  1. Enable the GL_EXT_debug_printf extension.

  2. Sprinkle your code with debugPrintfEXT() calls.

  3. Use the Vulkan Configurator that’s part of the SDK or manually edit vk_layer_settings.txt for your app enabling VK_VALIDATION_FEATURE_ENABLE_DEBUG_PRINTF_EXT.

  4. Normally, disable other validation features so as not to get too much output.

  5. Take a look at the debug report or debug utils info messages containing printf results, or set printf_to_stdout to true so printf messages are sent to stdout directly.

You can find an example shader in the validation layers test code. The debug printf feature has helped me a lot in the past, so I wanted to make sure it’s widely known and used.

Due to the observer effect, you may end up in situations where your code works correctly when enabling debug printf but incorrectly without it. This may be due to multiple reasons but one of the main ones I’ve encountered is improper synchronization. When debug printf is used, the layers use additional synchronization primitives to sync the contents of auxiliary buffers, which can mask synchronization bugs present in the app.

Finally, RenderDoc 1.14, released at the end of May, also supports Vulkan’s shader printf statements and will let you take a look at the print statements produced during a draw call. Furthermore, the print statements don’t have to be present in the original shader. You can also use the shader edit system to insert them on the fly and use them to debug the results of a particular shader invocation. Isn’t that awesome? Great work by Baldur Karlsson as always.

PS: As a happy coincidence, just yesterday LunarG published a white paper on Vulkan’s debug printf with additional information on this excellent feature. Be sure to check it out!

July 21, 2021 06:42 AM

July 20, 2021

Oriol Brufau

Improved raster transforms with non-uniform scales


CSS Transforms Level 1 introduced 2D transforms, that can be specified using the transform property. For example, they can be used to rotate or scale an element:

  • transform: none
  • transform: rotate(45deg)
  • transform: scale(1, 0.5)

CSS Transforms Level 2 extends that feature to allow transforms in 3D space, for example:

  • transform: rotate3d(1, 1, 1, 45deg)
  • transform: scale3d(1, 0.5, 2)

Typically, using 3D transforms forces the element into its own rendering layer. This is sometimes desired by authors, since it can improve performance if for example the element is moving around.

Therefore, identity transformations in the Z axis, like scale3d(X, Y, 1) instead of scale(X, Y), are sometimes used to opt-in into this behavior. This trick works on Chromium, but note it’s not compliant with the spec.


Forcing an element to be rasterized in its own layer can have some disadvantages.

For example, Chromium used to rasterize it using a single float scale. When the transform had different X and Y scale components, Chromium just picked the bigger one, clamped by 5 times the smaller one (to avoid memory problems). And then it used this raster scale for both axes, producing suboptimal results.

Also, Chromium only uses LCD text antialiasing when the internal raster scale matches the actual X and Y scales in the transform. Therefore, non-uniform scales prevented the nicer LCD antialiasing.

And unrelated to uniform scales, if the transformed element doesn’t have an opaque background, LCD antialiasing is not used either, since Chromium needs to know the color behind the text.

The last problem remains unsolved, but I fixed the other two in Chromium 92, which has been released today.

Thanks to Bloomberg for sponsoring Igalia to do it!


The main patch that addressed both problems was But LCD text antialiasing was still not used because I made a mistake 😳, which I fixed in

Basically, it was a matter of changing AxisTransform2d and PictureLayerImpl to store a 2D scale rather than a single float. I used gfx::Vector2dF, which is like a pair of floats with some nice methods to clamp by a minim or maximum, scale both floats by the same factor, etc.

I kept most tiling logic as it was, just taking the maximum component of the gfx::Vector2dF as the “scale key”. However, different 2D scales can have the same key, for example by dynamically changing scale3d(1, 5, 1) into scale3d(5, 1, 1), both with a scale key of 5. Therefore, when finding if there already was a tiling with the desired scale key, I made sure to the check the 2D scales, and recreate the tiling if they were different.

This is an example of how it looked like in Chromium:

This is how it looked when internally using 2D scales:

And finally, with LCD text antialiasing:

For reference, this is how your browser renders it (live example):

Lorem ipsum

Comparing the 1st and 2nd images, using 2D scales clearly improved the text, which was hard to read due to missing some thin parts of the glyphs, and also note the border diagonals in the corners look less jagged.

At first glance it may be hard to notice the difference between the 2nd and 3rd images, so you can compare the text antialiasing in these magnified images:

At the top, the edges of the glyphs simply use grayscale antialiasing, while at the bottom, the LCD antialiasing uses some colored pixels.


While my patch improved the common basic cases, Chromium will still fall back to a 1D scale in these cases:

  • Directly composited images
  • Animations
  • Heads up layers
  • Scrollbar layers
  • Mirror layers
  • When the layer has perspective

Some of them may be addressed in the future, this is tracked in bug 1196414.

For example, this live example uses a CSS animation so it still looks wrong in Chromium 92:

Lorem ipsum

I actually started a patch to address animations, and it seemed to work well in simple cases, but it could be wrong when ancestors had additional transforms. Handling that properly would have required more complexity, and it wasn’t clear that it was worth it.

Therefore, I don’t plan to continue working on these edge cases, but if you are affected by them, you can star bug 1196414 and provide a good testcase. This may help increase the priority of the bug!

by Oriol Brufau at July 20, 2021 09:30 PM

July 15, 2021

Ricardo García

Linking deqp-vk much faster thanks to lld

Some days ago my Igalia colleague Adrián Pérez pointed us to mold, a new drop-in replacement for existing Unix linkers created by the original author of LLVM lld. While mold is pretty new and does not aim to be 100% compatible with GNU ld, GNU gold or LLVM lld (at least as of the time I’m writing this), I noticed the benchmark table in its README file also painted a pretty picture about the performance of lld, if inferior to that of mold.

In my job at Igalia I work most of the time on VK-GL-CTS, Vulkan and OpenGL’s Conformance Test Suite, which contains thousands of tests for OpenGL and Vulkan. These tests are provided by different executable files and the Vulkan tests on which I’m focused are contained in a binary called deqp-vk. When built with debug information, deqp-vk can be quite large. A recent build, for example, is taking 369 MB in my drive. But the worst part is that linking the binary typically takes around 25 seconds on my work laptop.

$ time --target deqp-vk
  [6/6] Linking CXX executable external/vulkancts/modules/vulkan/deqp-vk

  real    0m25.137s
  user    0m22.280s
  sys     0m3.440s

I had never paid much attention to the linker before, always relying on the default choice in Fedora or any other distribution. However, I decided to install lld, which has an official package, and gave it a try. You Will Not Believe What Happened Next.

$ time --target deqp-vk
  [6/6] Linking CXX executable external/vulkancts/modules/vulkan/deqp-vk

  real    0m2.622s
  user    0m5.456s
  sys     0m1.764s

lld is capable of correctly linking deqp-vk in 1/10th of the time the default linker (GNU ld) takes to do the same job. If you want to try lld yourself you have several options. Ideally, you’d be able to run update-alternatives --set ld /usr/bin/lld as root but that option is notably not available in Fedora. There was a proposal to make that work but it never materialized, so it cannot be made the default system-wide linker.

However, depending on the build system used by a particular project, there should be a way to make it use lld instead of /usr/bin/ld. For example, VK-GL-CTS uses CMake, which invokes the compiler to link executable files, instead of calling the linker directly, which would be unusual. Both GCC and Clang can be passed -fuse-ld=lld as a command line option to use lld instead of the default linker. That flag should be added to CMake’s CMAKE_EXE_LINKER_FLAGS variable, either by reconfiguring an existing project with, for example, ccmake, or by adding the flag to the LDFLAGS environment variable before running CMake on a build directory for the first time.

Looking forward to start using the mold linker in the future and its multithreading capabilities. In the mean time, I’m very happy to have checked lld. It’s not that usual that a simple tooling change as this one gives me such a clear advantage.

July 15, 2021 04:33 PM

July 09, 2021

Víctor Jáquez

Video decoding in GStreamer with Vulkan

Warning: Vulkan video is still work in progress, from specification to available drivers and applications. Do not use it for production software just yet.


Vulkan is a cross-platform Application Programming Interface (API), backed by the Khronos Group, aimed at graphics developers for a wide range of different tasks. The interface is described by a common specification, and it is implemented by different drivers, usually provided by GPU vendors and Mesa.

One way to visualize Vulkan, at first glance, is like a low-level OpenGL API, but better described and easier to extend. Even more, it is possible to implement OpenGL on top of Vulkan. And, as far as I am told by my peers in Igalia, Vulkan drivers are easier and cleaner to implement than OpenGL ones.

A couple years ago, a technical specification group (TSG), inside the Vulkan Working Group, proposed the integration of hardware accelerated video compression and decompression into the Vulkan API. In April 2021 the formed Vulkan Video TSG published an introduction to the
. Please, do not hesitate to read it. It’s quite good.

Matthew Waters worked on a GStreamer plugin using Vulkan, mainly for uploading, composing and rendering frames. Later, he developed a library mapping Vulkan objects to GStreamer. This work was key for what I am presenting here. In 2019, during the last GStreamer Conference, Matthew delivered a talk about his work. Make sure to watch it, it’s worth it.

Other key components for this effort were the base classes for decoders and the bitstream parsing libraries in GStreamer, jointly developed by Intel, Centricular, Collabora and Igalia. Both libraries allow using APIs for stateless video decoding and encoding within the GStreamer framework, such as Vulkan Video, VAAPI, D3D11, and so on.

When the graphics team in Igalia told us about the Vulkan Video TSG, we decided to explore the specification. Therefore, Igalia decided to sponsor part of my time to craft a GStreamer element to decode H.264 streams using these new Vulkan extensions.


As stated at the beginning of this text, this development has to be considered unstable and the APIs may change without further notice.

Right now, the only Vulkan driver that offers these extensions is the beta NVIDIA driver. You would need, at least, version 455.50.12 for Linux, but it would be better to grab the latest one. And, of course, I only tested this on Linux. I would like to thank NVIDIA for their Vk Video samples. Their test application drove my work.

Finally, this work assumes the use of the main development branch of GStreamer, because the base classes for decoders are quite recent. Naturally, you can use gst-build for an efficient upstream workflow.

Work done

This work basically consists of two new objects inside the GstVulkan code:

  • GstVulkanDeviceDecoder: a GStreamer object in GstVulkan library, inherited from GstVulkanDevice, which enables VK_KHR_video_queue and VK_KHR_video_decode_queue extensions. Its purpose is to handle codec-agnostic operations.
  • vulkanh264dec: a GStreamer element, inherited from GstH264Decoder, which tries to instantiate a GstVulkanDeviceDecoder to composite it and is in charge of handling codec-specific operations later, such as matching the parsed structures. It outputs, in the source pad, memory:VulkanImage featured frames, with NV12 color format.

  • So far this pipeline works without errors:

    $ gst-launch-1.0 filesrc ! parsebin ! vulkanh264dec ! fakesink

    As you might see, the pipeline does not use vulkansink to render frames. This is because the Vulkan format output by the driver’s decoder device is VK_FORMAT_G8_B8R8_2PLANE_420_UNORM, which is NV12 crammed in a single image, while for GstVulkan a NV12 frame is a buffer with two images, one per component. So the current color conversion in GstVulkan does not support this Vulkan format. That is future work, among other things.

    You can find the merge request for this work in GStreamer’s Gitlab.

    Future work

    As was mentioned before, it is required to fully support VK_FORMAT_G8_B8R8_2PLANE_420_UNORM format in GstVulkan. That requires thinking about how to keep backwards compatibility. Later, an implementation of the sampler to convert this format to RGB will be needed, so that decoded frames can be rendered by vulkansink.

    Also, before implementing any new feature, the code and its abstractions will need to be cleaned up, since currently the division between codec-specific and codec-agnostic code is not strict, and it must be fixed.

    Another important cleanup task is to enhance the way the Vulkan headers are handled. Since the required headers files for video extensions are beta, they are not expected to be available in the system, so temporally I had to add the those headers as part of the GstVulkan library.

    Then it will be possible to implement the H.265 decoder, since the NVIDIA driver also supports it.

    Later on, it will be nice to start thinking about encoders. But this requires extending support for stateless encoders in GStreamer, something I want do to for the new VAAPI plugin too.

    Thanks for bearing with me, and thanks to Igalia for sponsoring this work.

    by vjaquez at July 09, 2021 05:38 PM

    July 06, 2021

    Igalia Compilers Team

    JS Nation Talk: “How to Outsmart Time: Building Futuristic JavaScript Apps Using Temporal”

    Recently Compilers Team member Ujjwal Sharma gave a talk at the JS Nation 2021 conference about the Temporal proposal. Check out the recording here:

    The talk goes over how to use Temporal’s new date & time API in real world programming, and also how Temporal interacts with other APIs such as JS Intl.

    We’ve written about Temporal previously on this blog and our other teammates have also written about how Temporal might be useful for the GNOME desktop.

    If you’re interested in an audio-format deep-dive about Temporal, also check out the Igalia Chats episode on the topic.

    by Compilers Team at July 06, 2021 06:09 PM

    July 04, 2021

    Brian Kardell

    Tabs in HTML?

    Tabs in HTML?

    Please help us evaluate an idea!

    I've fallen a bit behind on my podcast consumption (thanks to being very busy), but I was recently catching up during a drive in the car and was pleased to hear Dave Rupert and Chris Coyier having some good discussion on the Shop Talk Show episode 466 (time jumped to the start) about some work we've been doing and that we're looking for more opinions/thoughts on. Dave did a bang-up job explaining I think, but just trying to imagine it all might be a little difficult, so I wanted to share something a little more concrete, provide some context around it, and to re-iterate that "we'd like (need) your help in evaluating this idea ".

    Show me something...

    Imagine that you could have something that worked .... kind of like this (insert giant hand wavey motions): You write some good old HTML that looks just like HTML you write today, but are able to express that want you to sometimes treat it with different interaction affordances - just like the way scroll panes work on the Web today...

    A running demo showing sections that gain tabset or collapse affordances as the screen resizes.

    You should be able to load the demo in this video yourself in any browser...

    Show me the demo

    If you view the source of that page, you'll see there is just a single element that we have cleverly named <spicy-sections>.

    Please check it out: It's very easy to try. Note that the actual syntax or way to express the association with interactive afforances is entirely up in the air. This custom element isn't a proposal itself. There are many efforts happening in parallel discussing precisely how this should (and shouldn't or can and can't) work, but the custom element should help you explore the ideas.

    Play with it. Build something useful. Ask questions, show us your uses, give us feedback about what you like or don't. This will help us shape good and successful approaches and inform actual proposals. Question #1 is on the crux of the idea itself.

    If the answer to "how do I get a native tabset in the browser?"" involved using 0...1 'new' elements (perhaps one to identify content which could fit these models) and some CSS (or CSS-like) means of expressing 'when'" - would you call that a win? Do you get it? Do you love it? Do you hate it? What are your questions?

    A little background

    There are lots of efforts around adding new elements to HTML - many of which are being coordiated through an effort in WICG called "Open UI". This effort involves browser implementers, developers, UI Toolkit makers, etc, and I think that's great. I really want the web to have a better set of tools for making basic UI and I hope that Open UI can show us a better way forward than we've done in the past.

    To me, this means involving more people. It means meeting developers closer to where they are and giving them something useful to evaluate, to tighten the feedback loop and make sure we're able to course correct. But it also means that we should be producing things out in the open along the way that allow anyone to see how we even got to here. Proposals shouldn't lack a back-story or explantion, or seem like they appeared out of nowhere. There shouln''t be any things "you just have to trust us on".

    So, that's what we're trying to do: A group of us have been looking at 'tabs' - and there is a reason we're asking. Chances are, I think, this won't match your first ideas about how the browser would get native support for tabs. It didn't match my own, in fact. I've built and used a lot of tabs over the last 20+ years, and never thought of it this way either. So, I wanted to add some context and explanation for the curious...

    Step 1: Define Tabs?

    I realize this sounds almost silly. In fact, when I asked the question "can we define 'tabs'" a friend replied to me...

    You know what tabs are, Brian".

    I mean... You use them every day, on every OS. Everybody knows they exist in every toolbox. All that's left is to "just pave the cowpaths!" But when you get right down to it, it's a lot more complicated than that.

    Different "tabset" implementations have different features and different limits. All of them have changed and evolved over time. Today, most UI toolboxes have multiple APIs for creating things that users would just call "tabs" - and there are reasons for that. Sometimes these APIs are even the basis of things a user might not call tabs! In many cases, the APIs themselves aren't called "tabs" (or anything close to that). So, if we want to discuss these things (any components, really), we need to begin with a survey and lay down some clear definitions and goals. We laid this all out in this research identifying the parts and features in the landscape.

    An interesting distinction

    One thing which falls out this is an intesting distinction of 2 broad "kinds of tabset-like controls":

    • One kind is actually a window manager. The tabs at the top of the browser you're looking at right now are windows that happen to be arranged visually as groups that look like tabs.
    • Another kind manages exclusive display and focus management patterns of what are, effectively, sections within the same document that happen to look like tabs.

    As end-users we probably don't think about this much, but it is a pretty important distinction actually and there are differences we understand commonly. In fact, so many expectations about both the shape of the UI and user interactions flow from this. Text searching happens in a window, not across windows, for example. Windows can be "dragged out" and displayed as, well, windows. The keyboard interactions and accessibility roles of windows are expected to be... windows. And so on.

    For our purposes, we have chosen to focus primarily on the ltter kind which are prevalent in UI kits for the Web.

    Markup and APIs

    Again, it feels like it shouldn't be difficult to pave these paths into markup. However, even in systems without markup, the shape of APIs varies pretty considerably. Translating this to markup adds a new set of challenges. There isn't a clear/self evident mapping to DOM/markup at all - and whatever decisions we make somehow dictate something about how APIs should work on the web.

    To illustrate this, we have also collected a bunch of research showing all sorts of variants over the years and various dissected pros and cons of each.

    Some things we thought were important...

    We thought it is worth thinking about progressive enhancement. Support for new features generally rolls out unevenly (we only got the last support for summary/details about a year ago, for example). This means that for potentially a long time, some browsers may not have support for native tabs.

    But that's only part of the story: When you consider "other browsers" - things like embedded devices which update more slowly still, or search engines or reader modes... What happens to content?

    Further, we'd would like to test out any theory with a custom element (as above), and the script can fail to download. In fact, these things seem to dovetail nicely. We'd like the content to be "good" even if script for some reason doesn't execute in that case.

    This fed into roads we didn't go down..


    One popular approach involves putting the tab's "label" as an attribute. That has a lot of really nice qualities, but it falls down on a number of other points. Without support (including all of those cases above), users would be left with information loss and a wall of run-on and unlabelled content. This also has pretty extreme limitations on what can be put into a tab label. No additional elements means no ruby text, for example, or interesting icon treatments or stylistic markup or structured text of any kind. So, we suggest that attributes for labels are not our first choice.

    TOC Style?

    Another very popular set of solutions involves "table of contents (TOC) style" markup. These draw a parallel which equates tab labels to items in a table of contents. In fact, over the years this sort of pattern has been popular with progressive enhancement enthusiasts because you can use a list of links. The markup itself even "looks like tabs" already. It's probably surprising then that this isn't our first choice - why?

    The short answer is that there are some fundamental flaws in this analogy which have impacts. Tables of contents are an enhancement of headings, not a replacement for them. That is, they are generally built based on headings that label content and simply repeated/reflected earlier. Without these already in place, the un-enhanced content is just a wall of unbroken text without labels. It's almost exactly the same issue as attribute style. While it's possible to repeat the headings too, why repeat yourself? If you have the heading, you can build the TOC, but not vice-versa.

    Similarly, the argument that "the markup looks like tabs already" is less than perfect. As our research shows, tabset labels can exist along any axis, or even around the circumference of a circle. If you look at it just slightly differently, they can perhaps even be interleaved in the content ('responsive tabs' do this, and at one point ARIA's "single select accordions" were also tabs).

    So, we suggest TOC-style isn't our first choice either.

    A tabset element?

    All of this helps shine light on some interesting questions and led to my post Design Afforance Controls which talks about why the way we approach this matters. It holds up flaws in the inherent control-specific-ness of <summary> and <details> and contrasts this with the fact that we don't have a <scrollpane> element - but rather employ those affordances when they make sense.

    If a thing's fundamental, primary "nature" is just "good sections", don't we often want to change our minds on presentation based on things like media or design? It's worth considering.

    So, it's not that we shouldn't have an element specifically for tabs as much as "isn't this maybe more useful"?

    What do you think?

    Please let us know what you think about the ideas here! If you could provide "just good content" and have pretty much full stylistic control, and use CSS to express what kind of "showey/hidey" control it should be/when... How would you feel about that? Do you "get it"? Do you "buy the arguments"? Or are we just barking up the wrong tree? Your feedback will help inform how we discuss or advocate for things next in OpenUI to move things forward.

    Mentions do not imply endorsements, but many thanks to folks who proofread this post, met along the way, helped research, had discussions, and did work. Very special thanks especially @jon_neal @TerribleMia and @davatron5000 for many thoughtful discussions.

    July 04, 2021 04:00 AM

    July 01, 2021

    Felipe Erias

    Towards richer colors on the Web


    This blog post is based on my talk at the BlinkOn 14 conference (May 2021). You can watch the talk here:

    Towards richer colors in Blink (BlinkOn 14)

    And the slides are available here: Towards richer colors in Blink - slides.

    This article will talk about the ongoing efforts to specify richer colors on the Web platform, plus some ideas about directions for future development on Blink/Chromium.


    The study of color brings together ideas from physics (how light works), biology (how our eyes see), computing, and more. There is a long and rich history following the desire to be able to use richer materials and colors when creating visual art, and the same is true of the Web today.

    A color space is a way to describe and organize colors so they can be identified and reproduced with accuracy. Some color spaces are more or less arbitrary (e.g. the Pantone collection) but the ones that we will focus on are based on detailed mathematical descriptions.

    These color spaces consist of a mathematical color model that specifies how colors are described (i.e. as tuples of numbers) and a precise description of how those components are to be interpreted.

    The range of colors that a hardware display is able to show is called its gamut. When we want to show an image that uses a larger color space than this gamut, its colors will have to be mapped to the ones that can be actually displayed: this process is called gamut mapping.

    Essentially, the colors in the original image are “squeezed” so they can be displayed by the device. This process can be rather complex, because we want the image being displayed to preserve as much of the intent of the original as possible.

    When we talk about software, we say that an application is color managed when it is aware of the different color spaces used by its source media and is able to use that information when deciding how that media should be displayed on the screen.

    Traditionally, the Web has been built on top of the sRGB color space (created in 1996) which describes colors with a RGB color model (red, green and blue) plus a non-linear transfer function to link the numerical value for each component with the intensity of the corresponding primary color.

    There are many other color spaces. The graph below represents the chromaticity of the CIE XYZ color space, which was specifically designed to cover all colors that an average human can see.


    Source: WikiMedia

    From that large map of colors within human perception, we can identify those that fall within the sRGB color space.


    Source: WikiMedia

    As you can see, there are many colors that we can perceive but can not be described by sRGB!

    Learn more: Color: From Hexcodes to Eyeballs (Jamie Wong)

    (Note: these graphs are a useful tool to visualize and compare different gamuts but sometimes can be a bit confusing, because they use colors that we can obviously see but then tell us that some of the colors represented by them are outside the gamut that our device can display.)

    Colors on the Web

    The sRGB color space gained popularity because it was well suited to be displayed by the CRT monitors that were common at the time. CSS includes plenty of functions and shortcuts to define colors in the sRGB space, for example:

    rgb(218, 112, 214)
    rgba(211, 65, 0, .8)
    hsl(177, 70%, 41%)

    See also: Color CSS data type

    As technology has improved over time, nowaday many devices are able to display colors that go beyond the sRGB color space. On the Web platform there is increasing interest in adding support for wider color gamuts to different elements.

    Learn more: Unlocking Colors (Brian Kardell), LCH colors in CSS: what, why, and how? (Lea Verou)

    Several JavaScript libraries already provide a lot of functionality for manipulating colors (but are limited by the limits of what can be displayed by the browser).

    See: Color JS, D3 d3-interpolate, chroma JS.

    The major Web browsers offer different levels of support for color management and access to wider gamuts.

    This article will focus specifically on adding support on Blink and Chromium for richer colors in elements defined in HTML and CSS.

    CSS Color

    The reference specification for richer colors on the Web is the CSS Color Module elaborated by the CSS Working Group. CSS Color Module 4 describes most of the changes discussed here and CSS Color Module 5 will bring additional functionality.

    There is as well a Color on the Web community group at the W3C that among other things organises a workshop on wide color gamut for the Web.

    In 2020 there was also a very interesting discussion at the W3C’s Technical Architecture Group about how having colors outside of the sRGB gamut opened up questions about interoperability between the different elements of the platform, as well as interesting observations around how to support calculations for improved color contrast and accessibility.

    (Note: this list does not pretend to be exhaustive and it intentionally leaves aside the many groups working on standards beyond CSS and beyond the Web in general.)

    The CSS Color spec, among other things:

    • extends the color() function to let the author explicitly indicate the desired color space of a color, including those with a wide gamut;
    • defines the lab() and lch() functions to specify colors in the CIE LAB colorspace;
    • provides detailed control over how interpolation happens, as well as many other features;
    • contains a reference implementation for the operations described in it.

    So why is this a big deal?

    Display more colors

    First, using only sRGB limits the range of colors that can be displayed. Many modern monitors have a wider gamut than sRGB, often close to another standard called Display-P3.

    Here you can see both of those spaces over the same graph that we saw before:

    srgb p3

    The Display-P3 space is about one third larger than sRGB. This means that from CSS we have no access to roughly one third of the colors that modern monitors can display.

    Learn more: Why DCI-P3 is the New Standard of Color Gamut?

    See also: Wide-gamut color on the web

    This is another way of visualizing the same idea, where the white line in each case represents the boundary between what can be described by sRGB and what is within Display-P3.

    srgb p3 outline

    As you can see, the colors that fall within the Display-P3 space but outside of sRGB are the most intense and vivid.

    Learn more: Wide Gamut Color in CSS with Display-P3 (WebKit)

    When a Web browser is not able to display a color because of hardware and/or software limitations, it will use instead the closest one of the colors that it can display.

    Let’s see an example of this. The image on the left below is a uniform red square in the sRGB gamut. The image on the right is slightly different, as it actually uses two different shades of red: one that is within the sRGB gamut and another that is outside of it. On sRGB displays, both colors are painted the same and the result is a uniform red square, just like the first image. However, on a system that can display wide-gamut colors, both shades of red will be painted differently and you will be able to see a faint WebKit logo inside the square.

    sRGB color examples wide-gamut color examples

    Source and more information: Comparison between normal and wide-gamut images (WebKit).

    Furthermore, there are color spaces that are even larger than Display-P3; for now, they are mostly reserved to professional equipment and applications, but it is likely that at some point in the future some of them will probably become popular in their turn.

    Adding wider color spaces to the Web is as much about supporting what widely available hardware can do today as it is about setting us in the path to support what it will do in the future.

    Consistent and predictable colors

    Secondly, another limitation of sRGB on the Web is that it is not perceptually uniform: the same numeric amount of change in a value does not cause similar changes in the colors that we perceive.

    We can see this clearly with HLS, which is an alternate way to express the same sRGB colors in terms of hue, saturation, and lightness. Let’s see some examples.

    Here 20 degrees in hue are the difference between orange and yellow:

    HSL(30, 100%, 50%)
    HSL(50, 100%, 50%)

    Whereas here that same step produces very similar blues:

    HSL(230, 100%, 50%)
    HSL(250, 100%, 50%)

    Changing the lightness value may also change the saturation that we perceive (even when its numerical value stays the same).

    HSL(0, 90%, 40%)
    HSL(0, 90%, 80%)

    And colors with the same saturation and lightness values can be perceived very differently because of their hue:

    HSL(250, 100%, 50%)
    HSL(60, 100%, 50%)

    Learn more: Color spaces for human beings

    This means that in general sRGB (and by extension HSL) can not be used to accurately adjust lightness, saturation or hue, to find complementary colors, to calculate the perceived contrast between two colors, etc.

    One of the new functionalities in the CSS Color spec is to be able to use color spaces where the same numerical changes in one of the values brings similar perceived changes, like the LCH color space.

    LCH is based on the CIE LAB color space and defines colors according to their Lightness, Chroma, and Hue.

    In the LCH color space, the same numerical changes in a value bring about similar and predictable changes in the colors that we perceive without affecting the other characteristics.

    Changes in lightness:

    LCH(45% 60 60)
    LCH(60% 60 60)
    LCH(75% 60 60)

    Changes in chroma (or “amount of color”):

    LCH(50% 10 319)
    LCH(50% 60 319)
    LCH(50% 110 319)

    Changes in hue:

    LCH(50% 70 35)
    LCH(50% 70 135)
    LCH(50% 70 280)

    Learn more: LCH colour picker

    See also: Perceptually uniform color spaces

    Interpolation and more

    Since color spaces represent and organize colors differently, the path to reach one color from another is not the same on different spaces. This means that there are many possible ways to interpolate between two colors to create e.g. a gradient. For example:

    interpolation examples 1

    Try it out: Color JS - interpolation

    The CSS Color spec will provide more control over interpolation on additional color spaces. This is just one example where adding richer color capabilities to the Web dramatically broadens the range of tools available to authors when creating their sites.

    Learn more: Interpolation on CSS Color 4, Mixing colors on CSS Color 5.

    Color in Chromium

    Now let’s talk about Chromium. As you know, it is the Free Software portion of the Chrome and Edge Web browsers.

    The Web engine inside of it is called Blink and it implements the Web Platform standards that describe how to turn Web content into pixels on the screen.

    Blink itself is a fork of WebKit, which is the Web engine used by Safari and others.

    Render pipeline

    Blink basically creates a rendering pipeline that takes Web sources as input (pages, stylesheets, and so on).

    It parses them, applies styles, defines geometry, arranges the content into layers and tiles, paints those and sends them over to be displayed.

    This job of actually painting those pixels is carried out by a multiplatform graphics library called Skia.

    Blink pipeline

    Learn more: Life of a Pixel (Chromium team).

    Richer colors

    In Chromium, there is already some support for color management, @media queries (gamut), color profiles (tags) in images, and so on.

    Learn more about embedded color profiles in images: Digital-Image Color Spaces (Jeffrey Friedl).

    There is now also an intent to experiment with additional color spaces for canvas, WebGL and WebGPU.

    See: Color managing canvas contents, intent to ship.

    However, there isn’t yet support for using richer color spaces with individual Web elements like we have seen in the previous section.

    Within Blink, CSS colors are parsed and stored into a small structure with just 32 bits: that means 8 bits per RGB color channel (plus 8 more for transparency).

    See: color.h (Chromium).

    These colors are eventually handed over to the Skia library to carry out the actual drawing. Skia then uses its own similar 32-bit format.

    See: SkColor in SkColor.h (Chromium).

    We can show this on the previous diagram. A Web page specifies a color in sRGB which is stored in a 32-bit format and passed across the rendering pipeline until it reaches Skia, where is is converted to a similar format, rastered, and displayed.

    Blink pipeline colors

    This means that thoughout Blink’s rendering pipeline colors are represented using only 32 bits, and this limits the precision and the richness of the colors that can be used and displayed in websites by Chromium.

    Some ideas from WebKit

    Blink started as a fork of WebKit in 2013 and although they have evolved in different ways, we can still look at WebKit to get some inspiration for storing and manipulating high-precision colors.

    Without getting into too much detail, WebKit supports a high precision representation of colors that stores four float values plus a colorspace. LAB is one of those spaces that may be used to define colors in WebKit.

    Learn more: Improving Color on the Web, Wide Gamut Color in CSS with Display-P3 (WebKit).

    See also: Color.h, ColorComponents.h and ColorSpace.h (WebKit).

    Having this support for higher precision colors has already made it possible to implement several color features in WebKit, for example:

    An importabnt difference is that WebKit uses the platform’s graphic libraries directly (e.g. CoreGraphics on Mac) whereas Chromium uses Skia across different platforms. Support for displaying colors beyond the sRGB gamut may not be available in all platforms.

    High precision colors in Skia

    Interestingly, Skia does not have the same limits in color precision and range as Blink does.

    Internally, it has a format for high-precision colors that holds four float values, and it is also able to take color spaces into account.

    See: SkRGBA4f and SkColor4f in SkColor.h.

    Much of the Skia API is already able to take as input a colorspace and one or more high precision colors defined in it. Skia is also able to convert between source and destination color spaces, so colors can be manipulated with flexibility before being adapted to be displayed on concrete hardware.

    In Skia, a color space is defined by a transfer function and a gamut. See: SkColorSpace.h.

    So, Skia is able to paint richer colors on hardware that supports them.

    This means that, if we managed to get that rich color information defined in the Web sources at the beginning of the pipeline all the way to Skia at the end of the pipeline, we would be able to paint those colors correctly on the screen :)

    However, two more things would be needed in order to implement the full functionality of the CSS Color specs. First, Skia’s representation of high-precision colors still uses the RGBA structure, so out of the box Skia does not support other formats like LAB or LCH.

    Secondly, as we have seen, the CSS Color spec provides ways to specify the interpolation colorspace for gradients, transitions, etc. Blink relies on Skia for this interpolation, but Skia does not provide fine-grained control: Skia will always use the colorspace where the source colors have been defined, and does not support interpolating in a different space.

    These point towards the need for an additional layer between the Blink painting code and SKia that is able to translate the richer color information into formats that Skia can understand and use to display those colors on the screen.


    As a very broad summary, the first step to support wider, richer color gamuts in Blink is to parse the CSS code using those new features.

    Those wide gamut colors and their colorspaces need to be stored in a high-precision format that can be used throughout Blink’s rendering pipeline.

    At the end of the pipeline, it needs to be translated so Skia can paint those colors correctly in the desired hardware. For this, we will also need more fine-grained control over interpolation and probably other changes.

    This work is not straightforward because it would touch a lot of different components, and it might also have an impact on memory, on performance, on how paint information is recorded and used, etc.

    For Web authors it is important that these features are available at the same time, so they can rely on the new functionality provided by the CSS Color spec.

    In Closing

    I hope that with this you got a better understanding of the value of adding richer colors to the Web and the scope of the work that would be needed to do so in Chromium.

    These are some steps in the long road to increase the expressivity of the web platform and to widen the range of tools that are available to authors when creating the Web.

    Thank you very much for reading.

    by Felipe Erias at July 01, 2021 12:00 AM

    June 28, 2021

    Miguel A. Gómez

    Unresponsive web processes in WPE and WebKitGTK

    In case you’re not familiar with WebKit‘s multiprocess model, allow me to explain some basics: In WebKitGTK and WPE we currently have three main processes which collaborate in order to ultimately render the web pages (this is subject to change, probably with the addition of new processes):

    1. The UI process: this is the process of the application that instantiates the web view. For example, Cog when using WPE or Epiphany when using WebKitGTK. This process is usually in charge of interacting with the user/application and send events and requests to the other processes.
    2. The network process: the goal of this process is to perform network requests for the different web processes and make the results available to them.
    3. The web process: this is the process that really performs the heavy lifting. It’s in charge of requesting the resources required by the page (through the network process), create the DOM tree, execute the JS code, render the content, respond to user events, etc.

    In a simple case, we would have a UI process, a web process and a network process working together to render a page. There are situations where (due to security reasons, performance, build options, etc.) the same UI process can be using more than a single web process at the same time, while these share the network process. I won’t go into details about all the possibilities here because that’s beyond the goal of this post (but it’s a good idea for a future post!). If you want more information related to this you can search for topics like process prewarm, process swap on navigation or service workers running on a dedicated processes. In any case, at any given moment, one of those web processes is always the main one performing all the tasks mentioned before, and the others are helpers for concrete tasks, so the situation is equivalent to the simpler case of a single UI process, a web process and an network process.

    Developers know that processes have the bad habit of getting hung sometimes, and that can happen here as well to any of the processes. Unfortunately, in many cases that’s probably due to some kind of bug in the code executed by the processes and we can’t do much more about that than fixing the bug. But there’s a special kind of block that affects the web process only, and it’s related to the execution of JS code. Consider this simple JS script for example:

    setTimeout(function() {
        while(true) { }
    }, 1000);

    This will create an infinite loop during the JS execution after one second, blocking the main thread of the web process, and taking 100% of the execution time. The main thread, besides executing the JS code, is also in charge of processing incoming events or messages from the UI process, perform the layout, initiate redraws of the page, etc. But as it’s locked in the JS loop, it won’t be able to perform any of those tasks, causing the web process to become unresponsive to user input or API calls that come through the UI process. There are other situations that could make the process unresponsive, but the execution of faulty JS is probably the most common one.

    WebKit has some internal tools to detect these situations, that mainly consist on timeouts that mark the process as unresponsive when some expected reply is not received in time from the web process. But on WebKitGTK and WPE there wasn’t a way to notify the application that’s using the web view that the web process had become unresponsive, or a way to recover from this situation (other than closing the application or the tab). This is fixed in trunk for both ports by adding two components to the WebKitWebView class:

    1. A new property named is-web-process-responsive: this property will signal responsiveness changes of the web process. The application using the web view can connect to the notifications of the object property as with any other GObject class, and can also get the its value using the new webkit_web_view_get_is_web_process_responsive() API method.
    2. A new API method webkit_web_view_terminate_web_process() that can be used kill the web process when it becomes unresponsive (but will also work with responsive ones). Any load performed after this call will spawn a new and shiny web process. When this method is used to kill a web process, the web view will emit the signal web-process-terminated with WEBKIT_WEB_PROCESS_TERMINATED_BY_API as the termination reason.

    With these changes, the browser can now connect to the is-web-process-responsive signal changes. If the web process becomes unresponsive, the browser can launch a dialog informing the user of the problem, and ask whether to wait or kill the process. If the user chooses to kill the process, the browser can use webkit_web_view_terminate_web_process() to kill the problematic process and then reload (or load a new page) to go back to functional state.

    These changes, among other wonderful features we’re developing at Igalia, will be available on the upcoming 2.34 release of WebKitGTK and WPE. I hope they are useful for you 🙂

    Happy coding!

    by magomez at June 28, 2021 02:15 PM

    Samuel Iglesias

    My experience in esLibre 2021

    This year, I decided to participate as speaker in esLibre 2021 conference. esLibre is a Spanish free software conference that covers a lot of different topics related to open-source projects: from the technical point of view to its social impact.

    This year the conference had talks about game development with Godot, KDE, LibreOffice, Free Software in Universities among many others. Check out the program.

    esLibre 2021

    This is my first time participating in this conference and I enjoyed it a lot. Huge applause to the organization team for the huge work to organize this edition, for helping out the speakers with different testing days and for their kindness to reply any question from me and other attendees. They did a superb job!

    My talk was an introduction to Mesa where I covered things like where is Mesa in the open-source graphics stack, a summary of what it does, the drivers implemented in Mesa, how our community is organized and how to contribute to it. If you know Spanish, you can check it out here (PDF). But in case you want an English version of it, this talk is very similar to the one I gave at Ubucon Europe 2018.

    My esLibre talk was recorded as well! I’ll update this post with the link to the recording once it is publicly available.

    Enjoy it!

    Introduction to Mesa

    June 28, 2021 08:20 AM

    June 24, 2021

    Gyuyoung Kim

    The progress of the legacy IPC migration in Chromium

    Recall of the legacy IPCs migration

    As you might know, Chromium has a multi-process architecture that involves many processes communicating with one another through IPC (Inter-Process Communication). However, traditionally IPC has used an approach which the Chromium project feels are not the best fit for achieving the project’s goals in terms of security, stability, and integration of a large number of components.  Igalia has been working on many aspects of changing this.  One very substantial effort has been in converting from the legacy named pipe IPC implementation to use the Mojo Framework. Mojo itself is approximately 3 times faster than the legacy IPC and involves one-third less context switching. We can remove unnecessary layers like //content/renderer layer to communicate between different processes. Besides we can easily connect interface clients and implementations across arbitrary inter-process boundaries.  Thus, replacing the legacy IPC with Mojo is one of the goals for Onion Soup 2.0 [1] and it’s a prerequisite to further Onion Souping and refactoring. The uses of the legacy IPCs were very spread widely and blocking several high-impact projects including BackForwardCache, Multiple blink isolates, and RenderDocument. There were about 450 legacy IPC in January 2020. Approximately 264 IPCs in //content layer and there were 194 IPCs in the other areas.

    The current Progress

    Igalia has been working on converting the legacy IPCs to Mojo since 2020 in earnest. We have mainly focused on converting the legacy IPCs in //content during the last year. 293 messages have been migrated and 3 IPC messages remain. 98% has done since August 2019. As IPC messages conversion was almost done in //content layer at the end of March 2021, there are only 3 IPC messages related to Jin Java Bridge messages now. And also, recently we started working on other areas. For example, Printing, Extensions, Android WebView, and so on. 49 messages have been migrated and 150 messages still remain. 24% has done since August 2019. We have been working on migrating the legacy IPC to Mojo in other modules since BlinkOn 13. All IPCs were successfully completed to migrate to Mojo in Android WebView, Media, Printing modules since BlinkOn13. And now, Igalia has been working on converting in Extensions. Besides that, the Java legacy IPC conversion for the communication between C++ native and Java layer for Android was held for a while because there are some issues with WebView API in the content layer.
    We have been working on migrating the legacy IPC to Mojo in other modules since BlinkOn 13. All IPCs were successfully completed to migrate to Mojo in Android WebView, Media, Printing modules since BlinkOn13. And now, Igalia has been working on converting in Extensions. Besides that, the Java legacy IPC conversion for the communication between C++ native and Java layers for Android was held for a while because there are some issues with WebView API in the content layer.
    The graph of the progress of the legacy IPC migration [2]
    The table of the progress of the legacy IPC migration [2]
    I shared this progress in the lightning talk of BlinkOn14. You can the video and the slides on the links.


    [1] OnionSoup 2.0
    [2] Migration to Legacy IPC messages

    by gyuyoung at June 24, 2021 03:23 AM

    June 21, 2021

    Ricardo García

    VK_EXT_multi_draw released for Vulkan

    The Khronos Group has released today a new version of the Vulkan specification that includes the VK_EXT_multi_draw extension. This new extension has been championed by Mike Blumenkrantz, contracted by Valve to work on Zink, an OpenGL implementation that’s part of Mesa and runs on top of Vulkan. Mike has been working very hard to make OpenGL-on-Vulkan performant and better, and came up with this extension to close an existing gap between the two APIs. As part of the ongoing collaboration between Igalia and Valve, I had the chance to participate in the release process by reviewing the specification text in depth, providing feedback and fixes, and writing a set of CTS tests to check conformance for drivers implementing the extension. As you can see in the contributors list, VK_EXT_multi_draw had input and feedback from more vendors. Special mention to Jason Ekstrand from Intel, who provided an initial review of the text, and Piers Daniell from NVIDIA, who was also involved since the early stages.

    Thanks to VK_EXT_multi_draw, Vulkan will have equivalents to the glMultiDrawArrays and glMultiDrawElements functions from OpenGL. They’re called vkCmdDrawMultiEXT and vkCmdDrawMultiIndexedEXT. These two new functions allow recording a batch of draw commands in a command buffer using a single call, and they can be used in situations where an application would be recording a high number of draws without changing state. Although Vulkan already had mechanisms that allowed applications to record batches of draw commands in the form of indirect draws, these need the array of draw parameters to reside in a GPU-accessible buffer. VK_EXT_multi_draw, on the other hand, lets applications provide arrays of draw parameters using CPU memory.

    vkCmdDrawMultiEXT is essentially equivalent to calling vkCmdDraw multiple times in a row, and vkCmdDrawMultiIndexedEXT does the same for vkCmdDrawIndexed. To improve application performance and reduce CPU overhead, Vulkan drivers are allowed and encouraged to omit checks for API function arguments provided by applications (these correctness checks are provided by the Vulkan Validation Layers mainly during application development), and thanks to mechanisms like primary and secondary command buffers, Vulkan makes it possible to prepare sequences of commands for the GPU to execute using multiple threads and CPU cores. In this situation, you may be wondering how much of an improvement the new functions provide apart from saving a few microseconds processing some function calls. In other words, what’s the practical difference between calling vkCmdDraw a thousand times and batching a thousand draws using vkCmdDrawMultiEXT?

    The answer is that most of the overhead of recording a draw command doesn’t come from having to call a function, but in the checks the implementation has to run when recording the command. These checks may not be related to correctness, but to additional actions and options that may need to be taken depending on the state of the command buffer in the moment the draw command is recorded. For example, see the calls to radv_before_draw when RADV processes a draw command (note: RADV is Mesa’s super nice free software Vulkan driver for AMD cards). These checks only need to run once when using the new functions. In bechmark-like scenarios using real drivers, Mike has been able to verify that, while the overhead varies per driver and some of them are lightweight and have minimal overhead, some mainstream drivers can double their draw call processing rate when using VK_EXT_multi_draw.

    Mike has work-in-progress implementations for Mesa’s ANV and RADV drivers (the Vulkan drivers for Intel and AMD GPUs, respectively) which pass conformance and will hopefully land soon in Mesa’s main branch, and more drivers are expected to ship support for the extension in the near future.

    June 21, 2021 06:30 AM

    June 08, 2021

    Eric Meyer

    Back in the CSSWG

    As you might have noticed, I recently wrote about how I got started with CSS a quarter century ago,  what I’ve seen change over that long span of time, and the role testing has played in both of those things.

    After all, CSS tests are most of how I got onto the Cascading Style Sheets & Formatting Properties Working Group (as it was known then) back in the late 1990s.  After I’d finished creating tests for nearly all of CSS, I wrote the chair of the CSS&FP WG, Chris Lilley, about it.  The conversation went something like, “Hey, I have all these tests I’ve created, would the WG or browser makers be at all interested in using them?”  To which the answer was a resounding yes.

    Not too much later, I made some pithy-snarky comment on www-style about how only the Cool Kids on the WG knew what was going on with something or other, and I wasn’t one of them, pout pout.  At which point Chris emailed me to say something like, “We have this role called Invited Expert; how would you like to be one?”  To which the answer was a resounding (if slightly stunned) yes.

    I came aboard with a lot of things in mind, but the main thing was to merge my test suite with some other tests and input from smart folks to create the very first official W3C test suite.  Of any kind, not just for CSS.  It was announced alongside the promotion of CSS2 to Recommendation status in December 1998.

    I stayed an Invited Expert for a few years, but around 2003 I withdrew from the group for lack of time and input, and for the last 17-some years, that’s how it’s stayed.  Until now, that is: as of yesterday, I’ve rejoined the CSS Working Group, this time as an official Member, one of several representing Igalia.  And fittingly, Chris Lilley was the first to welcome me back.

    I’m returning to take back up the mantle I carried the first time around: testing CSS.  I intend to focus on creating Web Platform Test entries demonstrating new CSS features, clarifying changes to existing specifications, and filling in areas of CSS that are under-tested.  Maybe even to draft tests for things the WG is debating, to explore what a given proposal would mean in terms of real-world rendering.

    My thanks to Igalia for enabling my return to the CSS WG, as well as supporting my contributions yet to come.  And many thanks to the WG for a warm welcome.  I have every hope that I’ll be able to once more help CSS grow and improve in my own vaguely unique way.

    Have something to say to all that? You can add a comment to the post, or email Eric directly.

    by Eric Meyer at June 08, 2021 01:43 PM

    June 06, 2021

    Manuel Rego

    :focus-visible in WebKit - May 2021

    And again this is a new report about the work around :focus-visible in WebKit, you can check the previous ones at:

    As you might already know this work is part of the Open Prioriziatation campaign by Igalia that has been funded by a lot of people. Thank you all for your support!

    The high level summary is that the implementation in WebKit can be considered to be complete and all the :focus-visible patches have been included on the last Safari Technology Preview 125 as an experimental feature. Moreover, Igalia has been in conversations with Apple trying to find a way to enable the feature by default at some point.

    Implementation details

    As I’ve just mentioned, the implementation finished by the end of April, and no more patches have landed since then. It passes most of the WPT tests, there are still some minor differences here and there (like some input types matching or not :focus-visible) but those issues have been considered to be fine as they depend on the different browsers specific behavior.

    You can test this feature in Safari Technology Preview (since release 125) by enabling the runtime flag in the menu (Develop > Experimental Features > :focus-visible pseudo-class). Please play with it and report any issue you might find.

    Debate time

    During the last patch reviews more Apple engineers got interested on the feature, and there were a bunch of discussions about whether it would (or should) change the default behavior in WebKit, and how.

    So let’s start from the beginning, what is focus-visible? A broad description of :focus-visible is that it will match based on when the browser would natively show a focus ring. The typical example for this are buttons, in general when people click a button they don’t expect to see a focus ring, for that reason most browsers haven’t been showing it for years. When an element is focused browsers use some internal heuristics to decide when to show or not a focus ring.

    However buttons in Safari are different to other browsers, and that’s because Safari follows the Mac platform conventions. Buttons are not click focusable in Safari (though you can still focus them via keyboard with Option + Tab), as they don’t receive focus on click they don’t even match :focus, so they never show a focus ring on mouse interactions. This behavior tries to mimic what happens on the Mac platform, but there are still some differences. The Mac platform standard, for example, allows that you can be editing an input, click on a button and keep editing the input as the focus is still there. However that’s not exactly what happens in Safari either, when you click the button, even if it doesn’t get the focus, the focus is gone from the input, so you cannot just continue editing it like in the platform. On top of that, an invisible navigation caret moves to that button on click, and further keyboard navigations start from there. So it’s kind of similar to the platform, but with some nuances.

    This is only part of the problem, the web is full of things that are focusable, like <div tabindex="0"> elements. These elements have always matched (and still match) :focus by default, and have usually showed a focus ring when focused via mouse click. Web authors generally want to hide the focus ring when clicking on <div tabindex="0"> elements, and that’s why the current :focus-visible implementations don’t match in this case. Chrome and Firefox are using :focus-visible in the User Agent (UA) style sheet, so they don’t show a focus ring when clicking on such elements. However, Apple has expressed some concerns here that it might change the default focus indicator behavior in a way that might differ from their platform philosophy, and thus needs more review.

    During these conversations an idea showed up as potential solution. What if we show a focus ring when users click on a generic <div tabindex="0">, but we don’t if that element has some specific role, e.g. <div tabindex="0" role="button">. This would give web authors the possibility to get their desired behavior by just adding a role to those elements.

    This would make <div tabindex="0" role="button"> work similar to regular buttons on Mac, but there’s still one difference, those elements will still get the focus so some use cases might get broken. James Craig came out with a scenario in which an user is scrolling the page with the spacebar, then they click on a <div tabindex="0" role="button">, and if they enter spacebar again, that wouldn’t keep scrolling the page anymore. And the user won’t know exactly why, as they haven’t seen any focus ring after click (note that with the current :focus-visible implementation, they user will start to see a focus ring on that <div tabindex="0" role="button"> after entering the spacebar).

    On that discussion James has shared an idea to add a new CSS property (or it could be a HTML attribute) that marks an element so it cannot receive focus via mouse click. That would make possible to make buttons work like in Safari in other browsers, or make a <div tabindex="0"> to work like a Mac button too. However this would be something new that would need to get implemented in all browsers, not just WebKit and that would need to get discussed and agreed with the web community.

    On the same issue Brian Kardell is proposing some alternatives, for example having some special parameter like :focus-visible(platform) (syntax to be defined) that could behave differently in Safari than other browsers, so Safari can use it in the UA style sheet, while :focus-visible alone would work the same in all browsers.

    As you see there’s not a clear solution to all this discussion yet, but we’re following it closely and providing our feedback to try to reach some final proposal that makes everyone happy.

    Some numbers

    Let’s do a final review of the total numbers (as nothing has changed in May):

    • 26 PRs merged in WPT.
    • 27 patches landed in WebKit.
    • 9 patches landed in Chromium.
    • 2 PRs merged in CSS spcs.
    • 1 PR merged in HTML spec.

    Wrapping up

    :focus-visible has been added to WebKit thanks to the support from many individual people and organizations that make it happen through the Open Prioritization experiment by Igalia. Once more big thanks to you all! 🙏

    In addition, the WPT test suite has been improved counting now ~40 tests for this feature. Also in January neither Firefox or Chrome were using :focus-visible on the UA style sheet, however they both use it there nowadays. Thus, doing the implementation on WebKit has helped to move forward this feature on different places.

    There is still the ongoing discussion about when or how this could be enabled by default in WebKit and eventually shipped in Safari. That conversation is moving and we hope there’ll be some kind of positive resolution so this feature can be enjoyed by web authors in all the browser engines. Igalia will keep being on top of the topic and pushing things forward to make it happen.

    Finally, thanks to everyone who has helped in the different conversations, reviews, etc. during these months.

    June 06, 2021 10:00 PM

    June 02, 2021

    Eric Meyer

    Ancestors and Descendants

    After my post the other day about how I got started with CSS 25 years ago, I found myself reflecting on just how far CSS itself has come over all those years.  We went from a multi-year agony of incompatible layout models to the tipping point of April 2017, when four major Grid implementations shipped in as many weeks, and were very nearly 100% consistent with each other.  I expressed delight and astonishment at the time, but it still, to this day, amazes me.  Because that’s not what it was like when I started out.  At all.

    I know it’s still fashionable to complain about how CSS is all janky and weird and unapproachable, but child, the wrinkles of today are a sunny park stroll compared to the jagged icebound cliff we faced at the dawn of CSS.  Just a few examples, from waaaaay back in the day:

    • In the initial CSS implementation by Netscape Navigator 4, padding was sometimes a void.  What I mean is, you could give an element a background color, and you could set a border, but if you adding any padding, in some situations it wouldn’t take on the background color, allowing the background of the parent element to show through.  Today, we can recreate that effect like so:
      border: 3px solid red;
      padding: 0.5em;
      background-color: cornflowerblue;
      background-clip: content-box;

      Padding as a void.

      .code-by-example { display: flex; flex-wrap: wrap; gap: 1em 1.5em; margin: 1.5em 0; } .code-by-example pre { flex-grow: 1; margin: 0; } #padding-void { border: 3px solid red; padding: 0.75em; background-color: cornflowerblue; background-clip: content-box; flex-shrink: 0; flex-grow: 1; } But we didn’t have background-clip in those days, and backgrounds weren’t supposed to act like that.  It was just a bug that got fixed a few versions later. (It was easier to get browsers to fix bugs in those days, because the web was a lot smaller, and so were the stakes.)  Until that happened, if you wanted a box with border, background, padding, and content in Navigator, you wrapped a <div> inside another <div>, then applied the border and background to the outer and the padding (or a margin, at that point it didn’t matter) to the inner.
    • In another early Navigator 4 version, pica math was inverted: Instead of 12 points per pica, it was set to 12 picas per point — so 12pt equated to 144pc instead of 1pc.  Oops.
    • Navigator 4’s handling of color values was another fun bit of bizarreness.  It would try to parse any string as if it were hexadecimal, but it did so in this weird way that meant if you declared color: inherit it would render in, as one person put it, “monkey-vomit green”.
    • Internet Explorer for Windows started out by only tiling background images down and to the right.  Which was fine if you left the origin image in the top left corner, but as soon as you moved it with background-position, the top and left sides of the element just… wouldn’t have any background.  Sort of like Navigator’s padding void!
    • At one point, IE/Win (as we called it then) just flat out refused to implement background-position: fixed.  I asked someone on that team point blank if they’d ever do it, and got just laughter and then, “Ah no.” (Eventually they relented, opening the door for me to create complexspiral and complexspiral distorted.)
    • For that matter, IE/Win didn’t inherit font sizes into tables.  Which would be annoying even today, but in the era of still needing tables to do page-level layout, it was a real problem.
    • IE/Win had so many layout bugs, there were whole sites dedicated to cataloging and explaining them.  Some readers will remember, and probably shudder to do so, the Three-Pixel Text Jog, the Phantom Box Bug, the Peekaboo Bug, and more.  Or, for that matter, hasLayout/zoom.
    • And perhaps most famous of all, Netscape and Opera implemented the W3C box model (2021 equivalent: box-sizing: content-box) while Microsoft implemented an alternative model (2021 equivalent: box-sizing: border-box), which meant apparently simple CSS meant to size elements would yield different results in different browsers.  Possibly vastly different, depending on the size of the padding and so on.  Which model is more sensible or intuitive doesn’t actually matter here: the inconsistency literally threatened the survival of CSS itself.  Neither side was willing to change to match the other — “we have customers!” was the cry — and nobody could agree on a set of new properties to replace height and width.  It took the invention of DOCTYPE switching to rescue CSS from the deadlock, which in turn helped set the stage for layout-behavior properties like box-sizing.

    I could go on.  I didn’t even touch on Opera’s bugs, for example.  There was just so much that was wrong.  Enough so that in a fantastic bit of code aikido, Tantek turned browsers’ parsing bugs against them, redirecting those failures into ways to conditionally deliver specific CSS rules to the browsers that needed them.  A non-JS, non-DOCTYPE form of browser sniffing, if you like — one of the earliest progenitors of feature queries.

    I said DOCTYPE switching saved CSS, and that’s true, but it’s not the whole truth.  So did the Web Standards Project, WaSP for short.  A group of volunteers, sick of the chaotic landscape of browser incompatibilities (some intentional) and the extra time and cost of dealing with them, who made the case to developers, browser makers, and the tech press that there was a better way, one where browsers were compatible on the basics like W3C specifications, and could compete on other features.  It was a long, wearying, sometimes frustrating, often derided campaign, but it worked.

    The state of the web today, with its vast capability and wide compatibility, owes a great deal to the WaSP and its allies within browser teams.  I remember the time that someone working on a browser — I won’t say which one, or who it was — called me to discuss the way the WaSP was treating their browser. “I want you to be tougher on us,” they said, surprising the hell out of me. “If we can point to outside groups taking us to task for falling short, we can make the case internally to get more resources.”  That was when I fully grasped that corporations aren’t monoliths, and formulated my version of Hanlon’s Razor: “Never ascribe to malice that which is adequately explained by resource constraints.”

    The original Acid Test.

    In order to back up what we said when we took browsers to task, we needed test cases.  This not only gave the CSS1 Test Suite a place of importance, but also the tests the WaSP’s CSS Action Committee (aka the CSS Samurai) devised.  The most famous of these is the first CSS Acid Test, which was added to the CSS1 Test Suite and was even used as an Easter egg in Internet Explorer 5 for Macintosh.

    The need for testing, whether acid or basic, lives on in the Web Platform Tests, or WPT for short.  These tests form a vital link in the development of the web.  They allow specification authors to create reference results for the rules in those specifications, and they allow browser makers to see if the code they’re writing yields the correct results.  Sometimes, an implementation fails a test and the implementor can’t figure out why, which leads to a discussion with the authors of the specification, and that can lead to clarifications of the specification, or to fixing flawed tests, or even to both.  Realize just how harmonious browser support for HTML and CSS is these days, and know that WPT deserves a big part of the credit for that harmony.

    As much as the Web Standards Project set us on the right path, the Web Platform Tests keep us on that path.  And I can’t lie, I feel like the WPT is to the CSS1 Test Suite much like feature queries are to those old CSS parser hacks.  The latter are much greater and more powerful than than the former, but there’s an evolutionary line that connects them.  Forerunners and inheritors.  Ancestors and descendants.

    It’s been a real privilege to be present as CSS first emerged, to watch as it’s developed into the powerhouse it is today, and to be a part of that story — a story that is, I believe, far from over.  There are still many ways for CSS to develop, and still so many things we have yet to discover in its feature set.  It’s still an entrancing language, and I hope I get to be entranced for another 25 years.

    Thanks to Brian Kardell, Jenn Lukas, and Melanie Sumner for their input and suggestions.

    Have something to say to all that? You can add a comment to the post, or email Eric directly.

    by Eric Meyer at June 02, 2021 08:16 PM

    May 27, 2021

    Brian Kardell

    Stranger Than Fractions

    Stranger Than Fractions

    There's a new Math Working Group in the W3C (and I'm co-chairing). In this post, I'll share some information on that, why I really hope your organizations will join, as well as some personal reflections.

    Life is weird. If I could travel back in time and explain my life to a younger me, I couldn't even count of the number of things that younger me would have just absolutely scoffed at in disbelief. Here's another one to add to the list: I'm co-chairing a new W3C Working Group focused on Math on the Web.

    I'm not going to offer all of the reasons this would be surprising to a younger me, but suffice to say it's a pretty long list. Even a me of only 2-3 years ago would probably be pretty incredulous. See, I'd never really given math on the web much thought until then. The thing that really brought it to my attention was that a company I knew to be full of some pretty smart people (Igalia, where I now work) were suddenly talking about how to add MathML to Chromium, and why this is a thing we should do. It came up before the W3C Technical Architecture Group and was getting some larger discussion around the interwebs. Particularly, there were connections to Extensible Web Manifesto. I felt kind of compelled to really think about it and write something thoughtful about it myself. So in January 2019 (I didn't work for Igalia then) I wrote Harold Crick and the Web Platform.

    Based on this and some other observations that I was having about what an important role I thought Igalia could play in so many fundamentally important issues, I applied there (here). Since then I've tried to help "right the ship" and get math onto the web and on a stable footing that is integrated with the platform. I participated in the CG where we worked out MathML-Core, attempting to do just that. I helped write some tests, open (and resolve!) issues in a number of standards about how we integrate, draft a bit of spec, open implementation bugs (and ship changes in all browsers!), explain why this work is important from a significant number of angles (not the least of which is that it is societally important) in blog posts and talks (I won't link them all because there's already a lot of links here), prioritize work, draft an expainer, work through a TAG review, draft the new Working Group Charter and gain support for it (I'm very pleased to say that every browser vendor supported its creation - and Chrome was even first one, if anyone has doubts).

    A few weeks after the charter was approved, I was asked to sign on as a co-chair to lead the MathML-Core portion (the bit that goes in browsers). Last week I was officially added and "approved by the director" as co-chair.

    Now for... you know... lots more important work as we try reach a really great state of affairs.

    We'll only really hope to do that though with help and good, diverse (from many angles) participation in the Working Group. If you're a W3C member, consider getting involved yourself. If not, still please comment on issues and review things. Importantly: Put aside any math phobias, doubts or pre-conceived notions. Even if your present-self is a little (or even a lot) incredulous at the idea that you can really help. Believe me, I get it. But that's wrong. Help and participation from people with backgrounds across the platform aren't only very welcome, they're necessary: There's a lot to do to ensure that the platform is sensible and consistent as possible. Many discliplines need to coordinate to make sure that things stay on track, make sense and that important aspects don't get left behind.

    We'll be starting up the MathML-Core meetings soon (end of June or early July, tdb sooon) and focus on actually moving some of this through the standards process and beginning to work together to answer remaining questions and make sure we're driving toward really good interoperable, well integrated math on the Web.

    We can do this.

    May 27, 2021 04:00 AM

    May 25, 2021

    Eric Meyer

    25 Years of CSS

    It was the morning of Tuesday, May 7th and I was sitting in the Ambroisie conference room of the CNIT in Paris, France having my mind repeatedly blown by an up-and-coming web technology called “Cascading Style Sheets”, 25 years ago this month.

    I’d been the Webmaster at Case Western Reserve University for just over two years at that point, and although I was aware of table-driven layout, I’d resisted using it for the main campus site.  All those table tags just felt… wrong.  Icky.  And yet, I could readily see how not using tables hampered my layout options.  I’d been holding out for something better, but increasingly unsure how much longer I could wait.

    Having successfully talked the university into paying my way to Paris to attend WWW5, partly by having a paper accepted for presentation, I was now sitting in the W3C track of the conference, seeing examples of CSS working in a browser, and it just felt… right.  When I saw a single word turned a rich blue and 100-point size with just a single element and a few simple rules, I was utterly hooked.  I still remember the buzzing tingle of excitement that encircled my head as I felt like I was seeing a real shift in the web’s power, a major leap forward, and exactly what I’d been holding out for.

    Page 4, HTML 3.2.

    Looking back at my hand-written notes (laptops were heavy, bulky, battery-poor, and expensive in those days, so I didn’t bother taking one with me) from the conference, which I still have, I find a lot that interests me.  HTTP 1.1 and HTML 3.2 were announced, or at least explained in detail, at that conference.  I took several notes on the brand-new <OBJECT> element and wrote “CENTER is in!”, which I think was an expression of excitement.  Ah, to be so young and foolish again.

    There are other tidbits: a claim that “standards will trail innovation” — something that I feel has really only happened in the past decade or so — and that “Math has moved to ActiveMath”, the latter of which is a term I freely admit I not only forgot, but still can’t recall in any way whatsoever.

    My first impressions of CSS, split for no clear reason across two pages.

    But I did record that CSS had about 35 properties, and that you could associate it with markup using <LINK REL=STYLESHEET>, <STYLE>…</STYLE>, or <H1 STYLE="…">.  There’s a question — “Gradient backgrounds?” — that I can’t remember any longer if it was a note to myself to check later, or something that was floated as a possibility during the talk.  I did take notes on image backgrounds, text spacing, indents (which I managed to misspell), and more.

    What I didn’t know at the time was that CSS was still largely vaporware.  Implementations were coming, sure, but the demos I’d seen were very narrowly chosen and browser support was minimal at best, not to mention wildly inconsistent.  I didn’t discover any of this until I got back home and started experimenting with the language.  With a printed copy of the CSS1 specification next to me, I kept trying things that seemed like they should work, and they didn’t.  It didn’t matter if I was using the market-dominating behemoth that was Netscape Navigator or the scrappy, fringe-niche new kid Internet Explorer: very little seemed to line up with the specification, and almost nothing worked consistently across the browsers.

    So I started creating little test pages, tackling a single property on each page with one test per value (or value type), each just a simple assertion of what should be rendered along with a copy of the CSS used on the page.  Over time, my completionist streak drove me to expand this smattering of tests to cover everything in CSS1, and the perfectionist in me put in the effort to make it easy to navigate.  That way, when a new browser version came out, I could run it through the whole suite of tests and see what had changed and make note of it.

    Eventually, those tests became the CSS1 Test Suite, and the way it looks today is pretty much how I built it.  Some tests were expanded, revised, and added, plus it eventually all got poured into a basic test harness that I think someone else wrote, but most of the tests — and the overall visual design — were my work, color-blindness insensitivity and all.  Those tests are basically what got me into the Working Group as an Invited Expert, way back in the day.

    Before that happened, though, with all those tests in hand, I was able to compile CSS browser support information into a big color-coded table, which I published on the CWRU web site (remember, I was Webmaster) and made freely available to all.  The support data was stored in a large FileMaker Pro database, with custom dropdown fields to enter the Y/N/P/B values and lots of fields for me to enter template fragments so that I could export to HTML.  That support chart eventually migrated to the late Web Review, where it came to be known as “the Mastergrid”, a term I find funny in retrospect because grid layout was still two decades in the future, and anyway, it was just a large and heavily styled data table.  Because I wasn’t against tables for tabular data.  I just didn’t like the idea of using them solely for layout purposes.

    You can see one of the later versions of Mastergrid in the Wayback Machine, with its heavily classed and yet still endearingly clumsy markup.  My work maintaining the Mastergrid, and articles I wrote for Web Review, led to my first book for O’Reilly (currently in its fourth edition), which led to my being asked to write other books and speak at conferences, which led to my deciding to co-found a conference… and a number of other things besides.

    And it all kicked off 25 years ago this month in a conference room in Paris, May 7th, 1996.  What a journey it’s been.  I wonder now, in the latter half of my life, what CSS — what the web itself — will look like in another 25 years.

    Have something to say to all that? You can add a comment to the post, or email Eric directly.

    by Eric Meyer at May 25, 2021 03:30 PM

    Enrique Ocaña

    GStreamer WebKit debugging by using external tools (2/2)

    This is the last post of the series showing interesting debugging tools, I hope you have found it useful. Don’t miss the custom scripts at the bottom to process GStreamer logs, help you highlight the interesting parts and find the root cause of difficult bugs. Here are also the previous posts of the series:

    How to debug pkgconfig

    When pkg-config finds the PKG_CONFIG_DEBUG_SPEW env var, it explains all the steps used to resolve the packages:

    PKG_CONFIG_DEBUG_SPEW=1 /usr/bin/pkg-config --libs x11

    This is useful to know why a particular package isn’t found and what are the default values for PKG_CONFIG_PATH when it’s not defined. For example:

    Adding directory '/usr/local/lib/x86_64-linux-gnu/pkgconfig' from PKG_CONFIG_PATH
    Adding directory '/usr/local/lib/pkgconfig' from PKG_CONFIG_PATH
    Adding directory '/usr/local/share/pkgconfig' from PKG_CONFIG_PATH
    Adding directory '/usr/lib/x86_64-linux-gnu/pkgconfig' from PKG_CONFIG_PATH
    Adding directory '/usr/lib/pkgconfig' from PKG_CONFIG_PATH
    Adding directory '/usr/share/pkgconfig' from PKG_CONFIG_PATH

    If we have tuned PKG_CONFIG_PATH, maybe we also want to add the default paths. For example:

    export PKG_CONFIG_PATH=${SYSROOT}/usr/local/lib/pkgconfig:${SYSROOT}/usr/lib/pkgconfig
    # Add also the standard pkg-config paths to find libraries in the system
    export PKG_CONFIG_PATH=${PKG_CONFIG_PATH}:/usr/local/lib/x86_64-linux-gnu/pkgconfig:\
    # This tells pkg-config where the "system" pkg-config dir is. This is useful when cross-compiling for other
    # architecture, to avoid pkg-config using the system .pc files and mixing host and target libraries
    export PKG_CONFIG_LIBDIR=${SYSROOT}/usr/lib
    # This could have been used for cross compiling:

    Man in the middle proxy for WebKit

    Sometimes it’s useful to use our own modified/unminified files with a 3rd party service we don’t control. Mitmproxy can be used as a man-in-the-middle proxy, but I haven’t tried it personally yet. What I have tried (with WPE) is this:

    1. Add an /etc/hosts entry to point the host serving the files we want to change to an IP address controlled by us.
    2. Configure a web server to provide the files in the expected path.
    3. Modify the ResourceRequestBase constructor to change the HTTPS requests to HTTP when the hostname matches the target:
    ResourceRequestBase(const URL& url, ResourceRequestCachePolicy policy)
        : m_url(url)
        , m_timeoutInterval(s_defaultTimeoutInterval)
        , m_isAppBound(false)
        if (""))
            && m_url.protocol().containsIgnoringASCIICase(String("https"))) {
            printf("### %s: URL %s detected, changing from https to http\n",
                __PRETTY_FUNCTION__, m_url.string().utf8().data()); 

    :bulb: Pro tip: If you have to debug minified/obfuscated JavaScript code and don’t have a deobfuscated version to use in a man-in-the-middle fashion, use to deobfuscate it and get meaningful variable names.

    Bandwidth control for a dependent device

    If your computer has a “shared internet connection” enabled in Network Manager and provides access to a dependent device , you can control the bandwidth offered to that device. This is useful to trigger quality changes on adaptive streaming videos from services out of your control.

    This can be done using tc, the Traffic Control tool from the Linux kernel. You can use this script to automate the process (edit it to suit to your needs).

    Useful scripts to process GStreamer logs

    I use these scripts in my daily job to look for strange patterns in GStreamer logs that help me to find the cause of the bugs I’m debugging:

    • h: Highlights each expression in the command line in a different color.
    • mgrep: Greps (only) for the lines with the expressions in the command line and highlights each expression in a different color.
    • filter-time: Gets a subset of the log lines between a start and (optionally) an end GStreamer log timestamp.
    • highlight-threads: Highlights each thread in a GStreamer log with a different color. That way it’s easier to follow a thread with the naked eye.
    • remove-ansi-colors: Removes the color codes from a colored GStreamer log.
    • aha: ANSI-HTML-Adapter converts plain text with color codes to HTML, so you can share your GStreamer logs from a web server (eg: for bug discussion). Available in most distros.
    • gstbuffer-leak-analyzer: Analyzes a GStreamer log and shows unbalances in the creation/destruction of GstBuffer and GstMemory objects.

    by eocanha at May 25, 2021 06:00 AM

    May 18, 2021

    Enrique Ocaña

    GStreamer WebKit debugging by using external tools (1/2)

    In this new post series, I’ll show you how both existing and ad-hoc tools can be helpful to find the root cause of some problems. Here are also the older posts of this series in case you find them useful:

    Use strace to know which config/library files are used by a program

    If you’re becoming crazy supposing that the program should use some config and it seems to ignore it, just use strace to check what config files, libraries or other kind of files is the program actually using. Use the grep rules you need to refine the search:

    $ strace -f -e trace=%file nano 2> >(grep 'nanorc')
    access("/etc/nanorc", R_OK)             = 0
    access("/usr/share/nano/javascript.nanorc", R_OK) = 0
    access("/usr/share/nano/gentoo.nanorc", R_OK) = 0

    Know which process is killing another one

    First, try to strace -e trace=signal -p 1234 the killed process.

    If that doesn’t work (eg: because it’s being killed with the uncatchable SIGKILL signal), then you can resort to modifying the kernel source code (signal.c) to log the calls to kill():

    SYSCALL_DEFINE2(kill, pid_t, pid, int, sig)
        struct task_struct *tsk_p;
        /* Log SIGKILL */
        if (sig & 0x1F == 9) {
            tsk_p = find_task_by_vpid(pid);
            if (tsk_p) {
                printk(KERN_DEBUG "Sig: %d from pid: %d (%s) to pid: %d (%s)\n",
                    sig, current->pid, current->comm, pid, tsk_p->comm);
            } else {
                printk(KERN_DEBUG "Sig: %d from pid: %d (%s) to pid: %d\n",
                    sig, current->pid, current->comm, pid);

    Wrap gcc/ld/make to tweak build parameters

    If you ever find yourself with little time in front of a stubborn build system and, no matter what you try, you can’t get the right flags to the compiler, think about putting something (a wrapper) between the build system and the compiler. Example for g++:

    main() {
        # Build up arg[] array with all options to be passed
        # to subcommand.
        for opt in "$@"; do
            case "$opt" in
            -O2) ;; # Removes this option
                arg[i]="$opt" # Keeps the others
        EXTRA_FLAGS="-O0" # Adds extra option
        echo "g++ ${EXTRA_FLAGS} ${arg[@]}" # >> /tmp/build.log # Logs the command
        /usr/bin/ccache g++ ${EXTRA_FLAGS} "${arg[@]}" # Runs the command
    main "$@"

    Make sure that the wrappers appear earlier than the real commands in your PATH.

    The make wrapper can also call remake instead. Remake is fully compatible with make but has features to help debugging compilation and makefile errors.

    Analyze the structure of MP4 data

    The ISOBMFF Box Structure Viewer online tool allows you to upload an MP4 file and explore its structure.

    by eocanha at May 18, 2021 06:00 AM

    May 17, 2021

    Delan Azabani

    Chromium spelling and grammar features

    Back in September, I wrote about my wonderful internship with Igalia’s web platform team. I’m thrilled to have since joined Igalia full-time, starting in the very last week of last year. My first project has been implementing the new CSS spelling and grammar features in Chromium. Life has been pretty hectic since Aria and I moved back to Perth, but more on that in another post. For now, let’s step back and review our progress.

    article > figure > img { max-width: 100%; } article > figure > figcaption { max-width: 30rem; margin-left: auto; margin-right: auto; } article > pre, article > code { font-family: Inconsolata, monospace, monospace; } .local-demo { font-style: italic; font-weight: bold; color: rebeccapurple; } .local-spelling, .local-grammar { text-decoration-thickness: 0; text-decoration-skip-ink: none; } .local-spelling { text-decoration: red wavy underline; } .local-grammar { text-decoration: green wavy underline; } .local-table { font-size: 0.75em; } .local-table td, .local-table th { vertical-align: top; border: 1px solid black; } .local-table td:not(.local-tight), .local-table th:not(.local-tight) { padding: 0.5em; } .local-tight picture, .local-tight img { vertical-align: top; } .local-compare * + *, .local-tight * + * { margin-top: 0; } .local-compare { max-width: 100%; border: 1px solid rebeccapurple; } .local-compare > div { max-width: 100%; position: relative; touch-action: pinch-zoom; --cut: 50%; } .local-compare > div > * { vertical-align: top; max-width: 100%; } .local-compare > div > :nth-child(1) { position: absolute; clip: rect(auto, auto, auto, var(--cut)); } .local-compare > div > :nth-child(2) { position: absolute; width: var(--cut); height: 100%; border-right: 1px solid rebeccapurple; } .local-compare > div > :nth-child(2):before { content: "actual"; color: rebeccapurple; font-size: 0.75em; position: absolute; right: 0.5em; } .local-compare > div > :nth-child(2):after { content: "ref"; color: rebeccapurple; font-size: 0.75em; position: absolute; left: calc(100% + 0.5em); }

    The squiggly lines that indicate possible spelling or grammar errors have been a staple of word processing on computers for decades. But on the web, these indicators are powered by the browser, which doesn’t always have the information needed to place and render them most appropriately. For example, authors might want to provide their own grammar checker (placement), or tweak colors to improve contrast (rendering).

    To address this, the CSS pseudo and text decoration specs have defined new pseudo-elements ::spelling-error and ::grammar-error, allowing authors to style those indicators, and new text-decoration-line values spelling-error and grammar-error, allowing authors to mark up their text with the same kind of decorations as native indicators.


    Current status

    I’ve sent an Intent to Prototype, as well as requests for positions from Mozilla and Apple.

    I’ve landed a patch that paves the way for ::spelling-error + ::grammar-error support internally, and I’m hopefully(!) around halfway done with implementing both the new painting rules and the new processing model.

    The spec updates, led by Florian Rivoal, were largely done by the end of 2017. As the first impl of both the features themselves and much of the underlying highlight specs, there were always going to be questions and rough edges to be clarified.

    Two issues were raised before we even started, I’ve since sent in another two, and I’ll need to raise at least two more by the time we’re done. I’ve also landed three WPT patches, including three new tests and fixes for countless more.


    In the course of my work on these features, I’ve already fixed at least two other bugs that weren’t of my own creation, and reported four more:

    1171741Selecting text causes emphasis marks to be painted twice
    1172177Erroneous viewport-size-dependent clipping of some text shadows
    1176649text-shadow paints with incorrect offset for vertical scripts in vertical writing modes
    1180068text-shadow erroneously paints over text proper in mixed upright/sideways fragments

    CJK CSS unification

    My colleague Rego noticed that the squiggly lines for spelling and grammar errors look slightly different to a naïve red or green wavy underline. How, why, and should we unify squiggly and wavy lines? Some further investigation revealed that the two kinds of decorations are drawn very differently with completely separate code paths.

    non-macOS (demo0)

    Left (bolder text): nearest wavy decorations.
    Right (lighter text): native squiggly lines.

    The case for unifying squiggly and wavy lines became a lot more complicated too. For example, our squiggly lines are actually dots on macOS. More specifically, they are round dots with an alpha gradient, matching the platform’s native controls. These details are beyond what can be expressed in terms of a dotted underline, so if we were to unify by making squiggly lines equivalent to such a decoration, we would lose that benefit.

    macOS (demo0)

    Left (bolder text): nearest dotted decorations.
    Right (lighter text): native squiggly lines.

    The spec doesn’t require that spelling-error and grammar-error lines be expressible in terms of other decoration lines, so unification won’t block shipping. I decided it would be best to revisit this once I landed some patches and familiarised myself with the code.

    Fifteen years in the making

    ::spelling-error and ::grammar-error are defined as highlight pseudo-elements, together with ::selection and ::target-text. The spec’s processing model and rendering rules are both very different to how ::selection (or ::target-text) has been implemented in any browser so far. Now that we’re implementing more than just the first couple of pseudos, we really ought to comply with the new spec, which complicates our job somewhat.

    I’ll talk about ::selection a fair bit below, because most of the spec discussion I found happened before the others were defined, going back as far as 2006. Highlight pseudos like ::selection are tricky because they aren’t tree-abiding: the selected parts of the document aren’t generally a child of any one element.

    But even then, how hard could it be?

    • What is ::selection? How does it interact with other pseudo-elements? Is it a singleton, or does each element have a ::selection pseudo-element? How do we reconcile the ::selection “tree”, if any, with the element tree?
    • Can child ::selection styles override parent ::selection styles? What about the child’s “real element” styles? How exactly do parent ::selection styles propagate to child ::selection styles? Do we use a tweaked cascade or tweaked inheritance?
    • What happens when authors specify ::selection styles that affect layout? What about styles that rely on how ::selection relates to the element tree, like outline or translucent background-color?
    • What happens when child ::selection styles specify only color or only background-color but not both? Does the other inherit as usual? If we want a special case tying these two properties together, how does it interact with other properties?
    • Does the ::selection background-color paint over text, or under it? What about “replaced” content like images? If we paint over text, do we need to make the author’s color translucent, and if so, how?
    • Is text in the ::selection color painted in addition to, or instead of, the same text in its original color? What about background-color?
    • Can the default UA stylesheet describe the platform’s ::selection style? How?
    • How naughty were browsers that implemented ::selection without a -vendor-prefix before it was standardised? Are vendor prefixes even a good idea?
    • Most importantly, how do we introduce a new processing model and rendering rules without breaking existing content?

    For answers to most of these questions, check out my notes5.

    By the time I started to understand the problem space, two weeks had passed.

    Pretty intense for my very first foray into www-style!

    Highlight painting

    The current spec isolates each highlight pseudo into an “overlay”, and allows each of them to have independent backgrounds, shadows, and other decorations.

    Like other browsers, Chromium implemented an older model, where matching ::selection rules are only used to change things like the text color and shadows (except for background-color, which has always been independent).

    But the closer I looked, the deeper the problems ran.

    Shadows and backgrounds

    everyone’s shadow code is complete made-up horseshit but mostly i blame the fact that someone decided to add ‘shadow’ to the (very small!) special list of styles ::selection could modify

    — Gankra, 2021

    I whipped up a quick demo3 with some backgrounds and shadows, and the result was… not good. “So the originating text shadow (yellow) paints over the ::selection background (grey), except when it paints under, and sometimes it even paints over the text (black)? Why is the ::selection shadow clipped to the ::selection background? What?”

    highlight-painting-001.html (based on demo3)

    Some of these were easier to fix than others. To fix backgrounds, we essentially push the code that paints the background waaaaay down NG­Text­Fragment­Painter, so that it’s before painting the selected text but after pretty much everything else. We then fix shadows similarly, reordering the text paints from “before with shadows, after with shadows, selected with shadows” to an order that keeps shadows behind text.

    These initial fixes are now live in Chromium 90, but we still need to deal with the ::selection shadow clipping. What’s up with that?

    Shadow clipping

    The weird shadow clipping was a side effect of how we ensured that the ::selection text color changes exactly where the ::selection background starts:

    1. we clip out and paint the selected text in original color, then
    2. we clip (in) and paint the selected text in ::selection color.

    This is useful for both subtle reasons, like ink overflow…

    …and not so subtle reasons, like allowing the user to clearly and precisely select graphemes in ligature-heavy languages like Sorani. In this example, یلا is three letters (îla), but only two glyphs. This isn’t explicitly required by any spec, but it’s definitely intentional.

    If you use Chromium, you may notice that the ref for that demo appears to select more text. What we’re really doing with ::selection painting is pretending that ligatures are divisible into horizontal parts and guessing how wide each part is. Current font technology just doesn’t provide the metadata to do this more “correctly”.

    Firefox always allows splitting ligature styles, including with real elements, and there are at least two good arguments in favour of this approach. Chromium has (reasonably) decided that while the technique is ok for ::selection, perhaps even desirable, it’s not the way to go for ordinary markup.

    But anyway, back to the point at hand. text-shadow means “paint the text again, under the text proper, with these colors and offsets”. We want to clip the ::selection shadow for the same reasons we clip the text proper in ::selection color, but the coordinates need to be offset for each shadow. That we don’t is the bug here.

    When painting the ::selection shadow (blue), we need to clip the canvas to the dotted line, but we were actually clipping to the solid line.

    Consensus seems to be that not doing so is undesirable, and in theory, fixing this would be straightforward, but in practice… 😵‍💫

    The first confounding factor was that NG­Text­Fragment­Painter and NG­Text­Painter were… a tangled mess. Even the owners weren’t sure this was the most helpful architecture:

    // TODO(layout-dev): Does this distinction make sense?
    class CORE_EXPORT NGTextPainter : public TextPainterBase { /* ... */ }

    Years of typographical features have been duct-taped on without a systemic approach to managing complexity, including decorations, shadows, ellipses, background clipping, RTL text, vertical text, ruby text, emphasis marks, print rendering, drag-and-drop rendering, selections, highlights, “markers”, and SVG features like stroke and fill.

    A third of the logic was in Text­Painter­Base, so good luck not breaking legacy. Shadows were painted with a now-deprecated Skia feature called a Draw­Looper, which allows you to repeat a procedure a bunch of times with different tweaks, such as canvas transformations and color changes. It’s almost specifically designed for shadows, but it’s technically possible to repeat procedures that have nothing to do with drawing text.

    // SkCanvas* canvas;
    // SkPaint paint;
    // SkScalar x, y;
    // sk_sp<SkTextBlob> blob;
    // sk_sp<SkDrawLooper> looper;
    looper->apply(canvas, paint, [&blob, x, y](SkCanvas* c, const SkPaint& p) {
        // procedure to be looped
        c->drawTextBlob(blob, x, y, p);

    My solution was based on the observation that loopers draw offset shadows by “moving” the canvas with a transform before each iteration, but transforming the canvas only affects subsequent operations. We were clipping the canvas once, before running the looper, but if we could somehow reclip the canvas after each transform, the clip region would “move” together with each shadow, and we wouldn’t even need to change the coordinates!

    I prototyped a fix that seemed to handle everything I threw at it, and informed by the challenges that involved, I also refactored out the code for selections, highlights, and markers. Stephen and I decided that adding clipping as a fixed function to Draw­Looper made more sense than adding it to the procedure. At the time, this was true.

    The prototype made my most complex test case (at the time) pass, with the exception of ink overflow color, which was a limitation of my ref (both renderings are acceptable).

    I then took a couple weeks off to move to Perth.

    Vertical vertigo

    “Wait… isn’t the original purpose of vertical writing modes, you know, vertical scripts? I wonder if those work as well as horizontal scripts being rotated sideways…”

    “…what? Let’s see what they look like without my patch…”


    Left: vertical script in vertical-rl, with patch.
    Right: same test case, without patch.

    Notice how the shadows are offset in the wrong direction. They should be painted southeast of the text proper, but were being painted northeast.

    When painting a text fragment with a vertical writing-mode, we rotate the canvas by 90° cw (or ccw for sideways-lr). This is good for horizontal scripts like Latin or Sorani, because they usually need to be painted sideways.

    Except when text-orientation is upright, which overrides the usual behaviour.

    But for vertical scripts like Han, we usually need to keep the canvas unrotated. A single text fragment can contain text in multiple scripts, so we actually achieve this by rotating the canvas back for the parts in vertical scripts.

    Except when text-orientation is sideways, which overrides the usual behaviour.

    Note that the way text-orientation is defined means that none of its values are actually supposed to affect the rendering of vertical-only scripts like Mongolian. I would suggest not thinking about this too hard.

    So far so good right?

    This is what we were doing when painting text with vertical scripts and shadows (example limited to a single script and single shadow for simplicity):

    1. Let space be our original “physical” coordinate space
    2. Let offset be the shadow’s offset in space
    3. Let selection be the selection rect coordinates in space
    4. Vertical writing mode, so rotate canvas by 90°, yielding space′
    5. Let offset′ be the result of mapping offset into space′
    6. Let selection′ be the result of mapping selection into space′
    7. Old: clip the canvas to selection′
    8. Configure a Draw­Looper that will:
      • move the canvas by offset′
      • New: clip the canvas to selection′
      • draw the text for the shadow
    9. Vertical script, so rotate canvas back by 90°, yielding space″
    10. Run the Draw­Looper, which carries out the steps above

    The looper is told to move and clip the canvas to offset′ and selection′, which are coordinates in space′, but when it eventually tries to do that, the canvas is in space″.

    offset′ being in the wrong space is why shadows have always been painted in the wrong place for vertical scripts. By reordering the clip to selection′ so it happens after the rotation to space″, we were now clipping the canvas to the wrong coordinates, which in turn made the text invisible in our demo6!


    Fixing this again proved harder than it seemed on the surface, because text painting in Chromium involves the coordination of four components: paint, shaping, cc, and Skia.

    In paint, the text painters are given a “fragment” of text to be painted in a given style. They know the writing mode, because that’s part of the style, but they know very little about the text itself. The first rotation (for the vertical writing mode) happens here, and we configure the Draw­Looper here (except for its procedure, which we pass in shaping).

    In shaping, we find the best glyphs for each character, and determine what scripts the text fragment is made of, then split the text into “blobs”. The second rotation (for the vertical script) happens here, and we throw in a skew transform too if the text we’re painting is oblique (or fake italic, which is again known only to shaping).

    In cc, we expose a Skia-like API that can either dispatch to Skia immediately or collect operations into a queue for later. Draw­Looper is in the process of being moved here, because the Skia maintainers don’t want it.

    Skia provides a stateful canvas, which more or less creates visible output.

    With each canvas transform, existing coordinates need to be remapped into the new space before they can be used again, and we were doing them imperatively in two different components. Worse still, while layout (ng) — the phase that happens before paint — uses the type system to enforce correct handling of coordinates (e.g. Physical­Offset, Logical­Rect), the same is not true for paint onwards.

    Everything is in Physical­Rect and friends, often erroneously, or in “untyped” coordinates like Float­Rect or Sk­Rect. In one case, a Physical­Offset is used in both physical and non-physical (rotated for writing-mode) spaces, to refer to two different points at different corners of the text. Here… let me illustrate.

    When painting horizontal text in vertical-rl, we rotate the canvas 90° cw around A so that the text’s left descent corner lands on B. The left ascent corner moves from B to C.

    That single variable was used to intentionally refer to both B and C at different times in a function, because the coordinates for B in space happen to be numerically the same as those for C in space′. aaaa­aaaA­AAAA­AAAA­AAAA-


    To be fair, each of these flaws has a reasonable explanation.

    Layout is a confusing place where we constantly need to deal with different coordinate spaces, so ideally we would iron everything out so that paint can work purely in physical space. Half the point of types like Logical­Rect is to provide getters and setters for concepts like “inline start” and “block end”.

    For most of the things we paint, this is ok, even desirable. Rects like ::selection backgrounds must be painted in physical space, so we can round the coordinates to integers for crisp edges. Text is the only exception: the history of computer typography means that vertical text is, to some extent, seen internally as rotated horizontal text.

    Draw­Looper is handy for painting shadows, and it might[citation needed] even reduce serialisation overhead in cc. But the way we currently configure them, baking coordinates into them before shaping, makes it even harder to handle vertical text correctly.

    Last but not least, Chromium’s pre-standard text painting order was “all rects for highlights and markers first, then all texts”. This made the imperative canvas rotations almost acceptable, if you ignore the shadow bugs, because we didn’t need to rotate the canvas back and forth nearly as many times.

    Once I moved to Perth, I spent over three weeks trying to find a systemic solution to these problems, but I just wasn’t getting anywhere meaningful. In the interests of working a bit more breadth-first and avoiding burnout, I’ve shelved highlight painting for now.

    Processing model

    Let’s return to how computed styles for highlight selectors should work.

    The consensus was that parent ::selection styles should somehow propagate to the ::selection styles of their children, so authors can use their existing CSS skills to define both general ::selection styles and more specific styles under certain elements. This was unlike all existing implementations, where the only selector that worked the way you would expect was ::selection, that is to say, *::selection.

    At first, that “somehow” was by tweaking the cascade to take parent ::selection rules into account. Emilio raised performance concerns with this, so the spec was changed, instead tweaking inheritance to make ::selection styles inherit from parent ::selection styles (and never from originating or “real” elements).

    This is what I’m working on now. I’ve got a patch that gets most of the way, first by fixing inherit, then by fixing unset, then with a couple more fixes for styles where the cascade doesn’t yield any value, but there are still a few kinks ahead:

    • impl work has raised at least three questions that need CSSWG clarification;
    • we need to optimise it, maybe more than before, to avoid perf regressions;
    • we still need to check if style invalidation works correctly; and
    • we probably want new devtools features to visualise highlight inheritance.

    Stay tuned!

    Beyond my colleagues at Igalia, special thanks go to Stephen, Rune, Koji (Google), and Emilio (Mozilla) for putting up with all of my questions, not to mention Florian and fantasai from the CSSWG, plus Gankra (Mozilla) for her writing about text rendering, which has proved both inspiring and reassuring.

    May 17, 2021 10:30 AM

    May 13, 2021

    Eric Meyer

    Adding Pandoc Arguments in BBEdit

    Thanks to the long and winding history of my blog, I write posts in Markdown in BBEdit, export them to HTML, and paste the resulting HTML into WordPress. I do it that way because switching WordPress over to auto-parsing Markdown in posts causes problems with rendering the markup of some posts I wrote 15-20 years ago, and finding and fixing every instance is a lengthy project for which I do not have the time right now.

    (And I don’t use the block editor because whenever I use it to edit an old post, the markup in those posts get mangled so much that it makes me want to hurl. This is as much the fault of my weird idiosyncratic bespoke-ancient setup as of WordPress itself, but it’s still super annoying and so I avoid it entirely.)

    Anyway, the point here is that I write Markdown in BBEdit, and export it from there. This works okay, but there have always been things missing, like a way to easily add attributes to elements like my code blocks. BBEdit’s default Markdown exporter, CommonMark, sort of supports that, except it doesn’t appear to give me control over the class names: telling it I want a class value of css on a preformatted block means I get a class value of language-css instead. Also it drops that class value on the code element it inserts into the pre element, instead of attaching it directly to the pre element. Not good, unless I start using Prism, which I may one day but am not yet.

    Pandoc, another exporter you can use in BBEdit, offers much more robust and yet simple element attribute attachment: you put {.class #id} or whatever at the beginning of any element, and you get those things attached directly to the element. But by default, it also wraps elements around, and adds attributes to, the pre element, apparently in anticipation of some other kind of syntax highlighting.

    I spent an hour reading the Pandoc man page (just kidding, I was actually skimming, that’s the only way I could possibly get through all that in an hour) and found the --no-highlight option. Perfect! So I dropped into Preferences > Languages > Language-specific settings:Markdown > Markdown, set the “Markdown processor” dropdown to “Custom”, and filled in the following:

    Command pandoc
    Arguments --no-highlight

    Done and done. I get a more powerful flavor of Markdown in an editor I know and love. It’s not perfect — I still have to manually tweak table markup by hand, for example — but it’s covering probably 95% of my use cases for writing blog posts.

    Now all I need to do is find a Pandoc Markdown option or extensions or whatever that keeps it from collapsing the whitespace between elements in its HTML output, and I’ll be well and truly satisfied.

    Have something to say to all that? You can add a comment to the post, or email Eric directly.

    by Eric Meyer at May 13, 2021 07:08 PM

    Andy Wingo

    cross-module inlining in guile

    Greetings, hackers of spaceship Earth! Today's missive is about cross-module inlining in Guile.

    a bit of history

    Back in the day... what am I saying? I always start these posts with loads of context. Probably you know it all already. 10 years ago, Guile's partial evaluation pass extended the macro-writer's bill of rights to Schemers of the Guile persuasion. This pass makes local function definitions free in many cases: if they should be inlined and constant-folded, you are confident that they will be. peval lets you write clear programs with well-factored code and still have good optimization.

    The peval pass did have a limitation, though, which wasn't its fault. In Guile, modules have historically been a first-order concept: modules are a kind of object with a hash table inside, which you build by mutating. I speak crassly but that's how it is. In such a world, it's hard to reason about top-level bindings: what module do they belong to? Could they ever be overridden? When you have a free reference to a, and there's a top-level definition of a in the current compilation unit, is that the a that's being referenced, or could it be something else? Could the binding be mutated in the future?

    During the Guile 2.0 and 2.2 cycles, we punted on this question. But for 3.0, we added the notion of declarative modules. For these modules, bindings which are defined once in a module and which are not mutated in the compilation unit are declarative bindings, which can be reasoned about lexically. We actually translate them to a form of letrec*, which then enables inlining via peval, contification, and closure optimization -- in descending order of preference.

    The upshot is that with Guile 3.0, top-level bindings are no longer optimization barriers, in the case of declarative modules, which are compatible enough with historic semantics and usage that they are on by default.

    However, module boundaries have still been an optimization barrier. Take (srfi srfi-1), a little utility library on lists. One definition in the library is xcons, which is cons with arguments reversed. It's literally (lambda (cdr car) (cons car cdr)). But does the compiler know that? Would it know that (car (xcons x y)) is the same as y? Until now, no, because no part of the optimizer will look into bindings from outside the compilation unit.

    mr compiler, tear down this wall

    But no longer! Guile can now inline across module boundaries -- in some circumstances. This feature will be part of a future Guile 3.0.8.

    There are actually two parts of this. One is the compiler can identify a set of "inlinable" values from a declarative module. An inlinable value is a small copyable expression. A copyable expression has no identity (it isn't a fresh allocation), and doesn't reference any module-private binding. Notably, lambda expressions can be copyable, depending on what they reference. The compiler then extends the module definition that's residualized in the compiled file to include a little procedure that, when passed a name, will return the Tree-IL representation of that binding. The design of that was a little tricky; we want to avoid overhead when using the module outside of the compiler, even relocations. See compute-encoding in that module for details.

    With all of that, we can call ((module-inlinable-exports (resolve-interface '(srfi srfi-1))) 'xcons) and get back the Tree-IL equivalent of (lambda (cdr car) (cons car cdr)). Neat!

    The other half of the facility is the actual inlining. Here we lean on peval again, causing <module-ref> forms to trigger an attempt to copy the term from the imported module to the residual expression, limited by the same effort counter as the rest of peval.

    The end result is that we can be absolutely sure that constants in imported declarative modules will inline into their uses, and fairly sure that "small" procedures will inline too.

    caveat: compiled imported modules

    There are a few caveats about this facility, and they are sufficiently sharp that I should probably fix them some day. The first one is that for an imported module to expose inlinable definitions, the imported module needs to have been compiled already, not loaded from source. When you load a module from source using the interpreter instead of compiling it first, the pipeline is optimized for minimizing the latency between when you ask for the module and when it is available. There's no time to analyze the module to determine which exports are inlinable and so the module exposes no inlinable exports.

    This caveat is mitigated by automatic compilation, enabled by default, which will compile imported modules as needed.

    It could also be fixed for modules by topologically ordering the module compilation sequence; this would allow some parallelism in the build but less than before, though for module graphs with cycles (these exist!) you'd still have some weirdness.

    caveat: abi fragility

    Before Guile supported cross-module inlining, there was only explicit inlining across modules in Guile, facilitated by macros. If you write a module that has a define-inlinable export and you think about its ABI, then you know to consider any definition referenced by the inlinable export, and you know by experience that its code may be copied into other compilation units. Guile doesn't automatically recompile a dependent module when a macro that it uses changes, currently anyway. Admittedly this situation leans more on culture than on tools, which could be improved.

    However, with automatically inlinable exports, this changes. Any definition in a module could be inlined into its uses in other modules. This may alter the ABI of a module in unexpected ways: you think that module C depends on module B, but after inlining it may depend on module A as well. Updating module B might not update the inlined copies of values from B into C -- as in the case of define-inlinable, but less lexically apparent.

    At higher optimization levels, even private definitions in a module can be inlinable; these may be referenced if an exported macro from the module expands to a term that references a module-private variable, or if an inlinable exported binding references the private binding. But these optimization levels are off by default precisely because I fear the bugs.

    Probably what this cries out for is some more sensible dependency tracking in build systems, but that is a topic for another day.

    caveat: reproducibility

    When you make a fresh checkout of Guile from git and build it, the build proceeds in the following way.

    Firstly, we build libguile, the run-time library implemented in C.

    Then we compile a "core" subset of Scheme files at optimization level -O1. This subset should include the evaluator, reader, macro expander, basic run-time, and compilers. (There is a bootstrap evaluator, reader, and macro expander in C, to start this process.) Say we have source files S0, S1, S2 and so on; generally speaking, these files correspond to Guile modules M0, M1, M2 etc. This first build produces compiled files C0, C1, C2, and so on. When compiling a file S2 implementing module M2, which happens to import M1 and M0, it may be M1 and M0 are provided by compiled files C1 and C0, or possibly they are loaded from the source files S1 and S0, or C1 and S0, or S1 and C0.

    The bootstrap build uses make for parallelism, with each compile process starts afresh, importing all the modules that comprise the compiler and then using them to compile the target file. As the build proceeds, more and more module imports can be "serviced" by compiled files instead of source files, making the build go faster and faster. However this introduces system-specific nondeterminism as to the set of compiled files available when compiling any other file. This strategy works because it doesn't really matter whether module M1 is provided by compiled file C1 or source file S1; the compiler and the interpreter implement the same language.

    Once the compiler is compiled at optimization level -O1, Guile then uses that freshly built compiler to build everything at -O2. We do it in this way because building some files at -O1 then all files at -O2 takes less time than going straight to -O2. If this sounds weird, that's because it is.

    The resulting build is reproducible... mostly. There is a bug in which some unique identifiers generated as part of the implementation of macros can be non-reproducible in some cases, and that disabling parallel builds seems to solve the problem. The issue being that gensym (or equivalent) might be called a different number of times depending on whether you are loading a compiled module, or whether you need to read and macro-expand it. The resulting compiled files are equivalent under alpha-renaming but not bit-identical. This is a bug to fix.

    Anyway, at optimization level -O1, Guile will record inlinable definitions. At -O2, Guile will actually try to do cross-module inlining. We run into two issues when compiling Guile; one is if we are in the -O2 phase, and we compile a module M which uses module N, and N is not in the set of "core" modules. In that case depending on parallelism and compile order, N may be loaded from source, in which case it has no inlinable exports, or from a compiled file, in which case it does. This is not a great situation for the reliability of this optimization. I think probably in Guile we will switch so that all modules are compiled at -O1 before compiling at -O2.

    The second issue is more subtle: inlinable bindings are recorded after optimization of the Tree-IL. This is more optimal than recording inlinable bindings before optimization, as a term that is not inlinable due to its size in its initial form may become small enough to inline after optimization. However, at -O2, optimization includes cross-module inlining! A term that is inlinable at -O1 may become not inlinable at -O2 because it gets slightly larger, or vice-versa: terms that are too large at -O1 could shrink at -O2. We don't even have a guarantee that we will reach a fixed point even if we repeatedly recompile all inputs at -O2, because we allow non-shrinking optimizations.

    I think this probably calls for a topological ordering of module compilation inside Guile and perhaps in other modules. That would at least give us reproducibility, provided we avoid the feedback loop of keeping around -O2 files compiled from a previous round, even if they are "up to date" (their corresponding source file didn't change).

    and for what?

    People who have worked on inliners will know what I mean that a good inliner is like a combine harvester: ruthlessly efficient, a qualitative improvement compared to not having one, but there is a pointy end with lots of whirling blades and it's important to stop at the end of the row. You do develop a sense of what will and won't inline, and I think Dybvig's "Macro writer's bill of rights" encompasses this sense. Luckily people don't lose fingers or limbs to inliners, but inliners can maim expectations, and cross-module inlining more so.

    Still, what it buys us is the freedom to be abstract. I can define a module like:

    (define-module (elf)
    (define ET_NONE		0)		; No file type
    (define ET_REL		1)		; Relocatable file
    (define ET_EXEC		2)		; Executable file
    (define ET_DYN		3)		; Shared object file
    (define ET_CORE		4)		; Core file

    And if a module uses my (elf) module and references ET_DYN, I know that the module boundary doesn't prevent the value from being inlined as a constant (and possibly unboxed, etc).

    I took a look and on our usual microbenchmark suite, cross-module inlining doesn't make a difference. But that's both a historical oddity and a bug: firstly that the benchmark suite comes from an old Scheme world that didn't have modules, and so won't benefit from cross-module inlining. Secondly, Scheme definitions from the "default" environment that aren't explicitly recognized as primitives aren't inlined, as the (guile) module isn't declarative. (Probably we should fix the latter at some point.)

    But still, I'm really excited about this change! Guile developers use modules heavily and have been stepping around this optimization boundary for years. I count 100 direct uses of define-inlinable in Guile, a number of them inside macros, and many of these are to explicitly hack around the optimization barrier. I really look forward to seeing if we can remove some of these over time, to go back to plain old define and just trust the compiler to do what's needed.

    by the numbers

    I ran a quick analysis of the modules include in Guile to see what the impact was. Of the 321 files that define modules, 318 of them are declarative, and 88 contain inlinable exports (27% of total). Of the 6519 total bindings exported by declarative modules, 566 of those are inlinable (8.7%). Of the inlinable exports, 388 (69%) are functions (lambda expressions), 156 (28%) are constants, and 22 (4%) are "primitives" referenced by value and not by name, meaning definitions like (define head car) (instead of re-exporting car as head).

    On the use side, 90 declarative modules import inlinable bindings (29%), resulting in about 1178 total attempts to copy inlinable bindings. 902 of those attempts are to copy a lambda expressions in operator position, which means that peval will attempt to inline their code. 46 of these attempts fail, perhaps due to size or effort constraints. 191 other attempts end up inlining constant values. 20 inlining attempts fail, perhaps because a lambda is used for a value. Somehow, 19 copied inlinable values get elided because they are evaluated only for their side effects, probably to clean up let-bound values that become unused due to copy propagation.

    All in all, an interesting endeavor, and one to improve on going forward. Thanks for reading, and catch you next time!

    by Andy Wingo at May 13, 2021 11:25 AM

    Brian Kardell

    Can I :has()

    Can I :has()

    As you might know, my company (Igalia) works on all of the web engines and we contribute a lot. I'm very proud of all of the things we're able to do to improve both the features of the web platform, and the overall health of this commons. I'm especially pleased when this lets us tackle historically hard problems. A very incomplete list of things with some historical challenges that we've helped move in important ways the past few years would include: CSS Grid, MathML, JavaScript Class features, hardware accelerated SVGs and Container Queries. Today I'll be telling you about another one we're working on.

    Today we're filing an intent to prototype, tackling yet another historically hard problem for the web: The :has() selector. In this post, I'd like to explain what this intent means, as well as why it matters, where it comes from and why I am very excited about it.

    :has() for the unfamilliar

    When you write a CSS rule, the last (right-most) bit of the selector is the thing that you're styling. We call that the "subject" of the rule. Most people writing CSS have, at some point, found themselves wanting to style something based on what is inside it. You might have heard people talk about wanting "a parent selector" or an "ancestor combinator". :has() is that - basically.

    /* style an .x that contains a .y descendant - not the .y */ 
    .x:has(.y) { ... }

    The long history of postponed :has

    The basic reasons to desire such powers are pretty obvious. Powerful selection ability greatly enables a real separation of concerns. This fact wasn't lost on anyone. XPath allowed it, and CSS specifications since the late 1990's have tried.

    In fact, a lot of people learned about :has() first through jQuery. That's because when John Resig wrote it he wanted it to support all of the ""new CSS 3 selectors" - and :has() was one of them. It was in the spec, so jQuery supported the :has() selector pseudo. The trouble, of course, is that no one actually knows what will gain implementations and reach recommendations at the start of the process - and :has() didn't, and was postponed again to Selectors Level 4. The first draft of Selectors Level 4 was published in 2011. It is highly starred, was among the top requested features in Microsoft's old User Voice system, and every year remains among the top 2-3 most requested features. Every so often someone (frequently me) brings it up again in the CSS Working Group.

    Why the hold up?

    Primarily, even without :has() it's pretty hard to live up to performance guarantees of CSS, where everything continue to evaluate and render "live" at 60fps. If you think, mathematically, about just how much work is conceptually involved in applying hundreds or thousands of rules as the DOM changes (including as it is parsing), it's quite a feat as is.

    Engines have figured out how to optimize this based on clever patterns and observations that avoid the work that is conceptually necessary - and a lot of that is sort of based on these subject invariants that has() would appear to throw to the wind.

    At the same time, there are plenty of aspects of this problem that are considerably easier than others. CSS engines for print, for example support :has() because they don't need to run at 60fps. DOM APIs like querySelector() / querySelectorAll() / matches() also check at a very specific point in time in a completely different manner - it's very doable there, as jQuery showed.

    There are limits that we could potentially place on this selector that might help a little. Or, there are things like :focus-within or :empty which seem similar, but internally, are very specifically easier - but very incomplete.

    And so, for the last decade this comes up once or twice a year as we try to find some way forward. In the end, we ultimately sort of go around in circles: It's impossible to make decisions and progress while everything is in limbo. We don't really know what the options are, and its hard for anyone to take up trying to imagine a way forward. As we've seen with some other things, like Container Queries, this is also somewhat of a vicious cycle: The more it comes up, and the more it is discussed without ultimate progress of any real kind, the less likely it is that anyone actually wants to talk about it again.

    We need prototyping, exploration, data we can point to and more concrete things we can discuss - but the longer it goes on, the more hopeless it looks and the less likely anyone is to do it.

    eyeo, Igalia and prototypes

    Igalia works on all of the web engines, and with lots of consumers of those engines to expand investment in this wonderful commons. eyeo makes a number of products like the Adblock Browser and Adblock Plus. While some sites can offer workarounds that employ additional classes for intentionaly styling these sorts of things, plenty of other things cannot. Lots of very useful things (reader mode, ad blockers, conformance checker plugins and search are just some examples) rely on selectors and heuristics abbout trees of markup that they didn't create. They have a definite separation of concerns and thus observe these sorts of shortcommings very acutely. Having no native solutions for some of these hard problems causes everyone to have to find their own ways deal with it themselves, and all of them have different performance characteristics and different edge cases, and all of them require additional JavaScript. That's not good for anyone. So, eyeo approached us about sponsoring work, research and prototyping on some things - among them :has(). Can we somehow get past these impasses and make progress on this one, and make things better for the entire community? What might that look like? Can it conceivably work in the main 60fps CSS? If not, can we provide some research and data that allows other paths like support in the JavaScript DOM methods, or a static profile? Let's be sure to include all of important uses of selectors.

    For the past little while, Igalia engineers have been looking into this problem. We've been having discussions with Chromium developers, looking into Firefox and WebKit codebases and doing some initial protypes and tests to really get our heads around it. Through this, we've provided lots of data about performance of what we have already and where we believe challenges and possibilities lie. We've begun sketching out an explainer with all of our design notes and questions linked up - so it's all there in the open for people to review as we attempt to open this discussion.

    Today's intent: What it means

    The meaning of "intents" have been occasionally difficult for the larger community to understand in the same way, so I wanted to take a moment to suggest how to interpret it: With today's intent, we're simply stating that we feel that we have gathered enough information and data on this that we feel like we're ready to share it for wider review and discussion, productively. We believe that the data suggests that it seems at least plausible to carry on with discussions around supporting a (partially limited) form of :has in the main, live CSS. We would like for data, designs and limits to be discussed fairly concretely. We would like like to carry forward with additional, concrete implementation prototyping and continue to help sort out a path forward.

    May 13, 2021 04:00 AM

    May 11, 2021

    Enrique Ocaña

    GStreamer WebKit debugging by instrumenting source code (3/3)

    This is the last post on the instrumenting source code series. I hope you to find the tricks below as useful as the previous ones.

    In this post I show some more useful debugging tricks. Don’t forget to have a look at the other posts of the series:

    Finding memory leaks in a RefCounted subclass

    The source code shown below must be placed in the .h where the class to be debugged is defined. It’s written in a way that doesn’t need to rebuild RefCounted.h, so it saves a lot of build time. It logs all refs, unrefs and adoptPtrs, so that any anomaly in the refcounting can be traced and investigated later. To use it, just make your class inherit from LoggedRefCounted instead of RefCounted.

    Example output:

    void WTF::adopted(WTF::LoggedRefCounted<T>*) [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1
    void WTF::adopted(WTF::LoggedRefCounted<T>*) [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1
    ^^^ Two adopts, this is not good.
    void WTF::LoggedRefCounted<T>::ref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1 --> ...
    void WTF::LoggedRefCounted<T>::ref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount ... --> 2
    void WTF::LoggedRefCounted<T>::deref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 2 --> ...
    void WTF::LoggedRefCounted<T>::deref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount ... --> 1
    void WTF::adopted(WTF::LoggedRefCounted<T>*) [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1
    void WTF::LoggedRefCounted<T>::deref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1 --> ...
    void WTF::LoggedRefCounted<T>::deref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1 --> ...
    ^^^ Two recursive derefs, not good either.
    #include "Logging.h"
    namespace WTF {
    template<typename T> class LoggedRefCounted : public WTF::RefCounted<T> {
        void ref() {
            printf("%s: this=%p, refCount %d --> ...\n", __PRETTY_FUNCTION__, this, WTF::RefCounted<T>::refCount()); fflush(stdout);
            printf("%s: this=%p, refCount ... --> %d\n", __PRETTY_FUNCTION__, this, WTF::RefCounted<T>::refCount()); fflush(stdout);
        void deref() {
            printf("%s: this=%p, refCount %d --> ...\n", __PRETTY_FUNCTION__, this, WTF::RefCounted<T>::refCount()); fflush(stdout);
            printf("%s: this=%p, refCount ... --> %d\n", __PRETTY_FUNCTION__, this, WTF::RefCounted<T>::refCount()); fflush(stdout);
        LoggedRefCounted() { }
        ~LoggedRefCounted() { }
    template<typename T> inline void adopted(WTF::LoggedRefCounted<T>* object)
        printf("%s: this=%p, refCount %d\n", __PRETTY_FUNCTION__, object, (object)?object->refCount():0); fflush(stdout);
    } // Namespace WTF

    Pause WebProcess on launch

    WebProcessMainGtk and WebProcessMainWPE will sleep for 30 seconds if a special environment variable is defined:


    It only works #if ENABLE(DEVELOPER_MODE), so you might want to remove those ifdefs if you’re building in Release mode.

    Log tracers

    In big pipelines (e.g. playbin) it can be very hard to find what element is replying to a query or handling an event. Even using gdb can be extremely tedious due to the very high level of recursion. My coworker Alicia commented that using log tracers is more helpful in this case.

    GST_TRACERS=log enables additional GST_TRACE() calls all accross GStreamer. The following example logs entries and exits into the query function.


    The names of the logging categories are somewhat inconsistent:

    • log (the log tracer itself)
    • GST_PADS
    • query
    • bin

    The log tracer code is in subprojects/gstreamer/plugins/tracers/gstlog.c.

    by eocanha at May 11, 2021 06:00 AM

    May 10, 2021

    Fernando Jiménez

    WPE WebKit for Android

    WPE WebKit is the official WebKit port for embedded and low-consumption computer devices. It has been designed from the ground-up with performance, small footprint, accelerated content rendering, and simplicity of deployment in mind.

    It brings the excellence of the WebKit engine to countless platforms and target devices, serving as a base for systems and environments that primarily or completely rely on web platform technologies to build their interfaces.

    WPE WebKit’s architecture allows for inclusion in a variety of use cases and applications. It can be custom embedded into an existing application, or it can run as a standalone web runtime under a variety of presentation systems, from platform-specific display managers to existing window management protocols like Wayland or X11.

    Today, we (Igalia) are happy to announce initial support of WPE for Android.

    This effort was initiated back in 2017 by my colleague Žan Doberšek, who fully implemented a WPE backend for Android along with the required pieces to get rendering and basic input work. The work was paused for quite some time until the beginning of this year, when I joined Igalia and took over his work. Since then, I have been heads down working on it, trying to make it more usable thanks to Cerbero and a WebView based Java API.

    How it looks

    A picture is worth a thousand words. This is how it currently looks running on an Android phone:

    As you can see, we have the basic set of functionality enough to implement a simple multi-tabs web browser with progress report, navigation controls and IME support.

    Support is not limited to mobile devices though. Thanks to the wide range of architectures and devices that support Android we can now run WPE WebKit on an even wider set of devices. Like a pair of XR glasses. This is a video of a port of Firefox Reality using WPEView instead of GeckoView:

    Building blocks

    Cerbero build system

    WPE WebKit has a very long list of dependencies. Cross compiling all these dependencies manually can be quite cumbersome, so in order to ease the development process I focused my first weeks of work on setting up a more usable build system. We decided to use Cerbero, GStreamer’s cross compilation system, which already had recipes - this is how Cerbero names its build scripts - for many of the required dependencies. I wrote all the missing Cerbero recipes and integrated it into WPE Android’s build system, to the point that building everything requires a single python3 scripts/ --build command.

    For now the only supported architecture is arm64. There are plans to support other architectures soon.

    WPEView API

    WPEView wraps the WPE WebKit browser engine in a reusable Android API. WPEView serves a similar purpose to Android’s built-in WebView and tries to mimick its API aiming to be an easy to use drop-in replacement with extended functionality.

    Setting up WPEView in your Android application is fairly simple.

    First, add the WPEView widget to your Activity layout


    And next, wire it in your Activity implementation to start using the API, for example, to load an URL:

    override fun onCreate(savedInstanceState: Bundle?) {
        var browser = findViewById(

    To get a better sense on how to use WPEView, check the code of the MiniBrowser demo in the examples folder.

    Process model

    In order to safeguard the rest of the system and to allow the application to remain responsive even if the user loads a web page that infinite loops or otherwise hangs, the modern incarnation of WebKit uses a multi-process architecture. Web pages are loaded in its own WebProcess. Multiple WebProcesses can share a browsing session, which lives in a shared NetworkProcess. In addition to handling all network accesses, this process is also responsible for managing the disk cache and Web APIs that allow websites to store structured data such as Web Storage API and IndexedDB API.

    Given that Android forbids the fork syscall on non-rooted devices, we cannot directly spawn child processes. Instead, we use Android Services to host the logic of WebKit’s auxiliary processes. The life cycle of all WebKit’s auxiliary processes is managed by WebKit itself. The Android layer only proxies requests to spawn and terminate these processes/services.

    In addition to the multi-process architecture, modern WebKit versions introduce the PSON model (Process Swap On Navigation) which aims to improve security by creating an independent WebProcess for each security origin. This is currently disabled for WPE Android, although partial support is already in place.

    Browser and Pages

    The central piece of WPE Android is the Browser top level Singleton object. This is somehow the equivalent to WebKit’s UIProcess. Among other duties it:

    • Manages the creation and destruction of Page instances.
    • Funnels WPEView API calls to the appropriate Page instance.
    • Manages the Android Services equivalent to WebKit’s auxiliary processes (Web and Network processes).
    • Hosts the UIProcess thread where the WebKitWebContext instance lives and where the main loop is run.

    A Page roughly corresponds to a tab in a regular browser UI. There is a 1:1 relationship between WPEView and Page. Each Page instance has its own gfx.View and WebKitWebView instances associated.

    WPE Backend

    The common interface between WPEWebKit and its rendering backends is provided by libwpe. WPEBackend-android is our Android-oriented implementation of the libwpe API, bridging the gap between the WebKit architecture and the internal composition structure on one side and the Android system on the other.


    gfx.View is an extension of android.opengl.GLSurfaceView living in the UI Process. It manages the life cycle of a Surface Texture, which is some sort of buffer consumer, that is handed off to the Web Process through Android’s IPC mechanisms, where the actual rendering happens.

    It is also in charge of relaying input events to the internal WebKit input-methods.

    This part is currently being significantly changed by Žan to use Native Hardware Buffers.

    Future work

    There are still plenty of things to do and, we have a growing list of issues in the main repository. The next steps will be towards extending support for other architectures - so far only arm64 is supported. Multimedia support is also on the list of immediate plans. Along with the big rendering engine refactor that Žan is working on.

    Try it yourself

    If you want to try the current prototype, you can follow the instructions in the README of the main repo.

    We welcome contributions of all kinds. Give it a try and file issues as you encounter them. And if you feel encouraged enough, send us patches!


    • I would like to thank Igalia for giving me the time and space to work on this project.
    • Huge thanks to Žan Doberšek for his amazing work and continuous guidance.
    • Kudos to Philippe Normand and Thibault Saunier for their recommendations and support around Cerbero.
    • Many thanks to Imanol Fernández for his contributions so far and for the VR demo.

    May 10, 2021 12:00 AM