Planet Igalia

March 25, 2025

José Dapena

trace-chrome: easy remote tracing of Chromium

As part of my performance analysis work for LGE webOS, I often had to capture Chrome Traces from an embedded device. So, to make it convenient, I wrote a simple command line helper to obtain the traces remotely, named trace-chrome.

In this blog post will explain why it is useful, and how to use it.

TL;DR #

If you want to read directly about the tool, jump to the how to use section.

Tracing Chromium remotely #

Chromium provides an infrastructure for capturing static tracing data, based on Perfetto. In this blog post I am not going through its architecture or implementation, but focus on how we instruct a trace capture to start and stop, and how to then fetch the results.

Chrome/Chromium provides user interfaces for capturing and analyzing traces locally. This can be done opening a tab and pointing it to the chrome://tracing URL.

The tracing capture UI is quite powerful, and completely implemented in web. This has a downside, though: running the capture UI introduces a significant overhead in several resources (CPU, memory, GPU, …).

This overhead may be even more significant when tracing Chromium or any other Chromium based web runtime in an embedded device, where we have CPU, storage and memory constraints.

Chromium does a great work at minimizing the overhead, by postponing the trace processing as much as possible, and providing a minimal UI when the capture is ongoing. But it may still be too much.

How to avoid this problem?

  • Capturing UI should not run in the system we are tracing. We can run the UI in a different computer to capture the trace.
  • Same about storage, we want it to happen in a different computer.

The solution for both is tracing remotely. Both the user interface for controlling the recording, and the recording storage happen in a different computer.

Setting up remote tracing support #

First, some nomenclature I will use:

  • Target device: it is the one that runs the Chromium web runtime instance we are going to trace.
  • Host device: the device that will run the tracing UI, to configure, start and stop the recording, and to explore the tracing results.

OK, now we know we want to trace remotely the target device Chromium instance. How can we do that? First, we need to connect our tracing tools running in the host to the Chromium instance in the target device.

This is done using the remote debugging port: a multi-purpose HTTP port provided by Chromium. This port is used not only for tracing, it offers access to Chrome Developer Tools.

The Chromium remote debugging port is disabled by default, but it can be enabled using the command line switch --remote-debugging-port=PORT in the target Chromium instance. This will open an HTTP port in the localhost interface, that can be used to connect.

Why localhost? Because this interface does not provide any authentication or encryption. So it is unsafe. It is user responsibility to provide some security (i.e. by using an setting an SSH tunnel between the host and the target device to connect to the remote debugging port).

Capturing traces with chrome://inspect #

Chromium browser provides a solution for tracing remotely. Just opening the URL chrome://inspect in the host device. It provides this user interface:

First, the checkbox for Discover network targets needs to be set.

Then press the Configure… button to set the list of IP addressed and ports where we expect target remote debugging ports to be.

Do not forget to add to the list the end point that is accessible from the host device. I.e. in the case of an SSH tunnel from the host device to the target device port, it needs to be the host side of the tunnel.

For the case we set up the host side tunnel at the port 10101, we will see this:

Then, just pressing the trace link will show the Chromium tracing UI, but connected to the target device Chromium instance.

Capturing traces with trace-chrome #

Over the last 8 years, I have been involved quite often in exploring the performance of Chromium in embedded devices. Specifically for the LGE webOS web stack. In this problem space, Chromium tracing capabilities are handy, providing a developers oriented view of different metrics, including the time spent running known operations in specific threads.

At that time I did not know about chrome://inspect so I really did not have an easy way to collect Chromium traces from a different machine. This is important as one performance analysis principle is that collecting the information should be as lightweight as possible. Running the tracing UI in the same Chromium instance that is analyzed is against that principle.

The solution? I wrote a very simple NodeJS script, that allows to capture a Chromium trace from the command line.

This is convenient for several reasons:

  • No need to launch the full tracing UI.
  • As we completely detach that UI from the capturing step, without an additional step to record the trace to a file, we are not affected on the unstability of the tracing UI handling the captured trace (not a problem usually, but it happens).
  • Easier to repeat tests for specific tracing categories, instead of manually enabling them in the tracing UI.

The script just provides an easy to use command line interface to the already existing chrome-remote-interface NodeJS module.

The project is open source, and available at github.com/jdapena/trace-chrome.

How to use trace-chrome #

Now, the instructions to use trace-chrome. The tool depends on having a working NodeJS environment in the host.

Installation #

First, clone the Github repository in the host:

git clone github.com:jdapena/trace-chrome

Then, install the dependencies. To do this, you need to have a working NodeJS environment.

cd trace-chrome
npm install

Running #

Now it is possible to try the tool. To get the command line help just run:

$ bin/trace-chrome --help
Usage: trace-chrome [options]

Options:
-H, --host <host> Remote debugging protocol host (default: "localhost")
-p, --port <port> Remote debugging protocool port (default: "9876")
-s, --showcategories Show categories
-O, --output <path> Output file (default: "")
-c, --categories <categories> Set categories (default: "")
-e, --excludecategories <categories> Exclude categories (default: "")
--systrace Enable systrace
--memory_dump_mode <mode> Memory dump mode (default: "")
--memory_dump_interval <interval_in_ms> Memory dump interval in ms (default: 2000)
--dump_memory_at_stop
-h, --help display help for command

To connect to a running Chromium instance remote debugging port, the --host and --port parameters need to be used. In the examples I am going to use the port 9999 and the host localhost.

Warning

Note that, in this case, the parameter --host refers to the network address of the remote debugging port access point. It is not referring to the host machine where we run the script.

Getting the tracing categories #

First, to check which tracing categories are available, we can use the option --showcategories:

bin/trace-chrome --host localhost --port 9999 --showcategories

We will obtain a list like this:

AccountFetcherService
Blob
CacheStorage
Calculators
CameraStream
...

Recording a session #

Now, the most important step: recording a Chromium trace. To do this, we will provide a list of categories (parameter --categories), and a file path to record the trace (parameter --output):

bin/trace-chrome --host localhost --port 9999 \
--categories "blink,cc,gpu,renderer.scheduler,sequence_manager,v8,toplevel,viz" \
--output js_and_rendering.json

This will start recording. To stop recording, just press <Ctrl>+C, and the trace will be transferred and stored to the provided file path.

Tip

Which categories to use? Good presets for certain problem scopes can be obtained in Chrome. Just open chrome://tracing, press the Record button, and play with the predefined settings. In the bottom you will see the list of categories to pass for each of them.

Opening the trace file #

Now the tracing file has been obtained, it can be opened from Chrome or Chromium running in host: load in a tab the URL chrome://tracing and press the Load button.

Tip

The traces are completely standalone. So they can be loaded in any other computer without any additional artifact. This is useful, as those traces can be shared among developers or uploaded to a ticket tracker.

But, if you want to do that, do not forget to compress first with gzip to make the trace smaller. chrome://tracing can open the compressed traces directly.

Capturing memory infra dumps #

The script also supports periodical recording of the memory-infra system. This captures periodical dumps of the state of memory, with specific instrumentation in several categories.

To use it, add the category disabled-by-default-memory-infra, and pass the following parameters to configure the capture:

  • --memory_dump_mode <background|detailed|simple>: level of detail. background is designed to have almost no impact in execution, running very fast. light mode shows a few entries, while detailed is unlimited, and provides the most complete information.
  • --memory_dump_interval: the interval in miliseconds between snapshots.

Using npx #

For convenience, it is also possible to use trace-chrome with npx. It will install the script and the dependencies in the NPM cache, and run from them:

npx jdapena/trace-chrome --help

Examples #

  1. Record a trace of the categories for the Web Developer mode in Chrome Tracing UI:
bin/trace-chrome --host HOST --port PORT \ 
--categories "blink,cc,netlog,renderer.scheduler,sequence_manager,toplevel,v8" \
--output web_developer.json
  1. Record memory infrastructure snapshots every 10 seconds:
bin/trace-chrome --host HOST --port PORT \
--categories "disabled-by-default-memory-infra" --memory_dump_mode detailed \
--memory_dump_interval 10000 --output memory_infra.json

Wrapping up #

trace-chrome is a very simple tool, just providing a convenient command line interface for interacting with remote Chromium instances. It is specially useful for tracing embedded devices.

It has been useful for me for years, in a number of platforms, from Windows to Linux, from desktop to low end devices.

Try it!

References #

March 25, 2025 12:00 AM

March 24, 2025

Igalia WebKit Team

WebKit Igalia Periodical #18

Update on what happened in WebKit in the week from March 17 to March 24.

Cross-Port 🐱

Limited the amount data stored for certain elements of WebKitWebViewSessionState. This results in memory savings, and avoids oddly large objects which resulted in web view state being restored slowly.

Multimedia 🎥

GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.

Reduced parsing overhead in incoming WebRTC video streams by reducing excessive tag events at startup and by avoiding the plugging of parser elements for already-parsed streams.

JavaScriptCore 🐟

The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.

Fixed an integer overflow when using wasm/gc on 32-bits.

Graphics 🖼️

Landed a change that fixes a few scenarios where the damage was not generated on layer property changes.

Releases 📦️

WebKitGTK 2.48.0 and WPE WebKit 2.48.0 have been released. While they may not look as exciting as the 2.46 series, which introduced the use of Skia for painting, they nevertheless includes half a year of improvements. This development cycle focused on reworking internals, which brings modest performance improvements for all kinds of devices, but most importantly cleanups which will enable further improvements going forward.

For those who need longer to integrate newer releases, which we know can be a longer process for embedded device distrihytos, we have also published WPE WebKit 2.46.7 with a few stability and security fixes.

Accompanying these releases there is security advisory WSA-2025-0002 (GTK, WPE), which covers the solved security issues. Crucially, all three contain the fix for an issue known to be exploited in the wild, and therefore we strongly encourage updating.

As usual, bug reports are always welcome at the WebKit Bugzilla.

libsoup 3.6.5 has been released with bug fixes and security improvements.

That’s all for this week!

by Unknown at March 24, 2025 07:50 PM

March 20, 2025

Jesse Alama

Leaning In! 2025 from the organizer’s point of view

Lean­ing In! 2025 from the or­ga­niz­er’s point of view

Lean­ing In! 2025 has come and gone. How did it go?

The in­spi­ra­tion for do­ing Lean­ing In! came from the tu­to­r­i­al at BOBKonf 2024 by Joachim Bre­it­ner and David Chris­tiansen. The tu­to­r­i­al room was full; in fact, it was over­full and not every­one who want­ed to at­tend could at­tend. I’d kept my eye on Lean from its ear­li­est days but lost the thread for a long time. The im­age I had of Lean came from its ver­sion 1 and 2 days, when the project was still close­ly aligned the aims of ho­mo­topy type the­o­ry. I didn’t know about Lean ver­sion 3. So when I opened my eyes and woke up, I was in the cur­rent era of Lean (ver­sion 4), with a great lan­guage, hu­mon­gous stan­dard li­brary, and pret­ty sen­si­bile tool­ing. I was on board right away. As an or­ga­niz­er of Rack­et­fest, I had some ex­pe­ri­ence putting to­geth­er (small) con­fer­ences, so I thought I’d give it a go with Lean.

I an­nounced the con­fer­ence a few months ago, so there wasn’t all that much time to find speak­ers and plan. Still, we had 33 peo­ple in the room. When I first start­ed plan­ning the work­shop, I thought there’d only be 10-12 peo­ple. This was my first time or­ga­niz­ing a Lean work­shop of any sort, so my ini­tial ex­pec­ta­tions were very mod­est. I booked a fair­ly small room at Spielfeld for that. Af­ter some en­cour­age­ment from Joachim, who po­lite­ly sug­gest­ed that 10-12 might be a bit too small, I re­quest­ed a some­what larg­er room, for up to 20 peo­ple. But as reg­is­tra­tions kept com­ing in, I need­ed to rene­go­ti­ate with Spielfeld. Ul­ti­mate­ly, they put us in their largest room (a more ap­pro­pri­ate­ly sized room ex­ists but had al­ready been booked). The room we were in was some­what too big, but I’m glad we had the space.

Lean is a de­light­ful mix of pro­gram ver­i­fi­ca­tion and math­e­mat­ics for­mal­iza­tion. That was re­flect­ed in the pro­gram. We had three talks,

that, I’d say, were def­i­nite­ly more in the com­put­er sci­ence camp. With Lean, it’s not so clear at times. Lukas’s talk was mo­ti­vat­ed by some ap­pli­ca­tions com­ing from com­put­er sci­ence but the top­ic makes sense on its own and could have been tak­en up by a math­e­mati­cian. The open­ing talk, Re­cur­sive de­f­i­n­i­tions, by Joachim Bre­it­ner, was about the in­ter­nals of Lean it­self, so I think it doesn’t count as a talk on for­mal­iz­ing math­e­mat­ics. But it sort of was, in the sense that it was about the log­ic in the Lean ker­nel. It was com­put­er sci­ence-y, but it wasn’t real­ly about us­ing Lean, but more about bet­ter un­der­stand­ing how Lean works un­der the hood.

It is clear that math­e­mat­ics for­mal­iza­tion in Lean is very much ready for re­search lev­el math­e­mat­ics. The math­e­mat­ics li­brary is very well de­vel­oped, and Lean is fast enough, with good enough tool­ing, to en­able math­e­mati­cians to do se­ri­ous stuff. We are light years past noodling about the Peano ax­ioms or How do I for­mal­ize a group?. I have a guy feel­ing that we may be ap­proach­ing a point in the near fu­ture wher Lean might be­come a com­mon way of do­ing math­e­mat­ics.

What didn’t go so well

The part of the event that prob­a­bly didn’t go quite as I had planned was the Proof Clin­ic in the af­ter­noon. The in­ten­tion of the proof clin­ic was to take ad­van­tage of the fact that many of us had come to Berlin to meet face-to-face, and there were sev­er­al ex­perts in the room. Let’s work to­geth­er! If there’s any­thing you’re stuck on, let’s talk about it and make some progress, to­day. Think of it as a sort of mi­cro-un­con­fer­ence (just one hour long) with­in a work­shop.

That sounds good, but I didn’t pre­pare the at­ten­dees well enough. I only start­ed adding top­ics to the list of po­ten­tial dis­cus­sion items in the morn­ing, and I was the only one adding them. Pri­vate­ly, I had a few dis­cus­sion items in my back pock­et, but they were in­tend­ed just to get the con­ver­sa­tion go­ing. My idea was that once we prime the pump, we’ll have all sorts of things to talk about.

That’s not quite what hap­pened. We did, ul­ti­mate­ly, dis­cuss a few in­ter­est­ing things but it took a while for us to warm up. Also, do­ing the proof clin­ic as a sin­gle large group might not have been the best idea. Per­haps we should have split up into groups and tried to work to­geth­er that way.

I also learned that sev­er­al at­ten­dees don’t use Zulip, so my as­sump­tion that Zulip is the one and only way for peo­ple to com­mu­ni­cate about Lean wasn’t quite right. I could have done bet­ter com­mu­ni­ca­tion with at­ten­dees in ad­vance to make sure that we co­or­di­nate dis­cus­sion in Zulip, in­stead of sim­ply as­sum­ing that, of course, every­one is there.

The fu­ture

Will there be an­oth­er edi­tion of Lean­ing In!

Yes, I think so. It's a lot of work to or­ga­nize a con­fer­ence (and there's al­ways more to do, even when you know that there's a lot!). But the com­mu­ni­ty ben­e­fits are clear. Stay tuned!

March 20, 2025 08:43 AM

March 19, 2025

Jesse Alama

Announcing decimal128: JavaScript implementation of Decimal128

An­nounc­ing dec­i­mal128: JavaScript im­ple­men­ta­tion of Dec­i­mal128

I’m hap­py to an­nounce dec­i­mal128.js, an NPM pack­age I made for sim­u­lat­ing IEEE 754 Dec­i­mal128 num­bers in JavaScript.

(This is my first NPM pack­age. I made it in Type­Script; it’s my first go at the lan­guage.)

What?

Dec­i­mal128 is an IEEE stan­dard for float­ing-point dec­i­mal num­bers. These num­bers aren’t the bi­na­ry float­ing-point num­bers that you know and love (?), but dec­i­mal num­bers. You know, the kind we learn about be­fore we’re even ten years old. In the bi­na­ry world, things like 0.1 + 0.2 aren’t ex­act­ly* equal to 0.3, and cal­cu­la­tions like 0.7 * 1.05 work out to ex­act­ly 0.735. These kinds of num­bers are what we use when do­ing all sorts of every­day cal­cu­la­tions, es­pe­cial­ly those hav­ing to do with mon­ey.

Dec­i­mal128 en­codes dec­i­mal num­bers into 128 bits. It is a fixed-width en­cod­ing, un­like ar­bi­trary-pre­ci­sion num­bers, which, of course, re­quire an ar­bi­trary amount of space. The en­cod­ing can rep­re­sent of num­bers with up to 34 sig­nif­i­cant dig­its and an ex­po­nent of –6143 to 6144. That is a tru­ly vast amount of space if one keeps the in­tend­ed use cas­es in­volv­ing hu­man-read­able and -writable num­bers (read: mon­ey) in mind.

Why?

I’m work­ing on ex­tend­ing the JavaScript lan­guage with dec­i­mal num­bers (pro­pos­al-dec­i­mal). One of the de­sign de­ci­sions that has to be made there is whether to im­ple­ment ar­bi­trary-pre­ci­sion dec­i­mal num­bers or to im­ple­ment some kind of ap­prox­i­ma­tion there­of, with Dec­i­mal128 be­ing the main con­tender. As far as I could tell, there was no im­ple­men­ta­tion of Dec­i­mal128 in JavaScript, so I made one.

The in­ten­tion isn’t to sup­port the full Dec­i­mal128 stan­dard, nor should one ex­pect to achieve the per­for­mance that, say, a C/C++ li­brary would give you in user­land JavaScript. (To say noth­ing of hav­ing ma­chine-na­tive dec­i­mal in­struc­tions, which is tru­ly ex­ot­ic.) The in­ten­tion is to give JavaScript de­vel­op­ers some­thing that gen­uine­ly strives to ap­prox­i­mate Dec­i­mal128 for JS pro­grams.

In par­tic­u­lar, the hope is that this li­brary of­fers the JS com­mu­ni­ty a chance to get a feel for what Dec­i­mal128 might be like.

How to use

Just do

$ npm in­stall dec­i­mal128

and start us­ing the pro­vid­ed Dec­i­mal128 class.

Is­sues?

If you find any bugs or would like to re­quest a fea­ture, just open an is­sue and I’ll get on it.

March 19, 2025 10:11 AM

The decimals around us: Cataloging support for decimal numbers

The dec­i­mals around us: Cat­a­loging sup­port for dec­i­mal num­bers

Dec­i­mals num­bers are a data type that aims to ex­act­ly rep­re­sent dec­i­mal num­bers. Some pro­gram­mers may not know, or ful­ly re­al­ize, that, in most pro­gram­ming lan­guages, the num­bers that you en­ter look like dec­i­mal num­bers but in­ter­nal­ly are rep­re­sent­ed as bi­na­ry—that is, base-2—float­ing-point num­bers. Things that are to­tal­ly sim­ple for us, such as 0.1, sim­ply can­not be rep­re­sent­ed ex­act­ly in bi­na­ry. The dec­i­mal data type—what­ev­er its stripe or fla­vor—aims to rem­e­dy this by giv­ing us a way of rep­re­sent­ing and work­ing with dec­i­mal num­bers, not bi­na­ry ap­prox­i­ma­tions there­of. (Wikipedia has more.)

To help with my work on adding dec­i­mals to JavaScript, I've gone through a list of pop­u­lar pro­gram­ming lan­guages, tak­en from the 2022 Stack­Over­flow de­vel­op­er sur­vey. What fol­lows is a brief sum­ma­ry of where these lan­guages stand re­gard­ing dec­i­mals. The in­ten­tion is to keep things sim­ple. The pur­pose is:

  1. If a lan­guage does have dec­i­mals, say so;
  2. If a lan­guage does not have dec­i­mals, but at least one third-par­ty li­brary ex­ists, men­tion it and link to it. If a dis­cus­sion is un­der­way to add dec­i­mals to the lan­guage, link to that dis­cus­sion.

There is no in­ten­tion to fil­ter out an lan­guage in par­tic­u­lar; I'm just work­ing with a slice of lan­guages found in in the Stack­Over­flow list linked to ear­li­er. If a lan­guage does not have dec­i­mals, there may well be mul­ti­ple third-part dec­i­mal li­braries. I'm not aware of all li­braries, so if I have linked to a mi­nor li­brary and ne­glect to link to a more high-pro­file one, please let me know. More im­por­tant­ly, if I have mis­rep­re­sent­ed the ba­sic fact of whether dec­i­mals ex­ists at all in a lan­guage, send mail.

C

C does not have dec­i­mals. But they're work­ing on it! The C23 stan­dard (as in, 2023) stan­dard pro­pos­es to add new fixed bit-width data types (32, 64, and 128) for these num­bers.

C#

C# has dec­i­mals in its un­der­ly­ing .NET sub­sys­tem. (For the same rea­son, dec­i­mals also ex­ist in Vi­su­al Ba­sic.)

C++

C++ does not have dec­i­mals. But—like C—they're work­ing on it!

Dart

Dart does not have dec­i­mals. But a third-par­ty li­brary ex­ists.

Go

Go does not have dec­i­mals, but a third-par­ty li­brary ex­ists.

Java

Java has dec­i­mals.

JavaScript

JavaScript does not have dec­i­mals. We're work­ing on it!

Kotlin

Kotlin does not have dec­i­mals. But, in a way, it does: since Kotlin is run­ning on the JVM, one can get dec­i­mals by us­ing Java's built-in sup­port.

PHP

PHP does not have dec­i­mals. An ex­ten­sion ex­ists and at least one third-par­ty li­brary ex­ists.

Python

Python has dec­i­mals.

Ruby

Ruby has dec­i­mals. De­spite that, there is some third-par­ty work to im­prove the built-in sup­port.

Rust

Rust does not have dec­i­mals, but a crate ex­ists.

SQL

SQL has dec­i­mals (it is the DEC­I­MAL data type). (Here is the doc­u­men­ta­tion for, e.g., Post­greSQL, and here is the doc­u­men­ta­tion for MySQL.)

Swift

Swift has dec­i­mals

Type­Script

Type­Script does not have dec­i­mals. How­ev­er, if dec­i­mals get added to JavaScript (see above), Type­Script will prob­a­bly in­her­it dec­i­mals, even­tu­al­ly.

March 19, 2025 10:10 AM

Here’s how to unbreak floating-point math in JavaScript

Here’s how to un­break float­ing-point math in JavaScript

Be­cause com­put­ers are lim­it­ed, they work in a fi­nite range of num­bers, name­ly, those that can be rep­re­sent­ed straight­for­ward­ly as fixed-length (usu­al­ly 32 or 64) se­quences of bits. If you’ve only got 32 or 64 bits, it’s clear that there are only so many num­bers you can rep­re­sent, whether we’re talk­ing about in­te­gers or dec­i­mals. For in­te­gers, it’s clear that there’s a way to ex­act­ly rep­re­sent math­e­mat­i­cal in­te­gers (with­in the fi­nite do­main per­mit­ted by 32 or 64 bits). For dec­i­mals, we have to deal with the lim­its im­posed by hav­ing only a fixed num­ber of bits: most dec­i­mal num­bers can­not be ex­act­ly rep­re­sent­ed. This leads to headaches in all sorts of con­texts where dec­i­mals arise, such as fi­nance, sci­ence, en­gi­neer­ing, and ma­chine learn­ing.

It has to do with our use of base 10 and the com­put­er’s use of base 2. Math strikes again! Ex­act­ness of dec­i­mal num­bers isn’t an ab­struse, edge case-y prob­lem that some math­e­mati­cians thought up to poke fun at pro­gram­mers en­gi­neers who aren’t blessed to work in an in­fi­nite do­main. Con­sid­er a sim­ple ex­am­ple. Fire up your fa­vorite JavaScript en­gine and eval­u­ate this:

1 + 2 === 3

You should get true. Duh. But take that ex­am­ple and work it with dec­i­mals:

0.1 + 0.2 === 0.3

You’ll get false.

How can that be? Is float­ing-point math bro­ken in JavaScript? Short an­swer: yes, it is. But if it’s any con­so­la­tion, it’s not just JavaScript that’s bro­ken in this re­gard. You’ll get the same re­sult in all sorts of oth­er lan­guages. This isn’t wat. This is the un­avoid­able bur­den we pro­gram­mers bear when deal­ing with dec­i­mal num­bers on ma­chines with lim­it­ed pre­ci­sion.

Maybe you’re think­ing OK, but if that’s right, how in the world do dec­i­mal num­bers get han­dled at all? Think of all the fi­nan­cial ap­pli­ca­tions out there that must be do­ing the wrong thing count­less times a day. You’re quite right! One way of get­ting around odd­i­ties like the one above is by al­ways round­ing. So in­stead of work­ing with, say, this is by han­dling dec­i­mal num­bers as strings (se­quences of dig­its). You would then de­fine op­er­a­tions such as ad­di­tion, mul­ti­pli­ca­tion, and equal­i­ty by do­ing el­e­men­tary school math, dig­it by dig­it (or, rather, char­ac­ter by char­ac­ter).

So what to do?

Num­bers in JavaScript are sup­posed to be IEEE 754 float­ing-point num­bers. A con­se­quence of this is, ef­fec­tive­ly, that 0.1 + 0.2 will nev­er be 0.3 (in the sense of the === op­er­a­tor in JavaScript). So what can be done?

There’s an npm li­brary out there, dec­i­mal.js, that pro­vides sup­port for ar­bi­trary pre­ci­sion dec­i­mals. There are prob­a­bly oth­er li­braries out there that have sim­i­lar or equiv­a­lent func­tion­al­i­ty.

As you might imag­ine, the is­sue un­der dis­cus­sion is old. There are workarounds us­ing a li­brary.

But what about ex­tend­ing the lan­guage of JavaScript so that the equa­tion does get val­i­dat­ed? Can we make JavaScript work with dec­i­mals cor­rect­ly, with­out us­ing a li­brary?

Yes, we can.

Aside: Huge in­te­gers

It’s worth think­ing about a sim­i­lar is­sue that also aris­es from the finite­ness of our ma­chines: ar­bi­trar­i­ly large in­te­gers in JavaScript. Out of the box, JavaScript didn’t sup­port ex­treme­ly large in­te­gers. You’ve got 32-bit or (more like­ly) 64-bit signed in­te­gers. But even though that’s a big range, it’s still, of course, lim­it­ed. Big­Int, a pro­pos­al to ex­tend JS with pre­cise­ly this kind of thing, reached Stage 4 in 2019, so it should be avail­able in pret­ty much every JavaScript en­gine you can find. Go ahead and fire up Node or open your brows­er’s in­spec­tor and plug in the num­ber of nanosec­onds since the Big Bang:

13_787_000_000_000n // years
* 365n              // days
* 24n               // hours
* 60n               // min­utes
* 60n               // sec­onds
* 1000n             // mil­lisec­onds
* 1000n             // mi­crosec­onds
* 1000n             // nanosec­onds

(Not a sci­en­ti­cian. May not be true. Not in­tend­ed to be a fac­tu­al claim.)

Adding big dec­i­mals to the lan­guage

OK, enough about big in­te­gers. What about adding sup­port for ar­bi­trary pre­ci­sion dec­i­mals in JavaScript? Or, at least, high-pre­ci­sion dec­i­mals? As we see above, we don’t even need to wrack our brains try­ing to think of com­pli­cat­ed sce­nar­ios where a ton of dig­its af­ter the dec­i­mal point are need­ed. Just look at 0.1 + 0.2 = 0.3. That’s pret­ty low-pre­ci­sion, and it still doesn’t work. Is there any­thing anal­o­gous to Big­Int for non-in­te­ger dec­i­mal num­bers? No, not as a li­brary; we al­ready dis­cussed that. Can we add it to the lan­guage, so that, out of the box—with no third-par­ty li­brary—we can work with dec­i­mals?

The an­swer is yes. Work is pro­ceed­ing on this mat­ter, but things re­main to un­set­tled. The rel­e­vant pro­pos­al is BigDec­i­mal. I’ll be work­ing on this for a while. I want to get big dec­i­mals into JavaScript. There are all sorts of is­sues to re­solve, but they’re def­i­nite­ly re­solv­able. We have ex­pe­ri­ence with ar­bi­trary pre­ci­sion arith­metic in oth­er lan­guages. It can be done.

So yes, float­ing-point math is bro­ken in JavaScript, but help is on the way. You’ll see more from me here as I tack­le this in­ter­est­ing prob­lem; stay tuned!

March 19, 2025 10:10 AM

Binary floats can let us down! When close enough isn't enough

Bi­na­ry floats can let us down! When close enough isn't enough

If you've played Mo­nop­oly, you'll know abuot the Bank Er­ror in Your Fa­vor card in the Com­mu­ni­ty Chest. Re­mem­ber this?

Card from the game Monopoly: Bank error in your favor!

A bank er­ror in your fa­vor? Sweet! But what if the bank makes an er­ror in its fa­vor? Sure­ly that's just as pos­si­ble, right?

I'm here to tell you that if you're do­ing every­day fi­nan­cial cal­cu­la­tions—noth­ing fan­cy, but in­volv­ing mon­ey that you care about—then you might need to know that us­ing bi­na­ry float­ing point num­bers, then some­thing might be go­ing wrong. Let's see how bi­na­ry float­ing-point num­bers might yield bank er­rors in your fa­vor—or the bank's.

In a won­der­ful pa­per on dec­i­mal float­ing-point num­bers, Mike Col­ishaw gives an ex­am­ple.

Here's how you can re­pro­duce that in JavaScript:

(1.05 * 0.7).to­Pre­ci­sion(2);
# 0.73

Some pro­gram­mers might not be aware of this, but many are. By point­ing this out I'm not try­ing to be a smar­ty­pants who knows some­thing you don't. For me, this ex­am­ple il­lus­trates just how com­mon this sort of er­ror might be.

For pro­gram­mers who are aware of the is­sue, one typ­i­cal ap­proache to deal­ing with it is this: Nev­er work with sub-units of a cur­ren­cy. (Some cur­ren­cies don't have this is­sue. If that's you and your prob­lem do­main, you can kick back and be glad that you don't need to en­gage in the fol­low­ing sorts of headaches.) For in­stance, when work­ing with US dol­lars of eu­ros, this ap­proach man­dates that one nev­er works with eu­ros and cents, but only with cents. In this set­ting, dol­lars ex­ist only as an ab­strac­tion on top of cents. As far as pos­si­ble, cal­cu­la­tions nev­er use floats. But if a float­ing-point num­ber threat­ens to come up, some form of round­ing is used.

An­oth­er aproach for a pro­gram­mer is to del­e­gate fi­nan­cial cal­cu­la­tions to an ex­ter­nal sys­tem, such as a re­la­tion­al data­base, that na­tive­ly sup­ports prop­er dec­i­mal cal­cu­la­tions. One dif­fi­cul­ty is that even if one del­e­gates these cal­cu­la­tions to an ex­ter­nal sys­tem, if one lets a float­ing-point val­ue flow int your pro­gram, even a val­ue that can be trust­ed, it may be­come taint­ed just by be­ing im­port­ed into a lan­guage that doesn't prop­er­ly sup­port dec­i­mals. If, for in­stance, the re­sult of a cal­cu­la­tion done in, say, Post­gres, is ex­act­ly 0.1, and that flows into your JavaScript pro­gram as a num­ber, it's pos­si­ble that you'll be deal­ing with a con­t­a­m­i­nat­ed val­ue. For in­stance:

(0.1).to­Pre­ci­sion(25)
# 0.1000000000000000055511151

This ex­am­ple, ad­mit­ted­ly, re­quires quite a lot of dec­i­mals (19!) be­fore the ugly re­al­i­ty of the sit­u­a­tion rears its head. The re­al­i­ty is that 0.1 does not, and can­not, have an ex­act rep­re­sen­ta­tion in bi­na­ry. The ear­li­er ex­am­ple with the cost of a phone call is there to raise your aware­ness of the pos­si­bil­i­ty that one doesn't need to go 19 dec­i­mal places be­fore one starts to see some weird­ness show­ing up.

There are all sorts of ex­am­ples of this. It's ex­ceed­ing­ly rare for a dec­i­mal num­ber to have an ex­act rep­re­sen­ta­tion in bi­na­ry. Of the num­bers 0.1, 0.2, …, 0.9, only 0.5 can be ex­act­ly rep­re­sent­ed in bi­na­ry.

Next time you look at a bank state­ment, or a bill where some tax is cal­cu­lat­ed, I in­vite you to ask how that was cal­cu­lat­ed. Are they us­ing dec­i­mals, or floats? Is it cor­rect?

I'm work­ing on the dec­i­mal pro­pos­al for TC39 to try to work what it might be like to add prop­er dec­i­mal num­bers to JavaScript. There are a few very in­ter­est­ing de­grees of free­dom in the de­sign space (such as the pre­cise datatype to be used to rep­re­sent these kinds of num­ber), but I'm op­ti­mistic that a rea­son­able path for­ward ex­ists, that con­sen­sus be­tween JS pro­gram­mers and JS en­gine im­ple­men­tors can be found, and even­tu­al­ly im­ple­ment­ed. If you're in­ter­est­ed in these is­sues, check out the README in the pro­pos­al and get in touch!

March 19, 2025 10:09 AM

Getting started with Lean 4, your next programming language

Get­ting start­ed with Lean 4, your next pro­gram­ming lan­guage

I had the plea­sure of learn­ing about Lean 4 with David Chris­tiansen and Joachim Bre­it­ner at their tu­to­r­i­al at BOBKonf 2024. I‘m plan­ning on do­ing a cou­ple of for­mal­iza­tions with Lean and would love to share what I learn as a to­tal new­bie, work­ing on ma­cOS.

Need­ed tools

I‘m on ma­cOS and use Home­brew ex­ten­sive­ly. My sim­ple go-to ap­proach to find­ing new soft­ware is to do brew search lean. This re­vealed lean as well as sur­face elan. Run­ning brew info lean showed me that that pack­age (at the time I write this) in­stalls Lean 3. But I know, out-of-band, that Lean 4 is what I want to work with. Run­ning brew info elan looked bet­ter, but the out­put re­minds me that (1) the in­for­ma­tion is for the elan-init pack­age, not the elan cask, and (2) elan-init con­flicts with both the elan and the afore­men­tioned lean. Yikes! This strikes me as a po­ten­tial prob­lem for the com­mu­ni­ty, be­cause I think Lean 3, though it still works, is pre­sum­ably not where new Lean de­vel­op­ment should be tak­ing place. Per­haps the Home­brew for­mu­la for Lean should be up­dat­ed called lean3, and a new lean4 pack­age should be made avail­able. I‘m not sure. The sit­u­a­tion seems less than ide­al, but in short, I have been suc­cess­ful with the elan-init pack­age.

Af­ter in­stalling elan-init, you‘ll have the elan tool avail­able in your shell. elan is the tool used for main­tain­ing dif­fer­ent ver­sions of Lean, sim­i­lar to nvm in the Node.js world or pyenv.

Set­ting up a blank pack­age

When I did the Lean 4 tu­to­r­i­al at BOB, I worked en­tire­ly with­in VS Code and cre­at­ed a new stand­alone pack­age us­ing some in-ed­i­tor func­tion­al­i­ty. At the com­mand line, I use lake init to man­u­al­ly cre­ate a new Lean pack­age. At first, I made the mis­take of run­ning this com­mand, as­sum­ing it would cre­ate a new di­rec­to­ry for me and set up any con­fig­u­ra­tion and boil­er­plate code there. I was sur­prised to find, in­stead, that lake init sets things up in the cur­rent di­rec­to­ry, in ad­di­tion to cre­at­ing a sub­di­rec­to­ry and pop­u­lat­ing it. Us­ing lake --help, I read about the lake new com­mand, which does what I had in mind. So I might sug­gest us­ing lake new rather than lake init.

What‘s in the new di­rec­to­ry? Do­ing tree foo­bar re­veals

foo­bar
├── Foo­bar
│   └── Ba­sic.lean
├── Foo­bar.lean
├── Main.lean
├── lake­file.lean
└── lean-tool­chain

Tak­ing a look there, I see four .lean files. Here‘s what they con­tain:

Main.lean
im­port «Foo­bar»

def main : IO Unit :=
  IO.print­ln s!"Hel­lo, {hel­lo}!"
Foo­bar.lean
-- This mod­ule serves as the root of the `Foo­bar` li­brary.
-- Im­port mod­ules here that should be built as part of the li­brary.
im­port «Foo­bar».Ba­sic
Foo­bar/Ba­sic.lean
def hel­lo := "world"
lake­file.lean
im­port Lake
open Lake DSL

pack­age «foo­bar» where
  -- add pack­age con­fig­u­ra­tion op­tions here

lean_lib «Foo­bar» where
  -- add li­brary con­fig­u­ra­tion op­tions here

@[de­fault_tar­get]
lean_exe «foo­bar» where
  root := `Main

It looks like there‘s a lit­tle mod­ule struc­ture here, and a ref­er­ence to the iden­ti­fi­er hel­lo, de­fined in Foo­bar/Ba­sic.lean and made avail­able via Foo­bar.lean. I’m not go­ing to touch lake­file.lean for now; as a new­bie, it looks scary enough that I think I’ll just stick to things like Ba­sic.lean.

There‘s also an au­to­mat­i­cal­ly cre­at­ed .git there, not shown in the di­rec­to­ry out­put above.

Now what?

Now that you‘ve got Lean 4 in­stalled and set up a pack­age, you‘re ready to dive in to one of the of­fi­cial tu­to­ri­als. The one I‘m work­ing through is David‘s Func­tion­al Pro­gram­ming in Lean. There‘s all sorts of ad­di­tion­al things to learn, such as all the dif­fer­ent lake com­mands. En­joy!

March 19, 2025 10:09 AM

Announcing a polyfill for the TC39 decimal proposal

An­nounc­ing a poly­fill for the TC39 dec­i­mal pro­pos­al

I’m hap­py to an­nounce that the dec­i­mal pro­pos­al—a pro­posed ex­ten­sion of JavaScript to sup­port dec­i­mal num­bers—is now avail­able as an NPM pack­age called pro­pos­al-dec­i­mal!

(Ac­tu­al­ly, it has been avail­able for some time, made avail­able not long af­ter we de­cid­ed to pur­sue IEEE 754 Dec­i­mal128 as a data mod­el for the dec­i­mal pro­pos­al rather than some al­ter­na­tives. The old pack­age was—and still is—avail­able un­der a dif­fer­ent name—dec­i­mal128—but I’ll be sun­set­ting that pack­age in fa­vor of the new one an­nounced here. If you’ve been us­ing dec­i­mal128, you can con­tin­ue to use it, but you’ll prob­a­bly want to switch to pro­pos­al-dec­i­mal.)

To use pro­pos­al-dec­i­mal in your project, in­stall the NPM pack­age. If you’re look­ing to use this code in Node.js or oth­er JS en­gines that sup­port ESM, you'll want to im­port the code like this:

im­port { Dec­i­mal128 } from 'pro­pos­al-dec­i­mal';
con­st x = new Dec­i­mal128("0.1");
// etc.

For use in a brows­er, the file dist/Dec­i­mal128.mjs con­tains the Dec­i­mal128 class and all its in­ter­nal de­pen­den­cies in a sin­gle file. Use it like this:

<script type="mod­ule">
im­port { Dec­i­mal128 } from 'path/to/Dec­i­mal128.mjs';
con­st x = new Dec­i­mal128("0.1");
// keep rock­ing dec­i­mals!
</script>

The in­ten­tion of this poly­fill is to track the spec text for the dec­i­mal pro­pos­al. I can­not rec­om­mend this pack­age for pro­duc­tion use just yet, but it is us­able and I’d love to hear any ex­pe­ri­ence re­ports you may have. We’re aim­ing to be as faith­ful as pos­si­ble to the spec, so we don’t aim to be blaz­ing­ly fast. That said, please do re­port any wild de­vi­a­tions in per­for­mance com­pared to oth­er dec­i­mal li­braries for JS as an is­sue. Any crash­es or in­cor­rect re­sults should like­wise be re­port­ed as an is­sue.

En­joy!

March 19, 2025 10:09 AM

March 18, 2025

Manuel Rego

Two new nice additions by Igalia on the last Safari Technology Preview.

March 18, 2025 12:00 AM

March 17, 2025

Igalia WebKit Team

WebKit Igalia Periodical #17

Update on what happened in WebKit in the week from March 10 to March 17.

Cross-Port 🐱

Web Platform 🌐

Updated button activation behaviour and type property reflection with command and commandfor. Also aligned popovertarget behaviour with latest specification.

Fixed reflection of command IDL property.

Implemented the trusted-types-eval keyword for the script-src CSP directive.

Implemented accessibility handling for command buttons.

Enabled command and commandfor in preview.

JavaScriptCore 🐟

The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.

Fixed an integer overflow in JSC (only happens on 32-bit systems with lots of RAM).

Graphics 🖼️

Fixed theming issues in WPE/WebKitGTK with vertical writing-modes.

That’s all for this week!

by Unknown at March 17, 2025 08:31 PM

March 12, 2025

Brian Kardell

11ty Math

11ty Math

Markdown, LaTex and ASCII - who writes HTML?

Very nearly none of the content that we encounter online is entirely hand authored from opening doctype to closing HTML tag - it's assembled. We have layouts, and includes and templates and so on. And, most of the actual content that we produce and consume is written in some more familiar or easier to write shorthand. For most of the people reading this, it's probably mostly markdown.

But did you know that lots of the places that you use markdown like GitHub and GitLab, and Visual Studio support embedding mathematical expressions written in LaTex surrounded by $ (inline) or $$ (block)? Those are then transformed for you to rendered Math with MathML?

It got me thinking that we should have a kind of similarly standard easy setup for 11ty. It would be a huge win to process it on the server, MathML will render natively, fast, without FOUC. It will be accessible, styleable, scale appropriately with text-size and zoom and so on.

The super interesting thing to note about most of the tools where you can use markup is that so many of them are built on common infrastructure: markdown-it. The architectural pattern of markdown-it allows people to write plugins, and if you're looking to match those above, you can do it pretty easily with the @mdit/plugin-katex:

/* eleventy.config.js */
const markdownIt = require("markdown-it");

module.exports = async function (eleventyConfig) {

  const { katex } = (await import("@mdit/plugin-katex"));
  eleventyConfig.setLibrary(
  	"md", 
  	markdownIt().use(katex, {output: "mathml"})
  );

}

And... That's it. Now you can embed LaTex math in your markdown just as you can in those other places and it will do the work of generating fast, native, accessible, and styleable MathML...

Some math  $\frac{x^2}{a^2} + \frac{y^2}{b^2} = 1$ whee.

Yields...

<p>Some math <math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><msup><mi>x</mi><mn>2</mn></msup><msup><mi>a</mi><mn>2</mn></msup></mfrac><mo>+</mo><mfrac><msup><mi>y</mi><mn>2</mn></msup><msup><mi>b</mi><mn>2</mn></msup></mfrac><mo>=</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">\frac{x^2}{a^2} + \frac{y^2}{b^2} = 1</annotation></semantics></math></span> whee.</p>

Which your browser renders as...

Some math inline x2a2+y2b2=1\frac{x^2}{a^2} + \frac{y^2}{b^2} = 1 whee.

Surrounding the math part alone with $$ instead yields block math, which renders as..

x2a2+y2b2=1\frac{x^2}{a^2} + \frac{y^2}{b^2} = 1

Mathematical fonts are critical for rendering. We almost have universally good default math fonts, but some operating systems (eyes Android disapprovingly) still don't. However, like me you can include some CSS to help. On this page I've included
<link  href="https://fred-wang.github.io/MathFonts/STIX/mathfonts.css" />
.
You can read a lot more about this on MDN.

AsciiMath Math / Mathup

While LaTeX is by far the most common way that people author math, there are people who prefer AsciiMath. Rúnar Berg Baugsson Sigríðarson (@runarberg) has a nice explanation as to why they prefer to not use TeX.

His mathup package seems pretty nice (an AsciiMath dialect more than just AsciiMath), and there is also a corresponding markdown-it-math which is similarly easy to use...

/* eleventy.config.js */
const markdownIt = require("markdown-it");

module.exports = async function (eleventyConfig) {
  const markdownItMath = 
  	(await import('markdown-it-math')).default;

  eleventyConfig.setLibrary(
    "md", 
    markdownIt().use(markdownItMath)
  );
}

Then, you can embed AsciiMath in your markdown, fenced by the same $ or $$ and it will generate some nice MathML. For example...

$$
e = sum_(n=0)^oo 1/n!
$$

Will be transformed at build time and render in your browser as...

e=∑n=0∞1n!

Make sure you get markdown-it-math 5.0.0-rc.0 or above or this won't work. You might also consider including their stylesheet.

markdown-it-math also supports a nice pattern for easily integrating other engines for embedded transformations like Ron Kok's Temml or Fred Wang's TeXZilla.

Unicode Math

There is also Unicode Math, which Murray Sargent III developed and had integrated into all of the Microsoft products. It's pretty nifty too if you ask me. This repo has a nice comparison of the three.

Unfortunately there is no npm module for it (yet), so for now, unfortunately that remains an open wish.

So, that's it. Enjoy. Mathify your static sites.

Before you go... The work of Math rendering in browsers is severely under funded. Almost none of the effort or funding over the last 30 years to make this possible has come from browser vendors, but rather from individual contributors and those willing to help fund them. If you appreciate the importance of this work, please consider helping to support the work with a small monthly donation, and please help us to publicly lobby implementers to invest in it.

March 12, 2025 04:00 AM

Víctor Jáquez

Using pre-commit in GStreamer

Recently, GStreamer development story integrated the usage of pre-commit. pre-commit is a Git hook script that chain different linters, checkers, validators, formatters, etc., that are executed at git commit. This script is in Python. And there’s other GStreamer utility in Python: hotdoc

The challenge is that Debian doesn’t allow to install Python packages through pip, they have to be installed as Debian packages or inside virtual environments, such as venv.

So, instead of activating a virtual environment when I work in GStreamer, let’s just use direnv to activate it automatically.

Here’s a screencast of what I did to setup a Python virtual environment, within direnv, and installing pre-commit, hotdoc and gst-indent-1.0.

UPDATE: Tim told me that wit pipx we can do the same without the venv hassle.

https://mastodon.social/@tp_muller@fosstodon.org/114150178786863565

March 12, 2025 12:00 AM

March 11, 2025

Ricardo García

Device-Generated Commands at Vulkanised 2025

A month ago I attended Vulkanised 2025 in Cambridge, UK, to present a talk about Device-Generated Commands in Vulkan. The event was organized by Khronos and took place in the Arm Cambridge office. The talk I presented was similar to the one from XDC 2024, but instead of being a lightning 5-minutes talk, I had 25-30 minutes to present and I could expand the contents to contain proper explanations of almost all major DGC concepts that appear in the spec.

I attended the event together with my Igalia colleagues Lucas Fryzek and Stéphane Cerveau, who presented about lavapipe and Vulkan Video, respectively. We had a fun time in Cambridge and I can sincerely recommend attending the event to any Vulkan enthusiasts out there. It allows you to meet Khronos members and people working on both the specification and drivers, as well as many other Vulkan users from a wide variety of backgrounds.

The recordings for all sessions are now publicly available, and the one for my talk can be found embedded below. For those of you preferring slides and text, I’m also providing a transcription of my presentation together with slide screenshots further down.

In addition, at the end of the video there’s a small Q&A section but I’ve always found it challenging to answer questions properly on the fly and with limited time. For this reason, instead of transcribing the Q&A section literally, I’ve taken the liberty of writing down the questions and providing better answers in written form, and I’ve also included an extra question that I got in the hallways as bonus content. You can find the Q&A section right after the embedded video.

Vulkanised 2025 recording

* { padding: 0; margin: 0; overflow: hidden; } html, body { height: 100%; } img, span { /* All elements take the whole iframe width and are vertically centered. */ position: absolute; width: 100%; top: 0; bottom: 0; margin: auto; } span { /* This mostly applies to the play button. */ height: 1.5em; text-align: center; font-family: sans-serif; font-size: 500%; color: white; } Video: Vulkanised 2025: Device-Generated Commands in Vulkan "> * { padding: 0; margin: 0; overflow: hidden; } html, body { height: 100%; } img, span { /* All elements take the whole iframe width and are vertically centered. */ position: absolute; width: 100%; top: 0; bottom: 0; margin: auto; } span { /* This mostly applies to the play button. */ height: 1.5em; text-align: center; font-family: sans-serif; font-size: 500%; color: white; } Video: Vulkanised 2025: Device-Generated Commands in Vulkan " >

Questions and answers with longer explanations

Question: can you give an example of when it’s beneficial to use Device-Generated Commands?

There are two main use cases where DGC would improve performance: on the one hand, many times game engines use compute pre-passes to analyze the scene they want to draw and prepare some data for that scene. This includes maybe deciding LOD levels, discarding content, etc. After that compute pre-pass, results would need to be analyzed from the CPU in some way. This implies a stall: the output from that compute pre-pass needs to be transferred to the CPU so the CPU can use it to record the right drawing commands, or maybe you do this compute pre-pass during the previous frame and it contains data that is slightly out of date. With DGC, this compute dispatch (or set of compute dispatches) could generate the drawing commands directly, so you don’t stall or you can use more precise data. You also save some memory bandwidth because you don’t need to copy the compute results to host-visible memory.

On the other hand, sometimes scenes contain so much detail and geometry that recording all the draw calls from the CPU takes a nontrivial amount of time, even if you distribute this draw call recording among different threads. With DGC, the GPU itself can generate these draw calls, so potentially it saves you a lot of CPU time.

Question: as the extension makes heavy use of buffer device addresses, what are the challenges for tools like GFXReconstruct when used to record and replay traces that use DGC?

The extension makes use of buffer device addresses for two separate things. First, it uses them to pass some buffer information to different API functions, instead of passing buffer handles, offsets and sizes. This is not different from other APIs that existed before. The VK_KHR_buffer_device_address extension contains APIs like vkGetBufferOpaqueCaptureAddressKHR, vkGetDeviceMemoryOpaqueCaptureAddressKHR that are designed to take care of those cases and make it possible to record and reply those traces. Contrary to VK_KHR_ray_tracing_pipeline, which has a feature to indicate if you can capture and replay shader group handles (fundamental for capture and replay when using ray tracing), DGC does not have any specific feature for capture-replay. DGC does not add any new problem from that point of view.

Second, the data for some commands that is stored in the DGC buffer sometimes includes device addresses. This is the case for the index buffer bind command, the vertex buffer bind command, indirect draws with count (double indirection here) and ray tracing command. But, again, the addresses in those commands are buffer device addresses. That does not add new challenges for capture and replay compared to what we already had.

Question: what is the deal with the last token being the one that dispatches work?

One minor detail from DGC, that’s important to remember, is that, by default, DGC respects the order in which sequences appear in the DGC buffer and the state used for those sequences. If you have a DGC buffer that dispatches multiple draws, you know the state that is used precisely for each draw: it’s the state that was recorded before the execute-generated-commands call, plus the small changes that a particular sequence modifies like push constant values or vertex and index buffer binds, for example. In addition, you know precisely the order of those draws: executing the DGC buffer is equivalent, by default, to recording those commands in a regular command buffer from the CPU, in the same order they appear in the DGC buffer.

However, when you create an indirect commands layout you can indicate that the sequences in the buffer may run in an undefined order (this is VK_INDIRECT_COMMANDS_LAYOUT_USAGE_UNORDERED_SEQUENCES_BIT_EXT). If the sequences could dispatch work and then change state, we would have a logical problem: what do those state changes affect? The sequence that is executed right after the current one? Which one is that? We would not know the state used for each draw. Forcing the work-dispatching command to be the last one is much easier to reason about and is also logically tight.

Naturally, if you have a series of draws on the CPU where, for some of them, you change some small bits of state (e.g. like disabling the depth or stencil tests) you cannot do that in a single DGC sequence. For those cases, you need to batch your sequences in groups with the same state (and use multiple DGC buffers) or you could use regular draws for parts of the scene and DGC for the rest.

Question from the hallway: do you know what drivers do exactly at preprocessing time that is so important for performance?

Most GPU drivers these days have a kernel side and a userspace side. The kernel driver does a lot of things like talking to the hardware, managing different types of memory and buffers, talking to the display controller, etc. The kernel driver normally also has facilities to receive a command list from userspace and send it to the GPU.

These command lists are particular for each GPU vendor and model. The packets that form it control different aspects of the GPU. For example (this is completely made-up), maybe one GPU has a particular packet to modify depth buffer and test parameters, and another packet for the stencil test and its parameters, while another GPU from another vendor has a single packet that controls both. There may be another packet that dispatches draw work of all kinds and is flexible to accomodate the different draw commands that are available on Vulkan.

The Vulkan userspace driver translates Vulkan command buffer contents to these GPU-specific command lists. In many drivers, the preprocessing step in DGC takes the command buffer state, combines it with the DGC buffer contents and generates a final command list for the GPU, storing that final command list in the preprocess buffer. Once the preprocess buffer is ready, executing the DGC commands is only a matter of sending that command list to the GPU.

Talk slides and transcription

Slide 1: Device-Generated Commands in Vulkan title

Hello, everyone! I’m Ricardo from Igalia and I’m going to talk about device-generated commands in Vulkan.

Slide 2: About Me

First, some bits about me. I have been part of the graphics team at Igalia since 2019. For those that don’t know us, Igalia is a small consultancy company specialized in open source and my colleagues in the graphics team work on things such as Mesa drivers, Linux kernel drivers, compositors…​ that kind of things. In my particular case the focus of my work is contributing to the Vulkan Conformance Test Suite and I do that as part of a collaboration between Igalia and Valve that has been going on for a number of years now. Just to highlight a couple of things, I’m the main author of the tests for the mesh shading extension and device-generated commands that we are talking about today.

Slide 3: What are Device-Generated Commands

So what are device-generated commands? So basically it’s a new extension, a new functionality, that allows a driver to read command sequences from a regular buffer: something like, for example, a storage buffer, instead of the usual regular command buffers that you use. The contents of the DGC buffer could be filled from the GPU itself. This is what saves you the round trip to the CPU and, that way, you can improve the GPU-driven rendering process in your application. It’s like one step ahead of indirect draws and dispatches, and one step behind work graphs. And it’s also interesting because device-generated commands provide a better foundation for translating DX12. If you have a translation layer that implements DX12 on top of Vulkan like, for example, Proton, and you want to implement ExecuteIndirect, you can do that much more easily with device generated commands. This is important for Proton, which Valve uses to run games on the Steam Deck, i.e. Windows games on top of Linux.

Slide 4: Naïve CPU-based Approach

If we set aside Vulkan for a moment, and we stop thinking about GPUs and such, and you want to come up with a naive CPU-based way of running commands from a storage buffer, how do you do that? Well, one immediate solution we can think of is: first of all, I’m going to assign a token, an identifier, to each of the commands I want to run, and I’m going to store that token in the buffer first. Then, depending on what the command is, I want to store more information.

For example, if we have a sequence like we see here in the slide where we have a push constant command followed by dispatch, I’m going to store the token for the push constants command first, then I’m going to store some information that I need for the push constants command, like the pipeline layout, the stage flags, the offset and the size. Then, after that, depending on the size that I said I need, I am going to store the data for the command, which is the push constant values themselves. And then, after that, I’m done with it, and I store the token for the dispatch, and then the dispatch size, and that’s it.

But this doesn’t really work: this is not how GPUs work. A GPU would have a hard time running commands from a buffer if we store them this way. And this is not how Vulkan works because in Vulkan you want to provide as much information as possible in advance and you want to make things run in parallel as much as possible, and take advantage of the GPU.

Slide 5: VK_EXT_device_generated_commands

So what do we do in Vulkan? In Vulkan, and in the Vulkan VK_EXT_device_generated_commands extension, we have this central concept, which is called the Indirect Commands Layout. This is the main thing, and if you want to remember just one thing about device generated commands, you can remember this one.

The indirect commands layout is basically like a template for a short sequence of commands. The way you build this template is using the tokens and the command information that we saw colored red and green in the previous slide, and you build that in advance and pass that in advance so that, in the end, in the command buffer itself, in the buffer that you’re filling with commands, you don’t need to store that information. You just store the data for each command. That’s how you make it work.

And the result of this is that with the commands layout, that I said is a template for a short sequence of commands (and by short I mean a handful of them like just three, four or five commands, maybe 10), the DGC buffer can be pretty large, but it does not contain a random sequence of commands where you don’t know what comes next. You can think about it as divided into small chunks that the specification calls sequences, and you get a large number of sequences stored in the buffer but all of them follow this template, this commands layout. In the example we had, push constant followed by dispatch, the contents of the buffer would be push constant values, dispatch size, push content values, dispatch size, many times repeated.

Slide 6: Restricted Command Selection

The second thing that Vulkan does to be able to make this work is that we limit a lot what you can do with device-generated commands. There are a lot of things you cannot do. In fact, the only things you can do are the ones that are present in this slide.

You have some things like, for example, update push constants, you can bind index buffers, vertex buffers, and you can draw in different ways, using mesh shading maybe, you can dispatch compute work and you can dispatch raytracing work, and that’s it. You also need to check which features the driver supports, because maybe the driver only supports device-generated commands for compute or ray tracing or graphics. But you notice you cannot do things like start render passes or insert barriers or bind descriptor sets or that kind of thing. No, you cannot do that. You can only do these things.

Slide 7: Indirect Commands Layout

This indirect commands layout, which is the backbone of the extension, specifies, as I said, the layout for each sequence in the buffer and it has additional restrictions. The first one is that it must specify exactly one token that dispatches some kind of work and it must be the last token in the sequence. You cannot have a sequence that dispatches graphics work twice, or that dispatches computer work twice, or that dispatches compute first and then draws, or something like that. No, you can only do one thing with each DGC buffer and each commands layout and it has to be the last one in the sequence.

And one interesting thing that also Vulkan allows you to do, that DX12 doesn’t let you do, is that it allows you (on some drivers, you need to check the properties for this) to choose which shaders you want to use for each sequence. This is a restricted version of the bind pipeline command in Vulkan. You cannot choose arbitrary pipelines and you cannot change arbitrary states but you can switch shaders. For example, if you want to use a different fragment shader for each of the draws in the sequence, you can do that. This is pretty powerful.

Slide 8: Indirect Commands Layout creation structure

How do you create one of those indirect commands layout? Well, with one of those typical Vulkan calls, to create an object that you pass these CreateInfo structures that are always present in Vulkan.

And, as you can see, you have to pass these shader stages that will be used, will be active, while you draw or you execute those indirect commands. You have to pass the pipeline layout, and you have to pass in an indirect stride. The stride is the amount of bytes for each sequence, from the start of a sequence to the next one. And the most important information of course, is the list of tokens: an array of tokens that you pass as the token count and then the pointer to the first element.

Slide 9: Indirect Commands Layout layout token struct

Now, each of those tokens contains a bit of information and the most important one is the type, of course. Then you can also pass an offset that tells you how many bytes into the sequence for the start of the data for that command. Together with the stride, it tells us that you don’t need to pack the data for those commands together. If you want to include some padding, because it’s convenient or something, you can do that.

Slide 10: Indirect Commands Layout token data

And then there’s also the token data which allows you to pass the information that I was painting in green in other slides like information to be able to run the command with some extra parameters. Only a few tokens, a few commands, need that. Depending on the command it is, you have to fill one of the pointers in the union but for most commands they don’t need this kind of information. Knowing which command it is you just know you are going to find some fixed data in the buffer and you just read that and process that.

Slide 11: Indirect Execution Sets

One thing that is interesting, like I said, is the ability to switch shaders and to choose which shaders are going to be used for each of those individual sequences. Some form of pipeline switching, or restricted pipeline switching. To do that you have to create something that is called Indirect Execution Sets.

Each of these execution sets is like a group or an array, if you want to think about it like that, of pipelines: similar pipelines or shader objects. They have to share something in common, which is that all of the state in the pipeline has to be identical, basically. Only the shaders can change.

When you create these execution sets and you start adding pipelines or shaders to them, you assign an index to each pipeline in the set. Then, you pass this execution set beforehand, before executing the commands, so that the driver knows which set of pipelines you are going to use. And then, in the DGC buffer, when you have this pipeline token, you only have to store the index of the pipeline that you want to use. You create the execution set with 20 pipelines and you pass an index for the pipeline that you want to use for each draw, for each dispatch, or whatever.

Slide 12: Indirect Execution Sets creation structures

The way to create the execution sets is the one you see here, where we have, again, one of those CreateInfo structures. There, we have to indicate the type, which is pipelines or shader objects. Depending on that, you have to fill one of the pointers from the union on the top right here.

If we focus on pipelines because it’s easier on the bottom left, you have to pass the maximum pipeline count that you’re going to store in the set and an initial pipeline. The initial pipeline is what is going to set the template that all pipelines in the set are going to conform to. They all have to share essentially the same state as the initial pipeline and then you can change the shaders. With shader objects, it’s basically the same, but you have to pass more information for the shader objects, like the descriptor set layouts used by each stage, push-constant information…​ but it’s essentially the same.

Slide 13: Indirect Execution Sets update instructions

Once you have that execution set created, you can use those two functions (vkUpdateIndirectExecutionSetPipelineEXT and vkUpdateIndirectExecutionSetShaderEXT) to update and add pipelines to that execution set. You need to take into account that you have to pass a couple of special creation flags to the pipelines, or the shader objects, to tell the driver that you may use those inside an execution set because the driver may need to do something special for them. And one additional restriction that we have is that if you use an execution set token in your sequences, it must appear only once and it must be the first one in the sequence.

Slide 14: Recap So Far

The recap, so far, is that the DGC buffer is divided into small chunks that we call sequences. Each sequence follows a template that we call the Indirect Commands Layout. Each sequence must dispatch work exactly once and you may be able to switch the set of shaders we used with with each sequence with an Indirect Execution Set.

Slide 15: Executing Work with DGC

Wow do we go about actually telling Vulkan to execute the contents of a specific buffer? Well, before executing the contents of the DGC buffer the application needs to have bound all the needed states to run those commands. That includes descriptor sets, initial push constant values, initial shader state, initial pipeline state. Even if you are going to use an Execution Set to switch shaders later you have to specify some kind of initial shader state.

Slide 16: Executing Work with DGC function call and info structure

Once you have that, you can call this vkCmdExecuteGeneratedCommands. You bind all the state into your regular command buffer and then you record this command to tell the driver: at this point, execute the contents of this buffer. As you can see, you typically pass a regular command buffer as the first argument. Then there’s some kind of boolean value called isPreprocessed, which is kind of confusing because it’s the first time it appears and you don’t know what it is about, but we will talk about it in a minute. And then you pass a relatively larger structure containing information about what to execute.

In that GeneratedCommandsInfo structure, you need to pass again the shader stages that will be used. You have to pass the handle for the Execution Set, if you’re going to use one (if not you can use the null handle). Of course, the indirect commands layout, which is the central piece here. And then you pass the information about the buffer that you want to execute, which is the indirect address and the indirect address size as the buffer size. We are using buffer device address to pass information.

And then we have something again mentioning some kind of preprocessing thing, which is really weird: preprocess address and preprocess size which looks like a buffer of some kind (we will talk about it later). You have to pass the maximum number of sequences that you are going to execute. Optionally, you can also pass a buffer address for an actual counter of sequences. And the last thing that you need is the max draw count, but you can forget about that if you are not dispatching work using draw-with-count tokens as it only applies there. If not, you leave it as zero and it should work.

Slide 17: Executing Work with DGC preprocessing fields highlight

We have a couple of things here that we haven’t talked about yet, which are the preprocessing things. Starting from the bottom, that preprocess address and size give us a hint that there may be a pre-processing step going on. Some kind of thing that the driver may need to do before actually executing the commands, and we need to pass information about the buffer there.

The boolean value that we pass to the command ExecuteGeneratedCommands tells us that the pre-processing step may have happened before so it may be possible to explicitly do that pre-processing instead of letting the driver do that at execution time. Let’s take a look at that in more detail.

Slide 18: Preprocess Buffer

First of all, what is the pre-process buffer? The pre-process buffer is auxiliary space, a scratch buffer, because some drivers need to take a look at how the command sequence looks like before actually starting to execute things. They need to go over the sequence first and they need to write a few things down just to be able to properly do the job later to execute those commands.

Once you have the commands layout and you have the maximum number of sequences that you are going to execute, you can call this vkGetGeneratedCommandMemoryRequirementsEXT and the driver is going to tell you how much space it needs. Then, you can create a buffer, you can allocate the space for that, you need to pass a special new buffer usage flag (VK_BUFFER_USAGE_2_PREPROCESS_BUFFER_BIT_EXT) and, once you have that buffer, you pass the address and you pass a size in the previous structure.

Slide 19: Explicit Preprocessing

Now the second thing is that we have the possibility of ding this preprocessing step explicitly. Explicit pre-processing is something that’s optional, but you probably want to do that if you care about performance because it’s the key to performance with some drivers.

When you use explicit pre-processing you don’t want to (1) record the state, (2) call this vkPreProcessGeneratedCommandsEXT and (3) call vkExecuteGeneratedCommandsEXT. That is what implicit pre-processing does so this doesn’t give you anything if you do it this way.

This is designed so that, if you want to do explicit pre-processing, you’re going to probably want to use a separate command buffer for pre-processing. You want to batch pre-processing calls together and submit them all together to keep the GPU busy and to give you the performance that you want. While you submit the pre-processing steps you may be still preparing the rest of the command buffers to enqueue the next batch of work. That’s the key to doing pre-processing optimally.

You need to decide beforehand if you are going to use explicit pre-processing or not because, if you’re going to use explicit preprocessing, you need to pass a flag when you create the commands layout, and then you have to call the function to preprocess generated commands. If you don’t pass that flag, you cannot call the preprocessing function, so it’s an all or nothing. You have to decide, and you do what you want.

Slide 20: Explicit Preprocessing (continued)

One thing that is important to note is that preprocessing needs to know and has to have the same state, the same contents of the input buffers as when you execute so it can run properly.

The video contains a cut here because the presentation laptop ran out of battery.

Slide 21: Explicit Preprocessing (continued) state command buffer

If the pre-processing step needs to have the same state as the execution, you need to have bound the same pipeline state, the same shaders, the same descriptor sets, the same contents. I said that explicit pre-processing is normally used using a separate command buffer that we submit before actual execution. You have a small problem to solve, which is that you would need to record state twice: once on the pre-process command buffer, so that the pre-process step knows everything, and once on the execution, the regular command buffer, when you call execute. That would be annoying.

Instead of that, the pre-process generated commands function takes an argument that is a state command buffer and the specification tells you: this is a command buffer that needs to be in the recording state, and the pre-process step is going to read the state from it. This is the first time, and I think the only time in the specification, that something like this is done. You may be puzzled about what this is exactly: how do you use this and how do we pass this?

Slide 22: Explicit Preprocessing (continued) state command buffer ergonomics

I just wanted to get this slide out to tell you: if you’re going to use explicit pre-processing, the ergonomic way of using it and how we thought about using the processing step is like you see in this slide. You take your main command buffer and you record all the state first and, just before calling execute-generated-commands, the regular command buffer contains all the state that you want and that preprocess needs. You stop there for a moment and then you prepare your separate preprocessing command buffer passing the main one as an argument to the preprocess call, and then you continue recording commands in your regular command buffer. That’s the ergonomic way of using it.

Slide 23: Synchronization from filling the DGC buffer to reading from it

You do need some synchronization at some steps. The main one is that, if you generate the contents of the DGC buffer from the GPU itself, you’re going to need some synchronization: writes to that buffer need to be synchronized with something else that comes later which is executing or reading those commands from from the buffer.

Depending on if you use explicit preprocessing you can use the pipeline stage command-pre-process which is new and pre-process-read or you synchronize that with the regular device-generated-commands-execution which was considered part of the regular draw-indirect-stage using indirect-command-read access.

Slide 24: Synchronization (continued) from explicit preprocessing to execution

If you use explicit pre-processing you need to make sure that writes to the pre-process buffer happen before you start reading from that. So you use these just here (VK_PIPELINE_STAGE_COMMAND_PREPROCESS_BIT_EXT, VK_ACCESS_COMMAND_PREPROCESS_WRITE_BIT_EXT) to synchronize processing with execution (VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT, VK_ACCESS_INDIRECT_COMMAND_READ_BIT) if you use explicit preprocessing.

Slide 25: Quick How-To

The quick how-to: I just wanted to get this slide out for those wanting a reference that says exactly what you need to do. All the steps that I mentioned here about creating the commands layout, the execution set, allocating the preprocess buffer, etc. This is the basic how-to.

Slide 26: Thanks for watching!

And that’s it. Thanks for watching! Questions?

March 11, 2025 04:30 PM

Stéphane Cerveau

Vulkanised 2025

Vulkanised

UK calling #

Long time no see this beautiful grey sky, roast beef on sunday and large but full packed pub when there is a football or a rugby game (The rose team has been lucky this year, grrr).

It was a delightful journey in the UK starting with my family visiting London including a lot (yes a lot…) of sightviews in a very short amount of time. But we managed to fit everything in. We saw the changing of the guards, the Thames river tide on a boat, Harry Potter gift shops and the beautiful Arsenal stadium with its legendary pitch, one of the best of England.

It was our last attraction in London and now it was time for my family to go to Standsted back home and me to Cambridge and its legendary university.

To start the journey in Cambridge, first I got some rest on Monday in the hotel to face the hail of information I will get during the conference. This year, Vulkanised took place on Arm’s campus, who kindly hosted the event, providing everything we needed to feel at home and comfortable.

The first day, we started with an introduction from Ralph Potter, the Vulkan Working Group Chair at Khronos, who introduced the new 1.4 release and all the extensions coming along including “Vulkan Video”. Then we could start this conference with my favorite topic, decoding video content with Vulkan Video. And the game was on! There was a presentation every 30 minutes including a neat one from my colleague at Igalia Ricardo Garcia about Device-Generated Commands in Vulkan and a break every 3 presentations. It took a lot of mental energy to keep up with all the topics as each presentation was more interesting than the last. During the break, we had time to relax with good coffee, delicious cookies, and nice conversations.

The first day ended up with a tooling demonstrations from LunarG, helping us all to understand and tame the Vulkan beast. The beast is ours now!

As I was not in the best shape due to a bug I caught on Sunday, I decided to play it safe and went to the hotel just after a nice indian meal. I had to prepare myself for the next day, where I would present “Vulkan Video is Open: Application Showcase”.

Vulkan Video is Open: Application showcase ! #

First Srinath Kumarapuram from Nvidia gave a presentation about the new extensions made available during 2024 by the Vulkan Video TSG. It started with a brief timeline of the video extensions from the initial h26x decoding to the latest VP9 decode coming this year including the 2024 extensions such as the AV1 codec. Then he presented more specific extensions such as VK_KHR_video_encode_quantization_map, VK_KHR_video_maintenance2 released during 2024 and coming in 2025, VK_KHR_video_encode_intra_refresh. He mentioned that the Vulkan toolbox now completely supports Vulkan Video, including the Validation Layers, Vulkan Profiles, vulkaninfo or GFXReconstruct.

After some deserved applause for a neat presentation, it was my time to be on stage.

During this presentation I focused on the Open source ecosystem around Vulkan Video. Indeed Vulkan Video ships with a sample app which is totally open along with the regular Conformance Test Suite. But that’s not all! Two major frameworks now ship with Vulkan Video support: GStreamer and FFmpeg.

Before this, I started by talking about Mesa, the open graphics library. This library which is totally open provides drivers which support Vulkan Video extensions and allow applications to run Vulkan Video decode or encode. The 3 major chip vendors are now supported. It started in 2022 with RADV, a userspace driver that implements the Vulkan API on most modern AMD GPUs. This driver supports all the vulkan video extensions except the lastest ones such as VK_KHR_video_encode_quantization_map or VK_KHR_video_maintenance2 but this they should be implemented sometime in 2025. Intel GPUs are now supported with the ANV driver, this driver also supports the common video extensions such as h264/5 and AV1 codec. The last driver to gain support was at the end of 2024 where several of the Vulkan Video extensions were introduced to NVK, a Vulkan driver for NVIDIA GPUs. This driver is still experimental but it’s possible to decode H264 and H265 content as well as its proprietary version. This completes the offering of the main GPUs on the market.

Then I moved to the applications including GStreamer, FFmpeg and Vulkan-Video-Samples. In addition to the extensions supported in 2025, we talked mainly about the decode conformance using Fluster. To compare all the implementations, including the driver, the version and the framework, a spreadsheet can be found here. In this spreadsheet we summarize the 3 supported codecs (H264, H265 and AV1) with their associated test suites and compare their implemententations using Vulkan Video (or not, see results for VAAPI with GStreamer). GStreamer, my favorite playground, can now decode H264 and H265 since 1.24 and recently got the support for AV1 but the merge request is still under review. It supports more than 80% of the H264 test vectors for the JVT-AVC_V1 and 85% of the H265 test vectors in JCT-VC-HEVC_V1. FFMpeg is offering better figures passing 90% of the tests. It supports all the avaliable codecs including all of the encoders as well. And finally Vulkan-Video-Samples is the app that you want to use to support all codecs for both encode and decode, but its currently missing support for mesa drivers when it comes to use Fluster decode tests…

Vulkanised on the 3rd day #

During the 3rd day, we had interesting talks as well demonstrating the power of Vulkan, from Blender, a free and open-source 3D computer graphics software tool switching progressively to Vulkan, to the implementation of 3D a game engine using Rust, or compute shaders in Astronomy. My other colleague at Igalia, Lucas Fryzek, also had a presentation on Mesa with Lavapipe: a Mesa’s Software Renderer for Vulkan which allows you to have a hardware free implementation of Vulkan and to validate extensions in a simpler way. Finally, we finished this prolific and dense conference with Android and its close collaboration with Vulkan.

If you are interested in 3D graphics, I encourage you to attend future Vulkanised editions, which are full of passionate people. And if you can not attend you can still watch the presentation online.

If you are interested in the Vulkan Video presentation I gave, you can catch up the video here:

Or follow our Igalia live blog post on Vulkan Video:

https://blogs.igalia.com/vjaquez/vulkan-video-status/

As usual, if you would like to learn more about Vulkan, GStreamer or any other open multimedia framework, please feel free to contact us!

March 11, 2025 12:00 AM

March 10, 2025

Igalia WebKit Team

WebKit Igalia Periodical #16

Update on what happened in WebKit in the week from March 3 to March 10.

Cross-Port 🐱

Web Platform 🌐

Forced styling to field-sizing: fixed when an input element is auto filled, and added support for changing field-sizing dynamically.

Fixed an issue where the imperative popover APIs didn't take into account the source parameter for focus behavior.

Multimedia 🎥

GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.

Fixed YouTube breakage on videos with advertisements. The fix prevents scrolling to the comments section when the videos are fullscreened, but having working video playback was considered more important for now.

Graphics 🖼️

Fixed re-layout issues for form controls with the experimental field-sizing implementation.

Landed a change that improves the quality of damage rectangles and reduces the amount of painting done in the compositor in some simple scenarios.

Introduce a hybrid threaded rendering mode, scheduling tasks to both the CPU and GPU worker pools. By default we use CPU-affine rendering on WPE, and GPU-affine rendering on the GTK port, saturating the CPU/GPU worker pool first, before switching to the GPU/CPU.

Infrastructure 🏗️

We have recently enabled automatic nightly runs of WPT tests with WPE for the Web Platform Tests (WPT) dashboard. If you click on the “Edit” button at the wpt.fyi dashboard now there is the option to select WPE.

For example, one may compare the results for WPE to other browsers or examine the differences between the WPE and GTK ports.

These nightly runs happen now daily on the TaskCluster CI sponsored by Mozilla (Thanks to James Graham!). If you want to run WPT tests with WPE WebKit locally, there are instructions at the WPT documentation.

That’s all for this week!

by Unknown at March 10, 2025 10:59 PM

March 07, 2025

Andy Wingo

whippet lab notebook: untagged mallocs, bis

Earlier this week I took an inventory of how Guile uses the Boehm-Demers-Weiser (BDW) garbage collector, with the goal of making sure that I had replacements for all uses lined up in Whippet. I categorized the uses into seven broad categories, and I was mostly satisfied that I have replacements for all except the last: I didn’t know what to do with untagged allocations: those that contain arbitrary data, possibly full of pointers to other objects, and which don’t have a header that we can use to inspect on their type.

But now I do! Today’s note is about how we can support untagged allocations of a few different kinds in Whippet’s mostly-marking collector.

inside and outside

Why bother supporting untagged allocations at all? Well, if I had my way, I wouldn’t; I would just slog through Guile and fix all uses to be tagged. There are only a finite number of use sites and I could get to them all in a month or so.

The problem comes for uses of scm_gc_malloc from outside libguile itself, in C extensions and embedding programs. These users are loathe to adapt to any kind of change, and garbage-collection-related changes are the worst. So, somehow, we need to support these users if we are not to break the Guile community.

on intent

The problem with scm_gc_malloc, though, is that it is missing an expression of intent, notably as regards tagging. You can use it to allocate an object that has a tag and thus can be traced precisely, or you can use it to allocate, well, anything else. I think we will have to add an API for the tagged case and assume that anything that goes through scm_gc_malloc is requesting an untagged, conservatively-scanned block of memory. Similarly for scm_gc_malloc_pointerless: you could be allocating a tagged object that happens to not contain pointers, or you could be allocating an untagged array of whatever. A new API is needed there too for pointerless untagged allocations.

on data

Recall that the mostly-marking collector can be built in a number of different ways: it can support conservative and/or precise roots, it can trace the heap precisely or conservatively, it can be generational or not, and the collector can use multiple threads during pauses or not. Consider a basic configuration with precise roots. You can make tagged pointerless allocations just fine: the trace function for that tag is just trivial. You would like to extend the collector with the ability to make untagged pointerless allocations, for raw data. How to do this?

Consider first that when the collector goes to trace an object, it can’t use bits inside the object to discriminate between the tagged and untagged cases. Fortunately though the main space of the mostly-marking collector has one metadata byte for each 16 bytes of payload. Of those 8 bits, 3 are used for the mark (five different states, allowing for future concurrent tracing), two for the precise field-logging write barrier, one to indicate whether the object is pinned or not, and one to indicate the end of the object, so that we can determine object bounds just by scanning the metadata byte array. That leaves 1 bit, and we can use it to indicate untagged pointerless allocations. Hooray!

However there is a wrinkle: when Whippet decides the it should evacuate an object, it tracks the evacuation state in the object itself; the embedder has to provide an implementation of a little state machine, allowing the collector to detect whether an object is forwarded or not, to claim an object for forwarding, to commit a forwarding pointer, and so on. We can’t do that for raw data, because all bit states belong to the object, not the collector or the embedder. So, we have to set the “pinned” bit on the object, indicating that these objects can’t move.

We could in theory manage the forwarding state in the metadata byte, but we don’t have the bits to do that currently; maybe some day. For now, untagged pointerless allocations are pinned.

on slop

You might also want to support untagged allocations that contain pointers to other GC-managed objects. In this case you would want these untagged allocations to be scanned conservatively. We can do this, but if we do, it will pin all objects.

Thing is, conservative stack roots is a kind of a sweet spot in language run-time design. You get to avoid constraining your compiler, you avoid a class of bugs related to rooting, but you can still support compaction of the heap.

How is this, you ask? Well, consider that you can move any object for which we can precisely enumerate the incoming references. This is trivially the case for precise roots and precise tracing. For conservative roots, we don’t know whether a given edge is really an object reference or not, so we have to conservatively avoid moving those objects. But once you are done tracing conservative edges, any live object that hasn’t yet been traced is fair game for evacuation, because none of its predecessors have yet been visited.

But once you add conservatively-traced objects back into the mix, you don’t know when you are done tracing conservative edges; you could always discover another conservatively-traced object later in the trace, so you have to pin everything.

The good news, though, is that we have gained an easier migration path. I can now shove Whippet into Guile and get it running even before I have removed untagged allocations. Once I have done so, I will be able to allow for compaction / evacuation; things only get better from here.

Also as a side benefit, the mostly-marking collector’s heap-conservative configurations are now faster, because we have metadata attached to objects which allows tracing to skip known-pointerless objects. This regains an optimization that BDW has long had via its GC_malloc_atomic, used in Guile since time out of mind.

fin

With support for untagged allocations, I think I am finally ready to start getting Whippet into Guile itself. Happy hacking, and see you on the other side!

by Andy Wingo at March 07, 2025 01:47 PM

March 04, 2025

Abhijeet Kandalkar

Testing WebXR on Windows

WebXR on Windows

It started with my need to debug Chromium’s implementation of OpenXR. I wanted to understand how Chromium interfaces with OpenXR APIs. However, I noticed that only the Android and Windows ports of Chromium currently support OpenXR bindings. Since I needed to debug a desktop implementation, Windows was the only viable option. Additionally, I did not have access to a physical XR device, so I explored whether a simulator or emulator environment could be used to test WebXR support for websites.

Understanding WebXR and OpenXR

Before diving into implementation details, it’s useful to understand what WebXR and OpenXR are and how they differ.

WebXR is a web standard that enables immersive experiences, such as Virtual Reality (VR) and Augmented Reality (AR), in web browsers. It allows developers to create XR content using JavaScript and run it directly in a browser without requiring platform-specific applications.

OpenXR is a cross-platform API standard developed by the Khronos Group, designed to unify access to different XR hardware and software. It provides a common interface for VR and AR devices, ensuring interoperability across different platforms and vendors.

The key difference is that WebXR is a high-level API used by web applications to access XR experiences, whereas OpenXR is a low-level API used by platforms and engines to communicate with XR hardware. WebXR implementations, such as the one in Chromium use OpenXR as the backend to interact with different XR runtimes.

Chromium OpenXR Implementation

Chromium’s OpenXR implementation, which interacts with the platform-specific OpenXR runtime, is located in the device/vr/ directory. WebXR code interacts with this device/vr/ OpenXR implementation, which abstracts WebXR features across multiple platforms.

WebXR ---> device/vr/ ---> OpenXR API ---> OpenXR runtime

Installing OpenXR Runtime

To run OpenXR on Windows, you need to install an OpenXR runtime. You can download and install OpenXR Tools for Windows Mixed Reality from the Microsoft App Store:

OpenXR Tools for Windows Mixed Reality

If it is not available on your machine, you can enable it from the OpenXR Runtime tab in the application.

Installing Microsoft Mixed Reality Simulator

To set up a simulated environment for WebXR testing, follow these steps:

  1. Install Mixed Reality Portal from the Microsoft App Store.
  2. Follow the official Microsoft guide on enabling the Mixed Reality simulator: Using the Windows Mixed Reality Simulator

If you encounter hardware compatibility errors, refer to the troubleshooting steps in the guide below.

https://www.thewindowsclub.com/disable-hardware-requirement-checks-for-mixed-reality-portal

Connecting Chromium to OpenXR Implementation

Chromium provides a flag to select the OpenXR implementation.

  1. Open Chrome and navigate to:
    chrome://flags/#webxr-runtime
    
  2. Set the flag to OpenXR.

This enables Chromium to use the OpenXR runtime for WebXR applications.

Launch WebVR application

Launch chromium and Open : https://immersive-web.github.io/webxr-samples/immersive-vr-session.html

output

CallStack

When we call navigator.xr.requestSession("immersive-vr"); from Javascript, below call stack get triggered.

callstack

Conclusions

With this setup, you can explore and debug WebXR applications on Windows even without a physical VR headset. The combination of Chromium’s OpenXR implementation and Microsoft’s Mixed Reality Simulator provides a practical way to test WebXR features and interactions.

If you’re interested in further experimenting, try developing a simple WebXR scene to validate your setup! Additionally, we plan to post more about Chromium’s architecture on OpenXR and will link those posts here once they are ready.

March 04, 2025 06:30 PM

Andy Wingo

whippet lab notebook: on untagged mallocs

Salutations, populations. Today’s note is more of a work-in-progress than usual; I have been finally starting to look at getting Whippet into Guile, and there are some open questions.

inventory

I started by taking a look at how Guile uses the Boehm-Demers-Weiser collector‘s API, to make sure I had all my bases covered for an eventual switch to something that was not BDW. I think I have a good overview now, and have divided the parts of BDW-GC used by Guile into seven categories.

implicit uses

Firstly there are the ways in which Guile’s run-time and compiler depend on BDW-GC’s behavior, without actually using BDW-GC’s API. By this I mean principally that we assume that any reference to a GC-managed object from any thread’s stack will keep that object alive. The same goes for references originating in global variables, or static data segments more generally. Additionally, we rely on GC objects not to move: references to GC-managed objects in registers or stacks are valid across a GC boundary, even if those references are outside the GC-traced graph: all objects are pinned.

Some of these “uses” are internal to Guile’s implementation itself, and thus amenable to being changed, albeit with some effort. However some escape into the wild via Guile’s API, or, as in this case, as implicit behaviors; these are hard to change or evolve, which is why I am putting my hopes on Whippet’s mostly-marking collector, which allows for conservative roots.

defensive uses

Then there are the uses of BDW-GC’s API, not to accomplish a task, but to protect the mutator from the collector: GC_call_with_alloc_lock, explicitly enabling or disabling GC, calls to sigmask that take BDW-GC’s use of POSIX signals into account, and so on. BDW-GC can stop any thread at any time, between any two instructions; for most users is anodyne, but if ever you use weak references, things start to get really gnarly.

Of course a new collector would have its own constraints, but switching to cooperative instead of pre-emptive safepoints would be a welcome relief from this mess. On the other hand, we will require client code to explicitly mark their threads as inactive during calls in more cases, to ensure that all threads can promptly reach safepoints at all times. Swings and roundabouts?

precise tracing

Did you know that the Boehm collector allows for precise tracing? It does! It’s slow and truly gnarly, but when you need precision, precise tracing nice to have. (This is the GC_new_kind interface.) Guile uses it to mark Scheme stacks, allowing it to avoid treating unboxed locals as roots. When it loads compiled files, Guile also adds some sliced of the mapped files to the root set. These interfaces will need to change a bit in a switch to Whippet but are ultimately internal, so that’s fine.

What is not fine is that Guile allows C users to hook into precise tracing, notably via scm_smob_set_mark. This is not only the wrong interface, not allowing for copying collection, but these functions are just truly gnarly. I don’t know know what to do with them yet; are our external users ready to forgo this interface entirely? We have been working on them over time, but I am not sure.

reachability

Weak references, weak maps of various kinds: the implementation of these in terms of BDW’s API is incredibly gnarly and ultimately unsatisfying. We will be able to replace all of these with ephemerons and tables of ephemerons, which are natively supported by Whippet. The same goes with finalizers.

The same goes for constructs built on top of finalizers, such as guardians; we’ll get to reimplement these on top of nice Whippet-supplied primitives. Whippet allows for resuscitation of finalized objects, so all is good here.

misc

There is a long list of miscellanea: the interfaces to explicitly trigger GC, to get statistics, to control the number of marker threads, to initialize the GC; these will change, but all uses are internal, making it not a terribly big deal.

I should mention one API concern, which is that BDW’s state is all implicit. For example, when you go to allocate, you don’t pass the API a handle which you have obtained for your thread, and which might hold some thread-local freelists; BDW will instead load thread-local variables in its API. That’s not as efficient as it could be and Whippet goes the explicit route, so there is some additional plumbing to do.

Finally I should mention the true miscellaneous BDW-GC function: GC_free. Guile exposes it via an API, scm_gc_free. It was already vestigial and we should just remove it, as it has no sensible semantics or implementation.

allocation

That brings me to what I wanted to write about today, but am going to have to finish tomorrow: the actual allocation routines. BDW-GC provides two, essentially: GC_malloc and GC_malloc_atomic. The difference is that “atomic” allocations don’t refer to other GC-managed objects, and as such are well-suited to raw data. Otherwise you can think of atomic allocations as a pure optimization, given that BDW-GC mostly traces conservatively anyway.

From the perspective of a user of BDW-GC looking to switch away, there are two broad categories of allocations, tagged and untagged.

Tagged objects have attached metadata bits allowing their type to be inspected by the user later on. This is the happy path! We’ll be able to write a gc_trace_object function that takes any object, does a switch on, say, some bits in the first word, dispatching to type-specific tracing code. As long as the object is sufficiently initialized by the time the next safepoint comes around, we’re good, and given cooperative safepoints, the compiler should be able to ensure this invariant.

Then there are untagged allocations. Generally speaking, these are of two kinds: temporary and auxiliary. An example of a temporary allocation would be growable storage used by a C run-time routine, perhaps as an unbounded-sized alternative to alloca. Guile uses these a fair amount, as they compose well with non-local control flow as occurring for example in exception handling.

An auxiliary allocation on the other hand might be a data structure only referred to by the internals of a tagged object, but which itself never escapes to Scheme, so you never need to inquire about its type; it’s convenient to have the lifetimes of these values managed by the GC, and when desired to have the GC automatically trace their contents. Some of these should just be folded into the allocations of the tagged objects themselves, to avoid pointer-chasing. Others are harder to change, notably for mutable objects. And the trouble is that for external users of scm_gc_malloc, I fear that we won’t be able to migrate them over, as we don’t know whether they are making tagged mallocs or not.

what is to be done?

One conventional way to handle untagged allocations is to manage to fit your data into other tagged data structures; V8 does this in many places with instances of FixedArray, for example, and Guile should do more of this. Otherwise, you make new tagged data types. In either case, all auxiliary data should be tagged.

I think there may be an alternative, which would be just to support the equivalent of untagged GC_malloc and GC_malloc_atomic; but for that, I am out of time today, so type at y’all tomorrow. Happy hacking!

by Andy Wingo at March 04, 2025 03:42 PM

March 03, 2025

Igalia WebKit Team

WebKit Igalia Periodical #15

Update on what happened in WebKit in the week from February 19 to March 3.

Cross-Port 🐱

Web Platform 🌐

Implemented support for setting returnValue for <dialog> with Invoker Commands.

After fixing an issue with Trusted Types when doing attribute mutation within the default callback, and implementing performance improvements for Trusted Types enforcement, the Trusted Types implementation is now considered stable and has been enabled by default.

Multimedia 🎥

GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.

Landed one fix which, along with previous patches, solved the webKitMediaSrcStreamFlush() crash reported in bug #260455.

Unfortunately, in some pages where the crash previously occurred, now a different blank video bug has been revealed. The cause of this bug is known, but fixing it would cause performance regressions in pages with many video elements. Work is ongoing to find a better solution for both.

The initial support of MP4-muxed WebVTT in-band text tracks is about to be merged, which will bring this MSE feature to the ports using GStreamer. Text tracks for the macOS port of WebKit only landed two weeks ago and we expect there will be issues to iron out in WebKit ports, multiplatform code and even potentially in spec work—we are already aware of a few potential ones.

Note that out-of band text-tracks are well supported in MSE across browsers and commonly used. On the other hand, no browsers currently ship with in-band text track support in MSE at this point.

Support for MediaStreamTrack.configurationchange events was added, along with related improvements in the GStreamer PipeWire plugin. This will allow WebRTC applications to seamlessly handle default audio/video capture changes.

Graphics 🖼️

Continued improving the support for handling graphics damage:

  • Added support for validating damage rectangles in Layout Tests.

  • Landed a change that adds layout tests covering the damage propagation feature.

  • Landed a change that fixes damage rectangles on layer resize operations.

  • Landed a change that improves damage rectangles produced by scrolling so that they are clipped to the parent container.

The number of threads used for painting with the GPU has been slightly tweaked, which brings a measurable performance improvement in all kinds of devices with four or mores processor cores.

Releases 📦️

The stable branch for the upcoming 2.48.x stable release series of the GTK and WPE ports has been created. The first preview releases from this branch are WebKitGTK 2.47.90 and WPE WebKit 2.47.90. People willing to report issues and help with stabilization are encouraged to test them and report issues in Bugzilla.

Community & Events 🤝

Published a blog post that presents an opinionated approach to the work with textual logs obtained from WebKit and GStreamer.

That’s all for this week!

by Unknown at March 03, 2025 02:15 PM

February 28, 2025

Javier Fernández

Can I use Secure Curves in the Web Platform?

Long story short, yes, it’s possible to use the Ed25519 and X25519 algorithms through the Web Cryptography API exposed by the major browser engines: Blink (Chrome, Edge, Brave, …), WebKit (Safari) and Gecko (Firefox).

However, despite the hard work during the last year we haven’t been able to ship Ed25519 in Chrome. It’s still available behind the Experimental Web Platform Features runtime flag. In this post, I will explain the current blockers and plans for the future regarding this feature.

Although disappointed about the current status of the Ed25519 in Chrome, I’m very satisfied to see the results of our efforts, with the implementation now moving forward in other major web engines and shipping by default for both Ed25119 [1] and X25119 [2].

Finally, I want to remark about the work done to improve interoperability, which is a relevant debt this feature carried the last few years, to ensure applications can realiably use the Curve21559 features in any of the major browsers.

Context

Before analyzing the current status and blockers, it’s important to understand why browser support for this feature matters, why merging the WICG draft into the Web Cryptography API specification is key.

I’ve already written about this in my last post so I’m not going to elaborate too much, but it’s important to describe some of the advantages for the Web Platform users this API has over the current alternatives.

The Ed25519 algorithm for EdDSA signing and the X25519 function for key-agreement offer stronger security and better performance than other algorithms. For instance, the RSA keys are explicitly banned from new features like Web Transport. The smaller key size (32 bytes) and EdDSA signatures (64 bytes) provide advantages in terms of transmission rates, especially in distributed systems and peer-to-peer communications.

The lack of a browser API to use the Curve25519 algorithms have forced web authors to rely on external components, either JS libraries or WASM compiled, which implies a security risk. This situation is especially sad, considering that browsers already have support for these algorithms as part of the TLS 1.3 implementation; it’s just not exposed to web authors.

Web Platform Feature Development Timeline: Key Milestones

To get an idea of of what the time-frame and effort required to develop a web feature like this looks like, lets consider the following milestones:

  1. 2020 – Jan
    • Secure Curves on WebCrypto API repository created in the W3C WICG
    • Intent-To-Prototype request for Blink, sent by Qingsi Wang
    • TAG review request
    • Standard position request for Gecko about X25519
  2. 2020 – Feb
    • Initial implementation by Qingsi Wang of the Ed25519 algorithm for Blink (Abandoned )
  3. 2020 – Sep
    • Intent-To-Prototype request of Ed25519 and X25519 for Blink, sent by Javier Fernandez (Igalia)
    • Design document
    • Standard position request for WebKit about Secure Curves in WebKit
  4. 2022 – October
    • Initial implementation by Javier Fernandez (Igalia) of the Ed25519 algorithm for Blink
  5. 2023 – Jan
    • Initial Implementation by Angela Izquierdo (Apple) of the Ed25519 algorithm for WebKit (Apple port)
  6. 2023 – Feb
  7. 2023 – Mar
    • Initial implementation by Javier Fernandez (Igalia) of the X25519 algorithm for Blink
  8. 2023 – Aug
    • Initial implementation by Javier Fernandez (Igalia) of the Ed25519 algorithm for WebKit (GTK+ port)
    • Initial implementation by Javier Fernandez (Igalia) of the X25519 algorithm for WebKit (GTK+ port)
  9. 2023 – Sep
    • Initial implementation by Javier Fernandez (Igalia) for WebKit of the X25519 algorithm for WebKit (Apple port)
    • Safari STP 178 release shipped with Ed25519
  10. 2024 – March
    • Initial implementation by Anna Weine (Mozilla) for Gecko of the Ed25519 algorithm
  11. 2024 – Jun
    • Initial implementation by Anna Weine (Mozilla) for Gecko of the X25519 algorithm
  12. 2024 – Sep
    • Firefox 130 shipped both Ed25519 and X25519 enabled by default
  13. 2024 – Nov
    • Intent-to-Ship request of X25519 for Blink by Javier Fernandez (Igalia)
  14. 2024 – Dec
    • The Ed25519 and X25519 algorithms are integrated into the Web Cryptography API draft
    • The X25519 algorithm is enabled by default in WebKit (all ports)
  15. 2025 – Jan
    • Safari STP 211 release shipped with X25519
  16. 2025 – Feb
    • Chrome 133 release shipped with X25519 enabled by default

This is a good example of a third-party actor not affiliated with a browser-vendor investing time and money to change the priorities of the companies behind the main browsers to bring an important feature to the Web Platform. It’s been a large effort during 2 years, reaching agreement between 3 different browsers, spec editor and contributors and the W3C Web App Sec WG, which manages the Web Cryptography API specification.

It’s worth mentioning that a large part of the time has been invested in increasing the testing coverage in the Web Platform Tests’ WebCryptoAPI test suite and improving the interoperability between the three main web engines. This effort implies filing bugs in the corresponding browsers, discussing in the Secure Curves WICG about the best approach to address the interop issue and writing tests to ensure we don’t regress in the future.

Unfortunately, a few of these interop issues are the reason why the Ed25519 algorithm has not been shipped when I expected, but I’ll elaborate more in this later in this post.

Current implementation status of the Curve25519 algorithms

The following table provides a high-level overview of the support of the Secure Curve25519 features in some of the main browsers:

If we want to take a broader look at the implementation status of these algorithms, there is a nice table in the issue #20 at the WICG repository:

Interoperability

As I commented before, most of the work done during this year was focused on improving the spec and increasing the test coverage by the WPT suite. The main goal of these efforts was improving the interoperability of the feature between the three main browsers. It’s interesting to compare the results with the data shown in my previous post.

Test results for the generateKey method:

Current

Before

Test results for the deriveBits and deriveKey methods:

Current

Before

Tests results for the importKey and exportKey methods:

Current

Before

Tests results for the sign and verify methods:

Current

Before

Tests results for the wrap and unwrap methods:

Current

Before

Why Chrome didn’t ship the Ed25519 feature ?

The blockers that prevented the Intent-To-Ship request of the Ed25519 are primarily these 2 issues in the spec:

  • Use of randomize EdDSA signatures (issue #28)
  • Rejection of any invalid and small-order points (issue #27)

There are other minor disagreements regarding the Ed25519 specification, like the removal of the “alg” member in the JWK format in the import/export operations. There is a bug 40074061 in Chrome to implement this spec change, but apparently it has not been agreed on by Chrome, and now there is not enough support to proceed. Firefox already implements the specified behavior and there is a similar bug report in WebKit (bug 262613) where it seems there is support to implement the behavior change. However, I’d rather avoid introducing an interop issue and delay the implementation until there is more consensus regarding the spec.

The issue about the use of randomized EdDSA signatures comes from the fact that WebKit’s CryptoKit, the underlaying cryptography component of the WebKit engine, follows that approach in their implementation of the Ed25519 generateKey() operation. It’s been always in the spirit of the Secure Curves spec to rely completely on the corresponding official RFCs. In the case of the Ed25519 algorithm it refers to the RFC8032, where it’s stated the deterministic nature of the Ed25519 keys. However, the CRFG is currently discussing the issue and there is a proposal of defining a Ed25519-bis with randomized signatures.

The small-order issue is more complex, it seems. The spec states clearly that any invalid or small-order point should be rejected during the Verify operation. This behavior is based on the RFC8032 mentioned before. Ideally, the underlying cryptography library should take care of performing these checks, and this has been the approach followed by the three main browsers in the implementation of the whole API; in the case of Chrome, this cryptography library is BoringSSL. The main problem here is that there are differences in how the cryptography libraries implement these checks, and BoringSSL is not an exception. The WPT I have implemented to cover these cases also show interop issues between the three main engines. I’ve filed the bug 697, but it was marked as low priority. The alternative would be to implement the additional checks in the WebCrypto implementation, but Chrome is not very positive about this approach.

The small-order checks has been a request from Mozilla since the initial standard-position request submitted long time ago. This has been stated again in the PR#362 and Apple expressed positive opinions about this as well, so I believe this is going to be the selected approach.

Conclusions

I believe that Secure Curves being added into the Web Cryptography specification is a great achievement for the Web Platform and brings a very powerful feature for web developers. This is especially true in the realm of a decentralized web where Content Addressing is a key concept, which is based on cryptography hashes. Browsers exposing APIs to use Ed25519 and X25519 is going to offer a big advantage for decentralized web applications. I want to thank Daniel Huigens, editor of the Web Cryptography API specification, for his huge effort to address all the spec issues filed during the years, driving discussions always with the aim to reach consensus based resolutions, despite how frustrating the process sometimes is to develop a feature like this until it’s shipped in the browser.

The implementation in the three major browsers is a clear sign of the stability of these features and ensures it will be maintained properly in the future. This includes the effort to keep interoperability of the three implementations, which is crucial for a healthy Web Platform ecosystem and ultimately to the web authors. The fact that we couldn’t ship Ed25519 in Chrome is the only negative aspect of the work done this year but I believe it’s going to be resolved soon.

At Igalia we expect to continue working in 2025 on the Web Cryptography API specification, at least until the blocker issues mentioned before are addressed and we can send the Intent-To-Ship request. We hope to also find the opportunity to contribute to the spec to add new algorithms and also carry out maintenance work; ensuring an interoperable Web Platform has been always a priority for Igalia and an important factor when evaluating the projects we take on as a company.

by jfernandez at February 28, 2025 03:24 PM

February 27, 2025

Alex Bradbury

ccache for LLVM builds across multiple directories

Problem description

If you're regularly rebuilding a large project like LLVM, you almost certainly want to be using ccache. Incremental builds are helpful, but it's quite common to be swapping between different commit IDs and it's very handy to have the build complete relatively quickly without needing any explicit thought on your side. Enabling ccache with LLVM's CMake build system is trivial, but out of the box you will suffer from cache misses if building llvm-project in different build directories, even for an identical commit and identical build settings (and even for identical source directories, i.e. without even considering separate checkouts or separate git work trees):

mkdir -p exp && cd exp
ccache -zC # Clear old ccache
git clone --reference-if-able ~/llvm-project https://github.com/llvm/llvm-project/
cd llvm-project
cmake -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DLLVM_ENABLE_LLD=ON \
  -DLLVM_TARGETS_TO_BUILD="X86" \
  -DCMAKE_{C,CXX}_COMPILER_LAUNCHER=ccache \
  -B build/a \
  -S llvm
cmake --build build/a
echo "@@@@@@@@@@ Stats after building build/a @@@@@@@@@@"
ccache -s
cmake -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DLLVM_ENABLE_LLD=ON \
  -DLLVM_TARGETS_TO_BUILD="X86" \
  -DCMAKE_{C,CXX}_COMPILER_LAUNCHER=ccache \
  -B build/b \
  -S llvm
cmake --build build/b
echo "@@@@@@@@@@ Stats after building build/b @@@@@@@@@@"
ccache -s

We see no cache hits:

@@@@@@@@@@ Stats after building build/a @@@@@@@@@@
Cacheable calls:    2252 /   2253 (99.96%)
  Hits:                0 /   2252 ( 0.00%)
    Direct:            0
    Preprocessed:      0
  Misses:           2252 /   2252 (100.0%)
Uncacheable calls:     1 /   2253 ( 0.04%)
Local storage:
  Cache size (GiB):  0.2 / 1024.0 ( 0.02%)
  Cleanups:          256
  Hits:                0 /   2252 ( 0.00%)
  Misses:           2252 /   2252 (100.0%)

@@@@@@@@@@ Stats after building build/b @@@@@@@@@@
Cacheable calls:    4504 /   4506 (99.96%)
  Hits:               71 /   4504 ( 1.58%)
    Direct:            0 /     71 ( 0.00%)
    Preprocessed:     71 /     71 (100.0%)
  Misses:           4433 /   4504 (98.42%)
Uncacheable calls:     2 /   4506 ( 0.04%)
Local storage:
  Cache size (GiB):  0.5 / 1024.0 ( 0.04%)
  Cleanups:          256
  Hits:               71 /   4504 ( 1.58%)
  Misses:           4433 /   4504 (98.42%)

Let's take a look at a build command to check what's going on:

$ ninja -C build/b -t commands lib/Support/CMakeFiles/LLVMSupport.dir/APInt.cpp.o
ccache /usr/bin/clang++ -DGTEST_HAS_RTTI=0 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/asb/exp/llvm-project/build/b/lib/Support -I/home/asb/exp/llvm-project/llvm/lib/Support -I/home/asb/exp/llvm-project/build/b/include -I/home/asb/exp/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Werror=global-constructors -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/APInt.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/APInt.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/APInt.cpp.o -c /home/asb/exp/llvm-project/llvm/lib/Support/APInt.cpp

We can see that as LLVM generates header files, it has absolute directories specified in -I within the build directory, which of course differs for build a and build b above, causing a cache miss. Even if there was a workaround for the generated headers, we'd still fail to get cache hits if building from different llvm-project checkouts or worktrees.

Solution

Unsurprisingly, this is a common problem with ccache and it has good documentation on the solution. It advises:

  • Setting the base_dir ccache option to enable ccache's rewriting of absolute to relative paths for any path with that prefix.
  • Setting the absolute_paths_in_stderr ccache option in order to rewrite relative paths in stderr output to absolute (thus avoiding confusing error messages).
    • I have to admit that trialling this alongside forcing an error in a header with # error "forced error" I'm not sure I see a difference for Clang's output when attempting to build LLVM.
  • If compiling with -g, use the -fdebug-prefix-map option.

The ccache changes can be actioned by:

ccache --set-config base_dir=/home
ccache --set-config absolute_paths_in_stderr=true

The -fdebug-prefix-map option can be enabled by setting -DLLVM_USE_RELATIVE_PATHS_IN_DEBUG_INFO=ON in your CMake invocation.

Testing the solution

Repeating the cmake and ccache invocations from earlier, we see that build/b had almost a 100% hit rate:

@@@@@@@@@@ Stats after building build/a @@@@@@@@@@
Cacheable calls:    2252 /   2253 (99.96%)
  Hits:                0 /   2252 ( 0.00%)
    Direct:            0
    Preprocessed:      0
  Misses:           2252 /   2252 (100.0%)
Uncacheable calls:     1 /   2253 ( 0.04%)
Local storage:
  Cache size (GiB):  0.2 / 1024.0 ( 0.02%)
  Cleanups:          256
  Hits:                0 /   2252 ( 0.00%)
  Misses:           2252 /   2252 (100.0%)

@@@@@@@@@@ Stats after building build/b @@@@@@@@@@
Cacheable calls:    4504 /   4506 (99.96%)
  Hits:             2251 /   4504 (49.98%)
    Direct:         2251 /   2251 (100.0%)
    Preprocessed:      0 /   2251 ( 0.00%)
  Misses:           2253 /   4504 (50.02%)
Uncacheable calls:     2 /   4506 ( 0.04%)
Local storage:
  Cache size (GiB):  0.2 / 1024.0 ( 0.02%)
  Cleanups:          256
  Hits:             2251 /   4504 (49.98%)
  Misses:           2253 /   4504 (50.02%)

And additionally building llvm from a new worktree (i.e. a different absolute path to the source directory):

git worktree add -b main-wt1 wt1 && cd wt1
cmake -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DLLVM_ENABLE_LLD=ON \
  -DLLVM_TARGETS_TO_BUILD="X86" \
  -DCMAKE_{C,CXX}_COMPILER_LAUNCHER=ccache \
  -B build/c \
  -S llvm
cmake --build build/c
echo "@@@@@@@@@@ Stats after building build/c @@@@@@@@@@"
ccache -s

Which results in the following stats (i.e., another close to 100% hit rate):

@@@@@@@@@@ Stats after building build/c @@@@@@@@@@
Cacheable calls:    6756 /   6759 (99.96%)
  Hits:             4502 /   6756 (66.64%)
    Direct:         4502 /   4502 (100.0%)
    Preprocessed:      0 /   4502 ( 0.00%)
  Misses:           2254 /   6756 (33.36%)
Uncacheable calls:     3 /   6759 ( 0.04%)
Local storage:
  Cache size (GiB):  0.2 / 1024.0 ( 0.02%)
  Cleanups:          256
  Hits:             4502 /   6756 (66.64%)
  Misses:           2254 /   6756 (33.36%)

And building with -DLLVM_USE_RELATIVE_PATHS_IN_DEBUG_INFO=ON and -DCMAKE_BUILD_TYPE=Debug:

cmake -G Ninja \
  -DCMAKE_BUILD_TYPE=Debug \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DLLVM_ENABLE_LLD=ON \
  -DLLVM_TARGETS_TO_BUILD="X86" \
  -DCMAKE_{C,CXX}_COMPILER_LAUNCHER=ccache \
  -DLLVM_USE_RELATIVE_PATHS_IN_DEBUG_INFO=ON \
  -B build/debug_a \
  -S llvm
cmake --build build/debug_a
echo "@@@@@@@@@@ Stats after building build/debug_a @@@@@@@@@@"
ccache -s
cmake -G Ninja \
  -DCMAKE_BUILD_TYPE=Debug \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DLLVM_ENABLE_LLD=ON \
  -DLLVM_TARGETS_TO_BUILD="X86" \
  -DCMAKE_{C,CXX}_COMPILER_LAUNCHER=ccache \
  -DLLVM_USE_RELATIVE_PATHS_IN_DEBUG_INFO=ON \
  -B build/debug_b \
  -S llvm
cmake --build build/debug_b
echo "@@@@@@@@@@ Stats after building build/debug_b @@@@@@@@@@"
ccache -s

This results in no hits for debug_a and close to 100% hits for debug_b as expected:

@@@@@@@@@@ Stats after building build/debug_a @@@@@@@@@@
Cacheable calls:    9008 /   9012 (99.96%)
  Hits:             4502 /   9008 (49.98%)
    Direct:         4502 /   4502 (100.0%)
    Preprocessed:      0 /   4502 ( 0.00%)
  Misses:           4506 /   9008 (50.02%)
Uncacheable calls:     4 /   9012 ( 0.04%)
Local storage:
  Cache size (GiB):  3.1 / 1024.0 ( 0.31%)
  Cleanups:          256
  Hits:             4502 /   9008 (49.98%)
  Misses:           4506 /   9008 (50.02%)

@@@@@@@@@@ Stats after building build/debug_b @@@@@@@@@@
Cacheable calls:    11260 /  11265 (99.96%)
  Hits:              6753 /  11260 (59.97%)
    Direct:          6753 /   6753 (100.0%)
    Preprocessed:       0 /   6753 ( 0.00%)
  Misses:            4507 /  11260 (40.03%)
Uncacheable calls:      5 /  11265 ( 0.04%)
Local storage:
  Cache size (GiB):   3.2 / 1024.0 ( 0.31%)
  Cleanups:           256
  Hits:              6753 /  11260 (59.97%)
  Misses:            4507 /  11260 (40.03%)

Limitations

Rewriting paths to relative works in most cases, but you'll still experience cache misses if the location of your build directory relative to the source directory differs. This might happen if you compile directly in build/ in one checkout, but in build/foo in another, or if compiling outside of the llvm-project source tree altogether in one case, but within it (e.g. in build/) in another.

This is normally pretty easy to avoid, but is worth being aware of. For instance I find it helpful on LLVM buildbots I administer to be able to rapidly reproduce a previous build using ccache, but the default source vs build directory layout used during CI is different to what I normally use in day to day development.

Other helpful options

I was going to advertise inode_cache = true, but I see this is enabled by default since I last looked. Otherwise, file_clone = true (docs makes sense for my case where I'm in a filesystem with reflink support (XFS) and have plenty of space.


Article changelog
  • 2025-02-27: Initial publication date.

February 27, 2025 12:00 PM

February 26, 2025

Pawel Lampe

Working with WebKit and GStreamer logs in Emacs.

WebKit has grown into a massive codebase throughout the years. To make developers’ lives easier, it offers various subsystems and integrations. One such subsystem is a logging subsystem that offers the recording of textual logs describing an execution of the internal engine parts.

The logging subsystem in WebKit (as in any computer system), is usually used for both debugging and educational purposes. As WebKit is a widely-used piece of software that runs on everything ranging from desktop-class devices up to low-end embedded devices, it’s not uncommon that logging is sometimes the only way for debugging when various limiting factors come into play. Such limiting factors don’t have to be only technical - it may also be that the software runs on some restricted systems and direct debugging is not allowed.

Requirements for efficient work with textual logs #

Regardless of the reasons why logging is used, once the set of logs is produced, one can work with it according to the particular need. From my experience, efficient work with textual logs requires a tool with the following capabilities:

  1. Ability to search for a particular substring or regular expression.
  2. Ability to filter text lines according to the substring or regular expressions.
  3. Ability to highlight particular substrings.
  4. Ability to mark certain lines for separate examination (with extra notes if possible).
  5. Ability to save and restore the current state of work.

While all text editors should be able to provide requirement 1, requirements 2-5 are usually more tricky and text editors won’t support them out of the box. Fortunately, any modern extensible text editor should be able to support requirements 2-5 after some extra configuration.

Setting up Emacs to work with logs #

Throughout the following sections, I use Emacs, the classic “extensible, customizable, free/libre text editor”, to showcase how it can be set up and used to meet the above criteria and to make work with logs a gentle experience.

Emacs, just like any other text editor, provides the support for requirement 1 from the previous section out of the box.

loccur - the minor mode for text filtering #

To support requirement 2, it requires some extra mode. My recommendation for that is loccur - the minor mode that acts just like a classic grep *nix utility yet directly in the editor. The benefit of that mode (over e.g. occur) is that it works in-place. Therefore it’s very ergonomic and - as I’ll show later - it works well in conjunction with bookmarking mode.

Installation of loccur is very simple and can be done from within the built-in package manager:

M-x package-install RET loccur RET

With loccur installed, one can immediately start using it by calling M-x loccur RET <regex> RET. The figure below depicts the example of filtering: loccur - the minor mode for text filtering.

highlight-symbol - the package with utility functions for text highlighting #

To support requirement 3, Emacs also requires the installation of extra module. In that case my recommendation is highlight-symbol that is a simple set of functions that enables basic text fragment highlighting on the fly.

Installation of this module is also very simple and boils down to:

M-x package-install RET highlight-symbol RET

With the above, it’s very easy to get results like in the figure below: highlight-symbol - the package with utility functions for text highlighting. just by moving the cursor around and using C-c h to toggle the highlight of the text at the current cursor position.

bm - the package with utility functions for buffer lines bookmarking #

Finally, to support requirements 4 and 5, Emacs requires one last extra package. This time my recommendation is bm that is quite a powerful set of utilities for text bookmarking.

In this case, installation is also very simple and is all about:

M-x package-install RET bm RET

In a nutshell, the bm package brings some visual capabilities like in the figure below: bm - the package with utility functions for buffer lines bookmarking as well as non-visual capabilities that will be discussed in further sections.

The final configuration #

Once all the necessary modules are installed, it’s worth to spend some time on configuration. With just a few simple tweaks it’s possible to make the work with logs simple and easily reproducible.

To not influence other workflows, I recommend attaching as much configuration as possible to any major mode and setting that mode as a default for files with certain extensions. The configuration below uses a major mode called text-mode as the one for working with logs and associates all the files with a suffix .log with it. Moreover, the most critical commands of the modes installed in the previous sections are binded to the key shortcuts. The one last thing is to enable truncating the lines ((set-default 'truncate-lines t)) and highlighting the line that the cursor is currently at ((hl-line-mode 1)).

(add-to-list 'auto-mode-alist '("\\.log\\'" . text-mode))
(add-hook 'text-mode-hook
(lambda ()
(define-key text-mode-map (kbd "C-c t") 'bm-toggle)
(define-key text-mode-map (kbd "C-c n") 'bm-next)
(define-key text-mode-map (kbd "C-c p") 'bm-previous)
(define-key text-mode-map (kbd "C-c h") 'highlight-symbol)
(define-key text-mode-map (kbd "C-c C-c") 'highlight-symbol-next)
(set-default 'truncate-lines t)
(hl-line-mode 1)
))

WebKit logs case study #

To show what the workflow of Emacs is with the above configuration and modules, some logs are required first. It’s very easy to get some logs out of WebKit, so I’ll additionally get some GStreamer logs as well. For that, I’ll build a WebKit GTK port from the latest revision of WebKit repository. To make the build process easier, I’ll use the WebKit container SDK.

Here’s the build command:

./Tools/Scripts/build-webkit --gtk --debug --cmakeargs="-DENABLE_JOURNALD_LOG=OFF"

The above command disables the ENABLE_JOURNALD_LOG build option so that logs are printed to stderr. This will result in the WebKit and GStreamer logs being bundled together as intended.

Once the build is ready, one can run any URL to get the logs. I’ve chosen a YouTube conformance tests suite from 2021 and selected test case “39. PlaybackRateChange” to get some interesting entries from multimedia-related subsystems:

export GST_DEBUG_NO_COLOR=1
export GST_DEBUG=4,webkit*:7
export WEBKIT_DEBUG=Layout,Media=debug,Events=debug
export URL='https://ytlr-cert.appspot.com/2021/main.html'
./Tools/Scripts/run-minibrowser --gtk --debug --features=+LogsPageMessagesToSystemConsole "${URL}" &> log.log

The commands above reveal some interesting aspects of how to get certain logs. First of all, the commands above specify a few environment variables:

  • GST_DEBUG=4,webkit*:7 - to enable GStreamer logs of level INFO (for all categories) and of level TRACE for the webkit* categories
  • GST_DEBUG_NO_COLOR=1 - to disable coloring of GStreamer logs
  • WEBKIT_DEBUG=Layout,Media=debug,Events=debug - to enable WebKit logs for a few interesting channels.

Moreover, the runtime preference LogsPageMessagesToSystemConsole is enabled to log console output logged by JavaScript code.

The workflow #

Once the logs are collected, one can open them using Emacs and start making sense out of them by gradually exploring the flow of execution. In the below exercise, I intend to understand what happened from the multimedia perspective during the execution of the test case “39. PlaybackRateChange”.

The first step is usually to find the most critical lines that mark more/less the area in the file where the interesting things happen. In that case I propose using M-x loccur RET CONSOLE LOG RET to check what the console logs printed by the application itself are. Once some lines are filtered, one can use bm-toggle command (C-c t) to mark some lines for later examination (highlighted as orange): Effect of filtering and marking some console logs.

For practicing purposes I propose exiting the filtered view M-x loccur RET and trying again to see what events the browser was dispatching e.g. using M-x loccur RET on node node 0x7535d70700b0 VIDEO RET: Effect of filtering and marking some video node events.

In general, the combination of loccur and substring/regexp searches should be very convenient to quickly explore various types of logs along with marking them for later. In case of very important log lines, one can additionally use bm-bookmark-annotate command to add extra notes for later.

Once some interesting log lines are marked, the most basic thing to do is to jump between them using bm-previous (C-c n) and bm-next (C-c p). However, the true power of bm mode comes with the use of M-x bm-show RET to get the view containing only the lines marked with bm-toggle (originally highlighted orange): Effect of invoking bm-show.

This view is especially useful as it shows only the lines deliberately marked using bm-toggle and allows one to quickly jump to them in the original file. Moreover, the lines are displayed in the order they appear in the original file. Therefore it’s very easy to see the unified flow of the system and start making sense out of the data presented. What’s even more interesting, the view contains also the line numbers from the original file as well as manually added annotations if any. The line numbers are especially useful as they may be used for resuming the work after ending the Emacs session - which I’ll describe further in this section.

When the *bm-bookmarks* view is rendered, the only problem left is that the lines are hard to read as they are displayed using a single color. To overcome that problem one can use the macros from the highlight-symbol package using the C-c h shortcut defined in the configuration. The result of highlighting some strings is depicted in the figure below: Highlighting strings in bm-show.

With some colors added, it’s much easier to read the logs and focus on essential parts.

Saving and resuming the session #

On some rare occasions it may happen that it’s necessary to close the Emacs session yet the work with certain log file is not done and needs to be resumed later. For that, the simple trick is to open the current set of bookmarks with M-x bm-show RET and then save that buffer to the file. Personally, I just create a file with the same name as log file yet with .bm prefix - so for log.log it’s log.log.bm.

Once the session is resumed, it is enough to open both log.log and log.log.bm files side by side and create a simple ad-hoc macro to use line numbers from log.log.bm to mark them again in the log.log file: Resuming the session

As shown in the above gif, within a few seconds all the marks are applied in the buffer with log.log file and the work can resume from that point i.e. one can jump around using bm, add new marks etc.

Summary #

Although the above approach may not be ideal for everybody, I find it fairly ergonomic, smooth, and covering all the requirements I identified earlier. I’m certain that editors other than Emacs can be set up to allow the same or very similar flow, yet any particular configurations are left for the reader to explore.

February 26, 2025 12:00 AM

February 25, 2025

Manuel Rego

Nice blog post by my colleague Frédéric Wang about some issues when writing Web Platform Tests https://frederic-wang.fr//2025/02/21/five-testharness-dot-js-mistakes/

February 25, 2025 12:00 AM

February 21, 2025

Frédéric Wang

Stage d'implémentation des normes Web

Igalia recherche des étudiant·e·s pour ses « stages » de développement logiciel 1

Brève description :

  • Contribution aux logiciels libres.
  • En télétravail.
  • 450 heures réparties sur 3 ou 6 mois.
  • Rémunération de 7 000 €.
  • Encadrement par un·e ingénieur·e d’Igalia.
  • Lutter contre les discriminations professionnelles dans le secteur informatique.

Chaque année, je m’occupe du stage « Implémentation des normes Web » qui consiste à modifier les navigateurs (Chromium, Firefox, Safari…) afin d’améliorer le support de technologies Web (HTML, CSS, DOM…). Il faut notamment étudier les spécifications correspondantes et écrire des tests de conformité. Notez bien que ce n’est pas un stage de développement Web mais de développement C++.

Je vous invite à lire My internship with Igalia 1 de ma collègue Delan Azabani pour un exemple concret. Ces dernières années, en plus de la communication par messagerie instantanée, je mets en place des visioconférences hebdomadaires qui se révèlent plutôt efficaces pour permettre aux stagiaires de progresser.

flyer A6

J’ai commencé des cours de LSF depuis quelques mois et assisté à plusieurs spectacles de l’IVT, notamment « Parle plus fort ! » qui décrit avec humour les difficultés des Sourds au travail. Cette année, j’envisage de prendre un·e stagiaire Sourd·e afin de contribuer à une meilleure intégration des Sourds en milieu professionnel. Je pense que ce sera aussi une expérience positive pour mon entreprise et pour moi-même.

Profil recherché :

  • Étudiant·e en informatique niveau licence/master.
  • Résidant en région parisienne (pour faciliter l’encadrement).
  • Pouvant lire/écrire en anglais (et communiquer en LSF).
  • Intéressé·e par les technologies Web.
  • Connaissance de développement C/C++.

Si vous êtes intéressé·e, les candidatures se font ici jusqu’au 4 avril 2025.

  1. Le programme “Coding Experience” d’Igalia ne correspond pas forcément à un stage au sens français du terme. Si vous souhaiteriez en faire un stage conventionné, précisez-le lors de la candidature et nous pourrons trouver une solution.  2

February 21, 2025 11:00 PM

February 20, 2025

Frédéric Wang

Five testharness.js mistakes

Introduction

I recently led a small training session at Igalia where I proposed to find mistakes in five small testharness.js tests I wrote. These mistakes are based on actual issues I found in official web platform tests, or on mistakes I made myself in the past while writing tests, so I believe they would be useful to know. The feedback from my teammates was quite positive, with very good participation and many ideas. They suggested I write a blog post about it, so here it is.

Please read the tests carefully and try to find the mistakes before looking at the proposed fixes…

1. Multiple tests in one loop

We often need to perform identical assertions for a set of similar objects. A good practice is to split such checks into multiple test() calls, so that it’s easier to figure out which of the objects are causing failures. Below, I’m testing the reflected autoplay attribute on the <audio> and <video> elements. What small mistake did I make?

<!DOCTYPE html>
<script src="/resources/testharness.js"></script>
<script src="/resources/testharnessreport.js"></script>
<script>
  ["audio", "video"].forEach(tagName => {
    test(function() {
      let element = document.createElement(tagName);
      assert_equals(element.autoplay, false, "inital value");
      element.setAttribute("autoplay", "autoplay");
      assert_equals(element.autoplay, true, "after setting attribute");
      element.removeAttribute("autoplay");
      assert_equals(element.autoplay, false, "after removing attribute");
    }, "Basic test for HTMLMediaElement.autoplay.");
  });
</script>

Proposed fix

Each loop iteration creates one test, but they all have have the name "Basic test for HTMLMediaElement.autoplay.". Because this name identifies the test in various places (e.g. failure expectations), it must be unique to be useful. These tests will even cause a “Harness status: Error” with the message “duplicate test name”.

One way to solve that is to move the loop iteration into the test(), which will fix the error but won’t help you with fine-grained failure reports. We can instead use a different description for each iteration:

       assert_equals(element.autoplay, true, "after setting attribute");
       element.removeAttribute("autoplay");
       assert_equals(element.autoplay, false, "after removing attribute");
-    }, "Basic test for HTMLMediaElement.autoplay.");
+    }, `Basic test for HTMLMediaElement.autoplay (${tagName} element).`);
   });
 </script>

2. Cleanup between tests

Sometimes, it is convenient to reuse objects (e.g. DOM elements) for several test() calls, and some cleanup may be necessary. For instance, in the following test, I’m checking that setting the class attribute via setAttribute() or setAttributeNS() is properly reflected on the className property. However, I must clear the className at the end of the first test(), so that we can really catch the failure in the second test() if, for example, setAttributeNS() does not modify the className because of an implementation bug. What’s wrong with this approach?

<!DOCTYPE html>
<script src="/resources/testharness.js"></script>
<script src="/resources/testharnessreport.js"></script>
<div id="element"></div>
<script>
  test(function() {
    element.setAttribute("class", "myClass");
    assert_equals(element.className, "myClass");
    element.className = "";
  }, "Setting the class attribute via setAttribute().");

  test(function() {
    element.setAttributeNS(null, "class", "myClass");
    assert_equals(element.className, "myClass");
    element.className = "";
  }, "Setting the class attribute via setAttributeNS().");
</script>

Proposed fix

In general, it is difficult to guarantee that a final cleanup is executed. In this particular case, for example, if the assert_equals() fails because of bad browser implementation, then an exception is thrown and the rest of the function is not executed.

Fortunately, testharness.js provides a better way to perform cleanup after a test execution:

-  test(function() {
+  function resetClassName() { element.className = ""; }
+
+  test(function(t) {
+    t.add_cleanup(resetClassName);
     element.setAttribute("class", "myClass");
     assert_equals(element.className, "myClass");
-    element.className = "";
   }, "Setting the class attribute via setAttribute().");

-  test(function() {
+  test(function(t) {
+    t.add_cleanup(resetClassName);
     element.setAttributeNS(null, "class", "myClass");
     assert_equals(element.className, "myClass");
-    element.className = "";
   }, "Setting the class attribute via setAttributeNS().");

3. Checking whether an exception is thrown

Another very frequent test pattern involves checking whether a Web API throws an exception. Here, I’m trying to use DOMParser.parseFromString() to parse a small MathML document. The HTML spec says that it should throw a TypeError if one specifies the MathML MIME type. The second test() asserts that the rest of the try branch is not executed and that the correct exception type is found in the catch branch. Is this approach correct? Can the test be rewritten in a better way?

<!DOCTYPE html>
<script src="/resources/testharness.js"></script>
<script src="/resources/testharnessreport.js"></script>
<script>
  const parser = new DOMParser();
  const mathmlSource = `<?xml version="1.0"?>
<math xmlns="http://www.w3.org/1998/Math/MathML">
  <mfrac>
    <mn>1</mn>
    <mi>x</mi>
  </mfrac>
</math>`;

  test(function() {
    let doc = parser.parseFromString(mathmlSource, "application/xml");
    assert_equals(doc.documentElement.tagName, "math");
  }, "DOMParser's parseFromString() accepts application/xml");

  test(function() {
    try {
      parser.parseFromString(mathmlSource, "application/mathml+xml");
      assert_unreached();
    } catch(e) {
      assert_true(e instanceof TypeError);
    }
  }, "DOMParser's parseFromString() rejects application/mathml+xml");
</script>

Proposed fix

If the assert_unreached() is executed because of an implementation bug with parseFromString(), then the assertion will actually throw an exception. That exception won’t be a TypeError, so the test will still fail because of the assert_true(), but the failure report will look a bit confusing.

In this situation, it’s better to use the assert_throws_* APIs:

   test(function() {
-    try {
-      parser.parseFromString(mathmlSource, "application/mathml+xml");
-      assert_unreached();
-    } catch(e) {
-      assert_true(e instanceof TypeError);
-    }
+    assert_throws_js(TypeError,
+        _ => parser.parseFromString(mathmlSource, "application/mathml+xml"));
   }, "DOMParser's parseFromString() rejects application/mathml+xml");

4. Waiting for an event listener to be called

The following test verifies a very basic feature: clicking a button triggers the registered event listener. We use the (asynchronous) testdriver API test_driver.click() to emulate that user click, and a promise_test() call to wait for the click event listener to be called. The test may time out if there is something wrong in the browser implementation, but do you see a risk for flaky failures?

Note: the testdriver API only works when running tests automatically. If you run the test manually, you need to click the button yourself.

<!DOCTYPE html>
<script src="/resources/testharness.js"></script>
<script src="/resources/testharnessreport.js"></script>
<script src="/resources/testdriver.js"></script>
<script src="/resources/testdriver-vendor.js"></script>
<button id="button">Click me to run manually</button>
<script>
  promise_test(function() {
    test_driver.click(button);
    return new Promise(resolve => {
      button.addEventListener("click", resolve);
    });
  }, "Clicking the button triggers registered click event handler.");
</script>

Proposed fix

The problem I wanted to show here is that we are sending the click event before actually registering the listener. The test would likely still work, because test_driver.click() is asynchronous and communication to the test automation scripts is slow, whereas registering the event is synchronous.

But rather than making this kind of assumption, which poses a risk of flaky failures as well as making the test hard to read, I prefer to just move the statement that triggers the event into the Promise, after the listener registration:

 <button id="button">Click me to run manually</button>
 <script>
   promise_test(function() {
-    test_driver.click(button);
     return new Promise(resolve => {
       button.addEventListener("click", resolve);
+      test_driver.click(button);
     });
   }, "Clicking the button triggers registered click event handler.");
 </script>

My colleagues also pointed out that if the promise returned by test_driver.click() fails, then a “Harness status: Error” could actually be reported with “Unhandled rejection”. We can add a catch to handle this case:

 <button id="button">Click me to run manually</button>
 <script>
   promise_test(function() {
-    return new Promise(resolve => {
+    return new Promise((resolve, reject) => {
       button.addEventListener("click", resolve);
-      test_driver.click(button);
+      test_driver.click(button).catch(reject);
     });
   }, "Clicking the button triggers registered click event handler.");
 </script>

5. Dealing with asynchronous resources

It’s very common to deal with asynchronous resources in web platform tests. The following test case verifies the behavior of a frame with lazy loading: it is initially outside the viewport (so not loaded) and then scrolled into the viewport (which should trigger its load). The actual loading of the frame is tested via the window name of /common/window-name-setter.html (should be “spices”). Again, this test may time out if there is something wrong in the browser implementation, but can you see a way to make the test a bit more robust?

Side question: the <div id="log"></div> and add_cleanup() are not really necessary for this test to work, so what’s the point of using them? Can you think of one?

<!DOCTYPE html>
<script src="/resources/testharness.js"></script>
<script src="/resources/testharnessreport.js"></script>
<style>
  #lazyframe { margin-top: 10000px; }
</style>
<div id="log"></div>
<iframe id="lazyframe" loading="lazy"
        src="/common/window-name-setter.html"></iframe>
<script>
  promise_test(function() {
    return new Promise(resolve => {
      window.addEventListener("load", () => {
        assert_not_equals(lazyframe.contentWindow.name, "spices");
        resolve();
      });
    });
  }, "lazy frame not loaded after page load");
  promise_test(t => {
    t.add_cleanup(_ => window.scrollTo(0, 0));
    return new Promise(resolve => {
      lazyframe.addEventListener('load', () => {
        assert_equals(lazyframe.contentWindow.name, "spices");
        resolve();
      });
      lazyframe.scrollIntoView();
    });
  }, "lazy frame loaded after appearing in the viewport");
</script>

Proposed fix

This is similar to what we discussed in the previous tests. If the assert_equals() in the listener fails, then an exception is thrown, but it won’t be caught by the testharness.js framework. A “Harness status: Error” is reported, but the test will only complete after the timeout. This can slow down test execution, especially if this pattern is repeated for several tests.

To make sure we report the failure immediately in that case, we can instead reject the promise if the equality does not hold, or even better, place the assert_equals() check after the promise resolution:

 <script>
   promise_test(function() {
     return new Promise(resolve => {
-      window.addEventListener("load", () => {
-        assert_not_equals(lazyframe.contentWindow.name, "spices");
-        resolve();
-      });
-    });
+      window.addEventListener("load", resolve);
+    }).then(_ => {
+      assert_not_equals(lazyframe.contentWindow.name, "spices");
+    });;
   }, "lazy frame not loaded after page load");
   promise_test(t => {
     t.add_cleanup(_ => window.scrollTo(0, 0));
     return new Promise(resolve => {
-      lazyframe.addEventListener('load', () => {
-        assert_equals(lazyframe.contentWindow.name, "spices");
-        resolve();
-      });
+      lazyframe.addEventListener('load', resolve);
       lazyframe.scrollIntoView();
+    }).then(_ => {
+      assert_equals(lazyframe.contentWindow.name, "spices");
     });
   }, "lazy frame loaded after appearing in the viewport");
 </script>

Regarding the side question, if you run the test by opening the page in the browser, then the report will be appended at the bottom of the page by default. But lazyframe has very large height, and the page may be scrolled to some other location. An explicit <div id="log"> ensures the report is inserted inside that div at top of the page, while the add_cleanup() ensures that we scroll to that location after test execution.

February 20, 2025 11:00 PM

Maksim Sisov

Bisecting Electron with Electron/Fiddle.

  1. Electron Fiddle
  2. Electron Releases

Recently, I have been working on an issue in Electron, which required bisecting and finding the exact version of Electron, when the regression happened.

A quick research did not reveal any guides, but as my search was progressing, I found one interesting commit - feat: add Bisect helper.

Electron Fiddle

Fiddle is an Electron playground that allows developers to experiment with Electron APIs. It has a quick startup template, which you can change as you wish. You can save fiddle locally or as a GitHub Gist, which can be shared with anyone by just entering the Gist URL in the address bar.

Moreover, you can choose what Electron version you wish to use - from stable to nightly releases.

Fiddle

Electron Releases

You can run fiddle using any version of Electron you wish - either stable, beta, or nightly. One can either run fiddle with obsolete versions, which is super great when comparing behaviour between different versions.

An option to choose the version of the Electron can be found at the top-left corner of the Fiddle window.

Once pressed, you can use filter to choose any Electron version you wish.

However, you may not find beta or nightly versions in the filter. For that, go to Settings (a gear icon on the left of the filter), then Electron, and select the desired channels.

Now, you can access all the available Electron versions and try any of them on the fly.

I hope this small guide helps you to triage your Electron problems :)))

by Maksim Sisov (msisov@igalia.com) at February 20, 2025 12:00 AM

February 19, 2025

Nick Yamane

Chromium Ozone/Wayland: The Last Mile Stretch

Hey there! I’m glad to finally start paying my blogging debt :) as this is something I’ve been planning to do for quite some time now. To get the ball rolling, I’ve shared some bits about me in my very first blog post Olá Mundo.

In this article, I’m going to walk through what we’ve been working on since last year in the Chromium Ozone/Wayland project, on which I’ve been involved (directly or indirectly) since I’ve joined Igalia back in 2018.

by nickdiego@igalia.com (Nick Yamane) at February 19, 2025 01:00 PM

Manuel Rego

Announcing the Web Engines Hackfest 2025

Igalia is arranging the twelfth annual Web Engines Hackfest, which will be held on Monday 2nd June through Wednesday 4th June. As usual, this is a hybrid event, at Palexco in A Coruña (Galicia, Spain) as well as remotely.

Registration is now open:

Picture of sunset in A Coruña from June 2024 at Riazor beach where the whole Orzán bay can be seen.
Sunset in A Coruña from Riazor beach (June 2024)

The Web Engines Hackfest is an event where folks working on various parts of the web platform gather for a few days to share knowledge and discuss a variety of topics. These topics include web standards, browser engines, JavaScript engines, and all the related technology around them. Last year, we had eight talks (watch them on YouTube) and 15 breakout sessions (read them on GitHub).

A wide range of experts with a diverse set of profiles and skills attend each year, so if you are working on the web platform, this event is a great opportunity to chat with people that are both developing the standards and working on the implementations. We’re really grateful for all the people that regularly join us for this event; you are the ones that make this event something useful and interesting for everyone! 🙏

Really enjoying Web Engines Hackfest by @igalia once again. Recommended for everyone interested in web technology.

— Anne van Kesteren (@annevk@hachyderm.io) June 05, 2024

The breakout sessions are probably the most interesting part of the event. Many different topics are discussed there, from high-level issues like how to build a new feature for the web platform, to lower-level efforts like the new WebKit SDK container. Together with the hallway discussions and impromptu meetings, the Web Engines Hackfest is an invaluable experience.

Talk by Stephanie Stimac about Sustainable Futures in the Web Engines Hackfest 2024

Big shout-out to Igalia for organising the Web Engines Hackfest every year since 2014, as well as the original WebKitGTK+ Hackfest starting in 2009. The event has grown and we’re now close to 100 onsite participants with representation from all major browser vendors. If your organization is interested in helping make this event possible, please contact us regarding our sponsorship options.

See you all in June!

February 19, 2025 07:55 AM

Stephen Chenney

Canvas Localization Support

The Web must support all users, with a multitude of languages and writing styles. HTML supports localization through the HTTP content language representation header, while HTML defines the lang and dir attributes. Yet the canvas element has no direct way of specifying the language for text drawing, nor good support for controlling the writing direction. Such localization controls the choice of characters from a given font and the bidi ordering.

Canvas Localization Today #

In browsers at the time of writing, a <canvas> element in a web page uses the HTML lang attribute of the element for glyph selection, and uses the CSS direction property for the text direction. Note that in all browsers the CSS direction property will have the HTML dir value unless otherwise set. In addition, there is an explicit direction attribute on the <canvas> element’s CanvasRenderingContext2D to control the text drawing direction. That attribute accepts the special "inherit" value, the default, by which the direction is taken from the element.

This example demonstrates the current behavior for language in canvas text.

<div>
<h3>Canvas lang="en"</h3>
<canvas id="en" width="200px" height="60px" lang="en"></canvas>
</div>
<div>
<h3>Canvas lang="tr"</h3>
<canvas id="tr" width="200px" height="60px" lang="tr"></canvas>
</div>
<script>
let en_context = document.getElementById('en').getContext('2d');
let tr_context = document.getElementById('tr').getContext('2d');

function drawText(context) {
context.font = '20px Lato-Medium';
context.color = 'black';
context.fillText('finish', 50, 20);
}

function run_tests() {
drawText(en_context);
drawText(tr_context);
};

// See the example for the code to load the font
</script>

The language comes from the lang attribute on the canvas element, and here it controls the use of ligatures.

OffscreenCanvas provides no such localization support. The "inherit" text direction is always left-to-right and the language is always the default browser language. As a result, in many locales text content will be drawn differently in offscreen vs. DOM canvases. This is a significant problem given offscreen canvas is the preferred method for high performance applications.

The new canvas lang attribute #

The first step to improving localization is giving authors explicit control over the content language for canvas text metrics and drawing. The HTML WHATWG standards group has agreed to add the lang attribute to CanvasRenderingContext2D and OffscreenCanvasRenderingContext2D. The attribute takes any value that an HTML lang attribute can take, or the "inherit" value specifying that the language be taken from the <canvas> element.

This example shows the simplest use of the new attribute:

<script>
function run_test(language_string) {
let context = document.getElementById(language_string).getContext('2d');
context.lang = language_string;
context.font = '20px Lato-Medium';
context.color = 'black';
context.fillText('finish', 50, 20);
}

let latoMediumFontFace = new FontFace(
// Lato-Medium is a font with language specific ligatures.
'Lato-Medium',
'url(../fonts/Lato-Medium.ttf)'
);

latoMediumFontFace.load().then((font) => {
document.fonts.add(font);
run_test('en');
run_test('tr');
});
</script>

The line context.lang = language_string; sets the language for text to the given string, which can be any valid BC47 locale. In this case we use generic English (“en”) and Turkish (“tr”).

The default value for the lang attribute is "inherit" whereby the text uses the language of the canvas. The first example works as it always has, where the canvas text inherits the language of the element.

Offscreen canvas, however, may now use the lang attribute to localize the language. This was not possible before.

  • The language may be set explicitly by setting the lang attribute in the offscreen context.

  • The language may be inherited from the most intuitive source, such as a canvas element that the offscreen was transferred from (the placeholder canvas) or the document in which the Offscreen canvas is created.

Let’s explore these situations, starting with the explicit setting.

offscreen_ctx.lang = 'tr';
offscreen_ctx.font = '20px Lato-Medium';
offscreen_ctx.color = 'black';
offscreen_ctx.fillText('finish', 50, 20);

The language for the offscreen context is explicitly set to Turkish, and the font uses that language to render appropriate glyphs (ligatures or not, in this case).

When the offscreen is created from a canvas element, through the transferControlToOffscreen() method, the language is taken from the canvas element. Here is an example:

<!DOCTYPE html>
<meta charset="utf-8">
<div>
<h3>Canvas lang="tr"</h3>
<canvas lang="tr" id="tr" width="200px" height="60px"></canvas>
</div>
<script>
let context = document.getElementById("tr").transferControlToOffscreen().getContext("2d");

// The default value for lang is "inherit". The canvas that transferred this
// offscreen context has a lang="tr" attribute, so the language used for text
// is "tr".
context.font = '20px Lato-Medium';
context.color = 'black';
context.fillText('finish', 50, 20);
</script>

The value used for "inherit" is captured when control is transferred. If the language on the original canvas element is subsequently changed, the language used for "inherit" on the offscreen does not change.

When the offscreen canvas is created directly in script, there is no canvas element to use, so "inherit" takes the value from the document element for the realm the script is running in. Here’s an example:

<!DOCTYPE html>
<htnl lang="tr">
<meta charset="utf-8">
<div>
<h3>Canvas lang="tr"</h3>
<canvas id="tr" width="200px" height="60px"></canvas>
</div>
<script>
// This is the output canvas.
var canvas = document.getElementById('tr');
var bitmap_ctx = canvas.getContext('bitmaprenderer');

// A newly constructed offscreen canvas.
// Note the lang attribute on the document element, in this case "tr"
// is stored in the offscreen when it is created.
var offscreen = new OffscreenCanvas(canvas.width, canvas.height);
var offscreen_ctx = offscreen.getContext('2d');

offscreen_ctx.font = '20px Lato-Medium';
offscreen_ctx.color = 'black';
offscreen_ctx.fillText('finish', 50, 20);

// Transfer the OffscreenCanvas image to the output canvas.
const bitmap = offscreen.transferToImageBitmap();
bitmap_ctx.transferFromImageBitmap(bitmap);
</script>
</htnl>

Finally, when the offscreen canvas is transferred to a worker, the worker will use the offscreen’s "inherit" language for it’s own ‘“inherit”’ value. This is a complete example, but the key snippet is:

// Create a canvas to serve as the placeholder for the offscreen worker.
const placeholder_canvas = document.createElement('canvas');
placeholder_canvas.setAttribute('width', '200');
placeholder_canvas.setAttribute('height', '60');
// Set the `lang` attribute on the placeholder canvas`
placeholder_canvas.setAttribute('lang', 'tr');

// Create the offscreen from the placeholder. The offscreen will get the
// language from this placeholder, in this case 'tr'.
const offscreen = placeholder_canvas.transferControlToOffscreen();

// Create the worker and transfer the offscreen to it. The language is
// transferred along with the offscreen, so content rendered in the
// offscreen will use 'tr' for the language (unless the `lang` is
// set explicitly on the context in the worker).
const worker = new Worker('canvas-lang-inherit-worker.js');
worker.postMessage({canvas: offscreen}, [offscreen]);

The lang canvas text attribute is implemented in Chrome 135 and later when “Experimental Web Platform Features” is enabled or the command line flag “–enable-blink-features=CanvasTextLang” is used.

Fixing direction for offscreen canvas #

The problem of direction = "inherit" for OffscreenCanvas is solved along the same lines as lang: Capture the inherited value at creation from the placeholder <canvas> element or the document hosting the script that created the offscreen canvas.

All of the examples above for lang work equally well for direction, where the HTML element attribute is dir.

Thanks #

Canvas lang attribute support in Chromium was implemented by Igalia S.L. funded by Bloomberg L.P.

February 19, 2025 12:00 AM

February 18, 2025

Igalia WebKit Team

WebKit Igalia Periodical #14

Update on what happened in WebKit in the week from February 11 to February 18.

Cross-Port 🐱

Multimedia 🎥

GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.

Several multimedia-related optimizations landed this week, memory allocations were reduced in the video rendering code path and in the MediaStreamAudioDestinationNode implementation.

On the WebRTC fronts, the GstWebRTC backend has gained support for stats rate limiting support and the send/receive buffer sizes were adjusted.

The GStreamer video frame converter, used to show video frames in the canvas or WebGL, has been fixed to use the right GL context and now supports DMA-BUF video frames, too.

Added support to the eotf additional MIME type parameter when checking for supported multimedia content types. This is required by some streaming sites like YouTube TV.

The GStreamer-based WebCodecs AudioData::copyTo() is now spec compliant, handling all possible format conversions defined in the WebCodecs specification.

Infrastructure 🏗️

The WebKit ports that use CMake as their build system—as is the case for the GTK and WPE ones—now enable C++ library assertions by default, when building against GNU libstdc++ or LLVM's libc++. This adds lightweight runtime checks to a number of C++ library facilities, mostly aimed at detecting out-of-bounds memory access, and does not have a measurable performance impact on typical browsing workloads.

A number of Linux distributions were already enabling these assertions as part of their security hardening efforts (e.g. Fedora or Gentoo) and they do help finding actual bugs. As a matter of fact, a number of issues were fixed before all the WebKit API and layout tests with the assertions enabled and the patch could be merged! Going forward, this will prevent accidentally introducing bugs new due to wrong usage of the C++ standard library.

That’s all for this week!

by Unknown at February 18, 2025 11:34 AM

Maksim Sisov

Unmasking Hidden Floating-Point Errors in Chromium’s Ozone/Wayland.

  1. The IEEE 754 Standard and Floating-Point Imprecision
  2. How the libwayland Update Exposed the Issue
  3. Chromium’s Targeted Fix: RoundToNearestThousandth

Floating-point arithmetic under the IEEE 754 standard introduces subtle precision errors due to the inherent limitations of representing decimal numbers in binary form. In graphical applications like Chromium, even tiny discrepancies can manifest as rendering inconsistencies. Recent updates in libwayland’s conversion routines have uncovered such a floating-point precision issue in Chromium’s Ozone/Wayland layer—one that was previously masked by legacy conversion behavior.

The IEEE 754 Standard and Floating-Point Imprecision

The IEEE 754 standard governs how floating-point numbers are represented and manipulated in most modern computing systems. Because many decimals (for example, 0.2) cannot be exactly represented in binary being limited to 32 or 64 bits, their results are actually stored as approximations.

This limited precision means that arithmetic operations can introduce small errors. In many cases, these errors are negligible, but in precision-sensitive environments like graphical rendering, even minor inaccuracies can lead to visible artifacts.

The standard has 3 basic components:

  • The Sign of Mantissa - in other words, the sign bit.
  • The Biased exponent - this field is designed to encode both positive and negative values by adding a fixed bias to the actual exponent, thereby converting it into a non-negative stored form.
  • The Normalised Mantissa - this component represents the significant digits of a number in scientific or floating-point notation. In normalized form, the mantissa is adjusted so that only one nonzero digit—specifically, a single “1”—appears immediately to the left of the decimal point.

Consider the number 0.2. Its binary representation is an infinitely repeating fraction:

0.2 = 0.0011 0011 0011 … (in binary)

Note that the pattern of 0011 repeats indefinitely.

If 0.2 is represented in the IEEE 754 form (32 bits), it takes the following form -

0 01111101 10011001100110011001101

As a result, the repeating nature of these binary representations leads to approximation and precision challenges when dealing with floating-point arithmetics.

0.1 + 0.2 != 0.3

How the libwayland Update Exposed the Issue

Libwayland historically provided a utility to convert integer values into a fixed-point format (wl_fixed) for use in various graphical calculations. However, it carried an unintended side effect: an implicit rounding of the result. This rounding acted as a “safety net,” smoothing over the small imprecisions that naturally occur with floating-point arithmetic.

An update to libwayland refined the conversion routine by making it more precise. The function responsible for converting floating-point values to fixed-point values — commonly used in operations like wl_fixed_to_double — no longer introduced the implicit rounding that was part of the old implementation. With this improved precision, the floating-point discrepancies, previously hidden by the rounding effect, began to surface in the Chromium Ozone/Wayland layer, causing subtle yet noticeable differences in viewport calculations.

Chromium’s Targeted Fix: RoundToNearestThousandth

To restore consistency in graphical rendering following the libwayland update, we needed to address the newly exposed precision error. In commit 6187087, the solution was to reintroduce rounding explicitly within the Ozone/Wayland layer. It was decided to implement a utility function named RoundToNearestThousandth to round the viewport source value to the nearest thousandth. This explicit rounding step effectively recreates the smoothing effect that the old implicit conversion provided, ensuring that minor floating-point errors do not result in visible inconsistencies.

By aligning the rounding behavior of the Ozone/Wayland component with that of Chromium’s cc/viz module, the fix ensures consistent and reliable rendering across the browser’s graphical stack. This deliberate approach to handling floating-point imprecision highlights the importance of explicit error management in complex software systems.

by Maksim Sisov (msisov@igalia.com) at February 18, 2025 12:00 AM

Igalia Compilers Team

Sharing SpiderMonkey tests with the world

At Igalia, we believe in building open and interoperable platforms, and we’ve found that for such platforms to succeed, it is essential to combine a clear standard with an extensive test suite. We’ve worked on a number of such test suites, ranging from the official JavaScript conformance test suite test262 and web-platform-tests, which covers the rest of the web platform, to the Khronos Vulkan and OpenGL CTS. Our experience has consistently shown that the existence of a thorough test suite that is easy for implementers to run, helps significantly to get different products to interoperate in the real world.

An important way to maximize the coverage of such a test suite is to encourage implementers to share the tests they are writing as part of their implementation work. In the first place, it is helpful to share new tests – a great success story here is the web-platform-tests project, which has two-way synchronization tooling for the major browsers. This allows developers to write tests directly within their own project, which are then shared more or less automatically. However, for mature platforms, especially when the platform and implementations are older than the centralised test suite, there is often a large backlog of existing tests. We would love to see more of these tests made available.

During 2024, we looked in particular at contributing such backlog tests to test262. We identified SpiderMonkey’s non262 suite as a potential source of interesting tests for this purpose. This test suite is interesting for several reasons.

In the first place, it is part of the larger jstests suite, which also contains the upstream test262 tests (hence the name). This meant we did not expect architectural issues to crop up when upstreaming those tests. Second, as a test suite built by JavaScript engine implementers, it seemed likely to contain tests for edge cases and requirements that changed over time. It is not uncommon for several implementers to be caught out by the same issue, so such tests are more likely to find bugs in other engines as well.

During our investigation, we discovered that our friends at Bocoup had a similar idea back in 2017, and they had created a python script to transform parts of the jstests suite to work within the test262 test suite, which we gratefully reused. However, some issues quickly came to light: it had been written for Python 2, and its unit test had not been enabled in continuous integration, so it needed some work to be of use in 2024. Once that was done, we discovered that the script had been used as part of a mostly manual process to submit specifically curated tests, and it could not cope with the diversity and complexity of the whole test suite without significant up-front work to put the tests into a shape that it could deal with. We suspect that this is part of the reason that their project did not bear all that much fruit in the end, and decided that our approach needed to maximize the number of shared tests for the effort expended.

After getting the script into shape, we set to work on a batch export of the non262 test suite. In order to verify the quality of the exported tests, we ran the tests against both SpiderMonkey and V8 using the upstream tooling. This process revealed several issues—some anticipated, others more unexpected. For example, a large number of tests used helper functions that were only available in SpiderMonkey, which we either needed to provide in test262 or automatically translate to an existing helper function. Other APIs, for example those testing specific details of the garbage collection implementation, could not be reproduced at all.

Besides that, a fair number of tests relied on specifics of the SpiderMonkey implementation that were not guaranteed by the standard – such as testing exact values of the error.message property or the handling of particular bit patterns in typed arrays. Some tests also turned out not to be testing what they were meant to test or covered a previous version of the specification or a not-yet-finalized proposal. Depending on the situation, we improved the tests and added APIs to make it possible to share them, or we skipped exporting the offending tests.

We also discovered some issues in the test262-harness tool that we used to run the tests upstream, notably around tests using JavaScript modules and the [[IsHTMLDDA]] internal slot (used to specify the web compatibility requirements around document.all). Also, it turned out that the mechanism to include helper libraries into tests was not fully specified, which led to some combinations of helpers to work in some test runners but not others. We have started the process to clarify the documentation on this point.

As part of this project so far, we landed about 1600 new tests into test262 and filed 10 bugs (some covering failures in multiple tests), of which half have been fixed by the engine maintainers. Several failures also were the result of bugs that had been filed earlier but hadn’t been fixed yet. Also, Mozilla has decided to remove the exported tests from their own repository, and to use the centralised copies instead.

In terms of future work for this particular project, we’re expecting to investigate if we can share some more of the tests that are currently skipped. Separately we’re interested in looking into a fully automated two-way synchronization system; this would significantly ease the cooperation of engine developers and project maintainers on a unified test suite, though the engineering effort would be commensurate. We’ll also continue investigating if we can identify other test suites that can benefit from a similar treatment.

We would like to thank Bloomberg for sponsoring this work and the Mozilla community, in particular Dan Minor, for their indispensable help and code reviews.

February 18, 2025 12:00 AM

February 17, 2025

Max Ihlenfeldt

Storing and viewing local test results in ResultDB

Get the same nice graphical view of your local test results as for CQ tests!

February 17, 2025 12:00 AM

February 16, 2025

Qiuyi Zhang (Joyee)

Building Node.js on Windows with clang-cl

Recently Node.js started to support building with clang-cl on Windows. I happened to have the chance to try it out this week and while it still needs some fixups in my case, it’s mostly working very well now. Here are some notes about this

February 16, 2025 01:21 AM

February 14, 2025

Andy Wingo

tracepoints: gnarly but worth it

Hey all, quick post today to mention that I added tracing support to the Whippet GC library. If the support library for LTTng is available when Whippet is compiled, Whippet embedders can visualize the GC process. Like this!

Screenshot of perfetto showing a generational PCC trace

Click above for a full-scale screenshot of the Perfetto trace explorer processing the nboyer microbenchmark with the parallel copying collector on a 2.5x heap. Of course no image will have all the information; the nice thing about trace visualizers like is that you can zoom in to sub-microsecond spans to see exactly what is happening, have nice mouseovers and clicky-clickies. Fun times!

on adding tracepoints

Adding tracepoints to a library is not too hard in the end. You need to pull in the lttng-ust library, which has a pkg-config file. You need to declare your tracepoints in one of your header files. Then you have a minimal C file that includes the header, to generate the code needed to emit tracepoints.

Annoyingly, this header file you write needs to be in one of the -I directories; it can’t be just in the the source directory, because lttng includes it seven times (!!) using computed includes (!!!) and because the LTTng file header that does all the computed including isn’t in your directory, GCC won’t find it. It’s pretty ugly. Ugliest part, I would say. But, grit your teeth, because it’s worth it.

Finally you pepper your source with tracepoints, which probably you wrap in some macro so that you don’t have to require LTTng, and so you can switch to other tracepoint libraries, and so on.

using the thing

I wrote up a little guide for Whippet users about how to actually get traces. It’s not as easy as perf record, which I think is an error. Another ugly point. Buck up, though, you are so close to graphs!

By which I mean, so close to having to write a Python script to make graphs! Because LTTng writes its logs in so-called Common Trace Format, which as you might guess is not very common. I have a colleague who swears by it, that for him it is the lowest-overhead system, and indeed in my case it has no measurable overhead when trace data is not being collected, but his group uses custom scripts to convert the CTF data that he collects to... GTKWave (?!?!?!!).

In my case I wanted to use Perfetto’s UI, so I found a script to convert from CTF to the JSON-based tracing format that Chrome profiling used to use. But, it uses an old version of Babeltrace that wasn’t available on my system, so I had to write a new script (!!?!?!?!!), probably the most Python I have written in the last 20 years.

is it worth it?

Yes. God I love blinkenlights. As long as it’s low-maintenance going forward, I am satisfied with the tradeoffs. Even the fact that I had to write a script to process the logs isn’t so bad, because it let me get nice nested events, which most stock tracing tools don’t allow you to do.

I fixed a small performance bug because of it – a worker thread was spinning waiting for a pool to terminate instead of helping out. A win, and one that never would have shown up on a sampling profiler too. I suspect that as I add more tracepoints, more bugs will be found and fixed.

fin

I think the only thing that would be better is if tracepoints were a part of Linux system ABIs – that there would be header files to emit tracepoint metadata in all binaries, that you wouldn’t have to link to any library, and the actual tracing tools would be intermediated by that ABI in such a way that you wouldn’t depend on those tools at build-time or distribution-time. But until then, I will take what I can get. Happy tracing!

by Andy Wingo at February 14, 2025 01:32 PM

February 12, 2025

Max Ihlenfeldt

Implementing fallback tab dragging for Wayland in Chromium

Fallback tab dragging shipped in Chromium 133! Let't take a look at the problem it solves, how it works, and why it took so long to ship.

February 12, 2025 12:00 AM

February 11, 2025

Igalia WebKit Team

WebKit Igalia Periodical #13

Update on what happened in WebKit in the week from February 3 to February 11.

Cross-Port 🐱

Fixed an assertion crash in the remote Web Inspector when its resources contain an UTF-8 “non-breaking space” character.

Multimedia 🎥

GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.

Media playback now supports choosing the output audio device on a per element basis, using the setSinkId() API. This also added the support needed for enumerating audio outputs, which is needed by Web applications to obtain the identifiers of the available devices. Typical usage includes allowing the user to choose the audio output used in WebRTC-based conference calls.

For now feature flags ExposeSpeakers, ExposeSpeakersWithoutMicrophone, and PerElementSpeakerSelection need to be enabled for testing.

Set proper playbin flags which are needed to properly use OpenMax on the RaspberryPi.

Graphics 🖼️

Landed a change that adds a visualization for damage rectangles, controlled by the WEBKIT_SHOW_DAMAGE environment variable. This highlights areas damaged during rendering of every frame—as long as damage propagation is enabled.

Screenshot of a web page showing the “Poster Circle” CSS transforms demo, with red rectangles overlapping the areas that have been last re-rendered

Releases 📦️

Stable releases of WebKitGTK 2.46.6 and WPE WebKit 2.46.6 are now available. These come along with the first security advisory of the year (WSA-2025-0001: GTK, WPE): they contain mainly security fixes, and everybody is advised to update.

The unstable release train continues as well, with WebKitGTK 2.47.4 and WPE WebKit 2.47.4 available for testing. These are snapshots of the current development status, and while expected to work there may be rough edges—if you encounter any issue, reports at the WebKit Bugzilla are always welcome.

The recently released libwpe 1.16.1 accidentally introduced an ABI break, which has been corrected in libwpe 1.16.2. There are no other changes, and the latter should be preferred.

That’s all for this week!

by Unknown at February 11, 2025 10:43 AM

February 10, 2025

Andy Wingo

whippet at fosdem

Hey all, the video of my FOSDEM talk on Whippet is up:

Slides here, if that’s your thing.

I ended the talk with some puzzling results around generational collection, which prompted yesterday’s post. I don’t have a firm answer yet. Or rather, perhaps for the splay benchmark, it is to be expected that a generational GC is not great; but there are other benchmarks that also show suboptimal throughput in generational configurations. Surely it is some tuning issue; I’ll be looking into it.

Happy hacking!

by Andy Wingo at February 10, 2025 09:01 PM

Manuel Rego

Solving Cross-root ARIA Issues in Shadow DOM

This blog post is to announce that Igalia has gotten a grant from NLnet Foundation to work on solving cross-root ARIA issues in Shadow DOM. My colleague Alice Boxhall, which has been working on sorting out this issue since several years ago, together with support form other igalians is doing the work related to this grant.

The Problem #

Shadow DOM has some issues that prevent it to be used in some situations if you want to have an accessible application. This has been identified by the Web Components Working Group as one of the top priority issues that need to be sorted out.

Briefly speaking, there are mainly two different problems when you want to reference elements for ARIA attributes cross shadow root boundaries.

First issue is that you cannot reference things outside the Shadow DOM. Imagine you have a custom element (#customButton) which contains a native button in its Shadow DOM, and you want to associate the internal button with a label (#label) which is outside in the light DOM.

<label id="label">Label</label>
<custom-button id="customButton">
<template shadowrootmode="open">
<div>foo</div>
<button aria-labelledby="label">Button</button>
<div>bar</div>
</template>
</custom-button>

And the second problem is that you cannot reference things inside a Shadow DOM from the outside. Imagine the opposite situation where you have a custom label (#customLabel) with a native label in its Shadow DOM that you want to reference from a button (#button) in the light DOM.

<custom-label id="customLabel">
<template shadowrootmode="open">
<div>foo</div>
<label>Label</label>
<div>bar</div>
</template>
</custom-label>
<button id="button" aria-labelledby="customLabel">Button</button>

This is a huge issue for web components because they cannot use Shadow DOM, as they would like due to its encapsulation properties, if they want to provide an accessible experience to users. For that reason many of the web components libraries don’t use yet Shadow DOM and have to rely on workarounds or custom polyfills.

If you want to know more on the problem, Alice goes deep on the topic in her blog post How Shadow DOM and accessibility are in conflict.

Prior Art #

The Accessibility Object Model (AOM) effort was started several years ago aiming to solve several issues including the one described before, that had a wider scope and tried to solve many different things including the problems described in this blog post. At that time Alice was at Google and Alex Surkov at Mozilla, both were part of this effort. Coincidentally, they are now at Igalia, which together with Joanmarie Diggs and Valerie Young create a dream team of accessibility experts in our company.

Even when the full problem hasn’t been sorted out yet, there has been some progress with the Element Reflection feature which allows ARIA relationship attributes to be reflected as element references. Whit this users can specify them without the need to assign globally unique ID attributes to each element. This feature has been implemented in Chromium and WebKit by Igalia. So instead of doing something like:

<button id="button" aria-describedby="description">Button</button>
<div id="description">Description of the button.</div>

You could specify it like:

<button id="button">Button</button>
<div id="description">Description of the button.</div>
<script>
button.ariaDescribedByElements = [description];
</script>

Coming back to Shadow DOM, this feature also enables authors to specify ARIA relationships pointing to things outside the Shadow DOM (the first kind of problem described in the previous section), however it doesn’t allow to reference elements inside another Shadow DOM (the second problem). Anyway let’s see an example of how this will solve the first issue:

<label id="label">Label</label>
<custom-button id="customButton"></custom-button>
<script>
const shadowRoot = customButton.attachShadow({mode: "open"});

const foo = document.createElement("div");
foo.textContent = "foo";
shadowRoot.appendChild(foo);

const button = document.createElement("button");
button.textContent = "Button";
/* Here is where we reference the outer label from the button inside the Shadow DOM. */
button.ariaLabelledByElements = [label];
shadowRoot.appendChild(button);

const bar = document.createElement("div");
bar.textContent = "bar";
shadowRoot.appendChild(bar);
</script>

Apart from Element Reflection, which only solves part of the issues, there have been other ideas about how to solve these problems. Initially Cross-root ARIA Delegation proposal by Leo Balter at Salesforce. A different one called Cross-root ARIA Reflection by Westbrook Johnson at Adobe. And finally the Reference Target for Cross-root ARIA proposal by Ben Howell at Microsoft.

Again if you want to learn more about the different nuances of the previous proposals you can revisit Alice’s blog post.

The Proposed Solution: Reference Target #

At this point this is the most promising proposal is the Reference Target one. This proposal allows the web authors to use Shadow DOM and still don’t break the accessibility of their web applications. The proposal is still in flux and it’s currently being prototyped in Chromium and WebKit. Anyway as an example this is the kind of API shape that would solve the second problem described in the initial section, where we reference a label (#actualLabel) inside the Shadow DOM from a button (#button) in the light DOM.

<custom-label id="customLabel">
<template shadowrootmode="open"
shadowrootreferencetarget="actualLabel">

<div>foo</div>
<label id="actualLabel">Label</label>
<div>bar</div>
</template>
</custom-label>
<button id="button" aria-labelledby="customLabel">Button</button>

The Grant #

As part of this grant we’ll work on all the process to get the Reference Target proposal ready to be shipped in the web rendering engines. Some of the tasks that will be done during this project include work in different fronts:

  • Proposal: Help with the work on the proposal identifying issues, missing bits, design solutions, providing improvements, keeping it up to date as the project evolves.
  • Specification: Write the specification text, discuss and review it with the appropriate working groups, improved it based on gathered feedback and implementation experience, resolve issues identified in the standards bodies.
  • Implementation: Prototype implementation in WebKit to verify the proposal, upstream changes to WebKit, fix bugs on the implementation, adapt it to spec modifications.
  • Testing: Analyze current WPT tests, add new ones to increase coverage, validate implementations, keep them up-to-date as things evolve.
  • Outreach: Blog posts explaining the evolution of the project and participation in events with other standards folks to have the proper discussions to move the proposal and specification forward.

NLnet Foundation logo

We’re really grateful that NLnet has trusted us to this project, and we really hope this will allow to fix an outstanding accessibility issue in the web platform that has been around for too many time already. At the same point it’s a bit sad, that the European Union through the NGI funds is the one sponsoring this project, when it will have a very important impact for several big fishes that are part of the Web Components WG.

If you want to follow the evolution of this project, I’d suggest you to follow Alice’s blog where she’ll be updating us about the progress of the different tasks.

February 10, 2025 12:00 AM

February 09, 2025

Andy Wingo

baffled by generational garbage collection

Usually in this space I like to share interesting things that I find out; you might call it a research-epistle-publish loop. Today, though, I come not with answers, but with questions, or rather one question, but with fractal surface area: what is the value proposition of generational garbage collection?

hypothesis

The conventional wisdom is encapsulated in a 2004 Blackburn, Cheng, and McKinley paper, “Myths and Realities: The Performance Impact of Garbage Collection”, which compares whole-heap mark-sweep and copying collectors to their generational counterparts, using the Jikes RVM as a test harness. (It also examines a generational reference-counting collector, which is an interesting predecessor to the 2022 LXR work by Zhao, Blackburn, and McKinley.)

The paper finds that generational collectors spend less time than their whole-heap counterparts for a given task. This is mainly due to less time spent collecting, because generational collectors avoid tracing/copying work for older objects that mostly stay in the same place in the live object graph.

The paper also notes an improvement for mutator time under generational GC, but only for the generational mark-sweep collector, which it attributes to the locality and allocation speed benefit of bump-pointer allocation in the nursery. However for copying collectors, generational GC tends to slow down the mutator, probably because of the write barrier, but in the end lower collector times still led to lower total times.

So, I expected generational collectors to always exhibit lower wall-clock times than whole-heap collectors.

test workbench

In whippet, I have a garbage collector with an abstract API that specializes at compile-time to the mutator’s object and root-set representation and to the collector’s allocator, write barrier, and other interfaces. I embed it in whiffle, a simple Scheme-to-C compiler that can run some small multi-threaded benchmarks, for example the classic Gabriel benchmarks. We can then test those benchmarks against different collectors, mutator (thread) counts, and heap sizes. I expect that the generational parallel copying collector takes less time than the whole-heap parallel copying collector.

results?

So, I ran some benchmarks. Take the splay-tree benchmark, derived from Octane’s splay.js. I have a port to Scheme, and the results are... not good!

In this graph the “pcc” series is the whole-heap copying collector, and “generational-pcc” is the generational counterpart, with a nursery sized such that after each collection, its size is 2 MB times the number of active mutator threads in the last collector. So, for this test with eight threads, on my 8-core Ryzen 7 7840U laptop, the nursery is 16MB including the copy reserve, which happens to be the same size as the L3 on this CPU. New objects are kept in the nursery one cycle before being promoted to the old generation.

There are also results for “mmc” and “generational-mmc” collectors, which use an Immix-derived algorithm that allows for bump-pointer allocation but which doesn’t require a copy reserve. There, the generational collectors use a sticky mark-bit algorithm, which has very different performance characteristics as promotion is in-place, and the nursery is as large as the available heap size.

The salient point is that at all heap sizes, and for these two very different configurations (mmc and pcc), generational collection takes more time than whole-heap collection. It’s not just the splay benchmark either; I see the same thing for the very different nboyer benchmark. What is the deal?

I am honestly quite perplexed by this state of affairs. I wish I had a narrative to tie this together, but in lieu of that, voici some propositions and observations.

“generational collection is good because bump-pointer allocation”

Sometimes people say that the reason generational collection is good is because you get bump-pointer allocation, which has better locality and allocation speed. This is misattribution: it’s bump-pointer allocators that have these benefits. You can have them in whole-heap copying collectors, or you can have them in whole-heap mark-compact or immix collectors that bump-pointer allocate into the holes. Or, true, you can have them in generational collectors with a copying nursery but a freelist-based mark-sweep allocator. But also you can have generational collectors without bump-pointer allocation, for free-list sticky-mark-bit collectors. To simplify this panorama to “generational collectors have good allocators” is incorrect.

“generational collection lowers pause times”

It’s true, generational GC does lower median pause times:

But because a major collection is usually slightly more work under generational GC than in a whole-heap system, because of e.g. the need to reset remembered sets, the maximum pauses are just as big and even a little bigger:

I am not even sure that it is meaningful to compare median pause times between generational and non-generational collectors, given that the former perform possibly orders of magnitude more collections than the latter.

Doing fewer whole-heap traces is good, though, and in the ideal case, the less frequent major traces under generational collectors allows time for concurrent tracing, which is the true mitigation for long pause times.

is it whiffle?

Could it be that the test harness I am using is in some way unrepresentative? I don’t have more than one test harness for Whippet yet. I will start work on a second Whippet embedder within the next few weeks, so perhaps we will have an answer there. Still, there is ample time spent in GC pauses in these benchmarks, so surely as a GC workload Whiffle has some utility.

One reasons that Whiffle might be unrepresentative is that it is an ahead-of-time compiler, whereas nursery addresses are assigned at run-time. Whippet exposes the necessary information to allow a just-in-time compiler to specialize write barriers, for example the inline check that the field being mutated is not in the nursery, and an AOT compiler can’t encode this as an immediate. But it seems a small detail.

Also, Whiffle doesn’t do much compiler-side work to elide write barriers. Could the cost of write barriers be over-represented in Whiffle, relative to a production language run-time?

Relatedly, Whiffle is just a baseline compiler. It does some partial evaluation but no CFG-level optimization, no contification, no nice closure conversion, no specialization, and so on: is it not representative because it is not an optimizing compiler?

is it something about the nursery size?

How big should the nursery be? I have no idea.

As a thought experiment, consider the case of a 1 kilobyte nursery. It is probably too small to allow the time for objects to die young, so the survival rate at each minor collection would be high. Above a certain survival rate, generational GC is probably a lose, because your program violates the weak generational hypothesis: it introduces a needless copy for all survivors, and a synchronization for each minor GC.

On the other hand, a 1 GB nursery is probably not great either. It is plenty large enough to allow objects to die young, but the number of survivor objects in a space that large is such that pause times would not be very low, which is one of the things you would like in generational GC. Also, you lose out on locality: a significant fraction of the objects you traverse are probably out of cache and might even incur TLB misses.

So there is probably a happy medium somewhere. My instinct is that for a copying nursery, you want to make it about as big as L3 cache, which on my 8-core laptop is 16 megabytes. Systems are different sizes though; in Whippet my current heuristic is to reserve 2 MB of nursery per core that was active in the previous cycle, so if only 4 threads are allocating, you would have a 8 MB nursery. Is this good? I don’t know.

is it something about the benchmarks?

I don’t have a very large set of benchmarks that run on Whiffle, and they might not be representative. I mean, they are microbenchmarks.

One question I had was about heap sizes. If a benchmark’s maximum heap size fits in L3, which is the case for some of them, then probably generational GC is a wash, because whole-heap collection stays in cache. When I am looking at benchmarks that evaluate generational GC, I make sure to choose those that exceed L3 size by a good factor, for example the 8-mutator splay benchmark in which minimum heap size peaks at 300 MB, or the 8-mutator nboyer-5 which peaks at 1.6 GB.

But then, should nursery size scale with total heap size? I don’t know!

Incidentally, the way that I scale these benchmarks to multiple mutators is a bit odd: they are serial benchmarks, and I just run some number of threads at a time, and scale the heap size accordingly, assuming that the minimum size when there are 4 threads is four times the minimum size when there is just one thread. However, multithreaded programs are unreliable, in the sense that there is no heap size under which they fail and above which they succeed; I quote:

"Consider 10 threads each of which has a local object graph that is usually 10 MB but briefly 100MB when calculating: usually when GC happens, total live object size is 10×10MB=100MB, but sometimes as much as 1 GB; there is a minimum heap size for which the program sometimes works, but also a minimum heap size at which it always works."

is it the write barrier?

A generational collector partitions objects into old and new sets, and a minor collection starts by visiting all old-to-new edges, called the “remembered set”. As the program runs, mutations to old objects might introduce new old-to-new edges. To maintain the remembered set in a generational collector, the mutator invokes write barriers: little bits of code that run when you mutate a field in an object. This is overhead relative to non-generational configurations, where the mutator doesn’t have to invoke collector code when it sets fields.

So, could it be that Whippet’s write barriers or remembered set are somehow so inefficient that my tests are unrepresentative of the state of the art?

I used to use card-marking barriers, but I started to suspect they cause too much overhead during minor GC and introduced too much cache contention. I switched to precise field-logging barriers some months back for Whippet’s Immix-derived space, and we use the same kind of barrier in the generational copying (pcc) collector. I think this is state of the art. I need to see if I can find a configuration that allows me to measure the overhead of these barriers, independently of other components of a generational collector.

is it something about the generational mechanism?

A few months ago, my only generational collector used the sticky mark-bit algorithm, which is an unconventional configuration: its nursery is not contiguous, non-moving, and can be as large as the heap. This is part of the reason that I implemented generational support for the parallel copying collector, to have a different and more conventional collector to compare against. But generational collection loses on some of these benchmarks in both places!

is it something about collecting more often?

On one benchmark which repeatedly constructs some trees and then verifies them, I was seeing terrible results for generational GC, which I realized were because of cooperative safepoints: generational GC collects more often, so it requires that all threads reach safepoints more often, and the non-allocating verification phase wasn’t emitting any safepoints. I had to change the compiler to emit safepoints at regular intervals (in my case, on function entry), and it sped up the generational collector by a significant amount.

This is one instance of a general observation, which is that any work that doesn’t depend on survivor size in a GC pause is more expensive with a generational collector, which runs more collections. Synchronization can be a cost. I had one bug in which tracing ephemerons did work proportional to the size of the whole heap, instead of the nursery; I had to specifically add generational support for the way Whippet deals with ephemerons during a collection to reduce this cost.

is it something about collection frequency?

Looking deeper at the data, I have partial answers for the splay benchmark, and they are annoying :)

Splay doesn’t actually allocate all that much garbage. At a 2.5x heap, the stock parallel MMC collector (in-place, sticky mark bit) collects... one time. That’s all. Same for the generational MMC collector, because the first collection is always major. So at 2.5x we would expect the generational collector to be slightly slower. The benchmark is simply not very good – or perhaps the most generous interpretation is that it represents tasks that allocate 40 MB or so of long-lived data and not much garbage on top.

Also at 2.5x heap, the whole-heap copying collector runs 9 times, and the generational copying collector does 293 minor collections and... 9 major collections. We are not reducing the number of major GCs. It means either the nursery is too small, so objects aren’t dying young when they could, or the benchmark itself doesn’t conform to the weak generational hypothesis.

At a 1.5x heap, the copying collector doesn’t have enough space to run. For MMC, the non-generational variant collects 7 times, and generational MMC times out. Timing out indicates a bug, I think. Annoying!

I tend to think that if I get results and there were fewer than, like, 5 major collections for a whole-heap collector, that indicates that the benchmark is probably inapplicable at that heap size, and I should somehow surface these anomalies in my analysis scripts.

collecting more often redux

Doing a similar exercise for nboyer at 2.5x heap with 8 threads (4GB for 1.6GB live data), I see that pcc did 20 major collections, whereas generational pcc lowered that to 8 major collections and 3471 minor collections. Could it be that there are still too many fixed costs associated with synchronizing for global stop-the-world minor collections? I am going to have to add some fine-grained tracing to find out.

conclusion?

I just don’t know! I want to believe that generational collection was an out-and-out win, but I haven’t yet been able to prove it is true.

I do have some homework to do. I need to find a way to test the overhead of my write barrier – probably using the MMC collector and making it only do major collections. I need to fix generational-mmc for splay and a 1.5x heap. And I need to do some fine-grained performance analysis for minor collections in large heaps.

Enough for today. Feedback / reactions very welcome. Thanks for reading and happy hacking!

by Andy Wingo at February 09, 2025 12:46 PM

February 06, 2025

Brian Kardell

Link in Bio

Link in Bio

Embracing an IndieWeb thing...

Over the years, I've been there as things have come and gone on the interwebs. I had accounts on AIM, MySpace, Blogger, FaceBook, WordPress, Google Plus, Twitter, Instagram, and plenty more. On each one, I wind up with a profile where I want to link people to other places I am online - and those places don't always make it easy to do that. So, something short and memorable that you could even type if you had to is ideal - like a handle: @briankardell or @bkardell or, in the end: bkardell.com is pretty great.

Back in 2016, some folks felt the same frustration and instead created this Linktree thing. But... It's just like, a fremium thing that gives you the most basic single page website ever? Yes, and today they have 50 million users and are valued at $1.3 billion - that's a value of $26/user. I mean, I found that amazing.

But, here's the thing: After a while I realized that there's something to it. Not the business model, specifically, but the link in bio idea. The link in bio is generally not a great website on its own, but it's a page where people can pretty quickly navigate to the thing they're looking for without other noise or fluff. Which is something that a home page often isn't. So, I've learned to really appreciate them, actually.

Back in about 2020, some IndieWeb folks began thinking about, criticizing and brainstorming around it too. They began writing about Link in Bio, and why it might be useful to have your own version. There are several examples of people doing something similar.

Recently a bunch of things smashed together in my mind. First, on ShopTalk, I heard Chris and Dave talking about "slash pages" (pages right off the root domain with well known names). Second, I've been working on social media plans - getting away from some platforms and moving to others and thinking about these problems. An IndieWeb style `/links` page, that adds rel="me" links is a really nice, simple way to achieve a whole lot of things if we'd all adopt it - not the least of which is that it's an evergreen link, almost as simple as a handle, to where people can find you should you choose to leave...

So, starting with me, and Igalia, you can find our links at bkardell.com/links and igalia.com/links respectively.

A screenshot of Igalia's post with a sunset and birds flying away, linking to our /links page

February 06, 2025 05:00 AM

February 05, 2025

Alberto Garcia

Keeping your system-wide configuration files intact after updating SteamOS

Introduction

If you use SteamOS and you like to install third-party tools or modify the system-wide configuration some of your changes might be lost after an OS update. Read on for details on why this happens and what to do about it.


As you all know SteamOS uses an immutable root filesystem and users are not expected to modify it because all changes are lost after an OS update.

However this does not include configuration files: the /etc directory is not part of the root filesystem itself. Instead, it’s a writable overlay and all modifications are actually stored under /var (together with all the usual contents that go in that filesystem such as logs, cached data, etc).

/etc contains important data that is specific to that particular machine like the configuration of known network connections, the password of the main user and the SSH keys. This configuration needs to be kept after an OS update so the system can keep working as expected. However the update process also needs to make sure that other changes to /etc don’t conflict with whatever is available in the new version of the OS, and there have been issues due to some modifications unexpectedly persisting after a system update.

SteamOS 3.6 introduced a new mechanism to decide what to to keep after an OS update, and the system now keeps a list of configuration files that are allowed to be kept in the new version. The idea is that only the modifications that are known to be important for the correct operation of the system are applied, and everything else is discarded1.

However, many users want to be able to keep additional configuration files after an OS update, either because the changes are important for them or because those files are needed for some third-party tool that they have installed. Fortunately the system provides a way to do that, and users (or developers of third-party tools) can add a configuration file to /etc/atomic-update.conf.d, listing the additional files that need to be kept.

There is an example in /etc/atomic-update.conf.d/example-additional-keep-list.conf that shows what this configuration looks like.

Sample configuration file for the SteamOS updater

Developers who are targeting SteamOS can also use this same method to make sure that their configuration files survive OS updates. As an example of an actual third-party project that makes use of this mechanism you can have a look at the DeterminateSystems Nix installer:

https://github.com/DeterminateSystems/nix-installer/blob/v0.34.0/src/planner/steam_deck.rs#L273

As usual, if you encounter issues with this or any other part of the system you can check the SteamOS issue tracker. Enjoy!


  1. A copy is actually kept under /etc/previous to give the user the chance to recover files if necessary, and up to five previous snapshots are kept under /var/lib/steamos-atomupd/etc_backup ↩

by berto at February 05, 2025 04:13 PM

February 04, 2025

José Dapena

Maintaining Chromium downstream: keeping it small

Maintaining a downstream of Chromium is hard, because of the speed upstream moves. and how hard it is to keep our downstream up to date.

A critical aspect is how big what we build on top of Chromium is: in other words, the size of our downstream. In this blog post I will review how to measure it, and the impact it has on the costs of maintaining a downstream.

Maintaining Chromium downstream series #

Last year, I started a series of blog posts about the challenges, the organization and the implementation details of maintaining a project that is a downstream of Chromium. This is the third blog post in the series.

The previous posts were:

  • Why downstream?: why is it needed to create downstream forks of Chromium? And why using Chromium in particular?
  • Update strategies: when to update? Is it better to merge or rebase? How can automation help?

Measuring the size of a downstream #

But, first… What do I mean by the size of the downstream? I am interested in a definition that can be used as a metric, something we can measure and track. A number that allows to know if the downstream is increasing or decreasing, measure if a change has impact on it.

The rough idea is: the bigger the downstream is, the more complex it is to maintain it. I will provide a few metrics that can be used for this purpose.

Delta #

The most obvious metric is the delta, the difference between upstream Chromium and the downstream. For this, and assuming the downstream uses Git, the definition I use is essentially the result of this command:

git diff --shortstat BASELINE..DOWNSTREAM

BASELINE is a commit reference that represents the pure upstream repository status our downstream is based on (our baseline). DOWNSTREAM is the commit reference we want to compare the baseline to.

As a recommendation, it is useful to maintain in our downstream repository tags or branches that represent strictly the baseline. This way we can use diff tools to represent our delta more easily.

This command is going to return 3 values:

  • The number of files that have changed.
  • The number of lines that were added.
  • The number of lines that were removed.

We will be mostly interested in tracking the number of lines added and removed.

This definition is interesting as it gives an idea of the amount of lines of code that we need to maintain. It may not reflect the full amount to maintain, as some files are maintained out of the Chromium repository. Aggregating these with other repositories changed or added to the build could be useful.

One interesting thing with this approach is also that we can measure the delta of specific paths in the repository. I.e. if we want to measure the delta of the content/ path, it is just as easy as doing:

git diff --shortstat BASELINE..DOWNSTREAM content/

Modifying delta #

The regular delta definition we considered has a problem. All the line changes have the same weight. But, when we update our baseline, a big part of the complexity comes from the conflicts found when rebasing or merging.

So, I am introducing a new definition. Modifying delta: the changes between the baseline and the downstream that affect upstream lines. In this case, we ignore completely any file added only by the downstream, as that is not going to create conflicts.

In Git, we can use filters for that purpose:

git diff --diff-filter=CDMR --shortstat BASELINE..DOWNSTREAM

This will only account these changes:

  • M: changes affecting existing files.
  • R: files that were renamed.
  • C: files that were copied.
  • D: files that were deleted.

So, these numbers are going to more accurately represent which parts of our delta can conflict with the changes coming from upstream when we rebase or merge.

Tracking the modifying delta, and reorganizing the project to reduce it, is usually a good strategy for reducing maintenance costs.

diffstat #

An issue we have with the Git diff stats is that it represents modified lines as a block of lines removed and another of lines added.

Fortunately, we can use another tool. Diffstat, will do a best effort to identify which lines are actually modified. It can be easily installed in your distribution of choice (i.e. the package diffstat in Debian/Ubuntu/Redhat).

This behavior is enabled with the parameter -m:

git diff ...parameters... | diffstat -m

This is the kind of output that is generated. On top of the typical + and - we see the ! for the lines that have been detected to be modified.

$ git show | diffstat -m
paint/timing/container_timing.cc | 5 ++++!
paint/timing/container_timing.h | 1 +
timing/performance_container_timing.cc | 20 ++++++++++++++++++!!
timing/performance_container_timing.h | 5 +++++
timing/performance_container_timing.idl | 1 +
timing/window_performance.cc | 4 ++!!
timing/window_performance.h | 1 +
7 files changed, 32 insertions(+), 5 modifications(!)

Coloring is also available, with the parameter -C.

Using diffstat gives a more accurate insight of both the total delta and the modifying delta.

Tracking deltas #

Now we have the tools to provide numbers, we can track them in the time to know if our downstream is growing or shrinking.

That can be used also for measuring the impact of different strategies or changes in the downstream maintenance complexity.

Other metric ideas #

But deltas are not the only tool to measure the complexity, specially regarding the effort maintaining a downstream.

I can enumerate just a few ideas that provide insight of different problems:

  • Frequency of rebase/merge conflicts per path.
  • Frequency of undetected build issues.
  • Frequency and complexity of the regressions, weighed by the size of the patches fixing them.

Relevant changes for tracking a downstream #

Let’s focus now on other factors, not always measurable easily, when we maintain a downstream project.

What we build on top of Chromium #

The complexity of a downstream, specially the one measured by regular delta, is impacted heavily by what is built on top of Chromium.

A full web browser is usually bigger, because it includes the required user experience, and many components that conform what we nowadays consider a browser. History, bookmarks, user profiles, secrets management…

An application runtime for hybrid applications may just have minimal wrappers for integrating a web view, but then maybe a complex set of components for easing the integration with a native toolkit or a specific programming language.

How much you build on top of Chrome?

  • Browser are usually bigger pieces than runtimes.
  • Hybrid application runtimes may have a big part related to toolkit or other components.

What we depend on #

For maintenance complexity, as important as what we build on top, is the set of boundaries and dependencies:

  • How many upstream components are we using?
  • What kind of APIs are provided?
  • Are they stable or changing often?

These questions are specially relevant, as Chromium does not really provide any warranty about the stability, or even availability, of existing components.

Though, different layers provided by Chromium change less often than others. Some examples:

  • The Content API provides the basics of the web platform and Chromium process model, so it is quite useful for hybrid application runtimes. It has been changing last years, in part because of the [Onion Soup refactorings] (https://mariospr.org/category/blink-onion-soup/). Though, there are always examples of how to adapt to those changes in //content/shell and //chrome.
  • Chromium provides a number of reusable components at //components that may be useful for different downstreams.
  • Then, for building a full web browser, it may be tempting to directly use //chrome, and modify it for the specific downstream user experience. This means a higher modifying delta. But, as the upstream Chrome browser UI may also often changes heavily, the frequency of conflicts also increases.

Wrapping up #

In this post I reviewed different ways to measure the downstream size, and how what we build impacts the complexity of maintenance.

Understanding and tracking our downstream allows to implement strategies to keep things under control. It also allows to better understand the cost of a specific feature or an implementation approach.

In the next post in this series, I will write about how the upstream Chromium community helps the downstreams.

References #

February 04, 2025 12:00 AM

February 03, 2025

Igalia WebKit Team

WebKit Igalia Periodical #12

Update on what happened in WebKit in the week from January 27 to February 3.

Cross-Port 🐱

The documentation now has a section on how to use the Web Inspector remotely. This makes information on this topic easier to find, as it was previously scattered around a few different locations.

Jamie continues her Coding Experience work around bringing WebExtensions to the WebKitGTK port. A good part of this involves porting functionality from Objective-C, which only the Apple WebKit ports would use, into C++ code that all ports may use. The latest in this saga was WebExtensionStorageSQLiteStore.

Web Platform 🌐

The experimental support for Invoker Commands has been updated to match latest spec changes.

WPE and WebKitGTK now have support for the Cookie Store API.

Implemented experimental support for the CloseWatcher API.

Multimedia 🎥

GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.

The GStreamer WebRTC backend can now recycle inactive senders and support for inactive receivers was also improved. With these changes, support for screen sharing over WebRTC is now more reliable.

On the playback front, a bug on the silent video automatic pause optimization was fixed, the root case for certain VP9 videos appearing as empty sometimes was found out to be in GStreamer, and there is effort ongoing to solve racy crashes when flushing MSE streams.

WebKitGTK 🖥️

Support for WebDriver BiDi has been enabled in WebKitGTK as an experimental feature.

That’s all for this week!

by Unknown at February 03, 2025 10:33 PM

January 30, 2025

Vivienne Watermeier

Debugging GStreamer applications using Nushell

Nushell is a new shell (get it?) in development since 2019. Where other shells like bash and zsh treat all data as raw text, nu instead provides a type system for all data flowing through its pipelines, with many commands inspired by functional languages to manipulate that data. The examples on their homepage and in the README.md demonstrate this well, and I recommend taking a quick look if you’re not familiar with the language.

I have been getting familiar with Nu for a few months now, and found it a lot more approachable and user-friendly than traditional shells, and particularly helpful for exploring logs.

Where to find documentation #

I won’t go over all the commands I use in detail, so if anything is ever unclear, have a look at the Command Reference. The most relevant categories for our use case are probably Strings and Filters. From inside nushell, you can also use help some_cmd or some_cmd --help, or help commands for a full table of commands that can be manipulated and searched like any other table in nu. And for debugging a pipeline, describe is a very useful command that describes the type of its input.

Set-up for analyzing GStreamer logs #

First of all, we need some custom commands to parse the raw logs into a nu table. Luckily, nushell provides a parse command for exacly this use case, and we can define this regex to use with it:

let gst_regex: ([
  '(?<time>[0-9.:]+) +'
  '(?<pid>\w+) +'
  '(?<thread>\w+) +'
  '(?<level>\w+) +'
  '(?<category>[\w-]+) +'
  '(?<file>[\w.-]+)?:'
  '(?<line>\w+):'
  '(?<function>[\w()~-]+)?:'
  '(?<object><[^>]*>)? +'
  '(?<msg>.*)$'
] | str join)

(I use a simple pipeline here to split the string over multiple lines for better readability, it just concatenates the list elements.)

Lets run a simple pipeline to get some logs to play around with:
GST_DEBUG=*:DEBUG GST_DEBUG_FILE=sample.log gst-launch-1.0 videotestsrc ! videoconvert ! autovideosink

For parsing the file, we need to be careful to remove any ansi escapes, and split the input into lines. On top of that, we will also store the result to a variable for ease of use:
let gst_log = open sample.log | ansi strip | lines | parse --regex $gst_regex

You can also define a custom command for this, which would look something like:

def "from gst logs" []: string -> table {
  $in | ansi strip | lines | parse --regex ([
    '(?<time>[0-9.:]+) +'
    '(?<pid>\w+) +'
    '(?<thread>\w+) +'
    '(?<level>\w+) +'
    '(?<category>[\w-]+) +'
    '(?<file>[\w.-]+)?:'
    '(?<line>\w+):'
    '(?<function>[\w()~-]+)?:'
    '(?<object><[^>]*>)? +'
    '(?<msg>.*)$'
  ] | str join)
}

Define it directly on the command line, or place it in your configuration files. Either way, use the command like this:
let gst_log = open sample.log | from gst logs

Some basic commands for working with the parsed data #

If you take a look at a few lines of the table, it should look something like this:
$gst_log | skip 10 | take 10

╭────┬────────────────────┬───────┬────────────┬────────┬─────────────────────┬────────────────┬───────┬──────────────────────────────┬─────────────┬───────────────────────────────────────────────╮
│  # │        time        │  pid  │   thread   │ level  │      category       │      file      │ line  │           function           │   object    │                      msg                      │
├────┼────────────────────┼───────┼────────────┼────────┼─────────────────────┼────────────────┼───────┼──────────────────────────────┼─────────────┼───────────────────────────────────────────────┤
│  0 │ 0:00:00.003607288  │ 5161  │ 0x1ceba80  │ DEBUG  │ GST_ELEMENT_PADS    │ gstelement.c   │ 315   │ gst_element_base_class_init  │             │ type GstBin : factory (nil)                   │
│  1 │ 0:00:00.003927025  │ 5161  │ 0x1ceba80  │ INFO   │ GST_INIT            │ gstcontext.c   │ 86    │ _priv_gst_context_initialize │             │ init contexts                                 │
│  2 │ 0:00:00.004117399  │ 5161  │ 0x1ceba80  │ INFO   │ GST_PLUGIN_LOADING  │ gstplugin.c    │ 328   │ _priv_gst_plugin_initialize  │             │ registering 0 static plugins                  │
│  3 │ 0:00:00.004164980  │ 5161  │ 0x1ceba80  │ DEBUG  │ GST_REGISTRY        │ gstregistry.c  │ 592   │ gst_registry_add_feature     │ <registry0> │ adding feature 0x1d08c70 (bin)                │
│  4 │ 0:00:00.004176720  │ 5161  │ 0x1ceba80  │ DEBUG  │ GST_REFCOUNTING     │ gstobject.c    │ 710   │ gst_object_set_parent        │ <bin>       │ set parent (ref and sink)                     │
│  5 │ 0:00:00.004197201  │ 5161  │ 0x1ceba80  │ DEBUG  │ GST_ELEMENT_PADS    │ gstelement.c   │ 315   │ gst_element_base_class_init  │             │ type GstPipeline : factory 0x1d09310          │
│  6 │ 0:00:00.004243022  │ 5161  │ 0x1ceba80  │ DEBUG  │ GST_REGISTRY        │ gstregistry.c  │ 592   │ gst_registry_add_feature     │ <registry0> │ adding feature 0x1d09310 (pipeline)           │
│  7 │ 0:00:00.004254252  │ 5161  │ 0x1ceba80  │ DEBUG  │ GST_REFCOUNTING     │ gstobject.c    │ 710   │ gst_object_set_parent        │ <pipeline>  │ set parent (ref and sink)                     │
│  8 │ 0:00:00.004265272  │ 5161  │ 0x1ceba80  │ INFO   │ GST_PLUGIN_LOADING  │ gstplugin.c    │ 236   │ gst_plugin_register_static   │             │ registered static plugin "staticelements"     │
│  9 │ 0:00:00.004276813  │ 5161  │ 0x1ceba80  │ DEBUG  │ GST_REGISTRY        │ gstregistry.c  │ 476   │ gst_registry_add_plugin      │ <registry0> │ adding plugin 0x1d084d0 for filename "(NULL)" │
╰────┴────────────────────┴───────┴────────────┴────────┴─────────────────────┴────────────────┴───────┴──────────────────────────────┴─────────────┴───────────────────────────────────────────────╯

skip and take do exactly what it says on the tin - removing the first N rows, and showing only the first N rows, respectively. I use them here to keep the examples short.


To ignore columns, use reject:
$gst_log | skip 10 | take 5 | reject time pid thread

╭───┬───────┬────────────────────┬───────────────┬──────┬──────────────────────────────┬─────────────┬────────────────────────────────╮
│ # │ level │      category      │     file      │ line │           function           │   object    │              msg               │
├───┼───────┼────────────────────┼───────────────┼──────┼──────────────────────────────┼─────────────┼────────────────────────────────┤
│ 0 │ DEBUG │ GST_ELEMENT_PADS   │ gstelement.c  │ 315  │ gst_element_base_class_init  │             │ type GstBin : factory (nil)    │
│ 1 │ INFO  │ GST_INIT           │ gstcontext.c  │ 86   │ _priv_gst_context_initialize │             │ init contexts                  │
│ 2 │ INFO  │ GST_PLUGIN_LOADING │ gstplugin.c   │ 328  │ _priv_gst_plugin_initialize  │             │ registering 0 static plugins   │
│ 3 │ DEBUG │ GST_REGISTRY       │ gstregistry.c │ 592  │ gst_registry_add_feature     │ <registry0> │ adding feature 0x1d08c70 (bin) │
│ 4 │ DEBUG │ GST_REFCOUNTING    │ gstobject.c   │ 710  │ gst_object_set_parent        │ <bin>       │ set parent (ref and sink)      │
╰───┴───────┴────────────────────┴───────────────┴──────┴──────────────────────────────┴─────────────┴────────────────────────────────╯

Or its counterpart, select, which is also useful for reordering columns:
$gst_log | skip 10 | take 5 | select msg category level

╭───┬────────────────────────────────┬────────────────────┬───────╮
│ # │              msg               │      category      │ level │
├───┼────────────────────────────────┼────────────────────┼───────┤
│ 0 │ type GstBin : factory (nil)    │ GST_ELEMENT_PADS   │ DEBUG │
│ 1 │ init contexts                  │ GST_INIT           │ INFO  │
│ 2 │ registering 0 static plugins   │ GST_PLUGIN_LOADING │ INFO  │
│ 3 │ adding feature 0x1d08c70 (bin) │ GST_REGISTRY       │ DEBUG │
│ 4 │ set parent (ref and sink)      │ GST_REFCOUNTING    │ DEBUG │
╰───┴────────────────────────────────┴────────────────────┴───────╯

Meanwhile, get returns a single column as a list, which can for example be used with uniq to get a list of all objects in the log:
$gst_log | get object | uniq | take 5

╭───┬──────────────╮
│ 0 │              │
│ 1 │ <registry0>  │
│ 2 │ <bin>        │
│ 3 │ <pipeline>   │
│ 4 │ <capsfilter> │
╰───┴──────────────╯

Filtering rows by different criteria works really well with where.
$gst_log | where thread in ['0x7f467c000b90' '0x232fefa0'] and category == GST_STATES | take 5

╭────┬────────────────────┬───────┬─────────────────┬────────┬─────────────┬──────────┬──────┬───────────────────────┬──────────────────┬───────────────────────────────────────────────────────────╮
│  # │        time        │  pid  │     thread      │ level  │  category   │   file   │ line │       function        │      object      │                            msg                            │
├────┼────────────────────┼───────┼─────────────────┼────────┼─────────────┼──────────┼──────┼───────────────────────┼──────────────────┼───────────────────────────────────────────────────────────┤
│  0 │ 0:00:01.318390245  │ 5158  │ 0x7f467c000b90  │ DEBUG  │ GST_STATES  │ gstbin.c │ 1957 │ bin_element_is_sink   │ <autovideosink0> │ child autovideosink0-actual-sink-xvimage is sink          │
│  1 │ 0:00:01.318523898  │ 5158  │ 0x7f467c000b90  │ DEBUG  │ GST_STATES  │ gstbin.c │ 1957 │ bin_element_is_sink   │ <pipeline0>      │ child autovideosink0 is sink                              │
│  2 │ 0:00:01.318558109  │ 5158  │ 0x7f467c000b90  │ DEBUG  │ GST_STATES  │ gstbin.c │ 1957 │ bin_element_is_sink   │ <pipeline0>      │ child videoconvert0 is not sink                           │
│  3 │ 0:00:01.318569169  │ 5158  │ 0x7f467c000b90  │ DEBUG  │ GST_STATES  │ gstbin.c │ 1957 │ bin_element_is_sink   │ <pipeline0>      │ child videotestsrc0 is not sink                           │
│  4 │ 0:00:01.338298058  │ 5158  │ 0x7f467c000b90  │ INFO   │ GST_STATES  │ gstbin.c │ 3408 │ bin_handle_async_done │ <autovideosink0> │ committing state from READY to PAUSED, old pending PAUSED │
╰────┴────────────────────┴───────┴─────────────────┴────────┴─────────────┴──────────┴──────┴───────────────────────┴──────────────────┴───────────────────────────────────────────────────────────╯

It provides special shorthands called row conditions - have a look at the reference for more examples.


Of course, get and where can also be combined:
$gst_log | get category | uniq | where $it starts-with GST | take 5

╭───┬────────────────────╮
│ 0 │ GST_REGISTRY       │
│ 1 │ GST_INIT           │
│ 2 │ GST_MEMORY         │
│ 3 │ GST_ELEMENT_PADS   │
│ 4 │ GST_PLUGIN_LOADING │
╰───┴────────────────────╯

And if you need to merge multiple logs, I recommend using sort-by time. This could look like
let gst_log = (open sample.log) + (open other.log) | from gst logs | sort-by time

Interactively exploring logs #

While there are many other useful commands, there is one more command I find incredbly useful: explore. It is essentially the nushell equivalent to less, and while it is still quite rough around the edges, I’ve been using it all the time, mostly for its interactive REPL.

First, just pipe the parsed log into explore:
$gst_log | explore

Now, using the :try command opens its REPL. Enter any pipeline at the top, and you will be able to explore its output below: Example of using the explore command and its REPL. The top of the window shows the current command, with the resulting data underneath as a table.

Switch between the command line and the pager using Tab, and while focused on the pager, search forwards or backwards using / and ?, or enter :help for explanations. Also have a look at the documentation on explore in the Nushell Book.

January 30, 2025 12:00 AM

January 29, 2025

Max Ihlenfeldt

Manually triggering Swarming tasks

Let's take a closer look at the different parts working together in the background to make Swarming work!

January 29, 2025 12:00 AM

January 27, 2025

Igalia WebKit Team

WebKit Igalia Periodical #11

Update on what happened in WebKit in the week from January 20 to January 27.

Cross-Port 🐱

GLib 2.70 will be required starting with the upcoming 2.48 stable releases. This made it possible to remove some code that is no longer needed.

Fixed unlimited memory consumption in case of playing regular video playback and using web inspector.

Speed up reading of large messages sent by the web inspector.

Web Platform 🌐

Implemented support for dialog.requestClose().

Multimedia 🎥

GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.

Fixed the assertion error "pipeline and player states are not synchronized" related to muted video playback in the presence of scroll. Work is ongoing regarding other bugs reproduced with the same video, some of them related to scroll and some likely indepedent.

Fixed lost initial audio samples played using WebAudio on 32-bit Raspberry Pi devices, by preventing the OpenMAX subsystem to enter standby mode.

Graphics 🖼️

Landed a change that fixes damage propagation of 3D-transformed layers.

Fixed a regression visiting any web page making use of accelerated ImageBuffers (e.g. canvas) when CPU rendering is used. We were unconditionally creating OpenGL fences, even in CPU rendering mode, and tried to wait for completion in a worker thread, that had no OpenGL context (due to CPU rendering). This is an illegal operation in EGL and fired an assertion, crashing the WebProcess.

Releases 📦️

Despite the work on the WPE Platform API, we continue to maintain the “classic” stack based on libwpe. Thus, we have released libwpe 1.16.1 with the small—but important—addition of support for representing analog button inputs for devices capable of reporting varying amounts of pressure.

That’s all for this week!

by Unknown at January 27, 2025 05:56 PM

January 21, 2025

Qiuyi Zhang (Joyee)

Executable loading and startup performance on macOS

Recently, I fixed a macOS-specific startup performance regression in Node.js after an extensive investigation. Along the way, I learned a lot about tools for macOS and Node

January 21, 2025 12:22 AM

January 20, 2025

Igalia WebKit Team

WebKit Igalia Periodical #10

Update on what happened in WebKit in the week from January 13 to January 20.

Cross-Port 🐱

JavaScriptCore 🐟

The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.

The JavaScriptCore GLib API has gained support for creating Promise objects. This allows integrating asynchronous functionality more ergonomically when interfacing between native code and JavaScript.

Graphics 🖼️

Elements with outlines inside scrolling containers now render their outlines properly.

Landed a change that adds multiple fixes to the damage propagation functionality in scenarios such as:

  • Layers with custom transforms.

  • Pages with custom viewport scale.

  • Dynamic layer size changes.

  • Scrollbar layers.

Landed a change that improves damage propagation in terms of animations handling.

Landed a change that prevents any kind of damage propagation when the feature is disabled at runtime using its corresponding flag. Before that, even though the functionality was runtime-disabled some memory usage and unneeded calculations were being done.

WPE WebKit 📟

WPE Platform API 🧩

New, modern platform API that supersedes usage of libwpe and WPE backends.

Drag gesture threshold, and key repeat delay/interval are now handled through the WPESettings API instead of using hardcoded values. While defaults typically work well, being able to tweak them for certain setups without rebuilding WPE is a welcome addition.

Sylvia has also improved the WPE Platform DRM/KMS backend to pick the default output device scaling factor using WPESettings.

Infrastructure 🏗️

libsoup has been added to Google's OSS-Fuzz program to help find security issues.

That’s all for this week!

by Unknown at January 20, 2025 10:32 PM

André Almeida

Linux 6.13, I WANT A GUITAR PEDAL

Just as 2025 is starting, we got a new Linux release in mid January, tagged as 6.13. In the spirit of holidays, Linus Torvalds even announced during 6.13-rc6 that he would be building and raffling a guitar pedal for a random kernel developer!

As usual, this release comes with a pack of exciting news done by the kernel community:

  • This release has two important improvements for task scheduling: lazy preemption and proxy execution. The goal with lazy preemption is to find a better balance between throughput and response time. A secondary goal is being able to make it the preferred non-realtime scheduling policy for most cases. Tasks that really need a reschedule in a hurry will use the older TIF_NEED_RESCHED flag. A preliminary work for proxy execution was merged, which will let us avoid priority-inversion scenarios when using real time tasks with deadline scheduling, for use cases such as Android.

  • New important Rust abstractions arrived, such as VFS data structures and interfaces, and also abstractions for misc devices.

  • Lightweight guard pages: guard pages are used to raise a fatal signal when accessed. This feature had the drawback of having a heavy performance impact, but in this new release the flag MADV_GUARD_INSTALL was added for the madvise() syscall, offering a lightweight way to guard pages.

To know more about the community improvements, check out the summary made by Kernel Newbies.

Now let’s highlight the contributions made by Igalians for this release.

Case-insensitive support for tmpfs

Case sensitivity has been a traditional difference between Linux distros and MS Windows, with the most popular filesystems been in opposite sides: while ext4 is case sensitive, NTFS is case insensitive. This difference proved to be challenging when Windows apps, mainly games, started to be a common use case for Linux distros (thanks to Wine!). For instance, games running through Steam’s Proton would expect that the path assets/player.png and assets/PLAYER.PNG would point to be the same file, but this is not the case in ext4. To avoid doing workarounds in userspace, ext4 has support for casefolding since Linux 5.2.

Now, tmpfs joins the group of filesystems with case-insensitive support. This is particularly useful for running games inside containers, like the combination of Wine + Flatpak. In such scenarios, the container shares a subset of the host filesystem with the application, mounting it using tmpfs. To keep the filesystem consistent, with the same expectations of the host filesystem about the mounted one, if the host filesystem is case-insensitive we can do the same thing for the container filesystem too. You can read more about the use case in the patchset cover letter.

While the container frameworks implement proper support for this feature, you can play with it and try it yourself:

$ mount -t tmpfs -o casefold fs_name /mytmpfs
$ cd /mytmpfs # case-sensitive by default, we still need to enable it
$ mkdir a
$ touch a; touch A
$ ls
A  a
$ mkdir B; cd b
cd: The directory 'b' does not exist
$ # now let's create a case-insensitive dir
$ mkdir case_dir
$ chattr +F case_dir
$ cd case_dir
$ touch a; touch A
$ ls
a
$ mkdir B; cd b
$ pwd
$ /home/user/mytmpfs/case_dir/B

V3D Super Pages support

As part of Igalia’s effort for enhancing the graphics stack for Raspberry Pi, the V3D DRM driver now has support for Super Pages, improving performance and making memory usage more efficient for Raspberry Pi 4 and 5. Using Linux 6.13, the driver will enable the MMU to allocate not only the default 4KB pages, but also 64KB “Big Pages” and 1MB “Super Pages”.

To measure the difference that Super Pages made to the performance, a series of benchmarks where used, and the highlights are:

  • +8.36% of FPS boost for Warzone 2100 in RPi4
  • +3.62% of FPS boost for Quake 2 in RPi5
  • 10% time reduction for the Mesa CI job v3dv-rpi5-vk-full:arm64
  • Aether SX2 emulator is more fluid to play

You can read a detailed post about this, with all benchmark results, in Maíra’s blog post, including a super cool PlayStation 2 emulation showcase!

New transparent_hugepage_shmem= command-line parameter

Igalia contributed new kernel command-line parameters to improve the configuration of multi-size Transparent Huge Pages (mTHP) for shmem. These parameters, transparent_hugepage_shmem= and thp_shmem=, enable more flexible and fine-grained control over the allocation of huge pages when using shmem.

The transparent_hugepage_shmem= parameter allows users to set a global default huge page allocation policy for the internal shmem mount. This is particularly valuable for DRM GPU drivers. Just as CPU architectures, GPUs can also take advantage of huge pages, but this is possible only if DRM GEM objects are backed by huge pages.

Since GEM uses shmem to allocate anonymous pageable memory, having control over the default huge page allocation policy allows for the exploration of huge pages use on GPUs that rely on GEM objects backed by shmem.

In addition, the thp_shmem= parameter provides fine-grained control over the default huge page allocation policy for specific huge page sizes.

By configuring page sizes and policies of huge-page allocations for the internal shmem mount, these changes complement the V3D Super Pages feature, as we can now tailor the size of the huge pages to the needs of our GPUs.

DRM and AMDGPU improvements

As usual in Linux releases, this one collects a list of improvements made by our team in DRM and AMDGPU driver from the last cycle.

Cosmic (the desktop environment behind Pop! OS) users discovered some bugs in the AMD display driver regarding the handling of overlay planes. These issues were pre-existing and came to light with the introduction of cursor overlay mode. They were causing page faults and divide errors. We debugged the issue together with reporters and proposed a set of solutions that were ultimately accepted by AMD developers in time for this release.

In addition, we worked with AMD developers to migrate the driver-specific handling of EDID data to the DRM common code, using drm_edid opaque objects to avoid handling raw EDID data. The first phase was incorporated and allowed the inclusion of new functionality to get EDID from ACPI. However, some dependencies between the AMD the Linux-dependent and OS-agnostic components were left to be resolved in next iterations. It means that next steps will focus on removing the legacy way of handling this data.

Also in the AMD driver, we fixed one out of bounds memory write, fixed one warning on a boot regression and exposed special GPU memory pools via the fdinfo common DRM framework.

In the DRM scheduler code, we added some missing locking, removed a couple of re-lock cycles for slightly reduced command submission overheads and clarified the internal documentation.

In the common dma-fence code, we fixed one memory leak on the failure path and one significant runtime memory leak caused by incorrect merging of fences. The latter was found by the community and was manifesting itself as a system out of memory condition after a few hours of gameplay.

sched_ext

The sched_ext landed in kernel 6.12 to enable the efficient development of BPF-based custom schedulers. During the 6.13 development cycle, the sched_ext community has made efforts to harden the code to make it more reliable and clean up the BPF APIs and documentation for clarity.

Igalia has contributed to hardening the sched_ext core code. We fixed the incorrect use of the scheduler run queue lock, especially during initializing and finalizing the BPF scheduler. Also, we fixed the missing RCU lock protections when the sched_core selects a proper CPU for a task. Without these fixes, the sched_ext core, in the worst case, could crash or raise a kernel oops message.

Other Contributions & Fixes

syzkaller, a kernel fuzzer, has been an important instrument to find kernel bugs. With the help of KASAN, a memory error detector, and syzbot, numerous such bugs have been reported and fixed.

Igalians have contributed to such fixes around a lot of subsystems (like media, network, etc), helping reduce the number of open bugs.

Check the complete list of Igalia’s contributions for the 6.13 release

Authored (70)

André Almeida

Changwoo Min

Christian Gmeiner

Guilherme G. Piccoli

Maíra Canal

Melissa Wen

Thadeu Lima de Souza Cascardo

Tvrtko Ursulin

Reviewed (41)

André Almeida

Christian Gmeiner

Iago Toral Quiroga

Jose Maria Casanova Crespo

Juan A. Suarez

Maíra Canal

Tvrtko Ursulin

Tested (1)

Christian Gmeiner

Acked (5)

Changwoo Min

Maíra Canal

Maintainer SoB (6)

Maíra Canal

January 20, 2025 12:00 AM

January 15, 2025

Ziran Sun

Igalia CSR – 2024 in Review

2024 was another busy year for Igalia CSR. In the past 12 months, Igalia has been continuing the traditional effort on the Non-Governmental Organizations (NGOs), Reforestation, and Social Investment projects. We added a new NGO to the list and started a couple of new Social Investment projects. The CSR commission has also been looking at creating guidance on how to create and organize a cooperative based on our experience and exploring new communication channels. And we are excited about our first CSR podcast!

First CSR Podcast

In July 2024 Igalia published the first CSR podcast, thanks to Paulo Matos, Eric Meyer, and Brian Kardell!

The podcast discusses Igalia’s flat structure and why we believe that CSR is interesting for Igalia. It also covers Igalia’s approach and perspective on our social responsibilities, the projects we have, Igalia’s approach and conscience, the impact of CSR, and our vision for the future.

If interested, check out Igalia Chats: Social Responsibility At Igalia.

0.7% and NGOs

Since 2007 Igalia has been donating 0.7% of our income annually to a list of NGOs proposed by the Igalians. Working with these partners, Igalia continued the effort in a wide range of areas including development aid and humanitarian action, health, functional disabilities, ecology and animal welfare, transparency, and information, etc.

These organizations reported regularly to the commission on finance, progress, and outcomes of the dedicated projects. Most projects have been progressing nicely and steadily in 2024. Here we’d like to talk about a couple of new NGO projects we recently added.

Degen Foundation

The Degen Foundation is a small private foundation, based in A Coruña that has been working for more than ten years on neurodegenerative diseases. The Foundation was born as Foundation “Curemos el Parkinson” in 2015 when its founder and president, Alberto Amil, was diagnosed with a particularly severe and complex version of Parkinson’s Disease.

Igalia started its collaboration with the Degen Foundation in 2023, mainly engaged in the development of the first phase of the Degen Community platform, a virtual meeting and emotional support point for patients. Studies consistently show that emotional support is as crucial as clinical support for neurodegenerative disease patients. The Degen Community platform aims to provide emotional support via a pack of tools/apps. The platform also will act as an information portal to publish relevant and up-to-date information for patients and carers. The platform has been under design and volunteers have been sourced to collaborate on content etc. The organization plans to launch the platform in 2025.

Hevya

In 2024, we introduced a new NGO, Hevya, to Igalia’s NGO list. Heyva Sor a Kurdistanê is a humanitarian aid organization established to assist people in the harsh conditions of the ongoing war in Kurdistan. The organization conducts relief efforts on fundamental needs such as food, health, shelter, and education. They have been providing continuous assistance and promoting solidarity, sacrifice, and mutual support in society since 1993. The organization has become a beacon of hope for the population in Kurdistan.

Emergency Project – Floods in Valencia

Storm DANA, which hit the Valencian territory in October 2024, has had a particular impact on Horta Sud, a region that has been devastated by the catastrophe.

The CSR Commission responded quickly to this emergency incident. After collecting the votes from Igalians, the commission decided to allocate the remaining undistributed NGO donation budget to aid Horta Sud in rebuilding their community. The first donation was made via Fundació Horta Sud and the second contribution via Cerai. Both Fundació Horta Sud and Cerai are local organizations working in the affected area and they were proposed by our colleague Jordi Mallach. We also bought a nice drawing by Mariscal, a well-known Valencian artist.

Social Investments

This year we started two new social investments: Extension of the Yoff Library project and Biomans Project. Meanwhile, after preparation was completed in 2023, UNICEF’s Casitas Infantiles project started on time.

Casitas Infantiles (Children’s Small Houses in Cuba)

In Cuba, state educational centers only care for around 19% of children between 1 – 6 years old. Casitas Infantiles was proposed by UNICEF to Igalia to help provide children with “Children’s Small Houses”, a concept of using adapted premises in workplaces, companies, and cooperatives as shelters for children’s education. This solution has been applied over the years in several provinces. It’s approved to work well and in high demand recently. After collecting feedback/thoughts from Igalians, the CSR commission reached the decision of supporting this for a period of 24 months, targeting setting up 28 small houses to accommodate 947 children.

The project started in March 2024. We received reports in June and December detailing the 16 first small houses selected, resource acquisition and distribution, and training activities carried out for 186 educational agents and 856 parents or childminders to raise awareness of positive methods of education and parenting. Workshops and training also were carried out to raise awareness of the opening and continuity of children’s houses in key sectors.

– Extension of the Yoff Library Project

This is an extension of our Library in Yoff project.

This project progressed as planned. The construction work (Phase 5) was completed. An on-site visit in June carried out the Training action (phase 6), and Furniture and bibliography sourcing operations (phase 7). A follow-up on-site visit in November brought back some lovely videos showing how the library looks and works today and the positive feedback from the locals.

The extension project was to support completing the library with a few final bits, including kitchen extension, school furniture renovation, and computer and network equipment. It’s great to see the impact the library has on the local community.

Biomans Project

Biomans is a circular economy project that focuses its activity on the sustainable use of residual wood for its conversion into wood biomass for heating. The goal of the project is to promote green and inclusive employment in rural Galicia for people at risk of social exclusion, mainly those with intellectual disabilities.

AMICOS Association Initiated the project and has acquired a plot of land as the premise for a factory and training unit to develop the activity. Igalia’s donation would be used for the construction of the factory.

Reforestation

Igalia started the Reforestation project in 2019. Partnering with Galnus , the Reforestation project focuses on conserving and expanding native, old-growth forests to capture, and long-term storing, carbon emissions.

Check on our blog, Igalia on Reforestation, for the projects carried out in the past few years.

In 2024, Galnus proposed ROIS III to Igalia. ROIS III is an extension of the project we are running at the Rois community land. The additional area to work in this project is around 1 hectare, adjacent to the 4 hectares we have already been working on. This would mean that we are building a new native forest of over 5 hectares. Funding for this extension work was in place in November and we shall hear more about this in 2025.

The other proposal from Galnus in 2024 was A Coruña Urban Forest project.

The concept of the urban forest project is to create an urban forest in the surroundings of “Parque de Bens. This project would become a model of public-private collaboration, encouraging the participation of other companies and public institutions in the development of environmental and social projects. It also incorporates a new model of green infrastructure, different from the usual parks and green areas, with high maintenance and low natural interest.

This is an exciting proposal. It’s different from our past and existing reforestation projects. After some discussions and feasibility studies, the commission decided to take a step forward and this proposal has now moved to the agreement handling stage.

Looking forward to 2025

With some exciting project proposals received from the Igalians for 2025, we are looking forward to another good year!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

by zsun at January 15, 2025 11:00 AM

January 13, 2025

Igalia WebKit Team

WebKit Igalia Periodical #9

Update on what happened in WebKit in the week from December 31, 2024 to January 13, 2025.

Cross-Port 🐱

Web Platform 🌐

Landed a fix to the experimental Trusted Types implementation for certain event handler content attributes not being protected even though they are sinks.

Landed a fix to experimental Trusted Types implementation where the SVGScriptElement.className property was being protected even though it's not a sink.

Multimedia 🎥

GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.

Support for the H.264 “constrained-high” and “high” profiles was improved in the GStreamer WebRTC backend.

The GStreamer WebRTC backend now has basic support for network conditions simulation, that will be useful to improve error recovery and packet loss coping mechanisms.

JavaScriptCore 🐟

The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.

JSC got a fix for a tricky garbage-collection issue.

Graphics 🖼️

Landed a change that enables testing the "damage propagation" functionality. This is a first step in a series of fixes and improvements that should stabilize that feature.

Damage propagation passes extra information that describes the viewport areas that have visually changed since the last frame across different graphics subsystems. This allows the WebKit compositor and the system compositor to reduce the amount of painting being done thus reducing usage of resources (CPU, GPU, and memory bus). This is especially helpful on constrained, embedded platforms.

WebKitGTK 🖥️

A patch landed to add metadata (title and creation/modification date) to PDF documents generated for printing.

The “suspended” toplevel state is now handled in GTK port to pause rendering when web views are fully obscured.

Jamie Murphy is doing a Coding Experience focused on adding support for WebExtensions. After porting a number of Objective-C classes to C++, to allow using them in all WebKit ports, she has recently made the code build on Linux, and started adding new public API to expose the functionality to GTK applications that embed web views. There is still plenty of work to do, but this is great progress nevertheless.

WPE WebKit 📟

Sylvia Li, who is also doing a Coding Experience, has updated WPEView so it will pick its default configuration values using the recently added WPESettings API.

That’s all for this week!

by Unknown at January 13, 2025 01:36 PM

Andy Wingo

an annoying failure mode of copying nurseries

I just found a funny failure mode in the Whippet garbage collector and thought readers might be amused.

Say you have a semi-space nursery and a semi-space old generation. Both are block-structured. You are allocating live data, say, a long linked list. Allocation fills the nursery, which triggers a minor GC, which decides to keep everything in the nursery another round, because that’s policy: Whippet gives new objects another cycle in which to potentially become unreachable.

This causes a funny situation!

Consider that the first minor GC doesn’t actually free anything. But, like, nothing: it’s impossible to allocate anything in the nursery after collection, so you run another minor GC, which promotes everything, and you’re back to the initial situation, wash rinse repeat. Copying generational GC is strictly a pessimization in this case, with the additional insult that it doesn’t preserve object allocation order.

Consider also that because copying collectors with block-structured heaps are unreliable, any one of your minor GCs might require more blocks after GC than before. Unlike in the case of a major GC in which this essentially indicates out-of-memory, either because of a mutator bug or because the user didn’t give the program enough heap, for minor GC this is just what we expect when allocating a long linked list.

Therefore we either need to allow a minor GC to allocate fresh blocks – very annoying, and we have to give them back at some point to prevent the nursery from growing over time – or we need to maintain some kind of margin, corresponding to the maximum amount of fragmentation. Or, or, we allow evacuation to fail in a minor GC, in which case we fall back to promotion.

Anyway, I am annoyed and amused and I thought others might share in one or the other of these feelings. Good day and happy hacking!

by Andy Wingo at January 13, 2025 09:59 AM

Igalia Compilers Team

Igalia's Compilers Team in 2024

2024 marked another year of exciting developments and accomplishments for Igalia's Compilers team packed with milestones, breakthroughs, and a fair share of long debugging sessions. From advancing JavaScript standards, improving LLVM RISC-V performance, to diving deep into Vulkan and FEX emulation, we did it all.

From shipping require(esm) in Node.js to porting LLVM’s libc to RISC-V, and enabling WebAssembly’s highest optimization tier in JavaScriptCore, last year was been nothing short of transformative. So, grab a coffee (or your preferred debugging beverage), and let’s take a look back at the milestones, challenges, and just plain cool stuff we've been up to last year.

JavaScript Standards #

We secured a few significant wins last year when it comes to JavaScript standards. First up, we got Import attributes (alongside JSON modules) to Stage 4. Import attributes allow customizing how modules are imported. For example, in all JavaScript environments you'll be able to natively import JSON files using

import myData from "./data" with { type: "json" };

Not far behind, the Intl.DurationFormat proposal also reached Stage 4. Intl.DurationFormat provides a built-in way to format durations (e.g., days, hours, minutes) in a locale-sensitive manner, enhancing internationalization support.

We also advanced ShadowRealm, the JavaScript API that allows you to execute code in a fresh and isolated environment, to Stage 2.7, making significant progress in resolving the questions about which web APIs should be included. We addressed open issues related to HTML integration and ensured comprehensive WPT coverage.

We didn't stop there though. We implemented MessageFormat 2.0 in ICU4C; you can read more about it in this blog post.

We also continued working on AsyncContext, an API that would let you persist state across awaits and other ways of running code asynchronously. The main blocker for Stage 2.7 is figuring out how it should interact with web APIs, and events in particular, and we have made a lot of progress in that area.

Meanwhile, the source map specification got a major update, with the publication of ECMA-426. This revamped spec, developed alongside Bloomberg, brings much-needed precision and new features like ignoreList, all aimed at improving interoperability.

We also spent time finishing Temporal, the modern date and time API for JavaScript—responding to feedback, refining the API, and reducing binary size. After clearing those hurdles, we moved forward with Test262 coverage and WebKit implementation.

Speaking of Test262, our team continued our co-stewardship of this project that ensures compatibility between JavaScript implementations across browsers and runtimes, thanks to support from the Sovereign Tech Fund. We worked on tests for everything from resizable ArrayBuffers to deferred imports, keeping JavaScript tests both thorough and up to date. To boost Test262 coverage, we successfully ported the first batch of SpiderMonkey's non-262 test suite to Test262. This initiative resulted in the addition of approximately 1,600 new tests, helping to expand and strengthen the testing framework. We would like to thank Bloomberg for supporting this work.

The decimal proposal started the year in Stage 1 and remains so, but it has gone through a number of iterative refinements after being presented at the TC39 plenary.

It’s was a productive year, and we’re excited to keep pushing these and more proposals forward.

Node.js #

In 2024, we introduced several key enhancements in Node.js.

We kicked things off by adding initial support for CPPGC-based wrapper management, which helps making the C++/JS corss-heap references visible to the garbage collector, reduces risks of memory leaks/use-after-frees, and improves garbage collection performance.

Node.js contains a significant amount of JavaScript internals, which are precompiled and preloaded into a custom V8 startup snapshot for faster startup. However, embedding these snapshots and code caches introduced reproducibility issues in Node.js executables. In 2024, We made the built-in snapshot and code cache reproducible, which is a major milestone in making the Node.js executables reproducible.

To help user applications start up faster, we also shipped support for on-disk compilation cache for user modules. Using this feature, TypeScript made their CLI start up ~2.5x faster, for example.

One of the impactful work we've done in 2024 was implementing and shipping require(esm), which is set to accelerate EcmaScript Modules (ESM) adoption in the Node.js ecosystem, as now package maintainers can ship ESM directly without having to choose between setting up dual shipping or losing reach, and it allows many frameworks/tools to load user code in ESM directly instead of doing hacky ESM -> CJS conversion , which tend to be bug-prone, or outright rejecting ESM code. Additionally, we landed module.registerHooks() to help the ecosystem migrate away from dependency of CJS loader internals and improve the state of ESM customization.

We also shipped a bunch of other smaller semver-minor features throughout 2024, such as support for embedded assets in single executable applications, crypto.hash() for more efficient one-off hashing, and v8.queryObjects() for memory leak investigation, to name a few.

Apart from project work, we also co-organized the Node.js collaboration summit in Bloomberg's London office, and worked on Node.js's Bluesky content automation for a more transparent and collaborative social media presence of the project.

You can learn more about the new module loading features from our talk at ViteConf Remote, and about require(esm) from our NodeConf EU talk.

JavaScriptCore #

In JavaScriptCore, we've ported BBQJIT, the first WebAssembly optimizing tier to 32-bits. It should be a solid improvement over the previous fast-and-reasonably-performant tier (BBQ) for most workloads. The previous incarnation of this tier generated the Air IR (the low-level); BBQJIT generates machine code more directly, which means JSC can tier-up to it faster.

We're also very close to enabling (likely this month) the highest optimizing tier (called "OMG") for WebAssembly on 32-bits. OMG generates code in the B3 IR, for which JSC implements many more optimizations. B3 then gets lowered to Air and finally to machine code. OMG can increase peak performance for many workloads, at the cost of more time spent on compilation. This has been a year-long effort by multiple people.

V8 #

In V8, we introduced a new entity called Isolate Groups to break the limit of 4Gb for pointer compression usage. It should help V8 embedders like node, deno, and others to allocate more isolate per process. We also supported multi-cage mode for the newly added sandbox feature of V8. You can read more about this in the blog post.

LLVM #

In LLVM's RISC-V backend, we added full scalable vectorization support for the BF16 vector extensions zvfbfmin and zvfbfwma. This means that code like the following C snippet:

void f(float * restrict dst, __bf16 * restrict a, __bf16 * restrict b, int n) {
for (int i = 0; i < n; i++)
dst[i] += ((float)a[i] * (float)b[i]);
}

Now gets efficiently vectorized into assembly like this:

	vsetvli	t4, zero, e16, m1, ta, ma
.LBB0_4:
	vl1re16.v	v8, (t3)
	vl1re16.v	v9, (t2)
	vl2re32.v	v10, (t1)
	vfwmaccbf16.vv	v10, v8, v9
	vs2r.v	v10, (t1)
	add	t3, t3, a4
	add	t2, t2, a4
	sub	t0, t0, a6
	add	t1, t1, a7
	bnez	t0, .LBB0_4

On top of that, we’ve made significant strides in overall performance last year. Here's a bar plot showing the improvements in performance from LLVM 17 last November to now.

Note: This accomplishment is the result of the combined efforts of many developers, including those at Igalia!

Bar graph

We also ported most of LLVM's libc to rv32 and rv64 in September (~91% of functions enabled). We presented the results at LLVM Developer's meeting 2024, you can watch the video of the talk to learn more about this.

Pie chart

Shader compilation (Mesa IR3) and dynamic binary translation (FEX-Emu) #

  • Shader compilation: In shader compilation we've been busy improving the ir3 compiler backend for the freedreno/turnip drivers for Adreno GPUs in Mesa. Some of the highlights include:

  • Dynamic Binary Translation

    • In 2024, Igalia had the exciting opportunity to contribute to FEX (https://fex-emu.com/), marking our first year working on the project. Last year, our primary focus was improving the x87 FPU emulation. While we worked on several pull requests with targeted optimizations, we also took on a few larger tasks that made a significant impact:

      • Introducing a new x87 stack optimization pass was one of our major contributions. You can dive deeper into the details of it in the blog post and explore the work itself in the pull request.

      • Another key feature we added was explicit mode switching between MMX and x87 modes, details can be found in the pull request.

      • We also focused on SVE optimization for x87 load/store operations. The details of this work can be found in the pull request here.

As we look ahead, we are excited to continue driving the evolution of these technologies while collaborating with our amazing partners and communities.

January 13, 2025 12:00 AM

January 09, 2025

Andy Wingo

ephemerons vs generations in whippet

Happy new year, hackfolk! Today, a note about ephemerons. I thought I was done with them, but it seems they are not done with me. The question at hand is, how do we efficiently and correctly implement ephemerons in a generational collector? Whippet‘s answer turns out to be simple but subtle.

on oracles

The deal is, I want to be able to evaluate different collector constructions and configurations, and for that I need a performance oracle: a known point in performance space-time against which to compare the unknowns. For example, I want to know how a sticky mark-bit approach to generational collection does relative to the conventional state of the art. To do that, I need to build a conventional system to compare against! If I manage to do a good job building the conventional evacuating nursery, it will have similar performance characteristics as other nurseries in other unlike systems, and thus I can use it as a point of comparison, even to systems I haven’t personally run myself.

So I am adapting the parallel copying collector I described last July to have generational support: a copying (evacuating) young space and a copying old space. Ideally then I’ll be able to build a collector with a copying young space (nursery) but a mostly-marking nofl old space.

notes on a copying nursery

A copying nursery has different operational characteristics than a sticky-mark-bit nursery, in a few ways. One is that a sticky mark-bit nursery will promote all survivors at each minor collection, leaving the nursery empty when mutators restart. This has the pathology that objects allocated just before a minor GC aren’t given a chance to “die young”: a sticky-mark-bit GC over-promotes.

Contrast that to a copying nursery, which can decide to promote a survivor or leave it in the young generation. In Whippet the current strategy for the parallel-copying nursery I am working on is to keep freshly allocated objects around for another collection, and only promote them if they are live at the next collection. We can do this with a cheap per-block flag, set if the block has any survivors, which is the case if it was allocated into as part of evacuation during minor GC. This gives objects enough time to die young while not imposing much cost in the way of recording per-object ages.

Recall that during a GC, all inbound edges from outside the graph being traced must be part of the root set. For a minor collection where we just trace the nursery, that root set must include all old-to-new edges, which are maintained in a data structure called the remembered set. Whereas for a sticky-mark-bit collector the remembered set will be empty after each minor GC, for a copying collector this may not be the case. An existing old-to-new remembered edge may be unnecessary, because the target object was promoted; we will clear these old-to-old links at some point. (In practice this is done either in bulk during a major GC, or the next time the remembered set is visited during the root-tracing phase of a minor GC.) Or we could have a new-to-new edge which was not in the remembered set before, but now because the source of the edge was promoted, we must adjoin this old-to-new edge to the remembered set.

To preserve the invariant that all edges into the nursery are part of the roots, we have to pay special attention to this latter kind of edge: we could (should?) remove old-to-promoted edges from the remembered set, but we must add promoted-to-survivor edges. The field tracer has to have specific logic that applies to promoted objects during a minor GC to make the necessary remembered set mutations.

other object kinds

In Whippet, “small” objects (less than 8 kilobytes or so) are allocated into block-structed spaces, and large objects have their own space which is managed differently. Notably, large objects are never moved. There is generational support, but it is currently like the sticky-mark-bit approach: any survivor is promoted. Probably we should change this at some point, at least for collectors that don’t eagerly promote all objects during minor collections.

finalizers?

Finalizers keep their target objects alive until the finalizer is run, which effectively makes each finalizer part of the root set. Ideally we would have a separate finalizer table for young and old objects, but currently Whippet just has one table, which we always fully traverse at the end of a collection. This effectively adds the finalizer table to the remembered set. This is too much work—there is no need to visit finalizers for old objects in a minor GC—but it’s not incorrect.

ephemerons

So what about ephemerons? Recall that an ephemeron is an object E×KV in which there is an edge from E to V if and only if both E and K are live. Implementing this conjunction is surprisingly gnarly; you really want to discover live ephemerons while tracing rather than maintaining a global registry as we do with finalizers. Whippet’s algorithm is derived from what SpiderMonkey does, but extended to be parallel.

The question is, how do we implement ephemeron-reachability while also preserving the invariant that all old-to-new edges are part of the remembered set?

For Whippet, the answer turns out to be simple: an ephemeron E is never older than its K or V, by construction, and we never promote E without also promoting (if necessary) K and V. (Ensuring this second property is somewhat delicate.) In this way you never have an old E and a young K or V, so no edge from an ephemeron need ever go into the remembered set. We still need to run the ephemeron tracing algorithm for any ephemerons discovered as part of a minor collection, but we don’t need to fiddle with the remembered set. Phew!

conclusion

As long all promoted objects are older than all survivors, and that all ephemerons are younger than the objects referred to by their key and value edges, Whippet’s parallel ephemeron tracing algorithm will efficiently and correctly trace ephemeron edges in a generational collector. This applies trivially for a sticky-mark-bit collector, which always promotes and has no survivors, but it also holds for a copying nursery that allows for survivors after a minor GC, as long as all survivors are younger than all promoted objects.

Until next time, happy hacking in 2025!

by Andy Wingo at January 09, 2025 10:15 AM

January 08, 2025

Eric Meyer

CSS Wish List 2025

Back in 2023, I belatedly jumped on the bandwagon of people posting their CSS wish lists for the coming year.  This year I’m doing all that again, less belatedly! (I didn’t do it last year because I couldn’t even.  Get it?)

I started this post by looking at what I wished for a couple of years ago, and a small handful of my wishes came true:

Note that by “came true”, I mean “reached at least Baseline Newly Available”, not “reached Baseline Universal”; that latter status comes over time.  And more :has() isn’t really a feature you can track, but I do see more people sharing cool :has() tricks and techniques these days, so I’ll take that as a positive signal.

A couple more of my 2023 wishes are on the cusp of coming true:

Those are both in the process of rolling out, and look set to reach Baseline Newly Available before the year is done.  I hope.

That leaves the other half of the 2023 list, none of which has seen much movement.  So those will be the basis of this year’s list, with some new additions.

Hanging punctuation

WebKit has been the sole implementor of this very nice typographic touch for almost a decade now.  The lack of any support by Blink and Gecko is now starting to verge on feeling faintly ridiculous.

Margin and line box trimming

Trim off the leading block margin on the first child in an element, or the trailing block margin of the last child, so they don’t stick out of the element and mess with margin collapsing.  Same thing with block margins on the first and last line boxes in an element.  And then, be able to do similar things with the inline margins of elements and line boxes!  All these things could be ours.

Stroked text

We can already fake text stroking with text-shadow and paint-order, at least in SVG.  I’d love to have a text-stroke property that can be applied to HTML, SVG, and MathML text.  And XML text and any text that CSS is able to style.  It should be at least as powerful as SVG stroking, if not more so.

Expanded attr() support

This has seen some movement specification-wise, but last I checked, no implementation promises or immediate plans.  Here’s what I want to be able to do:

td {width: attr(data-size em, 1px));

<td data-size="5">…</td>

The latest Values and Units module describes this, so fingers crossed it starts to gain some momentum.

Exclusions

Yes, I still want CSS Exclusions, a lot.  They would make some layout hacks a lot less hacky, and open the door for really cool new hacks, by letting you just mark an element as creating a flow exclusions for the content of other elements.  Position an image across two columns of text and set it to exclude, and the text of those columns will flow around or past it like it was a float.  This remains one of the big missing pieces of CSS layout, in my view.  Linked flow regions is another.

Masonry layout

This one is a bit stalled because the basic approach still hasn’t been decided.  Is it part of CSS Grid or its own display type?  It’s a tough call.  There are persuasive arguments for both.  I myself keep flip-flopping on which one I prefer.

Designers want this.  Implementors want this.  In some ways, that’s what makes it so difficult to pick the final syntax and approach: because everyone wants this, everyone wants to make the exactly perfect right choices for now, for the future, and for ease of teaching new developers.  That’s very, very hard.

Grid track and gap styles

Yeah, I still want a Grid equivalent of column-rule, except more full-featured and powerful.  Ideally this would be combined with a way to select individual grid tracks, something like:

.gallery {display: grid;}
.gallery:col-track(4) {gap-rule: 2px solid red;}

…in order to just put a gap rule on that particular column.  I say that would be ideal because then I could push for a way to set the gap value for individual tracks, something like:

.gallery {gap: 1em 2em;}
.gallery:row-track(2) {gap: 2em 0.5em;}

…to change the leading and trailing gaps on just that row.

Custom media queries

This was listed as “Media query variables” in 2023.  With these, you could define a breakpoint set like so:

@custom-media --sm (inline-size <= 25rem);
@custom-media --md (25rem < inline-size <= 50rem);
@custom-media --lg (50rem < inline-size);

body {margin-inline: auto;}
@media (--sm) {body {inline-size: auto;}}
@media (--md) {body {inline-size: min(90vw, 40em);}
@media (--lg) {body {inline-size: min(90vw, 55em);}

In other words, you can use custom media queries as much as you want throughout your CSS, but change their definitions in just one place.  It’s CSS variables, but for media queries!  Let’s do it.

Unprefix all the things

Since we decided to abandon vendor prefixing in favor of feature flags, I want to see anything that’s still prefixed get unprefixed, in all browsers.  Keep the support for the prefixed versions, sure, I don’t care, just let us write the property and value names without the prefixes, please and thank you.

Grab bag

I still would like a way to indicate when a shorthand property is meant for logical rather than physical directions, a way to apply a style sheet to a single element, the ability to add or subtract values from a shorthand without having to rewrite the whole thing, and styles that cross resource boudnaries.  They’re all in the 2023 post.

Okay, that’s my list.  What’s yours?


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at January 08, 2025 12:45 PM

January 02, 2025

Orko Garai

What is an Input Method Editor?

I’ve been working on chromium input method editor integration for linux wayland at Igalia over the past several months, and I thought I’d share some insights I’ve gained along the way and some highlights from my work.

This is the first in a series of blog posts about input method editors, or IME in short. Here I will try to explain what an IME really is at a high level before diving deeper into some of the technical details of IME support in linux and chromium in upcoming posts.

January 02, 2025 06:29 PM

December 30, 2024

Igalia WebKit Team

WebKit Igalia Periodical #8

Update on what happened in WebKit in the week from December 23 to December 30.

Community & Events 🤝

Published an article on CSS Anchor Positioning. It discusses the current status of the support across browsers, Igalia's contributions to the WebKit's implementation, and the predictions for the future.

That’s all for this week!

by Unknown at December 30, 2024 04:39 PM

December 27, 2024

Pawel Lampe

Contributing to CSS Anchor Positioning in WebKit.

CSS Anchor Positioning is a novel CSS specification module that allows positioned elements to size and position themselves relative to one or more anchor elements anywhere on the web page. In simpler terms, it is a new web platform API that simplifies advanced relative-positioning scenarios such as tooltips, menus, popups, etc.

CSS Anchor Positioning in practice #

To better understand the true power it brings, let’s consider a non-trivial layout presented in Figure 1:

Non-trivial layout.

In the past, creating a context menu with position: fixed and positioned relative to the button required doing positioning-related calculations manually. The more complex the layout, the more complex the situation. For example, if the table in the above example was in a scrollable container, the position of the context menu would have to be updated manually on every scroll event.

With the CSS Anchor Positioning the solution to the above problem becomes trivial and requires 2 parts:

  • The <button> element must be marked as an anchor element by adding anchor-name: --some-name.
  • The context menu element must position itself using the anchor() function: left: anchor(--some-name right); top: anchor(--some-name bottom).

The above is enough for the web engine to understand that the context menu element’s left and top must be positioned to the anchor element’s right and bottom. With that, the web engine can carry out the job under the hood, so the result is as in Figure 2:

Non-trivial layout with anchor-positioned context menu.

As the above demonstrates, even with a few simple API pieces, it’s now possible to address very complex scenarios in a very elegant fashion from the web developer’s perspective. Moreover, CSS Anchor Positioning offers even more than that. There are numerous articles with great examples such as this MDN article, this css-tricks article, or this chrome blog post, but the long story short is that both positioning and sizing elements relative to anchors are now very simple.

Implementation status across web engines #

The first draft of the specification was published in early 2023, which in the web engines field is not so long time ago. Therefore - as one can imagine - not all the major web engines support it yet. The first (and so far the only) web engine to support CSS Anchor Positioning was Chromium (see the introduction blog post) - thus the information on caniuse.com. However, despite the information visible on the WPT results page, the other web engines are currently implementing it (see the meta bug for Gecko and bug list for WebKit). The lack of progress on the WPT results page is due to the feature not being enabled by default yet in those cases.

Implementation status in WebKit #

From the commits visible publicly, one can deduce that the work on CSS Anchor Positioning in WebKit has been started by Apple early 2024. The implementation was initiated by adding a core part - support for anchor-name, position-anchor, and anchor(). Those 2 properties and function are enough to start using the feature in real-world scenarios as well as more sophisticated WPT tests.

The work on the above had been finished by the end of Q3 2024, and then - in Q4 2024 - the work significantly intensified. A parsing/computing support has been added for numerous properties and functions and moreover, a lot of new functionalities and bug fixes landed afterwards. One could expect some more things to land by the end of the year even if there’s not much time left.

Overall, the implementation is in progress and is far from being done, but can already be tested in many real-world scenarios. This can be done using custom WebKit builds (across various OSes) or using Safari Technology Preview on Mac. The precondition for testing is, however, that the runtime preference called CSSAnchorPositioning is enabled.

My contributions #

Since the CSS Anchor Positioning in WebKit is still work in progress, and since the demand for the set of features this module brings is high, I’ve been privileged to contribute a little to the implementation myself. My work so far has been focused around the parts of API that allow creating menu-like elements becoming visible on demand.

The first challenge with the above was to fix various problems related to toggling visibility status such as:

The obvious first step towards addressing the above was to isolate elegant scenarios to reproduce the above. In the process, I’ve created some test cases, and added them to WPT. With tests in place, I’ve imported them into WebKit’s source tree and proceeded with actual bug fixing. The result was the fix for the above crash, and the fix for the layout being broken. With that in place, the visibility of menu-like elements can be changed without any problems now.

The second challenge was about the missing features allowing automatic alignment to the anchor. In a nutshell, to get the alignment like in the Figure 3:

Non-trivial layout with centered anchor-positioned context menu.

there are 2 possibilities:

At first, I wasn’t aware of the anchor-center and hence I’ve started initial work towards supporting position-area. Once I became aware, however, I’ve switched my focus to implementing anchor-center and left the above for Apple to continue - not to block them. Until now, both the initial and core parts of anchor-center implementation have landed. It means, the basic support is in place.

Despite anchor-center layout tests passing, I’ve already discovered some problems such as:

and I anticipate more problems may appear once the testing intensifies.

To address the above, I’ll be focusing on adding extra WPT coverage along with fixing the problems one by one. The key is to make sure that at the end of the day, all the unexpected problems are covered with WPT test cases. This way, other web engines will also benefit.

The future #

With WebKit’s implementation of CSS Anchor Positioning in its current shape, the work can be very much parallel. Assuming that Apple will keep working on that at the same pace as they did for the past few months, I wouldn’t be surprised if CSS Anchor Positioning would be pretty much done by the end of 2025. If the implementation in Gecko doesn’t stall, I think one can also expect a lot of activity around testing in the WPT. With that, the quality of implementation across the web engines should improve, and eventually (perhaps in 2026?) the CSS Anchor Positioning should reach the state of full interoperability.

December 27, 2024 12:00 AM