Planet Igalia

September 29, 2022

Clayton Craft

8 cubic feet

This is the average amount of natural gas burned by my water heater to heat about 1 hot shower's worth of water. Yes, I am now able to monitor resource consumption at home. Watch out!

Sorry to the rest of the world who use the superior metric system, but units of gas are sold here in "therms", which is much easier to convert to feet³ than to meters³...

Anyways, I recently added a few improvements to my home to help monitor energy and water consumption. Thank you to the companies / municipalities that service my water and natural gas service for 1) installing wireless meters and 2) leaving them unencrypted (more on that in a bit!) For measuring electricity, I sprung for an IoTaWatt, with split-core current transformers installed in/under the breaker panel. The whole setup is "managed" by Home Assistant.

Monitoring the utility wireless meters for water and natural gas consumption was surprisingly easy with an SDR (I used one from rtl-sdr). I've heard that in other countries, the messages these wireless meters broadcast in every direction is encrypted. At my location, for whatever reasons (or lack thereof), they are not. So not only can I easily receive usage info about the meters measuring my service, but I'm able to receive messages from quite a few other meters located around my home... at neighbors' houses, across the street, whatever. This particular water meter includes some bits to indicate whether it detects a "leak condition", where water is being drained away without interruption for some large number of hours. While trying to identify my meter in the swarm of meters all shouting around me, I noticed that some neighbor has a water leak. I had no way to identify who they were, since (luckily) the meters aren't broadcasting any obvious personally identifiable info. The "leak" went away after another day, before I was able to call the water provider. If that was a legitimate water leak that lasted many days/weeks, it would be really useful info to know if you were the one with the leak.

I haven't had this level of sensing set up for very long, however I can't help but notice the real cost of doing certain things we do regularly. Boiling eggs on the stove takes about 0.5kWh of electricity, for example. 0.5kWh is about the amount of energy that 5 adult humans produce over 1 hour, assuming the Wikipedia estimate of 100 watts per adult. That may or may not sound like much, but it adds up. What I can't see (yet?) is what it took my electricity provider to generate 0.5kWh of electricity.

It's a little nuts to think about, and I'm not sure if this is going to heavily influence my/my family's behavior moving forward. I think it may... Because even after a small number of days it's really hard to ignore some usage patterns that are emerging. I also can't help but think if others would act differently if they could see a more accurate real cost of doing things.

September 29, 2022 12:00 AM

September 27, 2022

Manuel Rego

TPAC 2022

A few weeks ago the W3C arranged a new edition of TPAC, this time it was an hybrid event. I had the chance to attend onsite in Vancouver (Canada).

This is the second conference for me this year after the Web Engines Hackfest, and the first one travelling away. It was awesome to meet so many people in real life again, some of them for the first time face to face. Also it was great spending some days with the other igalians attending (Andreu Botella, Brian Kardell and Valerie Young).

This is my third TPAC and I’ve experienced an evolution since my first one back in 2016 in Lisbon. At that time Igalia started to be known in some specific groups, mostly ARIA and CSS Working Groups, but many people didn’t know us at all. However nowadays lots of people know us, Igalia is mentioned every now and then in conversations, in presentations, etc. When someone asks where you work and you reply Igalia, they only have good words about the work we do. “You should hire Igalia to implement that” is a sentence that we heard several times during the whole week. It’s clear how we’ve grown into the ecosystem and how we can have an important impact on the web platform evolution. These days many companies understand the benefits of contributing to the web platform bringing new priorities to the table. And Igalia is ready to help pushing some features forward on the different standard bodies and browser engines.

Igalia sponsored TPAC in bronze level and also the inclusion fund.

The rest of this blog post is about some brief highlights related to my participation on the event.

Web Developers Meetup

On Tuesday evening there was the developer meetup with 4 talks and some demos. The talks were very nice, recordings will be published soon but meanwhile you can check slides at the meetup website. It’s worth to highlight that Igalia is somehow involved in topics related to the four presentations, some of them are probably better known than others. But some examples so you can connect the dots:

Wolvic logo Wolvic logo

On top of the talks, Igalia was showing a demo of Wolvic, the open source browser for WebXR devices. A bunch of people asked questions and got interested on it, and everyone liked the stickers with the Wolvic logo.

Igalia also sponsored this meetup.

CSS

Lots of amazing things have been happening to CSS recently like :has() and Container Queries. But these folks are always thinking in the next steps and during the CSSWG meeting there were discussions about some new proposals:

  • CSS Anchoring: This is about positioning things like pop-ups relative to other elements. The goal is to simplify current situation so it can be done with some CSS lines instead of having to deal manually with some JavaScript code.
  • CSS Toggles: This is a proposal to properly add the toggle concept in CSS, similar to what people have been doing with the “Checkbox Hack”.

Picture of the CSSWG meeting by Rossen Atanassov Picture of the CSSWG meeting by Rossen Atanassov

On the breaks there were lots of informal discussions about the line-clamp property and how to unprefix it. There different proposals about how to address this issue, let’s hope there’s some sort of agreement and this can be implemented soon.

Accessibility and Shadow DOM

Shadow DOM is cool but also quite complex, and right now the accessibility story with Shadow DOM is mostly broken. The Accessibility Object Model (AOM) proposals aim to solve some of these issues.

At TPAC a bunch of folks interested on AOM arranged an informal meeting followed by a number of conversations around some of these topics. On one side, there’s the ARIA reflection proposal that is being worked on in different browsers, which will allow setting ARIA attributes via JavaScript. In addition, there were lots of discussions around Cross-root ARIA Delegation proposal, and its counterpart Cross-root ARIA Reflection, thinking about the best way to solve these kind of issues. We now have some kind of agreement on an initial proposal design that would need to be discussed with the different groups involved. Let’s hope we’re on the right path to find a proper solution to help making Shadow DOM more accessible.

Maps on the Web

This is one of those problems that don’t have a proper solution on the web platform. Igalia has been in conversations around this for a while, for example you can check this talk by Brian Kardell at a W3C workshop. This year Bocoup together with the Natural Resources Canada did a research on the topic, defining a long term roadmap to add support for maps on the Web.

There were a few sessions at TPAC about maps where Igalia participated. Things are still in a kind of early stage, but some of the features described on the roadmap are very interesting and have a broader scope than only maps (for example pan and zoom would be very useful for other more generic use cases too). Let’s see how all this evolves in the future.

Wrap-up

And that’s all from my side after a really great week in the nice city of Vancouver. As usual in this kind of event, the most valuable part was meeting people and having lots of informal conversations all over the place. The hybrid setup was really nice, but still those face-to-face conversations are something different to what you can do attending remotely.

See you all in the next editions!

September 27, 2022 10:00 PM

September 23, 2022

André Almeida

futex2 at Linux Plumbers Conference 2022

In-person conferences are finally back! After two years of remote conferences, the kernel development community got together in Dublin, Ireland, to discuss current problems that need collaboration to be solved. As in the past editions, I took the opportunity to discuss about futex2, a project I’m deeply involved in. futex2 is a project to solve issues found in the current futex interface. This years’ session was about NUMA awaress of futex.

September 23, 2022 12:00 AM

September 21, 2022

Alex Bradbury

What's new for RISC-V in LLVM 15

LLVM 15.0.0 was released around about two weeks ago now, and I wanted to highlight some of RISC-V specific changes or improvements that were introduced while going into a little more detail than I was able to in the release notes.

In case you're not familiar with LLVM's release schedule, it's worth noting that there are two major LLVM releases a year (i.e. one roughly every 6 months) and these are timed releases as opposed to being cut when a pre-agreed set of feature targets have been met. We're very fortunate to benefit from an active and growing set of contributors working on RISC-V support in LLVM projects, who are responsible for the work I describe below - thank you! I coordinate biweekly sync-up calls for RISC-V LLVM contributors, so if you're working in this area please consider dropping in.

Linker relaxation

Linker relaxation is a mechanism for allowing the linker to optimise code sequences at link time. A code sequence to jump to a symbol might conservatively take two instructions, but once the target address is known at link-time it might be small enough to fit in the immediate of a single instruction, meaning the other can be deleted. Because a linker performing relaxation may delete bytes (rather than just patching them), offsets including those for jumps within a function may be changed. To allow this to happen without breaking program semantics, even local branches that might typically be resolved by the assembler must be emitted as a relocation when linker relaxation is enabled. See the description in the RISC-V psABI or Palmer Dabbelt's blog post on linker relaxation for more background.

Although LLVM has supported codegen for linker relaxation for a long time, LLD (the LLVM linker) has until now lacked support for processing these relaxations. Relaxation is primarily an optimisation, but processing of R_RISCV_ALIGN (the alignment relocation) is necessary for correctness when linker relaxation is enabled, meaning it's not possible to link such object files correctly without at least some minimal support. Fangrui Song implemented support for R_RISCV_ALIGN/R_RISCV_CALL/R_RISCV_CALL_PLT/R_RISCV_TPREL_* relocations in LLVM 15 and wrote up a blog post with more implementation details, which is a major step in bringing us to parity with the GCC/binutils toolchain.

Optimisations

As with any release, there's been a large number of codegen improvements, both target-independent and target-dependent. One addition to highlight in the RISC-V backend is the new RISCVCodeGenPrepare pass. This is the latest piece of a long-running campaign (largely led by Craig Topper) to improve code generation related to sign/zero extensions on RV64. CodeGenPrepare is a target-independent pass that performs some late-stage transformations to the input ahead of lowering to SelectionDAG. The RISC-V specific version looks for opportunities to convert zero-extension to i64 with a sign-extension (which is cheaper).

Another new pass that may be of interest is RISCVMakeCompressible (contributed by Lewis Revill and Craig Blackmore). Rather than trying to improve generated code performance, this is solely focused on reducing code size, and may increase the static instruction count in order to do so (which is why it's currently only enabled at the -Oz optimisation level). It looks for cases where an instruction has been selected which can't be represented by one of the compressed (16-bit as opposed to 32-bit wide) instruction forms. For instance due to the register not being one of the registers addressable from the compressed instruction, or the offset being out of range). It will then look for opportunities to transform the input to make the instructions compressible. Grabbing two examples from the header comment of the pass:

; 'zero' register not addressable in compressed store.
                 =>   li a1, 0
sw zero, 0(a0)   =>   c.sw a1, 0(a0)
sw zero, 8(a0)   =>   c.sw a1, 8(a0)
sw zero, 4(a0)   =>   c.sw a1, 4(a0)
sw zero, 24(a0)  =>   c.sw a1, 24(a0) 

and

; compressed stores support limited offsets
lui a2, 983065     =>   lui a2, 983065 
                   =>   addi  a3, a2, -256
sw  a1, -236(a2)   =>   c.sw  a1, 20(a3)
sw  a1, -240(a2)   =>   c.sw  a1, 16(a3)
sw  a1, -244(a2)   =>   c.sw  a1, 12(a3)
sw  a1, -248(a2)   =>   c.sw  a1, 8(a3)
sw  a1, -252(a2)   =>   c.sw  a1, 4(a3)
sw  a0, -256(a2)   =>   c.sw  a0, 0(a3)

There's a whole range of other backend codegen improvements, including additions to existing RISC-V specific passes but unfortunately it's not feasible to enumerate them all.

One improvement to note from the Clang frontend is that the C intrinsics for the RISC-V Vector extension are now lazily generated, avoiding the need to parse a huge pre-generated header file and improving compile times.

Support for new instruction set extensions

A batch of new instruction set extensions were ratified at the end of last year (see also the recently ratified extension list. LLVM 14 already featured a number of these (with the vector and ratified bit manipulation extensions no longer being marked as experimental). In LLVM 15 we were able to fill in some of the gaps, adding support for additional ratified extensions as well as some new experimental extensions.

In particular:

  • Assembler and disassembler support for the Zdinx, Zfinx, Zhinx, and Zhinxmin extensions. Cores that implement these extensions store double/single/half precision floating point values in the integer register file (GPRs) as opposed to having a separate floating-point register file (FPRs).
    • The instructions defined in the conventional floating point extensions are defined to instead operate on the general purpose registers, and instructions that become redundant (namely those that involve moving values from FPRs to GPRs) are removed.
    • Cores might implement these extensions rather than the conventional floating-point in order to reduce the amount of architectural state that is needed, reducing area and context-switch cost. The downside is of course that register pressure for the GPRs will be increased.
    • Codegen for these extensions is not yet supported (i.e. the extensions are only supported for assembly input or inline assembly). A patch to provide this support is under review though.
  • Assembler and disassembler support for the Zicbom, Zicbop, and Zicboz extensions. These cache management operation (CMO) extensions add new instructions for invalidating, cleaning, and flushing cache blocks (Zicbom), zeroing cache blocks (Zicboz), and prefetching cache blocks (Zicbop).
    • These operations aren't currently exposed via C intrinsics, but these will be added once the appropriate naming has been agreed.
    • One of the questions raised during implementation was about the preferred textual format for the operands. Specifically, whether it should be e.g. cbo.clean (a0)/cbo.clean 0(a0) to match the format used for other memory operations, or cbo.clean a0 as was used in an early binutils patch. We were able to agree between the CMO working group, LLVM, and GCC developers on the former approach.
  • Assembler, disassembler, and codegen support for the Zmmul extension. This extension is just a subset of the 'M' extension providing just the multiplication instructions without the division instructions.
  • Assembler and disassembler support for the additional CSRs (control and status registers) and instructions introduced by the hypervisor and Svinval additions to the privileged architecture specification. Svinval provides fine-grained address-translation cache invalidation and fencing, while the hypervisor extension provides support for efficiently virtualising the supervisor-level architecture (used to implement KVM for RISC-V).
  • Assembler and disassembler support for the Zihintpause extension. This adds the pause instruction intended for use as a hint within spin-wait loops.
    • Zihintpause was actually the first extension to go through RISC-V International's fast-track architecture extension process back in early 2021. We were clearly slow to add it to LLVM, but are trying to keep a closer eye on ratified extensions going forwards.
  • Support was added for the not yet ratified Zvfh extension, providing support for half precision floating point values in RISC-V vectors.
    • Unlike the extensions listed above, support for Zvfh is experimental. This is a status we use within the RISC-V backend for extensions that are not yet ratified and may change from release to release with no guarantees on backwards compatibility. Enabling support for such extensions requires passing -menable-experimental-extensions to Clang and specifying the extension's version when listing it in the -march string.

It's not present in LLVM 15, but LLVM 16 onwards will feature a user guide for the RISC-V target summarising the level of support for each extension (huge thanks to Philip Reames for kicking off this effort).

Other changes

In case I haven't said it enough times, there's far more interesting changes than I could reasonably cover. Apologies if I've missed your favourite new feature or improvement. In particular, I've said relatively little about RISC-V Vector support. There's been a long series of improvements and correctness fixes in the LLVM 15 development window, after RVV was made non-experimental in LLVM 14 and there's much more to come in LLVM 16 (e.g. scalable vectorisation becoming enabled by default).

September 21, 2022 12:00 PM

September 20, 2022

Danylo Piliaiev

:tada: Turnip now exposes Vulkan 1.3 :tada:

Photo of the RB3 development board with Adreno 630 GPU
RB3 development board with Adreno 630 GPU

It is a major milestone for a driver that is created without any hardware documentation.

The last major roadblocks were VK_KHR_dynamic_rendering and, to a much lesser extent, VK_EXT_inline_uniform_block. Huge props to Connor Abbott for implementing them both!

Screenshot of mesamatrix.net showing that Turnip has 100% of features required for Vulkan 1.3

VK_KHR_dynamic_rendering was an especially nasty extension to implement on tiling GPUs because dynamic rendering allows splitting a render pass between several command buffers.

For desktop GPUs there are no issues with this. They could just record and execute commands in the same order they are submitted without any additional post-processing. Desktop GPUs don’t have render passes internally, they are just a sequence of commands for them.

On the other hand, tiling GPUs have the internal concept of a render pass: they do binning of the whole render pass geometry first, load part of the framebuffer into the tile memory, execute all render pass commands, store framebuffer contents into the main memory, then repeat load_framebufer -> execute_renderpass -> store_framebuffer for all tiles. In Turnip the required glue code is created at the end of a render pass, while the whole render pass contents (when the render pass is split across several command buffers) are known only at the submit time. Therefore we have to stitch the final render pass right there.

What’s next?

Implementing Vulkan 1.3 was necessary to support the latest DXVK (Direct3D 9-11 translation layer). VK_KHR_dynamic_rendering itself was also necessary for the latest VKD3D (Direct3D 12 translation layer).

For now my plan is:

  • Continue implementing new extensions for DXVK, VKD3D, and Zink as they come out.
  • Focus more on performance.
  • Improvements to driver debug tooling so it works better with internal and external debugging utilities.

by Danylo Piliaiev at September 20, 2022 09:00 PM

September 13, 2022

Alberto Garcia

Adding software to the Steam Deck with systemd-sysext

Yakuake on SteamOS

Introduction: an immutable OS

The Steam Deck runs SteamOS, a single-user operating system based on Arch Linux. Although derived from a standard package-based distro, the OS in the Steam Deck is immutable and system updates replace the contents of the root filesystem atomically instead of using the package manager.

An immutable OS makes the system more stable and its updates less error-prone, but users cannot install additional packages to add more software. This is not a problem for most users since they are only going to run Steam and its games (which are stored in the home partition). Nevertheless, the OS also has a desktop mode which provides a standard Linux desktop experience, and here it makes sense to be able to install more software.

How to do that though? It is possible for the user to become root, make the root filesytem read-write and install additional software there, but any changes will be gone after the next OS update. Modifying the rootfs can also be dangerous if the user is not careful.

Ways to add additional software

The simplest and safest way to install additional software is with Flatpak, and that’s the method recommended in the Steam Deck Desktop FAQ. Flatpak is already installed and integrated in the system via the Discover app so I won’t go into more details here.

However, while Flatpak works great for desktop applications not every piece of software is currently available, and Flatpak is also not designed for other types of programs like system services or command-line tools.

Fortunately there are several ways to add software to the Steam Deck without touching the root filesystem, each one with different pros and cons. I will probably talk about some of them in the future, but in this post I’m going to focus on one that is already available in the system: systemd-sysext.

About systemd-sysext

This is a tool included in recent versions of systemd and it is designed to add additional files (in the form of system extensions) to an otherwise immutable root filesystem. Each one of these extensions contains a set of files. When extensions are enabled (aka “merged”) those files will appear on the root filesystem using overlayfs. From then on the user can open and run them normally as if they had been installed with a package manager. Merged extensions are seamlessly integrated with the rest of the OS.

Since extensions are just collections of files they can be used to add new applications but also other things like system services, development tools, language packs, etc.

Creating an extension: yakuake

I’m using yakuake as an example for this tutorial since the extension is very easy to create, it is an application that some users are demanding and is not easy to distribute with Flatpak.

So let’s create a yakuake extension. Here are the steps:

1) Create a directory and unpack the files there:

$ mkdir yakuake
$ wget https://steamdeck-packages.steamos.cloud/archlinux-mirror/extra/os/x86_64/yakuake-21.12.1-1-x86_64.pkg.tar.zst
$ tar -C yakuake -xaf yakuake-*.tar.zst usr

2) Create a file called extension-release.NAME under usr/lib/extension-release.d with the fields ID and VERSION_ID taken from the Steam Deck’s /etc/os-release file.

$ mkdir -p yakuake/usr/lib/extension-release.d/
$ echo ID=steamos > yakuake/usr/lib/extension-release.d/extension-release.yakuake
$ echo VERSION_ID=3.3.1 >> yakuake/usr/lib/extension-release.d/extension-release.yakuake

3) Create an image file with the contents of the extension:

$ mksquashfs yakuake yakuake.raw

That’s it! The extension is ready.

A couple of important things: image files must have the .raw suffix and, despite the name, they can contain any filesystem that the OS can mount. In this example I used SquashFS but other alternatives like EroFS or ext4 are equally valid.

NOTE: systemd-sysext can also use extensions from plain directories (i.e skipping the mksquashfs part). Unfortunately we cannot use them in our case because overlayfs does not work with the casefold feature that is enabled on the Steam Deck.

Using the extension

Once the extension is created you simply need to copy it to a place where systemd-systext can find it. There are several places where they can be installed (see the manual for a list) but due to the Deck’s partition layout and the potentially large size of some extensions it probably makes more sense to store them in the home partition and create a link from one of the supported locations (/var/lib/extensions in this example):

(deck@steamdeck ~)$ mkdir extensions
(deck@steamdeck ~)$ scp user@host:/path/to/yakuake.raw extensions/
(deck@steamdeck ~)$ sudo ln -s $PWD/extensions /var/lib/extensions

Once the extension is installed in that directory you only need to enable and start systemd-sysext:

(deck@steamdeck ~)$ sudo systemctl enable systemd-sysext
(deck@steamdeck ~)$ sudo systemctl start systemd-sysext

After this, if everything went fine you should be able to see (and run) /usr/bin/yakuake. The files should remain there from now on, also if you reboot the device. You can see what extensions are enabled with this command:

$ systemd-sysext status
HIERARCHY EXTENSIONS SINCE
/opt      none       -
/usr      yakuake    Tue 2022-09-13 18:21:53 CEST

If you add or remove extensions from the directory then a simple “systemd-sysext refresh” is enough to apply the changes.

Unfortunately, and unlike distro packages, extensions don’t have any kind of post-installation hooks or triggers, so in the case of Yakuake you probably won’t see an entry in the KDE application menu immediately after enabling the extension. You can solve that by running kbuildsycoca5 once from the command line.

Limitations and caveats

Using systemd extensions is generally very easy but there are some things that you need to take into account:

  1. Using extensions is easy (you put them in the directory and voilà!). However, creating extensions is not necessarily always easy. To begin with, any libraries, files, etc., that your extensions may need should be either present in the root filesystem or provided by the extension itself. You may need to combine files from different sources or packages into a single extension, or compile them yourself.
  2. In particular, if the extension contains binaries they should probably come from the Steam Deck repository or they should be built to work with those packages. If you need to build your own binaries then having a SteamOS virtual machine can be handy. There you can install all development files and also test that everything works as expected. One could also create a Steam Deck SDK extension with all the necessary files to develop directly on the Deck 🙂
  3. Extensions are not distribution packages, they don’t have dependency information and therefore they should be self-contained. They also lack triggers and other features available in packages. For desktop applications I still recommend using a system like Flatpak when possible.
  4. Extensions are tied to a particular version of the OS and, as explained above, the ID and VERSION_ID of each extension must match the values from /etc/os-release. If the fields don’t match then the extension will be ignored. This is to be expected because there’s no guarantee that a particular extension is going to work with a different version of the OS. This can happen after a system update. In the best case one simply needs to update the extension’s VERSION_ID, but in some cases it might be necessary to create the extension again with different/updated files.
  5. Extensions only install files in /usr and /opt. Any other file in the image will be ignored. This can be a problem if a particular piece of software needs files in other directories.
  6. When extensions are enabled the /usr and /opt directories become read-only because they are now part of an overlayfs. They will remain read-only even if you run steamos-readonly disable !!. If you really want to make the rootfs read-write you need to disable the extensions (systemd-sysext unmerge) first.
  7. Unlike Flatpak or Podman (including toolbox / distrobox), this is (by design) not meant to isolate the contents of the extension from the rest of the system, so you should be careful with what you’re installing. On the other hand, this lack of isolation makes systemd-sysext better suited to some use cases than those container-based systems.

Conclusion

systemd extensions are an easy way to add software (or data files) to the immutable OS of the Steam Deck in a way that is seamlessly integrated with the rest of the system. Creating them can be more or less easy depending on the case, but using them is extremely simple. Extensions are not packages, and systemd-sysext is not a package manager or a general-purpose tool to solve all problems, but if you are aware of its limitations it can be a practical tool. It is also possible to share extensions with other users, but here the usual warning against installing binaries from untrusted sources applies. Use with caution, and enjoy!

by berto at September 13, 2022 06:00 PM

September 12, 2022

Eric Meyer

Nuclear Targeted Footnotes

One of the more interesting design challenges of The Effects of Nuclear Weapons was the fact that, like many technical texts, it has footnotes.  Not a huge number, and in fact one chapter has none at all, but they couldn’t be ignored.  And I didn’t want them to be inline between paragraphs or stuck into the middle of the text.

This was actually a case where Chris and I decided to depart a bit from the print layout, because in print a chapter has many pages, but online it has a single page.  So we turned the footnotes into endnotes, and collected them all near the end of each chapter.

Originally I had thought about putting footnotes off to one side in desktop views, such as in the right-hand grid gutter.  After playing with some rough prototypes, I realized this wasn’t going to go the way I wanted it to, and would likely make life difficult in a variety of display sizes between the “big desktop monitor” and “mobile device” realms.  I don’t know, maybe I gave up too easily, but Chris and I had already decided that endnotes were an acceptable adaptation and I decided to roll with that.

So here’s how the footnotes work.  First off, in the main-body text, a footnote marker is wrapped in a <sup> element and is a link that points at a named anchor in the endnotes. (I may go back and replace all the superscript elements with styled <mark> elements, but for now, they’re superscript elements.)  Here’s an example from the beginning of Chapter I, which also has a cross-reference link in it, classed as such even though we don’t actually style them any differently than other links.

This is true for a conventional “high explosive,” such as TNT, as well as for a nuclear (or atomic) explosion,<sup><a href="#fnote01">1</a></sup> although the energy is produced in quite different ways (<a href="#§1.11" class="xref">§ 1.11</a>).

Then, down near the end of the document, there’s a section that contains an ordered list.  Inside that list are the endnotes, which are in part marked up like this:

<li id="fnote01"><sup>1</sup> The terms “nuclear” and atomic” may be used interchangeably so far as weapons, explosions, and energy are concerned, but “nuclear” is preferred for the reason given in <a href="#§1.11" class="xref">§ 1.11</a>.

The list item markers are switched off with CSS, and superscripted numbers stand in their place.  I do it that way because the footnote numbers are important to the content, but also have specific presentation demands that are difficult  —  nay, impossible — to pull off with normal markers, like raising them superscript-style. (List markers are only affected by a very limited set of properties.)

In order to get the footnote text to align along the start (left) edge of their content and have the numbers hang off the side, I elected to use the old negative-text-indent-positive-padding trick:

.endnotes li {
	padding-inline-start: 0.75em;
	text-indent: -0.75em;
}

That works great as long as there are never any double-digit footnote numbers, which was indeed the case… until Chapter VIII.  Dang it.

So, for any footnote number above 9, I needed a different set of values for the indent-padding trick, and I didn’t feel like adding in a bunch of greater-than-nine classes. Following-sibling combinator to the rescue!

.endnotes li:nth-of-type(9) ~ li {
	margin-inline-start: -0.33em;
	padding-inline-start: 1.1em;
	text-indent: -1.1em;
}

The extra negative start margin is necessary solely to get the text in the list items to align horizontally, though unnecessary if you don’t care about that sort of thing.

Okay, so the endnotes looked right when seen in their list, but I needed a way to get back to the referring paragraph after reading a footnote.  Thus, some “backjump” links got added to each footnote, pointing back to the paragraph that referred to them.

<span class="backjump">[ref. <a href="#§1.01">§ 1.01</a>]</span>

With that, a reader can click/tap a footnote number to jump to the corresponding footnote, then click/tap the reference link to get back to where they started.  Which is fine, as far as it goes, but that idea of having footnotes appear in context hadn’t left me.  I decided I’d make them happen, one way or another.

(Throughout all this, I wished more than once the HTML 3.0 proposal for <fn> had gone somewhere other than the dustbin of history and the industry’s collective memory hole.  Ah, well.)

I was thinking I’d need some kind of JavaScript thing to swap element nodes around when it occurred to me that clicking a footnote number would make the corresponding footnote list item a target, and if an element is a target, it can be styled using the :target pseudo-class.  Making it appear in context could be a simple matter of positioning it in the viewport, rather than with relation to the document.  And so:

.endnotes li:target {
	position: fixed;
	bottom: 0;
	padding-block: 2em 4em;
	padding-inline: 2em;
	margin-inline: -2em 0;
	border-top: 1px solid;
	background: #FFF;
	box-shadow: 0 0 3em 3em #FFF;
	max-width: 45em;
}

That is to say, when an endnote list item is targeted, it’s fixedly positioned against the bottom of the viewport and given some padding and background and a top border and a box shadow, so it has a bit of a halo above it that sets it apart from the content it’s overlaying.  It actually looks pretty sweet, if I do say so myself, and allows the reader to see footnotes without having to jump back and forth on the page.  Now all I needed was a way to make the footnote go away.

Again I thought about going the JavaScript route, but I’m trying to keep to the Web’s slower pace layers as much as possible in this project for maximum compatibility over time and technology.  Thus, every footnote gets a “close this” link right after the backjump link, marked up like this:

<a href="#fnclosed" class="close">X</a></li>

(I realize that probably looks a little weird, but hang in there and hopefully I can clear it up in the next few paragraphs.)

So every footnote ends with two links, one to jump to the paragraph (or heading) that referred to it, which is unnecessary when the footnote has popped up due to user interaction; and then, one to make the footnote go away, which is unnecessary when looking at the list of footnotes at the end of the chapter.  It was time to juggle display and visibility values to make each appear only when necessary.

.endnotes li .close {
	display: none;
	visibility: hidden;
}
.endnotes li:target .close {
	display: block;
	visibility: visible;
}
.endnotes li:target .backjump {
	display: none;
	visibility: hidden;
}

Thus, the “close this” links are hidden by default, and revealed when the list item is targeted and thus pops up.  By contrast, the backjump links are shown by default, and hidden when the list item is targeted.

As it now stands, this approach has some upsides and some downsides.  One upside is that, since a URL with an identifier fragment is distinct from the URL of the page itself, you can dismiss a popped-up footnote with the browser’s Back button.  On kind of the same hand, though, one downside is that since a URL with an identifier fragment is distinct from the URL of the page itself, if you consistently use the “close this” link to dismiss a popped-up footnote, the browser history gets cluttered with the opened and closed states of various footnotes.

This is bad because you can get partway through a chapter, look at a few footnotes, and then decide you want to go back one page by hitting the Back button, at which point you discover have to go back through all those footnote states in the history before you actually go back one page.

I feel like this is a thing I can (probably should) address by layering progressively-enhancing JavaScript over top of all this, but I’m still not quite sure how best to go about it.  Should I add event handlers and such so the fragment-identifier stuff is suppressed and the URL never actually changes?  Should I add listeners that will silently rewrite the browser history as needed to avoid this?  Ya got me.  Suggestions or pointers to live examples of solutions to similar problems are welcomed in the comments below.

Less crucially, the way the footnote just appears and disappears bugs me a little, because it’s easy to miss if you aren’t looking in the right place.  My first thought was that it would be nice to have the footnote unfurl from the bottom of the page, but it’s basically impossible (so far as I can tell) to animate the height of an element from 0 to auto.  You also can’t animate something like bottom: calc(-1 * calculated-height) to 0 because there is no CSS keyword (so far as I know) that returns the calculated height of an element.  And you can’t really animate from top: 100vh to bottom: 0 because animations are of a property’s values, not across properties.

I’m currently considering a quick animation from something like bottom: -50em to 0, going on the assumption that no footnote will ever be more than 50 em tall, regardless of the display environment.  But that means short footnotes will slide in later than tall footnotes, and probably appear to move faster.  Maybe that’s okay?  Maybe I should do more of a fade-and-scale-in thing instead, which will be visually consistent regardless of footnote size.  Or I could have them 3D-pivot up from the bottom edge of the viewport!  Or maybe this is another place to layer a little JS on top.

Or maybe I’ve overlooked something that will let me unfurl the way I first envisioned with just HTML and CSS, a clever new technique I’ve missed or an old solution I’ve forgotten.  As before, comments with suggestions are welcome.


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at September 12, 2022 03:16 PM

September 10, 2022

Alex Bradbury

simple-reload

Summary

simple-reload provides a straight-forward (~30 lines of JS and zero server-side requirements) way of reloading a web page as it is iteratively developed or modified. Once activated, a page will be reloaded whenever it regains focus.

If you encounter any problems, please file an issue on the simple-reload GitHub repository.

Trade-offs

Mature solutions like LiveReload are available, which makes a different set of trade-offs. Please read this section carefully to determine if simple-reload makes sense for you:

Cons:

  • Reload won't take place until the page is focused by the user, which requires manual interaction.
    • This is significantly less burdensome if using focus follows pointer in your window manager.
  • Reloads will occur even if there were no changes.
    • With e.g. LiveReload, a reload only happens when the server indicates there has been a change. This may be a big advantage for stateful pages or pages with lots of forms.

Pros:

  • Tiny, easy to modify implementation.
  • Can dynamically enable/disable reloading on a per-tab basis.
    • This can be helpful to keep a fixed revision of a page in one tab to compare against.
  • No server-side requirements.
    • So works even from file://.
    • Makes it easier if using with a remote server (e.g. no need to worry about exposing a port as for LiveReload).

Code

<script type="module">
// Set to true to enable reloading from first load.
const enableByDefault = false;
// Firefox triggers blur/focus events when resizing, so we ignore a focus
// following a blur within 200ms (assumed to be generated by resizing rather
// than human interaction).
let blurTimeStamp = null;
function focusListener(ev) {
  if (ev.timeStamp - blurTimeStamp >= 200) {
    location.reload();
  }
}
function blurListener(ev) {
  if (blurTimeStamp === null) {
    window.addEventListener("focus", focusListener);
  }
  blurTimeStamp = ev.timeStamp;
}
function deactivate() {
  sessionStorage.removeItem("simple-reload");
  window.removeEventListener("focus", focusListener);
  window.removeEventListener("blur", blurListener);
  document.title = document.title.replace(/^\u27F3 /, "");
  window.addEventListener("dblclick", activate, { once: true });
}
function activate() {
  sessionStorage.setItem("simple-reload", "activated");
  location.reload();
}
if (enableByDefault || sessionStorage.getItem("simple-reload") == "activated") {
  document.title = "\u27F3 " + document.title;
  sessionStorage.setItem("simple-reload", "activated");
  window.addEventListener("blur", blurListener);
  window.addEventListener("dblclick", deactivate, { once: true });
} else {
  window.addEventListener("dblclick", activate);
}
</script>

Usage

Paste the above code into the <head> or <body> of a HTML file. You can then enable the reload behaviour by double-clicking on a page (double-click again to disable again). The title is prefixed with ⟳ while reload-on-focus is enabled. If you'd like reload-on-focus enabled by default, just flip the enableByDefault variable to true. You could either modify whatever you're using to emit HTML to include this code when in development mode, or configure your web server of choice to inject it for you.

The enabled/disabled state of the reload logic is scoped to the current tab, so will be maintained if navigating to different pages within the same domain in that tab.

In terms of licensing, the implementation is so straight-forward it hardly feels copyrightable. Please consider it public domain, or MIT if you're more comfortable with an explicit license.

Implementation notes

  • In case it's not clear, "blur" events mentioned above are events fired when an element is no longer in focus.
  • As noted in the code comment, Firefox (under dwm in Linux at least) seems to trigger blur+focus events when resizing the window using a keybinding while Chrome doesn't. Being able to remove the logic to deal with this issue would be a good additional simplification.

Article changelog
  • 2022-09-17: Added link to simple-reload GitHub repo.
  • 2022-09-11: Removed incorrect statement about sessionStorage usage being redundant if enableByDefault = true (it does mean disabling reloading will persist).
  • 2022-09-10: Initial publication date.

September 10, 2022 12:00 PM

Muxup implementation notes

This article contains a few notes on various implementation decisions made when creating the Muxup website. They're intended primarily as a reference for myself, but some parts may be of wider interest. See about for more information about things like site structure.

Site generation

I ended up writing my own script for generating the site pages from a tree of Markdown files. Zola seems like an excellent option, but as I had a very specific idea on how I wanted pages to be represented in source form and the format and structure of the output, writing my own was the easier option (see build.py). Plus, yak-shaving is fun.

I opted to use mistletoe for Markdown parsing. I found a few bugs in the traverse helper function when implementing some transformations on the generated AST, but upstream was very responsive about reviewing and merging my PRs. The main wart I've found is that parsing and rendering aren't fully separated, although this doesn't pose any real practical concern for my use case and will hopefully be fixed in the future. mistune also seemed promising, but has some conformance issues.

One goal was to keep everything possible in standard Markdown format. This means, for instance, avoiding custom frontmatter entries or link formats if the same information could be extracted from the file). Therefore:

  • There is no title frontmatter entry - the title is extracted by grabbing the first level 1 heading from the Markdown AST (and erroring if it isn't present).
  • All internal links are written as [foo](/relative/to/root/foo.md). The generator will error if the file can't be found, and will otherwise translate the target to refer to the appropriate permalink.
    • This has the additional advantage that links are still usable if viewing the Markdown on GitHub, which can be handy if reviewing a previous revision of an article.
  • Article changelogs are just a standard Markdown list under the "Article changelog" heading, which is checked and transformed at the Markdown AST level in order to produce the desired output (emitting the list using a <details> and <summary>).

All CSS was written through the usual mix of experimentation (see simple-reload for the page reloading solution I used to aid iterative development) and learning from the CSS used by other sites.

Randomly generated title highlights

The main visual element throughout the site is the randomised, roughly drawn highlight used for the site name and article headings. This takes some inspiration from the RoughNotation library (see also the author's description of the algorithms used), but uses my own implementation that's much more tightly linked to my use case.

The core logic for drawing these highlights is based around drawing the SVG path:

  • Determine the position and size of the text element to be highlight (keeping in mind it might be described by multiple rectangles if split across several lines).
  • For each rectangle, draw a bezier curve starting from the middle of the left hand side through to the right hand side, with its two control points at between 20-40% and 40-80% of the width.
  • Apply randomised offsets in the x and y directions for every point.
    • The range of the randomised offsets should depend on length of the text. Generally speaking, a smaller offsets should be used for a shorter piece of text.

This logic is implemented in preparePath and reproduced below (with the knowledge the offset(delta) is a helper to return a random number between delta and -delta, hopefully it's clear how this relates to the logic described above):

function preparePath(hlInfo) {
  const parentRect = hlInfo.svg.getBoundingClientRect();
  const rects = hlInfo.hlEl.getClientRects();
  let pathD = "";
  for (const rect of rects) {
    const x = rect.x - parentRect.x, y = rect.y - parentRect.y,
          w = rect.width, h = rect.height;
    const mid_y = y + h / 2;
    let maxOff = w < 75 ? 3 : w < 300 ? 6 : 8;
    const divergePoint = .2 + .2 * Math.random();
    pathD = `${pathD}
      M${x+offset(maxOff)} ${mid_y+offset(maxOff)}
      C${x+w*divergePoint+offset(maxOff)} ${mid_y+offset(maxOff)},
       ${x+2*w*divergePoint+offset(maxOff)} ${mid_y+offset(maxOff)}
       ${x+w+offset(maxOff)} ${mid_y+offset(maxOff)}`;
  }
  hlInfo.nextPathD = pathD;
  hlInfo.strokeWidth = 0.85*rects[0].height;
}

I took some care to avoid forced layout/reflow by batching together reads and writes of the DOM into separate phases when drawing the initial set of highlights for the page, which is why this function is generates the path but doesn't modify the SVG directly. Separate logic adds handlers to links that are highlighted continuously redraw the highlight (I liked the visual effect).

Minification and optimisation

I primarily targeted the low-hanging fruit here, and avoided adding in too many dependencies (e.g. separate HTML and CSS minifiers) during the build. muxup.com is a very lightweight site - the main extravagance I've allowed myself is the use of a webfont (Nunito) but that only weighs in at ~36KiB (the variable width version converted to woff2 and subsetted using pyftsubset from fontTools). As the CSS and JS payload is so small, it's inlined into each page.

The required JS for the article pages and the home page is assembled and then minified using terser. This reduces the size of the JS required for the page you're reading from 5077 bytes to 2620 bytes uncompressed (1450 bytes to 991 bytes if compressing the result with Brotli, though in practice the impact will be a bit different when compressing the JS together with the rest of the page + CSS). When first published, the page you are reading (including inlined CSS and JS) was ~27.7KiB uncompressed (7.7KiB Brotli compressed), which compares to 14.4KiB for its source Markdown (4.9KiB Brotli compressed).

Each page contains an embedded stylesheet with a conservative approximation of the minimal CSS needed. The total amount of CSS is small enough that it's easy to manually split between the CSS that is common across the site, the CSS only needed for the home page, the CSS common to all articles, and then other CSS rules that may or may not be needed depending on the page content. For the latter case, CSS snippets to include are gated on matching a given string (e.g. <kbd for use of the <kbd> tag). For my particular use case, this is more straight forward and faster than e.g. relying on PurgeCSS as part of the build process.

The final trick on the frontend is prefetching. Upon hovering on an internal link, it will be fetched (see the logic at the end of common.js for the implementation approach), meaning that in the common case any link you click will already have been loaded and cached. More complex approaches could be used to e.g. hook the mouse click event and directly update the DOM using the retrieved data. But this would require work to provide UI feedback during the load and the incremental benefit over prefetching to prime the cache seems small.

Serving using Caddy

I had a few goals in setting up Caddy to serve this site:

  • Enable new and shiny things like HTTP3 and serving Brotli compressed content.
    • Both are very widely supported on the browser side (and Brotli really isn't "new" any more), but require non-standard modules or custom builds on Nginx.
  • Redirect wwww.muxup.com/* URLs to muxup.com/*.
  • HTTP request methods other than GET or HEAD should return error 405.
  • Set appropriate Cache-Control headers in order to avoid unnecessary re-fetching content. Set shorter lifetimes for served .html and 308 redirects vs other assets. Leave 404 responses with no Cache-Control header.
  • Avoid serving the same content at multiple URLs (unless explicitly asked for) and don't expose the internal filenames of content served via a different canonical URL. Also, prefer URLs without a trailing slash, but ensure not to issue a redirect if the target file doesn't exist. This means (for example):
    • muxup.com/about/ should redirect to muxup.com/about
    • muxup.com/2022q3////muxup-implementation-notes should 404 or redirect.
    • muxup.com/about/./././ should 404 or redirect
    • muxup.com/index.html should 404.
    • muxup.com/index.html.br (referring to the precompressed brotli file) should 404.
    • muxup.com/non-existing-path/ should 404.
    • If there is a directory foo and a foo.html at the same level, serve foo.html for GET /foo (and redirect to it for GET /foo/).
  • Never try to serve */index.html or similar (except in the special case of GET /).

Perhaps because my requirements were so specific, this turned out to be a little more involved than I expected. If seeking to understand the Caddyfile format and Caddy configuration in general, I'd strongly recommend getting a good understanding of the key ideas by reading Caddyfile concepts, understanding the order in which directives are handled by default and how you might control the execution order of directives using the route directive or use the handle directive to specify groups of directives in a mutually exclusive fashion based on different matchers. The Composing in the Caddyfile article provides a good discussion of these options).

Ultimately, I wrote a quick test script for the desired properties and came up with the following Caddyfile that meets almost all goals:

{
	servers {
		protocol {
			experimental_http3
		}
	}
	email asb@asbradbury.org
}

(muxup_file_server) {
	file_server {
		index ""
		precompressed br
		disable_canonical_uris
	}
}
www.muxup.com {
	redir https://muxup.com{uri} 308
	header Cache-Control "max-age=2592000, stale-while-revalidate=2592000"
}
muxup.com {
	root * /var/www/muxup.com/htdocs
	encode gzip
	log {
		output file /var/log/caddy/muxup.com.access.log
	}
	header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"

	vars short_cache_control "max-age=3600"
	vars long_cache_control "max-age=2592000, stale-while-revalidate=2592000"

	@method_isnt_GET_or_HEAD not method GET HEAD
	@path_is_suffixed_with_html_or_br path *.html *.html/ *.br *.br/
	@path_or_html_suffixed_path_exists file {path}.html {path}
	@html_suffixed_path_exists file {path}.html
	@path_or_html_suffixed_path_doesnt_exist not file {path}.html {path}
	@path_is_root path /
	@path_has_trailing_slash path_regexp ^/(.*)/$

	handle @method_isnt_GET_or_HEAD {
		error 405
	}
	handle @path_is_suffixed_with_html_or_br {
		error 404
	}
	handle @path_has_trailing_slash {
		route {
			uri strip_suffix /
			header @path_or_html_suffixed_path_exists Cache-Control "{vars.long_cache_control}"
			redir @path_or_html_suffixed_path_exists {path} 308
			error @path_or_html_suffixed_path_doesnt_exist 404
		}
	}
	handle @path_is_root {
		rewrite index.html
		header Cache-Control "{vars.short_cache_control}"
		import muxup_file_server
	}
	handle @html_suffixed_path_exists {
		rewrite {path}.html
		header Cache-Control "{vars.short_cache_control}"
		import muxup_file_server
	}
	handle * {
		header Cache-Control "{vars.long_cache_control}"
		import muxup_file_server
	}
	handle_errors {
		header -Cache-Control
		respond "{err.status_code} {err.status_text}"
	}
}

A few notes on the above:

  • It isn't currently possible to match URLs with // due to the canonicalisation Caddy performs, but 2.6.0 including PR #4948 hopefully provides a solution. Hopefully to allow matching /./ too.
  • HTTP3 should enabled by default in Caddy 2.6.0.
  • Surprisingly, you need to explicitly opt in to enabling gzip compression (see this discussion with the author of Caddy about that choice).
  • The combination of try_files and file_server provides a large chunk of the basics, but something like the above handlers is needed to get the precise desired behaviour for things like redirects, *.html and *.br etc.
  • route is needed within handle @path_has_trailing_slash because the default execution order of directives has uri occurring some time after redir and error.
  • Caddy doesn't support dynamically brotli compressing responses, so the precompressed br option of file_server is used to serve pre-compressed files (as prepared by the deploy script).
  • I've asked for advice on improving the above Caddyfile on the Caddy Discourse.

Analytics

The simplest possible solution - don't have any.

Last but by no means least, is the randomly selected doodle at the bottom of each page. I select images from the Quick, Draw! dataset and export them to SVG to be randomly selected on each page load. A rather scrappy script contains logic to generate SVGs from the dataset's NDJSON format and contains a simple a Flask application that allows selecting desired images from randomly displayed batches from each dataset.

With examples such as O'Reilly's beautiful engravings of animals on their book covers there's a well established tradition of animal illustrations on technical content - and what better way to honour that tradition than with a hastily drawn doodle by a random person on the internet that spins when your mouse hovers over it?


Article changelog
  • 2022-09-11: Add HSTS, tweak no-www redirect, and reject HTTP methods other than GET or POST in Caddyfile. Also link to thread requesting suggestions for this Caddyfile on Caddy's Discourse.
  • 2022-09-10: Initial publication date.

September 10, 2022 12:00 PM

September 02, 2022

Ricardo García

VK_EXT_mesh_shader finally released

Vulkan 1.3.226 was released yesterday and it finally includes the cross-vendor VK_EXT_mesh_shader extension. This has definitely been an important moment for me. As part of my job at Igalia and our collaboration with Valve, I had the chance to work reviewing this extension in depth and writing thousands of CTS tests for it. You’ll notice I’m listed as one of the extension contributors. Hopefully, the new tests will be released to the public soon as part of the open source VK-GL-CTS Khronos project.

During this multi-month journey I had the chance to work closely with several vendors working on adding support for this extension in multiple drivers, including NVIDIA (special shout-out to Christoph Kubisch, Patrick Mours, Pankaj Mistry and Piers Daniell among others), Intel (thanks Marcin Ślusarz for finding and reporting many test bugs) and, of course, Valve. Working for the latter, Timur Kristóf provided an implementation for RADV and reported to me dozens of bugs, test ideas, suggestions and improvements. Do not miss his blog post series about mesh shaders and how they’re implemented on RDNA2 hardware. Timur’s implementation will be used in your Linux system if you have a capable AMD GPU and, of course, the Steam Deck.

The extension has been developed with DX12 compatibility in mind. It’s possible to use mesh shading from Vulkan natively and it also allows future titles using DX12 mesh shading to be properly run on top of VKD3D-Proton and enjoyed on Linux, if possible, from day one. It’s hard to provide a summary of the added functionality and what mesh shaders are about in a short blog post like this one, so I’ll refer you to external documentation sources, starting by the Vulkan mesh shading post on the Khronos Blog. Both Timur and myself have submitted a couple of talks to XDC 2022 which have been accepted and will give you a primer on mesh shading as well as some more information on the RADV implementation. Do not miss the event at Minneapolis or enjoy it remotely while it’s being livestreamed in October.

September 02, 2022 08:18 PM

September 01, 2022

Delan Azabani

Meet the CSS highlight pseudos

A year and a half ago, I was asked to help upstream a Chromium patch allowing authors to recolor spelling and grammar errors in CSS. At the time, I didn’t realise that this was part of a far more ambitious effort to reimagine spelling errors, grammar errors, text selections, and more as a coherent system that didn’t yet exist as such in any browser. That system is known as the highlight pseudos, and this post will focus on the design of said system and its consequences for authors.

This is the third part of a series (part one, part two) about Igalia’s work towards making the CSS highlight pseudos a reality.

article { --cr-highlight: #3584E4; --cr-highlight-aC0h: #3584E4C0; } article figure > img { max-width: 100%; } article figure > figcaption { max-width: 30rem; margin-left: auto; margin-right: auto; } article pre, article code { font-family: Inconsolata, monospace, monospace; } article aside, article blockquote { font-size: 0.75em; max-width: 30rem; } article aside { margin-left: 0; padding-left: 1rem; border-left: 3px double rebeccapurple; } article blockquote { margin-left: 3rem; } article blockquote:before { margin-left: -2rem; } ._spelling, ._grammar { text-decoration-thickness: /* iOS takes 0 literally */ 1px; text-decoration-skip-ink: none; } ._spelling { text-decoration: /* not a shorthand on iOS */ underline; text-decoration-style: wavy; text-decoration-color: red; } ._grammar { text-decoration: /* not a shorthand on iOS */ underline; text-decoration-style: wavy; text-decoration-color: green; } ._example { border: 2px dotted rebeccapurple; } ._example * + *, ._hpdemo * + * { margin-top: 0; } ._checker { position: relative; margin-left: auto; margin-right: auto; } ._checker:focus { outline: none; } ._checker::before { display: flex; align-items: center; justify-content: center; position: absolute; top: 0; bottom: 0; left: 0; right: 0; width: 100%; font-size: 7em; color: transparent; background: transparent; content: "▶"; } ._checker:not(:focus)::before { color: rebeccapurple; background: #66339940; } ._checker tbody th { text-align: left; } ._checker ._live::selection, ._checker ._live *::selection { color: currentColor; background: transparent; } ._checker:not(:focus) ._live > div { visibility: hidden; } ._checker:not([data-phase=done]):not(#specificity) ._live > div, ._checker:not([data-phase=done]):not(#specificity) ._live > div * { color: transparent; } ._checker:not([data-phase=done]):not(#specificity) ._live > div::selection, ._checker:not([data-phase=done]):not(#specificity) ._live > div *::selection { color: transparent; } ._checker:not([data-phase=done]):not(#specificity) ._live > div::highlight(checker), ._checker:not([data-phase=done]):not(#specificity) ._live > div *::highlight(checker), ._checker:not([data-phase=done]):not(#specificity) ._live > div::highlight(lower), ._checker:not([data-phase=done]):not(#specificity) ._live > div *::highlight(lower) { color: transparent; } ._checker ._live > div { width: 5em; } ._checker ._live > div { position: relative; line-height: 1; } ._checker ._live > div > span { position: absolute; margin: 0; padding-top: calc((1.5em - 1em) / 2); width: 5em; } /* ::highlight() [end-to-end test] = no, if the pseudo selector is broken and/or no active highlight = yes, if the pseudo selector works and highlight is active */ ._checker ._custom :nth-child(2) { color: transparent; background: transparent; } ._checker ._custom :nth-child(1)::highlight(checker) { color: transparent; } ._checker ._custom :nth-child(2)::highlight(checker) { color: CanvasText; background: Canvas; } /* ::highlight() [selector] = no, if the pseudo selector is unsupported = yes, if the pseudo selector is supported • highlight not active, only for selector list validity */ ._checker ._chps :nth-child(2) { color: transparent; } ._checker ._chps :nth-child(1), :not(*)::highlight(checker) { color: transparent; } ._checker ._chps :nth-child(2), :not(*)::highlight(checker) { color: CanvasText; } /* ::highlight() [API] = no, if the API is missing or broken = yes, if the API is present and working */ ._checker ._cha {} /* ::spelling-error [end-to-end test] = no, if the pseudo selector is broken and/or no active highlight = yes, if the pseudo selector works and highlight is active */ ._checker [spellcheck] :nth-child(2) { color: transparent; background: transparent; } ._checker [spellcheck] :nth-child(1)::spelling-error { color: transparent; } ._checker [spellcheck] :nth-child(2)::spelling-error { color: CanvasText; background: Canvas; } ._checker [spellcheck] *::spelling-error { text-decoration: none; } /* Highlight inheritance (::highlight) = no, if var() inherits from originating element = yes, if var() ignores originating element and uses fallback */ ._checker ._hih { color: transparent; background: transparent; } ._checker ._hih::highlight(checker) { --t: transparent; --x: CanvasText; --y: Canvas; } ._checker ._hih :nth-child(1)::highlight(checker) { color: var(--t, CanvasText); background: var(--t, Canvas); } ._checker ._hih :nth-child(2)::highlight(checker) { color: var(--x, transparent); background: var(--y, transparent); } /* Highlight inheritance (::selection) = no, if var() inherits from originating element = yes, if var() ignores originating element and uses fallback */ ._checker ._his { color: transparent; background: transparent; } ._checker ._his::selection { --t: transparent; --x: CanvasText; --y: Canvas; } ._checker ._his :nth-child(1)::selection { color: var(--t, CanvasText); background: var(--t, Canvas); } ._checker ._his :nth-child(2)::selection { color: var(--x, transparent); background: var(--y, transparent); } /* Highlight overlay painting = no, if currentColor takes color from originating element only = yes, if currentColor takes color from next active highlight • lower highlight “yes” is hidden by ‘-webkit-text-fill-color’ */ ._checker ._hop { color: transparent; background: transparent; } ._checker ._hop :nth-child(1) { color: CanvasText; } ._checker ._hop :nth-child(1)::highlight(lower) { color: transparent; } ._checker ._hop :nth-child(1)::highlight(checker) { color: currentColor; } ._checker ._hop :nth-child(2) { color: transparent; } ._checker ._hop :nth-child(2)::highlight(lower) { color: CanvasText; -webkit-text-fill-color: transparent; } ._checker ._hop :nth-child(2)::highlight(checker) { color: currentColor; -webkit-text-fill-color: currentColor; } ._table { font-size: 0.75em; } ._table td, ._table th { vertical-align: top; border: 1px solid black; } ._table td:not(._tight), ._table th:not(._tight) { padding: 0.5em; } ._tight picture, ._tight img { vertical-align: top; } ._compare * + *, ._tight * + *, ._gifs * + * { margin-top: 0; } ._compare { max-width: 100%; border: 1px solid rebeccapurple; } ._compare > div { max-width: 100%; position: relative; touch-action: pinch-zoom; --cut: 50%; } ._compare > div > * { vertical-align: top; max-width: 100%; } ._compare > div > :nth-child(1) { position: absolute; clip: rect(auto, auto, auto, var(--cut)); } ._compare > div > :nth-child(2) { position: absolute; width: var(--cut); height: 100%; border-right: 1px solid rebeccapurple; } ._compare > div > :nth-child(2):before { content: var(--left-label); color: rebeccapurple; font-size: 0.75em; position: absolute; right: 0.5em; } ._compare > div > :nth-child(2):after { content: var(--right-label); color: rebeccapurple; font-size: 0.75em; position: absolute; left: calc(100% + 0.5em); } ._sum td:first-of-type { padding-right: 1em; } ._gifs { position: relative; display: flex; flex-flow: column nowrap; } ._gifs > video { transition: opacity 0.125s linear; } ._gifs > button { transition: 0.125s linear; transition-property: color, background-color; } ._gifs._paused > video { opacity: 0.5; } ._gifs._paused > button { color: rebeccapurple; background: #66339940; } ._gifs > button { position: absolute; top: 0; bottom: 0; left: 0; right: 0; width: 100%; font-size: 7em; color: transparent; background: transparent; content: "▶"; } ._gifs > button:focus-visible { outline: 0.25rem solid #663399C0; outline-offset: -0.25rem; } ._commits { position: relative; } ._commits > :first-child { position: absolute; right: -0.1em; height: 100%; border-right: 0.2em solid rgba(102,51,153,0.5); } ._commits > :last-child { position: relative; padding-right: 0.5em; } * + ._commit, ._commit * + * { margin-top: 0; } ._commit { line-height: 2; margin-right: -1.5em; text-align: right; } ._commit > img { width: 2em; vertical-align: middle; } ._commit > a { padding-right: 0.5em; text-decoration: none; color: rebeccapurple; } ._commit > a > code { font-size: 1em; } ._commit-none > a { color: rgba(102,51,153,0.5); }

Contents

What are they?

CSS has four highlight pseudos and an open set of author-defined custom highlight pseudos. They have their roots in ::selection, which was a rudimentary and non-standard, but widely supported, way of styling text and images selected by the user.

The built-in highlights are ::selection for user-selected content, ::target-text for linking to text fragments, ::spelling-error for misspelled words, and ::grammar-error for text with grammar errors, while the custom highlights are known as ::highlight(x) where x is the author-defined highlight name.

Can I use them?

::selection has long been supported by all of the major browsers, and ::target-text shipped in Chromium 89. But for most of that time, no browser had yet implemented the more robust highlight pseudo system in the CSS pseudo spec.

::highlight() and the custom highlight API shipped in Chromium 105, thanks to the work by members1 of the Microsoft Edge team. They are also available in Safari 14.1 (including iOS 14.5) as an experimental feature (Highlight API). You can enable that feature in the Develop menu, or for iOS, under Settings > Safari > Advanced.

Safari’s support currently has a couple of quirks, as of TP 152. Range is not supported for custom highlights yet, only StaticRange, and the Highlight constructor has a bug where it requires passing exactly one range, ignoring any additional arguments. To create a Highlight with no ranges, first create one with a dummy range, then call the clear or delete methods.

Chromium 105 also implements the vast majority of the new highlight pseudo system. This includes highlight overlay painting, which was enabled for all highlight pseudos, and highlight inheritance, which was enabled for ::highlight() only.

Chromium 108 includes ::spelling-error and ::grammar-error as an experimental feature, together with the new ‘text-decoration-line’ values ‘spelling-error’ and ‘grammar-error’. You can enable these features at

chrome://flags/#enable-experimental-web-platform-features

Chromium’s support also currently has some bugs, as of r1041796. Notably, highlights don’t yet work under ::first-line and ::first-letter2, ‘text-shadow’ is not yet enabled for ::highlight(), computedStyleMap results are wrong for ‘currentColor’, and highlights that split ligatures (e.g. for complex scripts) only render accurately in ::selection2.

Click the table below to see if your browser supports these features.

act: 
sel: 
cha: 
yoursChromiumSafariFirefox
Custom highlights
noyes
10514.1*?
• ::highlight()
noyes
10514.1*?
• CSSOM API
noyes
10514.1* (ab)?
::spelling-error
noyes
108*??
Highlight overlay painting
noyes
105??
Highlight inheritance (::selection)
noyes
???
Highlight inheritance (::highlight)
noyes
105??
  • * = experimental (can be enabled in UI)
  • S = ::highlight() unsupported in querySelector
  • C = CSS.highlights missing or setlike (older API from 2020)
  • H = new Highlight() missing
  • a = StaticRange only (no support for Range)
  • b = new Highlight() requires exactly one range argument

How do I use them?

While you can write rules for highlight pseudos that target all elements, as was commonly done for pre-standard ::selection, selecting specific elements can be more powerful, allowing descendants to cleanly override highlight styles.

the fox jumps over the dog
(the quick fox, mind you)
<style>
    :root::selection {
        color: white;
        background-color: black;
    }
    aside::selection {
        background-color: darkred;
    }
</style>
<body>
    <p>the fox jumps over the dog
    <aside>
        (the <sup>quick</sup> fox, mind you)
    </aside>
</body>

Previously the same code would yield…

the fox jumps over the dog
(the quick fox, mind you)

(in older browsers)

Notice how none of the text is white on black, because there are always other elements (body, p, aside, sup) between the root and the text.

…unless you also selected the descendants of :root and aside:

:root::selection,
:root *::selection
/* (or just ::selection) */ {
    color: white;
    background-color: black;
}
aside::selection,
aside *::selection {
    background-color: green;
}

Note that a bare ::selection rule still means *::selection, and like any universal rule, it can interfere with inheritance when mixed with non-universal highlight rules.

the fox jumps over the dog
(the quick fox, mind you)
<style>
    ::selection {
        color: white;
        background-color: black;
    }
    aside::selection {
        background-color: darkred;
    }
</style>
<body>
    <p>the fox jumps over the dog
    <aside>
        (the <sup>quick</sup> fox, mind you)
    </aside>
</body>

sup::selection would have inherited ‘darkred’ from aside::selection, but the universal ::selection rule matches it directly, so it becomes black.

::selection is primarily controlled by user input, though pages can both read and write the active ranges via the Selection API with getSelection().

::target-text is activated by navigating to a URL ending in a fragment directive, which has its own syntax embedded in the #fragment. For example:

  • #foo:~:text=bar targets #foo and highlights the first occurrence of “bar”
  • #:~:text=the,dog highlights the first range of text from “the” to “dog”

::spelling-error and ::grammar-error are controlled by the user’s spell checker, which is only used where the user can input text, such as with textarea or contenteditable, subject to the spellcheck attribute (which also affects grammar checking). For privacy reasons, pages can’t read the active ranges of these highlights, despite being visible to the user.

::highlight() is controlled via the Highlight API with CSS.highlights. CSS.highlights is a maplike object, which means the interface is the same as a Map of strings (highlight names) to Highlight objects. Highlight objects, in turn, are setlike objects, which you can use like a Set of Range or StaticRange objects.

Hello, world!
<style>
    ::highlight(foo) { background: yellow; }
</style>
<script>
    const foo = new Highlight;
    CSS.highlights.set("foo", foo); // maplike

    const range = new Range;
    range.setStart(document.body.firstChild, 0);
    range.setEnd(document.body.firstChild, 5);
    foo.add(range); // setlike
</script>
<body>Hello, world!</body>

You can use getComputedStyle() to query resolved highlight styles under a particular element. Regardless of which parts (if any) are highlighted, the styles returned are as if the given highlight is active and all other highlights are inactive.

<style>
    ::selection { background: #00FF00; }
    ::highlight(foo) { background: #FF00FF; }
</style>
<script>
    getSelection().removeAllRanges();
    getSelection().selectAllChildren(document.body);

    const style = getComputedStyle(document.body, "::highlight(foo)");
    console.log(style.backgroundColor);
</script>
<body>Hello, world!</body>

This code always prints “rgb(255, 0, 255)”, even though only ::selection is active.

How do they work?

Highlight pseudos are defined as pseudo-elements, but they actually have very little in common with other pseudo-elements like ::before and ::first-line.

Unlike other pseudos, they generate highlight overlays, not boxes, and these overlays are like layers over the original content. Where text is highlighted, a highlight overlay can add backgrounds and text shadows, while the text proper and any other decorations are “lifted” to the very top.

@import url(/images/hpdemo.css);
quikc brown fox
quikc brown fox
quikc brown fox
quikc brown fox
quikc brown fox
quikc brown fox

You can think of highlight pseudos as innermost pseudo-elements that always exist at the bottom of any tree of elements and other pseudos, but unlike other pseudos, they don’t inherit their styles from that element tree.

Instead each highlight pseudo forms its own inheritance tree, parallel to the element tree. This means body::selection inherits from html::selection, not from ‘body’ itself.


At this point, you can probably see that the highlight pseudos are quite different from the rest of CSS, but there are also several special cases and rules needed to make them a coherent system.

For the typical appearance of spelling and grammar errors, highlight pseudos need to be able to add their own decorations, and they need to be able to leave the underlying foreground color unchanged. Highlight inheritance happens separately from the element tree, so we need some way to refer to the underlying foreground color.

That escape hatch is to set ‘color’ itself to ‘currentColor’, which is the default if nothing in the highlight tree sets ‘color’.

quick → quikc
quick → quikc
:root::spelling-error {
    /* color: currentColor; */
    text-decoration: red wavy underline;
}

This is a bit of a special case within a special case.

You see, ‘currentColor’ is usually defined as “the computed value of ‘color’”, but the way I like to think of it is “don’t change the foreground color”, and most color-valued properties like ‘text-decoration-color’ default to this value.

For ‘color’ itself that wouldn’t make sense, so we instead define ‘color:currentColor’ as equivalent to ‘color:inherit’, which still fits that mental model. But for highlights, that definition would no longer fit, so we redefine it as being the ‘color’ of the next active highlight below.

To make highlight inheritance actually useful for ‘text-decoration’ and ‘background-color’, all properties are inherited in highlight styles, even those that are not usually inherited.

quick fox
<style>
    aside::selection {
        background-color: yellow;
    }
</style>
<aside>
    <sup>quick</sup> fox
</aside>

This would conflict with the usual rules3 for decorating boxes, because descendants would get two decorations, one propagated and one inherited. We resolved this by making decorations added by highlights not propagate to any descendants.

quick fox
quick fox
quick fox
quikc fxo
<style>
    .blue {
        text-decoration: blue underline;
    }
    :root::spelling-error {
        text-decoration: red wavy underline;
    }
</style>
<div class="blue">
    <sup>quick</sup> fox
</div>
<div contenteditable spellcheck lang="en">
    <sup>quikc</sup> fxo
</div>

The blue decoration propagates to the sup element from the decorating box, so there should be a single line at the normal baseline. On the other hand, the spelling decoration is inherited by sup::spelling-error, so there should be separate lines for “quikc” and “fxo” at their respective baselines.

Unstyled highlight pseudos generally don’t change the appearance of the original content, so the default ‘color’ and ‘background-color’ in highlights are ‘currentColor’ and ‘transparent’ respectively, the latter being the property’s initial value. But two highlight pseudos, ::selection and ::target-text, have UA default foreground and background colors.

For compatibility with ::selection in older browsers, the UA default ‘color’ and ‘background-color’ (e.g. white on blue) is only used if neither were set by the author. This rule is known as paired cascade, and for consistency it also applies to ::target-text.

default on default plus more text
+
p { color: rebeccapurple; }
::selection { background: yellow; }
=currentColor on yellow plus more text

It’s common for selected text to almost invert the original text colors, turning black on white into white on blue, for example. To guarantee that the original decorations remain as legible as the text when highlighted, which is especially important for decorations with semantic meaning (e.g. line-through), originating decorations are recolored to the highlight ‘color’. This doesn’t apply to decorations added by highlights though, because that would break the typical appearance of spelling and grammar errors.

do not buy bread
do not buy bread
<style>
    del {
        text-decoration: darkred line-through;
    }
    ::selection {
        color: white;
        background: darkblue;
    }
</style>
<div>
    do <del>not</del> buy bread
</div>

This line-through decoration becomes white like the rest of the text when selected, even though it was explicitly set to ‘darkred’ in the original content.

The default style rules for highlight pseudos might look something like this. Notice the new ‘spelling-error’ and ‘grammar-error’ decorations, which authors can use to imitate native spelling and grammar errors.

:root::selection { background-color: Highlight; color: HighlightText; }
:root::target-text { background-color: Mark; color: MarkText; }
:root::spelling-error { text-decoration: spelling-error; }
:root::grammar-error { text-decoration: grammar-error; }

This doesn’t completely describe ::selection and ::target-text, due to paired cascade.


The way the highlight pseudos have been designed naturally leads to some limitations.

Gotchas

Removing decorations and shadows

Older browsers with ::selection tend to treat it purely as a way to change the original content’s styles, including text shadows and other decorations. Some tutorial content has even been written to that effect:

One of the most helpful uses for ::selection is turning off a text-shadow during selection. A text-shadow can clash with the selection’s background color and make the text difficult to read. Set text-shadow: none; to make text clear and easy to read during selection.

Under the spec, highlight pseudos can no longer remove or really change the original content’s decorations and shadows. Setting these properties in highlight pseudos to values other than ‘none’ adds decorations and shadows to the overlays when they are active.

del {
    text-decoration: line-through;
    text-shadow: 2px 2px red;
}
::highlight(undelete) {
    text-decoration: none;
    text-shadow: none;
}

This code means that ::highlight(undelete) adds no decorations or shadows, not that it removes the line-through and red shadow when del is highlighted.

While the new :has() selector might appear to offer a solution to this problem, pseudo-element selectors are not allowed in :has(), at least not yet.

del:has(::highlight(undelete)) {
    text-decoration: none;
    text-shadow: none;
}

This code does not work.

Removing shadows that might clash with highlight backgrounds (as suggested in the tutorial above) will no longer be as necessary anyway, since highlight backgrounds now paint on top of the original text shadows.

Faultlore
Faultlore
Faultlore
Faultlore

Faultlore
Faultlore
Faultlore
Faultlore
Faultlore
Faultlore

If you still want to ensure those shadows don’t clash with highlights in older browsers, you can set ‘text-shadow’ to ‘none’, which is harmless in newer browsers.

::selection { text-shadow: none; }

This rule might be helpful for older browsers, but note that like any universal rule, it can interfere with inheritance of ‘text-shadow’ when combined with more specific rules.

As for line decorations, if you’re really determined, you can work around this limitation by using ‘-webkit-text-fill-color’, a standard property (believe it or not) that controls the foreground fill color of text4.

::highlight(undelete) {
    color: transparent;
    -webkit-text-fill-color: CanvasText;
}

This hack hides any original decorations (in visual media), because those decorations are recolored to the highlight ‘color’, but it might change the text color too.

Fun fact: because of ‘-webkit-text-fill-color’ and its stroke-related siblings, it isn’t always possible for highlight pseudos to avoid changing the foreground colors of text, at least not without out-of-band knowledge of what those colors are.

the quick fox
the quikc fox
p { color: blue; }
em {
    -webkit-text-fill-color: yellow;
    -webkit-text-stroke: 1px green;
}
:root::spelling-error {
    /* default styles */
    color: currentColor;
    -webkit-text-fill-color: currentColor;
    -webkit-text-stroke-color: 0 currentColor;
    text-decoration: spelling-error;
}
em::spelling-error {
    /* styles needed to preserve text colors */
    -webkit-text-fill-color: yellow;
    -webkit-text-stroke: 1px green;
}

When a word in em is misspelled, it will become blue like the rest of p, unless the fill and stroke properties are set in ::spelling-error accordingly.

Accessing global constants

Highlight pseudos also don’t automatically have access to custom properties set in the element tree, which can make things tricky if you have a design system that exposes a color palette via custom properties on :root.

:root {
    --primary: #420420;
    --secondary: #C0FFEE;
    --accent: #663399;
}
::selection {
    background: var(--accent);
    color: var(--secondary);
}

This code does not work.

You can work around this by adding selectors for the necessary highlight pseudos to the rule defining the constants, or if the necessary highlight pseudos are unknown, by rewriting each constant as a custom @property rule.

:root, :root::selection {
    --primary: #420420;
    --secondary: #C0FFEE;
    --accent: #663399;
}
@property --primary {
    initial-value: #420420;
    syntax: "*"; inherits: false;
}
@property --secondary {
    initial-value: #C0FFEE;
    syntax: "*"; inherits: false;
}
@property --accent {
    initial-value: #663399;
    syntax: "*"; inherits: false;
}

Spec issues

While the design of the highlight pseudos has mostly settled, there are still some unresolved issues to watch out for.

  • how to use spelling and grammar decorations with the UA default colors (#7522)
  • values of non-applicable properties, e.g. ‘text-shadow’ with em units (#7591)
  • the meaning of underline- and emphasis-related properties in highlights (#7101)
  • whether ‘-webkit-text-fill-color’ and friends are allowed in highlights (#7580)
  • some browsers “tweak” the colors or alphas set in highlight styles (#6853)
  • how the highlight pseudos are supposed to interact with SVG (svgwg#894)

What now?

The highlight pseudos are a radical departure from older browsers with ::selection, and have some significant differences with CSS as we know it. Now that we have some experimental support, we want your help to play around with these features and help us make them as useful and ergonomic as possible before they’re set in stone.

Special thanks to Rego, Brian, Eric (Igalia), Florian, fantasai (CSSWG), Emilio (Mozilla), and Dan for their work in shaping the highlight pseudos (and this post). We would also like to thank Bloomberg for sponsoring this work.


  1. Dan, Fernando, Sanket, Luis, Bo, and anyone else I missed. 

  2. See this demo for more details.  2

  3. CSSWG discussion also found that decorating box semantics are undesirable for decorations added by highlights anyway. 

  4. This is actually the case everywhere the WHATWG compat spec applies, at all times. If you think about it, the only reason why setting ‘color’ to ‘red’ makes your text red is because ‘-webkit-text-fill-color’ defaults to ‘currentColor’. 

September 01, 2022 03:00 PM

Andy Wingo

new month, new brainworm

Today, a brainworm! I had a thought a few days ago and can't get it out of my head, so I need to pass it on to another host.

So, imagine a world in which there is a a drive to build a kind of Kubernetes on top of WebAssembly. Kubernetes nodes are generally containers, associated with additional metadata indicating their place in overall system topology (network connections and so on). (I am not a Kubernetes specialist, as you can see; corrections welcome.) Now in a WebAssembly cloud, the nodes would be components, probably also with additional topological metadata. VC-backed companies will duke it out for dominance of the WebAssembly cloud space, and in a couple years we will probably emerge with an open source project that has become a de-facto standard (though it might be dominated by one or two players).

In this world, Kubernetes and Spiffy-Wasm-Cloud will coexist. One of the success factors for Kubernetes was that you can just put your old database binary inside a container: it's the same ABI as when you run your database in a virtual machine, or on (so-called!) bare metal. The means of composition are TCP and UDP network connections between containers, possibly facilitated by some kind of network fabric. In contrast, in Spiffy-Wasm-Cloud we aren't starting from the kernel ABI, with processes and such: instead there's WASI, which is more of a kind of specialized and limited libc. You can't just drop in your database binary, you have to write code to get it to conform to the new interfaces.

One consequence of this situation is that I expect WASI and the component model to develop a rich network API, to allow WebAssembly components to interoperate not just with end-users but also other (micro-)services running in the same cloud. Likewise there is room here for a company to develop some complicated network fabrics for linking these things together.

However, WebAssembly-to-WebAssembly links are better expressed via typed functional interfaces; it's more expressive and can be faster. Not only can you end up having fine-grained composition that looks more like lightweight Erlang processes, you can also string together components in a pipeline with communications overhead approaching that of a simple function call. Relative to Kubernetes, there are potential 10x-100x improvements to be had, in throughput and in memory footprint, at least in some cases. It's the promise of this kind of improvement that can drive investment in this area, and eventually adoption.

But, you still have some legacy things running in containers. What to do? Well... Maybe recompile them to WebAssembly? That's my brain-worm.

A container is a file system image containing executable files and data. Starting with the executable files, they are in machine code, generally x64, and interoperate with system libraries and the run-time via an ABI. You could compile them to WebAssembly instead. You could interpret them as data, or JIT-compile them as webvm does, or directly compile them to WebAssembly. This is the sort of thing you hire Fabrice Bellard to do ;) Then you have the filesystem. Let's assume it is stateless: any change to the filesystem at runtime doesn't need to be preserved. (I understand this is a goal, though I could be wrong.) So you could put the filesystem in memory, as some kind of addressable data structure, and you make the libc interface access that data structure. It's something like the microkernel approach. And then you translate whatever topological connectivity metadata you had for Kubernetes to your Spiffy-Wasm-Cloud's format.

Anyway in the end you have a WebAssembly module and some metadata, and you can run it in your WebAssembly cloud. Or on the more basic level, you have a container and you can now run it on any machine with a WebAssembly implementation, even on other architectures (coucou RISC-V!).

Anyway, that's the tweet. Have fun, whoever gets to work on this :)

by Andy Wingo at September 01, 2022 10:12 AM

August 31, 2022

Qiuyi Zhang (Joyee)

Building V8 on an M1 MacBook

I’ve recently got an M1 MacBook and played around with it a bit. It seems many open source projects still haven’t added MacOS with ARM64

August 31, 2022 04:00 PM

August 29, 2022

Lauro Moura

Using Breakpad to generate crash dumps with WPE WebKit

Introduction and BreakPad overview

Breakpad is a tool from Google that helps generate crash reports. From its description:

Breakpad is a library and tool suite that allows you to distribute an application to users with compiler-provided debugging information removed, record crashes in compact "minidump" files, send them back to your server, and produce C and C++ stack traces from these minidumps. Breakpad can also write minidumps on request for programs that have not crashed.

It works by stripping the debug information from the executable and saving it into "symbol files." When a crash occurs or upon request, the Breakpad client library generates the crash information in these "minidumps." The Breakpad minidump processor combines these files with the symbol files and generates a human-readable stack trace. The following picture, also from Breakpad's documentation, describes this process:

Breakpad overview

In WPE, Breakpad support was added initially for the downstream 2.28 branch by Vivek Arumugam and backported upstream. It'll be available in the soon-to-be-released 2.38 version, and the WebKit Flatpak SDK bundles the Breakpad client library since late May 2022.

Enabling Breakpad in WebKit

As a developer feature, Breakpad support is disabled by default but can be enabled by passing -DENABLE_BREAKPAD=1 to cmake when building WebKit. Optionally, you can also set -DBREAKPAD_MINIDUMP_DIR=<some path> to hardcode the path used by Breakpad to save the minidumps. If not set during build time, BREAKPAD_MINIDUMP_DIR must be set as an environment variable pointing to a valid directory when running the application. If defined, this variable also overrides the path defined during the build.

Generating the symbols

To generate the symbol files, Breakpad provides the dump_syms tool. It takes a path to the executable/library and dumps to stdout the symbol information.

Once generated, the symbol files must be laid out in a specific tree structure so minidump_stackwalk can find them when merging with the crash information. The folder containing the symbol files must match a hash code generated for that specific binary. For example, in the case of the libWPEWebKit:

$ dump_syms WebKitBuild/WPE/Release/lib/libWPEWebKit-1.1.so > libWPEWebKit-1.1.so.0.sym
$ head -n 1 libWPEWebKit-1.1.so.0.sym
MODULE Linux x86_64 A2DA230C159B97DC00000000000000000 libWPEWebKit-1.1.so.0
$ mkdir -p ./symbols/libWPEWebKit-1.1.so.0/A2DA230C159B97DC00000000000000000
$ cp libWPEWebKit-1.1.so.0.sym ./symbols/libWPEWebKit-1.1.so.0/A2DA230C159B97DC00000000000000000/

Generating the crash log

Besides the symbol files, we need a minidump file with the stack information, which can be generated in two ways. First, by asking Breakpad to create it. The other way is, well, when the application crashes :)

To generate a minidump manually, you can either call google_breakpad::WriteMiniDump(path, callback, context) or send one of the crashing signals Breakpad recognizes. The former is helpful to generate the dumps programmatically at specific points, while the signal approach might be helpful to inspect hanging processes. These are the signals Breakpad handles as crashing ones:

  • SIGSEGV
  • SIGABRT
  • SIGFPE
  • SIGILL (Note: this is for illegal instruction, not the ordinary SIGKILL)
  • SIGBUS
  • SIGTRAP

Now, first we must run Cog:

$ BREAKPAD_MINIDUMP_DIR=/home/lauro/minidumps ./Tools/Scripts/run-minibrowser --wpe --release https://www.wpewebkit.org

Crashing the WebProcess using SIGTRAP:

$ ps aux | grep WebProcess
<SOME-PID> ... /app/webkit/.../WebProcess
$ kill -TRAP <SOME-PID>
$ ls /home/lauro/minidumps
5c2d93f2-6e9f-48cf-6f3972ac-b3619fa9.dmp
$ file ~/minidumps/5c2d93f2-6e9f-48cf-6f3972ac-b3619fa9.dmp
/home/lauro/minidumps/5c2d93f2-6e9f-48cf-6f3972ac-b3619fa9.dmp: Mini DuMP crash report, 13 streams, Thu Aug 25 20:29:11 2022, 0 type

Note: In the current form, WebKit supports Breakpad dumps in the WebProcess and NetworkProcess, which WebKit spawns itself. The developer must manually add support for it in the UIProcess (the browser/application using WebKit). The exception handler should be installed as early as possible, and many programs might do some initialization before initializing WebKit itself.

Translating the crash log

Once we have a .dmp crash log and the symbol files, we can use minidump_stackwalk to show the human-readable crash log:

$ minidump_stackwalk ~/minidumps/5c2d93f2-6e9f-48cf-6f3972ac-b3619fa9.dmp ./symbols/
<snip long header>
Operating system: Linux
                  0.0.0 Linux 5.15.0-46-generic #49-Ubuntu SMP Thu Aug 4 18:03:25 UTC 2022 x86_64
CPU: amd64
     family 23 model 113 stepping 0
     1 CPU

GPU: UNKNOWN

Crash reason:  SIGTRAP
Crash address: 0x3e800000000
Process uptime: not available

Thread 0 (crashed)
 0  libc.so.6 + 0xf71fd
    rax = 0xfffffffffffffffc   rdx = 0x0000000000000090
    rcx = 0x00007f6ae47d61fd   rbx = 0x00007f6ae4f0c2e0
    rsi = 0x0000000000000001   rdi = 0x000056187adbf6d0
    rbp = 0x00007fffa9268f10   rsp = 0x00007fffa9268ef0
     r8 = 0x0000000000000000    r9 = 0x00007f6ae4fdc2c0
    r10 = 0x00007fffa9333080   r11 = 0x0000000000000293
    r12 = 0x0000000000000001   r13 = 0x00007fffa9268f34
    r14 = 0x0000000000000090   r15 = 0x000056187ade7aa0
    rip = 0x00007f6ae47d61fd
    Found by: given as instruction pointer in context
 1  libglib-2.0.so.0 + 0x585ce
    rsp = 0x00007fffa9268f20   rip = 0x00007f6ae4efc5ce
    Found by: stack scanning
 2  libglib-2.0.so.0 + 0x58943
    rsp = 0x00007fffa9268f80   rip = 0x00007f6ae4efc943
    Found by: stack scanning
 3  libWPEWebKit-1.1.so.0!WTF::RunLoop::run() + 0x120
    rsp = 0x00007fffa9268fa0   rip = 0x00007f6ae8923180
    Found by: stack scanning
 4  libWPEWebKit-1.1.so.0!WebKit::WebProcessMain(int, char**) + 0x11e
    rbx = 0x000056187a5428d0   rbp = 0x0000000000000003
    rsp = 0x00007fffa9268fd0   r12 = 0x00007fffa9269148
    rip = 0x00007f6ae74719fe
    Found by: call frame info
<snip remaining trace>

Final words

This article briefly overviews enabling and using Breakpad to generate crash dumps. In a future article, we'll cover using Breakpad to get crashdumps while running WPE on embedded boards like RaspberryPis.

o/

by Lauro Moura at August 29, 2022 03:57 AM

August 23, 2022

Andy Wingo

accessing webassembly reference-typed arrays from c++

The WebAssembly garbage collection proposal is coming soonish (really!) and will extend WebAssembly with the the capability to create and access arrays whose memory is automatically managed by the host. As long as some system component has a reference to an array, it will be kept alive, and as soon as nobody references it any more, it becomes "garbage" and is thus eligible for collection.

(In a way it's funny to define the proposal this way, in terms of what happens to garbage objects that by definition aren't part of the program's future any more; really the interesting thing is the new things you can do with live data, defining new data types and representing them outside of linear memory and passing them between components without copying. But "extensible-arrays-structs-and-other-data-types" just isn't as catchy as "GC". Anyway, I digress!)

One potential use case for garbage-collected arrays is for passing large buffers between parts of a WebAssembly system. For example, a webcam driver could produce a stream of frames as reference-typed arrays of bytes, and then pass them by reference to a sandboxed WebAssembly instance to, I don't know, identify cats in the images or something. You get the idea. Reference-typed arrays let you avoid copying large video frames.

A lot of image-processing code is written in C++ or Rust. With WebAssembly 1.0, you just have linear memory and no reference-typed values, which works well for these languages that like to think of memory as having a single address space. But once you get reference-typed arrays in the mix, you effectively have multiple address spaces: you can't address the contents of the array using a normal pointer, as you might be able to do if you mmap'd the buffer into a program's address space. So what do you do?

reference-typed values are special

The broader question of C++ and GC-managed arrays is, well, too broad for today. The set of array types is infinite, because it's not just arrays of i32, it's also arrays of arrays of i32, and arrays of those, and arrays of records, and so on.

So let's limit the question to just arrays of i8, to see if we can make some progress. So imagine a C function that takes an array of i8:

void process(array_of_i8 array) {
  // ?
}

If you know WebAssembly, there's a clear translation of the sort of code that we want:

(func (param $array (ref (array i8)))
  ; operate on local 0
  )

The WebAssembly function will have an array as a parameter. But, here we start to run into more problems with the LLVM toolchain that we use to compile C and other languages to WebAssembly. When the C front-end of LLVM (clang) compiles a function to the LLVM middle-end's intermediate representation (IR), it models all local variables (including function parameters) as mutable memory locations created with alloca. Later optimizations might turn these memory locations back to SSA variables and thence to registers or stack slots. But, a reference-typed value has no bit representation, and it can't be stored to linear memory: there is no alloca that can hold it.

Incidentally this problem is not isolated to future extensions to WebAssembly; the externref and funcref data types that landed in WebAssembly 2.0 and in all browsers are also reference types that can't be written to main memory. Similarly, the table data type which is also part of shipping WebAssembly is not dissimilar to GC-managed arrays, except that they are statically allocated at compile-time.

At Igalia, my colleagues Paulo Matos and Alex Bradbury have been hard at work to solve this gnarly problem and finally expose reference-typed values to C. The full details and final vision are probably a bit too much for this article, but some bits on the mechanism will help.

Firstly, note that LLVM has a fairly traditional breakdown between front-end (clang), middle-end ("the IR layer"), and back-end ("the MC layer"). The back-end can be quite target-specific, and though it can be annoying, we've managed to get fairly good support for reference types there.

In the IR layer, we are currently representing GC-managed values as opaque pointers into non-default, non-integral address spaces. LLVM attaches an address space (an integer less than 224 or so) to each pointer, mostly for OpenCL and GPU sorts of use-cases, and we abuse this to prevent LLVM from doing much reasoning about these values.

This is a bit of a theme, incidentally: get the IR layer to avoid assuming anything about reference-typed values. We're fighting the system, in a way. As another example, because LLVM is really oriented towards lowering high-level constructs to low-level machine operations, it doesn't necessarily preserve types attached to pointers on the IR layer. Whereas for WebAssembly, we need exactly that: we reify types when we write out WebAssembly object files, and we need LLVM to pass some types through from front-end to back-end unmolested. We've had to change tack a number of times to get a good way to preserve data from front-end to back-end, and probably will have to do so again before we end up with a final design.

Finally on the front-end we need to generate an alloca in different address spaces depending on the type being allocated. And because reference-typed arrays can't be stored to main memory, there are semantic restrictions as to how they can be used, which need to be enforced by clang. Fortunately, this set of restrictions is similar enough to what is imposed by the ARM C Language Extensions (ACLE) for scalable vector (SVE) values, which also don't have a known bit representation at compile-time, so we can piggy-back on those. This patch hasn't landed yet, but who knows, it might land soon; in the mean-time we are going to run ahead of upstream a bit to show how you might define and use an array type definition. Further tacks here are also expected, as we try to thread the needle between exposing these features to users and not imposing too much of a burden on clang maintenance.

accessing array contents

All this is a bit basic, though; it just gives you enough to have a local variable or a function parameter of a reference-valued type. Let's continue our example:

void process(array_of_i8 array) {
  uint32_t sum;
  for (size_t idx = 0; i < __builtin_wasm_array_length(array); i++)
    sum += (uint8_t)__builtin_wasm_array_ref_i8(array, idx);
  // ...
}

The most basic way to extend C to access these otherwise opaque values is to expose some builtins, say __builtin_wasm_array_length and so on. Probably you need different intrinsics for each scalar array element type (i8, i16, and so on), and one for arrays which return reference-typed values. We'll talk about arrays of references another day, but focusing on the i8 case, the C builtin then lowers to a dedicated LLVM intrinsic, which passes through the middle layer unscathed.

In C++ I think we can provide some nicer syntax which preserves the syntactic illusion of array access.

I think this is going to be sufficient as an MVP, but there's one caveat: SIMD. You can indeed have an array of i128 values, but you can only access that array's elements as i128; worse, you can't load multiple data from an i8 array as i128 or even i32.

Compare this to to the memory control proposal, which instead proposes to map buffers to non-default memories. In WebAssembly, you can in theory (and perhaps soon in practice) have multiple memories. The easiest way I can see on the toolchain side is to use the address space feature in clang:

void process(uint8_t *array __attribute__((address_space(42))),
             size_t len) {
  uint32_t sum;
  for (size_t idx = 0; i < len; i++)
    sum += array[idx];
  // ...
}

How exactly to plumb the mapping between address spaces which can only be specified by number from the front-end to the back-end is a little gnarly; really you'd like to declare the set of address spaces that a compilation unit uses symbolically, and then have the linker produce a final allocation of memory indices. But I digress, it's clear that with this solution we can use SIMD instructions to load multiple bytes from memory at a time, so it's a winner with respect to accessing GC arrays.

Or is it? Perhaps there could be SIMD extensions for packed GC arrays. I think it makes sense, but it's a fair amount of (admittedly somewhat mechanical) specification and implementation work.

& future

In some future bloggies we'll talk about how we will declare new reference types: first some basics, then some more integrated visions for reference types and C++. Lots going on, and this is just a brain-dump of the current state of things; thoughts are very much welcome.

by Andy Wingo at August 23, 2022 10:35 AM

August 18, 2022

Andy Wingo

just-in-time code generation within webassembly

Just-in-time (JIT) code generation is an important tactic when implementing a programming language. Generating code at run-time allows a program to specialize itself against the specific data it is run against. For a program that implements a programming language, that specialization is with respect to the program being run, and possibly with respect to the data that program uses.

The way this typically works is that the program generates bytes for the instruction set of the machine it's running on, and then transfers control to those instructions.

Usually the program has to put its generated code in memory that is specially marked as executable. However, this capability is missing in WebAssembly. How, then, to do just-in-time compilation in WebAssembly?

webassembly as a harvard architecture

In a von Neumman machine, like the ones that you are probably reading this on, code and data share an address space. There's only one kind of pointer, and it can point to anything: the bytes that implement the sin function, the number 42, the characters in "biscuits", or anything at all. WebAssembly is different in that its code is not addressable at run-time. Functions in a WebAssembly module are numbered sequentially from 0, and the WebAssembly call instruction takes the callee as an immediate parameter.

So, to add code to a WebAssembly program, somehow you'd have to augment the program with more functions. Let's assume we will make that possible somehow -- that your WebAssembly module that had N functions will now have N+1 functions, and with function N being the new one your program generated. How would we call it? Given that the call instructions hard-code the callee, the existing functions 0 to N-1 won't call it.

Here the answer is call_indirect. A bit of a reminder, this instruction take the callee as an operand, not an immediate parameter, allowing it to choose the callee function at run-time. The callee operand is an index into a table of functions. Conventionally, table 0 is called the indirect function table as it contains an entry for each function which might ever be the target of an indirect call.

With this in mind, our problem has two parts, then: (1) how to augment a WebAssembly module with a new function, and (2) how to get the original module to call the new code.

late linking of auxiliary webassembly modules

The key idea here is that to add code, the main program should generate a new WebAssembly module containing that code. Then we run a linking phase to actually bring that new code to life and make it available.

System linkers like ld typically require a complete set of symbols and relocations to resolve inter-archive references. However when performing a late link of JIT-generated code, we can take a short-cut: the main program can embed memory addresses directly into the code it generates. Therefore the generated module would import memory from the main module. All references from the generated code to the main module can be directly embedded in this way.

The generated module would also import the indirect function table from the main module. (We would ensure that the main module exports its memory and indirect function table via the toolchain.) When the main module makes the generated module, it also embeds a special patch function in the generated module. This function would add the new functions to the main module's indirect function table, and perform any relocations onto the main module's memory. All references from the main module to generated functions are installed via the patch function.

We plan on two implementations of late linking, but both share the fundamental mechanism of a generated WebAssembly module with a patch function.

dynamic linking via the run-time

One implementation of a linker is for the main module to cause the run-time to dynamically instantiate a new WebAssembly module. The run-time would provide the memory and indirect function table from the main module as imports when instantiating the generated module.

The advantage of dynamic linking is that it can update a live WebAssembly module without any need for re-instantiation or special run-time checkpointing support.

In the context of the web, JIT compilation can be triggered by the WebAssembly module in question, by calling out to functionality from JavaScript, or we can use a "pull-based" model to allow the JavaScript host to poll the WebAssembly instance for any pending JIT code.

For WASI deployments, you need a capability from the host. Either you import a module that provides run-time JIT capability, or you rely on the host to poll you for data.

static linking via wizer

Another idea is to build on Wizer's ability to take a snapshot of a WebAssembly module. You could extend Wizer to also be able to augment a module with new code. In this role, Wizer is effectively a late linker, linking in a new archive to an existing object.

Wizer already needs the ability to instantiate a WebAssembly module and to run its code. Causing Wizer to ask the module if it has any generated auxiliary module that should be instantiated, patched, and incorporated into the main module should not be a huge deal. Wizer can already run the patch function, to perform relocations to patch in access to the new functions. After having done that, Wizer (or some other tool) would need to snapshot the module, as usual, but also adding in the extra code.

As a technical detail, in the simplest case in which code is generated in units of functions which don't directly call each other, this is as simple as just appending the functions to the code section and then and appending the generated element segments to the main module's element segment, updating the appended function references to their new values by adding the total number of functions in the module before the new module was concatenated to each function reference.

late linking appears to be async codegen

From the perspective of a main program, WebAssembly JIT code generation via late linking appears the same as aynchronous code generation.

For example, take the C program:

struct Value;
struct Func {
  struct Expr *body;
  void *jitCode;
};

void recordJitCandidate(struct Func *func);
uint8_t* flushJitCode(); // Call to actually generate JIT code.

struct Value* interpretCall(struct Expr *body,
                            struct Value *arg);

struct Value* call(struct Func *func,
                   struct Value* val) {
  if (func->jitCode) {
    struct Value* (*f)(struct Value*) = jitCode;
    return f(val);
  } else {
    recordJitCandidate(func);
    return interpretCall(func->body, val);
  }
}

Here the C program allows for the possibility of JIT code generation: there is a slot in a Func instance to fill in with a code pointer. If this program generates code for a given Func, it won't be able to fill in the pointer -- it can't add new code to the image. But, it could tell Wizer to do so, and Wizer could snapshot the program, link in the new function, and patch &func->jitCode. From the program's perspective, it's as if the code becomes available asynchronously.

demo!

So many words, right? Let's see some code! As a sketch for other JIT compiler work, I implemented a little Scheme interpreter and JIT compiler, targetting WebAssembly. See interp.cc for the source. You compile it like this:

$ /opt/wasi-sdk/bin/clang++ -O2 -Wall \
   -mexec-model=reactor \
   -Wl,--growable-table \
   -Wl,--export-table \
   -DLIBRARY=1 \
   -fno-exceptions \
   interp.cc -o interplib.wasm

Here we are compiling with WASI SDK. I have version 14.

The -mexec-model=reactor argument means that this WASI module isn't just a run-once thing, after which its state is torn down; rather it's a multiple-entry component.

The two -Wl, options tell the linker to export the indirect function table, and to allow the indirect function table to be augmented by the JIT module.

The -DLIBRARY=1 is used by interp.cc; you can actually run and debug it natively but that's just for development. We're instead compiling to wasm and running with a WASI environment, giving us fprintf and other debugging niceties.

The -fno-exceptions is because WASI doesn't support exceptions currently. Also we don't need them.

WASI is mainly for non-browser use cases, but this module does so little that it doesn't need much from WASI and I can just polyfill it in browser JavaScript. So that's what we have here:

loading wasm-jit...

JavaScript disabled, no wasm-jit demo. See the wasm-jit web page for more information. &&<<<<><>>&&<&>>>><><><><><><>>><>>

Each time you enter a Scheme expression, it will be parsed to an internal tree-like intermediate language. You can then run a recursive interpreter over that tree by pressing the "Evaluate" button. Press it a number of times, you should get the same result.

As the interpreter runs, it records any closures that it created. The Func instances attached to the closures have a slot for a C++ function pointer, which is initially NULL. Function pointers in WebAssembly are indexes into the indirect function table; the first slot is kept empty so that calling a NULL pointer (a pointer with value 0) causes an error. If the interpreter gets to a closure call and the closure's function's JIT code pointer is NULL, it will interpret the closure's body. Otherwise it will call the function pointer.

If you then press the "JIT" button above, the module will assemble a fresh WebAssembly module containing JIT code for the closures that it saw at run-time. Obviously that's just one heuristic: you could be more eager or more lazy; this is just a detail.

Although the particular JIT compiler isn't much of interest---the point being to see JIT code generation at all---it's nice to see that the fibonacci example sees a good speedup; try it yourself, and try it on different browsers if you can. Neat stuff!

not just the web

I was wondering how to get something like this working in a non-webby environment and it turns out that the Python interface to wasmtime is just the thing. I wrote a little interp.py harness that can do the same thing that we can do on the web; just run as `python3 interp.py`, after having `pip3 install wasmtime`:

$ python3 interp.py
...
Calling eval(0x11eb0) 5 times took 1.716s.
Calling jitModule()
jitModule result: <wasmtime._module.Module object at 0x7f2bef0821c0>
Instantiating and patching in JIT module
... 
Calling eval(0x11eb0) 5 times took 1.161s.

Interestingly it would appear that the performance of wasmtime's code (0.232s/invocation) is somewhat better than both SpiderMonkey (0.392s) and V8 (0.729s).

reflections

This work is just a proof of concept, but it's a step in a particular direction. As part of previous work with Fastly, we enabled the SpiderMonkey JavaScript engine to run on top of WebAssembly. When combined with pre-initialization via Wizer, you end up with a system that can start in microseconds: fast enough to instantiate a fresh, shared-nothing module on every HTTP request, for example.

The SpiderMonkey-on-WASI work left out JIT compilation, though, because, you know, WebAssembly doesn't support JIT compilation. JavaScript code actually ran via the C++ bytecode interpreter. But as we just found out, actually you can compile the bytecode: just-in-time, but at a different time-scale. What if you took a SpiderMonkey interpreter, pre-generated WebAssembly code for a user's JavaScript file, and then combined them into a single freeze-dried WebAssembly module via Wizer? You get the benefits of fast startup while also getting decent baseline performance. There are many engineering considerations here, but as part of work sponsored by Shopify, we have made good progress in this regard; details in another missive.

I think a kind of "offline JIT" has a lot of value for deployment environments like Shopify's and Fastly's, and you don't have to limit yourself to "total" optimizations: you can still collect and incorporate type feedback, and you get the benefit of taking advantage of adaptive optimization without having to actually run the JIT compiler at run-time.

But if we think of more traditional "online JIT" use cases, it's clear that relying on host JIT capabilities, while a good MVP, is not optimal. For one, you would like to be able to freely emit direct calls from generated code to existing code, instead of having to call indirectly or via imports. I think it still might make sense to have a language run-time express its generated code in the form of a WebAssembly module, though really you might want native support for compiling that code (asynchronously) from within WebAssembly itself, without calling out to a run-time. Most people I have talked to that work on WebAssembly implementations in JS engines believe that a JIT proposal will come some day, but it's good to know that we don't have to wait for it to start generating code and taking advantage of it.

& out

If you want to play around with the demo, do take a look at the wasm-jit Github project; it's fun stuff. Happy hacking, and until next time!

by Andy Wingo at August 18, 2022 03:36 PM

August 15, 2022

Eric Meyer

Table Column Alignment with Variable Transforms

One of the bigger challenges of recreating The Effects of Nuclear Weapons for the Web was its tables.  It was easy enough to turn tab-separated text and numbers into table markup, but the column alignment almost broke me.

To illustrate what I mean, here are just a few examples of columns that had to be aligned.

.fader div {position: relative;} .fader > div > * {max-width: 100%;} .fader > div > a > :first-child {position: absolute; top: 0; left: 0;} .fader:is(:hover, :focus) > div > a > :first-child { animation: fadeCycle 1s -0.33s infinite alternate;} @keyframes fadeCycle { from {opacity: 1;} 33.3% {opacity: 1;} 66.7% {opacity: 0;} to {opacity: 0;} }
A few of the many tables in the book and their fascinating column alignments.  (Hover/focus this figure to start a cyclic animation fading some alignment lines in and out. Sorry if that doesn’t work for you, mobile readers.)

At first I naïvely thought, “No worries, I can right- or left-align most of these columns and figure out the rest later.”  But then I looked at the centered column headings, and how the column contents were essentially centered on the headings while having their own internal horizontal alignment logic, and realized all my dreams of simple fixes were naught but ashes.

My next thought was to put blank spacer columns between the columns of visible content, since table layout doesn’t honor the gap property, and then set a fixed width for various columns.  I really didn’t like all the empty-cell spam that would require, even with liberal application of the rowspan attribute, and it felt overly fragile  —  any shifts in font face (say, on an older or niche system) might cause layout upset within the visible columns, such as wrapping content that shouldn’t be wrapped or content overlapping other content.  I felt like there was a better answer.

I also thought about segregating every number and symbol (including decimal separators) into separate columns, like this:

<tr>
  <th>Neutrinos from fission products</th>
  <td>10</td> 
  <td></td>
  <td></td>
</tr>
<tr class="total">
  <th>Total energy per fission</th>
  <td>200</td>
  <td>±</td>
  <td>6</td>
</tr>

Then I contemplated what that would do to screen readers and the document structure in general, and after the nausea subsided, I decided to look elsewhere.

It was at that point I thought about using spacer <span>s.  Like, anywhere I needed some space next to text in order to move it to one side or the other, I’d throw in something like one of these:

<span class="spacer"></span>
<span style="display: inline; width: 2ch;"></span>

Again, the markup spam repulsed me, but there was the kernel of an idea in there… and when I combined it with the truism “CSS doesn’t care what you expect elements to look or act like”, I’d hit upon my solution.

Let’s return to Table 1.43, which I used as an illustration in the announcement post.  It’s shown here in its not-aligned and aligned states, with borders added to the table-cell elements.

Table 1.43 before and after the cells are shifted to make their contents visually align.

This is exactly the same table, only with cells shifted to one side or another in the second case.  To make this happen, I first set up a series of CSS rules:

figure.table .lp1 {transform: translateX(0.5ch);}
figure.table .lp2 {transform: translateX(1ch);}
figure.table .lp3 {transform: translateX(1.5ch);}
figure.table .lp4 {transform: translateX(2ch);}
figure.table .lp5 {transform: translateX(2.5ch);}

figure.table .rp1 {transform: translateX(-0.5ch);}
figure.table .rp2 {transform: translateX(-1ch);}

For a given class, the table cell is translated along the X axis by the declared number of ch units.  Yes, that means the table cells sharing a column no longer actually sit in the column.  No, I don’t care — and neither, as I said, does CSS.

I chose the labels lp and rp for “left pad” and “right pad”, in part as a callback to the left-pad debacle of yore even though it has basically nothing to do with what I’m doing here.  (Many of my class names are private jokes to myself.  We take our pleasures where we can.)  The number in each class name represents the number of “characters” to pad, which here increment by half-ch measures.  Since I was trying to move things by characters, using the unit that looks like it’s a character measure (even though it really isn’t) made sense to me.

With those rules set up, I could add simple classes to table cells that needed to be shifted, like so:

<td class="lp3">5 ± 0.5</td>

<td class="rp2">10</td>

That was most of the solution, but it turned out to not be quite enough.  See, things like decimal places and commas aren’t as wide as the numbers surrounding them, and sometimes that was enough to prevent a specific cell from being able to line up with the rest of its column.  There were also situations where the data cells could all be aligned with each other, but were unacceptably offset from the column header, which was nearly always centered.

So I decided to calc() the crap out of this to add the flexibility a custom property can provide.  First, I set a sitewide variable:

body {
	--offset: 0ch;
}

I then added that variable to the various transforms:

figure.table .lp1 {transform: translateX(calc(0.5ch + var(--offset)));}
figure.table .lp2 {transform: translateX(calc(1ch   + var(--offset)));}
figure.table .lp3 {transform: translateX(calc(1.5ch + var(--offset)));}
figure.table .lp4 {transform: translateX(calc(2ch   + var(--offset)));}
figure.table .lp5 {transform: translateX(calc(2.5ch + var(--offset)));}

figure.table .rp1 {transform: translateX(calc(-0.5ch + var(--offset)));}
figure.table .rp2 {transform: translateX(calc(-1ch   + var(--offset)));}

Why use a variable at all?  Because it allows me to define offsets specific to a given table, or even specific to certain table cells within a table.  Consider the styles embedded along with Table 3.66:

#tbl3-66 tbody tr:first-child td:nth-child(1),
#tbl3-66 tbody td:nth-child(7) {
	--offset: 0.25ch;
}
#tbl3-66 tbody td:nth-child(4) {
	--offset: 0.1ch;	
}

Yeah. The first cell of the first row and the seventh cell of every row in the table body needed to be shoved over an extra quarter-ch, and the fourth cell in every table-body row (under the heading “Sp”) got a tenth-ch nudge.  You can judge the results for yourself.

So, in the end, I needed only sprinkle class names around table markup where needed, and add a little extra offset via a custom property that I could scope to exactly where needed.  Sure, the whole setup is hackier than a panel of professional political pundits, but it works, and to my mind, it beats the alternatives.

I’d have been a lot happier if I could have aligned some of the columns on a specific character.  I think I still would have needed the left- and right-pad approach, but there were a lot of columns where I could have reduced or eliminated all the classes.  A quarter-century ago, HTML 4 had this capability, in that you could write:

<COLGROUP>
	<COL>
	<COL>
	<COL align="±">
</COLGROUP>

CSS2 was also given this power via text-align, where you could give it a string value in order to specify horizontal alignment.

But browsers never really supported these features, even if some of them do still have bugs open on the issue.  (I chuckle aridly every time I go there and see “Opened 24 years ago” a few lines above “Status: NEW”.)  I know it’s not top of anybody’s wish list, but I wouldn’t mind seeing that capability return, somehow. Maybe as something that could be used in Grid column tracks as well as table columns.

I also found myself really pining for the ability to use attr() here, which would have allowed me to drop the classes and use data-* attributes on the table cells to say how far to shift them.  I could even have dropped the offset variable.  Instead, it could have looked something like this:

<td data-pad="3.25">5 ± 0.5</td>

<td data-pad="-1.9">10</td>

figure.table *[data-pad] {transform: translateX(attr(data-pad,'ch'));}

Alas, attr() is confined to the content property, and the idea of letting it be used more widely remains unrealized.

Anyway, that was my journey into recreating mid-20th-Century table column alignment on the Web.  It’s true that sufficiently old browsers won’t get the fancy alignment due to not supporting custom properties or calc(), but the data will all still be there.  It just won’t have the very specific column alignment, that’s all.  Hooray for progressive enhancement!


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at August 15, 2022 02:41 PM

August 10, 2022

Javier Fernández

New Custom Handlers component for Chrome

The HTML Standard section on Custom Handlers describes the procedure to register custom protocol handlers for URLs with specific schemes. When a URL is followed, the protocol dictates which part of the system handles it by consulting registries. In some cases, handlers may fall to the OS level and and launch a desktop application (eg. mailto:// may launch a native application) or, the browser may redirect the request to a different HTTP(S) URL (eg. mailto:// can go to gmail).

There are multiple ways to invoke the registerProtocolHandler method from the HTML API. The most common is through Browser Extensions, or add-ons, where the user must install third-party software to add or modify a browser’s feature. However, this approach puts on the users the responsibility of ensuring the extension is secure and that it respects their privacy. Ideally, downstream browsers could ship built-in protocol handlers that take this load off of users, but this was previously difficult.

During the last years Igalia and Protocol Labs have been working on improving the support of this HTML API in several of the main web engines, with the long term goal of getting the IPFS protocol a first-class member inside the most relevant browsers.

In this post I’m going to explain the most recent improvements, especially in Chrome, and some side advantages for Chromium embedders that we got on the way.

Componentization of the Custom Handler logic

The first challenge faced by Chromium-based browsers wishing to add support for a new protocol was the architecture. Most of the logic and relevant codebase lived in the //chrome layer. Thus, any intent to introduce a behavior change on this logic would require a direct fork and patch the //chrome layers with the new logic. The design wasn’t defined to allow Chrome embedders to extend it with new features

The Chromium project provides a Public Content API to allow embedders reusing a great part of the browser’s common logic and to implement, if needed, specific behavior though the specialization of these APIs. These would seem to be the ideal way of extending the browser capabilities to handle new protocols. However, accessing //chrome from any layer (eg. //content or //components) is forbidden and constitutes a layering violation.

After some discussion with Chrome engineers (special thanks to Kuniko Yasuda and Colin Blundell) we decided to move the Protocol Handlers logic to the //components layer and create a new Custom Handlers component.

The following diagram provides a high-level overview of the architectural change:

Custom Handlers logic moved to the //component layer
Changes in the Chrome’s component design

The new component makes the Protocol Handler logic Chrome independent, bringing us the possibility of using the Content Public API to introduce the mentioned behavior changes, as we’ll see later in this post.

Only the registry’s Delegate and the Factory classes will remain in the //chrome layer.

The Delegate class was defined to provide OS integration, like setting as default browser, which needs to access the underlying operating systems interfaces. It also deals with other browser-specific logic, like Profiles. With the new componentized design, embedders may provide their own Delegate implementations with different strategies to interact with the operating system’s desktop or even support additional flavors.

On the other hand, the Factory (in addition to the creation of the appropriated Delegate’s instance) provides a BrowserContext that allows the registry to operate under Incognito mode.

Lets now get a deeper look into the component, trying to understand the responsibilities of each class. The following class diagram illustrates the multi-process architecture of the Custom Handler logic:

Class diagram of the registerProtocolHandler logic in Chrome's multi-process architecture
Multi-process architecture class diagram

The ProtocolHandlerRegistry class has been modified to allow operating without user preferences storage; this is particularly useful to move most of the browser tests to the new component and avoid the dependency with the prefs and user_prefs components (eg. testing). In this scenario the registered handlers will be stored in memory only.

It’s worth mentioning that as part of this architectural change, I’ve applied some code cleanup and refactoring, following the policies dictated in the base::Value Code Health task-force. Some examples:

The ProtocolHandlerThrottle implements URL redirection with the handler provided by the registry. The ChromeContentBrowserClient and ShellBrowserContentClient classes use it to implement the CreateURLLoaderThrottles method of the ContentBrowserClient interface.

The specialized request is the responsible for handling the permission prompt dialog response, except for the the content-shell implementations, which bypass the Permission APIs (more on this later). I’ve also been working on a new design that relies on the PermissionContextBase::CreatePermissionRequest interface, so that we could create our own custom request and avoid the direct call to the PermissionRequestManager. Unfortunately this still needs some additional support to deal with the RegisterProtocolHandlerPermissionRequest‘s specific parameters needed for creating new instances, but I won’t extend on this, as it deserves its own post.

Finally, I’d like to remark that the new component provides a simple registry factory, designed mainly for testing purposes. It’s basically the same as the one implemented in the //chrome layer except some limitations of the new component:

  • No incognito mode
  • Use of a mocking Delegate (no actual OS interaction)

Single-Point security checks

I also wanted to apply some refactoring as well to improve the inter-process consistency and increase the amount of shared code between the renderer and browser process.

One of the most relevant parts of this refactoring was the one applied to the implement the Custom Handlers’ normalization process, which goes from some URL’s syntax validation to the security and privacy checks. I’ve landed some patches (CL#3507332, CL#3692750) to implement functions that are shared among the renderer and browser processes to implement the mentioned normalization procedure.

Refactoring to implement a single-point security and privacy checks
Security and privacy logic refactoring

The code has been moved to blink/common so that it could be invoked also by the new Custom Handlers component described before. Both the WebContentsImpl (run by the Browser process) and the NavigatorContentUtils (by the Renderer process) rely on this common code now.

Additionally, I have moved all the security and privacy checks from the Browser class, in Chrome, to the WebContentsImpl class. This implies that any embedder that implements the WebContentsDelegate interface shouldn’t need to bother about these checks, assuming that any ProtocolHandler instance is valid.

Conclusion

In the introduction I mentioned that the main goal of this refactoring was to decouple the Custom Handlers logic from the Chrome’s specific codebase; this would allow us to modify or even extend the implementation of the Custom Handlers APIs.

The componentization of some parts of the Chrome’s codebase has many advantages, as it’s described in the Browser Components design document. Additionally, a new Custom Handlers component would provide other interesting advantages on different fronts.

One of the most interesting use cases is the definition of Web Platform Tests for the Custom Handlers APIs. This goal has 2 main challenges to address:

  • Testing Automation
  • Content Shell support

Finally, we would like this new component to be useful for Chrome embedders, and even browsers built directly on top of a full featured Chrome, to implement new Custom Handlers related features. This line of work would eventually be useful to improve the support of distributed protocols like IPFS, as an intermediate step towards a native implementation of the protocols in the browser.

by jfernandez at August 10, 2022 10:19 AM

August 09, 2022

Eric Meyer

Recreating “The Effects of Nuclear Weapons” for the Web

In my previous post, I wrote about a way to center elements based on their content, without forcing the element to be a specific width, while preserving the interior text alignment.  In this post, I’d like to talk about why I developed that technique.

Near the beginning of this year, fellow Web nerd and nuclear history buff Chris Griffith mentioned a project to put an entire book online: The Effects of Nuclear Weapons by Samuel Glasstone and Philip J. Dolan, specifically the third (1977) edition.  Like Chris, I own a physical copy of this book, and in fact, the information and tools therein were critical to the creation of HYDEsim, way back in the Aughts.  I acquired it while in pursuit of my degree in History, for which I studied the Cold War and the policy effects of the nuclear arms race, from the first bombers to the Strategic Defense Initiative.

I was immediately intrigued by the idea and volunteered my technical services, which Chris accepted.  So we started taking the OCR output of a PDF scan of the book, cleaning up the myriad errors, re-typing the bits the OCR mangled too badly to just clean up, structuring it all with HTML, converting figures to PNGs and photos to JPGs, and styling the whole thing for publication, working after hours and in odd down times to bring this historical document to the Web in a widely accessible form.  The result of all that work is now online.

That linked page is the best example of the technique I wrote about in the aforementioned previous post: as a Table of Contents, none of the lines actually get long enough to wrap.  Rather than figuring out the exact length of the longest line and centering based on that, I just let CSS do the work for me.

There were a number of other things I invented (probably re-invented) as we progressed.  Footnotes appear at the bottom of pages when the footnote number is activated through the use of the :target pseudo-class and some fixed positioning.  It’s not completely where I wanted it to be, but I think the rest will require JS to pull off, and my aim was to keep the scripting to an absolute minimum.

LaTeX and MathJax made writing and rendering this sort of thing very easy.

I couldn’t keep the scripting to zero, because we decided early on to use MathJax for the many formulas and other mathematical expressions found throughout the text.  I’d never written LaTeX before, and was very quickly impressed by how compact and yet powerful the syntax is.

Over time, I do hope to replace the MathJax-parsed LaTeX with raw MathML for both accessibility and project-weight reasons, but as of this writing, Chromium lacks even halfway-decent MathML support, so we went with the more widely-supported solution.  (My colleague Frédéric Wang at Igalia is pushing hard to fix this sorry state of affairs in Chromium, so I do have hopes for a migration to MathML… some day.)

The figures (as distinct from the photos) throughout the text presented an interesting challenge.  To look at them, you’d think SVG would be the ideal image format. Had they come as vector images, I’d agree, but they’re raster scans.  I tried recreating one or two in hand-crafted SVG and quickly determined the effort to create each was significant, and really only worked for the figures that weren’t charts, graphs, or other presentations of data.  For anything that was a chart or graph, the risk of introducing inaccuracies was too high, and again, each would have required an inordinate amount of effort to get even close to correct.  That’s particularly true considering that without knowing what font face was being used for the text labels in the figures, they’d have to be recreated with paths or polygons or whatever, driving the cost-to-recreate astronomically higher.

So I made the figures PNGs that are mostly transparent, except for the places where there was ink on the paper.  After any necessary straightening and some imperfection cleanup in Acorn, I then ran the PNGs through the color-index optimization process I wrote about back in 2020, which got them down to an average of 75 kilobytes each, ranging from 443KB down to 7KB.

At the 11th hour, still secretly hoping for a magic win, I ran them all through svgco.de to see if we could get automated savings.  Of the 161 figures, exactly eight of them were made smaller, which is not a huge surprise, given the source material.  So, I saved those eight for possible future updates and plowed ahead with the optimized PNGs.  Will I return to this again in the future?  Probably.  It bugs me that the figures could be better, and yet aren’t.

It also bugs me that we didn’t get all of the figures and photos fully described in alt text.  I did write up alternative text for the figures in Chapter I, and a few of the photos have semi-decent captions, but this was something we didn’t see all the way through, and like I say, that bugs me.  If it also bugs you, please feel free to fork the repository and submit a pull request with good alt text.  Or, if you prefer, you could open an issue and include your suggested alt text that way.  By the image, by the section, by the chapter: whatever you can contribute would be appreciated.

Those image captions, by the way?  In the printed text, they’re laid out as a label (e.g., “Figure 1.02”) and then the caption text follows.  But when the text wraps, it doesn’t wrap below the label.  Instead, it wraps in its own self-contained block instead, with the text fully justified except for the last line, which is centered.  Centered!  So I set up the markup and CSS like this:

<figure>
	<img src="…" alt="…" loading="lazy">
	<figcaption>
		<span>Figure 1.02.</span> <span>Effects of a nuclear explosion.</span>
	</figcaption>
</figure>
figure figcaption {
	display: grid;
	grid-template-columns: max-content auto;
	gap: 0.75em;
	justify-content: center;
	text-align: justify;
	text-align-last: center;
}

Oh CSS Grid, how I adore thee.  And you too, CSS box alignment.  You made this little bit of historical recreation so easy, it felt like cheating.

Look at the way it’s all supposed to line up on the ± and one number doesn’t even have a ± and that decimal is just hanging out there in space like it’s no big deal.  LOOK AT IT.

Some other things weren’t easy.  The data tables, for example, have a tendency to align columns on the decimal place, even when most but not all of the numbers are integers.  Long, long ago, it was proposed that text-align be allowed a string value, something like text-align: '.', which you could then apply to a table column and have everything line up on that character.  For a variety of reasons, this was never implemented, a fact which frosts my windows to this day.  In general, I mean, though particularly so for this project.  The lack of it made keeping the presentation historically accurate a right pain, one I may get around to writing about, if I ever overcome my shame.  [Editor’s note: he overcame that shame.]

There are two things about the book that we deliberately chose not to faithfully recreate.  The first is the font face.  My best guess is that the book was typeset using something from the Century family, possibly Century Schoolbook (the New version of which was a particular favorite of mine in college).  The very-widely-installed Cambria seems fairly similar, at least to my admittedly untrained eye, and furthermore was designed specifically for screen media, so I went with body text styling no more complicated than this:

body {
	font: 1em/1.35 Cambria, Times, serif;
	hyphens: auto;
}

I suppose I could have tracked down a free version of Century and used it as a custom font, but I couldn’t justify the performance cost in both download and rendering speed to myself and any future readers.  And the result really did seem close enough to the original to accept.

The second thing we didn’t recreate is the printed-page layout, which is two-column.  That sort of layout can work very well on the book page; it almost always stinks on a Web page.  Thus, the content of the book is rendered online in a single column.  The exceptions are the chapter-ending Bibliography sections and the book’s Index, both of which contain content compact and granular enough that we could get away with the original layout.

There’s a lot more I could say about how this style or that pattern came about, and maybe someday I will, but for now let me leave you with this: all these decisions are subject to change, and open to input.  If you come up with a superior markup scheme for any of the bits of the book, we’re happy to look at pull requests or issues, and to act on them.  It is, as we say in our preface to the online edition, a living project.

We also hope that, by laying bare the grim reality of these horrific weapons, we can contribute in some small way to making them a dead and buried technology.


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at August 09, 2022 05:28 PM

August 07, 2022

Andy Wingo

coarse or lazy?

sweeping, coarse and lazy

One of the things that had perplexed me about the Immix collector was how to effectively defragment the heap via evacuation while keeping just 2-3% of space as free blocks for an evacuation reserve. The original Immix paper states:

To evacuate the object, the collector uses the same allocator as the mutator, continuing allocation right where the mutator left off. Once it exhausts any unused recyclable blocks, it uses any completely free blocks. By default, immix sets aside a small number of free blocks that it never returns to the global allocator and only ever uses for evacuating. This headroom eases defragmentation and is counted against immix's overall heap budget. By default immix reserves 2.5% of the heap as compaction headroom, but [...] is fairly insensitive to values ranging between 1 and 3%.

To Immix, a "recyclable" block is partially full: it contains surviving data from a previous collection, but also some holes in which to allocate. But when would you have recyclable blocks at evacuation-time? Evacuation occurs as part of collection. Collection usually occurs when there's no more memory in which to allocate. At that point any recyclable block would have been allocated into already, and won't become recyclable again until the next trace of the heap identifies the block's surviving data. Of course after the next trace they could become "empty", if no object survives, or "full", if all lines have survivor objects.

In general, after a full allocation cycle, you don't know much about the heap. If you could easily know where the live data and the holes were, a garbage collector's job would be much easier :) Any algorithm that starts from the assumption that you know where the holes are can't be used before a heap trace. So, I was not sure what the Immix paper is meaning here about allocating into recyclable blocks.

Thinking on it again, I realized that Immix might trigger collection early sometimes, before it has exhausted the previous cycle's set of blocks in which to allocate. As we discussed earlier, there is a case in which you might want to trigger an early compaction: when a large object allocator runs out of blocks to decommission from the immix space. And if one evacuating collection didn't yield enough free blocks, you might trigger the next one early, reserving some recyclable and empty blocks as evacuation targets.

when do you know what you know: lazy and eager

Consider a basic question, such as "how many bytes in the heap are used by live objects". In general you don't know! Indeed you often never know precisely. For example, concurrent collectors often have some amount of "floating garbage" which is unreachable data but which survives across a collection. And of course you don't know the difference between floating garbage and precious data: if you did, you would have collected the garbage.

Even the idea of "when" is tricky in systems that allow parallel mutator threads. Unless the program has a total ordering of mutations of the object graph, there's no one timeline with respect to which you can measure the heap. Still, Immix is a stop-the-world collector, and since such collectors synchronously trace the heap while mutators are stopped, these are times when you can exactly compute properties about the heap.

Let's retake the question of measuring live bytes. For an evacuating semi-space, knowing the number of live bytes after a collection is trivial: all survivors are packed into to-space. But for a mark-sweep space, you would have to compute this information. You could compute it at mark-time, while tracing the graph, but doing so takes time, which means delaying the time at which mutators can start again.

Alternately, for a mark-sweep collector, you can compute free bytes at sweep-time. This is the phase in which you go through the whole heap and return any space that wasn't marked in the last collection to the allocator, allowing it to be used for fresh allocations. This is the point in the garbage collection cycle in which you can answer questions such as "what is the set of recyclable blocks": you know what is garbage and you know what is not.

Though you could sweep during the stop-the-world pause, you don't have to; sweeping only touches dead objects, so it is correct to allow mutators to continue and then sweep as the mutators run. There are two general strategies: spawn a thread that sweeps as fast as it can (concurrent sweeping), or make mutators sweep as needed, just before they allocate (lazy sweeping). But this introduces a lag between when you know and what you know—your count of total live heap bytes describes a time in the past, not the present, because mutators have moved on since then.

For most collectors with a sweep phase, deciding between eager (during the stop-the-world phase) and deferred (concurrent or lazy) sweeping is very easy. You don't immediately need the information that sweeping allows you to compute; it's quite sufficient to wait until the next cycle. Moving work out of the stop-the-world phase is a win for mutator responsiveness (latency). Usually people implement lazy sweeping, as it is naturally incremental with the mutator, naturally parallel for parallel mutators, and any sweeping overhead due to cache misses can be mitigated by immediately using swept space for allocation. The case for concurrent sweeping is less clear to me, but if you have cores that would otherwise be idle, sure.

eager coarse sweeping

Immix is interesting in that it chooses to sweep eagerly, during the stop-the-world phase. Instead of sweeping irregularly-sized objects, however, it sweeps over its "line mark" array: one byte for each 128-byte "line" in the mark space. For 32 kB blocks, there will be 256 bytes per block, and line mark bytes in each 4 MB slab of the heap are packed contiguously. Therefore you get relatively good locality, but this just mitigates a cost that other collectors don't have to pay. So what does eager marking over these coarse 128-byte regions buy Immix?

Firstly, eager sweeping buys you eager identification of empty blocks. If your large object space needs to steal blocks from the mark space, but the mark space doesn't have enough empties, it can just trigger collection and then it knows if enough blocks are available. If no blocks are available, you can grow the heap or signal out-of-memory. If the lospace (large object space) runs out of blocks before the mark space has used all recyclable blocks, that's no problem: evacuation can move the survivors of fragmented blocks into these recyclable blocks, which have also already been identified by the eager coarse sweep.

Without eager empty block identification, if the lospace runs out of blocks, firstly you don't know how many empty blocks the mark space has. Sweeping is a kind of wavefront that moves through the whole heap; empty blocks behind the wavefront will be identified, but those ahead of the wavefront will not. Such a lospace allocation would then have to either wait for a concurrent sweeper to advance, or perform some lazy sweeping work. The expected latency of a lospace allocation would thus be higher, without eager identification of empty blocks.

Secondly, eager sweeping might reduce allocation overhead for mutators. If allocation just has to identify holes and not compute information or decide on what to do with a block, maybe it go brr? Not sure.

lines, lines, lines

The original Immix paper also notes a relative insensitivity of the collector to line size: 64 or 256 bytes could have worked just as well. This was a somewhat surprising result to me but I think I didn't appreciate all the roles that lines play in Immix.

Obviously line size affect the worst-case fragmentation, though this is mitigated by evacuation (which evacuates objects, not lines). This I got from the paper. In this case, smaller lines are better.

Line size affects allocation-time overhead for mutators, though which way I don't know: scanning for holes will be easier with fewer lines in a block, but smaller lines would contain more free space and thus result in fewer collections. I can only imagine though that with smaller line sizes, average hole size would decrease and thus medium-sized allocations would be harder to service. Something of a wash, perhaps.

However if we ask ourselves the thought experiment, why not just have 16-byte lines? How crazy would that be? I think the impediment to having such a precise line size would mainly be Immix's eager sweep, as a fine-grained traversal of the heap would process much more data and incur possibly-unacceptable pause time overheads. But, in such a design you would do away with some other downsides of coarse-grained lines: a side table of mark bytes would make the line mark table redundant, and you eliminate much possible "dark matter" hidden by internal fragmentation in lines. You'd need to defer sweeping. But then you lose eager identification of empty blocks, and perhaps also the ability to evacuate into recyclable blocks. What would such a system look like?

Readers that have gotten this far will be pleased to hear that I have made some investigations in this area. But, this post is already long, so let's revisit this in another dispatch. Until then, happy allocations in all regions.

by Andy Wingo at August 07, 2022 09:44 AM

August 04, 2022

Igalia Compilers Team

Igalia’s Compilers Team in 2022H1

As we enter the second half of 2022, we’d like to provide a summary (necessarily highly condensed and selective!) of what we’ve been up to recently, providing some insight into the breadth of technical challenges our team of over 20 compiler engineers has been tackling.

Low-level JS / JSC on 32-bit systems

We have continued to maintain support for 32-bit systems (mainly ARMv7, but also MIPS) in JavaScriptCore (JSC). The work involves continuous tracking of upstream development to prevent regressions as well as the development of new features:

  • A major milestone for this has been the completion of support for WebAssembly in the Low-Level Interpreter (LLInt) for ARMv7. The MIPS support is mostly complete.
  • Developed an initial prototype of the concurrency compilation in the DFG tier for 32-bit systems, and the results are promising. The work continues, and we expect to upstream it in 2022H2.
  • Code reduction and optimizations: we upstreamed several code reductions and optimizations for 32-bit systems (mainly ARMv7): 25% size reduction in DFGOSRExit blocks, 24% in baseline JIT on JetStream2 and 25% code size reduction from porting EXTRA_CTI_THUNKS.
  • Improved our hardware testing infrastructure with more MIPS and faster ARMv7 hardware for the buildbots running in the EWS (Early Warning System), which allows for a smaller response time for regressions.
  • Deployed two fuzzing bots that run test JSC 24/7. The bots already found a few issues upstream that we reported to Apple. The bugs that affect 64-bit systems were fixed by the team at Apple, while we are responsible for fixing the ones affecting 32-bit systems. We expect to work on them in 2022H2.
  • Added logic to transparently re-run failing JSC tests (on 32-bit platforms) and declare them a pass if they’re simply flaky, as long as the flakiness does not rise above a threshold. This means fewer false alerts for developers submitting patches to the EWS and for the people doing QA work. Naturally, the flakiness information is stored in the WebKit resultsdb and visualized at results.webkit.org.

JS and standards

Another aspect of our work is our contribution to the JavaScript standards effort, through involvement in the TC39 standards body, direct contribution to standards proposals, and implementation of those proposals in the major JS engines.

  • Further coverage of the Temporal spec in the Test262 conformance suite, as well as various specification updates. See this blog post for insight into some of the challenges tackled by Temporal.
  • Performance improvements for JS class features in V8, such as faster initialisations.
  • Work towards supporting snapshots in node.js, including activities such as fixing support for V8 startup snapshots in the presence of class field initializers.
  • Collaborating with others on the “types as comments” proposal for JS, successfully reaching stage 1 in the TC39 process.
  • Implementing ShadowRealm support in WebKit

WebAssembly

WebAssembly is a low-level compilation target for the web, which we have contributed to in terms of specification proposals, LLVM toolchain modifications, implementation work in the JS engines, and working with customers on use cases both on the server and in web browsers.

Some highlights from the last 6 months include:

  • Creation of a proposal for reference-typed strings in WebAssembly to ensure efficient operability with languages like JavaScript. We also landed patches to implement this proposal in V8.
  • Prototyping Just-In-Time (JIT) compilation within WebAssembly.
  • Working to implement support for WebAssembly GC types in Clang and LLVM (with one important use case being efficient and leak-free sharing of object graphs between JS and C++ compiled to Wasm).
  • Implementation of support for GC types in WebKit’s implementation of WebAssembly.

Events

With in-person meetups becoming possible again, Igalians have been talking on a range of topics – Multi-core Javascript (BeJS), TC39 (JS Nation), RISC-V LLVM (Cambridge RISC-V meetup), and more.

We’ve also had opportunities for much-needed face to face time within the team, with many of the compilers team meeting in Brussels in May, and for the company-wide summit held in A Coruña in June. These events provided a great opportunity to discuss current technical challenges, strategy, and ideas for the future, knowledge sharing, and of course socialising.

Team growth

Our team has grown further this year, being joined by:

  • Nicolò Ribaudo – a core maintainer of BabelJS, continuing work on that project after joining Igalia in June as well as contributing to work on JS modules.
  • Aditi Singh – previously worked with the team through the Coding Experience program, joining full time in March focusing on the Temporal project.
  • Alex Bradbury – a long-time LLVM developer who joined in March and is focusing on WebAssembly and RISC-V work in Clang/LLVM.

We’re keen to continue to grow the team and actively hiring, so if you think you might be interested in working in any of the areas discussed above, please apply here.

More about Igalia

If you’re keen to learn more about how we work at Igalia, a recent article at The New Stack provides a fantastic overview and includes comments from a number of customers who have supported the work described in this post.

by Compilers Team at August 04, 2022 07:30 AM

André Almeida

Keeping a project bisectable

People write code. Test coverage is never enough. Some angry contributor will disable the CI. And we all write bugs. But that’s OK, it part of the job. Programming is hard and sometimes we may miss a corner case, forget that numbers overflow and all other strange things that computers can do. One easy thing that we can do to help the poor developer that needs to find what changed in the code that stopped their printer to work properly, is to keep the project bisectable.

August 04, 2022 12:00 AM

July 24, 2022

Danylo Piliaiev

July 2022 Turnip Status Update

There is a steady progress being made since :tada: Turnip is Vulkan 1.1 Conformant :tada:. We now support GL 4.6 via Zink, implemented a lot of extensions, and are close to Vulkan 1.3 conformance.

Support of real-world games is also looking good, here is a video of Adreno 660 rendering “The Witcher 3”, “The Talos Principle”, and “OMD2”:

All of them have reasonable frame rate. However, there was a bit of “cheating” involved. Only “The Talos Principle” game was fully running on the development board (via box64), other two games were only rendered in real time on Adreno GPU, but were ran on x86-64 laptop with their VK commands being streamed to the dev board. You could read about this method in my post “Testing Vulkan drivers with games that cannot run on the target device”.

The video was captured directly on the device via OBS with obs-vkcapture, which worked surprisingly well after fighting a bunch of issues due to the lack of binary package for it and a bit dated Ubuntu installation.

Zink (GL over Vulkan)

A number of extensions were implemented that are required for Zink to support higher GL versions. As of now Turnip supports OpenGL 4.6 via Zink, and while yet not conformant - only a handful of GL CTS tests are failing. For the perspective, Freedreno (our GL driver for Adreno) supports only OpenGL 3.3.

For Zink adventures and profound post titles check out Mike Blumenkrantz’s awesome blog supergoodcode.com

If you are interested in Zink over Turnip bring up in particular, you should read:

Low Resolution Z improvements

A major improvement for low resolution Z optimization (LRZ) was recently made in Turnip, read about it in the previous post of mine: LRZ on Adreno GPUs

Extensions

Anyway, since the last update Turnip supports many more extensions (in no particular order):

What about Vulkan conformance?

Screenshot of a mesamatrix.net website which shows how many extensions left for Turnip to implement to be Vulkan 1.3 conformant
From mesamatrix.net/#Vulkan1.3

For Vulkan 1.3 conformance there are only a few extension left to implement. The only major ones are VK_KHR_dynamic_rendering and VK_EXT_inline_uniform_block required for Vulkan 1.3. VK_KHR_dynamic_rendering is currently being reviewed and foundation for VK_EXT_inline_uniform_block was recently merged.

That’s all for today!

by Danylo Piliaiev at July 24, 2022 09:00 PM

July 20, 2022

Andy Wingo

unintentional concurrency

Good evening, gentle hackfolk. Last time we talked about heuristics for when you might want to compact a heap. Compacting garbage collection is nice and tidy and appeals to our orderly instincts, and it enables heap shrinking and reallocation of pages to large object spaces and it can reduce fragmentation: all very good things. But evacuation is more expensive than just marking objects in place, and so a production garbage collector will usually just mark objects in place, and only compact or evacuate when needed.

Today's post is more details!

dedication

Just because it's been, oh, a couple decades, I would like to reintroduce a term I learned from Marnanel years ago on advogato, a nerdy group blog kind of a site. As I recall, there is a word that originates in the Oxbridge social environment, "narg", from "Not A Real Gentleman", and which therefore denotes things that not-real-gentlemen do: nerd out about anything that's not, like, fox-hunting or golf; or generally spending time on something not because it will advance you in conventional hierarchies but because you just can't help it, because you love it, because it is just your thing. Anyway, in the spirit of pursuits that are really not What One Does With One's Time, this post is dedicated to the word "nargery".

side note, bis: immix-style evacuation versus mark-compact

In my last post I described Immix-style evacuation, and noted that it might take a few cycles to fully compact the heap, and that it has a few pathologies: the heap might never reach full compaction, and that Immix might run out of free blocks in which to evacuate.

With these disadvantages, why bother? Why not just do a single mark-compact pass and be done? I implicitly asked this question last time but didn't really answer it.

For some people will be, yep, yebo, mark-compact is the right answer. And yet, there are a few reasons that one might choose to evacuate a fraction of the heap instead of compacting it all at once.

The first reason is object pinning. Mark-compact systems assume that all objects can be moved; you can't usefully relax this assumption. Most algorithms "slide" objects down to lower addresses, squeezing out the holes, and therefore every live object's address needs to be available to use when sliding down other objects with higher addresses. And yet, it would be nice sometimes to prevent an object from being moved. This is the case, for example, when you grant a foreign interface (e.g. a C function) access to a buffer: if garbage collection happens while in that foreign interface, it would be nice to be able to prevent garbage collection from moving the object out from under the C function's feet.

Another reason to want to pin an object is because of conservative root-finding. Guile currently uses the Boehm-Demers-Weiser collector, which conservatively scans the stack and data segments for anything that looks like a pointer to the heap. The garbage collector can't update such a global root in response to compaction, because you can't be sure that a given word is a pointer and not just an integer with an inconvenient value. In short, objects referenced by conservative roots need to be pinned. I would like to support precise roots at some point but part of my interest in Immix is to allow Guile to move to a better GC algorithm, without necessarily requiring precise enumeration of GC roots. Optimistic partial evacuation allows for the possibility that any given evacuation might fail, which makes it appropriate for conservative root-finding.

Finally, as moving objects has a cost, it's reasonable to want to only incur that cost for the part of the heap that needs it. In any given heap, there will likely be some data that stays live across a series of collections, and which, once compacted, can't be profitably moved for many cycles. Focussing evacuation on only the part of the heap with the lowest survival rates avoids wasting time on copies that don't result in additional compaction.

(I should admit one thing: sliding mark-compact compaction preserves allocation order, whereas evacuation does not. The memory layout of sliding compaction is more optimal than evacuation.)

multi-cycle evacuation

Say a mutator runs out of memory, and therefore invokes the collector. The collector decides for whatever reason that we should evacuate at least part of the heap instead of marking in place. How much of the heap can we evacuate? The answer depends primarily on how many free blocks you have reserved for evacuation. These are known-empty blocks that haven't been allocated into by the last cycle. If you don't have any, you can't evacuate! So probably you should keep some around, even when performing in-place collections. The Immix papers suggest 2% and that works for me too.

Then you evacuate some blocks. Hopefully the result is that after this collection cycle, you have more free blocks. But you haven't compacted the heap, at least probably not on the first try: not into 2% of total space. Therefore you tell the mutator to put any empty blocks it finds as a result of lazy sweeping during the next cycle onto the evacuation target list, and then the next cycle you have more blocks to evacuate into, and more and more and so on until after some number of cycles you fall below some overall heap fragmentation low-watermark target, at which point you can switch back to marking in place.

I don't know how this works in practice! In my test setups which triggers compaction at 10% fragmentation and continues until it drops below 5%, it's rare that it takes more than 3 cycles of evacuation until the heap drops to effectively 0% fragmentation. Of course I had to introduce fragmented allocation patterns into the microbenchmarks to even cause evacuation to happen at all. I look forward to some day soon testing with real applications.

concurrency

Just as a terminological note, in the world of garbage collectors, "parallel" refers to multiple threads being used by a garbage collector. Parallelism within a collector is essentially an implementation detail; when the world is stopped for collection, the mutator (the user program) generally doesn't care if the collector uses 1 thread or 15. On the other hand, "concurrent" means the collector and the mutator running at the same time.

Different parts of the collector can be concurrent with the mutator: for example, sweeping, marking, or evacuation. Concurrent sweeping is just a detail, because it just visits dead objects. Concurrent marking is interesting, because it can significantly reduce stop-the-world pauses by performing most of the computation while the mutator is running. It's tricky, as you might imagine; the collector traverses the object graph while the mutator is, you know, mutating it. But there are standard techniques to make this work. Concurrent evacuation is a nightmare. It's not that you can't implement it; you can. But it's very very hard to get an overall performance win from concurrent evacuation/copying.

So if you are looking for a good bargain in the marketplace of garbage collector algorithms, it would seem that you need to avoid concurrent copying/evacuation. It's an expensive product that would seem to not buy you very much.

All that is just a prelude to an observation that there is a funny source of concurrency even in some systems that don't see themselves as concurrent: mutator threads marking their own roots. To recall, when you stop the world for a garbage collection, all mutator threads have to somehow notice the request to stop, reach a safepoint, and then stop. Then the collector traces the roots from all mutators and everything they reference, transitively. Then you let the threads go again. Thing is, once you get more than a thread or four, stopping threads can take time. You'd be tempted to just have threads notice that they need to stop, then traverse their own stacks at their own safepoint to find their roots, then stop. But, this introduces concurrency between root-tracing and other mutators that might not have seen the request to stop. For marking, this concurrency can be fine: you are just setting mark bits, not mutating the roots. You might need to add an additional mark pattern that can be distinguished from marked-last-time and marked-the-time-before-but-dead-now, but that's a detail. Fine.

But if you instead start an evacuating collection, the gates of hell open wide and toothy maws and horns fill your vision. One thread could be stopping and evacuating the objects referenced by its roots, while another hasn't noticed the request to stop and is happily using the same objects: chaos! You are trying to make a minor optimization to move some work out of the stop-the-world phase but instead everything falls apart.

Anyway, this whole article was really to get here and note that you can't do ragged-stops with evacuation without supporting full concurrent evacuation. Otherwise, you need to postpone root traversal until all threads are stopped. Perhaps this is another argument that evacuation is expensive, relative to marking in place. In practice I haven't seen the ragged-stop effect making so much of a difference, but perhaps that is because evacuation is infrequent in my test cases.

Zokay? Zokay. Welp, this evening's nargery was indeed nargy. Happy hacking to all collectors out there, and until next time.

by Andy Wingo at July 20, 2022 09:26 PM

Víctor Jáquez

Gamepad in WPEWebkit

This is the brief story of the Gamepad implementation in WPEWebKit.

It started with an early development done by Eugene Mutavchi (kudos!). Later, by the end of 2021, I retook those patches and dicussed them with my fellow igalian Adrián, and we decided to come with a slightly different approach.

Before going into the details, let’s quickly review the WPE architecture:

  1. cog library — it’s a shell library that simplifies the task of writing a WPE browser from the scratch, by providing common functionality and helper APIs.
  2. WebKit library — that’s the web engine that, given an URI and other following inputs, returns, among other ouputs, graphic buffers with the page rendered.
  3. WPE library — it’s the API that bridges cog (1) (or whatever other browser application) and WebKit (2).
  4. WPE backend — it’s main duty is to provide graphic buffers to WebKit, buffers supported by the hardware, the operating system, windowing system, etc.

Eugene’s implementation has code in WebKit (implementing the gamepad support for WPE port); code in WPE library with an API to communicate WebKit’s gamepad and WPE backend, which provided a custom implementation of gamepad, reading directly the event in the Linux device. Almost everything was there, but there were some issues:

  • WPE backend is mainly designed as a set of protocols, similar to Wayland, to deal with graphic buffers or audio buffers, but not for input events. Cog library is the place where input events are handled and injected to WebKit, such as keyboard.
  • The gamepad handling in a WPE backend was ad-hoc and low level, reading directly the events from Linux devices. This approach is problematic since there are plenty gamepads in the market and each has its own axis and buttons, so remapping them to the standard map is required. To overcome this issue and many others, there’s a GNOME library: libmanette, which is already used by WebKitGTK port.

Today’s status of the gamepad support is that it works but it’s not yet fully upstreamed.

  • merged ">">libwpe pull request.
  • cog pull request — there are two implementations: none and libmanette. None is just a dummy implementation which will ignore any request for a gamepad provider; it’s provided if libmanette is not available or if available libwpe hasn’t gamepad support.
  • WebKit pull request.

To prove you all that it works my exhibit A is this video, where I play asteroids in a RasberryPi 4 64 bits:

The image was done with buildroot, using its master branch (from a week ago) with a bunch of modifications, such as adding libmanette, a kernel patch for my gamepad device, kernel 5.15.55 and its corresponding firmware, etc.

by vjaquez at July 20, 2022 10:08 AM

July 19, 2022

Manuel Rego

Some highlights of the Web Engines Hackfest 2022

Last month Igalia arranged a new edition of the Web Engines Hackfest in A Coruña (Galicia, Spain), where brought together more than 70 people working on the web platform during 2 days, with lots of discussions and conversations around different features.

This was my first onsite event since “before times”, it was amazing seeing people for real after such a long time, meeting again some old colleagues and also a bunch of people for the first time. Being an organizer of the event meant that they were very busy days for me, but it looks like people were happy with the result and enjoyed the event quite a lot.

This is a brief post about my personal highlights during the event.

Talks

During the hackfest we had an afternoon with 5 talks, the talks were live streamed on YouTube and people could follow them remotely and also ask questions through the event matrix channel.

Leo Balter's Talk

Leo Balter’s Talk

  • Leo Balter talked about how Salesforce participates on the web platform as partner, working with browsers and web standards.
    I really liked this talk, because it explains how companies that use the web platform, can collaborate and have a direct impact on the evolution of the web. And there are many ways to do that: reporting bugs that affect your company, explaining use cases that are important for you and the things you miss from the platform, providing feedback about different features, looking for partners to fix outstanding issues or add support for new stuff, etc.
    Igalia has been showing during the last decade that there’s a way to have an impact on the web platform outside of the big companies and browser vendors. Thanks to our position on the different communities, we can help companies to push features they’re interested in and that would benefit the entire web platform in the future.
  • Dominik Röttsches gave a talk about COLRv1 fonts giving details on Chromium implementation and the different open-source software components involved.
    This new font format allows to do really amazing things and Dominik showed how to create a Galician emoji font with popular things like Tower of Hercules or Polbo á feira. With some early demos on variable COLRv1 and the beginnings of the first Galician emoji font…
  • Daniel Minor explained the work done in Gecko and SpiderMonkey to refactor the internationalization system.
    Very interesting talk with lots of information and details about internationalization, going deep on text segmentation and how it works on different languages, and also introducing the ICU4X project.
  • Ada Rose Cannon did a great introduction to WebXR and Augmented Reality.
    Despite not being onsite, this was an awesome talk and the video was actually a very immersive experience. Ada explained many concepts and features around WebXR and Augmented Reality with a bunch of cool examples and demos.
  • Thomas Steiner talked about Project Fugu APIs that have been implemented in Chromium.
    Using the Web Engines Hackfest logo as example, he explained different new capabilities that Project Fugu is adding to the web through a real application called SVGcode.

It was a great set of talks, and you can now watch them all on YouTube. We hope you enjoy them if you haven’t the chance to watch them yet.

CSS & Interop 2022

On the CSS breakout session we talked about all the new big features that are arriving to browsers these days. Container Queries and :has being probably the most notable examples, features that people have been requesting since the early days and that are shipping into browsers this year.

Apart from that, we talked about the Interop 2022 effort. How the target areas to improve interoperability are defined, and how much it implies the work in some of them.

MathML & Fonts

Frédéric Wang did a nice presentation about MathML and all the work that has been done in the recent years. The feature is close to shipping in Chromium (modulo finding some solution regarding printing or waiting for LayoutNG to be ready to print), that will be a huge step forward for MathML becoming it a feature supported in the major browser engines.

Related to the MathML work there were some discussion around fonts, particularly OpenType MATH fonts, you can read Fred’s post for more details. There are some good news regarding this topic, new macOS version includes STIX Two Math installed by default, and there are ongoing conversations to get some OpenType MATH font by default in Android too.

MathML Breakout Session MathML Breakout Session

Accessibility & AOM

Valerie Young, who has recently started acting as co-chair of the ARIA Working Group, was leading a session around accessibility where we talked about ARIA and related things like AOM.

The Accessibility Object Model (AOM) is an effort that involves a lot of different things. In this session we talked about ARIA Attribute Reflection and the issues making accessible custom elements that use Shadow DOM, and that proposals like Cross-root ARIA Delegation are trying to solve.

Accessibility Breakout Session Accessibility Breakout Session

Acknowledgements

To close this post I’d like to say thank you to everyone that participated in the Web Engines Hackfest, without your presence this event wouldn’t make any sense. Also I’d like to thank the speakers for the great talks and the time devoted to work on them for this event. As usual big thanks to the sponsors Arm, Google and Igalia to make this event possible once more. And thanks again to Igalia for letting me be part of the event organization.

Web Engines Hackfest 2022 Sponsors - Host & Organizer: Igalia. Gold Sponsors: Arm, Google and Igalia. Other Sponsors: Arm (Lunch sponsor) Web Engines Hackfest 2022 Sponsors

July 19, 2022 10:00 PM

July 18, 2022

Víctor Jáquez

GstVA H.264 encoder, compositor and JPEG decoder

There are, right now, three new GstVA elements merged in main: vah264enc, vacompositor and vajpegdec.

Just to recap, GstVA is a GStreamer plugin in gst-plugins-bad (yes, we agree it’s not a great name anymore), to differentiate it from gstreamer-vaapi. Both plugins use libva to access stateless video processing operations; the main difference is, precisely, how stream’s state is handled: while GstVA uses GStreamer libraries shared by other hardware accelerated plugins (such as d3d11 and v4l2codecs), gstreamer-vaapi uses an internal, tightly coupled and convoluted library.

Also, note that right now (release 1.20) GstVA elements are ranked NONE, while gstreamer-vaapi ones are mostly PRIMARY+1.

Back to the three new elements in GstVA, the most complex one is vah264enc wrote almost completely by He Junyan, from Intel. For it, He had to write a H.264 bitwriter which is, until certain extend, the opposite for H.264 parser: construct the bitstream buffer from H.264 structures such as PPS, SPS, slice header, etc. This API is part of libgstcodecparsers, ready to be reused by other plugins or applications. Currently vah264enc is fairly complete and functional, dealing with profiles and rate controls, among other parameters. It still have rough spots, but we’re working on them. But He Junyan is restless and he already has in the pipeline an encoder common class along with a HEVC and AV1 encoders.

The second element is vacompositor, wrote by Artie Eoff. It’s the replacement of vaapioverlay in gstreamer-vaapi. The suffix compositor is preferred to follow the name of primary video mixing (software-based) element: compositor, successor of videomixer. See this discussion for further details. The purpose of this element is to compose a single video stream from multiple video streams. It works with Intel’s media-driver supporting alpha channel, and also works with AMD Mesa Gallium, but without alpha channel (in other words, a custom degree of transparency).

The last, but not the least, element is vajpegdec, which I worked on. The main issue was not the decoder itself, but jpegparse, which didn’t signal the image caps required for the hardware accelerated decoders. For instance, VA only decodes images with SOF marker 0 (Baseline DCT). It wasn’t needed before because the main and only consumer of the parser is jpegdec which deals with any type of JPEG image. Long story short, we revamped jpegparse and now it signals sof marker, color space (YUV, RGB, etc.) and chroma subsampling (if it has YUV color space), along with comments and EXIF-like metadata as pipeline’s tags. Thus vajpegdec will expose in caps template the supported color space and chroma subsampling supported by the driver. For example, Intel supports (more or less) RGB color space, while AMD Mesa Gallium don’t.

And that’s all for now. Thanks.

by vjaquez at July 18, 2022 06:27 PM

July 12, 2022

Danylo Piliaiev

Low-resolution-Z on Adreno GPUs

Table of Contents

What is LRZ?

Citing official Adreno documentation:

[A Low Resolution Z (LRZ)] pass is also referred to as draw order independent depth rejection. During the binning pass, a low resolution Z-buffer is constructed, and can reject LRZ-tile wide contributions to boost binning performance. This LRZ is then used during the rendering pass to reject pixels efficiently before testing against the full resolution Z-buffer.

My colleague Samuel Iglesias did the initial reverse-engineering of this feature; for its in-depth overview you could read his great blog post Low Resolution Z Buffer support on Turnip.

Here are a few excerpts from this post describing what is LRZ?

To understand better how LRZ works, we need to talk a bit about tiled-based rendering. This is a way of rendering based on subdividing the framebuffer in tiles and rendering each tile separately.

The binning pass processes the geometry of the scene and records in a table on which tiles a primitive will be rendered. By doing this, the HW only needs to render the primitives that affect a specific tile when is processed.

The rendering pass gets the rasterized primitives and executes all the fragment related processes of the pipeline. Once it finishes, the resolve pass starts.

Where is LRZ used then? Well, in both binning and rendering passes. In the binning pass, it is possible to store the depth value of each vertex of the geometries of the scene in a buffer as the HW has that data available. That is the depth buffer used internally for LRZ. It has lower resolution as too much detail is not needed, which helps to save bandwidth while transferring its contents to system memory.

Thanks to LRZ, the rendering pass is only executed on the fragments that are going to be visible at the end.

LRZ brings a couple of things on the table that makes it interesting. One is that applications don’t need to reorder their primitives before submission to be more efficient, that is done by the HW with LRZ automatically.

Now, a year later, I returned to this feature to make some important improvements, for nitty-gritty details you could dive into Mesa MR#16251 “tu: Overhaul LRZ, implement on-GPU dir tracking and LRZ fast-clear”. There I implemented on-GPU LRZ direction tracking, LRZ reuse between renderpasses, and fast-clear of LRZ.

In this post I want to give a practical advice, based on things I learnt while reverse-engineering this feature, on how to help driver to enable LRZ. Some of it could be self-evident, some is already written in the official docs, and some cannot be found there. It should be applicable for Vulkan, GLES, and likely for Direct3D.

Do not change the direction of depth comparisons

Or rather, when writing depth - do not change the direction of depth comparisons. If depth comparison direction is changed while writing into depth buffer - LRZ would have to be disabled.

Why? Because if depth comparison direction is GREATER - LRZ stores the lowest depth value of the block of pixels, if direction is LESS - it stores the highest value of the block. So if direction is changed the LRZ value becomes wrong for the new direction.

A few examples:

  • :thumbsup: Going from VK_COMPARE_OP_GREATER -> VK_COMPARE_OP_GREATER_OR_EQUAL is good;
  • :x: Going from VK_COMPARE_OP_GREATER -> VK_COMPARE_OP_LESS is bad;
  • :neutral_face: From VK_COMPARE_OP_GREATER with depth write -> VK_COMPARE_OP_LESS without depth write is ok;
    • LRZ would be just temporally disabled for VK_COMPARE_OP_LESS draw calls.

The rules could be summarized as:

  • Changing depth write direction disables LRZ;
  • For calls with different direction but without depth write LRZ is temporally disabled;
  • VK_COMPARE_OP_GREATER and VK_COMPARE_OP_GREATER_OR_EQUAL have same direction;
  • VK_COMPARE_OP_LESS and VK_COMPARE_OP_LESS_OR_EQUAL have same direction;
  • VK_COMPARE_OP_EQUAL and VK_COMPARE_OP_NEVER don’t have a direction, LRZ is temporally disabled;
    • Surprise, your VK_COMPARE_OP_EQUAL compares don’t benefit from LRZ;
  • VK_COMPARE_OP_ALWAYS and VK_COMPARE_OP_NOT_EQUAL either temporally or completely disable LRZ, depending on depth being written.

Simple rules for fragment shader

Do not write depth

This obviously makes resulting depth value unpredictable, so LRZ has to be completely disabled.

Note, the output values of manually written depth could be bound by conservative depth modifier, for GLSL this is achieved by GL_ARB_conservative_depth extension, like this:

layout (depth_greater) out float gl_FragDepth;

However, Turnip at the moment does not consider this hint, and it is unknown if Qualcomm’s proprietary driver does.

Do not use Blending/Logic OPs/colorWriteMask

All of them make a new fragment value depend on the old fragment value. LRZ is temporary disabled in this case.

Do not have side-effects in fragment shaders

Writing to SSBO, images, … from fragment shader forces late Z, thus it is incompatible with LRZ. At the moment Turnip completely disables LRZ when shader has such side-effects.

Do not discard fragments

Discarding fragments moves the decision whether fragment contributes to the depth buffer to the time of fragment shader execution. LRZ is temporary disabled in this case.

LRZ in secondary command buffers and dynamic rendering

TLDR: Since Snapdragon 865 (Adreno 650) LRZ supported in secondary command buffers.

TLDR: LRZ would work with VK_KHR_dynamic_rendering, but you’d like to avoid using this extension because it isn’t nice to tilers.


Official docs state that LRZ is disabled with “Use of secondary command buffers (Vulkan)”, and on another page that “Snapdragon 865 and newer will not disable LRZ based on this criteria”.

Why?

Because up to Snapdragon 865 tracking of the direction is done on the CPU, meaning that LRZ direction is kept in internal renderpass object, updated and checked without any GPU involvement.

But starting from Snapdragon 865 the direction could be tracked on GPU which allows driver not to know previous LRZ direction during a command buffer construction. Therefor secondary command buffers could now use LRZ!


Recently Vulkan 1.3 came out and mandated the support of VK_KHR_dynamic_rendering. It gets rid of complicated VkRenderpass and VkFramebuffer setup, but much more exciting is a simpler way for parallel renderpasses construction (with VK_RENDERING_SUSPENDING_BIT / VK_RENDERING_RESUMING_BIT flags).

VK_KHR_dynamic_rendering poses a similar challenge for LRZ as secondary command buffers and has the same solution.

Reusing LRZ between renderpasses

TLDR: Since Snapdragon 865 (Adreno 650) LRZ would work if you store depth in one renderpass and load it later, giving depth image isn’t changed in-between.


Another major improvement brought by Snapdragon 865 is the possibility to reuse LRZ state between renderpasses.

The on-GPU direction tracking is part of the equation here, another part is the tracking of a depth view being used. Depth image has a single LRZ buffer which corresponds to a single array layer + single mip level of the image. So if view with different array layer or mip layer is used - LRZ state couldn’t be reused and will be invalidated.

With the above knowledge here are the conditions when LRZ state could be reused:

  • Depth attachment was stored (STORE_OP_STORE) at the end of some past renderpass;
  • The same depth attachment with the same depth view settings is being loaded (not cleared) in the current renderpass;
  • There were no changes in the underlying depth image, meaning there was no vkCmdBlitImage*, vkCmdCopyBufferToImage*, or vkCmdCopyImage*. Otherwise LRZ state would be invalidated;

Misc notes:

  • LRZ state is saved per depth image so you don’t lose the state if you you have several renderpasses with different depth attachments;
  • vkCmdClearAttachments + LOAD_OP_LOAD is just equal to LOAD_OP_CLEAR.

Conclusion

While there are many rules listed above - it all boils down to keeping things simple in the main renderpass(es) and not being too clever.

by Danylo Piliaiev at July 12, 2022 09:00 PM

July 07, 2022

Brian Kardell

Where Browsers Come From

Where Browsers Come From

In this post, I’ll talk about the evolution of the economics around browsers, and what we should think about as we move forward.

The very first web browsers were rather small, single-person affairs. The source for Tim Berners-Lee’s original WorldWideWeb (which included a WYSIWYG editor!) was rougly a third larger than the unminified build of jQuery. Tim even abstracted the HTTP bits so you could build your own browser more easily, or a server (or both/all in one). For the most part, we can say these were created in academia.

There was at least some assumption at the time that like most other available software, someone could monetize a good one as a product. In fact, before building his own, Tim famously tried to give the idea away to a company doing just that.

For a while we flirted with that idea in various forms, creating companies “around” browsers. These companies largely never “just sold the browser” but involved lots of experimenting with ways to keep browsers “free” for individuals by subsidizing them in other ways: coporate sales, licences, support, server sales, and so on. With these concerted effort, and money, it was soon impractical for a 1 person affair to keep up or compete. Instead, the competition pretty quickly became between Netscape and Microsoft.

Microsoft, on the other hand, subsidized the browser through products people were already buying. They were already spending money to add similar web features to many those products, so why not just centralize it? They eventually just shipped it with the OS.

When it became clear that Netscape’s model was losing, the people there who really cared about the web had a real problem: Now what? Unless someone could do something radical, Microsoft’s way would be the only way.

And then, two interesting things happened…

In 1998, as a result, Mozilla was spun off from Netscape and Open Source in the modern sense was born. The browser known as Firefox wouldn’t even launch its 1.0 version until 2004.

At literally the same time, Google (the search engine) was developing and getting users. It existed for 6 years before it launched its IPO – also in 2004. The web had indexes and search engines before Google, and had some clear winners. But, it would be an understatement to say that Google was game-changing. It just absolutely nailed search with some totally new ideas. They made it simple and accurate.

Google was banking on what really good search meant to the Web: It made it practical and easy. The easier they made it, the more people turned to the web for answers, and so on. That meant people would just be searching, literally all the all the time - and each of those searches came with ad opportunities!

You can see just how important this is to getting around on the web today. We’ve replaced the URL bar with the Wonder Bar (integrated search). If it doesn’t look like a URL, it goes to a search engine - automatically… But it wasn’t always like that.

Just before Firefox 1.0, Mozilla signed a landmark “default search” deal which would pay for 85% of their total budget. It wasn’t the intent for that to be the only means forever. In fact, they tried to expand into product ideas too. However, this dependency has generally only increased, recently accounting for up to 95% of the total Mozilla budget. Over time, this model would become the dominant approach.

Also, by the release of Firefox 1.0 its codebase was up to 2.1 million lines of code.

How we pay for the web

Step back and look at the growth in the size and complexity of the 3 remaining engines over time…

browser ~lines of code
OG WWW 13,000
Firefox 1.0 2,100,000
Chromium 2022 34,900,821

Today, the cost to develop and maintain a browser is measured in the many hundreds of millions of dollars per year – each. And here’s the thing…

Most of the actual, practical revenue that has made the whole thing possible ever since is in a large sense driven by the model of monetizing browsers through default search.

In some cases this is pretty direct and easy to point to. In other cases, it’s a little obscured. However, I ask you to consider that companies (no matter how profitable) don’t voluntarily commit to bleeding hundreds of millions of dollars per year (and rising), forever. Something has to make it more economically viable - and today that thing is lots and lots of default search revenue.

Without it, the ledgers would suddenly go red, but default search deals ensure the books aren’t being drained of massive piles of money. In fact, they’re generating revenue.

It’s tempting to think “Well, so? It seems to be working pretty well? This is the way”.

…But, is it? Is it the only way? Should it always be the only way?

I kind of think not.

Starting a conversation about our future

We’ve never really just sold the browser, we’ve always subsidized browser development in other ways. I’m definitely not suggesting that users should have to pay to download and use a web browser, but I do think it’s useful to realize that that doesn’t exactly make browsers free either. Instead, it spreads out costs some other way that obscures those details from the conversation. From that angle, it might be a useful thought exercise to consider: What does the browser commons cost, per-user, per-year?

Based on what we know of team sizes, scales and budgets - several people seem to have arrived on a similar ballpark figure: Somewhere around 2 billion dollars per year. Those are current costs to maintain all the current engines and make/keep them competive. I expect that for many this will induce a kind of “sticker shock” and it will be tempting to think “wow, that’s inefficient, maybe we don’t need 3 engines after all.” However, let’s dig in a little further.

The web currently has about 5 billion users. So, a hyptothetical per-user, per-year cost is less than 50 cents per year. As I said earlier, we’ve never required that, and we shouldn’t. We spread costs and subsidize them. But, let’s continue our thought exercise a bit further: If the wealthiest 1/5th of users instead paid $2 per year, that would still meet costs. If we keep going ‘up’, at $10 per year it would take a representative number of users comparative to the population Brazil. Going further, if the the wealthiest 1,000 companies equally subsidized the web, that would cost them $2 million each.

All of this is just to illustrate a few pretty simple things: The more we spread it out, the less it costs anyone.

Today, the search-ads-based model spreads those costs to everyone who advertises - which, it turns out, is pretty large population with money. Fractions of fractions of the revenue generated from this actually makes it back down to cover those enormous browser costs today.

They do this in exhange for certain things: At a minimum, this is in exchange for showing you lots of ads. However, many advertisers also think it’s valuable for them to know as much as possible about you in the process. And, well, at some level - it’s worth keeping in mind that currently, they’re paying. If we want to change any aspect of that, we’ve got to reckon with that fact.

But there are lots of opportunities to spread out those costs beyond that, aren’t there?

Downstream

Open Source has a lot of great qualities. Among them, it lets ideas smash together and new projects can get spun up which wouldn’t otherwise be economically possible.

It’s important to realize that the engines/open source browser projects are all cost. It’s the actual deployed browsers that are built with them that are currently monetizable in this way at all. But there are lots of other things built with these projects too, many of which have also become lucritive projects.

The current open source model doesn’t require them to give anything back. That’s not to say none do - some do (substantially even), but it’s entirely voluntary and many don’t at all. It just so happens that the flagship browsers who sponsor an engine/open source browser project have so many users that they can lose quite a lot to people only taking without upsetting this delicate balance.

You’ll note that while I said those flagship browsers not only pay for development, they generate revenue with default search - there’s still only a tiny few organizations on the planet willing to maintain an engine/browser. A big part of that is that that’s only possible if you are a flagship browser.

In my view, we need to work on that on several fronts. First, we need to acknowledge it’s a problem and begin trying to change the conversation. Thankfully, we’re seeing lots of things developing here like Open Collective, and we’re starting to see people increasingly pointing that out. A few weeks ago, Nicholas C. Zakas (@slicknet) wrote Sponsoring dependencies: The next step in open source sustainability which is worth a read. However, no matter how we make money trickle through the system, we also need to find new ways of making it appealing to put it in in the first place. Making business investment into the commons tax-deductable would probably go a long way.

Ads

Another interesting part of that is that many of those who make those lucrative products but don’t give back do then advertise them. One is a very easy pitch (ads), the other is hard to justify even at a fraction of the cost. So, advertising can tap from completely different departments in the same company, for different reasons. I think there’s something important in that distinction that we should look at too.

For example, “plentiful ads” isn’t even the only ads model! Conversely, some kinds of ad opportunies are worth a lot more because they aren’t plentiful. They are exclusive. The Super Bowl, for example, lasts only a few hours and raises more money than Firefox’s current annual budget in that time. It’s worth noting that even with its currently reduced marketshare, more people use Firefox than watch the Super Bowl! (Google search makes about 2 orders of magnitude more in a quarter than that, btw). Maybe there’s something there worth thinking about.

But… search?

You might have noticed a huge flaw in all of this discussion: It’s been focused entirely on how the browser commons is funded by search, but it’s left off the fact that even if we funded browsers a completely different way - they still need to search, and that is where all the money currently is. Attacking only one end of this would sort of make it worse.

Yeah, this is kind of the perfect mousetrap we’ve built for ourselves.

That original search deal was so killer because they realized just how fundamental integrated search would be to just using the web and … I hate to point it out again, but that is also not free. In fact, Google’s search engine costs a lot more than their web browser engine… And someone has to pay for that too, and it is again, the central money spigot.

I’m genuinely not sure how to address that. It’s harder to call search engines we have today “the commons”, though search integration is clearly fundamental to it. That seems much more challenging to tackle. It is something of an optimization problem and involves businesses and incentives and users making choices about things I think most people are not even especially well equipped to understand.

The one thing I can say for sure though is: If we don’t talk about problems, or even begin to agree they might be problems, it’s pretty hard to improve them. So, let’s start doing that.

July 07, 2022 04:00 AM

July 06, 2022

Clayton Craft

Deleting stuff is fun and helpful!

Debugging problems during software development can be frustrating, even if it's immensely rewarding once you eventually figure it out. This is especially true when reproducing the problem seemingly requires some complex dance through tons of other parts of the project. Frustration is further exacerbated when the language used is one you're still trying to master.

There's one technique I've learned through the School of Hard Knocks to deal with problems that are overly complex in one form or another: simplify it

What do I mean? I mean start removing / commenting out code that is seemingly unrelated. Try to simplify the input into the function or whatever into the simplest form that causes the problem to happen. Pull out portions of code that you think are related to the problem and build a new program in an attempt to reproduce the problem with some skeleton of the original code base / configuration. With source control, you're not really at risk of "losing" anything, so go wild with removing things. Your goal is to recreate the problem at hand with as simple of a program (and input!) as possible.

Why would you want to do this? I'm glad you asked!

Easier to spot the root cause

Oftentimes in the process of simplifying it, I notice the cause of the problem. "Oops, I'm pretty sure I didn't actually mean to do that." - me, a lot.

Easier for comparisons

If the problem magically goes away after simplifying it, then I have a valuable starting point for diffing the broken program against. If it's still not obvious, then I start re-adding components (with attempts to reproduce in between additions) until I can cause the problem to happen again.

Easier to ask for help

If I simplify it and the problem still happens, it's much easier to share a simplified version when asking for help. Somewhat unsurprisingly, people are generally more willing to help if they don't have to spend half a day studying the entire codebase to get a handle on how things are supposed to work. Allowing them to also reproduce the problem more easily can only help.

Easier to turn into a test

Simplifying the code to reproduce the problem can include removing unrelated functionality and trying to mock out unrelated dependencies. This is like... 90-something percent of the way towards writing a test. So once the issue is discovered and resolved, I already have a test ready to go that can confirm the fix and help guard against it from happening again in the future! Win!

The next time you're baffled while debugging by some weird behavior, just start deleting stuff! It's fun! And it'll probably help!

July 06, 2022 12:00 AM

July 05, 2022

Alberto Garcia

Running the Steam Deck’s OS in a virtual machine using QEMU

SteamOS desktop

Introduction

The Steam Deck is a handheld gaming computer that runs a Linux-based operating system called SteamOS. The machine comes with SteamOS 3 (code name “holo”), which is in turn based on Arch Linux.

Although there is no SteamOS 3 installer for a generic PC (yet), it is very easy to install on a virtual machine using QEMU. This post explains how to do it.

The goal of this VM is not to play games (you can already install Steam on your computer after all) but to use SteamOS in desktop mode. The Gamescope mode (the console-like interface you normally see when you use the machine) requires additional development to make it work with QEMU and will not work with these instructions.

A SteamOS VM can be useful for debugging, development, and generally playing and tinkering with the OS without risking breaking the Steam Deck.

Running the SteamOS desktop in a virtual machine only requires QEMU and the OVMF UEFI firmware and should work in any relatively recent distribution. In this post I’m using QEMU directly, but you can also use virt-manager or some other tool if you prefer, we’re emulating a standard x86_64 machine here.

General concepts

SteamOS is a single-user operating system and it uses an A/B partition scheme, which means that there are two sets of partitions and two copies of the operating system. The root filesystem is read-only and system updates happen on the partition set that is not active. This allows for safer updates, among other things.

There is one single /home partition, shared by both partition sets. It contains the games, user files, and anything that the user wants to install there.

Although the user can trivially become root, make the root filesystem read-write and install or change anything (the pacman package manager is available), this is not recommended because

  • it increases the chances of breaking the OS, and
  • any changes will disappear with the next OS update.

A simple way for the user to install additional software that survives OS updates and doesn’t touch the root filesystem is Flatpak. It comes preinstalled with the OS and is integrated with the KDE Discover app.

Preparing all the necessary files

The first thing that we need is the installer. For that we have to download the Steam Deck recovery image from here: https://store.steampowered.com/steamos/download/?ver=steamdeck&snr=

Once the file has been downloaded, we can uncompress it and we’ll get a raw disk image called steamdeck-recovery-4.img (the number may vary).

Note that the recovery image is already SteamOS (just not the most up-to-date version). If you simply want to have a quick look you can play a bit with it and skip the installation step. In this case I recommend that you extend the image before using it, for example with ‘truncate -s 64G steamdeck-recovery-4.img‘ or, better, create a qcow2 overlay file and leave the original raw image unmodified: ‘qemu-img create -f qcow2 -F raw -b steamdeck-recovery-4.img steamdeck-recovery-extended.qcow2 64G

But here we want to perform the actual installation, so we need a destination image. Let’s create one:

$ qemu-img create -f qcow2 steamos.qcow2 64G

Installing SteamOS

Now that we have all files we can start the virtual machine:

$ qemu-system-x86_64 -enable-kvm -smp cores=4 -m 8G \
    -device usb-ehci -device usb-tablet \
    -device intel-hda -device hda-duplex \
    -device VGA,xres=1280,yres=800 \
    -drive if=pflash,format=raw,readonly=on,file=/usr/share/ovmf/OVMF.fd \
    -drive if=virtio,file=steamdeck-recovery-4.img,driver=raw \
    -device nvme,drive=drive0,serial=badbeef \
    -drive if=none,id=drive0,file=steamos.qcow2

Note that we’re emulating an NVMe drive for steamos.qcow2 because that’s what the installer script expects. This is not strictly necessary but it makes things a bit easier. If you don’t want to do that you’ll have to edit ~/tools/repair_device.sh and change DISK and DISK_SUFFIX.

SteamOS installer shortcuts

Once the system has booted we’ll see a KDE Plasma session with a few tools on the desktop. If we select “Reimage Steam Deck” and click “Proceed” on the confirmation dialog then SteamOS will be installed on the destination drive. This process should not take a long time.

Now, once the operation finishes a new confirmation dialog will ask if we want to reboot the Steam Deck, but here we have to choose “Cancel”. We cannot use the new image yet because it would try to boot into the Gamescope session, which won’t work, so we need to change the default desktop session.

SteamOS comes with a helper script that allows us to enter a chroot after automatically mounting all SteamOS partitions, so let’s open a Konsole and make the Plasma session the default one in both partition sets:

$ sudo steamos-chroot --disk /dev/nvme0n1 --partset A
# steamos-readonly disable
# echo '[Autologin]' > /etc/sddm.conf.d/zz-steamos-autologin.conf
# echo 'Session=plasma.desktop' >> /etc/sddm.conf.d/zz-steamos-autologin.conf
# steamos-readonly enable
# exit

$ sudo steamos-chroot --disk /dev/nvme0n1 --partset B
# steamos-readonly disable
# echo '[Autologin]' > /etc/sddm.conf.d/zz-steamos-autologin.conf
# echo 'Session=plasma.desktop' >> /etc/sddm.conf.d/zz-steamos-autologin.conf
# steamos-readonly enable
# exit

After this we can shut down the virtual machine. Our new SteamOS drive is ready to be used. We can discard the recovery image now if we want.

Booting SteamOS and first steps

To boot SteamOS we can use a QEMU line similar to the one used during the installation. This time we’re not emulating an NVMe drive because it’s no longer necessary.

$ cp /usr/share/OVMF/OVMF_VARS.fd .
$ qemu-system-x86_64 -enable-kvm -smp cores=4 -m 8G \
   -device usb-ehci -device usb-tablet \
   -device intel-hda -device hda-duplex \
   -device VGA,xres=1280,yres=800 \
   -drive if=pflash,format=raw,readonly=on,file=/usr/share/ovmf/OVMF.fd \
   -drive if=pflash,format=raw,file=OVMF_VARS.fd \
   -drive if=virtio,file=steamos.qcow2 \
   -device virtio-net-pci,netdev=net0 \
   -netdev user,id=net0,hostfwd=tcp::2222-:22

(the last two lines redirect tcp port 2222 to port 22 of the guest to be able to SSH into the VM. If you don’t want to do that you can omit them)

If everything went fine, you should see KDE Plasma again, this time with a desktop icon to launch Steam and another one to “Return to Gaming Mode” (which we should not use because it won’t work). See the screenshot that opens this post.

Congratulations, you’re running SteamOS now. Here are some things that you probably want to do:

  • (optional) Change the keyboard layout in the system settings (the default one is US English)
  • Set the password for the deck user: run ‘passwd‘ on a terminal
  • Enable / start the SSH server: ‘sudo systemctl enable sshd‘ and/or ‘sudo systemctl start sshd‘.
  • SSH into the machine: ‘ssh -p 2222 deck@localhost

Updating the OS to the latest version

The Steam Deck recovery image doesn’t install the most recent version of SteamOS, so now we should probably do a software update.

  • First of all ensure that you’re giving enought RAM to the VM (in my examples I run QEMU with -m 8G). The OS update might fail if you use less.
  • (optional) Change the OS branch if you want to try the beta release: ‘sudo steamos-select-branch beta‘ (or main, if you want the bleeding edge)
  • Check the currently installed version in /etc/os-release (see the BUILD_ID variable)
  • Check the available version: ‘steamos-update check
  • Download and install the software update: ‘steamos-update

Note: if the last step fails after reaching 100% with a post-install handler error then go to Connections in the system settings, rename Wired Connection 1 to something else (anything, the name doesn’t matter), click Apply and run steamos-update again. This works around a bug in the update process. Recent images fix this and this workaround is not necessary with them.

As we did with the recovery image, before rebooting we should ensure that the new update boots into the Plasma session, otherwise it won’t work:

$ sudo steamos-chroot --partset other
# steamos-readonly disable
# echo '[Autologin]' > /etc/sddm.conf.d/zz-steamos-autologin.conf
# echo 'Session=plasma.desktop' >> /etc/sddm.conf.d/zz-steamos-autologin.conf
# steamos-readonly enable
# exit

After this we can restart the system.

If everything went fine we should be running the latest SteamOS release. Enjoy!

Reporting bugs

SteamOS is under active development. If you find problems or want to request improvements please go to the SteamOS community tracker.

Edit 06 Jul 2022: Small fixes, mention how to install the OS without using NVMe.

by berto at July 05, 2022 07:11 PM

July 01, 2022

Claudio Saavedra

Fri 2022/Jul/01

I wrote a technical overview of the WebKit WPE project for the WPE WebKit blog, for those interested in WPE as a potential solution to the problem of browsers in embedded devices.

This article begins a series of technical writeups on the architecture of WPE, and we hope to publish during the rest of the year further articles breaking down different components of WebKit, including graphics and other subsystems, that will surely be of great help for those interested in getting more familiar with WebKit and its internals.

July 01, 2022 10:39 AM

June 29, 2022

Patrick Griffis

WebExtension Support in Epiphany

I’m excited to help bring WebExtensions to Epiphany (GNOME Web) thanks to investment from my employer Igalia. In this post, I’ll go over a summary of how extensions work and give details on what Epiphany supports.

Web browsers have supported extensions in some form for decades. They allow the creation of features that would otherwise be part of a browser but can be authored and experimented with more easily. They’ve helped develop and popularize ideas like ad blocking, password management, and reader modes. Sometimes, as in very popular cases like these, browsers themselves then begin trying to apply lessons upstream.

Toward universal support

For most of this history, web extensions have used incompatible browser-specific APIs. This began to change in 2015 with Firefox adopting an API similar to Chrome’s. In 2020, Safari also followed suit. We now have the foundations of an ecosystem-wide solution.

“The foundations of” is an important thing to understand: There are still plenty of existing extensions built with browser-specific APIs and this doesn’t magically make them all portable. It does, however, provide a way towards making portable extensions. In some cases, existing extensions might just need some porting. In other cases, they may utilize features that aren’t entirely universal yet (or, may never be).

Bringing Extensions to Epiphany

With version 43.alpha Epiphany users can begin to take advantage of some of the same powerful and portable extensions described above. Note that there are quite a few APIs that power this and with this release we’ve covered a meaningful segment of them but not all (details below). Over time our API coverage and interoperability will continue to grow.

What WebExtensions can do: Technical Details

At a high level, WebExtensions allow a private privileged web page to run in the browser. This is an invisible Background Page that has access to a browser JavaScript API. This API, given permission, can interact with browser tabs, cookies, downloads, bookmarks, and more.

Along with the invisible background page, it gives a few options to show a UI to the user. One such method is a Browser Action which is shown as a button in the browser’s toolbar that can popup an HTML view for the user to interact with. Another is an Options Page dedicated to configuring the extension.

Lastly, an extension can inject JavaScript directly into any website it has permissions to via Content Scripts. These scripts are given full access to the DOM of any web page they run in. However content scripts don’t have access to the majority of the browser API but, along with the above pages, it has the ability to send and receive custom JSON messages to all pages within an extension.

Example usage

For a real-world example, I use Bitwarden as my password manager which I’ll simplify how it roughly functions. Firstly there is a Background Page that does account management for your user. It has a Popup that the user can trigger to interface with your account, passwords, and options. Finally, it also injects Content Scripts into every website you open.

The Content Script can detect all input fields and then wait for a message to autofill information into them. The Popup can request the details of the active tab and, upon you selecting an account, send a message to the Content Script to fill this information. This flow does function in Epiphany now but there are still some issues to iron out for Bitwarden.

Epiphany’s current support

Epiphany 43.alpha supports the basic structure described above. We are currently modeling our behavior after Firefox’s ManifestV2 API which includes compatibility with Chrome extensions where possible. Supporting ManifestV3 is planned alongside V2 in the future.

As of today, we support the majority of:

  • alarms - Scheduling of events to trigger at specific dates or times.
  • commands - Keyboard shortcuts.
  • cookies - Management and querying of browser cookies.
  • downloads - Ability to start and manage downloads.
  • menus - Creation of context menu items.
  • notifications - Ability to show desktop notifications.
  • storage - Storage of extension private settings.
  • tabs - Control and monitoring of browser tabs, including creating, closing, etc.
  • windows - Control and monitoring of browser windows.

A notable missing API is webRequest which is commonly used by blocking extensions such as uBlock Origin or Privacy Badger. I would like to implement this API at some point however it requires WebKitGTK improvements.

For specific API details please see Epiphany’s documentation.

What this means today is that users of Epiphany can write powerful extensions using a well-documented and commonly used format and API. What this does not mean is that most extensions for other browsers will just work out of the box, at least not yet. Cross-browser extensions are possible but they will have to only require the subset of APIs and behaviors Epiphany currently supports.

How to install extensions

This support is still considered experimental so do understand this may lead to crashes or other unwanted behavior. Also please report issues you find to Epiphany rather than to extensions.

You can install the development release and test it like so:

flatpak remote-add --if-not-exists gnome-nightly https://nightly.gnome.org/gnome-nightly.flatpakrepo
flatpak install gnome-nightly org.gnome.Epiphany.Devel
flatpak run --command=gsettings org.gnome.Epiphany.Devel set org.gnome.Epiphany.web:/org/gnome/epiphany/web/ enable-webextensions true

You will now see Extensions in Epiphany’s menu and if you run it from the terminal it will print out any message logged by extensions for debugging. You can download extensions most easily from Mozilla’s website.

June 29, 2022 04:00 AM

June 24, 2022

Tim Chevalier

on impostor syndrome, or: worry dies last

According to Sedgwick, it was just this kind of interchange that fueled her emotional re-education. She came to see that the quickness of her mind was actually holding back her progress, because she expected emotional change to be as easy to master as a new theory: “It’s hard to recognize that your whole being, your soul doesn’t move at the speed of your cognition,” she told me. “That it could take you a year to really know something that you intellectually believe in a second.” She learned “how not to feel ashamed of the amount of time things take, or the recalcitrance of emotional or personal change.”

Maria Russo, “The reeducation of a queer theorist”, 1999

My colleague Ioanna Dimitriou told me “worry dies last”, and it made me remember this passage from an interview with Eve Kosofsky Sedgwick.

It’s especially common in fields where people’s work is constantly under review by talented peers, such as academia or Open Source Software, or taking on a new job.

Geek Feminism Wiki, “Impostor Syndrome”

At the end of 2012/beginning of 2013 I wrote a four-part blog post about my experiences with impostor syndrome. That led to me getting invited to speak on episode 113 of the “Ruby Rogues” podcast, which was dedicated to impostor syndrome. (Unfortunately, from what I can tell, their web site is gone.)

Since then, my thinking about impostor syndrome has changed.

“Impostor syndrome” is an entirely rational behavior for folks who do get called impostors (ie. many underrepresented people). It’s part coping mechanism, part just listening to the feedback you’re getting….

We call it “impostor syndrome”, but we’re not sick. The real sickness is an industry that calls itself a meritocracy but over and over and over fails to actually reward merit.

This is fixable. It will take doing the work of rooting out bias in all its forms, at all levels – and critically, in who gets chosen to level up. So let’s get to work.

Leigh Honeywell, “Impostor Syndrome”, 2016

I agree with everything Leigh wrote here. Impostor syndrome, like any response to past or ongoing trauma, is not a pathology. It’s a reasonable adaptation to an environment that places stresses on your mind and body that exhaust your resources for coping with those demands. I wrote a broader post about this point in 2016, called “Stop Moralizing About Personality Traits”.

Acceptance is the first step towards change. By now, I’ve spent over a decade consciously reckoning with the repercussions of growing up and into young adulthood without emotional support, both on the micro-level (family and intimate relationships) and the macro-level (being a perennial outsider with no home on either side of a variety of social borders: for example, those of gender, sexuality, disability, culture, and nationality). When i started my current job last year, I wasn’t over it. That made it unnecessarily hard to get started and put up a wall between me and any number of people who might have offered help if they’d only known what I was going through. I’m still not over it.

To recognize, and name as a problem, the extent to which my personality has been shaped by unfair social circumstances: that was one step. Contrary to my acculturation as an engineer, the next step is not “fix the problem”. In fact, there is no patch you can simply apply to your own inner operating system, because all of your conscious thoughts run in user space. Maybe you can attach a debugger to your own kernel, but some changes can’t be made to a running program without a cold reboot. I don’t recommend trying that at home.

Learning to identify impostor syndrome (or, as you might call it, “dysfunctional environment syndrome”; or, generalizing, “complex trauma” or “structural violence”) is one step, but a bug report isn’t the same thing as a passing regression test. As with free software, improvement has to come a little bit at a time, from many different contributors; there are few successful projects with a single maintainer.

I am ashamed of the amount of time things take, of looking like a senior professional on the outside as long as my peers don’t know (or aren’t thinking about) how I’ve never had a single job in tech for more than two years, about what it was like for me to move from job to job never picking enough momentum to accomplish anything that felt real to me. I wonder whether they think I’ve got it all figured out, which I don’t, but it often feels easier to just let people think that and suffer in silence. Learning to live with trauma requires trusting relationships; you can’t do it on your own. But the trauma itself makes it difficult to impossible to trust and to enter into genuine relationships.

I am not exaggerating when I say that my career has been traumatic for me; it has both echoed much older traumas and created entirely new ones. That’s a big part of why I had to share how I felt about finally meeting more of my co-workers in person. I’m 41 years old and I feel like I should be better at this by now. I’m not. But I’ll keep trying, if it takes a lifetime.

by Tim Chevalier at June 24, 2022 12:52 PM

June 23, 2022

Tim Chevalier

we belong

I am about 70% robot and 30% extremely sentimental and emotional person, generally in series rather than in parallel. But last week’s Igalia summit was a tidal wave of feelings, unexpected but completely welcome. Some of those feelings are ones I’ve already shared with the exact people who need to know, but there are some that I need to share with the Internet. I am hoping I’m not the only one who feels this way, though I don’t think I am.

A lot of us are new and this was our first summit. Meeting 60 or 70 people who were previously 2D faces on a screen for half an hour a week, at best, was intense. I was told by others that reuniting with long-time friends/colleagues/comrades/whatever words you want to use (and it’s hard to find the exact right one for a workplace like this) who they hadn’t seen since pre-pandemic was intense as well.

For me, there was more to it. I doubt I’m alone in this either, but it might explain why I’m feeling so strongly.

I tried to quit tech in 2015. I couldn’t afford to in the end, and I went to Google. They fired me for (allegedly) discriminating against white men, in late 2017. I decided it was time to quit again. I became an EMT and then a patient care coordinator, and applied to nursing schools. I got rejected. I decided I didn’t want to try again because I had learned that unless I became a physician, working in health care would never give me the respect I need. Unfortunately, I have an ego. I like to think that I balance it out with empathy more than some people in tech do, but it’s still there.

I got a DM in 2018 from some guy I knew from Twitter asking if I wanted to apply to Igalia, and I waited three years to take him up on it. Now I’m here.

Getting started wasn’t easy. The two weeks working from the office before the summit wasn’t easy either. But it all fell away sometime between Wednesday and Friday of last week, and quite unexpectedly, I decided I’m moving to Europe as soon as I can, probably to A Coruña (where Igalia’s headquarters is) at first but who knows where life will take me next? Listing all the reasons would take too long. But: I found a safe space, one where I felt welcome, accepted, like I belonged. It’s not a perfect one; I stood up during one of the meetings and expressed my pain at the dissonance between the comfort I feel here and the knowledge that most of the faces in the room were white and most belonged to men. I want to say we’re working on it, but our responsibility is to finish the work, not to feel good that we’ve started it. That’s true for writing code to deliver to a customer, and it’s true for achieving fairness.

I am old enough now to accommodate multiple conflicting truths. My desire to improve the unfairness, and to get other people to open their hearts enough to risk all-consuming rage at just how unfair things can be, those things coexist with my joy about finding a group of such consistently caring, thoughtful, and justice-minded people — ones who don’t seem to mind having me around.

I’m normally severely allergic to words like “love” and “family” in a corporate context. As an early childhood trauma survivor, these words are fraught, and I would rather keep things at work a bit more chill. At the same time, when I heard Igalians use these words during the summit to talk about our collective, it didn’t feel as menacing as it usually does. Maybe the right word to use here — the thing that we really mean when we generalize the words “love” and “family” because we’ve been taught (incorrectly) that it can only come from our lovers or parents — is “safety”. Safety is one of the most underrated concepts there is. Feeling safe means believing that you can rely on the people around you, that they know where you’re coming from or else if they don’t, that they’re willing to try to find out, that they’re willing to be changed by what happens if they do find out. I came in apprehensive. But in little ways and in big ways, I found safe people, not just one or two but a lot.

I could say more, but if I did, I might never stop. To channel the teenaged energy that I’m feeling right now (partly due to reconnecting with that version of myself who loved computers and longed to find other people who did too), I’ll include some songs that convey how I feel about this week. I don’t know if this will ring true for anyone else, but I have to try.

Allette Brooks, “Silicon Valley Rebel”

We lean her bike along the office floor
They never know what to expect shaved into the back of her head when she walks in the door
And she says ‘I don’t believe in working like that for a company
It’s not like they care about you
It’s not like they care about me’

Please don’t leave us here alone in this silicon hell, oh
Life would be so unbearable without your rebel yell...

Vienna Teng, “Level Up”

Call it any name you need
Call it your 2.0, your rebirth, whatever –
So long as you can feel it all
So long as all your doors are flung wide
Call it your day number 1 in the rest of forever

If you are afraid, give more
If you are alive, give more now
Everybody here has seams and scars

Namoli Brennet “We Belong”

Here's to all the tough girls
And here's to all the sensitive boys
We Belong
Here's to all the rejects
And here's to all the misfits
We Belong

Here's to all the brains and the geeks
And here's to all the made up freaks, yeah
We Belong

And when the same old voices say
That we'd be better off running away
We belong, We belong anyway

The Nields, “Easy People”

You let me be who I want to be

Bob Franke, “Thanksgiving Eve”

What can you do with your days
But work and hope
Let your dreams bind your work to your play

And most of all, the Mountain Goats, “Color in Your Cheeks”

They came in by the dozens, walking or crawling
Some were bright-eyed, some were dead on their feet
And they came from Zimbabwe, or from Soviet Georgia
East Saint Louis, or from Paris, or they lived across the street
But they came, and when they finally made it here
It was the least that we could do to make our welcome clear

Come on in
We haven't slept for weeks
Drink some of this
This'll put color in your cheeks

This is a different kind of post from the ones I was originally planning to do on this blog. And I never thought I’d be talking about my job this way. Life comes at you fast. To return to the Allette Brooks lyric: it’s because at a co-op, there’s no “they” that’s separate from “you” and “me”. It’s just “you” and “me”, and we care about each other. It turns out that safe spaces and cooperative structure aren’t just political ideas that happen to correlate — in a company, neither can exist without the other. It’s not a safe space if you can get fired over one person’s petty grievance, like being reminded that white men don’t understand everything. Inversely, cooperative structure can’t work without deep trust. Trust is hard to scale, and as Igalia grows I worry about what will happen (doubtless, the people who were here when it was 1/10ths of the size have a different view.) There is no guarantee of success, but I want to be one of the ones to try.

And we’re hiring. I know how hard it is to try again when you’ve been humiliated, betrayed, and disappointed at work before, something that’s more common than not in tech when you don’t look like, sound like, or feel like everybody else. I’m here because somebody believed in me. I’m glad both that they did and that I was able to return that leap of faith and believe that I truly was believed in. And I would like to pass along the favor to you. Of course, to do that, I have to get to know you a little bit first. As long as I continue to have some time, I want to talk to people in groups that are systematically underrepresented in tech who may be intrigued by what I wrote here but aren’t sure if they’re good enough. My email is tjc at (obvious domain name) and the jobs page is at https://www.igalia.com/jobs. Even if you don’t see a technical role there that exactly fits you, please don’t let that stop you from reaching out; what matters most is willingness to learn and to tolerate the sometimes-painful, always-rewarding process of creating something together with mutual consent rather than coercion.

by Tim Chevalier at June 23, 2022 09:33 PM

June 22, 2022

Brian Kardell

Achievement Unlocked: Intent to Mathify

Achievement Unlocked: Intent to Mathify

We are about to reach a very unique and honestly pretty epic moment in standards history, and most people probably won’t even notice. It’s also personally meaningful to me, so I’d like to tell you about it and why it’s worth celebrating it…

Igalia just filed an Intent to Ship support for MathML-Core in Chromium. If you’re not familliar with what an intent to ship means - the blink release process involves several stages before something gets to ship. An intent to ship marks the beginning of the final step which allows the feature to just work by default in stable releases.

I know, I know. Many website developers will think “that doesn’t help me much and I have all these other problems which I would love to solve instead”. But… that’s kind of the problem and why this is momentus in several ways.

MathML has a wildly interesting history. At some level, the need for being able to display mathematical text seem like it would have been obvious from the web’s start at CERN, right? It was! In fact, support for rendering some math existed in CERNs experimental browser in 1993. Graphics too. It’s unsurprising then that SVG and MathML were among the first active working groups at the W3C when it was established. MathML was, in fact, the first XML oriented standard that the W3C ever published with it’s first recommendation coming in April 1998. For reference that’s over a year before HTML 4.0.1 reached REC… During the “HTML5” split it, along with SVG was specially integrated into the new, very well defined parser.

So why is it suddenly “news?”.

Well… It is complicated, and I wrote one of my personally favorite pieces explaining it all at the beginning of 2019, before I came to work at Igalia. Really, I think it is enjoyable and you should read it, but the TL;DR version is that standards implementations, and their prioritization is voluntary. In the same way that many developers are focused on shopping and selling and animations and layout and better modularization and… lots of other problems - there were just more appealing things on their plates which would appease a wider audience. There are millions of equations in Wikipedia alone, but we don’t think about it so much because we’ve developed plenty of clever (and often not-so-clever) workarounds to deal with “until the last one lands”.

And so progress was slower than normal. Volunteers did an amazing amount of the actual implementation work in many cases. And then, just as it looked like we were about to cross a finish line, Chrome forked WebKit and decided that - for now - they were going to rip out the newly landed MathML which had some problems to make it easier for them to refactor major parts of the engine. And then the way we do standards changed. We got more rigorous over the years. Basically - the story just kept getting worse. It was almost like we were going backwards for math on the web.

By 2018 or so it was looking pretty unable to be righted. Igalia was provided many arguments about how hard the problem was, and the scale of actually righting the ship. It was more than just 1 more implementation, which would already have been a huge effort - it was about re-establishing a working group, specifying all of the previously unspecified things, in a way that fit the platform (coordinating with many other working groups and WHATWG on various details), going through a review with the W3C Technical Architecture Group, and so on.

But, here we are.

Setting this right isn’t just historically unique in that way either: I’m pretty sure it’s safe to say that no non-browser organization has ever landed something of this kind of scale before - and they’ve never done it with some aid from various funding sources either. While Igalia wound up footing the lion’s share of the bill ourselves, we also had financial support at stages from NISO and the Alfred P Sloan Foundation, APS Physics, and Pearson and a small collection of donors that included $75k from two people. Really, in many ways, that effort was the pre-cursor to our whole Open Prioritization idea.

Personal notes

For me, I have a lot of personal connections to this that make it meaningful. As I said in Harold Crick and the Web Platform, I might be partially responsible for its delays.

Little did he know, he’d get deeply involved in helping solve that problem.

When I came to Igalia, it was one of my first projects.

I helped fix some things. The first Web Platform Test I ever contributed was for MathML. I think the first WHATWG PR I ever sent was on MathML. The first BlinkOn Lightning Talk I ever did was about - you guessed it, MathML. The first W3C Charter I ever helped write? That’s right: About MathML. The first actual Working Group I’ve ever chaired (well, co-chaired) is about Math. The first explainer I ever wrote myself was about MathML. The first podcast I ever hosted was on… Guess what? MathML. And so on.

And here’s the thing: I am perhaps the least mathematically minded person you will ever meet. But we did the right thing. A good thing. And we did it a good way, and for good reasons.

A few episodes later on our podcast we had Rick Byers from Chrome and Rossen Atanassov from Microsoft on the show, and Rick brought up MathML. Both of them were hugely impressed and supportive of it even back then - Rick said

I fully expect MathML to ship at some point and it’ll ship in Chrome… Even though from Google’s business perspective, it probably wouldn’t have been a good return on investment for us to do it… I’m thrilled that Igalia was able to do it.

By then, there was a global pandemic underway and Rossen pointed out…

I’m looking forward to it. Look, I… I’m a huge supporter of having a native math into into the engines, MathML… And having the ability to, at the end of the day, increase the edu market, which will benefit the most out of it. Especially, you know, having been through one full semester of having a middle school student at home, and having her do all of her work through online tools… Having better edu support will go a long way. So, thank you on behalf all of the students, future students that will benefit

And… yeah, that’s kind of it right? There’s this huge thing that has no special seeming obvious ROI for browsers, that doesn’t have every developer in the world screaming for it but it’s really great for society and kind of important to serve this niche because it underpins the ability of student, physicists, mathematicians etc to communicate with text.

Mission Accomplished Progress

Anyway… Wow. This is huge, I think, in so many ways.

I’m gonna go raise a glass to everyone who helped achieve this astonishingly huge thing.

I think it says so much about our ability to do things together and promise for the ecosystem and how it could work.

I sort of wish I could just say “Mission Accomplished” but the truth is that this is a beginning, not an end. It means we have really good support of parsing and rendering (assuming some math capable fonts) tons and tons of math interoperably, and a spec for how it needs to integrate with the whole rest of the platform, but only 1 implementation of that bit. Now we have to align the others implementations the same way so that we really have just One Platform and all of it can move forward together and no part gets left behind again.

Then, beyond that is MathML-Core Level 2. Level 1 draws a box around what was practically able to be aligned in the first pass - but it leaves a few hard problems on the table which really need solving. Many of these have some partial solutions in the other 2 browsers already but are hard to specify and integrate.

I have a lot of faith that we can reach both of those goals, but to do it takes investment. I really hope that reaching this milestone helps convince organizations and vendors to contribute toward reaching them. I’d encourage everyone to give the larger ecosystem some thought and consider how we can support good efforts - even maybe directly.

Help us support this work through Open Prioritization / Open Collective

If you'd like to understand the argument for this better, my friend and colleague at Igalia Eric Meyer presented an excellent talk when we announced it...

June 22, 2022 04:00 AM

June 21, 2022

Ziran Sun

My first time attending Igalia Summit

With employee distributed in over 20 countries globally and most are working remotely, Igalia holds summits twice a year to give the employees opportunities to meet up face-to-face. The summit normally runs in a period of a week with code camps, team building and recreational activities etc.. This year’s summer summit was held between the 15th-19th of June in A Coruña, Galicia.

I joined Igalia in the beginning of 2020. Because of COVID-19 pandemic, this was my first time attending an Igalia Summit. Due to personal time schedule I only managed to stay at A Coruña for three nights. The overall experience was great and I thoroughly enjoyed it!

Getting to know A Coruña

Igalia’s HQ is based in A Coruña in Galicia, Spain. A beautiful white-sanded beach is just a few yards away from the hotel we stayed. I did a barefoot stroll along the beach in one morning. The beach itself was reasonably quiet. There were a few people swimming in the shallow part of the sea. Weather that morning was perfect with warm sunshine and occasional cool gentle breezes.

The smell of sea in the air and people wandering around relaxingly in the evenings, somehow brought a familiar feeling to me and reminded of my home town.

On Wednesday I joined guided visit to A Coruña set in 1906. It was a very interesting walk around the historic part of the city. The tour guide Suso Martín presented an amazing one man show by walking us through the A Coruña‘s history back to the 12th century, historic buildings, religion and romances associated with the city.

On Thursday evening we went for the guided tour of MEGA, Mundo Estrella Galicia. En route we passed the office that Igalia used to locate at in early days. According to some Igalians who worked there before, it was a small office and there were about just over 10 Igalians in those days. Today Igalia has over 100 employees and last year we celebrated our 20th year anniversary in Open Source.

The highlights of the MEGA tour for me were watching the product line working and tasting the beers. We were spoiled by beers and local cheeses.

Meeting other igalians

Since I joined Igalian, all the meetings happened online due to pandemic. I was glad that I could finally meet other Igalians “physically” . During the summit I had chances to chat other Igalians at code camp, during meals and guided tours. It was pleasant. Playing board games together after an evening meal (I’d call it “night meal”) was a great fun. Some Igalians can be very funny and witty. I had quite a few “tearful laughter” moments. It was very enjoyable.

During the team code camp, I had chances to spending more time with my teammates. The technical meetings in both days were very engaging and effective. I was very happy to see Javi Fernández in person. Last year I was involved in CSS grid compatibility work (one of the 5 key areas for the Compat2021 effort that Igalia was involved in). I personally had never touched CSS Grid layout at the beginning of the assignment. The web platform team in Igalia though has very good knowledge and working experiences in this area. My teammates Manuel Rego, Javi Fernández and Sergio Villar were among the pioneer developers in this field. To assure the success of the task, the team formed a safe environment by providing me constant guidance and supports. Specifically, I had Javi’s helps throughout the whole task. Javi has been an amazing technical mentor with great expertise and patience. We had numerous calls for technical discussions. The team also had a couple of debugging sessions with me for some very tricky bugs.

The team meal on Wednesday evening was very nice. Delicious food, great companions and nice fruity beers – what could go wrong? Well, we did walk back to the hotel in the rain…

Thanks

The event was very well organized and productive. I really appreciate the hard work and great effort those igalians put in to make it happen. Just want to say – a big THANK YOU! I’m glad that I managed the trip. Traveling from the UK is not straightforward but I’d say it’s well worthy it.

by zsun at June 21, 2022 01:02 PM

Andy Wingo

an optimistic evacuation of my wordhoard

Good morning, mallocators. Last time we talked about how to split available memory between a block-structured main space and a large object space. Given a fixed heap size, making a new large object allocation will steal available pages from the block-structured space by finding empty blocks and temporarily returning them to the operating system.

Today I'd like to talk more about nothing, or rather, why might you want nothing rather than something. Given an Immix heap, why would you want it organized in such a way that live data is packed into some blocks, leaving other blocks completely free? How bad would it be if instead the live data were spread all over the heap? When might it be a good idea to try to compact the heap? Ideally we'd like to be able to translate the answers to these questions into heuristics that can inform the GC when compaction/evacuation would be a good idea.

lospace and the void

Let's start with one of the more obvious points: large object allocation. With a fixed-size heap, you can't allocate new large objects if you don't have empty blocks in your paged space (the Immix space, for example) that you can return to the OS. To obtain these free blocks, you have four options.

  1. You can continue lazy sweeping of recycled blocks, to see if you find an empty block. This is a bit time-consuming, though.

  2. Otherwise, you can trigger a regular non-moving GC, which might free up blocks in the Immix space but which is also likely to free up large objects, which would result in fresh empty blocks.

  3. You can trigger a compacting or evacuating collection. Immix can't actually compact the heap all in one go, so you would preferentially select evacuation-candidate blocks by choosing the blocks with the least live data (as measured at the last GC), hoping that little data will need to be evacuated.

  4. Finally, for environments in which the heap is growable, you could just grow the heap instead. In this case you would configure the system to target a heap size multiplier rather than a heap size, which would scale the heap to be e.g. twice the size of the live data, as measured at the last collection.

If you have a growable heap, I think you will rarely choose to compact rather than grow the heap: you will either collect or grow. Under constant allocation rate, the rate of empty blocks being reclaimed from freed lospace objects will be equal to the rate at which they are needed, so if collection doesn't produce any, then that means your live data set is increasing and so growing is a good option. Anyway let's put growable heaps aside, as heap-growth heuristics are a separate gnarly problem.

The question becomes, when should large object allocation force a compaction? Absent growable heaps, the answer is clear: when allocating a large object fails because there are no empty pages, but the statistics show that there is actually ample free memory. Good! We have one heuristic, and one with an optimum: you could compact in other situations but from the point of view of lospace, waiting until allocation failure is the most efficient.

shrinkage

Moving on, another use of empty blocks is when shrinking the heap. The collector might decide that it's a good idea to return some memory to the operating system. For example, I enjoyed this recent paper on heuristics for optimum heap size, that advocates that you size the heap in proportion to the square root of the allocation rate, and that as a consequence, when/if the application reaches a dormant state, it should promptly return memory to the OS.

Here, we have a similar heuristic for when to evacuate: when we would like to release memory to the OS but we have no empty blocks, we should compact. We use the same evacuation candidate selection approach as before, also, aiming for maximum empty block yield.

fragmentation

What if you go to allocate a medium object, say 4kB, but there is no hole that's 4kB or larger? In that case, your heap is fragmented. The smaller your heap size, the more likely this is to happen. We should compact the heap to make the maximum hole size larger.

side note: compaction via partial evacuation

The evacuation strategy of Immix is... optimistic. A mark-compact collector will compact the whole heap, but Immix will only be able to evacuate a fraction of it.

It's worth dwelling on this a bit. As described in the paper, Immix reserves around 2-3% of overall space for evacuation overhead. Let's say you decide to evacuate: you start with 2-3% of blocks being empty (the target blocks), and choose a corresponding set of candidate blocks for evacuation (the source blocks). Since Immix is a one-pass collector, it doesn't know how much data is live when it starts collecting. It may not know that the blocks that it is evacuating will fit into the target space. As specified in the original paper, if the target space fills up, Immix will mark in place instead of evacuating; an evacuation candidate block with marked-in-place objects would then be non-empty at the end of collection.

In fact if you choose a set of evacuation candidates hoping to maximize your empty block yield, based on an estimate of live data instead of limiting to only the number of target blocks, I think it's possible to actually fill the targets before the source blocks empty, leaving you with no empty blocks at the end! (This can happen due to inaccurate live data estimations, or via internal fragmentation with the block size.) The only way to avoid this is to never select more evacuation candidate blocks than you have in target blocks. If you are lucky, you won't have to use all of the target blocks, and so at the end you will end up with more free blocks than not, so a subsequent evacuation will be more effective. The defragmentation result in that case would still be pretty good, but the yield in free blocks is not great.

In a production garbage collector I would still be tempted to be optimistic and select more evacuation candidate blocks than available empty target blocks, because it will require fewer rounds to compact the whole heap, if that's what you wanted to do. It would be a relatively rare occurrence to start an evacuation cycle. If you ran out of space while evacuating, in a production GC I would just temporarily commission some overhead blocks for evacuation and release them promptly after evacuation is complete. If you have a small heap multiplier in your Immix space, occasional partial evacuation in a long-running process would probably reach a steady state with blocks being either full or empty. Fragmented blocks would represent newer objects and evacuation would periodically sediment these into longer-lived dense blocks.

mutator throughput

Finally, the shape of the heap has its inverse in the shape of the holes into which the mutator can allocate. It's most efficient for the mutator if the heap has as few holes as possible: ideally just one large hole per block, which is the limit case of an empty block.

The opposite extreme would be having every other "line" (in Immix terms) be used, so that free space is spread across the heap in a vast spray of one-line holes. Even if fragmentation is not a problem, perhaps because the application only allocates objects that pack neatly into lines, having to stutter all the time to look for holes is overhead for the mutator. Also, the result is that contemporaneous allocations are more likely to be placed farther apart in memory, leading to more cache misses when accessing data. Together, allocator overhead and access overhead lead to lower mutator throughput.

When would this situation get so bad as to trigger compaction? Here I have no idea. There is no clear maximum. If compaction were free, we would compact all the time. But it's not; there's a tradeoff between the cost of compaction and mutator throughput.

I think here I would punt. If the heap is being actively resized based on allocation rate, we'll hit the other heuristics first, and so we won't need to trigger evacuation/compaction based on mutator overhead. You could measure this, though, in terms of average or median hole size, or average or maximum number of holes per block. Since evacuation is partial, all you need to do is to identify some "bad" blocks and then perhaps evacuation becomes attractive.

gc pause

Welp, that's some thoughts on when to trigger evacuation in Immix. Next time, we'll talk about some engineering aspects of evacuation. Until then, happy consing!

by Andy Wingo at June 21, 2022 12:21 PM

Carlos García Campos

Thread safety support in libsoup3

In libsoup2 there’s some thread safety support that allows to send messages from a thread different than the one where the session is created. There are other APIs that can be used concurrently too, like accessing some of the session properties, and others that aren’t thread safe at all. It’s not clear what’s thread safe and even sending a message is not fully thread safe either, depending on the session features involved. However, several applications relay on the thread safety support and have always worked surprisingly well.

In libsoup3 we decided to remove the (broken) thread safety support and only allowed to use the API from the same thread where the session was created. This simplified the code and made easier to add the HTTP/2 implementation. Note that HTTP/2 supports multiple request over the same TCP connection, which is a lot more efficient than starting multiple requests from several threads in parallel.

When apps started to be ported to libsoup3, those that relied on the thread safety support ended up being a pain to be ported. Major refactorings where required to either stop using the sync API from secondary threads, or moving all the soup usage to the same secondary thread. We managed to make it work in several modules like gstreamer and gvfs, but others like evolution required a lot more work. The extra work was definitely worth it and resulted in much better and more efficient code. But we also understand that porting an application to a new version of a dependency is not a top priority task for maintainers.

So, in order to help with the migration to libsoup3, we decided to add thread safety support to libsoup3 again, but this time trying to cover all the APIs involved in sending a message and documenting what’s expected to be thread safe. Also, since we didn’t remove the sync APIs, it’s expected that we support sending messages synchronously from secondary threads. We still encourage to use only the async APIS from a single thread, because that’s the most efficient way, especially for HTTP/2 requests, but apps currently using threads can be easily ported first and then refactored later.

The thread safety support in libsoup3 is expected to cover only one use case: sending messages. All other APIs, including accessing session properties, are not thread safe and can only be used from the thread where the session is created.

There are a few important things to consider when using multiple threads in libsoup3:

  • In the case of HTTP/2, two messages for the same host sent from different threads will not use the same connection, so the advantage of HTTP/2 multiplexing is lost.
  • Only the API to send messages can be called concurrently from multiple threads. So, in case of using multiple threads, you must configure the session (setting network properties, features, etc.) from the thread it was created and before any request is made.
  • All signals associated to a message (SoupSession::request-queued, SoupSession::request-unqueued, and all SoupMessage signals) are emitted from the thread that started the request, and all the IO will happen there too.
  • The session can be created in any thread, but all session APIs except the methods to send messages must be called from the thread where the session was created.
  • To use the async API from a thread different than the one where the session was created, the thread must have a thread default main context where the async callbacks are dispatched.
  • The sync API doesn’t need any main context at all.

by carlos garcia campos at June 21, 2022 07:36 AM

June 20, 2022

Tiago Vignatti

Short blog post from Madrid's bus

This post was inspired by “Short blog post from Madrid’s hotel room” from my colleague Frédéric Wang. You should really check his post instead! To Fred: thanks for the feedback and review here. I’m in for football in the next Summit, alright? :-) This week, I finally went to A Coruña for the Web Engines Hackfest and internal company meetings. These were my first on-site events since the COVID-19 pandemic. After two years of non-super-exciting virtual conferences I was so glad to finally be able to meet with colleagues and other people from the Web.

by Author at June 20, 2022 10:34 PM

Andy Wingo

blocks and pages and large objects

Good day! In a recent dispatch we talked about the fundamental garbage collection algorithms, also introducing the Immix mark-region collector. Immix mostly leaves objects in place but can move objects if it thinks it would be profitable. But when would it decide that this is a good idea? Are there cases in which it is necessary?

I promised to answer those questions in a followup article, but I didn't say which followup :) Before I get there, I want to talk about paged spaces.

enter the multispace

We mentioned that Immix divides the heap into blocks (32kB or so), and that no object can span multiple blocks. "Large" objects -- defined by Immix to be more than 8kB -- go to a separate "large object space", or "lospace" for short.

Though the implementation of a large object space is relatively simple, I found that it has some points that are quite subtle. Probably the most important of these points relates to heap size. Consider that if you just had one space, implemented using mark-compact maybe, then the procedure to allocate a 16 kB object would go:

  1. Try to bump the allocation pointer by 16kB. Is it still within range? If so we are done.

  2. Otherwise, collect garbage and try again. If after GC there isn't enough space, the allocation fails.

In step (2), collecting garbage could decide to grow or shrink the heap. However when evaluating collector algorithms, you generally want to avoid dynamically-sized heaps.

cheatery

Here is where I need to make an embarrassing admission. In my role as co-maintainer of the Guile programming language implementation, I have long noodled around with benchmarks, comparing Guile to Chez, Chicken, and other implementations. It's good fun. However, I only realized recently that I had a magic knob that I could turn to win more benchmarks: simply make the heap bigger. Make it start bigger, make it grow faster, whatever it takes. For a program that does its work in some fixed amount of total allocation, a bigger heap will require fewer collections, and therefore generally take less time. (Some amount of collection may be good for performance as it improves locality, but this is a marginal factor.)

Of course I didn't really go wild with this knob but it now makes me doubt all benchmarks I have ever seen: are we really using benchmarks to select for fast implementations, or are we in fact selecting for implementations with cheeky heap size heuristics? Consider even any of the common allocation-heavy JavaScript benchmarks, DeltaBlue or Earley or the like; to win these benchmarks, web browsers are incentivised to have large heaps. In the real world, though, a more parsimonious policy might be more appreciated by users.

Java people have known this for quite some time, and are therefore used to fixing the heap size while running benchmarks. For example, people will measure the minimum amount of memory that can allow a benchmark to run, and then configure the heap to be a constant multiplier of this minimum size. The MMTK garbage collector toolkit can't even grow the heap at all currently: it's an important feature for production garbage collectors, but as they are just now migrating out of the research phase, heap growth (and shrinking) hasn't yet been a priority.

lospace

So now consider a garbage collector that has two spaces: an Immix space for allocations of 8kB and below, and a large object space for, well, larger objects. How do you divide the available memory between the two spaces? Could the balance between immix and lospace change at run-time? If you never had large objects, would you be wasting space at all? Conversely is there a strategy that can also work for only large objects?

Perhaps the answer is obvious to you, but it wasn't to me. After much reading of the MMTK source code and pondering, here is what I understand the state of the art to be.

  1. Arrange for your main space -- Immix, mark-sweep, whatever -- to be block-structured, and able to dynamically decomission or recommission blocks, perhaps via MADV_DONTNEED. This works if the blocks are even multiples of the underlying OS page size.

  2. Keep a counter of however many bytes the lospace currently has.

  3. When you go to allocate a large object, increment the lospace byte counter, and then round up to number of blocks to decommission from the main paged space. If this is more than are currently decommissioned, find some empty blocks and decommission them.

  4. If no empty blocks were found, collect, and try again. If the second try doesn't work, then the allocation fails.

  5. Now that the paged space has shrunk, lospace can allocate. You can use the system malloc, but probably better to use mmap, so that if these objects are collected, you can just MADV_DONTNEED them and keep them around for later re-use.

  6. After GC runs, explicitly return the memory for any object in lospace that wasn't visited when the object graph was traversed. Decrement the lospace byte counter and possibly return some empty blocks to the paged space.

There are some interesting aspects about this strategy. One is, the memory that you return to the OS doesn't need to be contiguous. When allocating a 50 MB object, you don't have to find 50 MB of contiguous free space, because any set of blocks that adds up to 50 MB will do.

Another aspect is that this adaptive strategy can work for any ratio of large to non-large objects. The user doesn't have to manually set the sizes of the various spaces.

This strategy does assume that address space is larger than heap size, but only by a factor of 2 (modulo fragmentation for the large object space). Therefore our risk of running afoul of user resource limits and kernel overcommit heuristics is low.

The one underspecified part of this algorithm is... did you see it? "Find some empty blocks". If the main paged space does lazy sweeping -- only scanning a block for holes right before the block will be used for allocation -- then after a collection we don't actually know very much about the heap, and notably, we don't know what blocks are empty. (We could know it, of course, but it would take time; you could traverse the line mark arrays for all blocks while the world is stopped, but this increases pause time. The original Immix collector does this, however.) In the system I've been working on, instead I have it so that if a mutator finds an empty block, it puts it on a separate list, and then takes another block, only allocating into empty blocks once all blocks are swept. If the lospace needs blocks, it sweeps eagerly until it finds enough empty blocks, throwing away any nonempty blocks. This causes the next collection to happen sooner, but that's not a terrible thing; this only occurs when rebalancing lospace versus paged-space size, because if you have a constant allocation rate on the lospace side, you will also have a complementary rate of production of empty blocks by GC, as they are recommissioned when lospace objects are reclaimed.

What if your main paged space has ample space for allocating a large object, but there are no empty blocks, because live objects are equally peppered around all blocks? In that case, often the application would be best served by growing the heap, but maybe not. In any case in a strict-heap-size environment, we need a solution.

But for that... let's pick up another day. Until then, happy hacking!

by Andy Wingo at June 20, 2022 02:59 PM

June 17, 2022

Frédéric Wang

Short blog post from Madrid's hotel room

This week, I finally went back to A Coruña for the Web Engines Hackfest and internal company meetings. These were my first on-site events since the COVID-19 pandemic. After two years of non-super-exciting virtual conferences I was so glad to finally be able to meet with colleagues and other people from the Web.

Igalia has grown considerably and I finally get to know many new hires in person. Obviously, some people were still not able to travel despite the effort we put to settle strong sanitary measures. Nevertheless, our infrastructure has also improved a lot and we were able to provide remote communication during these events, in order to give people a chance to attend and participate !

Work on the Madrid–Galicia high-speed rail line finally completed last December, meaning one can now travel with fast trains between Paris - Barcelona - Madrid - A Coruña. This takes about one day and a half though and, because I’m voting for the Legislative elections in France, I had to shorten a bit my stay and miss nice social activities 😥… That’s a pity, but I’m looking forward to participating more next time!

Finally on the technical side, my main contribution was to present our upcoming plan to ship MathML in Chromium. The summary is that we are happy with this first implementation and will send the intent-to-ship next week. There are minor issues to address, but the consensus from the conversations we had with other attendees (including folks from Google and Mozilla) is that they should not be a blocker and can be refined depending on the feedback from API owners. So let’s do it and see what happens…

There is definitely a lot more to write and nice pictures to share, but it’s starting to be late here and I have a train back to Paris tomorrow. 😉

June 17, 2022 12:00 PM

June 16, 2022

Iago Toral

V3DV Vulkan 1.2 status

A quick update on my latest activities around V3DV: I’ve been focusing on getting the driver ready for Vulkan 1.2 conformance, which mostly involved fixing a few CTS tests of the kind that would only fail occasionally, these are always fun :). I think we have fixed all the issues now and we are ready to submit conformance to Khronos, my colleague Alejandro Piñeiro is now working on that.

by Iago Toral at June 16, 2022 09:21 AM

June 15, 2022

Andy Wingo

defragmentation

Good morning, hackers! Been a while. It used to be that I had long blocks of uninterrupted time to think and work on projects. Now I have two kids; the longest such time-blocks are on trains (too infrequent, but it happens) and in a less effective but more frequent fashion, after the kids are sleeping. As I start writing this, I'm in an airport waiting for a delayed flight -- my first since the pandemic -- so we can consider this to be the former case.

It is perhaps out of mechanical sympathy that I have been using my reclaimed time to noodle on a garbage collector. Managing space and managing time have similar concerns: how to do much with little, efficiently packing different-sized allocations into a finite resource.

I have been itching to write a GC for years, but the proximate event that pushed me over the edge was reading about the Immix collection algorithm a few months ago.

on fundamentals

Immix is a "mark-region" collection algorithm. I say "algorithm" rather than "collector" because it's more like a strategy or something that you have to put into practice by making a concrete collector, the other fundamental algorithms being copying/evacuation, mark-sweep, and mark-compact.

To build a collector, you might combine a number of spaces that use different strategies. A common choice would be to have a semi-space copying young generation, a mark-sweep old space, and maybe a treadmill large object space (a kind of copying collector, logically; more on that later). Then you have heuristics that determine what object goes where, when.

On the engineering side, there's quite a number of choices to make there too: probably you make some parts of your collector to be parallel, maybe the collector and the mutator (the user program) can run concurrently, and so on. Things get complicated, but the fundamental algorithms are relatively simple, and present interesting fundamental tradeoffs.


figure 1 from the immix paper

For example, mark-compact is most parsimonious regarding space usage -- for a given program, a garbage collector using a mark-compact algorithm will require less memory than one that uses mark-sweep. However, mark-compact algorithms all require at least two passes over the heap: one to identify live objects (mark), and at least one to relocate them (compact). This makes them less efficient in terms of overall program throughput and can also increase latency (GC pause times).

Copying or evacuating spaces can be more CPU-efficient than mark-compact spaces, as reclaiming memory avoids traversing the heap twice; a copying space copies objects as it traverses the live object graph instead of after the traversal (mark phase) is complete. However, a copying space's minimum heap size is quite high, and it only reaches competitive efficiencies at large heap sizes. For example, if your program needs 100 MB of space for its live data, a semi-space copying collector will need at least 200 MB of space in the heap (a 2x multiplier, we say), and will only run efficiently at something more like 4-5x. It's a reasonable tradeoff to make for small spaces such as nurseries, but as a mature space, it's so memory-hungry that users will be unhappy if you make it responsible for a large portion of your memory.

Finally, mark-sweep is quite efficient in terms of program throughput, because like copying it traverses the heap in just one pass, and because it leaves objects in place instead of moving them. But! Unlike the other two fundamental algorithms, mark-sweep leaves the heap in a fragmented state: instead of having all live objects packed into a contiguous block, memory is interspersed with live objects and free space. So the collector can run quickly but the allocator stops and stutters as it accesses disparate regions of memory.

allocators

Collectors are paired with allocators. For mark-compact and copying/evacuation, the allocator consists of a pointer to free space and a limit. Objects are allocated by bumping the allocation pointer, a fast operation that also preserves locality between contemporaneous allocations, improving overall program throughput. But for mark-sweep, we run into a problem: say you go to allocate a 1 kilobyte byte array, do you actually have space for that?

Generally speaking, mark-sweep allocators solve this problem via freelist allocation: the allocator has an array of lists of free objects, one for each "size class" (say 2 words, 3 words, and so on up to 16 words, then more sparsely up to the largest allocatable size maybe), and services allocations from their appropriate size class's freelist. This prevents the 1 kB free space that we need from being "used up" by a 16-byte allocation that could just have well gone elsewhere. However, freelists prevent objects allocated around the same time from being deterministically placed in nearby memory locations. This increases variance and decreases overall throughput for both the allocation operations but also for pointer-chasing in the course of the program's execution.

Also, in a mark-sweep collector, we can still reach a situation where there is enough space on the heap for an allocation, but that free space broken up into too many pieces: the heap is fragmented. For this reason, many systems that perform mark-sweep collection can choose to compact, if heuristics show it might be profitable. Because the usual strategy is mark-sweep, though, they still use freelist allocation.

on immix and mark-region

Mark-region collectors are like mark-sweep collectors, except that they do bump-pointer allocation into the holes between survivor objects.

Sounds simple, right? To my mind, though the fundamental challenge in implementing a mark-region collector is how to handle fragmentation. Let's take a look at how Immix solves this problem.


part of figure 2 from the immix paper

Firstly, Immix partitions the heap into blocks, which might be 32 kB in size or so. No object can span a block. Block size should be chosen to be a nice power-of-two multiple of the system page size, not so small that common object allocations wouldn't fit. Allocating "large" objects -- greater than 8 kB, for Immix -- go to a separate space that is managed in a different way.

Within a block, Immix divides space into lines -- maybe 128 bytes long. Objects can span lines. Any line that does not contain (a part of) an object that survived the previous collection is part of a hole. A hole is a contiguous span of free lines in a block.

On the allocation side, Immix does bump-pointer allocation into holes. If a mutator doesn't have a hole currently, it scans the current block (obtaining one if needed) for the next hole, via a side-table of per-line mark bits: one bit per line. Lines without the mark are in holes. Scanning for holes is fairly cheap, because the line size is not too small. Note, there are also per-object mark bits as well; just because you've marked a line doesn't mean that you've traced all objects on that line.

Allocating into a hole has good expected performance as well, as it's bump-pointer, and the minimum size isn't tiny. In the worst case of a hole consisting of a single line, you have 128 bytes to work with. This size is large enough for the majority of objects, given that most objects are small.

mitigating fragmentation

Immix still has some challenges regarding fragmentation. There is some loss in which a single (piece of an) object can keep a line marked, wasting any free space on that line. Also, when an object can't fit into a hole, any space left in that hole is lost, at least until the next collection. This loss could also occur for the next hole, and the next and the next and so on until Immix finds a hole that's big enough. In a mark-sweep collector with lazy sweeping, these free extents could instead be placed on freelists and used when needed, but in Immix there is no such facility (by design).

One mitigation for fragmentation risks is "overflow allocation": when allocating an object larger than a line (a medium object), and Immix can't find a hole before the end of the block, Immix allocates into a completely free block. So actually mutator threads allocate into two blocks at a time: one for small objects and medium objects if possible, and the other for medium objects when necessary.

Another mitigation is that large objects are allocated into their own space, so an Immix space will never be used for blocks larger than, say, 8kB.

The other mitigation is that Immix can choose to evacuate instead of mark. How does this work? Is it worth it?

stw

This question about the practical tradeoffs involving evacuation is the one I wanted to pose when I started this article; I have gotten to the point of implementing this part of Immix and I have some doubts. But, this article is long enough, and my plane is about to land, so let's revisit this on my return flight. Until then, see you later, allocators!

by Andy Wingo at June 15, 2022 12:47 PM

June 10, 2022

Alex Surkov

Automated accessibility testing

Accessibility Testing

Intro

I think it’s fair to say that every application developer needs to take care of accessibility at some point. Indeed, even if you use an accessible toolkit to create an application, the app isn’t always accessible simply because the combination of accessible parts is not always accessible. In reality, many things can go wrong. Without care the app may become completely inaccessible as complexity increases

If you’re a web developer, then you’re (hopefully) already familiar with WCAG, ARIA and other cool stuff helping to address accessibility issues in web apps. If you are a platform developer such as an Android or iOS, then you probably know a bazillion of tricks to keep mobile apps accessible. You might also know how to run the app over screen readers and other assistive technology softwares to ensure everything goes smoothly and works as expected. And you might already have started thinking of how you can use automated testing to cover all the accessibility caveats to avoid regressions and to reduce overhead of the manual testing.

So let’s talk about accessibility automated testing. There’s no universal solution that would embrace each and every platform and every single case. However there are many existing approaches, some are better than others. Some of them are fairly solid. Having said that looking at the diversity of the existing solutions and realizing how much people still do the manual testing (AAM specs, such as ARIA or HTML accessibility APIs mappings, can be a great example of it), I think it’d be amazing to systemize the existing techniques and come with a better, universal solution that would cover the majority of cases. I hope this post can help to find the right way to do this.

What to test

First things come first: what is the scope of automated accessibility testing, i.e. what exactly do we want to test?

Web

Without a doubt, the web is vast and complex and plays a significant role in our lives. It surely has to be accessible. But what is an accessible web, exactly?

The main conductors on the web are web browsers. They make up the web platform by providing all of the tiny building blocks such as HTML elements to create web content or ARIA attributes to define semantics. Browsers deliver web content to the users by rendering it on a screen and expose its semantics to the assistive technologies such as screen readers. All these blocks must be accessible. In particular that means the browsers are responsible to expose all building blocks to the assistive technologies correctly. It’s very tempting to say if a browser is doing the job well, then a web author cannot go wrong by using these blocks (for sure, if the web author is not doing anything strange on purpose). It sounds about right and it works nicely in theory. But, in practice accessibility issues suddenly pop up as the complexity of a web app goes up.

Browsers

The browsers are certainly the major use case in the web accessibility testing and I’d like to get them covered, simply because the web cannot be accessible without accessible browsers. Also, they already do a decent job on accessibility testing, and we can learn a lot from them. Indeed, they’re all stuffed with accessibility testing solutions and each has its own test harness for automating the process. Their system could be unified and adjusted to a broader range of uses.

Web apps

The web applications are the second piece of the web puzzle. They also must be covered.

The webapps are made up of small and accessible building blocks (if the browser does a good job). However, as I previously noted, it is not sufficient to have individually accessible parts — it is necessary that all of the combinations of parts are accessible. It may sound quite obvious since QA and end to end testing wasn’t invented yesterday, but this is something overlooked quite often. So this is one more yet use case on the table: the overall integrated web application’s accessibility.

Platform

Although the web is vital, there’s a sizable market of desktop and mobile applications which also need to get accessibility tested. Some desktop/mobile apps are using embedded browsers under-the-hood as a rendering engine, which brings them to the web scope. But speaking generally, desktop/mobile apps are not the web.

Having said that, it’s worth noting that browsers and desktop/mobile applications coexist in the same environment, and they use the very same platform accessibility APIs to expose the content to the assistive technologies. It makes web and desktop/mobile applications have more in common than people usually tend to think. I’m from the browser world and I keep focus on the web accessibility as many of you probably are, but let’s keep in mind desktop and mobile applications as one more use case. We will have them covered as well.

How to test

The next question to address is how to test accessibility or how to ensure that the application is accessible. You could test your entire application with a screen reader or a magnifier on your own and that would be pretty trustworthy, but what exactly to test when it comes to automated testing?

Unit testing

Unit testing is the first level of automated testing, and accessibility is no exception. It allows you to perform very low-level testing such as testing individual c++ classes or JS modules, or testing all internal things that are never disclosed or can never be tested via any public API as the pure under-the-hood thing. As a result, unit testing is a critical component in automated testing.

However, it has nothing to do with accessibility specifically, and there’s nothing to generalize or systemize for the benefits of accessibility. It’s simply something that all systems must have.

Accessibility APIs mappings

When it comes to automated accessibility testing, the first and probably the best thing to start from is to test accessibility APIs. Accessibility APIs is the universal language that any accessible application can be expressed in. If accessibility properties of UI looks about right, then you have great chances the app is accessible. By the way, this is the most common strategy in accessibility testing today in web browsers.

To give an example how accessibility APIs testing can be used practically, you can think of AAM web specifications. These specs define how ARIA attributes or HTML/SVG elements should be mapped to the accessibility APIs. This is a somewhat restrictive example to demonstrate the capabilities of accessibility APIs testing, because AAM utilize only a few things peculiar to accessibility APIs (for example, it misses key things like accessible relations, hierarchies or actions) but it’s a good example to get a sense of what kind of things can be tested when it comes to accessibility APIs.

Accessibility APIs have complex nature. They differ from platform to platform, but they share many traits in common. It’s not surprising at all, because they all are designed to expose the same thing: the semantics of user interface.

Here are the main categories you can find in many accessibility APIs.

  • States and properties: an example for these could be focused or selected states on a menu item or accessible name/description for properties.
  • Methods or interfaces allow to retrieve complex information about accessible elements, for example, to retrieve information about list or grid controls.
  • The relation concept is a key one. It defines how accessible elements relate to each other. In particular they are used to navigate content and/or to get extra information about elements such as labels for a control or headers for a table cell.
  • Accessible trees are a special case of accessible relations. However they are often defined as a separate entity. The accessible tree is a hierarchical representation of the content, and the ATs frequently rely on it.
  • As a rule of thumb there is also special support for text and selection.
  • There are also accessible actions for example a button click or expand/collapse a dropdown
  • Accessible events are used to notify the assistive technologies about app changes.

All of this is a very typical if not comprehensive list of what can be (or should be) tested when it comes to accessibility platform APIs testing.

Accessibility APIs are fairly low level testing, which is good or might be not depending on a use case. I would say it’s nearly perfect to test browsers, for example, how the browsers expose HTML elements and such. It also should be the right choice for different kinds of toolkits, which provides you with a set of building blocks, for example, extra controls on top of HTML5 elements. I’m not quite confident that this type of testing is exactly what web developers look for when it comes to webapps accessibility testing, because it’s fairly low level and requires understanding of under-the-hood things, but certainly they can benefit from checking accessibility properties on certain elements/structures, for example, for web components.

Assistive Technologies testing

Assistive Technologies (AT) testing makes another layer of automated testing. Unlinke accessibility APIs testing, this is a great way to ensure your app is spoken exactly as you want over different screen readers or it gets zoomed properly when it comes to screen magnifiers. Any application including both browsers and web apps can benefit from integrational AT testing by running individual control elements or separate UI blocks though the ATs.

AT testing can be also used for end-to-end testing. However, as any end-to-end testing it cannot be comprehensive because of a steep rise in testing scenarios as the app complexity increases. Having said that, it certainly can be helpful to check the most crucial scenarios and can make a real great addition to the integrational accessibility APIs testing.

Testing flow

Testing a single HTML page to check an accessible tree and/or its relevant accessibility properties makes a nice pattern of atomic testing. It is quite similar to unit testing. However, the reality is that not every use case can be reduced to a simple static page. It makes such testing quite restrictive. The real world affirms we need to test accessibility in dynamics for a full spectrum of testing scenarios, such as clicking a button and checking what happens next.

This makes another important piece of the puzzle, a testing flow, or in other words, how to control an application to test its dynamics, where you can query accessibility properties at the right time.

A typical test flow can be described in three simple steps:

  • Query-n-check: test accessibility properties against expectations.
  • Action: change a value or trigger an action: it represents the dynamics part and describes how a test interacts with an application.
  • On-hold: wait for an event to hold the test execution until the app gets to the right state.

To summarize there are two key pieces we’d need for a testing flow. First, to trigger an action. This can be emulation of user actions, triggering accessible actions or changing accessible properties, anything that allows the user to operate and control the application. Secondly, we need the ability to hold on to a test execution until certain criterias are met. That could be an accessible event or presence of certain accessible properties on an element, or waiting until a certain text is visible, whatever, just to put the test execution on pause until it’s good to proceed.

What we have now

Let’s take a look at what the market has to propose. I don’t know much about desktop or mobile applications, many of them are not open source, so no chance to sneak a peek. But I think it’d be a good idea to start from the web and the web browsers in particular, who have made some real good progress on accessibility testing.

AOM

The first thing I would like to mention is AOM. It’s not the browsers, but it’s something closely related. AOM stands for Accessible Object Model. This is an attempt to expose accessibility properties in a cross-platform way in browsers. Similarly to platform accessibility APIs, AOM defines roles, properties, relations, and an accessible tree. You can think of AOM as the web accessibility API. So having AOM with all typical accessibility features can make a great platform for cross-browser accessibility testing.

Certain elements like platform dependent mappings are not very suitable for AOM for sure. For example, AAM specs, the accessibility APIs mapping, which define how to expose the web to the platform accessibility APIs, do not make a great choice for AOM testing. Indeed, despite platform APIs similarities on the conceptual level, they differ in details. However, if we care about the logical compound only, AOM will suit nicely. For example, ARIA name computation algorithms make a great match for AOM testing, or AOM can be used to test relations between a label and a control. In this case we don’t worry what is the platform name for the relation, the only thing we want to test is the right relations are exposed.

Sadly, it’s not something that was implemented. AOM is a long term vision, and the main focus till now has been on ARIA reflection, which is prototyped by a number of browsers by now. But ARIA reflection spec is just a DOM reflection of ARIA attributes and thus has somewhat lower testing capabilities. For example, HTML elements that don’t have a proper ARIA mapping cannot be tested by ARIA reflection or any of accessibility events.

So AOM is something that has great testing potential but it’s not yet implemented or even specified.

ATTA

ATTA (Accessible Technology Test Adapter API) was the answer to manual accessibility testing for ARIA and HTML AAM specifications. Roughly speaking ATTA defines a protocol used to describe accessibility properties for a given HTML code snippet to create the test expectations. The expectations are sent to the ATTA server, which queries an ATTA driver for an accessible tree for the given HTML snippet and then checks it against the given expectations.

The ATTA is integrated into WPT test suite and it could make a great solution for the web accessibility APIs testing, if it worked. There are implementations of ATTA drivers for IAccessible2, UI Automation and ATK but apparently none of them ever reached the ready-to-use status.

So ATTA has a bunch of worthy ideas like WPT built-in integration and the modular system which allows you connect various platform APIs drivers, but sadly it is not finished yet and no active work any longer.

Browsers

Let’s take a look at the browsers. Browsers have quite a long history of accessibility support and they’ve made a great progress on accessibility testing.

Firefox

In Firefox all testing is done in Javascript. It’s worth noting Gecko’s accessibility core is a massive system that includes practically every feature of any desktop accessibility API. Gecko has a fairly mature set of cross-platform accessibility interfaces. Because the platform implementations are thin wrappers around the accessibility core, if you test something on a cross-platform layer in Gecko, then you can be confident that it will work well on all platforms. Gecko does, however, offer native NSAccessibility objects to JavaScript the same way that they do in cross-platform testing. It works nicely and allows one to poke all NSAccessibility attributes and parameterized attributes, as well as listening to the accessibility events. This approach is not portable as is, because it relies on a somewhat ancient Netscape-era technology. It could be adapted though to work through webidl if you want to to make it portable to other browsers, but fairly useless outside browsers world. Nevertheless it is certainly good to get inspired by.

Here’s an example of the Gecko accessibility test. The test represents a typical scenario for the accessibility testing: you get a property, call an action, wait for an event, and then make sure the property value was adjusted properly. You can imagine that cross platform tests are quite similar to this one.

Gecko implements its own test suite which provides a number of util functions such as ok() or is(), which are responsible to handle and report success or failures. This is the kind of testing system, where test expectations are listed explicitly in a test body, or in other words, the test says what to test and what are expectations. As a direct consequence if you need to change expectations, you have to adjust the test manually. It’s quite a typical testing system though.

WebKit/Safari

I think it’s fair to say that WebKit, being the engine behind the popular Safari browser, has decent NSAccessibility protocol support, but it also supports ATK and MSAA. The WebKit testsuite is rather straightforward and implements its testing capabilities in a cross platform style. It exposes a helper object to DOM, and you can query platform dependent accessible properties/methods from JavaScript. It’s quite similar to what Firefox does.

The testsuite itself is quite different from Firefox though, which is not surprising. The WebKit test generates output which is compared to expectations stored in files. The test expectations are also listed in a test body, which makes the approach close to Gecko.

Similar to Gecko, WebKit also supports event testing, here’s an example of a typical test. It might look bulky for a simple thing it does, but all stuff can be wrapped by a nice promise-based wrapper which will make the test more readable. The most important thing here is that WebKit also supports testing for all parts of a typical accessibility API.

Chromium

Beyond low-level c++ unit testing, Chromium relies on platform accessibility APIs testing. It is quite similar to Firefox or WebKit with one key difference.

Chromium can dump an accessibility tree with relevant accessibility properties or can record accessibility events, which it subsequently compares to the expectation files. The main gotcha here is these tests can be rebaselined easily unlike other kinds of tests. If something changes on the API level, for example, if an accessible tree is spec’d out differently, new properties are added or old ones are removed — any change, then all you need is to rerun a test and capture output, which becomes the new expectations. It’s as if you take a snapshot that becomes a new standard, and then all following runs match it.

Here’s a typical example of a tree test with mac and win expectation files.

Chromium also reveals basic scripting capabilities. Those are mainly mac targeted though, and thus scoped by NSAccessibility protocol testing. However it allows testing all bits of the API, including methods, attributes, parameterized attributes, actions and events.

What would make a great test suite?

Let’s pull things together. Accessibility exists within the context of a platform, which glues together an application and the assistive technology by accessibility API. We want a platform-wide testing solution to test platform accessibility APIs. It should be possible to test a variety of things such as accessible trees, properties, methods and events.

The solution should not be limited to just web browsers. It should be capable of covering web applications running in web browsers. We’d also like to test any application running on the system.

Multiple platforms should be supported such as AT-SPI/ATK on Linux, MSAA/IA2 and UIA on Windows and NSAccessibility protocol on Mac. The solution has to be extensible in case we need to support other platforms in future.

Test flow control should be supported out-of-the-box, such as interacting with an app and then waiting for an accessible event. Another example of interactions will be the flow control directives allowing communication with the assistive technologies. Getting those covered will allow writing end-to-end tests.

Last but not least. Easy test rebaselining is a key feature to have. If you ever need to change a test’s expectations, then you just rerun the test and record its output. It happens more often than you probably think. This testsuite allows you to adjust expectations with about zero effort.

Chromium accessibility tools

Chromium has decent accessibility tools to inspect a platform accessibility tree and to listen to accessibility events. They are available on all major platforms and can be easily ported to other platforms, essentially on any platform Chromium is running on. They are capable of inspecting any application on a system including web applications running in a web browser. All major desktop APIs are supported as well, namely AT-SPI on Linux, MSAA/IA2 and UIA on Windows and NSAccessibility protocol on Mac.

In Chromium these tools are integrated into a test harness to perform platform accessibility testing. The testsuite supports test flow instructions and rebaselining mechanism.

The tools can be beefed up with testing capabilities to become a new test harness, or be easily integrated into existing test suites or CDCI systems.

The tools are not perfect and are not feature-complete. They have great potential though. As long as they are capable of providing all the must-have testing features we discussed above, all they need is some love I think. The tools are open source and anyone can contribute. However, having them as an integral part of the Chromium project makes it seem like they are inherently tied to that project and doesn’t make life easy for new contributors. It’s possible, however, that eventually the tools could get a new home. If new contributors bring the fresh vision and new use cases to shape the tool’s features, then it can make a great start of a new open source project which will embrace all innovations, and hopefully can solve a long standing problem of accessibility automated testing.

Is it worth taking a shot?

Are you ready to join the efforts?

by Alexander Surkov at June 10, 2022 12:50 PM

June 06, 2022

Alejandro Piñeiro

Playing with the rpi4 CPU/GPU frequencies

In recent days I have been testing how modifying the default CPU and GPU frequencies on the rpi4 increases the performance of our reference Vulkan applications. By default Raspbian uses 1500MHz and 500MHz respectively. But with a good heat dissipation (a good fan, rpi400 heat spreader, etc) you can play a little with those values.

One of the tools we usually use to check performance changes are gfxreconstruct. This tools allows you to record all the Vulkan calls during a execution of an aplication, and then you can replay the captured file. So we have traces of several applications, and we use them to test any hypothetical performance improvement, or to verify that some change doesn’t cause a performance drop.

So, let’s see what we got if we increase the CPU/GPU frequency, focused on the Unreal Engine 4 demos, that are the more shader intensive:

Unreal Engine 4 demos FPS chart

So as expected, with higher clock speed we see a good boost in performance of ~10FPS for several of these demos.

Some could wonder why the increase on the CPU frequency got so little impact. As I mentioned, we didn’t get those values from the real application, but from gfxreconstruct traces. Those are only capturing the Vulkan calls. So on those replays there are not tasks like collision detection, user input, etc that are usually handled on the CPU. Also as mentioned, all the Unreal Engine 4 demos uses really complex shaders, so the “bottleneck” there is the GPU.

Let’s move now from the cold numbers, and test the real applications. Let’s start with the Unreal Engine 4 SunTemple demo, using the default CPU/GPU frequencies (1500/500):

Even if it runs fairly smooth most of the time at ~24 FPS, there are some places where it dips below 18 FPS. Let’s see now increasing the CPU/GPU frequencies to 1800/750:

Now the demo runs at ~34 FPS most of the time. The worse dip is ~24 FPS. It is a lot smoother than before.

Here is another example with the Unreal Engine 4 Shooter demo, already increasing the CPU/GPU frequencies:

Here the FPS never dips below 34FPS, staying at ~40FPS most of time.

It has been around 1 year and a half since we announced a Vulkan 1.0 driver for Raspberry Pi 4, and since then we have made significant performance improvements, mostly around our compiler stack, that have notably improved some of these demos. In some cases (like the Unreal Engine 4 Shooter demo) we got a 50%-60% improvement (if you want more details about the compiler work, you can read the details here).

In this post we can see how after this and taking advantage of increasing the CPU and GPU frequencies, we can really start to get reasonable framerates in more demanding demos. Even if this is still at low resolutions (for this post all the demos were running at 640×480), it is still great to see this on a Raspberry Pi.

by infapi00 at June 06, 2022 09:58 AM