Planet Igalia

June 06, 2024

Víctor Jáquez

GStreamer Vulkan Operation API

Two weeks ago the GStreamer Spring Hackfest took place in Thessaloniki, Greece. I had a great time. I hacked a bit on VA, Vulkan and my toy, planet-rs, but mostly I ate delicious Greek food ☻. A big thanks to our hosts: Vivia, Jordan and Sebastian!

GStreamer Spring Hackfest
2024
First day of the GStreamer Spring Hackfest 2024 - https://floss.social/@gstreamer/112511912596084571

And now, writing this supposed small note, I recalled that I have in my to-do list an item to write a comment about GstVulkanOperation, an addition to GstVulkan API which helps with the synchronization of operations on frames, in order to enable Vulkan Video.

Originally, GstVulkan API didn’t provide almost any synchronization operation, beside fences, and that appeared to be enough for elements, since they do simple Vulkan operations. Nonetheless, as soon as we enabled VK_VALIDATION_FEATURE_ENABLE_SYNCHRONIZATION_VALIDATION_EXT feature, which reports resource access conflicts due to missing or incorrect synchronization operations between action [*], a sea of hazard operation warnings drowned us [*].

Hazard operations are a sequence of read/write commands in a memory area, such as an image, that might be re-ordered, or racy even.

Why are those hazard operations reported by the Vulkan Validation Layer, if the programmer pushes the commands to execute in queue in order? Why is explicit synchronization required? Because, as the great blog post from Hans-Kristian Arntzen, Yet another blog explaining Vulkan synchronization, (make sure you read it!) states:

[…] all commands in a queue execute out of order. Reordering may happen across command buffers and even vkQueueSubmits

In order to explain how synchronization is done in Vulkan, allow me to yank a couple definitions stated by the specification:

Commands are instructions that are recorded in a device’s queue. There are four types of commands: action, state, synchronization and indirection. Synchronization commands impose ordering constraints on action commands, by introducing explicit execution and memory dependencies.

Operation is an arbitrary amount of commands recorded in a device’s queue.

Since the driver can reorder commands (perhaps for better performance, dunno), we need to send explicit synchronization commands to the device’s queue to enforce a specific sequence of action commands.

Nevertheless, Vulkan doesn’t offer fine-grained dependencies between individual operations. Instead, dependencies are expressed as a relation of two elements, where each element is composed by the intersection of scope and operation. A scope is a concept in the specification that, in practical terms, can be either pipeline stage (for execution dependencies), or both pipeline stage and memory access type (for memory dependencies).

First let’s review execution dependencies through pipeline stages:

Every command submitted to a device’s queue goes through a sequence of steps known as pipeline stages. This sequence of steps is one of the very few implicit ordering guarantees that Vulkan has. Draw calls, copy commands, compute dispatches, all go through certain sequential stages, which amount of stages to cover depends on the specific command and the current command buffer state.

In order to visualize an abstract execution dependency let’s imagine two compute operations and the first must happen before the second.

Operation 1
Sync command
Operation 2
  1. The programmer has to specify the Sync command in terms of two scopes (Scope 1 and Scope 2), in this execution dependency case, two pipeline stages.
  2. The driver generates an intersection between commands in Operation 1 and Scope 1 defined as Scoped operation 1. The intersection contains all the commands in Operation 1 that go through up to the pipeline stage defined in Scope 1. The same is done with Operation 2 and Scope 2 generating Scoped operation 2.
  3. Finally, we got an execution dependency that guarantees that Scoped operation 1 happens before Scoped operation 2.

Now let’s talk about memory dependencies:

First we need to understand the concepts of memory availability and visibility. Their formal definition in Vulkan are a bit hard to grasp since they come from the Vulkan memory model, which is intended to abstract all the ways of how hardware access memory. Perhaps we could say that availability is the operation that assures the existence of the required memory; while visibility is the operation that assures it’s possible to read/write the data in that memory area.

Memory dependencies are limited the Operation 1 that be done before memory availability and Operation 2 that have to be done after its visibility.

But again, there’s no fine-grained way to declare that memory dependency. Instead, there are memory access types, which are functions used by descriptor types, or functions for pipeline stage to access memory, and they are used as access scopes.

All in all, if a synchronization command defining a memory dependency between two operations, it’s composed by the intersection of between each command and a pipeline stage, intersected with the memory access type associated with the memory processed by those commands.

Now that the concepts are more or less explained we could see those concepts expressed in code. The synchronization command for execution and memory dependencies is defined by VkDependencyInfoKHR. And it contains a set of barrier arrays, for memory, buffers and images. Barriers express the relation of dependency between two operations. For example, Image barriers use VkImageMemoryBarrier2 which contain the mask for source pipeline stage (to define Scoped operation 1), and the mask for the destination pipeline stage (to define Scoped operation 2); the mask for source memory access type and the mask for the destination memory access to define access scopes; and also layout transformation declaration.

A Vulkan synchronization example from Vulkan Documentation wiki:

vkCmdDraw(...);

... // First render pass teardown etc.

VkImageMemoryBarrier2KHR imageMemoryBarrier = {
...
.srcStageMask = VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT_KHR,
.dstStageMask = VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT_KHR,
.dstAccessMask = VK_ACCESS_2_COLOR_ATTACHMENT_WRITE_BIT_KHR,
.oldLayout = VK_IMAGE_LAYOUT_READ_ONLY_OPTIMAL,
.newLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL
/* .image and .subresourceRange should identify image subresource accessed */};

VkDependencyInfoKHR dependencyInfo = {
...
1, // imageMemoryBarrierCount
&imageMemoryBarrier, // pImageMemoryBarriers
...
}

vkCmdPipelineBarrier2KHR(commandBuffer, &dependencyInfo);

... // Second render pass setup etc.

vkCmdDraw(...);

First draw samples a texture in the fragment shader. Second draw writes to that texture as a color attachment.

This is a Write-After-Read (WAR) hazard, which you would usually only need an execution dependency for - meaning you wouldn’t need to supply any memory barriers. In this case you still need a memory barrier to do a layout transition though, but you don’t need any access types in the src access mask. The layout transition itself is considered a write operation though, so you do need the destination access mask to be correct - or there would be a Write-After-Write (WAW) hazard between the layout transition and the color attachment write.

Other explicit synchronization mechanisms, along with barriers, are semaphores and fences. Semaphores are a synchronization primitive that can be used to insert a dependency between operations without notifying the host; while fences are a synchronization primitive that can be used to insert a dependency from a queue to the host. Semaphores and fences are expressed in the VkSubmitInfo2KHR structure.

As a preliminary conclusion, synchronization in Vulkan is hard and a helper API would be very helpful. Inspired by FFmpeg work done by Lynne, I added GstVulkanOperation object helper to GStreamer Vulkan API.

GstVulkanOperation object helper aims to represent an operation in the sense of the Vulkan specification mentioned before. It owns a command buffer as public member where external commands can be pushed to the associated device’s queue.

It has a set of methods:

Internally, GstVulkanOperation contains two arrays:

  1. The array of dependency frames, which are the set of frames, each representing an operation, which will hold dependency relationships with other dependency frames.

    gst_vulkan_operation_add_dependency_frame appends frames to this array.

    When calling gst_vulkan_operation_end the frame’s barrier state for each frame in the array is updated.

    Also, each dependency frame creates a timeline semaphore, which will be signaled when a command, associated with the frame, is executed in the device’s queue.

  2. The array of barriers, which contains a list of synchronization commands. gst_vulkan_operation_add_frame_barrier fills and appends a VkImageMemoryBarrier2KHR associated with a frame, which can be in the array of dependency frames.

Here’s a generic view of video decoding example:

gst_vulkan_operation_begin (priv->exec, ...);

cmd_buf = priv->exec->cmd_buf->cmd;

gst_vulkan_operation_add_dependency_frame (exec, out,
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR,
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR);

/* assume a engine where out frames can be used for DPB frames, */
/* so a barrier for layout transition is required */
gst_vulkan_operation_add_frame_barrier (exec, out,
VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
VK_ACCESS_2_VIDEO_DECODE_WRITE_BIT_KHR,
VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR, NULL);

for (i = 0; i < dpb_size; i++) {
gst_vulkan_operation_add_dependency_frame (exec, dpb_frame,
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR,
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR);
}

barriers = gst_vulkan_operation_retrieve_image_barriers (exec);
vkCmdPipelineBarrier2 (cmd_buf, &(VkDependencyInfo) {
...
.pImageMemoryBarriers = barriers->data,
.imageMemoryBarrierCount = barriers->len,
});
g_array_unref (barriers);

vkCmdBeginVideoCodingKHR (cmd_buf, &decode_start);
vkCmdDecodeVideoKHR (cmd_buf, &decode_info);
vkCmdEndVideoCodingKHR (cmd_buf, &decode_end);

gst_vulkan_operation_end (exec, ...);

Here, just one memory barrier is required for memory layout transition, but semaphores are required to signal when an output frame and its DPB frames are processed, and later, the output frame can be used as a DPB frame. Otherwise, the output frame might not be fully reconstructed with it’s used as DPB for the next output frame, generating only noise.

And that’s all. Thank you.

June 06, 2024 12:00 AM

June 05, 2024

Alberto Garcia

More ways to install software in SteamOS: Distrobox and Nix

Introduction

In my previous post I talked about how to use systemd-sysext to add software to the Steam Deck without modifying the root filesystem. In this post I will give a brief overview of two additional methods.

Distrobox

distrobox is a tool that uses containers to create a mutable environment on top of your OS.

Distrobox running in SteamOS

With distrobox you can open a terminal with your favorite Linux distro inside, with full access to the package manager and the ability to install additional software. Containers created by distrobox are integrated with the system so apps running inside have normal access to the user’s home directory and the Wayland/X11 session.

Since these containers are not stored in the root filesystem they can survive an OS update and continue to work fine. For this reason they are particularly suited to systems with an immutable root filesystem such as Silverblue, Endless OS or SteamOS.

Starting from SteamOS 3.5 the system comes with distrobox (and podman) preinstalled and it can be used right out of the box without having to do any previous setup.

For example, in order to create a Debian bookworm container simply open a terminal and run this:

$ distrobox create -i debian:bookworm debbox

Here debian:bookworm is the image that this container is created from (debian is the name and bookworm is the tag, see the list of supported tags here) and debbox is the name that is given to this new container.

Once the container is created you can enter it:

$ distrobox enter debbox

Or from the ‘Debian’ entry in the desktop menu -> Lost & Found.

Once inside the container you can run your Debian commands normally:

$ sudo apt update
$ sudo apt install vim-gtk3

Nix

Nix is a package manager for Linux and other Unix-like systems. It has the property that it can be installed alongside the official package manager of any distribution, allowing the user to add software without affecting the rest of the system.

Nix running in SteamOS

Nix installs everything under the /nix directory, and packages are made available to the user through a new entry in the PATH and a ~/.nix-profile symlink stored in the home directory.

Nix is more things, including the basis of the NixOS operating system. Explaning Nix in more detail is beyond the scope of this blog post, but for SteamOS users these are perhaps its most interesting properties:

  • Nix is self-contained: all packages and their dependencies are installed under /nix.
  • Unlike software installed with pacman, Nix survives OS updates.
  • Unlike podman / distrobox, Nix does not create any containers. All packages have normal access to the rest of the system, just like native SteamOS packages.
  • Nix has a very large collection of packages, here is a search engine: https://search.nixos.org/packages

The only thing that Nix needs from SteamOS is help to set up the /nix directory so its contents are not stored in the root filesystem. This is already happening starting from SteamOS 3.5 so you can install Nix right away in single-user mode:

$ sudo chown deck:deck /nix
$ wget https://nixos.org/nix/install
$ sh ./install --no-daemon

This installs Nix and adds a line to ~/.bash_profile to set up the necessary environment variables. After that you can log in again and start using it. Here’s a very simple example (refer to the official documentation for more details):

# Install and run Midnight Commander
$ nix-env -iA nixpkgs.mc
$ mc

# List installed packages
$ nix-env -q
mc-4.8.31
nix-2.21.1

# Uninstall Midnight Commander
$ nix-env -e mc-4.8.31

What we have seen so far is how to install Nix in single-user mode, which is the simplest one and probably good enough for a single-user machine like the Steam Deck. The Nix project however recommends a multi-user installation, see here for the reasons.

Unfortunately the official multi-user installer does not work out of the box on the Steam Deck yet, but if you want to go the multi-user way you can use the Determinate Systems installer: https://github.com/DeterminateSystems/nix-installer

Conclusion

Distrobox and Nix are useful tools and they give SteamOS users the ability to add additional software to the system without having to modify the base operating system.

While for graphical applications the recommended way to install third-party software is still Flatpak, Distrobox and Nix give the user additional flexibility and are particularly useful for installing command-line utilities and other system tools.

by berto at June 05, 2024 03:53 PM

May 27, 2024

Andy Wingo

cps in hoot

Good morning good morning! Today I have another article on the Hoot Scheme-to-Wasm compiler, this time on Hoot’s use of the continuation-passing-style (CPS) transformation.

calls calls calls

So, just a bit of context to start out: Hoot is a Guile, Guile is a Scheme, Scheme is a Lisp, one with “proper tail calls”: function calls are either in tail position, syntactically, in which case they are tail calls, or they are not in tail position, in which they are non-tail calls. A non-tail call suspends the calling function, putting the rest of it (the continuation) on some sort of stack, and will resume when the callee returns. Because non-tail calls push their continuation on a stack, we can call them push calls.

(define (f)
  ;; A push call to g, binding its first return value.
  (define x (g))
  ;; A tail call to h.
  (h x))

Usually the problem in implementing Scheme on other language run-times comes in tail calls, but WebAssembly supports them natively (except on JSC / Safari; should be coming at some point though). Hoot’s problem is the reverse: how to implement push calls?

The issue might seem trivial but it is not. Let me illustrate briefly by describing what Guile does natively (not compiled to WebAssembly). Firstly, note that I am discussing residual push calls, by which I mean to say that the optimizer might remove a push call in the source program via inlining: we are looking at those push calls that survive the optimizer. Secondly, note that native Guile manages its own stack instead of using the stack given to it by the OS; this allows for push-call recursion without arbitrary limits. It also lets Guile capture stack slices and rewind them, which is the fundamental building block we use to implement exception handling, Fibers and other forms of lightweight concurrency.

The straightforward function call will have an artificially limited total recursion depth in most WebAssembly implementations, meaning that many idiomatic uses of Guile will throw exceptions. Unpleasant, but perhaps we could stomach this tradeoff. The greater challenge is how to slice the stack. That I am aware of, there are three possible implementation strategies.

generic slicing

One possibility is that the platform provides a generic, powerful stack-capture primitive, which is what Guile does. The good news is that one day, the WebAssembly stack-switching proposal should provide this too. And in the meantime, the so-called JS Promise Integration (JSPI) proposal gets close: if you enter Wasm from JS via a function marked as async, and you call out to JavaScript to a function marked as async (i.e. returning a promise), then on that nested Wasm-to-JS call, the engine will suspend the continuation and resume it only when the returned promise settles (i.e. completes with a value or an exception). Each entry from JS to Wasm via an async function allocates a fresh stack, so I understand you can have multiple pending promises, and thus multiple wasm coroutines in progress. It gets a little gnarly if you want to control when you wait, for example if you might want to wait on multiple promises; in that case you might not actually mark promise-returning functions as async, and instead import an async-marked async function waitFor(p) { return await p} or so, allowing you to use Promise.race and friends. The main problem though is that JSPI is only for JavaScript. Also, its stack sizes are even smaller than the the default stack size.

instrumented slicing

So much for generic solutions. There is another option, to still use push calls from the target machine (WebAssembly), but to transform each function to allow it to suspend and resume. This is what I think of as Joe Marshall’s stack trick (also see §4.2 of the associated paper). The idea is that although there is no primitive to read the whole stack, each frame can access its own state. If you insert a try/catch around each push call, the catch handler can access local state for activations of that function. You can slice a stack by throwing a SaveContinuation exception, in which each frame’s catch handler saves its state and re-throws. And if we want to avoid exceptions, we can use checked returns as Asyncify does.

I never understood, though, how you resume a frame. The Generalized Stack Inspection paper would seem to indicate that you need the transformation to introduce a function to run “the rest of the frame” at each push call, which becomes the Invoke virtual method on the reified frame object. To avoid code duplication you would have to make normal execution flow run these Invoke snippets as well, and that might undo much of the advantages. I understand the implementation that Joe Marshall was working on was an interpreter, though, which bounds the number of sites needing such a transformation.

cps transformation

The third option is a continuation-passing-style transformation. A CPS transform results in a program whose procedures “return” by tail-calling their “continuations”, which themselves are procedures. Taking our previous example, a naïve CPS transformation would reify the following program:

(define (f' k)
  (g' (lambda (x) (h' k x))))

Here f' (“f-prime”) receives its continuation as an argument. We call g', for whose continuation argument we pass a closure. That closure is the return continuation of g, binding a name to its result, and then tail-calls h with respect to f. We know their continuations are the same because it is the same binding, k.

Unfortunately we can’t really slice abitrary ranges of a stack with the naïve CPS transformation: we can only capture the entire continuation, and can’t really inspect its structure. There is also no way to compose a captured continuation with the current continuation. And, in a naïve transformation, we would be constantly creating lots of heap allocation for these continuation closures; a push call effectively pushes a frame onto the heap as a closure, as we did above for g'.

There is also the question of when to perform the CPS transform; most optimizing compilers would like a large first-order graph to work on, which is out of step with the way CPS transformation breaks functions into many parts. Still, there is a nugget of wisdom here. What if we preserve the conventional compiler IR for most of the pipeline, and only perform the CPS transformation at the end? In that way we can have nice SSA-style optimizations. And, for return continuations of push calls, what if instead of allocating a closure, we save the continuation data on an explicit stack. As Andrew Kennedy notes, closures introduced by the CPS transform follow a stack discipline, so this seems promising; we would have:

(define (f'' k)
  (push! k)
  (push! h'')
  (g'' (lambda (x)
         (define h'' (pop!))
         (define k (pop!))
         (h'' k x))))

The explicit stack allows for generic slicing, which makes it a win for implementing delimited continuations.

hoot and cps

Hoot takes the CPS transformation approach with stack-allocated return closures. In fact, Hoot goes a little farther, too far probably:

(define (f''')
  (push! k)
  (push! h''')
  (push! (lambda (x)
           (define h'' (pop!))
           (define k (pop!))
           (h'' k x)))
  (g'''))

Here instead of passing the continuation as an argument, we pass it on the stack of saved values. Returning pops off from that stack; for example, (lambda () 42) would transform as (lambda () ((pop!) 42)). But some day I should go back and fix it to pass the continuation as an argument, to avoid excess stack traffic for leaf function calls.

There are some gnarly details though, which I know you are here for!

splits

For our function f, we had to break it into two pieces: the part before the push-call to g and the part after. If we had two successive push-calls, we would instead split into three parts. In general, each push-call introduces a split; let us use the term tails for the components produced by a split. (You could also call them continuations.) How many tails will a function have? Well, one for the entry, one for each push call, and one any time control-flow merges between two tails. This is a fixpoint problem, given that the input IR is a graph. (There is also some special logic for call-with-prompt but that is too much detail for even this post.)

where to save the variables

Guile is a dynamically-typed language, having a uniform SCM representation for every value. However in the compiler and run-time we can often unbox some values, generally as u64/s64/f64 values, but also raw pointers of some specific types, some GC-managed and some not. In native Guile, we can just splat all of these data members into 64-bit stack slots and rely on the compiler to emit stack maps to determine whether a given slot is a double or a tagged heap object reference or what. In WebAssembly though there is no sum type, and no place we can put either a u64 or a (ref eq) value. So we have not one stack but three (!) stacks: one for numeric values, implemented using a Wasm memory; one for (ref eq) values, using a table; and one for return continuations, because the func type hierarchy is disjoin from eq. It’s.... it’s gross? It’s gross.

what variables to save

Before a push-call, you save any local variables that will be live after the call. This is also a flow analysis problem. You can leave off constants, and instead reify them anew in the tail continuation.

I realized, though, that we have some pessimality related to stacked continuations. Consider:

(define (q x)
  (define y (f))
  (define z (f))
  (+ x y z))

Hoot’s CPS transform produces something like:

(define (q0 x)
  (save! x)
  (save! q1)
  (f))

(define (q1 y)
  (restore! x)
  (save! x)
  (save! y)
  (save! q2)
  (f))

(define (q2 z)
  (restore! x)
  (restore! y)
  ((pop!) (+ x y z)))

So q0 saved x, fine, indeed we need it later. But q1 didn’t need to restore x uselessly, only to save it again on q2‘s behalf. Really we should be applying a stack discipline for saved data within a function. Given that the source IR is a graph, this means another flow analysis problem, one that I haven’t thought about how to solve yet. I am not even sure if there is a solution in the literature, given that the SSA-like flow graphs plus tail calls / CPS is a somewhat niche combination.

calling conventions

The continuations introduced by CPS transformation have associated calling conventions: return continuations may have the generic varargs type, or the compiler may have concluded they have a fixed arity that doesn’t need checking. In any case, for a return, you call the return continuation with the returned values, and the return point then restores any live-in variables that were previously saved. But for a merge between tails, you can arrange to take the live-in variables directly as parameters; it is a direct call to a known continuation, rather than an indirect call to an unknown call site.

cps soup?

Guile’s intermediate representation is called CPS soup, and you might wonder what relationship that CPS has to this CPS. The answer is not much. The continuations in CPS soup are first-order; a term in one function cannot continue to a continuation in another function. (Inlining and contification can merge graphs from different functions, but the principle is the same.)

It might help to explain that it is the same relationship as it would be if Guile represented programs using SSA: the Hoot CPS transform runs at the back-end of Guile’s compilation pipeline, where closures representations have already been made explicit. The IR is still direct-style, just that syntactically speaking, every call in a transformed program is a tail call. We had to introduce save and restore primitives to implement the saved variable stack, and some other tweaks, but generally speaking, the Hoot CPS transform ensures the run-time all-tail-calls property rather than altering the compile-time language; a transformed program is still CPS soup.

fin

Did we actually make the right call in going for a CPS transformation?

I don’t have good performance numbers at the moment, but from what I can see, the overhead introduced by CPS transformation can impose some penalties, even 10x penalties in some cases. But some results are quite good, improving over native Guile, so I can’t be categorical.

But really the question is, is the performance acceptable for the functionality, and there I think the answer is more clear: we have a port of Fibers that I am sure Spritely colleagues will be writing more about soon, we have good integration with JavaScript promises while not relying on JSPI or Asyncify or anything else, and we haven’t had to compromise in significant ways regarding the source language. So, for now, I am satisfied, and looking forward to experimenting with the stack slicing proposal as it becomes available.

Until next time, happy hooting!

by Andy Wingo at May 27, 2024 12:36 PM

May 24, 2024

Andy Wingo

hoot's wasm toolkit

Good morning! Today we continue our dive into the Hoot Scheme-to-WebAssembly compiler. Instead of talking about Scheme, let’s focus on WebAssembly, specifically the set of tools that we have built in Hoot to wrangle Wasm. I am peddling a thesis: if you compile to Wasm, probably you should write a low-level Wasm toolchain as well.

(Incidentally, some of this material was taken from a presentation I gave to the Wasm standardization organization back in October, which I think I haven’t shared yet in this space, so if you want some more context, have at it.)

naming things

Compilers are all about names: definitions of globals, types, local variables, and so on. An intermediate representation in a compiler is a graph of definitions and uses in which the edges are names, and the set of possible names is generally unbounded; compilers make more names when they see fit, for example when copying a subgraph via inlining, and remove names if they determine that a control or data-flow edge is not necessary. Having an unlimited set of names facilitates the graph transformation work that is the essence of a compiler.

Machines, though, generally deal with addresses, not names; one of the jobs of the compiler back-end is to tabulate the various names in a compilation unit, assigning them to addresses, for example when laying out an ELF binary. Some uses may refer to names from outside the current compilation unit, as when you use a function from the C library. The linker intervenes at the back-end to splice in definitions for dangling uses and applies the final assignment of names to addresses.

When targetting Wasm, consider what kinds of graph transformations you would like to make. You would probably like for the compiler to emit calls to functions from a low-level run-time library written in wasm. Those functions are probably going to pull in some additional definitions, such as globals, types, exception tags, and so on. Then once you have your full graph, you might want to lower it, somehow: for example, you choose to use the stringref string representation, but browsers don’t currently support it; you run a post-pass to lower to UTF-8 arrays, but then all your strings are not constant, meaning they can’t be used as global initializers; so you run another post-pass to initialize globals in order from the start function. You might want to make other global optimizations as well, for example to turn references to named locals into unnamed stack operands (not yet working :).

Anyway what I am getting at is that you need a representation for Wasm in your compiler, and that representation needs to be fairly complete. At the very minimum, you need a facility to transform that in-memory representation to the standard WebAssembly text format, which allows you to use a third-party assembler and linker such as Binaryen’s wasm-opt. But since you have to have the in-memory representation for your own back-end purposes, probably you also implement the names-to-addresses mapping that will allow you to output binary WebAssembly also. Also it could be that Binaryen doesn’t support something you want to do; for example Hoot uses block parameters, which are supported fine in browsers but not in Binaryen.

(I exaggerate a little; Binaryen is a more reasonable choice now than it was before the GC proposal was stabilised. But it has been useful to be able to control Hoot’s output, for example as the exception-handling proposal has evolved.)

one thing leads to another

Once you have a textual and binary writer, and an in-memory representation, perhaps you want to be able to read binaries as well; and perhaps you want to be able to read text. Reading the text format is a little annoying, but I had implemented it already in JavaScript a few years ago; and porting it to Scheme was a no-brainer, allowing me to easily author the run-time Wasm library as text.

And so now you have the beginnings of a full toolchain, built just out of necessity: reading, writing, in-memory construction and transformation. But how are you going to test the output? Are you going to require a browser? That’s gross. Node? Sure, we have to check against production Wasm engines, and that’s probably the easiest path to take; still, would be nice if this were optional. Wasmtime? But that doesn’t do GC.

No, of course not, you are a dirty little compilers developer, you are just going to implement a little wasm interpreter, aren’t you. Of course you are. That way you can build nice debugging tools to help you understand when things go wrong. Hoot’s interpreter doesn’t pretend to be high-performance—it is not—but it is simple and it just works. Massive kudos to Spritely hacker David Thompson for implementing this. I think implementing a Wasm VM also had the pleasant side effect that David is now a Wasm expert; implementation is the best way to learn.

Finally, one more benefit of having a Wasm toolchain as part of the compiler: %inline-wasm. In my example from last time, I had this snippet that makes a new bytevector:

(%inline-wasm
 '(func (param $len i32) (param $init i32)
    (result (ref eq))
    (struct.new
     $mutable-bytevector
     (i32.const 0)
     (array.new $raw-bytevector
                (local.get $init)
                (local.get $len))))
 len init)

%inline-wasm takes a literal as its first argument, which should parse as a Wasm function. Parsing guarantees that the wasm is syntactically valid, and allows the arity of the wasm to become apparent: we just read off the function’s type. Knowing the number of parameters and results is one thing, but we can do better, in that we also know their type, which we use for intentional types, requiring in this case that the parameters be exact integers which get wrapped to the signed i32 range. The resulting term is spliced into the CPS graph, can be analyzed for its side effects, and ultimately when written to the binary we replace each local reference in the Wasm with a reference of the appropriate local variable. All this is possible because we have the tools to work on Wasm itself.

fin

Hoot’s Wasm toolchain is about 10K lines of code, and is fairly complete. I think it pays off for Hoot. If you are building a compiler targetting Wasm, consider budgetting for a 10K SLOC Wasm toolchain; you won’t regret it.

Next time, an article on Hoot’s use of CPS. Until then, happy hacking!

by Andy Wingo at May 24, 2024 10:37 AM

Víctor Jáquez

GStreamer Hackfest 2024

Last weeks were a bit hectic. First, with a couple friends we biked the southwest of the Netherlands for almost a week. The next week, the last one, I attended the 2024 Display Next Hackfest

This week was Igalia’s Assembly meetings, and next week, along with other colleagues, I’ll be in Thessaloniki for the GStreamer Spring Hackfest

I’m happy to meet again friends from the GStreamer community and talk and move things forward related with Vulkan, VA-API, KMS, video codecs, etc.

May 24, 2024 12:00 AM

May 23, 2024

Patrick Griffis

Introducing the WebKit Container SDK

Developing WebKitGTK and WPE has always had challenges such as the amount of dependencies or it’s fairly complex C++ codebase which not all compiler versions handle well. To help with this we’ve made a new SDK to make it easier.

Current Solutions

There have always been multiple ways to build WebKit and its dependencies on your host however this was never a great developer experience. Only very specific hosts could be “supported”, you often had to build a large number of dependencies, and the end result wasn’t very reproducable for others.

The current solution used by default is a Flatpak based one. This was a big improvement for ease of use and excellent for reproducablity but it introduced many challenges doing development work. As it has a strict sandbox and provides read-only runtimes it was difficult to use complex tooling/IDEs or develop third party libraries in it.

The new SDK tries to take a middle ground between those two alternatives, isolating itself from the host to be somewhat reproducable, yet being a mutable environment to be flexible enough for a wide range of tools and workflows.

The WebKit Container SDK

At the core it is an Ubuntu OCI image with all of the dependencies and tooling needed to work on WebKit. On top of this we added some scripts to run/manage these containers with podman and aid in developing inside of the container. It’s intention is to be as simple as possible and not change traditional development workflows.

You can find the SDK and follow the quickstart guide on our GitHub: https://github.com/Igalia/webkit-container-sdk

The main requirements is that this only works on Linux with podman 4.0+ installed. For example Ubuntu 23.10+.

In the most simple case, once you clone https://github.com/Igalia/webkit-container-sdk.git, using the SDK can be a few commands:

source /your/path/to/webkit-container-sdk/register-sdk-on-host.sh
wkdev-create --create-home
wkdev-enter

From there you can use WebKit’s build scripts (./Tools/Scripts/build-webkit --gtk) or CMake. As mentioned before it is an Ubuntu installation so you can easily install your favorite tools directly like VSCode. We even provide a wkdev-setup-vscode script to automate that.

Advanced Usage

Disposibility

A workflow that some developers may not be familiar with is making use of entirely disposable development environments. Since these are isolated containers you can easily make two. This allows you to do work in parallel that would interfere with eachother while not worrying about it as well as being able to get back to a known good state easily:

wkdev-create --name=playground1
wkdev-create --name=playground2

podman rm playground1 # You would stop first if running.
wkdev-enter --name=playground2

Working on Dependencies

An important part of WebKit development is working on the dependencies of WebKit rather than itself, either for debugging or for new features. This can be difficult or error-prone with previous solutions. In order to make this easier we use a project called JHBuild which isn’t new but works well with containers and is a simple solution to work on our core dependencies.

Here is an example workflow working on GLib:

wkdev-create --name=glib
wkdev-enter --name=glib

# This will clone glib main, build, and install it for us. 
jhbuild build glib

# At this point you could simply test if a bug was fixed in a different versin of glib.
# We can also modify and debug glib directly. All of the projects are cloned into ~/checkout.
cd ~/checkout/glib

# Modify the source however you wish then install your new version.
jhbuild make

Remember that containers are isoated from each other so you can even have two terminals open with different builds of glib. This can also be used to test projects like Epiphany against your build of WebKit if you install it into the JHBUILD_PREFIX.

To Be Continued

In the next blog post I’ll document how to use VSCode inside of the SDK for debugging and development.

May 23, 2024 04:00 AM

May 22, 2024

Frédéric Wang

Flygskam and o caminho da Corunha

Prolegomenon

Early next June, I’m traveling from Paris to A Coruña for the Web Engines Hackfest and other internal Igalia events. In recent years I’ve done it by train, as previously mentioned. Some colleagues at Igalia were curious about it so I decided to write this blog post, investigating possible ways to do it from various European places. I wish this can also motivate more people at Igalia and beyond, who are still hesistant to give up alternatives with heavier carbon footprint.

In addition to various trip planners, I’ve also used this nice map from Wikipedia, which gives a good overview of high-speed rail in Europe:

I also sought advice from Nicolò Ribaudo who is quite familiar with train traveling and provided useful recommendations. In particular, he mentioned the ÖBB trip planner, which seems quite efficient combining trains from multiple operators.

I’ve focused on big european cities (with airports) that are close to Spain but this is definitely not exhaustive. There is probably a lot more to discuss beyong trip planning, but hopefully that can be a good starting point.

Paris as a departure or connection

Based on my experience traveling from Paris, I found these direct trains between big cities:

  • Renfe offers several AVE and Alvia trains traveling every days between A Coruña and Madrid, with a duration 3h30-4h 1. There are two important thing to note:
    1. These local trains are only avalailble for sell maybe 2 months in advance at best.
    2. These trains are connecting to Madrid Chamartín which is maybe 30 minutes away from Madrid Atocha by the Madrid metro.
  • Renfe also proposes even more options (say one each half hour during the day) between Barcelona and Madrid Atocha, with a duration of 2h30-3h 2.
  • The SNCF proposes two or three direct trains during the day between Paris Gare de Lyon and Barcelona, with a duration of 6h30-7h. If you are coming from Paris and want to to take a train to Madrid, you will likely have to cross the station and pass some x-ray baggage scanner, so be sure to keep enough time for the connection.
  • The Eurostar offers several options during the day to connect Paris with cities below. They connect to Gare du Nord or Gare de l’Est, which are very close to each other but ~30 minutes away from Gare de Lyon by public transport.

Personally, I’m doing this one-day-and-half trip (inbound trip is similar):

  1. Take the train from Paris (9:42 AM) to Barcelona (4:33 PM).
  2. Keep enough time for the Barcelona Sants connection.
  3. Take a train from Barcelona to Madrid in the evening.
  4. Stay one night in Madrid.
  5. Take a train from Madrid to A Coruña in the morning.

From London, Amsterdam, Brussels, Berlin one could instead do a two-days trip:

  1. Travel to Paris in the morning 3.
  2. Keep enough time for the Paris connection.
  3. Take the train from Paris (2:42PM) to Barcelona (9:27PM).
  4. Stay one night in Barcelona.
  5. Travel from Barcelona to Madrid Atocha.
  6. Keep enough time for the Madrid Metro connection.
  7. Travel from Madrid Chamartín to A Coruña.

I also looked at the trip with the minimum number of connections to go to Barcelona from big cities in Switzerland and a a similar traject is possible. See later for alternatives.

Finally, Nicolò mentioned that ÖBB recently started running a night train from Berlin to Paris, which you can probably use to do a similar trip as mine.

Estimate of CO2 emissions

In order to estimate CO2 emission for the trips suggested in the previous section, I compiled information from different sources:

There would be a lot to discuss about the methodology but the goal here is only to give a rough idea. Anyway, below is an estimate of kilograms of CO2 per passenger for some of the train trips previously mentioned:

  Eurostar SNCF Ecopassenger 4 Ecotree
Berlin ↔ Cologne - - 17-19 1-3
Cologne ↔ Paris 5.2 5.2 7.4 1-3
London ↔ Paris 2.4 2.4 1.7 1-3
Brussels ↔ Paris 1.6 1.8 1.8 1-2
Amsterdam ↔ Paris 2.6 2.9 9.3-9.5 1-3
Paris ↔ Barcelona - 3.8 - 1-6
Barcelona ↔ Madrid - - 17-20 1-6
Madrid ↔ A Coruña - - - 1-3

The best/worst cases are compiled into the following table and compared with a flight to A Coruña (with a connection by Madrid) as calculated by Ecotree. If we follow these data, a train from Berlin, London, Paris, Brussels and Amsterdam won’t at worst be around 50kg of CO2 per passenger and represent at least a reduction of around 90% compared to using a plane:

Departure Flight by Madrid (Ecotree) Train 5 CO2 reduction
Berlin 448 6-55.4 ≥87%
London 388 4-32 ≥91%
Brussels 396 4-30.8 ≥92%
Amsterdam 396 4-38.5 ≥90%
Paris 368 3-29 ≥92%
Madrid 244 1-3 ≥98%

More cities and trains

Disclaimer: This section is essentially based on information provided by Nicolò and what I found on internet.

In addition to the SNCF train previously mentioned, Renfe proposes two trains that are quite important for traveling from France to Spain:

  • One train between Lyon and Barcelona per day (duration ~5h)
  • One train between Marseille and Madrid per day (duration ~8h)

From Switzerland, you can pass by the south of France using more trains/connections to arrive faster in Barcelone than what was previously proposed. For example, taking a local train to go from Genève to Lyon followed by the Lyon to Barcelona train mentioned above can be done in around 7h. From Zurich, with connection at Genève and Lyon, it takes around 10h as opposed to 12h for a single connection in Paris.

From the Netherlands, Belgium or Germany it makes sense to consider the night trains from Paris to the border with Spain. Those trains do not have a fixed schedule, but vary depending on the weekday. Most of them arrive in Portbou and from there you can take a regional train to Barcelona. Some of them arrive to Latour-de-Carol, and from there it’s three hours on a regional train to Barcelona. In any case, you’ll be early enough at the border so that it’s possible to then arrive to A Coruña in the afternoon or evening. Rarely the night train arrives at the border on the west coast, and continuing from there to A Coruña with the regional trains that go on the northern coast might be a good experience.

From Belgium it’s also possible to take a TGV from Brussels to Lyon, and then from there take the train to Barcelona. This avoids a stressful connection in Paris, where you need to move between the two stations Gare du Nord and Gare de Lyon.

From Italy, the main trouble is to deal with connections. The Turin–Lyon high-speed railway contruction may help in the future but it’s not running yet. The alternatives are either to go on the coast through Genova-Ventimiglia-Nice, or with the Eurocity from Milan to Switzerland finally get to France.

From Portugal, which is so close geographically and culturally to Galicia, we could think there should be an easy way to travel to A Coruña. But apparently neither Comboios de Portugal nor Renfe provides any direct train:

  • From Porto, the ÖBB trip planner suggests connection at Vigo-Guixar for a total duration of 6h30. As a comparison, the website of the Web Engines Hackfest indicates a 3-hours trip by car and a 5-hours trip by bus.
  • Between Libon and Porto, Comboios de Portugal seems to propose trains taking 2h30-3h. You can combine that with the other trains to do the trip to A Coruña with one night at Porto or Vigo.

Last but not least, A Coruña railway station is a one-quarter walk away from the Igalia office and a three-quarter walks away from Palexco (Web Engines Hackfest’s venue). This is more convenient than the A Coruña airport which is around 10 km away from A Coruña.

Notes and references

  • Flygskam: anti-flying social movement started in Sweden, with the aim of reducing the environmental impact of aviation.
  • O Caminho de Santiago: The Way of St. James, a network of pilgrims’ ways leading to Santiago de Compostela (whose railway station you will likely stop at if you take a train to A Coruña).
  1. Incidentally, people travelling from far away are unlikely to find a direct flight to the A Coruña airport. In that case, using local trains from bigger airports like Madrid or Porto may be a better option. 

  2. Direct train between A Coruña and Barcelona seems to be rarer, slower and no longer available as night train. So a connection or night stay in Madrid seems the best option. 

  3. Deutsche Bahn offers a lot of Berlin to Cologne trains per day with a duration of 4h-4h30, including early/late trains or night trains, that you can combine with the Cologne-Paris Eurostar. Deutsche Bahn also offers (non-direct) ICE trains to go from Paris to Berlin. 

  4. Ecopassenger gives information per train, so I’m provided some lower/upper bounds based on different trains. 

  5. Based on the previous data from Eurostar, SNCF, Ecopassenger, Ecotree, trying to find the lowest/highest sum for each individual segment. 

May 22, 2024 10:00 PM

Andy Wingo

growing a bootie

Following on last week’s egregious discussion of the Hoot Scheme-to-WebAssembly compiler bootie, today I would like to examine another axis of boot, which is a kind of rebased branch of history: not the hack as it happened, but the logic inside the hack, the structure of the built thing, the history as it might have been. Instead of describing the layers of shims and props that we used while discovering what were building, let’s look at how we would build Hoot again, if we had to.

I think many readers of this blog will have seen Growing a Language, a talk / performance art piece in which Guy L. Steele—I once mentioned to him that Guy L. was one of the back-justifications for the name Guile; he did not take it well—in which Steele takes the set of monosyllabic words as primitives and builds up a tower of terms on top, bootstrapping a language as he goes. I just watched it again and I think it holds up, probably well enough to forgive the superfluous presence of the gender binary in the intro; ideas were different in the 1900s.

It is in the sense of that talk that I would like to look at growing a Hoot: how Hoot defines nouns and verbs in terms of smaller, more primitive terms: terms in terms of terms.

(hoot features) features (hoot primitives) primitives (ice-9 match) match (ice-9 match):s->(hoot primitives):n (hoot eq) eq (ice-9 match):s->(hoot eq):n (hoot pairs) pairs (ice-9 match):s->(hoot pairs):n (hoot vectors) vectors (ice-9 match):s->(hoot vectors):n (hoot equal) equal (ice-9 match):s->(hoot equal):n (hoot lists) lists (ice-9 match):s->(hoot lists):n (hoot errors) errors (ice-9 match):s->(hoot errors):n (hoot numbers) numbers (ice-9 match):s->(hoot numbers):n (fibers scheduler) scheduler (hoot ffi) ffi (fibers scheduler):s->(hoot ffi):n (guile) (guile) (fibers scheduler):s->(guile):n (fibers channels) channels (fibers channels):s->(ice-9 match):n (fibers waiter-queue) waiter-queue (fibers channels):s->(fibers waiter-queue):n (fibers operations) operations (fibers channels):s->(fibers operations):n (fibers channels):s->(guile):n (srfi srfi-9) srfi-9 (fibers channels):s->(srfi srfi-9):n (fibers waiter-queue):s->(ice-9 match):n (fibers waiter-queue):s->(fibers operations):n (fibers waiter-queue):s->(guile):n (fibers waiter-queue):s->(srfi srfi-9):n (fibers promises) promises (fibers promises):s->(fibers operations):n (hoot exceptions) exceptions (fibers promises):s->(hoot exceptions):n (fibers promises):s->(hoot ffi):n (fibers promises):s->(guile):n (fibers conditions) conditions (fibers conditions):s->(ice-9 match):n (fibers conditions):s->(fibers waiter-queue):n (fibers conditions):s->(fibers operations):n (fibers conditions):s->(guile):n (fibers conditions):s->(srfi srfi-9):n (fibers timers) timers (fibers timers):s->(fibers scheduler):n (fibers timers):s->(fibers operations):n (scheme time) time (fibers timers):s->(scheme time):n (fibers timers):s->(guile):n (fibers operations):s->(ice-9 match):n (fibers operations):s->(fibers scheduler):n (hoot boxes) boxes (fibers operations):s->(hoot boxes):n (fibers operations):s->(guile):n (fibers operations):s->(srfi srfi-9):n (hoot eq):s->(hoot primitives):n (hoot syntax) syntax (hoot eq):s->(hoot syntax):n (hoot strings) strings (hoot strings):s->(hoot primitives):n (hoot strings):s->(hoot eq):n (hoot strings):s->(hoot pairs):n (hoot bytevectors) bytevectors (hoot strings):s->(hoot bytevectors):n (hoot strings):s->(hoot lists):n (hoot bitwise) bitwise (hoot strings):s->(hoot bitwise):n (hoot char) char (hoot strings):s->(hoot char):n (hoot strings):s->(hoot errors):n (hoot strings):s->(hoot numbers):n (hoot match) match (hoot strings):s->(hoot match):n (hoot pairs):s->(hoot primitives):n (hoot bitvectors) bitvectors (hoot bitvectors):s->(hoot primitives):n (hoot bitvectors):s->(hoot bitwise):n (hoot bitvectors):s->(hoot errors):n (hoot bitvectors):s->(hoot match):n (hoot vectors):s->(hoot primitives):n (hoot vectors):s->(hoot pairs):n (hoot vectors):s->(hoot lists):n (hoot vectors):s->(hoot errors):n (hoot vectors):s->(hoot numbers):n (hoot vectors):s->(hoot match):n (hoot equal):s->(hoot primitives):n (hoot equal):s->(hoot eq):n (hoot equal):s->(hoot strings):n (hoot equal):s->(hoot pairs):n (hoot equal):s->(hoot bitvectors):n (hoot equal):s->(hoot vectors):n (hoot records) records (hoot equal):s->(hoot records):n (hoot equal):s->(hoot bytevectors):n (hoot not) not (hoot equal):s->(hoot not):n (hoot values) values (hoot equal):s->(hoot values):n (hoot hashtables) hashtables (hoot equal):s->(hoot hashtables):n (hoot equal):s->(hoot numbers):n (hoot equal):s->(hoot boxes):n (hoot equal):s->(hoot match):n (hoot exceptions):s->(hoot features):n (hoot exceptions):s->(hoot primitives):n (hoot exceptions):s->(hoot pairs):n (hoot exceptions):s->(hoot records):n (hoot exceptions):s->(hoot lists):n (hoot exceptions):s->(hoot syntax):n (hoot exceptions):s->(hoot errors):n (hoot exceptions):s->(hoot match):n (hoot cond-expand) cond-expand (hoot exceptions):s->(hoot cond-expand):n (hoot parameters) parameters (hoot parameters):s->(hoot primitives):n (hoot fluids) fluids (hoot parameters):s->(hoot fluids):n (hoot parameters):s->(hoot errors):n (hoot parameters):s->(hoot cond-expand):n (hoot records):s->(hoot primitives):n (hoot records):s->(hoot eq):n (hoot records):s->(hoot pairs):n (hoot records):s->(hoot vectors):n (hoot symbols) symbols (hoot records):s->(hoot symbols):n (hoot records):s->(hoot lists):n (hoot records):s->(hoot values):n (hoot records):s->(hoot bitwise):n (hoot records):s->(hoot errors):n (hoot ports) ports (hoot records):s->(hoot ports):n (hoot records):s->(hoot numbers):n (hoot records):s->(hoot match):n (hoot keywords) keywords (hoot records):s->(hoot keywords):n (hoot records):s->(hoot cond-expand):n (hoot dynamic-wind) dynamic-wind (hoot dynamic-wind):s->(hoot primitives):n (hoot dynamic-wind):s->(hoot syntax):n (hoot bytevectors):s->(hoot primitives):n (hoot bytevectors):s->(hoot bitwise):n (hoot bytevectors):s->(hoot errors):n (hoot bytevectors):s->(hoot match):n (hoot error-handling) error-handling (hoot error-handling):s->(hoot primitives):n (hoot error-handling):s->(hoot pairs):n (hoot error-handling):s->(hoot exceptions):n (hoot write) write (hoot error-handling):s->(hoot write):n (hoot control) control (hoot error-handling):s->(hoot control):n (hoot error-handling):s->(hoot fluids):n (hoot error-handling):s->(hoot errors):n (hoot error-handling):s->(hoot ports):n (hoot error-handling):s->(hoot numbers):n (hoot error-handling):s->(hoot match):n (hoot error-handling):s->(hoot cond-expand):n (hoot ffi):s->(hoot primitives):n (hoot ffi):s->(hoot strings):n (hoot ffi):s->(hoot pairs):n (hoot procedures) procedures (hoot ffi):s->(hoot procedures):n (hoot ffi):s->(hoot lists):n (hoot ffi):s->(hoot not):n (hoot ffi):s->(hoot errors):n (hoot ffi):s->(hoot numbers):n (hoot ffi):s->(hoot cond-expand):n (hoot debug) debug (hoot debug):s->(hoot primitives):n (hoot debug):s->(hoot match):n (hoot symbols):s->(hoot primitives):n (hoot symbols):s->(hoot errors):n (hoot assoc) assoc (hoot assoc):s->(hoot primitives):n (hoot assoc):s->(hoot eq):n (hoot assoc):s->(hoot pairs):n (hoot assoc):s->(hoot equal):n (hoot assoc):s->(hoot lists):n (hoot assoc):s->(hoot not):n (hoot procedures):s->(hoot primitives):n (hoot procedures):s->(hoot syntax):n (hoot write):s->(hoot primitives):n (hoot write):s->(hoot eq):n (hoot write):s->(hoot strings):n (hoot write):s->(hoot pairs):n (hoot write):s->(hoot bitvectors):n (hoot write):s->(hoot vectors):n (hoot write):s->(hoot records):n (hoot write):s->(hoot bytevectors):n (hoot write):s->(hoot symbols):n (hoot write):s->(hoot procedures):n (hoot write):s->(hoot bitwise):n (hoot write):s->(hoot char):n (hoot write):s->(hoot errors):n (hoot write):s->(hoot ports):n (hoot write):s->(hoot numbers):n (hoot write):s->(hoot keywords):n (hoot lists):s->(hoot primitives):n (hoot lists):s->(hoot pairs):n (hoot lists):s->(hoot values):n (hoot lists):s->(hoot numbers):n (hoot lists):s->(hoot match):n (hoot lists):s->(hoot cond-expand):n (hoot not):s->(hoot syntax):n (hoot syntax):s->(hoot primitives):n (hoot values):s->(hoot primitives):n (hoot values):s->(hoot syntax):n (hoot control):s->(hoot primitives):n (hoot control):s->(hoot parameters):n (hoot control):s->(hoot values):n (hoot control):s->(hoot cond-expand):n (hoot bitwise):s->(hoot primitives):n (hoot char):s->(hoot primitives):n (hoot char):s->(hoot bitvectors):n (hoot char):s->(hoot bitwise):n (hoot char):s->(hoot errors):n (hoot char):s->(hoot match):n (hoot dynamic-states) dynamic-states (hoot dynamic-states):s->(hoot primitives):n (hoot dynamic-states):s->(hoot vectors):n (hoot dynamic-states):s->(hoot debug):n (hoot dynamic-states):s->(hoot lists):n (hoot dynamic-states):s->(hoot values):n (hoot dynamic-states):s->(hoot errors):n (hoot dynamic-states):s->(hoot numbers):n (hoot dynamic-states):s->(hoot match):n (hoot read) read (hoot read):s->(hoot primitives):n (hoot read):s->(hoot eq):n (hoot read):s->(hoot strings):n (hoot read):s->(hoot pairs):n (hoot read):s->(hoot bitvectors):n (hoot read):s->(hoot vectors):n (hoot read):s->(hoot exceptions):n (hoot read):s->(hoot symbols):n (hoot read):s->(hoot lists):n (hoot read):s->(hoot not):n (hoot read):s->(hoot values):n (hoot read):s->(hoot char):n (hoot read):s->(hoot errors):n (hoot read):s->(hoot ports):n (hoot read):s->(hoot numbers):n (hoot read):s->(hoot match):n (hoot read):s->(hoot keywords):n (hoot hashtables):s->(hoot primitives):n (hoot hashtables):s->(hoot eq):n (hoot hashtables):s->(hoot pairs):n (hoot hashtables):s->(hoot vectors):n (hoot hashtables):s->(hoot procedures):n (hoot hashtables):s->(hoot lists):n (hoot hashtables):s->(hoot values):n (hoot hashtables):s->(hoot bitwise):n (hoot hashtables):s->(hoot errors):n (hoot hashtables):s->(hoot numbers):n (hoot fluids):s->(hoot primitives):n (hoot fluids):s->(hoot cond-expand):n (hoot errors):s->(hoot primitives):n (hoot atomics) atomics (hoot atomics):s->(hoot primitives):n (hoot ports):s->(hoot primitives):n (hoot ports):s->(hoot eq):n (hoot ports):s->(hoot strings):n (hoot ports):s->(hoot pairs):n (hoot ports):s->(hoot vectors):n (hoot ports):s->(hoot parameters):n (hoot ports):s->(hoot bytevectors):n (hoot ports):s->(hoot procedures):n (hoot ports):s->(hoot lists):n (hoot ports):s->(hoot not):n (hoot ports):s->(hoot values):n (hoot ports):s->(hoot bitwise):n (hoot ports):s->(hoot char):n (hoot ports):s->(hoot errors):n (hoot ports):s->(hoot numbers):n (hoot ports):s->(hoot boxes):n (hoot ports):s->(hoot match):n (hoot ports):s->(hoot cond-expand):n (hoot numbers):s->(hoot primitives):n (hoot numbers):s->(hoot eq):n (hoot numbers):s->(hoot not):n (hoot numbers):s->(hoot values):n (hoot numbers):s->(hoot bitwise):n (hoot numbers):s->(hoot errors):n (hoot numbers):s->(hoot match):n (hoot boxes):s->(hoot primitives):n (hoot match):s->(hoot primitives):n (hoot match):s->(hoot errors):n (hoot keywords):s->(hoot primitives):n (hoot cond-expand):s->(hoot features):n (hoot cond-expand):s->(hoot primitives):n (scheme lazy) lazy (scheme lazy):s->(hoot primitives):n (scheme lazy):s->(hoot records):n (scheme lazy):s->(hoot match):n (scheme base) base (scheme lazy):s->(scheme base):n (scheme load) load (scheme load):s->(hoot primitives):n (scheme load):s->(hoot errors):n (scheme load):s->(scheme base):n (scheme complex) complex (scheme complex):s->(hoot numbers):n (scheme time):s->(hoot primitives):n (scheme time):s->(scheme base):n (scheme file) file (scheme file):s->(hoot primitives):n (scheme file):s->(hoot errors):n (scheme file):s->(hoot ports):n (scheme file):s->(hoot match):n (scheme file):s->(scheme base):n (scheme write) write (scheme write):s->(hoot write):n (scheme eval) eval (scheme eval):s->(hoot errors):n (scheme eval):s->(scheme base):n (scheme inexact) inexact (scheme inexact):s->(hoot primitives):n (scheme inexact):s->(hoot numbers):n (scheme char) char (scheme char):s->(hoot primitives):n (scheme char):s->(hoot bitwise):n (scheme char):s->(hoot char):n (scheme char):s->(hoot numbers):n (scheme char):s->(scheme base):n (scheme process-context) process-context (scheme process-context):s->(hoot primitives):n (scheme process-context):s->(hoot errors):n (scheme process-context):s->(scheme base):n (scheme cxr) cxr (scheme cxr):s->(hoot pairs):n (scheme read) read (scheme read):s->(hoot read):n (scheme base):s->(hoot features):n (scheme base):s->(hoot primitives):n (scheme base):s->(hoot eq):n (scheme base):s->(hoot strings):n (scheme base):s->(hoot pairs):n (scheme base):s->(hoot vectors):n (scheme base):s->(hoot equal):n (scheme base):s->(hoot exceptions):n (scheme base):s->(hoot parameters):n (scheme base):s->(hoot dynamic-wind):n (scheme base):s->(hoot bytevectors):n (scheme base):s->(hoot error-handling):n (scheme base):s->(hoot symbols):n (scheme base):s->(hoot assoc):n (scheme base):s->(hoot procedures):n (scheme base):s->(hoot write):n (scheme base):s->(hoot lists):n (scheme base):s->(hoot not):n (scheme base):s->(hoot syntax):n (scheme base):s->(hoot values):n (scheme base):s->(hoot control):n (scheme base):s->(hoot char):n (scheme base):s->(hoot read):n (scheme base):s->(hoot errors):n (scheme base):s->(hoot ports):n (scheme base):s->(hoot numbers):n (scheme base):s->(hoot match):n (scheme base):s->(hoot cond-expand):n (scheme base):s->(srfi srfi-9):n (scheme repl) repl (scheme repl):s->(hoot errors):n (scheme repl):s->(scheme base):n (scheme r5rs) r5rs (scheme r5rs):s->(scheme lazy):n (scheme r5rs):s->(scheme load):n (scheme r5rs):s->(scheme complex):n (scheme r5rs):s->(scheme file):n (scheme r5rs):s->(scheme write):n (scheme r5rs):s->(scheme eval):n (scheme r5rs):s->(scheme inexact):n (scheme r5rs):s->(scheme char):n (scheme r5rs):s->(scheme process-context):n (scheme r5rs):s->(scheme cxr):n (scheme r5rs):s->(scheme read):n (scheme r5rs):s->(scheme base):n (scheme r5rs):s->(scheme repl):n (scheme case-lambda) case-lambda (scheme case-lambda):s->(hoot primitives):n (fibers) (fibers) (fibers):s->(fibers scheduler):n (fibers):s->(guile):n (guile):s->(hoot features):n (guile):s->(hoot primitives):n (guile):s->(ice-9 match):n (guile):s->(hoot eq):n (guile):s->(hoot strings):n (guile):s->(hoot pairs):n (guile):s->(hoot bitvectors):n (guile):s->(hoot vectors):n (guile):s->(hoot equal):n (guile):s->(hoot exceptions):n (guile):s->(hoot parameters):n (guile):s->(hoot dynamic-wind):n (guile):s->(hoot bytevectors):n (guile):s->(hoot error-handling):n (guile):s->(hoot symbols):n (guile):s->(hoot assoc):n (guile):s->(hoot procedures):n (guile):s->(hoot write):n (guile):s->(hoot lists):n (guile):s->(hoot not):n (guile):s->(hoot syntax):n (guile):s->(hoot values):n (guile):s->(hoot control):n (guile):s->(hoot bitwise):n (guile):s->(hoot char):n (guile):s->(hoot dynamic-states):n (guile):s->(hoot read):n (guile):s->(hoot fluids):n (guile):s->(hoot errors):n (guile):s->(hoot ports):n (guile):s->(hoot numbers):n (guile):s->(hoot boxes):n (guile):s->(hoot keywords):n (guile):s->(hoot cond-expand):n (guile):s->(scheme lazy):n (guile):s->(scheme time):n (guile):s->(scheme file):n (guile):s->(scheme char):n (guile):s->(scheme process-context):n (guile):s->(scheme base):n (guile):s->(srfi srfi-9):n (srfi srfi-9):s->(hoot primitives):n (srfi srfi-9):s->(hoot records):n

If you are reading this on the web, you should see above a graph of dependencies among the 50 or so libraries that are shipped as part of Hoot. (Somehow I doubt that a feed reader will plumb through the inline SVG, but who knows.) It’s a bit of a mess, but still I think it’s a useful illustration of a number of properties of how the Hoot language is grown from small to large. Click on any box to visit the source code for that module.

the root of the boot

Firstly, let us note that the graph is not a forest: it is a single tree. There is no module that does not depend (possibly indirectly) on (hoot primitives). This is because there are no capabilities that Hoot libraries can access without importing them, and the only way into the Hootosphere from outside is via the definitions in the primitives module.

So what are these definitions, you might ask? Well, these are the “well-known” bindings, for example + for which the compiler might have some special understanding, the sort of binding that gets translated to a primitive operation at the compiler IR level. They are used in careful ways by the modules that use (hoot primitives) to ensure that their uses are all open-coded by the compiler. (“Open coding” is inlining. But inlining to me implies that the whole implementation is inlined, with no slow-path callouts, whereas open coding implies to me that it’s the compiler that knows what the op does and may or may not inline the actual asm.)

But, (hoot primitives) also exposes some other definitions, for example define and let and lambda and all that. Scheme doesn’t have keywords in the sense that Python has def and with and such: there is no privileged way to associate a name with its meaning. It is in this sense that it is impossible to avoid (hoot primitives): the most simple (define x 42) depends on the lexical meaning of define, which is provided by the primitives module.

Syntax definitions are an expander construct; they are not present at run-time. Using a syntax definition causes the expander to invoke code, and the expander runs on the host system, which is Guile and not WebAssembly. So, syntax definitions belong to the host. This goes also for some first-order definitions such as syntax->datum and so on, which are only used in syntax expanders; these definitions are plumbed through (hoot primitives), but can only ever be used by macro definitions, which run on the meta-level.

(Is this too heavy? Allow me to lighten the mood: when I was 22 or so and working in Namibia, I somehow got an advance copy of Notes from the Metalevel. I was working on algorithmic music synthesis, and my chief strategy was knocking hubris together with itself, as one does. I sent the author a bunch of uninvited corrections to his book. I think it was completely unwelcome! Anyway, moral of the story, at 22 you get a free pass to do whatever you want, and come to think of it, now that I am 44 I think I should get some kind of hubris loyalty award or something.)

powerful primitives

So, there are expand-time primitives and run-time primitives. The expander knows about expand-time primitives and the compiler knows about run-time primitives. One particularly powerful primitive is %inline-wasm, which takes an inline snippet of WebAssembly as an s-expression and applies it to a number of arguments passed at run-time. Consider make-bytevector:

(define* (make-bytevector len #:optional (init 0))
  (%inline-wasm
   '(func (param $len i32) (param $init i32)
      (result (ref eq))
      (struct.new
       $mutable-bytevector
       (i32.const 0)
       (array.new $raw-bytevector
                  (local.get $init)
                  (local.get $len))))
   len init))

We have an inline snippet of wasm that makes a $mutable-bytevector. It passes 0 as the hash field, meaning that the hashq of this value will be lazily initialized, and the contents are a new array of a given size and initial value. Inputs will be unboxed to the appropriate type (two i32s in this case), and likewise with outputs; here we produce the universal (ref eq) representation.

The nice thing about %inline-wasm is that the compiler didn’t have to be taught about make-bytevector: this definition suffices, because %inline-wasm can access a number of lower-level capabilities.

dual denotations

But as we learned in my notes on whole-program compilation, any run-time definition is available at compile-time, if it is reachable from a syntax transformer. So this definition above isn’t quite sufficient; we can’t call make-bytevector as part of a procedural macro, which we might want to do. What we need instead is to provide one definition when residualizing wasm at run-time, and another when loading a module at expand-time.

In Hoot we do this with cond-expand, where we expand to %inline-wasm when targetting Hoot, and... what, precisely, at expand-time? Really we need to make a Guile bytevector, so in this sort of case, we end up having to include a run-time make-bytevector definition in the (hoot primitives) module. This happens whereever we end up using %inline-wasm.

building to guile

Returning to our graph, we see that there is a red-colored block for Hoot modules, a teal-colored layer on top for those modules that are defined by R7RS, a few oddballs, and then (guile) and Fibers built on top. The (guile) module provides a shim that implements Guile’s own default set of bindings, allowing Guile modules to be loaded on a Hoot system. (guile) is layered on top of the low-level Hoot libraries, and out of convenience, on top of the various R7RS libraries as well, because it was easiest to remember what was where in R7RS than our ad-hoc nest of Hoot internal libraries.

Having (guile) lets Guile hackers build on Hoot. It’s still incomplete but I think eventually it will be capital-G Good. Even for a library that needed more porting like Fibers (Hoot has no threads so much of the parallel concurrent ML implementation can be simplified, and we use an event loop from the Wasm run-time instead of an epoll-based scheduler), it was still pleasant to be able to use define-module and keyword arguments and all of that.

next layers

I mentioned that this tower of terms is incomplete, and so that is one of the next work items for Hoot: complete support for Guile’s run-time library. At that point we’d probably want to merge it into Guile, but that is another topic.

But let’s leave that for another day; until then, happy hacking!

by Andy Wingo at May 22, 2024 08:16 AM

May 21, 2024

Brian Kardell

Improving the WPT Dashboard

Improving the WPT Dashboard

Thoughts on things I'd like to see as part of the WPT dashboard.

In my last post I dug into the data behind wpt.fyi's Browser Specific Failures chart (below) and the site's general reporting capabilities.

I suggested that linking to data on specifically what failed (at least what is queryable) would probably be really helpful (perhaps some kind of "understanding this chart" link as well).

While these aren't part of the design today, I think this is mainly because the primary audience of that chart was originally mainly the vendors themselves. It was intended to allow for certain simple kinds of tracking, planning and prioritization. For example "Let's set a goal to not let the failure exceed such and such threshold" or "Let's aim to lower the failures by X this quarter". It wasn't critical to link to the tests because the audience knew how to interrogate the data - the purpose was just to get a quantifiable number you can easily report on.

But, now we see this chart shared a lot and it's pretty clear that people are curious so we should probably adjust it for the wider audience.

Additionally though, that's also only a single view of the data, and I'd like to argue that we could some other improvements too.

Prioritization

BSF made the observation that if we can identify a test that fails in only 1 browser, then that browser's team can easily prioritize something that has significant impact. That browser is the boat anchor holding things back. Except, it's not quite that cut and dry in reality.

Real management of software projects is hard. I think that anyone who's worked on software projects can relate to this, at least a bit, if we take some time to consider all of the things that go into choosing how to apply our limited resources. Obviously, not all failures are equal - especially when we're talking about projects which are a quarter of a century old. The reality is that all of that decisioning and prioritization is happening independently across different organizations, with different views on the web, different budgets, different legacy challenges, etc.

That's where I think there are some things to learn from Interop.

What I learned from Interop Is...

If you think about it, Interop is about trying to achieve thematically, basically the same thing as BSF: Make more things "green across the board". But it is a very different thing than BSF.

I've really learned a lot from the process of helping organize Interop every year about why this takes so long to happen naturally. There are so many limits and signals and opinions. One of the things we do as part of the process is to take all of the submissions and independently order them in terms of what we think are their priorities. There are 6 organizations doing that: Apple, Bocoup, Google, Igalia, Microsoft and Mozilla. How many do you think chose the same #1? The answer is 0.

It really highlights how waiting for all of the stars to align by chance winds up often being a painfully slow process and full of problems.

However, a huge part of interop is dedicated to dealing with the stuff BSF doesn't really consider - aligning agreement on: 1. what features are most important 2. which tests regarding those are valid/important 3. are all the spec questions really answered? 4. is this actually (hopefully) achievable in the next year?

In that, I believe it has been extremely successful in creating way more "green across the board" than anything else. I think this is true beyond even what is officially part of Interop, because we're all able to kind of discuss and see where others are probably going to invest in work because things that were important for them didn't make the cut.

In a way, each year is sort of like doing what we used to do with "CSS2" and "HTML4"... Creating a more focused discussion that is the goal floor, not the ceiling.

It's not enough... Sure, I believe this gives us much better results by helping alignment. I think this is obvious given how rapid and smoothly we've found so much high-quality alignment in recent years. However, there's something I want stress in all of this: Choosing what to prioritize is also inherently choosing what to collectively deprioritize. It is inevitable because at the end of the day there is just too much.

The only real solution to this problem is wider investment in the platform and, ultimately, almost certainly, changing how we fund it.

Alignment vs Passing

Interop also showed us that a simple, individual pass/fail can be incomplete and misleading. If 3 browsers reach a point of passing 50% of measured tests, the number of tests that pass in all browsers might actually be 0, as illustrated in the table below...

Chrome Firefox WebKit
Lots of tests pass, but not even one passes universally!

In fact, here's a real world example of exactly this kind of misleading view in a set of SVG tests. If we look at the numbers across the bottom:

  • chrome: 166 / 191
  • edge: 166 / 191
  • firefox: 175 / 191
  • safari: 132 / 191

It's not terrible if you're only looking at those numbers. But, if you scroll down through that table you'll see that there are ragged failures all over that. In fact, only 52 of 189 are "green across the board"!

We can only realistically solve by having a more holistic view and working together. BSF is just the slice that is theoretically actionable individually, not everything that matters.

What about a focus on Universally Passing?

In the Interop project we track the difference above as its own data point: The Interop number, and we put it as a separate column in the test tables:

a table containing a column for each individual browser scores on different features, and a column for the number that pass in all
The interop column reports how many tests pass on all of the tracked browsers

Similarly, we track it over time:

A graph showing scores of each browser over time as well as an "interop line"

Could we learn something from this? Wouldn't something like that be great to have in general?

For example, in the wpt.fyi tables? Now, it couldn't look just like that because those numbers are all in percentages, and this only really works because the interop process carefully sets a governance process for defining/agreeing to what the tests are. It would be enough to add a column to the table in the same form, something like this:

That might help us uncover situations like the SVG one above and present opportunites like interop for us to collectively decide to try to address that.

Similarly, we could track it over time. Sort of the opposite of BSF. We want to see the simple number of subtests passing in browsers and it should always be going up (even as new tests are added, no existing ones should stop passing - those are just more opportunities to go up). Further, ideally the Universally Passing number shouldn't ever be drawing significantly further away from that over time or we're making less of the platform universal. That is, you could see, over time when we are cooperating better, and when we are not.

We do better when we are. In my mind, that's an explicit goal, and this would be a view into it.

May 21, 2024 04:00 AM

May 20, 2024

Frédéric Wang

Time travel debugging of WebKit with rr

Introduction

rr is a debugging tool for Linux that was originally developed by Mozilla for Firefox. It has long been adopted by Igalia and other web platform developers for Chromium and WebKit too. Back in 2019, there were breakout sessions on this topic at the Web Engines Hackfest and BlinkOn.

For WebKitGTK, the Flatpak SDK provides a copy of rr, but recently I was unable to use the instructions on trac.webkit.org. Fortunately, my colleague Adrián Pérez suggested using a direct build without flatpak or the bubblewrap sandbox, and that indeed solved my problem. I thought it might be interesting to share this information with others, so I decided to write this blog post.

Disclaimer: The build instructions below may be imperfect, will likely become outdated, and are in any case not a replacement for the official ones for WebKitGTK development. Use them at your own risk!

CMake configuration

The approach that worked for me was thus to perform a direct build from my system. I came up with the following configuration step:

cmake -S. -BWebKitBuild/Release \
   -DCMAKE_BUILD_TYPE=Release \
   -DCMAKE_INSTALL_PREFIX=$HOME/WebKit/WebKitBuild/install/Release \
   -GNinja -DPORT=GTK -DENABLE_BUBBLEWRAP_SANDBOX=OFF \
   -DDEVELOPER_MODE=ON -DDEVELOPER_MODE_FATAL_WARNINGS=OFF \
   -DENABLE_TOOLS=ON -DENABLE_LAYOUT_TESTS=ON

where:

  • The -B option specifies the build directory, which is traditionnaly called WebKitBuild/ for the WebKit project.
  • CMAKE_BUILD_TYPE specifies the build type, e.g. optimized release builds (Release, corresponding to --release for the offical script) or debug builds with assertions (Debug, corresponding to --debug) 1.
  • CMAKE_INSTALL_PREFIX specifies the installation directory, which I place inside WebKitBuild/install/ 2.
  • The -G option specifies the build system generator. I used Ninja, which is the default for the offical script too.
  • -DPORT=GTK is for building WebKitGTK. I haven’t tested rr with other Linux ports.
  • -DENABLE_BUBBLEWRAP_SANDBOX=OFF was suggested by Adrián. The bubblewrap sandbox probably does not make sense without flatpak, so it should be safe to disable it anyway.
  • I extracted the other -D flags from the official script, trying to stay as close as possible to what it provides for WebKit development (being able to run layout tests, building the Tools/, ignoring fatal warnings, etc).

Needless to say, the advantage of using flatpak is that it automatically downloads and install all the required dependencies. But if you do your own build, you need to figure out what they are and perform the setup manually. Generally, this is straightforward using your distribution’s package manager, but there can be some tricky exceptions 3.

While we are still at the configuration step, I believe it’s worth sharing two more tricks for WebKit developers:

  • You can use -DENABLE_SANITIZERS=address to produce Asan builds or builds with other sanitizers.
  • You can use -DCMAKE_CXX_FLAGS="-DENABLE_TREE_DEBUGGING" in release builds if you want to get access to the tree debugging functions (ShowRenderTree and the like). This flag is turned on by default for debug builds.

Building and running WebKit

Once the configure step is successful, you can build and install WebKit using the following CMake command 2.

cmake --build WebKitBuild/Release --target install

When that operation completes, you should be able to run MiniBrowser with the following command:

LD_LIBRARY_PATH=WebKitBuild/install/Release/lib ./WebKitBuild/Release/bin/MiniBrowser

For WebKitTestRunner, some extra environment variables are necessary 2:

TEST_RUNNER_INJECTED_BUNDLE_FILENAME=$HOME/WebKit/WebKitBuild/Release/lib/libTestRunnerInjectedBundle.so LD_LIBRARY_PATH=WebKitBuild/install/Release/lib ./WebKitBuild/Release/bin/WebKitTestRunner filename.html

You can also use the official scripts, Tools/Script/run-minibrowser and Tools/Script/run-webkit-tests. They expect some particular paths, but a quick workaround is to use a symbolic link:

ln -s $HOME/WebKit/WebKitBuild $HOME/WebKit/WebKitBuild/GTK

Using rr for WebKit debugging

rr is generally easily installable from your distribution’s package manager. However, as stated on the project wiki page:

Support for the latest hardware and kernel features may require building rr from Github master.

Indeed, using the source has always worked best for me to avoid mysterious execution failures when starting the recording 4.

If you are not familiar with rr, I strongly invite you to take a look at the overview on the project home page or at some of the references I mentioned in the introduction. In any case, the first step is to record a trace by passing the program and arguments to rr. For example, to record a trace for MiniBrowser:

LD_LIBRARY_PATH=WebKitBuild/install/Debug/lib rr ./WebKitBuild/Debug/bin/MiniBrowser https://www.igalia.com/

After the program exits, you can replay the recorded trace as many times as you want. For hard-to-reproduce bugs (e.g. non-deterministic issues or involving a lot of manual steps), that means you only need to be able to record and reproduce the bug once and then can just focus on debugging. You can even turn off your machine after hours of exhausting debugging, then continue the effort later when you have more time and energy! The trace is played in a deterministic way, always using the same timing and pointer addresses. You can use most gdb commands (to run the program, interrupt it, and inspect data), but the real power comes from new commands to perform reverse execution!

Before coming to that, let’s explain how to handle programs with multiple processes, which is the case for WebKit and modern browsers in general. After you recorded a trace, you can display the pids of all recorded processes using the rr ps command. For example, we can see in the following output that the MiniBrowser process (pid 24103) actually forked three child processes, including the Network Process (pid 24113) and the Web Process (24116):

PID     PPID    EXIT    CMD
24103   --      0       ./WebKitBuild/Debug/bin/MiniBrowser https://www.igalia.com/
24113   24103   -9      ./WebKitBuild/Debug/bin/WebKitNetworkProcess 7 12
24115   24103   1       (forked without exec)
24116   24103   -9      ./WebKitBuild/Debug/bin/WebKitWebProcess 15 15

Here is a small debugging session similar to the single-process example from Chromium Chronicle #13 5. We use the option -p 24116 to attach the debugger to the Web Process and -e to start debugging from where it exited:

rr replay -p 24116 -e
(rr) break RenderFlexibleBox::layoutBlock
(rr) rc # Run back to the last layout call
Thread 2 hit Breakpoint 1, WebCore::RenderFlexibleBox::layoutBlock (this=0x7f66699cc400, relayoutChildren=false) at /home/fred/src-obj/WebKit/Source/WebCore/rendering/RenderFlexibleBox.cpp:420
(rr) # Inspect anything you want here. To find the previous Layout call on this object:
(rr) cond 1 this == 0x7f66699cc400
(rr) rc
Thread 2 hit Breakpoint 1, WebCore::RenderFlexibleBox::layoutBlock (this=0x7f66699cc400, relayoutChildren=false) at /home/fred/src-obj/WebKit/Source/WebCore/rendering/RenderFlexibleBox.cpp:420
420     {
(rr) delete 1
(rr) watch -l m_style.m_nonInheritedFlags.effectiveDisplay # Or find the last time the effective display was changed
Thread 4 hit Hardware watchpoint 2: -location m_style.m_nonInheritedFlags.effectiveDisplay

Old value = 16
New value = 0
0x00007f6685234f39 in WebCore::RenderStyle::RenderStyle (this=0x7f66699cc4a8) at /home/fred/src-obj/WebKit/Source/WebCore/rendering/style/RenderStyle.cpp:176
176     RenderStyle::RenderStyle(RenderStyle&&) = default;

rc is an abbreviation for reverse-continue and continues execution backward. Similarly, you can use reverse-next, reverse-step and reverse-finish commands, or their abbreviations. Notice that the watchpoint change is naturally reversed compared to normal execution: the old value (sixteen) is the one after intialization, while the new value (zero) is the one before initialization!

Restarting playback from a known point in time

rr also has a concept of “event” and associates a number to each event it records. They can be obtained by the when command, or printed to the standard output using the -M option. To elaborate a bit more, suppose you add the following printf in RenderFlexibleBox::layoutBlock:

@@ -423,6 +423,8 @@ void RenderFlexibleBox::layoutBlock(bool relayoutChildren, LayoutUnit)
     if (!relayoutChildren && simplifiedLayout())
         return;

+    printf("this=%p\n", this);
+

After building, recording and replaying again, the output should look like this:

$ rr -M replay -p 70285 -e # replay with the new PID of the web process.
...
[rr 70285 57408]this=0x7f742203fa00
[rr 70285 57423]this=0x7f742203fc80
[rr 70285 57425]this=0x7f7422040200
...

Each printed output is now annotated with two numbers in bracket: a PID and an event number. So in order to restart from when an interesting output happened (let’s say [rr 70285 57425]this=0x7f7422040200), you can now execute run 57425 from the debugging session, or equivalently:

rr replay -p 70285 -g 57425

Older traces and parallel debugging

Another interesting thing to know is that traces are stored in ~/.local/share/rr/ and you can always specify an older trace to the rr command e.g. rr ps ~/.local/share/rr/MiniBrowser-0. Be aware that the executable image must not change, but you can use rr pack to be able to run old traces after a rebuild, or even to copy traces to another machine.

To be honest, most the time I’m just using the latest trace. However, one thing I’ve sometimes found useful is what I would call the “parallel debugging” technique. Basically, I’m recording one trace for a testcase that exhibits the bug and another one for a very similar testcase (e.g. with one CSS property difference) that behaves correctly. Then I replay the two traces side by side, comparing them to understand where the issue comes from and what can be done to fix it.

The usage documentation also provides further tips, but this should be enough to get you started with time travel debugging in WebKit!

  1. RelWithDebInfo build type (which yields an optimized release build with debug symbols) might also be interesting to consider in some situations, e.g. debugging bugs that reproduce in release builds but not in debug builds. 

  2. Using an installation directory might not be necessary, but without that, I had trouble making the whole thing work properly (wrong libraries loaded or libraries not found).  2 3

  3. In my case, I chose the easiest path to disable some features, namely -DUSE_JPEGXL=OFF, -DUSE_LIBBACKTRACE=OFF, and -DUSE_GSTREAMER_TRANSCODER=OFF

  4. Incidentally, you are likely to get an error saying that perf_event_paranoid is required to be at most 1, which you can force using sudo sysctl kernel.perf_event_paranoid=1

  5. The equivalent example would probably have been to watch for the previous style change with watch -l m_style, but this was exceeding my hardware watchpoint limit, so I narrowed it down to a smaller observation scope. 

May 20, 2024 10:00 PM

May 16, 2024

Andy Wingo

on hoot, on boot

I realized recently that I haven’t been writing much about the Hoot Scheme-to-WebAssembly compiler. Upon reflection, I have been too conscious of its limitations to give it verbal tribute, preferring to spend each marginal hour fixing bugs and filling in features rather than publicising progress.

In the last month or so, though, Hoot has gotten to a point that pleases me. Not to the point where I would say “accept no substitutes” by any means, but good already for some things, and worth writing about.

So let’s start today by talking about bootie. Boot, I mean! The boot, the boot, the boot of Hoot.

hoot boot: temporal tunnel

The first axis of boot is time. In the beginning, there was nary a toot, and now, through boot, there is Hoot.

The first boot of Hoot was on paper. Christine Lemmer-Webber had asked me, ages ago, what I thought Guile should do about the web. After thinking a bit, I concluded that it would be best to avoid compromises when building an in-browser Guile: if you have to pollute Guile to match what JavaScript offers, you might as well program in JavaScript. JS is cute of course, but Guile is a bit different in some interesting ways, the most important of which is control: delimited continuations, multiple values, tail calls, dynamic binding, threads, and all that. If Guile’s web bootie doesn’t pack all the funk in its trunk, probably it’s just junk.

So I wrote up a plan something to which I attributed the name tailification. In retrospect, this is simply a specific flavor of a continuation-passing-style (CPS) transmutation, late in the compiler pipeline. I’ll elocute more in a future dispatch. I did end up writing the tailification pass back then; I could have continued to target JS, but it was sufficiently annoying and I didn’t prosecute. It sat around unused for a few years, until Christine’s irresistable charisma managed to conjure some resources for Hoot.

In the meantime, the GC extension for WebAssembly shipped (woot woot!), and to boot Hoot, I filled in the missing piece: a backend for Guile’s compiler that tailified and then translated primitive operations to snippets of WebAssembly.

It was, well, hirsute, but cute and it did compute, so we continued to boot. From this root we grew a small run-time library, written in raw WebAssembly, used for slow-paths for the various primitive operations that are part of Guile’s compiler back-end. We filled out Guile primcalls, in minute commits, growing the WebAssembly runtime library and toolchain as we went.

Eventually we started constituting facilities defined in terms of those primitives, via a Scheme prelude that was prepended to all programs, within a nested lexical environment. It was never our intention though to drown the user’s programs in a sea of predefined bindings, as if the ultimate program were but a vestigial inhabitant of the lexical lake—don’t dilute the newt!, we would often say [ed: we did not]— so eventually when the prelude became unmanageable, we finally figured out how to do whole-program compilation of a set of modules.

Then followed a long month in which I would uproot the loot from the boot: take each binding from the prelude and reattribute it into an appropriate module. User code could import all the modules that suit, as long as they were known to Hoot, but no others; it was only until we added the ability for users to programmatically consitute an environment from their modules that Hoot became a language implementation of any repute.

Which brings us to the work of the last month, about which I cannot be mute. When you have existing Guile code that you want to distribute via the web, Hoot required you transmute its module definitions into the more precise R6RS syntax. Precise, meaning that R6RS modules are static, in a way that Guile modules, at least in absolute terms, are not: Guile programs can use first-class accessors on the module systems to pull out bindings. This is yet another example of what I impute as the original sin of 1990s language development, that modules are just mutable hash maps. You see it in Python, for example: because you don’t know for sure to what values global names are bound, it is easy for any discussion of what a particular piece of code means to end in dispute.

The question is, though, are the semantics of name binding in a language fixed and absolute? Once your language is booted, are its aspects definitively attributed? I think some perfection, in the sense of becoming more perfect or more like the thing you should be, is something to salute. Anyway, in Guile it would be coherent with Scheme’s lexical binding heritage to restitute some certainty as to the meanings of names, at least in a default compilation node. Lexical binding is, after all, the foundation of the Macro Writer’s Statute of Rights. Of course if you are making a build for development purposes, not to distribute, then you might prefer a build that marks all bindings as dynamic. Otherwise I think it’s reasonable to require the user to explicitly indicate which definitions are denotations, and which constitute locations.

Hoot therefore now includes an implementation of the static semantics of Guile’s define-module: it can load Guile modules directly, and as a tribute, it also has an implementation of the ambient (guile) module that constitutes the lexical soup of modules that aren’t #:pure. (I agree, it would be better if all modules were explicit about the language they are written in—their imported bindings and so on—but there is an existing corpus to accomodate; the point is moot.)

The astute reader (whom I salute!) will note that we have a full boot: Hoot is a Guile. Not an implementation to substitute the original, but more of an alternate route to the same destination. So, probably we should scoot the two implementations together, to knock their boots, so to speak, merging the offshoot Hoot into Guile itself.

But do I circumlocute: I can only plead a case of acute Hoot. Tomorrow, we elocute on a second axis of boot. Until then, happy compute!

by Andy Wingo at May 16, 2024 08:01 PM

May 15, 2024

Manuel A. Fernandez Montecelo

WiFi back-ends in SteamOS 3.6

As of NetworkManager v1.46.0 (and a very long while before that, since 1.12 released in 2018), there are two wifi.backend's supported: wpa_supplicant (default), and iwd (experimental).

iwd is still considered experimental after these years, see the list of issues with iwd backend, and specially this one.

In SteamOS, however, the default is iwd. It works relatively well in most scenarios; and the main advantage is that, when waking up the device (think of Steam Deck returning from sleep, "suspended to RAM" or S3 sleeping state), it regains network connectivity much faster -- typically 1-2 seconds vs. 5+ for wpa_supplicant.

However, if for some reason iwd doesn't work properly in your case, you can give wpa_supplicant a try.

With steamos-wifi-set-backend command

In 3.6, the back-end can be changed via command.

3.6 has been available in Main channel since late March and in Beta/Preview for several days, for example 20240509.100.

If you haven't done this before, to be able to run it, one must be able to either log in via SSH (not available out of the box), or to switch to desktop-mode, open a terminal and be able to type commands.

Assuming that one wants to change from the default iwd to wpa_supplicant, the command is:

steamos-wifi-set-backend wpa_supplicant

The script tries to ensure than things are in place in terms of some systemd units being stopped and others brought up as appropriate. After a few seconds, if it's able to connect correctly to the WiFi network, the device should get connectivity (e.g. ping works). It is not necessary to enter again the preferred networks and passwords, NetworkManager feeds the back-ends with the necessary config and credentials.

It is possible to switch back and forth between back-ends by using alternatively iwd and wpa_supplicant as many times as desired, and without restarting the system.

The current back-end as per configuration can be obtained with:

steamos-wifi-set-backend --check

And finally, this is the output of changing to wpa_supplicant and back to iwd (i.e., the output that you should expect):

(deck@steamdeck ~)$ steamos-wifi-set-backend --check
iwd

(deck@steamdeck ~)$ steamos-wifi-set-backend wpa_supplicant
INFO: switching back-end to 'wpa_supplicant'
INFO: stopping old back-end service and restarting NetworkManager,
      networking will be disrupted (hopefully only momentary blip, max 10 seconds)...
INFO: restarting done
INFO: checking status of services ...
      (sleeping for 2 seconds to catch-up with state)
INFO: status OK

(deck@steamdeck ~)$ steamos-wifi-set-backend --check
wpa_supplicant

(deck@steamdeck ~)$ ping -c 3 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=56 time=16.4 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=56 time=24.8 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=56 time=16.5 ms

--- 1.1.1.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 16.370/19.195/24.755/3.931 ms

(deck@steamdeck ~)$ steamos-wifi-set-backend iwd
INFO: switching back-end to 'iwd'
INFO: stopping old back-end service and restarting NetworkManager,
      networking will be disrupted (hopefully only momentary blip, max 10 seconds)...
INFO: restarting done
INFO: checking status of services ...
      (sleeping for 2 seconds to catch-up with state)
INFO: status OK

(deck@steamdeck ~)$ steamos-wifi-set-backend --check
iwd

(deck@steamdeck ~)$ ping -c 3 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=56 time=16.0 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=56 time=16.5 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=56 time=21.3 ms

--- 1.1.1.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 15.991/17.940/21.295/2.382 ms

By hand

⚠️ IMPORTANT ⚠️ : Please do this only if you know what you are doing, you don't fear minor breakage and you are confident that you can recover from the situation if something goes wrong. If unsure, better wait until the channel that you run supports the commands to handle this automatically.

If you are running a channel that still does not have this command, e.g. Stable at the time of writing this, you can achieve the same effect by executing the instructions run by this script by hand.

Note, however, that you might easily loose network connectivity if you make some mistake or something goes wrong. So be prepared to restore things by hand by switching to desktop-mode and having a terminal ready.

Now, the actual instructions. In /etc/NetworkManager/conf.d/wifi_backend.conf, change the string of wifi.backend, for example comment out the default one with iwd and add a line using wpa_supplicant as value:

[device]
#wifi.backend=iwd
wifi.backend=wpa_supplicant
wifi.iwd.autoconnect=yes

⚠️ IMPORTANT ⚠️ : Leave the other lines alone. If you don't have experience with the format you can easily cause the file to not be parseable, and then networking could stop working entirely.

After that, things get more complicated depending on the hardware on which the system runs.

If the device is the SteamDeck, there is a difference between the original LCD and the newer OLED model. With the OLED model, when you switch away from iwd, typically the network device is destroyed, so it's necessary to re-create it. This happens automatically when booting the system, so the easiest way if the network doesn't come up after 15 seconds after the following instructions, is to just restart the whole system.

If you're running SteamOS on models of the SteamDeck that don't exist at the time of writing this, or on other types of computers, the situation can be similar to the OLED or even more complex, depending on the WiFi hardware and drivers. So, again, tread carefully.

In the simplest scenario, LCD model, execute the following commands as root (or prepending sudo), with OLD_BACKEND being either iwd or wpa_supplicant, the one that you want to migrate away from:

systemctl stop NetworkManager
systemctl disable --now OLD_BACKEND
systemctl restart NetworkManager

If you have the OLED model and switching from wpa_supplicant to iwd, that should be enough as well.

But if you have the OLED model and you changed from iwd to wpa_supplicant, as explained above, your system will not be able to connect to the network, and the easiest is to restart the system.

However, if you want to try to solve the problem by hand without restarting, then execute these commands as root/sudo:

systemctl stop NetworkManager
systemctl disable --now OLD_BACKEND

if ! iw dev wlan0 info &>/dev/null; then iw phy phy0 interface add wlan0 type station; fi

systemctl restart NetworkManager

In any case, if after trying to change the back-end by hand and rebooting it's still not being able to connect, perhaps the best thing to do is to restore the previous configuration and restart the system again, and wait until your channel is upgraded to include this command.

Change back-ends via UI?

In the future, it is conceivable that the UI allows to change WiFi backends with the help of this command or by other means, but this is not possible at the moment.

by Manuel A. Fernandez Montecelo at May 15, 2024 09:21 AM

May 13, 2024

Andy Wingo

partitioning pitfalls for generational collectors

You might have heard of long-pole problems: when you are packing a tent, its bag needs to be at least as big as the longest pole. (This is my preferred etymology; there are others.) In garbage collection, the long pole is the longest chain of object references; there is no way you can algorithmically speed up pointer-chasing of (say) a long linked-list.

As a GC author, some of your standard tools can mitigate the long-pole problem, and some don’t apply.

Parallelism doesn’t help: a baby takes 9 months, no matter how many people you assign to the problem. You need to visit each node in the chain, one after the other, and having multiple workers available to process those nodes does not get us there any faster. Parallelism does help in general, of course, but doesn’t help for long poles.

You can apply concurrency: instead of stopping the user’s program (the mutator) to enumerate live objects, you trace while the mutator is running. This can happen on mutator threads, interleaved with allocation, via what is known as incremental tracing. Or it can happen in dedicated tracing threads, which is what people usually mean when they refer to concurrent tracing. Though it does impose some coordination overhead on the mutator, the pause-time benefits are such that most production garbage collectors do trace concurrently with the mutator, if they have the development resources to manage the complexity.

Then there is partitioning: instead of tracing the whole graph all the time, try to just trace part of it and just reclaim memory for that part. This bounds the size of a long pole—it can’t be bigger than the partition you trace—and so tracing a graph subset should reduce total pause time.

The usual way to partition is generational, in which the main program allocates into a nursery, and objects that survive a few collections then get promoted into old space. But there may be other partitions, for example to put “large objects” (for example, bigger than a few virtual memory pages) in their own section, to be managed with their own algorithm.

partitions, take one

And that brings me to today’s article: generational partitioning has a failure mode which manifests itself as spurious out-of-memory. For example, in V8, running this small JavaScript program that repeatedly allocates a 1-megabyte buffer grows to consume all memory in the system and eventually panics:

while (1) new Uint8Array(1e6);

This is a funny result, to say the least. Let’s dig in a bit to see why.

First off, note that allocating a 1-megabyte Uint8Array makes a large object, because it is bigger than half a V8 page, which is 256 kB on most systems There is a backing store for the array that will get allocated into the large object space, and then the JS object wrapper that gets allocated in the young generation of the regular object space.

Let’s imagine the heap consists of the nursery, the old space, and a large object space (lospace, for short). (See the next section for a refinement of this model.) In the loop, we cause allocation in the nursery and the lospace. When the nursery starts to get full, we run a minor collection, to trace the newly allocated part of the object graph, possibly promoting some objects to the old generation.

Promoting an object adds to the byte count in the old generation. You could say that promoting causes allocation in the old-gen, though it might happen just on an accounting level if an object is promoted in place. In V8, old-gen allocation is subject to a limit, and that limit is set by a gnarly series of heuristics. Exceeding the limit will cause a major GC, in which the whole heap is traced.

So, minor collections cause major collections via promotion. But what if a minor collection never promotes? This can happen in request-oriented workflows, in which a request comes in, you handle it in JS, write out the result, and then handle the next. If the nursery is at least as large as the memory allocated in one request, no object will ever survive more than one collection. But in our new Uint8Array(1e6) example above, we are allocating to newspace and the lospace. If we never cause promotion, we will always collect newspace but never trigger the full GC that would also collect the large object space, triggering this out-of-memory phenomenon.

partitions, take two

In general, the phenomenon we are looking for is nursery allocations that cause non-nursery allocations, where those non-nursery allocations will not themselves bring about a major GC.

In our example above, it was a typed array object with an associated lospace backing store, assuming the lospace allocation wouldn’t eventually trigger GC. But V8’s heap is not exactly like our model, for one because it actually has separate nursery and old-generation lospaces, and for two because allocation to the old-generation lospace does count towards the old-gen allocation limit. And yet, that simple does still blow through all memory. So what is the deal?

The answer is that V8’s heap now has around two dozen spaces, and allocations to some of those spaces escape the limits and allocation counters. In this case, V8’s sandbox makes the link between the JS typed array object and its backing store pass through a table of indirections. At each allocation, we make an object in the nursery, allocate a backing store in the nursery lospace, and then allocate a new entry in the external pointer table, storing the index of that entry in the JS object. When the object needs to get its backing store, it dereferences the corresponding entry in the external pointer table.

Our tight loop above would therefore cause an allocation (in the external pointer table) that itself would not hasten a major collection.

Seen this way, one solution immediately presents itself: maybe we should find a way to make external pointer table entry allocations trigger a GC based on some heuristic, perhaps the old-gen allocated bytes counter. But then you have to actually trigger GC, and this is annoying and not always possible, as EPT entries can be allocated in response to a pointer write. V8 hacker Michael Lippautz summed up the issue in a comment that can be summarized as “it’s gnarly”.

In the end, it would be better if new-space allocations did not cause old-space allocations. We should separate the external pointer table (EPT) into old and new spaces. Because there is at most one “host” object that is associated with any EPT entry—if it’s zero objects, the entry is dead—then the heuristics that apply to host objects will do the right thing with regards to EPT entries.

A couple weeks ago, I landed a patch to do this. It was much easier said than done; the patch went through upwards of 40 revisions, as it took me a while to understand the delicate interactions between the concurrent and parallel parts of the collector, which were themselves evolving as I was working on the patch.

The challenge mostly came from the fact that V8 has two nursery implementations that operate in different ways.

The semi-space nursery (the scavenger) is a parallel stop-the-world collector, which is relatively straightforward... except that there can be a concurrent/incremental major trace in progress while the scavenger runs (a so-called interleaved minor GC). External pointer table entries have mark bits, which the scavenger collector will want to use to determine which entries are in use and which can be reclaimed, but the concurrent major marker will also want to use those mark bits. In the end we decided that if a major mark is in progress, an interleaved collection will neither mark nor sweep the external pointer table nursery; a major collection will finish soon, and reclaiming dead EPT entries will be its responsibility.

The minor mark-sweep nursery does not run concurrently with a major concurrent/incremental trace, which is something of a relief, but it does run concurrently with the mutator. When we promote a page, we also need to promote all EPT entries for objects on that page. To keep track of which objects have external pointers, we had to add a new remembered set, built up during a minor trace, and cleared when the page is swept (also lazily / concurrently). Promoting a page iterates through that set, evacuating EPT entries to the old space. This is additional work during the stop-the-world pause, but hopefully it is not too bad.

To be honest I don’t have the performance numbers for this one. It rides the train this week to Chromium 126 though, and hopefully it fixes this problem in a robust way.

partitions, take three

The V8 sandboxing effort has sprouted a number of indirect tables: external pointers, external buffers, trusted objects, and code objects. Also recently to better integrate with compressed pointers, there is also a table for V8-to-managed-C++ (Oilpan) references. I expect the Oilpan reference table will soon grow a nursery along the same lines as the regular external pointer table.

In general, if you have a generational partition of your main object space, it would seem that you almost always need a generational partition of every other space. Otherwise either you cause new allocations to occur in what is effectively an old space, perhaps needlessly hastening a major GC, or you forget to track allocations in that space, leading to a memory leak. If the generational hypothesis applies for a wrapper object, it probably also applies for any auxiliary allocations as well.

fin

I have a few cleanups remaining in this area but I am relieved to have landed this patch, and pleased to have spent time under the hood of V8’s collector. Many many thanks to Samuel Groß, Michael Lippautz, and Dominik Inführ for their review and patience. Until the next cycle, happy allocating!

by Andy Wingo at May 13, 2024 01:03 PM

May 12, 2024

Alex Bradbury

Notes from the Carbon panel session at EuroLLVM 2024

Last month I had the pleasure of attending EuroLLVM which featured a panel session on the Carbon programming language. It was recorded and of course we all know automated transcription can be stunningly accurate these days, yet I still took fairly extensive notes throughout. I often take notes (or near transcriptions) as I find it helps me process. I'm not sure whether I'm adding any value by writing this up, or just contributing entropy but here it goes. You should of course assume factual mistakes or odd comments are errors in my transcription or interpretation, and if you keep an eye on the LLVM YouTube channel you should find the session recording uploaded there in the coming months.

First, a bit of background on the session, Carbon, and my interest in it. Carbon is a programming language started by Google aiming to be used in many cases where you would currently use C++ (billed as "an experimental successor to C++"). The Carbon README gives a great description of its goals and purpose, but one point I'll highlight is that ease of interoperability with C++ is a key design goal and constraint. The project recognises that this limits the options for the language to such a degree that it explicitly recommends that if you are able to make use of a modern language like Go, Swift, Kotlin, or Rust, then you should. Carbon is intended to be there for when you need that C++ interoperability. Of course, (and as mentioned during the panel) there are parallel efforts to improve Rust's ability to interoperate with C++.

Whatever other benefits the Carbon project is able to deliver, I think there's huge value in the artefacts being produced as part of the design and decision making process so far. This has definitely been true of languages like Swift and especially Rust, where going back through the development history there's masses of discussion on difficult decisions e.g. (in the case of Rust) green threads, the removal of typestate, or internal vs external iterators. The Swift Evolution and Rust Internals forums are still a great read, but obviously there's little ability to revisit fundamental decisions at this point. I try to follow Carbon's development due to a combination of its openness, the fact it's early stage enough to be making big decisions (e.g. lambdas in Carbon), and also because they're exploring different design trade-offs than those other languages (e.g. the Carbon approach to definition-checked generics). I follow because I'm a fan of programming language design discussion, not necessarily because of the language itself, which may or may not turn out to be something I'm excited to program in. As a programming language development lurker I like to imagine I'm learning something by osmosis if nothing else - but perhaps it's just a displacement activity...

If you want to keep up with Carbon, then check out their recently started newsletter, issues/PRs/discussions on GitHub and there's a lot going on on Discord too (I find group chat distracting so don't really follow there). The current Carbon roadmap is of course worth a read too. In case I wasn't clear enough already, I have no affiliation to the project. From where I'm standing, I'm impressed by the way it's being designed and developed out in the open, more than (rightly or wrongly) you might assume given the Google origin.

Notes from the panel session

The panel "Carbon: An experiment in different tradeoffs" took place on April 11th at EuroLLVM 2024. It was moderated by Hana Dusíková and the panel members were Chandler Carruth, Jon Ross-Perkins, and Richard Smith. Rather than go through each question in the order they were asked I've tried to group together questions I saw as thematically linked. Anything in quotes should be a close approximation of what was said, but I can't guarantee I didn't introduce errors.

Background and basics of Carbon

This wasn't actually the first question, but I think a good starting point is 'how would you sell Carbon to a C++ user?'. Per Chandler, "we can provide a better language, tooling, and ecosystem without needing to leave everything you built up in C++ (both existing code and the training you had). The cost of leveraging carbon should have as simple a ramp as possible. Sometimes you need performance, we can give more advanced tools to help you get the most performance from your code. In other cases, it's security and we'll be able to offer more tools here than C++. It's having the ability to unlock improvements in the language without having to walk away from your investments in C++, that of course isn't going anywhere."

When is it going to be done? Chandler: "When it's ready! We're trying to put our roadmap out there publicly where we can. It's a long term project. Our goal for this year is to get the toolchain caught up with the language design, get into building practical C++ interop this year. There are many unknowns in terms of how far we'll get this year, but next year I think you'll see a toolchain that works and you can do interesting stuff with and evaluate in a more concrete context."

As for the size of the team and how many companies are contributing, the conclusion was that Google is definitely the main backer right now but there are others starting to take a look. There are probably about 5-10 people active on Carbon in any different week, but it varies so this can be a different set of people from week to week.

Given we've established that Google are still the main backer, one line of questioning was about what Google see in it and how management were convinced to back it. Richard commented "I think we're always interested in exploring new possibilities for how to deal with our existing C++ codebase, which is a hugely valuable asset for us. That includes both looking at how we can keep the C++ language itself happy and our tools for it being good and maintainable, but also places we might take it in the future. For a long time we've been talking about if we can build something that works better for our use case than existing tools, and had an opportunity to explore that and went with it."

A fun question that followed later aimed to establish the favourite thing about Carbon from each of the panel members (of course, it's nice that much of this motivation overlaps with the reasons for my own interest!):

  • Jon: For me it's unlocking the ability to improve the developer experience. Improving the language syntax, making things more understandable. Also C++ is very slow to compile and we're hoping for 1/3rd of the compile time for a typical file vs Clang.
  • Richard: from a language design perspective, being able to go through and systematically evaluate every decision and look at them again using things that have been learned since then.
  • Chandler: It's not so much the systematic review for me. We're building this with an eye to C++, and to integrate with C++ need a fairly big complex language (generics, inheritance, overloading etc etc). But when we look at this stuff, we keep finding principled and small/focused designs that we can use to underpin the sort of language functionality that is in C++ but fits together in a coherent and principled way. And we write this down. I think a lot of people don't realise that Carbon is writing down all the design as we go rather than having some mostly complete starting point and then making changes to that using something like an RFC process.

Finally, (for this section) there was also some intriguing discussion about Carbon's design tradeoffs. This is covered mostly elsewhere, but I think Chandler's answer focusing on the philosophy of Carbon rather than technical implementation details fits well here: "One of the philosophical goals of our language design is we don't try to solve hard problems in the compiler. [In other languages] there's often a big problem and the compiler has to solve it, freeing the programmer from worrying about it. Instead we provide the syntax and ask the programmer to tell us. e.g. for type inference, you could imagine having Hindley-Milner type inference. That's a hard problem, even if it makes ergonomics better in some cases. So we use a simpler and more direct type system. You see this crop up all over the language design."

Carbon governance

The first question related to organisation of the project and its governance was about what has been done to make it easier for contributors to get involved. My interpretation of the responses is that although they've had some success in attracting new contributors, the feeling is that there's more that could be done here. e.g. from Jon "Things like maintaining the project documentation, writing a start guide on how to build. We're making some different infrastructure choices, e.g. bazel, but in general we're trying to use things people are familiar with: GitHub, GH actions etc. There's probably still room for improvement. Listening to the LLVM newcomers session a couple of days ago gave me some ideas. Right now there's probably a lot of challenge learning how the different model works in Carbon."

The governance model for Carbon in general was something Chandler had a lot to say about:

  • On getting it in place early: "I was keen we start off with it, seeing challenges with C++ or even LLVM where as a project gets big the governance model that worked well at the beginning might not work so well and changing it at that point can be really hard. So I wanted to set up something that could scale, and can be changed if needed.
  • Initial mistakes: "The initial model was terrible. It was very elaborate, a large core team of people that pulled in people that weren't even involved in the project. We'd have weekly/biweekly meetings, a complex consensus process. In some ways it worked, we were being intentional and it was clear there was a governance project. But its was so inefficient, so people tried to avoid it. So we revamped it and simplified it, distilling it to something a little closer to the benevolent dictator for life model with minimal patches. Three leads rather than one to add some redundancy, one top-level governance body, will eventually have a rotation there."
  • The model now: "It's not that we'll try to agree on everything - if a change is uncontroversial then any one of the leads can approve, which gives us 3x bandwidth. Except for the cases if there is controversy, then the three of us step back and discuss. I had my personal pet favourite feature of carbon overturned by this governance model. The model allows us to make decisions quickly and easily, and most decisions are reversible if they turn out to be wrong. So we don't try to optimise for getting to the best outcome, rather to minimise cost for getting to an outcome. We're still early - we have a plan for scaling but that's more in the future."

Further questioning led to additional points being made about the governance model, such as the fact there are now several community members unaffiliated to Google with commit privileges, that the project goals are documented which helps contributors understand if a proposal is likely to be aligned with Carbon or not, and that these goals are potentially changeable if people come in with good arguments as to why they should be updated.

Carbon vs other languages

Naturally, there were multiple questions related to Carbon vs Rust. In terms of high-level comparisons, the panel members were keen to point out that Rust is a great language and is mature, while Carbon remains at an early stage. There are commonalities in terms of the aim to provide modern type system features to systems programmers, but the focus on C++ interoperability as a goal driving the language design is a key differentiator.

Thoughts about Rust naturally lead to questions about Carbon's memory safety story, and whether Carbon plans to have something like Rust's borrow checker. Richard commented "I think it's possible to get something like that. We're not sure exactly what it will look like yet, we're planning to look at this more between the 0.1 and 0.2 milestone. Borrow checking is an interesting option to pursue but there are some others to explore."

Concerns were also raised about whether given that C++ interop is such a core goal of Carbon, it may have problems as C++ continues to evolve (perhaps with features that clash with features add in Carbon). Chandler's response was "As long as clang gets new C++ features, we'll get them. Similar to how swift's interop works, linking the toolchain and using that to understand the functionality. If C++ starts moving in ways addressing the safety concerns etc that would be fantastic. We could shut down carbon! I don't think that's very likely. In terms of functionality that would make interop not work well that's a bit of a concern, but of course if C++ adds something they need it to interop with existing C++ code so we face similar constraints. While Richard commented "C++ modules is a big one we need to keep an eye on. But at the moment as modules adoption hasn't taken off in a big way yet, we're still targeting header files." There was also a brief exchange about what if Carbon gets reflection before C++, with the hope that if it did happen it could help with the design process by giving another example for C++ to learn from (just as Carbon learns from Rust, Swift, Go, C++, and others).

Compiler implementation questions

EuroLLVM is of course a compilers conference, so there was a lot of curiousity about the implementation of the Carbon toolchain itself. In discussing trade-offs in the design, Jon leapt into a discussion of this "to go for SemIR vs a traditional AST. In Clang the language is represented by an AST, which is what you're typically taught about how compilers work. We have a semantic IR model where we produce lex tokens, then a parse tree (slightly different to an AST) an then it becomes SemIR. This is very efficient and fast to process, lowers to LLVM IR very easily, but it's a different model and a different way to think about writing a compiler. To the earlier point about newcomers, it's something people have to learn and a bit of a barrier because of that. We try to address it, but it is a trade-off." Jon since provided more insight on Carbon's implementation approach in a recent /r/ProgrammingLanguages thread.

Given what Carbon devs have learned about applying data oriented design to optimise the frontend (see talks like Modernizing Compiler Design for Carbon Toolchain), could the same ideas be applied to the backend? Chandler commented on the tradeoff he sees "By necessity with data-oriented design you have to specialise a lot. We think we can benefit from this a lot with the frontend at the cost of reusability. The trade-off might be different within LLVM."

When asked about whether the Carbon frontend may be reusable for other tools (reducing duplicated effort writing parsers for Carbon), Jon responded "I'm pretty confident at least through the parse tree it will be reusable. I'm more hesitant to make that claim through to SemIR. It may not be the best choice for everyone, and in some cases you may want more of an AST. But providing these tools as part of the language project like a formatter, migration tool [for language of API changes], something like clang-tidy, we're going to be taking on these costs ourselves which is going to incentivise us to find ways of amortising this and finding the right balance."


Article changelog
  • 2024-05-13: Various phrasing tweaks and fixed improper heading nesting.
  • 2024-05-12: Initial publication date.

May 12, 2024 12:00 PM

May 09, 2024

Brian Kardell

Brian Specific Failures

Brian Specific Failures

Through my work in Interop, I've been I've been learning stuff about WPT. Part of this has been driven by a desire to really understand the graph on the main page about browser specific failures (like: which tests?). You might think this is stuff I'd already know, but I didn't. So, I thought I'd share some of the things I learned along the way, just incase you didn't know either.

Did you know that wpt.fyi has lots of ways to query it? It does!

You can, for example, type nameOfBrowser: fail (or nameOfBrowser: pass) into the search box - and you can even string together several of them to ask interesting questions!

What tests fail in every browser?

It seemed like an interesting question: Are there tests that fail in every single browser?

A query for that might look like: chrome:fail firefox:fail edge:fail safari:fail (using the browsers that are part of Interop, for example). And, as of this writing, it tells me...

There are 2,724 tests (15,718 subtests) which fail in every single browser!

Wow. I didn't expect there to be zero, but that feels like kind of a lot of tests to fail in every browser, doesn't it? How can that even happen? To be totally honest, I don't know!

A chicken and egg problem?

The purpose of WPT is to test conformance to a standard, but this is... tricky. Tests can be used to discuss specs as they are being written - or highlight things that are yet to be implemented, and WPT doesn't have the same kind of guardrails as specs. Things can get committed there that aren't "real" or yet agreed upon. Frequently the first implementers and experimenters write tests. And, it's not uncommon that then things change - or sometimes the specs just never go anywhere. Maybe those tests sit in a limbo "hoping" to advance?

WPT asks that files of early things like this be placed in a /tentative directory or have a .tentative in the names of early things like that which don't have agreement.

Luckily thanks to the query API we can see which of the tests above fall into that category by adding is:tentative to our query: chrome:fail firefox:fail edge:fail safari:fail is:tentative). We can see that indeed

348 of the tests (1,911 subtests) that fail in every single browser are marked as tentative.

The query language supports negating parameters with an exclamation, so we can adjust the query (chrome:fail firefox:fail edge:fail safari:fail !is:tentative) and see...

2,372 tests (13,797 subtests) of the tests that fail in every single browser are not marked as tentative.

So, I guess .tentative doesn't explain most of it (or maybe some of those just didn't follow that guidance). What does? I don't know!

Poking around, I see there's even 1 ACID test that literally everyone fails. Maybe that's just a bad test at this point? Maybe all of the tests that match this query need reviewed? I don't know!

You can also query simply all(status:fail) which will show pretty much the same thing for whatever browsers you happen to be showing. I like the explicit version for introducing this though as it's very clear from the query itself which "all" I'm referring to about.

Are there any tests that only pass in one browser?

I also wondered: Hmm... If there are tests that fail in exactly one browser, and we've just shown there's a bunch that pass in none, how many are there that pass in exactly one? That's pretty interesting to look at too:

Engines and Browsers: Tricky

Often when we're showing results we're using the 3 flagship browsers as a proxy for engine capabilies. We do that a lot - like in the Browsers Specific Failures (BSF) graph at the top of wpt.fyi - but... It's an imperfect proxy.

For example, there are also tests that only fail in Edge. As of this writing 1,456 tests (1,727 subtests) of them.

Or, there's also tests that fail uniquely in WebKit GTK - 1,459 tests (1,740 subtests) of those.

Here's where there's a bit of a catch-22 though. BSF is really useful for the browser teams, so links like the above are actually very handy if you're on the Edge or GTK teams. But we can't add those to the BSF chart because it kind of really changes the meaning of the whole thing. What that kind of wants to be is "Engine Specific Failures", but that's not actually directly measurable.

Brian Specific Failures...

Below is an interesting table which links to query results that show actual tests that report as failing in only one of the three flagship browsers.

Browser Non-tentative Subtests which uniquely fail among these browsers
Chrome 1,345 tests (12,091 subtests)
Firefox 2,290 tests (16,791 subtests)
Safari 4,263 tests (14,867 subtests)

If that sounds like the data for the Browser Specific Failures (BSF) graph on the front page of wpt.fyi, yes. And no. I call this the "Brian Specific Failures" (BSF) table.

I think that this is about as close as we're probably going to get in the short term to a linkable set of tests that fail, if you'd like to explore them. The BSF graph is also, I believe, more disciminating than just "pass" and "fail" that we're showing here. Like, what do you do if a test times out? Are there intermediate or currently unknown statuses?

It was also kind of interesting, for me at least, while looking at the data, to realize just how different the situation looks depending on whether you are looking at tests, or subtests. Firefox fails the most subtests, but Safari fails the most tests. BSF "scores" subtests as decimal values of a whole test.

It's pretty complicated to pull any of this off actually. Things go wrong with tests and runs, and so on. I also realized that all of these scores are inherently a little imperfect.

For example, if you at the top of the page, it says (as I am writing this) there are 55,176 tests (1,866,346 subtests)

But if you look to the bottom of the screen, the table has a summary with the passing/total subtests for each browser. As I am writing this it says:

chrome: 1836812 / 1890027, edge: 1834777 / 1887771, firefox: 1786949 / 1868316, safari: 1787112 / 1879926

You might note that none of those lists the same number of subtests!

Anyway, it was an interesting rabbit hole to dive into! I have further thoughts on the WPT.fyi dashboard page which I'll share in another post, but I'd love to hear of any interesting queries that you come up with or questions you have. I probably don't know the answers, but I'll try to help find them!

May 09, 2024 04:00 AM

Igalia Compilers Team

MessageFormat 2.0: A domain-specific language?

code { word-break: normal; }

I recommend reading the prequel to this post, "MessageFormat 2.0: a new standard for translatable messages"; otherwise what follows won't make much sense!

This blog post represents the Igalia compilers team; all opinions expressed here are our own and may not reflect the opinions of other members of the MessageFormat Working Group or any other group.

What we've achieved #

In the previous post, I've just described the work done by the members of the MF2 Working Group to develop a specification for MF2, a process that spanned several years. When I joined the project in 2023, my task was to implement the spec and produce a back-end for the JavaScript Intl.MessageFormat proposal.

|-------------------------------|
| JavaScript code using the API |
|-------------------------------|
| browser engine                |
|-------------------------------|
| MF2 in ICU                    |
|-------------------------------|

For code running in a browser, the stack looks like this: web developers write JavaScript code that uses the API; their code runs in a JavaScript engine that's built into a Web browser; and that browser makes API calls to the ICU library in order to execute the JS code.

The bottom layer of the stack is what I've implemented as a tech preview. Since the major JavaScript engines are implemented in C++, I worked on ICU4C, the C++ version of ICU.

The middle level is not yet implemented. A polyfill makes it possible to try out the API in JavaScript now.

Understanding MF2 #

To understand the remainder of this post, it's useful to have read over a few more examples from the MF2 documentation. See:

Is MF2 a programming language? (And does it matter?) #

Different members of the working group have expressed different opinions on whether MF2 is a programming language:

  • Addison Phillips: "I think we made a choice (we can reconsider it if necessary, although I don't think we need to) that MF2 is not a resource format. I'm not sure if 'templating language' is the right characterization, but let's go with it for now." (comment on pull request 474, September 2023)
  • Stanisław Małolepszy: "...we’re neither resource format nor templating language. I tend to think we’re a storage format for variants." (comment on the same PR)
  • Mihai Niță: "For me, this is a DSL designed for i18n / l10n." (comment on PR 507)

MF2 lacks block structure and complex control flow: the .match construct can't be nested, and .local variable declarations are sequential rather than nested. Some people might expect a general-programming language to include these features.

But in implementing it, I have found that several issues arise that are familiar from my experience studying, and implementing, compilers and interpreters for more conventional programming languages.

I wrote in part 1 that when using either MF1 or MF2, messages can be separated from code -- but it turns out that maybe, messages are code! But they constitute code in a different programming language, which can be cleanly separated from the host programming language, just as with uninterpreted strings.

Abstract syntax #

As with a programming language, the syntax of MF2 is specified using a formal grammar, in this case using Augmented Backus-Naur Form. In most compilers and interpreters, a parser (that's either generated from a specification of the formal grammar, or written by hand to closely follow that grammar) reads the text of the program and generates an abstract syntax tree (AST), which is an internal representation of the program.

MF2 has a data model, which is like an AST. Users can either write messages in the text-based syntax (probably easiest for most people) or construct a data model directly using part of the API (useful for things like building external tools).

Parsers and ASTs are familiar concepts for programming language implementors, and writing the "front-end" for MF2 (the parser, and classes that define the data model) was not that different from writing a front-end for a programming language. Because it works on ASTs, the formatter itself (the part that does all the work once everything is parsed) also looks a lot like an interpreter. The "back-end", producing a formatted result, is currently quite simple since the end result is a flat string -- however, in the future, "formatting to parts" (necessary to easily support features like markup) will be supported.

For more details, see Eemeli Aro's FOSDEM 2024 talk on the data model.

Naming #

MF2 has local variable declarations, like the let construct in JavaScript and in some functional programming languages. That means that in implementing MF2, we've had to think about evaluation strategies (lazy versus eager evaluation), scope, and mutability. MF2 settled on a spec in which all local variables are immutable and no name shadowing is permitted except in a very limited way, with .input declarations. Free variables are permitted, and don't need to be declared explicitly: these correspond to runtime arguments provided to a message. Eager and lazy implementations can both conform to the spec, since variables are immutable, and lazy implementations are free to use call-by-need semantics (memoization). These design choices weren't obvious, and in some cases, inspired quite a lot of discussion.

Naming can have subtle consequences. In at least one case, the presence of .local declarations necessitates a dataflow analysis (albeit a simple one). The analysis is to check for "missing selector annotation" errors; the details are outside the scope of this post, but the point is just that error checking requires "looking through" a variable reference, possibly multiple chains of references, to the variable's definition. Sounds like a programming language!

Functions #

Placeholders in MF2 may be expressions with annotations. These expressions resemble function calls in a more conventional language. Unlike in most languages, functions cannot be implemented in MF2 itself, but must be implemented in a host language. For example, in the ICU4C implementation, the functions would either be written in C++ or (less likely) another language that has a foreign function interface with C++.

Functions can be built-in (required in the spec to be provided), or custom. Programmers using an implementation of the MF2 API can write their own custom functions that format data ("formatters"), that customize the behavior of the .match construct ("selectors"), or do both. Formatters and selectors have different interfaces.

The term "annotation" in the MF2 spec is suggestive. For example, a placeholder like {$x :number} can be read: "x should be formatted as a number." But custom formatters and selectors may also do much more complicated things, as long as they conform to their interfaces. An implementor could define a custom :squareroot function, for example, such that {$x :squareroot} is replaced with a formatted number whose value is the square root of the runtime value of $x. It's unclear how common use cases like this will be, but the generality of the custom function mechanism makes them possible.

Wherever there are functions, functional programmers think about types, explicit or implicit. MF2 is designed to use a single runtime type: it's unityped. The spec uses the term "resolved value" to refer to the type of its runtime values. Moreover, the spec tries to avoid constraining the structure of "resolved values", to allow the implementation as much freedom as possible. However, the implementation has to call functions written in a host language that may have a different type system. The challenge comes in defining the "function registry", which is the part of the MF2 spec that defines both the names and behavior of built-in functions, and how to specify custom functions.

Different host languages have different type systems. Specifying the interface between MF2 and externally-defined functions is tricky, since the goal is to be able to implement MF2 in any programming language.

While a language that can call foreign functions but not define its own functions is unusual, defining foreign function interfaces that bridge the gaps between disparate programming languages is a common task when designing a language. This also sounds like a programming language.

Pattern matching #

The .match construct in MF2 looks a lot like a case expression in Haskell, or perhaps a switch statement in C/C++, depending on your perspective. Unlike switch, .match allows multiple expressions to be examined in order to choose an alternative. And unlike case, .match only compares data against specific string literals or a wildcard symbol, rather than destructuring the expressions being scrutinized.

The ability provided by MF2 to customize pattern-matching is unusual. An implementation of a selector function takes a list of keys, one per variant, and returns a sorted list of keys in order of preference. The list is then used by the pattern matching algorithm in the formatter to pick the best-matching variant given that there are multiple selectors. An abstract example is:

.match {$x :X} {$y :Y}
A1 A2 {{1}}
B1 B2 {{2}}
C1 * {{3}}
* C2 {{4}}
* * {{5}}

In this example, the implementation of X would be called with the list of keys [A1, B1, C1] (the * key is not passed to the selector implementation) and returns a list of the same keys, arranged in any order. Likewise, the implementation of Y would be called with [A2, B2, C2].

I don't know of any existing programming languages with a pattern-matching construct like this one; usually, the comparison between values and patterns is based on structural equality and can't be abstracted over. But the ability to factor out the workings of pattern matching and swap in a new kind of matching (by defining a custom selector function) is a kind of abstraction that would be found in a general-purpose programming language. Of course, the goal here is not to match patterns in general but to select a grammatically correct variant depending on data that flows in at runtime.

But is it Turing-complete? #

MF2 has no explicit looping constructs or recursion, but it does have function calls. The details of how custom functions are realized are implementation-specific; typically, using a general-purpose programming language. That means that MF2 can invoke code that does arbitrary computation, but not express it. I think it would be fair to say that the combination of MF2 and a suitable registry of custom functions is Turing-complete, but MF2 itself is not Turing-complete. For example, imagine a custom function named eval that accepts an arbitrary JavaScript program as a string, and returns its output as a string. This is not how MF2 is intended to be used; the spec notes that "execution time SHOULD be limited" for invocations of custom functions. I'm not aware of any other languages whose Turing-completeness depends on their execution environment. (Though there is at least one lengthy discussion of whether CSS is Turing-complete.)

If custom functions were removed altogether and the registry of functions was limited to a small built-in set, MF2 would look much less like a programming language; its underlying "instruction set" would be much more limited.

Code versus data #

There is an old Saturday Night Live routine: “It’s a floor wax! It’s a dessert topping! It’s both!” XML is similar. “It’s a database! It’s a document! It’s both!” -- Philip Wadler, "XQuery: a typed functional language for querying XML" (2002)

The line between code and data isn't always clear, and the MF2 data model can be seen as a representation of the input data, rather than as an AST representing a program. Likewise, the syntax can be seen as a serialized format for representing the data, rather than as the syntax of a program.

There is also a "floor wax / dessert topping" dichotomy when it comes to functions. Is MF2 an active agent that calls functions, or does it define data passed to an API, whose implementation is what calls functions?

"Language" has multiple meanings in software: it can refer to a programming language, but the "L" in "HTML" and "XML" stands for "language". We would usually say that HTML and XML are markup languages, not programming languages, but even that point is debatable. After all, HTML embeds JavaScript code; the relationship between HTML and JavaScript resembles the relationship between MF2 and function implementations. Languages differ in how much computation they can directly express, but there are common aspects that unite different languages, like having a formal grammar.

I view MF2 as a domain-specific language for formatting messages, but another perspective is that it's a representation of data passed to an API. An API itself can be viewed as a domain-specific language: it provides verbs (functions or methods that can be called) and nouns (data structures that the functions can accept.)

Summing up #

As someone with a background in programming language design and implementation, I'm naturally inclined to see everything as a language design problem. A general-purpose programming language is complex, or at least can be used to solve complex problems. In contrast, those who have worked on the MF2 spec over the years have put in a lot of effort to make it as simple and focused on a narrow domain as possible.

One of the ways in which MF2 has veered towards programming language territory is the custom function mechanism, which was added to provide extensibility. The mechanism is general, but if there was a less general solution that still supported the desired range of use cases, these years of effort would have unearthed one. The presence of name binding (with all the complexities that brings up) and an unusual form of pattern matching also suggest to me that it's appropriate to consider MF2 a programming language, and to apply known programming language techniques to its design and implementation. This shows that programming language theory has interesting applications in internationalization, which is a new finding as far as I know.

What's left to do? #

The MessageFormat spec working group welcomes feedback during the tech preview period. User feedback on the MF2 spec and implementations will influence its future design. The current tech preview spec is part of LDML 45; the beta version, to be included in LDML 46, may include improvements suggested by users and implementors.

Igalia plans to continue collaboration to advance the Intl.MessageFormat proposal in TC39. Implementation in browser engines will be part of that process.

Acknowledgments #

Thanks to my colleagues Ben Allen, Philip Chimento, Brian Kardell, Eric Meyer, Asumu Takikawa, and Andy Wingo; and MF2 working group members Eemeli Aro, Elango Cheran, Richard Gibson, Mihai Niță, and Addison Phillips for their comments and suggestions on this post and its predecessor. Special thanks to my colleague Ujjwal Sharma, whose work I borrowed from in parts of the first post.

May 09, 2024 12:00 AM

May 07, 2024

Melissa Wen

Get Ready to 2024 Linux Display Next Hackfest in A Coruña!

We’re excited to announce the details of our upcoming 2024 Linux Display Next Hackfest in the beautiful city of A Coruña, Spain!

This year’s hackfest will be hosted by Igalia and will take place from May 14th to 16th. It will be a gathering of minds from a diverse range of companies and open source projects, all coming together to share, learn, and collaborate outside the traditional conference format.

Who’s Joining the Fun?

We’re excited to welcome participants from various backgrounds, including:

  • GPU hardware vendors;
  • Linux distributions;
  • Linux desktop environments and compositors;
  • Color experts, researchers and enthusiasts;

This diverse mix of backgrounds are represented by developers from several companies working on the Linux display stack: AMD, Arm, BlueSystems, Bootlin, Collabora, Google, GravityXR, Igalia, Intel, LittleCMS, Qualcomm, Raspberry Pi, RedHat, SUSE, and System76. It’ll ensure a dynamic exchange of perspectives and foster collaboration across the Linux Display community.

Please take a look at the list of participants for more info.

What’s on the Agenda?

The beauty of the hackfest is that the agenda is driven by participants! As this is a hybrid event, we decided to improve the experience for remote participants by creating a dedicated space for them to propose topics and some introductory talks in advance. From those inputs, we defined a schedule that reflects the collective interests of the group, but is still open for amendments and new proposals. Find the schedule details in the official event webpage.

Expect discussions on:

KMS Color/HDR
  • Proposal with new DRM object type:
    • Brief presentation of GPU-vendor features;
    • Status update of plane color management pipeline per vendor on Linux;
  • HDR/Color Use-cases:
    • HDR gainmap images and how should we think about HDR;
    • Google/ChromeOS GFX view about HDR/per-plane color management, VKMS and lessons learned;
  • Post-blending Color Pipeline.
  • Color/HDR testing/CI
    • VKMS status update;
    • Chamelium boards, video capture.
  • Wayland protocols
    • color-management protocol status update;
    • color-representation and video playback.
Display control
  • HDR signalling status update;
  • backlight status update;
  • EDID and DDC/CI.
Strategy for video and gaming use-cases
  • Multi-plane support in compositors
    • Underlay, overlay, or mixed strategy for video and gaming use-cases;
    • KMS Plane UAPI to simplify the plane arrangement problem;
    • Shared plane arrangement algorithm desired.
  • HDR video and hardware overlay
Frame timing and VRR
  • Frame timing:
    • Limitations of uAPI;
    • Current user space solutions;
    • Brainstorm better uAPI;
  • Cursor/overlay plane updates with VRR;
  • KMS commit and buffer-readiness deadlines;
Power Saving vs Color/Latency
  • ABM (adaptive backlight management);
  • PSR1 latencies;
  • Power optimization vs color accuracy/latency requirements.
Content-Adaptive Scaling & Sharpening
  • Content-Adaptive Scalers on display hardware;
  • New drm_colorop for content adaptive scaling;
  • Proprietary algorithms.
Display Mux
  • Laptop muxes for switching of the embedded panel between the integrated GPU and the discrete GPU;
  • Seamless/atomic hand-off between drivers on Linux desktops.
Real time scheduling & async KMS API
  • Potential benefits: lower latency input feedback, better VRR handling, buffer synchronization, etc.
  • Issues around “async” uAPI usage and async-call handling.

In-person, but also geographically-distributed event

This year Linux Display Next hackfest is a hybrid event, hosted onsite at the Igalia offices and available for remote attendance. In-person participants will find an environment for networking and brainstorming in our inspiring and collaborative office space. Additionally, A Coruña itself is a gem waiting to be explored, with stunning beaches, good food, and historical sites.

Semi-structured structure: how the 2024 Linux Display Next Hackfest will work

  • Agenda: Participants proposed the topics and talks for discussing in sessions.
  • Interactive Sessions: Discussions, workshops, introductory talks and brainstorming sessions lasting around 1h30. There is always a starting point for discussions and new ideas will emerge in real time.
  • Immersive experience: We will have coffee-breaks between sessions and lunch time at the office for all in-person participants. Lunches and coffee-breaks are sponsored by Igalia. This will keep us sharing knowledge and in continuous interaction.
  • Spaces for all group sizes: In-person participants will find different room sizes that match various group sizes at Igalia HQ. Besides that, there will be some devices for showcasing and real-time demonstrations.

Social Activities: building connections beyond the sessions

To make the most of your time in A Coruña, we’ll be organizing some social activities:

  • First-day Dinner: In-person participants will enjoy a Galician dinner on Tuesday, after a first day of intensive discussions in the hackfest.
  • Getting to know a little of A Coruña: Finding out a little about A Coruña and current local habits.

Participants of a guided tour in one of the sectors of the Museum of Estrella Galicia (MEGA). Source: mundoestrellagalicia.es

  • On Thursday afternoon, we will close the 2024 Linux Display Next hackfest with a guided tour of the Museum of Galicia’s favorite beer brand, Estrella Galicia. The guided tour covers the eight sectors of the museum and ends with beer pouring and tasting. After this experience, a transfer bus will take us to the Maria Pita square.
  • At Maria Pita square we will see the charm of some historical landmarks of A Coruña, explore the casual and vibrant style of the city center and taste local foods while chatting with friends.

Sponsorship

Igalia sponsors lunches and coffee-breaks on hackfest days, Tuesday’s dinner, and the social event on Thursday afternoon for in-person participants.

We can’t wait to welcome hackfest attendees to A Coruña! Stay tuned for further details and outcomes of this unconventional and unique experience.

May 07, 2024 02:33 PM

May 06, 2024

Igalia Compilers Team

MessageFormat 2.0: a new standard for translatable messages

code { word-break: normal; }
This blog post represents the Igalia compilers team; all opinions expressed here are our own and may not reflect the opinions of other members of the MessageFormat Working Group or any other group.

MessageFormat 2.0 is a new standard for expressing translatable messages in user interfaces, both in Web applications and other software. Last week, my work on implementing MessageFormat 2.0 in C++ was released in ICU 75, the latest release of the International Components for Unicode library.

As a compiler engineer, when I learned about MessageFormat 2.0, I began to see it as a programming language, albeit an unconventional one. The message formatter is analogous to an interpreter for a programming language. I was surprised to discover a programming language hiding in this unexpected place. Understanding my surprise requires some context: first of all, what "messages" are.

A story about message formatting #

Over the past 40 to 50 years, user interfaces (UIs) have grown increasingly complex. As interfaces have become more interactive and dynamic, the process of making them accessible in a multitude of natural languages has increased in complexity as well. Internationalization (i18n) refers to this general practice, while the process of localization (l10n) refers to the act of modifying a specific system for a specific natural language.

i18n history #

Localization of user interfaces (both command-line and graphical) began by translating strings embedded in code that implements UIs. Those strings are called "messages". For example, consider a text-based adventure game: messages like "It is pitch black. You are likely to be eaten by a grue." might appear anywhere in the code. As a slight improvement, the messages could all be stored in a separate "resource file" that is selected based on the user's locale. The ad hoc approaches to translating these messages and integrating them into code didn't scale well. In the late 1980s, C introduced a gettext() function into glibc, which was never standardized but was widely adopted. This function primarily provided string replacement functionality. While it was limited, it inspired the work that followed. Microsoft and Apple operating systems had more powerful i18n support during this time, but that's beyond the scope of this post.

The rise of the Web required increased flexibility. Initially, static documents and apps with mostly static content dominated. Java, PHP, and Ruby on Rails all brought more dynamic content to the web, and developers using these languages needed to tinker more with message formatting. Their applications needed to handle not just static strings, but messages that could be customized to dynamic content in a way that produced understandable and grammatically correct messages for the target audience. MessageFormat arose as one solution to this problem, implemented by Taligent (along with other i18n functionality).

The creators of Java had an interest in making Java the preferred language for implementing Web applications, so Sun Microsystems incorporated Taligent's work into Java in JDK 1.1 (1997). This version of MessageFormat supported pattern strings, formatting basic types, and choice format.

The MessageFormat implementation was then incorporated into ICU, the widely used library that implements internationalization functions. Initially, Taligent and IBM (which was one of Taligent's parent companies) worked with Sun to maintain parallel i18n implementations in ICU and in the JDK, but eventually the two implementations diverged. Moving at a faster pace, ICU added plural selection to MessageFormat and expanded other capabilities, based on feedback from developers and translators who had worked with the previous generations of localization tools. ICU also added a C++ port of this Java-based implementation; hence, the main ICU project includes ICU4J (implemented in Java) and ICU4C (implemented in C++ with some C APIs).

Since ICU MessageFormat was introduced, many much more complex Web UIs, largely based on JavaScript, have appeared. Complex programs can generate much more interesting content, which implies more complex localization. This necessitates a different model of message formatting. An update to MessageFormat has been in the works at least since 2019, and that work culminates with the recent release of the MessageFormat 2.0 specification. The goals are to provide more modularity and extensibility, and easier adaptation for different locales.

For brevity, we will refer to ICU MessageFormat, otherwise known simply as "MessageFormat", as "MF1" throughout. Likewise, "MF2" refers to MessageFormat 2.0.

Why message formatting? (Not just for translation) #

The work described in this post is motivated by the desire to make it as easy as possible to write software that is accessible to people regardless of the language(s) they use to communicate. But message formatting is needed even in user interfaces that only support a single natural language.

Consider this C code, where number is presumed to be an unsigned integer variable that's already been defined:

printf("You have %u files", number);

We've probably all used applications that informed us that we "have 1 files". While it's probably clear what that means, why should human readers have to mentally correct the grammatical errors of a computer? A programmer trying to improve things might write:

printf("You have %u file%s", number, number == 1 ? "" : "s");

which handles the case where n is 1. However, English is full of irregular plurals: consider "bunny" and "bunnies", "life" and "lives", or "mouse" and "mice". The code would be easier to read and maintain if it was easier to express "print the plural of 'file'", instead of a conditional expression that must be scrutinized for its meaning. While "printf" is short for "print formatted", its formatting is limited to simpler tasks like printing numbers as strings, not complex tasks like pluralizing "mouse".

This is just one example; another example is messages including ordinals, like "1st", "2nd", or "3rd", where the suffix (like "st") is determined in a complicated way from the number (consider "11th" versus "21st").

We've seen how messages can vary based on user input in non-local ways: the bug in "You have %u files" is that not only does the number of files vary, so does the word "files". So you might begin to see the value of a specialized notation for expressing such messages, which would make it easier to notice bugs like the "1 files" bug and even prevent them in the first place. That notation might even resemble a domain-specific language.

Why message formatting? #

Turning now to translation, you might be wondering why you would need a separate message formatter, instead of writing code in your programming language of choice that would look something like this (if your language of choice is C/C++):

// Supposing a char* "what" and int "planet" are already defined
printf("There was %s on planet %d", what, planet);

Here, the message is the first argument to printf and the other arguments are substituted into the message at runtime; no special message formatting functionality is used, beyond what the C standard library offers.

The problem is that if you want to translate the words in the message, translation may require changing the code, not just the text of the message. Suppose that in the target language, we needed to write the equivalent of:

printf("On planet %d, there was a %s", what, planet);

Oops! We've reordered the text, but not the other arguments to printf. When using a modern C/C++ compiler, this would be a compile-time error (since what is a string and not an integer). But it's easy to imagine similar cases where both arguments are strings, and a nonsense message would result. Message formatters are necessary not only for resilience to errors, but also for division of labor: people building systems want to decouple the tasks of programming and translation.

ICU MessageFormat #

In search of a better tool for the job, we turn to one of many tools for formatting messages: ICU MessageFormat (MF1). According to the ICU4C API documentation, "MessageFormat prepares strings for display to users, with optional arguments (variables/placeholders)".

The MF2 working group describes MF1 more succinctly as "An i18n-aware printf, basically."

Here's an expanded version of the previous printf example, expressed in MF1 (and taken from the "Usage Information" of the API docs):

"At {1,time,::jmm} on {1,date,::dMMMM}, there was {2} on planet {0,number}."

Typically, the task of creating a message string like this is done by a software engineer, perhaps one who specializes in localization. The work of translating the words in the message from one natural language to another is done by a translator, who is not assumed to also be a software developer.

The designers of MF1 made it clear in the syntax what content needs to be translated, and what doesn't. Any text occurring inside curly braces is non-translatable: for example, the string "number" in the last set of curly braces doesn't mean the word "number", but rather, is a directive that the value of argument 0 should be formatted as a number. This makes it easy for both developers and translators to work with a message. In MF1, a piece of text delimited by curly braces is called a "placeholder".

The numbers inside placeholders represent arguments provided at runtime. (Arguments can be specified either by number or name.) Conceptually, time, date, and number are formatting functions, which take an argument and an optional "style" (like ::jmm and ::dMMMM in this example). MF1 has a fixed set of these functions. So {1,time,::jmm} should be read as "Format argument 1 as a time value, using the format specified by the string ::jmm". (The details of how the format strings work aren't important for this explanation.) Since {2} has no formatting function specified, the value of argument 2 is formatted based on its type. (From context, we can suppose it has a string type, but we would need to look at the arguments to the message to know for sure.) The set of types that can be formatted in this way is fixed (users can't add new types of their own).

For brevity, I won't show how to provide the arguments that are substituted for the numbers 0, 1 and 2 in the message; the API documentation shows the C++ code to create those arguments.

MF1 addresses the reordering issue we noticed in the printf example. As long as they don't change what's inside the placeholders, a translator doesn't have to worry that their translation will disturb the relationship between placeholders and arguments. Likewise, a developer doesn't have to worry about manually changing the order of arguments to reflect a translation of the message. The placeholder {2} means the same thing regardless of where it appears in the message. (Using named rather than positional arguments would make this point even clearer.)

(The ICU User Guide contains more documentation on MF1. A tutorial by Mohammad Ashour also contains some useful information on MF1.)

Enter MessageFormat 2.0 #

To summarize a few of the shortcomings of MF1:

  • Lack of modularity: the set of formatters (mostly for numbers, dates, and times) is fixed. There's no way to add your own.
    • Like formatting, selection (choosing a different pattern based on the runtime contents of one or more arguments) is not customizable. It can only be done either based on plural form, or literal string matching. This makes it hard to express the grammatical structures of various human languages.
  • No separate data model: the abstract syntax is concrete syntax.
  • There is no way to declare a local variable so that the same piece of a message can be re-used without repeating it; all variables are external arguments.
  • Baggage: extending the existing spec with different syntax could change the meaning of existing messages, so a new spec is needed. (MF1 was not designed for forward-compatibility, and the new spec is backward-incompatible.)

MF2 represents the best efforts of a number of experts in the field to address these shortcomings and others.

I hope the brief history I gave shows how much work, by many tech workers, has gone into the problem of message formatting. Synthesizing the lessons of the past few decades into one standard has taken years, with seemingly small details provoking nuanced discussions. What follows may seem complex, but the complexity is inherent in the problem space that it addresses. The plethora of different competing tools to address the same set of problems is evidence for that.

"Inherent is Latin for not your fault." -- Rich Hickey, "Simple Made Easy"

A simple example #

Here's the same example in MF2 syntax:

At time {$time :datetime hour=numeric minute=|2-digit|}, on {$time :datetime day=numeric month=long},
there was {$what} on planet {$planet :number}

As in MF1, placeholders are enclosed in curly braces. Variables are prefixed with a $; since there are no local bindings for time, what, or planet, they are treated as external arguments. Although variable names are already delimited by curly braces, the $ prefix helps make it even clearer that variable names should not be translated.

Function names are now prefixed by a :, and options (like hour) are named. There can be multiple options, not just one as in MF1. This is a loose translation of the MF1 version, since in MF2, the skeleton option is not part of the alpha version of the spec. (It is possible to write a custom function that takes this option, and it may be added in the future as part of the built-in datetime function.) Instead, the :datetime formatter can take a variety of field options (shown in the example) or style options (not shown) The full options are specified in the Default Registry portion of the spec.

Literal strings that occur within placeholders, like 2-digit, are quoted. The MF2 syntax for quoting a literal is to enclose it in vertical bars (|). The vertical bars are optional in most cases, but are necessary for the literal 2-digit because it includes a hyphen. This syntax was chosen instead of literal quotation marks (") because some use cases for MF2 messages require embedding them in a file format that assigns meaning to quotation marks. Vertical bars are less commonly used in this way.

A shorter version of this example is:

At time {$time :time}, on {$time :date}, there was {$what} on planet {$planet :number}

:time formats the time portion of its operand with default options, and likewise for :date. Implementations may differ on how options are interpreted and their default values. Thus, the formatted output of this version may be slightly different from that of the first version.

A more complex example: selection and custom functions #

So far, the examples are single messages that contain placeholders that vary based on runtime input. But sometimes, the translatable text in a message also depends on runtime input. A common example is pluralization: in English, consider "You have one item" versus "You have 5 items." We can call these different strings "variants" of the same message.

While you can imagine code that selects the right variant with a conditional, that violates the separation between code and data that we previously discussed.

Another motivator for variants is grammatical case. English has a fairly simple case system (consider "She told me" versus "I told her"; the "she" of the first sentence changes to "her" when that pronoun is the object of the transitive verb "tell".) Some languages have much more complex case systems. For example, Russian has six grammatical cases; Basque, Estonian, and Finnish all have more than ten. The complexity of translating messages into and between languages like these is further motivation for organizing all the variants of a message together. The overall goal is to make messages as self-contained as possible, so that changing them doesn't require changing the code that manipulates them. MF2 makes that possible in more situations. While MF1 supports selection based on plural categories, MF2 also supports a general, customizable form of selection.

Here's a very simple example (necessarily simple since it's in English) of using custom selectors in MF2 to express grammatical case:

.match {$userName :hasCase}
vocative {{Hello, {$userName :person case=vocative}!}}
accusative {{Please welcome {$userName :person case=accusative}!}}
* {{Hello!}}

The keyword .match designates the beginning of a matcher.

How to read a matcher #

Although MF2 is declarative, you can read this example imperatively as follows:

  1. Apply :hasCase to the runtime value of $userName to get a string c representing a grammatical case.
  2. Compare c to each of the keys in the three variants ("vocative", "accusative", and the wildcard "*", which matches any string.)
  3. Take the matching variant v and format its pattern.

This example assumes that the runtime value of the argument $userName is a structure whose grammatical case can be determined. This example also assumes that custom :hasCase and :person functions have been defined (the details of how those functions are defined are outside the scope of this post). In this example, :person is a formatting function, like :datetime in the simpler example. :hasCase is a different kind of function: a selector function, which may extract a field from its argument or do arbitrarily complicated computation to determine a value to be matched against.

In general: a matcher includes one or more selectors and one or more variants. A variant includes a list of keys, whose length must equal the number of selectors in the matcher; and a single pattern.

In this example, the selector of the matcher is the placeholder {$userName :hasCase}. Selectors appear between the .match keyword and the beginning of the first variant. There are three variants, each of which has a single key. The strings delimited by double sets of curly braces are patterns, which in turn contain other placeholders. The selectors are used to select a pattern based on whether the runtime value of the selector matches one of the keys.

Formatting the selected pattern #

Supposing that :hasCase maps the value of $userName onto "vocative", the formatted pattern consists of the string "Hello, " concatenated with the result of formatting this placeholder:

{$userName :person case=vocative}

concatenated with the string "!". (This also supposes that we are formatting to a single string result; future versions of MF2 may also support formatting to a list of "parts", in which case the result strings would be returned in a more complicated data structure, not concatenated.)

You can read this placeholder like a function call with arguments: "Call the :person function on the value of $userName with an additional named argument, the name "case" bound to the string "vocative". We also elide the details of :person, but you can suppose it uses additional fields in $userName to format it as a personal name. Such fields might include a title, first name, and last name (incidentally, see "Falsehoods Programmers Believe About Names" for why formatting personal names is more complicated than you might think.)

Note that MF2 provides no constructs that mutate variables. Once a variable is bound, its value doesn't change. Moreover, built-in functions don't have side effects, so as long as custom functions are written in a reasonable way (that is, without side effects that cross the boundary between MF2 and function execution), MF2 has no side effects.

That means that $userName represents the same value when it appears in the selector and when it appears in any of the patterns. Conceptually, :hasCase returns a result that is used for selection; it doesn't change what the name $userName is bound to.

Abstraction over function implementations #

A developer could plug in an implementation of :hasCase that requires its argument to be an object or record that contains grammatical metadata, and then simply returns one of the fields of this object. Or we could plug in an implementation that can accept a string and uses a dictionary for the target language to guess its case. The message is structured the same way regardless, though to make it work as expected, the structure of the arguments must match the expectations of whatever functions consume them. Effectively, the message is parameterized over the meanings of its arguments and over the meanings of any custom functions it uses. This parameterization is a key feature of MF2.

Summing up #

This example couldn't be expressed in MF1, since MF1 has no custom functions. The checking for different grammatical cases would be done in the underlying programming language, with ad hoc code for selecting between the different strings. This would be error-prone, and would force a code change whenever a message changes (just as in the simple example shown previously). MF1 does have an equivalent of .match that supports a few specific kinds of matching, like plural matching. In MF2, the ability to write custom selector functions allows for much richer matching.

Further reading (or watching) #

For more background both on message formatting in general and on MF2, I recommend my teammate Ujjwal Sharma's talk at FOSDEM 2024, on which portions of this post are based. A recent MessageFormat 2 open house talk by Addison Phillips and Elango Cheran also provides some great context and motivation for why a new standard is needed.

You can also read a detailed argument in favor of a new standard by visiting the spec repository on GitHub: "Why MessageFormat needs a successor".

Igalia's work on MF2, or: Who are we and what are we doing here? #

So far, I've provided a very brief overview of the syntax and semantics of MF2. More examples will be provided via links in the next post.

A question I haven't answered is why this post is on the Igalia compilers team blog. I'm a member of the Igalia compilers team; together with Ujjwal Sharma, I have been collaborating with the MF2 working group, a subgroup of the Unicode CLDR technical committee. The working group is chaired by Addison Phillips and has many other members and contributors.

Part of our work on the compilers team has been to implement the MF2 specification as a C++ API in ICU. (Mihai Niță at Google, also a member of the working group, implemented the specification as a Java API in ICU.)

So why would the compilers team work on internationalization?

Part of our work as a team is to introduce and refine proposals for new JavaScript (JS) language features and work with TC39, the JS standards committee, to advance these proposals with the goal of inclusion in the official JS specification.

One such proposal that the compilers team has been involved with is the Intl.MessageFormat proposal.

The implementation of MessageFormat 2 in ICU provides support for browser engines (the major ones are implemented in C++) to implement this proposal. Prototype implementations are part of the TC39 proposal process.

But there's another reason: as I hinted at the beginning, the more I learned about MF2, the more I began to see it as a programming language, at least a domain-specific language. In the next post on this blog, I'll talk about the implementation work that Igalia has done on MF2 and reflect on that work from a programming language design perspective. Stay tuned!

May 06, 2024 12:00 AM

May 04, 2024

Brian Kardell

Known Knowns

Known Knowns

Fun with the DOM, the parser, illogical trees and "unknowns"...

HTML has this very tricky parser that does 'corrections' and things on the fly. So if you create a page like:

<table>
   Hello
   <td>
      Look below...
   </td>
</table>

And load it in your browser, what you'll actually get parsed as a tree will be

  • HTML
    • HEAD
    • BODY
      • #text: Hello
      • TABLE
        • TBODY
          • TR
            • TD
              • #text: Look below...
            • #text:

Things can literally be moved around, implied elements added and so on.

Illogical trees

But it's not impossible to create trees that are impossible to create with the parser itself, if you do it dynamically. With the DOM API, you can create whatever wild constructs you want: Paragraphs inside of paragraphs? Sure, why not.

Or, text that is a direct child of a table. Note this still renders the text in every browser engine.

You can even add children to 'void' elements that way too. Here's an interesting one: An <hr> with a text node child. Again, It renders the text in every browser engine (the color varies).

You can also put unknown elements into your markup and the text content is shown... By default, it is basically a <span>.

In most cases, HTML wants to show something... or at least leave CSS in control. For example, you can dynamically add children to a <script>. While that won't be shown by default, it's simply because the UA style sheet has script set to display: none;. If we change that, we can totally see it.

But this isn't universal: In some cases there are other renderers that are in control - mainly when it comes to form controls. But also, for example, if you switch the <hr> in the example above to a <br> it won't render the text. It doesn't generate a box that you can do anything with with CSS as far as I can tell, except make it display: none (useful if they’re in white-space: pre blocks to keep them from forcing open extra lines).

SVG

The HTML parser has fix ups for embedding SVG too, there are kind of integration points. But, in SVG you can have an unknown element too... For example:

<svg>
    <unknown>Test<unknown>
    <rect width="150" height="150" x="10" y="10" style="fill:blue;stroke:pink;stroke-width:5;opacity:0.5" />
</svg>

The unknown element won't render the text, nor additional SVG children inside it. For example:

<svg id=one width="300" height="170">
  <unknown><ellipse cx="120" cy="80" rx="100" ry="50" style="fill:yellow;stroke:green;stroke-width:3" /></unknown>
  <rect width="150" height="150" x="10" y="10" style="fill:blue;stroke:pink;stroke-width:5;opacity:0.5" />
</svg>

MathML

As you might expect, there are parser integrations for MathML too, and you can have an unknown elements in MathML too. In MathML, all elements (including unknown ones) generate a "math content box", but only token elements (like <mi>, <mo>, <mn>) render text. For example, the <math> element itself - if you try to put text in it, the text won't render, but it will still generate a box and other content.

<math>
   Not a token. Doesn't render.
   <mi>X</mi>
</math>

MathML has other elements too like <mrow> and <mspace> and <mphantom> which are just containers. Same story there, if you try to put text in them, the text won't render...

<math>
   <mrow>Not a token. Doesn't render.</mrow>
   <mi>X</mi>
</math>

But if you put the text inside a token element (like <mi>) inside that same <mrow>, then the text will render..

<math>
   <mrow><mi>ok</mi></mrow>
   <mi>X</mi>
</math>

In MathML, unknown elements are basically treated just like <mrow>. In the above examples, you could replace <mrow> with <unknown> and it'd be the same.

Unknown Unknowns

Ok, here's something you don't think about every day: Given this markup:

<unknown id=one>One</unknown>
<math>
  <unknown id=two>Two</unknown>
</math>
<svg>
  <unknown id=three>Three</unknown>
</svg>

One, Two and Three are three different kinds of unknowns!

console.log(one.namespaceURI, one.constructor.name)
// logs 'http://www.w3.org/1999/xhtml HTMLUnknownElement'

console.log(two.namespaceURI, two.constructor.name)
// logs 'http://www.w3.org/1998/Math/MathML MathMLElement'

console.log(three.namespaceURI, three.constructor.name)
// logs 'http://www.w3.org/2000/svg SVGElement`

In CSS, these can also (theoretically) be styled via namespaces. The following will only style the first of those:

@namespace html url(http://www.w3.org/1999/xhtml);
html|unknown { color: blue; }

Under-defined Unknowns

Remember how in the beginning we created nonsensical constructs dynamically? Well, we can do that here too. We can move an unknown MathMLElement right into HTML, or an unknown HTMLElement right into MathML, and so on - and it's not currently well-defined, universal, or consistent what actually happens here.

Here's an interesting example moves an unknown MathMLElement and a SVGElement into HTML, and an HTMLElement into MathML and so on

See the Pen Untitled by вкαя∂εℓℓ (@briankardell) on CodePen.

Here's how that renders in the various today:

Left to right: Chrome, Firefox, Safari all render differently

So, I guess we should probably fix that. I'll have to start creating some issues and tentative tests (feel free to beat me to it 😉)

Semi-related Rabbit Holes

Not specifically these issues, but related namespace stuff has caused a lot of problems that we're remedying as shown by a recent flurry of activity started by my colleague Luke Warlow to fix MathML-Core support in various libraries/frameworks:

Who's next? :)

May 04, 2024 04:00 AM

April 30, 2024

Enrique Ocaña

Dissecting GstSegments

During all these years using GStreamer, I’ve been having to deal with GstSegments in many situations. I’ve always have had an intuitive understanding of the meaning of each field, but never had the time to properly write a good reference explanation for myself, ready to be checked at those times when the task at hand stops being so intuitive and nuisances start being important. I used the notes I took during an interesting conversation with Alba and Alicia about those nuisances, during the GStreamer Hackfest in A Coruña, as the seed that evolved into this post.

But what are actually GstSegments? They are the structures that track the values needed to synchronize the playback of a region of interest in a media file.

GstSegments are used to coordinate the translation between Presentation Timestamps (PTS), supplied by the media, and Runtime.

PTS is the timestamp that specifies, in buffer time, when the frame must be displayed on screen. This buffer time concept (called buffer running-time in the docs) refers to the ideal time flow where rate isn’t being had into account.

Decode Timestamp (DTS) is the timestamp that specifies, in buffer time, when the frame must be supplied to the decoder. On decoders supporting P-frames (forward-predicted) and B-frames (bi-directionally predicted), the PTS of the frames reaching the decoder may not be monotonic, but the PTS of the frames reaching the sinks are (the decoder outputs monotonic PTSs).

Runtime (called clock running time in the docs) is the amount of physical time that the pipeline has been playing back. More specifically, the Runtime of a specific frame indicates the physical time that has passed or must pass until that frame is displayed on screen. It starts from zero.

Base time is the point when the Runtime starts with respect to the input timestamp in buffer time (PTS or DTS). It’s the Runtime of the PTS=0.

Start, stop, duration: Those fields are buffer timestamps that specify when the piece of media that is going to be played starts, stops and how long that portion of the media is (the absolute difference between start and stop, and I mean absolute because a segment being played backwards may have a higher start buffer timestamp than what its stop buffer timestamp is).

Position is like the Runtime, but in buffer time. This means that in a video being played back at 2x, Runtime would flow at 1x (it’s physical time after all, and reality goes at 1x pace) and Position would flow at 2x (the video moves twice as fast than physical time).

The Stream Time is the position in the stream. Not exactly the same concept as buffer time. When handling multiple streams, some of them can be offset with respect to each other, not starting to be played from the begining, or even can have loops (eg: repeating the same sound clip from PTS=100 until PTS=200 intefinitely). In this case of repeating, the Stream time would flow from PTS=100 to PTS=200 and then go back again to the start position of the sound clip (PTS=100). There’s a nice graphic in the docs illustrating this, so I won’t repeat it here.

Time is the base of Stream Time. It’s the Stream time of the PTS of the first frame being played. In our previous example of the repeating sound clip, it would be 100.

There are also concepts such as Rate and Applied Rate, but we didn’t get into them during the discussion that motivated this post.

So, for translating between Buffer Time (PTS, DTS) and Runtime, we would apply this formula:

Runtime = BufferTime * ( Rate * AppliedRate ) + BaseTime

And for translating between Buffer Time (PTS, DTS) and Stream Time, we would apply this other formula:

StreamTime = BufferTime * AppliedRate + Time

And that’s it. I hope these notes in the shape of a post serve me as reference in the future. Again, thanks to Alicia, and especially to Alba, for the valuable clarifications during the discussion we had that day in the Igalia office. This post wouldn’t have been possible without them.

by eocanha at April 30, 2024 06:00 AM

April 26, 2024

Gyuyoung Kim

Web Platform Test and Chromium

I’ve been working on tasks related to web platform tests since last year. In this blog post, I’ll introduce what web platform tests are and how the Chromium project incorporates these tests into its development process.

1. Introduction

The web-platform-tests project serves as a cross-browser test suite for the Web platform stack. By crafting tests that can run seamlessly across all browsers, browser projects gain assurance that their software aligns with other implementations. This confidence extends to future implementations, ensuring compatibility. Consequently, web authors and developers can trust the Web platform to fulfill its promise of seamless functionality across browsers and devices, eliminating the need for additional layers of abstraction to address gaps introduced by specification editors and implementors.

For your information, the Web Platform Test Community operates a dashboard that tracks the pass ratio of tests across major web browsers. The chart below displays the number of failing tests in major browsers over time, with Chrome showing the lowest failure rate.

[Caption] Test failures graph on major browsers
And, the below chart shows the interoperability of the web platform technology for 2023 among major browsers. The interoperability has been improving.

[Caption] Interoperability among the major browsers 2023

2. Test Suite Design

The majority of the test suite is made up of HTML pages that are designed to be loaded in a browser. These pages may either generate results programmatically or provide a set of steps for executing the test and obtaining the outcome. Overall, the tests are concise, cross-platform, and self-contained, making them easy to run in any browser.

2.1 Test Layout

Most primary directories within the repository are dedicated to tests associated with specific web standards. For W3C specifications, these directories typically use the short name of the spec, which is the name used for snapshot publications under the /TR/ path. For WHATWG specifications, the directories are usually named after the spec’s subdomain, omitting “.spec.whatwg.org” from the URL. Other specifications follow a logical naming convention.

The css/ directory contains test suites specifically designed for the CSS Working Group specifications.

Within each specification-specific directory, tests are organized in one of two common ways: a flat structure, sometimes used for shorter specifications, or a nested structure, where each subdirectory corresponds to the ID of a heading within the specification. The nested structure, which provides implicit metadata about the tested section of the specification based on its location in the filesystem, is preferred for larger specifications.

For example, tests related to “The History interface” in HTML can be found in html/browsers/history/the-history-interface/.

Many directories also include a file named META.yml, which may define properties such as:
  • spec: a link to the specification covered by the tests in the directory
  • suggested_reviewers: a list of GitHub usernames for individuals who are notified when pull requests modify files in the directory
Various resources that tests rely on are stored in common directories, including images, fonts, media, and general resources.

2.2 Test Types

Tests in this project employ various approaches to validate expected behavior, with classifications based on how expectations are expressed:

  1. Rendering Tests:
    • Reftests: Compare the graphical rendering of two (or more) web pages, asserting equality in their display (e.g., A.html and B.html must render identically). This can be done manually by users switching between tabs/windows or through automated scripts.
    • Visual Tests: Evaluate a page’s appearance, with results determined either by human observation or by comparing it with a saved screenshot specific to the user agent and platform.
  2. JavaScript Interface Tests (testharness.js tests):
    • Ensure that JavaScript interfaces behave as expected. Named after the JavaScript harness used to execute them.
  3. WebDriver Browser Automation Protocol Tests (wdspec tests):
    • Written in Python, these tests validate the WebDriver browser automation protocol.
  4. Manual Tests rely on a human to run them and determine their result.

3. Web Platform Test in Chromium

The Chromium project conducts web tests utilized by Blink to assess various components, encompassing, but not limited to, layout and rendering. Generally, these web tests entail loading pages in a test renderer (content_shell) and comparing the rendered output or JavaScript output with an expected output file. The Web Tests include “Web platform tests”(WPT) located at web_tests/external/wpt. Other directories are for Chrome-specific tests only. In the web_tests/external/wpt, there are around 50,000 web platform tests currently.

3.1 Web Tests in the Chromium development process

Every patch must pass all tests before it can be merged into the main source tree. In Gerrit, trybots execute a wide array of tests, including the Web Test, to ensure this. We can initiate these trybots with a specific patch, and each trybot will then run the scheduled tests using that patch.

[Caption] Trybots in Gerrit
For example, the linux-rel trybot runs the web tests in the blink_web_tests step as below,

[Caption] blink_web_tests in the linux-rel trybot

3.2 Internal sequence for running the Web Test in Chromium

Let’s take a look at how Chromium internally runs the web tests simply. Basically, there are 2 major components to run the web tests in Chromium. One is run_web_tests.py acting as a server. The other one is content_shell acting as a client that loads a passed test and returns the result. The input and output of content_shell are assumed to follow the run_web_tests protocol through pipes that connect stdin and stdout of run_web_tests.py and content_shell as below,

[Caption] Sequence how to execute a test between run_web_test.py and content_shell

4. In conclusion

We just explored what Web Tests are and how Chromium runs these tests for web platform testing. Web platform tests play a crucial role in ensuring interoperability among various web browsers and platforms. In the following blog post, I will share how did I support and implement to run of web tests on iOS for the Blink port.

References

by gyuyoung at April 26, 2024 08:48 AM

April 24, 2024

Stephen Chenney

CSS Custom Properties in Highlight Pseudos

The CSS highlight inheritance model describes the process for inheriting the CSSproperties of the various highlight pseudo elements:

  • ::selection controlling the appearance of selected content
  • ::spelling-error controlling the appearance of misspelled word markers
  • ::grammar-error controlling how grammar errors are marked
  • ::target-text controlling the appearance of the string matching a target-text URL
  • ::highlight defining the appearance of a named highlight, accessed via the CSS highlight API

The inheritance model is intended to generate more intuitive behavior for examples like the one below, where we would expect the second line to be entirely blue because the <em> is a child of the blue paragraph that typically would inherit properties from the paragraph:

<style>
::selection /* = *::selection (universal) */ {
color: lightgreen;
}
.blue::selection {
color: blue;
}
</style>
<p>Some <em>lightgreen</em> text</p>
<p class="blue">Some <em>lightgreen</em> text that one would expect to be blue</p>
<script>
range = new Range();
range.setStart(document.body, 0);
range.setEnd(document.body, 3)
document.getSelection().addRange(range);
</script>

Before this model was standardized, web developers achieved the same effect using CSS custom properties. The selection highlight could make use of a custom property defined on the originating element (the element that is selected). Being defined on the originating element tree, those custom properties were inherited, so a universal highlight would effectively inherit the properties of its parent. Here’s what it looks like:

<style>
:root {
--selection-color: lightgreen;
}
::selection /* = *::selection (universal) */ {
color: var(--selection-color);
}
.blue {
--selection-color: blue;
}
</style>
<p>Some <em>lightgreen</em> text</p>
<p class="blue">Some <em>lightgreen</em> text that is also blue</p>
<script>
range = new Range();
range.setStart(document.body, 0);
range.setEnd(document.body, 3)
document.getSelection().addRange(range);
</script>

In the example, all selection highlights use the value of --selection-color as the selection text color. The <em> element inherits the property value of blue from its parent <p> and hence has a blue highlight.

This approach to selection inheritance was promoted in Stack Overflow posts and other places, even in posts that did not discuss inheritance. The problem is that the new highlight inheritance model, as previously specified, broke this behavior in a way that required significant changes for web sites making use of the former approach. At the prompting of developer advocates, the CSS Working Group decided to change the specified behavior of custom properties with highlight pseudos to support the existing recommended approach.

Custom Properties for Highlights #

The updated behavior for custom properties in highlight pseudos is that highlight properties that have var(...) references take the property values from the originating element. In addition, custom properties defined in the highlight pseudo itself will be ignored. The example above works with this new approach, so content making use of it will not break when highlight inheritance is enabled in browsers.

Note that this explicitly conflicts with the previous guidance, which was to define the custom properties on the highlight pseudos, with only the root highlight inheriting custom properties from the document root element.

The change is implemented behind a flag in Chrome 126 and higher, as is expected to be enabled by default as early as Chrome 130. To see the effects right now, enable “Experimental Web Platform features” via chrome://flags in M126 Chrome (Canary at the time of writing).

A Unified Approach for Derived Properties in Highlights #

There are other situations in which properties are disallowed on highlight pseudos yet those properties are necessary to resolve variables. For example, lengths that use font based units, such as 0.5em cannot use font-size from a highlight pseudo because font-size is not allowed in highlights. Similarly container-based units or viewport units. The general rule is that the highlight will get whatever value it needs from the originating element, and that now includes custom property values.

April 24, 2024 12:00 AM

April 22, 2024

Brian Kardell

Mirror Effects

Mirror Effects

Today I had this thought about "AI" that felt more like a short blog post than a social media thread, so I thought I'd share it.

I love learning and thinking about weird indirect connections like how thing X made things Y and Z suddenly possible, and given some time these had secondary or tertiary impacts that were really unexpected.

There are so many possible examples: We didn't really expect the web and social media to give rise to the Arab Spring. Nor to such disinformation. Nor the 2016 American election (or several others similarly around the world). Maybe Carrier didn't expect that the invention of air conditioning would help reshape the American political map, but it did.

One of the things that came to my mind today was a thing I read about mirrors. Ian Mortimer explains how improvements to and the spread of mirrors radically changed lots of things. Here's a quote from the piece:

The very act of a person seeing himself in a mirror or being represented in a portrait as the center of attention encouraged him to think of himself in a different way. He began to see himself as unique. Previously the parameters of individual identity had been limited to an individual’s interaction with the people around him and the religious insights he had over the course of his life. Thus individuality as we understand it today did not exist; people only understood their identity in relation to groups—their household, their manor, their town or parish—and in relation to God... The Mirror Effect

It's pretty interesting to look back and observe all of the changes that this began to stir - from the way people literally lived (more privacy), to changes in the types of writing and art, and how we thought about fitting in to the larger world (and thus the societies we built).

So, today I had this random thought that I wonder what sorts of effects like these will come from all of the "AI" focus. That is, not the common/direct sorts of things we're talking about but the ones that maybe have very little to do with any actual technology even. Which things will some future us look back on in 20 or 100 years and say "huh, interesting".

In particular, the thing that brought this to mind is that I am suddenly seeing lots of more people suddenly having conversations like "What is consciousness though, really?" and "What is intelligence, really?". I don't mean from technologists, I just mean it seems to be causing lots more people to suddenly think about that kind of thing.

So it made me think: I wonder if we will see increased signups for philosophy courses? Or, sales of more books along these lines? Could that ultimately lead to another, similar sort of changing in how we collectively see ourselves? I wonder what effects this has in the long term on literature, film, or even science and government.

This isn't a "take" - it's not trying to be optimistic or pessimistic. It's more of a "huh... I hadn't thought much about that before, but it's kind of interesting to think about." Don't you think? Outside of the normal sorts of things we're obviously thinking about - what are some you could imagine?

As a kind of final interesting note: Stephen Johnson is a great storyteller, and his work is full of connections like these. If you want to read more, check out his books. Part of the reason that I mention him is that interestingly, it seems that he has recently gone to work with Google on a LM project about writing himself. I bet he's got some notes and ideas on this already too.

April 22, 2024 04:00 AM

April 16, 2024

Philippe Normand

From WebKit/GStreamer to rust-av, a journey on our stack’s layers

In this post I’ll try to document the journey starting from a WebKit issue and ending up improving third-party projects that WebKitGTK and WPEWebKit depend on.

I’ve been working on WebKit’s GStreamer backends for a while. Usually some new feature needed on WebKit side would trigger work on GStreamer. That’s quite common and healthy actually, by improving GStreamer (bug fixes or implementing new features) we make the whole stack stronger (hopefully). It’s not hard to imagine other web-engines, such as Servo for instance, leveraging fixes made in GStreamer in the context of WebKit use-cases.

Sometimes though we have to go deeper and this is what this post is about!

Since version 2.44, WebKitGTK and WPEWebKit ship with a WebCodecs backend. That backend leverages the wide range of GStreamer audio and video decoders/encoders to give low-level access to encoded (or decoded) audio/video frames to Web developers. I delivered a lightning talk at gst-conf 2023 about this topic.

There are still some issues to fix regarding performance and some W3C web platform tests are still failing. The AV1 decoding tests were flagged early on while I was working on WebCodecs, I didn’t have time back then to investigate the failures further, but a couple weeks ago I went back to those specific issues.

The WebKit layout tests harness is executed by various post-commit bots, on various platforms. The WebKitGTK and WPEWebKit bots run on Linux. The WebCodec tests for AV1 currently make use of the GStreamer av1enc and dav1ddec elements. We currently don’t run the tests using the modern and hardware-accelerated vaav1enc and vaav1dec elements because the bots don’t have compatible GPUs.

The decoding tests were failing, this one for instance (the ?av1 variant). In that test both encoding and decoding are tested, but decoding was failing, for a couple reasons. Rabbit hole starts here. After debugging this for a while, it was clear that the colorspace information was lost between the encoded chunks and the decoded frames. The decoded video frames didn’t have the expected colorimetry values.

The VideoDecoderGStreamer class basically takes encoded chunks and notifies decoded VideoFrameGStreamer objects to the upper layers (JS) in WebCore. A video frame is basically a GstSample (Buffer and Caps) and we have code in place to interpret the colorimetry parameters exposed in the sample caps and translate those to the various WebCore equivalents. So far so good, but the caps set on the dav1ddec elements didn’t have those informations! I thought the dav1ddec element could be fixed, “shouldn’t be that hard” and I knew that code because I wrote it in 2018 :)

So let’s fix the GStreamer dav1ddec element. It’s a video decoder written in Rust, relying on the dav1d-rs bindings of the popular C libdav1d library. The dav1ddec element basically feeds encoded chunks of data to dav1d using the dav1d-rs bindings. In return, the bindings provide the decoded frames using a Dav1dPicture Rust structure and the dav1ddec GStreamer element basically makes buffers and caps out of this decoded picture. The dav1d-rs bindings are quite minimal, we implemented API on a per-need basis so far, so it wasn’t very surprising that… colorimetry information for decoded pictures was not exposed! Rabbit hole goes one level deeper.

So let’s add colorimetry API in dav1d-rs. When working on (Rust) bindings of a C library, if you need to expose additional API the answer is quite often in the C headers of the library. Every Dav1dPicture has a Dav1dSequenceHeader, in which we can see a few interesting fields:

typedef struct Dav1dSequenceHeader {
...
    enum Dav1dColorPrimaries pri; ///< color primaries (av1)
    enum Dav1dTransferCharacteristics trc; ///< transfer characteristics (av1)
    enum Dav1dMatrixCoefficients mtrx; ///< matrix coefficients (av1)
    enum Dav1dChromaSamplePosition chr; ///< chroma sample position (av1)
    ...
    uint8_t color_range;
    ...
...
} Dav1dSequenceHeader;

After sharing a naive branch with rust-av co-maintainers Luca Barbato and Sebastian Dröge, I came up with a couple pull-requests that eventually were shipped in version 0.10.3 of dav1d-rs. I won’t deny matching primaries, transfer, matrix and chroma-site enum values to rust-avs Pixel enum was a bit challenging :P Anyway, with dav1d-rs fixed up, rabbit hole level goes up one level :)

Now with the needed dav1d-rs API, the GStreamer dav1ddec element could be fixed. Again, matching the various enum values to their GStreamer equivalent was an interesting exercise. The merge request was merged, but to this date it’s not shipped in a stable gst-plugins-rs release yet. There’s one more complication here, ABI broke between dav1d 1.2 and 1.4 versions. The dav1d-rs 0.10.3 release expects the latter. I’m not sure how we will cope with that in terms of gst-plugins-rs release versioning…

Anyway, WebKit’s runtime environment can be adapted to ship dav1d 1.4 and development version of the dav1ddec element, which is what was done in this pull request. The rabbit is getting out of his hole.

The WebCodec AV1 tests were finally fixed in WebKit, by this pull request. Beyond colorimetry handling a few more fixes were needed, but luckily those didn’t require any fixes outside of WebKit.

Wrapping up, if you’re still reading this post, I thank you for your patience. Working on inter-connected projects can look a bit daunting at times, but eventually the whole ecosystem benefits from cross-project collaborations like this one. Thanks to Luca and Sebastian for the help and reviews in dav1d-rs and the dav1ddec element. Thanks to my fellow Igalia colleagues for the WebKit reviews.

by Philippe Normand at April 16, 2024 08:14 PM

April 04, 2024

Jesse Alama

The decimals around us: Cataloging support for decimal numbers

A cat­a­log of sup­port for dec­i­mal num­bers in var­i­ous pro­gram­ming lan­guages

Dec­i­mals num­bers are a data type that aims to ex­act­ly rep­re­sent dec­i­mal num­bers. Some pro­gram­mers may not know, or ful­ly re­al­ize, that, in most pro­gram­ming lan­guages, the num­bers that you en­ter look like dec­i­mal num­bers but in­ter­nal­ly are rep­re­sent­ed as bi­na­ry—that is, base-2—float­ing-point num­bers. Things that are to­tal­ly sim­ple for us, such as 0.1, sim­ply can­not be rep­re­sent­ed ex­act­ly in bi­na­ry. The dec­i­mal data type—what­ev­er its stripe or fla­vor—aims to rem­e­dy this by giv­ing us a way of rep­re­sent­ing and work­ing with dec­i­mal num­bers, not bi­na­ry ap­prox­i­ma­tions there­of. (Wikipedia has more.)

To help with my work on adding dec­i­mals to JavaScript, I've gone through a list of pop­u­lar pro­gram­ming lan­guages, tak­en from the 2022 Stack­Over­flow de­vel­op­er sur­vey. What fol­lows is a brief sum­ma­ry of where these lan­guages stand re­gard­ing dec­i­mals. The in­ten­tion is to keep things sim­ple. The pur­pose is:

  1. If a lan­guage does have dec­i­mals, say so;
  2. If a lan­guage does not have dec­i­mals, but at least one third-par­ty li­brary ex­ists, men­tion it and link to it. If a dis­cus­sion is un­der­way to add dec­i­mals to the lan­guage, link to that dis­cus­sion.

There is no in­ten­tion to fil­ter out an lan­guage in par­tic­u­lar; I'm just work­ing with a slice of lan­guages found in in the Stack­Over­flow list linked to ear­li­er. If a lan­guage does not have dec­i­mals, there may well be mul­ti­ple third-part dec­i­mal li­braries. I'm not aware of all li­braries, so if I have linked to a mi­nor li­brary and ne­glect to link to a more high-pro­file one, please let me know. More im­por­tant­ly, if I have mis­rep­re­sent­ed the ba­sic fact of whether dec­i­mals ex­ists at all in a lan­guage, send mail.

C

C does not have dec­i­mals. But they're work­ing on it! The C23 stan­dard (as in, 2023) stan­dard pro­pos­es to add new fixed bit-width data types (32, 64, and 128) for these num­bers.

C#

C# has dec­i­mals in its un­der­ly­ing .NET sub­sys­tem. (For the same rea­son, dec­i­mals also ex­ist in Vi­su­al Ba­sic.)

C++

C++ does not have dec­i­mals. But—like C—they're work­ing on it!

Dart

Dart does not have dec­i­mals. But a third-par­ty li­brary ex­ists.

Go

Go does not have dec­i­mals, but a third-par­ty li­brary ex­ists.

Java

Java has dec­i­mals.

JavaScript

JavaScript does not have dec­i­mals. We're work­ing on it!

Kotlin

Kotlin does not have dec­i­mals. But, in a way, it does: since Kotlin is run­ning on the JVM, one can get dec­i­mals by us­ing Java's built-in sup­port.

PHP

PHP does not have dec­i­mals. An ex­ten­sion ex­ists and at least one third-par­ty li­brary ex­ists.

Python

Python has dec­i­mals.

Ruby

Ruby has dec­i­mals. De­spite that, there is some third-par­ty work to im­prove the built-in sup­port.

Rust

Rust does not have dec­i­mals, but a crate ex­ists.

SQL

SQL has dec­i­mals (it is the DECIMAL data type). (Here is the doc­u­men­ta­tion for, e.g., Post­greSQL, and here is the doc­u­men­ta­tion for MySQL.)

Swift

Swift has dec­i­mals

Type­Script

Type­Script does not have dec­i­mals. How­ev­er, if dec­i­mals get added to JavaScript (see above), Type­Script will prob­a­bly in­her­it dec­i­mals, even­tu­al­ly.

April 04, 2024 12:52 PM

Getting started with Lean 4, your next programming language

I had the plea­sure of learn­ing about Lean 4 with David Chris­tiansen and Joachim Bre­it­ner at their tu­to­r­i­al at BOBKonf 2024. I‘m plan­ning on do­ing a cou­ple of for­mal­iza­tions with Lean and would love to share what I learn as a to­tal new­bie, work­ing on ma­cOS.

Need­ed tools

I‘m on ma­cOS and use Home­brew ex­ten­sive­ly. My sim­ple go-to ap­proach to find­ing new soft­ware is to do brew search lean. This re­vealed lean as well as sur­face elan. Run­ning brew info lean showed me that that pack­age (at the time I write this) in­stalls Lean 3. But I know, out-of-band, that Lean 4 is what I want to work with. Run­ning brew info elan looked bet­ter, but the out­put re­minds me that (1) the in­for­ma­tion is for the elan-init pack­age, not the elan cask, and (2) elan-init con­flicts with both the elan and the afore­men­tioned lean. Yikes! This strikes me as a po­ten­tial prob­lem for the com­mu­ni­ty, be­cause I think Lean 3, though it still works, is pre­sum­ably not where new Lean de­vel­op­ment should be tak­ing place. Per­haps the Home­brew for­mu­la for Lean should be up­dat­ed called lean3, and a new lean4 pack­age should be made avail­able. I‘m not sure. The sit­u­a­tion seems less than ide­al, but in short, I have been suc­cess­ful with the elan-init pack­age.

Af­ter in­stalling elan-init, you‘ll have the elan tool avail­able in your shell. elan is the tool used for main­tain­ing dif­fer­ent ver­sions of Lean, sim­i­lar to nvm in the Node.js world or pyenv.

Set­ting up a blank pack­age

When I did the Lean 4 tu­to­r­i­al at BOB, I worked en­tire­ly with­in VS Code (…) and cre­at­ed a new stand­alone pack­age us­ing some in-ed­i­tor func­tion­al­i­ty. At the com­mand line, I use lake init to man­u­al­ly cre­ate a new Lean pack­age. At first, I made the mis­take of run­ning this com­mand, as­sum­ing it would cre­ate a new di­rec­to­ry for me and set up any con­fig­u­ra­tion and boil­er­plate code there. I was sur­prised to find, in­stead, that lake init sets things up in the cur­rent di­rec­to­ry, in ad­di­tion to cre­at­ing a sub­di­rec­to­ry and pop­u­lat­ing it. Us­ing lake --help, I read about the lake new com­mand, which does what I had in mind. So I might sug­gest us­ing lake new rather than lake init.

What‘s in the new di­rec­to­ry? Do­ing tree foobar re­veals

foobar ├── Foobar │   └── Basic.lean ├── Foobar.lean ├── Main.lean ├── lakefile.lean └── lean-toolchain

Tak­ing a look there, I see four .lean files. Here‘s what they con­tain:

Main.lean

import «Foobar» def main : IO Unit := IO.println s!"Hello, {hello}!"

Foobar.lean

-- This module serves as the root of the `Foobar` library. -- Import modules here that should be built as part of the library. import «Foobar».Basic

Foobar/Basic.lean

def hello := "world"

lakefile.lean

import Lake open Lake DSL package «foobar» where -- add package configuration options here lean_lib «Foobar» where -- add library configuration options here @[default_target] lean_exe «foobar» where root := `Main

It looks like there‘s a lit­tle mod­ule struc­ture here, and a ref­er­ence to the iden­ti­fi­er hello, de­fined in Foobar/Basic.lean and made avail­able via Foobar.lean. I’m not go­ing to touch lakefile.lean for now; as a new­bie, it looks scary enough that I think I’ll just stick to things like Basic.lean.

There‘s also an au­to­mat­i­cal­ly cre­at­ed .git there, not shown in the di­rec­to­ry out­put above.

Now what?

Now that you‘ve got Lean 4 in­stalled and set up a pack­age, you‘re ready to dive in to one of the of­fi­cial tu­to­ri­als. The one I‘m work­ing through is David‘s Func­tion­al Pro­gram­ming in Lean. There‘s all sorts of ad­di­tion­al things to learn, such as all the dif­fer­ent lake com­mands. En­joy!

April 04, 2024 01:48 AM

April 03, 2024

Brian Kardell

The Blessing of the Strings

The Blessing of the Strings

Trusted Types have been a proposal by Google for quite some time at this point, but it's currently getting a lot of attention and work in all browsers (Igalia is working on implementations in WebKit and Gecko, sponsored by Salesforce and Google, respectively). I've been looking at it a lot and thought it's probably something worth writing about.

The Trusted Types proposal is about preventing Cross-site scripting (XSS), and rides atop Content Security Policy (CSP) and allows website maintainers to say "require trusted-types". Once required, lots of the Web Platform's dangerous API surfaces ("sinks") which currently require a string will now require... well, a different type.

myElement.innerHTML (and a whole lot of other APIs) for example, would now require a TrustedHTML object instead of just a string.

You can think of TrustedHTML as an interface indicating that a string has been somehow specially "blessed" as safe... Sanitized.

the Holy Hand grenade scene from Monty Python's Holy Grail
And Saint Attila raised the string up on high, saying, 'O Lord, bless this thy string, that with it we may trust that it is free of XSS...' [ref].

Granting Blessings

The interesting thing about this is how one goes about blessing strings, and how this changes the dynamics of development and safety to protect from XSS.

To start with, there is a new global trustedTypes object (available in both window and workers) with a method called .createPolicy which can be used to create "policies" for blessing various kinds of input (createHTML, createScript, and createScriptURL). Trusted Types comes with the concept of a default policy, and the ability for you to register a specially named "default"...

//returns a policy, but you 
// don't really need to do anything 
// with the default one
trustedTypes.createPolicy(
    "default", 
    {
      createHTML: s => { 
          return DOMPurify.sanitize(s) 
      } 
    }
);

And now, the practical upshot is that all attempts to set HTML will be sanitized... So if there's some code that tries to do:

// if str contains
// `&lt;img src="no" onerror="<em>dangerous code</em>" &gt;`;
target.innerHTML =  str;

Then the onerror attribute will be automatically stripped (sanitized) before .innerHTML gets it.

Hey that's pretty cool!

one of the scenes where the castle guard is mocking arthur and his men
It's almost like you just put defenses around all that stuff and can just peer over the wall at would be attackers and make faces at them....

But wait... can't someone come along then and just create a more lenient policy called default?

No! That will throw an exception!

Also, you don't have to create a default. If you don't, and someone tries to use one of those methods to assign a string, it will throw.

The only thing this enforcement cares about is that it is one of these "blessed" types. Website administrators can also provide (in the header) the name of 1 or more policies which should be created.

Any attempts to define a policy not in that list will throw (it's a bit more complicated than that, see Name your Policy below). Let's imagine that in the header we specified that a policy named "sanitize" is allowed to be created.

Maybe you can see some of why that starts to get really interesting. In order to use any of those APIs (at all), you'd need access to a policy in order to bless the string. But because the policy which can do that blessing is a handle, it's up to you what code you give it to...

{
  const sanitizerPolicy = 
      trustedTypes.createPolicy(
        "sanitize",
        {
          createHTML: s => { 
            return DOMPurify.sanitize(s) 
        } 
  );


    // give someOtherModule access to a sanitization policy
    someOtherModule.init(sanitizerPolicy)

    // yetAnotherModule can't even sanitize, any use of those
    // APIs will throw
    yetAnotherModule.foo()
}

// Anything out here also doesn't have 
// access to a sanitization policy

What's interesting about this is that the thing doing the trusting on the client, is actually on the client as well - but the pattern ensures that this becomes a considerably more finite problem. It is much easier to audit whether the "trust" is warranted. That is, we can look at the above to see that there is only one policy and it only supports creating HTML. We can see that the trust there is placed in DOMPurify, and even that amount of trust is only provided to select modules.

Finally, most importantly: It is a pattern that is machine enforceable. Anything that tries to use any of those APIs without a blessed string (a Trusted Type) will fail... Unless you ask it not to.

Don't Throw, Just Help?

Shutting down all of those APIs after the fact is hard because all of those dangerous APIs are also really useful and therefore widely used. As I said earlier, auditing to find and understand all uses of them all is pretty difficult. Chances are pretty good that there might just be a lot more unsafe stuff floating around in your site than you expected.

Instead of Content-Security-Policy CSP headers, you can send Content-Security-Policy-Report-Only and include a directive that includes report-to /csp-violation-report-endpoint/ where /csp-violation-report-endpoint/ is an endpoint path (on the same origin). If set, whenever violations occur, browsers should send a request to report a violation to that endpoint (JSON formatted with lots of data).

The general idea is that it is then pretty easy to turn this on and monitor your site to discover where you might have some problems, and begin to work through them. This should be especially good for your QA environment. Just keep in mind that the report doesn't actually prevent the potentially bad things from happening, it just lets you know they exist.

Shouldn't there just be a standard santizer too?

Yes!! That is also a thing that is being worked on.

Name Your Policy

I'm not going to lie, I found CSP/headers to be both a little confusing to read and to figure out their relationships. You might see a header set up to report only....

Content-Security-Policy-Report-Only: report-uri /csp-violation-report-endpoint; default-src 'self'; require-trusted-types-for 'script'; trusted-types one two;

Believe it or not that's a fairly simple one. Basically though, you split it up on semi-colons and each of those is a directive. The directive has a name like "report-uri" followed by whitespace and then a list of values (potentially containing only 1) which are whitespace separated. There are also keyword values which are quoted.

So, the last two parts of this are about Trusted Types. The first, require-trusted-types-for is about what gets some kind of enforcement and really the only thing you can put there currently is the keyword 'script'. The second, trusted-types is about what policies can be created.

Note that I said "some kind of enforcement" because the above is "report only" which means those things will report, but not actually throw, while if we just change the name of the header from Content-Security-Policy-Report-Only to Content-Security-Policy lots of things might start throwing - which didn't greatly help my exploration. So, here's a little table that might help..

If the directives are... then...
(missing) You can create whatever policies you want (except duplicates), but they aren't enforced in any way.
require-trusted-types-for 'script'; You can create whatever policies you want (except duplicates), and they are enforced. All attempts to assign strings to those sinks will throw. This means if you create a policy named default, it will 'bless' strings through that automatically, but it also means anyone can create any policy to 'bless' strings too.
trusted-types You cannot create any policies whatsoever. Attempts to will throw.
trusted-types 'none' Same as with no value.
trusted-types a b You can call createPolicy with names 'a' and 'b' exactly once. Attempts to call with other names (including 'default'), or repeatedly will throw.
trusted-types default You can call createPolicy with names 'default' exactly once. Attempts to call with other names, or repeatedly will throw.
require-trusted-types-for 'script'; trusted-types a You can call createPolicy with names 'a' exactly once. Attempts to call with other names (including default), or repeatedly will throw. All attempts to assign strings to those sinks will throw unless they are 'blessed' from a function in a policy named 'a'

April 03, 2024 04:00 AM

April 02, 2024

Maíra Canal

Linux 6.8: AMD HDR and Raspberry Pi 5

The Linux kernel 6.8 came out on March 10th, 2024, bringing brand-new features and plenty of performance improvements on different subsystems. As part of Igalia, I’m happy to be an active part of many features that are released in this version, and today I’m going to review some of them.

Linux 6.8 is packed with a lot of great features, performance optimizations, and new hardware support. In this release, we can check the Intel Xe DRM driver experimentally, further support for AMD Zen 5 and other upcoming AMD hardware, initial support for the Qualcomm Snapdragon 8 Gen 3 SoC, the Imagination PowerVR DRM kernel driver, support for the Nintendo NSO controllers, and much more.

Igalia is widely known for its contributions to Web Platforms, Chromium, and Mesa. But, we also make significant contributions to the Linux kernel. This release shows some of the great work that Igalia is putting into the kernel and strengthens our desire to keep working with this great community.

Let’s take a deep dive into Igalia’s major contributions to the 6.8 release:

AMD HDR & Color Management

You may have seen the release of a new Steam Deck last year, the Steam Deck OLED. What you may not know is that Igalia helped bring this product to life by putting some effort into the AMD driver-specific color management properties implementation. Melissa Wen, together with Joshua Ashton (Valve), and Harry Wentland (AMD), implemented several driver-specific properties to allow Gamescope to manage color features provided by the AMD hardware to fit HDR content and improve gamers’ experience.

She has explained all features implemented in the AMD display kernel driver in two blog posts and a 2023 XDC talk:

Async Flip

André Almeida worked together with Simon Ser (SourceHut) to provide support for asynchronous page-flips in the atomic API. This feature targets users who want to present a new frame immediately, even if after missing a V-blank. This feature is particularly useful for applications with high frame rates, such as gaming.

Raspberry Pi 5

Raspberry Pi 5 was officially released on October 2023 and Igalia was ready to bring top-notch graphics support for it. Although we still can’t use the RPi 5 with the mainline kernel, it is superb to see some pieces coming upstream. Iago Toral worked on implementing all the kernel support needed for the V3D 7.1.x driver.

With the kernel patches, by the time the RPi 5 was released, it already included a fully 3.1 OpenGL ES and Vulkan 1.2 compliant driver implemented by Igalia.

GPU stats and CPU jobs for the Raspberry Pi 4/5

Apart from the release of the Raspberry Pi 5, Igalia is still working on improving the whole Raspberry Pi environment. I worked, together with José Maria “Chema” Casanova, implementing the support for GPU stats on the V3D driver. This means that RPi 4/5 users now can access the usage percentage of the GPU and they can access the statistics by process or globally.

I also worked, together with Melissa, implementing CPU jobs for the V3D driver. As the Broadcom GPU isn’t capable of performing some operations, the Vulkan driver uses the CPU to compensate for it. In order to avoid stalls in the job submission, now CPU jobs are part of the kernel and can be easily synchronized though with synchronization objects.

If you are curious about the CPU job implementation, you can check this blog post.

Other Contributions & Fixes

Sometimes we don’t contribute to a major feature in the release, however we can help improving documentation and sending fixes. André also contributed to this release by documenting the different AMD GPU reset methods, making it easier to understand by future users.

During Igalia’s efforts to improve the general users’ experience on the Steam Deck, Guilherme G. Piccoli noticed a message in the kernel log and readily provided a fix for this PCI issue.

Outside of the Steam Deck world, we can check some of Igalia’s work on the Qualcomm Adreno GPUs. Although most of our Adreno-related work is located at the user-space, Danylo Piliaiev sent a couple of kernel fixes to the msm driver, fixing some hangs and some CTS tests.

We also had contributions from our 2023 Igalia CE student, Nia Espera. Nia’s project was related to mobile Linux and she managed to write a couple of patches to the kernel in order to add support for the OnePlus 9 and OnePlus 9 Pro devices.

If you are a student interested in open-source and would like to have a first exposure to the professional world, check if we have openings for the Igalia Coding Experience. I was a CE student myself and being mentored by a Igalian was a incredible experience.

Check the complete list of Igalia’s contributions for the 6.8 release

Authored (57):

André Almeida (2)

Danylo Piliaiev (2)

Guilherme G. Piccoli (1)

Iago Toral Quiroga (4)

Maíra Canal (17)

Melissa Wen (27)

Nia Espera (4)

Signed-off-by (88):

André Almeida (4)

Danylo Piliaiev (2)

Guilherme G. Piccoli (1)

Iago Toral Quiroga (4)

Jose Maria Casanova Crespo (2)

Maíra Canal (28)

Melissa Wen (43)

Nia Espera (4)

Acked-by (4):

Jose Maria Casanova Crespo (2)

Maíra Canal (1)

Melissa Wen (1)

Reviewed-by (30):

André Almeida (1)

Christian Gmeiner (1)

Iago Toral Quiroga (20)

Maíra Canal (4)

Melissa Wen (4)

Tested-by (1):

Guilherme G. Piccoli (1)

April 02, 2024 11:00 AM

March 20, 2024

Jani Hautakangas

Bringing WebKit back to Android.

It’s been quite a while since the last blog post about WPE-Android, but that doesn’t mean WPE-Android hasn’t been in development. The focus has been on stabilizing the runtime and implementing the most crucial core features to make it easier to integrate new APIs into WPEView.

Main building blocks #

WPE-Android has three main building blocks:

  • Cerbero
    • Cross-platform build aggregator that is used to build WPE WebKit and all of its dependencies to Android.
  • WPEBackend-Android
    • Implements Android specific graphics buffer support and buffer sharing between WebKit UIProcess and WebProcess.
  • WPEView
    • Allows displaying web content in activity layout using WPEWebKit.

WPE-Android high-level design

What’s new #

The list of all work completed so far would be quite long, as there have been no official releases or public announcements, with the exception of the last blog post. Since that update, most of the efforts have been focused on ‘under the hood’ improvements, including enhancing stability, adding support for some core WebKit features, and making general improvements to the development infrastructure.

Here is a list of new features worth mentioning:

  • Based on WPE WebKit 2.42.1, the project was branched for the 2.42.x series. Future work will continue in the main branch for the next release.
  • Dropped support for 32-bit platforms (x86 and armv7). Only arm64-v8a and x86_64 are supported
  • Integration to Android main loop so that WPE WebKit GLib main loop is driven by Android main loop
  • Process-Swap On Navigation aka PSON
  • Added ASharedMemory support to WebKit SharedMemory
  • Hardware-accelerated multimedia playback
  • Fullscreen support
  • Cookies management
  • ANGLE based WebGL
  • Cross process fence insertion for composition synchronization with Surface Flinger
  • WebDriver support
  • GitHub Actions build bots
  • GitHub Actions WebDriver test bots

Demos #

WPEWebkit powered web view (WPEView) #

Demo uses WPE-Android MiniBrowser sample application to show basic web page loading and touch-based scrolling, usage of a simple cookie manager to clear the page’s date usage, and finally loads the popular “Aquarium sample” to show a smooth (60FPS) WebGL animation running thanks to HW acceleration support.

WebDriver #

Demo shows how to run WebDriver test with with emulator. Detailed instructions how to run WebDriver test can be found in README.md

Test requests a website through the Selenium remote webdriver. It then replaces the default behavior of window.alert on the requested page by injecting and executing a JavaScript snippet. After loading the page, it performs a click() action on the element that calls the alert. This results in the text ‘cheese’ being displayed right below the ‘click me’ link.

Test contains three building blocks:

  • WPE WebDriver running on emulator
  • HTTP server serving test.html web page
  • Selenium python test script executing the test

What’s next #

We have ambitious plans for the coming months regarding the development of WPE Android, focusing mainly on implementing additional features, stabilization, and performance improvements.

As for implementing new features, now that the integration with the WPE WebKit runtime has reached a more robust state, it’s time to start adding more of the APIs that are still missing in WPEView compared to other webviews on Android and to enable other web-facing features supported by WebKit. This effort, along with adding support for features like HTTP/2 and the remote Web Inspector, will be a major focus.

As for stabilization and performance, having WebDriver support will be very helpful as it will enable us to identify and fix issues promptly and thus help make WPE Android more stable and feature-complete. We would also like to focus on conformance testing compared to other web views on Android, which should help us prioritize our efforts.

The broader goal for this year is to develop WPE-Android into an easy-to-use platform for third-party projects, offering a compelling alternative to other webviews on the Android platform. We extend many thanks to the NLNet Foundation, whose support for WPE Android through a grant from the NGI Zero Core fund will be instrumental in helping us achieve this goal!

Try it yourself #

WPE-Android is still considered a prototype, and it’s a bit difficult to build. However, if you want to try it out, you can follow the instructions in the README.md file. Additionally, you can use the project’s issue tracker to report problems or suggest new features.

March 20, 2024 12:00 AM

March 17, 2024

Qiuyi Zhang (Joyee)

Memory leak regression testing with V8/Node.js, part 3 - heap iteration-based testing

In the previous blog post, I described the heap snapshot trick as an “abuse” of the heap snapshot API, because heap snapshots are not designed to interact with the finalizers run in the heap

March 17, 2024 07:28 PM

Memory leak regression testing with V8/Node.js, part 1 - memory usage-based testing

Like many other relatively big piece of software, Node.js is no stranger to memory leaks, and with them, fixes and regression tests

March 17, 2024 07:24 PM

Memory leak regression testing with V8/Node.js, part 2 - finalizer-based testing

In the previous blog post, I talked about how Node.js used memory usage measurement to test against memory leaks. Sometimes that’s good enough to provide valid tests

March 17, 2024 07:20 PM

March 13, 2024

Brian Kardell

How We Fund the Web Ecosystem

How We Fund the Web Ecosystem

On Tuesday (March 12th, 2024), Robin Berjon and Eric Meyer and I organized, led and scribed a session during W3C breakouts day about how we fund the web ecosystem…

Every year the W3C has a week long giant set of in-person meetings called TPAC. A nice feature of those meetings has always been “Breakouts Day” which is a day where people can propose sessions about pretty much anything and we try to organize a schedule around the ones that seem interesting to enough people.

This year, the W3C decided to try a second Breakouts Day that is not at the same time as TPAC, and was purely online.

Over the last several years, I’ve written several pieces about different aspects of the health of the web ecosystem and led a podcast series with quite a few episodes about that. In those pieces I’ve argued that while the web ecosystem has become the infrastructure for nearly everything, our models for funding and prioritization of the last 20 years have proven not only inadequate, and problematic, but ultimately fragile and cannot last. The only questions, I’ve argued, are how soon and what happens next. Are we ready for it? (hint: no).

So, I talked to a few people and we proposed this as a topic. It was well attended. We began with a short presentation (we made a very detailed outline together, but credit goes to Robin for the great slides).

We organized the presentation into sort of 2 parts. First I presented explaining the problems and why we believe this requires our attention and action. First we have to admit that we have a problem, right? And that this is a problem that we should be concerned with… If not us, who? If not now, when?

Then I outlined that there are many possible solutions and elements of solutions that we can discuss (or try), but all of them share some common elements:

  1. We need a way to take in common money, and a way to actually encourage money into the pot.
  2. We need a way to efficiently and fairly prioritize the money in the pot toward actual work.

I highlighted that there are existing things we can already try (and are trying), and that we should really start trying more.

After this, Robin presented a bigger possible vision we tried to lay out with lots of still fuzzy areas and questions - but effectively: We create an institution which is (through one of a few possibilities) able to compel participation into a system which enforces more sustainable (and fairer) characteristics which guarantee support for the infrastructure of the web.

You can get a very good idea of what was actually presented from the detailed outline that we shared too.

But all of this was only the initial short presentation which I think was only maybe 5-10 minutes. The rest was the point of the breakout: Actual discussion.

I think it was very positive actually. The main thing that impressed me is that there was seemingly no push back or questioning at at all in the premise. We agree with the fundamentals - that as I explained in Webrise, it’s fragile from this perspective, and we need to care about it.

Rick Beyers (from Google, but not speaking for Google) mentioned what they observed in Chromium contributions and that they also had concerns about diversity, both in terms of contributions to a single engine, and multiple engines. He also mentioned that Chromium was spinning up a new collective idea (not yet announced).

Just this morning, we helped launch the Servo collective. The timing is purely coincidental. I’d also note that in part of my presentation I mentioned exploring ways that governments can incentivize and forgot to mention that there have been interesting developments in some open source funding happening this way recently, and to note that the White House recently made a statement that Future Software Should Be Memory Safe. If anyone has a good ‘in’ at the White House please make the case that if you want to know a good place to invest to be sure a lot of future stuff is memory safe, it’s probably browsers - and especially the one written in Rust :). The collective would be happy to accept the White House’s check.

There were also some interesting questions about whether Web Monetization could be related to this, or is just a wholly separate problem, about how the advertising model is exceptionally progressive, and where other investment comes from currently.

Happily minutes are available if you’re interested - and we’ll be trying to organize some immediate discussions on where we go from here through this repo which also has a rough outline of how one solution might work.

March 13, 2024 04:00 AM

March 11, 2024

Brian Kardell

The Darkening

The Darkening

Some additions to my half-light library for styling Shadow DOM.

In case you haven't followed, a number of my recent posts have been thinking about how Shadow DOM falls short, and how we can can use the tools we have today to explore potential solutions and improvements. This has led to good feedback and quick iteration on a tiny library called half-light.

This library lets page authors selectively "push" styles down into shadow roots as adopted @layers. It plugs into CSS's media queries, and allows selectivity on both ends: Which styles and which specific shadow roots. I'm not going to recap the whole concept and interface here because it would just be regurgitating the stuff that's already in the link above. What I want to write about is the latest set of improvements to half-light.

no-light

An important point "feature" of half-light is that it doesn't require web components to be built in a special way in order to achieve this. Alternative/Previous ideas required elements to subclass a special element, for example, to say "Yes, I am styleable from above". However, I don't believe that that seems practical for reasons I explained in Lovely Trees. Effectively, it is very difficult to grow iterations on that approach naturally, and there are already so many wonderful components out there which a number of people say "I wish I could use, but alas I cannot provide some basic styling". This library just lets them do exactly that: Provide styling from the outside (with some caveats below), to see what it's like to actually live with that possibility. Do they love it? Ultimately regret it?

I don't believe that this is some kind of breach of contract to allow that kind of styling from the outside in open shadow roots. The simple fact that the library has to do hardly anything shows just how easy it is, technically, for any page author to do it already today. No "new powers" have been added.

It's harder to make this case with closed shadow roots. While it is entirely possible for page authors to take control of closed shadow roots too, they have to achieve this by changing their nature and, effectively saying "sorry, no your closed shadow roots will be open roots in my page". However much I am not a fan of the closed roots, I do think that someone who made their root closed gave a pretty strong opinion that you're not supposed to touch the inside, even if you think you want to.

What I hadn't considered is that open roots could exist inside of closed roots, and with my pattern you could still have styled them the same way. That felt wrong, so I fixed that by making the whole subtree "darkened". That is, half-light can't get any light past it.

I also made a check for an attribute (or property) called darkened which can achieve this for a light DOM as well. You set it on the shadow host. Thus, if you write myElement.darkened = true or <my-element darkened> it will prevent half-light from applying to the whole subtree. That trick can be used by both custom elements and page authors directly if they find it helpful. Similarly, it's benign otherwise, so component authors can start adding it if they want to and whether page authors happen to be using half-light or not doesn't matter.

Optimizations

The very first edition of half-light was a little greedy as to processing and probably was doing a little too much work. I didn't hear anyone say that they actually experienced a problem, but it was clearly not as efficient as it could be, so, I improved that generally.

But I also built half-light to be a little resilient to where in the head it was loaded, and to conveniently to work with dev tools. That means you can go ahead and live-change the CSS its aware of that and will propagate changes down into your components. This is achieved via a MutationObserver.

Now, in practice I don't think this is likely to be much of a problem in most cases. Once the document has settled down enough to start rendering, I'm not sure how heavy head mutations are these days - but it seems pretty reasonable to be able to say "you don't have to keep observing", so I added that ability too. If you include the disable-live-half-light attribute on the script tag that you use to include half-light, it will stop monitoring.

There's a practical upshot to that as well: This means it can also disengage some book keeping which technically leaks memory (this won't practically cause you issues on your blog or something, it's really mainly if you are doing a lot of dynamic stuff in a long-lived application).

Feedback and evolution

What I've appreciated most about this effort is the feedback and iteration. It's kind of amazing to look over such a brief period of time all of the evolution and improvements toward solving a problem. I hope that more and more people find it valuable to explore and let us know how it goes. Real world experimentation and feedback is so valuable toward ultimately developing a standard solution. Thanks to everyone who has reached out, filed issues, written a blog post, or discussed it on a podcast.

If you haven't already, please leave an emoji or a comment on this github issue to help me collect sentiment toward a solution like this in a central place.

March 11, 2024 04:00 AM

March 05, 2024

José Dapena

Maintaining downstreams of Chromium: why downstream?

Chromium, the web browser open source project Google Chrome is based on, can be considered nowadays the reference implementation of the web platform. As such, it is the first choice when implementing the web platform in a software platform or product.

Why is it like this? In this blog post I am going to introduce the topic, and then review the different reasons why a downstream Chromium is used in different projects.

A series of blog posts

This is the first of a series of blog posts, where I am going through several aspects and challenges of maintaining a downstream project using Chromium as its upstream project.

They will be mostly based on the discussions in two events. First, on The Web Engines Hackfest 2023 break out session with same title that I chaired in A Coruña. Then, on my BlinkOn 18 talk in November 2023, at Sunnyvale.

Some definitions

Before starting the discussion of the different aspects, let’s clarify how I will use several terms.

Repository vs. project

I am going to refer to a repository as a version controlled storage of code strictly. Then, a project (specifically, a software project) is the community of people that share goals and some kind of organization, to maintain one or several software products.

So, a project may use several repositories for their goals.

In this discussion I will talk about Chromium, an open source project that targets the implementation of the web platform user agent, a web browser, for different platforms. As such, it uses a number of repositories (src, v8 and more).

Downstream and upstream

I will use the downstream and upstream terms, referred to the relationship of different software projects version control repositories.

If there is a software project repository (typically open source), and a new repository is created that contains all or part of the original repository, then:

  • Upstream project will be the original repository.
  • Downstream project is the new repository.

It is important to highlight that different things can happen to the downstream repository:

  • This copy could be a one time event, so the downstream repository becomes an independent fork, and there may be no interest in tracking the upstream evolution. This happens often for abandoned repositories, where a different set of people start an independent project. But there could be other reasons.
  • There is the intent to track the upstream repository changes. So the downstream repository evolves as the upstream repository does too, but with some specific differences maintained on top of the original repository.

Why using Chromium?

Nowadays, web platform is a solid alternative for providing contents to the users. It allows modern user interfaces, based on well known standards, and integrate well with local and remote services. The gap between native applications and web contents has been reduced, so it is a good alternative quite often.

But, when integrating web contents, product integrators need an implementation of the web platform. It is no surprise that Chromium is the most used, for a number of reasons:

  • It is open source, with a license that allows to adapt it for new product needs.
  • It is well maintained and up to date. Even pushing through standardization for improving it continuously.
  • It is secure, both from architecture and maintenance model point of view.
  • It provides integration points to tailor the implementation to ones needs.
  • It supports the most popular software platforms (Windows, Android, Linux, …) for integrating new products.
  • On top of the web platform itself, it provides an implementation for many of the components required to build a modern web browser.

Still, there are other good alternate choices for integrating the web, as WebKit (specially WPE for the embedded use cases), or using the system-provided web components (Android or iOS web view, …).

Though, in this blog post I will focus on the Chromium case.

Why downstreaming Chromium?

But, why do different projects need to use downstream Chromium repositories?

The main simple reason the project needs a downstream repository is adding changes that are not upstream. This can be for a variety of reasons:

  • Downstream changes that are not allowed by upstream. I.e. because they will make upstream project harder to maintain, or it will not be tested often.
  • Downstream changes that downstream project does not want to add to upstream.

Let’s see some examples of changes of both types.

Hardware and OS adaptation

This is when downstream adds support for a hardware target or OS that is not originally supported in upstream Chromium project.

Chromium upstream provides an abstraction layer for that purpose named Ozone, that allows to adapt it to the OS, desktop environment, and system graphics compositor. But there are other abstraction layers for media acceleration, accessibility or input methods.

The Wayland protocol adaptation started as a downstream effort, as upstream Chromium did not intend to support Wayland at that time. Eventually it evolved into an upstream official Ozone backend.

An example? LGE webOS Chromium port.

Differentiation

The previous case mostly forces to have a downstream project or repository. But there are also some cases where this is intended. There is the will to have some features in the downstream repository and not in upstream, an intended differentiation.

Why would anybody want that? Some typical examples:

  • A set of features that the downstream project owners consider to make the project better in some way, and want them to be kept downstream. This can happen when a new browser is shipped, and it contains features that make the product offering different, and, in some ways, better, than upstream Chrome. That can be a different user experience, some security features, better privacy…
  • Adaptation to a different product brand. Each browser or browser-based product will want to have its specific brand instead of upstream Chromium brand.

Examples of this:

  • Brave browser, with completely different privacy and security choices.
  • ARC browser, with an innovative user experience.
  • Microsoft Edge, with tight Windows OS integration and corporate features.

Hybrid application runtimes

And one last interesting case: integrating the web platform for developing hybrid applications: those that mix parts of the user interface implemented in a native toolkit, and parts implemented using the web platform.

Though Chromium includes official support for hybrid applications for Android, with the Android Web View, other toolkits provide also web applications support, and the integration of those belong, in Chromium case, to downstream projects.

Examples?

What’s next?

In this blog post I presented different reasons why projects end up maintaining a downstream fork of Chromium.

In the next blog post I will present one of the main challenges when maintaining a downstream of Chromium: the different rebase and upgrade strategies.

by José Dapena Paz at March 05, 2024 03:54 PM

February 26, 2024

Andy Wingo

on the impossibility of composing finalizers and ffi

While poking the other day at making a Guile binding for Harfbuzz, I remembered why I don’t much do this any more: it is impossible to compose GC with explicit ownership.

Allow me to illustrate with an example. Harfbuzz has a concept of blobs, which are refcounted sequences of bytes. It uses these in a number of places, for example when loading OpenType fonts. You can get a peek at the blob’s contents back with hb_blob_get_data, which gives you a pointer and a length.

Say you are in LuaJIT. (To think that for a couple years, I wrote LuaJIT all day long; now I can hardly remember.) You get a blob from somewhere and want to get its data. You define a wrapper for hb_blob_get_data:

local hb = ffi.load("harfbuzz")
ffi.cdef [[
typedef struct hb_blob_t hb_blob_t;

const char *
hb_blob_get_data (hb_blob_t *blob, unsigned int *length);
]]

Presumably you then arrange to release LuaJIT’s reference on the blob when GC collects a Lua wrapper for a blob:

ffi.cdef [[
void hb_blob_destroy (hb_blob_t *blob);
]]

function adopt_blob(ptr)
  return ffi.gc(ptr, hb.hb_blob_destroy)
end

OK, so let’s say we get a blob from somewhere, and want to copy out its contents as a byte string.

function blob_contents(blob)
   local len_out = ffi.new('unsigned int')
   local contents = hb.hb_blob_get_data(blob, len_out)
   local len = len_out[0];
   return ffi.string(contents, len)
end

The thing is, this code is as correct as you can get it, but it’s not correct enough. In between the call to hb_blob_get_data and, well, anything else, GC could run, and if blob is not used in the future of the program execution (the continuation), then it could be collected, causing the hb_blob_destroy finalizer to release the last reference on the blob, freeing contents: we would then be accessing invalid memory.

Among GC implementors, it is a truth universally acknowledged that a program containing finalizers must be in want of a segfault. The semantics of LuaJIT do not prescribe when GC can happen and what values will be live, so the GC and the compiler are not constrained to extend the liveness of blob to, say, the entirety of its lexical scope. It is perfectly valid to collect blob after its last use, and so at some point a GC will evolve to do just that.

I chose LuaJIT not to pick on it, but rather because its FFI is very straightforward. All other languages with GC that I am aware of have this same issue. There are but two work-arounds, and neither are satisfactory: either develop a deep and correct knowledge of what the compiler and run-time will do for a given piece of code, and then pray that knowledge does not go out of date, or attempt to manually extend the lifetime of a finalizable object, and then pray the compiler and GC don’t learn new tricks to invalidate your trick.

This latter strategy takes the form of “remember-this” procedures that are designed to outsmart the compiler. They have mostly worked for the last few decades, but I wouldn’t bet on them in the future.

Another way to look at the problem is that once you have a system working—though, how would you know it’s correct?—then you either never update the compiler and run-time, or you become fast friends with whoever maintains your GC, and probably your compiler too.

For more on this topic, as always Hans Boehm has the first and last word; see for example the 2002 Destructors, finalizers, and synchronization. These considerations don’t really apply to destructors, which are used in languages with ownership and generally run synchronously.

Happy hacking, and be safe out there!

by Andy Wingo at February 26, 2024 10:05 AM

February 23, 2024

Manuel Rego

Servo at FOSDEM 2024

Following the trend started on my last blog post this is a blog post about Servo presence in a new event, this time FOSDEM 2024. This was my first time at FOSDEM, which is a special and different conference with lots of people and talks; it was quite an experience!

Picture of Rakhi during her Servo talk at FOSDEM with Tauri experiment slide in the background.
Servo talk by Rakhi Sharma at FOSDEM 2024

Embedding Servo in Rust projects #

My colleague Rakhi Sharma gave a great talk about Servo in the Rust devroom. The talk went deep into how Servo can be embedded by other Rust projects, showing the work on the Servo’s Minibrowser, and the latest experiments related to the integration with Tauri through their cross-platform WebView library called wry. It’s worth to highlight that the latter has been sponsored by NLnet Foundation, big thanks for your support! Attending FOSDEM was nice as we also met Daniel Thompson-Yvetot from Tauri, which we have been working with as part of this collaboration, and discussed about improvements and plans for the future.

Slides and video of the talk are already available online. Don’t miss it if you’re interested in the more recent news around Servo development and how to embedded it in other projects.

Servo at Linux Foundation Europe stand #

Servo is a Linux Foundation Europe project, and they have a stand at FOSDEM. This allowed us to show Servo to people passing by there and explain the ongoing development efforts on the project. There were also some nice Servo swag that people really liked. The stand gave us the chance to talk to different people about Servo. Most of them were very interested in the project revival and are hopeful about a promising future for the project.

Picture of Linux Foundation Europe stand at FOSDEM 2024 with different Servo swag (stickers, flyers, buttons, cups, caps, teddy bears). Also in the picture there is a screen showing a video of Servo rendering WebGPU content.
Servo swag at Linux Foundation Europe stand at FOSDEM 2024

As part of our attendance to FOSDEM we had the chance to meet some people we have been working with recently. We met Taym Haddadi one active Servo contributor, it was nice to talk to people contributing to the project and meet them in real life. In addition, we were also talking briefly with NLnet folks, they have one of the most popular stands with stickers of all the projects they fund. Furthermore, we also have the chance to talk to Outreachy folks, as we’re working into getting Servo back into Outreachy internship program this year.

Other Igalia talks at FOSDEM 2024 #

Finally, I’d like to highlight that Igalia presence at FOSDEM was much bigger than only Servo related stuff. There were a large groups of igalians participating in the event, and we have a bunch of talks there:

February 23, 2024 12:00 AM

February 20, 2024

Carlos García Campos

A Clarification About WebKit Switching to Skia

In the previous post I talked about the plans of the WebKit ports currently using Cairo to switch to Skia for 2D rendering. Apple ports don’t use Cairo, so they won’t be switching to Skia. I understand the post title was confusing, I’m sorry about that. The original post has been updated for clarity.

by carlos garcia campos at February 20, 2024 06:11 PM

Alex Bradbury

Clarifying instruction semantics with P-Code

I've recently had a need to step through quite a bit of disassembly for different architectures, and although some architectures have well-written ISA manuals it can be a bit jarring switching between very different assembly syntaxes (like "source, destination" for AT&T vs "destination, source" for just about everything else) or tedious looking up different ISA manuals to clarify the precise semantics. I've been using a very simple script to help convert an encoded instruction to a target-independent description of its semantics, and thought I may as well share it as well as some thoughts on its limitations.

instruction_to_pcode

The script is simplicity itself, thanks to the pypcode bindings to Ghidra's SLEIGH library which provides an interface to convert an input to the P-Code representation. Articles like this one provide an introduction and there's the reference manual in the Ghidra repo but it's probably easiest to just look at a few examples. P-Code is used as the basis of Ghidra's decompiler and provides a consistent human-readable description of the semantics of instructions for supported targets.

Here's an example aarch64 instruction:

$ ./instruction_to_pcode aarch64 b874c925
-- 0x0: ldr w5, [x9, w20, SXTW #0x0]
0) unique[0x5f80:8] = sext(w20)
1) unique[0x7200:8] = unique[0x5f80:8]
2) unique[0x7200:8] = unique[0x7200:8] << 0x0
3) unique[0x7580:8] = x9 + unique[0x7200:8]
4) unique[0x28b80:4] = *[ram]unique[0x7580:8]
5) x5 = zext(unique[0x28b80:4])

In the above you can see that the disassembly for the instruction is dumped, and then 5 P-Code instructions are printed showing the semantics. These P-Code instructions directly use the register names for architectural registers (as a reminder, AArch64 has 64-bit GPRs X0-X30 with the bottom halves acessible through W-W30). Intermediate state is stored in unique[addr:width] locations. So the above instruction sign-extends w20, adds to x9, and reads a 32-bit value from the resulting address, then zero-extends to 64-bits when storing into x5.

The output is somewhat more verbose for architectures with flag registers, e.g. cmpb $0x2f,-0x1(%r11) produces:

./instruction_to_pcode x86-64 --no-reverse-input "41 80 7b ff 2f"
-- 0x0: CMP byte ptr [R11 + -0x1],0x2f
0) unique[0x3100:8] = R11 + 0xffffffffffffffff
1) unique[0xbd80:1] = *[ram]unique[0x3100:8]
2) CF = unique[0xbd80:1] < 0x2f
3) unique[0xbd80:1] = *[ram]unique[0x3100:8]
4) OF = sborrow(unique[0xbd80:1], 0x2f)
5) unique[0xbd80:1] = *[ram]unique[0x3100:8]
6) unique[0x28e00:1] = unique[0xbd80:1] - 0x2f
7) SF = unique[0x28e00:1] s< 0x0
8) ZF = unique[0x28e00:1] == 0x0
9) unique[0x13180:1] = unique[0x28e00:1] & 0xff
10) unique[0x13200:1] = popcount(unique[0x13180:1])
11) unique[0x13280:1] = unique[0x13200:1] & 0x1
12) PF = unique[0x13280:1] == 0x0

But simple instructions that don't set flags do produce concise P-Code:

$ ./instruction_to_pcode riscv64 "9d2d"
-- 0x0: c.addw a0,a1
0) unique[0x15880:4] = a0 + a1
1) a0 = sext(unique[0x15880:4])

Other approaches

P-Code was an intermediate language I'd encountered before and of course benefits from having an easy to use Python wrapper and fairly good support for a range of ISAs in Ghidra. But there are lots of other options - angr (which uses Vex, taken from Valgrind) compares some options and there's more in this paper. Radare2 has ESIL, but while I'm sure you'd get used to it, it doesn't pass the readability test for me. The rev.ng project uses QEMU's TCG. This is an attractive approach because you benefit from more testing and ISA extension support for some targets vs P-Code (Ghidra support is lacking for RVV, bitmanip, and crypto extensions).

Another route would be to pull out the semantic definitions from a formal spec (like Sail) or even an easy to read simulator (e.g. Spike for RISC-V). But in both cases, definitions are written to minimise repetition to some degree, while when expanding the semantics we prefer explicitness, so would want to expand to a form that differs a bit from the Sail/Spike code as written.

February 20, 2024 12:00 PM

February 19, 2024

Carlos García Campos

WebKitGTK and WPEWebKit Switching to Skia for 2D Graphics Rendering

In recent years we have had an ongoing effort to improve graphics performance of the WebKit GTK and WPE ports. As a result of this we shipped features like threaded rendering, the DMA-BUF renderer, or proper vertical retrace synchronization (VSync). While these improvements have helped keep WebKit competitive, and even perform better than other engines in some scenarios, it has been clear for a while that we were reaching the limits of what can be achieved with a CPU based 2D renderer.

There was an attempt at making Cairo support GPU rendering, which did not work particularly well due to the library being designed around stateful operation based upon the PostScript model—resulting in a convenient and familiar API, great output quality, but hard to retarget and with some particularly slow corner cases. Meanwhile, other web engines have moved more work to the GPU, including 2D rendering, where many operations are considerably faster.

We checked all the available 2D rendering libraries we could find, but none of them met all our requirements, so we decided to try writing our own library. At the beginning it worked really well, with impressive results in performance even compared to other GPU based alternatives. However, it proved challenging to find the right balance between performance and rendering quality, so we decided to try other alternatives before continuing with its development. Our next option had always been Skia. The main reason why we didn’t choose Skia from the beginning was that it didn’t provide a public library with API stability that distros can package and we can use like most of our dependencies. It still wasn’t what we wanted, but now we have more experience in WebKit maintaining third party dependencies inside the source tree like ANGLE and libwebrtc, so it was no longer a blocker either.

In December 2023 we made the decision of giving Skia a try internally and see if it would be worth the effort of maintaining the project as a third party module inside WebKit. In just one month we had implemented enough features to be able to run all MotionMark tests. The results in the desktop were quite impressive, getting double the score of MotionMark global result. We still had to do more tests in embedded devices which are the actual target of WPE, but it was clear that, at least in the desktop, with this very initial implementation that was not even optimized (we kept our current architecture that is optimized for CPU rendering) we got much better results. We decided that Skia was the option, so we continued working on it and doing more tests in embedded devices. In the boards that we tried we also got better results than CPU rendering, but the difference was not so big, which means that with less powerful GPUs and with our current architecture designed for CPU rendering we were not that far from CPU rendering. That’s the reason why we managed to keep WPE competitive in embeeded devices, but Skia will not only bring performance improvements, it will also simplify the code and will allow us to implement new features . So, we had enough data already to make the final decision of going with Skia.

In February 2024 we reached a point in which our Skia internal branch was in an “upstreamable” state, so there was no reason to continue working privately. We met with several teams from Google, Sony, Apple and Red Hat to discuss with them about our intention to switch from Cairo to Skia, upstreaming what we had as soon as possible. We got really positive feedback from all of them, so we sent an email to the WebKit developers mailing list to make it public. And again we only got positive feedback, so we started to prepare the patches to import Skia into WebKit, add the CMake integration and the initial Skia implementation for the WPE port that already landed in main.

We will continue working on the Skia implementation in upstream WebKit, and we also have plans to change our architecture to better support the GPU rendering case in a more efficient way. We don’t have a deadline, it will be ready when we have implemented everything currently supported by Cairo, we don’t plan to switch with regressions. We are focused on the WPE port for now, but at some point we will start working on GTK too and other ports using cairo will eventually start getting Skia support as well.

by carlos garcia campos at February 19, 2024 01:27 PM

February 16, 2024

Melissa Wen

Keep an eye out: We are preparing the 2024 Linux Display Next Hackfest!

Igalia is preparing the 2024 Linux Display Next Hackfest and we are thrilled to announce that this year’s hackfest will take place from May 14th to 16th at our HQ in A Coruña, Spain.

This unconference-style event aims to bring together the most relevant players in the Linux display community to tackle current challenges and chart the future of the display stack.

Key goals for the hackfest include:

  • Releasing the power of collaboration: We’ll work to remove bottlenecks and pave the way for smoother, more performant displays.
  • Problem-solving powerhouse: Brainstorming sessions and collaborative coding will target issues like HDR, color management, variable refresh rates, and more.
  • Building on past commitments: Let’s solidify the progress made in recent years and push the boundaries even further.

The hackfest fosters an intimate and focused environment to brainstorm, hack, and design solutions alongside fellow display experts. Participants will dive into discussions, tinker with code, and contribute to shaping the future of the Linux display stack.

More details are available on the official website.

Stay tuned! Keep an eye out for more information, mark your calendars and start prepping your hacking gear.

February 16, 2024 05:25 PM

February 14, 2024

Lucas Fryzek

A Dive into Vulkanised 2024

Vulkanised sign at google’s office
Vulkanised sign at google’s office

Last week I had an exciting opportunity to attend the Vulkanised 2024 conference. For those of you not familar with the event, it is “The Premier Vulkan Developer Conference” hosted by the Vulkan working group from Khronos. With the excitement out of the way, I decided to write about some of the interesting information that came out of the conference.

A Few Presentations

My colleagues Iago, Stéphane, and Hyunjun each had the opportunity to present on some of their work into the wider Vulkan ecosystem.

Stéphane and Hyujun presenting
Stéphane and Hyujun presenting

Stéphane & Hyunjun presented “Implementing a Vulkan Video Encoder From Mesa to Streamer”. They jointly talked about the work they performed to implement the Vulkan video extensions in Intel’s ANV Mesa driver as well as in GStreamer. This was an interesting presentation because you got to see how the new Vulkan video extensions affected both driver developers implementing the extensions and application developers making use of the extensions for real time video decoding and encoding. Their presentation is available on vulkan.org.

Iago presenting
Iago presenting

Later my colleague Iago presented jointly with Faith Ekstrand (a well-known Linux graphic stack contributor from Collabora) on “8 Years of Open Drivers, including the State of Vulkan in Mesa”. They both talked about the current state of Vulkan in the open source driver ecosystem, and some of the benefits open source drivers have been able to take advantage of, like the common Vulkan runtime code and a shared compiler stack. You can check out their presentation for all the details.

Besides Igalia’s presentations, there were several more which I found interesting, with topics such as Vulkan developer tools, experiences of using Vulkan in real work applications, and even how to teach Vulkan to new developers. Here are some highlights for some of them.

Using Vulkan Synchronization Validation Effectively

John Zulauf had a presentation of the Vulkan synchronization validation layers that he has been working on. If you are not familiar with these, then you should really check them out. They work by tracking how resources are used inside Vulkan and providing error messages with some hints if you use a resource in a way where it is not synchronized properly. It can’t catch every error, but it’s a great tool in the toolbelt of Vulkan developers to make their lives easier when it comes to debugging synchronization issues. As John said in the presentation, synchronization in Vulkan is hard, and nearly every application he tested the layers on reveled a synchronization issue, no matter how simple it was. He can proudly say he is a vkQuake contributor now because of these layers.

6 Years of Teaching Vulkan with Example for Video Extensions

This was an interesting presentation from a professor at the university of Vienna about his experience teaching graphics as well as game development to students who may have little real programming experience. He covered the techniques he uses to make learning easier as well as resources that he uses. This would be a great presentation to check out if you’re trying to teach Vulkan to others.

Vulkan Synchronization Made Easy

Another presentation focused on Vulkan sync, but instead of debugging it, Grigory showed how his graphics library abstracts sync away from the user without implementing a render graph. He presented an interesting technique that is similar to how the sync validation layers work when it comes ensuring that resources are always synchronized before use. If you’re building your own engine in Vulkan, this is definitely something worth checking out.

Vulkan Video Encode API: A Deep Dive

Tony at Nvidia did a deep dive into the new Vulkan Video extensions, explaining a bit about how video codecs work, and also including a roadmap for future codec support in the video extensions. Especially interesting for us was that he made a nice call-out to Igalia and our work on Vulkan Video CTS and open source driver support on slide (6) :)

Thoughts on Vulkanised

Vulkanised is an interesting conference that gives you the intersection of people working on Vulkan drivers, game developers using Vulkan for their graphics backend, visual FX tool developers using Vulkan-based tools in their pipeline, industrial application developers using Vulkan for some embedded commercial systems, and general hobbyists who are just interested in Vulkan. As an example of some of these interesting audience members, I got to talk with a member of the Blender foundation about his work on the Vulkan backend to Blender.

Lastly the event was held at Google’s offices in Sunnyvale. Which I’m always happy to travel to, not just for the better weather (coming from Canada), but also for the amazing restaurants and food that’s in the Bay Area!

Great bay area food
Great bay area food

February 14, 2024 05:00 AM

February 13, 2024

Víctor Jáquez

GstVA library in GStreamer 1.22 and some new features in 1.24

I know, it’s old news, but still, I was pending to write about the GstVA library to clarify its purpose and scope.I didn’t want to have a library for GstVA, because to maintain a library is a though duty. I learnt it in the bad way with GStreamer-VAAPI. Therefore, I wanted GstVA as simple as possible, direct in its libva usage, and self-contained.

As far as I know, the main usage of GStreamer-VAAPI library was to have access to the VASurface from the GstBuffer, for example, with appsink. Since the beginning of GstVA, a mechanism to access the surface ID was provided, as `GStreamer OpenGL** offers access to the texture ID: through a flag when mapping a GstGL memory backed buffer. You can see how this mechanism is used in the one of the GstVA sample apps.

Nevertheless, later another use case appeared which demanded a library for GstVA: Other GStreamer elements needed to produce or consume VA surface backed buffers. Or to say it more concretely: they needed to use the GstVA allocator and buffer pool, and the mechanism to share the GstVA context along the pipeline too. The plugins with those VA related elements are msdk and qsv, when they operate on Linux. Both elements use Intel OneVPL, so they are basically competitors. The main difference is that the first is maintained by Intel, and has more features, specially for Linux, whilst the former in maintained by Seungha Yang, and it appears to be better tested in Windows.

These are the objects exposed by the GstVA library API:

GstVaDisplay #

GstVaDisplay represents a VADisplay, the interface between the application and the hardware accelerator. This class is abstract, and it’s supposed not to be instantiated. Instantiation has to go through it derived classes, such as

This class is shared among all the elements in the pipeline via GstContext, so all the plugged elements share the same connection the hardware accelerator. Unless the plugged element in the pipeline has a specific name, as in the case of multi GPU systems.

Let’s talk a bit about multi GPU systems. This is the gst-inspect-1.0 output of a system with an Intel and an AMD GPU, both with VA support:

$ gst-inspect-1.0 va
Plugin Details:
Name va
Description VA-API codecs plugin
Filename /home/igalia/vjaquez/gstreamer/build/subprojects/gst-plugins-bad/sys/va/libgstva.so
Version 1.23.1.1
License LGPL
Source module gst-plugins-bad
Documentation https://gstreamer.freedesktop.org/documentation/va/
Binary package GStreamer Bad Plug-ins git
Origin URL Unknown package origin

vaav1dec: VA-API AV1 Decoder in Intel(R) Gen Graphics
vacompositor: VA-API Video Compositor in Intel(R) Gen Graphics
vadeinterlace: VA-API Deinterlacer in Intel(R) Gen Graphics
vah264dec: VA-API H.264 Decoder in Intel(R) Gen Graphics
vah264lpenc: VA-API H.264 Low Power Encoder in Intel(R) Gen Graphics
vah265dec: VA-API H.265 Decoder in Intel(R) Gen Graphics
vah265lpenc: VA-API H.265 Low Power Encoder in Intel(R) Gen Graphics
vajpegdec: VA-API JPEG Decoder in Intel(R) Gen Graphics
vampeg2dec: VA-API Mpeg2 Decoder in Intel(R) Gen Graphics
vapostproc: VA-API Video Postprocessor in Intel(R) Gen Graphics
varenderD129av1dec: VA-API AV1 Decoder in AMD Radeon Graphics in renderD129
varenderD129av1enc: VA-API AV1 Encoder in AMD Radeon Graphics in renderD129
varenderD129compositor: VA-API Video Compositor in AMD Radeon Graphics in renderD129
varenderD129deinterlace: VA-API Deinterlacer in AMD Radeon Graphics in renderD129
varenderD129h264dec: VA-API H.264 Decoder in AMD Radeon Graphics in renderD129
varenderD129h264enc: VA-API H.264 Encoder in AMD Radeon Graphics in renderD129
varenderD129h265dec: VA-API H.265 Decoder in AMD Radeon Graphics in renderD129
varenderD129h265enc: VA-API H.265 Encoder in AMD Radeon Graphics in renderD129
varenderD129jpegdec: VA-API JPEG Decoder in AMD Radeon Graphics in renderD129
varenderD129postproc: VA-API Video Postprocessor in AMD Radeon Graphics in renderD129
varenderD129vp9dec: VA-API VP9 Decoder in AMD Radeon Graphics in renderD129
vavp9dec: VA-API VP9 Decoder in Intel(R) Gen Graphics

22 features:
+-- 22 elements

As you can observe, each card registers the supported element. The first GPU, the Intel one, doesn’t insert renderD129 after the va prefix, while the AMD Radeon, does. As you could imagine, renderD129 is the device name in /dev/dri:

$ ls /dev/dri
by-path card0 card1 renderD128 renderD129

The appended device name expresses the DRM device the elements use. And only after the second GPU the device name is appended. This allows to use the untagged elements either for the first GPU or to allow the usage of a wrapped display injected by the user application, such as VA X11 or Wayland connections.

Notice that sharing DMABuf-based buffers between different GPUs is theoretically possible, but not assured.

Keep in mind that, currently, nvidia-vaapi-driver is not supported by GstVA, and the driver will be ignored by the plugin register since 1.24.

VA allocators #

There are two types of memory allocators in GstVA:

  • VA surface allocator. It allocates a GstMemory that wraps a VASurfaceID, which represents a complete frame directly consumable by VA-based elements. Thus, a GstBuffer will hold one and only one GstMemory of this type. These buffers are the very same used by the system memory buffers, since the surfaces generally can map their content to CPU memory (unless they are encrypted, but GstVA haven’t been tested for that use case).
  • VA-DMABuf allocator. It descends from GstDmaBufAllocator, though it’s not an allocator in strict sense, since it only imports DMABufs to VA surfaces, and exports VA surfaces to DMABufs. Notice that a single VA surface can be exported as multiple DMABuf-backed GstMemories, and the contrary, a VA surface can be imported from multiple DMABufs.

Also, VA allocators offer a couple methods that can be useful for applications that use appsink, for example gst_va_buffer_get_surface and gst_va_buffer_peek_display.

One particularity of both allocators is that they keep an internal pool of the allocated VA surfaces, since the mapping of buffers/memories is not always 1:1, as in the case of DMABuf; to reuse surfaces even if the buffer pool ditches them, and to avoid surface leaks, keeping track of them all the time.

GstVaPool #

This class is only useful for other GStreamer elements capable of consume VA surfaces, so they could instantiate, configure and propose a VA buffer pool, but not for applications.

And that’s all for now. Thank you for bear with me up to here. But remember, the API of the library is unstable, and it can change any time. So, no promises are made :)

February 13, 2024 12:00 AM

February 12, 2024

Brian Kardell

StyleSheet Parfait

StyleSheet Parfait

In this post I'll talk about some interesting things (some people might pronounce this 'footguns') around adoptedStyleSheets, conversations and thoughts around open styling problems and @layer.

If you're not familiar with adopted stylesheets, they're a way that your shadow roots can share literal styesheet instances by reference. That's a cool idea, right? If you have 10 fancy-inputs on the same page, it makes no sense for each of them to have their own copy of the whole stylesheet.

It's fairly early days for this still (in standards and support terms, at least) and we'll put improvements on top of this, but for now it is a pretty basic JavaScript API: Every shadow root now has an .adoptedStyleSheets property, which is an array. You can push stylesheets onto it or just assign an array of stylesheets. Currently those stylesheets have to be instances created via the recently introduced constructor new CSSStyleSheet().

Cool.

Steve Orvell opened an issue suggesting that I make half-light use adopted stylesheets. Sure, why not. In practice what this really saves is mainly the parse time, since browsers are pretty good at optimizing this otherwise, but that's still important and, in fact, it made the code more concise as well.

Whoopsie

However, there is an important bit about adopted stylesheets that I forgot about when I implemented this initially (which is strange because I am on the record discussing it in CSSWG): adopted stylesheets are treated as if they come after any stylesheets in the (shadow) root.

Previously, half-light (and earlier experiments) took great care to put stylesheets from the outer page before any that the component itself. That seems right to me, and what the adopted stylesheets were doing now with adopted stylesheets seemed wrong...

Enter: Layers

A solution to this newly created problem that's fairly easy is to wrap the rules that are adopted with @layer. Then, if your component has a style element, the rules in there will, by default, win. And that's true even if the rules that the page author pushed in had higher specificity! That's a pretty nice improvement. If some code helps you understand, here's a pen that illustrates it all:

See the Pen adoptedstylesheets and layers by вкαя∂εℓℓ (@briankardell) on CodePen.

Layers... Like an ogre.

Eric and I recently did a podcast on the Open Styleable shadow roots topic with Mia. Some of the thoughts she shared, and later conversations that followed with Westbrook and Nolan, convinced me to try to explore how we could use layers in half-light.

It had me thinking that the way that adopted stylesheets and layers both work seems to allow that we could develop some kind of 'shadow styling protocol' here. Maybe the simplest way to do this is to just give the layer a well-known name: I called it --crossroot

Now, when a page author uses half-light to set a style like:

@media --crossroot { 
  h1 { ... }
}

This is adopted into shadow roots as:

@layer --crossroot { 
  h1 { ... }
}

That is a lower layer than anything in the default layer of a component's shadow root (both stylesheets or adopted stylesheets).

The maybe interesting part this adds is that it means that the component itself can consciously manage it's layers if it chooses to do so! For example..

this.shadowRoot.innerHTML = `
  <style>
    @layer base, --crossroot, main;
    ... 
    /* add rules to those layers  */
    ... 
  </style>
  ${some_html}
`

If that sounds a little confusing, it's really not too bad - what it means is that, going from least to most specific, rules would evaluate roughly like:

  1. User Agent styles.
  2. Page authored rules that inherit into the Shadow DOM.
  3. @layers (including --crossroot half-light provides) in the shadow
  4. Rules in style elements in the shadow that aren't in a layer.

Combining ideas...

There was also some unrelated feedback from Nolan that this approach was a non-starter for some use cases - like pushing Bootstrap down to all of the shadow roots. The previous shadow-boxing library would have supported that better. Luckily, though we've built this on Media Queries, so CSS has that pretty well figured out - all we have to do is add support to half-light to make Media Queries work in link and style tags (in the head) as well. Easy enough, and I agree that's a good improvement. So, now you can also write markup like this:

<link rel="stylesheet" href="../prism.css" media="screen, --crossroot"></link>

<!-- or to target shadows of specific elements, add a selector... -->
<link rel="stylesheet" href="../prism.css" media="screen, (--crossroot x-foo)"></link>

So... That's it, this is all in half-light now... And guess what? It's still only 95 lines of code. Thanks for all the feedback so far! So, wdyt? Don't forget to leave me an indication of how you're feeling about it with an emoji (and/or comment) on the Emoji sentiment or short comment issue.

Very special thanks to everyone who has commented and shared thoughts constructively along the way, even when they might not agree. If you actually voted in the emoji sentiment poll: ❤. Thanks especially to Mia who has been great to discuss/review and improve ideas with.

February 12, 2024 05:00 AM