Planet Igalia

September 05, 2025

Pawel Lampe

The problem of storing the damage

This article is a continuation of the series on damage propagation. While the previous article laid some foundation on the subject, this one discusses the cost (increased CPU and memory utilization) that the feature incurs, as this is highly dependent on design decisions and the implementation of the data structure used for storing damage information.

From the perspective of this article, the two key things worth remembering from the previous one are:

  • The damage propagation is an optional WPE/GTK WebKit feature that — when enabled — reduces the browser’s GPU utilization at the expense of increased CPU and memory utilization.
  • On the implementation level, the damage is almost always a collection of rectangles that cover the changed region.

The damage information #

Before diving into the problem and its solutions, it’s essential to understand basic properties of the damage information.

The damage nature #

As mentioned in the section about damage of the previous article, the damage information describes a region that changed and requires repainting. It was also pointed out that such a description is usually done via a collection of rectangles. Although sometimes it’s better to describe a region in a different way, the rectangles are a natural choice due to the very nature of the damage in the web engines that originates from the box model.

A more detailed description of the damage nature can be inferred from the Pipeline details section of the previous article. The bottom line is, in the end, the visual changes to the render tree yield the damage information in the form of rectangles. For the sake of clarity, such original rectangles may be referred to as raw damage.

In practice, the above means that it doesn’t matter whether, e.g. the circle is drawn on a 2D canvas or the background color of some block element changes — ultimately, the rectangles (raw damage) are always produced in the process.

Approximating the damage #

As the raw damage is a collection of rectangles describing a damaged region, the geometrical consequence is that there may be more than one set of rectangles describing the same region. It means that raw damage could be stored by a different set of rectangles and still precisely describe the original damaged region — e.g. when raw damage contains more rectangles than necessary. The example of different approximations of a simple raw damage is depicted in the image below:

Raw damage approximated multiple ways.

Changing the set of rectangles that describes the damaged region may be very tempting — especially when the size of the set could be reduced. However, the following consequences must be taken into account:

  • The damaged region could shrink when some damaging information would be lost e.g. if too many rectangles would be removed.
  • The damaged region could expand when some damaging information would be added e.g. if too many or too big rectangles would be added.

The first consequence may lead to visual glitches when repainting. The second one, however, causes no visual issues but degrades performance since a larger area (i.e. more pixels) must be repainted — typically increasing GPU usage. This means the damage information can be approximated as long as the trade-off between the extra repainted area and the degree of simplification in the underlying set of rectangles is acceptable.

The approximation mentioned above means the situation where the approximated damaged region covers the original damaged region entirely i.e. not a single pixel of information is lost. In that sense, the approximation can only add extra information. Naturally, the lower the extra area added to the original damaged region, the better.

The approximation quality can be referred to as damage resolution, which is:

  • low — when the extra area added to the original damaged region is significant,
  • high — when the extra area added to the original damaged region is small.

The examples of low (left) and high (right) damage resolutions are presented in the image below:

Various damage resolutions.

The problem #

Given the description of the damage properties presented in the sections above, it’s evident there’s a certain degree of flexibility when it comes to processing damage information. Such a situation is very fortunate in the context of storing the damage, as it gives some freedom in designing a proper data structure. However, before jumping into the actual solutions, it’s necessary to understand the problem end-to-end.

The scale #

The Pipeline details section of the previous article introduced two basic types of damage in the damage propagation pipeline:

  • layer damage — the damage tracked separately for each layer,
  • frame damage — the damage that aggregates individual layer damages and consists of the final damage of a given frame.

Assuming there are L layers and there is some data structure called Damage that can store the damage information, it’s easy to notice that there may be L+1 instances of Damage present at the same time in the pipeline as the browser engine requires:

  • L Damage objects for storing layer damage,
  • 1 Damage object for storing frame damage.

As there may be a lot of layers in more complex web pages, the L+1 mentioned above may be a very big number.

The first consequence of the above is that the Damage data structure in general should store the damage information in a very compact way to avoid excessive memory usage when L+1 Damage objects are present at the same time.

The second consequence of the above is that the Damage data structure in general should be very performant as each of L+1 Damage objects may be involved into a considerable amount of processing when there are lots of updates across the web page (and hence huge numbers of damage rectangles).

To better understand the above consequences, it’s essential to examine the input and the output of such a hypothetical Damage data structure more thoroughly.

The input #

There are 2 kinds of Damage data structure input:

  • other Damage,
  • raw damage.

The Damage becomes an input of other Damage in some situations, happening in the middle of the damage propagation pipeline when the broader damage is being assembled from smaller chunks of damage. What it consists of depends purely on the Damage implementation.

The raw damage, on the other hand, becomes an input of the Damage always at the very beginning of the damage propagation pipeline. In practice, it consists of a set of rectangles that are potentially overlapping, duplicated, or empty. Moreover, such a set is always as big as the set of changes causing visual impact. Therefore, in the worst case scenario such as drawing on a 2D canvas, the number of rectangles may be enormous.

Given the above, it’s clear that the hypothetical Damage data structure should support 2 distinct input operations in the most performant way possible:

  • add(Damage),
  • add(Rectangle).

The output #

When it comes to the Damage data structure output, there are 2 possibilities either:

  • other Damage,
  • the platform API.

The Damage becomes the output of other Damage on each Damage-to-Damage append that was described in the subsection above.

The platform API, on the other hand, becomes the output of Damage at the very end of the pipeline e.g. when the platform API consumes the frame damage (as described in the pipeline details section of the previous article). In this situation, what’s expected on the output technically depends on the particular platform API. However, in practice, all platforms supporting damage propagation require a set of rectangles that describe the damaged region. Such a set of rectangles is fed into the platforms via APIs by simply iterating the rectangles describing the damaged region and transforming them to whatever data structure the particular API expects.

The natural consequence of the above is that the hypothetical Damage data structure should support the following output operation — also in the most performant way possible:

  • forEachRectangle(...).

The problem statement #

Given all the above perspectives, the problem of designing the Damage data structure can be summarized as storing the input damage information to be accessed (iterated) later in a way that:

  1. the performance of operations for adding and iterating rectangles is maximal (performance),
  2. the memory footprint of the data structure is minimal (memory footprint),
  3. the stored region covers the original region and has the area as close to it as possible (damage resolution).

With the problem formulated this way, it’s obvious that this is a multi-criteria optimization problem with 3 criteria:

  1. performance (maximize),
  2. memory footprint (minimize),
  3. damage resolution (maximize).

Damage data structure implementations #

Given the problem of storing damage defined as above, it’s possible to propose various ways of solving it by implementing a Damage data structure. Before diving into details, however, it’s important to emphasize that the weights of criteria may be different depending on the situation. Therefore, before deciding how to design the Damage data structure, one should consider the following questions:

  • What is the proportion between the power of GPU and CPU in the devices I’m targeting?
  • What are the memory constraints of the devices I’m targeting?
  • What are the cache sizes on the devices I’m targeting?
  • What is the balance between GPU and CPU usage in the applications I’m going to optimize for?
    • Are they more rendering-oriented (e.g. using WebGL, Canvas 2D, animations etc.)?
    • Are they more computing-oriented (frequent layouts, a lot of JavaScript processing etc.)?

Although answering the above usually points into the direction of specific implementation, usually the answers are unknown and hence the implementation should be as generic as possible. In practice, it means the implementation should not optimize with a strong focus on just one criterion. However, as there’s no silver bullet solution, it’s worth exploring multiple, quasi-generic solutions that have been researched as part of Igalia’s work on the damage propagation, and which are the following:

  • Damage storing all input rects,
  • Bounding box Damage,
  • Damage using WebKit’s Region,
  • R-Tree Damage,
  • Grid-based Damage.

All of the above implementations are being evaluated along the 3 criteria the following way:

  1. Performance
    • by specifying the time complexity of add(Rectangle) operation as add(Damage) can be transformed into the series of add(Rectangle) operations,
    • by specifying the time complexity of forEachRectangle(...) operation.
  2. Memory footprint
    • by specifying the space complexity of Damage data structure.
  3. Damage resolution
    • by subjectively specifying the damage resolution.

Damage storing all input rects #

The most natural — yet very naive — Damage implementation is one that wraps a simple collection (such as vector) of rectangles and hence stores the raw damage in the original form. In that case, the evaluation is as simple as evaluating the underlying data structure.

Assuming a vector data structure and O(1) amortized time complexity of insertion, the evaluation of such a Damage implementation is:

  1. Performance
    • insertion is O(1) ✅
    • iteration is O(N) ❌
  2. Memory footprint
    • O(N) ❌
  3. Damage resolution
    • perfect

Despite being trivial to implement, this approach is heavily skewed towards the damage resolution criterion. Essentially, the damage quality is the best possible, yet the expense is a very poor performance and substantial memory footprint. It’s because a number of input rects N can be a very big number, thus making the linear complexities unacceptable.

The other problem with this solution is that it performs no filtering and hence may store a lot of redundant rectangles. While the empty rectangles can be filtered out in O(1), filtering out duplicates and some of the overlaps (one rectangle completely containing the other) would make insertion O(N). Naturally, such a filtering would lead to a smaller memory footprint and faster iteration in practice, however, their complexities would not change.

Bounding box Damage #

The second simplest Damage implementation one can possibly imagine is the implementation that stores just a single rectangle, which is a minimum bounding rectangle (bounding box) of all the damage rectangles that were added into the data structure. The minimum bounding rectangle — as the name suggests — is a minimal rectangle that can fit all the input rectangles inside. This is well demonstrated in the picture below:

Bounding box.

As this implementation stores just a single rectangle, and as the operation of taking the bounding box of two rectangles is O(1), the evaluation is as follows:

  1. Performance
    • insertion is O(1) ✅
    • iteration is O(1) ✅
  2. Memory footprint
    • O(1) ✅
  3. Damage resolution
    • usually low ⚠️

Contrary to the Damage storing all input rects, this solution yields a perfect performance and memory footprint at the expense of low damage resolution. However, in practice, the damage resolution of this solution is not always low. More specifically:

  • in the optimistic cases (raw damage clustered), the area of the bounding box is close to the area of the raw damage inside,
  • in the average cases, the approximation of the damaged region suffers from covering significant areas that were not damaged,
  • in the worst cases (small damage rectangles on the other ends of a viewport diagonal), the approximation is very poor, and it may be as bad as covering the whole viewport.

As this solution requires a minimal overhead while still providing a relatively useful damage approximation, in practice, it is a baseline solution used in:

  • Chromium,
  • Firefox,
  • WPE and GTK WebKit when UnifyDamagedRegions runtime preference is enabled, which means it’s used in GTK WebKit by default.

Damage using WebKit’s Region #

When it comes to more sophisticated Damage implementations, the simplest approach in case of WebKit is to wrap data structure already implemented in WebCore called Region. Its purpose is just as the name suggests — to store a region. More specifically, it’s meant to store rectangles describing region in an efficient way both for storage and for access to take advantage of scanline coherence during rasterization. The key characteristic of the data structure is that it stores rectangles without overlaps. This is achieved by storing y-sorted lists of x-sorted, non-overlapping rectangles. Another important property is that due to the specific internal representation, the number of integers stored per rectangle is usually smaller than 4. Also, there are some other useful properties that are, however, not very useful in the context of storing the damage. More details on the data structure itself can be found in the J. E. Steinhart’s paper from 1991 titled SCANLINE COHERENT SHAPE ALGEBRA published as part of Graphics Gems II book.

The Damage implementation being a wrapper of the Region was actually used by GTK and WPE ports as a first version of more sophisticated Damage alternative for the bounding box Damage. Just as expected, it provided better damage resolution in some cases, however, it suffered from effectively degrading to a more expensive variant bounding box Damage in the majority of situations.

The above was inevitable as the implementation was falling back to bounding box Damage when the Region’s internal representation was getting too complex. In essence, it was addressing the Region’s biggest problem, which is that it can effectively store N2 rectangles in the worst case due to the way it splits rectangles for storing purposes. More specifically, as the Region stores ledges and spans, each insertion of a new rectangle may lead to splitting O(N) existing rectangles. Such a situation is depicted in the image below, where 3 rectangles are being split into 9:

WebKit's Region storing method.

Putting the above fallback mechanism aside, the evaluation of Damage being a simple wrapper on top of Region is the following:

  1. Performance
    • insertion is O(logN) ✅
    • iteration is O(N2) ❌
  2. Memory footprint
    • O(N2) ❌
  3. Damage resolution
    • perfect

Adding a fallback, the evaluation is technically the same as bounding box Damage for N above the fallback point, yet with extra overhead. At the same time, for smaller N, the above evaluation didn’t really matter much as in such case all the performance, memory footprint, and the damage resolution were very good.

Despite this solution (with a fallback) yielded very good results for some simple scenarios (when N was small enough), it was not sustainable in the long run, as it was not addressing the majority of use cases, where it was actually a bit slower than bounding box Damage while the results were similar.

R-Tree Damage #

In the pursuit of more sophisticated Damage implementations, one can think of wrapping/adapting data structures similar to quadtrees, KD-trees etc. However, in most of such cases, a lot of unnecessary overhead is added as the data structures partition the space so that, in the end, the input is stored without overlaps. As overlaps are not necessarily a problem for storing damage information, the list of candidate data structures can be narrowed down to the most performant data structures allowing overlaps. One of the most interesting of such options is the R-Tree.

In short, R-Tree (rectangle tree) is a tree data structure that allows storing multiple entries (rectangles) in a single node. While the leaf nodes of such a tree store the original rectangles inserted into the data structure, each of the intermediate nodes stores the bounding box (minimum bounding rectangle, MBR) of the children nodes. As the tree is balanced, the above means that with every next tree level from the top, the list of rectangles (either bounding boxes or original ones) gets bigger and more detailed. The example of the R-tree is depicted in the Figure 5 from the Object Trajectory Analysis in Video Indexing and Retrieval Applications paper:

TODO.

The above perfectly shows the differences between the rectangles on various levels and can also visually suggest some ideas when it comes to adapting such a data structure into Damage:

  1. The first possibility is to make Damage a simple wrapper of R-Tree that would just build the tree and allow the Damage consumer to pick the desired damage resolution on iteration attempt. Such an approach is possible as having the full R-Tree allows the iteration code to limit iteration to a certain level of the tree or to various levels from separate branches. The latter allows Damage to offer a particularly interesting API where the forEachRectangle(...) function could accept a parameter specifying how many rectangles (at most) are expected to be iterated.
  2. The other possibility is to make Damage an adaptation of R-Tree that conditionally prunes the tree while constructing it not to let it grow too much, yet to maintain a certain height and hence certain damage quality.

Regardless of the approach, the R-Tree construction also allows one to implement a simple filtering mechanism that eliminates input rectangles being duplicated or contained by existing rectangles on the fly. However, such a filtering is not very effective as it can only consider a limited set of rectangles i.e. the ones encountered during traversal required by insertion.

Damage as a simple R-Tree wrapper

Although this option may be considered very interesting, in practice, storing all the input rectangles in the R-Tree means storing N rectangles along with the overhead of a tree structure. In the worst case scenario (node size of 2), the number of nodes in the tree may be as big as O(N), thus adding a lot of overhead required to maintain the tree structure. This fact alone makes this solution have an unacceptable memory footprint. The other problem with this idea is that in practice, the damage resolution selection is usually done once — during browser startup. Therefore, the ability to select damage resolution during runtime brings no benefits while introduces unnecessary overhead.

The evaluation of the above is the following:

  1. Performance
    • insertion is O(logMN) where M is the node size ✅
    • iteration is O(K) where K is a parameter and 0≤K≤N ✅
  2. Memory footprint
    • O(N) ❌
  3. Damage resolution
    • low to high
Damage as an R-Tree adaptation with pruning

Considering the problems the previous idea has, the option with pruning seems to be addressing all the problems:

  • the memory footprint can be controlled by specifying at which level of the tree the pruning should happen,
  • the damage resolution (level of the tree where pruning happens) can be picked on the implementation level (compile time), thus allowing some extra implementation tricks if necessary.

While it’s true the above problems are not existing within this approach, the option with pruning — unfortunately — brings new problems that need to be considered. As a matter of fact, all the new problems it brings are originating from the fact that each pruning operation leads to the loss of information and hence to the tree deterioration over time.

Before actually introducing those new problems, it’s worth understanding more about how insertions work in the R-Tree.

When the rectangle is inserted to the R-Tree, the first step is to find a proper position for the new record (see ChooseLeaf algorithm from Guttman1984). When the target node is found, there are two possibilities:

  1. adding the new rectangle to the target node does not cause overflow,
  2. adding the new rectangle to the target node causes overflow.

If no overflow happens, the new rectangle is just added to the target node. However, if overflow happens i.e. the number of rectangles in the node exceeds the limit, the node splitting algorithm is invoked (see SplitNode algorithm from Guttman1984) and the changes are being propagated up the tree (see ChooseLeaf algorithm from Guttman1984).

The node splitting, along with adjusting the tree, are very important steps within insertion as those algorithms are the ones that are responsible for shaping and balancing the tree. For example, when all the nodes in the tree are full and the new rectangle is being added, the node splitting will effectively be executed for some leaf node and all its ancestors, including root. It means that the tree will grow and possibly, its structure will change significantly.

Due to the above mechanics of R-Tree, it can be reasonably asserted that the tree structure becomes better as a function of node splits. With that, the first problem of the tree pruning becomes obvious: tree pruning on insertion limits the amount of node splits (due to smaller node splits cascades) and hence limits the quality of the tree structure. The second problem — also related to node splits — is that with all the information lost due to pruning (as pruning is the same as removing a subtree and inserting its bounding box into the tree) each node split is less effective as the leaf rectangles themselves are getting bigger and bigger due to them becoming bounding boxes of bounding boxes (…) of the original rectangles.

The above problems become more visible in practice when the R-tree input rectangles tend to be sorted. In general, one of the R-Tree problems is that its structure tends to be biased when the input rectangles are sorted. Despite the further insertions usually fix the structure of the biased tree, it’s only done to some degree, as some tree nodes may not get split anymore. When the pruning happens and the input is sorted (or partially sorted) the fixing of the biased tree is much harder and sometimes even impossible. It can be well explained with the example where a lot of rectangles from the same area are inserted into the tree. With the number of such rectangles being big enough, a lot of pruning will happen and hence a lot of rectangles will be lost and replaced by larger bounding boxes. Then, if a series of new insertions will start inserting nodes from a different area which is partially close to the original one, the new rectangles may end up being siblings of those large bounding boxes instead of the original rectangles that could be clustered within nodes in a much more reasonable way.

Given the above problems, the evaluation of the whole idea of Damage being the adaptation of R-Tree with pruning is the following:

  1. Performance
    • insertion is O(logMK) where M is the node size, K is a parameter, and 0<K≤N ✅
    • iteration is O(K) ✅
  2. Memory footprint
    • O(K) ✅
  3. Damage resolution
    • low to medium ⚠️

Despite the above evaluation looks reasonable, in practice, it’s very hard to pick the proper pruning strategy. When the tree is allowed to be taller, the damage resolution is usually better, but the increased memory footprint, logarithmic insertions, and increased iteration time combined pose a significant problem. On the other hand, when the tree is shorter, the damage resolution tends to be low enough not to justify using R-Tree.

Grid-based Damage #

The last, more sophisticated Damage implementation, uses some ideas from R-Tree and forms a very strict, flat structure. In short, the idea is to take some rectangular part of a plane and divide it into cells, thus forming a grid with C columns and R rows. Given such a division, each cell of the grid is meant to store at most one rectangle that effectively is a bounding box of the rectangles matched to that cell. The overview of the approach is presented in the image below:

Grid-based Damage creation process.

As the above situation is very straightforward, one may wonder what would happen if the rectangle would span multiple cells i.e. how the matching algorithm would work in that case.

Before diving into the matching, it’s important to note that from the algorithmic perspective, the matching is very important as it accounts for the majority of operations during new rectangle insertion into the Damage data structure. It’s because when the matched cell is known, the remaining part of insertion is just about taking the bounding box of existing rectangle stored in the cell and the new rectangle, thus having O(1) time complexity.

As for the matching itself, it can be done in various ways:

  • it can be done using strategies known from R-Tree, such as matching a new rectangle into the cell where the bounding box enlargement would be the smallest etc.,
  • it can be done by maximizing the overlap between the new rectangle and the given cell,
  • it can be done by matching the new rectangle’s center (or corner) into the proper cell,
  • etc.

The above matching strategies fall into 2 categories:

  • O(CR) matching algorithms that compare a new rectangle against existing cells while looking for the best match,
  • O(1) matching algorithms that calculate the target cell using a single formula.

Due to the nature of matching, the O(CR) strategies eventually lead to smaller bounding boxes stored in the Damage and hence to better damage resolution as compared to the O(1) algorithms. However, as the practical experiments show, the difference in damage resolution is not big enough to justify O(CR) time complexity over O(1). More specifically, the difference in damage resolution is usually unnoticeable, while the difference between O(CR) and O(1) insertion complexity is major, as the insertion is the most critical operation of the Damage data structure.

Due to the above, the matching method that has proven to be the most practical is matching the new rectangle’s center into the proper cell. It has O(1) time complexity as it requires just a few arithmetic operations to calculate the center of the incoming rectangle and to match it to the proper cell (see the implementation in WebKit). The example of such matching is presented in the image below:

Matching rectangles to proper cells.

The overall evaluation of the grid-based Damage constructed the way described in the above paragraphs is as follows:

  1. performance
    • insertion is O(1) ✅
    • iteration is O(CR) ✅
  2. memory footprint
    • O(CR) ✅
  3. damage resolution
    • low to high (depending on the CR) ✅

Clearly, the fundamentals of the grid-based Damage are strong, but the data structure is heavily dependent on the CR. The good news is that, in practice, even a fairly small grid such as 8x4 (CR=32) yields a damage resolution that is high. It means that this Damage implementation is a great alternative to bounding box Damage as even with very small performance and memory footprint overhead, it yields much better damage resolution.

Moreover, the grid-based Damage implementation gives an opportunity for very handy optimizations that improve memory footprint, performance (iteration), and damage resolution further.

As the grid dimensions are given a-priori, one can imagine that intrinsically, the data structure needs to allocate a fixed-size array of rectangles with CR entries to store cell bounding boxes.

One possibility for improvement in such a situation (assuming small CR) is to use a vector along with bitset so that only non-empty cells are stored in the vector.

The other possibility (again, assuming small CR) is to not use a grid-based approach at all as long as the number of rectangles inserted so far does not exceed CR. In other words, the data structure can allocate an empty vector of rectangles upon initialization and then just append new rectangles to the vector as long as the insertion does not extend the vector beyond CR entries. In such a case, when CR is e.g. 32, up to 32 rectangles can be stored in the original form. If at some point the data structure detects that it would need to store 33 rectangles, it switches internally to a grid-based approach, thus always storing at most 32 rectangles for cells. Also, note that in such a case, the first improvement possibility (with bitset) can still be used.

Summarizing the above, both improvements can be combined and they allow the data structure to have a limited, small memory footprint, good performance, and perfect damage resolution as long as there are not too many damage rectangles. And if the number of input rectangles exceeds the limit, the data structure can still fall-back to a grid-based approach and maintain very good results. In practice, the situations where the input damage rectangles are not exceeding CR (e.g. 32) are very common, and hence the above improvements are very important.

Overall, the grid-based approach with the above improvements has proven to be the best solution for all the embedded devices tried so far, and therefore, such a Damage implementation is a baseline solution used in WPE and GTK WebKit when UnifyDamagedRegions runtime preference is not enabled — which means it works by default in WPE WebKit.

Conclusions #

The former sections demonstrated various approaches to implementing the Damage data structure meant to store damage information. The summary of the results is presented in the table below:

table { border-collapse: separate; border-spacing: 2px; } th { background-color: #666; color: #fff; } th, td { padding: 20px; } tr:nth-child(odd) { background-color: #fafafa; } tr:nth-child(even) { background-color: #f2f2f2; } .code { background-color: #e5e5e5; padding: .25rem; border-radius: 3px; }
Implementation Insertion Iteration Memory Overlaps Resolution
Bounding box O(1) ✅ O(1) ✅ O(1) ✅ No usually low ⚠️
Grid-based O(1) ✅ O(CR) ✅ O(CR) ✅ Yes low to high
(depending on the CR)
R-Tree (with pruning) O(logMK) ✅ O(K) ✅ O(K) ✅ Yes low to medium ⚠️
R-Tree (without pruning) O(logMN) ✅ O(K) ✅ O(N) ❌ Yes low to high
All rects O(1) ✅ O(N) ❌ O(N) ❌ Yes perfect
Region O(logN) ✅ O(N2) ❌ O(N2) ❌ No perfect

While all the solutions have various pros and cons, the Bounding box and Grid-based Damage implementations are the most lightweight and hence are most useful in generic use cases.

On typical embedded devices — where CPUs are quite powerful compared to GPUs — both above solutions are acceptable, so the final choice can be determined based on the actual use case. If the actual web application often yields clustered damage information, the Bounding box Damage implementation should be preferred. Otherwise (majority of use cases), the Grid-based Damage implementation will work better.

On the other hand, on desktop-class devices – where CPUs are far less powerful than GPUs – the only acceptable solution is Bounding box Damage as it has a minimal overhead while it sill provides some decent damage resolution.

The above are the reasons for the default Damage implementations used by desktop-oriented GTK WebKit port (Bounding box Damage) and embedded-device-oriented WPE WebKit (Grid-based Damage).

When it comes to non-generic situations such as unusual hardware, specific applications etc. it’s always recommended to do a proper evaluation to determine which solution is the best fit. Also, the Damage implementations other than the two mentioned above should not be ruled out, as in some exotic cases, they may give much better results.

September 05, 2025 12:00 AM

September 04, 2025

Ricardo Cañuelo Navarro

First steps with Zephyr

In previous installments of this post series about Zephyr we had an initial introduction to it, and then we went through a basic example application that showcased some of its features. If you didn't read those, I heartily recommend you go through them before continuing with this one. If you did already, welcome back. In this post we'll see how to add support for a new device in Zephyr.

As we've been doing so far, we'll use a Raspberry Pi Pico 2W for our experiments. As of today (September 2nd, 2025), most of the devices in the RP2350 SoC are already supported, but there are still some peripherals that aren't. One of them is the inter-processor mailbox that allows both ARM Cortex-M33 cores1 to communicate and synchronize with each other. This opens some interesting possibilities, since the SoC contains two cores but only one is supported in Zephyr due to the architectural characteristics of this type of SoC2. It'd be nice to be able to use that second core for other things: a bare-metal application, a second Zephyr instance or something else, and the way to start the second core involves the use of the inter-processor mailbox.

Throughout the post we will reference our main source material for this task: the RP2350 datasheet, so make sure to keep it at hand.

The inter-processor mailbox peripheral

The processor subsystem block in the RP2350 contains a Single-cycle IO subsystem (SIO) that defines a set of peripherals that require low-latency and deterministic access from the processors. One of these peripherals is a pair of inter-processor FIFOs that allow passing data, messages or events between the two cores (section 3.1.5 in [1]).

The implementation and programmer's model for these is very simple:

  • A mailbox is a pair of FIFOs that are 32 bits wide and four entries deep.
  • One of the FIFOs can only be written by Core 0 and read by Core 1; the other can only be written by Core 1 and read by Core 0.
  • The SIO block has an IRQ output for each core to notify the core that it has received data in its FIFO. This interrupt is mapped to the same IRQ number (25) on each core.
  • A core can write to its outgoing FIFO as long as it's not full.

That's basically it3. The mailbox writing, reading, setup and status check are done through an also simple register interface that's thoroughly described in sections 3.1.5 and 3.1.11 of the datasheet.

The typical use case scenario of this peripheral may be an application distributed in two different computing entities (one in each core) cooperating and communicating with each other: one core running the main application logic in an OS while the other performs computations triggered and specified by the former. For instance, a modem/bridge device that runs the PHY logic in one core and a bridge loop in the other as a bare metal program, piping packets between network interfaces and a shared memory. The mailbox is one of the peripherals that make it possible for these independent cores to talk to each other.

But, as I mentioned earlier, in the RP2350 the mailbox has another key use case: after reset, Core 1 remains asleep until woken by Core 0. The process to wake up and run Core 1 involves both cores going through a state machine coordinated by passing messages over the mailbox (see [1], section 5.3).

Inter-processor mailbox support in Zephyr

NOTE: Not to be confused with mailbox objects in the kernel.

Zephyr has more than one API that fits this type of hardware: there's the MBOX interface, which models a generic multi-channel mailbox that can be used for signalling and messaging, and the IPM interface, which seems a bit more specific and higher-level, in the sense that it provides an API that's further away from the hardware. For this particular case, our driver could use either of these, but, as an exercise, I'm choosing to use the generic MBOX interface, which we can then use as a backend for the zephyr,mbox-ipm driver (a thin IPM API wrapper over an MBOX driver) so we can use the peripheral with the IPM API for free. This is also a simple example of driver composition.

The MBOX API defines functions to send a message, configure the device and check its status, register a callback handler for incoming messages and get the number of channels. That's what we need to implement, but first let's start with the basic foundation for the driver: defining the hardware.

Hardware definition

As we know, Zephyr uses device tree definitions extensively to configure the hardware and to query hardware details and parameters from the drivers, so the first thing we'll do is to model the peripheral into the device tree of the SoC.

In this case, the mailbox peripheral is part of the SIO block, which isn't defined in the device tree, so we'll start by adding this block as a placeholder for the mailbox and leave it there in case anyone needs to add support for any of the other SIO peripherals in the future. We only need to define its address range mapping according to the info in the datasheet:


sio: sio@d0000000 {
	compatible = "raspberrypi,pico-sio";
	reg = <0xd0000000 DT_SIZE_K(80)>;
};

We also need to define a minimal device tree binding for it, which can be extended later as needed (dts/bindings/misc/raspberry,pico-sio.yaml):


description: Raspberry Pi Pico SIO

compatible: "raspberrypi,pico-sio"

include: base.yaml

Now we can define the mailbox as a peripheral inside the SIO block. We'll create a device tree binding for it that will be based on the mailbox-controller binding and that we can extend as needed. To define the mailbox device, we only need to specify the IRQ number it uses, a name for the interrupt and the number of "items" (channels) to expect in a mailbox specifier, ie. when we reference the device in another part of the device tree through a phandle. In this case we won't need any channel specification, since a CPU core only handles one mailbox channel:


sio: sio@d0000000 {
	compatible = "raspberrypi,pico-sio";
	reg = <0xd0000000 DT_SIZE_K(80)>;

	mbox: mbox {
		compatible = "raspberrypi,pico-mbox";
		interrupts = <25 RPI_PICO_DEFAULT_IRQ_PRIORITY>;
		interrupt-names = "mbox0";
		fifo-depth = <4>;
		#mbox-cells = <0>;
		status = "okay";
	};
};

The binding looks like this:


description: Raspberry Pi Pico interprocessor mailbox

compatible: "raspberrypi,pico-mbox"

include: [base.yaml, mailbox-controller.yaml]

properties:
  fifo-depth:
    type: int
    description: number of entries that the mailbox FIFO can hold
    required: true

Driver set up and code

Now that we have defined the hardware in the device tree, we can start writing the driver. We'll put the source code next to the rest of the mailbox drivers, in drivers/mbox/mbox_rpi_pico.c, and we'll create a Kconfig file for it (drivers/mbox/Kconfig.rpi_pico) to define a custom config option that will let us enable or disable the driver in our firmware build:


config MBOX_RPI_PICO
	bool "Inter-processor mailbox driver for the RP2350/RP2040 SoCs"
	default y
	depends on DT_HAS_RASPBERRYPI_PICO_MBOX_ENABLED
	help
	  Raspberry Pi Pico mailbox driver based on the RP2350/RP2040
	  inter-processor FIFOs.

Now, to make the build system aware of our driver, we need to add it to the appropriate CMakeLists.txt file (drivers/mbox/CMakeLists.txt):


zephyr_library_sources_ifdef(CONFIG_MBOX_RPI_PICO   mbox_rpi_pico.c)

And source our new Kconfig file in the main Kconfig for mailbox drivers:


source "drivers/mbox/Kconfig.rpi_pico"

Finally, we're ready to write the driver. The work here can basically be divided into three parts: the driver structure setup according to the MBOX API, the scaffolding needed to have our driver correctly plugged into the device tree definitions by the build system (according to the Zephyr device model), and the actual interfacing with the hardware. We'll skip over most of the hardware-specific details, though, and focus on the driver structure.

First, we will create a device object using one of the macros of the Device Model API. There are many ways to do this, but, in rough terms, what these macros do is to create the object from a device tree node identifier and set it up for boot time initialization. As part of the object attributes, we provide things like an init function, a pointer to the device's private data if needed, the device initialization level and a pointer to the device's API structure. It's fairly common to use DEVICE_DT_INST_DEFINE() for this and loop over the different instances of the device in the SoC with a macro like DT_INST_FOREACH_STATUS_OKAY(), so we'll use it here as well, even if we have only one instance to initialize:


DEVICE_DT_INST_DEFINE(
	0,
	rpi_pico_mbox_init,
	NULL,
	&rpi_pico_mbox_data,
	NULL,
	POST_KERNEL,
	CONFIG_MBOX_INIT_PRIORITY,
	&rpi_pico_mbox_driver_api);

Note that this macro requires specifying the driver's compatible string with the DT_DRV_COMPAT() macro:


#define DT_DRV_COMPAT raspberrypi_pico_mbox

In the device's API struct, we define the functions the driver will use to implement the API primitives. In this case:


static DEVICE_API(mbox, rpi_pico_mbox_driver_api) = {
	.send = rpi_pico_mbox_send,
	.register_callback = rpi_pico_mbox_register_callback,
	.mtu_get = rpi_pico_mbox_mtu_get,
	.max_channels_get = rpi_pico_mbox_max_channels_get,
	.set_enabled = rpi_pico_mbox_set_enabled,
};

The init function, rpi_pico_mbox_init(), referenced in the DEVICE_DT_INST_DEFINE() macro call above, simply needs to set the device in a known state and initialize the interrupt handler appropriately (but we're not enabling interrupts yet):


#define MAILBOX_DEV_NAME mbox0

static int rpi_pico_mbox_init(const struct device *dev)
{
	ARG_UNUSED(dev);

	LOG_DBG("Initial FIFO status: 0x%x", sio_hw->fifo_st);
	LOG_DBG("FIFO depth: %d", DT_INST_PROP(0, fifo_depth));

	/* Disable the device interrupt. */
	irq_disable(DT_INST_IRQ_BY_NAME(0, MAILBOX_DEV_NAME, irq));

	/* Set the device in a stable state. */
	fifo_drain();
	fifo_clear_status();
	LOG_DBG("FIFO status after setup: 0x%x", sio_hw->fifo_st);

	/* Initialize the interrupt handler. */
	IRQ_CONNECT(DT_INST_IRQ_BY_NAME(0, MAILBOX_DEV_NAME, irq),
		DT_INST_IRQ_BY_NAME(0, MAILBOX_DEV_NAME, priority),
		rpi_pico_mbox_isr, DEVICE_DT_INST_GET(0), 0);

	return 0;
}

Where rpi_pico_mbox_isr() is the interrupt handler.

The implementation of the MBOX API functions is really simple. For the send function, we need to check that the FIFO isn't full, that the message to send has the appropriate size and then write it in the FIFO:


static int rpi_pico_mbox_send(const struct device *dev,
			uint32_t channel, const struct mbox_msg *msg)
{
	ARG_UNUSED(dev);
	ARG_UNUSED(channel);

	if (!fifo_write_ready()) {
		return -EBUSY;
	}
	if (msg->size > MAILBOX_MBOX_SIZE) {
		return -EMSGSIZE;
	}
	LOG_DBG("CPU %d: send IP data: %d", sio_hw->cpuid, *((int *)msg->data));
	sio_hw->fifo_wr = *((uint32_t *)(msg->data));
	sev();

	return 0;
}

Note that the API lets us pass a channel parameter to the call, but we don't need it.

The mtu_get and max_channels_get calls are trivial: for the first one we simply need to return the maximum message size we can write to the FIFO (4 bytes), for the second we'll always return 1 channel:


#define MAILBOX_MBOX_SIZE sizeof(uint32_t)

static int rpi_pico_mbox_mtu_get(const struct device *dev)
{
	ARG_UNUSED(dev);

	return MAILBOX_MBOX_SIZE;
}

static uint32_t rpi_pico_mbox_max_channels_get(const struct device *dev)
{
	ARG_UNUSED(dev);

	/* Only one channel per CPU supported. */
	return 1;
}

The function to implement the set_enabled call will just enable or disable the mailbox interrupt depending on a parameter:


static int rpi_pico_mbox_set_enabled(const struct device *dev,
				uint32_t channel, bool enable)
{
	ARG_UNUSED(dev);
	ARG_UNUSED(channel);

	if (enable) {
		irq_enable(DT_INST_IRQ_BY_NAME(0, MAILBOX_DEV_NAME, irq));
	} else {
		irq_disable(DT_INST_IRQ_BY_NAME(0, MAILBOX_DEV_NAME, irq));
	}

	return 0;
}

Finally, the function for the register_callback call will store a pointer to a callback function for processing incoming messages in the device's private data struct:


struct rpi_pico_mailbox_data {
	const struct device *dev;
	mbox_callback_t cb;
	void *user_data;
};

static int rpi_pico_mbox_register_callback(const struct device *dev,
					uint32_t channel,
					mbox_callback_t cb,
					void *user_data)
{
	ARG_UNUSED(channel);

	struct rpi_pico_mailbox_data *data = dev->data;
	uint32_t key;

	key = irq_lock();
	data->cb = cb;
	data->user_data = user_data;
	irq_unlock(key);

	return 0;
}

Once interrupts are enabled, the interrupt handler will call that callback every time this core receives anything from the other one:


static void rpi_pico_mbox_isr(const struct device *dev)
{
	struct rpi_pico_mailbox_data *data = dev->data;

	/*
	 * Ignore the interrupt if it was triggered by anything that's
	 * not a FIFO receive event.
	 *
	 * NOTE: the interrupt seems to be triggered when it's first
	 * enabled even when the FIFO is empty.
	 */
	if (!fifo_read_valid()) {
		LOG_DBG("Interrupt received on empty FIFO: ignored.");
		return;
	}

	if (data->cb != NULL) {
		uint32_t d = sio_hw->fifo_rd;
		struct mbox_msg msg = {
			.data = &d,
			.size = sizeof(d)};
		data->cb(dev, 0, data->user_data, &msg);
	}
	fifo_drain();
}

The fifo_*() functions scattered over the code are helper functions that access the memory-mapped device registers. This is, of course, completely hardware-specific. For example:


/*
 * Returns true if the read FIFO has data available, ie. sent by the
 * other core. Returns false otherwise.
 */
static inline bool fifo_read_valid(void)
{
	return sio_hw->fifo_st & SIO_FIFO_ST_VLD_BITS;
}

/*
 * Discard any data in the read FIFO.
 */
static inline void fifo_drain(void)
{
	while (fifo_read_valid()) {
		(void)sio_hw->fifo_rd;
	}
}

Done, we should now be able to build and use the driver if we enable the CONFIG_MBOX config option in our firmware build.

Using the driver as an IPM backend

As I mentioned earlier, Zephyr provides a more convenient API for inter-processor messaging based on this type of devices. Fortunately, one of the drivers that implement that API is a generic wrapper over an MBOX API driver like this one, so we can use our driver as a backend for the zephyr,mbox-ipm driver simply by adding a new device to the device tree:


ipc: ipc {
	compatible = "zephyr,mbox-ipm";
	mboxes = <&mbox>, <&mbox>;
	mbox-names = "tx", "rx";
	status = "okay";
};

This defines an IPM device that takes two existing mailbox channels and uses them for receiving and sending data. Note that, since our mailbox only has one channel from the point of view of each core, both "rx" and "tx" channels point to the same mailbox, which implements the send and receive primitives appropriately.

Testing the driver

If we did everything right, now we should be able to signal events and send data from one core to another. That'd require both cores to be running, and, at boot time, only Core 0 is. So let's see if we can get Core 1 to run, which is, in fact, the most basic test of the mailbox we can do.

To do that in the easiest way possible, we can go back to the most basic sample program there is, the blinky sample program, which, in this board, should print a periodic message through the UART:


*** Booting Zephyr OS build v4.2.0-1643-g31c9e2ca8903 ***
LED state: OFF
LED state: ON
LED state: OFF
LED state: ON
...

To wake up Core 1, we need to send a sequence of inputs from Core 0 using the mailbox and check at each step in the sequence that Core 1 received and acknowledged the data by sending it back. The data we need to send is (all 4-byte words):

  • 0.
  • 0.
  • 1.
  • A pointer to the vector table for Core 1.
  • Core 1 stack pointer.
  • Core 1 initial program counter (ie. a pointer to its entry function).

in that order.

To send the data from Core 0 we need to instantiate an IPM device, which we'll define in the device-tree first as an alias for the IPM node we created before:


/ {
	chosen {
		zephyr,ipc = &ipc;
	};

Once we enable the IPM driver in the firmware configuration (CONFIG_IPM=y), we can use the device like this:


static const struct device *const ipm_handle =
	DEVICE_DT_GET(DT_CHOSEN(zephyr_ipc));

int main(void)
{
	...

	if (!device_is_ready(ipm_handle)) {
		printf("IPM device is not ready\n");
		return 0;
	}

To send data we use ipm_send(), to receive data we'll register a callback that will be called every time Core 1 sends anything. In order to process the sequence handshake one step at a time we can use a message queue to send the received data from the IPM callback to the main thread:


K_MSGQ_DEFINE(ip_msgq, sizeof(int), 4, 1);

static void platform_ipm_callback(const struct device *dev, void *context,
				  uint32_t id, volatile void *data)
{
	printf("Message received from mbox %d: 0x%0x\n", id, *(int *)data);
	k_msgq_put(&ip_msgq, (const void *)data, K_NO_WAIT);
}

int main(void)
{
	...

	ipm_register_callback(ipm_handle, platform_ipm_callback, NULL);
	ret = ipm_set_enabled(ipm_handle, 1);
	if (ret) {
		printf("ipm_set_enabled failed\n");
		return 0;
	}

The last elements to add are the actual Core 1 code, as well as its stack and vector table. For the code, we can use a basic infinite loop that will send a message to Core 0 every now and then:


static inline void busy_wait(int loops)
{
	int i;

	for (i = 0; i < loops; i++)
		__asm__ volatile("nop");
}

#include <hardware/structs/sio.h>
static void core1_entry()
{
	int i = 0;

	while (1) {
		busy_wait(20000000);
		sio_hw->fifo_wr = i++;
	}
}

For the stack, we can just allocate a chunk of memory (it won't be used anyway) and for the vector table we can do the same and use an empty dummy table (because it won't be used either):


#define CORE1_STACK_SIZE 256
char core1_stack[CORE1_STACK_SIZE];
uint32_t vector_table[16];

And the code to handle the handshake would look like this:


void start_core1(void)
{
	uint32_t cmd[] = {
		0, 0, 1,
		(uintptr_t)vector_table,
		(uintptr_t)&core1_stack[CORE1_STACK_SIZE - 1],
		(uintptr_t)core1_entry};

	int i = 0;
	while (i < sizeof(cmd) / sizeof(cmd[0])) {
		int recv;

		printf("Sending to Core 1: 0x%0x (i = %d)\n", cmd[i], i);
		ipm_send(ipm_handle, 0, 0, &cmd[i], sizeof (cmd[i]));
		k_msgq_get(&ip_msgq, &recv, K_FOREVER);
		printf("Data received: 0x%0x\n", recv);
		i = cmd[i] == recv ? i + 1 : 0;
	}
}

You can find the complete example here.

So, finally we can build the example and check if Core 1 comes to life:


west build -p always -b rpi_pico2/rp2350a/m33 zephyr/samples/basic/blinky_two_cores
west flash -r uf2

Here's the UART output:


*** Booting Zephyr OS build v4.2.0-1643-g31c9e2ca8903 ***
Sending to Core 1: 0x0 (i = 0)
Message received from mbox 0: 0x0
Data received: 0x0
Sending to Core 1: 0x0 (i = 1)
Message received from mbox 0: 0x0
Data received: 0x0
Sending to Core 1: 0x1 (i = 2)
Message received from mbox 0: 0x1
Data received: 0x1
Sending to Core 1: 0x20000220 (i = 3)
Message received from mbox 0: 0x20000220
Data received: 0x20000220
Sending to Core 1: 0x200003f7 (i = 4)
Message received from mbox 0: 0x200003f7
Data received: 0x200003f7
Sending to Core 1: 0x10000905 (i = 5)
Message received from mbox 0: 0x10000905
Data received: 0x10000905
LED state: OFF
Message received from mbox 0: 0x0
LED state: ON
Message received from mbox 0: 0x1
Message received from mbox 0: 0x2
LED state: OFF
Message received from mbox 0: 0x3
Message received from mbox 0: 0x4
LED state: ON
Message received from mbox 0: 0x5
Message received from mbox 0: 0x6
LED state: OFF
Message received from mbox 0: 0x7
Message received from mbox 0: 0x8

That's it! We just added support for a new device and we "unlocked" a new functionality for this board. I'll probably take a break from Zephyr experiments for a while, so I don't know if there'll be a part IV of this series anytime soon. In any case, I hope you enjoyed it and found it useful. Happy hacking!

References

1: Or both Hazard3 RISC-V cores, but we won't get into that.

2: Zephyr supports SMP, but the ARM Cortex-M33 configuration in the RP2350 isn't built for symmetric multi-processing. Both cores are independent and have no cache coherence, for instance. Since these cores are meant for small embedded devices rather than powerful computing devices, the existence of multiple cores is meant to allow different independent applications (or OSs) running in parallel, cooperating and sharing the hardware.

3: There's an additional instance of the mailbox with its own interrupt as part of the non-secure SIO block (see [1], section 3.1.1), but we won't get into that either.

by rcn at September 04, 2025 12:00 PM

September 01, 2025

Igalia WebKit Team

WebKit Igalia Periodical #36

Update on what happened in WebKit in the week from August 25 to September 1.

The rewrite of the WebXR support continues, as do improvements when building for Android, along with smaller fixes in multimedia and standards compliance.

Cross-Port 🐱

The WebXR implementation has gained input through OpenXR, including support for the hand interaction—useful for devices which only support hand-tracking—and the generic simple profile. This was soon followed by the addition of support for the Hand Input module.

Aligned the SVGStyleElement type and media attributes with HTMLStyleElement's.

Multimedia 🎥

GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.

Support for FFMpeg GStreamer audio decoders was re-introduced because the alternative decoders making use of FDK-AAC might not be available in some distributions and Flatpak runtimes.

Graphics 🖼️

Usage of fences has been introduced to control frame submission of rendered WebXR content when using OpenXR. This approach avoids blocking in the renderer process waiting for frames to be completed, resulting in slightly increased performance.

Loading a font from a collection will now iterate until finding the correct one. This solved a few font rendering issues.

WPE WebKit 📟

WPE Platform API 🧩

New, modern platform API that supersedes usage of libwpe and WPE backends.

Changed WPEPlatform to be built as part of the libWPEWebKit library. This avoids duplicating some code in different libraries, brings in a small reduction in used space, and simplifies installation for packagers. Note that the wpe-platform-2.0 module is still provided, and applications that consume the WPEPlatform API must still check and use it.

WPE Android 🤖

Adaptation of WPE WebKit targeting the Android operating system.

Support for sharing AHardwareBuffer handles across processes is now available. This lays out the foundation to use graphics memory directly across different WebKit subsystems later on, making some code paths more efficient, and paves the way towards enabling the WPEPlatform API on Android.

The MediaSession API has been disabled when building for Android. The existing implementation would attempt to use the MPRIS D-Bus interface, which does not work on Android.

That’s all for this week!

by Igalia WebKit Team at September 01, 2025 09:02 PM

August 31, 2025

Luis Henriques

Making the filesystem-wide cache invalidation lightspeed in FUSE

One interesting aspect of FUSE user-space file systems is that caching can be handled at the kernel level. For example, if an application reads data from a file that happens to be on a FUSE file system, the kernel will keep that data in the page cache so that later, if that data is requested again, it will be readily available, without the need for the kernel to request it again to the FUSE server. But the kernel also caches other file system data. For example, it keeps track of metadata (file size, timestamps, etc) that may allow it to also reply to a stat(2) system call without requesting it from user-space.

On the other hand, a FUSE server has a mechanism to ask the kernel to forget everything related to an inode or to a dentry that the kernel already knows about. This is a very useful mechanism, particularly for a networked file system.

Imagine a network file system mounted in two different hosts, rocinante and rucio. Both hosts will read data from the same file, and this data will be cached locally. This is represented in the figure below, on the left. Now, if that file is deleted from the rucio host (same figure, on the right), rocinante will need to be notified about this deletion1. This is needed so that the locally cached data in the rocinante host can also be remove. In addition, if this is a FUSE file system, the FUSE server will need to ask the kernel to forget everything about the deleted file.

Network File System Caching
Network File System Caching

Notifying the kernel to forget everything about a file system inode or dentry can be easily done from a FUSE server using the FUSE_NOTIFY_INVAL_INODE and FUSE_NOTIFY_INVAL_ENTRY operations. Or, if the server is implemented using libfuse, by using the APIs fuse_lowlevel_notify_inval_inode() and fuse_lowlevel_notify_entry(). Easy.

But what if the FUSE file system needs to notify the kernel to forget about all the files in the file system? Well, the FUSE server would simply need to walk through all those inodes and notify the kernel, one by one. Tedious and time consuming. And most likely racy.

Asking the kernel to forget everything about all the files may sound like a odd thing to do, but there are cases where this is needed. For example, the CernVM File System does exactly this. This is a read-only file system, which was developed to distribute software across virtual machines. Clients will then mount the file system and cache data/meta-data locally. Changes to the file system may happen only on a Release Manager Machine, a specific server where the file system will be mounted in read/write mode. When this Release Manager is done with all the changes, they can be all merged and published atomically, as a new revision of the file system. Only then the clients are able to access this new revision, but all the data (and meta-data) they have cached locally will need to be invalidated.

And this is where a new mechanism that has just been merged into mainline kernel v6.16 comes handy: a single operation that will ask the kernel to invalidate all dentries for a specific FUSE connection that the kernel knows about. After trying a few different approaches, I've implemented this mechanism for a project at Igalia by adding the new FUSE_NOTIFY_INC_EPOCH operation. This operation can be used from libfuse through fuse_lowlevel_notify_increment_epoch()2.

In a nutshell, every dentry (or directory entry) can have a time-to-live value associated with it; after this time has expired, it will need to be revalidated. This revalidation will happen when the kernel VFS layer does a file name look-up and finds a dentry cached (i.e. a dentry that has been looked-up before).

Since this commit has been merged, the concept of epoch was introduced: a FUSE server connection to the kernel will have an epoch value, and every new dentry created will also have an epoch, initialised to the same value as the connection. What the new FUSE_NOTIFY_INC_EPOCH operation will do is simply to increment the connection epoch value. Later, when the VFS is performing a look-up and finds a dentry cached, it will executed the FUSE callback function to revalidate it. At this point, FUSE will verify that the dentry epoch value is outdated and invalidate it.

Now, what's missing is an extra mechanism to periodically check for any dentries that need to be invalidated so that invalid dentries don't hang around for too long after the epoch is incremented. And that's exactly what's currently under discussion upstream. Hopefully it will shortly get into a state where it can be merged too.

Footnotes:

1

Obviously, this is a very simplistic description – it all depends on the actual file system design and implementation details, and specifically on the communication protocol being used to synchronise the different servers/clients across the network. In CephFS, for example, the clients get notified through it's own Ceph-specific protocol, by having it's Metadata Servers (MDS) revoking 'capabilities' that have been previously given to the clients, forcing them to request the data again if needed.

2

Note however that, although this extra API has been already merged into libfuse, no release has yet been done.

August 31, 2025 11:00 PM

August 25, 2025

Javier Fernández

Ed25519 Support Lands in Chrome: What It Means for Developers and the Web.

Introduction

Chrome M137 is the first stable version shipping the Ed25519 feature enabled by default, joining Safari and Firefox in their support. This is the last milestone of a three-year collaboration between Protocol Labs (who initiated the project), the IPFS Foundation, Open Impact Foundation, and WebTransitions.org.

In this post I’m going to analyze the impact this feature will have on the IPFS ecosystem and how a specific need in a niche area has become a mechanism to fund the web. Let’s start with a brief introduction of the feature and why it is important for the web platform.

Ed25519 key pairs have become a standard in many web applications, and the IPFS protocol adopted them as the default some time ago. From the outset, Ed25519 public keys have also served as primary identifiers in systems like dat/hypercore and SSB. Projects within this technology ecosystem tend to favor Ed25519 keys due to their smaller size and the potential for faster cryptographic operations compared to RSA keys.

In the context of Web Applications, the browser is responsible for dealing with the establishment of a secure connection to a remote server, which necessarily entails signature verification. There are two possible approaches for application developers:

  1. using the browser’s native cryptographic primitives, via the Web Cryptography API
  2. bundling crypto libraries into the application itself (eg as JS source files or WebAssembly binaries).

Developers have often faced a difficult choice between using RSA keys, which are natively supported in browsers, and the Ed25519 keys they generally prefer, since using the latter required relying on external libraries until M137. These external software components introduce potential security risks if compromised, for which the developer and the application could be held responsible. In most cases, it’s desirable for private keys to be non-extractable in order to prevent attacks from malicious scripts or browser extensions, something that cannot be guaranteed with JS/WASM-based implementations that “bundle in” cryptography capabilities. Additionally, “user-space” implementations like these are necessarily vulnerable to supply chain attacks out of the developer’s control, further increasing the risk and liability surface.

The work Igalia has been doing in recent years to contribute to the implementation of the Curve25519-based algorithms in the 3 main browsers (Chrome, Firefox and Safari) made it possible to promote Ed25519 and X25519 from the Web Incubation Group to the official W3C Web Cryptography API specification. This is a key milestone for the IPFS development community, since it guarantees a stable API to native cryptography primitives in the browser, allowing simpler, more secure, and more robust applications. Additionally, not having to bundle in cryptography means less code to maintain and fewer surfaces to secure over time.

Impact the entire Web Platform – and that’s a huge win

As already mentioned, Secure Curves like Ed25519 and X25519 play an important role in the cryptographic related logic of dApps. However, what makes this project particularly interesting is that it targets the commons – the Web Platform itself. The effort to fund and implement a key advantage for a niche area has the potential to positively impact  the entire Web Platform – and that’s a huge win.

There are several projects that will benefit from a native implementation of the Curve25519 algorithms in the browser’s Web Cryptography API.

Proton services

Proton offers services like Proton Mail, Proton Drive, Proton Calendar and Proton Wallet which use Elliptic Curve Cryptography (ECC) based on Curve25519 by default. Their web applications make use of the browser’s Web Cryptography API when available. The work we have done to implement and ship the Ed25519 and X25519 algorithms in the 3 main browser engines allows users of these services to rely on the browser’s native implementation of their choice, leading to improved performance and security. It’s worth mentioning that Proton is also contributing to the Web Cryptography API via the work Daniel Huigens is doing as spec editor

Matrix/Riot

The Matrix instant messaging web application uses Ed25519 for its device identity and cryptographic signing operations. These are implemented by the matrix-sdk-crypto Rust component, which is shared by both the web and native clients. This unified crypto engine is compiled to WebAssembly and integrated into the web client via the JavaScript SDK. Although theoretically, the web client could eventually use the browser’s Web Crypto API to implement the Ed25519-related operations, it might not be the right approach for now. The messaging app also requires other low-level cryptographic primitives that are not yet available in the Web API. Continued evolution of  the Web Crypto API, with more algorithms and low level operations, is a key factor in increasing adoption of the API.

Signal

The Signal Protocol is well known for its robust end-to-end encrypted messaging capabilities, and the use of Ed25519 and X25519 is an important piece of its security model. The Signal web client, which is implemented as an Electron application, is based on the Signal Protocol, which relies on these algorithms. The cryptographic layer is implemented in the libsignal internal component, and it is used by all Signal clients. The point is that, as an Electron app, the web client may be able to take advantage of Chrome’s Web Crypto API; however, as with the Matrix web client, the specific requirement of these messaging applications, along with some limitations of the Web API, might be reasons to rule out this approach for the time being.

Use of Ed25519 and X25519 in the IPFS ecosystem

Developing web features implies a considerable effort in terms of time and money. Contributing to a better and more complete Web Platform is an admirable goal, but it does not justify the investment if it does not address a specific need. In this section I’m going to analyze the impact of this feature in some projects in the IPFS ecosystem.

Libp2p

According to the spec, implementations MUST support Ed25519. The js-libp2p implementation for the JS APIs exposed by the browser provides a libp2p-crypto library that depends on the WebCrypto API, so it doesn’t require building third-party crypto components. The upstream work to replace the Ed25519 operations with Web Crypto alternatives has also shown benefits in terms of performance; see the PR 3100 for details. Backward compatibility with the JavaScript based implementation, provided via @noble/curves, is guaranteed though. 

There are several projects that depend on libp2p that would benefit from the use of the Web Cryptography API to implement their Ed25519 operations: 

  • Hellia a pure JavaScript implementation of the IPFS protocol capable of running in a browser or a Node.js server. 
  • Peergos — a decentralised protocol and open-source platform for storage, social media and applications.
  • Lodestar – an Ethereum consensus client written in JS/TypeScript.
  • HOPR – a privacy-preserving network protocol for messaging.
  • Peerbit – a decentralized database framework with built-in encryption.
  • Topology – a decentralized network infrastructure tooling suite.

Helia

The Secure Curves are widely implemented in the main JavaScript engines, so now that the main browsers offer support in their stable releases, Helia developers can be fairly confident in relying on the Web Cryptography API implementation. The eventual removal of the @noble/curves dependency to implement the Ed25519 operations is going to positively impact  the Helia project, for the reasons already explained. However, Helia depends on @libp2p/webrtc for the implementation of the WebRTC transport layer. This package depends on @peculiar/x509, probably for the X509 certificate creation and verification, and also on @peculiar/webcrypto. The latter is a WebCrypto API polyfill that probably would be worth removing, given that most of the JS engines already provide a native implementation.

Lodestar

This project heavily depends on js-libp2p to implement its real-time peer-to-peer network stack (Discv5, GossipSub, Resk/Resq and Noise). Its modular design enables it to operate as a decentralized Ethereum client for the libp2p applications ecosystem. It’s a good example because it doesn’t use Ed25519 for the implementation of its node identity; instead it’s based on secp256k1. However, Lodestar’s libp2p-based handshake uses the Noise protocol, which itself uses X25519 (Curve25519) for the Diffie–Hellman key exchange to establish a secure channel between peers. The Web Cryptography API provides operations for this key-sharing algorithm, and it has also been shipped in the stable releases of the 3 main browsers.

Peergos

This is an interesting example; unlike Hellia, it’s implemented in Java so it uses a custom libp2p implementation (in Java) built around jvm-libp2p, a native Java libp2p stack, and integrates cryptographic primitives on the JVM. It uses the Ed25519 operations for key generation, signatures, and identity purposes, but provides its own implementation as part of its cryptographic layer. Back in July 2024 they integrated a WebCryptoAPI based implementation, so that it’s used when supported in the browser. As a technology targeting the Web Platform, it’d be an interesting move to eventually get rid of the custom Ed25519 implementation and rely on the browser’s Web Cryptography API instead, either through the libp2p-crypto component or its own cryptographic layer.

Other decentralized technologies

The implementation of the Curve25519 related algorithms in the browser’s Web Cryptography API has had an impact that goes beyond the IPFS community, as it has been widely used in many other technologies across the decentralized web. 

In this section I’m going to describe a few examples of relevant projects that are – or could potentially be – getting rid of third-party libraries to implement their Ed25519 and X25519 operations, relying on the native implementation provided by the browser.

Phantom wallet

Phantom was built specifically for Solana and designed to interact with Solana-based applications. Solana uses Ed25519 keys for identity and transaction signing, so Phantom generates and manages these keys within the browser or mobile device. This ensures that all operations (signing, message verification, address derivation) conform to Solana’s cryptographic standards. This integration comes from the official Solana JavaScript SDK: @solana/web3.js. In recent versions of the SDK, the Ed25519 operations use the native Crypto API if it is available, but it still provides a polyfill implemented with @noble/ed25519. According to the npm registry, the polyfill has a bundle size of 405 kB unpacked (minimized around 100 – 150 kB).

Making the Case for the WebCrypto API

In the previous sections we have discussed several web projects where the Ed25519 and X25519 algorithms are a fundamental piece of their cryptographic layer. The variety of solutions adopted to provide an implementation of the cryptographic primitives, such as those for identity and signing, has been remarkable.

  • @noble/curves – A high-security, easily auditable set of contained cryptographic libraries, Zero or minimal dependencies, highly readable TypeScript / JS code, PGP-signed releases and transparent NPM builds.
  • TweetNaCL.js – Port of TweetNaCl / NaCl to JavaScript for modern browsers and Node.js. Public domain.
  • Web Cryptography polyfills
  • Custom SDK implementations
    • matrix-sdk-crypto –  A no-network-IO implementation of a state machine that handles end-to-end encryption for Matrix clients. Compiled to WebAssembly and integrated into the web client via the JavaScript SDK.
    • Bouncy Castle Crypto – The Bouncy Castle Crypto package is a Java implementation of cryptographic algorithms. 
    • libsignal – signal-crypto provides cryptographic primitives such as AES-GCM; it uses RustCrypto‘s where possible.

As mentioned before, some web apps have strong cross-platform requirements, or simply the Web Crypto API is not flexible enough or lacks support for new algorithms (eg, Matrix and Signal). However, for the projects where the Web Platform is a key use case, the Web API offers way more advantages in terms of performance, bundle size, security and stability. 

The Web Crypto API is supported in the main JavaScript engines (Node.js, Deno) and the main web browsers (Safari, Chrome and Firefox). This full coverage of the Web Platform ensures high levels of interoperability and stability for both users and developers of web apps. Additionally, with the recent milestone announced in this post – the shipment of Secure Curves support in the latest stable Chrome release –  the availability of this in the Web Platform is also remarkable:

Investment and prioritization

The work to implement and ship the Ed25519 and X25519 algorithms in the 3 main browser engines has been a long path. It took a few years to stabilize the WICG document, prototyping, increasing the test coverage in the WPT repository and incorporating both algorithms in the W3C official draft of the Web Cryptography API specification. Only after this final step, could the shipment procedure be finally completed, with Chrome 137 being the last one to ship the feature enabled by default. 

This experience shows another example of a non-browser vendor pushing forward a feature that benefits the whole Web Platform, even if it initially covers a short-term need of a niche community, in this case the dWeb ecosystem. It’s also worth noting how the prioritization of this kind of web feature works in the Web Platform development cycle, and more specifically the browser feature shipment process. The browsers have their own budgets and priorities, and it’s the responsibility of the Web Platform consumers to invest in the features they consider critical for their user base. Without non-browser vendors pushing forward features, the Web Platform would evolve at a much slower pace.

The flaws of trying to use the Web Crypto API for some projects have been noted in this post, which prevent a bigger adoption of this API in favor of third-party cryptography components. The API needs to incorporate new algorithms, as has been recently discussed in the Web App Sec Working Group meetings; there is an issue to collect ideas for new algorithms and a WICG draft. Some of the proposed algorithms include post-quantum secure and modern cryptographic algorithms like ML-KEM, ML-DSA, and ChaChaPoly1305.

Recap and next steps

The Web Cryptography API specification has had a difficult journey in the last few years, as was explained in a previous post, when the Web Cryptography WG was dismantled. Browsers gave very low priority to this API until Daniel Huigens (Proton) took over the responsibility and became the only editor of the spec. The implementation progress has been almost exclusively targeting standalone JS engines until this project by Igalia was launched 3 years ago. 

The incorporation of Ed25519 and X25519 into the official W3C draft, along with default support in all three major web browsers, has brought this feature to the forefront of web application development where a cryptographic layer is required. 

The use of the Web API provides several advantages to the web authors:

  • Performance – Generally more performant implementation of the cryptographic operations, including reduced bundled size of the web app.
  • Security  –  Reduced attack surface, including JS timing attacks or memory disclosure via JS inspection; no supply-chain vulnerabilities. 
  • Stability and Interoperability – Standardized and stable API, long-term maintenance by the browsers’ development teams. 

Streams and WebCrypto

This is a “decade problem”, as it’s noticeable from the WebCrypto issue 73, which now has some fresh traction thanks to a recent proposal by WinterTC, a technical committee focused on web-interoperable server runtimes; there is also an alternate proposal from the WebCrypto WG. It’s still unclear how much support to expect from implementors, especially the three main browser vendors, but Chrome has at least expressed strong opposition to the idea of a streaming encryption / decryption. However, there is a clear indication of support for hashing of streams, which is perhaps the use case from which IPFS developers would benefit most. Streams would have a big impact on many IPFS related use cases, especially in combination with better support for BLAKE3 which would be a major step forward.

by jfernandez at August 25, 2025 09:31 PM

Igalia WebKit Team

WebKit Igalia Periodical #35

Update on what happened in WebKit in the week from August 18 to August 25.

This week continue improvements in the WebXR front, more layout tests passing, support for CSS's generic font family for math, improvements in the graphics stack, and an Igalia Chat episode!

Cross-Port 🐱

Align the experimental CommandEvent with recent specification changes. This should finalise the implementation ready to enable by default.

The WebXR implementation has gained support to funnel usage permission requests through the public API for immersive sessions. Note that this is a basic implementation, and fine-grained control of requested session capabilities may be added at a later time.

The GTK MiniBrowser has been updated to handle WebKitXRPermissionRequest accordingly.

Implemented rawKeyDown and rawKeyUp in WKTestRunner for WPEWebKit and WebkitGTK, which made more than 300 layout tests pass.

Enable command and commandfor attributes in stable. These are part of the invoker commands API for buttons.

Graphics 🖼️

The CSS font-family: math generic font family is now supported in WebKit. This is part of the CSS Fonts Level 4 specification.

The WebXR implementation has gained to ability to use GBM graphics buffers as fallback, which allows usage with drivers that do not provide the EGL_MESA_image_dma_buf_export extension, yet use GBM for buffer allocation.

The WebXR render loop has been simplified by using a work queue and offloading the session handling to the render thread.

Community & Events 🤝

Early this month, a new episode of Igalia Chat titled "Get Down With the WebKit" was released, where Brian Kardell and Eric Meyer talk with Igalia's Alejandro (Alex) Garcia about the WebKit project and Igalia's WPE port.

That’s all for this week!

by Igalia WebKit Team at August 25, 2025 07:42 PM

August 22, 2025

Eric Meyer

No, Google Did Not Unilaterally Decide to Kill XSLT

It’s uncommon, but not unheard of, for a GitHub issue to spark an uproar.  That happened over the past month or so as the WHATWG (Web Hypertext Application Technology Working Group, which I still say should have called themselves a Task Force instead) issue “Should we remove XSLT from the web platform?” was opened, debated, and eventually locked once the comment thread started spiraling into personal attacks.  Other discussions have since opened, such as a counterproposal to update XSLT in the web platform, thankfully with (thus far) much less heat.

If you’re new to the term, XSLT (Extensible Stylesheet Language Transformations) is an XML language that lets you transform one document tree structure into another.  If you’ve ever heard of people styling their RSS and/or Atom feeds to look nice in the browser, they were using some amount of XSLT to turn the RSS/Atom into HTML, which they could then CSS into prettiness.

This is not the only use case for XSLT, not by a long shot, but it does illustrate the sort of thing XSLT is good for.  So why remove it, and who got this flame train rolling in the first place?

Before I start, I want to note that in this post, I won’t be commenting on whether or not XSLT support should be dropped from browsers or not.  I’m also not going to be systematically addressing the various reactions I’ve seen to all this.  I have my own biases around this — some of them in direct conflict with each other! — but my focus here will be on what’s happened so far and what might lie ahead.

Also, Brian and I talked with Liam Quin about all this, if you’d rather hear a conversation than read a blog post.

As a very quick background, various people have proposed removing XSLT support from browsers a few times over the quarter-century-plus since support first landed.  It was discussed in both the early and mid-2010s, for example.  At this point, browsers all more or less supportXSLT 1.0, whereas the latest version of XSLT is 3.0.  I believe they all do so with C++ code, which is therefore not memory-safe, that is baked into the code base rather than supported via some kind of plugged-in library, like Firefox using PDF.js to support PDFs in the browser.

Anyway, back on August 1st, Mason Freed of Google opened issue #11523 on WHATWG’s HTML repository, asking if XSLT should be removed from browsers and giving a condensed set of reasons why it might be a good idea.  He also included a WASM-based polyfill he’d written to provide XSLT support, should browsers remove it, and opened “ Investigate deprecation and removal of XSLT” in the Chromium bug tracker.

“So it’s already been decided and we just have to bend over and take the changes our Googlish overlords have decreed!” many people shouted.  It’s not hard to see where they got that impression, given some of the things Google has done over the years, but that’s not what’s happening here.  Not at this point.  I’d like to set some records straight, as an outside observer of both Google and the issue itself.

First of all, while Mason was the one to open the issue, this was done because the idea was raised in a periodic WHATNOT meeting (call), where someone at Mozilla was actually the one to bring it up, after it had come up in various conversations over the previous few months.  After Mason opened the issue, members of the Mozilla and WebKit teams expressed (tentative, mostly) support for the idea of exploring this removal.  Basically, none of the vendors are particularly keen on keeping native XSLT support in their codebases, particularly after security flaws were found in XSLT implementations.

This isn’t the first time they’ve all agreed it might be nice to slim their codebases down a little by removing something that doesn’t get a lot of use (relatively speaking), and it won’t be the last.  I bet they’ve all talked at some point about how nice it would be to remove BMP support.

Mason mentioned that they didn’t have resources to put toward updating their XSLT code, and got widely derided for it. “Google has trillions of dollars!” people hooted.  Google has trillions of dollars.  The Chrome team very much does not.  They probably get, at best, a tiny fraction of one percent of those dollars.  Whether Google should give the Chrome team more money is essentially irrelevant, because that’s not in the Chrome team’s control.  They have what they have, in terms of head count and time, and have to decide how those entirely finite resources are best spent.

(I will once again invoke my late-1900s formulation of Hanlon’s Razor: Never attribute to malice that which can be more adequately explained by resource constraints.)

Second of all, the issue was opened to start a discussion and gather feedback as the first stage of a multi-step process, one that could easily run for years.  Google, as I assume is true for other browser makers, has a pretty comprehensive method for working out whether removing a given feature is tenable or not.  Brian and I talked with Rick Byers about it a while back, and I was impressed by both how many things have been removed, and what they do to make sure they’re removing the right things.

Here’s one (by no means the only!) way they could go about this:

  1. Set up a switch that allows XSLT to be disabled.
  2. In the next release of Chrome, use the switch to disable XSLT in one percent of all Chrome downloads.
  3. See if any bug reports come in about it.  If so, investigate further and adjust as necessary if the problems are not actually about XSLT.
  4. If not, up the percentage of XSLT-disabled downloads a little bit at a time over a number of releases.  If no bugs are reported as the percentage of XSLT-disabled users trends toward 100%, then prepare to remove it entirely.
  5. If, on the other hand, it becomes clear that removing XSLT will be a widely breaking change  —  where “widely” can still mean a very tiny portion of their total user base — then XSLT can be re-enabled for all users as soon as possible, and the discussion taken back up with this new information in hand.

Again, that is just one of several approaches Google could take, and it’s a lot simpler than what they would most likely actually do, but it’s roughly what they default to, as I understand it.  The process is slow and deliberate, building up a picture of actual use and user experience.

Third of all, opening a bug that includes a pull request of code changes isn’t a declaration of countdown to merge, it’s a way of making crystal clear (to those who can read the codebase) exactly what the proposal would entail.  It’s basically a requirement for the process of making a decision to start, because it sets the exact parameters of what’s being decided on.

That said, as a result of all this, I now strongly believe that every proposed-removal issue should point to the process and where the issue stands in it. (And write down the process if it hasn’t been already.) This isn’t for the issue’s intended audience, which was other people within WHATWG who are familiar with the usual process and each other, but for cases of context escape, like happened here.  If a removal discussion is going to be held in public, then it should assume the general public will see it and provide enough context for the general public to understand the actual nature of the discussion.  In the absence of that context, the nature of the discussion will be assumed, and every assumption will be different.

There is one thing that we should all keep in mind, which is that “remove from the web platform” really means “remove from browsers”.  Even if this proposal goes through, XSLT could still be used server-side.  You could use libraries that support XSLT versions more recent than 1.0, even!  Thus, XML could still be turned into HTML, just not in the client via native support, though JS or WASM polyfills, or even add-on extensions, would still be an option.  Is that good or bad?  Like everything else in our field, the answer is “it depends”.

Just in case your eyes glazed over and you quickly skimmed to see if there was a TL;DR, here it is:

The discussion was opened by a Google employee in response to interest from multiple browser vendors in removing built-in XSLT, following a process that is opaque to most outsiders.  It’s a first step in a multi-step evaluation process that can take years to complete, and whose outcome is not predetermined.  Tempers flared and the initial discussion was locked; the conversation continues elsewhere.  There are good reasons to drop native XSLT support in browsers, and also good reasons to keep or update it, but XSLT is not itself at risk.

 

Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at August 22, 2025 04:22 PM

August 20, 2025

Eric Meyer

To Infinity… But Not Beyond!

Previously on meyerweb, I explored ways to do strange things with the infinity keyword in CSS calculation functions.  There were some great comments on that post, by the way; you should definitely go give them a read.  Anyway, in this post, I’ll be doing the same thing, but with different properties!

When last we met, I’d just finished up messing with font sizes and line heights, and that made me think about other text properties that accept lengths, like those that indent text or increase the space between words and letters.  You know, like these:

div:nth-of-type(1) {text-indent: calc(infinity * 1ch);}
div:nth-of-type(2) {word-spacing: calc(infinity * 1ch);}
div:nth-of-type(3) {letter-spacing: calc(infinity * 1ch);}
<div>I have some text and I cannot lie!</div>
<div>I have some text and I cannot lie!</div>
<div>I have some text and I cannot lie!</div>

According to Frederic Goudy, I am now the sort of man who would steal a infinite number of sheep.  Which is untrue, because, I mean, where would I put them?

Consistency across Firefox, Chrome, and Safari

Visually, these all came to exactly the same result, textually speaking, with just very small (probably line-height-related) variances in element height.  All get very large horizontal overflow scrolling, yet scrolling out to the end of that overflow reveals no letterforms at all; I assume they’re sat just offscreen when you reach the end of the scroll region.  I particularly like how the “I” in the first <div> disappears because the first line has been indented a few million (or a few hundred undecillion) pixels, and then the rest of the text is wrapped onto the second line.  And in the third <div>, we can check for line-leading steganography!

When you ask for the computed values, though, that’s when things get weird.

Text property results
Computed value for…
Browser text-indent word-spacing letter-spacing
Safari 33554428px 33554428px 33554428px
Chrome 33554400px 3.40282e+38px 33554400px
Firefox (Nightly) 3.40282e+38px 3.40282e+38px 3.40282e+38px

Safari and Firefox are at least internally consistent, if many orders of magnitude apart from each other.  Chrome… I don’t even know what to say.  Maybe pick a lane?

I have to admit that by this point in my experimentation, I was getting a little bored of infinite pixel lengths.  What about infinite unitless numbers, like line-height or  —  even better  —  z-index?

div {
	position: absolute;
}
div:nth-of-type(1) {
	top: 10%;
	left: 1em;
	z-index: calc(infinity + 1);
}
div:nth-of-type(2) {
	top: 20%;
	left: 2em;
	z-index: calc(infinity);
}
div:nth-of-type(3) {
	top: 30%;
	left: 3em;
	z-index: 32767;
}
<div>I’m really high!</div>
<div>I’m really high!</div>
<div>I’m really high!</div>
The result you get in any of Firefox, Chrome, or Safari

It turns out that in CSS you can go to infinity, but not beyond, because the computed values were the same regardless of whether the calc() value was infinity or infinity + 1.

z-index values
Browser Computed value
Safari 2147483647
Chrome 2147483647
Firefox (Nightly) 2147483647

Thus, the first two <div> s were a long way above the third, but were themselves drawn with the later-painted <div> on top of the first.  This is because in positioning, if overlapping elements have the same z-index value, the one that comes later in the DOM gets painted over top any that come before it.

This does also mean you can have a finite value beat infinity.  If you change the previous CSS like so:

div:nth-of-type(3) {
	top: 30%;
	left: 3em;
	z-index: 2147483647;
}

…then the third <div> is painted atop the other two, because they all have the same computed value.  And no, increasing the finite value to a value equal to 2,147,483,648 or higher doesn’t change things, because the computed value of anything in that range is still 2147483647.

The results here led me to an assumption that browsers (or at least the coding languages used to write them) use a system where any “infinity” that has multiplication, addition, or subtraction done to it just returns “infinite”.  So if you try to double Infinity, you get back Infinity (or Infinite or Inf or whatever symbol is being used to represent the concept of the infinite).  Maybe that’s entry-level knowledge for your average computer science major, but I was only one of those briefly and I don’t think it was covered in the assembler course that convinced me to find another major.

Looking across all those years back to my time in university got me thinking about infinite spans of time, so I decided to see just how long I could get an animation to run.

div {
	animation-name: shift;
	animation-duration: calc(infinity * 1s);
}
@keyframes shift {
	from {
		transform: translateX(0px);
	}
	to {
		transform: translateX(100px);
	}
}
<div>I’m timely!</div>

The results were truly something to behold, at least in the cases where beholding was possible.  Here’s what I got for the computed animation-duration value in each browser’s web inspector Computed Values tab or subtab:

animation-duration values
Browser Computed value As years
Safari 🤷🏽
Chrome 1.79769e+308s 5.7004376e+300
Firefox (Nightly) 3.40282e+38s 1.07902714e+31

Those are… very long durations.  In Firefox, the <div> will finish the animation in just a tiny bit over ten nonillion (ten quadrillion quadrillion) years.  That’s roughly ten times as long as it will take for nearly all the matter in the known Universe to have been swallowed by supermassive galactic black holes.

In Chrome, on the other hand, completing the animation will take approximately half again as long asan incomprehensibly longer amount of time than our current highest estimate for the amount of time it will take for all the protons and neutrons in the observable Universe to decay into radiation, assuming protons actually decay. (Source: Wikipedia’s Timeline of the far future.)

“Okay, but what about Safari?” you may be asking.  Well, there’s no way as yet to find out, because while Safari loads and renders the page like usual, the page then becomes essentially unresponsive.  Not the browser, just the page itself.  This includes not redrawing or moving the scrollbar gutters when the window is resized, or showing useful information in the Web Inspector.  I’ve already filed a bug, so hopefully one day we’ll find out whether its temporal limitations are the same as Chrome’s or not.

It should also be noted that it doesn’t matter whether you supply 1s or 1ms as the thing to multiply with infinity: you get the same result either way.  This makes some sense, because any finite number times infinity is still infinity.  Well, sort of.  But also yes.

So what happens if you divide a finite amount by infinity?  In browsers, you very consistently get nothing!

div {
	animation-name: shift;
	animation-duration: calc(100000000000000000000000s / infinity);
}

(Any finite number could be used there, so I decided to type 1 and then hold the 0 key for a second or two, and use the resulting large number.)

Division-by-infinity results
Browser Computed value
Safari 0
Chrome 0
Firefox (Nightly) 0

Honestly, seeing that kind of cross-browser harmony… that was soothing.

And so we come full circle, from something that yielded consistent results to something else that yields consistent results.  Sometimes, it’s the little wins that count the most.

Just not infinitely.


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at August 20, 2025 02:49 PM

Jasmine Tang

The hitchhiker's guide to LLVM debugging tools

Jasmine introduces to beginners some debugging tools that benefit their LLVM development.

August 20, 2025 12:00 AM

August 18, 2025

Igalia WebKit Team

WebKit Igalia Periodical #34

Update on what happened in WebKit in the week from August 11 to August 18.

This week we saw updates in WebXR support, better support for changing audio outputs, enabling of GLib API when building the JSCOnly port, improvements to damaging propagation, WPE platform enhancements, and more!

Cross-Port 🐱

Complementing our WebXR efforts, it is now possible to query whether a WebView is in immersive mode and request to leave immersive mode.

Multimedia 🎥

GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.

Changing audio outputs has been changed to use gst_device_reconfigure_element() instead of relying on knowledge about how different GStreamer sink elements handle the choice of output device. Note that audio output selection support is in development and disabled by default, the ExposeSpeakers, ExposeSpeakersWithoutMicrophone, and PerElementSpeakerSelection features flags may be toggled to test it.

While most systems use PipeWire or PulseAudio these days, some systems may need a fix for the corresponding ALSA and OSS elements, which has been already merged in GStreamer.

JavaScriptCore 🐟

The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.

Support for enabling the GLib API when building the JSCOnly port has been added.

Graphics 🖼️

With the #44192 PR landed, the damage propagation feature is now able to propagate damage from accelerated 2D canvas.

WPE WebKit 📟

Fixed minor bug in WPE's: pressing Esc key inside a dialog closes dialog. See PR #49265.

WPE Platform API 🧩

New, modern platform API that supersedes usage of libwpe and WPE backends.

The WPEPlatform DRM backend can now report available input devices. This is mainly used to support Interaction Media Features queries, which allow web sites to better adapt to the available input devices.

That’s all for this week!

by Igalia WebKit Team at August 18, 2025 07:21 PM

Vivienne Watermeier

CEA-608 captions in Media Source Extensions with webkitgtk

Recently, I have been working on webkitgtk support for in-band text tracks in Media Source Extensions, so far just for WebVTT in MP4. Eventually, I noticed a page that seemed to be using a CEA-608 track - most likely unintentionally, not expecting it to be handled - so I decided to take a look how that might work. Take a look at the resulting PR here: https://github.com/WebKit/WebKit/pull/47763

Now, if you’re not already familiar with subtitle and captioning formats, particularly CEA-608, you might assume they must be straightforward, compared to audio and video. After all, its just a bit of text and some timestamps, right?

However, even WebVTT as a text-based format already provides lots of un- or poorly supported features that don’t mesh well with MSE - for details on those open questions, take a look at Alicia’s session on the topic: https://github.com/w3c/breakouts-day-2025/issues/14

Quick introduction to CEA-608 #

CEA-608, also known as line 21 captions, is responsible for encoding captions as a fixed-bitrate stream of byte pairs in an analog NTSC broadcast. As the name suggests, they are transmitted during the vertical blanking period, on line 21 (and line 284, for the second field) - imagine this as the mostly blank area “above” the visible image. This provides space for up to 4 channels of captioning, plus some additional metadata about the programming, though due to the very limited bandwidth, these capabilities were rarely used to their full extent.

While digital broadcasts provide captioning defined by its successor standard CEA-708, this newer format still provides the option to embed 608 byte pairs. This is still quite common, and is enabled by later standards defining a digital encoding, known as Caption Distribution Packets.

These are also what enables CEA-608 tracks in MP4.

Current issues, and where to go in the future #

The main issue I’ve encountered in trying to make CEA-608 work in an MSE context lies in its origin as a fixed-bitrate stream - there is no concept of cues, no defined start or end, just one continuous stream.

As WebKit internally understands only WebVTT cues, we rely on GStreamer’s cea608tott element for the conversion to WebVTT. Essentially, this element needs to create cues with well-defined timestamps, which works well enough if we have the entire stream present on disk.

However, when 608 is present as a track in an MSE stream, how do we tell if the “current” cue is continued in the next SourceBuffer? Currently, cea608tott will just wait for more data, and emit another cue once it encounters a line break, or its current line buffer fills up, but this also means the final cue will be swallowed, because there will never be “more data” to allow for that decision.

The solution would be to always cut off cues at SourceBuffer boundaries, so cues might appear to be split awkwardly to the viewer. Overall, this conversion to VTT won’t reproduce the captions as they were intended to be viewed, at least not currently. In particular, roll-up mode can’t easily be emulated using WebVTT.

The other issue is that I’ve assumed for the current patch that CEA-608 captions will be present as a separate MP4 track, while in practice they’re usually injected into the video stream, which will be harder to handle well.

Finally, there is the risk of breaking existing websites, that might have unintentionally left CEA-608 captions in, and don’t handle a surprise duplicate text track well.

Takeaway #

While this patch only provides experimental support so far, I feel this has given me valuable insight into how inband text tracks can work with various formats aside from just WebVTT. Ironically, CEA-608 even avoids some of WebVTT’s issues - there are no gaps or overlapping cues to worry about, for example.

Either way, I’m looking forward to improving on WebVTT’s pain points, and maybe adding other formats eventually!

August 18, 2025 12:00 AM

August 15, 2025

Tiago Vignatti

From Chromium to Community (2025)

In the first week of June (2025), our team at Igalia held our regular meeting about Chromium.

We talked about our technical projects, but also where the Chromium project is leading, given all the investments going to AI, and this interesting initiative from the Linux Foundation to fund open development of Chromium.

We also held our annual Igalia meeting, filled with many special moments — one of them being when Valerie, who had previously shared how Igalia is co-operatively managed, spoke about her personal journey and involvement with other cooperatives.

by Author at August 15, 2025 02:28 PM

August 13, 2025

Ricardo Cañuelo Navarro

First steps with Zephyr (II)

In the previous post we set up a Zephyr development environment and checked that we could build applications for multiple different targets. In this one we'll work on a sample application that we can use to showcase a few Zephyr features and as a template for other applications with a similar workflow.

We'll simulate a real work scenario and develop a firmware for a hardware board (in this example it'll be a Raspberry Pi Pico 2W) and we'll set up a development workflow that supports the native_sim target, so we can do most of the programming and software prototyping on a simulated environment without having to rely on the hardware. When developing for new hardware, it's a common practice that the software teams need to start working on firmware and drivers before the hardware is available, so the initial stages of software development for new silicon and boards is often tested on software or hardware emulators. Then, after the prototyping is done we can deploy and test the firmare on the real board. We'll see how we can do a simple behavioral model of some of the devices we'll use in the final hardware setup and how we can leverage this workflow to unit-test and refine the firmware.

This post is a walkthrough of the whole application. You can find the code here.

Application description

The application we'll build and run on the Raspberry Pi Pico 2W will basically just listen for a button press. When the button is pressed the app will enqueue some work to be done by a processing thread and the result will be published via I2C for a controller to request. At the same time, it will configure two serial consoles, one for message logging and another one for a command shell that can be used for testing and debugging.

These are the main features we'll cover with this experiment:

  • Support for multiple targets.
  • Target-specific build and hardware configuration.
  • Logging.
  • Multiple console output.
  • Zephyr shell with custom commands.
  • Device emulation.
  • GPIO handling.
  • I2C target handling.
  • Thread synchronization and message-passing.
  • Deferred work (bottom halves).

Hardware setup

Besides the target board and the development machine, we'll be using a Linux-based development board that we can use to communicate with the Zephyr board via I2C. Anything will do here, I used a very old Raspberry Pi Model B that I had lying around.

The only additional peripheral we'll need is a physical button connected to a couple of board pins. If we don't have any, a jumper cable and a steady pulse will also work. Optionally, to take full advantage of the two serial ports, a USB - TTL UART converter will be useful. Here's how the full setup looks like:

   +--------------------------+
   |                          |    Eth
   |      Raspberry Pi        |---------------+
   |                          |               |
   +--------------------------+               |
      6    5   3                              |
      |    |   |                              |
      |   I2C I2C       /                     |
     GND  SCL SDA    __/ __                   |
      |    |   |    |     GND                 |
      |    |   |    |      |                  |
      18   7   6    4     38                  |
   +--------------------------+            +-------------+
   |                          |    USB     | Development |
   |   Raspberry Pi Pico 2W   |------------|   machine   |
   |                          |            +-------------+
   +--------------------------+                |
          13      12      11                   |
           |      |       |                    |
          GND   UART1    UART1                 |
           |     RX       TX                   |
           |      |       |                    |
          +-----------------+     USB          |
          |  USB - UART TTL |------------------+
          |    converter    |
          +-----------------+

For additional info on how to set up the Linux-based Raspberry Pi, see the appendix at the end.

Setting up the application files

Before we start coding we need to know how we'll structure the application. There are certain conventions and file structure that the build system expects to find under certain scenarios. This is how we'll structure the application (test_rpi):


test_rpi
├── boards
│   ├── native_sim_64.conf
│   ├── native_sim_64.overlay
│   ├── rpi_pico2_rp2350a_m33.conf
│   └── rpi_pico2_rp2350a_m33.overlay
├── CMakeLists.txt
├── Kconfig
├── prj.conf
├── README.rst
└── src
    ├── common.h
    ├── emul.c
    ├── main.c
    └── processing.c

Some of the files there we already know from the previous post: CMakeLists.txt and prj.conf. All the application code will be in the src directory, and we can structure it as we want as long as we tell the build system about the files we want to compile. For this application, the main code will be in main.c, processing.c will contain the code of the processing thread, and emul.c will keep everything related to the device emulation for the native_sim target and will be compiled only when we build for that target. We describe this to the build system through the contents of CMakeLists.txt:


cmake_minimum_required(VERSION 3.20.0)

find_package(Zephyr REQUIRED HINTS $ENV{ZEPHYR_BASE})
project(test_rpi)

target_sources(app PRIVATE src/main.c src/processing.c)
target_sources_ifdef(CONFIG_BOARD_NATIVE_SIM app PRIVATE src/emul.c)

In prj.conf we'll put the general Zephyr configuration options for this application. Note that inside the boards directory there are two additional .conf files. These are target-specific options that will be merged to the common ones in prj.conf depending on the target we choose to build for.

Normally, most of the options we'll put in the .conf files will be already defined, but we can also define application-specific config options that we can later reference in the .conf files and the code. We can define them in the application-specific Kconfig file. The build system will it pick up as the main Kconfig file if it exists. For this application we'll define one additional config option that we'll use to configure the log level for the program, so this is how Kconfig will look like:


config TEST_RPI_LOG_LEVEL
	int "Default log level for test_rpi"
	default 4

source "Kconfig.zephyr"

Here we're simply prepending a config option before all the rest of the main Zephyr Kconfig file. We'll see how to use this option later.

Finally, the boards directory also contains target-specific overlay files. These are regular device tree overlays which are normally used to configure the hardware. More about that in a while.

Main application architecture

The application flow is structured in two main threads: the main thread and an additional processing thread that does its work separately. The main thread runs the application entry point (the main() function) and does all the software and device set up. Normally it doesn't need to do anything more, we can use it to start other threads and have them do the rest of the work while the main thread sits idle, but in this case we're doing some work with it instead of creating an additional thread for that. Regarding the processing thread, we can think of it as "application code" that runs on its own and provides a simple interface to interact with the rest of the system1.

Once the main thread has finished all the initialization process (creating threads, setting up callbacks, configuring devices, etc.) it sits in an infinite loop waiting for messages in a message queue. These messages are sent by the processing thread, which also runs in a loop waiting for messages in another queue. The messages to the processing thread are sent, as a result of a button press, by the GPIO ISR callback registered (actually, by the bottom half triggered by it and run by a workqueue thread). Ignoring the I2C part for now, this is how the application flow would look like:


    Main thread    Processing thread    Workqueue thread     GPIO ISR
        |                  |                    |                |
        |                  |                    |<--------------| |
        |                  |<------------------| |           (1) |
        |                 | |               (2) |                |
        |<----------------| |                   |                |
       | |             (3) |                    |                |
        |                  |                    |                |

Once the button press is detected, the GPIO ISR calls a callback we registered in the main setup code. The callback defers the work (1) through a workqueue (we'll see why later), which sends some data to the processing thread (2). The data it'll send is just an integer: the current uptime in seconds. The processing thread will then do some processing using that data (convert it to a string) and will send the processed data to the main thread (3). Let's take a look at the code that does all this.

Thread creation

As we mentioned, the main thread will be responsible for, among other tasks, spawning other threads. In our example it will create only one additional thread.


#include <zephyr/kernel.h>

#define THREAD_STACKSIZE	2048
#define THREAD_PRIORITY		10

K_THREAD_STACK_DEFINE(processing_stack, THREAD_STACKSIZE);
struct k_thread processing_thread;

int main(void)
{
	[...]

	/* Thread initialization */
	k_thread_create(&processing_thread, processing_stack,
			THREAD_STACKSIZE, data_process,
			&in_msgq, &out_msgq, NULL,
			THREAD_PRIORITY, 0, K_FOREVER);
	k_thread_name_set(&processing_thread, "processing");
	k_thread_start(&processing_thread);

We'll see what the data_process() function does in a while. For now, notice we're passing two message queues, one for input and one for output, as parameters for that function. These will be used as the interface to connect the processing thread to the rest of the firmware.

GPIO handling

Zephyr's device tree support greatly simplifies device handling and makes it really easy to parameterize and handle device operations in an abstract way. In this example, we define and reference the GPIO for the button in our setup using a platform-independent device tree node:


#define ZEPHYR_USER_NODE DT_PATH(zephyr_user)
const struct gpio_dt_spec button = GPIO_DT_SPEC_GET_OR(
	ZEPHYR_USER_NODE, button_gpios, {0});

This looks for a "button-gpios" property in the "zephyr,user" node in the device tree of the target platform and initializes a gpio_dt_spec property containing the GPIO pin information defined in the device tree. Note that this initialization and the check for the "zephyr,user" node are static and happen at compile time so, if the node isn't found, the error will be caught by the build process.

This is how the node is defined for the Raspberry Pi Pico 2W:


/ {

[...]

	zephyr,user {
		button-gpios = <&gpio0 2 (GPIO_ACTIVE_LOW | GPIO_PULL_UP)>;
	};
};

This defines the GPIO to be used as the second GPIO from bank 0, it'll be set up with an internal pull-up resistor and will be active-low. See the device tree GPIO API for details on the specification format. In the board, that GPIO is routed to pin 4:


Now we'll use the GPIO API to configure the GPIO as defined and to add a callback that will run when the button is pressed:


if (!gpio_is_ready_dt(&button)) {
	LOG_ERR("Error: button device %s is not ready",
	       button.port->name);
	return 0;
}
ret = gpio_pin_configure_dt(&button, GPIO_INPUT);
if (ret != 0) {
	LOG_ERR("Error %d: failed to configure %s pin %d",
	       ret, button.port->name, button.pin);
	return 0;
}
ret = gpio_pin_interrupt_configure_dt(&button,
                                      GPIO_INT_EDGE_TO_ACTIVE);
if (ret != 0) {
	LOG_ERR("Error %d: failed to configure interrupt on %s pin %d",
		ret, button.port->name, button.pin);
	return 0;
}
gpio_init_callback(&button_cb_data, button_pressed, BIT(button.pin));
gpio_add_callback(button.port, &button_cb_data);

We're configuring the pin as an input and then we're enabling interrupts for it when it goes to logical level "high". In this case, since we defined it as active-low, the interrupt will be triggered when the pin transitions from the stable pulled-up voltage to ground.

Finally, we're initializing and adding a callback function that will be called by the ISR when it detects that this GPIO goes active. We'll use this callback to start an action from a user event. The specific interrupt handling is done by the target-specific device driver2 and we don't have to worry about that, our code can remain device-independent.

NOTE: The callback we'll define is meant as a simple exercise for illustrative purposes. Zephyr provides an input subsystem to handle cases like this properly.

What we want to do in the callback is to send a message to the processing thread. The communication input channel to the thread is the in_msgq message queue, and the data we'll send is a simple 32-bit integer with the number of uptime seconds. But before doing that, we'll first de-bounce the button press using a simple idea: to schedule the message delivery to a workqueue thread:


/*
 * Deferred irq work triggered by the GPIO IRQ callback
 * (button_pressed). This should run some time after the ISR, at which
 * point the button press should be stable after the initial bouncing.
 *
 * Checks the button status and sends the current system uptime in
 * seconds through in_msgq if the the button is still pressed.
 */
static void debounce_expired(struct k_work *work)
{
	unsigned int data = k_uptime_seconds();
	ARG_UNUSED(work);

	if (gpio_pin_get_dt(&button))
		k_msgq_put(&in_msgq, &data, K_NO_WAIT);
}

static K_WORK_DELAYABLE_DEFINE(debounce_work, debounce_expired);

/*
 * Callback function for the button GPIO IRQ.
 * De-bounces the button press by scheduling the processing into a
 * workqueue.
 */
void button_pressed(const struct device *dev, struct gpio_callback *cb,
		    uint32_t pins)
{
	k_work_reschedule(&debounce_work, K_MSEC(30));
}

That way, every unwanted oscillation will cause a re-scheduling of the message delivery (replacing any prior scheduling). debounce_expired will eventually read the GPIO status and send the message.

Thread synchronization and messaging

As I mentioned earlier, the interface with the processing thread consists on two message queues, one for input and one for output. These are defined statically with the K_MSGQ_DEFINE macro:


#define PROC_MSG_SIZE		8

K_MSGQ_DEFINE(in_msgq, sizeof(int), 1, 1);
K_MSGQ_DEFINE(out_msgq, PROC_MSG_SIZE, 1, 1);

Both queues have space to hold only one message each. For the input queue (the one we'll use to send messages to the processing thread), each message will be one 32-bit integer. The messages of the output queue (the one the processing thread will use to send messages) are 8 bytes long.

Once the main thread is done initializing everything, it'll stay in an infinite loop waiting for messages from the processing thread. The processing thread will also run a loop waiting for incoming messages in the input queue, which are sent by the button callback, as we saw earlier, so the message queues will be used both for transferring data and for synchronization. Since the code running in the processing thread is so small, I'll paste it here in its entirety:


static char data_out[PROC_MSG_SIZE];

/*
 * Receives a message on the message queue passed in p1, does some
 * processing on the data received and sends a response on the message
 * queue passed in p2.
 */
void data_process(void *p1, void *p2, void *p3)
{
	struct k_msgq *inq = p1;
	struct k_msgq *outq = p2;
	ARG_UNUSED(p3);

	while (1) {
		unsigned int data;

		k_msgq_get(inq, &data, K_FOREVER);
		LOG_DBG("Received: %d", data);

		/* Data processing: convert integer to string */
		snprintf(data_out, sizeof(data_out), "%d", data);

		k_msgq_put(outq, data_out, K_NO_WAIT);
	}
}

I2C target implementation

Now that we have a way to interact with the program by inputting an external event (a button press), we'll add a way for it to communicate with the outside world: we're going to turn our device into a I2C target that will listen for command requests from a controller and send data back to it. In our setup, the controller will be Linux-based Raspberry Pi, see the diagram in the Hardware setup section above for details on how the boards are connected.

In order to define an I2C target we first need a suitable device defined in the device tree. To abstract the actual target-dependent device, we'll define and use an alias for it that we can redefine for every supported target. For instance, for the Raspberry Pi Pico 2W we define this alias in its device tree overlay:


/ {
	[...]

	aliases {
                i2ctarget = &i2c0;
	};

Where i2c0 is originally defined like this:


i2c0: i2c@40090000 {
	compatible = "raspberrypi,pico-i2c", "snps,designware-i2c";
	#address-cells = <1>;
	#size-cells = <0>;
	reg = <0x40090000 DT_SIZE_K(4)>;
	resets = <&reset RPI_PICO_RESETS_RESET_I2C0>;
	clocks = <&clocks RPI_PICO_CLKID_CLK_SYS>;
	interrupts = <36 RPI_PICO_DEFAULT_IRQ_PRIORITY>;
	interrupt-names = "i2c0";
	status = "disabled";
};

and then enabled:


&i2c0 {
	clock-frequency = <I2C_BITRATE_STANDARD>;
	status = "okay";
	pinctrl-0 = <&i2c0_default>;
	pinctrl-names = "default";
};

So now in the code we can reference the i2ctarget alias to load the device info and initialize it:


/*
 * Get I2C device configuration from the devicetree i2ctarget alias.
 * Check node availability at buid time.
 */
#define I2C_NODE	DT_ALIAS(i2ctarget)
#if !DT_NODE_HAS_STATUS_OKAY(I2C_NODE)
#error "Unsupported board: i2ctarget devicetree alias is not defined"
#endif
const struct device *i2c_target = DEVICE_DT_GET(I2C_NODE);

To register the device as a target, we'll use the i2c_target_register() function, which takes the loaded device tree device and an I2C target configuration (struct i2c_target_config) containing the I2C address we choose for it and a set of callbacks for all the possible events. It's in these callbacks where we'll define the target's functionality:


#define I2C_ADDR		0x60

[...]

static struct i2c_target_callbacks target_callbacks = {
	.write_requested = write_requested_cb,
	.write_received = write_received_cb,
	.read_requested = read_requested_cb,
	.read_processed = read_processed_cb,
	.stop = stop_cb,
};

[...]

int main(void)
{
	struct i2c_target_config target_cfg = {
		.address = I2C_ADDR,
		.callbacks = &target_callbacks,
	};

	if (i2c_target_register(i2c_target, &target_cfg) < 0) {
		LOG_ERR("Failed to register target");
		return -1;
	}

Each of those callbacks will be called as a response from an event started by the controller. Depending on how we want to define the target we'll need to code the callbacks to react appropriately to the controller requests. For this application we'll define a register that the controller can read to get a timestamp (the firmware uptime in seconds) from the last time the button was pressed. The number will be received as an 8-byte ASCII string.

If the controller is the Linux-based Raspberry Pi, we can use the i2c-tools to poll the target and read from it:


# Scan the I2C bus:
$ i2cdetect -y 0
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         -- -- -- -- -- -- -- -- 
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
60: 60 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
70: -- -- -- -- -- -- -- --

# I2C bus 0: issue command 0 (read uptime) on device 0x60:
# - Send byte 0 to device with address 0x60
# - Read back 8 bytes
$ i2ctransfer -y 0 w1@0x60 0 r8
0x36 0x33 0x00 0x00 0x00 0x00 0x00 0x00

We basically want the device to react when the controller sends a write request (to select the register and prepare the data), when it sends a read request (to send the data bytes back to the controller) and when it sends a stop condition.

To handle the data to be sent, the I2C callback functions manage an internal buffer that will hold the string data to send to the controller, and we'll load this buffer with the contents of a source buffer that's updated every time the main thread receives data from the processing thread (a double-buffer scheme). Then, when we program an I2C transfer we walk this internal buffer sending each byte to the controller as we receive read requests. When the transfer finishes or is aborted, we reload the buffer and rewind it for the next transfer:


typedef enum {
	I2C_REG_UPTIME,
	I2C_REG_NOT_SUPPORTED,

	I2C_REG_DEFAULT = I2C_REG_UPTIME
} i2c_register_t;

/* I2C data structures */
static char i2cbuffer[PROC_MSG_SIZE];
static int i2cidx = -1;
static i2c_register_t i2creg = I2C_REG_DEFAULT;

[...]

/*
 * Callback called on a write request from the controller.
 */
int write_requested_cb(struct i2c_target_config *config)
{
	LOG_DBG("I2C WRITE start");
	return 0;
}

/*
 * Callback called when a byte was received on an ongoing write request
 * from the controller.
 */
int write_received_cb(struct i2c_target_config *config, uint8_t val)
{
	LOG_DBG("I2C WRITE: 0x%02x", val);
	i2creg = val;
	if (val == I2C_REG_UPTIME)
		i2cidx = -1;

	return 0;
}

/*
 * Callback called on a read request from the controller.
 * If it's a first read, load the output buffer contents from the
 * current contents of the source data buffer (str_data).
 *
 * The data byte sent to the controller is pointed to by val.
 * Returns:
 *   0 if there's additional data to send
 *   -ENOMEM if the byte sent is the end of the data transfer
 *   -EIO if the selected register isn't supported
 */
int read_requested_cb(struct i2c_target_config *config, uint8_t *val)
{
	if (i2creg != I2C_REG_UPTIME)
		return -EIO;

	LOG_DBG("I2C READ started. i2cidx: %d", i2cidx);
	if (i2cidx < 0) {
		/* Copy source buffer to the i2c output buffer */
		k_mutex_lock(&str_data_mutex, K_FOREVER);
		strncpy(i2cbuffer, str_data, PROC_MSG_SIZE);
		k_mutex_unlock(&str_data_mutex);
	}
	i2cidx++;
	if (i2cidx == PROC_MSG_SIZE) {
		i2cidx = -1;
		return -ENOMEM;
	}
	*val = i2cbuffer[i2cidx];
	LOG_DBG("I2C READ send: 0x%02x", *val);

	return 0;
}

/*
 * Callback called on a continued read request from the
 * controller. We're implementing repeated start semantics, so this will
 * always return -ENOMEM to signal that a new START request is needed.
 */
int read_processed_cb(struct i2c_target_config *config, uint8_t *val)
{
	LOG_DBG("I2C READ continued");
	return -ENOMEM;
}

/*
 * Callback called on a stop request from the controller. Rewinds the
 * index of the i2c data buffer to prepare for the next send.
 */
int stop_cb(struct i2c_target_config *config)
{
	i2cidx = -1;
	LOG_DBG("I2C STOP");
	return 0;
}

int main(void)
{
	[...]

	while (1) {
		char buffer[PROC_MSG_SIZE];

		k_msgq_get(&out_msgq, buffer, K_FOREVER);
		LOG_DBG("Received: %s", buffer);
		k_mutex_lock(&str_data_mutex, K_FOREVER);
		strncpy(str_data, buffer, PROC_MSG_SIZE);
		k_mutex_unlock(&str_data_mutex);
	}

Device emulation

The application logic is done at this point, and we were careful to write it in a platform-agnostic way. As mentioned earlier, all the target-specific details are abstracted away by the device tree and the Zephyr APIs. Although we're developing with a real deployment board in mind, it's very useful to be able to develop and test using a behavioral model of the hardware that we can program to behave as close to the real hardware as we need and that we can run on our development machine without the cost and restrictions of the real hardware.

To do this, we'll rely on the native_sim board3, which implements the core OS services on top of a POSIX compatibility layer, and we'll add code to simulate the button press and the I2C requests.

Emulating a button press

We'll use the gpio_emul driver as a base for our emulated button. The native_sim device tree already defines an emulated GPIO bank for this:


gpio0: gpio_emul {
	status = "okay";
	compatible = "zephyr,gpio-emul";
	rising-edge;
	falling-edge;
	high-level;
	low-level;
	gpio-controller;
	#gpio-cells = <2>;
};

So we can define the GPIO to use for our button in the native_sim board overlay:


/ {
	[...]

	zephyr,user {
		button-gpios = <&gpio0 0 GPIO_ACTIVE_HIGH>;
	};
};

We'll model the button press as a four-phase event consisting on an initial status change caused by the press, then a semi-random rebound phase, then a phase of signal stabilization after the rebounds stop, and finally a button release. Using the gpio_emul API it'll look like this:


/*
 * Emulates a button press with bouncing.
 */
static void button_press(void)
{
	const struct device *dev = device_get_binding(button.port->name);
	int n_bounces = sys_rand8_get() % 10;
	int state = 1;
	int i;

	/* Press */
	gpio_emul_input_set(dev, 0, state);
	/* Bouncing */
	for (i = 0; i < n_bounces; i++) {
		state = state ? 0: 1;
		k_busy_wait(1000 * (sys_rand8_get() % 10));
		gpio_emul_input_set(dev, 0, state);
	}
	/* Stabilization */
	gpio_emul_input_set(dev, 0, 1);
	k_busy_wait(100000);
	/* Release */
	gpio_emul_input_set(dev, 0, 0);
}

The driver will take care of checking if the state changes need to raise interrupts, depending on the GPIO configuration, and will trigger the registered callback that we defined earlier.

Emulating an I2C controller

As with the button emulator, we'll rely on an existing emulated device driver for this: i2c_emul. Again, the device tree for the target already defines the node we need:


i2c0: i2c@100 {
	status = "okay";
	compatible = "zephyr,i2c-emul-controller";
	clock-frequency = <I2C_BITRATE_STANDARD>;
	#address-cells = <1>;
	#size-cells = <0>;
	#forward-cells = <1>;
	reg = <0x100 4>;
};

So we can define a machine-independent alias that we can reference in the code:


/ {
	aliases {
		i2ctarget = &i2c0;
	};

The events we need to emulate are the requests sent by the controller: READ start, WRITE start and STOP. We can define these based on the i2c_transfer() API function which will, in this case, use the i2c_emul driver implementation to simulate the transfer. As in the GPIO emulation case, this will trigger the appropriate callbacks. The implementation of our controller requests looks like this:


/*
 * A real controller may want to continue reading after the first
 * received byte. We're implementing repeated-start semantics so we'll
 * only be sending one byte per transfer, but we need to allocate space
 * for an extra byte to process the possible additional read request.
 */
static uint8_t emul_read_buf[2];

/*
 * Emulates a single I2C READ START request from a controller.
 */
static uint8_t *i2c_emul_read(void)
{
	struct i2c_msg msg;
	int ret;

	msg.buf = emul_read_buf;
	msg.len = sizeof(emul_read_buf);
	msg.flags = I2C_MSG_RESTART | I2C_MSG_READ;
	ret = i2c_transfer(i2c_target, &msg, 1, I2C_ADDR);
	if (ret == -EIO)
		return NULL;

	return emul_read_buf;
}

static void i2c_emul_write(uint8_t *data, int len)
{
	struct i2c_msg msg;

	/*
	 * NOTE: It's not explicitly said anywhere that msg.buf can be
	 * NULL even if msg.len is 0. The behavior may be
	 * driver-specific and prone to change so we're being safe here
	 * by using a 1-byte buffer.
	 */
	msg.buf = data;
	msg.len = len;
	msg.flags = I2C_MSG_WRITE;
	i2c_transfer(i2c_target, &msg, 1, I2C_ADDR);
}

/*
 * Emulates an explicit I2C STOP sent from a controller.
 */
static void i2c_emul_stop(void)
{
	struct i2c_msg msg;
	uint8_t buf = 0;

	/*
	 * NOTE: It's not explicitly said anywhere that msg.buf can be
	 * NULL even if msg.len is 0. The behavior may be
	 * driver-specific and prone to change so we're being safe here
	 * by using a 1-byte buffer.
	 */
	msg.buf = &buf;
	msg.len = 0;
	msg.flags = I2C_MSG_WRITE | I2C_MSG_STOP;
	i2c_transfer(i2c_target, &msg, 1, I2C_ADDR);
}

Now we can define a complete request for an "uptime read" operation in terms of these primitives:


/*
 * Emulates an I2C "UPTIME" command request from a controller using
 * repeated start.
 */
static void i2c_emul_uptime(const struct shell *sh, size_t argc, char **argv)
{
	uint8_t buffer[PROC_MSG_SIZE] = {0};
	i2c_register_t reg = I2C_REG_UPTIME;
	int i;

	i2c_emul_write((uint8_t *)&reg, 1);
	for (i = 0; i < PROC_MSG_SIZE; i++) {
		uint8_t *b = i2c_emul_read();
		if (b == NULL)
			break;
		buffer[i] = *b;
	}
	i2c_emul_stop();

	if (i == PROC_MSG_SIZE) {
		shell_print(sh, "%s", buffer);
	} else {
		shell_print(sh, "Transfer error");
	}
}

Ok, so now that we have implemented all the emulated operations we needed, we need a way to trigger them on the emulated environment. The Zephyr shell is tremendously useful for cases like this.

Shell commands

The shell module in Zephyr has a lot of useful features that we can use for debugging. It's quite extensive and talking about it in detail is out of the scope of this post, but I'll show how simple it is to add a few custom commands to trigger the button presses and the I2C controller requests from a console. In fact, for our purposes, the whole thing is as simple as this:


SHELL_CMD_REGISTER(buttonpress, NULL, "Simulates a button press", button_press);
SHELL_CMD_REGISTER(i2cread, NULL, "Simulates an I2C read request", i2c_emul_read);
SHELL_CMD_REGISTER(i2cuptime, NULL, "Simulates an I2C uptime request", i2c_emul_uptime);
SHELL_CMD_REGISTER(i2cstop, NULL, "Simulates an I2C stop request", i2c_emul_stop);

We'll enable these commands only when building for the native_sim board. With the configuration provided, once we run the application we'll have the log output in stdout and the shell UART connected to a pseudotty, so we can access it in a separate terminal and run these commands while we see the output in the terminal where we ran the application:


$ ./build/zephyr/zephyr.exe
WARNING: Using a test - not safe - entropy source
uart connected to pseudotty: /dev/pts/16
*** Booting Zephyr OS build v4.1.0-6569-gf4a0beb2b7b1 ***

# In another terminal
$ screen /dev/pts/16

uart:~$
uart:~$ help
Please press the <Tab> button to see all available commands.
You can also use the <Tab> button to prompt or auto-complete all commands or its subcommands.
You can try to call commands with <-h> or <--help> parameter for more information.

Shell supports following meta-keys:
  Ctrl + (a key from: abcdefklnpuw)
  Alt  + (a key from: bf)
Please refer to shell documentation for more details.

Available commands:
  buttonpress  : Simulates a button press
  clear        : Clear screen.
  device       : Device commands
  devmem       : Read/write physical memory
                 Usage:
                 Read memory at address with optional width:
                 devmem <address> [<width>]
                 Write memory at address with mandatory width and value:
                 devmem <address> <width> <value>
  help         : Prints the help message.
  history      : Command history.
  i2cread      : Simulates an I2C read request
  i2cstop      : Simulates an I2C stop request
  i2cuptime    : Simulates an I2C uptime request
  kernel       : Kernel commands
  rem          : Ignore lines beginning with 'rem '
  resize       : Console gets terminal screen size or assumes default in case
                 the readout fails. It must be executed after each terminal
                 width change to ensure correct text display.
  retval       : Print return value of most recent command
  shell        : Useful, not Unix-like shell commands.

To simulate a button press (ie. capture the current uptime):


uart:~$ buttonpress

And the log output should print the enabled debug messages:


[00:00:06.300,000] <dbg> test_rpi: data_process: Received: 6
[00:00:06.300,000] <dbg> test_rpi: main: Received: 6

If we now simulate an I2C uptime command request we should get the captured uptime as a string:


uart:~$ i2cuptime 
6

We can check the log to see how the I2C callbacks ran:


[00:01:29.400,000] <dbg> test_rpi: write_requested_cb: I2C WRITE start
[00:01:29.400,000] <dbg> test_rpi: write_received_cb: I2C WRITE: 0x00
[00:01:29.400,000] <dbg> test_rpi: stop_cb: I2C STOP
[00:01:29.400,000] <dbg> test_rpi: read_requested_cb: I2C READ started. i2cidx: -1
[00:01:29.400,000] <dbg> test_rpi: read_requested_cb: I2C READ send: 0x36
[00:01:29.400,000] <dbg> test_rpi: read_processed_cb: I2C READ continued
[00:01:29.400,000] <dbg> test_rpi: read_requested_cb: I2C READ started. i2cidx: 0
[00:01:29.400,000] <dbg> test_rpi: read_requested_cb: I2C READ send: 0x00
[00:01:29.400,000] <dbg> test_rpi: read_processed_cb: I2C READ continued
[00:01:29.400,000] <dbg> test_rpi: read_requested_cb: I2C READ started. i2cidx: 1
[00:01:29.400,000] <dbg> test_rpi: read_requested_cb: I2C READ send: 0x00
[00:01:29.400,000] <dbg> test_rpi: read_processed_cb: I2C READ continued
[00:01:29.400,000] <dbg> test_rpi: read_requested_cb: I2C READ started. i2cidx: 2
[00:01:29.400,000] <dbg> test_rpi: read_requested_cb: I2C READ send: 0x00
[00:01:29.400,000] <dbg> test_rpi: read_processed_cb: I2C READ continued
[00:01:29.400,000] <dbg> test_rpi: read_requested_cb: I2C READ started. i2cidx: 3
[00:01:29.400,000] <dbg> test_rpi: read_requested_cb: I2C READ send: 0x00
[00:01:29.400,000] <dbg> test_rpi: read_processed_cb: I2C READ continued
[00:01:29.400,000] <dbg> test_rpi: read_requested_cb: I2C READ started. i2cidx: 4
[00:01:29.400,000] <dbg> test_rpi: read_requested_cb: I2C READ send: 0x00
[00:01:29.400,000] <dbg> test_rpi: read_processed_cb: I2C READ continued
[00:01:29.400,000] <dbg> test_rpi: read_requested_cb: I2C READ started. i2cidx: 5
[00:01:29.400,000] <dbg> test_rpi: read_requested_cb: I2C READ send: 0x00
[00:01:29.400,000] <dbg> test_rpi: read_processed_cb: I2C READ continued
[00:01:29.400,000] <dbg> test_rpi: read_requested_cb: I2C READ started. i2cidx: 6
[00:01:29.400,000] <dbg> test_rpi: read_requested_cb: I2C READ send: 0x00
[00:01:29.400,000] <dbg> test_rpi: read_processed_cb: I2C READ continued
[00:01:29.400,000] <dbg> test_rpi: stop_cb: I2C STOP

Appendix: Linux set up on the Raspberry Pi

This is the process I followed to set up a Linux system on a Raspberry Pi (very old, model 1 B). There are plenty of instructions for this on the Web, and you can probably just pick up a pre-packaged and pre-configured Raspberry Pi OS and get done with it faster, so I'm adding this here for completeness and because I want to have a finer grained control of what I put into it.

The only harware requirement is an SD card with two partitions: a small (~50MB) FAT32 boot partition and the rest of the space for the rootfs partition, which I formatted as ext4. The boot partition should contain a specific set of configuration files and binary blobs, as well as the kernel that we'll build and the appropriate device tree binary. See the official docs for more information on the boot partition contents and this repo for the binary blobs. For this board, the minimum files needed are:

  • bootcode.bin: the second-stage bootloader, loaded by the first-stage bootloader in the BCM2835 ROM. Run by the GPU.
  • start.elf: GPU firmware, starts the ARM CPU.
  • fixup.dat: needed by start.elf. Used to configure the SDRAM.
  • kernel.img: this is the kernel image we'll build.
  • dtb files and overlays.

And, optionally but very recommended:

  • config.txt: bootloader configuration.
  • cmdline.txt: kernel command-line parameters.

In practice, pretty much all Linux setups will also have these files. For our case we'll need to add one additional config entry to the config.txt file in order to enable the I2C bus:


dtparam=i2c_arm=on

Once we have the boot partition populated with the basic required files (minus the kernel and dtb files), the two main ingredients we need to build now are the kernel image and the root filesystem.

Building a Linux kernel for the Raspberry Pi

Main reference: Raspberry Pi docs

There's nothing non-standard about how we'll generate this kernel image, so you can search the Web for references on how the process works if you need to. The only things to take into account is that we'll pick the Raspberry Pi kernel instead of a vanilla mainline kernel. I also recommend getting the arm-linux-gnueabi cross-toolchain from kernel.org.

After installing the toolchain and cloning the repo, we just have to run the usual commands to configure the kernel, build the image, the device tree binaries, the modules and have the modules installed in a specific directory, but first we'll add some extra config options:


cd kernel_dir
KERNEL=kernel
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- bcmrpi_defconfig

We'll need to add at least ext4 builtin support so that the kernel can mount the rootfs, and I2C support for our experiments, so we need to edit .config, add these:


CONFIG_EXT4_FS=y
CONFIG_I2C=y

And run the olddefconfig target. Then we can proceed with the rest of the build steps:


make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- olddefconfig
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- zImage modules dtbs -j$(nproc)
mkdir modules
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- INSTALL_MOD_PATH=./modules modules_install

Now we need to copy the kernel and the dtbs to the boot partition of the sd card:


cp arch/arm/boot/zImage /path_to_boot_partition_mountpoint/kernel.img
cp arch/arm/boot/dts/broadcom/*.dtb /path_to_boot_partition_mountpoint
mkdir /path_to_boot_partition_mountpoint/overlays
cp arch/arm/boot/dts/overlays/*.dtb /path_to_boot_partition_mountpoint/overlays

(we really only need the dtb for this particular board, but anyway).

Setting up a Debian rootfs

There are many ways to do this, but I normally use the classic debootstrap to build Debian rootfss. Since I don't always know which packages I'll need to install ahead of time, the strategy I follow is to build a minimal image with the bare minimum requirements and then boot it either on a virtual machine or in the final target and do the rest of the installation and setup there. So for the initial setup I'll only include the openssh-server package:


mkdir bookworm_armel_raspi
sudo debootstrap --arch armel --include=openssh-server bookworm \
        bookworm_armel_raspi http://deb.debian.org/debian

# Remove the root password
sudo sed -i '/^root/ { s/:x:/::/ }' bookworm_armel_raspi/etc/passwd

# Create a pair of ssh keys and install them to allow passwordless
# ssh logins
cd ~/.ssh
ssh-keygen -f raspi
sudo mkdir bookworm_armel_raspi/root/.ssh
cat raspi.pub | sudo tee bookworm_armel_raspi/root/.ssh/authorized_keys

Now we'll copy the kernel modules to the rootfs. From the kernel directory, and based on the build instructions above:


cd kernel_dir
sudo cp -fr modules/lib/modules /path_to_rootfs_mountpoint/lib

If your distro provides qemu static binaries (eg. Debian: qemu-user-static), it's a good idea to copy the qemu binary to the rootfs so we can mount it locally and run apt-get on it:


sudo cp /usr/bin/qemu-arm-static bookworm_armel_raspi/usr/bin

Otherwise, we can boot a kernel on qemu and load the rootfs there to continue the installation. Next we'll create and populate the filesystem image, then we can boot it on qemu for additional tweaks or dump it into the rootfs partition of the SD card:


# Make rootfs image
fallocate -l 2G bookworm_armel_raspi.img
sudo mkfs -t ext4 bookworm_armel_raspi.img
sudo mkdir /mnt/rootfs
sudo mount -o loop bookworm_armel_raspi.img /mnt/rootfs/
sudo cp -a bookworm_armel_raspi/* /mnt/rootfs/
sudo umount /mnt/rootfs

To copy the rootfs to the SD card:


sudo dd if=bookworm_armel_raspi.img of=/dev/sda2 bs=4M

(Substitute /dev/sda2 for the sd card rootfs partition in your system).

At this point, if we need to do any extra configuration steps we can either:

  • Mount the SD card and make the changes there.
  • Boot the filesystem image in qemu with a suitable kernel and make the changes in a live system, then dump the changes into the SD card again.
  • Boot the board and make the changes there directly. For this we'll need to access the board serial console through its UART pins.

Here are some of the changes I made. First, network configuration. I'm setting up a dedicated point-to-point Ethernet link between the development machine (a Linux laptop) and the Raspberry Pi, with fixed IPs. That means I'll use a separate subnet for this minimal LAN and that the laptop will forward traffic between the Ethernet nic and the WLAN interface that's connected to the Internet. In the rootfs I added a file (/etc/systemd/network/20-wired.network) with the following contents:


[Match]
Name=en*

[Network]
Address=192.168.2.101/24
Gateway=192.168.2.100
DNS=1.1.1.1

Where 192.168.2.101 is the address of the board NIC and 192.168.2.100 is the one of the Eth NIC in my laptop. Then, assuming we have access to the serial console of the board and we logged in as root, we need to enable systemd-networkd:


systemctl enable systemd-networkd

Additionally, we need to edit the ssh server configuration to allow login as root. We can do this by setting PermitRootLogin yes in /etc/ssh/sshd_config.

In the development machine, I configured the traffic forwarding to the WLAN interface:


sudo sysctl -w net.ipv4.ip_forward=1
sudo pptables -t nat -A POSTROUTING -o <wlan_interface> -j MASQUERADE

Once all the configuration is done we should be able to log in as root via ssh:


ssh -i ~/.ssh/raspi root@192.168.2.101

In order to issue I2C requests to the Zephyr board, we'll need to load the i2c-dev module at boot time and install the i2c-tools in the Raspberry Pi:


apt-get install i2c-tools
echo "ic2-dev" >> /etc/modules

1: Although in this case the thread is a regular kernel thread and runs on the same memory space as the rest of the code, so there's no memory protection. See the User Mode page in the docs for more details.

2: As a reference, for the Raspberry Pi Pico 2W, this is where the ISR is registered for enabled GPIO devices, and this is the ISR that checks the pin status and triggers the registered callbacks.

3: native_sim_64 in my setup.

by rcn at August 13, 2025 12:00 PM

August 11, 2025

Igalia WebKit Team

WebKit Igalia Periodical #33

Update on what happened in WebKit in the week from July 29 to August 11.

This update covers two weeks, including a deluge of releases and graphics work.

Cross-Port 🐱

Graphics 🖼️

CSS animations with a cubic-bezier timing function are now correctly rendered

The rewrite of the WebXR support continued making steady progress, and is getting closer to being able to render content again.

WPE WebKit 📟

The WPE port gained basic undo support in text inputs.

WPE Android 🤖

Adaptation of WPE WebKit targeting the Android operating system.

WPE-Android has been updated to use WebKit 2.48.5. This update particular interest for development on Android is the support for using the system logd service, which can be configured using system properties. For example, the following will enable logging all warnings:

adb shell setprop debug.log.WPEWebKit all
adb shell setprop log.tag.WPEWebKit WARN

Updated prebuilt packages are also available in the Central repository.

Releases 📦️

Stable releases of WebKitGTK 2.48.5 and WPE WebKit 2.48.5 are now available. These include the fixes and improvements from the corresponding 2.48.4 ones, and additionally solve a number of security issues. Advisory WSA-2025-0005 (GTK, WPE) covers the included security patches.

WebKitGTK 2.49.3 and WPE WebKit 2.49.4 have been released, intended to test out upcoming features and improvements. As usual, issue reports are welcome in Bugzilla, and are particularly important now to stabilize the newly created branch for the upcoming 2.50.x series.

Ruby was re-added to the GNOME SDK, thanks to Michael Catanzaro and Jordan Petridis. So we're happy to report that the WebKitGTK nightly builds for GNOME Web Canary are now fixed and Canary updates were resumed.

That’s all for this week!

by Igalia WebKit Team at August 11, 2025 09:07 PM

August 07, 2025

Andy Wingo

whippet hacklog: adding freelists to the no-freelist space

August greetings, comrades! Today I want to bookend some recent work on my Immix-inspired garbage collector: firstly, an idea with muddled results, then a slog through heuristics.

the big idea

My mostly-marking collector’s main space is called the “nofl space”. Its name comes from its historical evolution from mark-sweep to mark-region: instead of sweeping unused memory to freelists and allocating from those freelists, sweeping is interleaved with allocation; “nofl” means “no free-list”. As it finds holes, the collector bump-pointer allocates into those holes. If an allocation doesn’t fit into the current hole, the collector sweeps some more to find the next hole, possibly fetching another block. Space for holes that are too small is effectively wasted as fragmentation; mutators will try again after the next GC. Blocks with lots of holes will be chosen for opportunistic evacuation, which is the heap defragmentation mechanism.

Hole-too-small fragmentation has bothered me, because it presents a potential pathology. You don’t know how a GC will be used or what the user’s allocation pattern will be; if it is a mix of medium (say, a kilobyte) and small (say, 16 bytes) allocations, one could imagine a medium allocation having to sweep over lots of holes, discarding them in the process, which hastens the next collection. Seems wasteful, especially for non-moving configurations.

So I had a thought: why not collect those holes into a size-segregated freelist? We just cleared the hole, the memory is core-local, and we might as well. Then before fetching a new block, the allocator slow-path can see if it can service an allocation from the second-chance freelist of holes. This decreases locality a bit, but maybe it’s worth it.

Thing is, I implemented it, and I don’t know if it’s worth it! It seems to interfere with evacuation, in that the blocks that would otherwise be most profitable to evacuate, because they contain many holes, are instead filled up with junk due to second-chance allocation from the freelist. I need to do more measurements, but I think my big-brained idea is a bit of a wash, at least if evacuation is enabled.

heap growth

When running the new collector in Guile, we have a performance oracle in the form of BDW: it had better be faster for Guile to compile a Scheme file with the new nofl-based collector than with BDW. In this use case we have an additional degree of freedom, in that unlike the lab tests of nofl vs BDW, we don’t impose a fixed heap size, and instead allow heuristics to determine the growth.

BDW’s built-in heap growth heuristics are very opaque. You give it a heap multiplier, but as a divisor truncated to an integer. It’s very imprecise. Additionally, there are nonlinearities: BDW is relatively more generous for smaller heaps, because attempts to model and amortize tracing cost, and there are some fixed costs (thread sizes, static data sizes) that don’t depend on live data size.

Thing is, BDW’s heuristics work pretty well. For example, I had a process that ended with a heap of about 60M, for a peak live data size of 25M or so. If I ran my collector with a fixed heap multiplier, it wouldn’t do as well as BDW, because it collected much more frequently when the heap was smaller.

I ended up switching from the primitive “size the heap as a multiple of live data” strategy to live data plus a square root factor; this is like what Racket ended up doing in its simple implementation of MemBalancer. (I do have a proper implementation of MemBalancer, with time measurement and shrinking and all, but I haven’t put it through its paces yet.) With this fix I can meet BDW’s performance for my Guile-compiling-Guile-with-growable-heap workload. It would be nice to exceed BDW of course!

parallel worklist tweaks

Previously, in parallel configurations, trace workers would each have a Chase-Lev deque to which they could publish objects needing tracing. Any worker could steal an object from the top of a worker’s public deque. Also, each worker had a local, unsynchronized FIFO worklist, some 1000 entries in length; when this worklist filled up, the worker would publish its contents.

There is a pathology for this kind of setup, in which one worker can end up with a lot of work that it never publishes. For example, if there are 100 long singly-linked lists on the heap, and the worker happens to have them all on its local FIFO, then perhaps they never get published, because the FIFO never overflows; you end up not parallelising. This seems to be the case in one microbenchmark. I switched to not have local worklists at all; perhaps this was not the right thing, but who knows. Will poke in future.

a hilarious bug

Sometimes you need to know whether a given address is in an object managed by the garbage collector. For the nofl space it’s pretty easy, as we have big slabs of memory; bisecting over the array of slabs is fast. But for large objects whose memory comes from the kernel, we don’t have that. (Yes, you can reserve a big ol’ region with PROT_NONE and such, and then allocate into that region; I don’t do that currently.)

Previously I had a splay tree for lookup. Splay trees are great but not so amenable to concurrent access, and parallel marking is one place where we need to do this lookup. So I prepare a sorted array before marking, and then bisect over that array.

Except a funny thing happened: I switched the bisect routine to return the start address if an address is in a region. Suddenly, weird failures started happening randomly. Turns out, in some places I was testing if bisection succeeded with an int; if the region happened to be 32-bit-aligned, then the nonzero 64-bit uintptr_t got truncated to its low 32 bits, which were zero. Yes, crusty reader, Rust would have caught this!

fin

I want this new collector to work. Getting the growth heuristic good enough is a step forward. I am annoyed that second-chance allocation didn’t work out as well as I had hoped; perhaps I will find some time this fall to give a proper evaluation. In any case, thanks for reading, and hack at you later!

by Andy Wingo at August 07, 2025 03:02 PM

Ricardo Cañuelo Navarro

First steps with Zephyr

I recently started playing around with Zephyr, reading about it and doing some experiments, and I figured I'd rather jot down my impressions and findings so that the me in the future, who'll have no recollection of ever doing this, can come back to it as a reference. And if it's helpful for anybody else, that's a nice bonus.

It's been a really long time since I last dove into embedded programming for low-powered hardware and things have changed quite a bit, positively, both in terms of hardware availability for professionals and hobbyists and in the software options. Back in the day, most of the open source embedded OSs1 I tried felt like toy operating systems: enough for simple applications but not really suitable for more complex systems (eg. not having a proper preemptive scheduler is a serious limitation). In the proprietary side things looked better and there were many more options but, of course, those weren't freely available.

Nowadays, Zephyr has filled that gap in the open source embedded OSs field2, even becoming the de facto OS to use, something like a "Linux for embedded": it feels like a full-fledged OS, it's feature rich, flexible and scalable, it has an enormous traction in embedded, it's widely supported by many of the big names in the industry and it has plenty of available documentation, resources and a thriving community. Currently, if you need to pick an OS for embedded platforms, unless you're targetting very minimal hardware (8/16bit microcontrollers), it's a no brainer.

Noteworthy features

One of the most interesting qualities of Zephyr is its flexibility: the base system is lean and has a small footprint, and at the same time it's easy to grow a Zephyr-based firmware for more complex applications thanks to the variety of supported features. These are some of them:

  • Feature-rich kernel core services: for a small operating system, the amount of core services available is quite remarkable. Most of the usual tools for general application development are there: thread-based runtime with preemptive and cooperative scheduling, multiple synchronization and IPC mechanisms, basic memory management functions, asynchronous and event-based programming support, task management, etc.
  • SMP support.
  • Extensive core library: including common data structures, shell support and a POSIX compatibility layer.
  • Out-of-the-box hardware support for a large number of boards.
  • Logging and tracing: simple but capable facilities with support for different backends, easy to adapt to the hardware and application needs.
  • Native simulation target and device emulation: allows to build applications as native binaries that can run on the development platform for prototyping and debugging purposes.
  • Device tree support for hardware description and configuration.
  • Configurable scheduler.
  • Memory protection support and usermode applications on supported architectures.
  • Powerful and easy to use build tool.

Find more information and details in the Zephyr online documentation.

Getting started

Now let's move on and get some actual hands on experience with Zephyr. The first thing we'll do is to set up a basic development environment so we can start writing some experiments and testing them. It's a good idea to keep a browser tab open on the Zephyr docs, so we can reference them when needed or search for more detailed info.

Development environment setup

The development environment is set up and contained within a python venv. The Zephyr project provides the west command line tool to carry out all the setup and build steps.

The basic tool requirements in Linux are CMake, Python3 and the device tree compiler. Assuming they are installed and available, we can then set up a development environment like this:


python3 -m venv zephyrproject/.venv
. zephyrproject/.venv/bin/activate

# Now inside the venv

pip install west
west init zephyrproject
cd zephyrproject
west update

west zephyr-export
west packages pip --install
        

Some basic nomenclature: the zephyrproject directory is known as a west "workspace". Inside it, the zephyr directory contains the repo of Zephyr itself.

Next step is to install the Zephyr SDK, ie. the toolchains and other host tools. I found this step a bit troublesome and it could have better defaults. By default it will install all the available SDKs (many of which we won't need) and then all the host tools (which we may not need either). Also, in my setup, the script that install the host tools fails with a buffer overflow, so instead of relying on it to install the host tools (in my case I only needed qemu) I installed it myself. This has some drawbacks: we might be missing some features that are in the custom qemu binaries provided by the SDK, and west won't be able to run our apps on qemu automatically, we'll have to do that ourselves. Not ideal but not a dealbreaker either, I could figure it out and run that myself just fine.

So I recommend to install the SDK interactively so we can select the toolchains we want and whether we want to install the host tools or not (in my case I didn't):


cd zephyr
west sdk install -i
        

For the initial tests I'm targetting riscv64 on qemu, we'll pick up other targets later. In my case, since the host tools installation failed on my setup, I needed to provide qemu-system-riscv64 myself, you probably won't have to do that.

Now, to see if everything is set up correctly, we can try to build the simplest example program there is: samples/hello_world. To build it for qemu_riscv64 we can use west like this:


west build -p always -b qemu_riscv64 samples/hello_world
        

Where -p always tells west to do a pristine build ,ie. build everything every time. We may not need that necessarily but for now it's a safe flag to use.

Then, to run the app in qemu, the standard way is to do west build -t run, but if we didn't install the Zephyr host tools we'll need to run qemu ourselves:


qemu-system-riscv64 -nographic -machine virt -bios none -m 256 -net none \
    -pidfile qemu.pid -chardev stdio,id=con,mux=on -serial chardev:con \
    -mon chardev=con,mode=readline -icount shift=6,align=off,sleep=off \
    -rtc clock=vm \
    -kernel zephyr/build/zephyr/zephyr.elf

*** Booting Zephyr OS build v4.1.0-6569-gf4a0beb2b7b1 ***
Hello World! qemu_riscv64/qemu_virt_riscv64
        

Architecture-specific note: we're calling qemu-system-riscv64 with -bios none to prevent qemu from loading OpenSBI into address 0x80000000. Zephyr doesn't need OpenSBI and it's loaded into that address, which is where qemu-riscv's ZSBL jumps to3.

Starting a new application

The Zephyr Example Application repo repo contains an example application that we can use as a reference for a workspace application (ie. an application that lives in the `zephyrproject` workspace we created earlier). Although we can use it as a reference, I didn't have a good experience with it According to the docs, we can simply clone the example application repo into an existing workspace, but that doesn't seem to work, and it looks like the docs are wrong about that. , so I recommend to start from scratch or to take the example applications in the zephyr/samples directory as templates as needed.

To create a new application, we simply have to make a directory for it in the workspace dir and write a minimum set of required files:


.
├── CMakeLists.txt
├── prj.conf
├── README.rst
└── src
    └── main.c
        

CMakeLists.txt contains the required instructions for CMake to find and build the sources (only main.c in this example):


cmake_minimum_required(VERSION 3.20.0)

find_package(Zephyr REQUIRED HINTS $ENV{ZEPHYR_BASE})
project(test_app)

target_sources(app PRIVATE src/main.c)
        

where test_app is the name of the application. prj.conf is meant to contain application-specific config options and will be empty for now. README.rst is optional.

Assuming the code in main.c is correct, we can then build the application for a specific target with:


west build -p always -b <target> <app_name>
        

where <app_name> is the directory containing the application files listed above. Note that west uses CMake under the hood, so the build will be based on whatever build system CMake uses (apparently, ninja by default), so many of these operations can also be done at a lower level using the underlying build system commands (not recommended).

Building for different targets

Zephyr supports building applications for different target types or abstractions. While the end goal will normally be to have a firmare running on a SoC, for debugging purposes, for testing or simply to carry out most of the development without relying on hardware, we can target qemu to run the application on an emulated environment, or we can even build the app as a native binary to run on the development machine.

The differences between targets can be abstracted through proper use of APIs and device tree definitions so, in theory, the same application (with certain limitations) can be seamlessly built for different targets without modifications, and the build process takes care of doing the right thing depending on the target.

As an example, let's build and run the hello_world sample program in three different targets with different architectures: native_sim (x86_64 with emulated devices), qemu (Risc-V64 with full system emulation) and a real board, a Raspberry Pi Pico 2W (ARM Cortex-M33).

Before starting, let's clean up any previous builds:


west build -t clean
        

Now, to build and run the application as a native binary:


west build -p always -b native_sim/native/64 zephyr/samples/hello_world
[... omitted build output]

./build/zephyr/zephyr.exe 
*** Booting Zephyr OS build v4.1.0-6569-gf4a0beb2b7b1 ***
Hello World! native_sim/native/64
        

For Risc-V64 on qemu:


west build -t clean
west build -p always -b qemu_riscv64 zephyr/samples/hello_world
[... omitted build output]

west build -t run
*** Booting Zephyr OS build v4.1.0-6569-gf4a0beb2b7b1 ***
Hello World! qemu_riscv64/qemu_virt_riscv64
        

For the Raspberry Pi Pico 2W:


west build -t clean
west build -p always -b rpi_pico2/rp2350a/m33 zephyr/samples/hello_world
[... omitted build output]

west flash -r uf2
        

In this case, flashing and checking the console output are board-specific steps. Assuming the flashing process worked, if we connect to the board UART0, we can see the output message:


*** Booting Zephyr OS build v4.1.0-6569-gf4a0beb2b7b1 ***
Hello World! rpi_pico2/rp2350a/m33
        

Note that the application prints that line like this:


#include <stdio.h>

int main(void)
{
	printf("Hello World! %s\n", CONFIG_BOARD_TARGET);

	return 0;
}
        

The output of printf will be sent through the target zephyr,console device, however it's defined in its device tree. So, for native_sim:


/ {
[...]
	chosen {
		zephyr,console = &uart0;
[...]
	uart0: uart {
		status = "okay";
		compatible = "zephyr,native-pty-uart";
		/* Dummy current-speed entry to comply with serial
		 * DTS binding
		 */
		current-speed = <0>;
	};
        

Which will eventually print to stdout (see drivers/console/posix_arch_console.c and scripts/native_simulator/native/src/nsi_trace.c). For qemu_riscv64:


/ {
	chosen {
		zephyr,console = &uart0;
[...]

&uart0 {
	status = "okay";
};
        

and from virt-riscv.dtsi:


uart0: uart@10000000 {
	interrupts = < 0x0a 1 >;
	interrupt-parent = < &plic >;
	clock-frequency = < 0x384000 >;
	reg = < 0x10000000 0x100 >;
	compatible = "ns16550";
	reg-shift = < 0 >;
};
        

For the Raspberry Pi Pico 2W:


/ {
	chosen {
[...]
		zephyr,console = &uart0;

[...]

&uart0 {
	current-speed = <115200>;
	status = "okay";
	pinctrl-0 = <&uart0_default>;
	pinctrl-names = "default";
};
        

and from rp2350.dtsi:


uart0: uart@40070000 {
	compatible = "raspberrypi,pico-uart", "arm,pl011";
	reg = <0x40070000 DT_SIZE_K(4)>;
	clocks = <&clocks RPI_PICO_CLKID_CLK_PERI>;
	resets = <&reset RPI_PICO_RESETS_RESET_UART0>;
	interrupts = <33 RPI_PICO_DEFAULT_IRQ_PRIORITY>;
	interrupt-names = "uart0";
	status = "disabled";
};
        

This shows we can easily build our applications using hardware abstractions and have them working on different platforms using the same code and build environment.

What's next?

Now that we're set and ready to work and the environment is all set up, we can start doing more interesting things. In a follow-up post I'll show a concrete example of an application that showcases most of the features listed above.

1: Most of them are generally labelled as RTOSs, although the "RT" there is used rather loosely.

2: ThreadX is now an option too, having become open source recently. It brings certain features that are more common in proprietary systems, such as security certifications, and it looks like it was designed in a more focused way. In contrast, it lacks the ecosystem and other perks of open source projects (ease of adoption, rapid community-based growth).

3: https://popovicu.com/posts/risc-v-sbi-and-full-boot-process/.

by rcn at August 07, 2025 12:00 PM

Eric Meyer

Infinite Pixels

I was on one of my rounds of social media trawling, just seeing what was floating through the aether, when I came across a toot by Andy P that said:

Fun #css trick:

width: calc(infinity * 1px);
height: calc(infinity * 1px);

…and I immediately thought, This is a perfect outer-limits probe! By which I mean, if I hand a browser values that are effectively infinite by way of theinfinity keyword, it will necessarily end up clamping to something finite, thus revealing how far it’s able or willing to go for that property.

The first thing I did was exactly what Andy proposed, with a few extras to zero out box model extras:

div {
	width: calc(infinity * 1px);  
	height: calc(infinity * 1px);
	margin: 0;
	padding: 0; }
<body>
   <div>I’m huge!</div>
</body>

Then I loaded the (fully valid HTML 5) test page in Firefox Nightly, Chrome stable, and Safari stable, all on macOS, and things pretty immediately got weird:

Element Size Results
Browser Computed value Layout value
Safari 33,554,428 33,554,428
Chrome 33,554,400 33,554,400
Firefox (Nightly) 19.2 / 17,895,700 19.2 / 8,947,840 †

† height / width

Chrome and Safari both get very close to 225-1 (33,554,431), with Safari backing off from that by just 3 pixels, and Chrome by 31.  I can’t even hazard a guess as to why this sort of value would be limited in that way; if there was a period of time where 24-bit values were in vogue, I must have missed it.  I assume this is somehow rooted in the pre-Blink-fork codebase, but who knows. (Seriously, who knows?  I want to talk to you.)

But the faint whiff of oddness there has nothing on what’s happening in Firefox.  First off, the computed height is19.2px, which is the height of a line of text at default font size and line height.  If I explicitly gave it line-height: 1, the height of the <div> changes to 16px.  All this is despite my assigning a height of infinite pixels!  Which, to be fair, is not really possible to do, but does it make sense to just drop it on the floor rather than clamp to an upper bound?

Even if that can somehow be said to make sense, it only happens with height.  The computed width value is, as indicated, nearly 17.9 million, which is not the content width and is also nowhere close to any power of two.  But the actual layout width, according to the diagram in the Layout tab, is just over 8.9 million pixels; or, put another way, one-half of 17,895,700 minus 10.

This frankly makes my brain hurt.  I would truly love to understand the reasons for any of these oddities.  If you know from whence they arise, please, please leave a comment!  The more detail, the better.  I also accept trackbacks from blog posts if you want to get extra-detailed.

For the sake of my aching skullmeats, I almost called a halt there, but I decided to see what happened with font sizes.

div {
	width: calc(infinity * 1px);  
	height: calc(infinity * 1px);
	margin: 0;
	padding: 0;
	font-size: calc(infinity * 1px); }

My skullmeats did not thank me for this, because once again, things got… interesting.

Font Size Results
Browser Computed value Layout value
Safari 100,000 100,000
Chrome 10,000 10,000
Firefox (Nightly) 3.40282e38 2,400 / 17,895,700 †

† line height values of normal /1

Safari and Chrome have pretty clearly set hard limits, with Safari’s an order of magnitude larger than Chrome’s.  I get it: what are the odds of someone wanting their text to be any larger than, say, a viewport height, let alone ten or 100 times that height?  What intrigues me is the nature of the limits, which are so clearly base-ten numbers that someone typed in at some point, rather than being limited by setting a register size or variable length or something that would have coughed up a power of two.

And speaking of powers of two… ah, Firefox.  Your idiosyncrasy continues.  The computed value is a 32-bit single-precision floating-point number.  It doesn’t get used in any of the actual rendering, but that’s what it is.  Instead, the actual font size of the text, as judged by the Box Model diagram on the Layout tab, is… 2,400 pixels.

Except, I can’t say that’s the actual actual font size being used: I suspect the actual value is 2,000 with a line height of 1.2, which is generally what normal line heights are in browsers. “So why didn’t you just set line-height: 1 to verify that, genius?” I hear you asking.  I did!  And that’s when the layout height of the <div> bloomed to just over 8.9 million pixels, like it probably should have in the previous test!  And all the same stuff happened when I moved the styles from the<div> to the <body>!

I’ve started writing at least three different hypotheses for why this happens, and stopped halfway through each because each hypothesis self-evidently fell apart as I was writing it.  Maybe if I give my whimpering neurons a rest, I could come up with something.  Maybe not.  All I know is, I’d be much happier if someone just explained it to me; bonus points if their name is Clarissa.

Since setting line heights opened the door to madness in font sizing, I thought I’d try setting line-height to infinite pixels and see what came out.  This time, things were (relatively speaking) more sane.

Line Height Results
Browser Computed value Layout value
Safari 33,554,428 33,554,428
Chrome 33,554,400 33,554,400
Firefox (Nightly) 17,895,700 8,947,840

Essentially, the results were the same as what happened with element widths in the first example: Safari and Chrome were very close to 225-1, and Firefox had its thing of a strange computed value and a rendering size not quite half the computed value.

I’m sure there’s a fair bit more to investigate about infinite-pixel values, or about infinite values in general, but I’m going to leave this here because my gray matter needs a rest and possibly a pressure washing.  Still, if you have ideas for infinitely fun things to jam into browser engines and see what comes out, let me know.  I’m already wondering what kind of shenanigans, other than in z-index, I can get up to with calc(-infinity)


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at August 07, 2025 11:30 AM

August 03, 2025

Emmanuele Bassi

Governance in GNOME

How do things happen in GNOME?

Things happen in GNOME? Could have fooled me, right?

Of course, things happen in GNOME. After all, we have been releasing every six months, on the dot, for nearly 25 years. Assuming we’re not constantly re-releasing the same source files, then we have to come to the conclusion that things change inside each project that makes GNOME, and thus things happen that involve more than one project.

So let’s roll back a bit.

GNOME’s original sin

We all know Havoc Pennington’s essay on preferences; it’s one of GNOME’s foundational texts, we refer to it pretty much constantly both inside and outside the contributors community. It has guided our decisions and taste for over 20 years. As far as foundational text goes, though, it applies to design philosophy, not to project governance.

When talking about the inception and technical direction of the GNOME project there are really two foundational texts that describe the goals of GNOME, as well as the mechanisms that are employed to achieve those goals.

The first one is, of course, Miguel’s announcement of the GNOME project itself, sent to the GTK, Guile, and (for good measure) the KDE mailing lists:

We will try to reuse the existing code for GNU programs as much as possible, while adhering to the guidelines of the project. Putting nice and consistent user interfaces over all-time favorites will be one of the projects. — Miguel de Icaza, “The GNOME Desktop project.” announcement email

Once again, everyone related to the GNOME project is (or should be) familiar with this text.

The second foundational text is not as familiar, outside of the core group of people that were around at the time. I am referring to Derek Glidden’s description of the differences between GNOME and KDE, written five years after the inception of the project. I isolated a small fragment of it:

Development strategies are generally determined by whatever light show happens to be going on at the moment, when one of the developers will leap up and scream “I WANT IT TO LOOK JUST LIKE THAT” and then straight-arm his laptop against the wall in an hallucinogenic frenzy before vomiting copiously, passing out and falling face-down in the middle of the dance floor. — Derek Glidden, GNOME vs KDE

What both texts have in common is subtle, but explains the origin of the project. You may not notice it immediately, but once you see it you can’t unsee it: it’s the over-reliance on personal projects and taste, to be sublimated into a shared vision. A “bottom up” approach, with “nice and consistent user interfaces” bolted on top of “all-time favorites”, with zero indication of how those nice and consistent UIs would work on extant code bases, all driven by somebody’s with a vision—drug induced or otherwise—that decides to lead the project towards its implementation.

It’s been nearly 30 years, but GNOME still works that way.

Sure, we’ve had a HIG for 25 years, and the shared development resources that the project provides tend to mask this, to the point that everyone outside the project assumes that all people with access to the GNOME commit bit work on the whole project, as a single unit. If you are here, listening (or reading) to this, you know it’s not true. In fact, it is so comically removed from the lived experience of everyone involved in the project that we generally joke about it.

Herding cats and vectors sum

During my first GUADEC, back in 2005, I saw a great slide from Seth Nickell, one of the original GNOME designers. It showed GNOME contributors represented as a jumble of vectors going in all directions, cancelling each component out; and the occasional movement in the project was the result of somebody pulling/pushing harder in their direction.

Of course, this is not the exclusive province of GNOME: you could take most complex free and open source software projects and draw a similar diagram. I contend, though, that when it comes to GNOME this is not emergent behaviour but it’s baked into the project from its very inception: a loosey-goosey collection of cats, herded together by whoever shows up with “a vision”, but, also, a collection of loosely coupled projects. Over the years we tried to put a rest to the notion that GNOME is a box of LEGO, meant to be assembled together by distributors and users in the way they most like it; while our software stack has graduated from the “thrown together at the last minute” quality of its first decade, our community is still very much following that very same model; the only way it seems to work is because we have a few people maintaining a lot of components.

On maintainers

I am a software nerd, and one of the side effects of this terminal condition is that I like optimisation problems. Optimising software is inherently boring, though, so I end up trying to optimise processes and people. The fundamental truth of process optimisation, just like software, is to avoid unnecessary work—which, in some cases, means optimising away the people involved.

I am afraid I will have to be blunt, here, so I am going to ask for your forgiveness in advance.

Let’s say you are a maintainer inside a community of maintainers. Dealing with people is hard, and the lord forbid you talk to other people about what you’re doing, what they are doing, and what you can do together, so you only have a few options available.

The first one is: you carve out your niche. You start, or take over, a project, or an aspect of a project, and you try very hard to make yourself indispensable, so that everything ends up passing through you, and everyone has to defer to your taste, opinion, or edict.

Another option: API design is opinionated, and reflects the thoughts of the person behind it. By designing platform API, you try to replicate your toughts, taste, and opinions into the minds of the people using it, like the eggs of parasitic wasp; because if everybody thinks like you, then there won’t be conflicts, and you won’t have to deal with details, like “how to make this application work”, or “how to share functionality”; or, you know, having to develop a theory of mind for relating to other people.

Another option: you try to reimplement the entirety of a platform by yourself. You start a bunch of projects, which require starting a bunch of dependencies, which require refactoring a bunch of libraries, which ends up cascading into half of the stack. Of course, since you’re by yourself, you end up with a consistent approach to everything. Everything is as it ought to be: fast, lean, efficient, a reflection of your taste, commitment, and ethos. You made everyone else redundant, which means people depend on you, but also nobody is interested in helping you out, because you are now taken for granted, on the one hand, and nobody is able to get a word edgewise into what you made on the other.

I purposefully did not name names, even though we can all recognise somebody in these examples. For instance, I recognise myself. I have been all of these examples, at one point or another over the past 20 years.

Painting a target on your back

But if this is what it looks like from within a project, what it looks like from the outside is even worse.

Once you start dragging other people, you raise your visibility; people start learning your name, because you appear in the issue tracker, on Matrix/IRC, on Discourse and Planet GNOME. Youtubers and journalists start asking you questions about the project. Randos on web forums start associating you to everything GNOME does, or does not; to features, design, and bugs. You become responsible for every decision, whether you are or not, and this leads to being the embodiment of all evil the project does. You’ll get hate mail, you’ll be harrassed, your words will be used against you and the project for ever and ever.

Burnout and you

Of course, that ends up burning people out; it would be absurd if it didn’t. Even in the best case possible, you’ll end up burning out just by reaching empathy fatigue, because everyone has access to you, and everyone has their own problems and bugs and features and wouldn’t it be great to solve every problem in the world? This is similar to working for non profits as opposed to the typical corporate burnout: you get into a feedback loop where you don’t want to distance yourself from the work you do because the work you do gives meaning to yourself and to the people that use it; and yet working on it hurts you. It also empowers bad faith actors to hound you down to the ends of the earth, until you realise that turning sand into computers was a terrible mistake, and we should have torched the first personal computer down on sight.

Governance

We want to have structure, so that people know what to expect and how to navigate the decision making process inside the project; we also want to avoid having a sacrificial lamb that takes on all the problems in the world on their shoulders until we burn them down to a cinder and they have to leave. We’re 28 years too late to have a benevolent dictator, self-appointed or otherwise, and we don’t want to have a public consultation every time we want to deal with a systemic feature. What do we do?

Examples

What do other projects have to teach us about governance? We are not the only complex free software project in existence, and it would be an appaling measure of narcissism to believe that we’re special in any way, shape or form.

Python

We should all know what a Python PEP is, but if you are not familiar with the process I strongly recommend going through it. It’s well documented, and pretty much the de facto standard for any complex free and open source project that has achieved escape velocity from a centralised figure in charge of the whole decision making process. The real achievement of the Python community is that it adopted this policy long before their centralised figure called it quits. The interesting thing of the PEP process is that it is used to codify the governance of the project itself; the PEP template is a PEP; teams are defined through PEPs; target platforms are defined through PEPs; deprecations are defined through PEPs; all project-wide processes are defined through PEPs.

Rust

Rust has a similar process for language, tooling, and standard library changes, called RFC. The RFC process is more lightweight on the formalities than Python’s PEPs, but it’s still very well defined. Rust, being a project that came into existence in a Post-PEP world, adopted the same type of process, and used it to codify teams, governance, and any and all project-wide processes.

Fedora

Fedora change proposals exist to discuss and document both self-contained changes (usually fairly uncontroversial, given that they are proposed by the same owners of module being changed) and system-wide changes. The main difference between them is that most of the elements of a system-wide change proposal are required, wheres for self-contained proposals they can be optional; for instance, a system-wide change must have a contingency plan, a way to test it, and the impact on documentation and release notes, whereas as self-contained change does not.

GNOME

Turns out that we once did have GNOME Enhancement Proposals” (GEP), mainly modelled on Python’s PEP from 2002. If this comes as a surprise, that’s because they lasted for about a year, mainly because it was a reactionary process to try and funnel some of the large controversies of the 2.0 development cycle into a productive outlet that didn’t involve flames and people dramatically quitting the project. GEPs failed once the community fractured, and people started working in silos, either under their own direction or, more likely, under their management’s direction. What’s the point of discussing a project-wide change, when that change was going to be implemented by people already working together?

The GEP process mutated into the lightweight “module proposal” process, where people discussed adding and removing dependencies on the desktop development mailing list—something we also lost over the 2.x cycle, mainly because the amount of discussions over time tended towards zero. The people involved with the change knew what those modules brought to the release, and people unfamiliar with them were either giving out unsolicited advice, or were simply not reached by the desktop development mailing list. The discussions turned into external dependencies notifications, which also died up because apparently asking to compose an email to notify the release team that a new dependency was needed to build a core module was far too much of a bother for project maintainers.

The creation and failure of GEP and module proposals is both an indication of the need for structure inside GNOME, and how this need collides with the expectation that project maintainers have not just complete control over every aspect of their domain, but that they can also drag out the process until all the energy behind it has dissipated. Being in charge for the long run allows people to just run out the clock on everybody else.

Goals

So, what should be the goal of a proper technical governance model for the GNOME project?

Diffusing responsibilities

This should be goal zero of any attempt at structuring the technical governance of GNOME. We have too few people in too many critical positions. We can call it “efficiency”, we can call it “bus factor”, we can call it “bottleneck”, but the result is the same: the responsibility for anything is too concentrated. This is how you get conflict. This is how you get burnout. This is how you paralise a whole project. By having too few people in positions of responsibility, we don’t have enough slack in the governance model; it’s an illusion of efficiency.

Responsibility is not something to hoard: it’s something to distribute.

Empowering the community

The community of contributors should be able to know when and how a decision is made; it should be able to know what to do once a decision is made. Right now, the process is opaque because it’s done inside a million different rooms, and, more importantly, it is not recorded for posterity. Random GitLab issues should not be the only place where people can be informed that some decision was taken.

Empowering individuals

Individuals should be able to contribute to a decision without necessarily becoming responsible for a whole project. It’s daunting, and requires a measure of hubris that cannot be allowed to exist in a shared space. In a similar fashion, we should empower people that want to contribute to the project by reducing the amount of fluff coming from people with zero stakes in it, and are interested only in giving out an opinion on their perfectly spherical, frictionless desktop environment.

It is free and open source software, not free and open mic night down at the pub.

Actual decision making process

We say we work by rough consensus, but if a single person is responsible for multiple modules inside the project, we’re just deceiving ourselves. I should not be able to design something on my own, commit it to all projects I maintain, and then go home, regardless of whether what I designed is good or necessary.

Proposed GNOME Changes✝

✝ Name subject to change

PGCs

We have better tools than what the GEP used to use and be. We have better communication venues in 2025; we have better validation; we have better publishing mechanisms.

We can take a lightweight approach, with a well-defined process, and use it not for actual design or decision-making, but for discussion and documentation. If you are trying to design something and you use this process, you are by definition Doing It Wrong™. You should have a design ready, and series of steps to achieve it, as part of a proposal. You should already know the projects involved, and already have an idea of the effort needed to make something happen.

Once you have a formal proposal, you present it to the various stakeholders, and iterate over it to improve it, clarify it, and amend it, until you have something that has a rough consensus among all the parties involved. Once that’s done, the proposal is now in effect, and people can refer to it during the implementation, and in the future. This way, we don’t have to ask people to remember a decision made six months, two years, ten years ago: it’s already available.

Editorial team

Proposals need to be valid, in order to be presented to the community at large; that validation comes from an editorial team. The editors of the proposals are not there to evaluate its contents: they are there to ensure that the proposal is going through the expected steps, and that discussions related to it remain relevant and constrained within the accepted period and scope. They are there to steer the discussion, and avoid architecture astronauts parachuting into the issue tracker or Discourse to give their unwarranted opinion.

Once the proposal is open, the editorial team is responsible for its inclusion in the public website, and for keeping track of its state.

Steering group

The steering group is the final arbiter of a proposal. They are responsible for accepting it, or rejecting it, depending on the feedback from the various stakeholders. The steering group does not design or direct GNOME as a whole: they are the ones that ensure that communication between the parts happens in a meaningful manner, and that rough consensus is achieved.

The steering group is also, by design, not the release team: it is made of representatives from all the teams related to technical matters.

Is this enough?

Sadly, no.

Reviving a process for proposing changes in GNOME without addressing the shortcomings of its first iteration would inevitably lead to a repeat of its results.

We have better tooling, but the problem is still that we’re demanding that each project maintainer gets on board with a process that has no mechanism to enforce compliance.

Once again, the problem is that we have a bunch of fiefdoms that need to be opened up to ensure that more people can work on them.

Whither maintainers

In what was, in retrospect, possibly one of my least gracious and yet most prophetic moments on the desktop development mailing list, I once said that, if it were possible, I would have already replaced all GNOME maintainers with a shell script. Turns out that we did replace a lot of what maintainers used to do, and we used a large Python service to do that.

Individual maintainers should not exist in a complex project—for both the project’s and the contributors’ sake. They are inefficiency made manifest, a bottleneck, a point of contention in a distributed environment like GNOME. Luckily for us, we almost made them entirely redundant already! Thanks to the release service and CI pipelines, we don’t need a person spinning up a release archive and uploading it into a file server. We just need somebody to tag the source code repository, and anybody with the right permissions could do that.

We need people to review contributions; we need people to write release notes; we need people to triage the issue tracker; we need people to contribute features and bug fixes. None of those tasks require the “maintainer” role.

So, let’s get rid of maintainers once and for all. We can delegate the actual release tagging of core projects and applications to the GNOME release team; they are already releasing GNOME anyway, so what’s the point in having them wait every time for somebody else to do individual releases? All people need to do is to write down what changed in a release, and that should be part of a change itself; we have centralised release notes, and we can easily extract the list of bug fixes from the commit log. If you can ensure that a commit message is correct, you can also get in the habit of updating the NEWS file as part of a merge request.

Additional benefits of having all core releases done by a central authority are that we get people to update the release notes every time something changes; and that we can sign all releases with a GNOME key that downstreams can rely on.

Embracing special interest groups

But it’s still not enough.

Especially when it comes to the application development platform, we have already a bunch of components with an informal scheme of shared responsibility. Why not make that scheme official?

Let’s create the SDK special interest group; take all the developers for the base libraries that are part of GNOME—GLib, Pango, GTK, libadwaita—and formalise the group of people that currently does things like development, review, bug fixing, and documentation writing. Everyone in the group should feel empowered to work on all the projects that belong to that group. We already are, except we end up deferring to somebody that is usually too busy to cover every single module.

Other special interest groups should be formed around the desktop, the core applications, the development tools, the OS integration, the accessibility stack, the local search engine, the system settings.

Adding more people to these groups is not going to be complicated, or introduce instability, because the responsibility is now shared; we would not be taking somebody that is already overworked, or even potentially new to the community, and plopping them into the hot seat, ready for a burnout.

Each special interest group would have a representative in the steering group, alongside teams like documentation, design, and localisation, thus ensuring that each aspect of the project technical direction is included in any discussion. Each special interest group could also have additional sub-groups, like a web services group in the system settings group; or a networking group in the OS integration group.

What happens if I say no?

I get it. You like being in charge. You want to be the one calling the shots. You feel responsible for your project, and you don’t want other people to tell you what to do.

If this is how you feel, then there’s nothing wrong with parting ways with the GNOME project.

GNOME depends on a ton of projects hosted outside GNOME’s own infrastructure, and we communicate with people maintaining those projects every day. It’s 2025, not 1997: there’s no shortage of code hosting services in the world, we don’t need to have them all on GNOME infrastructure.

If you want to play with the other children, if you want to be part of GNOME, you get to play with a shared set of rules; and that means sharing all the toys, and not hoarding them for yourself.

Civil service

What we really want GNOME to be is a group of people working together. We already are, somewhat, but we can be better at it. We don’t want rule and design by committee, but we do need structure, and we need that structure to be based on expertise; to have distinct sphere of competence; to have continuity across time; and to be based on rules. We need something flexible, to take into account the needs of GNOME as a project, and be capable of growing in complexity so that nobody can be singled out, brigaded on, or burnt to a cinder on the sacrificial altar.

Our days of passing out in the middle of the dance floor are long gone. We might not all be old—actually, I’m fairly sure we aren’t—but GNOME has long ceased to be something we can throw together at the last minute just because somebody assumed the mantle of a protean ruler, and managed to involve themselves with every single project until they are the literal embodiement of an autocratic force capable of dragging everybody else towards a goal, until the burn out and have to leave for their own sake.

We can do better than this. We must do better.

To sum up

Stop releasing individual projects, and let the release team do it when needed.

Create teams to manage areas of interest, instead of single projects.

Create a steering group from representatives of those teams.

Every change that affects one or more teams has to be discussed and documented in a public setting among contributors, and then published for future reference.

None of this should be controversial because, outside of the publishing bit, it’s how we are already doing things. This proposal aims at making it official so that people can actually rely on it, instead of having to divine the process out of thin air.


The next steps

We’re close to the GNOME 49 release, now that GUADEC 2025 has ended, so people are busy working on tagging releases, fixing bugs, and the work on the release notes has started. Nevertheless, we can already start planning for an implementation of a new governance model for GNOME for the next cycle.

First of all, we need to create teams and special interest groups. We don’t have a formal process for that, so this is also a great chance at introducing the change proposal process as a mechanism for structuring the community, just like the Python and Rust communities do. Teams will need their own space for discussing issues, and share the load. The first team I’d like to start is an “introspection and language bindings” group, for all bindings hosted on GNOME infrastructure; it would act as a point of reference for all decisions involving projects that consume the GNOME software development platform through its machine-readable ABI description. Another group I’d like to create is an editorial group for the developer and user documentation; documentation benefits from a consistent editorial voice, while the process of writing documentation should be open to everybody in the community.

A very real issue that was raised during GUADEC is bootstrapping the steering committee; who gets to be on it, what is the committee’s remit, how it works. There are options, but if we want the steering committee to be a representation of the technical expertise of the GNOME community, it also has to be established by the very same community; in this sense, the board of directors, as representatives of the community, could work on defining the powers and compositions of this committee.

There are many more issues we are going to face, but I think we can start from these and evaluate our own version of a technical governance model that works for GNOME, and that can grow with the project. In the next couple of weeks I’ll start publishing drafts for team governance and the power/composition/procedure of the steering committee, mainly for iteration and comments.

by ebassi at August 03, 2025 07:48 PM

July 28, 2025

Igalia WebKit Team

WebKit Igalia Periodical #32

Update on what happened in WebKit in the week from July 21 to July 28.

This week the trickle of improvements to the graphics stack continues with more font handling improvements and tuning of damage information; plus the WPEPlatform Wayland backend gets server-side decorations with some compositors.

Cross-Port 🐱

Graphics 🖼️

The font-variant-emoji CSS property is now enabled by default in the GTK and WPE ports.

Font synthesis properties (synthetic bold/italic) are now correctly handled, so that fonts are rendered bold or italic even when the font itself does not provide these variants.

A few minor improvements to the damage propagation feature have landed.

The screen device scaling factor in use is now shown in the webkit://gpu internal information page.

WPE WebKit 📟

WPE Platform API 🧩

New, modern platform API that supersedes usage of libwpe and WPE backends.

The Wayland backend included with WPEPlatform has been taught how to request server-side decorations using the XDG Decoration protocol. This means that compositors that support the protocol will provide window frames and title bars for WPEToplevel instances. While this is a welcome quality of life improvement in many cases, window decorations will not be shown on Weston and Mutter (used by GNOME Shell among others), as they do not support the protocol at the moment.

WPE MiniBrowser, showing server-side decorations with the Labwc compositor

That’s all for this week!

by Igalia WebKit Team at July 28, 2025 09:04 PM

July 25, 2025

Víctor Jáquez

Summer updates

Somehow I internalized that my duty as software programmer was to silently work in a piece of code as if it were a magnum opus, until it’s finished, and then release it to the world with no need of explanations, because it should speak for itself. In other words, I tend to consider my work as a form of art, and myself as an artist. But I’m not. There’s no magnus opus and there will never be one. I’m rather a craftsman, in the sense of Richard Sennett: somebody who cares about their craft, making small, quick but thoughtful and clean changes, here and there, hoping that they will be useful to someone, now and in the future. And those little efforts need to be exposed openly, in spaces as this one and social media, as if I were a bazaar merchant.

This reflection invites me to add another task to my duties as software programmer: a periodical exposition of the work done. And this is the first intent to forge a (monthly) discipline in that direction, not in the sense of bragging, or looking to overprice a product (in the sense of commodity fetishism), but to build bridges with those that might find useful those pieces of software.

Let’s start.

GStreamer YUV4MPEG2 encoder and decoder #

We have been working lately on video encoding, and we wanted an easy way to test our work, using common samples such as those shared by the Derf’s collection. They are in a file format known as YUV4MPEG2, or more commonly known as y4m, because of their file name extension.

YUV4MPEG2 is a simple file format designed to hold uncompressed frames of YUV video, formatted as YCbCr 4:2:0, YCbCr 4:2:2 or YCbCr 4:4:4 data for the purpose of encoding. Instead of using raw YUV streams, where the frame size and color format have to be provided out-of-band, these metadata are embedded in the file.

There were already GStreamer elements for encoding and decoding y4m streams, but y4mdec was in gst-plugins-bad while y4menc in gst-plugins-good.

Our first task was to fix and improve y4menc [!8654], then move y4mdec to gst-plugins-good [!8719], but that implied to rewrite the element and add unit tests, while add more features such as handling more color formats.

Soothe — video encoders testing framework #

Heavily inspired by Fluster, a testing framework written in Python for decoder conformance, we are sketching Soothe, a script that aims to be a testing framework for video encoders, using VMAF, a perceptual video quality assessment algorithm.

GStreamer Vulkan H.264 encoder #

This is the reason of the efforts expressed above: vulkanh264enc, a H.264 encoder using the Vulkan Video extension [!7197].

One interesting side of this task was to propose a base class for hardware accelerated H.264 encoders, based on the vah264enc, the GStreamer VA-API H.264 encoder. We talked about this base class in the GStreamer Conference 2024.

Now the H.264 encoder merged and it will be part of the future release of GStreamer 1.28.

Removal of GStreamer-VAAPI subproject #

We’re very grateful with GStreamer-VAAPI. When its maintenance were handed over to us, after a few months we got the privilege to merge it as an official GStreamer subproject.

Now GStreamer-VAAPI functionality has been replaced with the VA plugin in gst-plugins-bad. Still, it isn’t a full featured replacement [#3947], but it’s complete and stable enough to be widely deployed. As Tim said in the GStreamer Conference 2024: it just works.

So, GStreamer-VAAPI subproject has been removed from main branch in git repository [!9200], and its Gitlab project, archived.

Vulkan Video Status page #

We believe that Vulkan Video extension will be one of the main APIs for video encoding, decoding and processing. Igalia participate in the Vulkan Video Technical Sub Group (TSG) and helps with the Conformance Test Suite (CTS).

Vulkan Video extension is big and constantly updated. In order to keep track of it we maintain a web page with the latest news about the specification, proprietary drivers, open source drivers and open source applications, along with articles and talks about it.

https://vulkan-video-status.igalia.com

GStreamer Planet #

Last but not least, GStreamer Planet has been updated and overhauled.

Given that the old Planet script, written in Python 2, is unmaintained, we worked on a new one in Rust: planet-rs. It internally uses tera for templates, feed-rs for feed parsing, and reqwest for HTTP handling. The planet is generated using Gitlab scheduled CI pipelines.

https://gstreamer.freedesktop.org/planet

July 25, 2025 12:00 AM

July 21, 2025

Igalia WebKit Team

WebKit Igalia Periodical #31

Update on what happened in WebKit in the week from July 14 to July 21.

In this week we had a fix for the libsoup-based resource loader on platforms without the shared-mime-info package installed, a fix for SQLite usage in WebKit, ongoing work on the GStreamer-based WebRTC implementation including better encryption for its default DTLS certificate and removal of a dependency, and an update on the status of GNOME Web Canary version.

Cross-Port 🐱

ResourceLoader delegates local resource loading (e.g. gresources) to ResourceLoaderSoup, which in turn uses g_content_type_guess to identify their content type. In platforms where shared-mime-info is not available, this fails silently and reports "text/plain", breaking things such as PDFjs.

A patch was submitted to use MIMETypeRegistry to get the MIME type of these local resources, falling back to g_content_type_guess when that fails, making internal resource loading more resilient.

Fixed "PRAGMA incrementalVacuum" for SQLite, which is used to reclaim freed filesystem space.

Multimedia 🎥

GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.

Most web engines migrated from a default DTLS certificate signed with a RSA key to a ECDSA p-256 key, almost a decade ago. GstWebRTC is now also signing its default DTLS certificate with that private key format. This improves compatibility with various SFUs, the Jitsi Video Bridge among them.

Work is on-going in GStreamer, adding support for getting the currently selected ICE candidates pair and a new webrtcbin signal to close the connection.

The WebKit GStreamer MediaRecorder backend no longer depends on GstTranscoder

WPE WebKit 📟

WPE Android 🤖

Adaptation of WPE WebKit targeting the Android operating system.

Changed libpsl to include built-in public-suffix data when building WPE for Android. Among other duties, having this working correctly is important for site isolation, resource loading, and cookie handling.

Releases 📦️

The GNOME Web Canary build has been stale for several weeks, since the GNOME nightly SDK was updated to freedesktop SDK 25.08beta which no longer ships one of the WebKitGTK build dependencies (Ruby). We will do our best to get the builds back to a working state, soon hopefully.

That’s all for this week!

by Igalia WebKit Team at July 21, 2025 07:24 PM

July 15, 2025

Alberto Garcia

Converting QEMU qcow2 images directly to stdout

Introduction

Some months ago, my colleague Madeeha Javed and I wrote a tool to convert QEMU disk images into qcow2, writing the result directly to stdout.

This tool is called qcow2-to-stdout.py and can be used for example to create a new image and pipe it through gzip and/or send it directly over the network without having to write it to disk first.

This program is included in the QEMU repository: https://github.com/qemu/qemu/blob/master/scripts/qcow2-to-stdout.py

If you simply want to use it then all you need to do is have a look at these examples:

$ qcow2-to-stdout.py source.raw > dest.qcow2
$ qcow2-to-stdout.py -f dmg source.dmg | gzip > dest.qcow2.gz

If you’re interested in the technical details, read on.

A closer look under the hood

QEMU uses disk images to store the contents of the VM’s hard drive. Images are often in qcow2, QEMU’s native format, although a variety of other formats and protocols are also supported.

I have written in detail about the qcow2 format in the past (for example, here and here), but the general idea is very easy to understand: the virtual drive is divided into clusters of a certain size (64 KB by default), and only the clusters containing non-zero data need to be physically present in the qcow2 image. So what we have is essentially a collection of data clusters and a set of tables that map guest clusters (what the VM sees) to host clusters (what the qcow2 file actually stores).

A qcow2 file is a collection of data clusters plus some metadata to map them to what the guest VM sees.

qemu-img is a powerful and versatile tool that can be used to create, modify and convert disk images. It has many different options, but one question that sometimes arises is whether it can use stdin or stdout instead of regular files when converting images.

The short answer is that this is not possible in general. qemu-img convert works by checking the (virtual) size of the source image, creating a destination image of that same size and finally copying all the data from start to finish.

Reading a qcow2 image from stdin doesn’t work because data and metadata blocks can come in any arbitrary order, so it’s perfectly possible that the information that we need in order to start writing the destination image is at the end of the input data¹.

Writing a qcow2 image to stdout doesn’t work either because we need to know in advance the complete list of clusters from the source image that contain non-zero data (this is essential because it affects the destination file’s metadata). However, if we do have that information then writing a new image directly to stdout is technically possible.

The bad news is that qemu-img won’t help us here: it uses the same I/O code as the rest of QEMU. This generic approach makes total sense because it’s simple, versatile and is valid for any kind of source and destination image that QEMU supports. However, it needs random access to both images.

If we want to write a qcow2 file directly to stdout we need new code written specifically for this purpose, and since it cannot reuse the logic present in the QEMU code this was written as a separate tool (a Python script).

The process itself goes like this:

  • Read the source image from start to finish in order to determine which clusters contain non-zero data. These are the only clusters that need to be present in the new image.
  • Write to stdout all the metadata structures of the new image. This is now possible because after the previous step we know how much data we have and where it is located.
  • Read the source image again and copy the clusters with non-zero data to stdout.

Images created with this program always have the same layout: header, refcount tables and blocks, L1 and L2 tables, and finally all data clusters.

One problem here is that, while QEMU can read many different image formats, qcow2-to-stdout.py is an independent tool that does not share any of the code and therefore can only read raw files. The solution here is to use qemu-storage-daemon. This program is part of QEMU and it can use FUSE to export any file that QEMU can read as a raw file. The usage of qemu-storage-daemon is handled automatically and the user only needs to specify the format of the source file:

$ qcow2-to-stdout.py -f dmg source.dmg > dest.qcow2

qcow2-to-stdout.py can only create basic qcow2 files and does not support features like compression or encryption. However, a few parameters can be adjusted, like the cluster size (-c), the width of the reference count entries (-r) and whether the new image is created with the input as an external data file (-d and -R).

And this is all, I hope that you find this tool useful and this post informative. Enjoy!

Acknowledgments

This work has been developed by Igalia and sponsored by Outscale, a Dassault Systèmes brand.

Logos of Igalia and Outscale

¹ This problem would not happen if the input data was in raw format but in this case we would not know the size in advance.

by berto at July 15, 2025 05:17 PM

July 14, 2025

Igalia WebKit Team

WebKit Igalia Periodical #30

Update on what happened in WebKit in the week from July 7 to July 14.

This week saw a fix for IPv6 scope-ids in DNS responses, frame pointers re-enabled in JSC developer builds, and a significant improvement to emoji fonts selection.

Cross-Port 🐱

Fixed support for IPv6 scope-ids in DNS responses.

JavaScriptCore 🐟

The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.

Developer builds of JSC now default to having frame pointers, to allow for more useful backtraces.

Graphics 🖼️

Improved the selection of emoji fonts to follow the spec more closely, and ensure the choice is honored while iterating over fallback fonts.

This work has been done in preparation to enable the support for the new font-variant-emoji CSS property down the line.

That’s all for this week!

by Igalia WebKit Team at July 14, 2025 08:01 PM

July 11, 2025

Manuel Rego

Playing with the new caret CSS properties

This is a brief blog post about some experiments playing with the new caret-animation and caret-shape CSS properties.

Current status #

It’s been a while since Igalia worked on adding support for caret-color property in Chromium/Blink (see my blog post from 2017) and more recently we have also been working on more properties to customize the insertion caret (see Stephen Chenney blog post from October last year).

Since then things have progressed and now caret-animation is shipping in Chromium since version 139 and caret-shape is being developed. So you can already start playing with these properties by enabling the experimental web platform features in Chromium since version 140.0.7288 (chrome://flags#enable-experimental-web-platform-features).

Some examples #

caret-shape syntax is pretty simple:

caret-shape: auto | bar | block | underscore

The initial value is auto which means the browser can determine the shape of the caret to follow platform conventions in different situations, however so far this is always using a bar caret (|). Then you can decide to use either a block () or underscore (_) caret, which might be useful and give a nice touch to some kinds of applications like a code editor.

Next you can see a very simple example which modifies the value of the caret-shape property so you can see how it works.

Screencast of the different caret-shape possible values

As you might have noticed we’re only using caret-shape: block property and not setting any particular color for it, in order to ensure the characters are still visible, the current Chromium implementation adds transparency to the block caret.

Let’s now combine the three CSS caret properties in a single example. Imagine we want a more fancy insertion caret that uses the block shape but blinks between two colors. To achieve something like this we have to use caret-color and also caret-animation so we can control how the caret is animated and change the color through CSS animations.

The source code of the following example is quite simple:

textarea {
color: white;
background: black;
caret-shape: block;
caret-animation: manual;
animation: caret-block 2s step-end infinite;
}

@keyframes caret-block {
0% { caret-color: #00d2ff; }
50% { caret-color: #ffa6b9; }
}

As you can see we’re using caret-shape: block to define we want a block insertion caret, and also caret-animation: manual which makes the browser to stop animating the caret. Thus we have to use our own animation that modifies caret-color to switch colors.

Screencast of a block caret that blinks between two colors

Similar to that you can create a rainbow caret with a fancier animation 🌈.

Screencast of a block caret that uses a rainbow animation to change colors

Or a caret that switches between block and underscore shapes.

Screencast of a caret that switches between block and underscore shapes

These are just some quick examples about how to use these new properties, you can start experimenting with them though caret-shape is still in the oven but implementation is in active development. Remember that if you want to play with the linked examples you have to enable the experimental web platform features flag (via chrome://flags#enable-experimental-web-platform-features or passing -enable-experimental-web-platform-features).

Thanks to my colleagues Stephen Chenney and Ziran Sun that have been working on the implementation of these features and Bloomberg for sponsoring this work as part of the ongoing collaboration with Igalia to improve the web platform.

Igalia logo Bloomberg logo
Igalia and Bloomberg working together to build a better web

July 11, 2025 12:00 AM

July 08, 2025

Andy Wingo

guile lab notebook: on the move!

Hey, a quick update, then a little story. The big news is that I got Guile wired to a moving garbage collector!

Specifically, this is the mostly-moving collector with conservative stack scanning. Most collections will be marked in place. When the collector wants to compact, it will scan ambiguous roots in the beginning of the collection cycle, marking objects referenced by such roots in place. Then the collector will select some blocks for evacuation, and when visiting an object in those blocks, it will try to copy the object to one of the evacuation target blocks that are held in reserve. If the collector runs out of space in the evacuation reserve, it falls back to marking in place.

Given that the collector has to cope with failed evacuations, it is easy to give the it the ability to pin any object in place. This proved useful when making the needed modifications to Guile: for example, when we copy a stack slice containing ambiguous references to a heap-allocated continuation, we eagerly traverse that stack to pin the referents of those ambiguous edges. Also, whenever the address of an object is taken and exposed to Scheme, we pin that object. This happens frequently for identity hashes (hashq).

Anyway, the bulk of the work here was a pile of refactors to Guile to allow a centralized scm_trace_object function to be written, exposing some object representation details to the internal object-tracing function definition while not exposing them to the user in the form of API or ABI.

bugs

I found quite a few bugs. Not many of them were in Whippet, but some were, and a few are still there; Guile exercises a GC more than my test workbench is able to. Today I’d like to write about a funny one that I haven’t fixed yet.

So, small objects in this garbage collector are managed by a Nofl space. During a collection, each pointer-containing reachable object is traced by a global user-supplied tracing procedure. That tracing procedure should call a collector-supplied inline function on each of the object’s fields. Obviously the procedure needs a way to distinguish between different kinds of objects, to trace them appropriately; in Guile, we use an the low bits of the initial word of heap objects for this purpose.

Object marks are stored in a side table in associated 4-MB aligned slabs, with one mark byte per granule (16 bytes). 4 MB is 0x400000, so for an object at address A, its slab base is at A & ~0x3fffff, and the mark byte is offset by (A & 0x3fffff) >> 4. When the tracer sees an edge into a block scheduled for evacuation, it first checks the mark byte to see if it’s already marked in place; in that case there’s nothing to do. Otherwise it will try to evacuate the object, which proceeds as follows...

But before you read, consider that there are a number of threads which all try to make progress on the worklist of outstanding objects needing tracing (the grey objects). The mutator threads are paused; though we will probably add concurrent tracing at some point, we are unlikely to implement concurrent evacuation. But it could be that two GC threads try to process two different edges to the same evacuatable object at the same time, and we need to do so correctly!

With that caveat out of the way, the implementation is here. The user has to supply an annoyingly-large state machine to manage the storage for the forwarding word; Guile’s is here. Basically, a thread will try to claim the object by swapping in a busy value (-1) for the initial word. If that worked, it will allocate space for the object. If that failed, it first marks the object in place, then restores the first word. Otherwise it installs a forwarding pointer in the first word of the object’s old location, which has a specific tag in its low 3 bits allowing forwarded objects to be distinguished from other kinds of object.

I don’t know how to prove this kind of operation correct, and probably I should learn how to do so. I think it’s right, though, in the sense that either the object gets marked in place or evacuated, all edges get updated to the tospace locations, and the thread that shades the object grey (and no other thread) will enqueue the object for further tracing (via its new location if it was evacuated).

But there is an invisible bug, and one that is the reason for me writing these words :) Whichever thread manages to shade the object from white to grey will enqueue it on its grey worklist. Let’s say the object is on an block to be evacuated, but evacuation fails, and the object gets marked in place. But concurrently, another thread goes to do the same; it turns out there is a timeline in which the thread A has marked the object, published it to a worklist for tracing, but thread B has briefly swapped out the object’s the first word with the busy value before realizing the object was marked. The object might then be traced with its initial word stompled, which is totally invalid.

What’s the fix? I do not know. Probably I need to manage the state machine within the side array of mark bytes, and not split between the two places (mark byte and in-object). Anyway, I thought that readers of this web log might enjoy a look in the window of this clown car.

next?

The obvious question is, how does it perform? Basically I don’t know yet; I haven’t done enough testing, and some of the heuristics need tweaking. As it is, it appears to be a net improvement over the non-moving configuration and a marginal improvement over BDW, but which currently has more variance. I am deliberately imprecise here because I have been more focused on correctness than performance; measuring properly takes time, and as you can see from the story above, there are still a couple correctness issues. I will be sure to let folks know when I have something. Until then, happy hacking!

by Andy Wingo at July 08, 2025 02:28 PM

July 07, 2025

Igalia WebKit Team

WebKit Igalia Periodical #29

Update on what happened in WebKit in the week from June 30 to July 7.

Improvements to Sysprof and related dependencies, WebKit's usage of std::variant replaced by mpark::variant, major WebXR overhauling, and support for the logd service on Android, are all part of this week's bundle of updates.

Cross-Port 🐱

The WebXR support in the GTK and WPE WebKit ports has been ripped off in preparation for an overhaul that will make it better fit WebKit's multi-process architecture.

The new implementation, still based on OpenXR is being re-added piecewise, starting with the foundational support code to coordinate XR content inside the Web engine. Next starting and stopping immersive sessions was brought back, and a basic render loop.

Note these are the first steps on this effort, and there is still plenty to do before WebXR experiences work again.

Changed usage of std::variant in favor of an alternative implementation based on mpark::variant, which reduces the size of the built WebKit library—currently saves slightly over a megabyte for release builds.

WPE WebKit 📟

WPE Android 🤖

Adaptation of WPE WebKit targeting the Android operating system.

Logging support is being improved to submit entries to the logd service on Android, and also to configure logging using a system property. This makes debugging and troubleshooting issues on Android more manageable, and is particularly welcome to develop WebKit itself.

While working on this feature, the definition of logging channels was simplified, too.

Community & Events 🤝

WebKit on Linux integrates with Sysprof and reports a plethora of marks. As we report more information to Sysprof, we eventually pushed Sysprof internals to its limit! To help with that, we're adding a new feature to Sysprof: hiding marks from view.

This required diving a little deeper into the stack, and add a new feature to a dependency as well.

That’s all for this week!

by Igalia WebKit Team at July 07, 2025 08:49 PM

July 06, 2025

Jasmine Tang

Jasmine's first time in llvm land and her dotfiles

Jasmine reports on her first 3 weeks at Igalia and her dotfiles modification

July 06, 2025 12:00 AM

July 03, 2025

Igalia Compilers Team

Summary of the May 2025 TC39 plenary

Introduction #

Hello everyone! As we have with the last bunch of meetings, we're excited to tell you about all the new discussions taking place in TC39 meetings and how we try to contribute to them. However, this specific meeting has an even more special place in our hearts since Igalia had the privilege of organising it in our headquarters in A Coruña, Galicia. It was an absolute honor to host all the amazing delegates in our home city. We would like to thank everyone involved and look forward to hosting it again!

Let's delve together into some of the most exciting updates.

You can also read the full agenda and the meeting minutes on GitHub.

Progress Report: Stage 4 Proposals #

Array.fromAsync for stage 4 #

Array.from, which takes a synchronous iterable and dumps it into a new array, is one of Array's most frequently used built-in methods, especially for unit tests or CLI interfaces. However, there was no way to do the equivalent with an asynchronous iterator. Array.fromAsync solves this problem, being to Array.from as for await is to for. This proposal has now been shipping in all JS engines for at least a year (which means it's Baseline 2024), and it has been highly requested by developers.

From a bureaucratic point of view however, the proposal was never really stage 3. In September 2022 it advanced to stage 3 with the condition that all three of the ECMAScript spec editors signed off on the spec text; and the editors requested that a pull request was opened against the spec with the actual changes. However, this PR was not opened until recently. So in this TC39 meeting, the proposal advanced to stage 4, conditional on this editors actually reviewing it.

  • Presenter(s): J. S. Choi

Explicit Resource Management for Stage 4 #

The Explicit Resource Management proposal introduces implicit cleanup callbacks for objects based on lexical scope. This is enabled through the new using x = declaration:

{
using myFile = open(fileURL);
const someBytes = myFile.read();

// myFile will be automatically closed, and the
// associated resources released, here at the
// end of the block.
}

The proposal is now shipped in Chrome, Node.js and Deno, and it's behind a flag in Firefox. As such, Ron Buckton asked for (and obtained!) consensus to approve it for Stage 4 during the meeting.

Similarly to Array.fromAsync, it's not quite Stage 4 yet, as there is still something missing before including it in the ECMAScript standard: test262 tests need to be merged, and the ECMAScript spec editors need to approve the proposed specification text.

  • Presenter(s): Ron Buckton

Error.isError for stage 4 #

The Error.isError(objectToCheck) method provides a reliable way to check whether a given value is a real instance of Error. This proposal was originally presented by Jordan Harband in 2015, to address concerns about it being impossible to detect whether a given JavaScript value is actually an error object or not (did you know that you can throw anything, including numbers and booleans!?). It finally became part of the ECMAScript standard during this meeting.

  • Presenter(s): Jordan Harband

Adding Intl.Locale#variants #

Intl.Locale objects represent Unicode Locale identifiers; i.e., a combination of language, script, region, and preferences for things like collation or calendar type.

For example, de-DE-1901-u-co-phonebk means "the German language as spoken in Germany with the traditional German orthography from 1901, using the phonebook collation". They are composed of a language optionally followed by:

  • a script (i.e. an alphabet)
  • a region
  • one or more variants (such as "the traditional German orthography from 1901")
  • a list of additional modifiers (such as collation)

Intl.Locale objects already had accessors for querying multiple properties about the underlying locale but was missing one for the variants due to an oversight, and the committee reached consensus on also exposing them in the same way.

  • Presenter(s): Richard Gibson

Progress Report: Stage 3 Proposals #

Intl.Locale Info Stage 3 update #

The Intl.Locale Info Stage 3 proposal allows JavaScript applications to query some metadata specific to individual locales. For example, it's useful to answer the question: "what days are considered weekend in the ms-BN locale?".

The committee reached consensus on a change regarding information about text direction: in some locales text is written left-to-right, in others it's right-to-left, and for some of them it's unknown. The proposal now returns undefined for unknown directions, rather than falling back to left-to-right.

  • Presenter(s): Shane F. Carr

Temporal status update igalia logo #

Our colleague Philip Chimento presented a regular status update on Temporal, the upcoming proposal for better date and time support in JS. The biggest news is that Temporal is now available in the latest Firefox release! The Ladybird, Graal, and Boa JS engines all have mostly-complete implementations. The committee agreed to make a minor change to the proposal, to the interpretation of the seconds (:00) component of UTC offsets in strings. (Did you know that there has been a time zone that shifted its UTC offset by just 20 seconds?)

  • Presenter(s): Philip Chimento

Immutable ArrayBuffer update #

The Immutable ArrayBuffer proposal allows creating ArrayBuffers in JS from read-only data, and in some cases allows zero-copy optimizations. After last time, the champions hoped they could get the tests ready for this plenary and ask for stage 3, but they did not manage to finish that on time. However, they did make a very robust testing plan, which should make this proposal "the most well-tested part of the standard library that we've seen thus far". The champions will ask to advance to stage 3 once all of the tests outlined in the plan have been written.

  • Presenter(s): Peter Hoddie, Richard Gibson

Progress Report: Stage 2.7 Proposals #

Iterator Sequencing update #

The iterator sequencing Stage 2.7 proposal introduces a new Iterator.concat method that takes a list of iterators and returns an iterator yielding all of their elements. It's the iterator equivalent of Array.prototype.concat, except that it's a static method.

Michael Ficarra, the proposal's champion, was originally planning to ask for consensus on advancing the proposal to Stage 3: test262 tests had been written, and on paper the proposal was ready. However, that was not possible because the committe discussed some changes about re-using "iterator result" objects that require some changes to the proposal itself (i.e. should Iterator.concat(x).next() return the same object as x.next(), or should it re-create it?).

  • Presenter(s): Michael Ficarra

Progress Report: Stage 2 Proposals #

Iterator Chunking update #

The iterator chunking Stage 2 proposal introduces two new Iterator.prototype.* methods: chunks(size), which splits the iterator into non-overlapping chunks, and windows(size), which generates overlapping chunks offset by 1 element:

[1, 2, 3, 4].values().chunks(2);  // [1,2] and [3,4]
[1, 2, 3, 4].values().windows(2); // [1,2], [2,3] and [3,4]

The proposal champion was planning to ask for Stage 2.7, but that was not possible due to some changes about the .windows behaviour requested by the committee: what should happen when requesting windows of size n out of an iterator that has less than n elements? We considered multiple options:

  1. Do not yield any array, as it's impossible to create a window of size n
  2. Yield an array with some padding (undefined?) at the end to get it to the expected length
  3. Yield an array with fewer than n elements

The committee concluded that there are valid use cases both for (1) and for (3). As such, the proposal will be updated to split .windows() into two separate methods.

  • Presenter(s): Michael Ficarra

AsyncContext web integration brainstorming igalia logo #

AsyncContext is a proposal that allows having state persisted across async flows of control -- like thread-local storage, but for asynchronicity in JS. The champions of the proposal believe async flows of control should not only flow through await, but also through setTimeout and other web features, such as APIs (like xhr.send()) that asynchronously fire events. However, the proposal was stalled due to concerns from browser engineers about the implementation complexity of it.

In this TC39 session, we brainstormed about removing some of the integration points with web APIs: in particular, context propagation through events caused asynchronously. This would work fine for web frameworks, but not for tracing tools, which is the other main use case for AsyncContext in the web. It was pointed out that if the context isn't propagated implicitly through events, developers using tracing libraries might be forced to snapshot contexts even when they're not needed, which would lead to userland memory leaks. In general, the room seemed to agree that the context should be propagated through events, at the very least in the cases in which this is feasible to implement.

This TC39 discussion didn't do much move the proposal along, and we weren't expecting it to do so -- browser representatives in TC39 are mostly engineers working on the core JS engines (such as SpiderMonkey, or V8), while the concerns were coming from engineers working on web APIs. However, the week after this TC39 plenary, Igalia organized the Web Engines Hackfest, also in A Coruña, where we could resume this conversation with the relevant people in the room. As a result, we've had positive discussions with Mozilla engineers about a possible path forward for the proposal that did propagate the context through events, analyzing more in detail the complexity of some specific APIs where we expect the propagation to be more complex.

Math.clamp for Stage 2 #

The Math.clamp proposal adds a method to clamp a numeric value between two endpoints of a range. This proposal reached stage 1 last February, and in this plenary we discussed and resolved some of the open issues it had:

  • One of them was whether the method should be a static method Math.clamp(min, value, max), or whether it should be a method on Number.prototype so you could do value.clamp(min, max). We opted for the latter, since in the former the order of the arguments might not be clear.
  • Another was whether the proposal should support BigInt as well. Since we're making clamp a method of Number, we opted to only support the JS number type. A follow-up proposal might add this on BigInt.prototype as well.
  • Finally, there was some discussion about whether clamp should throw an exception if min is not lower or equal to max; and in particular, how this should work with positive and negative zeros. The committee agreed that this can be decided during Stage 2.

With this, the Math.clamp (or rather, Number.prototype.clamp) proposal advanced to stage 2. The champion was originally hoping to get to Stage 2.7, but they ended up not proposing it due to the pending planned changes to the proposed specification text.

  • Presenter(s): Oliver Medhurst

Seeded PRNG for Stage 2 #

As it stands, JavaScript's built-in functionality for generating (pseudo-)random numbers does not accept a seed, a piece of data that anchors the generation of random numbers at a fixed place, ensuring that repeated calls to Math.random, for example, produce a fixed sequence of values. There are various use cases for such numbers, such as testing (how can I lock down the behavior of a function that calls Math.random for testing purposes if I don't know what it will produce?). This proposal seeks to add a new top-level Object, Random, that will permit seeding of random number generation. It was generally well received and advanced to stage 2.

  • Presenter(s): Tab Atkins-Bittner

Progress Report: Stage 1 Proposals #

More random functions for stage 1 #

Tab Atkins-Bittner, who presented the Seeded PRNG proposal, continued in a similar vein with "More random functions". The idea is to settle on a set of functions that frequently arise in all sorts of settings, such as shuffling an array, generating a random number in an interval, generating a random boolean, and so on. There are a lot of fun ideas that can be imagined here, and the committee was happy to advance this proposal to stage 1 for further exploration.

  • Presenter(s): Tab Atkins-Bittner

Keep trailing zeros in Intl.NumberFormat and Intl.PluralRules for Stage 1 #

Eemeli Aro of Mozilla proposed a neat bugfix for two parts of JavaScript's internationalization API that handle numbers. At the moment, when a digit string, such as "123.456" is given to the Intl.PluralRules and Intl.NumberFormat APIs, the string is converted to a Number. This is generally fine, but what about digit strings that contain trailing zeroes, such as "123.4560"? At the moment, that trailing zero gets removed and cannot be recovered. Eemeli suggest that we keep such digits. They make a difference when formatting numbers and in using them for pluralizing words, such as "1.0 stars". This proposal advanced to stage 1, with the understanding that some work needs to be done to clarify how some some already-existing options in the NumberFormat and PluralRules APIs are to be understood when handling such strings. Eemeli's proposal is now at stage 1!

  • Presenter(s): Eemeli Aro

Decimal Stage 1 update igalia logo #

We shared the latest developments on the Decimal proposal and its potential integration with Intl, focusing on the concept of amounts. These are lightweight wrapper classes designed to pair a decimal number with an integer "precision", representing either the number of significant digits or the number of fractional digits, depending on context. The discussion was a natural follow-on to the earlier discussion of keeping trailing zeroes in Intl.NumberFormat and Intl.PluralRules. In discussions about decimal, we floated the idea of a string-based version of amounts, as opposed to one backed by a decimal, but this was a new, work-in-progress idea. It seems that the committee is generally happy with the underlying decimal proposal but not yet convinced about the need for a notion of an amount, at least as it was presented. Decimal stays at stage 1.

  • Presenter(s): Jesse Alama

Comparisons to Stage 1 #

Many JS environments today provide some sort of assertion functions. (For example, console.assert, Node.js's node:assert module, the chai package on NPM.) The committee discussed a new proposal presented by Jacob Smith, Comparisons, which explores whether this kind of functionality should be part of the ECMAScript standard. The proposal reached stage 1, so the investigation and scoping will continue: should it cover rich equality comparisons, should there be some sort of test suite integration, should there be separate debug and production modes? These questions will be explored in future meetings.

  • Presenter(s): Jacob Smith

IDL for ECMAScript #

If you look at the specifications for HTML, the DOM, and other web platform features, you can't miss the Web IDL snippets in there. This IDL is used to describe all of the interfaces available in web browser JS environments, and how each function argument is processed and validated.

IDL does not only apply to the specifications! The IDL code is also copied directly into web browsers' code bases, sometimes with slight modifications, and used to generate C++ code.

Tooru Fujisawa (Arai) from Mozilla brought this proposal back to the committee after a long hiatus, and presented a vision of how the same thing might be done in the ECMAScript specification, gradually. This would lower maintenance costs for any JS engine, not just web browsers. However, the way that function arguments are generally handled differs sufficiently between web platform APIs and the ECMAScript specification that it wouldn't be possible to just use the same Web IDL directly.

Tooru presented some possible paths to squaring this circle: adding new annotations to the existing Web IDL or defining new syntax to support the ECMAScript style of operations.

  • Presenter(s): Tooru Fujisawa

Community Event #

After the meeting on Thursday, we co-organized a community event with the help of our local tech communities. With an exciting agenda full of insightful and unique presentations and a lively networking session afterwards over some snacks, we hope to have started some interesting conversations in the communities and piqued the interest of JavaScript developers around them over these topics.

Conclusion #

The May 2025 plenary was packed with exciting progress across the JavaScript language and internationalization features. It was also a special moment for us at Igalia as proud hosts of the meeting in our hometown of A Coruña. We saw long-awaited proposals like Array.fromAsync, Error.isError, and Explicit Resource Management reach Stage 4, while others continued to evolve through thoughtful discussion and iteration.

We’ll continue sharing updates as the work evolves, until then, thanks for reading, and see you at the next meeting!

July 03, 2025 12:00 AM

July 01, 2025

Igalia WebKit Team

WebKit Igalia Periodical #28

Update on what happened in WebKit in the week from June 24 to July 1.

This was a slow week, where the main highlight are new development releases of WPE WebKit and WebKitGTK.

Cross-Port 🐱

JavaScriptCore 🐟

The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.

Made some further progress bringing the 32-bit version of OMG closer to the 64-bit one

Releases 📦️

WebKitGTK 2.49.3 and WPE WebKit 2.49.3 have been released. These are development snapshots intended to allow those interested to test the new features and improvement which will be part of the next stable release series. As usual, bug reports are welcome in the WebKit Bugzilla.

Community & Events 🤝

The video recording for the talk “Jumping Over the Garden Wall - WPE WebKit on Android” from this year's Web Engines Hackfest is now available for watching.

That’s all for this week!

by Igalia WebKit Team at July 01, 2025 01:29 PM

Brian Kardell

Web Bucks

Web Bucks

Back in September 2024 I wrote a piece about the history of attempts at standardizing some kind of Micropayments going back to the late 90s. Like a lot of things I write, it's the outcome of looking at history and background for things that I'm actively thinking about. An announcement the other day made me think that perhaps now is a good time for a follow up post.

As you probably already know if you're reading this, I write and think a lot about the health of the web ecosystem. We've even got a whole playlist of videos (lots of podcast episodes) on the topic on YouTube. Today, that's nearly all paid for, on all sides, by advertising. In several important respects, it's safe to say that the status quo is under many threats. In several ways it's also worth questioning if the status quo is even good.

When Ted Nelson first imagined Micropayments in the 1960s, he was imaging a fair economic model for digital publishing. We've had many ideas and proposals since then. Web Monetization is one idea which isn't dead yet. Its main ideas involve embedding a declarative link to a "payment pointer" (like a wallet address) where payments can be sent via Interledger. I say "sent", but "streamed" might be more accurate. Interledger is a novel idea which treats money as "packets" and routes small amounts around. Full disclosure: Igalia has been working on some prototype work in Chromium to help see what a native implementation would look like, what its architecture would be and what options this opens (or closes). Our work has been funded by the Interledger Foundation. It does not amount to an endorsement, and it does not mean something will ship. That said, it doesn't mean the opposite either.

You might know that Brave, another Chromium-based browser, has system for creators too. In their model, publishers/creators sign up and verify their domain (or social accounts!), and people browsing those with Brave browsers sort of keep track of that locally, and at the end of the month Brave can batch up and settle accounts of Basic Attention Tokens ("BAT") which they can then pay out to creators in lump sums. As of the time of this writing, Brave has 88 million monthly active users (source) who could be paying its 1.67 million plus content creators and publishers (source).

Finally, in India, UPI offers most transactions free of charge and can also be used for micro payments - it's being used in $240 billion USD / month worth of transactions!

But there's also some "adjacent" stuff that doesn't claim to be micro transactions but somehow are similar:

If you've ever used Microsoft's Bing search engine, they also give you "points" (I like to call them "Bing Bucks") which you can trade in for other stuff (the payment is going in a different direction!). There was also Scroll, years ago, which was aimed to be a kind of universal service you could pay into to remove ads on many properties (it was bought by Twitter and shut down.)

Enter: Offerwall

Just the other day, Google Ad Manager gave a new idea a potentially really signficant boost. I think it's worth looking at: Offerwall. Offerwall lets sites provide potentially a few ways to monetize content, and for users to choose the one that they prefer. For example, a publisher can set up to allow reading their site in exchange for watching an ad (similar to YouTube's model). That's pretty interesting, but far more interesting to me, is that it integrates with a third-party service called Supertab. Supertab lets people provide their own subscriptions - including a tiny fee for this page, or access to the site for some timed pass - 4 hours, 24 hours, a week, etc. It does this with pretty friction-less wallet integration and by 'pooling' the funds until it makes sense to do a real, regular transaction. Perhaps the easiest thing is to look at some of their own examples.

Offerwall also allows other integrations, so maybe we'll see some of these begin to come together somehow too.

It's a very interesting way to split the difference and address a few complaints of micro transaction critics and generally people skeptical that something could gain significant traction. More than that even, it seems to me that by integrating with Google Ad manager it's got about as much advantage as anyone could get (the vast majority of ads are already served with Google Ad manager and this actually tries to expand that).

I'm very keen to see how this all plays out! What do you think will happen? Share your thoughts with me on social media.

July 01, 2025 04:00 AM

June 23, 2025

Igalia WebKit Team

WebKit Igalia Periodical #27

Update on what happened in WebKit in the week from June 16 to June 23.

This week saw a variety of fixes on multimedia, including GStreamer, a fix for JSC, and the addition of analog gamepad buttons support for WPE.

Cross-Port 🐱

Multimedia 🎥

GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.

WebRTC DTMF support was recently implemented in our GstWebRTC backend.

The WebCodecs VideoFrame copyTo() function now correctly handles odd-sized frames.

Multiple MediaRecorder-related improvements landed in main recently (1, 2, 3, 4), and also in GStreamer.

JavaScriptCore 🐟

The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.

JSC saw some fixes in i31 reference types when using Wasm GC.

WPE WebKit 📟

WPE now has support for analog gamepad buttons when using libwpe. Since version 1.16.2 libwpe has the capability to handle analog gamepad button events, but the support on the WPE side was missing. It has now been added, and will be enabled when the appropriate versions of libwpe are used.

That’s all for this week!

by Igalia WebKit Team at June 23, 2025 07:58 PM

Alex Bradbury

Vendor-recommended LLM parameter quick reference

I've been kicking the tires on various LLMs lately, and like many have been quite taken by the pace of new releases especially of models with weights distributed under open licenses, always with impressive benchmark results. I don't have local GPUs so trialling different models necessarily requires using an external host. There are various configuration parameters you can set when sending a query that affect generation and many vendors document recommended settings on the model card or associated documentation. For my own purposes I wanted to collect these together in one place, and also confirm in which cases common serving software like vLLM will use defaults provided alongside the model.

Main conclusions

  • If accessing a model via a hosted API you typically don't have much insight into their serving setup, so explicitly setting parameters client-side is probably your best bet if you want to try out a model and ensure any recommended parameters are applied to generation.
  • Although recent versions of vLLM will take preferred parameters from generation_config.json, not all models provide that file or if they do, they may not include their documented recommendations in it.
  • Some model providers have very strong and clear recommendations about which parameters to set to which values, for others it's impossible to find any guidance one way or another (or even what sampling setup was used for their benchmark results).
  • Sadly there doesn't seem to be a good alternative to trawling through the model descriptions and associated documentation right now (though hopefully this page helps!).
  • Even if every model starts consistently setting preferred parameters in generation_config.json (and inference API providers respect this), and/or a standard like model.yaml is adopted containing these parameters, some attention may still be required if a model has different recommendations for different use cases / modes (as Qwen3 does).
  • And of course there's a non-conclusion on how much this really matters. I don't know. Clearly for some models it's deemed very important, for the other's it's not always clear whether it just doesn't matter much, or if the model producer has done a poor job of documenting it.

Overview of parameters

The parameters supported by vLLM are documented here, though not all are supported in the HTTP API provided by different vendors. For instance, the subset of parameters supported by models on Parasail (an inference API provider I've been kicking the tires on recently) is documented here I cover just that subset below:

  • temperature: controls the randomness of sampling of tokens. Lower values are more deterministic, higher values are more random. This is one the parameters you'll see spoken about the most.
  • top_p: limits the tokens that are considered. If set to e.g. 0.5 then only consider the top most probable tokens whose summed probability doesn't exceed 50%.
  • top_k: also limits the tokens that are considered, such that only the top k tokens are considered.
  • frequency_penalty: penalises new tokens based on their frequency in the generated text. It's possible to set a negative value to encourage repetition.
  • presence_penalty: penalises new tokens if they appear in the generated text so far. It's possible to set a negative value to encourage repetition.
  • repetition_penalty: This is documented as being a parameter that penalises new tokens based on whether they've appeared so far in the generated text or prompt.
    • Based on that description it's not totally obvious how it differs from the frequency or presence penalties, but given the description talks about values less than 1 penalising repeated tokens and less than 1 encouraging repeated tokens we can infer this is applied as a multiplication on rather than an addition.
    • We can confirm this implementation by tracing through where penalties are applied in vllm's sampler.py, which in turn calls the apply_penalties helper function. This confirms how the frequency and presence penalties are applied based only on the output, unlike the repetition penalty is applied taking the prompt into account as well. Following the call-stack down to an implementation of the repetition penalty shows that if the logit is positive, it divides by the penalty and otherwise multiplies by it.
    • This was a pointless sidequest as this is a vllm-specific parameter that none of the models I've seen has a specific recommendation for.

Default vLLM behaviour

The above settings are typically exposed via the API, but what if you don't explicitly set them? vllm documents that it will by default apply settings from generation_config.json distributed with the model on HuggingFace if it exists (overriding its own defaults), but you can ignore generation_config.json to just use vllm's own defaults by setting --generation-config vllm when launching the server. This behaviour was introduced in a PR that landed in early March this year. We'll explore below which models actually have a generation_config.json with their recommended settings, but what about parameters not set in that file, or if that file isn't present? As far as I can see, that's where _DEFAULT_SAMPLING_PARAMS comes in and we get temperature=1.0 and repetition_penalty, top_p, top_k and min_p set to values that have no effect on the sampler.

Although Parasail use vllm for serving most (all?) of their hosted models, it's not clear if they're running with a configuration that allows defaults to be taken from generation_config.json. I'll update this post if that is clarified.

As all of these models are distributed with benchmark results front and center, it should be easy to at least find what settings were used for these results, even if it's not an explicit recommendation on which parameters to use - right? Let's find out. I've decided to step through models groups by their level of openness.

Open weight and open dataset models

Open weight models

  • deepseek-ai/DeepSeek-V3-0324
    • Recommendation: temperature=0.3 (specified on model card)
    • generation_config.json with recommended parameters: No.
    • Notes:
      • This model card is what made me pay more attention to these parameters - DeepSeek went as far as to map a temperature of 1.0 via the API to their recommended 0.3 (temperatures between 0 and 1 are multiplied by 0.7, and they subtract 0.7 for temperatures between 1 and 2). So clearly they're keen to override clients that default to setting temperture=1.0.
      • There's no generation_config.json and the V3 technical report indicates they used temperature=0.7 for for some benchmarks. They also state "Benchmarks containing fewer than 1000 samples are tested multiple times using varying temperature settings to derive robust final results" (not totally clear if results are averaged, or the best result is taken). There's no recommendation I can see for other generation parameters, and to add some extra confusion the DeepSeek API docs have a page on the temperature parameter with specific recommendations for different use cases and it's not totally clear if these apply equally to V3 (after its temperature scaling) and R1.
  • deepseek-ai/DeepSeek-R1-0528
    • Recommendation: temperature=0.6, top_p=0.95 (specified on model card)
    • generation_config.json with recommended parameters: Yes.
    • Notes: They report using temperature=0.6 and top_p=0.95 for their benchmarks (this is stated both on the model card and the paper) and state that temperature=0.6 is the value used for the web chatbot interface. They do have a generation_config.json that includes that setting.
  • ibm-granite/granite-3.3-8b-instruct
    • Recommendation: None.
    • generation_config.json with recommended parameters: File exists, sets no parameters.
  • microsoft/phi-4
  • microsoft/Phi-4-reasoning
    • Recommendation: temperature=0.8, top_k=50, top_p=0.95 (specified on model card)
    • generation_config.json with recommended parameters: Yes.
  • mistralai/Mistral-Small-3.2-24B-Instruct-2506
    • Recommendation: temperature=0.15. (specified on model card)
    • generation_config.json with recommended parameters: Yes
    • Recommends temperature=0.15 and includes this in generation_config.json.
    • Notes: I saw that one of Mistral's API methods for their hosted models returns the default_model_temperature. Executing curl --location "https://api.mistral.ai/v1/models" --header "Authorization: Bearer $MISTRAL_API_KEY" | jq -r '.data[] | "\(.name): \(.default_model_temperature)"' | sort gives some confusing results. The mistral-small-2506 version isn't yet available on the API. But the older mistral-small-2501 is, with a default temperature of 0.3 (differing from the recommendation on the model card. mistral-small-2503 has null for its default temperature. Go figure.
  • mistralai/Devstral-Small-2505
    • Recommendation: Unclear.
    • generation_config.json with recommended parameters: No.
    • Notes: This is a fine-tune of Mistral-Small-3.1. There is no explicit recommendation for temperature on the model card, but the example code does use temperature=0.15. However, this isn't set in generation_config.json (which doesn't set any default parameters) and Mistral's API indicates a default temperature of 0.0.
  • mistralai/Magistral-Small-2506
    • Recommendation: temperature=0.7, top_p=0.95 (specified on model card)
    • generation_config.json with recommended parameters: No (file exists, but parameters missing).
    • Notes: The model card has a very clear recommendation to use temperature=0.7 and top_p=0.95 and this default temperature is also reflected in Mistral's API mentioned above.
  • qwen3 family including Qwen/Qwen3-235B-A22B, Qwen/Qwen3-30B-A3B, Qwen/Qwen3-32B, and more.
    • Recommendation: temperature=0.6, top_p=0.95, top_k=20, min_p=0 for thinking mode and for non-thinking mode temperature=0.7, top_p=0.8, top_k=20 min_p=0 (specified on model card)
    • generation_config.json with recommended parameters: Yes, e.g. for Qwen3-32B (uses the "thinking mode" recommendations). (All the ones I've checked have this at least).
    • Notes: Unlike many others, there is a very clear recommendation under the best practices section of each model card, which for all models in the family that I've checked makes the same recommendation. They also suggest setting the presence_penalty between 0 and 2 to reduce endless repetitions. The Qwen 3 technical report notes the same parameters but also states that for the non-thinking mode they set presence_penalty=1.5 and applied the same setting for thinking mode for the Creative Writing v3 and WritingBench benchmarks.
  • THUDM/GLM-4-32B-0414
    • Recommendation: None.
    • generation_config.json with recommended parameters: No (file exists, but parameters missing).
    • Notes: There's a request for recommended sampling parameters on the HuggingFace page but it's not had a response.
  • THUDM/GLM-Z1-32B-0414
    • Recommendation: temperature=0.6, top_p=0.95, top_k=40 and max_new_tokens=30000 (specified on model card).
    • generation_config.json with recommended parameters: No.

Weight available (non-open) models

  • google/gemma-3-27b-it
    • Recommendation: Allegedly temperature=1.0, top_k=64, top_p=0.96 (source).
    • generation_config.json with recommended parameters: Yes (temperature=1.0 should be the vllm default anyway, so it shouldn't matter it isn't specified).
    • Notes: It was surprising to not see more clarity on this in the model card or technical report, neither of which have an explicit recommendation. As noted above, the generation_config.json does set top_k and top_p and the Unsloth folks apparently had confirmation from the Gemma team on recommended temperature though I couldn't find a public comment directly from the Gemma team.
  • meta-llama/Llama-4-Scout-17B-16E-Instruct
    • Recommendation: temperature=0.6, top_p=0.9 (source: generation_config.json).
    • generation_config.json with recommended parameters: Yes.
    • Notes: There was no discussion of recommended parameters in the model card itself. I accessed generation_config.json via a third-party mirror as providing name and DoB to view it on HuggingFace (as required by Llama's restrictive access policy) seems ridiculous.

model.yaml

As it happens, while writing this blog post I saw Simon Willison blogged about model.yaml. Model.yaml is an initiative from the LM Studio folks to provide a definition of a model and its sources that can be used with multiple local inference tools. This includes the ability to specify preset options for the model. It doesn't appear to be used by anyone else though, and looking at the LM Studio model catalog, taking qwen/qwen3-32b as an example: although the Qwen3 series have very strongly recommended default settings, the model.yaml only sets top_k and min_p, leaving temperature and top_p unset.


Article changelog
  • 2025-06-23: Initial publication date.

June 23, 2025 12:00 PM

Tvrtko Ursulin

Fair(er) DRM GPU scheduler

Introduction #

The DRM GPU scheduler is a shared Direct Rendering Manager (DRM) Linux Kernel level component used by a number of GPU drivers for managing job submissions from multiple rendering contexts to the hardware. Some of the basic functions it can provide are dependency resolving, timeout detection, and most importantly for this article, scheduling algorithms whose essential purpose is picking the next queued unit of work to execute once there is capacity on the GPU.

Different kernel drivers use the scheduler in slightly different ways - some simply need the dependency resolving and timeout detection part, while the actual scheduling happens in the proprietary firmware, while others rely on the scheduler’s algorithms for choosing what to run next. The latter ones is what the work described here is suggesting to improve.

More details about the other functionality provided by the scheduler, including some low level implementation details, are available in the generated kernel documentation repository[1].

Basic concepts and terminology #

Three DRM scheduler data structures (or objects) are relevant for this topic: the scheduler, scheduling entities and jobs.

First we have a scheduler itself, which usually corresponds with some hardware unit which can execute certain types of work. For example, the render engine can often be single hardware instance in a GPU and needs arbitration for multiple clients to be able to use it simultaneously.

Then there are scheduling entities, or in short entities, which broadly speaking correspond with userspace rendering contexts. Typically when an userspace client opens a render node, one such rendering context is created. Some drivers also allow userspace to create multiple contexts per open file.

Finally there are jobs which represent units of work submitted from userspace into the kernel. These are typically created as a result of userspace doing an ioctl(2) operation, which are specific to the driver in question.

Jobs are usually associated with entities and entities are then executed by schedulers. Each scheduler instance will have a list of runnable entities (entities with least one queued job) and when the GPU is available to execute something it will need to pick one of them.

Typically every userspace client will submit at least one such job per rendered frame and the desktop compositor may issue one or more to render the final screen image. Hence, on a busy graphical desktop, we can find dozens of active entities submitting multiple GPU jobs, sixty or more times per second.

The current scheduling algorithm #

In order to select the next entity to run, the scheduler defaults to the First In First Out (FIFO) mode of operation where selection criteria is the job submit time.

The FIFO algorithm in general has some well known disadvantages around the areas of fairness and latency, and also because selection criteria is based on job submit time, it couples the selection with the CPU scheduler, which is also not desirable because it creates an artifical coupling between different schedulers, different sets of tasks (CPU processes and GPU tasks), and different hardware blocks.

This is further amplified by the lack of guarantee that clients are submitting jobs with equal pacing (not all clients may be synchronised to the display refresh rate, or not all may be able to maintain it), the fact their per frame submissions may consist of unequal number of jobs, and last but not least the lack of preemption support. The latter is true both for the DRM scheduler itself, but also for many GPUs in their hardware capabilities.

Apart from uneven GPU time distribution, the end result of the FIFO algorithm picking the sub-optimal entity can be dropped frames and choppy rendering.

Round-robin backup algorithm #

Apart from the default FIFO scheduling algorithm, the scheduler also implements the round-robin (RR) strategy, which can be selected as an alternative at kernel boot time via a kernel argument. Round-robin, however, suffers from its own set of problems.

Whereas round-robin is typically considered a fair algorithm when used in systems with preemption support and ability to assign fixed execution quanta, in the context of GPU scheduling this fairness property does not hold. Here quanta are defined by userspace job submissions and, as mentioned before, the number of submitted jobs per rendered frame can also differ between different clients.

The final result can again be unfair distribution of GPU time and missed deadlines.

In fact, round-robin was the initial and only algorithm until FIFO was added to resolve some of these issue. More can be read in the relevant kernel commit. [2]

Priority starvation issues #

Another issue in the current scheduler design are the priority queues and the strict priority order execution.

Priority queues serve the purpose of implementing support for entity priority, which usually maps to userspace constructs such as VK_EXT_global_priority and similar. If we look at the wording for this specific Vulkan extension, it is described like this[3]:

The driver implementation *will attempt* to skew hardware resource allocation in favour of the higher-priority task. Therefore, higher-priority work *may retain similar* latency and throughput characteristics even if the system is congested with lower priority work.

As emphasised, the wording is giving implementations leeway to not be entirely strict, while the current scheduler implementation only executes lower priorities when the higher priority queues are all empty. This over strictness can lead to complete starvation of the lower priorities.

Fair(er) algorithm #

To solve both the issue of the weak scheduling algorithm and the issue of priority starvation we tried an algorithm inspired by the Linux kernel’s original Completely Fair Scheduler (CFS)[4].

With this algorithm the next entity to run will be the one with least virtual GPU time spent so far, where virtual GPU time is calculated from the the real GPU time scaled by a factor based on the entity priority.

Since the scheduler already manages a rbtree[5] of entities, sorted by the job submit timestamp, we were able to simply replace that timestamp with the calculated virtual GPU time.

When an entity has nothing more to run it gets removed from the tree and we store the delta between its virtual GPU time and the top of the queue. And when the entity re-enters the tree with a fresh submission, this delta is used to give it a new relative position considering the current head of the queue.

Because the scheduler does not currently track GPU time spent per entity this is something that we needed to add to make this possible. It however did not pose a significant challenge, apart having a slight weakness with the up to date utilisation potentially lagging slightly behind the actual numbers due some DRM scheduler internal design choices. But that is a different and wider topic which is out of the intended scope for this write-up.

The virtual GPU time selection criteria largely decouples the scheduling decisions from job submission times, to an extent from submission patterns too, and allows for more fair GPU time distribution. With a caveat that it is still not entirely fair because, as mentioned before, neither the DRM scheduler nor many GPUs support preemption, which would be required for more fairness.

Solving the priority starvation #

Because priority is now consolidated into a single entity selection criteria we were also able to remove the per priority queues and eliminate priority based starvation. All entities are now in a single run queue, sorted by the virtual GPU time, and the relative distribution of GPU time between entities of different priorities is controlled by the scaling factor which converts the real GPU time into virtual GPU time.

Code base simplification #

Another benefit of being able to remove per priority run queues is a code base simplification. Going further than that, if we are able to establish that the fair scheduling algorithm has no regressions compared to FIFO and RR, we can also remove those two which further consolidates the scheduler. So far no regressions have indeed been identified.

Real world examples #

As an first example we set up three demanding graphical clients, one of which was set to run with low priority (VK_QUEUE_GLOBAL_PRIORITY_LOW_EXT).

One client is the Unigine Heaven benchmark[6] which is simulating a game, while the other two are two instances of the deferredmultisampling Vulkan demo from Sascha Willems[7], modified to support running with the user specified global priority. Those two are simulating very heavy GPU load running simultaneouosly with the game.

All tests are run on a Valve Steam Deck OLED with an AMD integrated GPU.

First we try the current FIFO based scheduler and we monitor the GPU utilisation using the gputop[8] tool. We can observe two things:

  1. That the distribution of GPU time between the normal priority clients is not equal.
  2. That the low priority client is not getting any GPU time.

FIFO scheduling uneven GPU distribution and low priority starvation

Switching to the CFS inspired (fair) scheduler the situation changes drastically:

  1. GPU time distribution between normal priority clients is much closer together.
  2. Low priority client is not starved, but receiving a small share of the GPU.

New scheduler even GPU distribution and no low priority starvation

Note that the absolute numbers are not static but represent a trend.

This proves that the new algorithm can make the low priority useful for running heavy GPU tasks in the background, similar to what can be done on the CPU side of things using the nice(1) process priorities.

Synthetic tests #

Apart from experimenting with real world workloads, another functionality we implemented in the scope of this work is a collection of simulated workloads implemented as kernel unit tests based on the recently merged DRM scheduler mock scheduler unit test framework[9][10]. The idea behind those is to make it easy for developers to check for scheduling regressions when modifying the code, without the need to set up sometimes complicated testing environments.

Let us look at a few examples on how the new scheduler compares with FIFO when using those simulated workloads.

First an easy, albeit exaggerated, illustration of priority starvation improvements.

Solved low priority starvation

Here we have a normal priority client and a low priority client submitting many jobs asynchronously (only waiting for the submission to finish after having submitted the last job). We look at the number of outstanding jobs (queue depth - qd) on the Y axis and the passage of time on the X axis. With the FIFO scheduler (blue) we see that the low priority client is not making any progress whatsoever, all until the all submission of the normal client have been completed. Switching to the CFS inspired scheduler (red) this improves dramatically and we can see the low priority client making slow but steady progress from the start.

Second example is about fairness where two clients are of equal priority:

Fair GPU time distribution

Here the interesting observation is that the new scheduler graphed lines are much more straight. This means that the GPU time distribution is more equal, or fair, because the selection criteria is decoupled from the job submission time but based on each client’s GPU time utilisation.

For the final set of test workloads we will look at the rate of progress (aka frames per second, or fps) between different clients.

In both cases we have one client representing a heavy graphical load, and one representing an interactive, lightweight client. They are running in parallel but we will only look at the interactive client in the graphs. Because the goal is to look at what frame rate the interactive client can achieve when competing for the GPU. In other words we use that as a proxy for assessing user experience of using the desktop while there is simultaneous heavy GPU usage from another client.

The interactive client is set up to spend 1ms of GPU time in every 10ms period, resulting in an effective GPU load of 10%.

First test is with a heavy client wanting to utilise 75% of the GPU by submitting three 2.5ms jobs back to back, repeating that cycle every 10ms.

Interactive client vs heavy load

We can see that the average frame rate the interactive client achieves with the new scheduler is much higher than under the current FIFO algorithm.

For the second test we made the heavy GPU load client even more demanding by making it want to completely monopolise the GPU. It is now submitting four 50ms jobs back to back, and only backing off for 1us before repeating the loop.

Interactive client vs very heavy load

Again the new scheduler is able to give significantly more GPU time to the interactive client compared to what FIFO is able to do.

Conclusions #

From all the above it appears that the experiment was successful. We were able to simplify the code base, solve the priority starvation and improve scheduling fairness and GPU time allocation for interactive clients. No scheduling regressions have been identified to date.

The complete patch series implementing these changes is available at[11].

Potential for further refinements #

Because this work has simplified the scheduler code base and introduced entity GPU time tracking, it also opens up the possibilities for future experimenting with other modern algorithms. One example could be an EEVDF[12] inspired scheduler, given that algorithm has recently improved upon the kernel’s CPU scheduler and is looking potentially promising for it is combining fairness and latency in one algorithm.

Connection with the DRM scheduling cgroup controller proposal #

Another interesting angle is that, as this work implements scheduling based on virtual GPU time, which as a reminder is calculated by scaling the real time by a factor based on entity priority, it can be tied really elegantly to the previously proposed DRM scheduling cgroup controller.

There we had group weights already which can now be used when scaling the virtual time and lead to a simple but effective cgroup controller. This has already been prototyped[13], but more on that in a following blog post.

References #


  1. https://docs.kernel.org/gpu/drm-mm.html#gpu-scheduler ↩︎

  2. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.16-rc2&id=08fb97de03aa2205c6791301bd83a095abc1949c ↩︎

  3. https://registry.khronos.org/vulkan/specs/latest/man/html/VK_EXT_global_priority.html ↩︎

  4. https://en.wikipedia.org/wiki/Completely_Fair_Scheduler ↩︎

  5. https://en.wikipedia.org/wiki/Red–black_tree ↩︎

  6. https://benchmark.unigine.com/heaven ↩︎

  7. https://github.com/SaschaWillems/Vulkan ↩︎

  8. https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tools/gputop.c?ref_type=heads ↩︎

  9. https://gitlab.freedesktop.org/tursulin/kernel/-/commit/486bdcac6121cfc5356ab75641fc702e41324e27 ↩︎

  10. https://gitlab.freedesktop.org/tursulin/kernel/-/commit/50898d37f652b1f26e9dac225ecd86b3215a4558 ↩︎

  11. https://gitlab.freedesktop.org/tursulin/kernel/-/tree/drm-sched-cfs?ref_type=heads ↩︎

  12. https://lwn.net/Articles/925371/ ↩︎

  13. https://lore.kernel.org/dri-devel/20250502123256.50540-1-tvrtko.ursulin@igalia.com/ ↩︎

June 23, 2025 12:00 AM

June 16, 2025

Igalia WebKit Team

WebKit Igalia Periodical #26

Update on what happened in WebKit in the week from May 27 to June 16.

After a short hiatus coinciding with this year's edition of the Web Engines Hackfest, this issue covers a mixed bag of new API features, releases, multimedia, and graphics work.

Cross-Port 🐱

A new WebKitWebView::theme-color property has been added to the public API, along with a corresponding webkit_web_view_get_theme_color() getter. Its value follows that of the theme-color metadata attribute declared by pages loaded in the web view. Although applications may use the theme color in any way they see fit, the expectation is that it will be used to adapt their user interface (as in this example) to complement the Web content being displayed.

Multimedia 🎥

GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.

The video capture pipeline has gained the ability to optionally rotate the input before encoding.

WebKitGTK 🖥️

Damage propagation has been toggled for the GTK port: for now only a single rectangle is passed to the UI process, which then is used to let GTK know which part of a WebKitWebView has received changes since the last repaint. This is a first step to get damage tracking code widely tested, with further improvements to be enabled later when considered appropriate.

WPE WebKit 📟

WPE Android 🤖

Adaptation of WPE WebKit targeting the Android operating system.

WPE-Android 0.2.0 has been released. The main change in this version is the update to WPE WebKit 2.48.3, which is the first that can be built for Android out of the box, without needing any additional patching. Thanks to this, we expect that the WPE WebKit version used will receive more frequent updates going forward. The prebuilt packages available at the Maven Central repository have been updated accordingly.

Releases 📦️

WebKitGTK 2.49.2 and WPE WebKit 2.49.2 have been released. These are development snapshots and are intended to let those interested test out upcoming features and improvements, and as usual issue reports are welcome in Bugzilla.

Community & Events 🤝

This year's Web Engines Hackfest had two WebKit-related sessions, and the slides are available already for the WPE-Android talk and the Multimedia in WebKit session. Video recordings will be available later on.

That’s all for this week!

by Igalia WebKit Team at June 16, 2025 09:44 PM

Martín Abente Lahaye

How to integrate systemd-sysupdate with your Yocto-based image

The Yocto project has well-established OS update mechanisms available via third-party layers. But, did you know that recent releases of Yocto already come with a simple update mechanism?

The goal of this blog post is to present an alternative that doesn’t require a third-party layer and explain how it can be integrated with your Yocto-based image.

systemd-sysupdate #

Enter systemd-sysupdate: a mechanism capable of automatically discovering, downloading, and installing A/B-style OS updates. In a nutshell, it provides:

  • Atomic updates for a collection of different resources (files, directories or partitions).
  • Updates from remote and local sources (HTTP/HTTPS and directories).
  • Parallel installed versions A/B/C/… style.
  • Relative small footprint (~10 MiB or roughly 5% increase in our demo image).
  • Basic features are available since systemd 251 (released in May 2022).
  • Optional built-in services for updating and rebooting.
  • Optional DBus interface for applications integration.
  • Optional grouping of resources to be enabled together as features.

Together with automatic boot assessment, systemd-boot, and other tools, we can turn this OS update mechanism into a comprehensive alternative for common scenarios.

Yocto integration #

sysupdate has been available with Yocto releases for a few years now but, in order use it, it requires a few steps:

  1. Identifying the OS resources that need to be updated.
  2. Versioning these resources and the OS.
  3. Enabling sysupdate and providing transfer files for each resource.
  4. Serving updates via a web server.

OS resources to update #

The resources that need to be updated will depend on how the distribution is set up. For this post we’re assuming the following:

  • An image based on the latest Poky release, using systemd and systemd-boot.
  • The kernel, commands, initramfs, and other boot-related files are provided via an Unified Kernel Image (UKI).
  • A single rootfs, using ext4.

A Yocto-based image like this can be described as follows:

kas-poky-demo.yml:

INIT_MANAGER = "systemd"
EFI_PROVIDER = "systemd-boot"
INITRAMFS_IMAGE = "core-image-minimal-initramfs"
QB_KERNEL_ROOT = ""
QB_DEFAULT_KERNEL = "none"
IMAGE_FSTYPES = "wic"
WKS_FILE = "core-image-demo.wks.in"

recipes-core/images/core-image-demo.bb:

SUMMARY = "A demo image with UKI support enabled"
LICENSE = "MIT"
UKI_CMDLINE = "rootwait root=PARTLABEL=rootfs console=${KERNEL_CONSOLE}"
inherit core-image uki

wic/core-image-demo.wks.in:

part /boot --ondisk sda --fstype vfat --part-name ESP --part-type c12a7328-f81f-11d2-ba4b-00a0c93ec93b --source bootimg-efi --sourceparams="loader=systemd-boot,install-kernel-into-boot-dir=false" --align 1024 --active --fixed-size 100M
part / --ondisk sda --fstype=ext4 --source rootfs --part-name rootfs --part-type 4f68bce3-e8cd-4db1-96e7-fbcaf984b709 --align 1024 --use-uuid --fixed-size 300M
bootloader --ptable gpt --timeout=5

Under this specific setup, a full OS update would consist of the following resources:

  • The UKI, a regular file to be updated under the /boot partition.
  • The rootfs, a partition that can be updated in its entirety.

As mentioned before, updating files and partitions is supported by sysupdate. So, we’re good.

Versioning resources and the OS #

In order for sysupdate to determine the current version of the OS, it looks for the os-release file and inspects it for an IMAGE_VERSION field. Therefore, the image version must be included.

Resources that require updating must also be versioned with the image version. Following our previous assumptions:

  • The UKI filename is suffixed with the image version (e.g., uki_0.efi where 0 is the image version).
  • The rootfs partition is also versioned by suffixing the image version in its partition name (e.g., rootfs_0 could be the initial name of the partition).

To implement these changes in your Yocto-based image, the following recipes should be added or overridden:

recipes-core/os-release/os-release.bbappend:

OS_RELEASE_FIELDS += " \
IMAGE_VERSION \
"

OS_RELEASE_UNQUOTED_FIELDS += " \
IMAGE_VERSION \
"

Note that the value of IMAGE_VERSION can be hardcoded, provided by the continuous integration pipeline or determined at build-time (e.g., the current date and time).

recipes-core/images/core-image-demo.bb:

-UKI_CMDLINE = "rootwait root=PARTLABEL=rootfs console=${KERNEL_CONSOLE}"
+UKI_FILENAME = "uki_${IMAGE_VERSION}.efi"
+UKI_CMDLINE = "rootwait root=PARTLABEL=rootfs_${IMAGE_VERSION} console=${KERNEL_CONSOLE}"

wic/core-image-demo.wks.in:

-part / --ondisk sda --fstype=ext4 --source rootfs --part-name rootfs --part-type 4f68bce3-e8cd-4db1-96e7-fbcaf984b709 --align 1024 --use-uuid --fixed-size 300M
+part / --ondisk sda --fstype=ext4 --source rootfs --part-name "rootfs_${IMAGE_VERSION}" --part-type 4f68bce3-e8cd-4db1-96e7-fbcaf984b709 --align 1024 --use-uuid --fixed-size 300M

In the above recipes, we’re adding the suffix to the UKI filename and partition name, and we’re also coupling our UKI directly to its correspondent rootfs partition.

Enabling systemd-sysupdate #

By default, sysupdate is disabled in Yocto’s systemd recipe and there are no “default” transfer files for sysupdate. Therefore you must:

  1. Override systemd build configuration options and dependencies.
  2. Write transfer files for each resource that needs to be updated.
  3. Extend the partitions kickstart file with an additional partition that must mirror the original rootfs partition. This is to support an A/B OS update scheme.

To implement these changes in your Yocto-based image, the following recipes should be added or modified:

recipes-core/systemd/systemd_%.bbappend:

EXTRA_OEMESON:append = " \
-Dfdisk=enabled \
-Dsysupdate=enabled \
-Dsysupdated=enabled \
"

SRC_URI += " \
file://60-rootfs.transfer \
file://70-kernel.transfer \
"

do_install:append() {
install -d ${D}${base_libdir}/sysupdate.d
install -m 0644 ${UNPACKDIR}/60-rootfs.transfer ${D}${base_libdir}/sysupdate.d/
install -m 0644 ${UNPACKDIR}/70-kernel.transfer ${D}${base_libdir}/sysupdate.d/
}

Note that some minor details are omitted from this snippet, but you can find the full source files down below.

recipes-core/systemd/systemd/60-rootfs.transfer:

[Transfer]
ProtectVersion=%A
Verify=no

[Source]
Type=url-file
Path=http://10.0.2.2:3333/
MatchPattern=rootfs_@v.ext4

[Target]
Type=partition
Path=auto
MatchPattern=rootfs_@v
MatchPartitionType=root
InstancesMax=2

recipes-core/systemd/systemd/70-kernel.transfer:

[Transfer]
ProtectVersion=%A
Verify=no

[Source]
Type=url-file
Path=http://10.0.2.2:3333/
MatchPattern=uki_@v.efi

[Target]
Type=regular-file
Path=/EFI/Linux
PathRelativeTo=boot
MatchPattern=uki_@v+@l-@d.efi uki_@v+@l.efi uki_@v.efi
Mode=0444
TriesLeft=3
TriesDone=0
InstancesMax=2

These transfer files define what exactly constitutes a full OS update. Each file contains the following sections:

  • The transfer section, which defines general properties of the transfer (e.g., the fact the A version can’t be overridden).
  • The source section, which defines where to look for updates for these resources (e.g., a specific URL with matching pattern).
  • The target section, which defines where these updated resources must go to (e.g., a partition that matches the naming pattern).

For more information about these section properties check the sysupdate.d documentation.

wic/core-image-demo.wks.in::

 part / --ondisk sda --fstype=ext4 --source rootfs --part-name "rootfs_${IMAGE_VERSION}" --part-type 4f68bce3-e8cd-4db1-96e7-fbcaf984b709 --align 1024 --use-uuid --fixed-size 300M
+part --ondisk sda --source empty --part-name "_empty" --part-type 4f68bce3-e8cd-4db1-96e7-fbcaf984b709 --align 1024 --use-uuid --fixed-size 300M

Note that the _empty partition name is sysupdate’s naming convention for the partition resource type.

Serving the updates #

Updates can be served locally via regular directories or remotely via a regular HTTP/HTTPS web server. For Over-the-air (OTA) updates, HTTP/HTTPS is the correct option. Any web server can be used.

ls -1 ./server/
rootfs_0.ext4
rootfs_1.ext4
SHA256SUMS
uki_0.efi
uki_1.efi

When using HTTP/HTTPS, sysupdate will request a SHA256SUMS checksum file. This file acts as the update server’s “manifest”, describing what updated resources are available.

sha256sum * > SHA256SUMS
python3 -m http.server 3333

Demo #

If you’re interested in seeing these steps in action, watch our presentation at Embedded Recipes 2025 from last May.

Recording from Embedded Recipes 2025

Demo source files #

The source files of the demo shown here and in the presentation are available on GitHub. Give it a try!

June 16, 2025 12:00 AM

June 13, 2025

Luis Henriques

FUSE over io_uring

Over the past few months I had the chance to spend some time looking at an interesting new FUSE feature. This feature, merged into the Linux kernel 6.14 release, has introduced the ability to perform the communication between the user-space server (or FUSE server) and the kernel using io_uring. This means that file systems implemented in user-space will get a performance improvement simply by enabling this new feature.

But let's start with the beginning:

What is FUSE?

Traditionally, file systems in *nix operating systems have been implemented within their (monolithic) kernels. From the BSDs to Linux, file systems were all developed in the kernel. Obviously, the exceptions already existed since the beginning as well. Micro-kernels, for example, could be executed in ring0, while their file systems would run as servers with lower privileged levels. But these were the exceptions.

There are, however, several advantages in implementing them in user-space instead. Here are just a few of the most obvious ones:

  • It's probably easier to find people experienced in writing user-space code than kernel code.
  • It is easier, generally speaking, to develop, debug, and test user-space applications. Not because kernel is necessarily more complex, but because kernel development cycle is slower, requiring specialised tools and knowledge.
  • There are more tools and libraries available in user-space. It's way easier to just pick an already existing compression library to add compression in your file system than having it re-implemented in the kernel. Sure, nowadays the Linux kernel is already very rich in all sorts of library-like subsystems, but still.
  • Security, of course! Code in user-space can be isolated, while in the kernel it would be running in ring0.
  • And, obviously, porting a file system into a different operating systems is much easier if it's written in user-space.

And this is where FUSE can help: FUSE is a framework that provides the necessary infrastructure to make it possible to implement file systems in user-space.

FUSE includes two main components: a kernel-space module, and a user-space server. The kernel-space fuse module is responsible for getting all the requests from the virtual file system layer (VFS), and redirect them to user-space FUSE server. The communication between the kernel and the FUSE server is done through the /dev/fuse device.

There's also a third optional component: libfuse. This is a user-space library that makes life easier for developers implementing a file system as it hides most of the details of the FUSE protocol used to communicate between user- and kernel-space.

The diagram below helps understanding the interaction between all these components.

A FUSE diagram
FUSE diagram

As the diagram shows, when an application wants to execute an operation on a FUSE file system (for example, reading a few bytes from an open file), the workflow is as follows:

  1. The application executes a system call (e.g., read() to read data from an open file) and enters kernel space.
  2. The kernel VFS layer routes the operation to the appropriate file system implementation, the FUSE kernel module in this case. However, if the read() is done on a file that has been recently accessed, the data may already be in the page cache. In this case the VFS may serve the request directly and return the data immediately to the application without calling into the FUSE module.
  3. FUSE will create a new request to be sent to the user-space server, and queues it. At this point, the application performing the read() is blocked, waiting for the operation to complete.
  4. The user-space FUSE file system server gets the new request from /dev/fuse and starts processing it. This may include, for example, network communication in the case of a network file system.
  5. Once the request is processed, the user-space FUSE server writes the reply back into /dev/fuse.
  6. The FUSE kernel module will get that reply, return it to VFS and the user-space application will finally get its data.

As we can seen, there are a lot of blocking operations and context switches between user- and kernel- spaces.

What's io_uring

io_uring is an API for performing asynchronous I/O, meant to replace, for example, the old POSIX API (aio_read(), aio_write(), etc). io_uring can be used instead of read() and write(), but also for a lot of other I/O operations, such as fsync, poll. Or even for network-related operations such as the socket sendmsg() and recvmsg(). An application using this interface will prepare a set of requests (Submit Queue Entries or SQE), add them to Submission Queue Ring (SQR), and notify the kernel about these operations. The kernel will eventually pick these entries, executed them and add completion entries to the Completion Queue Ring (CQR). It's a simple producer-consumer model, as shown in the diagram bellow.

An io_uring diagram
io_uring diagram

What's FUSE over io_uring

As mentioned above, the usage of /dev/fuse for communication between the FUSE server and the kernel is one of the performance bottlenecks when using user-space file systems. Thus, replacing this mechanism by a block of memory (ring buffers) shared between the user-space server and the kernel was expected to result in performance improvements.

The implementation of FUSE over io_uring that was merged into the 6.14 kernel includes a set of SQR/CQR queues per CPU core and, even if not all the low-level FUSE operations are available through io_uring1, the performance improvements are quite visible. Note that, in the future, this design of having a set of rings per CPU may change and may become customisable. For example, it may be desirable to have a set of CPUs dedicated for doing I/O on a FUSE file system, keep other CPUs for other purposes.

Using FUSE over io_uring

One awesome thing about the way this feature was implemented is that there is no need to add any specific support to the user-space server implementations: as long as the FUSE server uses libfuse, all the details are totally transparent to the server.

In order to use this new feature one simply needs to enable it through a fuse kernel module parameter, for example by doing:

echo 1 > /sys/module/fuse/parameters/enable_uring

And then, when a new FUSE file system is mounted, io_uring will be used. Note that the above command needs to be executed before the file system is mounted, otherwise it will keep using the traditional /dev/fuse device.

Unfortunately, as of today, the libfuse library support for this feature hasn't been released yet. Thus, it is necessary to compile a version of this library that is still under review. It can be obtained in the maintainer git tree, branch uring.

After compiling this branch, it's easy to test io_uring using one of the passthrough file system examples distributed with the library. For example, one could use the following set of commands to mount a passthrough file system that uses io_uring:

echo 1 > /sys/module/fuse/parameters/enable_uring
cd <libfuse-build-dir>/examples
./passthrough_hp --uring --uring-q-depth=128 <src-dir> <mnt-dir>

The graphics below show the results of running some very basic read() and write() tests, using a simple setup with the passthrough_hp example file system. The workload used was the standard I/O generator fio.

The graphics on the left are for read() operations, and the ones on the right for write() operations; on the top the graphics are for buffered I/O and on the bottom for direct I/O.

All of them show the I/O bandwidth on the Y axis and the number of jobs (processes doing I/O) on the X axis. The test system used had 8 CPUs, and the tests used 1, 2, 4 and 8 jobs. Also, for each operation different block sizes were used. In these graphics only 4k and 32k block sizes are shown.

Reads Writes
buffered-reads.png buffered-writes.png
dio-reads.png dio-writes.png

The graphics show clearly that the io_uring performance is better than when using the FUSE /dev/fuse device. For the reads, the 4k block size io_uring tests are even better than the 32k tests for the traditional FUSE device. That doesn't happen in the writes, but io_uring are still better.

Conclusion

To summarise, today is already possible to improve the performance of FUSE file systems simply by explicitly enabling the io_uring communication between the kernel and the FUSE server. libfuse still needs to be manually compiled, but this should change very soon, once this library is released with support for this new feature. And this proves once again that user-space file systems are not necessarily "toy" file systems developed by "misguided" people.

Footnotes:

1

For example, /dev/fuse still needs to be used for the initial FUSE setup, for handling kernel INTERRUPT requests and for NOTIFY_* requests.

June 13, 2025 11:00 PM

June 11, 2025

Andy Wingo

whippet in guile hacklog: evacuation

Good evening, hackfolk. A quick note this evening to record a waypoint in my efforts to improve Guile’s memory manager.

So, I got Guile running on top of the Whippet API. This API can be implemented by a number of concrete garbage collector implementations. The implementation backed by the Boehm collector is fine, as expected. The implementation that uses the bump-pointer-allocation-into-holes strategy is less good. The minor reason is heap sizing heuristics; I still get it wrong about when to grow the heap and when not to do so. But the major reason is that non-moving Immix collectors appear to have pathological fragmentation characteristics.

Fragmentation, for our purposes, is memory under the control of the GC which was free after the previous collection, but which the current cycle failed to use for allocation. I have the feeling that for the non-moving Immix-family collector implementations, fragmentation is much higher than for size-segregated freelist-based mark-sweep collectors. For an allocation of, say, 1024 bytes, the collector might have to scan over many smaller holes until you find a hole that is big enough. This wastes free memory. Fragmentation memory is not gone—it is still available for allocation!—but it won’t be allocatable until after the current cycle when we visit all holes again. In Immix, fragmentation wastes allocatable memory during a cycle, hastening collection and causing more frequent whole-heap traversals.

The value proposition of Immix is that if there is too much fragmentation, you can just go into evacuating mode, and probably improve things. I still buy it. However I don’t think that non-moving Immix is a winner. I still need to do more science to know for sure. I need to fix Guile to support the stack-conservative, heap-precise version of the Immix-family collector which will allow for evacuation.

So that’s where I’m at: a load of gnarly Guile refactors to allow for precise tracing of the heap. I probably have another couple weeks left until I can run some tests. Fingers crossed; we’ll see!

by Andy Wingo at June 11, 2025 08:56 PM

June 09, 2025

Olivier Tilloy

Embedded Recipes '25

Last month the Embedded Recipes conference was held in Nice, France. Igalia was sponsoring the event, and my colleague Martín and myself were attending. In addition we both delivered a talk to a highly technical and engaged audience.

My presentation, unlike most other talks, was a high-level overview of how Igalia engineers contribute to SteamOS to shape the future of gaming on Linux, through our contracting work with Valve. Having joined the project recently, this was a challenge (the good kind) to me: it allowed me to gain a much better understanding of what all my colleagues who work on SteamOS do, through conversations I had with them when preparing the presentation. The talk was well received and the feedback I got was overall very positive, and it was followed up by several interesting conversations. I was apprehensive about the questions from the audience, as most of the work I presented wasn’t mine, and indeed some of them had to remain unanswered.

Martín delivered a lightning talk on how to implement OTA updates with systemd-sysupdate on Yocto-based distributions. It was also well received, and followed up by conversations in the Yocto workshop that took place the following day.

I found the selection of presentations overall quite interesting and relevant, and there were plenty of opportunities for networking during lunch, coffee breaks that were splendidly supplied with croissants, fruit juice, cheese and coffee, and a dinner at a beach restaurant.

The mascot reference to a famous French surfer gave me a smile.

Embedded Recipes de Nice

Many thanks to Kevin and all the folks at BayLibre for a top-notch organization in a relaxed and beautiful setting, to fellow speakers for bringing us these talks, and to everyone I talked to in the hallway track for the enriching conversations.

See you all next year in sunny Nice!

June 09, 2025 12:00 AM

May 30, 2025

Igalia WebKit Team

WebKit at the Web Engines Hackfest 2025

The Web Engines Hackfest 2025 is kicking off next Monday in A Coruña and among all the interesting talks and sessions about different engines, there are a few that can be interesting to people involved one way or another with WebKitGTK and WPE:

All talks will be live streamed and a Jitsi Meet link will be available for those interested in participating remotely. You can find all the details at webengineshackfest.org.

by Igalia WebKit Team at May 30, 2025 04:21 PM

May 29, 2025

Víctor Jáquez

GStreamer 1.26 and Igalia

The release of GStreamer 1.26, last March, delivered new features, optimization and improvements. Igalia played its role as long standing contributor, with 382 commits (194 merge requests) from a total of 2666 of commits merged in this release.This blog post takes a closer look on those contributions.

gst-devtools #

This module contains development and validation tools.

gst-dot-viewer #

gst-dot-viewer is a new web tool for real-time pipeline visualization. Our colleague, Thibault, wrote a blog post about its usage.

validate #

GstValidate is a tool to check if elements are behaving as expected.

  • Added support for HTTP Testing.
  • Scenario fixes such as reset pipelines on expected errors to avoid inconsistent states, improved error logging, and async action handling to prevent busy loops.

gst-editing-services #

GStreamer Editing Services is a library to simplify the creation of multimedia editing applications.

  • Enabled reverse playback, by adding a reverse property to nlesource for seamless backward clip playback.
  • Added internal tests for Non-Linear Engine elements.

gst-libav #

GStreamer Libav plug-in contains a set of many popular decoders and encoders using FFmpeg.

  • As part of the effort to support VVC/H.266 in GStreamer FFmpeg VVC/H.266 decoder was exposed.
  • Optimized framerate renegotiation in avviddec without decoder resets.
  • Mapped GST_VIDEO_FORMAT_GRAY10_LE16 format to FFmpeg’s equivalent.

gstreamer #

Core library.

  • Added a tracer for gst-dots-viewer.
  • Log tracers improvements such as, replaced integer codes with readable strings, to track pad’s sticky events, and simplify parameters handling, etc.
  • On pads, don’t push sticky events in response to a FLUSH_STOP event.
  • On queue element, fixed missing notify signals for level changes.
  • Pipeline parser now logs bus error messages during pipeline construction.
  • Fixed gst_util_ceil_log2 utility function.

gst-plugins-base #

GStreamer Base Plugins is a well-groomed and well-maintained collection of plugins. It also contains helper libraries and base classes useful for writing elements.

  • audiorate: respect tolerance property to avoid unnecessary sample adjustments for minor gaps.
  • audioconvert: support reordering of unpositioned input channels.
  • videoconvertscale: improve aspect ratio handling.
  • glcolorconvert: added I422_10XX, I422_12XX, Y444_10XX, and Y444_16XX color formats, and fixed caps negotiation for DMABuf.
  • glvideomixer: handle mouse events.
  • pbutils: added VVC/H.266 codec support
  • encodebasebin: parser fixes.
  • oggdemux: fixed seek to the end of files.
  • rtp: fixed precision for UNIX timestamp.
  • sdp: enhanced debugging messages.
  • parsebin: improved caps negotiation.
  • decodebin3: added missing locks to prevent race conditions.
  • streamsynchronizer: improved documentation.

gst-plugins-good #

GStreamer Good Plugins is a set of plugins considered to have good quality code, correct functionality, and uses LGPL/LGPL+compatible licenses.

  • hlsdemux2: handle empty segments at the beginning of a stream.
  • qtmux and matroska: add support for VVC/H.266.
  • matroskademux:support seek with stop in push mode.
  • rtp: several fixes.
  • osxaudio: fixes.
  • videoflip: support Y444_16LE and Y444_16BE color formats.
  • vpx: enhance error and warning messages.

gst-plugins-bad #

GStreamer Bad Plug+ins is a set of plugins that aren’t up to par compared to the rest. They might be close to being good quality, but they’re missing something, be it a good code review, some documentation, a set of tests, etc.

  • dashsink: a lot of improvements and cleanups, such as unit tests, state and event management.
  • h266parse: enabled vvc1 and vvi1 stream formats, improved code data parsing and negotiatios, along with cleanups and fixes.
  • mpegtsmux and tsdemux: added support for VVC/H.266 codec.
  • vulkan:
    • Added compatibility for timeline semaphores and barriers.
    • Initial support of multiple GPU and dynamic element registering.
    • Vulkan image buffer pool improvements.
    • vulkanh264dec: support interlaced streams.
    • vulkanencoding: rate control and quality level adjustments, update SPS/PPS, support layered DPBs.
  • webrtcbin:
    • Resolved duplicate payload types in SDP offers with RTX and multiple codecs.
    • Transceivers are now created earlier during negotiation to avoid linkage issues.
    • Allow session level in setup attribute in SDP answer.
  • wpevideosrc:
    • code cleanups
    • cached SHM buffers are cleared after caps renegotiation.
    • handle latency queries and post progress messages on bus.
  • srtdec: fixes
  • jpegparse: handle avi1 tag for progressive images
  • va: improve encoders configuration when properties change in run+time, specially rate control.

May 29, 2025 12:00 AM

May 28, 2025

Igalia Compilers Team

Improvements to RISC-V vector code generation in LLVM

main img{ width:75%; border: 1px solid #333; }

Earlier this month, Alex presented "Improvements to RISC-V vector code generation in LLVM" at the RISC-V Summit Europe in Paris. This blog post summarises that talk.

Title slide

Introduction #

So RISC-V, vectorisation, the complexities of the LLVM toolchain and just 15 minutes to cover it in front of an audience with varying specialisations. I was a little worried when first scoping this talk but the thing with compiler optimisations is that the objective is often pretty clear and easy to understand, even if the implementation can be challenging. I'm going to be exploiting that heavily in this talk by trying to focus on the high level objective and problems encountered.

RVV codegen development #

RVV codegen development

Where are we today in terms of the implementation of optimisation of RISC-V vector codegen? I'm oversimplifying the state of affairs here, but the list in the slide above isn't a bad mental model. Basic enablement is done, it's been validated to the point it's enabled by default, we've had a round of additional extension implementation, and a large portion of ongoing work is on performance analysis and tuning. I don't think I'll be surprising any of you if I say this is a huge task. We're never going to be "finished" in the sense that there's always more compiler performance tuning to be done, but there's certainly phases of catching the more obvious cases and then more of a long tail.

Improving RVV code generation #

Improving RVV code generation

What is the compiler trying to do here? There are multiple metrics, but typically we're focused primarily on performance of generated code. This isn't something we do at all costs -- in a general purpose compiler you can't for instance spend 10hrs optimising a particular input. So we need a lot of heuristics that help us arrive at a reasonable answer without exhaustively testing all possibilities.

The kind of considerations for the compiler during compilation includes:

  • Profitability. If you're transforming your code then for sure you want the new version to perform better than the old one! Given the complexity of the transformations from scalar to vector code and costs incurred by moving values between scalar and vector registers, it can be harder than you might think to figure out at the right time whether the vector route vs the scalar route might be better. You're typically estimating the cost of either choice before you've gone and actually applied a bunch of additional optimisations and transformations that might further alter the trade-off.
  • More specific to RISC-V vectors, it's been described before as effectively being a wider than 32-bit instruction width but with the excess encoded in control status registers. If you're too naive about it, you risk switching the vtype CSR more than necessary, adding unwanted overhead.
  • Spilling is when we load and store values to the stack. Minimising this is a standard objective for any target, but the lack of callee saved vector registers in the standard ABI poses a challenge, and this is more subtle but the fact we don't have immediate offsets for some vector instructions can put more pressure on scalar register allocation.
  • Or otherwise just ensuring that we're using the instructions available whenever we can. One of the questions I had was whether I'm going to be talking just about autovectorisation, or vector codegen where it's explicit in the input (e.g. vector datatypes, intrinsics). I'd make the point they're not fully independent, in fact all these kind of considerations are inter-related. If the compiler is doing cost modelling that's telling it vectorisation isn't profitable. Sometimes that's true, sometimes the model isn't detailed enough, or sometimes it's true for the compiler right now because it could be doing a better job of choosing instructions. If I solve the issue of suboptimal instruction selection then it benefits both autovectorisation (as it's more likely to be profitable, or will be more profitable) and potentially the more explicit path (as explicit uses of vectors benefit from the improved lowering).

Just one final point of order I'll say once to avoid repeating myself again and again. I'm giving a summary of improvements made by all LLVM contributors across many companies, rather than just those by my colleagues at Igalia.

Non-power-of-two vectorization #

Non-power-of-two vectorization

The intuition behind both this improvement and the one on the next slide is actually exactly the same. Cast your minds back to 2015 or so when Krste was presenting the vector extension. Some details have changed, but if you look at the slides (or any RVV summary since then) you see code examples with simple minimal loops even for irregularly sized vectors or where the length of a vector isn't fixed at compile time. The headline is that the compiler now generates output that looks a lot more like that handwritten code that better exploits the features of RISC-V vector.

For non-power-of-two vectorisation, I'm talking about the case here where you have a fixed known-at-compile time length. In LLVM this is handled usually by what we call the SLP or Superword Level Parallelism vectorizer. It needed to be taught to handle non-power-of-two sizes like we support in RVV. Other SIMD ISAs don't have the notion of vl and so generating non-power-of-two vector types isn't as easy.

Non-power-of-two vectorization example

The example I show here has pixels with rgb values. Before it would do a very narrow two-wide vector operation then handle the one remaining item with scalar code. Now we directly operate on a 3-element vector.

We are of course using simple code examples for illustration here. If you want to brighten an image as efficiently as possible sticking the per-pixel operation in a separate function like this perhaps isn't how you'd do it!

vl tail folding #

vl tail folding

Often when operating on a loop, you have an input of a certain length and you process it in chunks of some reasonable size. RISC-V vector gives us a lot more flexibility about doing this. If our input vector isn't an exact multiple of our vectorization factor ("chunk size") - which is the calculated vector length used per iteration - we can still process that in RVV using the same vector code path. While for other architectures, as you see with the old code has a vector loop, then may branch to a scalar version for the tail for any remainder elements. Now that's not necessary, LLVM's loop vectorizer can handle these cases properly and we get a single vectorised loop body. This results in performance improvements on benchmarks like x264 where the scalar tail is executed frequently, and improves code size even in cases where there is no direct performance impact.

vl tail folding example

libcall expansion #

libcall expansion

This one is a little bit simpler. It's common for the compiler to synthesise its own version of memcpy/memset when it sees it can generate a more specialised version based on information about alignment or size of the operands. Of course when the vector extension is available the compiler should be able to use it to implement these operations, and now it can.

libcall expansion example

This example shows how a small number of instructions expanded inline might be used to implement memcpy and memcmp. I also note there is a RISC-V vector specific consideration in favour of inlining operations in this case - as the standard calling convention doesn't have any callee-saved vector registers, avoiding the function call may avoid spilling vector registers.

Newer RVV extensions #

Improving codegen for newer RVV extensions

Sometimes of course it's a matter of a new extension letting us do something we couldn't before. We need to teach the compiler how to select instructions in that case, and to estimate the cost. Half precision and bf16 floating point is an interesting example where you introduce a small number of instructions for the values of that type, but otherwise rely on widening to 32-bit. This is of course better than falling back to a libcall or scalarising to use Zfh instruction, but someone needs to be put in the work to convince the compiler of that!

Loop vectorizer f32 widening

Other improvements #

Other improvements

The slide above has a sampling of other improvements. If you'd like to know more about the VL optimizer, my colleague's presentation at EuroLLVM earlier this year is now up on YouTube.

Another fun highlight is llvm-exegesis, this is a tool for detecting microarchitectural implementation details via probing, e.g. latency and throughput of different operations that will help you write a scheduling model. It now supports RVV which is a bit helpful for the one piece of RVV 1.0 hardware we have readily available, but should be a lot more helpful once more hardware reaches the market.

Results #

Results

So, it's time to show the numbers. Here I'm looking at execution time for SPEC CPU 2017 benchmarks (run using LLVM's harness) on at SpacemiT X60 and compiled with the options mentioned above. As you can see, 12 out of 16 benchmarks improved by 5% or more, 7 out of 16 by 10% or more. These are meaningful improvements a bit under 9% geomean when compared to Clang as of March this year to Clang from 18 months prior.

There's more work going in as we speak, such as the optimisation work done by my colleague Mikhail and written up on the RISE blog. Benchmarking done for that work comparing Clang vs GCC showed today's LLVM is faster than GCC in 11 of the 16 tested SPEC benchmarks, slower in 3, and about equal for the other two.

Are we done? Goodness no! But we're making great progress. As I say for all of these presentations, even if you're not directly contributing compiler engineering resources I really appreciate anyone able to contribute by reporting any cases when they compiler their code of interest and don't get the optimisation expected. The more you can break it down and produce minimised examples the better, and it means us compiler engineers can spend more time writing compiler patches rather than doing workload analysis to figure out the next priority.

Testing #

Testing

Adding all these new optimisations is great, but we want to make sure the generated code works and continues to work as these new code generation features are iterated on. It's been really important to have CI coverage for some of these new features including when they're behind flags and not enabled by default. Thank you to RISE for supporting my work here, we have a nice dashboard providing an easy view of just the RISC-V builders.

Future work #

Future work

Here's some directions of potential future work or areas we're already looking. Regarding the default scheduling model, Mikhail's recent work on the Spacemit X60 scheduling model shows how having at least a basic scheduling model can have a big impact (partly as various code paths are pessimised in LLVM if you don't at least have something). Other backends like AArch64 pick a reasonable in-order core design on the basis that scheduling helps a lot for such designs, and it's not harmful for more aggressive OoO designs.

Thank you #

Thank you

To underline again, I've walked through progress made by a whole community of contributors not just Igalia. That includes at least the companies mentioned above, but more as well. I really see upstream LLVM as a success story for cross-company collaboration within the RISC-V ecosystem. For sure it could be better, there are companies doing a lot with RISC-V who aren't doing much with the compiler they rely on, but a huge amount has been achieved by a contributor community that spans many RISC-V vendors. If you're working on the RISC-V backend downstream and looking to participate in the upstream community, we run biweekly contributor calls (details on the RISC-V category on LLVM's Discourse that may be a helpful way to get started.

Thank you for reading!

May 28, 2025 12:00 AM

May 26, 2025

Igalia WebKit Team

WebKit Igalia Periodical #25

Update on what happened in WebKit in the week from May 19 to May 26.

This week saw updates on the Android version of WPE, the introduction of a new mechanism to support memory-mappable buffers which can lead to better performance, a new gamepad API to WPE, and other improvements.

Cross-Port 🐱

Implemented support for the new 'request-close' command for dialog elements.

JavaScriptCore 🐟

The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.

Added support for using the GDB JIT API when dynamically generating code in JSC.

Graphics 🖼️

Added support for memory-mappable GPU buffers. This mechanism allows to allocate linear textures that can be used from OpenGL, and memory-mapped into CPU-accessible memory. This allows to update the pixel data directly, bypassing the usual glCopyTexSubImage2D logic that may introduce implicit synchronization / perform staging copies / etc. (driver-dependant).

WPE WebKit 📟

WPE Platform API 🧩

New, modern platform API that supersedes usage of libwpe and WPE backends.

Landed a patch to add a gamepads API to WPE Platform with an optional default implementation using libmanette.

WPE Android 🤖

Adaptation of WPE WebKit targeting the Android operating system.

WPE-Android has been updated to use WebKit 2.48.2. Updated packages will be available in the Central repository in the coming days.

The WPE-Android MiniBrowser no longer crashes when opening the “Settings” activity when the system-wide dark user interface mode is enabled.

That’s all for this week!

by Igalia WebKit Team at May 26, 2025 07:38 PM

André Almeida

Linux 6.15, DRM scheduler, wedged events, sched_ext and more

The Linux 6.15 has just been released, bringing a lot of new features:

  • nova-core, the “base” driver for the new NVIDIA GPU driver, written in Rust. nova project will eventually replace Nouveau driver for all GSP-based GPUs.
  • RISC-V gained support for some extensions: BFloat16 floating-point, Zaamo, Zalrsc and ZBKB.
  • The fwctl subsystem has been merged. This new family of drivers acts as a transport layer between userspace and complex firmware. To understand more about its controversies and how it got merged, check out this LWN article.
  • Support for MacBook touch bars, both as a DRM driver and input source.
  • Support for Adreno 623 GPU.

As always, I suggest to have a look at the Kernel Newbies summary. Now, let’s have a look at Igalia’s contributions.

DRM wedged events

In 3D graphics APIs such Vulkan and OpenGL, there are some mechanisms that applications can rely to check if the GPU had reset (you can read more about this in the kernel documentation). However, there was no generic mechanism to inform userspace that a GPU reset has happened. This is useful because in some cases the reset affected not only the app involved in the reset, but the whole graphic stack and thus needs some action to recover, like doing a module rebind or even bus reset to recovery the hardware. For this release, we helped to add an userspace event for this, so a daemon or the compositor can listen to it and trigger some recovery measure after the GPU has reset. Read more in the kernel docs.

DRM scheduler work

In the DRM scheduler area, in preparation for the future scheduling improvements, we worked on cleaning up the code base, better separation of the internal and external interfaces, and adding formal interfaces at places where individual drivers had too much knowledge of the scheduler internals.

General GPU/DRM stack

In the wider GPU stack area we optimised the most frequent dma-fence single fence merge operation to avoid memory allocations and array sorting. This should slightly reduce the CPU utilisation with workloads which use the DRM sync objects heavily, such as the modern composited desktops using Vulkan explicit sync.

Some releases ago, we helped to enable async page flips in the atomic DRM uAPI. So far, this feature was only enabled for the primary plane. In this release, we added a mechanism for the driver to decide which plane can perform async flips. We used this to enable overlay planes to do async flips in AMDGPU driver.

We also fixed a bug in the DRM fdinfo common layer which could cause use after free after driver unbind.

Intel Xe driver improvements

On the Intel GPU specific front we worked on adding better Alderlake-P support to the new Intel Xe driver by identifying and adding missing hardware workarounds, fixed the workaround application in general and also made some other smaller improvements.

sched_ext

When developing and optimizing a sched_ext-based scheduler, it is important to understand the interactions between the BPF scheduler and the in-kernel sched_ext core. If there is a mismatch between what the BPF scheduler developer expects and how the sched_ext core actually works, such a mismatch could often be the source of bugs or performance issues.

To address such a problem, we added a mechanism to count and report the internal events of the sched_ext core. This significantly improves the visibility of subtle edge cases, which might easily slip off. So far, eight events have been added, and the events can be monitored through a BPF program, sysfs, and a tracepoint.

A few less bugs

As usual, as part of our work on diverse projects, we keep an eye on automated test results to look for potential security and stability issues in different kernel areas. We’re happy to have contributed to making this release a bit more robust by fixing bugs in memory management, network (SCTP), ext4, suspend/resume and other subsystems.


This is the complete list of Igalia’s contributions for this release:

Authored (75)

André Almeida

Angelos Oikonomopoulos

Bhupesh

Changwoo Min

Gavin Guo

Guilherme G. Piccoli

Luis Henriques

Maíra Canal

Melissa Wen

Ricardo Cañuelo Navarro

Rodrigo Siqueira

Thadeu Lima de Souza Cascardo

Tvrtko Ursulin

Reviewed (30)

André Almeida

Christian Gmeiner

Iago Toral Quiroga

Jose Maria Casanova Crespo

Luis Henriques

Maíra Canal

Melissa Wen

Rodrigo Siqueira

Thadeu Lima de Souza Cascardo

Tvrtko Ursulin

Tested (2)

Changwoo Min

Guilherme G. Piccoli

Acked (12)

Changwoo Min

Maíra Canal

Tvrtko Ursulin

Maintainer SoB (2)

Maíra Canal

Tvrtko Ursulin

May 26, 2025 12:00 AM

May 22, 2025

Andy Wingo

whippet lab notebook: guile, heuristics, and heap growth

Greets all! Another brief note today. I have gotten Guile working with one of the Nofl-based collectors, specifically the one that scans all edges conservatively (heap-conservative-mmc / heap-conservative-parallel-mmc). Hurrah!

It was a pleasant surprise how easy it was to switch—from the user’s point of view, you just pass --with-gc=heap-conservative-parallel-mmc to Guile’s build (on the wip-whippet branch); when developing I also pass --with-gc-debug, and I had a couple bugs to fix—but, but, there are still some issues. Today’s note thinks through the ones related to heap sizing heuristics.

growable heaps

Whippet has three heap sizing strategies: fixed, growable, and adaptive (MemBalancer). The adaptive policy is the one I would like in the long term; it will grow the heap for processes with a high allocation rate, and shrink when they go idle. However I won’t really be able to test heap shrinking until I get precise tracing of heap edges, which will allow me to evacuate sparse blocks.

So for now, Guile uses the growable policy, which attempts to size the heap so it is at least as large as the live data size, times some multiplier. The multiplier currently defaults to 1.75×, but can be set on the command line via the GUILE_GC_OPTIONS environment variable. For example to set an initial heap size of 10 megabytes and a 4× multiplier, you would set GUILE_GC_OPTIONS=heap-size-multiplier=4,heap-size=10M.

Anyway, I have run into problems! The fundamental issue is fragmentation. Consider a 10MB growable heap with a 2× multiplier, consisting of a sequence of 16-byte objects followed by 16-byte holes. You go to allocate a 32-byte object. This is a small object (8192 bytes or less), and so it goes in the Nofl space. A Nofl mutator holds on to a block from the list of sweepable blocks, and will sequentially scan that block to find holes. However, each hole is only 16 bytes, so we can’t fit our 32-byte object: we finish with the current block, grab another one, repeat until no blocks are left and we cause GC. GC runs, and after collection we have an opportunity to grow the heap: but the heap size is already twice the live object size, so the heuristics say we’re all good, no resize needed, leading to the same sweep again, leading to a livelock.

I actually ran into this case during Guile’s bootstrap, while allocating a 7072-byte vector. So it’s a thing that needs fixing!

observations

The root of the problem is fragmentation. One way to solve the problem is to remove fragmentation; using a semi-space collector comprehensively resolves the issue, modulo any block-level fragmentation.

However, let’s say you have to live with fragmentation, for example because your heap has ambiguous edges that need to be traced conservatively. What can we do? Raising the heap multiplier is an effective mitigation, as it increases the average hole size, but for it to be a comprehensive solution in e.g. the case of 16-byte live objects equally interspersed with holes, you would need a multiplier of 512× to ensure that the largest 8192-byte “small” objects will find a hole. I could live with 2× or something, but 512× is too much.

We could consider changing the heap organization entirely. For example, most mark-sweep collectors (BDW-GC included) partition the heap into blocks whose allocations are of the same size, so you might have some blocks that only hold 16-byte allocations. It is theoretically possible to run into the same issue, though, if each block only has one live object, and the necessary multiplier that would “allow” for more empty blocks to be allocated is of the same order (256× for 4096-byte blocks each with a single 16-byte allocation, or even 4096× if your blocks are page-sized and you have 64kB pages).

My conclusion is that practically speaking, if you can’t deal with fragmentation, then it is impossible to just rely on a heap multiplier to size your heap. It is certainly an error to live-lock the process, hoping that some other thread mutates the graph in such a way to free up a suitable hole. At the same time, if you have configured your heap to be growable at run-time, it would be bad policy to fail an allocation, just because you calculated that the heap is big enough already.

It’s a shame, because we lose a mooring on reality: “how big will my heap get” becomes an unanswerable question because the heap might grow in response to fragmentation, which is not deterministic if there are threads around, and so we can’t reliably compare performance between different configurations. Ah well. If reliability is a goal, I think one needs to allow for evacuation, one way or another.

for nofl?

In this concrete case, I am still working on a solution. It’s going to be heuristic, which is a bit of a disappointment, but here we are.

My initial thought has two parts. Firstly, if the heap is growable but cannot defragment, then we need to reserve some empty blocks after each collection, even if reserving them would grow the heap beyond the configured heap size multiplier. In that way we will always be able to allocate into the Nofl space after a collection, because there will always be some empty blocks. How many empties? Who knows. Currently Nofl blocks are 64 kB, and the largest “small object” is 8kB. I’ll probably try some constant multiplier of the heap size.

The second thought is that searching through the entire heap for a hole is a silly way for the mutator to spend its time. Immix will reserve a block for overflow allocation: if a medium-sized allocation (more than 256B and less than 8192B) fails because no hole in the current block is big enough—note that Immix’s holes have 128B granularity—then the allocation goes to a dedicated overflow block, which is taken from the empty block set. This reduces fragmentation (holes which were not used for allocation because they were too small).

Nofl should probably do the same, but given its finer granularity, it might be better to sweep over a variable number of blocks, for example based on the logarithm of the allocation size; one could instead sweep over clz(min-size)–clz(size) blocks before taking from the empty block list, which would at least bound the sweeping work of any given allocation.

fin

Welp, just wanted to get this out of my head. So far, my experience with this Nofl-based heap configuration is mostly colored by live-locks, and otherwise its implementation of a growable heap sizing policy seems to be more tight-fisted regarding memory allocation than BDW-GC’s implementation. I am optimistic though that I will be able to get precise tracing sometime soon, as measured in development time; the problem as always is fragmentation, in that I don’t have a hole in my calendar at the moment. Until then, sweep on Wayne, cons on Garth, onwards and upwards!

by Andy Wingo at May 22, 2025 10:05 AM

May 21, 2025

Eric Meyer

Masonry, Item Flow, and… GULP?

There’s a layout type that web designers have been using for a long time now, and yet can’t be easily done with CSS: “masonry” layout, sometimes called “you know, like Pinterest does it” layout.  Masonry sits sort of halfway between flexbox and grid layout, which is a big part of why it’s been so hard to formalize.  There are those who think of it as an extension of flexbox, and others who think it’s an extension of grid, and both schools of thought have pretty solid cases.

So that’s been a lot of the discussion, which led to competing blog posts from Google (“Feedback needed: How should we define CSS masonry?”) and Apple (“Help us choose the final syntax for Masonry in CSS”).  Brian and I, with special guest star Rachel Andrew, did an Igalia Chats episode about the debate, which I think is a decent exploration of the pros and cons of each approach for anyone interested.

But then, maybe you don’t actually need to explore the two sides of the debate, because there’s a new proposal in town.  It’s currently being called Item Flow (which I can’t stop hearing sung by Eddie Vedder, please send help) and is explained in some detail in a blog post from the WebKit team.  The short summary is that it takes the flow and packing capabilities from flex and grid and puts them into their own set of properties, along with some new capabilities.

As an example, here’s a thing you can currently do with flexbox:

display: flex;
flex-wrap: wrap;
flex-direction: column;

If the current Item Flow proposals are taken as-is, you could get the same behavior with:

display: flex;
item-wrap: wrap;
item-direction: column;

…or, you could more compactly write it as:

display: flex;
item-flow: wrap column;

Now you might be thinking, okay, this just renames some flex properties to talk about items instead and you also get a shorthand property; big deal.  It actually is a big deal, though, because these item-* properties would apply in grid settingsas well.  In other words, you would be able to say:

display: grid;
item-flow: wrap column;

Hold up.  Item wrapping… in grid?!?  Isn’t that just the same as what grid already does?  Which is an excellent question, and not one that’s actually settled.

However, let’s invert the wrapping in grid contexts to consider an example given in the WebKit article linked earlier, which is that you could specify a single row of grid items that equally divide up the row’s width to size themselves, like so:

display: grid;
grid-auto-columns: 1fr;
item-wrap: nowrap;

In that case, a row of five items would size each item to be one-fifth the width of the row, whereas a row of three items would have each item be one-third the row’s width.  That’s a new thing, and quite interesting to ponder.

The proposal includes the properties item-pack and item-slack, the latter of which makes me grin a little like J.R. “Bob” Dobbs but the former of which I find a lot more interesting.  Consider:

display: flex;
item-wrap: wrap;
item-pack: balance;

This would act with flex items much the way text-wrap: balance acts with words.  If you have six flex items of roughly equal size, they’ll balance between two rows to three-and-three rather than five-and-one.  Even if your flex items are of very different sizes, item-pack: balance would do always automatically its best to get the row lengths as close to equal as possible, whether that’s two rows, three rows, four rows, or however many rows.  Or columns!  This works just as well either way.

There are still debates to be had and details to be worked out, but this new direction does feel fairly promising to me.  It covers all of the current behaviors that flex and grid flowing already permit, plus it solves some longstanding gripes about each layout approach and while also opening some new doors.

The prime example of a new door is the aforementioned masonry layout.  In fact, the previous code example is essentially a true masonry layout (because it resembles the way irregular bricks are laid in a wall).  If we wanted that same behavior, only vertically like Pinterest does it, we could try:

display: flex;
item-direction: column;  /* could also be `flex-direction` */
item-wrap: wrap;         /* could also be `flex-wrap` */
item-pack: balance;

That would be harder to manage, though, since for most writing modes on the web, the width is constrained and the height is not.  In other words, to make that work with flexbox, we’d have to set an explicit height.  We also wouldn’t be able to nail down the number of columns.  Furthermore, that would cause the source order to flow down columns and then jump back to the top of the next column.  So, instead, maybe we’d be able to say:

display: grid;
grid-template-columns: repeat(3,1fr);
item-direction: row;
item-pack: dense balance;

If I’ve read the WebKit article correctly, that would allow Pinterest-style layout with the items actually going across the columns in terms of source order, but being laid out in packed columns (sometimes called “waterfall” layout, which is to say, “masonry” but rotated 90 degrees).

That said, it’s possible I’m wrong in some of the particulars here, and even if I’m not, the proposal is still very much in flux.  Even the property names could change, so values and behaviors are definitely up for debate.

As I pondered that last example, the waterfall/Pinterest layout, I thought: isn’t this visual result essentially what multicolumn layout does?  Not in terms of source order, since multicolumn elements run down one column before starting again at the top of the next.  But that seems an easy enough thing to recreate like so:

display: grid;
grid-template-columns: repeat(3,1fr);
item-direction: column;
item-pack: dense balance;

That’s a balanced set of three equally wide columns, just like in multicol.  I can use gap for the column gaps, so that’s handled.  I wouldn’t be able to set up column rules — at least, not right now, though that may be coming thanks to the Edge team’s gap decorations proposal.  But what I would be able to do, that I can’t now, is vary the width of my multiple columns.  Thus:

display: grid;
grid-template-columns: 60% 40%; /* or 3fr 2fr, idc */
item-direction: column;
item-pack: dense balance;

Is that useful?  I dunno!  It’s certainly not a thing we can do in CSS now, though, and if there’s one thing I’ve learned in the past almost three decades, it’s that a lot of great new ideas come out of adding new layout capabilities.

So, if you’ve made it this far, thanks for reading and I strongly encourage you to go read the WebKit team’s post if you haven’t already (it has more detail and a lovely summary matrix near the end) and think about what this could do for you, or what it looks like it might fall short of making possible for you.

As I’ve said, this feels promising to me, as it enables what we thought was a third layout mode (masonry/waterfall) by enriching and extending the layout modes we already have (flex/grid).  It also feels like this could eventually lead to a Grand Unified Layout Platform — a GULP, if you will — where we don’t even have to say whether a given layout’s display is flex or grid, but instead specify the exact behaviors we want using various item-* properties to get just the right ratio of flexible and grid-like qualities for a given situation.

…or, maybe, it’s already there.  It almost feels like it is, but I haven’t thought about it in enough detail yet to know if there are things it’s missing, and if so, what those might be.  All I can say is, my Web-Sense is tingling, so I’m definitely going to be digging more at this to see what might turn up.  I’d love to hear from all y’all in the comments about what you think!


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at May 21, 2025 04:07 PM

May 20, 2025

Igalia Compilers Team

Summary of the April 2025 TC39 plenary

In April, many colleagues from Igalia participated in a TC39 meeting organized remotely to discuss proposed features for the JavaScript standard alongside delegates from various other organizations.

Let's delve together into some of the most exciting updates!

You can also read the full agenda and the meeting minutes on GitHub.

Progress Report: Stage 4 Proposals #

Add notation to Intl.PluralRules igalia logo #

In 2020, the Intl.NumberFormat Unified API proposal added a plethora of new features to Intl.NumberFormat, including compact and other non-standard notations. It was planned that Intl.PluralRules would be updated to work with the notation option to make the two complement each other. This normative change achieved this by adding a notation option to the PluralRules constructor.

Given the very small size of this Intl change, it didn't go through the staging process for proposals and was instead directly approved to be merged into the ECMA-402 specification.

Progress Report: Stage 3 Proposals #

Temporal Stage 3 status update igalia logo #

Our colleague Philip Chimento presented a regular status update on Temporal, the upcoming proposal for better date and time support in JS.

Firefox is at ~100% conformance with just a handful of open questions. The next most conformant implementation, in the Ladybird browser, dropped from 97% to 96% since February — not because they broke anything, but just because we added more tests for tricky cases in the meantime. GraalJS at 91% and Boa at 85% have been catching up.

Completing the Firefox implementation has raised a few interoperability questions which we plan to solve with the Intl Era and Month Code proposal soon.

Explicit Resource Management Stage 3 implementer feedback #

Dan Minor of Mozilla reported on a tricky case with the proposed using keyword for certain resources. The feature is essentially completely implemented in SpiderMonkey, but Dan highlighted an ambiguity about using the new keyword in switch statements. The committee agreed on a resolution of the issue suggested by Dan, including those implementers who have already shipped this stage 3 feature.

  • Champion: Ron Buckton

Changed behavior of Array.fromAsync (Stage 3) after spec PR #2600 #

The JavaScript iterator and async iterator protocols power all modern iteration methods in the language, from for of and for await of to the rest and spread operators, to the modern iterator helpers proposals...

One less-well-known part of these protocols, however, is the optional .throw() and .return() methods, which can be used to influence the iteration itself. In particular, .return() indicates to the iterator that the iteration is finished, so it can perform any cleanup actions. For example, this is called in for of/for await of when the iteration stops early (due to a break, for example).

When using for await of with a sync iterator/iterable, such as an array of promises, each value coming from the sync iterator is awaited. However, a bug was found recently where if one of those promises coming from the sync iterator rejects, the iteration would stop, but the original sync iterator's .return() method would never be called. (Note that in for of with sync iterators, .return() is always called after .next() throws).

In the January TC39 plenary we decided to make it so that such a rejection would close the original sync iterator. In this plenary, we decided that since Array.fromAsync (which is currently stage 3) uses the same underlying spec machinery for this, it also would affect that API.

Progress Report: Stage 2.7 Proposals #

Immutable ArrayBuffers missed Stage 3 #

The Immutable ArrayBuffer proposal allows creating ArrayBuffers in JS from read-only data, and in some cases allows zero-copy optimizations. After advancing to stage 2.7 last time, there is work underway to write conformance tests. The committee considered advancing the proposal to stage 3 conditionally on the tests being reviewed, but decided to defer that to the next meeting.

  • Champions: Mark S. Miller, Peter Hoddie, Richard Gibson, Jack-Works

Upsert to Stage 2.7 #

The notion of "upserting" a value into an object for a key is a great match for a common use case: is it possible to set a value for a property on an object, but, if the object already has that property, update the value in some way? To use CRUD terminology, it's a fusion of inserting and updating. This proposal is proceeding nicely; it just recently achieved stage 2, and achieved stage 2.7 at this plenary, since it has landed a number of test262 tests. This proposal is being worked on by Dan Minor with assistance from a number of students at the University of Bergen, illustrating a nice industry-academia collaboration.

  • Champion: Daniel Minor

Applying non-extensibility to private fields to Stage 2.7 #

JavaScript objects can be made non-extensible using Object.preventExtensions: the value of the properties of a non-extensible object can be changed, but you cannot add new properties to it.

"use strict";

let myObj = { x: 2, y: 3 };
Object.preventExtensions(myObj);
myObj.x = 5; // ok
myObj.z = 4; // error!

However, this only applies to public properties: you can still install new private fields on the object thanks to the "return it from super() trick".

class AddPrivateField extends function (x) { return x } {
#foo = 2;
static hasFoo(obj) { return #foo in obj; }
}

let myObj = { x: 2, y: 3 };
Object.preventExtension(myObj);
AddPrivateField.hasFoo(obj); // false
new AddPrivateField(obj);
AddPrivateField.hasFoo(obj); // true

This new proposal, which went all the way to Stage 2.7 in a single meeting, attempts to make the new AddPrivateField(obj) throw when myObj is non-extensible.

The V8 team is currently investigating the web compatibility of this change.

  • Champions: Mark Miller, Shu-yu Guo, Chip Morningstar, Erik Marks

Progress Report: Stage 2 Proposals #

Withdrawing Records and Tuples, replaced by Stage 1 composite keys #

Records and Tuples was a proposal to support composite primitive types, similar to object and arrays, but that would be deeply immutable and with recursive equality. They also had similar syntax as objects and arrays, but prefixed by #:

const myRecord = #{ name: "Nic", company: "Igalia" };
typeof myRecord; // "record"
myRecord.name = 2; // error
myRecord === #{ name: "Nic", company: "Igalia" }; // true

The proposal reached stage 2 years ago, but then got stuck due to significant performance concerns from browsers:

  • changing the way === works would risk making every existing === usage a little bit slower
  • JavaScript developers were expecting === on these values to be fast, but in reality it would have required either a full traversal of the two records/tuples or complex interning mechanisms

Ashley Claymore, working at Bloomberg, presented a new simpler proposal that would solve one of the use cases of Records and Tuples: having Maps and Sets whose keys are composed of multiple values. The proposal introduces composites: some objects that Map and Set would handle specially for that purpose.

const myMap = new Map();
myMap.set(["foo", "bar"], 3);
myMap.has(["foo", "bar"]); // false, it's a different array with just the same contents

myMap.set(Composite({ 0: "hello", 1: "world" }), 4);
myMap.has(Composite({ 0: "hello", 1: "world" })); // true!
  • Champion: Ashley Claymore

AsyncContext Stage 2 updates igalia logo #

AsyncContext is a proposal that allows storing state which is local to an async flow of control (roughly the async equivalent of thread-local storage in other languages), which was impossible in browsers until now. We had previously opened a Mozilla standards position issue about AsyncContext, and it came back negative. One of the main issues they had is that AsyncContext has a niche use case: this feature would be mostly used by third-party libraries, especially for telemetry and instrumentation, rather than by most developers. And Mozilla reasoned that making those authors' lives slightly easier was not worth the additional complexity to the web platform.

However, we should have put more focus on the facts that AsyncContext would enable libraries to improve the UX for their users, and that AsyncContext is also incredibly useful in many front-end frameworks. Not having access to AsyncContext leads to confusing and hard-to-debug behavior in some frameworks, and forces other frameworks to transpile all user code. We interviewed the maintainers for a number of frameworks to see their use cases, which you can read here.

Mozilla was also worried about the potential for memory leaks, since in a previous version of this proposal, calling .addEventListener would store the current context (that is, a copy of the value for every single AsyncContext.Variable), which would only be released in the corresponding .removeEventListener call -- which almost never happens. As a response we changed our model so that .addEventListener would not store the context. (You can read more about the memory aspects of the proposal here.)

A related concern is developer complexity, because in a previous model some APIs and events used the "registration context" (for events, the context in which .addEventListener is called) while others used the "dispatch context" (for events, the context that directly caused the event). We explained that in our newer model, we always use the dispatch context, and that this model would match the context you'd get if the API was internally implemented in JS using promises -- but that for most APIs other than events, those two contexts are the same. (You can read more about the web integration of AsyncContext here.)

After the presentation, Mozilla still had concerns about how the web integration might end up being a large amount of work to implement, and it might still not be worth it, even when the use cases were clarified. They pointed out that the frameworks do have use cases for the core of the proposal, but that they don't seem to need the web integration.

  • Champions: Andreu Botella, Chengzhong Wu, Justin Ridgewell

Intl Era Month Code Stage 2 Update #

In a post Temporal JavaScript, non-Gregorian calendars can be utilized beyond just Internationalization with a much higher level of detail. Some of this work is relatively uncharted and therefore needs standardization. One of these small but highly significant details is the string IDs for era and months for various calendars. This stage 2 update brought the committee up to speed on some of the design directions of the effort and justified the rationale behind certain tradeoffs including favoring human-readable era codes and removing the requirement of them to be globally unique as well as some of the challenges we have faced with standardizing and programmatically implementing Hijri calendars.

Deferred re-exports to Stage 2 igalia logo #

Originally created as part of the import defer proposal, deferred re-exports allow, well... deferring re-export declarations.

The goal of the proposal is to reduce the cost of unused export ... from statements, as well as providing a minimum basis for tree-shaking behavior that everybody must implement and can be relied upon.

Consider this example:

// my-library/index.js
export { add, multiply } from "./arithmetic.js";
export { union, intersection } from "./sets.js";

If a consumer of my-library only needs the add function, they have two choices:

  • either import my-library's internal files, to only load my-library/arithmetic.js, or
  • import { add } from "./my-library", however causing unnecessary work to load and execute my-library/sets.js (which is not used!).

With deferred re-exports, my-library could mark its own export ... from statements as "free to ignore if unused":

// my-library/index.js
export defer { add, multiply } from "./arithmetic.js";
export defer { union, intersection } from "./sets.js";

Now, when users do import { add } from "./my-library.js", my-library/sets.js will not be loaded and executed: the decision whether it should actually be imported or not has been deferred to my-library's user, who decided to only import what was necessary for the add function.

  • Champion: Nicolò Ribaudo

Progress Report: Stage 1 Proposals #

Disposable AsyncContext.Variable to Stage 1 #

In the AsyncContext proposal, you can't set the value of an AsyncContext.Variable. Instead, you have the .run method, which takes a callback, runs it with the updated state, and restores the previous value before returning. This offers strong encapsulation, making sure that no mutations can be leaked out of the scope. However, this also adds inflexibility in some cases, such as when refactoring a scope inside a function.

The disposable AsyncContext.Variable proposal extends the AsyncContext proposal by adding a way to set a variable without entering a new function scope, which builds on top of the explicit resource management proposal and its using keyword:

const asyncVar = new AsyncContext.Variable();

function* gen() {
// This code with `.run` would need heavy refactoring,
// since you can't yield from an inner function scope.
using _ = asyncVar.withValue(createSpan());
yield computeResult();
yield computeResult2();
// The scope of `_` ends here, so `asyncVar` is restored
// to its previous value.
}

One issue with this is that if the return value of .withValue is not used with a using declaration, the context will never be reset at the end of the scope; so when the current function returns, its caller will see an unexpected context (the context inside the function would leak to the outside). The strict enforcement of using proposal (currently stage 1) would prevent this from happening accidentally, but deliberately leaking the context would still be possible by calling Symbol.enter but not Symbol.dispose. (Note that context leaks are not memory leaks.)

The champions of this proposal explored how to deal with context leaks, and whether it's worth it, since preventing them would require changing the internal using machinery and would make composition of disposables non-intuitive. These leaks are not "unsafe" since you can only observe them with access to the same AsyncContext.Variable, but they are unexpected and hard to debug, and the champions do not know of any genuine use case for them.

The committee resolved on advancing this proposal to stage 1, indicating that it is worth spending time on, but the exact semantics and behaviors still need to be decided.

  • Champions: Chengzhong Wu, Luca Casonato, snek

Stage 1 update for decimal & measure: Amounts igalia logo #

We presented the results of recent discussions in the overlap between the measure and decimal proposals having to do with what we call an Amount: a container for a number (a Decimal, a Number, a BigInt, a digit string) together with precision. The goal is to be able to represent a number that knows how precise it is. The presentation focused on how the notion of an Amount can solve the internationalization needs of the decimal proposal while, at the same time, serving as a building block on which the measure proposal can build by slotting in a unit (or currency). The committee was not quite convinced by this suggestion, but neither did they reject the idea. We have an active biweekly champions call dedicated to the topic of JS numerics, where we will iterate on these ideas and, in all likelihood, present them again to committee at the next TC39 plenary in May at Igalia headquarters in A Coruña. Stay tuned!

  • Champions: Jesse Alama, Jirka Maršík, Andrew Paprocki

Compare strings by code point #

String encoding in programming languages has come a long way since the Olden Times, when anything not 7-bit ASCII was implementation-defined. Now we have Unicode. 32 bits per character is a lot though, so there are various ways to encode Unicode strings that use less space. Common ones include UTF-8 and UTF-16.

You can tell that JavaScript encodes strings as UTF-16 by the fact that string indexing s[0] returns the first 2-byte code unit. Iterators, on the other hand, iterate through Unicode characters ("code points"). Explained in terms of pizza:

> '🍕'[0]  // code unit indexing
'\ud83c'
> '🍕'.length // length in 2-byte code units
2
> [...'🍕'][0] // code point indexing (by using iteration)
'🍕'
> [...'🍕'].length // length in code points
1

It's currently possible to compare JavaScript strings by code units (the < and > operators and the array sort() method) but there's no facility to compare strings by code points. It requires writing complicated code yourself. This is unfortunate for interoperability with non-JS software such as databases, where comparisons are almost always by code point. Additionally, the problem is unique to UTF-16 encoding: with UTF-8 it doesn't matter if you compare by unit or point, because the results are the same.

This is a completely new proposal and the committee decided to move it to stage 1. There's no proposed API yet, just a consensus to explore the problem space.

  • Champions: Mathieu Hofman, Mark S. Miller, Christopher Hiller

Don't Remember Panicking Stage 1 Update #

This proposal discusses a taxonomy of possible errors that can occur when a JavaScript host runs out of memory (OOM) or space (OOS). It generated much discussion about how much can be reasonably expected of a JS host, especially when under such pressure. This question is particularly important for JS engines that are, by design, working with rather limited memory and space, such as embedded devices. There was no request for stage advancement, so the proposal stays at stage 1. A wide variety of options and ways in which to specify JS engine behavior under these extreme conditions were presented, so we can expect the proposal champions to iterate on the feedback they received and come back to plenary with a more refined proposal.

  • Champions: Mark S. Miller, Peter Hoddie, Zbyszek Tenerowicz, Christopher Hiller

Enums for Stage 1 #

Enums have been a staple of TypeScript for a long time, providing a type that represents a finite domain of named constant values. The reason to propose enums in JavaScript after all this time is that some modes of compilation, such as the "type stripping" mode used by default in Node.js, can't support enums unless they're also part of JS.

enum Numbers {
zero = 0,
one = 1,
two = 2,
alsoTwo = two, // self-reference
twoAgain = Numbers.two, // also self-reference
}

console.log(Numbers.zero); // 0

One notable difference with TS is that all members of the enum must have a provided initializer, since automatic numbering can easily cause accidental breaking changes. Having auto-initializers seems to be highly desirable, though, so some ways to extend the syntax to allow them are being considered.

  • Champion: Ron Buckton

May 20, 2025 12:00 AM

May 19, 2025

Igalia WebKit Team

WebKit Igalia Periodical #24

Update on what happened in WebKit in the week from May 12 to May 19.

This week focused on infrastructure improvements, new releases that include security fixes, and featured external projects that use the GTK and WPE ports.

Cross-Port 🐱

Multimedia 🎥

GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.

Fixed a reference cycle in the mediastreamsrc element, which prevented its disposal.

JavaScriptCore 🐟

The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.

Added an internal class that will be used to represent Temporal Duration objects in a way that allows for more precise calculations. This is not a user-visible change, but will enable future PRs to advance Temporal support in JSC towards completion.

WPE WebKit 📟

WPE Platform API 🧩

New, modern platform API that supersedes usage of libwpe and WPE backends.

Added an initial demo application to the GTK4 WPEPlatform implementation.

Screenshot of a Web browser application using a WPEPlatform backend based on GTK4

Releases 📦️

WebKitGTK 2.48.2 and WPE WebKit 2.48.2 have been released. These are paired with a security advisory (WSA-2025-0004: GTK, WPE), and therefore it is advised to update.

On top of security fixes, these release also include correctness fixes, and support for CSS Overscroll Behaviour is now enabled by default.

Community & Events 🤝

GNOME Web has gained a preferences page that allows toggling WebKit features at run-time. Tech Preview builds of the browser will show the settings page by default, while in regular releases it is hidden and may be enabled with the following command:

gsettings set org.gnome.Epiphany.ui webkit-features-page true

This should allow frontend developers to test upcoming features more easily. Note that the settings for WebKit features are not persistent, and they will be reset to their default state on every launch.

Features page in the GNOME Web preferences dialog Features page in the GNOME Web preferences dialog

Infrastructure 🏗️

Landed an improvement to error reporting in the script within WebKit that runs test262 JavaScript tests.

The WebKit Test Runner (WKTR) will no longer crash if invalid UTF-8 sequences are written to the standard error stream, (e.g. from 3rd party libraries' debugging options.

Experimentation is ongoing to un-inline String::find(), which saves ~50 KiB in the resulting binary size worth of repeated implementations of SIMD “find character in UTF-16” and “find character in UTF-32” algorithms. Notably, the algorithm for “find character in ASCII string” was not even part of the inlining.

Added the LLVM repository to the WebKit container SDK. Now it is possible to easily install Clang 20.x with wkdev-setup-default-clang --version=20.

Figured out that a performance bug related to jump threading optimization in Clang 18 resulted in a bottleneck adding up to five minutes of build time in the container SDK. This may be fixed by updating to Clang 20.x.

That’s all for this week!

by Igalia WebKit Team at May 19, 2025 09:10 PM

Melissa Wen

A Look at the Latest Linux KMS Color API Developments on AMD and Intel

This week, I reviewed the last available version of the Linux KMS Color API. Specifically, I explored the proposed API by Harry Wentland and Alex Hung (AMD), their implementation for the AMD display driver and tracked the parallel efforts of Uma Shankar and Chaitanya Kumar Borah (Intel) in bringing this plane color management to life. With this API in place, compositors will be able to provide better HDR support and advanced color management for Linux users.

To get a hands-on feel for the API’s potential, I developed a fork of drm_info compatible with the new color properties. This allowed me to visualize the display hardware color management capabilities being exposed. If you’re curious and want to peek behind the curtain, you can find my exploratory work on the drm_info/kms_color branch. The README there will guide you through the simple compilation and installation process.

Note: You will need to update libdrm to match the proposed API. You can find an updated version in my personal repository here. To avoid potential conflicts with your official libdrm installation, you can compile and install it in a local directory. Then, use the following command: export LD_LIBRARY_PATH="/usr/local/lib/"

In this post, I invite you to familiarize yourself with the new API that is about to be released. You can start doing as I did below: just deploy a custom kernel with the necessary patches and visualize the interface with the help of drm_info. Or, better yet, if you are a userspace developer, you can start developing user cases by experimenting with it.

The more eyes the better.

KMS Color API on AMD

The great news is that AMD’s driver implementation for plane color operations is being developed right alongside their Linux KMS Color API proposal, so it’s easy to apply to your kernel branch and check it out. You can find details of their progress in the AMD’s series.

I just needed to compile a custom kernel with this series applied, intentionally leaving out the AMD_PRIVATE_COLOR flag. The AMD_PRIVATE_COLOR flag guards driver-specific color plane properties, which experimentally expose hardware capabilities while we don’t have the generic KMS plane color management interface available.

If you don’t know or don’t remember the details of AMD driver specific color properties, you can learn more about this work in my blog posts [1] [2] [3]. As driver-specific color properties and KMS colorops are redundant, the driver only advertises one of them, as you can see in AMD workaround patch 24.

So, with the custom kernel image ready, I installed it on a system powered by AMD DCN3 hardware (i.e. my Steam Deck). Using my custom drm_info, I could clearly see the Plane Color Pipeline with eight color operations as below:

└───"COLOR_PIPELINE" (atomic): enum {Bypass, Color Pipeline 258} = Bypass
    ├───Bypass
    └───Color Pipeline 258
        ├───Color Operation 258
        │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, Multiplier, 3D LUT} = 1D Curve
        │   ├───"BYPASS" (atomic): range [0, 1] = 1
        │   └───"CURVE_1D_TYPE" (atomic): enum {sRGB EOTF, PQ 125 EOTF, BT.2020 Inverse OETF} = sRGB EOTF
        ├───Color Operation 263
        │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, Multiplier, 3D LUT} = Multiplier
        │   ├───"BYPASS" (atomic): range [0, 1] = 1
        │   └───"MULTIPLIER" (atomic): range [0, UINT64_MAX] = 0
        ├───Color Operation 268
        │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, Multiplier, 3D LUT} = 3x4 Matrix
        │   ├───"BYPASS" (atomic): range [0, 1] = 1
        │   └───"DATA" (atomic): blob = 0
        ├───Color Operation 273
        │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, Multiplier, 3D LUT} = 1D Curve
        │   ├───"BYPASS" (atomic): range [0, 1] = 1
        │   └───"CURVE_1D_TYPE" (atomic): enum {sRGB Inverse EOTF, PQ 125 Inverse EOTF, BT.2020 OETF} = sRGB Inverse EOTF
        ├───Color Operation 278
        │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, Multiplier, 3D LUT} = 1D LUT
        │   ├───"BYPASS" (atomic): range [0, 1] = 1
        │   ├───"SIZE" (atomic, immutable): range [0, UINT32_MAX] = 4096
        │   ├───"LUT1D_INTERPOLATION" (immutable): enum {Linear} = Linear
        │   └───"DATA" (atomic): blob = 0
        ├───Color Operation 285
        │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, Multiplier, 3D LUT} = 3D LUT
        │   ├───"BYPASS" (atomic): range [0, 1] = 1
        │   ├───"SIZE" (atomic, immutable): range [0, UINT32_MAX] = 17
        │   ├───"LUT3D_INTERPOLATION" (immutable): enum {Tetrahedral} = Tetrahedral
        │   └───"DATA" (atomic): blob = 0
        ├───Color Operation 292
        │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, Multiplier, 3D LUT} = 1D Curve
        │   ├───"BYPASS" (atomic): range [0, 1] = 1
        │   └───"CURVE_1D_TYPE" (atomic): enum {sRGB EOTF, PQ 125 EOTF, BT.2020 Inverse OETF} = sRGB EOTF
        └───Color Operation 297
            ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, Multiplier, 3D LUT} = 1D LUT
            ├───"BYPASS" (atomic): range [0, 1] = 1
            ├───"SIZE" (atomic, immutable): range [0, UINT32_MAX] = 4096
            ├───"LUT1D_INTERPOLATION" (immutable): enum {Linear} = Linear
            └───"DATA" (atomic): blob = 0

Note that Gamescope is currently using AMD driver-specific color properties implemented by me, Autumn Ashton and Harry Wentland. It doesn’t use this KMS Color API, and therefore COLOR_PIPELINE is set to Bypass. Once the API is accepted upstream, all users of the driver-specific API (including Gamescope) should switch to the KMS generic API, as this will be the official plane color management interface of the Linux kernel.

KMS Color API on Intel

On the Intel side, the driver implementation available upstream was built upon an earlier iteration of the API. This meant I had to apply a few tweaks to bring it in line with the latest specifications. You can explore their latest work here. For a more simplified handling, combining the V9 of the Linux Color API, Intel’s contributions, and my necessary adjustments, check out my dedicated branch.

I then compiled a kernel from this integrated branch and deployed it on a system featuring Intel TigerLake GT2 graphics. Running my custom drm_info revealed a Plane Color Pipeline with three color operations as follows:

├───"COLOR_PIPELINE" (atomic): enum {Bypass, Color Pipeline 480} = Bypass
│   ├───Bypass
│   └───Color Pipeline 480
│       ├───Color Operation 480
│       │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, 1D LUT Mult Seg, 3x3 Matrix, Multiplier, 3D LUT} = 1D LUT Mult Seg
│       │   ├───"BYPASS" (atomic): range [0, 1] = 1
│       │   ├───"HW_CAPS" (atomic, immutable): blob = 484
│       │   └───"DATA" (atomic): blob = 0
│       ├───Color Operation 487
│       │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, 1D LUT Mult Seg, 3x3 Matrix, Multiplier, 3D LUT} = 3x3 Matrix
│       │   ├───"BYPASS" (atomic): range [0, 1] = 1
│       │   └───"DATA" (atomic): blob = 0
│       └───Color Operation 492
│           ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, 1D LUT Mult Seg, 3x3 Matrix, Multiplier, 3D LUT} = 1D LUT Mult Seg
│           ├───"BYPASS" (atomic): range [0, 1] = 1
│           ├───"HW_CAPS" (atomic, immutable): blob = 496
│           └───"DATA" (atomic): blob = 0

Observe that Intel’s approach introduces additional properties like “HW_CAPS” at the color operation level, along with two new color operation types: 1D LUT with Multiple Segments and 3x3 Matrix. It’s important to remember that this implementation is based on an earlier stage of the KMS Color API and is awaiting review.

A Shout-Out to Those Who Made This Happen

I’m impressed by the solid implementation and clear direction of the V9 of the KMS Color API. It aligns with the many insightful discussions we’ve had over the past years. A huge thank you to Harry Wentland and Alex Hung for their dedication in bringing this to fruition!

Beyond their efforts, I deeply appreciate Uma and Chaitanya’s commitment to updating Intel’s driver implementation to align with the freshest version of the KMS Color API. The collaborative spirit of the AMD and Intel developers in sharing their color pipeline work upstream is invaluable. We’re now gaining a much clearer picture of the color capabilities embedded in modern display hardware, all thanks to their hard work, comprehensive documentation, and engaging discussions.

Finally, thanks all the userspace developers, color science experts, and kernel developers from various vendors who actively participate in the upstream discussions, meetings, workshops, each iteration of this API and the crucial code review process. I’m happy to be part of the final stages of this long kernel journey, but I know that when it comes to colors, one step is completed for new challenges to be unlocked.

Looking forward to meeting you in this year Linux Display Next hackfest, organized by AMD in Toronto, to further discuss HDR, advanced color management, and other display trends.

May 19, 2025 09:05 PM

Loïc Le Page

Have fun with Cam and Berry

Code repository: have-fun-with-cam-and-berry

The system configuration #

In this tutorial I’m using a Raspberry Pi 5 with a Camera Module 3. Be careful to use the right cable as the default white cable shipped with the camera is for older models of the Raspberry Pi.

Raspberry Pi 5

In order to not have to switch keyboard, mouse, screen, any cables between the device and the development machine, the idea is to do the whole development remotely. Obviously, you can also follow the whole tutorial by developping directly on the Raspberry Pi itself as, once configured, local or remote development is totally transparent.

In my own configuration I only have the Raspberry Pi connected to its power cable and to my local Wifi network. I’m using Visual Studio Code with the Remote-SSH extension on the development machine. In reality the device may be located anywhere in the world as Visual Studio Code is using a SSH tunnel to manage the remote connection in a secure way.

Basically, once the Raspberry Pi OS installed and the device connected to the network, you can install the needed development tools (clang or gcc, git, meson, ninja, etc…) and that’s all. Everything else is done from the development machine where you will install Visual Studio Code and the Remote-SSH extension. The first time the IDE is connecting to the device through SSH, it will automatically install the tools required. The detailed installation process is described here. Once the IDE is connected to the device you can chose which extensions to install locally on the device (like the C/C++ or Meson extensions).

Some useful tricks:

  • Append your public SSH key content (situated by default in ~/.ssh/id_rsa.pub) to the device ~/.ssh/authorized_keys file. It will allow you to connect to the device through ssh without having to enter each time a password.
  • Configure your ssh client (in the ~/.ssh/config file) to forward the ssh agent. It will allow to use securely your local ssh keys to access remote git repositories from the remote device. A typical configuration block would be something like:
    Host berry             [the friendly name that will appear in Visual Studio Code]
      HostName berry.local [the device hostname or IP address]
      User cam             [the username used to access the device with ssh]
      ForwardAgent yes

With those simple tricks, just executing ssh berry is enough to connect to the device without any password and then you can access any git repository locally just like if you were on the development machine itself.

You should also change, in the Meson extension configuration in Visual Studio Code, the build directory name and replace the default builddir by just build because if you are not using IntelliSense but another extension like clangd, it will not find the compile_commands.json file automatically. To update it directly, add this entry to the ~/.config/Code/User/settings.json file:

{
    ...
    "mesonbuild.buildFolder": "build"
}

Basic project initialization #

Let’s create the basic project structure with a simple meson.build file with a dependency on libcamera:

project(
    'cam-and-berry',
    'cpp',
    version: '1.0',
    default_options: ['warning_level=3', 'werror=true', 'cpp_std=c++20'],
)

libcamera_dep = dependency('libcamera', required: true)

executable('cam-and-berry', 'main.cpp', dependencies: libcamera_dep)

And the basic main.cpp file with the libcamera initialization code:

#include <libcamera/libcamera.h>

using namespace libcamera;

int main()
{
    // Initialize the camera manager.
    auto camManager = std::make_unique<CameraManager>();
    camManager->start();

    return 0;
}

You can configure and build the project by calling:

meson setup build
ninja -C build

or by using the tools integrated into Visual Studio Code through the Meson extension.

In order to debug the executable inside the IDE, add a .vscode/launch.json file with this content:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Debug",
            "type": "cppdbg",
            "request": "launch",
            "program": "${workspaceFolder}/build/cam-and-berry",
            "cwd": "${workspaceFolder}",
            "stopAtEntry": false,
            "externalConsole": false,
            "MIMode": "gdb",
            "preLaunchTask": "Meson: Build all targets"
        }
    ]
}

Now, just pressing F5 will build the project and start the debug session on the device while being driven remotely from the development machine.

If everything has worked well so far, you should see the libcamera logs on stderr, something like:

[5:10:53.005657356] [4366] ERROR IPAModule ipa_module.cpp:171 Symbol ipaModuleInfo not found
[5:10:53.005916466] [4366] ERROR IPAModule ipa_module.cpp:291 v4l2-compat.so: IPA module has no valid info
[5:10:53.005942225] [4366]  INFO Camera camera_manager.cpp:327 libcamera v0.4.0+53-29156679
[5:10:53.013988595] [4371]  INFO RPI pisp.cpp:720 libpisp version v1.1.0 e7974a156008 27-01-2025 (21:50:51)
[5:10:53.035006731] [4371]  INFO RPI pisp.cpp:1179 Registered camera /base/axi/pcie@120000/rp1/i2c@88000/imx708@1a to CFE device /dev/media0 and ISP device /dev/media1 using PiSP variant BCM2712_D0

You can disable those logs by adding this line at the beginning of the main function:

logSetTarget(LoggingTargetNone);

List cameras information #

While running (after calling start()) the libcamera::CameraManager initializes and then maintains up-to-date a vector of libcamera::Camera instances each time a physical camera is connected to or removed from the system. In our case we can consider that the Camera Module 3 will always be present as it is connected to the Raspberry internal connector.

We can list the available cameras at any moment by calling:

...

int main()
{
    ...

    // List cameras
    for (const auto& camera : camManager->cameras())
    {
        std::cout << "Camera found: " << camera->id() << std::endl;
    }

    return 0;
}

This should give an output like:

Camera found: /base/axi/pcie@120000/rp1/i2c@88000/imx708@1a

Each retrieved camera has a list of specific properties and controls (which can be different for every model of camera). This information can be listed using the camera properties() and controls() getters.

The idMap() getter in the libcamera::ControlList class returns a map associating each property ID to a property description defined in a libcamera::ControlId instance. It allows to retrieve the property name and global caracteristics.

Using this information we can now have a complete description of the camera properties, available controls and their possible values:

...

// List cameras properties and controls
for (const auto& camera : camManager->cameras())
{
    std::cout << "Camera found: " << camera->id() << std::endl;

    auto& propertiesList = camera->properties();
    auto idMap = propertiesList.idMap();

    std::cout << "# Properties:" << std::endl;
    for (const auto& [id, value] : propertiesList)
    {
        auto property = idMap->at(id);
        std::cout << "  " << property->name() << "(" << id << ") = " << value.toString() << std::endl;
    }

    std::cout << "# Controls:" << std::endl;
    for (const auto& [control, info] : camera->controls())
    {
        std::cout << "  " << control->name() << " = " << info.toString() << std::endl;
    }
}

...

This should give an output like:

Camera found: /base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a
# Properties:
  SystemDevices(10) = [ 20753, 20754, 20755, 20756, 20757, 20758, 20759, 20739, 20740, 20741, 20742 ]
  ScalerCropMaximum(8) = (0, 0)/0x0
  PixelArrayActiveAreas(7) = [ (16, 24)/4608x2592 ]
  PixelArraySize(5) = 4608x2592
  Rotation(2) = 180
  Location(1) = 2
  ColorFilterArrangement(10001) = 0
  UnitCellSize(4) = 1400x1400
  Model(3) = imx708
# Controls:
  AwbEnable = [false..true]
  AwbMode = [0..7]
  ColourTemperature = [100..100000]
  Saturation = [0.000000..32.000000]
  HdrMode = [0..4]
  AeMeteringMode = [0..3]
  Contrast = [0.000000..32.000000]
  AeEnable = [false..true]
  ColourGains = [0.000000..32.000000]
  SyncFrames = [1..1000000]
  ExposureValue = [-8.000000..8.000000]
  AeFlickerMode = [0..1]
  ExposureTime = [1..66666]
  AeExposureMode = [0..3]
  SyncMode = [0..2]
  Brightness = [-1.000000..1.000000]
  Sharpness = [0.000000..16.000000]
  NoiseReductionMode = [0..4]
  AeConstraintMode = [0..3]
  StatsOutputEnable = [false..true]
  ScalerCrop = [(0, 0)/0x0..(65535, 65535)/65535x65535]
  FrameDurationLimits = [33333..120000]
  CnnEnableInputTensor = [false..true]
  AfRange = [0..2]
  AfTrigger = [0..1]
  LensPosition = [0.000000..32.000000]
  AfWindows = [(0, 0)/0x0..(65535, 65535)/65535x65535]
  AnalogueGain = [1.000000..16.000000]
  AfPause = [0..2]
  AfMetering = [0..1]
  AfSpeed = [0..1]
  AfMode = [0..2]
  AeFlickerPeriod = [100..1000000]
  ScalerCrops = [(0, 0)/0x0..(65535, 65535)/65535x65535]

Video live stream #

We are now going to see how we can extract frames from the camera. The camera is not producing frames by itself, the extraction process works on demand: you first need to send a request to the camera to ask for a new frame.

The libcamera library provides a queue to process all those requests. So, basically, you need to create some requests and push them to this queue. When the camera is ready to take an image, it will pop out the next request from the queue and fill its associated buffer with the image content. Once the image is ready, the camera sends a signal to the application to inform that the request has been completed.

If you want to take a simple photo you only need to send one request, but if you want to display or stream some live video you will need to recycle and re-queue the requests once the corresponding frame has been processed. In the following code this is what we are going to do as it will be easy to adapt the code to only take one photo.

flowchart TB
    A(Acquire camera) --> B(Choose configuration)
    B --> C(Allocate buffers and requests)
    C --> D(Start camera)
    D --> E
    subgraph L [Frames extraction loop]
        E(Push request) -->|Frame produced| F(("Request completed
        callback"))
        F --> G(Process frame)
        G --> E
    end
    L --> H(Stop camera)
    H --> I(Free buffers and requests)
    I --> J(Release camera)

In all cases, there are some steps to follow before sending requests to the camera.

Acquire the camera for an exclusive usage #

Let’s consider that we have a camera available and we selected it during the former cameras listing. Our selected camera is called: selectedCamera and it’s a std::shared_ptr<Camera>.

We just have to call: selectedCamera->acquire(); to get an exclusive access to this camera. When we have finished with it, we can release it by calling selectedCamera->release();.

Select a specific configuration #

Once the camera acquired for an exclusive access, we need to configure it. In particular, we need to choose the frames resolution and pixel format. This is done by creating a camera configuration that will be tweaked, validated and applied to the camera.

// Lock the selected camera and choose a configuration for video display.
selectedCamera->acquire();

auto camConfig = selectedCamera->generateConfiguration({StreamRole::Viewfinder});
if (camConfig->empty())
{
    std::cerr << "No suitable configuration found for the selected camera" << std::endl;
    return -2;
}

The libcamera::StreamRole allows to pre-configure the returned stream configurations depending on the intended usage: taking photos (in raw mode or not), doing some video capture for streaming or recording (may provide encoded streams if the camera is able to do it) or doing some video capture for local display.

It returns the default camera configurations for each stream role required.

The default configuration returned may be tweaked with user values. Once modified the configuration must be validated. The camera may refuse those changes or adjust them to fit the device limits. Once validated, the configuration is applied to the selected camera.

auto& streamConfig = camConfig->at(0);
std::cout << "Default camera configuration is: " << streamConfig.toString() << std::endl;

streamConfig.size.width = 1920;
streamConfig.size.height = 1080;
streamConfig.pixelFormat = formats::RGB888;

if (camConfig->validate() == CameraConfiguration::Invalid)
{
    std::cerr << "Invalid camera configuration" << std::endl;
    return -3;
}
std::cout << "Targeted camera configuration is: " << streamConfig.toString() << std::endl;

if (selectedCamera->configure(camConfig.get()) != 0)
{
    std::cerr << "Failed to update the camera configuration" << std::endl;
    return -4;
}
std::cout << "Camera configured successfully" << std::endl;

Allocate the buffers and requests for frames extraction #

The memory for the frames buffers and requests is held by the user. Indeed, the frame content itself is allocated through DMA buffers for which the libcamera::FrameBuffer instance is holding the file descriptors.

The frames buffers are allocated through a libcamera::FrameBufferAllocator instance. When this instance is deleted, all buffers in the internal pool are also deleted, including the associated DMA buffers. So, the lifetime of the FrameBufferAllocator instance must be longer than the lifetime of all the requests associated with buffers from its internal pool.

The same FrameBufferAllocator instance is used to allocate buffers pools for the different streams from the same camera. In our case we are only using a single stream and so we will do the allocation only for this stream.

// Allocate the buffers pool used to fetch frames from the camera.
Stream* stream = streamConfig.stream();
auto frameAllocator = std::make_unique<FrameBufferAllocator>(selectedCamera);
if (frameAllocator->allocate(stream) < 0)
{
    std::cerr << "Failed to allocate buffers for the selected camera stream" << std::endl;
    return -5;
}

auto& buffersPool = frameAllocator->buffers(stream);
std::cout << "Camera stream has a pool of " << buffersPool.size() << " buffers" << std::endl;

Once we have the frames buffers allocated we can create the corresponding requests and associate each buffer with a request. So when the camera receives the request it will fill the associated frame buffer with the next image content.

// Create the requests used to fetch the actual camera frames.
std::vector<std::unique_ptr<Request>> requests;
for (auto& buffer : buffersPool)
{
    auto request = selectedCamera->createRequest();
    if (!request)
    {
        std::cerr << "Failed to create a frame request for the selected camera" << std::endl;
        return -6;
    }

    if (request->addBuffer(stream, buffer.get()) != 0)
    {
        std::cerr << "Failed to add a buffer to the frame request" << std::endl;
        return -7;
    }

    requests.push_back(std::move(request));
}

If the camera supports multistream, additional buffers can be added to a single request (using libcamera::Request::addBuffer) to capture frames for the other streams. However, only one buffer per stream is allowed in the same request.

Frames extraction loop #

Now that we have a pool of requests, each one with its associated frame buffer, we can send them to the camera for processing. Each time the camera has finished with a request, by filling the associated buffer with the actual image, it calls a requestCompleted callback and then continues with the next request in the queue.

When we receive the requestCompleted signal, we can extract the image content from the request buffer and process it. Once the image processing is finished, we recycle the buffer and push again the request in the queue for the next frames. To take a single photo we would only need one buffer and one request, and we would queue this request only once.

// Connect the requests execution callback, it is called each time a frame
// has been produced by the camera.
selectedCamera->requestCompleted.connect(selectedCamera.get(), [&selectedCamera](Request* request) {
    if (request->status() == Request::RequestCancelled)
    {
        return;
    }

    // We can directly take the first request buffer as we are managing
    // only one stream. In case of multiple streams, we should iterate
    // over the BufferMap entries or access the buffer by stream pointer.
    auto buffer = request->buffers().begin()->second;
    auto& metadata = buffer->metadata();
    if (metadata.status == FrameMetadata::FrameSuccess)
    {
        // As we are using a RGB888 color format we have only one plane, but
        // in case of using a multiplanes color format (like YUV420) we
        // should iterate over all the planes.
        std::cout << "Frame #" << std::setw(2) << std::setfill('0') << metadata.sequence
                    << ": time=" << metadata.timestamp << "ns, size=" << metadata.planes().begin()->bytesused
                    << ", fd=" << buffer->planes().front().fd.get() << std::endl;
    }
    else
    {
        std::cerr << "Invalid frame received" << std::endl;
    }

    // Reuse the request buffer and re-queue the request.
    request->reuse(Request::ReuseBuffers);
    selectedCamera->queueRequest(request);
});

Before queueing the first request we need to start the camera and we must stop it when we’ve finished with the frames extraction. The lifetime of all the requests pushed to the camera must be longer than this start/stop loop. Once the camera is stopped, we can delete the corresponding requests as they will not be used anymore.

This implies that the FrameBufferAllocator instance must also outlive this same start/stop loop. If you try to delete the requests vector or the frameAllocator instance before stopping the camera, you will naturally trigger a segmentation fault.

// Start the camera streaming loop and run it for a few seconds.
selectedCamera->start();
for (const auto& request : requests)
{
    selectedCamera->queueRequest(request.get());
}

std::this_thread::sleep_for(1500ms);
selectedCamera->stop();

At the end we clean up the resources. Here it is not really needed as the destructors will do automatically the job. But if you were building a more complex architecture and you need to explicitly free up the resources, that would be the order to follow.

With the current code the only important point here is to explicitly stop the camera before getting out of the main function (and to implicitly trigger the destructors calls), else the frameAllocator instance will be destroyed while the camera is still processing the associated requests, which will lead to a segmentation fault.

// Cleanup the resources. In fact those resources are automatically released
// when the corresponding destructors are called. The only compulsory call
// to make is selectedCamera->stop() as the camera streaming loop MUST be
// stopped before releasing the associated buffers pool.
frameAllocator.reset();
selectedCamera->release();
selectedCamera.reset();
camManager->stop();

If everything has worked well so far, you should see the following output:

Camera found: /base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a
# Properties:
  SystemDevices(10) = [ 20753, 20754, 20755, 20756, 20757, 20758, 20759, 20739, 20740, 20741, 20742 ]
  ScalerCropMaximum(8) = (0, 0)/0x0
  PixelArrayActiveAreas(7) = [ (16, 24)/4608x2592 ]
  PixelArraySize(5) = 4608x2592
  Rotation(2) = 180
  Location(1) = 2
  ColorFilterArrangement(10001) = 0
  UnitCellSize(4) = 1400x1400
  Model(3) = imx708
# Controls:
  AwbEnable = [false..true]
  AwbMode = [0..7]
  ColourTemperature = [100..100000]
  Saturation = [0.000000..32.000000]
  HdrMode = [0..4]
  AeMeteringMode = [0..3]
  Contrast = [0.000000..32.000000]
  AeEnable = [false..true]
  ColourGains = [0.000000..32.000000]
  SyncFrames = [1..1000000]
  ExposureValue = [-8.000000..8.000000]
  AeFlickerMode = [0..1]
  ExposureTime = [1..66666]
  AeExposureMode = [0..3]
  SyncMode = [0..2]
  Brightness = [-1.000000..1.000000]
  Sharpness = [0.000000..16.000000]
  NoiseReductionMode = [0..4]
  AeConstraintMode = [0..3]
  StatsOutputEnable = [false..true]
  ScalerCrop = [(0, 0)/0x0..(65535, 65535)/65535x65535]
  FrameDurationLimits = [33333..120000]
  CnnEnableInputTensor = [false..true]
  AfRange = [0..2]
  AfTrigger = [0..1]
  LensPosition = [0.000000..32.000000]
  AfWindows = [(0, 0)/0x0..(65535, 65535)/65535x65535]
  AnalogueGain = [1.000000..16.000000]
  AfPause = [0..2]
  AfMetering = [0..1]
  AfSpeed = [0..1]
  AfMode = [0..2]
  AeFlickerPeriod = [100..1000000]
  ScalerCrops = [(0, 0)/0x0..(65535, 65535)/65535x65535]
Default camera configuration is: 800x600-XRGB8888
Targeted camera configuration is: 1920x1080-RGB888
Camera configured successfully
Camera stream has a pool of 4 buffers
Frame #07: time=9764218484000ns, size=6220800, fd=31
Frame #08: time=9764269486000ns, size=6220800, fd=32
Frame #09: time=9764329905000ns, size=6220800, fd=33
Frame #10: time=9764389544000ns, size=6220800, fd=34
Frame #11: time=9764449731000ns, size=6220800, fd=31
Frame #12: time=9764509971000ns, size=6220800, fd=32
Frame #13: time=9764570430000ns, size=6220800, fd=33
Frame #14: time=9764630542000ns, size=6220800, fd=34
...

You can download the full code of this part here or directly access to the code repository.

Display the extracted images #

In this part, we are going to display the extracted frames using a small OpenGL ES application. This application will show a rotating cube with a metallic aspect displaying, on each face, the live video stream from the Raspberry Pi 5 camera with an orange/red shade, like in the following video:

For this, we need a little bit more code to initialize the window, the OpenGL context and manage the drawing. The full code is available at the code repository or you can download it here.

We are using the GLFW library to manage the EGL and OpenGL ES contexts and the GLM library to manage the 3D vectors and matrices. Those libraries are included as Meson wraps in the subprojects folder. So, just like with the previous code, to build the project you only need to execute:

meson setup build
ninja -C build

All the 3D rendering part is out of the scope of this tutorial and the corresponding classes have been grouped in the src/rendering subfolder to help focussing on the Camera and CameraTexture classes. If you are also interested in 3D rendering you can find a lot of interesting material on the Web and, in particular, Anton’s OpenGL 4 Tutorials or Learn OpenGL.

Camera <-> Renderer synchronization #

The Camera class is basically a wrapper of the code explained in the previous parts. In this case we are configuring the camera to use a pixel format aligned on 32 bits (XRGB8888) to be compatible with the hardware accelerated rendering.

// We need to choose a pixel format with a stride aligned on 32 bits to be
// compatible with the GLES renderer. We only need 2 buffers, while one
// buffer is used by the GLES renderer, the other one is filled by the
// camera next frame and then both buffers are swapped.
streamConfig.size.width = captureWidth;
streamConfig.size.height = captureHeight;
streamConfig.pixelFormat = libcamera::formats::XRGB8888;
streamConfig.bufferCount = 2;

We are also using 2 buffers as one buffer will be rendered on screen while the other buffer will receive the next camera frame, and then we’ll switch both buffers. We already know that when the requestCompleted signal is triggered, the corresponding buffer has finished being written with the next camera frame. This is our synchronization point to send this buffer to the rendering.

On the rendering side, we know that when the OpenGL buffers are swapped, the displayed image has been fully rendered. This is our synchronization point to recycle the buffer back the to camera rendering loop.

A specific wrapping class: Camera::Frame is used to exchange those buffers between the camera and the renderer. It is passed through a std::unique_ptr to ensure an exclusive access from the camera or the renderer. When the instance is destroyed, it automatically recycles the underlying buffer to make it available for the next camera frame.

When Camera::startCapturing is called, the camera starts producing frames continuously (like in the code from the previous parts). Each new frame replaces the previous one which is automatically recycled during its destruction:

void Camera::onRequestCompleted(libcamera::Request* request)
{
    if (request->status() == libcamera::Request::RequestCancelled)
    {
        return;
    }

    // We can directly take the first request buffer as we are managing
    // only one stream. In case of multiple streams, we should iterate
    // over the BufferMap entries or access the buffer by stream pointer.
    auto buffer = request->buffers().begin()->second;
    if (buffer->metadata().status == libcamera::FrameMetadata::FrameSuccess)
    {
        // As we are using a XRGB8888 color format we have only one plane, but
        // in case of using a multiplanes color format (like YUV420) we
        // should iterate over all the planes.
        std::unique_ptr<Frame> frame(new Frame(this, request, buffer->cookie()));

        std::lock_guard<std::mutex> lock(m_nextFrameMutex);
        m_nextFrame = std::move(frame);
    }
    else
    {
        // Reuse the request buffer and re-queue the request.
        request->reuse(libcamera::Request::ReuseBuffers);
        m_selectedCamera->queueRequest(request);
    }
}

Camera::Frame::~Frame()
{
    auto camera = m_camera.lock();
    if (camera && m_request)
    {
        m_request->reuse(libcamera::Request::ReuseBuffers);
        camera->m_selectedCamera->queueRequest(m_request);
    }
}

At any moment the renderer can fetch this frame to render it:

void onRender(double time) noexcept override
{
    if (m_camera)
    {
        // We are fetching the next camera produced frame that is ready to
        // be drawn. If there is no new frame available, we are just
        // keeping on drawing the same frame.
        auto cameraFrame = m_camera->getNextFrame();
        if (cameraFrame)
        {
            // We need to keep a reference to the current drawn frame in
            // order to not have the Camera class recycle the underlying
            // dma-buf while the GLES renderer is still using it for
            // drawing. This is the Camera::Frame destructor which ensures
            // proper synchronization. When reaching this point, the
            // previous m_currentCameraFrame has been fully drawn (the GLES
            // buffers swap has just occurred on the previous onRender
            // call), when the unique_ptr is replaced the previous
            // Camera::Frame is destroyed which triggers the recycling of
            // its FrameBuffer (for the next camera frame capture), while
            // the new frame is locked for drawing until it is itself
            // replaced.
            m_currentCameraFrame = std::move(cameraFrame);

            // We can directly fetch and bind the corresponding GLES
            // texture from the FrameBuffer cookie.
            auto textureIndex = m_currentCameraFrame->getCookie();
            m_textures[textureIndex]->bind();

            // The texture mix value is only used to reuse the same shader
            // without and with a camera frame. Now that we have a frame to
            // draw we can show it.
            m_shader->setCameraTextureMix(1.0f);
        }
    }

    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

    glm::mat4 modelMatrix =
        glm::rotate(glm::mat4(1.0f), 1.5f * static_cast<float>(time), glm::vec3(0.8f, 0.5f, 0.4f));
    m_shader->setModelMatrix(modelMatrix);

    m_cube->draw();
}

As we have only 2 buffers and the access to each buffer is exclusive, the camera and renderer speeds are going to adjust each other. The underlying frame buffer is only recycled once destroyed, which only happens when replaced by the next available buffer.

N.B. The Camera::onRequestCompleted callback is called from a libcamera capturing thread while the AppRenderer::onRender is called on the application main thread. The call to libcamera::Camera::queueRequest is thread-safe, but the access to the std::unique_ptr must be protected by a mutex to be passed to the render thread.

std::unique_ptr<Camera::Frame> Camera::getNextFrame() noexcept
{
    std::lock_guard<std::mutex> lock(m_nextFrameMutex);
    return std::move(m_nextFrame);
}

Convert a dma-buf to a texture #

A dma-buf can be attached to an EGLImage thanks to the EXT_image_dma_buf_import EGL extension:

// Create an EGLImage from the camera FrameBuffer.
// In our case we are using a packed color format (XRGB8888), so we
// only need the first buffer plane. In case of using a multiplanar color
// format (like YUV420 for example), we would need to iterate over all the
// color planes in the buffer and fill the EGL_DMA_BUF_PLANE[i]_FD_EXT,
// EGL_DMA_BUF_PLANE[i]_OFFSET_EXT and EGL_DMA_BUF_PLANE[i]_PITCH_EXT for
// each plane.
const auto& plane = buffer.planes().front();

const EGLAttrib attrs[] = {EGL_WIDTH,
                           streamConfiguration.size.width,
                           EGL_HEIGHT,
                           streamConfiguration.size.height,
                           EGL_LINUX_DRM_FOURCC_EXT,
                           streamConfiguration.pixelFormat.fourcc(),
                           EGL_DMA_BUF_PLANE0_FD_EXT,
                           plane.fd.get(),
                           EGL_DMA_BUF_PLANE0_OFFSET_EXT,
                           (plane.offset != libcamera::FrameBuffer::Plane::kInvalidOffset) ? plane.offset : 0,
                           EGL_DMA_BUF_PLANE0_PITCH_EXT,
                           streamConfiguration.stride,
                           EGL_NONE};

EGLImage eglImage = eglCreateImage(eglDisplay, EGL_NO_CONTEXT, EGL_LINUX_DMA_BUF_EXT, nullptr, attrs);
if (!eglImage)
{
    return nullptr;
}

N.B. It is important to use a pixel format compatible with the rendering device, else the eglCreateImage function will fail with eglGetError() returning EGL_BAD_MATCH.

Then, the EGLImage can be attached to an external OpenGL ES texture using the OES_EGL_image_external OpenGL extension:

// Create the GLES texture and attach the EGLImage to it.
glGenTextures(1, &texture->m_texture);
glBindTexture(GL_TEXTURE_EXTERNAL_OES, texture->m_texture);
glTexParameteri(GL_TEXTURE_EXTERNAL_OES, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_EXTERNAL_OES, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_EXTERNAL_OES, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_EXTERNAL_OES, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glEGLImageTargetTexture2DOES(GL_TEXTURE_EXTERNAL_OES, eglImage);
glBindTexture(GL_TEXTURE_EXTERNAL_OES, 0);

// Now that the EGLImage is attached to the texture, we can destroy it. The
// underlying dma-buf will be released when the texture is deleted.
eglDestroyImage(eglDisplay, eglImage);

The corresponding texture can be used like any other kind of texture by binding it to the GL_TEXTURE_EXTERNAL_OES target. Still, the shader will need to use the same extension and a specific sampler to use this external texture target:

#version 300 es
#extension GL_OES_EGL_image_external : require

precision mediump float;
....
uniform samplerExternalOES cameraTexture;
uniform float cameraTextureMix;
....

Although the dma-buf is wrapped by two layers (EGLImage and Texture), its content is never copied or transferred to the system CPU memory (RAM). This is the same memory space, allocated in a dedicated hardware memory, that is used to receive the camera frame content and display it on screen, allowing the kernel to optimize the corresponding resources.

The libcamera library is allocating the dma-bufs needed to store the captured frames content when calling libcamera::FrameBufferAllocator:allocate. So, we can create the corresponding external textures right after the Camera instance creation:

m_camera = Camera::create(m_width, m_height);
if (m_camera)
{
    // Create one texture per available camera buffer.
    for (const auto& request : m_camera->getRequests())
    {
        // We know that we are only using one stream and one buffer per
        // request. If we were using multiple streams at once, we
        // should iterate on the request BufferMap.
        auto [stream, buffer] = *request->buffers().begin();

        auto texture = CameraTexture::create(eglDisplay, stream->configuration(), *buffer);
        if (!texture)
        {
            std::cerr << "Failed to create a camera texture" << std::endl;

            m_textures.clear();
            m_camera.reset();
            m_shader.reset();
            m_cube.reset();

            return false;
        }

        // We are using the associated buffer cookie to store the
        // corresponding texture index in the internal vector. This way
        // it will be easy to fetch the right texture when a frame
        // buffer is ready to be drawn.
        m_textures.push_back(std::move(texture));
        buffer->setCookie(m_textures.size() - 1);
    }

    m_camera->startCapturing();
}

May 19, 2025 12:00 AM

May 16, 2025

Thibault Saunier

gst-dots-viewer: A New Tool for GStreamer Pipeline Visualization

We’re happy to have released gst-dots-viewer, a new development tool that makes it easier to visualize and debug GStreamer pipelines. This tool,  included in GStreamer 1.26, provides a web-based interface for viewing pipeline graphs in real-time as your application runs and allows to easily request all pipelines to be dumped at any time.

What is gst-dots-viewer?

gst-dots-viewer is a server application that monitors a directory for .dot files generated by GStreamer’s pipeline visualization system and displays them in your web browser. It automatically updates the visualization whenever new .dot files are created, making it simpler to debug complex applications and understand the evolution of the pipelines at runtime.

Key Features

  • Real-time Updates: Watch your pipelines evolve as your application runs
  • Interactive Visualization:
    • Click nodes to highlight pipeline elements
    • Use Shift-Ctrl-scroll or w/s keys to zoom
    • Drag-scroll support for easy navigation
  • Easily deployable in cloud based environments

How to Use It

  1. Start the viewer server:
    gst-dots-viewer
    
  2. Open your browser at http://localhost:3000
  3. Enable the dots tracer in your GStreamer application:
    GST_TRACERS=dots your-gstreamer-application
    

The web page will automatically update whenever new pipeline are dumped, and you will be able to dump all pipelines from the web page.

New Dots Tracer

As part of this release, we’ve also introduced a new dots tracer that replaces the previous manual approach to specify where to dump pipelines. The tracer can be activated simply by setting the GST_TRACERS=dots environment variable.

Interactive Pipeline Dumps

The dots tracer integrates with the pipeline-snapshot tracer to provide real-time pipeline visualization control. Through a WebSocket connection, the web interface allows you to trigger pipeline dumps. This means you can dump pipelines exactly when you need them during debugging or development, from your browser.

Future Improvements

We plan on adding more feature and  have this list of possibilities:

  • Additional interactive features in the web interface
  • Enhanced visualization options
  • Integration with more GStreamer tracers to provide comprehensive debugging information. For example, we could integrate the newly released memory-tracer and queue-level tracers so to plot graphs about memory usage at any time.

This could transform gst-dots-viewer into a more complete debugging and monitoring dashboard for GStreamer applications.

Demo

by thiblahute at May 16, 2025 09:35 AM

May 15, 2025

Andy Wingo

guile on whippet waypoint: goodbye, bdw-gc?

Hey all, just a lab notebook entry today. I’ve been working on the Whippet GC library for about three years now, learning a lot on the way. The goal has always been to replace Guile’s use of the Boehm-Demers-Weiser collector with something more modern and maintainable. Last year I finally got to the point that I felt Whippet was feature-complete, and taking into account the old adage about long arses and brief videos, I think that wasn’t too far off. I carved out some time this spring and for the last month have been integrating Whippet into Guile in anger, on the wip-whippet branch.

the haps

Well, today I removed the last direct usage of the BDW collector’s API by Guile! Instead, Guile uses Whippet’s API any time it needs to allocate an object, add or remove a thread from the active set, identify the set of roots for a collection, and so on. Most tracing is still conservative, but this will move to be more precise over time. I haven’t had the temerity to actually try one of the Nofl-based collectors yet, but that will come soon.

Code-wise, the initial import of Whippet added some 18K lines to Guile’s repository, as counted by git diff --stat, which includes documentation and other files. There was an unspeakable amount of autotomfoolery to get Whippet in Guile’s ancient build system. Changes to Whippet during the course of integration added another 500 lines or so. Integration of Whippet removed around 3K lines of C from Guile. It’s not a pure experiment, as my branch is also a major version bump and so has the freedom to refactor and simplify some things.

Things are better but not perfect. Notably, I switched to build weak hash tables in terms of buckets and chains where the links are ephemerons, which give me concurrent lock-free reads and writes but not resizable tables. I would like to somehow resize these tables in response to GC, but haven’t wired it up yet.

Anyway, next waypoint will be trying out the version of Whippet’s Nofl-based mostly-marking collector that traces all heap edges conservatively. If that works... well if that works... I don’t dare to hope! We will see what we get when that happens. Until then, happy hacking!

by Andy Wingo at May 15, 2025 02:39 PM

Georges Stavracas

In celebration of accessibility

Accessibility in the free and open source world is somewhat of a sensitive topic.

Given the principles of free software, one would think it would be the best possible place to advocate for accessibility. After all, there’s a collection of ideologically motivated individuals trying to craft desktops to themselves and other fellow humans. And yet, when you look at the current state of accessibility on the Linux desktop, you couldn’t possibly call it good, not even sufficient.

It’s a tough situation that’s forcing people who need assistive technologies out of these spaces.

I think accessibility on the Linux desktop is in a particularly difficult position due to a combination of poor incentives and historical factors:

  • The dysfunctional state of accessibility on Linux makes it so that the people who need it the most cannot even contribute to it.
  • There is very little financial incentive for companies to invest in accessibility technologies. Often, and historically, companies invest just enough to tick some boxes on government checklists, then forget about it.
  • Volunteers, especially those who contribute for fun and self enjoyment, often don’t go out of their ways to make the particular projects they’re working on accessible. Or to check if their contributions regress the accessibility of the app.
  • The nature of accessibility makes it such that the “functional progression” is not linear. If only 50% of the stack is working, that’s practically a 0%. Accessibility requires that almost every part of the stack to be functional for even the most basic use cases.
  • There’s almost nobody contributing to this area anymore. Expertise and domain knowledge are almost entirely lost.

In addition to that, I feel like work on accessibility is invisible. In the sense that most people are simply apathetic to the work and contributions done on this area. Maybe due to the dynamics of social media that often favor negative engagement? I don’t know. But it sure feels unrewarding. Compare:

Picture of a Reddit thread titled
Picture of a Reddit thread titled

Now, I think if I stopped writing here, you dear reader might feel that the situation is mostly gloomy, maybe even get angry at it. However, against all odds, and fighting a fight that seems impossible, there are people working on accessibility. Often without any kind of reward, doing this out of principle. It’s just so easy to overlook their effort!

So as we prepare for the Global Accessibility Awareness Day, I thought it would be an excellent opportunity to highlight these fantastic contributors and their excellent work, and also to talk about some ongoing work on GNOME.

If you consider this kind of work important and relevant, and/or if you need accessibility features yourself, I urge you: please donate to the people mentioned here. Grab these people a coffee. Better yet, grab them a monthly coffee! Contributors who accept donations have a button beneath their avatars. Go help them.

Calendar

GNOME Calendar, the default calendaring app for GNOME, has slowly but surely progressing towards being minimally accessible. This is mostly thanks to the amazing work from Hari Rana and Jeff Fortin Tam!

Hari recently wrote about it on Mastodon. In fixing one issue, Hari accidentally fixed at least two other issues. Jeff, as an exemplary product manager and co-maintainer, was the one who noticed and also blogged about these collateral fixes.

If you consider this kind of work important, please consider getting them a coffee!

Jeff Fortin Tam

@jfft

Elevado

Back when I was working on fixing accessibility on WebKitGTK, I found the lack of modern tools to inspect the AT-SPI bus a bit off-putting, so I wrote a little app to help me through. Didn’t think much of it, really.

But the project started getting some attention when Bilal Elmoussaoui contributed to it while testing some accessibility work in GNOME Shell. After that, Matthias Clasen – of GTK fame – and Claire – a new contributor! – started sending some nice patches around.

In preparation for the Global Accessibility Awareness Day we have made the first public release of Elevado! The project is evolving mostly without me these days, and it’s all thanks to these people.

Claire

@qwery

Bilal Elmoussaoui

@bilelmoussaoui

GTK

Of course, almost nothing I’ve mentioned so far would be possible if the toolkit itself didn’t have support for accessibility. Thanks to Emmanuele Bassi GTK4 received an entirely new accessibility backend.

Over time, more people picked up on it, and continued improving it and filling in the gaps. Matthias Clasen and Emmanuele continue to review contributions and keep things moving.

One particular contributor is Lukáš Tyrychtr, who has implemented the Text interface of AT-SPI in GTK. Lukáš contributes to various other parts of the accessibility stack as well!

Emmanuele Bassi

@ebassi

Lukáš Tyrychtr

@tyrylu

Matthias Clasen

@matthiasc

Design

On the design side, one person in particular stands out for a series of contributions on the Accessibility panel of GNOME Settings: Sam Hewitt. Sam introduced the first mockups of this panel in GitLab, then kept on updating it. More recently, Sam introduced mockups for text-to-speech (okay technically these are in the System panel, but that’s in the accessibility mockups folder!).

Please join me in thanking Sam for these contributions!

Sam Hewitt

@snwh

Infrastructure

Having apps and toolkits exposing the proper amount of accessibility information is a necessary first step, but it would be useless if there was nothing to expose to.

Thanks to Mike Gorse and others, the AT-SPI project keeps on living. AT-SPI is the service that receives and manages the accessibility information from apps. It’s the heart of accessibility in the Linux desktop! As far as my knowledge about it goes, AT-SPI is really old, dating back to Sun days.

Samuel Thibault continues to maintain speech-dispatcher and Accerciser. Speech dispatcher is the de facto text-to-speech service for Linux as of now. Accerciser is a venerable tool to inspect AT-SPI trees.

Eitan Isaacson is shaking up the speech synthesis world with libspiel, a speech framework for the desktop. Orca has experimental support for it. Eitan is now working on a desktop portal so that sandboxed apps can benefit from speech synthesis seamlessly!

One of the most common screen readers for Linux is Orca. Orca maintainers have been keeping it up an running for a very long time. Here I’d like to point out that we at Igalia significantly fund Orca development.

I would like to invite the community to share a thank you for all of them!

Eitan Isaacson

@eeejay

Mike Gorse

@mgorse

Samuel Thibault

@sthibaul

… and more!

I tried to reach out to everyone nominally mentioned in this blog post. Some people preferred not to be mentioned. I’m pretty sure I’ve never got to learn about others that are involved in related projects.

I guess what I’m trying to say is, this list is not exhaustive. There are more people involved. If you know some of them, please let me encourage you to pay them a tea, a lunch, a boat trip in Venice, whatever you feel like; or even just reach out to them and thank them for their work.

If you contribute or know someone who contributes to desktop accessibility, and wishes to be here, please let me know. Also, please let me know if this webpage itself is properly accessible!

A Look Into The Future

Shortly after I started to write this blog post, I thought to myself: “well, this is nice and all, but it isn’t exactly robust.” Hm. If only there was a more structured, reliable way to keep investing on this.

Coincidentally, at the same time, we were introduced to our new executive director Steven. With such a blast of an introduction, and seeing Steven hanging around in various rooms, I couldn’t resist asking about it. To my great surprise and joy, Steven swiftly responded to my inquiries and we started discussing some ideas!

Conversations are still ongoing, and I don’t want to create any sort of hype in case things end up not working, but… maaaaaaybe keep in mind that there might be an announcement soon!

Huge thanks to the people above, and to everyone who helped me write this blog post ♥


¹ – Jeff doesn’t accept donations for himself, but welcomes marketing-related business

by Georges Stavracas at May 15, 2025 12:49 PM

May 12, 2025

Igalia WebKit Team

WebKit Igalia Periodical #23

Update on what happened in WebKit in the week from May 5 to May 12.

This week saw one more feature enabled by default, additional support to track memory allocations, continued work on multimedia and WebAssembly.

Cross-Port 🐱

The Media Capabilities API is now enabled by default. It was previously available as a run-time option in the WPE/WebKitGTK API (WebKitSettings:enable-media-capabilities), so this is just a default tweak.

Landed a change that integrates malloc heap breakdown functionality with non-Apple ports. It works similarly to Apple's one yet in case of non-Apple ports the per-heap memory allocation statistics are printed to stdout periodically for now. In the future this functionality will be integrated with Sysprof.

Multimedia 🎥

GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.

Support for WebRTC RTP header extensions was improved, a RTP header extension for video orientation metadata handling was introduced and several simulcast tests are now passing

Progress is ongoing on resumable player suspension, which will eventually allow us to handle websites with lots of simultaneous media elements better in the GStreamer ports, but this is a complex task.

JavaScriptCore 🐟

The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.

The in-place Wasm interpreter (IPInt) port to 32-bits has seen some more work.

Fixed a bug in OMG caused by divergence with the 64-bit version. Further syncing is underway.

Releases 📦️

Michael Catanzaro has published a writeup on his blog about how the WebKitGTK API versions have changed over time.

Infrastructure 🏗️

Landed some improvements in the WebKit container SDK for Linux, particularly in error handling.

That’s all for this week!

by Igalia WebKit Team at May 12, 2025 07:29 PM

Alex Bradbury

suite-helper

In my work on RISC-V LLVM, I end up working with the llvm-test-suite a lot, especially as I put more effort into performance analysis, testing, and regression hunting. suite-helper is a Python script that helps with some of the repetitive tasks when setting up, building, and analysing LLVM test suite builds. (Worth nothing for those who aren't LLVM regulars: llvm-test-suite is a separate repository to LLVM and includes execution tests and benchmarks, which is different to the targeted unit tests including in the LLVM monorepo).

Get it from GitHub.

Motivation

As always, it scratches an itch for me. The design target is to provide a starting point that is hopefully good enough for many use cases, but it's easy to modify (e.g. by editing the generated scripts or emitted command lines) if doing something that isn't directly supported.

The main motivation for putting this script together came from my habit of writing fairly detailed "lab notes" for most of my work. This typically includes a listing of commands run, but I've found such listings rather verbose and annoying to work with. This presented a good opportunity for factoring out common tasks into a script, resulting in suite-helper.

Functionality overview

suite-helper has the following subtools:

  • create
    • Checkout llvm-test-suite to the given directory. Use the --reference argument to reference git objects from an existing local checkout.
  • add-config
    • Add a build configuration using either the "cross" or "native" template. See suite-helper add-config --help for a listing of available options. For a build configuration 'foo', a _rebuild-foo.sh file will be created that can be used to build it within the build.foo subdirectory.
  • status
    • Gives a listing of suite-helper managed build configurations that were detected, attempting to indicate if they are up to date or not (e.g. spotting if the hash of the compiler has changed).
  • run
    • Run the given build configuration using llvm-lit, with any additional options passed on to lit.
  • match-tool
    • A helper that is used by suite-helper reduce-ll but may be useful in your own reduction scripts. When looking at generated assembly or disassembly of an object file/binary and an area of interest, your natural inclination may well be to try to carefully craft logic to match something that has equivalent/similar properties. Credit to Philip Reames for underlining to me just how unresonably effective it is to completely ignore that inclination and just write something that naively matches a precise or near-precise assembly sequence. The resulting IR might include some extraneous stuff, but it's a lot easier to cut down after this initial minimisation stage, and a lot of the time it's good enough. The match-tool helper takes a multiline sequence of glob patterns as its argument, and will attempt to find a match for them (a sequential set of lines) on stdin. It also normalises whitespace.
  • get-ll
    • Query ninja nad process its output to try to produce and execute a compiler command that will emit a .ll for the given input file (e.g. a .c file). This is a common first step for llvm-reduce, or for starting to inspect the compilation of a file with debug options enabled.
  • reduce-ll
    • For me, it's fairly common to want to produce a minimised .ll file that produces a certain assembly pattern, based on compiling a given source input. This subtool automates that process, using get-ll to retrieve the ll, then llvm-reduce and match-tool to match the assembly.

Usage example

suite-helper isn't intended to avoid the need to understand how to build the LLVM test suite using CMake and run it using lit, rather it aims to streamline the flow. As such, a good starting point might be to work through some llvm-test-suite builds yourself and then look here to see if anything makes your use case easier or not.

All of the notes above may seem rather abstract, so here is an example of using the helper to while investigating some poorly canonicalised instructions and testing my work-in-progress patch to address them.

suite-helper create llvmts-redundancies --reference ~/llvm-test-suite

for CONFIG in baseline trial; do
  suite-helper add-config cross $CONFIG \
    --cc=~/llvm-project/build/$CONFIG/bin/clang \
    --target=riscv64-linux-gnu \
    --sysroot=~/rvsysroot \
    --cflags="-march=rva22u64 -save-temps=obj" \
    --spec2017-dir=~/cpu2017 \
    --extra-cmake-args="-DTEST_SUITE_COLLECT_CODE_SIZE=OFF -DTEST_SUITE_COLLECT_COMPILE_TIME=OFF"
  ./_rebuild-$CONFIG.sh
done

# Test suite builds are now available in build.baseline and build.trial, and
# can be compared with e.g. ./utils/tdiff.py.

# A separate script had found a suspect instruction sequence in sqlite3.c, so
# let's get a minimal reproducer.
suite-helper reduce build.baseline ./MultiSource/Applications/sqlite3/sqlite3.c \
  'add.uw  a0, zero, a2
   subw    a4, a4, zero' \
  --reduce-bin=~/llvm-project/build/baseline/bin/llvm-reduce \
  --llc-bin=~/llvm-project/build/baseline/bin/llc \
  --llc-args=-O3

The above produces the following reduced.ll:

target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64-unknown-linux-gnu"

define fastcc ptr @sqlite3BtreeDataFetch(ptr %pCur, ptr %pAmt, ptr %0, i16 %1, i32 %conv20.i, i1 %tobool.not.i) #0 {
entry:
  br i1 %tobool.not.i, label %if.else9.i, label %fetchPayload.exit

if.else9.i:                                       ; preds = %entry
  br label %fetchPayload.exit

fetchPayload.exit:                                ; preds = %if.else9.i, %entry
  %nKey.0.i = phi i32 [ %conv20.i, %if.else9.i ], [ 0, %entry ]
  %idx.ext16.i = zext i32 %nKey.0.i to i64
  %add.ptr17.i = getelementptr i8, ptr %0, i64 %idx.ext16.i
  %sub.i = sub i32 %conv20.i, %nKey.0.i
  store i32 %sub.i, ptr null, align 4
  ret ptr %add.ptr17.i
}

attributes #0 = { "target-features"="+b" }

Article changelog
  • 2025-05-12: Initial publication date.

May 12, 2025 12:00 PM