Update on what happened in WebKit in the week from September 8 to September 15.
The JavaScriptCore implementation of Temporal continues to be polished,
as does SVGAElement, and WPE and WebKitGTK accessibility tests can now
run (but they are not passing yet).
Cross-Port 🐱
Add support for the hreflang attribute on SVGAElement, this helps to align it with HTMLAnchorElement.
An improvement in harnessing code for A11y tests allowed to unblock many tests marked as Timeout/Skip in WPEWebKit and WebKitGTK ports. These tests are not passing yet, but they are at least running now.
The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.
In the JavaScriptCore (JSC) implementation of Temporal, refactored the implementations of the difference operations (since and until) for the TemporalPlainTime type in order to match the spec. This enables further work on Temporal, which is being done incrementally.
Update on what happened in WebKit in the week from September 1 to September 8.
In this week's installment of the periodical, we have better spec compliance of
JavaScriptCore's implementation of Temporal, an improvement in how gamepad events
are handled, WPE WebKit now implements a helper class which allows test baselines
to be aligned with other ports, and finally, an update on recent work on Sysprof.
Cross-Port 🐱
Until now, unrecognized gamepads didn't emit button presses or axis move events if they didn't map to the standard mapping layout according to W3C (https://www.w3.org/TR/gamepad/#remapping). Now we ensure that unrecognized gamepads always map to the standard layout, so events are always emitted if a button is pressed or the axis is moved.
JavaScriptCore 🐟
The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.
In the JavaScriptCore (JSC) implementation of Temporal, the compare() method on Temporal durations was modified to follow the spec, which increases the precision with which comparisons are made. This is another step towards a full spec-compliant implementation of Temporal in JSC.
WPE WebKit 📟
Added a specific implementation for helper class ImageAdapter for WPE. This class allows to load image resources that until now were only shipped in WebKitGTK and other ports. This change has aligned many WPE specific test baselines with the rest of WebKit ports, which were now removed.
Community & Events 🤝
Sysprof has received a variety of new features, improvements, and bugfixes, as part of the integration with Webkit. We continued pushing this front in the past 6 months! A few highlights:
An important bug with counters was fixed, and further integration was added to WebKit
It is now possible to hide marks from the waterfall view
Further work on the remote inspector integration, wkrictl, was done
Last year the Webkit project started to integrate its tracing routines with Sysprof. Since then, the feedback I’ve received about it is that it was a pretty big improvement in the development of the engine! Yay.
People started using Sysprof to have insights about the internal states of Webkit, gather data on how long different operations took, and more. Eventually we started hitting some limitations in Sysprof, mostly in the UI itself, such as lack of correlational and visualization features.
Earlier this year a rather interesting enhancement in Sysprof was added: it is now possible to filter the callgraph based on marks. What it means in practice is, it’s now possible to get statistically relevant data about what’s being executed during specific operations of the app.
In parallel to WebKit, recently Mesa merged a patch that integrates Mesa’s tracing routines with Sysprof. This brought data from yet another layer of the stack, and it truly enriches the profiling we can do on apps. We now have marks from the DRM vblank event, the compositor, GTK rendering, WebKit, Mesa, back to GTK, back to the compositor, and finally the composited frame submitted to the kernel. A truly full stack view of everything.
So, what’s the catch here? Well, if you’re an attentive reader, you may have noticed that the marks counter went from this last year:
To this, in March 2025:
And now, we’re at this number:
I do not jest when I say that this is a significant number! I mean, just look at this screenshot of a full view of marks:
Naturally, this is pushing Sysprof to its limits! The app is starting to struggle to handle such massive amounts of data. Having so much data also starts introducing noise in the marks – sometimes, for example, you don’t care about the Mesa marks, or the WebKit marks, of the GLib marks.
Hiding Marks
The most straightforward and impactful improvement that could be done, in light of what was explained above, was adding a way to hide certain marks and groups.
Sysprof heavily uses GListModels, as is trendy in GTK4 apps, so marks, catalogs, and groups are all considered lists containing lists containing items. So it felt natural to wrap these items in a new object with a visible property, and filter by this property, pretty straightforward.
Except it was not
Turns out, the filtering infrastructure in GTK4 did not support monitoring items for property changes. After talking to GTK developers, I learned that this was just a missing feature that nobody got to implementing. Sounded like a great opportunity to enhance the toolkit!
It took some wrestling, but it worked, the reviews were fantastic and now GtkFilterListModel has a new watch-items property. It only works when the the filter supports monitoring, so unfortunately GtkCustomFilter doesn’t work here. The implementation is not exactly perfect, so further enhancements are always appreciated.
So behold! Sysprof can now filter marks out of the waterfall view:
Counters
Another area where we have lots of potential is counters. Sysprof supports tracking variables over time. This is super useful when you want to monitor, for example, CPU usage, I/O, network, and more.
Naturally, WebKit has quite a number of internal counters that would be lovely to have in Sysprof to do proper integrated analysis. So between last year and this year, that’s what I’ve worked on as well! Have a look:
Unfortunately it took a long time to land some of these contributions, because Sysprof seemed to be behaving erratically with counters. After months fighting with it, I eventually figured out what was going on with the counters, and wrote the patch with probably my biggest commit message this year (beat only by few others, including a literal poem.)
Wkrictl
WebKit also has a remote inspector, which has stats on JavaScript objects and whatnot. It needs to be enabled at build time, but it’s super useful when testing on embedded devices.
I’ve started working on a way to extract this data from the remote inspector, and stuff this data into Sysprof as marks and counters. It’s called wkrict. Have a look:
This is far from finished, but I hope to be able to integrate this when it’s more concrete and well developed.
Future Improvements
Over the course of an year, the WebKit project went from nothing to deep integration with Sysprof, and more recently this evolved into actual tooling built around this integration. This is awesome, and has helped my colleagues and other contributors to contribute to the project in ways it simply wasn’t possible before.
There’s still *a lot* of work to do though, and it’s often the kind of work that will benefit everyone using Sysprof, not only WebKit. Here are a few examples:
Integrate JITDump symbol resolution, which allows profiling the JavaScript running on webpages. There’s ongoing work on this, but needs to be finished.
Per-PID marks and counters. Turns out, WebKit uses a multi-process architecture, so it would be better to redesign the marks and counters views to organize things by PID first, then groups, then catalogs.
A new timeline view. This is strictly speaking a condensed waterfall view, but it makes it more obvious the relationship between “inner” and “outer” marks.
Performance tuning in Sysprof and GTK. We’re dealing with orders of magnitude more data than we used to, and the app is starting to struggle to keep up with it.
Some of these tasks involve new user interfaces, so it would be absolutely lovely if Sysprof could get some design love from the design team. If anyone from the design team is reading this, we’d love to have your help
Finally, after all this Sysprof work, Christian kindly offered me to help co-maintain the project, which I accepted. I don’t know how much time and energy I’ll be able to dedicate, but I’ll try and help however I can!
I’d like to thank Christian Hergert, Benjamin Otte, and Matthias Clasen for all the code reviews, for all the discussions and patience during the influx of patches.
This article is a continuation of the series on damage propagation. While the previous article laid some foundation on the subject, this one
discusses the cost (increased CPU and memory utilization) that the feature incurs, as this is highly dependent on design decisions and the implementation of the data structure used for storing damage information.
From the perspective of this article, the two key things worth remembering from the previous one are:
The damage propagation is an optional WPE/GTK WebKit feature that — when enabled — reduces the browser’s GPU utilization at the expense of increased CPU and memory utilization.
On the implementation level, the damage is almost always a collection of rectangles that cover the changed region.
As mentioned in the section about damage of the previous article,
the damage information describes a region that changed and requires repainting. It was also pointed out that such a description is usually done via a collection of rectangles. Although sometimes
it’s better to describe a region in a different way, the rectangles are a natural choice due to the very nature of the damage in the web engines that originates from the box model.
A more detailed description of the damage nature can be inferred from the Pipeline details section of the
previous article. The bottom line is, in the end, the visual changes to the render tree yield the damage information in the form of rectangles.
For the sake of clarity, such original rectangles may be referred to as raw damage.
In practice, the above means that it doesn’t matter whether, e.g. the circle is drawn on a 2D canvas or the background color of some block element changes — ultimately, the rectangles (raw damage) are always produced
in the process.
As the raw damage is a collection of rectangles describing a damaged region, the geometrical consequence is that there may be more than one set of rectangles describing the same region.
It means that raw damage could be stored by a different set of rectangles and still precisely describe the original damaged region — e.g. when raw damage contains more rectangles than necessary.
The example of different approximations of a simple raw damage is depicted in the image below:
Changing the set of rectangles that describes the damaged region may be very tempting — especially when the size of the set could be reduced. However, the following consequences must be taken into account:
The damaged region could shrink when some damaging information would be lost e.g. if too many rectangles would be removed.
The damaged region could expand when some damaging information would be added e.g. if too many or too big rectangles would be added.
The first consequence may lead to visual glitches when repainting. The second one, however, causes no visual issues but degrades performance since a larger area
(i.e. more pixels) must be repainted — typically increasing GPU usage. This means the damage information can be approximated as long as the trade-off between the extra repainted area and the degree of simplification
in the underlying set of rectangles is acceptable.
The approximation mentioned above means the situation where the approximated damaged region covers the original damaged region entirely i.e. not a single pixel of information is lost. In that sense, the
approximation can only add extra information. Naturally, the lower the extra area added to the original damaged region, the better.
The approximation quality can be referred to as damage resolution, which is:
low — when the extra area added to the original damaged region is significant,
high — when the extra area added to the original damaged region is small.
The examples of low (left) and high (right) damage resolutions are presented in the image below:
Given the description of the damage properties presented in the sections above, it’s evident there’s a certain degree of flexibility when it comes to processing damage information. Such a situation is very fortunate in the
context of storing the damage, as it gives some freedom in designing a proper data structure. However, before jumping into the actual solutions, it’s necessary to understand the problem end-to-end.
layer damage — the damage tracked separately for each layer,
frame damage — the damage that aggregates individual layer damages and consists of the final damage of a given frame.
Assuming there are L layers and there is some data structure called Damage that can store the damage information, it’s easy to notice that there may be L+1 instances
of Damage present at the same time in the pipeline as the browser engine requires:
L Damage objects for storing layer damage,
1 Damage object for storing frame damage.
As there may be a lot of layers in more complex web pages, the L+1 mentioned above may be a very big number.
The first consequence of the above is that the Damage data structure in general should store the damage information in a very compact way to avoid excessive memory usage when L+1 Damage objects
are present at the same time.
The second consequence of the above is that the Damage data structure in general should be very performant as each of L+1 Damage objects may be involved into a considerable amount of processing when there are
lots of updates across the web page (and hence huge numbers of damage rectangles).
To better understand the above consequences, it’s essential to examine the input and the output of such a hypothetical Damage data structure more thoroughly.
The Damage becomes an input of other Damage in some situations, happening in the middle of the damage propagation pipeline when the broader damage is being assembled from smaller chunks of damage. What it consists
of depends purely on the Damage implementation.
The raw damage, on the other hand, becomes an input of the Damage always at the very beginning of the damage propagation pipeline. In practice, it consists of a set of rectangles that are potentially overlapping, duplicated, or empty. Moreover,
such a set is always as big as the set of changes causing visual impact. Therefore, in the worst case scenario such as drawing on a 2D canvas, the number of rectangles may be enormous.
Given the above, it’s clear that the hypothetical Damage data structure should support 2 distinct input operations in the most performant way possible:
When it comes to the Damage data structure output, there are 2 possibilities either:
other Damage,
the platform API.
The Damage becomes the output of other Damage on each Damage-to-Damage append that was described in the subsection above.
The platform API, on the other hand, becomes the output of Damage at the very end of the pipeline e.g. when the platform API consumes the frame damage (as described in the
pipeline details section of the previous article).
In this situation, what’s expected on the output technically depends on the particular platform API. However, in practice, all platforms supporting damage propagation require a set of rectangles that describe the damaged region.
Such a set of rectangles is fed into the platforms via APIs by simply iterating the rectangles describing the damaged region and transforming them to whatever data structure the particular API expects.
The natural consequence of the above is that the hypothetical Damage data structure should support the following output operation — also in the most performant way possible:
Given all the above perspectives, the problem of designing the Damage data structure can be summarized as storing the input damage information to be accessed (iterated) later in a way that:
the performance of operations for adding and iterating rectangles is maximal (performance),
the memory footprint of the data structure is minimal (memory footprint),
the stored region covers the original region and has the area as close to it as possible (damage resolution).
With the problem formulated this way, it’s obvious that this is a multi-criteria optimization problem with 3 criteria:
Given the problem of storing damage defined as above, it’s possible to propose various ways of solving it by implementing a Damage data structure. Before diving into details, however, it’s important to emphasize
that the weights of criteria may be different depending on the situation. Therefore, before deciding how to design the Damage data structure, one should consider the following questions:
What is the proportion between the power of GPU and CPU in the devices I’m targeting?
What are the memory constraints of the devices I’m targeting?
What are the cache sizes on the devices I’m targeting?
What is the balance between GPU and CPU usage in the applications I’m going to optimize for?
Are they more rendering-oriented (e.g. using WebGL, Canvas 2D, animations etc.)?
Are they more computing-oriented (frequent layouts, a lot of JavaScript processing etc.)?
Although answering the above usually points into the direction of specific implementation, usually the answers are unknown and hence the implementation should be as generic as possible. In practice,
it means the implementation should not optimize with a strong focus on just one criterion. However, as there’s no silver bullet solution, it’s worth exploring multiple, quasi-generic solutions that have been researched as
part of Igalia’s work on the damage propagation, and which are the following:
Damage storing all input rects,
Bounding box Damage,
Damage using WebKit’s Region,
R-Tree Damage,
Grid-based Damage.
All of the above implementations are being evaluated along the 3 criteria the following way:
Performance
by specifying the time complexity of add(Rectangle) operation as add(Damage) can be transformed into the series of add(Rectangle) operations,
by specifying the time complexity of forEachRectangle(...) operation.
Memory footprint
by specifying the space complexity of Damage data structure.
The most natural — yet very naive — Damage implementation is one that wraps a simple collection (such as vector) of rectangles and hence stores the raw damage in the original form.
In that case, the evaluation is as simple as evaluating the underlying data structure.
Assuming a vector data structure and O(1) amortized time complexity of insertion, the evaluation of such a Damage implementation is:
Performance
insertion is O(1) ✅
iteration is O(N) ❌
Memory footprint
O(N) ❌
Damage resolution
perfect ✅
Despite being trivial to implement, this approach is heavily skewed towards the damage resolution criterion. Essentially, the damage quality is the best possible, yet the expense is a very poor
performance and substantial memory footprint. It’s because a number of input rects N can be a very big number, thus making the linear complexities unacceptable.
The other problem with this solution is that it performs no filtering and hence may store a lot of redundant rectangles. While the empty rectangles can be filtered out in O(1),
filtering out duplicates and some of the overlaps (one rectangle completely containing the other) would make insertion O(N). Naturally, such a filtering
would lead to a smaller memory footprint and faster iteration in practice, however, their complexities would not change.
The second simplest Damage implementation one can possibly imagine is the implementation that stores just a single rectangle, which is a minimum bounding rectangle (bounding box) of all the damage
rectangles that were added into the data structure. The minimum bounding rectangle — as the name suggests — is a minimal rectangle that can fit all the input rectangles inside. This is well demonstrated in the picture below:
As this implementation stores just a single rectangle, and as the operation of taking the bounding box of two rectangles is O(1), the evaluation is as follows:
Performance
insertion is O(1) ✅
iteration is O(1) ✅
Memory footprint
O(1) ✅
Damage resolution
usually low ⚠️
Contrary to the Damage storing all input rects, this solution yields a perfect performance and memory footprint at the expense of low damage resolution. However,
in practice, the damage resolution of this solution is not always low. More specifically:
in the optimistic cases (raw damage clustered), the area of the bounding box is close to the area of the raw damage inside,
in the average cases, the approximation of the damaged region suffers from covering significant areas that were not damaged,
in the worst cases (small damage rectangles on the other ends of a viewport diagonal), the approximation is very poor, and it may be as bad as covering the whole viewport.
As this solution requires a minimal overhead while still providing a relatively useful damage approximation, in practice, it is a baseline solution used in:
Chromium,
Firefox,
WPE and GTK WebKit when UnifyDamagedRegions runtime preference is enabled, which means it’s used in GTK WebKit by default.
When it comes to more sophisticated Damage implementations, the simplest approach in case of WebKit is to wrap data structure already implemented in WebCore called
Region. Its purpose
is just as the name suggests — to store a region. More specifically, it’s meant to store rectangles describing region in an efficient way both for storage and for access to take advantage
of scanline coherence during rasterization. The key characteristic of the data structure is that it stores rectangles without overlaps. This is achieved by storing y-sorted lists of x-sorted, non-overlapping
rectangles. Another important property is that due to the specific internal representation, the number of integers stored per rectangle is usually smaller than 4. Also, there are some other useful properties
that are, however, not very useful in the context of storing the damage. More details on the data structure itself can be found in the J. E. Steinhart’s paper from 1991 titled
SCANLINE COHERENT SHAPE ALGEBRA
published as part of Graphics Gems II book.
The Damage implementation being a wrapper of the Region was actually used by GTK and WPE ports as a first version of more sophisticated Damage alternative for the bounding box Damage. Just as expected,
it provided better damage resolution in some cases, however, it suffered from effectively degrading to a more expensive variant bounding box Damage in the majority of situations.
The above was inevitable as the implementation was falling back to bounding box Damage when the Region’s internal representation was getting too complex. In essence, it was addressing the Region’s biggest problem,
which is that it can effectively store N2 rectangles in the worst case due to the way it splits rectangles for storing purposes. More specifically, as the Region stores ledges
and spans, each insertion of a new rectangle may lead to splitting O(N) existing rectangles. Such a situation is depicted in the image below, where 3 rectangles are being split
into 9:
Putting the above fallback mechanism aside, the evaluation of Damage being a simple wrapper on top of Region is the following:
Performance
insertion is O(logN) ✅
iteration is O(N2) ❌
Memory footprint
O(N2) ❌
Damage resolution
perfect ✅
Adding a fallback, the evaluation is technically the same as bounding box Damage for N above the fallback point, yet with extra overhead. At the same time, for smaller N, the above evaluation
didn’t really matter much as in such case all the performance, memory footprint, and the damage resolution were very good.
Despite this solution (with a fallback) yielded very good results for some simple scenarios (when N was small enough), it was not sustainable in the long run, as it was not addressing the majority of use cases,
where it was actually a bit slower than bounding box Damage while the results were similar.
In the pursuit of more sophisticated Damage implementations, one can think of wrapping/adapting data structures similar to quadtrees, KD-trees etc. However, in most of such cases, a lot of unnecessary overhead is added
as the data structures partition the space so that, in the end, the input is stored without overlaps. As overlaps are not necessarily a problem for storing damage information, the list of candidate data structures
can be narrowed down to the most performant data structures allowing overlaps. One of the most interesting of such options is the R-Tree.
In short, R-Tree (rectangle tree) is a tree data structure that allows storing multiple entries (rectangles) in a single node. While the leaf nodes of such a tree store the original
rectangles inserted into the data structure, each of the intermediate nodes stores the bounding box (minimum bounding rectangle, MBR) of the children nodes. As the tree is balanced, the above means that with every next
tree level from the top, the list of rectangles (either bounding boxes or original ones) gets bigger and more detailed. The example of the R-tree is depicted in the Figure 5 from
the Object Trajectory Analysis in Video Indexing and Retrieval Applications paper:
The above perfectly shows the differences between the rectangles on various levels and can also visually suggest some ideas when it comes to adapting such a data structure into Damage:
The first possibility is to make Damage a simple wrapper of R-Tree that would just build the tree and allow the Damage consumer to pick the desired damage resolution on iteration attempt. Such an approach is possible
as having the full R-Tree allows the iteration code to limit iteration to a certain level of the tree or to various levels from separate branches. The latter allows Damage to offer a particularly interesting API where the
forEachRectangle(...) function could accept a parameter specifying how many rectangles (at most) are expected to be iterated.
The other possibility is to make Damage an adaptation of R-Tree that conditionally prunes the tree while constructing it not to let it grow too much, yet to maintain a certain height and hence certain damage quality.
Regardless of the approach, the R-Tree construction also allows one to implement a simple filtering mechanism that eliminates input rectangles being duplicated or contained by existing rectangles on the fly. However,
such a filtering is not very effective as it can only consider a limited set of rectangles i.e. the ones encountered during traversal required by insertion.
Damage as a simple R-Tree wrapper
Although this option may be considered very interesting, in practice, storing all the input rectangles in the R-Tree means storing N rectangles along with the overhead of a tree structure. In the worst case scenario
(node size of 2), the number of nodes in the tree may be as big as O(N), thus adding a lot of overhead required to maintain the tree structure. This fact alone makes this solution have an
unacceptable memory footprint. The other problem with this idea is that in practice,
the damage resolution selection is usually done once — during browser startup. Therefore, the ability to select damage resolution during runtime brings no benefits while introduces unnecessary overhead.
The evaluation of the above is the following:
Performance
insertion is O(logMN) where M is the node size ✅
iteration is O(K) where K is a parameter and 0≤K≤N ✅
Memory footprint
O(N) ❌
Damage resolution
low to high ✅
Damage as an R-Tree adaptation with pruning
Considering the problems the previous idea has, the option with pruning seems to be addressing all the problems:
the memory footprint can be controlled by specifying at which level of the tree the pruning should happen,
the damage resolution (level of the tree where pruning happens) can be picked on the implementation level (compile time), thus allowing some extra implementation tricks if necessary.
While it’s true the above problems are not existing within this approach, the option with pruning — unfortunately — brings new problems that need to be considered. As a matter of fact, all the new problems it brings
are originating from the fact that each pruning operation leads to the loss of information and hence to the tree deterioration over time.
Before actually introducing those new problems, it’s worth understanding more about how insertions work in the R-Tree.
When the rectangle is inserted to the R-Tree, the first step is to find a proper position for the new record (see ChooseLeaf algorithm from Guttman1984). When the target node is
found, there are two possibilities:
adding the new rectangle to the target node does not cause overflow,
adding the new rectangle to the target node causes overflow.
If no overflow happens, the new rectangle is just added to the target node. However, if overflow happens i.e. the number of rectangles in the node exceeds the limit, the node splitting algorithm is invoked (see SplitNode
algorithm from Guttman1984) and the changes are being propagated up the tree (see ChooseLeaf algorithm from Guttman1984).
The node splitting, along with adjusting the tree, are very important steps within insertion as those algorithms are the ones that are responsible for shaping and balancing the tree. For example, when all the nodes in the tree are
full and the new rectangle is being added, the node splitting will effectively be executed for some leaf node and all its ancestors, including root. It means that the tree will grow and possibly, its structure will change significantly.
Due to the above mechanics of R-Tree, it can be reasonably asserted that the tree structure becomes better as a function of node splits. With that, the first problem of the tree pruning becomes obvious:
tree pruning on insertion limits the amount of node splits (due to smaller node splits cascades) and hence limits the quality of the tree structure. The second problem — also related to node splits — is that
with all the information lost due to pruning (as pruning is the same as removing a subtree and inserting its bounding box into the tree) each node split is less effective as the leaf rectangles themselves are
getting bigger and bigger due to them becoming bounding boxes of bounding boxes (…) of the original rectangles.
The above problems become more visible in practice when the R-tree input rectangles tend to be sorted. In general, one of the R-Tree problems is that its structure tends to be biased when the input rectangles are sorted.
Despite the further insertions usually fix the structure of the biased tree, it’s only done to some degree, as some tree nodes may not get split anymore. When the pruning happens and the input is sorted (or partially sorted)
the fixing of the biased tree is much harder and sometimes even impossible. It can be well explained with the example where a lot of rectangles from the same area are inserted into the tree. With the number of such rectangles
being big enough, a lot of pruning will happen and hence a lot of rectangles will be lost and replaced by larger bounding boxes. Then, if a series of new insertions will start inserting nodes from a different area which is
partially close to the original one, the new rectangles may end up being siblings of those large bounding boxes instead of the original rectangles that could be clustered within nodes in a much more reasonable way.
Given the above problems, the evaluation of the whole idea of Damage being the adaptation of R-Tree with pruning is the following:
Performance
insertion is O(logMK) where M is the node size, K is a parameter, and 0<K≤N ✅
iteration is O(K) ✅
Memory footprint
O(K) ✅
Damage resolution
low to medium ⚠️
Despite the above evaluation looks reasonable, in practice, it’s very hard to pick the proper pruning strategy. When the tree is allowed to be taller, the damage resolution is usually better, but the increased memory footprint,
logarithmic insertions, and increased iteration time combined pose a significant problem. On the other hand, when the tree is shorter, the damage resolution tends to be low enough not to justify using R-Tree.
The last, more sophisticated Damage implementation, uses some ideas from R-Tree and forms a very strict, flat structure. In short, the idea is to take some rectangular part of a plane and divide it into cells,
thus forming a grid with C columns and R rows. Given such a division, each cell of the grid is meant to store at most one rectangle that effectively is a bounding box of the rectangles matched to
that cell. The overview of the approach is presented in the image below:
As the above situation is very straightforward, one may wonder what would happen if the rectangle would span multiple cells i.e. how the matching algorithm would work in that case.
Before diving into the matching, it’s important to note that from the algorithmic perspective, the matching is very important as it accounts for the majority of operations during new rectangle insertion into the Damage data structure.
It’s because when the matched cell is known, the remaining part of insertion is just about taking the bounding box of existing rectangle stored in the cell and the new rectangle, thus having
O(1) time complexity.
As for the matching itself, it can be done in various ways:
it can be done using strategies known from R-Tree, such as matching a new rectangle into the cell where the bounding box enlargement would be the smallest etc.,
it can be done by maximizing the overlap between the new rectangle and the given cell,
it can be done by matching the new rectangle’s center (or corner) into the proper cell,
etc.
The above matching strategies fall into 2 categories:
O(CR) matching algorithms that compare a new rectangle against existing cells while looking for the best match,
O(1) matching algorithms that calculate the target cell using a single formula.
Due to the nature of matching, the O(CR) strategies eventually lead to smaller bounding boxes stored in the Damage and hence to better damage resolution as compared to the
O(1) algorithms. However, as the practical experiments show, the difference in damage resolution is not big enough to justify O(CR)
time complexity over O(1). More specifically, the difference in damage resolution is usually unnoticeable, while the difference between
O(CR) and O(1) insertion complexity is major, as the insertion is the most critical operation of the Damage data structure.
Due to the above, the matching method that has proven to be the most practical is matching the new rectangle’s center into the proper cell. It has O(1) time complexity
as it requires just a few arithmetic operations to calculate the center of the incoming rectangle and to match it to the proper cell (see
the implementation in WebKit). The example of such matching is presented in the image below:
The overall evaluation of the grid-based Damage constructed the way described in the above paragraphs is as follows:
performance
insertion is O(1) ✅
iteration is O(CR) ✅
memory footprint
O(CR) ✅
damage resolution
low to high (depending on the CR) ✅
Clearly, the fundamentals of the grid-based Damage are strong, but the data structure is heavily dependent on the CR. The good news is that, in practice, even a fairly small grid such as 8x4
(CR=32)
yields a damage resolution that is high. It means that this Damage implementation is a great alternative to bounding box Damage as even with very small performance and memory footprint overhead,
it yields much better damage resolution.
Moreover, the grid-based Damage implementation gives an opportunity for very handy optimizations that improve memory footprint, performance (iteration), and damage resolution further.
As the grid dimensions are given a-priori, one can imagine that intrinsically, the data structure needs to allocate a fixed-size array of rectangles with CR entries to store cell bounding boxes.
One possibility for improvement in such a situation (assuming small CR) is to use a vector along with bitset so that only non-empty cells are stored in the vector.
The other possibility (again, assuming small CR) is to not use a grid-based approach at all as long as the number of rectangles inserted so far does not exceed CR.
In other words, the data structure can allocate an empty vector of rectangles upon initialization and then just append new rectangles to the vector as long as the insertion does not extend the vector beyond
CR entries. In such a case, when CR is e.g. 32, up to 32 rectangles can be stored in the original form. If at some point the data structure detects that it would need to
store 33 rectangles, it switches internally to a grid-based approach, thus always storing at most 32 rectangles for cells. Also, note that in such a case, the first improvement possibility (with bitset) can still be used.
Summarizing the above, both improvements can be combined and they allow the data structure to have a limited, small memory footprint, good performance, and perfect damage resolution as long as there
are not too many damage rectangles. And if the number of input rectangles exceeds the limit, the data structure can still fall-back to a grid-based approach and maintain very good results. In practice, the situations
where the input damage rectangles are not exceeding CR (e.g. 32) are very common, and hence the above improvements are very important.
Overall, the grid-based approach with the above improvements has proven to be the best solution for all the embedded devices tried so far, and therefore, such a Damage implementation is a baseline solution used in
WPE and GTK WebKit when UnifyDamagedRegions runtime preference is not enabled — which means it works by default in WPE WebKit.
The former sections demonstrated various approaches to implementing the Damage data structure meant to store damage information. The summary of the results is presented in the table below:
While all the solutions have various pros and cons, the Bounding box and Grid-basedDamage implementations are the most lightweight and hence are most useful in generic use cases.
On typical embedded devices — where CPUs are quite powerful compared to GPUs — both above solutions are acceptable, so the final choice can be determined based on the actual use case. If the actual web application
often yields clustered damage information, the Bounding boxDamage implementation should be preferred. Otherwise (majority of use cases), the Grid-basedDamage implementation will work better.
On the other hand, on desktop-class devices – where CPUs are far less powerful than GPUs – the only acceptable solution is Bounding boxDamage as it has a minimal overhead while it sill provides some
decent damage resolution.
The above are the reasons for the default Damage implementations used by desktop-oriented GTK WebKit port (Bounding boxDamage) and embedded-device-oriented WPE WebKit (Grid-basedDamage).
When it comes to non-generic situations such as unusual hardware, specific applications etc. it’s always recommended to do a proper evaluation to determine which solution is the best fit. Also, the Damage implementations
other than the two mentioned above should not be ruled out, as in some exotic cases, they may give much better results.
Update on what happened in WebKit in the week from August 25 to September 1.
The rewrite of the WebXR support continues, as do improvements
when building for Android, along with smaller fixes in multimedia
and standards compliance.
Cross-Port 🐱
The WebXR implementation has gained
input through OpenXR, including
support for the hand interaction—useful for devices which only support
hand-tracking—and the generic simple profile. This was soon followed by the
addition of support for the Hand
Input module.
Aligned the SVGStyleElement
type and media attributes with HTMLStyleElement's.
Multimedia 🎥
GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.
Support for FFMpeg GStreamer audio decoders was
re-introduced
because the alternative decoders making use of FDK-AAC might not be available
in some distributions and Flatpak runtimes.
Graphics 🖼️
Usage of fences has been introduced
to control frame submission of rendered WebXR content when using OpenXR. This
approach avoids blocking in the renderer process waiting for frames to be
completed, resulting in slightly increased performance.
New, modern platform API that supersedes usage of libwpe and WPE backends.
Changed WPEPlatform to be built as part of the libWPEWebKit
library. This avoids duplicating some
code in different libraries, brings in a small reduction in used space, and
simplifies installation for packagers. Note that the wpe-platform-2.0 module
is still provided, and applications that consume the WPEPlatform API must still
check and use it.
Adaptation of WPE WebKit targeting the Android operating system.
Support for sharing AHardwareBuffer handles across processes is now
available. This lays out the
foundation to use graphics memory directly across different WebKit subsystems
later on, making some code paths more efficient, and paves the way towards
enabling the WPEPlatform API on Android.
Update on what happened in WebKit in the week from August 18 to August 25.
This week continue improvements in the WebXR front, more layout tests passing,
support for CSS's generic font family for math, improvements in the graphics
stack, and an Igalia Chat episode!
The WebXR implementation has gained support to funnel usage permission requests through the public API for immersive sessions. Note that this is a basic implementation, and fine-grained control of requested session capabilities may be added at a later time.
The CSS font-family: math generic font family is now supported in WebKit. This is part of the CSS Fonts Level 4 specification.
The WebXR implementation has gained to ability to use GBM graphics buffers as fallback, which allows usage with drivers that do not provide the EGL_MESA_image_dma_buf_export extension, yet use GBM for buffer allocation.
Early this month, a new episode of Igalia Chat titled "Get Down With the WebKit" was released, where Brian Kardell and Eric Meyer talk with Igalia's Alejandro (Alex) Garcia about the WebKit project and Igalia's WPE port.
Update on what happened in WebKit in the week from August 11 to August 18.
This week we saw updates in WebXR support, better support for changing audio outputs,
enabling of GLib API when building the JSCOnly port, improvements to damaging propagation,
WPE platform enhancements, and more!
GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.
Changing audio outputs has been changed to use gst_device_reconfigure_element() instead of relying on knowledge about how different GStreamer sink elements handle the choice of output device. Note that audio output selection support is in development and disabled by default, the ExposeSpeakers, ExposeSpeakersWithoutMicrophone, and PerElementSpeakerSelection features flags may be toggled to test it.
Recently, I have been working on webkitgtk support for in-band text tracks in Media Source Extensions,
so far just for WebVTT in MP4. Eventually, I noticed a page that seemed to be
using a CEA-608 track - most likely unintentionally, not expecting it to be handled - so I decided
to take a look how that might work. Take a look at the resulting PR here: https://github.com/WebKit/WebKit/pull/47763
Now, if you’re not already familiar with subtitle and captioning formats, particularly CEA-608,
you might assume they must be straightforward, compared to audio and video. After all, its
just a bit of text and some timestamps, right?
However, even WebVTT as a text-based format already provides lots of un- or poorly supported features that
don’t mesh well with MSE - for details on those open questions, take a look at Alicia’s session on the topic:
https://github.com/w3c/breakouts-day-2025/issues/14
CEA-608, also known as line 21 captions, is responsible for encoding captions as a fixed-bitrate
stream of byte pairs in an analog NTSC broadcast. As the name suggests, they are transmitted during
the vertical blanking period, on line 21 (and line 284, for the second field) - imagine this as the
mostly blank area “above” the visible image. This provides space for up to 4 channels of captioning,
plus some additional metadata about the programming, though due to the very limited bandwidth, these
capabilities were rarely used to their full extent.
While digital broadcasts provide captioning defined by its successor standard CEA-708,
this newer format still provides the option to embed 608 byte pairs.
This is still quite common, and is enabled by later standards defining a digital encoding,
known as Caption Distribution Packets.
These are also what enables CEA-608 tracks in MP4.
The main issue I’ve encountered in trying to make CEA-608 work in an MSE context lies in its origin
as a fixed-bitrate stream - there is no concept of cues, no defined start or end, just one continuous stream.
As WebKit internally understands only WebVTT cues, we rely on GStreamer’s cea608tott element
for the conversion to WebVTT. Essentially, this element needs to create cues with well-defined timestamps,
which works well enough if we have the entire stream present on disk.
However, when 608 is present as a track in an MSE stream, how do we tell if the “current” cue
is continued in the next SourceBuffer? Currently, cea608tott will just wait for more data,
and emit another cue once it encounters a line break, or its current line buffer fills up,
but this also means the final cue will be swallowed, because there will never be “more data”
to allow for that decision.
The solution would be to always cut off cues at SourceBuffer boundaries, so cues might appear
to be split awkwardly to the viewer.
Overall, this conversion to VTT won’t reproduce the captions as they were intended to be viewed,
at least not currently. In particular, roll-up mode can’t easily be emulated using WebVTT.
The other issue is that I’ve assumed for the current patch that CEA-608 captions
will be present as a separate MP4 track, while in practice they’re usually injected
into the video stream, which will be harder to handle well.
Finally, there is the risk of breaking existing websites, that might have unintentionally
left CEA-608 captions in, and don’t handle a surprise duplicate text track well.
While this patch only provides experimental support so far, I feel this has given
me valuable insight into how inband text tracks can work with various formats aside from
just WebVTT. Ironically, CEA-608 even avoids some of WebVTT’s issues - there are no gaps or
overlapping cues to worry about, for example.
Either way, I’m looking forward to improving on WebVTT’s pain points,
and maybe adding other formats eventually!
Adaptation of WPE WebKit targeting the Android operating system.
WPE-Android has been updated to use WebKit 2.48.5. This update particular interest for development on Android is the support for using the system logd service, which can be configured using system properties. For example, the following will enable logging all warnings:
adb shell setprop debug.log.WPEWebKit all
adb shell setprop log.tag.WPEWebKit WARN
Stable releases of WebKitGTK 2.48.5 and WPE WebKit 2.48.5 are now available. These include the fixes and improvements from the corresponding2.48.4 ones, and additionally solve a number of security issues. Advisory WSA-2025-0005 (GTK, WPE) covers the included security patches.
Ruby was re-added to the GNOME SDK, thanks to Michael Catanzaro and Jordan Petridis. So we're happy to report that the WebKitGTK nightly builds for GNOME Web Canary are now fixed and Canary updates were resumed.
Update on what happened in WebKit in the week from July 21 to July 28.
This week the trickle of improvements to the graphics stack continues with
more font handling improvements and tuning of damage information; plus the
WPEPlatform Wayland backend gets server-side decorations with some compositors.
Font synthesis properties (synthetic bold/italic) are now correctly
handled, so that fonts are rendered
bold or italic even when the font itself does not provide these variants.
A few minorimprovements to the damage
propagation feature have landed.
The screen device scaling factor in use is now
shown in the webkit://gpu internal
information page.
WPE WebKit 📟
WPE Platform API 🧩
New, modern platform API that supersedes usage of libwpe and WPE backends.
The Wayland backend included with WPEPlatform has been taught how to request
server-side decorations using the XDG
Decoration protocol.
This means that compositors that support the protocol will provide window
frames and title bars for WPEToplevel instances. While this is a welcome
quality of life improvement in many cases, window decorations will not be shown
on Weston and Mutter (used by GNOME Shell among others), as they do not support
the protocol at the moment.
Update on what happened in WebKit in the week from July 14 to July 21.
In this week we had a fix for the libsoup-based resource loader on platforms
without the shared-mime-info package installed, a fix for SQLite usage in
WebKit, ongoing work on the GStreamer-based WebRTC implementation including
better encryption for its default DTLS certificate and removal of a dependency,
and an update on the status of GNOME Web Canary version.
Cross-Port 🐱
ResourceLoader delegates local resource loading (e.g. gresources) to ResourceLoaderSoup, which in turn uses g_content_type_guess to identify their content type. In platforms where shared-mime-info is not available, this fails silently and reports "text/plain", breaking things such as PDFjs.
A patch was submitted to use MIMETypeRegistry to get the MIME type of these local resources, falling back to g_content_type_guess when that fails, making internal resource loading more resilient.
Fixed "PRAGMA incrementalVacuum" for SQLite, which is used to reclaim freed filesystem space.
Multimedia 🎥
GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.
Most web engines migrated from a default DTLS certificate signed with a RSA key to a ECDSA p-256 key, almost a decade ago. GstWebRTC is now also signing its default DTLS certificate with that private key format. This improves compatibility with various SFUs, the Jitsi Video Bridge among them.
Adaptation of WPE WebKit targeting the Android operating system.
Changed libpsl to include built-in public-suffix data when building WPE for Android. Among other duties, having this working correctly is important for site isolation, resource loading, and cookie handling.
Releases 📦️
The GNOME Web Canary build has been stale for several weeks, since the GNOME nightly SDK was updated to freedesktop SDK 25.08beta which no longer ships one of the WebKitGTK build dependencies (Ruby). We will do our best to get the builds back to a working state, soon hopefully.
Update on what happened in WebKit in the week from July 7 to July 14.
This week saw a fix for IPv6 scope-ids in DNS responses, frame pointers
re-enabled in JSC developer builds, and a significant improvement to
emoji fonts selection.
Update on what happened in WebKit in the week from June 30 to July 7.
Improvements to Sysprof and related dependencies, WebKit's usage of
std::variant replaced by mpark::variant, major WebXR overhauling,
and support for the logd service on Android, are all part of this
week's bundle of updates.
Cross-Port 🐱
The WebXR support in the GTK and WPE WebKit ports has been ripped off in preparation for an overhaul that will make it better fit WebKit's multi-process architecture.
Note these are the first steps on this effort, and there is still plenty to do before WebXR experiences work again.
Changed usage of std::variant in favor of an alternative implementation based on mpark::variant, which reduces the size of the built WebKit library—currently saves slightly over a megabyte for release builds.
Adaptation of WPE WebKit targeting the Android operating system.
Logging support is being improved to submit entries to the logd service on Android, and also to configure logging using a system property. This makes debugging and troubleshooting issues on Android more manageable, and is particularly welcome to develop WebKit itself.
While working on this feature, the definition of logging channels was simplified, too.
Community & Events 🤝
WebKit on Linux integrates with Sysprof and reports a plethora of marks. As we report more information to Sysprof, we eventually pushed Sysprof internals to its limit! To help with that, we're adding a new feature to Sysprof: hiding marks from view.
Update on what happened in WebKit in the week from June 24 to July 1.
This was a slow week, where the main highlight are new development
releases of WPE WebKit and WebKitGTK.
Cross-Port 🐱
JavaScriptCore 🐟
The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.
Made some further progress bringing the 32-bit version of OMG closer to the 64-bit one
Releases 📦️
WebKitGTK 2.49.3 and WPE WebKit 2.49.3 have been released. These are development snapshots intended to allow those interested to test the new features and improvement which will be part of the next stable release series. As usual, bug reports are welcome in the WebKit Bugzilla.
Multiple MediaRecorder-related improvements landed in main recently (1, 2, 3, 4), and also in GStreamer.
JavaScriptCore 🐟
The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.
JSC saw some fixes in i31 reference types when using Wasm GC.
WPE WebKit 📟
WPE now has support for analog gamepad buttons when using libwpe. Since version 1.16.2 libwpe has the capability to handle analog gamepad button events, but the support on the WPE side was missing. It has now been added, and will be enabled when the appropriate versions of libwpe are used.
Update on what happened in WebKit in the week from May 27 to June 16.
After a short hiatus coinciding with this year's edition of the Web Engines
Hackfest, this issue covers a mixed bag of new API features, releases,
multimedia, and graphics work.
Cross-Port 🐱
A new WebKitWebView::theme-color property has
beenadded to the public API, along with a
corresponding webkit_web_view_get_theme_color() getter. Its value follows
that of the theme-color metadata
attribute
declared by pages loaded in the web view. Although applications may use the
theme color in any way they see fit, the expectation is that it will be used to
adapt their user interface (as in this
example) to
complement the Web content being displayed.
Multimedia 🎥
GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.
Damage propagation has been toggled for the GTK
port: for now only a single rectangle
is passed to the UI process, which then is used to let GTK know which part of a
WebKitWebView has received changes since the last repaint. This is a first
step to get damage tracking code widely tested, with further improvements to be
enabled later when considered appropriate.
Adaptation of WPE WebKit targeting the Android operating system.
WPE-Android 0.2.0
has been released. The main change in this version is the update to WPE WebKit
2.48.3, which is the first that can be built for Android out of the box,
without needing any additional patching. Thanks to this, we expect that the WPE
WebKit version used will receive more frequent updates going forward. The
prebuilt packages available at the Maven Central
repository
have been updated accordingly.
Releases 📦️
WebKitGTK
2.49.2 and
WPE WebKit 2.49.2 have
been released. These are development snapshots and are intended to let those
interested test out upcoming features and improvements, and as usual issue
reports are welcome in Bugzilla.
The Web Engines Hackfest 2025 is kicking off next Monday in A Coruña and among
all the interesting talks and
sessions about
different engines, there are a few that can be interesting to people involved
one way or another with WebKitGTK and WPE:
“Multimedia in
WebKit”, by Philippe
Normand (Tuesday 3rd at 12:00 CEST), will focus on the current status and
future plans for the multimedia stack in WebKit.
All talks will be live streamed and a Jitsi Meet link will be available for
those interested in participating remotely. You can find all the details at
webengineshackfest.org.
Update on what happened in WebKit in the week from May 19 to May 26.
This week saw updates on the Android version of WPE, the introduction
of a new mechanism to support memory-mappable buffers which can lead
to better performance, a new gamepad API to WPE, and other improvements.
Cross-Port 🐱
Implemented support for the new 'request-close' command for dialog elements.
JavaScriptCore 🐟
The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.
Added support for using the GDB JIT API when dynamically generating code in JSC.
Graphics 🖼️
Added support for memory-mappable GPU buffers. This mechanism allows to allocate linear textures that can be used from OpenGL, and memory-mapped into CPU-accessible memory. This allows to update the pixel data directly, bypassing the usual glCopyTexSubImage2D logic that may introduce implicit synchronization / perform staging copies / etc. (driver-dependant).
WPE WebKit 📟
WPE Platform API 🧩
New, modern platform API that supersedes usage of libwpe and WPE backends.
Landed a patch to add a gamepads API to WPE Platform with an optional default implementation using libmanette.
Igalia worked with Savant Systems to bring a seamless, high-performance music experience to its smart home ecosystem. By enhancing WPE WebKit with critical backported patches, developing a custom Widevine CDM, and engineering a JavaScript D-Bus bridge, WPE WebKit was adapted to ensure robust and secure media playback directly within Savant’s platform.
Delivering a tightly integrated music experience in a smart home environment required overcoming significant technical challenges. To achieve this, WPE WebKit’s capabilities were streamlined to enable a fluid interface and reliable communication between the browser and the music process that powers a third-party music integration.
With deep expertise in browser technology and embedded systems, Igalia was able to help Savant implement a tailored WPE WebKit integration, optimizing performance while maintaining security and responsiveness. The result is a cutting-edge solution that enhances user experience and supports Savant’s commitment to innovation in smart home entertainment.