Planet Igalia

January 20, 2021

Sergio Villar

Flexbox Cats (a.k.a fixing images in flexbox)

In my previous post I discussed my most recent contributions to flexbox code in WebKit mainly targeted at reducing the number of interoperability issues among the most popular browsers. The ultimate goal was of course to make the life of web developers easier. It got quite some attention (I loved Alan Stearns’ description of the post) so I decided to write another one, this time focused in the changes I recently landed in WebKit (Safari’s engine) to improve the handling of elements with aspect ratio inside flexbox, a.k.a make images work inside flexbox. Some of them have been already released in the Safari 118 Tech Preview so it’s now possible to help test them and provide early feedback.

(BTW if you wonder about the blog post title I couldn’t resist the temptation of writing “Flexbox Cats” which sounded really great after the previous “Flexbox Gaps”. After all, image support was added to the Web just to post pictures of 🐱, wasn’t it?)

Same as I did before, I think it’d be useful to review some of the more relevant changes with examples so you could have any of those so inspiring a-ha moments when you realize that the issue you just couldn’t figure out was actually a problem in the implementation.

What was done

Images as flex items in column flows

Web engines are in charge of taking an element tree, and accompanying CSS and creating a box tree from this. All of this relies on Formatting Contexts. Each formatting context has specific ideas about how layout behaves. Both flex and grid, for example, created new, interesting formatting contexts which allow them to size their children by shrinking and or stretching them. But how all this works can vary. While there is “general” box code that is consulted by each formatting text, there are also special cases which require specialized overrides. Replaced elements (images, for example), should work a little differently in flex and grid containers. Consider this:

.flexbox {
    display: flex;
    flex-direction: column;
    height: 500px;
    justify-content: flex-start;
    align-items: flex-start;

.flexbox > * {
    flex: 1;
    min-width: 0;
    min-height: 0;

<div class="flexbox">
      <img src="cat1.jpg>

Ideally, the aspect ratio of the replaced element (the image, in the example) would be preserved as the flex context calculated its size in the relevant direction (column is the block direction/vertical in western writing modes, for example)…. But in WebKit, they weren’t. They are now.

Black and white cat by pixabay

Images as flex items in row flows

This second issue is kind of the specular twin of the previous one. The same issue that existed for block sizes was also there for inline sizes. Overriding inline sizes were not used to compute block sizes of items with aspect ratio (again the intrinsic inline size was used) and thus the aspect ratio of the image (replaced elements in general) was not preserved at all. Some examples of this issue:

.flexbox {
  display: flex;
  flex-direction: row;
  width: 500px;
  justify-content: flex-start;
  align-items: flex-start;
.flexbox > * {
  flex: 1;
  min-width: 0;
  min-height: 0;

<div class="flexbox">
    <img src="cat2.jpg">

Gray Cat by Gabriel Criçan

Images as flex items in auto-height flex containers

The two fixes above allowed us to “easily” fix this one because we can now rely on the computations done by the replaced elements code to compute sizes for items with aspect ratio even if they’re inside special formatting contexts as grid or flex. This fix was precisely about delegating that computation to the replaced elements code instead of duplicating all the aspect-ratio machinery in the flexbox code. This fix has apparently the potential to be a game changer:
This is a key bug to fix so that Authors can use Flexbox as intended. At the moment, no one can use Flexbox in a DOM structure where images are flex children.
Jen Simmons in bug 209983 Also don’t miss the opportunity to check this visually appealing demo by Jen which should work as expected now. For those of you not having a WebKit based browser I’ve recorded a screencast for you to compare (all circles should be round).
Left: old WebKit. Right: new WebKit (tested using WebKitGtk)
Apart from the screen cast, I’m also showcasing the issue with some actual code.

.flexbox {
    width: 500px;
    display: flex;
.flexbox > * {
    min-width: 0;

<div class="flexbox">  
  <img style="flex: auto;" src="cat3.jpg">

Tabby Cat by Bekka Mongeau

Flexbox additional cases for definite sizes

This was likely the trickiest one. I remember having nightmares with all the definite/indefinite stuff back then when I was implementing grid layout with other Igalia colleages. The whole thing about definite/indefinite sizes although sensible and relatively easy to understand is actually a huge challenge for web engines which were not really designed with them in mind. Laying out web content traditionally means taking a width as input to produce a height as output. However formatting contexts like grid or flex make the whole picture much more complicated.
This particular issue was not a malfunction but something that was not implemented. Essentially the flex specs define some cases where indefinite sizes should be considered as definite although the general rule considers them indefinite. For example, if a single-line flex container has a definite cross size we could assume that flex items have a definite size in the cross axis which is indeed equal to the flex container inner cross size.
In the following example the flex item, the image, has height:auto (by default) which is an indefinite size. However the flex container has a definite height (a fixed 300px). This means that when laying out the image, we could assume that its height is definite and equal to the height of the container. Having a definite height then allows you to properly compute the width using an aspect ratio.

.flexbox {
    display: flex;
    width: 0;
    height: 300px;

<div class="flexbox">
  <img src="cat4.png">

White and Black Cat With Blue Eyes by Thomas Svensson

Aspect ratio computations and box-sizing

Very common overlook in layout code. When dealing with layout bugs we (browser engineers) usually forget about box-sizing because the standard box model is the truth and the whole truth and the sole truth in our minds. Jokes aside, in this case the aspect ratio was applied to the border box (content + border + padding) instead of to the content box as it should. The result were distorted images because border and padding where altering the aspect ratio computations.

.flexbox {
  display: flex;
.flexbox > * {
  border-top: 150px solid blue;
  border-left: 30px solid orange;
  height: 300px;
  box-sizing: border-box;

<div class=flexbox>
  <img src="cat5.png"/>

Grayscale Photo of Long Fur Cat by Skyler Ewin


I mentioned this in the previous post but I’ll do it again here, having the web platform test suite has been an an absolute game changer for web browser engineers. They have helped us in many ways, from easily allowing us to verify our implementations to acting as a safety net against potential regressions we might add while fixing issues in the engines. We no longer have to manually test stuff in different browsers to check how other developers have interpreted the specs. We now have the test, period.
In this case, I’ve been using them in a different way. They have served me both as a guide, directing my efforts to reduce the flexbox interoperability issues and also as a nice metric to measure the progress of the task. Talking about metrics, this work made WebKit based browsers pass an additional 64 test cases from the WPT test suite, a very nice step forward for interoperability.
I’m attaching a screenshot with the current status of images as flex items from the WPT point of view. Each html file on the left column is a test, and each test performs multiple checks. For example the image-as-flexitem-* ones run 19 different checks (use cases) each. Each column show how many tests each browser successfully run. A quarter ago Safari’s (WebKit’s) figures for most of them were 11/19, 13/19 but now the last Tech Preview it’s passing all of them. Not bad huh?
image-as-flexitem-* flexbox tests in WPT as of 2021/01/20


Again many thanks to the different awesome folks at Apple, Google and my beloved Igalia that helped me with very insightful reviews and strong support at all levels.
Also I am thankful to all the photographers from whom I borrowed their nice cat pictures (including the Brown and Black Cat on top by pixabay).

by svillar at January 20, 2021 09:45 AM

January 08, 2021

Samuel Iglesias

Vkrunner RPM packages available

VkRunner is a Vulkan shader tester based on Piglit’s shader_runner (I already talked about it in my blog). This tool is very helpful for creating simple Vulkan tests without writing hundreds of lines of code. In the Graphics Team at Igalia, we use it extensively to help us in the open-source driver development in Mesa such as V3D and Turnip drivers.

As a hobby project for last Christmas holiday season, I wrote the .spec file for VkRunner and uploaded it to Fedora’s Copr and OpenSUSE Build Service (OBS) for generating the respective RPM packages.

This is the first time I create a package and thanks to the documentation on how to create RPM packages, the process was simpler than I initially thought. If I find the time to read Debian New Maintainers’ Guide, I will create a DEB package as well.

Anyway, if you have installed Fedora or OpenSUSE in your computer and you want to try VkRunner, just follow these steps:


  • Fedora:
$ sudo dnf copr enable samuelig/vkrunner
$ sudo dnf install vkrunner

OpenSUSE logo

  • OpenSUSE / SLE:
$ sudo zypper addrepo
$ sudo zypper refresh
$ sudo zypper install vkrunner

Enjoy it!

January 08, 2021 07:42 AM

December 31, 2020

Brian Kardell

TAG 2021

TAG 2021

The W3C is in the middle of a big, and arguably very difficult election for the W3C Techincal Architecture Group (aka TAG). The TAG is one of two small bodies within the W3C which are elected by membership. If you're unfamilliar with this, I wrote this exceptionally brief Primer for Busy People. I'd like to tell you why I think this is a big election, why it is complex, what I (parsonally) think about it, and what you can do if you agree.

The current W3C TAG election is both important and complex for several reasons. It's big because 4 of the 6 elected seats are up for election and two exceptional members (Alice Boxhall from Google and David Baron from Mozilla) are unable to run for reelection. It's big because there are) nine candidates and each brings different things to the table (and statements don't always capture enough). It's complex because of the voting system and participation.

Let me share thoughts on what I think would be a good result, and then I'll explain why that's hard to achieve and what I think we need to avoid.

A good result...

I believe the best result involves 3 candidates for sure (listed alphabetically): Lea Verou, Sangwhan Moon and Theresa O’Connor.

Let's start with the incumbents. I fully support re-electing both Theresa O’Connor and Sangwhan Moon and cannot imagine reasons not to. Both have a history in standards, have served well on TAG, have a diversity of knowledge, and are very reasonable and able to work well. Re-electing some good incumbents with these qualities is a practical advantage to the TAG as well as they are already well immersed. These are easy choices.

Lea Verou is another easy choice for me. Lea brings a really diverse background, set of perspectives and skills to the table. She's worked for the W3C, she's a great communicator to developers (this is definitely a great skill in TAG whose outreach is important), she's worked with small teams, produced a number of popular libraries and helped drive some interesting standards. The OpenJS Foundation was pleased to nominate her, but Frontiers and several others were also supportive. Lea also deserves "high marks".

These 3 are also a happily diverse group.

This leaves one seat. There are 3 other candidates who I think would be good, for different reasons: Martin Thompson, Amy Guy and Jeffrey Yaskin. Each of them will bring something different to the table and if I am really honest, it is a hard choice. I wish we could seat all 3 of them, but we can't. At least in this election (that is, they can run again).

For brevity, I will not even attempt to make all the cases here, but I encourage you to read their statements and ask friends. Truth be told, I have a strong sense that "any mix of these 6 could be fine" and different mixes optimize for slightly different things. Also, as I will explain, there are some things that seem slightly more important to me than who I recommend is third best vs fourth or fifth...

TLDR; Turn out the vote

If you find yourself in agreement with me, more or less, I would suggest: "place the 3 I mentioned above (Lea, Tess, Sangwhan) at least your top 4 places", pick a fourth from my other list and put them in whatever order you like..

tess lea sangwhan

I think there are many possible great slates for TAG in 2021, but they all involve Lea, Tess and Sangwhan. Please help support them and place them among your top 4 votes.

If you're a W3C member, your AC Representative votes for you -- tell them. Make sure they vote - the best result definitely depends on higher than average turnout. Suprisingly, about 75% of membership doesn't normally vote. These elections are among the rare times when there are "votes" where there are equal voices. A tiny 1 person member org has exactly the same voting power as every mega company.

If you're not a W3C member, you don't have a way to vote directly but you can publicly state your support and tag in member orgs or reach out to people you know who work for member orgs. Historically this has definitely helped - let's keep W3C working well!

STV, Turnout and Negative Preference

The W3C's election system uses STV. STV stands for "single transferrable vote". Single is the operative word: While you express 9 preferences in this election, only one of those will actually be counted to help someone win a seat. The counting system is (I think) rather complex, but the higher up on your list someone appears it is far more likely to be the one that counts. Each vote that is counted counts a 1 vote in that candidates' favor.

Let me stress why this matters: STV optimizes for choosing diversity of opinion with demonstrably critical support. A passionate group supporting an 'issue' candidate will all place their candidate as the #1 choice - those are guaranteed to be counted.

Historically only about 100 of the W3C's 438 member organizations actually turn out in a good election. Let's imagine turnout is even lower in 2020 and it's only 70. This means that if a candidate reaches 18 votes (a little over 4% of membership) they have a seat, no matter how the rest of us vote - even if everyone else had and actively negative preference for them.

Non-participation is an issue for all voting systems, but it seems like STV can really amplify outcomes which are undesirable here. The only solution to this problem is to increase turnout. Increasing turnout raises the quota bar and helps ensure that this doesn't happen.

Regardless of how you feel about any of the candidates, please help turnout the vote. The W3C works best when as many members vote as possible!

December 31, 2020 08:00 PM

December 22, 2020

Manuel Rego

2020 Recap

2020 is not a great year to do any kind of recap, but there have been some positive things happening in Igalia during this year. Next you can find a highlight of some of these things in no particular order.

CSS Working Group A Coruña F2F

The year couldn’t start better, on January Igalia hosted a CSS Working Group face-to-face meeting in our office in A Coruña (Galicia, Spain). Igalia has experience arranging other events in our office, but this was the first time that the CSSWG came here. It was an amazing week and I believe everyone enjoined the visit to this corner of the world. 🌍

Brian Kardell from Igalia was talking to everybody about Container Queries. This is one of the features that web authors have been asking for since ever, and Brian was trying to push the topic forward and find some kind of solution (even if not 100% feature complete) for this topic. In that week there were discussions about the relationship with other topics like Resize Observer or CSS Containment, and new ideas appeared too. Brian posted a blog post after the event, explaining some of those ideas. Later my colleague Javi Fernández worked on an experiment that Brian mentioned on a recent post. The good news is that all these conversations managed to bring this topic back to life, and past November Google announced that they have started working on a Container Queries prototype in Chromium.

During the meeting Jen Simmons (in Mozilla at that time, now in Apple) presented some topics from Mozilla, including a detailed proposal for Masonry Layout based on Grid, this has been something authors have also showed interest, and Firefox has already a prototype implementation behind a runtime flag.

Apart from the three days full of meetings and interesting discussions, some of the CSSWG members participated in a local meetup giving 4 nice talks:

Finally, I remember some corridor conversations about the Mozilla layoffs that had just happened just a few days before the event, but nobody could expect what was going to happen during the summer. It looks like 2020 has been a bad year for Mozilla in general and Servo in particular. 😢

Open Prioritization

This summer Igalia launched the Open Prioritization campaign, where we proposed a list of topics to be implemented on the different browser engines, and people supported them with different pledges; I wrote a blog post about it by that time.

Open Prioritization: :focus-visible in Safari/WebKit: $30.8K pledged out of $35K. Open Prioritization: :focus-visible in Safari/WebKit

This was a cool experiment, and it looks like a successful one, as :focus-visible in WebKit/Safari has been the winner. Igalia is currently collecting funds through Open Collective in order to start the implementation of :focus-visible in WebKit, you still have time to support it if you’re interested. If everything goes fine this should happen during the first quarter of 2021. 🚀

Igalia Chats

This actually started in later 2019, but it has been ongoing during the whole 2020. Brian Kardell has been recording a podcast series about the web platform and some of its features with different people from the industry. They have been getting more and popular, and Brian was even asked to record one of these for the last BlinkOn edition.

So far 8 episodes of around 1 hour length have been published, with 13 different guests. More to come in 2021! If you are curious and want to know more, you can find them at Igalia website or in your favourite podcasting platform.

Igalia contributions

This is not a comprehensive list but just some highlights of what Igalia has been doing in 2020 around CSS:

We’re working on a demo about these features, that we’ll be publishing next year.

In February Chromium published the requirements to become API owner. Due to my involvement on the Blink project since the fork from WebKit back in 2013, I was nominated and became Blink API Owner past March. 🥳

Yoav Weiss on the BlinkOn 13 Keynote announcing me as API owner Yoav Weiss on the BlinkOn 13 Keynote announcing me as API owner

The API owners met on a weekly basis to review the intent threads and discuss about them, it’s an amazing learning experience to be part of this group. In my case when reviewing intents I usually pay attention to things related to interoperability, like the status of the spec, test suites and other implementations. In addition, I have the support from all my awesome colleagues at Igalia that help me to play this role, thank you all!

2021 and beyond…

Igalia keeps growing and a bunch of amazing folks will join us soon, particularly Delan Azabani and Felipe Erias are already starting these days as part of the Web Platform team.

Open Prioritization should have the first successful project, as :focus-visible is advancing funding and it gets implemented in WebKit. We hope this can lead to new similar experiments in the future.

And I’m sure many other cool things will happen at Igalia next year, stay tuned!

December 22, 2020 11:00 PM

Brian Kardell

2020: The Good Parts

Note from the author...

My posts frequently (like this one) have a 'theme' and tend to use a number of images for visual flourish. Personally, I like it that way, I find it more engaging and I prefer for people to read it that way. However, for users on a metered or slow connection, downloading unnecessary images is, well, unnecessary, potentially costly and kind of rude. Just to be polite to my users, I offer the ability for you to opt out of 'optional' images if the total size of viewing the page would exceed a budget I have currently defined as 200k...

2020: The Good Parts

Each year, Igalians take a moment and look back on the year and assess what we've accomplished. Last year I wrote a wrap up for 2019 and hinted at being excited about some things in 2020 - I'd like to do the same this year.

Even in a "normal" year, making a list of things that you've actually accomplished can be a good exercise for your mental health. I do it once a month, in fact. It's easy to loose sight beyond what you're thinking of in the moment and feel overwhelmed by the sheer volume. If I can be honest with you, since it's just between us, heading into this exercise always fills me with a sense of dread. It always seems like now is the time when you have to come to grips with how little you actually accomplished this month. But, my experience is always quite the opposite: The simple act of even starting to create a list of things you actually did can give you a whole new perspective. Sometimes, usually maybe, I don't even finish the list because I hit a point where I say "Wow, actually, that's quite a lot" and feel quite a bit better.

But, 2020 is, of course, not a "normal year". It's more than fair to expect less of ourselves. So, when I sat down to write about what we accomplished, I was faced with this familiar sinking feeling -- and I had precisely the same reaction: Wow! We did a lot. So, let me share some highlights of Igalia's 2020: The Good Parts.

All the browsers

At Igalia, we are significant contributors to all of the browser engines (and several of the browsers that sit atop them too). There's a lot of ways you can look at just how much we do, and none of them are perfect, but commits are one kind of easy, but fuzzy measure of comparatively how much we did in the community. So, how much comparatively less did we do this year, than last? The opposite actually!

Igalia is again the #2 contributor to Chromium (Microsoft is coming up fast though). We are also again the #2 contributor to WebKit. Last year we raised some eyebrows by announcing that we had 11% of the total commits. This year: 15.5%! We also are up one place to the #6 contributors in the mozilla-central repository and up three places to #4 is servo! Keep in mind that #1 in all of these are the project owners (Google, Apple and Mozilla respectively).

We were huge contributors everywhere, but look at this: 15.5% of all WebKit Contributions in 2020!!

We worked on so many web features!

Some of the things we worked on are big or exciting things that everyone can appreciate and I want to highlight a little more here, but the list of features where we worked on at least one (sometimes two) implementations would be prohibitively long! Here is a very partial list of ones we worked on that I won't be highlighting.

  • Lazy loading
  • stale-while-revalidate
  • referrer-policy
  • Fixing bugs with XHR/fetch
  • Interop/Improvements to ResizeObserver/IntersectionObserver
  • Custom Properties performance
  • Text wrapping, line breaking and whitespace
  • Trailing ideograph spaces
  • Default aspect ratio from HTML Attributes
  • scroll snap
  • scroll-behavior
  • overscroll-behavior
  • scrollend event
  • Gamepad
  • PointerLock
  • list-stlye-type: <string>
  • ::marker
  • Lgical inset/border/padding/margin

A few web feature highlights...

Here are just a few things that I'd like to say a little more about...

Container Queries

I am exceptionally pleased that we have been pivotal in moving the ball in conversations on container queries. Not only did our outreach and discussions last year change the temperature of the room, but we got a start on two proposals and actually had CSS Working Group discussion on both. I'm also really pleased that Igalia built a functional prototype for further review and discussion of our switch proposal and that we've been collaborating with Google and Miriam Suzanne who have picked up where David Baron's proposal left.

unicorns walking around in paradise
It's like we just found not one, but two mytical unicorns

I expect 2021 to be an exciting year of developments in this space where we get a lot more answers sorted out.


Two years ago, MathML fell into a strange place in web history and had an uncertain future in browsers. Igalia has led the charge in righting all of this. Along with the MathML Refresh Community Group, peer implementers and help from various standards groups we now have MathML-Core - a well defined spec, with tests that define the most meaningful parts of MathML and their relation to the web platform as interoperability targets. We've made a ton of proress in aligning support, describing how things fit, and invested a lot of time this year up-streaming work in Chromium. Some additional work remains for next year pending Layout NG work at Google, but it's looking better and better and most of it shipping behind the experimental web platform features flag today. We also helped create and advocate for a new W3C charter for math.

But let me share why I'm especially proud of it...

A professor in front of a chalkboard full of math with space scenes super-imposed

Because Math is text, and a phenomenally import kind of text. The importance of begin able to render formulae is really highlighted during a pandemic, where researchers of all kinds need to share information and students are learning from home. I'm super proud to be a part of this single action that I believe really is a leap in helping the Web realize its potential for these societally important issues.

SVG/CSS Alignment

At the end of last year's post I hinted about something we were working on. The practical upshots that people will more immediately relate to will be the abilities to do 3D transforms in SVG and hardware accelerate SVG.

These are long requested enhancements but also come with historical baggage, so it's been difficult for browsers to invest. It's a great illustration of why Igalia is great for the ecosystem. This work is getting investment priority because Igalia are the maintainers of WPE WebKit, the official WebKit port for embedded devices.

Software on embedded devices has a marriage of unique qualities that lots of controls and displays want to be SVG-based, but also have to deal with typically low end hardware, which usually still has a GPU. Thus, this problem for those devices is a few orders of magnitude more critical than it is elsewhere. However, our work will ultimately fund improvements for all WebKit browsers, which also incentivizes others to follow!


One thing we haven't talked about yet, but I can't wait to is OffscreenCanvas. Apple originally pioneered the <canvas> element and it's super cool and useful for a lot of stuff. Unfortunately, it is historically tied to the DOM, did its work on the main thread and couldn't be used in workers. This is terrible because many of the use cases it is really great for are real intense. This is a bad situation - luckily, we're working on it! Chris Lord has been working on OffscreenCanvas in WebKit and it looks great so far - everything except text portions is done and I've been using it with great results.

OffscreenCanvas can be used in workers, and you can 'snap off' and transfer the context from a DOM rendered canvas to a worker too. So great! And guess why we're investing in it? You guessed it: Embedded.


I mean, this is kinda huge right? Igalia is investing to help move XR forward in WebKit - part of this is generalized for WebKit, and I think that is kind of awesome. Still early days and there's a lot to do, but this is pretty exciting to see developing and I'm proud that Igalia is helping make it happen!

Important JavaScript stuff!

We Pushed and shipped public/private instance and static fields in JavaScriptCore (WebKit). Private methods are ongoing. We've really improved coordination with WebKit at large this year, and we're constantly improving the 32bit JSC too. We're working on WebAssembly and numerous emerging TC39 specifications we're excited about: Module blocks, decorators, bundling, Realms, decimal, records and tuples, Temporal and lots of things for ECMA 402 (Internationalization) too!

Web Related, non-feature things

There's a lot of other things that we accomplished this year at Igalia which are pretty exciting too!

  • Open Prioritization! This year we ran a pilot experiment called "Open Prioritization" to start some big and complex discussions and attempt to find ways to give more people a voice in the prioritization of work on the commons. We partnered with Open Collective and while I can't say the first experiment was flawless, we learned a lot and are moving forward with a project picked and funded by the larger community as well as establishing a collective to continue to do this!

  • Our new podcast! This year we also launched a podcast. We've had great discussions on complex topics and had amazing guests, including one of the creators of CSS Håkon Wium Lie, people from several browser vendors past and present, people who helped drive the two historically special embeddable forms in HTML (MathML and SVG), and some developer and web friends. It's available via all of your favorite podcasting services, a playlist on our YouTube channel and on our website

  • ipfs This year we also began working with Protocol Labs to improve some things around protocol handers - those are great for the web at large and it's interesting and exciting to see what is happening with things like IPFS!

  • Joined MDN PAB This year Igalia also joined the MDN Product Advisory Board, and we're looking forward to helping ensure that the vital resource that is MDN remains healthy!

  • WPE You might know that Igalia are the maintainers of a few of the official WebKit ports, and one of them is for embedded systems. I'm really pleased with all of the thins that this has allowed us to help drive for WebKit and the larger Web Platform. However, embedded "browsers" was kind of a new topic to me when I began my work here and it's somewhat different than the sorts of challenges I am used to. With embedded systems you typically build the OS specifically for the device. Sharing the same web tech commons is phenomenal, but for many developers like myself, my questions about embedded were difficult to explore on my own as someone who rarely compiles a browser, much less an operating system! I'm really pleased with the progress we've made on this, making more friendly, informative and relevant to people who might not already be experts at this, including making easy step-wise options available for people to explore. Someone with no experience and download a raspbian based OS with WPE WebKit on it and flash it right on a Raspberry Pi just to explore. For a lot of pet projects, you can do a lot with that too. That's not super representative of a good embedded system in terms of performance and things, but it is very easy and it's maintained by us, so it's pretty up to date. A short step away, if you're pretty comfortable with Linux shell and ssh, you can get a minimal/optimized for Raspberry Pi 3 install you can flash right onto your Pi that runs a Weston Wayland compositor. Finally, if you already kind of know what you're doing, we maintain Yocto receipes for developers to more easily build and maintain their real systems.

  • Vulkan! driver - You might know that Igalia does all kinds of stuff beyond just the Web, we work on all of the things that power the web too, and kind of all the way down - so we have lots of areas of specialization. I think it's really cool that we partnered with Raspberry Pi to create a Vulkan driver for the Mesa graphic driver stack for the latest generation of Raspberry Pi, achieving conformance in less than 1 year, passing over 100k tests from Kronos' Conformance Test Suite since our initial announcement of drawing the first triangle!

Looking forward...???

So, what exciting things can we look forward to in 2021? Definitely, advancing all of the things above - and that's exciting enough. It's hard to know what to be most excited for, but I'm personaly really looking forward to watchin Open Prioritization grow and get a real good idea and very concrete progress on Container Queries issues. We've also got our eyes on some new things we'll be starting to talk about in the next year, so stay tuned on those too.

One, that I'd like to mention, however is tabs. Over the past year, Igalia has begun to get involved with efforts like OpenUI and I've been talking to developers and peers at Igalia about tabs. I had some thoughts and ideas that I posted earlier this year. Just recently some actual work and collaboration has been done - gettinga number of us with similar ideas together to sort out a useful custom element that we can test out, as well as working in OpenUI on aligning all of the stars we'll need to align as we attempt to approach something like standard tabs. It is very early days here, but we've gone from a vague blog post to some actual involvement and we're getting behind the idea - which is pretty exciting and I can't wait to say more concrete things here!

December 22, 2020 05:00 AM

December 21, 2020

Oriol Brufau

CSS ::marker pseudo-element in Chromium


Did you know that CSS makes it possible to style list markers?

In the past, if you wanted to customize the bullets or numbers in a list, you would probably have to hide the native markers with list-style: none, and then add fake markers with ::before.

However, now you can just use the ::marker pseudo-element in order to style the native markers directly!

If you are not familiar with it, I suggest reading these articles first:

In this post, I will explain the deep technical details of how I implemented ::marker in Chromium.

Thanks to Bloomberg for sponsoring Igalia to do it!

Implementing list-style-type: <string>

Before starting working on ::marker itself, I decided to add support for string values in list-style-type. It seemed a quite useful feature for authors, and Firefox already shipped it in 2015. Also, it’s like a more limited version of content in ::marker, so it was a good introduction.

It was relatively straight-forward to implement. I did it in a single patch,, which landed in Chromium 79. Then I also ported it into Webkit, it’s avilable since Safari Technology Preview 115.

<ol style="list-style-type: '★ '">


Parsing and computation

The interesting thing to mention is that list-style-type had been implemented with a keyword template, so its value would be internally stored using an enum, and it would benefit from the parser fast path for keywords. I didn’t want to lose that, so I followed the same approach as for display, which also used to be a single-keyword property, until Houdini extended its syntax with layout(<ident>).

Basically, I treated list-style-type as a partial keyword property. This means that it keeps the parser fast path for keyword values, but in case of failure it falls back to the normal parser, where I accepted a string value.

When a string is provided, the internal list-style-type value is set to a special EListStyleType::kString enum value, and the string is stored in an extra ListStyleStringValue field.


From a layout point of view, I had to modify both LayoutNG and legacy code. LayoutNG is a new layout engine for Chromium that has been designed for the needs of modern scalable web applications. It was released in Chrome 77 for block and inline layout, but some CSS features like multi-column haven’t been implemented in LayoutNG yet, so they force Chromium to use the old legacy engine.

It was mostly a matter of tweaking LayoutNGListItem (for LayoutNG) and LayoutListMarker (for legacy) in order to retrieve the string from ListStyleStringValue when the ListStyleType was EListStyleType::kString, and making sure to update the marker when ListStyleStringValue changed.

Also, string values are somewhat special because they don’t have a suffix, unlike numeric values that are suffixed with a dot and space (like 1. ), or symbolic values that get a trailing space (like ).

It’s noteworthy that until this point, markers didn’t have to care about mixed bidi. But now you can have things like list-style-type: "aال", that is: U+0061 a, U+0627 ا, U+0644 ل. Note that ا is written before ل, but since they are arabic characters, ا appears at the right.

This is relevant because the marker is supposed to be isolated from the text in the list item, so in LayoutNG I had to set unicode-bidi: isolate to inside markers. It wasn’t necessary for outside markers since they are implemented as inline-blocks, which are already isolated.

In legacy layout, markers don’t actually have their text as a child, it’s just a paint-time effect. As such, no bidi reordering happens, and aال doesn’t render correctly:

<li style="list-style: 'aال - ' inside">text</li>

LayoutNG: screenshot vs. legacy: screenshot

At that point I decided to leave it this way, but this kind of problems in legacy layout would keep haunting me while implementing ::marker. Keep reading to know the bloody details!

::marker parsing and computation

Here I started working on the actual ::marker pseudo-element. As a first step, in I recognized ::marker as a valid selector (behind a flag), added a usage counter, and defined a new PseudoId::kPseudoIdMarker to identify it in the node tree.

It’s important to note that list markers were still anonymous boxes, there was no actual ::marker pseudo-element, so kPseudoIdMarker wasn’t actually used yet.

Something that needs to be taken into account when using ::marker is that the layout model for outside positioning is not fully defined. Therefore, in order to prevent authors from relying on implementation-defined behaviors that may change in the future, the CSSWG decided to restrict which properties can actually be used on ::marker.

I implemented this restriction in, using a ValidPropertyFilter just like it was done for ::first-letter and ::cue. But note this was later refactored, and now whether a property applies to ::marker or not is specified in the property definition in css_properties.json5.

At this point, ::marker only allowed:

  • All font properties
  • Custom properties
  • color
  • content
  • direction
  • text-combine-upright
  • unicode-bidi

Using ::marker styles

At this point, ::marker was a valid selector, but list markers weren’t using ::marker styles. So in I just took these styles and assigned them to the markers.

This simple patch was the real deal, making Chromium’s implementation of ::marker match WebKit’s one, which shipped in 2017. When enabling the runtime flag, you could style markers:

::marker {
  color: green;
  font-weight: bold;


This landed in Chromium 80. So, how come I didn’t ship ::marker until 86?

The answer is that, while the basic functionality was working fine, I wanted to provide a full and solid implementation. And it was not yet the case, since content was not working, and markers were still anonymous boxes that just happened to get assigned the styles for ::marker pseudo-elements, but there were no actual ::marker pseudo-elements.

Support content in LayoutNG

Adding support for the content property was relatively easy in LayoutNG, since I could reuse the existing logic for ::before and ::after.

Roughly it was a matter of ignoring list-style-type and list-style-image in non-normal cases, and using the LayoutObject of the ContentData as the children. This was not possible in legacy, since LayoutListMarker can’t have children.

It may be worth it to summarize the different LayoutObject classes for list markers:

  • LayoutListMarker, based on LayoutBox, for legacy markers.
  • LayoutNGListMarker, based on LayoutNGBlockFlowMixin<LayoutBlockFlow>, for LayoutNG markers with an outside position.
  • LayoutNGInsideListMarker, based on LayoutInline, for LayoutNG markers with an inside position.

It’s important to note that non-normal markers were actual pseudo-elements, their LayoutNGListMarker or LayoutNGInsideListMarker were no longer anonymous, they had an originating PseudoElement in the node tree.

This means that I had to add logic for attaching, dettaching and rebuilding kPseudoIdMarker pseudo-elements, add LayoutObjectFactory::CreateListMarker(), and make LayoutTreeBuilderTraversal and Node::PseudoAware* methods be aware of ::marker.

Most of it was done in

Another problem that I had to address was that, until this point, both content: normal and content: none were considered to be synonymous, and were internally stored as nullptr.

However, unlike in ::before and ::after, normal and none have different behaviors in ::marker: the former decides the contents from the list-style properties, the latter prevents the ::marker from generating boxes.

Therefore, in I implemented content: none as a NoneContentData, and replaced the HasContent() helper function with the more specific ContentBehavesAsNormal() and ContentPreventsBoxGeneration().

Default styles

According to the spec, markers needed to get assigned these styles in UA origin:

unicode-bidi: isolate;
font-variant-numeric: tabular-nums;

At this point, the ComputedStyle for a marker could be created in different ways:

  • If there was some ::marker selector, by running the cascade normally.
  • Otherwise, LayoutListMarker or LayoutNGListItem would create the style from scratch.

First, in I made all StyleResolver::PseudoStyleForElementInternal, LayoutListMarker::ListItemStyleDidChange and LayoutNGListItem::UpdateMarker set these UA rules.

Then in I made it so that markers would always run the cascade, unifying the logic in PseudoStyleForElementInternal. But this way of injecting the UA styles was a bit hacky and problematic.

So finally, in I implemented it in the proper way, using a real UA stylesheet. However, I took care of preventing that from triggering SetHasPseudoElementStyle, which would have defeated some optimizations.

Interestingly, these UA styles use a ::marker selector, but they also affect nested ::before::marker and ::after::marker pseudo-elements. That’s because I took advantage of a bug in the style resolver, so that I wouldn’t have to implement the nested ::marker selectors. The bug is disabled for non-UA styles.

LayoutNGListItem::UpdateMarker also had some style tweaks that I moved into the style adjuster instead of to the UA sheet, because the exact styles depend on the marker:

  • Outside markers get display: inline-block, because they must be block containers.
  • Outside markers get white-space: pre, to prevent their trailing space from being trimmed.
  • Inside markers get some margins, depending on list-style-type.

I did that in and

Some fun: 99.9% performance regression

An implication of my work on the default marker styles was that the StyleType() became kPseudoIdMarker instead of kPseudoIdNone.

This made LayoutObject::PropagateStyleToAnonymousChildren() do more work, causing the flexbox_with_list_item perf test to worsen by a 99.9%!

Performance graph

I fixed it in by returning early for markers with content: normal, which didn’t need that work anyways.

Once I completed the ::marker implementation, I tried reverting the fix, and then the test only worsened by a 2-3%. So I guess the big regression was caused by the interaction of multiple factors, and the other factors were later fixed or avoided.

Developer tools

It was important for me to expose ::marker in the devtools just like a ::before or ::after. Not just because I thought it would be beneficial for authors, but also because it helped me a lot when implementing ::marker.

So first I made the Styles panel expose the ::marker styles when inspecting the originating list item (

Devtools ::marker styles

And then I made ::marker pseudo-elements inspectable in the Elements panel ( and

Devtools ::marker tree

However, note this only worked for actual ::marker pseudo-elements.

LayoutNG markers as real pseudo-elements

As previously stated, only non-normal markers were internally implemented as actual pseudo-elements, markers with content: normal were just annymous boxes.

So normal markers wouldn’t appear in devtools, and would yield incorrect values in getComputedStyle:

getComputedStyle(listItem, "::marker").width; // "auto"

According to CSSOM that’s supposed to be the used width in pixels, but since there was no actual ::marker pseudo-element, it would just return the computed value: auto.

So in I implemented LayoutNG normal markers as real pseudo-elements. It’s a big patch, though mostly that’s because I had to update several test expectations.

Another advantage was that non-normal markers benefited from the much vaster test coverage for normal ones. For example, some accessibility code was expecting markers to be anonymous, I noticed this thanks to existing tests with normal markers. Without this change I might have missed that non-normal ones weren’t handled properly.

And a nice side-effect that I wasn’t expecting was that the flexbox_with_list_item perf test improved by a 30-40%. Nice!

It’s worth noting that until this point, pseudo-elements could only be originated by an element. However, ::before and ::after pseudo-elements can have display: list-item and thus have a nested marker.

Due to the lack of support for ::before::marker and ::after::marker selectors, I could previously assume that nested markers would have the initial content: normal, and thus be anonymous. But this was no longer the case, so in I added support for nested pseudo-elements. However, the style resolver is still not able to handle them properly, so nested selectors don’t work.

A consequence of implementing LayoutNG markers as pseudo-elements was that they became independent, they were no longer created and destroyed by LayoutNGListItem. But the common logic for LayoutNGListMarker and LayoutNGInsideListMarker was still in LayoutNGListItem, so this made it difficult to keep the marker information in sync. Therefore, in I moved the common logic into a new ListMarker class, and each LayoutNG marker class would own a ListMarker instance.

I also renamed LayoutNGListMarker to LayoutNGOutsideListMarker, since the old name was misleading.

Legacy markers as real pseudo-elements

Since I had already added the changes needed to implement all LayoutNG markers as pseudo-elements, I thought that doing the same for legacy markers would be easier.

But I was wrong! The thing is that legacy layout already had some bugs affecting markers, but they would only be triggered when dynamically updating the styles of the list item. But there aren’t many tests that do that, so they went unnoticed… until I tried my patch, which surfaced these issues in the initial layout, making some test fail.

So first I had to fix bug 1048672, 1049633, and 1051114.

Then there was also bug 1051685, involving selections or drag-and-drop with pseudo-elements like ::before or ::after. So turning markers into pseudo-elements made them have the same problem, causing a test failure.

I could finally land my patch in, which also improved performance like in LayoutNG.

Animations & transitions

While I was still working on ::marker, the CSSWG decided to expand the list of allowed properties in order to include animations and transitions. I did so in

The tricky part was that only allowed properties could be animated. For example,

@keyframes anim {
  from { color: #c0c; background: #0cc }
  to   { color: #0cc; background: #c0c }
::marker {
  animation: anim 1s infinite alternate;
<ol><li>Non-animated text</li></ol>

Animated ::marker

Only the color of the marker is animated, not the background.

counter(list-item) inside <ol>

::before and ::after pseudo-elements already had the bug that, when referencing the list-item counter inside an <ol>, they would produce the wrong number, usually 1 unit greater.

Of course, ::marker inherited the same problem. And this was breaking one of the important use-cases, which is being able to customize the marker text with content.

For example,

::marker { content: "[" counter(list-item) "] " }

would start counting from 2 instead of 1:

::marker counter bug

Luckily, WebKit had already fixed this problem, so I could copy their solution. Unluckily, they mixed it with a big irrelevant refactoring, so I had to spend some time understanding which part was the actual fix. I ported it into Chromium in

Support content in legacy

The only missing thing to do was adding support for content in legacy layout. The problem was that LayoutListMarker can’t have children, so it’s not possible to just insert the layout object produced by the ContentData.

Then, my idea was replacing LayoutListMarker with two new classes:

  • LayoutOutsideListMarker, for markers with outside positioning.
  • LayoutInsideListMarker, for markers with inside positioning.

and they could share the ListMarker logic with LayoutNG markers.

However, when I started working on this, something big happened: the COVID-19 pandemic.


And Google decided the skip Chromium 82 due to the situation, which is relevant because, in order to be able to merge patches easily, they wanted to avoid big refactorings.

And a big refactoring is precisely what I needed! So I had to wait until Chromium 83 reached stable.

Also, Google engineers were not convinced by my proposal, because it would imply that legacy markers would use more memory and would be slower, since they would have children even with content: normal.

So I changed my strategy as such:

  • Keep LayoutListMarker for normal markers.
  • Add LayoutOutsideListMarker for non-normal outside markers.
  • Add LayoutInsideListMarker for non-normal inside markers.

This was done in this chain of CLs: 2109697, 2109771, 2110630, 2246514, 2252244, 2252041, 2252189, 2246573, 2258732.

::marker enabled by default

Finally the ::marker implementation was complete!

To summarize, list markers ended up implemented among 5 different layout classes:

  • LayoutListMarker
    • Used for normal markers in legacy layout.
    • Based on LayoutBox.
    • Can’t have children, doesn’t use ListMarker.
  • LayoutOutsideListMarker
    • Used for outside markers in legacy layout.
    • Based on LayoutBlockFlow, i.e. it’s a block container.
    • Has children, uses ListMarker to keep them updated.
  • LayoutInsideListMarker
    • Used for inside markers in legacy layout.
    • Based on LayoutInline, i.e. it’s an inline box.
    • Has children, uses ListMarker to keep them updated.
  • LayoutNGOutsideListMarker
    • Used for outside markers in LayoutNG.
    • Based on LayoutNGBlockFlowMixin<LayoutBlockFlow>, i.e. it’s a block container.
    • Has children, uses ListMarker to keep them updated.
  • LayoutNGInsideListMarker
    • Used for inside markers in LayoutNG.
    • Based on LayoutInline, i.e. it’s an inline box.
    • Has children, uses ListMarker to keep them updated.

So at this point I just landed to enable ::marker by default. This happened in Chromium 86.0.4198.0.

Allowing more properties

After shipping ::marker, I continued doing small tweaks in order to align the behavior with more recent CSSWG resolutions.

The first one was that, if you set text-transform on a list item or ancestor, the ::marker shouldn’t inherit it by default. For example,

<ol style="list-style-type: lower-alpha; text-transform: uppercase">

should have a lowercase a, not A:

::marker text-transform

Therefore, in I added text-tranform: none to the ::marker UA rules, but also allowed authors to specify another value if they want so.

Then, the CSSWG also resolved that ::marker should allow inherited properties that apply to text which don’t depend on box geometry. And other properties, unless whitelisted, shouldn’t affect markers, even when inherited from an ancestor.

Therefore, I added support for some text and text decoration properties, and also for line-height. On the other hand, I blocked inheritance of text-indent and text-align.

That was done in CLs 791815, 2382750, 2388384, 2391242, 2396125, 2438413.

The outcome was that, in Chromium, ::marker accepts these properties:

  • Animation properties: animation-delay, animation-direction, animation-duration, animation-fill-mode, animation-iteration-count, animation-name, animation-play-state, animation-timeline, animation-timing-function

  • Transition properties: transition-delay, transition-duration, transition-property, transition-timing-function

  • Font properties: font-family, font-kerning, font-optical-sizing, font-size, font-size-adjust, font-stretch, font-style, font-variant-ligatures, font-variant-caps, font-variant-east-asian, font-variant-numeric, font-weight, font-feature-settings, font-variation-settings,

  • Text properties: hyphens, letter-spacing, line-break, overflow-wrap, tab-size, text-transform, white-space, word-break, word-spacing

  • Text decoration properties: text-decoration-skip-ink, text-shadow, -webkit-text-emphasis-color, -webkit-text-emphasis-position, -webkit-text-emphasis-style

  • Writing mode properties: direction, text-combine-upright, unicode-bidi

  • Others: color, content, line-height

However, note that they may end up not having the desired effect in some cases:

  • The style adjuster forces white-space: pre in outside markers, so you can only customize white-space in inside ones.

  • text-combine-upright doesn’t work in pseudo-elements (bug 1060007). So setting it will only affect the computed style, and will also force legacy layout, but it won’t turn the marker text upright.

  • In legacy layout, the marker has no actual contents. So text properties, text decoration properties, unicode-bidi and line-height don’t work.

And this is the default UA stylesheet for markers:

::marker {
  unicode-bidi: isolate;
  font-variant-numeric: tabular-nums;
  text-transform: none;
  text-indent: 0 !important;
  text-align: start !important;
  text-align-last: start !important;

The final change, in, was the removal of the CSSMarkerPseudoElement runtime flag. Since 89.0.4358.0, it’s no longer possible to disable ::marker.


Implementing ::marker needed more than 100 patches in total, several refactorings, some existing bug fixes, and various CSSWG resolutions.

I also added lots of new WPT tests, additionally to the existing ones created by Apple and Mozilla. For every patch that had an observable improved behavior, I tried to cover it with a test. Most of them are in, though some are in css-lists, and others are Chromium-internal since they were testing non-standard behavior.

Note my work didn’t include ::before::marker and ::after::marker selectors, which haven’t been implemented in WebKit nor Firefox either. What remains to be done is making the selector parser handle nested pseudo-elements properly.

Also, I kept the disclosure triangle of a <summary> as a ::-webkit-details-marker, but since Chromium 89 it’s a ::marker as expected, thanks to Kent Tamura.

by Oriol Brufau at December 21, 2020 09:00 PM

December 03, 2020

Alberto Garcia

Subcluster allocation for qcow2 images

In previous blog posts I talked about QEMU’s qcow2 file format and how to make it faster. This post gives an overview of how the data is structured inside the image and how that affects performance, and this presentation at KVM Forum 2017 goes further into the topic.

This time I will talk about a new extension to the qcow2 format that seeks to improve its performance and reduce its memory requirements.

Let’s start by describing the problem.

Limitations of qcow2

One of the most important parameters when creating a new qcow2 image is the cluster size. Much like a filesystem’s block size, the qcow2 cluster size indicates the minimum unit of allocation. One difference however is that while filesystems tend to use small blocks (4 KB is a common size in ext4, ntfs or hfs+) the standard qcow2 cluster size is 64 KB. This adds some overhead because QEMU always needs to write complete clusters so it often ends up doing copy-on-write and writing to the qcow2 image more data than what the virtual machine requested. This gets worse if the image has a backing file because then QEMU needs to copy data from there, so a write request not only becomes larger but it also involves additional read requests from the backing file(s).

Because of that qcow2 images with larger cluster sizes tend to:

  • grow faster, wasting more disk space and duplicating data.
  • increase the amount of necessary I/O during cluster allocation,
    reducing the allocation performance.

Unfortunately, reducing the cluster size is in general not an option because it also has an impact on the amount of metadata used internally by qcow2 (reference counts, guest-to-host cluster mapping). Decreasing the cluster size increases the number of clusters and the amount of necessary metadata. This has direct negative impact on I/O performance, which can be mitigated by caching it in RAM, therefore increasing the memory requirements (the aforementioned post covers this in more detail).

Subcluster allocation

The problems described in the previous section are well-known consequences of the design of the qcow2 format and they have been discussed over the years.

I have been working on a way to improve the situation and the work is now finished and available in QEMU 5.2 as a new extension to the qcow2 format called extended L2 entries.

The so-called L2 tables are used to map guest addresses to data clusters. With extended L2 entries we can store more information about the status of each data cluster, and this allows us to have allocation at the subcluster level.

The basic idea is that data clusters are now divided into 32 subclusters of the same size, and each one of them can be allocated separately. This allows combining the benefits of larger cluster sizes (less metadata and RAM requirements) with the benefits of smaller units of allocation (less copy-on-write, smaller images). If the subcluster size matches the block size of the filesystem used inside the virtual machine then we can eliminate the need for copy-on-write entirely.

So with subcluster allocation we get:

  • Sixteen times less metadata per unit of allocation, greatly reducing the amount of necessary L2 cache.
  • Much faster I/O during allocating when the image has a backing file, up to 10-15 times more I/O operations per second for the same cluster size in my tests (see chart below).
  • Smaller images and less duplication of data.

This figure shows the average number of I/O operations per second that I get with 4KB random write requests to an empty 40GB image with a fully populated backing file.

I/O performance comparison between traditional and extended qcow2 images

Things to take into account:

  • The performance improvements described earlier happen during allocation. Writing to already allocated (sub)clusters won’t be any faster.
  • If the image does not have a backing file chances are that the allocation performance is equally fast, with or without extended L2 entries. This depends on the filesystem, so it should be tested before enabling this feature (but note that the other benefits mentioned above still apply).
  • Images with extended L2 entries are sparse, that is, they have holes and because of that their apparent size will be larger than the actual disk usage.
  • It is not recommended to enable this feature in compressed images, as compressed clusters cannot take advantage of any of the benefits.
  • Images with extended L2 entries cannot be read with older versions of QEMU.

How to use this?

Extended L2 entries are available starting from QEMU 5.2. Due to the nature of the changes it is unlikely that this feature will be backported to an earlier version of QEMU.

In order to test this you simply need to create an image with extended_l2=on, and you also probably want to use a larger cluster size (the default is 64 KB, remember that every cluster has 32 subclusters). Here is an example:

$ qemu-img create -f qcow2 -o extended_l2=on,cluster_size=128k img.qcow2 1T

And that’s all you need to do. Once the image is created all allocations will happen at the subcluster level.

More information

This work was presented at the 2020 edition of the KVM Forum. Here is the video recording of the presentation, where I cover all this in more detail:

You can also find the slides here.


This work has been possible thanks to Outscale, who have been sponsoring Igalia and my work in QEMU.

Igalia and Outscale

And thanks of course to the rest of the QEMU development team for their feedback and help with this!

by berto at December 03, 2020 06:15 PM

November 29, 2020

Philippe Normand

Catching up on WebKit GStreamer WebAudio backends maintenance

Over the past few months the WebKit development team has been working on modernizing support for the WebAudio specification. This post highlights some of the changes that were recently merged, focusing on the GStreamer ports.

My fellow WebKit colleague, Chris Dumez, has been very active lately, updating the WebAudio implementation for the mac ports in order to comply with the latest changes of the specification. His contributions have been documented in the Safari Technology Preview release notes for version 113, version 114, version 115 and version 116. This is great for the WebKit project! Since the initial implementation landed around 2011, there wasn’t much activity and over the years our implementation started lagging behind other web engines in terms of features and spec compliance. So, many thanks Chris, I think you’re making a lot of WebAudio web developers very happy these days :)

The flip side of the coin is that some of these changes broke the GStreamer backends, as Chris is focusing mostly on the Apple ports, a few bugs slipped in, noticed by the CI test bots and dutifully gardened by our bots sheriffs. Those backends were upstreamed in 2012 and since then I didn’t devote much time to their maintenance, aside from casual bug-fixing.

One of the WebAudio features recently supported by WebKit is the Audio Worklet interface which allows applications to perform audio processing in a dedicated thread, thus relieving some pressure off the main thread and ensuring a glitch-free WebAudio rendering. I added support for this feature in r268579. Folks eager to test this can try the GTK nightly MiniBrowser with the demos:

$ wget
$ chmod +x webkit-flatpak-run-nightly
$ python3 webkit-flatpak-run-nightly --gtk MiniBrowser

For many years our AudioFileReader implementation was limited to mono and stereo audio layouts. This limitation was lifted off in r269104 allowing for processing of up to 5.1 surround audio files in the AudioBufferSourceNode.

Our AudioDestination, used for audio playback, was only able to render stereo. It is now able to probe the GStreamer platform audio sink for the maximum number of channels it can handle, since r268727. Support for AudioContext getOutputTimestamp was hooked up in the GStreamer backend in r266109.

The WebAudio spec has a MediaStreamAudioDestinationNode for MediaStreams, allowing to feed audio samples coming from the WebAudio pipeline to outgoing WebRTC streams. Since r269827 the GStreamer ports now support this feature as well! Similarly, incoming WebRTC streams or capture devices can stream their audio samples to a WebAudio pipeline, this has been supported for a couple years already, contributed by my colleague Thibault Saunier.

Our GStreamer FFTFrame implementation was broken for a few weeks, while Chris was landing various improvements for the platform-agnostic and mac-specific implementations. I finally fixed it in r267471.

This is only the tip of the iceberg. A few more patches were merged, including some security-related bug-fixes. As the Web Platform keeps growing, supporting more and more multimedia-related use-cases, we, at the Igalia Multimedia team, are committed to maintain our position as GStreamer experts in the WebKit community.

by Philippe Normand at November 29, 2020 12:45 PM

November 26, 2020

Víctor Jáquez

Notes on using Emacs (LSP/ccls) for WebKit

I used to regard myself as an austere programmer in terms of tooling: Emacs —with a plain configuration— and grep. This approach forces you to understand all the elements involved in a project.

Some time ago I have to code in Rust, so I needed to learn the language as fast as possible. I looked for packages in MELPA that could help me to be productive quickly. Obviously, I installed rust-mode, but I also found racer for auto-completion. I tried it out. It was messy to setup and unstable, but it helped me to code while learning. When I felt comfortable with the base code, I uninstalled it.

This year I returned to work on WebKit. The last time I contributed to it was around five years ago, but now in a different area (still in the multimedia stack). WebKit is huge, and because of C++, I found gtags rather limited. Out of curiosity I looked for something similar to racer but for C++. And I spent a while digging on it.

The solution consists in the integration of three MELPA packages:

  • lsp-mode: a client for Language Server Protocol for Emacs.
  • company-mode: a text completion framework.
  • ccls: A C/C++ language server. Besides emacs-ccls adds more functionality to lsp-mode.

(I known, there’s a simpler alternative to lsp-mode, but I haven’t tried it yet).

First we might explain what’s LSP. It stands for Language Server Protocol, defined with JSON-RPC messages, between the editor and the language server. It was orginally developed by Microsoft for Visual Studio, which purpose is to support auto-completion, finding symbol’s definition, to show early error markers, etc., inside the editor. Therefore, lsp-mode is an Emacs mode that communicates with different language servers in LSP and operates in Emacs accordingly.

In order to support the auto-completion use-case lsp-mode uses the company-mode. This Emacs mode is capable to create a floating context menu where the editing cursor is placed.

The third part of the puzzle is, of course, the language server. There’s a language servers for different programming languages. For C & C++ there are two servers: clangd and ccls. The former uses Clang compiler, the last can use either Clang, GCC or MSVC. Along this text ccls will be used for reasons exposed later. In between, emacs-ccls leverages and extends the support of ccls in lsp-mode, though it’s not mandatory.

In short, the basic .emacs configuration, using use-package, would have these lines:

(use-package company
  :config (global-company-mode 1))

(use-package lsp-mode
  :diminish "L"
  :init (setq lsp-keymap-prefix "C-l"
              lsp-enable-file-watchers nil
              lsp-enable-on-type-formatting nil
              lsp-enable-snippet nil)
  :hook (c-mode-common . lsp-deferred)
  :commands (lsp lsp-deferred))

(use-package ccls
  :init (setq ccls-sem-highlight-method 'font-lock)
  :hook ((c-mode c++-mode objc-mode) . (lambda () (require 'ccls) (lsp-deferred))))

The snippet first configures company-mode. It is enabled globally because, normally, it is a nice feature to have, even in non-coding buffers, such as this very one, for writing a blog post in markdown format. Diminish mode hides or abbreviates the mode description in the Emacs’ mode line.

Later comes lsp-mode. It’s big and aims to do a lot of things, basically we have to tell it to disable certain features, such as file watcher, something not viable in massive projects as WebKit; as I don’t use snippet (generic text templates), I also disable it; and finally, lsp-mode tries to format the code at typing, I don’t know how the code style is figured out, but in my experience, it’s always detected wrong, so I disabled it too. Finally, lsp-mode is launched when a text uses the c-mode-common, shared by c++-mode too. lsp-mode is launched deferred, meaning it’ll startup until the buffer is visible; this is important since we might want to delay ccls session creation until the buffer’s .dir-locals.el file is processed, where it is configured for the specific project.

And lastly, ccls-mode configuration, hooked until c-mode or c++-mode are loaded up in a deferred fashion (already explained).

It’s important to understand how ccls works in order to integrate it in our workflow of a specific project, since it might need to be configured using Emacs’ per-directory local variales.

We are living in a post-Makefile world (almost), proof of that is ccls, which instead of a makefile, it uses a compilation database, a record of the compile options used to build the files in a project. It’s commonly described in JSON and it’s generated automatically by build systems such as meson or cmake, and later consumed by ninja or ccls to execute the compilation. Bear in mind that ccls uses a cache, which can eat a couple gigabytes of disk.

Now, let’s review the concrete details of using these features with WebKit. Let me assume that WebKit local repository is cloned in ~/WebKit.

As you may know, the cool way to compile WebKit is with flatpak. Flatpak adds an indirection in the compilation process, since it’s done in an isolated environment, above the native system. As a consequence, ccls has to be the one inside the Flatpak environment. In ~/.local/bin/webkit-ccls:

set -eu
cd $HOME/WebKit/
exec Tools/Scripts/webkit-flatpak -c ccls "$@"

Basically the scripts calls ccls inside flatpak, which is available in the SDK. And this is why ccls instead of clang, since clang is not provided.

By default ccls assumes the compilation database is in the project’s root directory, but in our case, it’s not, thus it is required to configure the database directory for our WebKit setup. For it, as we already said, a .dir-locals.el file is used.

  (indent-tabs-mode . nil)
  (c-basic-offset . 4))
  (indent-tabs-mode . nil)
  (c-basic-offset . 4))
  (indent-tabs-mode . nil)
  (c-basic-offset . 4))
  (indent-tabs-mode . nil))
  (fill-column . 100)
  (ccls-executable . "/home/vjaquez/.local/bin/webkit-ccls")
  (ccls-initialization-options . (:compilationDatabaseDirectory "/app/webkit/WebKitBuild/Release"
                                  :cache (:directory ".ccls-cache")))
  (compile-command . "build-webkit --gtk --debug")))

As you can notice, ccls-execute is defined here, though it’s not a safe local variable. Also the ccls-initialization-options, which is a safe local variable. It is important to notice that the compilation database directory is a path inside flatpak, and always use the Release path. I don’t understand why, but Debug path didn’t work for me. This mean that WebKit should be compiled as Release frequently, even if we only use Debug type for coding (as you may see in my compile-command).

Update: Now we can explain why it’s important to configure lsp-mode as deferred: to avoid connections to ccls before processing the .dir-locals.el file.

And that’s all. Now I have early programming errors detection, auto-completion, and so on. I hope you find these notes helpful.

Update: Sadly, because of flatpak indirection, symbols’ definition finding won’t work because the file paths stored in ccls cache are relative to flatpak’s file system. For that I still rely on global and its Emacs mode.

by vjaquez at November 26, 2020 04:20 PM

November 22, 2020

Eleni Maria Stea

FOSSCOMM 2020, and a status update on EXT_external_objects(_fd) extensions [en, gr]

FOSSCOMM (Free and Open Source Software Communities Meeting) is a Greek conference aiming to promote the use of FOSS in Greece and to bring FOSS enthusiasts together. It is organized entirely by volunteers and universities and takes place in a different city each year. This year it was virtual as Greece is under lockdown, and … Continue reading FOSSCOMM 2020, and a status update on EXT_external_objects(_fd) extensions [en, gr]

by hikiko at November 22, 2020 06:11 PM

November 20, 2020

Paulo Matos

A tour of the for..of implementation for 32bits JSC

We look at the implementation of the for-of intrinsic in 32bit JSC (JavaScriptCore).


by Paulo Matos at November 20, 2020 02:00 PM

Maksim Sisov

Chrome/Chromium on Wayland: The Waylandification project.

It has been a long time since I wrote my last blog post and since I wrote about something that I and my colleagues at Igalia have been working for the past 4 years. I have been postponing writing this post waiting until something big happens. Well, something big just happened…

If you already know what Ozone is, then I am happy to tell you that Chromium for Linux includes Ozone by default now and can be enabled with runtime command line flags. If you are interested in trying Chrome/Chromium with native Wayland support, you are encouraged to download Google Chrome for developers and try Ozone/Wayland by running the browser with the following command line flags – ‘–enable-features=UseOzonePlatform –ozone-platform=wayland’.

If you don’t know what Ozone is, here’s a brief explanation, before I talk about the history, status and design of this effort.

What is Ozone?

The very first thing that one may think about when they hear “Ozone” is the gas or a thin layer of the Earth’s atmosphere. Well… it is partly correct. In the case of Chromium, it is a platform abstraction layer.

I will not go into many details, but here is the description of that layer from Chromium’s documentation about Ozone –
“Ozone is a platform abstraction layer beneath Aura, Chromium’s platform independent windowing system, that is used for low level input and graphics. Once complete, the abstraction will support underlying systems ranging from embedded SoC targets to new X11-alternative window systems on Linux such as Wayland or Mir to bring up Aura Chromium by providing an implementation of the platform interface.”.
If you are interested in more details, you are welcome to read the project’s documentation at

The Summary of the Design of Ozone/Wayland

It has been a long time since Antonio Gomes started to work on this project. It started as a research project for our customer – Renesas Electronics, and was based on a former abstraction project with another clever name, “mus+ash” (pronounced “mustache”, you can read more about that here – Chromium, ozone, wayland and beyond).

Since that time, the project has been moved to downstream and back to upstream (because of some unknowns related to the “mus+ash”) and the design of Ozone integration has also been changed.

Currently, the Aura/Platform classes are injected into the Browser process and communicate directly with the underlying Ozone platforms including Wayland. In the browser process, Wayland creates a connection with a Wayland compositor, while in the GPU process, it only draws pixels into the created DMABUFs and neither receives events nor creates surfaces.

Migrating Away From X11-only Legacy Backend.

It is worth mentioning that Igalia has been working on both Ozone/X11 and Ozone/Wayland.

Since June 2020, we have been working on switching Ozone for Linux from needing to be done at compile time to being choosable at runtime. At the moment, one can try Ozone by running Chrome downloaded from the development channel with the ‘–enable-features=UseOzonePlatform –ozone-platform=wayland/x11’ runtime flags.

That approach is allowing us to gather a bigger audience of users who are willing to test the Ozone capabilities, but also achieve a better feature parity between non-Ozone/X11 and Ozone/X11/Wayland.

That is, most of the features and code paths are shared between the two implementations, and the paths that are not compatible with Ozone are being refactored at the moment.

Once all the incompatible  paths are refactored ( just a few of them remain) and all the available test suites are enabled on the Linux/Ozone bot, we will start what is known as a “finch trial”.  This allows Ozone to be enabled by default for some percentage of users (about 10%). If the finch trial goes well, the percentage of users will be gradually grown to 100% and we will start removing old non”-Ozone/X11 implementation.

Wayland + Tab Drag

If you’ve been trying it out, you might have already noticed that Ozone/Wayland does not support the Tab Drag feature well enough. The problem is the lack of the protocol for this feature.

At the moment, my colleague Nick Diego is working on the definition of the protocol for tab drag and implementation of that in Chromium.

Unfortunately, Ozone will fallback to x11/xwayland for compositors that do not support the aforementioned protocol. However, once more and more compositors will support that, Chrome will whitelist those compositors.

I would not go into details of that effort here in this blog post, but just rather leave a link to the design document that Nick has created – Tab Dragging on Ozone/Wayland.


This blog post was rather a brief summary of the design, feature, and status of the project. Thank you for reading it. Hopefully, when we start a finch trial, I will write another blog telling you how it goes. Bye.

by msisov at November 20, 2020 11:28 AM

Brian Kardell

Open Prioritization First Experiment Wrap Up

Open Prioritization First Experiment Wrap Up

Earlier this year, Igalia launched an experiment called "Open Prioritization" which, effectively, lets us pull money together to prioritize work on a feature in web browsers. In this piece I'll talk about the outcome, lessons learned along the way and next steps.

Our Open Prioritization experiment was a pretty big idea. On the surface, it seems to be asking a pretty simple question: "Could we crowdfund some development work on browser?" However, it was quite a bit more involved in its goals, because there is a lot more hiding behind that question than meets the eye, and most of it is kind of difficult to talk about in purely theoretical ways. I'll get into all of that in a minute, but let's start with the results...

One project advances: :focus-visible

We began the experiment with six possible things we would try to crowdfund, and I'm pleased to say that one will advance: :focus-visible in WebKit.

We are working with Open Collective on next steps as it involves some decision making on how we manage future experiments and a bigger idea too. However, very soon this will shift from a pledged collective which just asked "would you financially support this project if it were to be offered?" to a proper way to collect funds for it. If you pledged, you will receive an contact when it's ready asking you to fulfill your pledge with information on how. We will also write a post when that happens as it's likely that at least some people will not come back and fulfill their pledge.

As soon as this begins and enough funds are available, it will enter our developers work queue and as staff frees up, they will shift to begin work on implementing this in WebKit!

We did it! We at Igalia would like to say a giant "thank you" for all of those who helped support these efforts in improving our commons.


Let's talk about some of those bigger ideas this experiment was aiming to look at, and lessons we learned along the way, in retrospect...

  • Resources are finite. Prioritization is hard. No matter how big the budget, resources are still finite and work has to be prioritized. Even with only a few choices on the table to choose from, it's not necessarily easy or plain what to choose because there isn't a "right" answer.

  • There are reasonable arguments for prioritizing different things. The two finalists both had strong arguments from different angles, but even a step back - at least some people chose to pledge to something else. Some thought that supporting SVG path syntax in CSS was the best choice. They only pledged to that one. Many of these were final implementations, but this wasn't. Some people thought that advancing new things that no one else seems to be advancing was the way to go. Others supported it because they thought that it was really important to boost ones that are help Mozilla. There just weren't enough people either seeing or agreeing with that weighting of things.

  • Cost is a factor It's not an exclusive factor - the cheapest option by far (SVG Path in CSS/Mozilla) was eliminated earlier. There are other reasons :focus-visible made some giant leaps too, but - at the end of the day the bar was also just lower. The second place project never actually managed to pull ahead, depite hoving more actual pledged dollars at one point.

  • Investing with uncertainty is especially hard . Just last week, Twitter exploded with excitement that Google was going to prototype some high level stuff with Container Queries. Fundamental to Google's intent is CSS containment in a single direction. CSS does not currently define containment in a single direction, but it does define the containment module where it would be defined. Containment was, in part, trying to lay some potential groundwork here. When we launched the project, I wrote about this: WebKit doesn't currently support the containment that is defined already and is a necessary prerequisite of any proposal involving that approach. The trouble is: We don't know if that will be the approach, and supporting it is a big task. Building a high level solution on the magic in our switch proposal, for example, doesn't require containment at all. Adding general containment support was the most expensive project on our list, by far. In fact, we could have done a couple of them for that price. This makes the value proposition of that work very speculative. Despite being potentially critically valuable for the single biggest/longest ask in CSS history - that project didn't make the finals when we put it to the public either.

  • Some things are difficult to predict. Going into this, I didn't know what to expect. A single viral tweet and a mass of developers pitching in $1 or $2 could, in theory, have funded any of these in hours. While I didn't expect that, I did kind of expect some amount of funds in the end would be of that sort. Interestingly, that didn't happen. At all. Depite lots of efforts trying to get lots of people to pledge very small dollars even asking specifically, and making it possible to do with a tweet - very, very few did (literally 1 on the winning project pledged less than five dollars). The most popular pledge was $20 with about a quarter of the pledges being over $50, and going up from there.

  • Matching funds are a really big deal. You can definitely see why fundraisers stress this. For the duration of this experiment, we saw periods of little actual movement, despite lots of tweets about it, likes and blog posts. There were a few giant leaps, and they all involved offers of matching dollars. Igalia ourselves, The A11Y Project and AMPHTML all had some offer of matching dollars that really seemed to inspire a lot more participation. The bigger the matching dollars available, the bigger the participation was.

  • Communication is hard. These might not have been the most ideal projects, in some respects. This last bullet is complicated enough that I'll give it it's own section.

Lessons learned: Communication challenges

While I am tremendously happy that inert and :focus-visible were our finalists and both did very well, I am biased. I helped advocate for and specify these two features before I came to Igalia, working with some Googlers who also did the initial implementations. I also advocated for them to be included in the list of projects we offered. However, I failed to anticipate that the very reasons I did both of these would present challenges for the experiment, so I'd like to talk about that a bit...

Unfortunately a confluence of things led to a lot of chatter and blog posts which were effectively saying something along the lines of "Developers shouldn't have to foot the bill because Apple doesn't care about accessibility and refuses to implement something. They don't care, and this is evidence proof - they are the last ones to not implement" and I wound up having a lot of conversations trying to correct the various misunderstanding here. That's not everyone else's fault, it's mine. I should have taken more time to communicate these things clearly, but for the record, nothing about this is really correct, so let me take the time to add the clarity for posterity...

  • On last implementations The second implementations only recently began or completed in Firefox, and one of those was also by Igalia. It seems really unfortunate and not exactly fair to suggest that being a few weeks/months behind, and especially when that came from outside help, is really an indictment. It's not. As an example, in the winning project, Chromium shipped this by default in October 2020. Firefox is right now pending a default release. Keep in mind that vendors don't have perfect insight into what is happening in other browsers, and even if they did reallocating resources isn't a thing that is done on a whim: Different browsers have different people with different skills and availability at any given point in time.

  • On refusal to implement This is 100% incorrect. I want to really stress this: Every item on our list comes from the list of things that are 'wants' from vendors themselves that need prioritization and are among the things they will be considering taking up next. If not funded here, it will definitely still get done - it's just impossible to say when really, and whatever priority they give it, they can't give to something else. This experiment gives us a more definite timeframe and frees them to spend that on implementing something else.

  • On web developers shouldn't have to foot the bill. Well, if you mean contributing dollars directly in crowdfunding in order to get the feature, we absolutely don't (see above bullet). However, generally speaking, this was in fact part of the conversation we wanted to start. Make no mistake: You are paying today, indirectly - and the actual investment back into the commons is inefficeint and non-guaranteed. It's wonderful that 3 organizations have seemed to foot the bill for decades, but starting a conversation about whether it is talking about that is definitely part of the goal here.

  • On "Apple doesn't care about accessibility" This one makes me really sad, not only because I know it isn't true and it seems easy to show otherwise, but also because there are some really great people from Apple like James Craig who absolutely not only care very deeply but often help lead on important things.

  • On "it's wrong to crowdfund accessibility features"Unfortunately, it seems the very things that drew me to work on these in the first place wound up working against us a little: Both inert and :focus-visible are interesting because they are "core features" to the platform that are useful to everyone. However, they are designed to sit at an intersection where they happily have really out-sized impact for accessibility. There are good polyfills for both of these which work well and somewhat reduce the degree of 'urgency'. I really thought that this made for a nice combination of interests/pains might lead to good partnerships of investment where, yes, I imagined that perhaps some organizations interested in advancing the accessibility end of things and who have historically contribute their labors, might see value in contributing to the flame more directly. Perhaps this wasn't as wise or obviously great as I imagined.

Wrapping up

All in all, in the end - despite some rocky communications, we are really encouraged by this first experiment. Thank you to everyone who pledged, boosted, blogged about the effort, etc. We're really looking forward to taking this much further next year and we'd like to begin by asking you to share which specific projects you'd be interested in seeing or supporting in the future? Hit us up on @briankardell or @igalia.

November 20, 2020 05:00 AM

November 14, 2020

Eleni Maria Stea

A hack to display the Vulkan CTS tests output

Vulkan conformance tests for graphics drivers save their output images inside an XML file called TestResults.qpa. As binary outputs aren’t allowed, these output images (that would be saved as PNG otherwise) are encoded to text using Base64 and the result is printed between <Image></Image> XML tags. This is a problem sometimes, as external tools are … Continue reading A hack to display the Vulkan CTS tests output

by hikiko at November 14, 2020 03:20 AM

November 13, 2020

Alexander Dunaev

HiDPI support in Chromium for Wayland

It all started with this bug. The description sounded humble and harmless: the browser ignored some command line flag on Wayland. A screenshot was attached where it was clearly seen that Chromium (version 72 at that time, 2019 spring) did not respect the screen density and looked blurry on a HiDPI screen.

HiDPI literally means small pixels. It is hard to tell now what was the first HiDPI screen, but I assume that their wide recognition came around 2010 with Apple’s Retina displays. Ultra HD had been standardised in 2012, defining the minimum resolution and aspect ratio for what today is known informally as 4K—and 4K screens for laptops have pixels that are small enough to call it HiDPI. This Chromium issue, dated 2012, says that the Linux port lacks support for HiDPI while the Mac version has it already. On the other hand, HiDPI on Windows was tricky even in 2014.

‘That should be easy. Apparently it’s upscaled from low resolution. Wayland allows setting scale for the back buffers, likely you’ll have to add a single call somewhere in the window initialisation’, a colleague said.

Like many stories that begin this way, this turned out to be wrong. It was not so easy. Setting the buffer scale did the right thing indeed, but it was absolutely not enough. It turned out that support for HiDPI screens was entirely missing in our implementation of the Wayland client. On my way to the solution, I have found that scaling support in Wayland is non-trivial and sometimes confusing. Since I finished this work, I have been asked a few times about what happens there, so I thought that writing it all down in a post would be useful.


Modern desktop environments usually allow configuring the scale of the display at global system level. This allows all standard controls and window decorations to be sized proportionally. For applications that use those standard controls, this is a happy end: everything will be scaled automatically. Those which prefer doing everything themselves have to get the current scale from the environment and adjust rendering.  Chromium does exactly that: inside it has a so-called device scale factor. This factor is applied equally to all sizes, locations, and when rendering images and fonts. No code has to bother ever. It works within this scaled coordinate system, known as device independent pixels, or DIP. The device scale factor can take fractional values like 1.5, but, because it is applied at the stage of rendering, the result looks nice. The system scale is used as default device scale factor, and the user can override it using the command line flag named --force-device-scale-factor. However, this is the very flag which did not work in the bug mentioned in the beginning of this story.

Note that for X11 the ‘natural’ scale is still the physical pixels.  Despite having the system-wide scale, the system talks to the application in pixels, not in DIP.  It is the application that is responsible to handle the scale properly. If it does not, it will look perfectly sharp, but its details will be perhaps too small for the naked eye.

However, Wayland does it a bit differently. The system scale there is respected by the compositor when pasting buffers rendered by clients. So, if some application has no idea about the system scale and renders itself normally, the compositor will upscale it.  This is what originally happened to Chromium: it simply drew itself at 100%, and that image was then stretched by the system compositor. Remember that the Wayland way is giving a buffer to each application and then compositing the screen from those buffers, so this approach of upscaling buffers rendered by applications is natural. The picture below shows what that looks like. The screenshot is taken on a HiDPI display, so in order to see the difference better, you may want to see the full version (click the picture to open).

What Chromium looked like when it did not set its back buffer scale

Firefox (left) vs. Chromium (right)

How do Wayland clients support HiDPI then?

Level 1. Basic support

Each physical output device is represented at the Wayland level by an object named output. This object has a special integer property named buffer scale that tells literally how many physical pixels are used to represent the single logical pixel. The application’s back buffer has that property too. If scales do not match, Wayland will simply scale the raster image, thus emulating the ‘normal DPI’ device for the application that is not aware of any buffer scales.

The first thing the window is supposed to do is to check the buffer scale of the output that it currently resides at, and to set the same value to its back buffer scale. This will basically make the application using all available physical pixels: as scales of the buffer and the output are the same, Wayland will not re-scale the image.

Back buffer scale is set but rendering is not aware of that

Chromium now renders sharp image but all details are half their normal size

The next thing is fixing the rendering so it would scale things to the right size.  Using the output buffer scale as default is a good choice: the result will be ‘normal size’.  For Chromium, this means simply setting the device scale factor to the output buffer scale.

Now Chromium looks right

All set now

The final bit is slightly trickier.  Wayland sends UI events in DIP, but expects the client to send surface bounds in physical pixels. That means that if we implement something like interactive resize of the window, we will also have to do some math to convert the units properly.

This is enough for the basic support.  The application will work well on a modern laptop with 4K display.  But what if more than a single display is connected, and they have different pixel density?

Level 2. Multiple displays

If there are several output devices present in the system, each one may have its own scale. This makes things more complicated, so a few improvements are needed.

First, the window wants to know that it has been moved to another device.  When that happens, the window will ask for the new buffer scale and update itself.

Second, there may be implementation-specific issues. For example, some Wayland servers initially put the new sub-surface (which is used for menus) onto the default output, even if its parent surface has been moved to another output.  This may cause weird changes of their scale during their initialisation.  In Chromium, we just made it so the sub-surface always takes its scale from the parent.

Level 3? Fractional scaling?

Not really. Fractional scaling is basically ‘non-even’ scales like 125%. The entire feature had been somewhat controversial when it had been announced, because of how rendering in Wayland is performed. Here, non-even scale inevitably uses raster operations which make the image blurry. However, all that is transparent to the applications. Nothing new has been introduced at the level of Wayland protocols.


Although this task was not as simple as we thought, in the end it turned out to be not too hard. Check the output scale, set the back buffer scale, scale the rendering, translate pixels to DIP and vice versa in certain points. Pretty straightforward, and if you are trying to do something related, I hope this post helps you.

The issue is that there are many implementations of Wayland servers out there, not all of them are consistent, and some of them have bugs. It is worth testing the solution on a few distinct Linux distributions and looking for discrepancies in behaviour.

Anyway, Chromium with native Wayland support has recently reached beta—and it supports HiDPI! There may be bugs too, but the basic support should work well. Try it, and let us know if something is not right.

Note: the Wayland support is so far experimental. To try it, you would need to launch chrome via the command line with two flags:

by adunaev at November 13, 2020 10:10 AM

November 08, 2020

Eleni Maria Stea

[OpenGL and Vulkan Interoperability on Linux] Part 10: Reusing a Vulkan stencil buffer from OpenGL

This is 10th post on OpenGL and Vulkan interoperability with EXT_external_objects and EXT_external_objects_fd. We’ll see the last use case I’ve written for Piglit to test the extensions implementation on various mesa drivers as part of my work for Igalia. In this test a stencil buffer is allocated and filled with a pattern by Vulkan and … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 10: Reusing a Vulkan stencil buffer from OpenGL

by hikiko at November 08, 2020 10:07 PM

November 05, 2020

Iago Toral

V3DV + Zink

During my presentation at the X Developers Conference I stated that we had been mostly using the Khronos Vulkan Conformance Test suite (aka Vulkan CTS) to validate our Vulkan driver for Raspberry Pi 4 (aka V3DV). While the CTS is an invaluable resource for driver testing and validation, it doesn’t exactly compare to actual real world applications, and so, I made the point that we should try to do more real world testing for the driver after completing initial Vulkan 1.0 support.

To be fair, we had been doing a little bit of this already when I worked on getting the Vulkan ports of all 3 Quake game classics to work with V3DV, which allowed us to identify and fix a few driver bugs during development. The good thing about these games is that we could get the source code and compile them natively for ARM platforms, so testing and debugging was very convenient.

Unfortunately, there are not a plethora of Vulkan applications and games like these that we can easily test and debug on a Raspberry Pi as of today, which posed a problem. One way to work around this limitation that was suggested after my presentation at XDC was to use Zink, the OpenGL to Vulkan layer in Mesa. Using Zink, we can take existing OpenGL applications that are currently available for Raspberry Pi and use them to test our Vulkan implementation a bit more thoroughly, expanding our options for testing while we wait for the Vulkan ecosystem on Raspberry Pi 4 to grow.

So last week I decided to get hands on with that. Zink requires a few things from the underlying Vulkan implementation depending on the OpenGL version targeted. Currently, Zink only targets desktop OpenGL versions, so that limits us to OpenGL 2.1, which is the maximum version of desktop OpenGL that Raspbery Pi 4 can support (we support up to OpenGL ES 3.1 though). For that desktop OpenGL version, Zink required a few optional Vulkan 1.0 features that we were missing in V3DV, namely:

  • Logic operations.
  • Alpha to one.
  • VK_KHR_maintenance1.

The first two were trivial: they were already implemented and we only had to expose them in the driver. Notably, when I was testing these features with the relevant CTS tests I found a bug in the alpha to one tests, so I proposed a fix to Khronos which is currently in review.

I also noticed that Zink was also implicitly requiring support for timestamp queries, so I also implemented that in V3DV and then also wrote a patch for Zink to handle this requirement better.

Finally, Zink doesn’t use Vulkan swapchains, instead it creates presentable images directly, which was problematic for us because our platform needs to handle allocations for presentable images specially, so a patch for Zink was also required to address this.

As of the writing of this post, all this work has been merged in Mesa and it enables Zink to run OpenGL 2.1 applications over V3DV on Raspberry Pi 4. Here are a few screenshots of Quake3 taken with the native OpenGL driver (V3D), with the native Vulkan driver (V3DV) and with Zink (over V3DV). There is a significant performance hit with Zink at present, although that is probably not too unexpected at this stage, but otherwise it seems to be rendering correctly, which is what we were really interested to see:

Quake3 Vulkan renderer (V3DV)

Quake3 OpenGL renderer (V3D)

Quake3 OpenGL renderer (Zink + V3DV)

Note: you’ll notice that the Vulkan screenshot is darker than the OpenGL versions. As I reported in another post, that is a feature of the Vulkan port of Quake3 and is unrelated to the driver.

Going forward, we expect to use Zink to test more applications and hopefully identify driver bugs that help us make V3DV better.

by Iago Toral at November 05, 2020 10:14 AM

Brian Kardell

All Them Switches: Responsive Elements and More

All Them Switches: Responsive Elements and More

In this post I'll talk about developments along the way to a 'responsive elements' proposal (aka container queries/element queries use cases) that I talked about earlier this year, a brief detour along the way, and finally, ask for your input on both...

I've been talking a lot this year about the web ecosystem as a commons, its health, and why I believe that diversifying investment in it is both important and productive1,2,3,4,5. Collectively, at Igalia, believe this and we choose to invest in the commons ourselves too. We try to apply our expertise toward helping solve hard problems that have been long stuck, trying to listen to developers and do things that we believe can help further what we think are valuable causes. I'd like to tell you the story of one of those efforts, which became two - and enlist your input..

Advancing the Container Queries Cause

As you may recall, back in Feburary I posted an article explaining that we had been working on this problem a bunch, and sharing our thoughts and progress and just letting people know that something is happening... People are listening, and trying. I also shared that our discussions also prompted David Baron's work toward another possible path.

We wanted to present these together, so by late April we both made informal proposals to the CSS working group of what we'd like to explore. Ours was to begin with a switch() function in CSS focused on slotting into the architecture of CSS in a way that allows us to solve currently impossible problems. If we can show that this works and progress all of the engines, the problem of sugaring an even higher level expression becomes possible, but we deliver useful values fast too.

Neither the CSS working nor anyone involved in any of the proposals is arguing that these are an either/or choice here. We are pursuing options and answering questions, together. We all believe that working this problem from both ends has definite value in both the short and long term and are mutually supportive. We are also excited by Miriam Suzanne's recent work. They are almost certainly complimentary and may even wind up helping each other with different cases.

Shortly after we presented our idea, Igalia also said that we would be willing to invest time to try to do some prototyping and implementation exploration and report back.

Demos and Updates

My colleague Javi Fernadez agreed to tackle initial implementation investigations with some down time he had. Initially, he made some really nice progress pretty quickly, coming up with a strategy, writing some low-level tests and getting them passing. But, then the world got very... you know... hectic.

However, I'm really happy to announce today that that we have recently completed enough to to share and to say we'll be able to take this experience back to report to CSSWG pretty soon.

The following demos represent research and development. Implementation is limited, not yet standard and was done for the purposes of investigation, dicussion and to answer questions necessary for implementers. It is, nevertheless, real, functioning code.

A running demo in a build of Chromium of an image grid component designed independently from page layout, which uses the proposed CSS switch() function to declaratively, responsively change the grid-template-columns that it uses based on the size available to it.

Cool, right? Here's a short "lightning talk" style presentation on it with some more demos too (note the bit of jank you see is dropped frames from my recording, rendering is very fluid as they are in the version embedded above)...

So - I think this is pretty exciting... What do you think? Here are answers to some questions I know people have

Why a function and not a MQ or pseduo?
My post from Feb and the proposal explains that this is not an "instead of", but rather a " simpler and powerful step in breaking down the problem, which is luckily also a very useful step on its own". The things we want ultimately and tend to discuss are full of several hard problems, not just one. It's very hard to discuss numerous hypotheticals all at once, especially if they represent enormous amounts of work to imagine how they slot together in existing CSS architecture. That's not to say we shouldn't still try that too, it's just that the path for one is more definite and known. Our proposal, we believe, neatly slots out a few of the hardest problems in CSS and provides an achieveable answer we can make fast progress on in all engines and lessen the number of open questoons. This could allow us to both take on higher level sugar next, but also to fill that gap in user-land until we do. Breaking down problems like this is probably a thing you have done on your own engineering projects. It makes sense.
Why is it called inline available size?
The short answer is because that is accurately what it actually represents internally and there are good arguments for it I'll save for a more detailed post if this carries on, but don't get hung up on that, we haven't begun bikeshedding details of how you write the function and it will change. In fact, these demos use a slightly different format than our proposal because it was easier to parse. Don't get hung up on that either.
Where can you use switch?
You can use anything anywhere, but it will only be valid and meaningful in certain places. The function will only provide an available-inline-size value to switch in places that the CSS WG agrees to list. Sensibly what you can say is that these will never include things that could create circularties because they are used in derermining the available size. So, you wont be able to change the display, or the font with a switch() that depends on inine-available-size, but anything that changes properties of a layout module or paint is probably fair game. CSS could make other switchable values available for other properties though.
Why doesn't it use min-width/max-width style like media queries?
Media Queries L4 supports these examples, we just wanted to show you could. You could just as easily use min-width/max-width here!

Bonus Round: Switching gears...

Shortly after we made our switch proposal, my friend Jon Neal opened a github issue based on some twitter conversations. For the next week or two this thread was very busy with lots of function proposals that looked vaguely "switch-like". In fact, a number of them were also using the word "switch". From these, there are 3 ideas which seem interesting, look somewhat similar, but have some (very) importantly different characteristics, challenges and abilities. They are described below.


This proposal is a function which lets a variable represent an index into a list of possible values. Its use would look like this:

.foo {


This proposal is a function which allows you to pass pairs of math-like conditions and value associations, as well as a default value. The conditions are evaluated from left to right and the value following the first condition to be true is used, or the default if none are. Its use would look like this:

.foo {
      (50vw < 400px) 2em, 
      (50vw < 800px) 1em, 


This (our) proposal is a function which works like cond() above, but can provide contextual information only available at appropriate stages in the lifecycle. In the case of layout properties, it would have the same sorts of information available to it as a layout worklet, thus allowing you to do a lot of the things people want to do with "container queries" as in this example below (available-inline-size is the contextual value provided during layout). Its use would look like this:

/* (proposed syntax, to be bikeshed much.. note the demos use a less flexible/different/easier to implement syntax for now ) */ 
.foo {
      (available-inline-size > 1024px) 1fr 4fr 1fr;
      (available-inline-size > 400px) 2fr 1fr;
      (available-inline-size > 100px) 1fr;
      default 1fr;

As similar as these may seem, almost everything about them concretely is different. Each is parsed and follows very different paths around what can be resolved and when, as well as what you can do with them. nth-value(), it was suggested by Mozilla's Emilio Cobos, should be extremely easy to implement because it reuses much of the existing infrastructure for CSS math functions. In fact, he went ahead and implemented it in Mozilla's code base to illustrate.

While things were too hectic to advance our own proposal for a while earlier this year, we did have a enough time to look into that and indeed, the nth-value() proposal was fairly simple to implement in Chromium too! In a very short time, without very sustained investment, we were able to create a complete patch that we could submit.

While nth-value() doesn't help advance the container queries use cases, we agree that it looks like a comparatively easy win for developers, and it might be worth having too.

So, we put it to you: Is it?

We would love your feedback on both of these things - are they things that you would like to see standards bodies and implementers pursue? We certainly are willing to implement a similar prototype for WebKit if necessary if developers are interested and it is standardized. Let us know what you think via @igalia or @briankardell!

November 05, 2020 05:00 AM

October 31, 2020

Eleni Maria Stea

[OpenGL and Vulkan Interoperability on Linux] Part 9: Reusing a Vulkan z buffer from OpenGL

In this 9th post on OpenGL and Vulkan interoperability on Linux with EXT_external_objects and EXT_external_objects_fd we are going to see another extensions use case where a Vulkan depth buffer is used to render a pattern with OpenGL. Like every other example use case described in these posts, it was implemented for Piglit as part of … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 9: Reusing a Vulkan z buffer from OpenGL

by hikiko at October 31, 2020 02:10 PM

October 30, 2020

Jacobo Aragunde

Event management in X11 Chromium

This is a follow-up of my previous post, where I was trying to fix the bug #1042864 in Chromium: key strokes happening on native dialogs, like open and save dialogs, were not reported to the screen reader.

After learning how accessibility tools (ATs) register listeners for key events, I found out the problem was not actually there; I had to investigate how events arrive from the X11 server to the browser, and how they are forwarded to the ATs.

Not this kind of event

Events arrive from the X server

If you are running Chromium on Linux with the X11 backend (most likely, as it is the default), the Chromium browser process receives key press events from the X server. Then, it finds out if the target of those events is one of its browser windows, and sends it to the proper Window object to be processed.

These are the classes involved in the first part of this process:

The interface PlatformEventSource represents an undetermined source of events coming from the platform, and a PlatformEventDispatcher is any object in the browser capable of managing those events, dispatching them to the actual webpage or UI element. These two classes are related, the PlatformEventSource keeps a list of dispatchers it will forward the event to, if they can manage it (CanDispatchEvent).

The X11EventSource class implements PlatformEventSource; it has the code managing the events coming from an X11 server, in particular. It additionally keeps a list of XEventDispatcher objects, which is a class to manage X11 Event objects independently, but it’s not an implementation of PlatformEventDispatcher.

The X11Window class is the central piece, implementing both the PlatformEventDispatcher and the XEventDispatcher interfaces, in addition to the XWindow class. It has all the means required to find out if it can dispatch an event, and do it.

The main event processing loop looks like this:

  1. An event arrives to X11EventSource.
  • X11EventSource loops through its list of XEventDispatcher, and calls CheckCanDispatchNextPlatformEvent for each of them.

  • The X11Window implementing that function checks if the XWindow ID of the event target matches the ID of the XWindow represented by that object, and saves the XEvent object if affirmative.

  • X11EventSource calls DispatchEvent as implemented by its parent class PlatformEventSource.

  • The PlatformEventSource loops through its list of PlatformEventDispatchers and calls CanDispatchEvent on each one of them.

  • The X11Window object, which had previously run CheckCanDispatchNextPlatformEvent, just verifies if the XEvent object was saved then, and considers that a confirmation it can dispatch the event.

  • When one of the dispatchers answers positively, it receives the event for processing in a call to DispatchEvent; it is implemented at X11Window.

  • If it’s a keyboard event, it takes the steps required to send it to any ATs listening to it, which had been previously registered via ATK.

  • When X11Window ends processing the event, it returns POST_DISPATCH_STOP_PROPAGATION, telling PlatformEventSource to stop looping through the rest of dispatchers.

  • This is a sequence diagram summarizing this process:

    Events leave to the ATs

    As explained in the previous post, ATs can register callbacks for key press events, which ultimately call AtkUtilClass::add_key_event_listener. AtkUtilClass is a struct of function pointers, the actual implementation is provided by Chromium in the AtkUtilAuraLinux class, which keeps a list of those callbacks.

    When an X11Window class encounters an event that is targetting its own X Window, and it is a keyboard event, it calls X11ExtensionDelegate::OnAtkEvent() which is actually implemented by the class DesktopWindowTreeHostLinux; it ultimately hands the event to the AtkUtilAuraLinux class and runs HandleAtkEvent(). It will loop through, and run, any listeners that may have been registered.

    Native dialogs are different

    Native dialogs are stand-alone windows in the X server, different from the browser window that called them, and the browser process doesn’t wrap them in X11Window object. It is considered unnecessary, because the windows for native dialogs talk to the X server and receive events from it directly.

    They do belong to the browser process, though, which means that the browser will still receive events targetting the dialog windows. They will go through all the steps mentioned above to eventually be dismissed, because there is no X11Window object in the browser matching the ID of the target window of the process.

    Another consequence of dialog windows belonging to the browser process is that the AtkUtilClass struct points to Chromium’s own implementation, and here comes the problem… The dialog is expected to manage its own events through GTK+ code, including the GTK+ implementation of AtkUtilClass, but Chromium overrode it. The key press listeners that ATs registered are kept in Chromium code, so the dialog cannot notify them.

    Finally, fixing the problem

    Chromium does receive the keyboard events targetted to the dialog windows, but it does nothing with them because the target of those events is not a browser window. It gives us, though, a leg towards building a solution.

    To fix the problem, I made Chromium X Windows manage the keyboard events addressed to the native dialogs in addition to their own. For that, I took advantage of the “transient” property, which indicates a dependency of one window from the other: the dialog window had been set as transient for the browser window. In my first approach, I modified X11Window::CheckCanDispatchNextPlatformEvent() to verify if the target of the event was a transient window of the browser X Window, and in that case it would hand the event to X11ExtensionDelegate to be sent to ATs, following the code patch previously explained. It stopped processing at this point, otherwise the browser window would have received key presses directed to the dialog.

    The approach had one performance problem: I was calling the X server to check that property, for every keystroke, and that call implied using synchronous IPC. This was unacceptable! But it could be worked around: we could also notify the corresponding internal X11Window object about the existence of this transient window, when the dialog is created. This implies no IPC at all, we just store one new property in the X11Window object that can be checked locally when keyboard events are processed.

    This is a link to the review process of the patch, if you are interested in its history. To sum up, in the final solution:

    1. Chromium creates the native dialog and calls XWindow::SetTransientWindow, setting that property in the corresponding browser X Window.
  • When Chromium receives a keyboard event, it is captured by the X11Window object whose transient window property has been set before.

  • X11ExtensionDelegate::OnAtkEvent() is called for that event, then no more processing of this event happens in Chromium.

  • The native dialog code will also receive the event and manage the keystroke accordingly.

  • I hope you enjoyed this trip through Chromium event processing code. If you want to use the diagrams in this post, you may find their Dia source files in this link. Happy hacking!

    by Jacobo Aragunde Pérez at October 30, 2020 05:00 PM

    October 29, 2020

    Claudio Saavedra

    Thu 2020/Oct/29

    In this line of work, we all stumble at least once upon a problem that turns out to be extremely elusive and very tricky to narrow down and solve. If we&aposre lucky, we might have everything at our disposal to diagnose the problem but sometimes that&aposs not the case – and in embedded development it&aposs often not the case. Add to the mix proprietary drivers, lack of debugging symbols, a bug that&aposs very hard to reproduce under a controlled environment, and weeks in partial confinement due to a pandemic and what you have is better described as a very long lucid nightmare. Thankfully, even the worst of nightmares end when morning comes, even if sometimes morning might be several days away. And when the fix to the problem is in an inimaginable place, the story is definitely one worth telling.

    The problem

    It all started with one of Igalia&aposs customers deploying a WPE WebKit-based browser in their embedded devices. Their CI infrastructure had detected a problem caused when the browser was tasked with creating a new webview (in layman terms, you can imagine that to be the same as opening a new tab in your browser). Occasionally, this view would never load, causing ongoing tests to fail. For some reason, the test failure had a reproducibility of ~75% in the CI environment, but during manual testing it would occur with less than a 1% of probability. For reasons that are beyond the scope of this post, the CI infrastructure was not reachable in a way that would allow to have access to running processes in order to diagnose the problem more easily. So with only logs at hand and less than a 1/100 chances of reproducing the bug myself, I set to debug this problem locally.


    The first that became evident was that, whenever this bug would occur, the WebKit feature known as web extension (an application-specific loadable module that is used to allow the program to have access to the internals of a web page, as well to enable customizable communication with the process where the page contents are loaded – the web process) wouldn&apost work. The browser would be forever waiting that the web extension loads, and since that wouldn&apost happen, the expected page wouldn&apost load. The first place to look into then is the web process and to try to understand what is preventing the web extension from loading. Enter here, our good friend GDB, with less than spectacular results thanks to stripped libraries.

    #0  0x7500ab9c in poll () from target:/lib/
    #1  0x73c08c0c in ?? () from target:/usr/lib/
    #2  0x73c08d2c in ?? () from target:/usr/lib/
    #3  0x73c08e0c in ?? () from target:/usr/lib/
    #4  0x73bold6a8 in ?? () from target:/usr/lib/
    #5  0x75f84208 in ?? () from target:/usr/lib/
    #6  0x75fa0b7e in ?? () from target:/usr/lib/
    #7  0x7561eda2 in ?? () from target:/usr/lib/
    #8  0x755a176a in ?? () from target:/usr/lib/
    #9  0x753cd842 in ?? () from target:/usr/lib/
    #10 0x75451660 in ?? () from target:/usr/lib/
    #11 0x75452882 in ?? () from target:/usr/lib/
    #12 0x75452fa8 in ?? () from target:/usr/lib/
    #13 0x76b1de62 in ?? () from target:/usr/lib/
    #14 0x76b5a970 in ?? () from target:/usr/lib/
    #15 0x74bee44c in g_main_context_dispatch () from target:/usr/lib/
    #16 0x74bee808 in ?? () from target:/usr/lib/
    #17 0x74beeba8 in g_main_loop_run () from target:/usr/lib/
    #18 0x76b5b11c in ?? () from target:/usr/lib/
    #19 0x75622338 in ?? () from target:/usr/lib/
    #20 0x74f59b58 in __libc_start_main () from target:/lib/
    #21 0x0045d8d0 in _start ()

    From all threads in the web process, after much tinkering around it slowly became clear that one of the places to look into is that poll() call. I will spare you the details related to what other threads were doing, suffice to say that whenever the browser would hit the bug, there was a similar stacktrace in one thread, going through libEGL to a call to poll() on top of the stack, that would never return. Unfortunately, a stripped EGL driver coming from a proprietary graphics vendor was a bit of a showstopper, as it was the inability to have proper debugging symbols running inside the device (did you know that a non-stripped WebKit library binary with debugging symbols can easily get GDB and your device out of memory?). The best one could do to improve that was to use the gcore feature in GDB, and extract a core from the device for post-mortem analysis. But for some reason, such a stacktrace wouldn&apost give anything interesting below the poll() call to understand what&aposs being polled here. Did I say this was tricky?

    What polls?

    Because WebKit is a multiprocess web engine, having system calls that signal, read, and write in sockets communicating with other processes is an everyday thing. Not knowing what a poll() call is doing and who is it that it&aposs trying to listen to, not very good. Because the call is happening under the EGL library, one can presume that it&aposs graphics related, but there are still different possibilities, so trying to find out what is this polling is a good idea.

    A trick I learned while debugging this is that, in absence of debugging symbols that would give a straightforward look into variables and parameters, one can examine the CPU registers and try to figure out from them what the parameters to function calls are. Let&aposs do that with poll(). First, its signature.

    int poll(struct pollfd *fds, nfds_t nfds, int timeout);

    Now, let's examine the registers.

    (gdb) f 0
    #0  0x7500ab9c in poll () from target:/lib/
    (gdb) info registers
    r0             0x7ea55e58	2124766808
    r1             0x1	1
    r2             0x64	100
    r3             0x0	0
    r4             0x0	0

    Registers r0, r1, and r2 contain poll()&aposs three parameters. Because r1 is 1, we know that there is only one file descriptor being polled. fds is a pointer to an array with one element then. Where is that first element? Well, right there, in the memory pointed to directly by r0. What does struct pollfd look like?

    struct pollfd {
      int   fd;         /* file descriptor */
      short events;     /* requested events */
      short revents;    /* returned events */

    What we are interested in here is the contents of fd, the file descriptor that is being polled. Memory alignment is again in our side, we don&apost need any pointer arithmetic here. We can inspect directly the register r0 and find out what the value of fd is.

    (gdb) print *0x7ea55e58
    $3 = 8

    So we now know that the EGL library is polling the file descriptor with an identifier of 8. But where is this file descriptor coming from? What is on the other end? The /proc file system can be helpful here.

    # pidof WPEWebProcess
    1944 1196
    # ls -lh /proc/1944/fd/8
    lrwx------    1 x x      64 Oct 22 13:59 /proc/1944/fd/8 -> socket:[32166]

    So we have a socket. What else can we find out about it? Turns out, not much without the unix_diag kernel module, which was not available in our device. But we are slowly getting closer. Time to call another good friend.

    Where GDB fails, printf() triumphs

    Something I have learned from many years working with a project as large as WebKit, is that debugging symbols can be very difficult to work with. To begin with, it takes ages to build WebKit with them. When cross-compiling, it&aposs even worse. And then, very often the target device doesn&apost even have enough memory to load the symbols when debugging. So they can be pretty useless. It&aposs then when just using fprintf() and logging useful information can simplify things. Since we know that it&aposs at some point during initialization of the web process that we end up stuck, and we also know that we&aposre polling a file descriptor, let&aposs find some early calls in the code of the web process and add some fprintf() calls with a bit of information, specially in those that might have something to do with EGL. What can we find out now?

    Oct 19 10:13:27.700335 WPEWebProcess[92]: Starting
    Oct 19 10:13:27.720575 WPEWebProcess[92]: Initializing WebProcess platform.
    Oct 19 10:13:27.727850 WPEWebProcess[92]: wpe_loader_init() done.
    Oct 19 10:13:27.729054 WPEWebProcess[92]: Initializing PlatformDisplayLibWPE (hostFD: 8).
    Oct 19 10:13:27.730166 WPEWebProcess[92]: egl backend created.
    Oct 19 10:13:27.741556 WPEWebProcess[92]: got native display.
    Oct 19 10:13:27.742565 WPEWebProcess[92]: initializeEGLDisplay() starting.

    Two interesting findings from the fprintf()-powered logging here: first, it seems that file descriptor 8 is one known to libwpe (the general-purpose library that powers the WPE WebKit port). Second, that the last EGL API call right before the web process hangs on poll() is a call to eglInitialize(). fprintf(), thanks for your service.

    Number 8

    We now know that the file descriptor 8 is coming from WPE and is not internal to the EGL library. libwpe gets this file descriptor from the UI process, as one of the many creation parameters that are passed via IPC to the nascent process in order to initialize it. Turns out that this file descriptor in particular, the so-called host client file descriptor, is the one that the freedesktop backend of libWPE, from here onwards WPEBackend-fdo, creates when a new client is set to connect to its Wayland display. In a nutshell, in presence of a new client, a Wayland display is supposed to create a pair of connected sockets, create a new client on the Display-side, give it one of the file descriptors, and pass the other one to the client process. Because this will be useful later on, let&aposs see how is that currently implemented in WPEBackend-fdo.

        int pair[2];
        if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0, pair)  0)
            return -1;
        int clientFd = dup(pair[1]);
        wl_client_create(m_display, pair[0]);

    The file descriptor we are tracking down is the client file descriptor, clientFd. So we now know what&aposs going on in this socket: Wayland-specific communication. Let&aposs enable Wayland debugging next, by running all relevant process with WAYLAND_DEBUG=1. We&aposll get back to that code fragment later on.

    A Heisenbug is a Heisenbug is a Heisenbug

    Turns out that enabling Wayland debugging output for a few processes is enough to alter the state of the system in such a way that the bug does not happen at all when doing manual testing. Thankfully the CI&aposs reproducibility is much higher, so after waiting overnight for the CI to continuously run until it hit the bug, we have logs. What do the logs say?

    WPEWebProcess[41]: initializeEGLDisplay() starting.
      -> wl_display@1.get_registry(new id wl_registry@2)
      -> wl_display@1.sync(new id wl_callback@3)

    So the EGL library is trying to fetch the Wayland registry and it&aposs doing a wl_display_sync() call afterwards, which will block until the server responds. That&aposs where the blocking poll() call comes from. So, it turns out, the problem is not necessarily on this end of the Wayland socket, but perhaps on the other side, that is, in the so-called UI process (the main browser process). Why is the Wayland display not replying?

    The loop

    Something that is worth mentioning before we move on is how the WPEBackend-fdo Wayland display integrates with the system. This display is a nested display, with each web view a client, while it is itself a client of the system&aposs Wayland display. This can be a bit confusing if you&aposre not very familiar with how Wayland works, but fortunately there is good documentation about Wayland elsewhere.

    The way that the Wayland display in the UI process of a WPEWebKit browser is integrated with the rest of the program, when it uses WPEBackend-fdo, is through the GLib main event loop. Wayland itself has an event loop implementation for servers, but for a GLib-powered application it can be useful to use GLib&aposs and integrate Wayland&aposs event processing with the different stages of the GLib main loop. That is precisely how WPEBackend-fdo is handling its clients&apos events. As discussed earlier, when a new client is created a pair of connected sockets are created and one end is given to Wayland to control communication with the client. GSourceFunc functions are used to integrate Wayland with the application main loop. In these functions, we make sure that whenever there are pending messages to be sent to clients, those are sent, and whenever any of the client sockets has pending data to be read, Wayland reads from them, and to dispatch the events that might be necessary in response to the incoming data. And here is where things start getting really strange, because after doing a bit of fprintf()-powered debugging inside the Wayland-GSourceFuncs functions, it became clear that the Wayland events from the clients were never dispatched, because the dispatch() GSourceFunc was not being called, as if there was nothing coming from any Wayland client. But how is that possible, if we already know that the web process client is actually trying to get the Wayland registry?

    To move forward, one needs to understand how the GLib main loop works, in particular, with Unix file descriptor sources. A very brief summary of this is that, during an iteration of the main loop, GLib will poll file descriptors to see if there are any interesting events to be reported back to their respective sources, in which case the sources will decide whether to trigger the dispatch() phase. A simple source might decide in its dispatch() method to directly read or write from/to the file descriptor; a Wayland display source (as in our case), will call wl_event_loop_dispatch() to do this for us. However, if the source doesn&apost find any interesting events, or if the source decides that it doesn&apost want to handle them, the dispatch() invocation will not happen. More on the GLib main event loop in its API documentation.

    So it seems that for some reason the dispatch() method is not being called. Does that mean that there are no interesting events to read from? Let&aposs find out.

    System call tracing

    Here we resort to another helpful tool, strace. With strace we can try to figure out what is happening when the main loop polls file descriptors. The strace output is huge (because it takes easily over a hundred attempts to reproduce this), but we know already some of the calls that involve file descriptors from the code we looked at above, when the client is created. So we can use those calls as a starting point in when searching through the several MBs of logs. Fast-forward to the relevant logs.

    socketpair(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, [128, 130]) = 0
    dup(130)               = 131
    close(130)             = 0
    fcntl64(128, F_DUPFD_CLOEXEC, 0) = 130
    epoll_ctl(34, EPOLL_CTL_ADD, 130, {EPOLLIN, {u32=1639599928, u64=1639599928}}) = 0

    What we see there is, first, WPEBackend-fdo creating a new socket pair (128, 130) and then, when file descriptor 130 is passed to wl_client_create() to create a new client, Wayland adds that file descriptor to its epoll() instance for monitoring clients, which is referred to by file descriptor 34. This way, whenever there are events in file descriptor 130, we will hear about them in file descriptor 34.

    So what we would expect to see next is that, after the web process is spawned, when a Wayland client is created using the passed file descriptor and the EGL driver requests the Wayland registry from the display, there should be a POLLIN event coming in file descriptor 34 and, if the dispatch() call for the source was called, a epoll_wait() call on it, as that is what wl_event_loop_dispatch() would do when called from the source&aposs dispatch() method. But what do we have instead?

    poll([{fd=30, events=POLLIN}, {fd=34, events=POLLIN}, {fd=59, events=POLLIN}, {fd=110, events=POLLIN}, {fd=114, events=POLLIN}, {fd=132, events=POLLIN}], 6, 0) = 1 ([{fd=34, revents=POLLIN}])
    recvmsg(30, {msg_namelen=0}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)

    strace can be a bit cryptic, so let&aposs explain those two function calls. The first one is a poll in a series of file descriptors (including 30 and 34) for POLLIN events. The return value of that call tells us that there is a POLLIN event in file descriptor 34 (the Wayland display epoll() instance for clients). But unintuitively, the call right after is trying to read a message from socket 30 instead, which we know doesn&apost have any pending data at the moment, and consequently returns an error value with an errno of EAGAIN (Resource temporarily unavailable).

    Why is the GLib main loop triggering a read from 30 instead of 34? And who is 30?

    We can answer the latter question first. Breaking on a running UI process instance at the right time shows who is reading from the file descriptor 30:

    #1  0x70ae1394 in wl_os_recvmsg_cloexec (sockfd=30, msg=msg@entry=0x700fea54, flags=flags@entry=64)
    #2  0x70adf644 in wl_connection_read (connection=0x6f70b7e8)
    #3  0x70ade70c in read_events (display=0x6f709c90)
    #4  wl_display_read_events (display=0x6f709c90)
    #5  0x70277d98 in pwl_source_check (source=0x6f71cb80)
    #6  0x743f2140 in g_main_context_check (context=context@entry=0x2111978, max_priority=, fds=fds@entry=0x6165f718, n_fds=n_fds@entry=4)
    #7  0x743f277c in g_main_context_iterate (context=0x2111978, block=block@entry=1, dispatch=dispatch@entry=1, self=)
    #8  0x743f2ba8 in g_main_loop_run (loop=0x20ece40)
    #9  0x00537b38 in ?? ()

    So it&aposs also Wayland, but on a different level. This is the Wayland client source (remember that the browser is also a Wayland client?), which is installed by cog (a thin browser layer on top of WPE WebKit that makes writing browsers easier to do) to process, among others, input events coming from the parent Wayland display. Looking at the cog code, we can see that the wl_display_read_events() call happens only if GLib reports that there is a G_IO_IN (POLLIN) event in its file descriptor, but we already know that this is not the case, as per the strace output. So at this point we know that there are two things here that are not right:

    1. A FD source with a G_IO_IN condition is not being dispatched.
    2. A FD source without a G_IO_IN condition is being dispatched.

    Someone here is not telling the truth, and as a result the main loop is dispatching the wrong sources.

    The loop (part II)

    It is at this point that it would be a good idea to look at what exactly the GLib main loop is doing internally in each of its stages and how it tracks the sources and file descriptors that are polled and that need to be processed. Fortunately, debugging symbols for GLib are very small, so debugging this step by step inside the device is rather easy.

    Let&aposs look at how the main loop decides which sources to dispatch, since for some reason it&aposs dispatching the wrong ones. Dispatching happens in the g_main_dispatch() method. This method goes over a list of pending source dispatches and after a few checks and setting the stage, the dispatch method for the source gets called. How is a source set as having a pending dispatch? This happens in g_main_context_check(), where the main loop checks the results of the polling done in this iteration and runs the check() method for sources that are not ready yet so that they can decide whether they are ready to be dispatched or not. Breaking into the Wayland display source, I know that the check() method is called. How does this method decide to be dispatched or not?

        [](GSource* base) -> gboolean
            auto& source = *reinterpret_cast(base);
            return !!source.pfd.revents;

    In this lambda function we&aposre returning TRUE or FALSE, depending on whether the revents field in the GPollFD structure have been filled during the polling stage of this iteration of the loop. A return value of TRUE indicates the main loop that we want our source to be dispatched. From the strace output, we know that there is a POLLIN (or G_IO_IN) condition, but we also know that the main loop is not dispatching it. So let&aposs look at what&aposs in this GPollFD structure.

    For this, let&aposs go back to g_main_context_check() and inspect the array of GPollFD structures that it received when called. What do we find?

    (gdb) print *fds
    $35 = {fd = 30, events = 1, revents = 0}
    (gdb) print *(fds+1)
    $36 = {fd = 34, events = 1, revents = 1}

    That&aposs the result of the poll() call! So far so good. Now the method is supposed to update the polling records it keeps and it uses when calling each of the sources check() functions. What do these records hold?

    (gdb) print *pollrec->fd
    $45 = {fd = 19, events = 1, revents = 0}
    (gdb) print *(pollrec->next->fd)
    $47 = {fd = 30, events = 25, revents = 1}
    (gdb) print *(pollrec->next->next->fd)
    $49 = {fd = 34, events = 25, revents = 0}

    We&aposre not interested in the first record quite yet, but clearly there&aposs something odd here. The polling records are showing a different value in the revent fields for both 30 and 34. Are these records updated correctly? Let&aposs look at the algorithm that is doing this update, because it will be relevant later on.

      pollrec = context->poll_records;
      i = 0;
      while (pollrec && i  n_fds)
          while (pollrec && pollrec->fd->fd == fds[i].fd)
              if (pollrec->priority = max_priority)
                  pollrec->fd->revents =
                    fds[i].revents & (pollrec->fd->events | G_IO_ERR | G_IO_HUP | G_IO_NVAL);
              pollrec = pollrec->next;

    In simple words, what this algorithm is doing is to traverse simultaneously the polling records and the GPollFD array, updating the polling records revents with the results of polling. From reading how the pollrec linked list is built internally, it&aposs possible to see that it&aposs purposely sorted by increasing file descriptor identifier value. So the first item in the list will have the record for the lowest file descriptor identifier, and so on. The GPollFD array is also built in this way, allowing for a nice optimization: if more than one polling record – that is, more than one polling source – needs to poll the same file descriptor, this can be done at once. This is why this otherwise O(n^2) nested loop can actually be reduced to linear time.

    One thing stands out here though: the linked list is only advanced when we find a match. Does this mean that we always have a match between polling records and the file descriptors that have just been polled? To answer that question we need to check how is the array of GPollFD structures filled. This is done in g_main_context_query(), as we hinted before. I&aposll spare you the details, and just focus on what seems relevant here: when is a poll record not used to fill a GPollFD?

      n_poll = 0;
      lastpollrec = NULL;
      for (pollrec = context->poll_records; pollrec; pollrec = pollrec->next)
          if (pollrec->priority > max_priority)

    Interesting! If a polling record belongs to a source whose priority is lower than the maximum priority that the current iteration is going to process, the polling record is skipped. Why is this?

    In simple terms, this happens because each iteration of the main loop finds out the highest priority between the sources that are ready in the prepare() stage, before polling, and then only those file descriptor sources with at least such a a priority are polled. The idea behind this is to make sure that high-priority sources are processed first, and that no file descriptor sources with lower priority are polled in vain, as they shouldn&apost be dispatched in the current iteration.

    GDB tells me that the maximum priority in this iteration is -60. From an earlier GDB output, we also know that there&aposs a source for a file descriptor 19 with a priority 0.

    (gdb) print *pollrec
    $44 = {fd = 0x7369c8, prev = 0x0, next = 0x6f701560, priority = 0}
    (gdb) print *pollrec->fd
    $45 = {fd = 19, events = 1, revents = 0}

    Since 19 is lower than 30 and 34, we know that this record is before theirs in the linked list (and so it happens, it&aposs the first one in the list too). But we know that, because its priority is 0, it is too low to be added to the file descriptor array to be polled. Let&aposs look at the loop again.

      pollrec = context->poll_records;
      i = 0;
      while (pollrec && i  n_fds)
          while (pollrec && pollrec->fd->fd == fds[i].fd)
              if (pollrec->priority = max_priority)
                  pollrec->fd->revents =
                    fds[i].revents & (pollrec->fd->events | G_IO_ERR | G_IO_HUP | G_IO_NVAL);
              pollrec = pollrec->next;

    The first polling record was skipped during the update of the GPollFD array, so the condition pollrec && pollrec->fd->fd == fds[i].fd is never going to be satisfied, because 19 is not in the array. The innermost while() is not entered, and as such the pollrec list pointer never moves forward to the next record. So no polling record is updated here, even if we have updated revent information from the polling results.

    What happens next should be easy to see. The check() method for all polled sources are called with outdated revents. In the case of the source for file descriptor 30, we wrongly tell it there&aposs a G_IO_IN condition, so it asks the main loop to call dispatch it triggering a a wl_connection_read() call in a socket with no incoming data. For the source with file descriptor 34, we tell it that there&aposs no incoming data and its dispatch() method is not invoked, even when on the other side of the socket we have a client waiting for data to come and blocking in the meantime. This explains what we see in the strace output above. If the source with file descriptor 19 continues to be ready and with its priority unchanged, then this situation repeats in every further iteration of the main loop, leading to a hang in the web process that is forever waiting that the UI process reads its socket pipe.

    The bug – explained

    I have been using GLib for a very long time, and I have only fixed a couple of minor bugs in it over the years. Very few actually, which is why it was very difficult for me to come to accept that I had found a bug in one of the most reliable and complex parts of the library. Impostor syndrome is a thing and it really gets in the way.

    But in a nutshell, the bug in the GLib main loop is that the very clever linear update of registers is missing something very important: it should skip to the first polling record matching before attempting to update its revents. Without this, in the presence of a file descriptor source with the lowest file descriptor identifier and also a lower priority than the cutting priority in the current main loop iteration, revents in the polling registers are not updated and therefore the wrong sources can be dispatched. The simplest patch to avoid this, would look as follows.

       i = 0;
       while (pollrec && i  n_fds)
    +      while (pollrec && pollrec->fd->fd != fds[i].fd)
    +        pollrec = pollrec->next;
           while (pollrec && pollrec->fd->fd == fds[i].fd)
               if (pollrec->priority = max_priority)

    Once we find the first matching record, let&aposs update all consecutive records that also match and need an update, then let&aposs skip to the next record, rinse and repeat. With this two-line patch, the web process was finally unlocked, the EGL display initialized properly, the web extension and the web page were loaded, CI tests starting passing again, and this exhausted developer could finally put his mind to rest.

    A complete patch, including improvements to the code comments around this fascinating part of GLib and also a minimal test case reproducing the bug have already been reviewed by the GLib maintainers and merged to both stable and development branches. I expect that at least some GLib sources will start being called in a different (but correct) order from now on, so keep an eye on your GLib sources. :-)

    Standing on the shoulders of giants

    At this point I should acknowledge that without the support from my colleagues in the WebKit team in Igalia, getting to the bottom of this problem would have probably been much harder and perhaps my sanity would have been at stake. I want to thank Adrián and &Zcaronan for their input on Wayland, debugging techniques, and for allowing me to bounce back and forth ideas and findings as I went deeper into this rabbit hole, helping me to step out of dead-ends, reminding me to use tools out of my everyday box, and ultimately, to be brave enough to doubt GLib&aposs correctness, something that much more often than not I take for granted.

    Thanks also to Philip and Sebastian for their feedback and prompt code review!

    October 29, 2020 01:10 PM

    October 23, 2020

    Eleni Maria Stea

    [OpenGL and Vulkan Interoperability on Linux] Part 8: Using a Vulkan vertex buffer from OpenGL and then from Vulkan

    This is the 8th post on OpenGL and Vulkan Interoperability with EXT_external_objects and EXT_external_objects_fd where I explain some example use cases of the extensions I’ve implemented for Piglit as part of my work for Igalia. In this example, a Vulkan vertex buffer is created and filled with vertices and then it’s used to render the … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 8: Using a Vulkan vertex buffer from OpenGL and then from Vulkan

    by hikiko at October 23, 2020 05:00 PM

    October 18, 2020

    Eleni Maria Stea

    [OpenGL and Vulkan Interoperability on Linux] Part 7: Reusing a Vulkan vertex buffer from OpenGL

    This is the 7th post on OpenGL and Vulkan Interoperability with EXT_external_objects. It’s about another EXT_external_objects use case implemented for Piglit as part of my work for Igalia‘s graphics team. In this case a vertex buffer is allocated and filled with data from Vulkan and then it’s used from OpenGL to render a pattern on … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 7: Reusing a Vulkan vertex buffer from OpenGL

    by hikiko at October 18, 2020 04:47 PM

    [OpenGL and Vulkan Interoperability on Linux] Part 6: We should be able to reuse a Vulkan pixel buffer from OpenGL but not to overwrite it!

    This is another blog post on OpenGL and Vulkan Interoperability. It’s not really a description of a new use case as the Piglit test I am going to describe is quite similar to the previous example we’ve seen where we reused a Vulkan pixel buffer from OpenGL. This Piglit test was written because there’s an … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 6: We should be able to reuse a Vulkan pixel buffer from OpenGL but not to overwrite it!

    by hikiko at October 18, 2020 11:23 AM

    [OpenGL and Vulkan Interoperability on Linux] Part 5: A Vulkan pixel buffer is reused by OpenGL

    This is the 5th post of the OpenGL and Vulkan interoperability series where I describe some use cases for the EXT_external_objects and EXT_external_objects_fd extensions. These use cases have been implemented inside Piglit as part of my work for Igalia‘s graphics team using a Vulkan framework I’ve written for this purpose. And in this 5th post, … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 5: A Vulkan pixel buffer is reused by OpenGL

    by hikiko at October 18, 2020 10:02 AM

    October 17, 2020

    Eleni Maria Stea

    [OpenGL and Vulkan Interoperability on Linux] Part 4: Using OpenGL to overwrite Vulkan allocated textures.

    This is the 4th post on OpenGL and Vulkan Interoperability on Linux. The first one was an introduction to EXT_external_objects and EXT_external_objects_fd extensions, the second was describing a simple interoperability use case where a Vulkan allocated textured is filled by OpenGL, and the third was about a slightly more complex use case where a Vulkan … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 4: Using OpenGL to overwrite Vulkan allocated textures.

    by hikiko at October 17, 2020 08:06 AM

    October 16, 2020

    Enrique Ocaña

    Figuring out corrupt stacktraces on ARM

    If you’re developing C/C++ on embedded devices, you might already have stumbled upon a corrupt stacktrace like this when trying to debug with gdb:

    (gdb) bt 
    #0  0xb38e32c4 in pthread_getname_np () from /home/enrique/buildroot/output5/staging/lib/
    #1  0xb38e103c in __lll_timedlock_wait () from /home/enrique/buildroot/output5/staging/lib/ 
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)

    In these cases I usually give up gdb and try to solve my problems by adding printf()s and resorting to other tools. However, there are times when you really really need to know what is in that cursed stack.

    ARM devices subroutine calls work by setting the return address in the Link Register (LR), so the subroutine knows where to point the Program Counter (PC) register to. While not jumping into subroutines, the values of the LR register is saved in the stack (to be restored later, right before the current subroutine returns to the caller) and the register can be used for other tasks (LR is a “scratch register”). This means that the functions in the backtrace are actually there, in the stack, in the form of older saved LRs, waiting for us to get them.

    So, the first step would be to dump the memory contents of the backtrace, starting from the address pointed by the Stack Pointer (SP). Let’s print the first 256 32-bit words and save them as a file from gdb:

    (gdb) set logging overwrite on
    (gdb) set logging file /tmp/bt.txt
    (gdb) set logging on
    Copying output to /tmp/bt.txt.
    (gdb) x/256wa $sp
    0xbe9772b0:     0x821e  0xb38e103d   0x1aef48   0xb1973df0
    0xbe9772c0:      0x73d  0xb38dc51f        0x0          0x1
    0xbe9772d0:   0x191d58    0x191da4   0x19f200   0xb31ae5ed
    0xbe977560: 0xb28c6000  0xbe9776b4        0x5      0x10871 <main(int, char**)>
    0xbe977570: 0xb6f93000  0xaaaaaaab 0xaf85fd4a   0xa36dbc17
    0xbe977580:      0x130         0x0    0x109b9 <__libc_csu_init> 0x0
    0xbe977690:        0x0         0x0    0x108cd <_start>  0x0
    0xbe9776a0:        0x0     0x108ed <_start+32>  0x10a19 <__libc_csu_fini> 0xb6f76969  
    (gdb) set logging off
    Done logging to /tmp/bt.txt.

    Gdb already can name some of the functions (like main()), but not all of them. At least not the ones more interesting for our purpose. We’ll have to look for them by hand.

    We first get the memory page mapping from the process (WebKit’s WebProcess in my case) looking in /proc/pid/maps. I’m retrieving it from the device (named metro) via ssh and saving it to a local file. I’m only interested in the code pages, those with executable (‘x’) permissions:

    $ ssh metro 'cat /proc/$(ps axu | grep WebProcess | grep -v grep | { read _ P _ ; echo $P ; })/maps | grep " r.x. "' > /tmp/maps.txt

    The file looks like this:

    00010000-00011000 r-xp 00000000 103:04 2617      /usr/bin/WPEWebProcess
    b54f2000-b6e1e000 r-xp 00000000 103:04 1963      /usr/lib/ 
    b6f6b000-b6f82000 r-xp 00000000 00:02 816        /lib/ 
    be957000-be978000 rwxp 00000000 00:00 0          [stack] 
    be979000-be97a000 r-xp 00000000 00:00 0          [sigpage] 
    be97b000-be97c000 r-xp 00000000 00:00 0          [vdso] 
    ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]

    Now we process the backtrace to remove address markers and have one word per line:

    $ cat /tmp/bt.txt | sed -e 's/^[^:]*://' -e 's/[<][^>]*[>]//g' | while read A B C D; do echo $A; echo $B; echo $C; echo $D; done | sed 's/^0x//' | while read P; do printf '%08x\n' "$((16#"$P"))"; done | sponge /tmp/bt.txt

    Then merge and sort both files, so the addresses in the stack appear below their corresponding mappings:

    $ cat /tmp/maps.txt /tmp/bt.txt | sort > /tmp/merged.txt

    Now we process the resulting file to get each address in the stack with its corresponding mapping:

    $ cat /tmp/merged.txt | while read LINE; do if [[ $LINE =~ - ]]; then MAPPING="$LINE"; else echo $LINE '-->' $MAPPING; fi; done | grep '/' | sed -E -e 's/([0-9a-f][0-9a-f]*)-([0-9a-f][0-9a-f]*)/\1 - \2/' > /tmp/mapped.txt

    Like this (address in the stack, page start (or base), page end, page permissions, executable file load offset (base offset), etc.):

    0001034c --> 00010000 - 00011000 r-xp 00000000 103:04 2617 /usr/bin/WPEWebProcess
    b550bfa4 --> b54f2000 - b6e1e000 r-xp 00000000 103:04 1963 /usr/lib/ 
    b5937445 --> b54f2000 - b6e1e000 r-xp 00000000 103:04 1963 /usr/lib/ 
    b5fb0319 --> b54f2000 - b6e1e000 r-xp 00000000 103:04 1963 /usr/lib/

    The addr2line tool can give us the exact function an address belongs to, or even the function and source code line if the code has been built with symbols. But the addresses addr2line understands are internal offsets, not absolute memory addresses. We can convert the addresses in the stack to offsets with this expression:

    offset = address - page start + base offset

    I’m using buildroot as my cross-build environment, so I need to pick the library files from the staging directory because those are the unstripped versions. The addr2line tool is the one from the buldroot cross compiling toolchain. Written as a script:

    $ cat /tmp/mapped.txt | while read ADDR _ BASE _ END _ BASEOFFSET _ _ FILE; do OFFSET=$(printf "%08x\n" $((0x$ADDR - 0x$BASE + 0x$BASEOFFSET))); FILE=~/buildroot/output/staging/$FILE; if [[ -f $FILE ]]; then LINE=$(~/buildroot/output/host/usr/bin/arm-buildroot-linux-gnueabihf-addr2line -p -f -C -e $FILE $OFFSET); echo "$ADDR $LINE"; fi; done > /tmp/addr2line.txt

    Finally, we filter out the useless [??] entries:

    $ cat /tmp/bt.txt | while read DATA; do cat /tmp/addr2line.txt | grep "$DATA"; done | grep -v '[?][?]' > /tmp/fullbt.txt

    What remains is something very similar to what the real backtrace should have been if everything had originally worked as it should in gdb:

    b31ae5ed gst_pad_send_event_unchecked en /home/enrique/buildroot/output5/build/gstreamer1-1.10.4/gst/gstpad.c:5571 
    b31a46c1 gst_debug_log en /home/enrique/buildroot/output5/build/gstreamer1-1.10.4/gst/gstinfo.c:444 
    b31b7ead gst_pad_send_event en /home/enrique/buildroot/output5/build/gstreamer1-1.10.4/gst/gstpad.c:5775 
    b666250d WebCore::AppendPipeline::injectProtectionEventIfPending() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebCore/platform/graphics/gstreamer/mse/AppendPipeline.cpp:1360 
    b657b411 WTF::GRefPtr<_GstEvent>::~GRefPtr() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/DerivedSources/ForwardingHeaders/wtf/glib/GRefPtr.h:76 
    b5fb0319 WebCore::HTMLMediaElement::pendingActionTimerFired() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebCore/html/HTMLMediaElement.cpp:1179 
    b61a524d WebCore::ThreadTimers::sharedTimerFiredInternal() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebCore/platform/ThreadTimers.cpp:120 
    b61a5291 WTF::Function<void ()>::CallableWrapper<WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::{lambda()#1}>::call() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/DerivedSources/ForwardingHeaders/wtf/Function.h:101 
    b6c809a3 operator() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:171 
    b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164 
    b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164 
    b2ad4223 g_main_context_dispatch en :? 
    b6c80601 WTF::{lambda(_GSource*, int (*)(void*), void*)#1}::_FUN(_GSource*, int (*)(void*), void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:40 
    b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164 
    b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164 
    b2adfc49 g_poll en :? 
    b2ad44b7 g_main_context_iterate.isra.29 en :? 
    b2ad477d g_main_loop_run en :? 
    b6c80de3 WTF::RunLoop::run() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:97 
    b6c654ed WTF::RunLoop::dispatch(WTF::Function<void ()>&&) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/RunLoop.cpp:128 
    b5937445 int WebKit::ChildProcessMain<WebKit::WebProcess, WebKit::WebProcessMain>(int, char**) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebKit/Shared/unix/ChildProcessMain.h:64 
    b27b2978 __bss_start en :?

    I hope you find this trick useful and the scripts handy in case you ever to resort to examining the raw stack to get a meaningful backtrace.

    Happy debugging!

    by eocanha at October 16, 2020 06:07 PM

    October 15, 2020

    Andy Wingo

    on "binary security of webassembly"


    You may have seen an interesting paper cross your radar a couple months ago: Everything Old is New Again: Binary Security of WebAssembly, by Daniel Lehmann, Johannes Kinder and Michael Pradel. The paper makes some strong claims and I would like to share some thoughts on it.

    reader-response theory

    For context, I have been working on web browsers for the last 8 years or so, most recently on the JavaScript and WebAssembly engine in Firefox. My work mostly consists of implementing new features, which if you are familiar with software development translates as "writing bugs". Almost all of those bugs are security bugs, potentially causing Firefox to go from being an agent of the user to an agent of the Mossad, or of cryptocurrency thieves, or anything else.

    Mitigating browser bug flow takes a siege mentality. Web browsers treat all web pages and their corresponding CSS, media, JavaScript, and WebAssembly as hostile. We try to reason about global security properties, and translate those properties into invariants ensured at compile-time and run-time, for example to ensure that a web page from site A can't access cookies from site B.

    In this regard, WebAssembly has some of the strongest isolation invariants in the whole platform. A WebAssembly module has access to nothing, by default: neither functionality nor data. Even a module's memory is isolated from the rest of the browser, both by construction (that's just how WebAssembly is specified) and by run-time measures (given that pointers are 32 bits in today's WebAssembly, we generally reserve a multi-gigabyte region for a module's memory that can contain nothing else).

    All of this may seem obvious, but consider that a C++ program compiled to native code on a POSIX platform can use essentially everything that the person running it has access to: your SSH secrets, your email, all of your programs, and so on. That same program compiled to WebAssembly does not -- any capability it has must have been given to it by the person running the program. For POSIX-like programs, the WebAssembly community is working on a POSIX for the web that standardizes a limited-capability access to data and functionality from the world, and in web browsers, well of course the module has access only to the capabilities that the embedding web page gives to it. Mostly, as the JS run-time accompanying the WebAssembly is usually generated by emscripten, this set of capabilties is a function of the program itself.

    Of course, complex WebAssembly systems may contain multiple agents, acting on behalf of different parties. For example, a module might, through capabilities provided by the host, be able to ask flickr to delete a photo, but might also be able to crawl a URL for photos. Probably in this system, crawling a web page shouldn't be able to "trick" the WebAssembly module into deleting a photo. The C++ program compiled to WebAssembly could have a bug of course, in which case, you get to keep both pieces.

    I mention all of this because we who work on WebAssembly are proud of this work! It is a pleasure to design and build a platform for high-performance code that provides robust capabilities-based security properties.

    the new criticism

    Therefore it was with skepticism that I started reading the Lehmann et al paper. The paper focusses on WebAssembly itself, not any particular implementation thereof; what could be wrong about WebAssembly?

    I found the answer to be quite nuanced. To me, the paper shows three interesting things:

    1. Memory-safety bugs in C/C++ programs when compiled to WebAssembly can cause control-flow edges that were not present in the source program.

    2. Unexpected control-flow in a web browser can sometimes end up in a call to eval with the permissions of the web page, which is not good.

    3. It's easier in some ways to exploit bugs in a C/C++ program when compiled to WebAssembly than when compiled natively, because many common mitigations aren't used by the WebAssembly compiler toolchain.

    Firstly, let's dicuss the control-flow point. Let's say that the program has a bug, and you have made an exploit to overwrite some memory location. What can you do with it? Well, consider indirect calls (call_indirect). This is what a compiler will emit for a vtable method call, or for a call to a function pointer. The possible targets for the indirect call are stored in a table, which is a side array of all possible call_indirect targets. The actual target is selected at run-time based on an index; WebAssembly function pointers are just indices into this table.

    So if a function loads an index into the indirect call table from memory, and some exploit can change this index, then you can cause a call site to change its callee. Although there is a run-time type check that occurs at the call_indirect site to ensure that the callee is called with the right type, many functions in a module can have compatible types and thus be callable without an error.

    OK, so that's not great. But what can you do with it? Well it turns out that emscripten will sometimes provide JavaScript's eval to the WebAssembly module. Usually it will be called only with a static string, but anything can happen. If an attacker can redirect a call site to eval instead of one of the possible targets from the source code, you can (e.g.) send the user's cookies to

    There's a similar vulnerability regarding changing the operand to eval, instead. Strings are represented in linear memory as well, and there's no write protection on them, even if they are read-only data. If your write primitive can change the string being passed to eval, that's also a win for the attacker. More details in the paper.

    This observation brings us to the last point, which is that many basic mitigations in (e.g.) POSIX deployments aren't present in WebAssembly. There are no OS-level read-only protections for static data, and the compiler doesn't enforce this either. Also WebAssembly programs have to bundle their own malloc, but the implementations provided by emscripten don't implement the "hardening" techniques. There is no addres-space layout randomization, so exploits are deterministic. And so on.

    on mitigations

    It must be said that for most people working on WebAssembly, security "mitigations" are... unsatisfactory. They aren't necessary for memory-safe programs, and they can't prevent memory-unsafe programs from having unexpected behavior. Besides, we who work on WebAssembly are more focussed on the security properties of the WebAssembly program as embedded in its environment, but not on the program itself. Garbage in, garbage out, right?

    In that regard, I think that one answer to this paper is just "don't". Don't ship memory-unsafe programs, or if you do, don't give them eval capabilities. No general mitigation will make these programs safe. Writing your program in e.g. safe Rust is a comprehensive fix to this class of bug.

    But, we have to admit also that shipping programs written in C and C++ is a primary goal of WebAssembly, and that no matter how hard we try, some buggy programs will get shipped, and therefore that there is marginal value to including mitigations like read-only data or even address space randomization. We definitely need to work on getting control-flow integrity protections working well with the WebAssembly toolchain, probably via multi-table support (part of the reference types extension; my colleague Paulo Matos just landed a patch in this area). And certainly Emscripten should work towards minimizing the capabilities set exposed to WebAssembly by the generated JavaScript, notably by compiling away uses of eval by embind.

    Finally, I think that many of the problems identified by this paper will be comprehensively fixed in a different way by more "managed" languages. The problem is that C/C++ pointers are capabilities into all of undifferentiated linear memory. By contrast, handles to GC-managed objects are unforgeable: given object A, you can't get to object B except if object A references B. It would be great if we could bring some of the benefits of this more capability-based approach to in-memory objects to languages like C and C++; more on that in a future note, I think.


    In the end, despite my initial orneriness, I have to admit that the paper authors point out some interesting areas to work on. It's clear that there's more work to do. I was also relieved to find that my code is not at fault in this particular instance :) Onwards and upwards, and until next time, happy hacking!

    by Andy Wingo at October 15, 2020 10:29 AM

    October 13, 2020

    Andy Wingo

    malloc as a service

    Greetings, internet! Today I have the silliest of demos for you: malloc-as-a-service.

    loading walloc...

    JavaScript disabled, no walloc demo. See the walloc web page for more information. >&&<&>>>>>&&>

    The above input box, if things managed to work, loads up a simple bare-bones malloc implementation, and exposes "malloc" and "free" bindings. But the neat thing is that it's built without emscripten: it's a standalone C file that compiles directly to WebAssembly, with no JavaScript run-time at all. I share it here because it might come in handy to people working on WebAssembly toolchains, and also because it was an amusing experience to build.


    The name of the allocator is "walloc", in which the w is for WebAssembly.

    Walloc was designed with the following priorities, in order:

    1. Standalone. No stdlib needed; no emscripten. Can be included in a project without pulling in anything else.

    2. Reasonable allocation speed and fragmentation/overhead.

    3. Small size, to minimize download time.

    4. Standard interface: a drop-in replacement for malloc.

    5. Single-threaded (currently, anyway).

    Emscripten includes a couple of good malloc implementations (dlmalloc and emmalloc) which probably you should use instead. But if you are really looking for a bare-bones malloc, walloc is fine.

    You can check out all the details at the walloc project page; a selection of salient bits are below.

    Firstly, to build walloc, it's just a straight-up compile:

    clang -DNDEBUG -Oz --target=wasm32 -nostdlib -c -o walloc.o walloc.c

    The resulting walloc.o is a conforming WebAssembly file on its own, but which also contains additional symbol table and relocation sections which allow wasm-ld to combine separate compilation units into a single final WebAssembly file. walloc.c on its own doesn't import or export anything, in the WebAssembly sense; to make bindings visible to JS, you need to add a little wrapper:

    typedef __SIZE_TYPE__ size_t;
    #define WASM_EXPORT(name) \
      __attribute__((export_name(#name))) \
    // Declare these as coming from walloc.c.
    void *malloc(size_t size);
    void free(void *p);
    void* WASM_EXPORT(walloc)(size_t size) {
      return malloc(size);
    void WASM_EXPORT(wfree)(void* ptr) {

    If you compile that to exports.o and link via wasm-ld --no-entry --import-memory -o walloc.wasm exports.o walloc.o, you end up with the walloc.wasm used in the demo above. See your inspector for the URL.

    The resulting wasm file is about 2 kB (uncompressed).

    Walloc isn't the smallest allocator out there. A simple bump-pointer allocator that never frees is the fastest thing you can have. There is also an alternate allocator for Rust, wee_alloc, which is said to be smaller than walloc, though I think it is less space-efficient for small objects. But still, walloc is pretty small.

    implementation notes

    When a C program is compiled to WebAssembly, the resulting wasm module (usually) has associated linear memory. It can be linked in a way that the memory is created by the module when it's instantiated, or such that the module is given a memory by its host. The above example passed --import-memory to the linker, allowing the host to bound memory usage for the module instance.

    The linear memory has the usual data, stack, and heap segments. The data and stack are placed first. The heap starts at the &__heap_base symbol. (This symbol is computed and defined by the linker.) All bytes above &__heap_base can be used by the wasm program as it likes. So &__heap_base is the lower bound of memory managed by walloc.

                                                  memory growth ->
    | data and stack | alignment | walloc page | walloc page | ...
    ^ 0              ^ &__heap_base            ^ 64 kB aligned

    Interestingly, there are a few different orderings of data and stack used by different toolchains. It used to even be the case that the stack grew up. This diagram from the recent "Everything Old is New Again: Binary Security of WebAssembly" paper by Lehmann et al is a good summary:

    The sensible thing to prevent accidental overflow (underflow, really) is to have the stack grow down to 0, with data at higher addresses. But this can cause WebAssembly code that references data to take up more bytes, because addresses are written using variable-length "LEB" encodings that favor short offsets, so it isn't the default, right now at least.

    Anyway! The upper bound of memory managed by walloc is the total size of the memory, which is aligned on 64-kilobyte boundaries. (WebAssembly ensures this alignment.) Walloc manages memory in 64-kb pages as well. It starts with whatever memory is initially given to the module, and will expand the memory if it runs out. The host can specify a maximum memory size, in pages; if no more pages are available, walloc's malloc will simply return NULL; handling out-of-memory is up to the caller.

    Walloc has two allocation strategies: small and large objects.

    big bois

    A large object is more than 256 bytes.

    There is a global freelist of available large objects, each of which has a header indicating its size. When allocating, walloc does a best-fit search through that list.

    struct large_object {
      struct large_object *next;
      size_t size;
      char payload[0];
    struct large_object* large_object_free_list;

    Large object allocations are rounded up to 256-byte boundaries, including the header.

    If there is no object on the freelist that can satisfy an allocation, walloc will expand the heap by the size of the allocation, or by half of the current walloc heap size, whichever is larger. The resulting page or pages form a large object that can satisfy the allocation.

    If the best object on the freelist has more than a chunk of space on the end, it is split, and the tail put back on the freelist. A chunk is 256 bytes.

    | page header | chunk 1 | chunk 2 | ... | chunk 255 |
    ^ +0          ^ +256    ^ +512                      ^ +64 kB

    As each page is 65536 bytes, and each chunk is 256 bytes, there are therefore 256 chunks in a page. The first chunk in a page that begins an allocated object, large or small, contains a header chunk. The page header has a byte for each of the 256 chunks in the page. The byte is 255 if the corresponding chunk starts a large object; otherwise the byte indicates the size class for packed small-object allocations (see below).

    | page header | large object 1    | large object 2 ...   |
    ^ +0          ^ +256    ^ +512                           ^ +64 kB

    When splitting large objects, we avoid starting a new large object on a page header chunk. A large object can only span where a page header chunk would be if it includes the entire page.

    Freeing a large object pushes it on the global freelist. We know a pointer is a large object by looking at the page header. We know the size of the allocation, because the large object header precedes the allocation. When the next large object allocation happens after a free, the freelist will be compacted by merging adjacent large objects.

    small fry

    Small objects are allocated from segregated freelists. The granule size is 8 bytes. Small object allocations are packed in a chunk of uniform allocation size. There are size classes for allocations of each size from 1 to 6 granules, then 8, 10, 16, and 32 granules; 10 sizes in all. For example, an allocation of e.g. 12 granules will be satisfied from a 16-granule chunk. Each size class has its own free list.

    struct small_object_freelist {
      struct small_object_freelist *next;
    struct small_object_freelist small_object_freelists[10];

    When allocating, if there is nothing on the corresponding freelist, walloc will allocate a new large object, then change its chunk kind in the page header to the size class. It then goes through the fresh chunk, threading the objects through each other onto a free list.

    | page header | large object 1    | granules=4 | large object 2' ... |
    ^ +0          ^ +256    ^ +512    ^ +768       + +1024               ^ +64 kB

    In this example, we imagine that the 4-granules freelist was empty, and that the large object freelist contained only large object 2, running all the way to the end of the page. We allocated a new 4-granules chunk, splitting the first chunk off the large object, and pushing the newly trimmed large object back onto the large object freelist, updating the page header appropriately. We then thread the 4-granules (32-byte) allocations in the fresh chunk together (the chunk has room for 8 of them), treating them as if they were instances of struct freelist, pushing them onto the global freelist for 4-granules allocations.

               in fresh chunk, next link for object N points to object N+1
                                     |        |
    granules=4: | (padding, maybe) | object 0 | ... | object 7 |
                                   ^ 4-granule freelist now points here 

    The size classes were chosen so that any wasted space (padding) is less than the size class.

    Freeing a small object pushes it back on its size class's free list. Given a pointer, we know its size class by looking in the chunk kind in the page header.

    and that's it

    Hey have fun with the thing! Let me know if you find it useful. Happy hacking and until next time!

    by Andy Wingo at October 13, 2020 01:34 PM

    October 01, 2020

    Sergio Villar

    Closing the gap (in flexbox 😇)

    Flexbox had a lot of early problems, but by mid-May 2020 where our story begins, both Firefox and Chromium had done a lot of work on improving things with this feature. WebKit, however, hadn’t caught up. Prioritizing the incredible amounts of work a web engine requires is difficult. The WebKit implementation was still passable for very many (most) cases of the core features, and it didn’t have problems that caused crashes or something that urgently demanded attention, so engineers dedicated their limited time toward other things. The net result, however, was that as this choice repeated many times, the comparative state of WebKit’s flexbox implementation had fallen behind pretty significantly.
    Web Platform Tests (WPT) is a huge ongoing effort from many people to come up with a very extensive list of tests that could help both spec editors and implementors to make sure we have great compatibility. In the case of flexbox, for example, there are currently 773 tests (2926 subtests) and WebKit was failing a good amount of them. This matters a lot because there are things that flexbox is ideal for, and it is exceptionally widely used. In mid-May, Igalia was contracted to improve things here, and in this post, I’ll explain and illustrate how we did that.

    The Challenge

    The main issues were (in no particular order):
    • min-width:auto and min-height:auto handling
    • Nested flexboxes in column flows
    • Flexboxes inside tables and viceversa
    • Percentages in heights with indefinite sizes
    • WebKit CI not runnning many WPT flexbox tests
    • and of course… lack of gap support in Flexbox
    Modifying Flexbox layout code is a challenge by itself. Tiny modifications in the source code could cause huge differences in the final layout. You might even have a patch that passes all the tests and regresses multiple popular web sites.
    Good news is that we were able to tackle most of those issues. Let’s review what changes you could eventually expect from future releases of Safari (note that Apple doesn’t disclose information about future products and/or releases) and the other WebKit based browsers (like GNOME Web).

    Flexbox gaps 🥳🎉

    Probably one of the most awaited features in WebKit by web developers. It’s finally here after Firefox and Chrome landed it not so long ago. The implementation was initially inspired by the one in Chrome but then it diverged a bit in the final version of the patch. The important thing is that the behaviour should be the same, at least all the tests in WPT related to gaps are passing now in WebKit trunk.
    <div style="display: flex; flex-wrap: wrap; gap: 1ch">
      <div style="background: magenta; color: white">Lorem</div>
      <div style="background: green; color: white">ipsum</div>
      <div style="background: orange; color: white">dolor</div>
      <div style="background: blue; color: white">sit</div>
      <div style="background: brown; color: white">amet</div>

    Tables as flex items

    Tables should obey the flex container sizing whenever they are flex items. As it can be seen in the examples bellow, the tables’ layout code was kicking in and ignoring the constraints set by the flex container. Tables should do what the flex algorithm mandates and thus they should allow being stretched/squeezed as required.
    <div style="display:flex; width:100px; background:red;">
      <div style="display:table; width:10px; max-width:10px; height:100px; background:green;">
        <div style="width:100px; height:10px; background:green;"></div>

    Tables with items exceeding the 100% of available size

    This is the case of tables placed inside flex items. The automatic layout table algorithm was generating tables with unlimited widths when the sum of the sizes of their columns (expressed in percentages) was exceeding the 100%. It was impossible to fulfill at the same time the constraints set by tables and flexbox algorithms.
    <div style="display:flex; width:100px; height:100px; align-items:flex-start; background:green;">
      <div style="flex-grow:1; flex-shrink:0;">
        <table style="height:50px; background:green;" cellpadding="0" cellspacing="0">
            <td style="width:100%; background:green;"> </td>
            <td style="background:green;"> </td>
    Note how the table was growing indefinitely (I cropped the “Before” picture to fit in the post) to the right before the fix.

    Alignment in single-line flexboxes

    Interesting case. The code was considering that single-line flexboxes were those where all the flex items were placed in a single line after computing the required space for them. Though sensible, that’s not what a single line flexbox is, it’s a flex container with flex-wrap:nowrap. This means that a flex container with flex-wrap:wrap whose children do not need more than 1 flex line to be placed is not a single-line flex container from the specs POV (corolary: implementing specs is hard).
    <div style="display: flex; flex-wrap: wrap; align-content: flex-end; width: 425px; height: 70px; border: 2px solid black">
      <div style="height: 20px">This text should be at the bottom of its container</div>

    Percentages in flex items with indefinite sizes

    One of the trickiest ones. Although it didn’t involve a lot of code it caused two serious regressions in Youtube’s upload form and when viewing Twitter videos in fullscreen which required some previous fixes and delayed a bit the landing of this patch. Note that this behaviour was really conflictive from the pure specification POV as there were many changes over the time. Defining a good behaviour is really complicated. Without entering in too much details, flexbox has a couple of cases were sizes are considered as definite when they are theoretically indefinite. In this case we consider that if the flex container main size is definite then the post-flexing size of flex items is also treated as definite.
    <div style="display: flex; flex-direction: column; height: 150px; width: 150px; border: 2px solid black;">
        <div style="height: 50%; overflow: hidden;">
          <div style="width: 50px; height: 50px; background: green;"></div>
      <div style="flex: none; width: 50px; height: 50px; background: green;"></div>

    Hit testing with overlapping flex items

    There were some issues with pointer events passing through overlapping flex items (due to negative margins for example). This was fixed by letting the hit testing code proceed in reverse (the opposite to painting) order-modified document order instead of using the raw order from the DOM.
    <div style="display:flex; border: 1px solid black; width: 300px;">
      <a style="width: 200px;" href="#">Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua</a>
      <div style="margin-left: -200px; width: 130px; height: 50px; background: orange;"></div>
    In the “Before” case hit testing was bypassing the orange block and thus, the cursor was showing a hand because it detected that it was hovering a link. After the fix, the cursor is properly rendered as an arrow because the orange block covers the underneath link.

    Computing percentages with scrollbars

    In this case the issue was that, in order to compute percentages in heights, we were incorrectly using the size of the scrollbars too.
    <div style="display: inline-flex; height: 10em;">
      <div style="overflow-x: scroll;">
        <div style="width: 200px; height: 100%; background: green"></div>
    Note that in the “After” picture the horizontal scrollbar background is visible while in the “Before” the wrong height computation made the flex item overlap the scrollbar.

    Image items with specific sizes

    The flex layout algorithm needs the intrinsic sizes of the flex items to compute their sizes and the size of the flex container. Changes to those intrinsic sizes should trigger new layouts, and the code was not doing that.
    <!-- Just to showcase how the img bellow is not properly sized -->
    <div style="position: absolute; background-color: red; width: 50px; height: 50px; z-index: -1;"></div>
    <div style="display: flex; flex-direction: column; width: 100px; height: 5px;">
      <img style="width: 100px; height: 100px;" src="">

    Nested flexboxes with ‘min-height: auto’

    Another tricky one and another one related to the handling of nested column flexboxes. As in the previous issue with nested column flexboxes the problem was that we were not supporting this case. For those wanting to have a deeper understanding of the issue this bug was about implementing section 4.5 of the specs. This was one of the more complicated ones to fix, Edward Lorenz would love that part of the layout code, the slightest change in one of those source code lines could trigger huge changes in the final rendering.
    <div style='display:flex; flex-direction: column; overflow-y: scroll; width: 250px; height: 250px; border: 1px solid black'>
      <div style='display:flex;'>
        <div style="width: 100px; background: blue"></div>
        <div style='width: 120px; background: orange'></div>
        <div style='width: 10px; background: yellow; height: 300px'></div>
    As it can be seen, in the “Before” picture the blue and orange blocks are sized differently to the yellow one. That’s fixed in the “After” picture.

    Percentages in quirks mode

    Another one affecting how percentages are computed in heights, but this one specific to quirks mode. We’re matching now Firefox, Chrome and pre-Chromium Edge, i.e., flexbox should not care much about quirks mode since it was invented many years after quirky browsers dominated the earth.
    <!DOCTYPE html PUBLIC>
    <div style="width: 100px; height: 50px;">
      <div style="display: flex; flex-direction: column; outline: 2px solid blue;">
        <div style="flex: 0 0 50%"></div>

    Percentages in ‘flex-basis’

    Percentages were working generally fine inside flex-basis, however there was one particular problematic case. It arose whenever that percentage was refererring to, oh surprise, and indefinite height. And again, we’re talking about nested flexboxes with column flows. Indeed, definite/indefinite sizes is one of the toughest things to get right from the layout POV. In this particular case, the fix was to ignore the percentages and and treat them as height: auto.
    <div style="display: flex; flex-direction: column; width: 200px;">
      <div style="flex-basis: 0%; height: 100px; background: red;">
        <div style="background: lime">Here's some text.</div>

    Flex containers inside STF tables

    Fixing a couple of test cases submitted by an anonymous Opera employee 8! years ago. This is another case of competing layout contexts trying to do things their own way.
    <div style="display: table; background:red">
       <div style="display: flex; width: 0px">
          <p style="margin: 1em 1em;width: 50px">Text</p>
          <p style="margin: 1em 1em;width: 50px">Text</p>
          <p style="margin: 1em 1em;width: 50px">Text</p>
    After the fix the table is properly sized to 0px width and thus no red is seen.


    These examples are just some interesting ones I’ve chosen to highlight. In the end, almost 50 new flexbox tests are passing in WebKit that weren’t back in May!. I wouldn’t like to forget the great job done by my colleague Carlos Lopez who imported tons of WPT flexbox tests into the WebKit source tree. He also performed awesome triage work which made my life a lot easier.
    Investing in interoperability is a huge deal for the web. It’s good for everyone, from spec authors to final users, including browser vendors, downstream ports or web authors. So if you care about the web, or your business orbits around web technologies, you should definitely promote and invest on interoperability.

    Implementing standards or fixing bugs in web engines is the kind of work we happily do at Igalia on a daily basis. We are the second largest contributor to both WebKit and Chrome/Blink, so if you have an annoying bug on a particular web engine (Gecko and Servo as well) that you want to be fixed, don’t hesitate and contact us, we’d be glad to help. Also, should you want to be part of a workers-owned cooperative with an asambleary decision-making mechanism and a strong focus on free software technologies join us!.


    Many thanks to WebKit reviewers from Apple and Igalia like Darin Adler, Manuel Rego, Javier Fernández or Daniel Bates who made the process really easy for me, always providing very nice feedback for the patches I submitted.
    I’m also really thankful to Googlers like Christian Biesinger, David Grogan and Stephen McGruer who worked on the very same things in Blink and/or provided very nice guidance and support when porting patches.

    by svillar at October 01, 2020 11:34 AM

    September 28, 2020

    Adrián Pérez

    Sunsetting NPAPI support in WebKitGTK (and WPE)

    1. Summary
    2. What is NPAPI?
    3. What is NPAPI used for?
    4. Why are NPAPI plug-ins being phased out?
    5. What are other browsers doing?
    6. Is WebKitGTK following suit?


    Here’s a tl;dr list of bullet points:

    • NPAPI is an old mechanism to extend the functionality of a web browser. It is time to let it go.
    • One year ago, WebKitGTK 2.26.0 removed support for NPAPI plug-ins which used GTK2, but the rest of plug-ins kept working.
    • WebKitGTK 2.30.x will be the last stable series with support for NPAPI plug-ins at all. Version 2.30.0 was released a couple of weeks ago.
    • WebKitGTK 2.32.0, due in March 2021, will be the first stable release to ship without support for NPAPI plug-ins.
    • We have already removed the relevant code from the WebKit repository.
    • While the WPE WebKit port allowed running windowless NPAPI plug-ins, this was never advertised nor supported by us.

    What is NPAPI?

    In 1995, Netscape Navigator 2.0 introduced a mechanism to extend the functionality of the web browser. That was NPAPI, short for Netscape Plugin Application Programming Interface. NPAPI allowed third parties to add support for new content types; for example Future Splash (.spl files), which later became Flash (.swf).

    When a NPAPI plug-in is used to render content, the web browser carves a hole in the rectangular location where content handled by the plug-in will be placed, and hands off the rendering responsibility to the plug-in. This would end up calling call for trouble, as we will see later.

    What is NPAPI used for?

    A number of technologies have used NPAPI along the years for different purposes:

    • Displaying of multimedia content using Flash Player or the Silverlight plug-ins.
    • Running rich Java™ applications in the browser.
    • Displaying documents in non-Web formats (PDF, DjVu) inside browser windows.
    • A number of questionable practices, like VPN client software using a browser plug‑in for configuration.

    Why are NPAPI plug-ins being phased out?

    The design of NPAPI makes the web browser give full responsibility to plug-ins: the browser has no control whatsoever over what plug-ins do to display content, which makes it hard to make them participate in styling and layout. More importantly, plug-ins are compiled, native code over which browser developers cannot exercise quality control, which resulted in a history of security incidents, crashes, and browser hangs.

    Today, Web browsers’ rendering engines can do a better job than plug-ins, more securely and efficiently. The Web platform is mature and there is no place to blindly trust third party code to behave well. NPAPI is a 25 years old technology showing its age—it has served its purpose, but it is no longer needed.

    The last nail in the coffin was Adobe’s 2017 announcement that the Flash plugin will be discontinued in January 2021.

    What are other browsers doing?

    Glad that you asked! It turns out that all major browsers have plans for incrementally reducing how much of NPAPI usage they allow, until they eventually remove it.


    Let’s take a look at the Firefox roadmap first:

    Version Date Plug-in support changes
    47 June 2016 All plug-ins except Flash need the user to click on the element to activate them.
    52 March 2017 Only loads the Flash plug‑in by default.
    55 August 2017 Does not load the Flash plug‑in by default, instead it asks users to choose whether sites may use it.
    56 September 2017 On top of asking the user, Flash content can only be loaded from http:// and https:// URIs; the Android version completely removes plug‑in support. There is still an option to allow always running the Flash plug-in without asking.
    69 September 2019 The option to allow running the Flash plug-in without asking the user is gone.
    85 January 2021 Support for plug-ins is gone.
    Table: Firefox NPAPI plug-in roadmap.

    In conclusion, the Mozilla folks have been slowly boiling the frog for the last four years and will completely remove the support for NPAPI plug-ins coinciding with the Flash player reaching EOL status.

    Chromium / Chrome

    Here’s a timeline of the Chromium roadmap, merged with some highlights from their Flash Roadmap:

    Version Date Plug-in support changes
    ? Mid 2014 The interface to unblock running plug-ins is made more complicated, to discourage usage.
    ? January 2015 Plug-ins blocked by default, some popular ones allowed.
    42 April 2015 Support for plug-ins disabled by default, setting available in chrome://flags.
    45 September 2015 Support for NPAPI plug-ins is removed.
    55 December 2016 Browser does not advertise Flash support to web content, the user is asked whether to run the plug-in for sites that really need it.
    76 July 2019 Flash support is disabled by default, can still be enabled with a setting.
    88 January 2021 Flash support is removed.
    Table: Chromium NPAPI/Flash plug-in roadmap.

    Note that Chromium continued supporting Flash content even when it already removed support for NPAPI in 2015: by means of their acute NIH syndrome, Google came up with PPAPI, which replaced NPAPI and which was basically designed to support Flash and is currently used by Chromium’s built-in PDF viewer—which will go away also coinciding with Flash being EOL, nevertheless.


    On the Apple camp, the story is much easier to tell:

    • Their handheld devices—iPhone, iPad, iPod Touch—never supported NPAPI plug-ins to begin with. Easy-peasy.
    • On desktop, Safari has required explicit approval from the user to allow running plug-ins since June 2016. The Flash plug-in has not been preinstalled in Mac OS since 2010, requiring users to manually install it.
    • NPAPI plug-in support will be removed from WebKit by the end of 2020.

    Is WebKitGTK following suit?

    Yes. In September 2019 WebKitGTK 2.26 removed support for NPAPI plug-ins which use GTK2. This included Flash, but the PPAPI version could still be used via freshplayerplugin.

    In March 2021, when the next stable release series is due, WebKitGTK 2.32 will remove the support for NPAPI plug-ins. This series will receive updates until September 2021.

    The above gives a full two years since we started restricting which plug-ins can be loaded before they stop working, which we reckon should be enough. At the moment of writing this article, the support for plug-ins was already gone from the WebKit source the GTK and WPE ports.

    Yes, you read well, WPE supported NPAPI plug-ins, but in a limited fashion: only windowless plug-ins worked. In practice, making NPAPI plug-ins work on Unix-like systems required using the XEmbed protocol to allow them to place their rendered content overlaid on top of WebKit’s, but the WPE port does not use X11. Provided that we never advertised nor officially supported the NPAPI support in the WPE port, we do not expect any trouble removing it.

    by aperez ( at September 28, 2020 10:00 PM

    September 27, 2020

    Ricardo García

    My participation in XDC 2020

    The 2020 X.Org Developers Conference took place from September 16th to September 18th. For the first time, due to the ongoing COVID-19 pandemic, it was a fully virtual event. While this meant that some interesting bits of the conference, like the hallway track, catching up in person with some people and doing some networking, was not entirely possible this time, I have to thank the organizers for their work in making the conference an almost flawless event. The conference was livestreamed directly to YouTube, which was the main way for attendees to watch the many different talks. freenode was used for the hallway track, with most discussions happening in the ##xdc2020 IRC channel. In addition ##xdc2020-QA was used for attendees wanting to add questions or comments at the end of the talk.

    Igalia was a silver sponsor of the event and we also participated with 5 different talks, including one by yours truly.

    My talk about VK_EXT_extended_dynamic_state was based on my previous blog post, but it includes a more detailed explanation of the extension as well as more detailed comments and an explanation about how the extension was created. I took advantage of the possibility of using pre-recorded videos for the conference, as I didn’t fully trust my kids wouldn’t interrupt me in the middle of the talk. In the end I think it was a good idea and, from the presenter point of view, I also found out using a script and following it strictly (to some degree) prevented distractions and made the talk a bit shorter and more to the point, because I tend to beat around the bush when talking live. You can watch my talk in the embedded video below.

    * { padding: 0; margin: 0; overflow: hidden; } html, body { height: 100%; } img, span { /* All elements take the whole iframe width and are vertically centered. */ position: absolute; width: 100%; top: 0; bottom: 0; margin: auto; } span { /* This mostly applies to the play button. */ height: 1.5em; text-align: center; font-family: sans-serif; font-size: 500%; color: white; } Video XDC 2020 | How the Vulkan VK_EXT_extended_dynamic_state extension came to be "> * { padding: 0; margin: 0; overflow: hidden; } html, body { height: 100%; } img, span { /* All elements take the whole iframe width and are vertically centered. */ position: absolute; width: 100%; top: 0; bottom: 0; margin: auto; } span { /* This mostly applies to the play button. */ height: 1.5em; text-align: center; font-family: sans-serif; font-size: 500%; color: white; } Video XDC 2020 | How the Vulkan VK_EXT_extended_dynamic_state extension came to be " >

    Slides for the talk are also available and below you can find a transcript of the talk.

    <Title slide>

    Hello, my name is Ricardo García, I work at Igalia as part of its Graphics team and today I will be talking about the extended dynamic state Vulkan extension. At Igalia I was involved in creating CTS tests for this extension and also in reviewing the spec when writing those tests, in a very minor capacity. This extension is pretty simple and very useful, and the talk is divided in two parts. First I will talk about the extension itself and then I’ll reflect on a few bits about how this extension was created that I consider quite interesting.

    <Part 1>

    <Extension description slide>

    So, first, what does this extension do? Its documentation says:

    VK_EXT_extended_dynamic_state adds some more dynamic state to support applications that need to reduce the number of pipeline state objects they compile and bind.

    In other words, as you will see, it makes Vulkan pipeline objects more flexible and easier to use from the application point of view.

    <Pipeline diagram slide>

    So, to give you some context, this is [the] typical graphics pipeline representation in many APIs like OpenGL, DirectX or Vulkan. You’ve probably seen variations of this a million times. The pipeline is divided in stages, some of them fixed-function, some of them programmable with shaders. Each stage usually takes some data from the previous stage and produces data to be consumed by the next one, apart from using other external resources like buffers or textures or whatever. What’s the Vulkan approach to represent this process?

    <Creation structure slide>

    Vulkan wants you to specify almost every single aspect of the previous pipeline in advance by creating a graphics pipeline object that contains information about how every stage should work. And, once created, most of these pipeline parameters or configuration cannot be changed. As you can see here, this includes shader programs, how vertices are read and processed, depth and stencil tests, you name it. Pipeline objects are heavy objects in Vulkan and they are hard to create. Why does Vulkan want you to do that? The answer has always been this keyword: “optimization”. Giving all the information in advance gives more chances for every current or even future implementations to optimize how the pipeline works. It’s the safe choice. And, despite this, you can see there’s a pipeline creation parameter with information about dynamic state. These are things that can be changed when using the pipeline without having to create a separate and almost identical pipeline object.

    <New dynamic states slide>

    What the extension does should be pretty obvious now: it adds a bunch of additional elements that can be changed on the fly without creating additional pipelines. This includes things like primitive topology, front face vertex order, vertex stride, cull mode and more aspects of the depth and stencil tests, etc. A lot of things. Using them if needed means fewer pipeline objects, fewer pipeline cache accesses and simpler programs in general. As I said before, it makes Vulkan pipeline objects more flexible and easier to use from the application point of view, because more pipeline aspects can be changed on the fly when using these pipeline objects instead of having to create separate objects for each combination of parameters you may want to modify at runtime. This may make the application logic simpler and it can also help when Vulkan is used as the backend, for example, to implement higher level APIs that are not so rigid regarding pipelines. I know this extension is useful for some emulators and other API-translating projects.

    <New commands slide>

    Together with those it also introduces a new set of functions to change those parameters on the fly when recording commands that will use the pipeline state object.

    <Pipeline diagram slide>

    So, knowing that and going back to the graphics pipeline, the obvious question is: does this impact performance? Aren’t we reducing the number of optimization opportunities the implementation has if we use these additional dynamic states? In theory, yes. In practice, it depends on the implementation. Many GPUs and Vulkan drivers out there today have some pipeline aspects that are considered “dynamic” in the sense that they are easily changed on the fly without a perceptible impact in performance, while others are truly important for optimization. For example, take shaders. In Vulkan they’re provided as SPIR-V programs that need to be translated to GPU machine code and creating pipelines when the application starts makes it easy to compile shaders beforehand to avoid stuttering and frame timing issues later, for example. And not only that. As you create pipelines, you’re telling the implementation which shaders are used together. Say you have a vertex shader that outputs 4 parameters, and it’s used in a pipeline with a fragment shader that only uses the first 2. When creating the pipeline the implementation can decide to discard instructions that are only related to producing the 2 extra unused parameters in the vertex shader. But other things like, for example, changing the front face? That may be trivial without affecting performance.

    <Part 2>

    <Eric Lengyel tweet slide>

    Moving on to the second part, I wanted to talk about how this extension was created. It all started with an “angry” tweet by Eric Lengyel (sorry if I’m not pronouncing it correctly) who also happens to be the author of the previous diagram. He complained in Twitter that you couldn’t change the front face dynamically, which happens to be super useful for rendering reflections, and pointed to an OpenGL NVIDIA extension that allowed you to do exactly that.

    <Piers Daniell reply slide>

    This was noticed by Piers Daniell from NVIDIA, who created a proposal in Khronos. That proposal was discussed with other vendors (software and hardware) that chimed in on aspects that could be or should be made dynamic if possible, which resulted in the multi-vendor extension we have today.

    <RADV implementation slide>

    In fact, RADV was one of the first Vulkan implementations to support the extension thanks to the effort by Samuel Pitoiset.

    <Promoters of Khronos slide>

    This whole process got me thinking Khronos may sometimes be seen from the outside as this closed silo composed mainly of hardware vendors. Certainly, there are a lot of hardware vendors but if you take the list of promoter members you can see some fairly well-known software vendors as well, and API usability and adoption are important for both groups. There are many people in Khronos trying to make Vulkan easier to use even if we’re all aware that’s somewhat in conflict with providing a lower level API that should let you write performant applications.

    <Khronos Contributors slide>

    If you take a look at the long list of contributor members, that’s only shown partially here because it’s very long, you’ll notice a lot of actors from different backgrounds as well.

    <Vulkan-Docs repo slide>

    Moreover, while Khronos and its different Vulkan working groups are far from an open source project or community, I believe they’re certainly more open to contributions than what many people think. For example, the Vulkan spec is published in a GitHub repo with instructions to build it (the spec is written in AsciiDoc) and this repo is open for issues and pull requests. So, obviously, if you want to change major parts of Vulkan and how some aspects of the API work, you’re going to meet opposition and maybe you should be joining Khronos to discuss things internally with everyone involved in there. However, while an angry tweet was enough for this particular extension, if you’re not well-known you may want to create an issue instead, exposing your use case and maybe with other colleagues chiming in on details or supporting of your proposal. I know for a fact issues created in this public repo are discussed in periodic Khronos meetings. It may take some weeks if people are busy and there’s a lot of things on the table, but they’re going to end up being discussed, which is a very good thing I was happy to see, and I want to put emphasis on that. I would like Khronos to continue doing that and I would like more people to take advantage of the public repos from Khronos. I know the people involved in the Vulkan spec want to make the text as clear as possible. Maybe you think some paragraph is confusing, or there’s a missing link to another section that provides more context, or something absurd is allowed by the spec and should be forbidden. You can try a reasoned pull request for any of those. Obviously, no guarantees it will go in, but interesting in any case.

    <Blend state tweet slide>

    For example, in the Twitter thread I showed before, I tweeted a reply when the extension was published and, among a few retweets, likes and quoted replies I found this very interesting Tweet I’m showing you here, asking for the whole blend state to be made dynamic and indicating that would be game-changing for some developers and very interesting for web browsers. We all want our web browsers to leverage the power of the GPU as much as possible, right? So why not? I thought creating an issue in the public repo for this case could be interesting.

    <Dynamic blend state issue slide>

    And, in fact, it turns out someone had already created an issue about it, as you can see here.

    <Tom Olson reply slide>

    And in this case, in this issue, Tom Olson from ARM replied that the working group had been discussing it and it turns out in this particular case existing hardware doesn’t make it easy to make the blend state fully dynamic without possibly recompiling shaders under the hood and introducing unwanted complexity in the implementations, so it was rejected for now. But even if, in this case, the reply is negative, you can see what I was mentioning: the issue reached the working group, it was considered, discussed and the issue creator got a reply and feedback. And that’s what I wanted to show you.

    <Final slide>

    And that’s all. Thanks for listening! Any questions maybe?

    The talk was followed by a Q&A section moderated, in this case, by Martin Peres. In the text below RG stands for Ricardo Garcia and MP stands for Martin Peres.

    RG: OK…​ Hello everyone!

    MP: OK, so far we do not have any questions. Jason Ekstrand has a comment: "We (the Vulkan Working Group) has had many contributions to the spec".

    RG: Yeah, yeah, exactly. I mean, I don’t think it’s very well known but yeah, indeed, there are a lot of people who have already contributed issues, pull requests and there have been many external contributions already so these things should definitely continue and even happen more often.

    MP: OK, I’m gonna ask a question. So…​ how much do you think this is gonna help layering libraries like Zink because I assume, I mean, one of the big issues with Zink is that you need to have a lot of pipelines precompiled and…​ is this helping Zink?

    RG: I don’t know if it’s being used. I think I did a search yesterday to see if Zink was using the extension and I don’t remember if I found anything specific so maybe the Zink people can answer the question but, yeah, it should definitely help in those cases because OpenGL is not as strict as Vulkan regarding pipelines obviously. You can change more things on the fly and if the underlying Vulkan implementation supports extended dynamic state it should make it easier to emulate OpenGL on top of Vulkan. For example, I know it’s being used by VKD3D right now to emulate DirectX 12 and there’s a emulator, a few emulators out there which are using the extension because, you know, APIs for consoles are different and they can use this type of extensions to make code better.

    MG: Agree. Jason also has another comment saying there are even extensions in flight from the Mesa community for some windowing-system related stuff.

    RG: Yeah, I was happy to see yesterday…​ I think it was yesterday, well, here at this XDC that the present timing extension pull request is being handled right now on GitHub which I think is a very good thing. It’s a trend I would like to [see] continue because, well, I guess sometimes, you know, the discussions inside the Working Group and inside Khronos may involve IP or whatever so it’s better to have those discussions sometimes in private, but it is a good thing that maybe, you know, there are a few extensions that could be handled publicly in GitHub instead of the internal tools in Khronos. So, yeah, that’s a good thing and a trend I would like to see continue: extensions discussed in public.

    MG: Yeah, sounds very cool. OK, I think we do not have any question…​ other questions or comments so let’s say thank you very much and…​

    RG: Thank you very much and let me congratulate you for…​ to the organizers for organizing XDC and…​ everyone, enjoy the rest of the day, thank you.

    MG: Thank you! See you in 13m 30s for the status of’s GitLab cloud hosting.

    Regarding Zink, at the time I’m writing this, there’s an in-progress merge request for it to take advantage of the extension. Regarding the present timing extension, its pull request is at GitHub and you can also watch a short talk from Day One of the conference. I also mentioned the extension being used by VKD3D. I was specifically referring to the VKD3D-Proton fork.

    References used in the talk:

    September 27, 2020 01:25 PM

    Delan Azabani

    My internship with Igalia

    I was looking for a job late last year when I saw a tweet about a place called Igalia. The more I learned about them, the more interested I became, and before long I applied to join their Web Platform team. I didn’t have enough experience for a permanent position, but they did offer me a place in their Coding Experience program, which as far as I can tell is basically an internship, and I thoroughly enjoyed it. Here’s an overview of what I did and what I learned.

    figure { text-align: center; } figcaption { font-size: 0.75em; } figure img { max-width: 100%; } article > p { margin: 0; clear: both; } article > * + *, article > * + p { margin-top: 0.5em; } /* .local-float { display: block; width: 13em; margin: 1em 0 1em 1em; float: right; line-height: 1; } .local-float > * + * { margin-top: 1em; } .local-float > a { display: block; width: 100%; } .local-float > a > img { display: block; width: 100%; } */


    Why Igalia?

    There’s a wide range of work I can do as a computer programmer, but the vast majority of it seems to be in closed-source web applications, as an employee with a limited voice in the decisions that affect my work.

    At the time, all of my work since I graduated had been exactly that, or in builds and releases for said applications. That was interesting enough for a while, but I wanted to make a bigger impact, work on something I actually cared about of my own volition, and ideally move towards getting paid to do systems programming.

    Igalia appeals to me, with their focus on open-source projects, systems programming, and standards work. Even better, as a field, the web platform has been my one true love, and building things on it is how I got into programming over 15 years ago. But what cements their place as my “dream job” is how they work: as a distributed worker’s cooperative.

    What I mean by “distributed” is that members can work from anywhere in the world, paid in a way that fairly adjusts for location, and in whatever setting they thrive in (such as home). This alone was huge, as someone who can’t sustainably work in an office five days a week, had to move 4000 km away from home to do so, and had just left an employer that was actively hostile to remote work.

    Andy Wingo (author of that tweet) offers some insight into the “worker’s cooperative” part in these three posts. Igalia’s rough goal here, as far as I can tell, is that everyone gets a voice in deciding what the collective works on and how (to the extent that those decisions affect them), equal ownership of the business, and equivalent pay modulo effort and cost of living. This appeals to me as an anarchist, but also as a worker that has often been on the receiving end of unethical work, poor working conditions, and lack of autonomy.


    One goal of my internship was to help the Web Platform team with their MathML work, but I was also there to familiarise myself with working on the web platform, and my first task was purely for the latter.

    Many parts of the web platform have case-insensitive keywords that control an API or language feature, like link@rel (the <link rel="..."> attribute), but thanks to Unicode, there’s more than one level of case-insensitivity. Unicode case-insensitivity won’t break backwards compatibility of web content over time, but to improve interoperability and simplify implementations, things like the HTML spec tend to explicitly call for ASCII case-insensitivity, at least for keywords that are nominally ASCII.

    That makes Blink’s widespread use of Unicode case-insensitivity in these situations a bug, and my job was to fix that bug, which sounds simple enough, until you realise that doing so is technically a breaking change. You see, there are already a couple of non-ASCII characters that can introduce esoteric ways to write many of those keywords.

    More importantly, the web platform is almost1 unique in that breaking existing content is, in general, not allowed. But this time a breaking change was unavoidable, like any time where an implementation is fixed to align with the standard, or some behaviour is standardised after incompatible implementations appear. There might be content out there that relies on something like <link rel="ſtylesheet"> because it worked on Chromium.

    There are a few ways to minimise the impact of these breaking changes, like adding analytics to browsers to count how many pages would be affected, or searching archives of web content, but in this case we decided the risk was low enough that I could simply fix the bug and write some tests.

    Fixing the bug

    It’s hard to get a usable LSP setup going for a project as big as a browser. I switched between ccls and clangd a bunch of times, but I never quite got either working too well. My main machine is also getting pretty long in the tooth, which made indexing take forever and updating my branches expensive. I considered writing an LSP client that would allow me to kick off an index on one of Igalia’s 128-thread build boxes without an editor, but I eventually settled on using Chromium Code Search to jump around and investigate things. Firefox similarly has Searchfox2, but WebKit doesn’t yet have a public counterpart3.

    I was looking for callers of three deprecated functions, but not all of them were relevant to the bug, and not all of those needed tests, and so on. To help me analyse and categorise all of the potential call sites, I wrote some pretty intricate regular expressions for Sublime Text 2. This one finds all callers of DeprecatedEqualIgnoringCase, with two arguments, where one of them is an ASCII literal that wouldn’t need new tests (skSK):

    (?<literal>"(?:(?=[ -~])[^"skSK]|(?&escape))*"){0}

    After my first patch, which I wrote by hand, I also used those to do the actual replacing, maintaining a huge analysis of all the cases that remained after my second patch.

    Writing some tests

    Each of the major engines has its own web content tests, and automated tests are strongly preferred over manual tests if at all possible. All of the tests I wrote were automated, and most were Web Platform Tests, which are especially cool because they’re a shared suite of web content tests that can be run on any browser. Chromium and Firefox even automatically upstream changes to their vendored WPT trees!

    Many of my tests were for values of HTML attributes whose invalid value default was a different state to the keyword’s state. In these cases, I didn’t even need to assert anything about the attribute’s actual behaviour! All I had to do was write a tag, read the attribute in JavaScript, and check if the value we get back corresponds to the intended feature (bad) or the invalid value default (good).

    Some legacy HTML attributes are now specified in terms of CSS “presentational hints”, so I checked the results of getComputedStyle for those, but the coolest tests I learned to write were reftests. Very few web platform features guarantee that every user agent on every platform will render them identically down to the pixel, and over time, unrelated platform changes can affect a test’s expected rendering. Both of these things are ok, but they make it impractical for tests to compare web content against screenshots. Reftests consist of a test page that uses the feature being tested, and a reference page that should look the same without using the feature. The reference page is like a screenshot, but it’s subject to all of the same variables as the test page, such as font rendering.

    Ever heard of the Acid Tests? Acid2 is more or less a reftest, because it has a reference page that only uses a screenshot for the platform-independent parts. Acid1 uses a screenshot of the whole test, hence “except font rasterization and form widgets”.

    I had a lot of fun writing my two form-related tests, because I actually had to submit forms to observe those features’ behaviour. WPT has server-side testing infrastructure that can help with this, and for such tests, I would need to spin up the provided web server or run the finished product with wpt.live4. In both cases, I avoided the need for that with a <form method="GET"> that targets an iframe, plus a helper page that sends its query string back to the test page.

    MathML tasks

    MathML was meant to be the native language for mathematics on the web, and that’s still true today, but two decades later, browser support still has a long way to go. There are several reasons for this, notably including the largely volunteer-driven development of MathML and its implementations, but over the last few years, Igalia has helped change that on three fronts: writing a Chromium implementation, improving the Firefox and WebKit implementations, and improving the specs themselves.

    MathML 3 was made a Recommendation in 2014, and like any spec, it has shortcomings that only subsequent experience could identify. Proposals by the MathML Refresh CG like MathML Core are trying to address them in a bunch of ways, like simplifying the spec, setting clearer expectations around rendering, and redefining features in terms of better-supported CSS constructs. My remaining tasks touched on some of these.


    Moving onto WebKit, my next task was to remove some dead code. Past versions of MathML specify a very complex <mstyle> with its own inheritance system that’s incompatible with CSS, as well as several attributes that were rarely if ever used by authors, both of which are a burden on implementors.

    One of those attributes was mstyle@maxsize, which would serve as the default mo@maxsize instead of infinity. With the former removed from the spec, there was no longer a need for an explicit infinity value, so I removed the code for that.

    It turns out WebKit never got around to implementing mstyle@maxsize anyway, so there was no functional change.


    There’s a lot of MathML content that gets rendered like any other text, but stretchy and large operators are a bit more involved than just drawing a single glyph at a single size. A well-known example of a stretchy operator is square root notation, which consists of a radical (the squiggly part) and a vinculum (the overline part) that stretches to cover the expression being rooted.

    xy = x y

    Traditionally this was achieved by knowing where the glyphs for the separate parts lived in each font, so we could stretch and draw them independently. Unicode assignments for stretchy operator parts helped, but that wasn’t enough to yield ideal rendering, because many fonts use Private Use Area characters for some operators, and ordinary fonts don’t give applications the necessary tools to control mathematical layout precisely.

    OpenType MATH tables eventually solved this problem, but that meant Firefox essentially had three code paths: one for OpenType MATH fonts, one with font-specific operator data, and one generic Unicode path for all other fonts. That second one adds a lot of complexity, and there was only one font left with its own operator data: STIXGeneral.

    The goal was ultimately to remove that code path, dropping support for the font. That sounded easy enough until we realised that STIXGeneral remains preinstalled on macOS, as the only stock mathematics font, to this day.

    My task here was to add a feature flag that disables the code path on nightly builds, and gather data around how many pages would be affected. The patch was straightforward, with one change to allow Document::WarnOnceAbout to work with parameterised l10n messages, and I wrote a cute little data URL test page for the warning messages.


    Turning the feature flag on broke a test though, and I couldn’t for the life of me reproduce it locally. Fred and I tried every possible strategy we could imagine short of interactively debugging CI, on and off for six weeks, but it looked like the flaky behaviour involved some sort of race against @font-face loading. Eventually we gave up and disabled the feature flag just for that test, and I landed my patch.

    padding + border + margin

    Another way to improve the relationship between MathML and CSS has been defining how existing CSS constructs from the HTML world, including the box model properties, apply to MathML content. In this case, the consensus was that these properties would “inflate” the content box as necessary, making the element occupy more space.

    Existing implementations in WebKit and Firefox didn’t really handle them at all because it wasn’t in the spec, so the last task I had time for was to change that.

    A modern browser starts by parsing documents into an element tree, which is also exposed to authors as the DOM, but when it comes to rendering, that tree is converted to a layout tree, which represents the boxes to be drawn in a hierarchy of position/size influence. The layout tree consists of layout nodes (Chromium), renderer nodes (WebKit), or frame nodes (Firefox), but these all refer to the same concept.

    I started with Firefox and <mspace> because that was the only element that could not contain children. <mspace> represents, well, a space. It has attributes for width, height (height above the baseline), and depth (height below the baseline), each of which can be negative to bring surrounding elements closer together.

    I found the element’s frame node and noticed this method:

    void nsMathMLmspaceFrame::Reflow(nsPresContext* aPresContext,
                                     ReflowOutput& aDesiredSize,
                                     const ReflowInput& aReflowInput,
                                     nsReflowStatus& aStatus) {
      // [...]
      mBoundingMetrics = nsBoundingMetrics();
      mBoundingMetrics.width = mWidth;
      mBoundingMetrics.ascent = mHeight;
      mBoundingMetrics.descent = mDepth;
      mBoundingMetrics.leftBearing = 0;
      mBoundingMetrics.rightBearing = mBoundingMetrics.width;
      aDesiredSize.Width() = std::max(0, mBoundingMetrics.width);
      aDesiredSize.Height() = aDesiredSize.BlockStartAscent() + mDepth;
      // [...]

    Reflow is the process of traversing the layout tree and figuring out the positions and sizes of all of its nodes, and in Firefox that involves a depth-first tree of nsIFrame::Reflow calls, starting from the initial containing block. An <mspace> frame never has children, so our reflow logic was more or less to take the three attributes, then return a ReflowOutput that tells the parent we need that much space.

    To handle padding and border, we add that to our desired size. “Physical” here means the nsMargin in terms of absolute directions like left and right, as opposed to the LogicalMargin in terms of flow-relative directions, which are aware of direction (LTR + RTL) and writing-mode (horizontal + vertical + sideways). We want to use LogicalMargin in most situations, but MathML Core is currently strictly horizontal-tb and sums of left and right are inherently direction-safe, so nsMargin was the way to go here.

    auto borderPadding = aReflowInput.ComputedPhysicalBorderPadding();
    aDesiredSize.Width() = std::max(0, mBoundingMetrics.width) + borderPadding.LeftRight();
    aDesiredSize.Height() = aDesiredSize.BlockStartAscent() + mDepth + borderPadding.TopBottom();

    That was enough to pass the <mspace> cases in the Web Platform Tests, but the test page I had put together to play around with my patch yielded both good news and bad news. Let’s look at the reference, which uses <div> elements and flexbox rather than MathML.

    The good news was that Firefox already drew borders, or at least border colours, even though the layout of them was all wrong.

    The bad news was that while my patch made each element look Bigger Than Before, the baselines were misaligned. More importantly, the <mspace> elements and even the whole <math> elements still overlapped each other… almost as if… their parents were unaware of how much space they needed when positioning them!

    I fixed the first two problems by adding the padding and border to the nsBoundingMetrics as well, because that controls the sizes and positions of MathML content. That left the overlapping of the <math> elements, because while they contain MathML content, they themselves are HTML content as far as their ancestors are concerned.

    auto borderPadding = aReflowInput.ComputedPhysicalBorderPadding();
    mBoundingMetrics.width = mWidth + borderPadding.LeftRight();
    mBoundingMetrics.ascent = mHeight + borderPadding.Side(eSideTop);
    mBoundingMetrics.descent = mDepth + borderPadding.Side(eSideBottom);

    It turns out that in Firefox, MathML frames also need to report their width to their parent via nsMathMLContainerFrame::MeasureForWidth. With the <mspace> counterpart updated, plus the WPT expectations files updated to mark the <mspace> test cases as passing, my patch was ready to land.

    /* virtual */
    nsresult nsMathMLmspaceFrame::MeasureForWidth(DrawTarget* aDrawTarget,
                                                  ReflowOutput& aDesiredSize) {
      // [...]
      auto offsets = IntrinsicISizeOffsets();
      mBoundingMetrics.width = mWidth + offsets.padding + offsets.border;
      // [...]

    I also put together a test page (reference) for the interaction between negative mspace@width and padding, which more or less rendered as expected, but it potentially revealed a bug in the layout of <math> elements that are flex items. My guess is that flex items use a code path that clamps negative sizes to zero at some point, like we have to do in ReflowOutput, resulting in excess space for the item.

    Reftest for padding with negative mspace@width: reference page, without patch, with patch.

    Margins were trickier to implement because, with Firefox and MathML content at least, the positions of elements are the parent’s responsibility to calculate. I spent a very long time reading nsMathMLContainerFrame, which is the base implementation for most MathML parents, and eventually figured out where and how to handle margins. With a patch that updates RowChildFrameIterator and Place, and yet another test page (reference) that passed with my patch, we were close to having a template for the remaining MathML elements!

    Reftest for margin: reference page, without patch, with patch.

    You can see my approach over at D87594, but the patch needed reworking and I ran out of time before I could land it.


    This internship was incredibly valuable. While I was only able to finish the first trimester for mental health reasons, over the last nine months I’ve learned C++, learned how the web platform and browser engines work, gained ample experience reading specs, worked with countless people in the open-source community, and contributed to three major engines plus the Web Platform Tests.

    Were I able to continue, I would also look forward to (more) experience contributing to specs, and probably helping Igalia with their MathML in Chromium project. In any case, my time with the collective has only strengthened my desire to someday join full-time.

    Thanks to Caitlin for her advice and support, Eva and Javier and Pablo for getting me settled in so quickly, Manuel and Fred and Rob from the Web Platform team, and Yoav and Emilio for their help on the Chromium and Firefox parts of my work.

    1. Windows is the other major platform that does this. Check out The Old New Thing by Raymond Chen to learn more. 

    2. Searchfox more or less supersedes MXR and DXR. 

    3. Igalia has a Searchfox-based WebKit code browser, and I found it useful, but it’s not yet ready for public consumption. 

    4. See also, which tracks results of each test case across major browsers. 

    September 27, 2020 09:00 AM