In my previous post I discussed my most recent contributions to flexbox code in WebKit mainly targeted at reducing the number of interoperability issues among the most popular browsers. The ultimate goal was of course to make the life of web developers easier. It got quite some attention (I loved Alan Stearns’ description of the post) so I decided to write another one, this time focused in the changes I recently landed in WebKit (Safari’s engine) to improve the handling of elements with aspect ratio inside flexbox, a.k.a make images work inside flexbox. Some of them have been already released in the Safari 118 Tech Preview so it’s now possible to help test them and provide early feedback.
(BTW if you wonder about the blog post title I couldn’t resist the temptation of writing “Flexbox Cats” which sounded really great after the previous “Flexbox Gaps”. After all, image support was added to the Web just to post pictures of , wasn’t it?)
Same as I did before, I think it’d be useful to review some of the more relevant changes with examples so you could have any of those so inspiring a-ha moments when you realize that the issue you just couldn’t figure out was actually a problem in the implementation.
What was done
Images as flex items in column flows
Web engines are in charge of taking an element tree, and accompanying CSS and creating a box tree from this. All of this relies on Formatting Contexts. Each formatting context has specific ideas about how layout behaves. Both flex and grid, for example, created new, interesting formatting contexts which allow them to size their children by shrinking and or stretching them. But how all this works can vary. While there is “general” box code that is consulted by each formatting text, there are also special cases which require specialized overrides. Replaced elements (images, for example), should work a little differently in flex and grid containers. Consider this:
Ideally, the aspect ratio of the replaced element (the image, in the example) would be preserved as the flex context calculated its size in the relevant direction (column is the block direction/vertical in western writing modes, for example)…. But in WebKit, they weren’t. They are now.
This second issue is kind of the specular twin of the previous one. The same issue that existed for block sizes was also there for inline sizes. Overriding inline sizes were not used to compute block sizes of items with aspect ratio (again the intrinsic inline size was used) and thus the aspect ratio of the image (replaced elements in general) was not preserved at all. Some examples of this issue:
Images as flex items in auto-height flex containers
The two fixes above allowed us to “easily” fix this one because we can now rely on the computations done by the replaced elements code to compute sizes for items with aspect ratio even if they’re inside special formatting contexts as grid or flex. This fix was precisely about delegating that computation to the replaced elements code instead of duplicating all the aspect-ratio machinery in the flexbox code. This fix has apparently the potential to be a game changer:
This is a key bug to fix so that Authors can use Flexbox as intended. At the moment, no one can use Flexbox in a DOM structure where images are flex children.
Jen Simmons in bug 209983
Also don’t miss the opportunity to check this visually appealing demo by Jen which should work as expected now. For those of you not having a WebKit based browser I’ve recorded a screencast for you to compare (all circles should be round).
Left: old WebKit. Right: new WebKit (tested using WebKitGtk)
Apart from the screen cast, I’m also showcasing the issue with some actual code.
This was likely the trickiest one. I remember having nightmares with all the definite/indefinite stuff back then when I was implementing grid layout with other Igalia colleages. The whole thing about definite/indefinite sizes although sensible and relatively easy to understand is actually a huge challenge for web engines which were not really designed with them in mind. Laying out web content traditionally means taking a width as input to produce a height as output. However formatting contexts like grid or flex make the whole picture much more complicated.
This particular issue was not a malfunction but something that was not implemented. Essentially the flex specs define some cases where indefinite sizes should be considered as definite although the general rule considers them indefinite. For example, if a single-line flex container has a definite cross size we could assume that flex items have a definite size in the cross axis which is indeed equal to the flex container inner cross size.
In the following example the flex item, the image, has height:auto (by default) which is an indefinite size. However the flex container has a definite height (a fixed 300px). This means that when laying out the image, we could assume that its height is definite and equal to the height of the container. Having a definite height then allows you to properly compute the width using an aspect ratio.
Very common overlook in layout code. When dealing with layout bugs we (browser engineers) usually forget about box-sizing because the standard box model is the truth and the whole truth and the sole truth in our minds. Jokes aside, in this case the aspect ratio was applied to the border box (content + border + padding) instead of to the content box as it should. The result were distorted images because border and padding where altering the aspect ratio computations.
I mentioned this in the previous post but I’ll do it again here, having the web platform test suite has been an an absolute game changer for web browser engineers. They have helped us in many ways, from easily allowing us to verify our implementations to acting as a safety net against potential regressions we might add while fixing issues in the engines. We no longer have to manually test stuff in different browsers to check how other developers have interpreted the specs. We now have the test, period.
In this case, I’ve been using them in a different way. They have served me both as a guide, directing my efforts to reduce the flexbox interoperability issues and also as a nice metric to measure the progress of the task. Talking about metrics, this work made WebKit based browsers pass an additional 64 test cases from the WPT test suite, a very nice step forward for interoperability.
I’m attaching a screenshot with the current status of images as flex items from the WPT point of view. Each html file on the left column is a test, and each test performs multiple checks. For example the image-as-flexitem-* ones run 19 different checks (use cases) each. Each column show how many tests each browser successfully run. A quarter ago Safari’s (WebKit’s) figures for most of them were 11/19, 13/19 but now the last Tech Preview it’s passing all of them. Not bad huh?
image-as-flexitem-* flexbox tests in WPT as of 2021/01/20
Acknowledgements
Again many thanks to the different awesome folks at Apple, Google and my beloved Igalia that helped me with very insightful reviews and strong support at all levels.
Also I am thankful to all the photographers from whom I borrowed their nice cat pictures (including the Brown and Black Cat on top by pixabay).
VkRunner is a Vulkan shader tester based on Piglit’s shader_runner (I already talked about it in my blog). This tool is very helpful for creating simple Vulkan tests without writing hundreds of lines of code. In the Graphics Team at Igalia, we use it extensively to help us in the open-source driver development in Mesa such as V3D and Turnip drivers.
This is the first time I create a package and thanks to the documentation on how to create RPM packages, the process was simpler than I initially thought. If I find the time to read Debian New Maintainers’ Guide, I will create a DEB package as well.
Anyway, if you have installed Fedora or OpenSUSE in your computer and you want to try VkRunner, just follow these steps:
The W3C is in the middle of a big, and arguably very difficult election for the W3C Techincal Architecture Group (aka TAG). The TAG is one of two small bodies within the W3C which are elected by membership. If you're unfamilliar with this, I wrote this exceptionally brief Primer for Busy People. I'd like to tell you why I think this is a big election, why it is complex, what I (parsonally) think about it, and what you can do if you agree.
The current W3C TAG election is both important and complex for several reasons. It's big because 4 of the 6 elected seats are up for election and two exceptional members (Alice Boxhall from Google and David Baron from Mozilla) are unable to run for reelection. It's big because there are) nine candidates and each brings different things to the table (and statements don't always capture enough). It's complex because of the voting system and participation.
Let me share thoughts on what I think would be a good result, and then I'll explain why that's hard to achieve and what I think we need to avoid.
A good result...
I believe the best result involves 3 candidates for sure (listed alphabetically): Lea Verou, Sangwhan Moon and Theresa O’Connor.
Let's start with the incumbents. I fully support re-electing both Theresa O’Connor and Sangwhan Moon and cannot imagine reasons not to. Both have a history in standards, have served well on TAG, have a diversity of knowledge, and are very reasonable and able to work well. Re-electing some good incumbents with these qualities is a practical advantage to the TAG as well as they are already well immersed. These are easy choices.
Lea Verou is another easy choice for me. Lea brings a really diverse background, set of perspectives and skills to the table. She's worked for the W3C, she's a great communicator to developers (this is definitely a great skill in TAG whose outreach is important), she's worked with small teams, produced a number of popular libraries and helped drive some interesting standards. The OpenJS Foundation was pleased to nominate her, but Frontiers and several others were also supportive. Lea also deserves "high marks".
These 3 are also a happily diverse group.
This leaves one seat. There are 3 other candidates who I think would be good, for different reasons: Martin Thompson, Amy Guy and Jeffrey Yaskin. Each of them will bring something different to the table and if I am really honest, it is a hard choice. I wish we could seat all 3 of them, but we can't. At least in this election (that is, they can run again).
For brevity, I will not even attempt to make all the cases here, but I encourage you to read their statements and ask friends. Truth be told, I have a strong sense that "any mix of these 6 could be fine" and different mixes optimize for slightly different things. Also, as I will explain, there are some things that seem slightly more important to me than who I recommend is third best vs fourth or fifth...
TLDR; Turn out the vote
If you find yourself in agreement with me, more or less, I would suggest: "place the 3 I mentioned above (Lea, Tess, Sangwhan) at least your top 4 places", pick a fourth from my other list and put them in whatever order you like..
I think there are many possible great slates for TAG in 2021, but they all involve Lea, Tess and Sangwhan. Please help support them and place them among your top 4 votes.
If you're a W3C member, your AC Representative votes for you -- tell them. Make sure they vote - the best result definitely depends on higher than average turnout. Suprisingly, about 75% of membership doesn't normally vote. These elections are among the rare times when there are "votes" where there are equal voices. A tiny 1 person member org has exactly the same voting power as every mega company.
If you're not a W3C member, you don't have a way to vote directly but you can publicly state your support and tag in member orgs or reach out to people you know who work for member orgs. Historically this has definitely helped - let's keep W3C working well!
STV, Turnout and Negative Preference
The W3C's election system uses STV. STV stands for "single transferrable vote". Single is the operative word: While you express 9 preferences in this election, only one of those will actually be counted to help someone win a seat. The counting system is (I think) rather complex, but the higher up on your list someone appears it is far more likely to be the one that counts. Each vote that is counted counts a 1 vote in that candidates' favor.
Let me stress why this matters: STV optimizes for choosing diversity of opinion with demonstrably critical support. A passionate group supporting an 'issue' candidate will all place their candidate as the #1 choice - those are guaranteed to be counted.
Historically only about 100 of the W3C's 438 member organizations actually turn out in a good election. Let's imagine turnout is even lower in 2020 and it's only 70. This means that if a candidate reaches 18 votes (a little over 4% of membership) they have a seat, no matter how the rest of us vote - even if everyone else had and actively negative preference for them.
Non-participation is an issue for all voting systems, but it seems like STV can really amplify outcomes which are undesirable here. The only solution to this problem is to increase turnout. Increasing turnout raises the quota bar and helps ensure that this doesn't happen.
Regardless of how you feel about any of the candidates, please help turnout the vote. The W3C works best when as many members vote as possible!
2020 is not a great year to do any kind of recap, but there have been some positive things happening in Igalia during this year. Next you can find a highlight of some of these things in no particular order.
Fantastic @CSSWG meeting hosting by @igalia.@svgeesus and I are so incredibly grateful that they are providing childcare for us, for all three days of the meeting, on the same floor. ❤️ Setting the bar high!
Brian Kardell from Igalia was talking to everybody about Container Queries. This is one of the features that web authors have been asking for since ever, and Brian was trying to push the topic forward and find some kind of solution (even if not 100% feature complete) for this topic. In that week there were discussions about the relationship with other topics like Resize Observer or CSS Containment, and new ideas appeared too. Brian posted a blog post after the event, explaining some of those ideas. Later my colleague Javi Fernández worked on an experiment that Brian mentioned on a recent post. The good news is that all these conversations managed to bring this topic back to life, and past November Google announced that they have started working on a Container Queries prototype in Chromium.
This actually started in later 2019, but it has been ongoing during the whole 2020. Brian Kardell has been recording a podcast series about the web platform and some of its features with different people from the industry. They have been getting more and popular, and Brian was even asked to record one of these for the last BlinkOn edition.
So far 8 episodes of around 1 hour length have been published, with 13 different guests. More to come in 2021! If you are curious and want to know more, you can find them at Igalia website or in your favourite podcasting platform.
Igalia contributions
This is not a comprehensive list but just some highlights of what Igalia has been doing in 2020 around CSS:
The API owners met on a weekly basis to review the intent threads and discuss about them, it’s an amazing learning experience to be part of this group. In my case when reviewing intents I usually pay attention to things related to interoperability, like the status of the spec, test suites and other implementations. In addition, I have the support from all my awesome colleagues at Igalia that help me to play this role, thank you all!
2021 and beyond…
Igalia keeps growing and a bunch of amazing folks will join us soon, particularly Delan Azabani and Felipe Erias are already starting these days as part of the Web Platform team.
Open Prioritization should have the first successful project, as :focus-visible is advancing funding and it gets implemented in WebKit. We hope this can lead to new similar experiments in the future.
And I’m sure many other cool things will happen at Igalia next year, stay tuned!
My posts frequently (like this one) have a 'theme' and tend to use a number of images for visual flourish. Personally, I like it that way, I find it more engaging and I prefer for people to read it that way. However, for users on a metered or slow connection, downloading unnecessary images is, well, unnecessary, potentially costly and kind of rude. Just to be polite to my users, I offer the ability for you to opt out of 'optional' images if the total size of viewing the page would exceed a budget I have currently defined as 200k...
2020: The Good Parts
Each year, Igalians take a moment and look back on the year and assess what we've accomplished. Last year I wrote a wrap up for 2019 and hinted at being excited about some things in 2020 - I'd like to do the same this year.
Even in a "normal" year, making a list of things that you've actually accomplished can be a good exercise for your mental health. I do it once a month, in fact. It's easy to loose sight beyond what you're thinking of in the moment and feel overwhelmed by the sheer volume. If I can be honest with you, since it's just between us, heading into this exercise always fills me with a sense of dread. It always seems like now is the time when you have to come to grips with how little you actually accomplished this month. But, my experience is always quite the opposite: The simple act of even starting to create a list of things you actually did can give you a whole new perspective. Sometimes, usually maybe, I don't even finish the list because I hit a point where I say "Wow, actually, that's quite a lot" and feel quite a bit better.
But, 2020 is, of course, not a "normal year". It's more than fair to expect less of ourselves. So, when I sat down to write about what we accomplished, I was faced with this familiar sinking feeling -- and I had precisely the same reaction: Wow! We did a lot. So, let me share some highlights of Igalia's 2020: The Good Parts.
All the browsers
At Igalia, we are significant contributors to all of the browser engines (and several of the browsers that sit atop them too). There's a lot of ways you can look at just how much we do, and none of them are perfect, but commits are one kind of easy, but fuzzy measure of comparatively how much we did in the community. So, how much comparatively less did we do this year, than last? The opposite actually!
Igalia is again the #2 contributor to Chromium (Microsoft is coming up fast though). We are also again the #2 contributor to WebKit. Last year we raised some eyebrows by announcing that we had 11% of the total commits. This year: 15.5%! We also are up one place to the #6 contributors in the mozilla-central repository and up three places to #4 is servo! Keep in mind that #1 in all of these are the project owners (Google, Apple and Mozilla respectively).
We were huge contributors everywhere, but look at this: 15.5% of all WebKit Contributions in 2020!!
We worked on so many web features!
Some of the things we worked on are big or exciting things that everyone can appreciate and I want to highlight a little more here, but the list of features where we worked on at least one (sometimes two) implementations would be prohibitively long! Here is a very partial list of ones we worked on that I won't be highlighting.
Lazy loading
stale-while-revalidate
referrer-policy
Fixing bugs with XHR/fetch
Interop/Improvements to ResizeObserver/IntersectionObserver
Custom Properties performance
Text wrapping, line breaking and whitespace
Trailing ideograph spaces
Default aspect ratio from HTML Attributes
scroll snap
scroll-behavior
overscroll-behavior
scrollend event
Gamepad
PointerLock
list-stlye-type: <string>
::marker
Lgical inset/border/padding/margin
A few web feature highlights...
Here are just a few things that I'd like to say a little more about...
Container Queries
I am exceptionally pleased that we have been pivotal in moving the ball in conversations on container queries. Not only did our outreach and discussions last year change the temperature of the room, but we got a start on two proposals and actually had CSS Working Group discussion on both. I'm also really pleased that Igalia built a functional prototype for further review and discussion of our switch proposal and that we've been collaborating with Google and Miriam Suzanne who have picked up where David Baron's proposal left.
It's like we just found not one, but two mytical unicorns
I expect 2021 to be an exciting year of developments in this space where we get a lot more answers sorted out.
MathML
Two years ago, MathML fell into a strange place in web history and had an uncertain future in browsers. Igalia has led the charge in righting all of this. Along with the MathML Refresh Community Group, peer implementers and help from various standards groups we now have MathML-Core - a well defined spec, with tests that define the most meaningful parts of MathML and their relation to the web platform as interoperability targets. We've made a ton of proress in aligning support, describing how things fit, and invested a lot of time this year up-streaming work in Chromium. Some additional work remains for next year pending Layout NG work at Google, but it's looking better and better and most of it shipping behind the experimental web platform features flag today. We also helped create and advocate for a new W3C charter for math.
But let me share why I'm especially proud of it...
Because Math is text, and a phenomenally import kind of text. The importance of begin able to render formulae is really highlighted during a pandemic, where researchers of all kinds need to share information and students are learning from home. I'm super proud to be a part of this single action that I believe really is a leap in helping the Web realize its potential for these societally important issues.
SVG/CSS Alignment
At the end of last year's post I hinted about something we were working on. The practical upshots that people will more immediately relate to will be the abilities to do 3D transforms in SVG and hardware accelerate SVG.
These are long requested enhancements but also come with historical baggage, so it's been difficult for browsers to invest. It's a great illustration of why Igalia is great for the ecosystem. This work is getting investment priority because Igalia are the maintainers of WPE WebKit, the official WebKit port for embedded devices.
Software on embedded devices has a marriage of unique qualities that lots of controls and displays want to be SVG-based, but also have to deal with typically low end hardware, which usually still has a GPU. Thus, this problem for those devices is a few orders of magnitude more critical than it is elsewhere. However, our work will ultimately fund improvements for all WebKit browsers, which also incentivizes others to follow!
OffscreenCanvas
One thing we haven't talked about yet, but I can't wait to is OffscreenCanvas. Apple originally pioneered the <canvas> element and it's super cool and useful for a lot of stuff. Unfortunately, it is historically tied to the DOM, did its work on the main thread and couldn't be used in workers. This is terrible because many of the use cases it is really great for are real intense. This is a bad situation - luckily, we're working on it! Chris Lord has been working on OffscreenCanvas in WebKit and it looks great so far - everything except text portions is done and I've been using it with great results.
OffscreenCanvas can be used in workers, and you can 'snap off' and transfer the context from a DOM rendered canvas to a worker too. So great! And guess why we're investing in it? You guessed it: Embedded.
WebXR
I mean, this is kinda huge right? Igalia is investing to help move XR forward in WebKit - part of this is generalized for WebKit, and I think that is kind of awesome. Still early days and there's a lot to do, but this is pretty exciting to see developing and I'm proud that Igalia is helping make it happen!
Important JavaScript stuff!
We Pushed and shipped public/private instance and static fields in JavaScriptCore (WebKit). Private methods are ongoing. We've really improved coordination with WebKit at large this year, and we're constantly improving the 32bit JSC too. We're working on WebAssembly and numerous emerging TC39 specifications we're excited about: Module blocks, decorators, bundling, Realms, decimal, records and tuples, Temporal and lots of things for ECMA 402 (Internationalization) too!
Web Related, non-feature things
There's a lot of other things that we accomplished this year at Igalia which are pretty exciting too!
Open Prioritization! This year we ran a pilot experiment called "Open Prioritization" to start some big and complex discussions and attempt to find ways to give more people a voice in the prioritization of work on the commons. We partnered with Open Collective and while I can't say the first experiment was flawless, we learned a lot and are moving forward with a project picked and funded by the larger community as well as establishing a collective to continue to do this!
Our new podcast! This year we also launched a podcast. We've had great discussions on complex topics and had amazing guests, including one of the creators of CSS Håkon Wium Lie, people from several browser vendors past and present, people who helped drive the two historically special embeddable forms in HTML (MathML and SVG), and some developer and web friends. It's available via all of your favorite podcasting services, a playlist on our YouTube channel and on our website
ipfs This year we also began working with Protocol Labs to improve some things around protocol handers - those are great for the web at large and it's interesting and exciting to see what is happening with things like IPFS!
Joined MDN PAB This year Igalia also joined the MDN Product Advisory Board, and we're looking forward to helping ensure that the vital resource that is MDN remains healthy!
WPE You might know that Igalia are the maintainers of a few of the official WebKit ports, and one of them is for embedded systems. I'm really pleased with all of the thins that this has allowed us to help drive for WebKit and the larger Web Platform. However, embedded "browsers" was kind of a new topic to me when I began my work here and it's somewhat different than the sorts of challenges I am used to. With embedded systems you typically build the OS specifically for the device. Sharing the same web tech commons is phenomenal, but for many developers like myself, my questions about embedded were difficult to explore on my own as someone who rarely compiles a browser, much less an operating system! I'm really pleased with the progress we've made on this, making wpewebkit.org more friendly, informative and relevant to people who might not already be experts at this, including making easy step-wise options available for people to explore. Someone with no experience and download a raspbian based OS with WPE WebKit on it and flash it right on a Raspberry Pi just to explore. For a lot of pet projects, you can do a lot with that too. That's not super representative of a good embedded system in terms of performance and things, but it is very easy and it's maintained by us, so it's pretty up to date. A short step away, if you're pretty comfortable with Linux shell and ssh, you can get a minimal/optimized for Raspberry Pi 3 install you can flash right onto your Pi that runs a Weston Wayland compositor. Finally, if you already kind of know what you're doing, we maintain Yocto receipes for developers to more easily build and maintain their real systems.
Vulkan! driver -
You might know that Igalia does all kinds of stuff beyond just the Web, we work on all of the things that power the web too, and kind of all the way down - so we have lots of areas of specialization. I think it's really cool that we partnered with Raspberry Pi to create a Vulkan driver for the Mesa graphic driver stack for the latest generation of Raspberry Pi, achieving conformance in less than 1 year, passing over 100k tests from Kronos' Conformance Test Suite since our initial announcement of drawing the first triangle!
Looking forward...???
So, what exciting things can we look forward to in 2021? Definitely, advancing all of the things above - and that's exciting enough. It's hard to know what to be most excited for, but I'm personaly really looking forward to watchin Open Prioritization grow and get a real good idea and very concrete progress on Container Queries issues. We've also got our eyes on some new things we'll be starting to talk about in the next year, so stay tuned on those too.
One, that I'd like to mention, however is tabs. Over the past year, Igalia has begun to get involved with efforts like OpenUI and I've been talking to developers and peers at Igalia about tabs. I had some thoughts and ideas that I posted earlier this year. Just recently some actual work and collaboration has been done - gettinga number of us with similar ideas together to sort out a useful custom element that we can test out, as well as working in OpenUI on aligning all of the stars we'll need to align as we attempt to approach something like standard tabs. It is very early days here, but we've gone from a vague blog post to some actual involvement and we're getting behind the idea - which is pretty exciting and I can't wait to say more concrete things here!
Did you know that CSS makes it possible to style list markers?
In the past, if you wanted to customize the bullets or numbers in a list,
you would probably have to hide the native markers with list-style: none,
and then add fake markers with ::before.
However, now you can just use the ::marker pseudo-element
in order to style the native markers directly!
If you are not familiar with it, I suggest reading these articles first:
In this post, I will explain the deep technical details of how I implemented
::marker in Chromium.
Thanks to Bloomberg for sponsoring Igalia to do it!
Implementing list-style-type: <string>
Before starting working on ::marker itself, I decided to add support for string values in list-style-type.
It seemed a quite useful feature for authors, and Firefox already shipped it in 2015.
Also, it’s like a more limited version of content in ::marker, so it was a good introduction.
It was relatively straight-forward to implement.
I did it in a single patch, https://crrev.com/704563,
which landed in Chromium 79.
Then I also ported it into Webkit, it’s avilable since Safari Technology Preview 115.
The interesting thing to mention is that list-style-type had been implemented with a keyword template,
so its value would be internally stored using an enum, and it would benefit from the parser fast path for keywords.
I didn’t want to lose that, so I followed the same approach as for display,
which also used to be a single-keyword property,
until Houdini extended its syntax with layout(<ident>).
Basically, I treated list-style-type as a partial keyword property.
This means that it keeps the parser fast path for keyword values,
but in case of failure it falls back to the normal parser, where I accepted a string value.
When a string is provided,
the internal list-style-type value is set to
a special EListStyleType::kString enum value,
and the string is stored in an extra ListStyleStringValue field.
Layout
From a layout point of view, I had to modify both LayoutNG and legacy code.
LayoutNG is a new layout engine for Chromium
that has been designed for the needs of modern scalable web applications.
It was released in Chrome 77 for block and inline layout,
but some CSS features like multi-column haven’t been implemented in LayoutNG yet,
so they force Chromium to use the old legacy engine.
It was mostly a matter of tweaking
LayoutNGListItem (for LayoutNG) and LayoutListMarker (for legacy)
in order to retrieve the string from ListStyleStringValue when
the ListStyleType was EListStyleType::kString,
and making sure to update the marker when ListStyleStringValue changed.
Also, string values are somewhat special because they don’t have a suffix,
unlike numeric values that are suffixed with a dot and space (like 1. ),
or symbolic values that get a trailing space (like ◾ ).
It’s noteworthy that until this point, markers didn’t have to care about mixed bidi.
But now you can have things like list-style-type: "aال", that is: U+0061 a, U+0627 ا, U+0644 ل.
Note that ا is written before ل, but since they are arabic characters, ا appears at the right.
This is relevant because the marker is supposed to be isolated from the text in the list item,
so in LayoutNG I had to set unicode-bidi: isolate to inside markers.
It wasn’t necessary for outside markers since they are implemented as inline-blocks,
which are already isolated.
In legacy layout, markers don’t actually have their text as a child, it’s just a paint-time effect.
As such, no bidi reordering happens, and aال doesn’t render correctly:
<listyle="list-style: 'aال - ' inside">text</li>
LayoutNG: vs.
legacy:
At that point I decided to leave it this way,
but this kind of problems in legacy layout would keep haunting me while implementing ::marker.
Keep reading to know the bloody details!
::marker parsing and computation
Here I started working on the actual ::marker pseudo-element.
As a first step, in https://crrev.com/709390
I recognized ::marker as a valid selector (behind a flag), added a usage counter,
and defined a new PseudoId::kPseudoIdMarker to identify it in the node tree.
It’s important to note that list markers were still anonymous boxes,
there was no actual ::marker pseudo-element,
so kPseudoIdMarker wasn’t actually used yet.
Something that needs to be taken into account when using ::marker
is that the layout model for outside positioning is not fully defined.
Therefore, in order to prevent authors from relying on implementation-defined behaviors
that may change in the future, the CSSWG decided to restrict which properties
can actually be used on ::marker.
I implemented this restriction in https://crrev.com/710995,
using a ValidPropertyFilter just like it was done for ::first-letter and ::cue.
But note this was later refactored,
and now whether a property applies to ::marker or not
is specified in the property definition in css_properties.json5.
At this point, ::marker only allowed:
All font properties
Custom properties
color
content
direction
text-combine-upright
unicode-bidi
Using ::marker styles
At this point, ::marker was a valid selector,
but list markers weren’t using ::marker styles.
So in https://crrev.com/711883
I just took these styles and assigned them to the markers.
This simple patch was the real deal,
making Chromium’s implementation of ::marker match WebKit’s one,
which shipped in 2017.
When enabling the runtime flag, you could style markers:
This landed in Chromium 80. So, how come I didn’t ship ::marker until 86?
The answer is that, while the basic functionality was working fine,
I wanted to provide a full and solid implementation.
And it was not yet the case, since content was not working,
and markers were still anonymous boxes that just happened to get assigned
the styles for ::marker pseudo-elements,
but there were no actual ::marker pseudo-elements.
Support content in LayoutNG
Adding support for the content property was relatively easy in LayoutNG,
since I could reuse the existing logic for ::before and ::after.
Roughly it was a matter of ignoring list-style-type and list-style-image
in non-normal cases, and using the LayoutObject of the ContentData
as the children. This was not possible in legacy, since LayoutListMarker
can’t have children.
It may be worth it to summarize the different LayoutObject classes for list markers:
LayoutListMarker, based on LayoutBox, for legacy markers.
LayoutNGListMarker, based on LayoutNGBlockFlowMixin<LayoutBlockFlow>,
for LayoutNG markers with an outside position.
LayoutNGInsideListMarker, based on LayoutInline,
for LayoutNG markers with an inside position.
It’s important to note that non-normal markers were actual pseudo-elements,
their LayoutNGListMarker or LayoutNGInsideListMarker were no longer anonymous,
they had an originating PseudoElement in the node tree.
This means that I had to add logic for attaching, dettaching and rebuilding
kPseudoIdMarker pseudo-elements, add LayoutObjectFactory::CreateListMarker(),
and make LayoutTreeBuilderTraversal and Node::PseudoAware* methods be
aware of ::marker.
Another problem that I had to address was that, until this point,
both content: normal and content: none were considered to be synonymous,
and were internally stored as nullptr.
However, unlike in ::before and ::after,
normal and none have different behaviors in ::marker:
the former decides the contents from the list-style properties,
the latter prevents the ::marker from generating boxes.
Therefore, in https://crrev.com/732549
I implemented content: none as a NoneContentData,
and replaced the HasContent() helper function with the more specific
ContentBehavesAsNormal() and ContentPreventsBoxGeneration().
Default styles
According to the spec, markers needed to get assigned these styles in UA origin:
At this point, the ComputedStyle for a marker could be created in
different ways:
If there was some ::marker selector, by running the cascade normally.
Otherwise, LayoutListMarker or LayoutNGListItem would create the
style from scratch.
First, in https://crrev.com/720875
I made all StyleResolver::PseudoStyleForElementInternal,
LayoutListMarker::ListItemStyleDidChange and LayoutNGListItem::UpdateMarker
set these UA rules.
Then in https://crrev.com/725913
I made it so that markers would always run the cascade,
unifying the logic in PseudoStyleForElementInternal.
But this way of injecting the UA styles was a bit hacky and problematic.
So finally, in https://crrev.com/779284
I implemented it in the proper way, using a real UA stylesheet.
However, I took care of preventing that from triggering SetHasPseudoElementStyle,
which would have defeated some optimizations.
Interestingly, these UA styles use a ::marker selector, but they also affect
nested ::before::marker and ::after::marker pseudo-elements.
That’s because I took advantage of a bug in the style resolver,
so that I wouldn’t have to implement the nested ::marker selectors.
The bug is disabled for non-UA styles.
LayoutNGListItem::UpdateMarker also had some style tweaks that I moved into
the style adjuster instead of to the UA sheet,
because the exact styles depend on the marker:
Outside markers get display: inline-block, because they must be
block containers.
Outside markers get white-space: pre, to prevent their trailing space
from being trimmed.
Inside markers get some margins, depending on list-style-type.
An implication of my work on the default marker styles was that
the StyleType() became kPseudoIdMarker instead of kPseudoIdNone.
This made LayoutObject::PropagateStyleToAnonymousChildren() do more work,
causing the flexbox_with_list_item perf test to worsen by a 99.9%!
I fixed it in https://crrev.com/722421
by returning early for markers with content: normal,
which didn’t need that work anyways.
Once I completed the ::marker implementation, I tried reverting the fix,
and then the test only worsened by a 2-3%.
So I guess the big regression was caused by the interaction of multiple factors,
and the other factors were later fixed or avoided.
Developer tools
It was important for me to expose ::marker in the devtools
just like a ::before or ::after.
Not just because I thought it would be beneficial for authors,
but also because it helped me a lot when implementing ::marker.
So first I made the Styles panel expose the ::marker styles
when inspecting the originating list item (https://crrev.com/724094).
However, note this only worked for actual ::marker pseudo-elements.
LayoutNG markers as real pseudo-elements
As previously stated, only non-normal markers were internally implemented
as actual pseudo-elements, markers with content: normal were just
annymous boxes.
So normal markers wouldn’t appear in devtools, and would yield
incorrect values in getComputedStyle:
According to CSSOM
that’s supposed to be the used width in pixels,
but since there was no actual ::marker pseudo-element,
it would just return the computed value: auto.
So in https://crrev.com/731964
I implemented LayoutNG normal markers as real pseudo-elements.
It’s a big patch, though mostly that’s because I had to update
several test expectations.
Another advantage was that non-normal markers benefited from the much
vaster test coverage for normal ones.
For example, some accessibility code was expecting markers to be
anonymous, I noticed this thanks to existing tests with normal markers.
Without this change I might have missed that non-normal ones weren’t
handled properly.
And a nice side-effect that I wasn’t expecting was that
the flexbox_with_list_item perf test improved by a 30-40%. Nice!
It’s worth noting that until this point,
pseudo-elements could only be originated by an element.
However, ::before and ::after pseudo-elements can have display: list-item
and thus have a nested marker.
Due to the lack of support for ::before::marker and ::after::marker selectors,
I could previously assume that nested markers would have the initial content: normal,
and thus be anonymous.
But this was no longer the case, so in https://crrev.com/730531
I added support for nested pseudo-elements.
However, the style resolver is still not able to handle them properly,
so nested selectors don’t work.
A consequence of implementing LayoutNG markers as pseudo-elements
was that they became independent,
they were no longer created and destroyed by LayoutNGListItem.
But the common logic for LayoutNGListMarker and LayoutNGInsideListMarker
was still in LayoutNGListItem, so this made it difficult to keep the
marker information in sync.
Therefore, in https://crrev.com/735167
I moved the common logic into a new ListMarker class,
and each LayoutNG marker class would own a ListMarker instance.
I also renamed LayoutNGListMarker to LayoutNGOutsideListMarker,
since the old name was misleading.
Legacy markers as real pseudo-elements
Since I had already added the changes needed to implement
all LayoutNG markers as pseudo-elements,
I thought that doing the same for legacy markers would be easier.
But I was wrong! The thing is that legacy layout already had some bugs
affecting markers, but they would only be triggered when dynamically
updating the styles of the list item.
But there aren’t many tests that do that, so they went unnoticed…
until I tried my patch, which surfaced these issues in the initial layout,
making some test fail.
Then there was also bug 1051685,
involving selections or drag-and-drop with pseudo-elements like ::before or ::after.
So turning markers into pseudo-elements made them have the same problem,
causing a test failure.
I could finally land my patch in https://crrev.com/745012,
which also improved performance like in LayoutNG.
Animations & transitions
While I was still working on ::marker,
the CSSWG decided to expand the list of allowed properties
in order to include animations and transitions.
I did so in https://crrev.com/753752.
The tricky part was that only allowed properties could be animated.
For example,
Only the color of the marker is animated, not the background.
counter(list-item) inside <ol>
::before and ::after pseudo-elements already had the bug that,
when referencing the list-item counter inside an <ol>,
they would produce the wrong number, usually 1 unit greater.
Of course, ::marker inherited the same problem.
And this was breaking one of the important use-cases,
which is being able to customize the marker text with content.
Luckily, WebKit had already fixed this problem, so I could copy their solution.
Unluckily, they mixed it with a big irrelevant refactoring,
so I had to spend some time understanding which part was the actual fix.
I ported it into Chromium in https://crrev.com/783493.
Support content in legacy
The only missing thing to do was adding support for content in legacy layout.
The problem was that LayoutListMarker can’t have children,
so it’s not possible to just insert the layout object produced by the ContentData.
Then, my idea was replacing LayoutListMarker with two new classes:
LayoutOutsideListMarker, for markers with outside positioning.
LayoutInsideListMarker, for markers with inside positioning.
and they could share the ListMarker logic with LayoutNG markers.
However, when I started working on this, something big happened:
the COVID-19 pandemic.
And Google decided the skip Chromium 82 due to the situation,
which is relevant because, in order to be able to merge patches easily,
they wanted to avoid big refactorings.
And a big refactoring is precisely what I needed!
So I had to wait until Chromium 83 reached stable.
Also, Google engineers were not convinced by my proposal,
because it would imply that legacy markers
would use more memory and would be slower,
since they would have children even with content: normal.
So I changed my strategy as such:
Keep LayoutListMarker for normal markers.
Add LayoutOutsideListMarker for non-normal outside markers.
Add LayoutInsideListMarker for non-normal inside markers.
Therefore, in https://crrev.com/791815
I added text-tranform: none to the ::marker UA rules,
but also allowed authors to specify another value if they want so.
Then, the CSSWG also resolved that ::marker should allow
inherited properties that apply to text which don’t depend on box geometry.
And other properties, unless whitelisted, shouldn’t affect markers,
even when inherited from an ancestor.
Therefore, I added support for some text and text decoration properties,
and also for line-height.
On the other hand, I blocked inheritance of text-indent and text-align.
However, note that they may end up not having the desired effect in some cases:
The style adjuster forces white-space: pre in outside markers,
so you can only customize white-space in inside ones.
text-combine-upright doesn’t work in pseudo-elements
(bug 1060007).
So setting it will only affect the computed style,
and will also force legacy layout,
but it won’t turn the marker text upright.
In legacy layout, the marker has no actual contents.
So text properties, text decoration properties,
unicode-bidi and line-height don’t work.
And this is the default UA stylesheet for markers:
The final change, in https://crrev.com/837424,
was the removal of the CSSMarkerPseudoElement runtime flag.
Since 89.0.4358.0, it’s no longer possible to disable ::marker.
Overview
Implementing ::marker needed more than 100 patches in total,
several refactorings, some existing bug fixes, and various CSSWG resolutions.
I also added lots of new WPT tests, additionally to the existing ones
created by Apple and Mozilla.
For every patch that had an observable improved behavior,
I tried to cover it with a test.
Most of them are in https://wpt.fyi/results/css/css-pseudo?q=marker,
though some are in css-lists, and others are Chromium-internal
since they were testing non-standard behavior.
Note my work didn’t include ::before::marker and ::after::marker selectors,
which haven’t been implemented in WebKit nor Firefox either.
What remains to be done is making the selector parser handle nested pseudo-elements properly.
Also, I kept the disclosure triangle of a <summary> as a ::-webkit-details-marker,
but since Chromium 89 it’s a ::marker as expected,
thanks to Kent Tamura.
In previous blog posts I talked about QEMU’s qcow2 file format and how to make it faster. This post gives an overview of how the data is structured inside the image and how that affects performance, and this presentation at KVM Forum 2017 goes further into the topic.
This time I will talk about a new extension to the qcow2 format that seeks to improve its performance and reduce its memory requirements.
Let’s start by describing the problem.
Limitations of qcow2
One of the most important parameters when creating a new qcow2 image is the cluster size. Much like a filesystem’s block size, the qcow2 cluster size indicates the minimum unit of allocation. One difference however is that while filesystems tend to use small blocks (4 KB is a common size in ext4, ntfs or hfs+) the standard qcow2 cluster size is 64 KB. This adds some overhead because QEMU always needs to write complete clusters so it often ends up doing copy-on-write and writing to the qcow2 image more data than what the virtual machine requested. This gets worse if the image has a backing file because then QEMU needs to copy data from there, so a write request not only becomes larger but it also involves additional read requests from the backing file(s).
Because of that qcow2 images with larger cluster sizes tend to:
grow faster, wasting more disk space and duplicating data.
increase the amount of necessary I/O during cluster allocation,
reducing the allocation performance.
Unfortunately, reducing the cluster size is in general not an option because it also has an impact on the amount of metadata used internally by qcow2 (reference counts, guest-to-host cluster mapping). Decreasing the cluster size increases the number of clusters and the amount of necessary metadata. This has direct negative impact on I/O performance, which can be mitigated by caching it in RAM, therefore increasing the memory requirements (the aforementioned post covers this in more detail).
Subcluster allocation
The problems described in the previous section are well-known consequences of the design of the qcow2 format and they have been discussed over the years.
I have been working on a way to improve the situation and the work is now finished and available in QEMU 5.2 as a new extension to the qcow2 format called extended L2 entries.
The so-called L2 tables are used to map guest addresses to data clusters. With extended L2 entries we can store more information about the status of each data cluster, and this allows us to have allocation at the subcluster level.
The basic idea is that data clusters are now divided into 32 subclusters of the same size, and each one of them can be allocated separately. This allows combining the benefits of larger cluster sizes (less metadata and RAM requirements) with the benefits of smaller units of allocation (less copy-on-write, smaller images). If the subcluster size matches the block size of the filesystem used inside the virtual machine then we can eliminate the need for copy-on-write entirely.
So with subcluster allocation we get:
Sixteen times less metadata per unit of allocation, greatly reducing the amount of necessary L2 cache.
Much faster I/O during allocating when the image has a backing file, up to 10-15 times more I/O operations per second for the same cluster size in my tests (see chart below).
Smaller images and less duplication of data.
This figure shows the average number of I/O operations per second that I get with 4KB random write requests to an empty 40GB image with a fully populated backing file.
Things to take into account:
The performance improvements described earlier happen during allocation. Writing to already allocated (sub)clusters won’t be any faster.
If the image does not have a backing file chances are that the allocation performance is equally fast, with or without extended L2 entries. This depends on the filesystem, so it should be tested before enabling this feature (but note that the other benefits mentioned above still apply).
Images with extended L2 entries are sparse, that is, they have holes and because of that their apparent size will be larger than the actual disk usage.
It is not recommended to enable this feature in compressed images, as compressed clusters cannot take advantage of any of the benefits.
Images with extended L2 entries cannot be read with older versions of QEMU.
How to use this?
Extended L2 entries are available starting from QEMU 5.2. Due to the nature of the changes it is unlikely that this feature will be backported to an earlier version of QEMU.
In order to test this you simply need to create an image with extended_l2=on, and you also probably want to use a larger cluster size (the default is 64 KB, remember that every cluster has 32 subclusters). Here is an example:
Over the past few months the WebKit development team has been working on
modernizing support for the WebAudio specification. This post highlights some
of the changes that were recently merged, focusing on the GStreamer ports.
My fellow WebKit colleague, Chris Dumez, has been very active lately, updating
the WebAudio implementation for the mac ports in order to comply with the latest
changes of the specification. His contributions have been documented in the
Safari Technology Preview release notes for version 113, version 114,
version 115 and version 116. This is great for the WebKit project! Since
the initial implementation landed around 2011, there wasn’t much activity and
over the years our implementation started lagging behind other web engines in
terms of features and spec compliance. So, many thanks Chris, I think you’re
making a lot of WebAudio web developers very happy these days :)
The flip side of the coin is that some of these changes broke the GStreamer
backends, as Chris is focusing mostly on the Apple ports, a few bugs slipped in,
noticed by the CI test bots and dutifully gardened by our bots sheriffs. Those
backends were upstreamed in 2012 and since then I didn’t devote much time to
their maintenance, aside from casual bug-fixing.
One of the WebAudio features recently supported by WebKit is the Audio Worklet
interface which allows applications to perform audio processing in a dedicated
thread, thus relieving some pressure off the main thread and ensuring a
glitch-free WebAudio rendering. I added support for this feature in r268579.
Folks eager to test this can try the GTK nightly MiniBrowser with the demos:
For many years our AudioFileReader implementation was limited to mono and stereo
audio layouts. This limitation was lifted off in r269104 allowing for
processing of up to 5.1 surround audio files in the AudioBufferSourceNode.
Our AudioDestination, used for audio playback, was only able to render stereo.
It is now able to probe the GStreamer platform audio sink for the maximum number
of channels it can handle, since r268727. Support for AudioContext
getOutputTimestamp was hooked up in the GStreamer backend in r266109.
The WebAudio spec has a MediaStreamAudioDestinationNode for MediaStreams,
allowing to feed audio samples coming from the WebAudio pipeline to outgoing
WebRTC streams. Since r269827 the GStreamer ports now support this feature as
well! Similarly, incoming WebRTC streams or capture devices can stream their
audio samples to a WebAudio pipeline, this has been supported for a couple years
already, contributed by my colleague Thibault Saunier.
Our GStreamer FFTFrame implementation was broken for a few weeks, while Chris
was landing various improvements for the platform-agnostic and mac-specific
implementations. I finally fixed it in r267471.
This is only the tip of the iceberg. A few more patches were merged, including
some security-related bug-fixes. As the Web Platform keeps growing, supporting
more and more multimedia-related use-cases, we, at the Igalia Multimedia team,
are committed to maintain our position as GStreamer experts in the WebKit community.
I used to regard myself as an austere programmer in terms of tooling: Emacs —with a plain configuration— and grep. This approach forces you to understand all the elements involved in a project.
Some time ago I have to code in Rust, so I needed to learn the language as fast as possible. I looked for packages in MELPA that could help me to be productive quickly. Obviously, I installed rust-mode, but I also found racer for auto-completion. I tried it out. It was messy to setup and unstable, but it helped me to code while learning. When I felt comfortable with the base code, I uninstalled it.
This year I returned to work on WebKit. The last time I contributed to it was around five years ago, but now in a different area (still in the multimedia stack). WebKit is huge, and because of C++, I found gtags rather limited. Out of curiosity I looked for something similar to racer but for C++. And I spent a while digging on it.
The solution consists in the integration of three MELPA packages:
lsp-mode: a client for Language Server Protocol for Emacs.
ccls: A C/C++ language server. Besides emacs-ccls adds more functionality to lsp-mode.
(I known, there’s a simpler alternative to lsp-mode, but I haven’t tried it yet).
First we might explain what’s LSP. It stands for Language Server Protocol, defined with JSON-RPC messages, between the editor and the language server. It was orginally developed by Microsoft for Visual Studio, which purpose is to support auto-completion, finding symbol’s definition, to show early error markers, etc., inside the editor. Therefore, lsp-mode is an Emacs mode that communicates with different language servers in LSP and operates in Emacs accordingly.
In order to support the auto-completion use-case lsp-mode uses the company-mode. This Emacs mode is capable to create a floating context menu where the editing cursor is placed.
The third part of the puzzle is, of course, the language server. There’s a language servers for different programming languages. For C & C++ there are two servers: clangd and ccls. The former uses Clang compiler, the last can use either Clang, GCC or MSVC. Along this text ccls will be used for reasons exposed later. In between, emacs-ccls leverages and extends the support of ccls in lsp-mode, though it’s not mandatory.
In short, the basic .emacs configuration, using use-package, would have these lines:
The snippet first configures company-mode. It is enabled globally because, normally, it is a nice feature to have, even in non-coding buffers, such as this very one, for writing a blog post in markdown format. Diminish mode hides or abbreviates the mode description in the Emacs’ mode line.
Later comes lsp-mode. It’s big and aims to do a lot of things, basically we have to tell it to disable certain features, such as file watcher, something not viable in massive projects as WebKit; as I don’t use snippet (generic text templates), I also disable it; and finally, lsp-mode tries to format the code at typing, I don’t know how the code style is figured out, but in my experience, it’s always detected wrong, so I disabled it too. Finally, lsp-mode is launched when a text uses the c-mode-common, shared by c++-mode too. lsp-mode is launched deferred, meaning it’ll startup until the buffer is visible; this is important since we might want to delay ccls session creation until the buffer’s .dir-locals.el file is processed, where it is configured for the specific project.
And lastly, ccls-mode configuration, hooked until c-mode or c++-mode are loaded up in a deferred fashion (already explained).
It’s important to understand how ccls works in order to integrate it in our workflow of a specific project, since it might need to be configured using Emacs’ per-directory local variales.
We are living in a post-Makefile world (almost), proof of that is ccls, which instead of a makefile, it uses a compilation database, a record of the compile options used to build the files in a project. It’s commonly described in JSON and it’s generated automatically by build systems such as meson or cmake, and later consumed by ninja or ccls to execute the compilation. Bear in mind that ccls uses a cache, which can eat a couple gigabytes of disk.
Now, let’s review the concrete details of using these features with WebKit. Let me assume that WebKit local repository is cloned in ~/WebKit.
As you may know, the cool way to compile WebKit is with flatpak. Flatpak adds an indirection in the compilation process, since it’s done in an isolated environment, above the native system. As a consequence, ccls has to be the one inside the Flatpak environment. In ~/.local/bin/webkit-ccls:
#!/bin/sh
set -eu
cd $HOME/WebKit/
exec Tools/Scripts/webkit-flatpak -c ccls "$@"
Basically the scripts calls ccls inside flatpak, which is available in the SDK. And this is why ccls instead of clang, since clang is not provided.
By default ccls assumes the compilation database is in the project’s root directory, but in our case, it’s not, thus it is required to configure the database directory for our WebKit setup. For it, as we already said, a .dir-locals.el file is used.
As you can notice, ccls-execute is defined here, though it’s not a safe local variable. Also the ccls-initialization-options, which is a safe local variable. It is important to notice that the compilation database directory is a path inside flatpak, and always use the Release path. I don’t understand why, but Debug path didn’t work for me. This mean that WebKit should be compiled as Release frequently, even if we only use Debug type for coding (as you may see in my compile-command).
Update: Now we can explain why it’s important to configure lsp-mode as deferred: to avoid connections to ccls before processing the .dir-locals.el file.
And that’s all. Now I have early programming errors detection, auto-completion, and so on. I hope you find these notes helpful.
Update: Sadly, because of flatpak indirection, symbols’ definition finding won’t work because the file paths stored in ccls cache are relative to flatpak’s file system. For that I still rely on global and its Emacs mode.
FOSSCOMM (Free and Open Source Software Communities Meeting) is a Greek conference aiming to promote the use of FOSS in Greece and to bring FOSS enthusiasts together. It is organized entirely by volunteers and universities and takes place in a different city each year. This year it was virtual as Greece is under lockdown, and … Continue reading FOSSCOMM 2020, and a status update on EXT_external_objects(_fd) extensions [en, gr]
It has been a long time since I wrote my last blog post and since I wrote about something that I and my colleagues at Igalia have been working for the past 4 years. I have been postponing writing this post waiting until something big happens. Well, something big just happened…
If you already know what Ozone is, then I am happy to tell you that Chromium for Linux includes Ozone by default now and can be enabled with runtime command line flags. If you are interested in trying Chrome/Chromium with native Wayland support, you are encouraged to download Google Chrome for developers and try Ozone/Wayland by running the browser with the following command line flags – ‘–enable-features=UseOzonePlatform –ozone-platform=wayland’.
If you don’t know what Ozone is, here’s a brief explanation, before I talk about the history, status and design of this effort.
What is Ozone?
The very first thing that one may think about when they hear “Ozone” is the gas or a thin layer of the Earth’s atmosphere. Well… it is partly correct. In the case of Chromium, it is a platform abstraction layer.
I will not go into many details, but here is the description of that layer from Chromium’s documentation about Ozone – “Ozone is a platform abstraction layer beneath Aura, Chromium’s platform independent windowing system, that is used for low level input and graphics. Once complete, the abstraction will support underlying systems ranging from embedded SoC targets to new X11-alternative window systems on Linux such as Wayland or Mir to bring up Aura Chromium by providing an implementation of the platform interface.”.
If you are interested in more details, you are welcome to read the project’s documentation at https://chromium.googlesource.com.
The Summary of the Design of Ozone/Wayland
It has been a long time since Antonio Gomes started to work on this project. It started as a research project for our customer – Renesas Electronics, and was based on a former abstraction project with another clever name, “mus+ash” (pronounced “mustache”, you can read more about that here – Chromium, ozone, wayland and beyond).
Since that time, the project has been moved to downstream and back to upstream (because of some unknowns related to the “mus+ash”) and the design of Ozone integration has also been changed.
Currently, the Aura/Platform classes are injected into the Browser process and communicate directly with the underlying Ozone platforms including Wayland. In the browser process, Wayland creates a connection with a Wayland compositor, while in the GPU process, it only draws pixels into the created DMABUFs and neither receives events nor creates surfaces.
Migrating Away From X11-only Legacy Backend.
It is worth mentioning that Igalia has been working on both Ozone/X11 and Ozone/Wayland.
Since June 2020, we have been working on switching Ozone for Linux from needing to be done at compile time to being choosable at runtime. At the moment, one can try Ozone by running Chrome downloaded from the development channel with the ‘–enable-features=UseOzonePlatform –ozone-platform=wayland/x11’ runtime flags.
That approach is allowing us to gather a bigger audience of users who are willing to test the Ozone capabilities, but also achieve a better feature parity between non-Ozone/X11 and Ozone/X11/Wayland.
That is, most of the features and code paths are shared between the two implementations, and the paths that are not compatible with Ozone are being refactored at the moment.
Once all the incompatible paths are refactored ( just a few of them remain) and all the available test suites are enabled on the Linux/Ozone bot, we will start what is known as a “finch trial”. This allows Ozone to be enabled by default for some percentage of users (about 10%). If the finch trial goes well, the percentage of users will be gradually grown to 100% and we will start removing old non”-Ozone/X11 implementation.
Wayland + Tab Drag
If you’ve been trying it out, you might have already noticed that Ozone/Wayland does not support the Tab Drag feature well enough. The problem is the lack of the protocol for this feature.
At the moment, my colleague Nick Diego is working on the definition of the protocol for tab drag and implementation of that in Chromium.
Unfortunately, Ozone will fallback to x11/xwayland for compositors that do not support the aforementioned protocol. However, once more and more compositors will support that, Chrome will whitelist those compositors.
I would not go into details of that effort here in this blog post, but just rather leave a link to the design document that Nick has created – Tab Dragging on Ozone/Wayland.
Summary
This blog post was rather a brief summary of the design, feature, and status of the project. Thank you for reading it. Hopefully, when we start a finch trial, I will write another blog telling you how it goes. Bye.
Earlier this year, Igalia launched an experiment called "Open Prioritization" which, effectively, lets us pull money together to prioritize work on a feature in web browsers. In this piece I'll talk about the outcome, lessons learned along the way and next steps.
Our Open Prioritization experiment was a pretty big idea. On the surface, it seems to be asking a pretty simple question: "Could we crowdfund some development work on browser?" However, it was quite a bit more involved in its goals, because there is a lot more hiding behind that question than meets the eye, and most of it is kind of difficult to talk about in purely theoretical ways. I'll get into all of that in a minute, but let's start with the results...
One project advances: :focus-visible
We began the experiment with six possible things we would try to crowdfund, and I'm pleased to say that one will advance: :focus-visible in WebKit.
We are working with Open Collective on next steps as it involves some decision making on how we manage future experiments and a bigger idea too. However, very soon this will shift from a pledged collective which just asked "would you financially support this project if it were to be offered?" to a proper way to collect funds for it. If you pledged, you will receive an contact when it's ready asking you to fulfill your pledge with information on how. We will also write a post when that happens as it's likely that at least some people will not come back and fulfill their pledge.
As soon as this begins and enough funds are available, it will enter our developers work queue and as staff frees up, they will shift to begin work on implementing this in WebKit!
We did it! We at Igalia would like to say a giant "thank you" for all of those who helped support these efforts in improving our commons.
Retrospective
Let's talk about some of those bigger ideas this experiment was aiming to look at, and lessons we learned along the way, in retrospect...
Resources are finite. Prioritization is hard. No matter how big the budget, resources are still finite and work has to be prioritized. Even with only a few choices on the table to choose from, it's not necessarily easy or plain what to choose because there isn't a "right" answer.
There are reasonable arguments for prioritizing different things. The two finalists both had strong arguments from different angles, but even a step back - at least some people chose to pledge to something else. Some thought that supporting SVG path syntax in CSS was the best choice. They only pledged to that one. Many of these were final implementations, but this wasn't. Some people thought that advancing new things that no one else seems to be advancing was the way to go. Others supported it because they thought that it was really important to boost ones that are help Mozilla. There just weren't enough people either seeing or agreeing with that weighting of things.
Cost is a factor It's not an exclusive factor - the cheapest option by far (SVG Path in CSS/Mozilla) was eliminated earlier. There are other reasons :focus-visible made some giant leaps too, but - at the end of the day the bar was also just lower. The second place project never actually managed to pull ahead, depite hoving more actual pledged dollars at one point.
Investing with uncertainty is especially hard . Just last week, Twitter exploded with excitement that Google was going to prototype some high level stuff with Container Queries. Fundamental to Google's intent is CSS containment in a single direction. CSS does not currently define containment in a single direction, but it does define the containment module where it would be defined. Containment was, in part, trying to lay some potential groundwork here. When we launched the project, I wrote about this: WebKit doesn't currently support the containment that is defined already and is a necessary prerequisite of any proposal involving that approach. The trouble is: We don't know if that will be the approach, and supporting it is a big task. Building a high level solution on the magic in our switch proposal, for example, doesn't require containment at all. Adding general containment support was the most expensive project on our list, by far. In fact, we could have done a couple of them for that price. This makes the value proposition of that work very speculative. Despite being potentially critically valuable for the single biggest/longest ask in CSS history - that project didn't make the finals when we put it to the public either.
Some things are difficult to predict. Going into this, I didn't know what to expect. A single viral tweet and a mass of developers pitching in $1 or $2 could, in theory, have funded any of these in hours. While I didn't expect that, I did kind of expect some amount of funds in the end would be of that sort. Interestingly, that didn't happen. At all. Depite lots of efforts trying to get lots of people to pledge very small dollars even asking specifically, and making it possible to do with a tweet - very, very few did (literally 1 on the winning project pledged less than five dollars). The most popular pledge was $20 with about a quarter of the pledges being over $50, and going up from there.
Matching funds are a really big deal. You can definitely see why fundraisers stress this. For the duration of this experiment, we saw periods of little actual movement, despite lots of tweets about it, likes and blog posts. There were a few giant leaps, and they all involved offers of matching dollars. Igalia ourselves, The A11Y Project and AMPHTML all had some offer of matching dollars that really seemed to inspire a lot more participation. The bigger the matching dollars available, the bigger the participation was.
Communication is hard. These might not have been the most ideal projects, in some respects. This last bullet is complicated enough that I'll give it it's own section.
Lessons learned: Communication challenges
While I am tremendously happy that inert and :focus-visible were our finalists and both did very well, I am biased. I helped advocate for and specify these two features before I came to Igalia, working with some Googlers who also did the initial implementations. I also advocated for them to be included in the list of projects we offered. However, I failed to anticipate that the very reasons I did both of these would present challenges for the experiment, so I'd like to talk about that a bit...
Unfortunately a confluence of things led to a lot of chatter and blog posts which were effectively saying something along the lines of "Developers shouldn't have to foot the bill because Apple doesn't care about accessibility and refuses to implement something. They don't care, and this is evidence proof - they are the last ones to not implement" and I wound up having a lot of conversations trying to correct the various misunderstanding here. That's not everyone else's fault, it's mine. I should have taken more time to communicate these things clearly, but for the record, nothing about this is really correct, so let me take the time to add the clarity for posterity...
On last implementations The second implementations only recently began or completed in Firefox, and one of those was also by Igalia. It seems really unfortunate and not exactly fair to suggest that being a few weeks/months behind, and especially when that came from outside help, is really an indictment. It's not. As an example, in the winning project, Chromium shipped this by default in October 2020. Firefox is right now pending a default release. Keep in mind that vendors don't have perfect insight into what is happening in other browsers, and even if they did reallocating resources isn't a thing that is done on a whim: Different browsers have different people with different skills and availability at any given point in time.
On refusal to implement This is 100% incorrect. I want to really stress this: Every item on our list comes from the list of things that are 'wants' from vendors themselves that need prioritization and are among the things they will be considering taking up next. If not funded here, it will definitely still get done - it's just impossible to say when really, and whatever priority they give it, they can't give to something else. This experiment gives us a more definite timeframe and frees them to spend that on implementing something else.
On web developers shouldn't have to foot the bill. Well, if you mean contributing dollars directly in crowdfunding in order to get the feature, we absolutely don't (see above bullet). However, generally speaking, this was in fact part of the conversation we wanted to start. Make no mistake: You are paying today, indirectly - and the actual investment back into the commons is inefficeint and non-guaranteed. It's wonderful that 3 organizations have seemed to foot the bill for decades, but starting a conversation about whether it is talking about that is definitely part of the goal here.
On "Apple doesn't care about accessibility" This one makes me really sad, not only because I know it isn't true and it seems easy to show otherwise, but also because there are some really great people from Apple like James Craig who absolutely not only care very deeply but often help lead on important things.
On "it's wrong to crowdfund accessibility features"Unfortunately, it seems the very things that drew me to work on these in the first place wound up working against us a little: Both inert and :focus-visible are interesting because they are "core features" to the platform that are useful to everyone. However, they are designed to sit at an intersection where they happily have really out-sized impact for accessibility. There are good polyfills for both of these which work well and somewhat reduce the degree of 'urgency'. I really thought that this made for a nice combination of interests/pains might lead to good partnerships of investment where, yes, I imagined that perhaps some organizations interested in advancing the accessibility end of things and who have historically contribute their labors, might see value in contributing to the flame more directly. Perhaps this wasn't as wise or obviously great as I imagined.
Wrapping up
All in all, in the end - despite some rocky communications, we are really encouraged by this first experiment. Thank you to everyone who pledged, boosted, blogged about the effort, etc. We're really looking forward to taking this much further next year and we'd like to begin by asking you to share which specific projects you'd be interested in seeing or supporting in the future? Hit us up on @briankardell or @igalia.
Vulkan conformance tests for graphics drivers save their output images inside an XML file called TestResults.qpa. As binary outputs aren’t allowed, these output images (that would be saved as PNG otherwise) are encoded to text using Base64 and the result is printed between <Image></Image> XML tags. This is a problem sometimes, as external tools are … Continue reading A hack to display the Vulkan CTS tests output
It all started with this bug. The description sounded humble and harmless: the browser ignored some command line flag on Wayland. A screenshot was attached where it was clearly seen that Chromium (version 72 at that time, 2019 spring) did not respect the screen density and looked blurry on a HiDPI screen.
HiDPI literally means small pixels. It is hard to tell now what was the first HiDPI screen, but I assume that their wide recognition came around 2010 with Apple’s Retina displays. Ultra HD had been standardised in 2012, defining the minimum resolution and aspect ratio for what today is known informally as 4K—and 4K screens for laptops have pixels that are small enough to call it HiDPI. This Chromium issue, dated 2012, says that the Linux port lacks support for HiDPI while the Mac version has it already. On the other hand, HiDPI on Windows was tricky even in 2014.
‘That should be easy. Apparently it’s upscaled from low resolution. Wayland allows setting scale for the back buffers, likely you’ll have to add a single call somewhere in the window initialisation’, a colleague said.
Like many stories that begin this way, this turned out to be wrong. It was not so easy. Setting the buffer scale did the right thing indeed, but it was absolutely not enough. It turned out that support for HiDPI screens was entirely missing in our implementation of the Wayland client. On my way to the solution, I have found that scaling support in Wayland is non-trivial and sometimes confusing. Since I finished this work, I have been asked a few times about what happens there, so I thought that writing it all down in a post would be useful.
Background
Modern desktop environments usually allow configuring the scale of the display at global system level. This allows all standard controls and window decorations to be sized proportionally. For applications that use those standard controls, this is a happy end: everything will be scaled automatically. Those which prefer doing everything themselves have to get the current scale from the environment and adjust rendering. Chromium does exactly that: inside it has a so-called device scale factor. This factor is applied equally to all sizes, locations, and when rendering images and fonts. No code has to bother ever. It works within this scaled coordinate system, known as device independent pixels, or DIP. The device scale factor can take fractional values like 1.5, but, because it is applied at the stage of rendering, the result looks nice. The system scale is used as default device scale factor, and the user can override it using the command line flag named --force-device-scale-factor. However, this is the very flag which did not work in the bug mentioned in the beginning of this story.
Note that for X11 the ‘natural’ scale is still the physical pixels. Despite having the system-wide scale, the system talks to the application in pixels, not in DIP. It is the application that is responsible to handle the scale properly. If it does not, it will look perfectly sharp, but its details will be perhaps too small for the naked eye.
However, Wayland does it a bit differently. The system scale there is respected by the compositor when pasting buffers rendered by clients. So, if some application has no idea about the system scale and renders itself normally, the compositor will upscale it. This is what originally happened to Chromium: it simply drew itself at 100%, and that image was then stretched by the system compositor. Remember that the Wayland way is giving a buffer to each application and then compositing the screen from those buffers, so this approach of upscaling buffers rendered by applications is natural. The picture below shows what that looks like. The screenshot is taken on a HiDPI display, so in order to see the difference better, you may want to see the full version (click the picture to open).
Firefox (left) vs. Chromium (right)
How do Wayland clients support HiDPI then?
Level 1. Basic support
Each physical output device is represented at the Wayland level by an object named output. This object has a special integer property named buffer scale that tells literally how many physical pixels are used to represent the single logical pixel. The application’s back buffer has that property too. If scales do not match, Wayland will simply scale the raster image, thus emulating the ‘normal DPI’ device for the application that is not aware of any buffer scales.
The first thing the window is supposed to do is to check the buffer scale of the output that it currently resides at, and to set the same value to its back buffer scale. This will basically make the application using all available physical pixels: as scales of the buffer and the output are the same, Wayland will not re-scale the image.
Chromium now renders sharp image but all details are half their normal size
The next thing is fixing the rendering so it would scale things to the right size. Using the output buffer scale as default is a good choice: the result will be ‘normal size’. For Chromium, this means simply setting the device scale factor to the output buffer scale.
All set now
The final bit is slightly trickier. Wayland sends UI events in DIP, but expects the client to send surface bounds in physical pixels. That means that if we implement something like interactive resize of the window, we will also have to do some math to convert the units properly.
This is enough for the basic support. The application will work well on a modern laptop with 4K display. But what if more than a single display is connected, and they have different pixel density?
Level 2. Multiple displays
If there are several output devices present in the system, each one may have its own scale. This makes things more complicated, so a few improvements are needed.
First, the window wants to know that it has been moved to another device. When that happens, the window will ask for the new buffer scale and update itself.
Second, there may be implementation-specific issues. For example, some Wayland servers initially put the new sub-surface (which is used for menus) onto the default output, even if its parent surface has been moved to another output. This may cause weird changes of their scale during their initialisation. In Chromium, we just made it so the sub-surface always takes its scale from the parent.
Level 3? Fractional scaling?
Not really. Fractional scaling is basically ‘non-even’ scales like 125%. The entire feature had been somewhat controversial when it had been announced, because of how rendering in Wayland is performed. Here, non-even scale inevitably uses raster operations which make the image blurry. However, all that is transparent to the applications. Nothing new has been introduced at the level of Wayland protocols.
Conclusion
Although this task was not as simple as we thought, in the end it turned out to be not too hard. Check the output scale, set the back buffer scale, scale the rendering, translate pixels to DIP and vice versa in certain points. Pretty straightforward, and if you are trying to do something related, I hope this post helps you.
The issue is that there are many implementations of Wayland servers out there, not all of them are consistent, and some of them have bugs. It is worth testing the solution on a few distinct Linux distributions and looking for discrepancies in behaviour.
Anyway, Chromium with native Wayland support has recently reached beta—and it supports HiDPI! There may be bugs too, but the basic support should work well. Try it, and let us know if something is not right.
Note: the Wayland support is so far experimental. To try it, you would need to launch chrome via the command line with two flags: --enable-features=UseOzonePlatform
--ozone-platform=wayland
During my presentation at the X Developers Conference I stated that we had been mostly using the Khronos Vulkan Conformance Test suite (aka Vulkan CTS) to validate our Vulkan driver for Raspberry Pi 4 (aka V3DV). While the CTS is an invaluable resource for driver testing and validation, it doesn’t exactly compare to actual real world applications, and so, I made the point that we should try to do more real world testing for the driver after completing initial Vulkan 1.0 support.
To be fair, we had been doing a little bit of this already when I worked on getting the Vulkan ports of all 3 Quake game classics to work with V3DV, which allowed us to identify and fix a few driver bugs during development. The good thing about these games is that we could get the source code and compile them natively for ARM platforms, so testing and debugging was very convenient.
Unfortunately, there are not a plethora of Vulkan applications and games like these that we can easily test and debug on a Raspberry Pi as of today, which posed a problem. One way to work around this limitation that was suggested after my presentation at XDC was to use Zink, the OpenGL to Vulkan layer in Mesa. Using Zink, we can take existing OpenGL applications that are currently available for Raspberry Pi and use them to test our Vulkan implementation a bit more thoroughly, expanding our options for testing while we wait for the Vulkan ecosystem on Raspberry Pi 4 to grow.
So last week I decided to get hands on with that. Zink requires a few things from the underlying Vulkan implementation depending on the OpenGL version targeted. Currently, Zink only targets desktop OpenGL versions, so that limits us to OpenGL 2.1, which is the maximum version of desktop OpenGL that Raspbery Pi 4 can support (we support up to OpenGL ES 3.1 though). For that desktop OpenGL version, Zink required a few optional Vulkan 1.0 features that we were missing in V3DV, namely:
Logic operations.
Alpha to one.
VK_KHR_maintenance1.
The first two were trivial: they were already implemented and we only had to expose them in the driver. Notably, when I was testing these features with the relevant CTS tests I found a bug in the alpha to one tests, so I proposed a fix to Khronos which is currently in review.
I also noticed that Zink was also implicitly requiring support for timestamp queries, so I also implemented that in V3DV and then also wrote a patch for Zink to handle this requirement better.
Finally, Zink doesn’t use Vulkan swapchains, instead it creates presentable images directly, which was problematic for us because our platform needs to handle allocations for presentable images specially, so a patch for Zink was also required to address this.
As of the writing of this post, all this work has been merged in Mesa and it enables Zink to run OpenGL 2.1 applications over V3DV on Raspberry Pi 4. Here are a few screenshots of Quake3 taken with the native OpenGL driver (V3D), with the native Vulkan driver (V3DV) and with Zink (over V3DV). There is a significant performance hit with Zink at present, although that is probably not too unexpected at this stage, but otherwise it seems to be rendering correctly, which is what we were really interested to see:
Quake3 Vulkan renderer (V3DV)
Quake3 OpenGL renderer (V3D)
Quake3 OpenGL renderer (Zink + V3DV)
Note: you’ll notice that the Vulkan screenshot is darker than the OpenGL versions. As I reported in another post, that is a feature of the Vulkan port of Quake3 and is unrelated to the driver.
Going forward, we expect to use Zink to test more applications and hopefully identify driver bugs that help us make V3DV better.
In this post I'll talk about developments along the way to a 'responsive elements' proposal (aka container queries/element queries use cases) that I talked about earlier this year, a brief detour along the way, and finally, ask for your input on both...
I've been talking a lot this year about the web ecosystem as a commons, its health, and why I believe that diversifying investment in it is both important and productive1,2,3,4,5. Collectively, at Igalia, believe this and we choose to invest in the commons ourselves too. We try to apply our expertise toward helping solve hard problems that have been long stuck, trying to listen to developers and do things that we believe can help further what we think are valuable causes. I'd like to tell you the story of one of those efforts, which became two - and enlist your input..
Advancing the Container Queries Cause
As you may recall, back in Feburary I posted an article explaining that we had been working on this problem a bunch, and sharing our thoughts and progress and just letting people know that something is happening... People are listening, and trying. I also shared that our discussions also prompted David Baron's work toward another possible path.
We wanted to present these together, so by late April we both made informal proposals to the CSS working group of what we'd like to explore. Ours was to begin with a switch() function in CSS focused on slotting into the architecture of CSS in a way that allows us to solve currently impossible problems. If we can show that this works and progress all of the engines, the problem of sugaring an even higher level expression becomes possible, but we deliver useful values fast too.
Neither the CSS working nor anyone involved in any of the proposals is arguing that these are an either/or choice here. We are pursuing options and answering questions, together. We all believe that working this problem from both ends has definite value in both the short and long term and are mutually supportive. We are also excited by Miriam Suzanne's recent work. They are almost certainly complimentary and may even wind up helping each other with different cases.
Shortly after we presented our idea, Igalia also said that we would be willing to invest time to try to do some prototyping and implementation exploration and report back.
Demos and Updates
My colleague Javi Fernadez agreed to tackle initial implementation investigations with some down time he had. Initially, he made some really nice progress pretty quickly, coming up with a strategy, writing some low-level tests and getting them passing. But, then the world got very... you know... hectic.
However, I'm really happy to announce today that that we have recently completed enough to to share and to say we'll be able to take this experience back to report to CSSWG pretty soon.
The following demos represent research and development. Implementation is limited, not yet standard and was done for the purposes of investigation, dicussion and to answer questions necessary for implementers. It is, nevertheless, real, functioning code.
A running demo in a build of Chromium of an image grid component designed independently from page layout, which uses the proposed CSS switch() function to declaratively, responsively change the grid-template-columns that it uses based on the size available to it.
Cool, right? Here's a short "lightning talk" style presentation on it with some more demos too (note the bit of jank you see is dropped frames from my recording, rendering is very fluid as they are in the version embedded above)...
So - I think this is pretty exciting... What do you think? Here are answers to some questions I know people have
FAQ
Why a function and not a MQ or pseduo?
My post from Feb and the proposal explains that this is not an "instead of", but rather a " simpler and powerful step in breaking down the problem, which is luckily also a very useful step on its own". The things we want ultimately and tend to discuss are full of several hard problems, not just one. It's very hard to discuss numerous hypotheticals all at once, especially if they represent enormous amounts of work to imagine how they slot together in existing CSS architecture. That's not to say we shouldn't still try that too, it's just that the path for one is more definite and known. Our proposal, we believe, neatly slots out a few of the hardest problems in CSS and provides an achieveable answer we can make fast progress on in all engines and lessen the number of open questoons. This could allow us to both take on higher level sugar next, but also to fill that gap in user-land until we do. Breaking down problems like this is probably a thing you have done on your own engineering projects. It makes sense.
Why is it called inline available size?
The short answer is because that is accurately what it actually represents internally and there are good arguments for it I'll save for a more detailed post if this carries on, but don't get hung up on that, we haven't begun bikeshedding details of how you write the function and it will change. In fact, these demos use a slightly different format than our proposal because it was easier to parse. Don't get hung up on that either.
Where can you use switch?
You can use anything anywhere, but it will only be valid and meaningful in certain places. The function will only provide an available-inline-size value to switch in places that the CSS WG agrees to list. Sensibly what you can say is that these will never include things that could create circularties because they are used in derermining the available size. So, you wont be able to change the display, or the font with a switch() that depends on inine-available-size, but anything that changes properties of a layout module or paint is probably fair game. CSS could make other switchable values available for other properties though.
Why doesn't it use min-width/max-width style like media queries?
Media Queries L4 supports these examples, we just wanted to show you could. You could just as easily use min-width/max-width here!
Bonus Round: Switching gears...
Shortly after we made our switch proposal, my friend Jon Neal opened a github issue based on some twitter conversations. For the next week or two this thread was very busy with lots of function proposals that looked vaguely "switch-like". In fact, a number of them were also using the word "switch". From these, there are 3 ideas which seem interesting, look somewhat similar, but have some (very) importantly different characteristics, challenges and abilities. They are described below.
nth-value()
This proposal is a function which lets a variable represent an index into a list of possible values. Its use would look like this:
This proposal is a function which allows you to pass pairs of math-like conditions and value associations, as well as a default value. The conditions are evaluated from left to right and the value following the first condition to be true is used, or the default if none are. Its use would look like this:
This (our) proposal is a function which works like cond() above, but can provide contextual information only available at appropriate stages in the lifecycle. In the case of layout properties, it would have the same sorts of information available to it as a layout worklet, thus allowing you to do a lot of the things people want to do with "container queries" as in this example below (available-inline-size is the contextual value provided during layout). Its use would look like this:
/* (proposed syntax, to be bikeshed much.. note the demos use a less flexible/different/easier to implement syntax for now ) */
.foo {
grid-template-columns:
switch(
(available-inline-size > 1024px) 1fr 4fr 1fr;
(available-inline-size > 400px) 2fr 1fr;
(available-inline-size > 100px) 1fr;
default 1fr;
);
}
As similar as these may seem, almost everything about them concretely is different. Each is parsed and follows very different paths around what can be resolved and when, as well as what you can do with them. nth-value(), it was suggested by Mozilla's Emilio Cobos, should be extremely easy to implement because it reuses much of the existing infrastructure for CSS math functions. In fact, he went ahead and implemented it in Mozilla's code base to illustrate.
While things were too hectic to advance our own proposal for a while earlier this year, we did have a enough time to look into that and indeed, the nth-value() proposal was fairly simple to implement in Chromium too! In a very short time, without very sustained investment, we were able to create a complete patch that we could submit.
While nth-value() doesn't help advance the container queries use cases, we agree that it looks like a comparatively easy win for developers, and it might be worth having too.
So, we put it to you: Is it?
We would love your feedback on both of these things - are they things that you would like to see standards bodies and implementers pursue? We certainly are willing to implement a similar prototype for WebKit if necessary if developers are interested and it is standardized. Let us know what you think via @igalia or @briankardell!
In this 9th post on OpenGL and Vulkan interoperability on Linux with EXT_external_objects and EXT_external_objects_fd we are going to see another extensions use case where a Vulkan depth buffer is used to render a pattern with OpenGL. Like every other example use case described in these posts, it was implemented for Piglit as part of … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 9: Reusing a Vulkan z buffer from OpenGL
This is a follow-up of my previous post, where I was trying to fix the bug #1042864 in Chromium: key strokes happening on native dialogs, like open and save dialogs, were not reported to the screen reader.
After learning how accessibility tools (ATs) register listeners for key events, I found out the problem was not actually there; I had to investigate how events arrive from the X11 server to the browser, and how they are forwarded to the ATs.
Not this kind of event
Events arrive from the X server
If you are running Chromium on Linux with the X11 backend (most likely, as it is the default), the Chromium browser process receives key press events from the X server. Then, it finds out if the target of those events is one of its browser windows, and sends it to the proper Window object to be processed.
These are the classes involved in the first part of this process:
The interface PlatformEventSource represents an undetermined source of events coming from the platform, and a PlatformEventDispatcher is any object in the browser capable of managing those events, dispatching them to the actual webpage or UI element. These two classes are related, the PlatformEventSource keeps a list of dispatchers it will forward the event to, if they can manage it (CanDispatchEvent).
The X11EventSource class implements PlatformEventSource; it has the code managing the events coming from an X11 server, in particular. It additionally keeps a list of XEventDispatcher objects, which is a class to manage X11 Event objects independently, but it’s not an implementation of PlatformEventDispatcher.
The X11Window class is the central piece, implementing both the PlatformEventDispatcher and the XEventDispatcher interfaces, in addition to the XWindow class. It has all the means required to find out if it can dispatch an event, and do it.
The main event processing loop looks like this:
An event arrives to X11EventSource.
X11EventSource loops through its list of XEventDispatcher, and calls CheckCanDispatchNextPlatformEvent for each of them.
The X11Window implementing that function checks if the XWindow ID of the event target matches the ID of the XWindow represented by that object, and saves the XEvent object if affirmative.
X11EventSource calls DispatchEvent as implemented by its parent class PlatformEventSource.
The PlatformEventSource loops through its list of PlatformEventDispatchers and calls CanDispatchEvent on each one of them.
The X11Window object, which had previously run CheckCanDispatchNextPlatformEvent, just verifies if the XEvent object was saved then, and considers that a confirmation it can dispatch the event.
When one of the dispatchers answers positively, it receives the event for processing in a call to DispatchEvent; it is implemented at X11Window.
If it’s a keyboard event, it takes the steps required to send it to any ATs listening to it, which had been previously registered via ATK.
When X11Window ends processing the event, it returns POST_DISPATCH_STOP_PROPAGATION, telling PlatformEventSource to stop looping through the rest of dispatchers.
This is a sequence diagram summarizing this process:
Events leave to the ATs
As explained in the previous post, ATs can register callbacks for key press events, which ultimately call AtkUtilClass::add_key_event_listener. AtkUtilClass is a struct of function pointers, the actual implementation is provided by Chromium in the AtkUtilAuraLinux class, which keeps a list of those callbacks.
When an X11Window class encounters an event that is targetting its own X Window, and it is a keyboard event, it calls X11ExtensionDelegate::OnAtkEvent() which is actually implemented by the class DesktopWindowTreeHostLinux; it ultimately hands the event to the AtkUtilAuraLinux class and runs HandleAtkEvent(). It will loop through, and run, any listeners that may have been registered.
Native dialogs are different
Native dialogs are stand-alone windows in the X server, different from the browser window that called them, and the browser process doesn’t wrap them in X11Window object. It is considered unnecessary, because the windows for native dialogs talk to the X server and receive events from it directly.
They do belong to the browser process, though, which means that the browser will still receive events targetting the dialog windows. They will go through all the steps mentioned above to eventually be dismissed, because there is no X11Window object in the browser matching the ID of the target window of the process.
Another consequence of dialog windows belonging to the browser process is that the AtkUtilClass struct points to Chromium’s own implementation, and here comes the problem… The dialog is expected to manage its own events through GTK+ code, including the GTK+ implementation of AtkUtilClass, but Chromium overrode it. The key press listeners that ATs registered are kept in Chromium code, so the dialog cannot notify them.
Finally, fixing the problem
Chromium does receive the keyboard events targetted to the dialog windows, but it does nothing with them because the target of those events is not a browser window. It gives us, though, a leg towards building a solution.
To fix the problem, I made Chromium X Windows manage the keyboard events addressed to the native dialogs in addition to their own. For that, I took advantage of the “transient” property, which indicates a dependency of one window from the other: the dialog window had been set as transient for the browser window. In my first approach, I modified X11Window::CheckCanDispatchNextPlatformEvent() to verify if the target of the event was a transient window of the browser X Window, and in that case it would hand the event to X11ExtensionDelegate to be sent to ATs, following the code patch previously explained. It stopped processing at this point, otherwise the browser window would have received key presses directed to the dialog.
The approach had one performance problem: I was calling the X server to check that property, for every keystroke, and that call implied using synchronous IPC. This was unacceptable! But it could be worked around: we could also notify the corresponding internal X11Window object about the existence of this transient window, when the dialog is created. This implies no IPC at all, we just store one new property in the X11Window object that can be checked locally when keyboard events are processed.
Chromium creates the native dialog and calls XWindow::SetTransientWindow, setting that property in the corresponding browser X Window.
When Chromium receives a keyboard event, it is captured by the X11Window object whose transient window property has been set before.
X11ExtensionDelegate::OnAtkEvent() is called for that event, then no more processing of this event happens in Chromium.
The native dialog code will also receive the event and manage the keystroke accordingly.
I hope you enjoyed this trip through Chromium event processing code. If you want to use the diagrams in this post, you may find their Dia source files in this link. Happy hacking!
In this line of work, we all stumble at least once upon a problem
that turns out to be extremely elusive and very tricky to narrow down
and solve. If we&aposre lucky, we might have everything at our
disposal to diagnose the problem but sometimes that&aposs not the
case – and in embedded development it&aposs often not the
case. Add to the mix proprietary drivers, lack of debugging symbols, a
bug that&aposs very hard to reproduce under a controlled environment,
and weeks in partial confinement due to a pandemic and what you have
is better described as a very long lucid nightmare. Thankfully,
even the worst of nightmares end when morning comes, even if sometimes
morning might be several days away. And when the fix to the problem is
in an inimaginable place, the story is definitely one worth
telling.
The problem
It all started with one
of Igalia&aposs customers deploying
a WPE WebKit-based browser in
their embedded devices. Their CI infrastructure had detected a problem
caused when the browser was tasked with creating a new webview (in
layman terms, you can imagine that to be the same as opening a new tab
in your browser). Occasionally, this view would never load, causing
ongoing tests to fail. For some reason, the test failure had a
reproducibility of ~75% in the CI environment, but during manual
testing it would occur with less than a 1% of probability. For reasons
that are beyond the scope of this post, the CI infrastructure was not
reachable in a way that would allow to have access to running
processes in order to diagnose the problem more easily. So with only
logs at hand and less than a 1/100 chances of reproducing the bug
myself, I set to debug this problem locally.
Diagnosis
The first that became evident was that, whenever this bug would
occur, the WebKit feature known as web extension (an
application-specific loadable module that is used to allow the program
to have access to the internals of a web page, as well to enable
customizable communication with the process where the page contents
are loaded – the web process) wouldn&apost work. The browser would be
forever waiting that the web extension loads, and since that wouldn&apost
happen, the expected page wouldn&apost load. The first place to look into
then is the web process and to try to understand what is preventing
the web extension from loading. Enter here, our good friend GDB, with
less than spectacular results thanks to stripped libraries.
#0 0x7500ab9c in poll () from target:/lib/libc.so.6
#1 0x73c08c0c in ?? () from target:/usr/lib/libEGL.so.1
#2 0x73c08d2c in ?? () from target:/usr/lib/libEGL.so.1
#3 0x73c08e0c in ?? () from target:/usr/lib/libEGL.so.1
#4 0x73bold6a8 in ?? () from target:/usr/lib/libEGL.so.1
#5 0x75f84208 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#6 0x75fa0b7e in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#7 0x7561eda2 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#8 0x755a176a in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#9 0x753cd842 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#10 0x75451660 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#11 0x75452882 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#12 0x75452fa8 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#13 0x76b1de62 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#14 0x76b5a970 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#15 0x74bee44c in g_main_context_dispatch () from target:/usr/lib/libglib-2.0.so.0
#16 0x74bee808 in ?? () from target:/usr/lib/libglib-2.0.so.0
#17 0x74beeba8 in g_main_loop_run () from target:/usr/lib/libglib-2.0.so.0
#18 0x76b5b11c in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#19 0x75622338 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#20 0x74f59b58 in __libc_start_main () from target:/lib/libc.so.6
#21 0x0045d8d0 in _start ()
From all threads in the web process, after much tinkering around it
slowly became clear that one of the places to look into is
that poll()
call. I will spare you the details related to what other threads were
doing, suffice to say that whenever the browser would hit the bug,
there was a similar stacktrace in one thread, going
through libEGL to a
call to poll() on top of the stack, that would never
return. Unfortunately, a stripped EGL driver coming from a proprietary
graphics vendor was a bit of a showstopper, as it was the inability to
have proper debugging symbols running inside the device (did you know
that a non-stripped WebKit library binary with debugging symbols can
easily get GDB and your device out of memory?). The best one could do
to improve that was to use the
gcore
feature in GDB, and extract a core from the device for post-mortem
analysis. But for some reason, such a stacktrace wouldn&apost give
anything interesting below the poll() call to understand
what&aposs being polled here. Did I say this was tricky?
What polls?
Because WebKit is a multiprocess web engine, having system calls
that signal, read, and write in sockets communicating with other
processes is an everyday thing. Not knowing what a poll()
call is doing and who is it that it&aposs trying to listen to, not
very good. Because the call is happening under the EGL library, one
can presume that it&aposs graphics related, but there are still
different possibilities, so trying to find out what is this polling is
a good idea.
A trick I learned while debugging this is that, in absence of
debugging symbols that would give a straightforward look into
variables and parameters, one can examine the CPU registers and try to
figure out from them what the parameters to function calls are. Let&aposs
do that with poll(). First, its signature.
int poll(struct pollfd *fds, nfds_t nfds, int timeout);
Now, let's examine the registers.
(gdb) f 0
#0 0x7500ab9c in poll () from target:/lib/libc.so.6
(gdb) info registers
r0 0x7ea55e58 2124766808
r1 0x1 1
r2 0x64 100
r3 0x0 0
r4 0x0 0
Registers r0, r1, and r2
contain poll()&aposs three
parameters. Because r1 is 1, we know that there is only
one file descriptor being polled. fds is a pointer to an
array with one element then. Where is that first element? Well, right
there, in the memory pointed to directly by
r0. What does struct pollfd look like?
struct pollfd {
int fd; /* file descriptor */
short events; /* requested events */
short revents; /* returned events */
};
What we are interested in here is the contents of fd,
the file descriptor that is being polled. Memory alignment is again in
our side, we don&apost need any pointer arithmetic here. We can
inspect directly the register r0 and find out what the
value of fd is.
(gdb) print *0x7ea55e58
$3 = 8
So we now know that the EGL library is polling the file descriptor
with an identifier of 8. But where is this file descriptor coming
from? What is on the other end? The /proc file system can
be helpful here.
# pidof WPEWebProcess
1944 1196
# ls -lh /proc/1944/fd/8
lrwx------ 1 x x 64 Oct 22 13:59 /proc/1944/fd/8 -> socket:[32166]
So we have a socket. What else can we find out about it? Turns out,
not much without
the unix_diag
kernel module, which was not available in our device. But we are
slowly getting closer. Time to call another good friend.
Where GDB fails, printf() triumphs
Something I have learned from many years working with a project as
large as WebKit, is that debugging symbols can be very difficult to
work with. To begin with, it takes ages to build WebKit with them.
When cross-compiling, it&aposs even worse. And then, very often the
target device doesn&apost even have enough memory to load the symbols
when debugging. So they can be pretty useless. It&aposs then when
just
using fprintf()
and logging useful information can simplify things. Since we know that
it&aposs at some point during initialization of the web process that
we end up stuck, and we also know that we&aposre polling a file
descriptor, let&aposs find some early calls in the code of the web
process and add some
fprintf() calls with a bit of information, specially in
those that might have something to do with EGL. What can we find out
now?
Oct 19 10:13:27.700335 WPEWebProcess[92]: Starting
Oct 19 10:13:27.720575 WPEWebProcess[92]: Initializing WebProcess platform.
Oct 19 10:13:27.727850 WPEWebProcess[92]: wpe_loader_init() done.
Oct 19 10:13:27.729054 WPEWebProcess[92]: Initializing PlatformDisplayLibWPE (hostFD: 8).
Oct 19 10:13:27.730166 WPEWebProcess[92]: egl backend created.
Oct 19 10:13:27.741556 WPEWebProcess[92]: got native display.
Oct 19 10:13:27.742565 WPEWebProcess[92]: initializeEGLDisplay() starting.
Two interesting findings from the fprintf()-powered
logging here: first, it seems that file descriptor 8 is one known to
libwpe
(the general-purpose library that powers the WPE WebKit port). Second,
that the last EGL API call right before the web process hangs
on poll() is a call
to eglInitialize(). fprintf(),
thanks for your service.
Number 8
We now know that the file descriptor 8 is coming from WPE and is
not internal to the EGL library. libwpe gets this file descriptor from
the UI process,
as one
of the many creation parameters that are passed via IPC to the
nascent process in order to initialize it. Turns out that this file
descriptor in particular, the so-called host client file descriptor,
is the one that the freedesktop backend of libWPE, from here onwards
WPEBackend-fdo,
creates when a new client is set to connect to its Wayland display. In
a nutshell, in presence of a new client, a Wayland display is supposed
to create a pair of connected sockets, create a new client on the
Display-side, give it one of the file descriptors, and pass the other
one to the client process. Because this will be useful later on,
let&aposs see how is
that currently
implemented in WPEBackend-fdo.
int pair[2];
if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0, pair) 0)
return -1;
int clientFd = dup(pair[1]);
close(pair[1]);
wl_client_create(m_display, pair[0]);
The file descriptor we are tracking down is the client file
descriptor, clientFd. So we now know what&aposs going on in this socket:
Wayland-specific communication. Let&aposs enable Wayland debugging next,
by running all relevant process with WAYLAND_DEBUG=1. We&aposll get back
to that code fragment later on.
A Heisenbug is a Heisenbug is a Heisenbug
Turns out that enabling Wayland debugging output for a few
processes is enough to alter the state of the system in such a way
that the bug does not happen at all when doing manual
testing. Thankfully the CI&aposs reproducibility is much higher, so
after waiting overnight for the CI to continuously run until it hit
the bug, we have logs. What do the logs say?
WPEWebProcess[41]: initializeEGLDisplay() starting.
-> wl_display@1.get_registry(new id wl_registry@2)
-> wl_display@1.sync(new id wl_callback@3)
So the EGL library is trying to fetch the Wayland
registry and it&aposs doing a wl_display_sync() call
afterwards, which will block until the server responds. That&aposs
where the blocking poll() call comes from. So, it turns
out, the problem is not necessarily on this end of the Wayland socket,
but perhaps on the other side, that is, in the so-called UI process
(the main browser process). Why is the Wayland display not
replying?
The loop
Something that is worth mentioning before we move on is how the
WPEBackend-fdo Wayland display integrates with the system. This
display is a nested display, with each web view a client, while it is
itself a client of the system&aposs Wayland display. This can be a bit
confusing if you&aposre not very familiar with how Wayland works, but
fortunately there is
good
documentation about Wayland elsewhere.
The way that the Wayland display in the UI process of a WPEWebKit
browser is integrated with the rest of the program, when it uses
WPEBackend-fdo, is through the
GLib
main event loop. Wayland itself has an event loop implementation
for servers, but for a GLib-powered application it can be useful to
use GLib&aposs and integrate Wayland&aposs event processing with the different
stages of the GLib main loop. That is precisely how WPEBackend-fdo is
handling its clients&apos events. As discussed earlier, when a new client
is created a pair of connected sockets are created and one end is
given to Wayland to control communication with the
client. GSourceFunc
functions are used to integrate Wayland with the application main
loop. In these functions, we make sure that whenever there are pending
messages to be sent to clients, those are sent, and whenever any of
the client sockets has pending data to be read, Wayland reads from
them, and to dispatch the events that might be necessary in response
to the incoming data. And here is where things start getting really
strange, because after doing a bit of
fprintf()-powered debugging inside the Wayland-GSourceFuncs functions,
it became clear that the Wayland events from the clients were never
dispatched, because the dispatch()GSourceFunc was not being called,
as if there was nothing coming from any Wayland client. But how is
that possible, if we already know that the web process client is
actually trying to get the Wayland registry?
To move forward, one needs to understand how the GLib main loop
works, in particular, with Unix file descriptor sources. A very brief
summary of this is that, during an iteration of the main loop, GLib
will poll file descriptors to see if there are any interesting events
to be reported back to their respective sources, in which case the
sources will decide whether to trigger the dispatch()
phase. A simple source might decide in its dispatch()
method to directly read or write from/to the file descriptor; a
Wayland display source (as in our case), will
call wl_event_loop_dispatch() to do this for us.
However, if the source doesn&apost find any interesting events, or if
the source decides that it doesn&apost want to handle them,
the dispatch() invocation will not happen. More on the
GLib main event loop in
its API
documentation.
So it seems that for some reason the dispatch() method is not being
called. Does that mean that there are no interesting events to read
from? Let&aposs find out.
System call tracing
Here we resort to another helpful
tool, strace. With strace
we can try to figure out what is happening when the main loop polls
file descriptors. The strace output is huge (because it
takes easily over a hundred attempts to reproduce this), but we know
already some of the calls that involve file descriptors from the code
we looked at above, when the client is created. So we can use those
calls as a starting point in when searching through the several MBs of
logs. Fast-forward to the relevant logs.
What we see there is, first, WPEBackend-fdo creating a new socket
pair (128, 130) and then, when file descriptor 130 is passed to
wl_client_create() to
create a new client, Wayland adds that file descriptor to its
epoll() instance
for monitoring clients, which is referred to by file descriptor 34. This way, whenever there are
events in file descriptor 130, we will hear about them in file descriptor 34.
So what we would expect to see next is that, after the web process
is spawned, when a Wayland client is created using the passed file
descriptor and the EGL driver requests the Wayland registry from the
display, there should be a POLLIN event coming in file
descriptor 34 and, if the dispatch() call for the source
was called,
a epoll_wait()
call on it, as that is
what wl_event_loop_dispatch()
would do when called from the source&aposs dispatch()
method. But what do we have instead?
strace can be a bit cryptic, so let&aposs explain
those two function calls. The first one is a poll in a series of file
descriptors (including 30 and 34) for POLLIN events. The
return value of that call tells us that there is a POLLIN
event in file descriptor 34 (the Wayland display epoll()
instance for clients). But unintuitively, the call right after is
trying to read a message from socket 30 instead, which we know
doesn&apost have any pending data at the moment, and consequently
returns an error value with an errno
of EAGAIN (Resource temporarily unavailable).
Why is the GLib main loop triggering a read from 30 instead of 34?
And who is 30?
We can answer the latter question first. Breaking on a running UI
process instance at the right time shows who is reading from
the file descriptor 30:
#1 0x70ae1394 in wl_os_recvmsg_cloexec (sockfd=30, msg=msg@entry=0x700fea54, flags=flags@entry=64)
#2 0x70adf644 in wl_connection_read (connection=0x6f70b7e8)
#3 0x70ade70c in read_events (display=0x6f709c90)
#4 wl_display_read_events (display=0x6f709c90)
#5 0x70277d98 in pwl_source_check (source=0x6f71cb80)
#6 0x743f2140 in g_main_context_check (context=context@entry=0x2111978, max_priority=, fds=fds@entry=0x6165f718, n_fds=n_fds@entry=4)
#7 0x743f277c in g_main_context_iterate (context=0x2111978, block=block@entry=1, dispatch=dispatch@entry=1, self=)
#8 0x743f2ba8 in g_main_loop_run (loop=0x20ece40)
#9 0x00537b38 in ?? ()
So it&aposs also Wayland, but on a different level. This
is the Wayland client source (remember that the browser is also a
Wayland client?), which is installed
by cog (a thin browser
layer on top of WPE WebKit that makes writing browsers easier to do)
to process, among others, input events coming from the parent Wayland
display. Looking
at the cog code, we can see that the
wl_display_read_events()
call happens only if GLib reports that there is
a G_IO_IN
(POLLIN) event in its file descriptor, but we already
know that this is not the case, as per the strace
output. So at this point we know that there are two things here that
are not right:
A FD source with a G_IO_IN condition is not being dispatched.
A FD source without a G_IO_IN condition is being dispatched.
Someone here is not telling the truth, and as a result the main loop
is dispatching the wrong sources.
The loop (part II)
It is at this point that it would be a good idea to look at what
exactly the GLib main loop is doing internally in each of its stages
and how it tracks the sources and file descriptors that are polled and
that need to be processed. Fortunately, debugging symbols for GLib are
very small, so debugging this step by step inside the device is rather
easy.
Let&aposs look at how the main loop decides which sources
to dispatch, since for some reason it&aposs dispatching the wrong ones.
Dispatching happens in
the g_main_dispatch()
method. This method goes over a list of pending source dispatches and
after a few checks and setting the stage, the dispatch method for the
source gets called. How is a source set as having a pending dispatch?
This happens in
g_main_context_check(),
where the main loop checks the results of the polling done in this
iteration and runs the check() method for sources that
are not ready yet so that they can decide whether they are ready to be
dispatched or not. Breaking into the Wayland display source, I know
that
the check()
method is called. How does this method decide to be dispatched or
not?
In this lambda function we&aposre returning TRUE or
FALSE, depending on whether the revents
field in
the GPollFD
structure have been filled during the polling stage of this iteration
of the loop. A return value of TRUE indicates the main
loop that we want our source to be dispatched. From
the strace output, we know that there is a
POLLIN (or G_IO_IN) condition, but we also know that the main loop is
not dispatching it. So let&aposs look at what&aposs in this GPollFD structure.
For this, let&aposs go back to g_main_context_check() and inspect the array
of GPollFD structures that it received when called. What do we find?
That&aposs the result of the poll() call! So far so good. Now the method
is supposed to update the polling records it keeps and it uses when
calling each of the sources check() functions. What do these records
hold?
We&aposre not interested in the first record quite yet, but clearly
there&aposs something odd here. The polling records are showing a
different value in the revent fields for both 30 and 34. Are these
records updated correctly? Let&aposs look at the algorithm that is doing
this update, because it will be relevant later on.
pollrec = context->poll_records;
i = 0;
while (pollrec && i n_fds)
{
while (pollrec && pollrec->fd->fd == fds[i].fd)
{
if (pollrec->priority = max_priority)
{
pollrec->fd->revents =
fds[i].revents & (pollrec->fd->events | G_IO_ERR | G_IO_HUP | G_IO_NVAL);
}
pollrec = pollrec->next;
}
i++;
}
In simple words, what this algorithm is doing is to traverse
simultaneously the polling records and the GPollFD array,
updating the polling records revents with the results of
polling. From
reading how
the pollrec linked list is built internally, it&aposs
possible to see that it&aposs purposely sorted by increasing file
descriptor identifier value. So the first item in the list will have
the record for the lowest file descriptor identifier, and so on. The
GPollFD array is also built in this way, allowing for a
nice optimization: if more than one polling record – that is, more
than one polling source – needs to poll the same file descriptor,
this can be done at once. This is why this otherwise O(n^2) nested
loop can actually be reduced to linear time.
One thing stands out here though: the linked list is only advanced
when we find a match. Does this mean that we always have a match
between polling records and the file descriptors that have just been
polled? To answer that question we need to check how is the array of
GPollFD structures
filled. This
is done in g_main_context_query(), as we hinted
before. I&aposll spare you the details, and just focus on what seems
relevant here: when is a poll record not used to fill
a GPollFD?
Interesting! If a polling record belongs to a source whose priority
is lower than the maximum priority that the current iteration is
going to process, the polling record is skipped. Why is this?
In simple terms, this happens because each iteration of the main
loop finds out the highest priority between the sources that are ready
in the prepare() stage, before polling, and then only
those file descriptor sources with at least such a a priority are
polled. The idea behind this is to make sure that high-priority
sources are processed first, and that no file descriptor sources with
lower priority are polled in vain, as they shouldn&apost be
dispatched in the current iteration.
GDB tells me that the maximum priority in this iteration is
-60. From an earlier GDB output, we also know that there&aposs a
source for a file descriptor 19 with a priority 0.
Since 19 is lower than 30 and 34, we know that this record is
before theirs in the linked list (and so it happens, it&aposs the
first one in the list too). But we know that, because its priority is
0, it is too low to be added to the file descriptor array to be
polled. Let&aposs look at the loop again.
pollrec = context->poll_records;
i = 0;
while (pollrec && i n_fds)
{
while (pollrec && pollrec->fd->fd == fds[i].fd)
{
if (pollrec->priority = max_priority)
{
pollrec->fd->revents =
fds[i].revents & (pollrec->fd->events | G_IO_ERR | G_IO_HUP | G_IO_NVAL);
}
pollrec = pollrec->next;
}
i++;
}
The first polling record was skipped during the update of
the GPollFD array, so the condition pollrec
&& pollrec->fd->fd == fds[i].fd is never going to
be satisfied, because 19 is not in the array. The
innermost while() is not entered, and as such
the pollrec list pointer never moves forward to the next
record. So no polling record is updated here, even if we have
updated revent information from the polling results.
What happens next should be easy to see. The check()
method for all polled sources are called with
outdated revents. In the case of the source
for file descriptor 30, we wrongly tell it there&aposs a
G_IO_IN condition, so it asks the main loop to call
dispatch it triggering a a wl_connection_read() call in a
socket with no incoming data. For the source with file descriptor 34,
we tell it that there&aposs no incoming data and
its dispatch() method is not invoked, even when on the
other side of the socket we have a client waiting for data to come and
blocking in the meantime. This explains what we see in
the strace output above. If the source with file
descriptor 19 continues to be ready and with its priority unchanged,
then this situation repeats in every further iteration of the main
loop, leading to a hang in the web process that is forever waiting
that the UI process reads its socket pipe.
The bug – explained
I have been using GLib for a very long time, and I have only fixed
a couple of minor bugs in it over the years. Very few actually,
which is why it was very difficult for me to come to accept that I
had found a bug in one of the most reliable and complex parts of the
library. Impostor syndrome is a thing and it really gets in the way.
But in a nutshell, the bug in the GLib main loop is that the very
clever linear update of registers is missing something very important:
it should skip to the first polling record matching before attempting
to update its revents. Without this, in the presence of a
file descriptor source with the lowest file descriptor identifier and
also a lower priority than the cutting priority in the current main
loop iteration, revents in the polling registers are not
updated and therefore the wrong sources can be dispatched. The
simplest patch to avoid this, would look as follows.
i = 0;
while (pollrec && i n_fds)
{
+ while (pollrec && pollrec->fd->fd != fds[i].fd)
+ pollrec = pollrec->next;
+
while (pollrec && pollrec->fd->fd == fds[i].fd)
{
if (pollrec->priority = max_priority)
Once we find the first matching record, let&aposs update all consecutive
records that also match and need an update, then let&aposs skip to the
next record, rinse and repeat. With this two-line patch, the web
process was finally unlocked, the EGL display initialized properly,
the web extension and the web page were loaded, CI tests starting
passing again, and this exhausted developer could finally put his mind
to rest.
A complete
patch, including improvements to the code comments around this
fascinating part of GLib and also a minimal test case reproducing the
bug have already been reviewed by the GLib maintainers and merged to
both stable and development branches. I expect that at
least some GLib sources will start being called in a
different (but correct) order from now on, so keep an eye on your
GLib sources. :-)
Standing on the shoulders of giants
At this point I should acknowledge that without the support from my
colleagues in the WebKit team in Igalia, getting to the bottom of this
problem would have probably been much harder and perhaps my sanity
would have been at stake. I want to
thank Adrián
and &Zcaronan for
their input on Wayland, debugging techniques, and for allowing me to
bounce back and forth ideas and findings as I went deeper into this
rabbit hole, helping me to step out of dead-ends, reminding me to use
tools out of my everyday box, and ultimately, to be brave enough to
doubt GLib&aposs correctness, something that much more often than not I
take for granted.
Thanks also to Philip
and Sebastian for their
feedback and prompt code review!
This is the 5th post of the OpenGL and Vulkan interoperability series where I describe some use cases for the EXT_external_objects and EXT_external_objects_fd extensions. These use cases have been implemented inside Piglit as part of my work for Igalia‘s graphics team using a Vulkan framework I’ve written for this purpose. And in this 5th post, … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 5: A Vulkan pixel buffer is reused by OpenGL
If you’re developing C/C++ on embedded devices, you might already have stumbled upon a corrupt stacktrace like this when trying to debug with gdb:
(gdb) bt
#0 0xb38e32c4 in pthread_getname_np () from /home/enrique/buildroot/output5/staging/lib/libpthread.so.0
#1 0xb38e103c in __lll_timedlock_wait () from /home/enrique/buildroot/output5/staging/lib/libpthread.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
In these cases I usually give up gdb and try to solve my problems by adding printf()s and resorting to other tools. However, there are times when you really really need to know what is in that cursed stack.
ARM devices subroutine calls work by setting the return address in the Link Register (LR), so the subroutine knows where to point the Program Counter (PC) register to. While not jumping into subroutines, the values of the LR register is saved in the stack (to be restored later, right before the current subroutine returns to the caller) and the register can be used for other tasks (LR is a “scratch register”). This means that the functions in the backtrace are actually there, in the stack, in the form of older saved LRs, waiting for us to get them.
So, the first step would be to dump the memory contents of the backtrace, starting from the address pointed by the Stack Pointer (SP). Let’s print the first 256 32-bit words and save them as a file from gdb:
(gdb) set logging overwrite on
(gdb) set logging file /tmp/bt.txt
(gdb) set logging on
Copying output to /tmp/bt.txt.
(gdb) x/256wa $sp
0xbe9772b0: 0x821e 0xb38e103d 0x1aef48 0xb1973df0
0xbe9772c0: 0x73d 0xb38dc51f 0x0 0x1
0xbe9772d0: 0x191d58 0x191da4 0x19f200 0xb31ae5ed
...
0xbe977560: 0xb28c6000 0xbe9776b4 0x5 0x10871 <main(int, char**)>
0xbe977570: 0xb6f93000 0xaaaaaaab 0xaf85fd4a 0xa36dbc17
0xbe977580: 0x130 0x0 0x109b9 <__libc_csu_init> 0x0
...
0xbe977690: 0x0 0x0 0x108cd <_start> 0x0
0xbe9776a0: 0x0 0x108ed <_start+32> 0x10a19 <__libc_csu_fini> 0xb6f76969
(gdb) set logging off
Done logging to /tmp/bt.txt.
Gdb already can name some of the functions (like main()), but not all of them. At least not the ones more interesting for our purpose. We’ll have to look for them by hand.
We first get the memory page mapping from the process (WebKit’s WebProcess in my case) looking in /proc/pid/maps. I’m retrieving it from the device (named metro) via ssh and saving it to a local file. I’m only interested in the code pages, those with executable (‘x’) permissions:
Now we process the backtrace to remove address markers and have one word per line:
$ cat /tmp/bt.txt | sed -e 's/^[^:]*://' -e 's/[<][^>]*[>]//g' | while read A B C D; do echo $A; echo $B; echo $C; echo $D; done | sed 's/^0x//' | while read P; do printf '%08x\n' "$((16#"$P"))"; done | sponge /tmp/bt.txt
Then merge and sort both files, so the addresses in the stack appear below their corresponding mappings:
The addr2line tool can give us the exact function an address belongs to, or even the function and source code line if the code has been built with symbols. But the addresses addr2line understands are internal offsets, not absolute memory addresses. We can convert the addresses in the stack to offsets with this expression:
offset = address - page start + base offset
I’m using buildroot as my cross-build environment, so I need to pick the library files from the staging directory because those are the unstripped versions. The addr2line tool is the one from the buldroot cross compiling toolchain. Written as a script:
$ cat /tmp/mapped.txt | while read ADDR _ BASE _ END _ BASEOFFSET _ _ FILE; do OFFSET=$(printf "%08x\n" $((0x$ADDR - 0x$BASE + 0x$BASEOFFSET))); FILE=~/buildroot/output/staging/$FILE; if [[ -f $FILE ]]; then LINE=$(~/buildroot/output/host/usr/bin/arm-buildroot-linux-gnueabihf-addr2line -p -f -C -e $FILE $OFFSET); echo "$ADDR $LINE"; fi; done > /tmp/addr2line.txt
Finally, we filter out the useless [??] entries:
$ cat /tmp/bt.txt | while read DATA; do cat /tmp/addr2line.txt | grep "$DATA"; done | grep -v '[?][?]' > /tmp/fullbt.txt
What remains is something very similar to what the real backtrace should have been if everything had originally worked as it should in gdb:
b31ae5ed gst_pad_send_event_unchecked en /home/enrique/buildroot/output5/build/gstreamer1-1.10.4/gst/gstpad.c:5571
b31a46c1 gst_debug_log en /home/enrique/buildroot/output5/build/gstreamer1-1.10.4/gst/gstinfo.c:444
b31b7ead gst_pad_send_event en /home/enrique/buildroot/output5/build/gstreamer1-1.10.4/gst/gstpad.c:5775
b666250d WebCore::AppendPipeline::injectProtectionEventIfPending() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebCore/platform/graphics/gstreamer/mse/AppendPipeline.cpp:1360
b657b411 WTF::GRefPtr<_GstEvent>::~GRefPtr() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/DerivedSources/ForwardingHeaders/wtf/glib/GRefPtr.h:76
b5fb0319 WebCore::HTMLMediaElement::pendingActionTimerFired() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebCore/html/HTMLMediaElement.cpp:1179
b61a524d WebCore::ThreadTimers::sharedTimerFiredInternal() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebCore/platform/ThreadTimers.cpp:120
b61a5291 WTF::Function<void ()>::CallableWrapper<WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::{lambda()#1}>::call() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/DerivedSources/ForwardingHeaders/wtf/Function.h:101
b6c809a3 operator() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:171
b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164
b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164
b2ad4223 g_main_context_dispatch en :?
b6c80601 WTF::{lambda(_GSource*, int (*)(void*), void*)#1}::_FUN(_GSource*, int (*)(void*), void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:40
b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164
b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164
b2adfc49 g_poll en :?
b2ad44b7 g_main_context_iterate.isra.29 en :?
b2ad477d g_main_loop_run en :?
b6c80de3 WTF::RunLoop::run() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:97
b6c654ed WTF::RunLoop::dispatch(WTF::Function<void ()>&&) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/RunLoop.cpp:128
b5937445 int WebKit::ChildProcessMain<WebKit::WebProcess, WebKit::WebProcessMain>(int, char**) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebKit/Shared/unix/ChildProcessMain.h:64
b27b2978 __bss_start en :?
I hope you find this trick useful and the scripts handy in case you ever to resort to examining the raw stack to get a meaningful backtrace.
You may have seen an interesting paper cross your radar a couple months ago: Everything Old is New Again: Binary Security of WebAssembly, by Daniel Lehmann, Johannes Kinder and Michael Pradel. The paper makes some strong claims and I would like to share some thoughts on it.
reader-response theory
For context, I have been working on web browsers for the last 8 years or so, most recently on the JavaScript and WebAssembly engine in Firefox. My work mostly consists of implementing new features, which if you are familiar with software development translates as "writing bugs". Almost all of those bugs are security bugs, potentially causing Firefox to go from being an agent of the user to an agent of the Mossad, or of cryptocurrency thieves, or anything else.
Mitigating browser bug flow takes a siege mentality. Web browsers treat all web pages and their corresponding CSS, media, JavaScript, and WebAssembly as hostile. We try to reason about global security properties, and translate those properties into invariants ensured at compile-time and run-time, for example to ensure that a web page from site A can't access cookies from site B.
In this regard, WebAssembly has some of the strongest isolation invariants in the whole platform. A WebAssembly module has access to nothing, by default: neither functionality nor data. Even a module's memory is isolated from the rest of the browser, both by construction (that's just how WebAssembly is specified) and by run-time measures (given that pointers are 32 bits in today's WebAssembly, we generally reserve a multi-gigabyte region for a module's memory that can contain nothing else).
All of this may seem obvious, but consider that a C++ program compiled to native code on a POSIX platform can use essentially everything that the person running it has access to: your SSH secrets, your email, all of your programs, and so on. That same program compiled to WebAssembly does not -- any capability it has must have been given to it by the person running the program. For POSIX-like programs, the WebAssembly community is working on a POSIX for the web that standardizes a limited-capability access to data and functionality from the world, and in web browsers, well of course the module has access only to the capabilities that the embedding web page gives to it. Mostly, as the JS run-time accompanying the WebAssembly is usually generated by emscripten, this set of capabilties is a function of the program itself.
Of course, complex WebAssembly systems may contain multiple agents, acting on behalf of different parties. For example, a module might, through capabilities provided by the host, be able to ask flickr to delete a photo, but might also be able to crawl a URL for photos. Probably in this system, crawling a web page shouldn't be able to "trick" the WebAssembly module into deleting a photo. The C++ program compiled to WebAssembly could have a bug of course, in which case, you get to keep both pieces.
I mention all of this because we who work on WebAssembly are proud of this work! It is a pleasure to design and build a platform for high-performance code that provides robust capabilities-based security properties.
the new criticism
Therefore it was with skepticism that I started reading the Lehmann et al paper. The paper focusses on WebAssembly itself, not any particular implementation thereof; what could be wrong about WebAssembly?
I found the answer to be quite nuanced. To me, the paper shows three interesting things:
Memory-safety bugs in C/C++ programs when compiled to WebAssembly can cause control-flow edges that were not present in the source program.
Unexpected control-flow in a web browser can sometimes end up in a call to eval with the permissions of the web page, which is not good.
It's easier in some ways to exploit bugs in a C/C++ program when compiled to WebAssembly than when compiled natively, because many common mitigations aren't used by the WebAssembly compiler toolchain.
Firstly, let's dicuss the control-flow point. Let's say that the program has a bug, and you have made an exploit to overwrite some memory location. What can you do with it? Well, consider indirect calls (call_indirect). This is what a compiler will emit for a vtable method call, or for a call to a function pointer. The possible targets for the indirect call are stored in a table, which is a side array of all possible call_indirect targets. The actual target is selected at run-time based on an index; WebAssembly function pointers are just indices into this table.
So if a function loads an index into the indirect call table from memory, and some exploit can change this index, then you can cause a call site to change its callee. Although there is a run-time type check that occurs at the call_indirect site to ensure that the callee is called with the right type, many functions in a module can have compatible types and thus be callable without an error.
OK, so that's not great. But what can you do with it? Well it turns out that emscripten will sometimes provide JavaScript's eval to the WebAssembly module. Usually it will be called only with a static string, but anything can happen. If an attacker can redirect a call site to eval instead of one of the possible targets from the source code, you can (e.g.) send the user's cookies to evil.com.
There's a similar vulnerability regarding changing the operand to eval, instead. Strings are represented in linear memory as well, and there's no write protection on them, even if they are read-only data. If your write primitive can change the string being passed to eval, that's also a win for the attacker. More details in the paper.
This observation brings us to the last point, which is that many basic mitigations in (e.g.) POSIX deployments aren't present in WebAssembly. There are no OS-level read-only protections for static data, and the compiler doesn't enforce this either. Also WebAssembly programs have to bundle their own malloc, but the implementations provided by emscripten don't implement the "hardening" techniques. There is no addres-space layout randomization, so exploits are deterministic. And so on.
on mitigations
It must be said that for most people working on WebAssembly, security "mitigations" are... unsatisfactory. They aren't necessary for memory-safe programs, and they can't prevent memory-unsafe programs from having unexpected behavior. Besides, we who work on WebAssembly are more focussed on the security properties of the WebAssembly program as embedded in its environment, but not on the program itself. Garbage in, garbage out, right?
In that regard, I think that one answer to this paper is just "don't". Don't ship memory-unsafe programs, or if you do, don't give them eval capabilities. No general mitigation will make these programs safe. Writing your program in e.g. safe Rust is a comprehensive fix to this class of bug.
But, we have to admit also that shipping programs written in C and C++ is a primary goal of WebAssembly, and that no matter how hard we try, some buggy programs will get shipped, and therefore that there is marginal value to including mitigations like read-only data or even address space randomization. We definitely need to work on getting control-flow integrity protections working well with the WebAssembly toolchain, probably via multi-table support (part of the reference types extension; my colleague Paulo Matos just landed a patch in this area). And certainly Emscripten should work towards minimizing the capabilities set exposed to WebAssembly by the generated JavaScript, notably by compiling away uses of eval by embind.
Finally, I think that many of the problems identified by this paper will be comprehensively fixed in a different way by more "managed" languages. The problem is that C/C++ pointers are capabilities into all of undifferentiated linear memory. By contrast, handles to GC-managed objects are unforgeable: given object A, you can't get to object B except if object A references B. It would be great if we could bring some of the benefits of this more capability-based approach to in-memory objects to languages like C and C++; more on that in a future note, I think.
chapeau
In the end, despite my initial orneriness, I have to admit that the paper authors point out some interesting areas to work on. It's clear that there's more work to do. I was also relieved to find that my code is not at fault in this particular instance :) Onwards and upwards, and until next time, happy hacking!
Greetings, internet! Today I have the silliest of demos for you: malloc-as-a-service.
loading walloc...
JavaScript disabled, no walloc demo. See the walloc web page for more information.
>&&<&>>>>>&&>
The above input box, if things managed to work, loads up a simple bare-bones malloc implementation, and exposes "malloc" and "free" bindings. But the neat thing is that it's built without emscripten: it's a standalone C file that compiles directly to WebAssembly, with no JavaScript run-time at all. I share it here because it might come in handy to people working on WebAssembly toolchains, and also because it was an amusing experience to build.
wat?
The name of the allocator is "walloc", in which the w is for WebAssembly.
Walloc was designed with the following priorities, in order:
Standalone. No stdlib needed; no emscripten. Can be included in a project without pulling in anything else.
Reasonable allocation speed and fragmentation/overhead.
Small size, to minimize download time.
Standard interface: a drop-in replacement for malloc.
Single-threaded (currently, anyway).
Emscripten includes a couple of good malloc implementations (dlmalloc and emmalloc) which probably you should use instead. But if you are really looking for a bare-bones malloc, walloc is fine.
You can check out all the details at the walloc project page; a selection of salient bits are below.
Firstly, to build walloc, it's just a straight-up compile:
The resulting walloc.o is a conforming WebAssembly file on its own, but which also contains additional symbol table and relocation sections which allow wasm-ld to combine separate compilation units into a single final WebAssembly file. walloc.c on its own doesn't import or export anything, in the WebAssembly sense; to make bindings visible to JS, you need to add a little wrapper:
typedef __SIZE_TYPE__ size_t;
#define WASM_EXPORT(name) \
__attribute__((export_name(#name))) \
name
// Declare these as coming from walloc.c.
void *malloc(size_t size);
void free(void *p);
void* WASM_EXPORT(walloc)(size_t size) {
return malloc(size);
}
void WASM_EXPORT(wfree)(void* ptr) {
free(ptr);
}
If you compile that to exports.o and link via wasm-ld --no-entry --import-memory -o walloc.wasm exports.o walloc.o, you end up with the walloc.wasm used in the demo above. See your inspector for the URL.
The resulting wasm file is about 2 kB (uncompressed).
Walloc isn't the smallest allocator out there. A simple bump-pointer allocator that never frees is the fastest thing you can have. There is also an alternate allocator for Rust, wee_alloc, which is said to be smaller than walloc, though I think it is less space-efficient for small objects. But still, walloc is pretty small.
implementation notes
When a C program is compiled to WebAssembly, the resulting wasm module (usually) has associated linear memory. It can be linked in a way that the memory is created by the module when it's instantiated, or such that the module is given a memory by its host. The above example passed --import-memory to the linker, allowing the host to bound memory usage for the module instance.
The linear memory has the usual data, stack, and heap segments. The data and stack are placed first. The heap starts at the &__heap_base symbol. (This symbol is computed and defined by the linker.) All bytes above &__heap_base can be used by the wasm program as it likes. So &__heap_base is the lower bound of memory managed by walloc.
The sensible thing to prevent accidental overflow (underflow, really) is to have the stack grow down to 0, with data at higher addresses. But this can cause WebAssembly code that references data to take up more bytes, because addresses are written using variable-length "LEB" encodings that favor short offsets, so it isn't the default, right now at least.
Anyway! The upper bound of memory managed by walloc is the total size of the memory, which is aligned on 64-kilobyte boundaries. (WebAssembly ensures this alignment.) Walloc manages memory in 64-kb pages as well. It starts with whatever memory is initially given to the module, and will expand the memory if it runs out. The host can specify a maximum memory size, in pages; if no more pages are available, walloc's malloc will simply return NULL; handling out-of-memory is up to the caller.
Walloc has two allocation strategies: small and large objects.
big bois
A large object is more than 256 bytes.
There is a global freelist of available large objects, each of which has a header indicating its size. When allocating, walloc does a best-fit search through that list.
Large object allocations are rounded up to 256-byte boundaries, including the header.
If there is no object on the freelist that can satisfy an allocation, walloc will expand the heap by the size of the allocation, or by half of the current walloc heap size, whichever is larger. The resulting page or pages form a large object that can satisfy the allocation.
If the best object on the freelist has more than a chunk of space on the end, it is split, and the tail put back on the freelist. A chunk is 256 bytes.
As each page is 65536 bytes, and each chunk is 256 bytes, there are therefore 256 chunks in a page. The first chunk in a page that begins an allocated object, large or small, contains a header chunk. The page header has a byte for each of the 256 chunks in the page. The byte is 255 if the corresponding chunk starts a large object; otherwise the byte indicates the size class for packed small-object allocations (see below).
When splitting large objects, we avoid starting a new large object on a page header chunk. A large object can only span where a page header chunk would be if it includes the entire page.
Freeing a large object pushes it on the global freelist. We know a pointer is a large object by looking at the page header. We know the size of the allocation, because the large object header precedes the allocation. When the next large object allocation happens after a free, the freelist will be compacted by merging adjacent large objects.
small fry
Small objects are allocated from segregated freelists. The granule size is 8 bytes. Small object allocations are packed in a chunk of uniform allocation size. There are size classes for allocations of each size from 1 to 6 granules, then 8, 10, 16, and 32 granules; 10 sizes in all. For example, an allocation of e.g. 12 granules will be satisfied from a 16-granule chunk. Each size class has its own free list.
When allocating, if there is nothing on the corresponding freelist, walloc will allocate a new large object, then change its chunk kind in the page header to the size class. It then goes through the fresh chunk, threading the objects through each other onto a free list.
In this example, we imagine that the 4-granules freelist was empty, and that the large object freelist contained only large object 2, running all the way to the end of the page. We allocated a new 4-granules chunk, splitting the first chunk off the large object, and pushing the newly trimmed large object back onto the large object freelist, updating the page header appropriately. We then thread the 4-granules (32-byte) allocations in the fresh chunk together (the chunk has room for 8 of them), treating them as if they were instances of struct freelist, pushing them onto the global freelist for 4-granules allocations.
in fresh chunk, next link for object N points to object N+1
/--------\
| |
+------------------+-^--------v-----+----------+
granules=4: | (padding, maybe) | object 0 | ... | object 7 |
+------------------+----------+-----+----------+
^ 4-granule freelist now points here
The size classes were chosen so that any wasted space (padding) is less than the size class.
Freeing a small object pushes it back on its size class's free list. Given a pointer, we know its size class by looking in the chunk kind in the page header.
and that's it
Hey have fun with the thing! Let me know if you find it useful. Happy hacking and until next time!
Flexbox had a lot of early problems, but by mid-May 2020 where our story begins, both Firefox and Chromium had done a lot of work on improving things with this feature. WebKit, however, hadn’t caught up. Prioritizing the incredible amounts of work a web engine requires is difficult. The WebKit implementation was still passable for very many (most) cases of the core features, and it didn’t have problems that caused crashes or something that urgently demanded attention, so engineers dedicated their limited time toward other things. The net result, however, was that as this choice repeated many times, the comparative state of WebKit’s flexbox implementation had fallen behind pretty significantly.
Web Platform Tests (WPT) is a huge ongoing effort from many people to come up with a very extensive list of tests that could help both spec editors and implementors to make sure we have great compatibility. In the case of flexbox, for example, there are currently 773 tests (2926 subtests) and WebKit was failing a good amount of them. This matters a lot because there are things that flexbox is ideal for, and it is exceptionally widely used. In mid-May, Igalia was contracted to improve things here, and in this post, I’ll explain and illustrate how we did that.
The Challenge
The main issues were (in no particular order):
min-width:auto and min-height:auto handling
Nested flexboxes in column flows
Flexboxes inside tables and viceversa
Percentages in heights with indefinite sizes
WebKit CI not runnning many WPT flexbox tests
and of course… lack of gap support in Flexbox
Modifying Flexbox layout code is a challenge by itself. Tiny modifications in the source code could cause huge differences in the final layout. You might even have a patch that passes all the tests and regresses multiplepopular web sites.
Good news is that we were able to tackle most of those issues. Let’s review what changes you could eventually expect from future releases of Safari (note that Apple doesn’t disclose information about future products and/or releases) and the other WebKit based browsers (like GNOME Web).
Flexbox gaps
Probably one of the most awaited features in WebKit by web developers. It’s finally here after Firefox and Chrome landed it not so long ago. The implementation was initially inspired by the one in Chrome but then it diverged a bit in the final version of the patch. The important thing is that the behaviour should be the same, at least all the tests in WPT related to gaps are passing now in WebKit trunk.
<div style="display: flex; flex-wrap: wrap; gap: 1ch">
<div style="background: magenta; color: white">Lorem</div>
<div style="background: green; color: white">ipsum</div>
<div style="background: orange; color: white">dolor</div>
<div style="background: blue; color: white">sit</div>
<div style="background: brown; color: white">amet</div>
</div>
Before
After
Tables as flex items
Tables should obey the flex container sizing whenever they are flex items. As it can be seen in the examples bellow, the tables’ layout code was kicking in and ignoring the constraints set by the flex container. Tables should do what the flex algorithm mandates and thus they should allow being stretched/squeezed as required.
Tables with items exceeding the 100% of available size
This is the case of tables placed inside flex items. The automatic layout table algorithm was generating tables with unlimited widths when the sum of the sizes of their columns (expressed in percentages) was exceeding the 100%. It was impossible to fulfill at the same time the constraints set by tables and flexbox algorithms.
Note how the table was growing indefinitely (I cropped the “Before” picture to fit in the post) to the right before the fix.
Alignment in single-line flexboxes
Interesting case. The code was considering that single-line flexboxes were those where all the flex items were placed in a single line after computing the required space for them. Though sensible, that’s not what a single line flexbox is, it’s a flex container with flex-wrap:nowrap. This means that a flex container with flex-wrap:wrap whose children do not need more than 1 flex line to be placed is not a single-line flex container from the specs POV (corolary: implementing specs is hard).
<div style="display: flex; flex-wrap: wrap; align-content: flex-end; width: 425px; height: 70px; border: 2px solid black">
<div style="height: 20px">This text should be at the bottom of its container</div>
</div>
Before
After
Percentages in flex items with indefinite sizes
One of the trickiest ones. Although it didn’t involve a lot of code it caused two serious regressions in Youtube’s upload form and when viewing Twitter videos in fullscreen which required some previous fixes and delayed a bit the landing of this patch. Note that this behaviour was really conflictive from the pure specification POV as there were many changes over the time. Defining a good behaviour is really complicated. Without entering in too much details, flexbox has a couple of cases were sizes are considered as definite when they are theoretically indefinite. In this case we consider that if the flex container main size is definite then the post-flexing size of flex items is also treated as definite.
There were some issues with pointer events passing through overlapping flex items (due to negative margins for example). This was fixed by letting the hit testing code proceed in reverse (the opposite to painting) order-modified document order instead of using the raw order from the DOM.
<div style="display:flex; border: 1px solid black; width: 300px;">
<a style="width: 200px;" href="#">Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua</a>
<div style="margin-left: -200px; width: 130px; height: 50px; background: orange;"></div>
</div>
Before
After
In the “Before” case hit testing was bypassing the orange block and thus, the cursor was showing a hand because it detected that it was hovering a link. After the fix, the cursor is properly rendered as an arrow because the orange block covers the underneath link.
Computing percentages with scrollbars
In this case the issue was that, in order to compute percentages in heights, we were incorrectly using the size of the scrollbars too.
Note that in the “After” picture the horizontal scrollbar background is visible while in the “Before” the wrong height computation made the flex item overlap the scrollbar.
Image items with specific sizes
The flex layout algorithm needs the intrinsic sizes of the flex items to compute their sizes and the size of the flex container. Changes to those intrinsic sizes should trigger new layouts, and the code was not doing that.
<!-- Just to showcase how the img bellow is not properly sized -->
<div style="position: absolute; background-color: red; width: 50px; height: 50px; z-index: -1;"></div>
<div style="display: flex; flex-direction: column; width: 100px; height: 5px;">
<img style="width: 100px; height: 100px;" src="https://wpt.live/css/css-flexbox/support/100x100-green.png">
</div>
Before
After
Nested flexboxes with ‘min-height: auto’
Another tricky one and another one related to the handling of nested column flexboxes. As in the previous issue with nested column flexboxes the problem was that we were not supporting this case. For those wanting to have a deeper understanding of the issue this bug was about implementing section 4.5 of the specs. This was one of the more complicated ones to fix, Edward Lorenz would love that part of the layout code, the slightest change in one of those source code lines could trigger huge changes in the final rendering.
As it can be seen, in the “Before” picture the blue and orange blocks are sized differently to the yellow one. That’s fixed in the “After” picture.
Percentages in quirks mode
Another one affecting how percentages are computed in heights, but this one specific to quirks mode. We’re matching now Firefox, Chrome and pre-Chromium Edge, i.e., flexbox should not care much about quirks mode since it was invented many years after quirky browsers dominated the earth.
Percentages were working generally fine inside flex-basis, however there was one particular problematic case. It arose whenever that percentage was refererring to, oh surprise, and indefinite height. And again, we’re talking about nested flexboxes with column flows. Indeed, definite/indefinite sizes is one of the toughest things to get right from the layout POV. In this particular case, the fix was to ignore the percentages and and treat them as height: auto.
Fixing a couple of test cases submitted by an anonymous Opera employee 8! years ago. This is another case of competing layout contexts trying to do things their own way.
After the fix the table is properly sized to 0px width and thus no red is seen.
Conclusions
These examples are just some interesting ones I’ve chosen to highlight. In the end, almost 50 new flexbox tests are passing in WebKit that weren’t back in May!. I wouldn’t like to forget the great job done by my colleague Carlos Lopez who imported tons of WPT flexbox tests into the WebKit source tree. He also performed awesome triage work which made my life a lot easier.
Investing in interoperability is a huge deal for the web. It’s good for everyone, from spec authors to final users, including browser vendors, downstream ports or web authors. So if you care about the web, or your business orbits around web technologies, you should definitely promote and invest on interoperability.
Implementing standards or fixing bugs in web engines is the kind of work we happily do at Igalia on a daily basis. We are the second largest contributor to both WebKit and Chrome/Blink, so if you have an annoying bug on a particular web engine (Gecko and Servo as well) that you want to be fixed, don’t hesitate and contact us, we’d be glad to help. Also, should you want to be part of a workers-owned cooperative with an asambleary decision-making mechanism and a strong focus on free software technologies join us!.
Acknowledgements
Many thanks to WebKit reviewers from Apple and Igalia like Darin Adler, Manuel Rego, Javier Fernández or Daniel Bates who made the process really easy for me, always providing very nice feedback for the patches I submitted.
I’m also really thankful to Googlers like Christian Biesinger, David Grogan and Stephen McGruer who worked on the very same things in Blink and/or provided very nice guidance and support when porting patches.
While the WPE WebKit port allowed running windowless NPAPI plug-ins, this
was never advertised nor supported by us.
What is NPAPI?
In 1995, Netscape Navigator 2.0 introduced a mechanism to extend the
functionality of the web browser. That was NPAPI, short for Netscape Plugin
Application Programming Interface. NPAPI allowed third parties to add
support for new content types; for example Future Splash (.spl files),
which later became Flash (.swf).
When a NPAPI plug-in is used to render content, the web browser carves a
hole in the rectangular location where content handled by the plug-in will
be placed, and hands off the rendering responsibility to the plug-in. This
would end up calling call for trouble, as we will see later.
What is NPAPI used for?
A number of technologies have used NPAPI along the years for different
purposes:
Displaying documents in non-Web formats (PDF, DjVu) inside
browser windows.
A number of questionable practices, like VPN client software using a browser plug‑in for
configuration.
Why are NPAPI plug-ins being phased out?
The design of NPAPI makes the web browser give full responsibility to
plug-ins: the browser has no control whatsoever over what plug-ins do to
display content, which makes it hard to make them participate in styling and
layout. More importantly, plug-ins are compiled, native code over which
browser developers cannot exercise quality control, which resulted in a
history of security incidents, crashes, and browser hangs.
Today, Web browsers’ rendering engines can do a better job than plug-ins, more
securely and efficiently. The Web platform is mature and there is no place to
blindly trust third party code to behave well. NPAPI is a 25 years old
technology showing its age—it has served its purpose, but it is no longer
needed.
Glad that you asked! It turns out that all major browsers have plans for
incrementally reducing how much of NPAPI usage they allow, until they
eventually remove it.
All plug-ins except Flash need the user to click on the element to activate them.
52
March 2017
Only loads the Flash plug‑in by default.
55
August 2017
Does not load the Flash plug‑in by default, instead it asks users to choose whether sites may use it.
56
September 2017
On top of asking the user, Flash content can only be loaded from http:// and https:// URIs; the Android version completely removes plug‑in support. There is still an option to allow always running the Flash plug-in without asking.
69
September 2019
The option to allow running the Flash plug-in without asking the user is gone.
85
January 2021
Support for plug-ins is gone.
Table: Firefox NPAPI plug-in roadmap.
In conclusion, the Mozilla folks have been slowly boiling the frog
for the last four years and will completely remove the support for NPAPI
plug-ins coinciding with the Flash player reaching EOL status.
The interface to unblock running plug-ins is made more complicated, to discourage usage.
?
January 2015
Plug-ins blocked by default, some popular ones allowed.
42
April 2015
Support for plug-ins disabled by default, setting available in chrome://flags.
45
September 2015
Support for NPAPI plug-ins is removed.
55
December 2016
Browser does not advertise Flash support to web content, the user is asked whether to run the plug-in for sites that really need it.
76
July 2019
Flash support is disabled by default, can still be enabled with a setting.
88
January 2021
Flash support is removed.
Table: Chromium NPAPI/Flash plug-in roadmap.
Note that Chromium continued supporting Flash content even when it already
removed support for NPAPI in 2015: by means of their acute NIH syndrome, Google came up with
PPAPI, which replaced NPAPI and which was basically designed
to support Flash and is currently used by Chromium’s built-in PDF viewer—which will go away
also coinciding with Flash being EOL,
nevertheless.
Safari
On the Apple camp, the story is much easier to tell:
Their handheld devices—iPhone, iPad, iPod Touch—never supported NPAPI
plug-ins to begin with. Easy-peasy.
On desktop, Safari has required explicit approval
from the user to allow running plug-ins since June 2016. The Flash plug-in
has not been preinstalled in Mac OS since 2010, requiring users to manually
install it.
Yes. In September 2019 WebKitGTK 2.26 removed support
for NPAPI plug-ins which use GTK2. This included Flash, but the PPAPI version
could still be used via freshplayerplugin.
In March 2021, when the next stable release series is due, WebKitGTK
2.32 will remove the support for NPAPI plug-ins. This series will receive
updates until September 2021.
The above gives a full two years since we started restricting which plug-ins
can be loaded before they stop working, which we reckon should be enough. At
the moment of writing this article, the support for plug-ins was already
gone from the WebKit source the GTK and WPE ports.
Yes, you read well, WPE supported NPAPI plug-ins, but in a limited fashion:
only windowless plug-ins worked. In practice, making NPAPI plug-ins work on
Unix-like systems required using the XEmbed protocol to allow them
to place their rendered content overlaid on top of WebKit’s, but the WPE port
does not use X11. Provided that we never advertised nor officially supported
the NPAPI support in the WPE port, we do not expect any trouble removing it.
The 2020 X.Org Developers Conference took place from September 16th to September 18th. For the first time, due to the ongoing COVID-19 pandemic, it was a fully virtual event. While this meant that some interesting bits of the conference, like the hallway track, catching up in person with some people and doing some networking, was not entirely possible this time, I have to thank the organizers for their work in making the conference an almost flawless event. The conference was livestreamed directly to YouTube, which was the main way for attendees to watch the many different talks. freenode was used for the hallway track, with most discussions happening in the ##xdc2020 IRC channel. In addition ##xdc2020-QA was used for attendees wanting to add questions or comments at the end of the talk.
My talk about VK_EXT_extended_dynamic_state was based on my previous blog post, but it includes a more detailed explanation of the extension as well as more detailed comments and an explanation about how the extension was created. I took advantage of the possibility of using pre-recorded videos for the conference, as I didn’t fully trust my kids wouldn’t interrupt me in the middle of the talk. In the end I think it was a good idea and, from the presenter point of view, I also found out using a script and following it strictly (to some degree) prevented distractions and made the talk a bit shorter and more to the point, because I tend to beat around the bush when talking live. You can watch my talk in the embedded video below.
* {
padding: 0;
margin: 0;
overflow: hidden;
}
html, body {
height: 100%;
}
img, span {
/* All elements take the whole iframe width and are vertically centered. */
position: absolute;
width: 100%;
top: 0;
bottom: 0;
margin: auto;
}
span {
/* This mostly applies to the play button. */
height: 1.5em;
text-align: center;
font-family: sans-serif;
font-size: 500%;
color: white;
}
▶">
* {
padding: 0;
margin: 0;
overflow: hidden;
}
html, body {
height: 100%;
}
img, span {
/* All elements take the whole iframe width and are vertically centered. */
position: absolute;
width: 100%;
top: 0;
bottom: 0;
margin: auto;
}
span {
/* This mostly applies to the play button. */
height: 1.5em;
text-align: center;
font-family: sans-serif;
font-size: 500%;
color: white;
}
▶"
>
Slides for the talk are also available and below you can find a transcript of the talk.
<Title slide>
Hello, my name is Ricardo García, I work at Igalia as part of its Graphics team and today I will be talking about the extended dynamic state Vulkan extension.
At Igalia I was involved in creating CTS tests for this extension and also in reviewing the spec when writing those tests, in a very minor capacity.
This extension is pretty simple and very useful, and the talk is divided in two parts. First I will talk about the extension itself and then I’ll reflect on a few bits about how this extension was created that I consider quite interesting.
<Part 1>
<Extension description slide>
So, first, what does this extension do? Its documentation says:
VK_EXT_extended_dynamic_state adds some more dynamic state to support applications that need to reduce the number of pipeline state objects they compile and bind.
In other words, as you will see, it makes Vulkan pipeline objects more flexible and easier to use from the application point of view.
<Pipeline diagram slide>
So, to give you some context, this is [the] typical graphics pipeline representation in many APIs like OpenGL, DirectX or Vulkan. You’ve probably seen variations of this a million times.
The pipeline is divided in stages, some of them fixed-function, some of them programmable with shaders.
Each stage usually takes some data from the previous stage and produces data to be consumed by the next one, apart from using other external resources like buffers or textures or whatever.
What’s the Vulkan approach to represent this process?
<Creation structure slide>
Vulkan wants you to specify almost every single aspect of the previous pipeline in advance by creating a graphics pipeline object that contains information about how every stage should work. And, once created, most of these pipeline parameters or configuration cannot be changed.
As you can see here, this includes shader programs, how vertices are read and processed, depth and stencil tests, you name it.
Pipeline objects are heavy objects in Vulkan and they are hard to create.
Why does Vulkan want you to do that? The answer has always been this keyword: “optimization”.
Giving all the information in advance gives more chances for every current or even future implementations to optimize how the pipeline works. It’s the safe choice.
And, despite this, you can see there’s a pipeline creation parameter with information about dynamic state.
These are things that can be changed when using the pipeline without having to create a separate and almost identical pipeline object.
<New dynamic states slide>
What the extension does should be pretty obvious now: it adds a bunch of additional elements that can be changed on the fly without creating additional pipelines.
This includes things like primitive topology, front face vertex order, vertex stride, cull mode and more aspects of the depth and stencil tests, etc. A lot of things.
Using them if needed means fewer pipeline objects, fewer pipeline cache accesses and simpler programs in general.
As I said before, it makes Vulkan pipeline objects more flexible and easier to use from the application point of view, because more pipeline aspects can be changed on the fly when using these pipeline objects instead of having to create separate objects for each combination of parameters you may want to modify at runtime. This may make the application logic simpler and it can also help when Vulkan is used as the backend, for example, to implement higher level APIs that are not so rigid regarding pipelines. I know this extension is useful for some emulators and other API-translating projects.
<New commands slide>
Together with those it also introduces a new set of functions to change those parameters on the fly when recording commands that will use the pipeline state object.
<Pipeline diagram slide>
So, knowing that and going back to the graphics pipeline, the obvious question is: does this impact performance? Aren’t we reducing the number of optimization opportunities the implementation has if we use these additional dynamic states?
In theory, yes. In practice, it depends on the implementation.
Many GPUs and Vulkan drivers out there today have some pipeline aspects that are considered “dynamic” in the sense that they are easily changed on the fly without a perceptible impact in performance, while others are truly important for optimization.
For example, take shaders. In Vulkan they’re provided as SPIR-V programs that need to be translated to GPU machine code and creating pipelines when the application starts makes it easy to compile shaders beforehand to avoid stuttering and frame timing issues later, for example.
And not only that. As you create pipelines, you’re telling the implementation which shaders are used together.
Say you have a vertex shader that outputs 4 parameters, and it’s used in a pipeline with a fragment shader that only uses the first 2.
When creating the pipeline the implementation can decide to discard instructions that are only related to producing the 2 extra unused parameters in the vertex shader.
But other things like, for example, changing the front face? That may be trivial without affecting performance.
<Part 2>
<Eric Lengyel tweet slide>
Moving on to the second part, I wanted to talk about how this extension was created.
It all started with an “angry” tweet by Eric Lengyel (sorry if I’m not pronouncing it correctly) who also happens to be the author of the previous diagram.
He complained in Twitter that you couldn’t change the front face dynamically, which happens to be super useful for rendering reflections, and pointed to an OpenGL NVIDIA extension that allowed you to do exactly that.
<Piers Daniell reply slide>
This was noticed by Piers Daniell from NVIDIA, who created a proposal in Khronos. That proposal was discussed with other vendors (software and hardware) that chimed in on aspects that could be or should be made dynamic if possible, which resulted in the multi-vendor extension we have today.
<RADV implementation slide>
In fact, RADV was one of the first Vulkan implementations to support the extension thanks to the effort by Samuel Pitoiset.
<Promoters of Khronos slide>
This whole process got me thinking Khronos may sometimes be seen from the outside as this closed silo composed mainly of hardware vendors.
Certainly, there are a lot of hardware vendors but if you take the list of promoter members you can see some fairly well-known software vendors as well, and API usability and adoption are important for both groups. There are many people in Khronos trying to make Vulkan easier to use even if we’re all aware that’s somewhat in conflict with providing a lower level API that should let you write performant applications.
<Khronos Contributors slide>
If you take a look at the long list of contributor members, that’s only shown partially here because it’s very long, you’ll notice a lot of actors from different backgrounds as well.
<Vulkan-Docs repo slide>
Moreover, while Khronos and its different Vulkan working groups are far from an open source project or community, I believe they’re certainly more open to contributions than what many people think.
For example, the Vulkan spec is published in a GitHub repo with instructions to build it (the spec is written in AsciiDoc) and this repo is open for issues and pull requests.
So, obviously, if you want to change major parts of Vulkan and how some aspects of the API work, you’re going to meet opposition and maybe you should be joining Khronos to discuss things internally with everyone involved in there.
However, while an angry tweet was enough for this particular extension, if you’re not well-known you may want to create an issue instead, exposing your use case and maybe with other colleagues chiming in on details or supporting of your proposal.
I know for a fact issues created in this public repo are discussed in periodic Khronos meetings. It may take some weeks if people are busy and there’s a lot of things on the table, but they’re going to end up being discussed, which is a very good thing I was happy to see, and I want to put emphasis on that. I would like Khronos to continue doing that and I would like more people to take advantage of the public repos from Khronos.
I know the people involved in the Vulkan spec want to make the text as clear as possible. Maybe you think some paragraph is confusing, or there’s a missing link to another section that provides more context, or something absurd is allowed by the spec and should be forbidden. You can try a reasoned pull request for any of those. Obviously, no guarantees it will go in, but interesting in any case.
<Blend state tweet slide>
For example, in the Twitter thread I showed before, I tweeted a reply when the extension was published and, among a few retweets, likes and quoted replies I found this very interesting Tweet I’m showing you here, asking for the whole blend state to be made dynamic and indicating that would be game-changing for some developers and very interesting for web browsers. We all want our web browsers to leverage the power of the GPU as much as possible, right? So why not? I thought creating an issue in the public repo for this case could be interesting.
<Dynamic blend state issue slide>
And, in fact, it turns out someone had already created an issue about it, as you can see here.
<Tom Olson reply slide>
And in this case, in this issue, Tom Olson from ARM replied that the working group had been discussing it and it turns out in this particular case existing hardware doesn’t make it easy to make the blend state fully dynamic without possibly recompiling shaders under the hood and introducing unwanted complexity in the implementations, so it was rejected for now. But even if, in this case, the reply is negative, you can see what I was mentioning: the issue reached the working group, it was considered, discussed and the issue creator got a reply and feedback. And that’s what I wanted to show you.
<Final slide>
And that’s all. Thanks for listening! Any questions maybe?
The talk was followed by a Q&A section moderated, in this case, by Martin Peres. In the text below RG stands for Ricardo Garcia and MP stands for Martin Peres.
RG: OK… Hello everyone!
MP: OK, so far we do not have any questions. Jason Ekstrand has a comment: "We (the Vulkan Working Group) has had many contributions to the spec".
RG: Yeah, yeah, exactly. I mean, I don’t think it’s very well known but yeah, indeed, there are a lot of people who have already contributed issues, pull requests and there have been many external contributions already so these things should definitely continue and even happen more often.
MP: OK, I’m gonna ask a question. So… how much do you think this is gonna help layering libraries like Zink because I assume, I mean, one of the big issues with Zink is that you need to have a lot of pipelines precompiled and… is this helping Zink?
RG: I don’t know if it’s being used. I think I did a search yesterday to see if Zink was using the extension and I don’t remember if I found anything specific so maybe the Zink people can answer the question but, yeah, it should definitely help in those cases because OpenGL is not as strict as Vulkan regarding pipelines obviously. You can change more things on the fly and if the underlying Vulkan implementation supports extended dynamic state it should make it easier to emulate OpenGL on top of Vulkan. For example, I know it’s being used by VKD3D right now to emulate DirectX 12 and there’s a emulator, a few emulators out there which are using the extension because, you know, APIs for consoles are different and they can use this type of extensions to make code better.
MG: Agree. Jason also has another comment saying there are even extensions in flight from the Mesa community for some windowing-system related stuff.
RG: Yeah, I was happy to see yesterday… I think it was yesterday, well, here at this XDC that the present timing extension pull request is being handled right now on GitHub which I think is a very good thing. It’s a trend I would like to [see] continue because, well, I guess sometimes, you know, the discussions inside the Working Group and inside Khronos may involve IP or whatever so it’s better to have those discussions sometimes in private, but it is a good thing that maybe, you know, there are a few extensions that could be handled publicly in GitHub instead of the internal tools in Khronos. So, yeah, that’s a good thing and a trend I would like to see continue: extensions discussed in public.
MG: Yeah, sounds very cool. OK, I think we do not have any question… other questions or comments so let’s say thank you very much and…
RG: Thank you very much and let me congratulate you for… to the organizers for organizing XDC and… everyone, enjoy the rest of the day, thank you.
MG: Thank you! See you in 13m 30s for the status of freedesktop.org’s GitLab cloud hosting.
I was looking for a job late last year when I saw a tweet about a place called Igalia.
The more I learned about them, the more interested I became, and before long I applied to join their Web Platform team.
I didn’t have enough experience for a permanent position, but they did offer me a place in their Coding Experience program, which as far as I can tell is basically an internship, and I thoroughly enjoyed it.
Here’s an overview of what I did and what I learned.
There’s a wide range of work I can do as a computer programmer, but the vast majority of it seems to be in closed-source web applications, as an employee with a limited voice in the decisions that affect my work.
At the time, all of my work since I graduated had been exactly that, or in builds and releases for said applications.
That was interesting enough for a while, but I wanted to make a bigger impact, work on something I actually cared about of my own volition, and ideally move towards getting paid to do systems programming.
Igalia appeals to me, with their focus on open-source projects, systems programming, and standards work.
Even better, as a field, the web platform has been my one true love, and building things on it is how I got into programming over 15 years ago.
But what cements their place as my “dream job” is how they work: as a distributed worker’s cooperative.
What I mean by “distributed” is that members can work from anywhere in the world, paid in a way that fairly adjusts for location, and in whatever setting they thrive in (such as home).
This alone was huge, as someone who can’t sustainably work in an office five days a week, had to move 4000 km away from home to do so, and had just left an employer that was actively hostile to remote work.
Andy Wingo (author of that tweet) offers some insight into the “worker’s cooperative” part in thesethreeposts.
Igalia’s rough goal here, as far as I can tell, is that everyone gets a voice in deciding what the collective works on and how (to the extent that those decisions affect them), equal ownership of the business, and equivalent pay modulo effort and cost of living.
This appeals to me as an anarchist, but also as a worker that has often been on the receiving end of unethical work, poor working conditions, and lack of autonomy.
ſtylesheet
One goal of my internship was to help the Web Platform team with their MathML work, but I was also there to familiarise myself with working on the web platform, and my first task was purely for the latter.
Many parts of the web platform have case-insensitive keywords that control an API or language feature, like link@rel (the <link rel="..."> attribute), but thanks to Unicode, there’s more than one level of case-insensitivity.
Unicode case-insensitivity won’t break backwards compatibility of web content over time, but to improve interoperability and simplify implementations, things like the HTML spec tend to explicitly call for ASCII case-insensitivity, at least for keywords that are nominally ASCII.
That makes Blink’s widespread use of Unicode case-insensitivity in these situations a bug, and my job was to fix that bug, which sounds simple enough, until you realise that doing so is technically a breaking change.
You see, there are already a couple of non-ASCII characters that can introduce esoteric ways to write many of those keywords.
More importantly, the web platform is almost1 unique in that breaking existing content is, in general, not allowed.
But this time a breaking change was unavoidable, like any time where an implementation is fixed to align with the standard, or some behaviour is standardised after incompatible implementations appear.
There might be content out there that relies on something like <link rel="ſtylesheet"> because it worked on Chromium.
There are a few ways to minimise the impact of these breaking changes, like adding analytics to browsers to count how many pages would be affected, or searching archives of web content, but in this case we decided the risk was low enough that I could simply fix the bug and write some tests.
It’s hard to get a usable LSP setup going for a project as big as a browser.
I switched between ccls and clangd a bunch of times, but I never quite got either working too well.
My main machine is also getting pretty long in the tooth, which made indexing take forever and updating my branches expensive.
I considered writing an LSP client that would allow me to kick off an index on one of Igalia’s 128-thread build boxes without an editor, but I eventually settled on using Chromium Code Search to jump around and investigate things.
Firefox similarly has Searchfox2, but WebKit doesn’t yet have a public counterpart3.
I was looking for callers of three deprecated functions, but not all of them were relevant to the bug, and not all of those needed tests, and so on.
To help me analyse and categorise all of the potential call sites, I wrote some pretty intricate regular expressions for Sublime Text 2.
This one finds all callers of DeprecatedEqualIgnoringCase, with two arguments, where one of them is an ASCII literal that wouldn’t need new tests (skSK):
Each of the major engines has its own web content tests, and automated tests are strongly preferred over manual tests if at all possible.
All of the tests I wrote were automated, and most were Web Platform Tests, which are especially cool because they’re a shared suite of web content tests that can be run on any browser.
Chromium and Firefox even automatically upstream changes to their vendored WPT trees!
Many of my tests were for values of HTML attributes whose invalid value default was a different state to the keyword’s state.
In these cases, I didn’t even need to assert anything about the attribute’s actual behaviour!
All I had to do was write a tag, read the attribute in JavaScript, and check if the value we get back corresponds to the intended feature (bad) or the invalid value default (good).
Some legacy HTML attributes are now specified in terms of CSS “presentational hints”, so I checked the results of getComputedStyle for those, but the coolest tests I learned to write were reftests.
Very few web platform features guarantee that every user agent on every platform will render them identically down to the pixel, and over time, unrelated platform changes can affect a test’s expected rendering.
Both of these things are ok, but they make it impractical for tests to compare web content against screenshots.
Reftests consist of a test page that uses the feature being tested, and a reference page that should look the same without using the feature.
The reference page is like a screenshot, but it’s subject to all of the same variables as the test page, such as font rendering.
Ever heard of the Acid Tests?
Acid2 is more or less a reftest, because it has a reference page that only uses a screenshot for the platform-independent parts.
Acid1 uses a screenshot of the whole test, hence “except font rasterization and form widgets”.
I had a lot of fun writing my twoform-related tests, because I actually had to submit forms to observe those features’ behaviour.
WPT has server-side testing infrastructure that can help with this, and for such tests, I would need to spin up the provided web server or run the finished product with wpt.live4.
In both cases, I avoided the need for that with a <form method="GET"> that targets an iframe, plus a helper page that sends its query string back to the test page.
MathML 3 was made a Recommendation in 2014, and like any spec, it has shortcomings that only subsequent experience could identify.
Proposals by the MathML Refresh CG like MathML Core are trying to address them in a bunch of ways, like simplifying the spec, setting clearer expectations around rendering, and redefining features in terms of better-supported CSS constructs.
My remaining tasks touched on some of these.
mo@maxsize
Moving onto WebKit, my next task was to remove some dead code.
Past versions of MathML specify a very complex <mstyle> with its own inheritance system that’s incompatible with CSS, as well as several attributes that were rarely if ever used by authors, both of which are a burden on implementors.
One of those attributes was mstyle@maxsize, which would serve as the default mo@maxsize instead of infinity.
With the former removed from the spec, there was no longer a need for an explicit infinity value, so I removed the code for that.
It turns out WebKit never got around to implementing mstyle@maxsize anyway, so there was no functional change.
There’s a lot of MathML content that gets rendered like any other text, but stretchy and large operators are a bit more involved than just drawing a single glyph at a single size.
A well-known example of a stretchy operator is square root notation, which consists of a radical (the squiggly part) and a vinculum (the overline part) that stretches to cover the expression being rooted.
xy
=
x
y
Traditionally this was achieved by knowing where the glyphs for the separate parts lived in each font, so we could stretch and draw them independently.
Unicode assignments for stretchy operator parts helped, but that wasn’t enough to yield ideal rendering, because many fonts use Private Use Area characters for some operators, and ordinary fonts don’t give applications the necessary tools to control mathematical layout precisely.
OpenType MATH tables eventually solved this problem, but that meant Firefox essentially had three code paths: one for OpenType MATH fonts, one with font-specific operator data, and one generic Unicode path for all other fonts.
That second one adds a lot of complexity, and there was only one font left with its own operator data: STIXGeneral.
The goal was ultimately to remove that code path, dropping support for the font.
That sounded easy enough until we realised that STIXGeneral remains preinstalled on macOS, as the only stock mathematics font, to this day.
My task here was to add a feature flag that disables the code path on nightly builds, and gather data around how many pages would be affected.
The patch was straightforward, with one change to allow Document::WarnOnceAbout to work with parameterised l10n messages, and I wrote a cute little data URL test page for the warning messages.
Turning the feature flag on broke a test though, and I couldn’t for the life of me reproduce it locally.
Fred and I tried every possible strategy we could imagine short of interactively debugging CI, on and off for six weeks, but it looked like the flaky behaviour involved some sort of race against @font-face loading.
Eventually we gave up and disabled the feature flag just for that test, and I landed my patch.
Another way to improve the relationship between MathML and CSS has been defining how existing CSS constructs from the HTML world, including the box model properties, apply to MathML content.
In this case, the consensus was that these properties would “inflate” the content box as necessary, making the element occupy more space.
Existing implementations in WebKit and Firefox didn’t really handle them at all because it wasn’t in the spec, so the last task I had time for was to change that.
A modern browser starts by parsing documents into an element tree, which is also exposed to authors as the DOM, but when it comes to rendering, that tree is converted to a layout tree, which represents the boxes to be drawn in a hierarchy of position/size influence.
The layout tree consists of layout nodes (Chromium), renderer nodes (WebKit), or frame nodes (Firefox), but these all refer to the same concept.
I started with Firefox and <mspace> because that was the only element that could not contain children.
<mspace> represents, well, a space.
It has attributes for width, height (height above the baseline), and depth (height below the baseline), each of which can be negative to bring surrounding elements closer together.
I found the element’s frame node and noticed this method:
Reflow is the process of traversing the layout tree and figuring out the positions and sizes of all of its nodes, and in Firefox that involves a depth-first tree of nsIFrame::Reflow calls, starting from the initial containing block.
An <mspace> frame never has children, so our reflow logic was more or less to take the three attributes, then return a ReflowOutput that tells the parent we need that much space.
To handle padding and border, we add that to our desired size.
“Physical” here means the nsMargin in terms of absolute directions like left and right, as opposed to the LogicalMargin in terms of flow-relative directions, which are aware of direction (LTR + RTL) and writing-mode (horizontal + vertical + sideways).
We want to use LogicalMargin in most situations, but MathML Core is currently strictly horizontal-tb and sums of left and right are inherently direction-safe, so nsMargin was the way to go here.
That was enough to pass the <mspace> cases in the Web Platform Tests, but the test page I had put together to play around with my patch yielded both good news and bad news.
Let’s look at the reference, which uses <div> elements and flexbox rather than MathML.
The good news was that Firefox already drew borders, or at least border colours, even though the layout of them was all wrong.
The bad news was that while my patch made each element look Bigger Than Before, the baselines were misaligned.
More importantly, the <mspace> elements and even the whole <math> elements still overlapped each other… almost as if… their parents were unaware of how much space they needed when positioning them!
I fixed the first two problems by adding the padding and border to the nsBoundingMetrics as well, because that controls the sizes and positions of MathML content.
That left the overlapping of the <math> elements, because while they contain MathML content, they themselves are HTML content as far as their ancestors are concerned.
It turns out that in Firefox, MathML frames also need to report their width to their parent via nsMathMLContainerFrame::MeasureForWidth.
With the <mspace> counterpart updated, plus the WPT expectations files updated to mark the <mspace> test cases as passing, my patch was ready to land.
I also put together a test page (reference) for the interaction between negative mspace@width and padding, which more or less rendered as expected, but it potentially revealed a bug in the layout of <math> elements that are flex items.
My guess is that flex items use a code path that clamps negative sizes to zero at some point, like we have to do in ReflowOutput, resulting in excess space for the item.
Reftest for padding with negative mspace@width: reference page, without patch, with patch.
Margins were trickier to implement because, with Firefox and MathML content at least, the positions of elements are the parent’s responsibility to calculate.
I spent a very long time reading nsMathMLContainerFrame, which is the base implementation for most MathML parents, and eventually figured out where and how to handle margins.
With a patch that updates RowChildFrameIterator and Place, and yet another test page (reference) that passed with my patch, we were close to having a template for the remaining MathML elements!
Reftest for margin: reference page, without patch, with patch.
You can see my approach over at D87594, but the patch needed reworking and I ran out of time before I could land it.
This internship was incredibly valuable.
While I was only able to finish the first trimester for mental health reasons, over the last nine months I’ve learned C++, learned how the web platform and browser engines work, gained ample experience reading specs, worked with countless people in the open-source community, and contributed to three major engines plus the Web Platform Tests.
Were I able to continue, I would also look forward to (more) experience contributing to specs, and probably helping Igalia with their MathML in Chromium project.
In any case, my time with the collective has only strengthened my desire to someday join full-time.
Thanks to Caitlin for her advice and support, Eva and Javier and Pablo for getting me settled in so quickly, Manuel and Fred and Rob from the Web Platform team, and Yoav and Emilio for their help on the Chromium and Firefox parts of my work.
Windows is the other major platform that does this. Check out The Old New Thing by Raymond Chen to learn more. ↩