Planet Igalia

September 12, 2023

Eric Meyer

Nuclear Anchored Sidenotes

Exactly one year ago today, which I swear is a coincidence I only noticed as I prepared to publish this, I posted an article on how I coded the footnotes for The Effects of Nuclear Weapons.  In that piece, I mentioned that the footnotes I ended up using weren’t what I had hoped to create when the project first started.  As I said in the original post:

Originally I had thought about putting footnotes off to one side in desktop views, such as in the right-hand grid gutter.  After playing with some rough prototypes, I realized this wasn’t going to go the way I wanted it to…

I came back to this in my post “CSS Wish List 2023”, when I talked about anchor(ed) positioning.  The ideal, which wasn’t really possible a year ago without a bunch of scripting, was to have the footnotes arranged structurally as endnotes, which we did, but in a way that I could place the notes as sidenotes, next to the footnote reference, when there was enough space to show them.

As it happens, that’s still not really possible without a lot of scripting today, unless you have:

  1. A recent (as of late 2023) version of Chrome
  2. With the “Experimental web features” flag enabled

With those things in place, you get experimental support for CSS anchor positioning, which lets you absolutely position an element in relation to any other element, anywhere in the DOM, essentially regardless of their markup relationship to each other, as long as they conform to a short set of constraints related to their containing blocks.  You could reveal an embedded stylesheet and then position it next to the bit of markup it styles!

Anchoring Sidenotes

More relevantly to The Effects of Nuclear Weapons, I can enhance the desktop browsing experience by turning the popup footnotes into Tufte-style static sidenotes.  So, for example, I can style the list items that contain the footnotes like this:

.endnotes li {
	position: absolute;
	top: anchor(top);
	bottom: auto;
	left: calc(anchor(--main right) + 0.5em);
	max-width: 23em;
}
A sidenote next to the main text column, with its number aligned with the referencing number found in the main text column.

Let me break that down.  The position is absolute, and bottom is set to auto to override a previous bit of styling that’s needed in cases where a footnote isn’t being anchored.  I also decided to restrain the maximum width of a sidenote to 23em, for no other reason than it looked right to me.

(A brief side note, pun absolutely intended: I’m using the physical-direction property top because the logical-direction equivalent in this context, inset-block-start, only gained full desktop cross-browser support a couple of years ago, and that’s only true if you ignore IE11’s existence, plus it arrived in several mobile browsers only this year, and I still fret about those kinds of things.  Since this is desktop-centric styling, I should probably set a calendar reminder to fix these at some point in the future.  Anyway, see MDN’s entry for more.)

Now for the new and unfamiliar parts.

 top: anchor(top);

This sets the position of the top edge of the list item to be aligned with the top edge of its anchor’s box.  What is a footnote’s anchor?  It’s the corresponding superscripted footnote mark embedded in the text.  How does the CSS know that?  Well, the way I set things up  —  and this is not the only option for defining an anchor, but it’s the option that worked in this use case  —  the anchor is defined in the markup itself.  Here’s what a footnote mark and its associated footnote look like, markup-wise.

explosion,<sup><a href="#fnote01" id="fn01">1</a></sup> although
<li id="fnote01" anchor="fn01"><sup>1</sup> … </li>

The important bits for anchor positioning are the id="fn01" on the superscripted link, and the anchor="fn01" on the list item: the latter establishes the element with an id of fn01 as the anchor for the list item.  Any element can have an anchor attribute, thus creating what the CSS Anchor Positioning specification calls an implicit anchor.  It’s explicit in the HTML, yes, but that makes it implicit to CSS, I guess.  There’s even an implicit keyword, so I could have written this in my CSS instead:

 top: anchor(implicit top);

(There are ways to mark an element as an anchor and associate other elements with that anchor, without the need for any HTML.  You don’t even need to have IDs in the HTML.  I’ll get to that in a bit.)

Note that the superscripted link and the list item are just barely related, structurally speaking.  Their closest ancestor element is the page’s single <main> element, which is the link’s fourth-great-grandparent, and the list item’s third-great-grandparent.  That’s okay!  Much as a <label> can be associated with an input element across DOM structures via its for attribute, any element can be associated with an anchoring element via its anchor attribute.  In both cases, the value is an ID.

So anyway, that means the top edge of the endnote will be absolutely positioned to line up with the top edge of its anchor.  Had I wanted the top of the endnote to line up with the bottom edge of the anchor, I would have said:

 top: anchor(bottom);

But I didn’t.  With the top edges aligned, I now needed to drop the endnote into the space outside the main content column, off to its right.  At first, I did it like this:

 left: anchor(--main right);

Wait.  Before you think you can just automatically use HTML element names as anchor references, well, you can’t.  That --main is what CSS calls a dashed-ident, as in a dashed identifier, and I declared it elsewhere in my CSS.  To wit:

main {
	anchor-name: --main;
}

That assigns the anchor name --main to the <main> element in the CSS, no HTML attributes required.  Using the name --main to identify the <main> element was me following the common practice of naming things for what they are.  I could have called it --mainElement or --elMain or --main-column or --content or --josephine or --📕😉 or whatever I wanted.  It made the most sense to me to call it --main, so that’s what I picked.

Having done that, I can use the edges of the <main> element as positioning referents for any absolutely (or fixed) positioned element.  Since I wanted the left side of sidenotes to be placed with respect to the right edge of the <main>, I set their left to be anchor(--main right).

Thus, taking these two declarations together, the top edge of a sidenote is positioned with respect to the top edge of its implicit anchor, and its left edge is positioned with respect to the right edge of the anchor named --main.

	top: anchor(top);
	left: anchor(--main right);

Yes, I’m anchoring the sidenotes with respect to two completely different anchors, one of which is a descendant of the other.  That’s okay!  You can do that!  Literally, you could position each edge of an anchored element to a separate anchor, regardless of how they relate to each other structurally.

Once I previewed the result of those declarations, I saw I the sidenotes were too close to the main content, which makes sense: I had made the edges adjacent to each other.

Red borders showing the edges of the sidenote and the main column touching.

I thought about using a left margin on the sidenotes to push them over, and that would work fine, but I figured what the heck, CSS has calculation functions and anchor functions can go inside them, and any engine supporting anchor positioning will also support calc(), so why not?  Thus:

 left: calc(anchor(--main right) + 0.5em);

I wrapped those in a media query that only turned the footnotes into sidenotes at or above a certain viewport width, and wrapped that in a feature query so as to keep the styles away from non-anchor-position-understanding browsers, and I had the solution I’d envisioned at the beginning of the project!

Except I didn’t.

Fixing Proximate Overlap

What I’d done was fine as long as the footnotes were well separated.  Remember, these are absolutely positioned elements, so they’re out of the document flow.  Since we still don’t have CSS Exclusions, there needs to be a way to deal with situations where there are two footnotes close to each other.  Without it, you get this sort of thing.

Two sidenotes completely overlapping with each other.  This will not do.

I couldn’t figure out how to fix this problem, so I did what you do these days, which is I posted my problem to social media.  Pretty quickly, I got a reply from the brilliant Roman Komarov, pointing me at a Codepen that showed how to do what I needed, plus some very cool highlighting techniques.  I forked it so I could strip it down to the essentials, which is all I really needed for my use case, and also have some hope of understanding it.

Once I’d worked through it all and applied the results to TEoNW, I got exactly what I was after.

The same two sidenotes, except now there is no overlap.

But how?  It goes like this:

.endnotes li {
	position: absolute;
	anchor-name: --sidenote;
	top: max(anchor(top) , calc(anchor(--sidenote bottom) + 0.67em));
	bottom: auto;
	left: calc(anchor(--main right) + 0.5em);
	max-width: 23em;
}

Whoa.  That’s a lot of functions working together there in the top value.  (CSS is becoming more and more functional, which I feel some kind of way about.)  It can all be verbalized as, “the position of the top edge of the list item is either the same as the top edge of its anchor, or two-thirds of an em below the bottom edge of the previous sidenote, whichever is further down”.

The browser knows how to do this because the list items have all been given an anchor-name of --sidenote (again, that could be anything, I just picked what made sense to me).  That means every one of the endnote list items will have that anchor name, and other things can be positioned against them.

Those styles mean that I have multiple elements bearing the same anchor name, though.  When any sidenote is positioned with respect to that anchor name, it has to pick just one of the anchors.  The specification says the named anchor that occurs most recently before the thing you’re positioning is what wins.  Given my setup, this means an anchored sidenote will use the previous sidenote as the anchor for its top edge.

At least, it will use the previous sidenote as its anchor if the bottom of the previous sidenote (plus two-thirds of an em) is lower than the top edge of its implicit anchor.  In a sense, every sidenote’s top edge has two anchors, and the max() function picks which one is actually used in every case.

CSS, man.

Remember that all this is experimental, and the specification (and thus how anchor positioning works) could change.  The best practices for accessibility are also not clear yet, from what I’ve been able to find.  As such, this may not be something you want to deploy in production, even as a progressive enhancement.  I’m holding off myself for the time being, which means none of the above is currently used in the published version of The Effects of Nuclear Weapons.  If people are interested, I can create a Codepen to illustrate.

I do know this is something the CSS Working Group is working on pretty hard right now, so I have hopes that things will finalize soon and support will spread.

My thanks to Roman Komarov for his review of and feedback on this article.  For more use cases of anchor positioning, see his lengthy (and quite lovely) article “Future CSS: Anchor Positioning”.


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at September 12, 2023 03:16 PM

September 06, 2023

Eric Meyer

Memories of Molly

The Web is a little bit darker today, a fair bit poorer: Molly Holzschlag is dead.  She lived hard, but I hope she died easy.  I am more sparing than most with my use of the word “friend”, and she was absolutely one.  To everyone.

If you don’t know her name, I’m sorry.  Too many didn’t.  She was one of the first web gurus, a title she adamantly rejected  —  “We’re all just people, people!”  —  but it fit nevertheless.  She was a groundbreaker, expanding and explaining the Web at its infancy.  So many people, on hearing the mournful news, have described her as a force of nature, and that’s a title she would have accepted with pride.  She was raucous, rambunctious, open-hearted, never ever close-mouthed, blazing with fire, and laughed (as she did everything) with her entire chest, constantly.  She was giving and took and she hurt and she wanted to heal everyone, all the time.  She was messily imperfect, would tell you so loudly and repeatedly, and gonzo in all the senses of that word.  Hunter S. Thompson should have written her obituary.

I could tell so many stories.  The time we were waiting to check into a hotel, talking about who knows what, and realized Little Richard was a few spots ahead of us in line.  Once he’d finished checking in, Molly walked right over to introduce herself and spend a few minutes talking with him.  An evening a group of us had dinner one the top floor of a building in Chiba City and I got the unexpectedly fresh shrimp hibachi.  The time she and I were chatting online about a talk or training gig, somehow got onto the subject of Nick Drake, and coordinated a playing of “ Three Hours” just to savor it together.  A night in San Francisco where the two of us went out for dinner before some conference or other, stopped at a bar just off Union Square so she could have a couple of drinks, and she got propositioned by the impressively drunk couple seated next to her after they’d failed to talk the two of us into hooking up.  The bartender couldn’t stop laughing.

Or the time a bunch of us were gathered in New Orleans (again, some conference or other) and went to dinner at a jazz club, where we ended up seated next to the live jazz trio and she sang along with some of the songs.  She had a voice like a blues singer in a cabaret, brassy and smoky and full of hard-won joys, and she used it to great effect standing in front of Bill Gates to harangue him about Internet Explorer.  She raised it to fight like hell for the Web and its users, for the foundational principles of universal access and accessible development.  She put her voice on paper in some three dozen books, and was working on yet another when she died.  In one book, she managed to sneak past the editors an example that used a stick-figure Kama Sutra custom font face.  She could never resist a prank, particularly a bawdy one, as long as it didn’t hurt anyone.

She made the trek to Cleveland at least once to attend and be part of the crew for one of our Bread and Soup parties.  We put her to work rolling tiny matzoh balls and she immediately made ribald jokes about it, laughing harder at our one-up jokes than she had at her own.  She stopped by the house a couple of other times over the years, when she was in town for consulting work, “Auntie Molly” to our eldest and one of my few colleagues to have spent any time with Rebecca.  Those pictures were lost, and I still keenly regret that.

There were so many things about what the Web became that she hated, that she’d spent so much time and energy fighting to avert, but she still loved it for what it could be and what it had been originally designed to be.  She took more than one fledgling web designer under her wing, boosted their skills and careers, and beamed with pride at their accomplishments.  She told a great story about one, I think it was Dunstan Orchard but I could be wrong, and his afternoon walk through a dry Arizona arroyo.

I could go on for pages, but I won’t; if this were a toast and she were here, she would have long ago heckled me (affectionately) into shutting up.  But if you have treasured memories of Molly, I’d love to hear them in the comments below, or on your own blog or social media or podcasts or anywhere.  She loved stories.  Tell hers.


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at September 06, 2023 03:44 PM

September 04, 2023

Clayton Craft

Upgrading Steam Deck storage the lazy way

When I bought my Steam Deck over a year ago, I purchased the basic model with a very unassuming 64GB eMMC module for primary storage. I knew this wouldn't be enough on its own for holding the games I like to play, so I've gotten by with having my Steam library installed on an SD card. This has worked surprisingly well, despite the lower performance of the SD card when compared to something like an NVMe. Load times, etc actually weren't all that bad! Alas, all good things must come to an end... I started having issues with the eMMC drive filling up and I was constantly interrupted with "low storage space" notifications in Steam. I guess Steam stores saved games, Mesa shader cache, and other things like that on the primary drive despite having gobs of space on SD storage. Oops!

Since I can't be bothered to enter my Steam login, WiFi credentials, etc a second time on my Deck, I really wanted to preserve/transfer as much data from the eMMC to a new NVMe drive as possible. That's what this post is about. Spoiler alert: I was successful, and it was relatively easy/painless.

by Unknown at September 04, 2023 12:00 AM

August 27, 2023

Brian Kardell

Completely Random Car and Design Stuff

Completely Random Car and Design Stuff

This is a kind of "off brand" post for me. It's very, very random and just for fun. It's just a whole bunch of mental wandering literally all over place about cars, design, AI and only a very little bit about the web. Many of these observations are probably extermely specific to America, but perhaps still interesting. There is no overarching "point" here, it's just thoughts.

There are certain trends that I notice in the design of everyday things. Sometimes I think I notice them developing early and have lots of conversations pointing them out to other people (especially my son) and wondering aloud about how all of this will look in retrospect: Is this one of those things that will come to represent this "era"?

If it's not clear what I mean, basically every single thing in the photo below screams 1970s:

A Frigidare ad from the 1970s. A woman standanding in a kitchen with avacado Frigidare appliances and countertop, a green and yellow flowered wallpaper, she's wearing 70's attire, the cabinet design is somehow also unique to the era, the quality of the film is also reminscant of the 1970s.

It seems like there are some things that come to become associated with an era... Stuff that got really popular for a period of time and then left behind.

Several years ago I pointed out two changes I thought I saw happening in new car colors, and I started to point them out to my son. If you haven't noticed it, it's real. For a lot of years the mainstream of automobiles might have varied in real ways (you'd realize how much if you ever tried to match paint on your car), it's still been a fairly common range and palette of colors that I think mainly would be described with words like "shiny" or "glossy". I think this has been generally true since the 70's or 80's where it seems I remember some more different reds and oranges and earthy colors (my dad's old truck maybe was interesting in both ways). In any case, now we have all of these new "ceramic colors" which look more "baked in" to the materials and matte. More often than not they're grays but there are some beautiful turquoises and reds. Also a new common thing is "blacked out" emblems. I think both of these things began with trendsetting aftermarket people.

Another thing that this made me think about and try to explain to my son was that that's kind of true of automobile body styles and some other characteristics too. I found it surprisingly hard to describe because it's not as if there is a single kind of car in any era - it's just that there are a smaller array of characteristics that you can recognize as belonging to that era.

Don't you think a lot of cars today have similar characteristics? I do. But... They also feel like they are increasingly resembling a car developed at the end of the 1970's, by a car company that doesn't even exist anymore. The car company is AMC (American Motors Corporation) and the car was the Eagle SX/4. Below is an advertisement

Advertisement for the Eagle SX/4 around 1979

Basically, it was sort of the first crossover vehicle that combined a car and 4 wheel (later all wheel) drive. But also, just visually: Do you see a resemblance to a lot of popular cars today? I do. I think this car looks much more similar to many cars I see on the road today than most of what I saw for the 40 years or so between them. Also interesting, it was available in one of those interesting earthy yellow/tan (and kind of matte!) colors with dark gray/black accents that kind of mostly disappeared, but I could imagine being reborn today.

Photo of the Eagle SX/4 in an earthy, somewhat matte tan/yellow with interesting dark trims.

So, I was thinking about this and the fact it was kind of ahead of its time. In fact, I searched for exactly that and found this article with a title that is pretty much that: The AMC Eagle SX/4 – An American 4×4 Sports Car That Was Ahead Of Its Time".

That's cool, but I was also wondering how much of that came from the fact that it was not (yet) part of "The Big 3". There used to be a lot more American car companies, but we just kept consolidating into those 3. Lots of those became "brands" of the Big 3 for a while, but eventually homogenized. A huge number of them no longer exist. My first car was a Pontiac. The last Pontiac produced rolled off the line in January 2010. Saturn, same. My mom drove a Plymouth Voyager. Plymouth was discontinued in 2001. In fact, the car I drive now sometimes (not even American: an Isuzo Amigo) hasn't been sold in the US since 2000. And so on.

I was thinking that most of the diversity today that drives anything about automobiles, pretty much doesn't really come from those Big 3. And wondering if maybe that meant something for web engines too. Does it? I don't know. For sure we can see outside (but still standards compliant) innovation in browsers like Arc or Vivaldi or Polypane. I suppose the open source nature makes that more easily possible, but it's worth noting too that this is kept alive mainly through ~80% of that bill being footed by the Web Engines Big 3.

Anyway... This is already all over the place (sorry, that's how my mind works!), but in sitting down this morning to write this I had an idea that perhaps AI could help me "illustrate" if not explain the "look" of body styles associated with different decades. Here's what I found out: Midjourney is pretty bad at it!

The photo below was generated with the prompt "photo of a typical 2010's automobile in America with the most common body style of cars built between 2000 and 2009".

A picture that is, for all intents and purposes, almost literally an early 1950's Buick but with a different emblem.

I tried several variations and wound up with something like this as one of the 4 options no matter how explicit it was. I wonder why? I'd kind of think that identifing and sort of recyling the "trends" like that would be very much what AI models like this would be really good at - but apparently not as good as I'd imagine!

I decided to check the other way round, just for giggles. The inverse worked better. Giving it the first image in this article mentioned the 1970's in its explanation! Interesting!

Anyway, like I said at the get-go: There's no overall point here :)

August 27, 2023 04:00 AM

August 23, 2023

Emmanuele Bassi

The Mirror

The GObject type system has been serving the GNOME community for more than 20 years. We have based an entire application development platform on top of the features it provides, and the rules that it enforces; we have integrated multiple programming languages on top of that, and in doing so, we expanded the scope of the GNOME platform in a myriad of directions. Unlike GTK, the GObject API hasn’t seen any major change since its introduction: aside from deprecations and little new functionality, the API is exactly the same today as it was when GLib 2.0 was released in March 2002. If you transported a GNOME developer from 2003 to 2023, they would have no problem understanding a newly written GObject class; though, they would likely appreciate the levels of boilerplate reduction, and the performance improvements that have been introduced over the years.

While having a stable API last this long is definitely a positive, it also imposes a burden on maintainers and users, because any change has to be weighted against the possibility of introducing unintended regressions in code that uses undefined, or undocumented, behaviour. There’s a lot of leeway when it comes to playing games with C, and GObject has dark corners everywhere.

The other burden is that any major change to a foundational library like GObject cascades across the entire platform. Releasing GLib 3.0 today would necessitate breaking API in the entirety of the GNOME stack and further beyond; it would require either a hard to execute “flag day”, or an impossibly long transition, reverberating across downstreams for years to come. Both solutions imply amounts of work that are simply not compatible with a volunteer-based project and ecosystem, especially the current one where volunteers of core components are now stretched thin across too many projects.

And yet, we are now at a cross-roads: our foundational code base has reached the point where recruiting new resources capable of affecting change on the project has become increasingly difficult; where any attempt at performance improvement is heavily counterbalanced by the high possibility of introducing world-breaking regressions; and where fixing the safety and ergonomics of idiomatic code requires unspooling twenty years of limitations inherent to the current design.

Something must be done if we want to improve the coding practices, the performance, and the safety of the platform without a complete rewrite.

The Mirror

‘Many things I can command the Mirror to reveal,’ she answered, ‘and to some I can show what they desire to see. But the Mirror also show things unbidden, and those are often stranger and more profitable than things we wish to behold. What you will see, if you leave the Mirror free to work, I cannot tell. For it shows things that were, and things that are, and things that yet may be. But which it is that he sees, even the wisest cannot always tell. Do you wish to look?’ — Lady Galadriel, “The Lords of the Rings”, Volume 1: The Fellowship of the Ring, Book 2: The Ring Goes South

In order to properly understand what we want to achieve, we need to understand the problem space that the type system is meant to solve, and the constraints upon which the type system was implemented. We do that by holding GObject up to Galadriel’s Mirror, and gazing into its surface.

Things that were

History became legend. Legend became myth. — Lady Galadriel, “The Lord of the Rings: The Fellowship of the Ring”

Before GObject there was GtkObject. It was a simpler time, it was a simpler stack. You added types only for the widgets and objects that related to the UI toolkit, and everything else was C89, with a touch of undefined behaviour, like calling function pointers with any number of arguments. Properties were “arguments”, likes were florps, and the timeline went sideways.

We had a class initialisation and an instance initialisation functions; properties were stored in a global hash table, but the property multiplexer pair of functions was stored on the type data instead of using the class structure. Types did not have private data: you only had keyed fields. No interfaces, only single inheritance. GtkObject was reference counted, and had an initially “floating” reference, to allow transparent ownership transfer from child to parent container when writing C code, and make the life of every language binding maintainer miserable in the process. There were weak references attached to an instance that worked by invoking a callback when the instance’s reference count reached zero. Signals operated exactly as they do today: large hash table of signal information, indexed by an integer.

None of this was thread safe. After all, GTK was not thread safe either, because X11 was not thread safe; and we’re talking about 1997: who even had hardware capable of multi-threading at the time? NPTL wasn’t even a thing, yet.

The introduction of GObject in 2001 changed some of the rules—mainly, around the idea of having dynamic types that could be loaded and unloaded in order to implement plugins. The basic design of the type system, after all, came from Beast, a plugin-heavy audio application, and it was extended to subsume the (mostly) static use cases of GTK. In order to support unloading, the class aspect of the type system was allowed to be cleaned up, but the type data had to be registered and never unloaded; in other words, once a type was registered, it was there forever.

Arguments” were renamed to properties, and were extended to include more than basic types, provide validations, and notify of changes; the overall design was still using a global hash table to store all the properties across all types. Properties were tied to the GObject type, but the property definition existed as a separate type hierarchy that was designed to validate values, but not manage fields inside a class. Signals were ported wholesale, with minimal changes mainly around the marshalling of values and abstracting closures.

The entire plan was to have GObject as one of the base classes at the root of a specific hierarchy, with all the required functionality for GTK to inherit from for its own GtkObject, while leaving open the possibility of creating other hierarchies, or even other roots with different functionality, for more lightweight objects.

These constraints were entirely intentional; the idea was to be able to port GTK to the new type system, and to an out of tree GLib, during the 1.3 development phase, and minimise the amount of changes necessary to make the transition work not just inside GTK, but inside of GNOME too.

Little by little, the entire GObject layer was ported towards thread safety in the only way that worked without breaking the type system: add global locks around everything; use read-write locks for the type data; lock the access and traversal of the property hash table and of the signals table. The only real world code bases that actively exercised multi-threading support were GStreamer and the GNOME VFS API that was mainly used by Nautilus.

With the 3.0 API, GTK dropped the GtkObject base type: the whole floating reference mechanism was moved to GObject, and a new type was introduced to provide the “initial” floating reference to derived types. Around the same time, a thread-safe version of weak references for GObject appeared as a separate API, which confused the matter even more.

Things that are

Darkness crept back into the forests of the world. Rumour grew of a Shadow in the East, whispers of a Nameless Fear. — Lady Galadriel, “The Lord of the Rings: The Fellowship of the Ring”

Let’s address the elephant in the room: it’s completely immaterial how many lines of code you have to deal with when creating a new type. It’s a one-off cost, and for most cases, it’s a matter of using the existing macros. The declaration and definition macros have the advantages of enforcing a series of best practices, and keep the code consistent across multiple projects. If you don’t want to deal with boilerplate when using C, you chose the wrong language to begin with. The existence of excessive API is mainly a requirement to allow other languages to integrate their type system with GObject’s own.

The dynamic part of the type system has gotten progressively less relevant. Yes: you can still create plugins, and those can register types; but classes are never unloaded, just like their type data. There is some attempt at enforcing an order of operations: you cannot just add an interface after a type has been instantiated any more; and while you can add properties and signals after class initialisation, it’s mainly a functionality reserved for specific language bindings to maintain backward compatibility.

Yes, defining properties is boring, and could probably be simplified, but the real cost is not in defining and installing a GParamSpec: it’s in writing the set and get property multiplexers, validating the values, boxing and unboxing the data, and dealing with the different property flags; none of those things can be wrapped in some fancy C preprocessor macros—unless you go into the weeds with X macros. The other, big cost of properties is their storage inside a separate, global, lock-happy hash table. The main use case of this functionality—adding entirely separate classes of properties with the same semantics as GObject properties, like style properties and child properties in GTK—has completely fallen out of favour, and for good reasons: it cannot be managed by generic code; it cannot be handled by documentation generators without prior knowledge; and, for the same reason, it cannot be introspected. Even calling these “properties” is kind of a misnomer: they are value validation objects that operate only when using the generic (and less performant) GObject accessor API, something that is constrained to things like UI definition files in GTK, or language bindings. If you use the C accessors for your own GObject type, you’ll have to implement validation yourself; and since idiomatic code will have the generic GObject accessors call the public C API of your type, you get twice the amount of validation for no reason whatsoever.

Signals have mostly been left alone, outside of performance improvements that were hard to achieve within the constraints of the existing implementation; the generic FFI-based closure turned out to be a net performance loss, and we’re trying to walk it back even for D-Bus, which was the main driver for it to land in the first place. Marshallers are now generated with a variadic arguments variant, to reduce the amount of boxing and unboxing of GValue containers. Still, there’s not much left to squeeze out of the old GSignal API.

The atomic nature of the reference counting can be a costly feature, especially for code bases that are by necessity single-threaded; the fact that the reference count field is part of the (somewhat) public API prevents fundamental refactorings, like switching to biased reference counting for faster operations on the same thread that created an instance. The lack of room on GObject also prevents storing the thread ID that owns the instance, which in turn prevents calling the GObjectClass.dispose() and GObjectClass.finalize() virtual functions on the right thread, and requires scheduling the destruction of an object on a separate main context, or locking the contents of an object at a further cost.

Things that yet may be

The quest stands upon the edge of a knife: stray but a little, and it will fail to the ruin of all. Yet hope remains, while the company is true. — Lady Galadriel, “The Lord of the Rings: The Fellowship of the Ring”

Over the years, we have been strictly focusing on GObject: speeding up its internals, figuring out ways to improve property registration and performance, adding new API and features to ensure it behaved more reliably. The type system has also been improved, mainly to streamline its use in idiomatic GObject code bases. Not everything worked: properties are still a problem; weak references and pointers are a mess, with two different API that interact badly with GObject; signals still exists on a completely separate plane; GObject is still wildly inefficient when it comes to locking.

The thesis of this strawman is that we reached the limits of backwards compatibility of GObject, and any attempt at improving it will inevitably lead to a more brittle code, rife with potential regressions. The typical answer, in this case, would be to bump the API/ABI of GObject, remove the mistakes of the past, and provide a new idiomatic approach. Sadly, doing so not only would require a level of resources we, as the GLib project stewards, cannot provide, but it would also completely break the entire ecosystem in a way that is not recoverable. Either nobody would port to the new GObject-3.0 API; or the various projects that depend on GObject would inevitably fracture, following whichever API version they can commit to; in the meantime, downstream distributors would suffer the worst effects of the shared platform we call “Linux”.

Between inaction and slow death, and action with catastrophic consequences, there’s the possibility of a third option: what if we stopped trying to emulate Java, and have a single “god” type?

Our type system is flexible enough to support partitioning various responsibilities, and we can defer complexity where it belongs: into faster moving dependencies, that have the benefit of being able to iterate and change at a much higher rate than the foundational library of the platform. What’s the point of shoving every possible feature into the base class, in order to cover ever increasingly complex use cases across multiple languages, when we can let consumers decide to opt into their own well-defined behaviours? What GObject ought to provide is a set of reliable types that can be combined in expressive ways, and that can be inspected by generic API.

A new, old base type

We already have a derivable type, called GTypeInstance. Typed instances don’t have any memory management: once instantiated, they can only be moved, or freed. All our objects already are typed instances, since GObject inherits from it. Contrary to the current common practices we should move towards using GTypeInstance for our types.

There’s a distinct lack of convenience API for defining typed instances, mostly derived from the fact that GTypeInstance is seen as a form of “escape hatch” for projects to use in order to avoid GObject. In practice, there’s nothing that prevents us from improving the convenience of creating new instantiatable/derivable types, especially if we start using them more often. The verbose API must still exist, to allow language bindings and introspection to handle this kind of types, but just like we made convenience macros for declaring and defining GObject types, we can provide macros for new typed instances, and for setting up a GValue table.

Optional functionality

Typed instances require a wrapper API to free their contents before calling g_type_free_instance(). Nothing prevents us from adding a GFinalizable interface that can be implemented by a GTypeInstance, though: interfaces exist at the type system level, and do not require GObject to work.

typedef struct {
  void (* finalize) (GFinalizable *self);
} GFinalizableInterface;

If a typed instance provides an implementation of GFinalizable, then g_type_free_instance() can free the contents of the instance by calling g_finalizable_finalize().

This interface is optional, in case your typed instance just contains simple values, like:

typedef struct {
  GTypeInstance parent;

  bool is_valid;
  double x1, y1;
  double x2, y2;
} Box;

and does not require deallocations outside of the instance block.

A similar interface can be introduced for cloning instances, allowing a copy operation alongside a move:

typedef struct {
  GClonable * (* clone) (GClonable *self);
} GClonable;

We could then introduce g_type_instance_clone() as a generic entry point that either used GClonable, or simply allocated a new instance and called memcpy() on it, using the size of the instance (and eventual private data) known to the type system.

The prior art for this kind of functionality exists in GIO, in the form of the GInitable and GAsyncInitable interfaces; unfortunately, those interfaces require GObject, and they depend on GCancellable and GAsyncResult objects, which prevent us from moving them into the lower level API.

Typed containers and life time management

The main functionality provided by GObject is garbage collection through reference counting: you acquire a (strong) reference when you need to access an instance, and release it when you don’t need the instance any more. If the reference you released was the last one, the instance gets finalized.

Of course, once you introduce strong references you open the door to a veritable bestiary of other type of references:

  • weak references, used to keep a “pointer” to the instance, and get a notification when the last reference drops
  • floating references, used as a C convenience to allow ownership transfer of newly constructed “child” objects to their “parent”
  • toggle references, used by language bindings that acquire a strong reference on an instance they wrap with a native object; when the toggle reference gets triggered it means that the last reference being held is the one on the native wrapper, and the wrapper can be dropped causing the instance to be finalized

All of these types of reference exist inside GObject, but since they were introduced over the years, they are bolted on top of the base class using the keyed data storage, which comes with its own costly locking and ordering; they are also managed through the finalisation code, which means there are re-entrancy issues or undefined ordering behaviours that routinely crop up over the years, especially when trying to optimise construction and destruction phases.

None of this complexity is, strictly speaking, necessary; we don’t care about an instance being reference counted: a “parent” object can move the memory of a “child” typed instance directly into its own code. What we care about is that, whenever other code interacts with ours, we can hand out a reference to that memory, so that ownership is maintained.

Other languages and standard libraries have the same concept:

These constructs are not part of a base class: they are wrappers around instances. This means you’re not handing out a reference to an instance: you are handing out a reference to a container, which holds the instance for you. The behaviour of the value is made explicit by the type system, not implicit to the type.

A simple implementation of a typed “reference counted” container would provide us with both strong and weak references:

typedef struct _GRc GRc;
typedef struct _GWeak GWeak;

GRc *g_rc_new (GType data_type, gpointer data);

GRc *g_rc_acquire (GRc *rc);
void g_rc_release (GRc *rc);

gpointer g_rc_get_data (GRc *rc);

GWeak *g_rc_downgrade (GRc *rc);
GRc *g_weak_upgrade (GWeak *weak);

bool g_weak_is_empty (GWeak *weak);
gpointer g_weak_get_data (GWeak *weak);

Alongside this type of containers, we could also have a specialisation for atomic reference counted containers; or pinned containers, which guarantee that an object is kept in the same memory location; or re-implement referenced containers inside each language binding, to ensure that the behaviour is tailored to the memory management of those languages.

Specialised types

Container types introduce the requirement of having the type system understand that an object can be the product of two types: the type of the container, and the type of the data. In order to allow properties, signals, and values to effectively provide introspection of this kind of container types we are going to need to introduce “specialised” types:

  • GRc exists as a “generic”, abstract type in the type system
  • any instance of GRc that contains a instance of type A gets a new type in the type system

A basic implementation would look like:

GRc *
g_rc_new (GType data_type, gpointer data)
{
  // Returns an existing GType if something else already
  // has registered the same GRc<T>
  GType rc_type =
    g_generic_type_register_static (G_TYPE_RC, data_type);

  // Instantiates GRc, but gives it the type of
  // GRc<T>; there is only the base GRc class
  // and instance initialization functions, as
  // GRc<T> is not a pure derived type
  GRc *res = (GRc *) g_type_create_instance (rc_type);
  res->data = data;

  return res;
}

Any instance of type GRc<A> satisfies the “is-a” relationship with GRc, but it is not a purely derived type:

GType rc_type =
  ((GTypeInstance *) rc)->g_class.g_type;
g_assert_true (g_type_is_a (rc_type, G_TYPE_RC));

The GRc<A> type does not have a different instance or class size, or its own class and instance initialisation functions; it’s still an instance of the GRc type, with a different GType. The GRc<A> type only exists at run time, as it is the result of the type instantiation; you cannot instantiate a plain GRc, or derive your type from GRc in order to create your own reference counted type, either:

// WRONG
GRc *rc = g_type_create_instance (G_TYPE_RC);

// WRONG
typedef GRc GtkWidget;

You can only use a GRc inside your own instance:

typedef struct {
  // GRc<GtkWidget>
  GRc *parent;
  // GRc<GtkWidget>
  GRc *first_child;
  // GRc<GtkWidget>
  GRc *next_sibling;

  // ...
} GtkWidgetPrivate;

Tuple types

Tuples are generic containers of N values, but right now we don’t have any way of formally declaring them into the type system. A hack is to use arrays of similarly typed values, but with the deprecation of GValueArray—which is a bad type that does not allow reference counting, and does not give you guarantees anyway—we only have C arrays and pointer types.

Registering a new tuple type would work like a generic type: a base GTuple abstract type as the “parent”, and a number of types:

typedef struct _GTuple GTuple;

GTuple *
g_tuple_new_int (size_t n_elements,
                 int elements[])
{
  GType tuple_type =
    g_tuple_type_register_static (G_TYPE_TUPLE, n_elements, G_TYPE_INT);

  GTuple *res = g_type_create_instance (tuple_type);
  for (size_t i = 0; i < n_elements; i++)
    g_tuple_add (res, elements[i]);

  return res;
}

We can also create specialised tuple types, like pairs:

typedef struct _GPair GPair;

GPair *
g_pair_new (GType this_type,
            GType that_type,
            ...);

This would give use the ability to standardise our API around fundamental types, and reduce the amount of ad hoc container types that libraries have to define and bindings have to wrap with native constructs.

Sum types

Of course, once we start with specialised types, we end up with sum types:

typedef enum {
  SQUARE,
  RECT,
  CIRCLE,
} ShapeKind;

typedef struct {
  GTypeInstance parent;

  ShapeKind kind;

  union {
    struct { Point origin; float side; };
    struct { Point origin; Size size; };
    struct { Point center; float radius; };
  } shape;
} Shape;

As of right now, discriminated unions don’t have any special handling in the type system: they are generally boxed types, or typed instances, but they require type-specific API to deal with the discriminator field and type. Since we have types for enumerations and instances, we can register them at the same time, and provide offsets for direct access:

GType
g_sum_type_register_static (const char *name,
                            size_t class_size,
                            size_t instance_size,
                            GType tag_enum_type,
                            offset_t tag_field);

This way it’s possible to ask the type system for:

  • the offset of the tag in an instance, for direct access
  • all the possible values of the tag, by inspecting its GEnum type

From then on, we can easily build types like Option and Result:

typedef enum {
  G_RESULT_OK,
  G_RESULT_ERR
} GResultKind;

typedef struct {
  GTypeInstance parent;

  GResultKind type;
  union {
    GValue value;
    GError *error;
  } result;
} GResult;

// ...
g_sum_type_register_static ("GResult",
                            sizeof (GResultClass),
                            sizeof (GResult),
                            G_TYPE_RESULT_KIND,
                            offsetof (GResult, type));

// ...
GResult *
g_result_new_boolean (gboolean value)
{
  GType res_type =
    g_generic_type_register_static (G_TYPE_RESULT,
                                    G_TYPE_BOOLEAN)
  GResult *res =
    g_type_create_instance (res_type);
  g_value_set_boolean (&res->result.value, value);

  return res;
}

// ...
g_autoptr (GResult) result = obj_finish (task);
switch (g_result_get_kind (result)) {
  case G_RESULT_OK:
    g_print ("Result: %s\n",
      g_result_get_boolean (result)
        ? "true"
        : "false");
    break;

  case G_RESULT_ERROR:
    g_printerr ("Error: %s\n",
      g_result_get_error_message (result));
    break;
}

// ...
g_autoptr (GResult) result =
  g_input_stream_read_bytes (stream);
if (g_result_is_error (result)) {
  // ...
} else {
  g_autoptr (GBytes) data = g_result_get_boxed (result);
  // ...
}

Consolidating GLib and GType

Having the type system in a separate shared library did make sense back when GLib was spun off from GTK; after all, GLib was mainly a set of convenient data types for a language that lacked a decent standard library. Additionally, not many C projects were interested in the type system, as it was perceived as a big chunk of functionality in an era where space was at a premium. These days, the smallest environment capable of running GLib code is plenty capable of running the GObject type system as well. The separation between GLib data types and the GObject type system has created data types that are not type safe, and work by copying data, by having run time defined destructor functions, or by storing pointers and assuming everything will be fine. This leads to code duplication between shared libraries, and prevents the use of GLib data types in the public API, lest the introspection information gets lost.

Moving the type system inside GLib would allow us to have properly typed generic container types, like a GVector replacing GArray, GPtrArray, GByteArray, as well as the deprecated GValueArray; or a GMap and a GSet, replacing GHashTable, GSequence, and GtkRBTree. Even the various list models could be assembled on top of these new types, and moved out of GTK.

Current consumers of GLib-only API would still have their basic C types, but if they don’t want to link against a slightly bigger shared library that includes GTypeInstance, GTypeInterface, and the newly added generic, tuple, and sum types, then they would probably be better served by projects like c-util instead.

Properties

Instead of bolting properties on top of GParamSpec, we can move their definition into the type system; after all, properties are a fundamental part of a type, so it does not make sense to bind them to the class instantiation. This would also remove the long-standing issue of properties being available for registration long after a class has been initialised; it would give us the chance to ship a utility for inspecting the type system to get all the meta-information on the hierarchy and generating introspection XML without having to compile a small binary.

If we move property registration to the type registration we can also finally move away from multiplexed accessors, and use direct instance field access where applicable:

GPropertyBuilder builder;

g_property_builder_init (&builder,
  G_TYPE_STRING, "name");
// Stop using flags, and use proper setters; since
// there's no use case for unsetting the readability
// flag, we don't even need a boolean argument
g_property_builder_set_readwrite (&builder);
// The offset is used for read and write access...
g_property_builder_set_private_offset (&builder,
  offsetof (GtkWidgetPrivate, name));
// ... unless an accessor function is provided; in
// this case we want setting a property to go through
// a function
g_property_builder_set_setter_func (&builder,
  gtk_widget_set_name);

// Register the property into the type; we return the
// offset of the property into the type node, so we can
// access the property definition with a fast look up
properties[NAME] =
  g_type_add_instance_property (type,
    g_property_builder_end (&builder));

Accessing the property information would then be a case of looking into the type system under a single reader lock, instead of traversing all properties in a glorified globally locked hash table.

Once we have a property registered in the type system, accessing it is a matter of calling API on the GProperty object:

void
gtk_widget_set_name (GtkWidget *widget,
                     const char *name)
{
  GProperty *prop =
    g_type_get_instance_property (GTK_TYPE_WIDGET,
                                  properties[NAME]);

  g_property_set (prop, name);
}

Signals

Moving signal registration into the type system would allow us to subsume the global locking into the type locks; it would also give us the chance to simplify some of the complexity for re-emission and hooks:

GSignalBuilder builder;

g_signal_builder_init (&builder, "insert-text");
g_signal_builder_set_args (&builder, 3,
  (GSignalArg[]) {
    { .name = "text", .gtype = G_TYPE_STRING },
    { .name = "length", .gtype = G_TYPE_SIZE },
    { .name = "position", .gtype = G_TYPE_OFFSET },
  });
g_signal_builder_set_retval (&builder,
  G_TYPE_OFFSET);
g_signal_builder_set_class_offset (&builder,
  offsetof (EditableClass, insert_text));

signals[INSERT_TEXT] =
  g_type_add_class_signal (type,
    g_signal_builder_end (&builder));

By taking the chance of moving signals out of the their own namespace we can also move to a model where each class is responsible for providing the API necessary to connect and emit signals, as well as providing callback types for each signal. This would allow us to increase type safety, and reduce the reliance on generic API:

typedef offset_t (* EditableInsertText) (Editable *self,
                                         const char *text,
                                         size_t length,
                                         offset_t position);

unsigned long
editable_connect_insert_text (Editable *self,
                              EditableInsertText callback,
                              gpointer user_data,
                              GSignalFlags flags);

offset_t
editable_emit_insert_text (Editable *self,
                           const char *text,
                           size_t length,
                           offset_t position);

Extending the type system

Some of the metadata necessary to provide properly typed properties and signals is missing from the type system. For instance, by design, there is no type representing a uint16_t; we are supposed to create a GParamSpec to validate the value of a G_TYPE_INT in order to fit in the 16bit range. Of course, this leads to excessive run time validation, and relies on C’s own promotion rules for variadic arguments; it also does not work for signals, as those do not use GParamSpec. More importantly, though, the missing connection between C types and GTypes prevents gathering proper introspection information for properties and signal arguments: if we only have the GType we cannot generate the full metadata that can be used by documentation and language bindings, unless we’re willing to lose specificity.

Not only the type system should be sufficient to contain all the standard C types that are now available, we also need the type system to provide us with enough information to be able to serialise those types into the introspection data, if we want to be able to generate code like signal API, type safe bindings, or accurate documentation for properties and signal handlers.

Introspection

Introspection exists outside of GObject mainly because of dependencies; the parser, abstract syntax tree, and transformers are written in Python and interface with a low level C tokeniser. Adding a CPython dependency to GObject is too much of a stretch, especially when it comes to bootstrapping a system. While we could keep the dependency optional, and allow building GObject without support for introspection, keeping the code separate is a simpler solution.

Nevertheless, GObject should not ignore introspection. The current reflection API inside GObject should generate data that is compatible with the libgirepository API and with its GIR parser. Currently, gobject-introspection is tasked with generating a small C executable, compiling it, running it to extract metadata from the type system, as well as the properties and signals of a GObject type, and generate XML that can be parsed and included into the larger GIR metadata for the rest of the ABI being introspected. GObject should ship a pre-built binary, instead; it should dlopen the given library or executable, extract all the type information, and emit the introspection data. This would not make gobject-introspection more cross-compilable, but it would simplify its internals and its distributability. We would not need to know how to compile and run C code from a Python script, for one; a simple executable wrapper around a native copy of the GObject-provided binary would be enough.

Ideally, we could move the girepository API into GObject itself, and allow it to load the binary data compiled out of the XML; language bindings loading the data at run time would then need to depend on GObject instead of an additional library, and we could ship the GIR → typelib compiler directly with GLib, leaving gobject-introspection to deal only with the parsing of C headers, docblocks, and annotations, to generate the XML representation of the C/GObject ABI.

There and back again

And the ship went out into the High Sea and passed on into the West, until at last on a night of rain Frodo smelled a sweet fragrance on the air and heard the sound of singing that came over the water. And then it seemed to him that as in his dream in the house of Bombadil, the grey rain-curtain turned all to silver glass and was rolled back, and he beheld white shores and beyond them a far green country under a swift sunrise. — “The Lord of the Rings”, Volume 3: The Return of the King, Book 6: The End of the Third Age

The hard part of changing a project in a backward compatible way is resisting the temptation of fixing the existing design. Some times it’s necessary to backtrack the chain of decisions, and consider the extant code base a dead branch; not because the code is wrong, or bug free, but because any attempt at doubling down on the same design will inevitably lead to breakage. In this sense, it’s easy to just declare “maintenance bankruptcy”, and start from a new major API version: breaks allow us to fix the implementation, at the cost of adapting to new API. For instance, widgets are still the core of GTK, even after 4 major revisions; we did not rename them to “elements” or “actors”, and we did not change how the windows are structured. You are still supposed to build a tree of widgets, connect callbacks to signals, and let the main event loop run. Porting has been painful because of underlying changes in the graphics stack, or because of portability concerns, but even with the direction change of favouring composition over inheritance, the knowledge on how to use GTK has been transferred from GTK 1 to 4.

We cannot do the same for GObject. Changing how it is implemented implies changing everything that depends on it; it means introducing behavioural changes in subtle, and hard to predict ways. Luckily for us, the underlying type system is still flexible and nimble enough that it can give us the ability to change direction, and implement an entirely different approach to object orientation—one that is more in line with languages like modern C++ and Rust. By following new approaches we can slowly migrate our platform to other languages over time, with a smaller impedance mismatch caused by the current design of our object model. Additionally, by keeping the root of the type system, we maintain the ability to provide a stable C ABI that can be consumed by multiple languages, which is the strong selling point of the GNOME ecosystem.

Why do all of this work, though? Compared to a full API break, this proposal has the advantage of being tractable and realistic; I cannot overemphasise enough how little appetite there is for a “GObject 3.0” in the ecosystem. The recent API bump from libsoup2 to libsoup3 has clearly identified that changes deep into the stack end up being too costly an effort: some projects have found it easier to switch to another HTTP library altogether, rather than support two versions of libsoup for a while; other projects have decided to drop compatibility with libsoup2, forcing the hand of every reverse dependency both upstream and downstream. Breaking GObject would end up breaking the ecosystem, with the hope of a “perfect” implementation way down the line and with very few users on one side, and a dead branch used by everybody else on the other.

Of course, the complexity of the change is not going to be trivial, and it will impact things like the introspection metadata and the various language bindings that exist today; some bindings may even require a complete redesign. Nevertheless, by implementing this new object model and leaving GObject alone, we buy ourselves enough time and space to port our software development platform towards a different future.

Maybe this way we will get to save the Shire; and even if we give up some things, or even lose them, we still get to keep what matters.

by ebassi at August 23, 2023 08:23 PM

August 21, 2023

Melissa Wen

AMD Driver-specific Properties for Color Management on Linux (Part 1)

TL;DR:

Color is a visual perception. Human eyes can detect a broader range of colors than any devices in the graphics chain. Since each device can generate, capture or reproduce a specific subset of colors and tones, color management controls color conversion and calibration across devices to ensure a more accurate and consistent color representation. We can expose a GPU-accelerated display color management pipeline to support this process and enhance results, and this is what we are doing on Linux to improve color management on Gamescope/SteamDeck. Even with the challenges of being external developers, we have been working on mapping AMD GPU color capabilities to the Linux kernel color management interface, which is a combination of DRM and AMD driver-specific color properties. This more extensive color management pipeline includes pre-defined Transfer Functions, 1-Dimensional LookUp Tables (1D LUTs), and 3D LUTs before and after the plane composition/blending.


The study of color is well-established and has been explored for many years. Color science and research findings have also guided technology innovations. As a result, color in Computer Graphics is a very complex topic that I’m putting a lot of effort into becoming familiar with. I always find myself rereading all the materials I have collected about color space and operations since I started this journey (about one year ago). I also understand how hard it is to find consensus on some color subjects, as exemplified by all explanations around the 2015 online viral phenomenon of The Black and Blue Dress. Have you heard about it? What is the color of the dress for you?

So, taking into account my skills with colors and building consensus, this blog post only focuses on GPU hardware capabilities to support color management :-D If you want to learn more about color concepts and color on Linux, you can find useful links at the end of this blog post.

Linux Kernel, show me the colors ;D

DRM color management interface only exposes a small set of post-blending color properties. Proposals to enhance the DRM color API from different vendors have landed the subsystem mailing list over the last few years. On one hand, we got some suggestions to extend DRM post-blending/CRTC color API: DRM CRTC 3D LUT for R-Car (2020 version); DRM CRTC 3D LUT for Intel (draft - 2020); DRM CRTC 3D LUT for AMD by Igalia (v2 - 2023); DRM CRTC 3D LUT for R-Car (v2 - 2023). On the other hand, some proposals to extend DRM pre-blending/plane API: DRM plane colors for Intel (v2 - 2021); DRM plane API for AMD (v3 - 2021); DRM plane 3D LUT for AMD - 2021. Finally, Simon Ser sent the latest proposal in May 2023: Plane color pipeline KMS uAPI, from discussions in the 2023 Display/HDR Hackfest, and it is still under evaluation by the Linux Graphics community.

All previous proposals seek a generic solution for expanding the API, but many seem to have stalled due to the uncertainty of matching well the hardware capabilities of all vendors. Meanwhile, the use of AMD color capabilities on Linux remained limited by the DRM interface, as the DCN 3.0 family color caps and mapping diagram below shows the Linux/DRM color interface without driver-specific color properties [*]:

Bearing in mind that we need to know the variety of color pipelines in the subsystem to be clear about a generic solution, we decided to approach the issue from a different perspective and worked on enabling a set of Driver-Specific Color Properties for AMD Display Drivers. As a result, I recently sent another round of the AMD driver-specific color mgmt API.

For those who have been following the AMD driver-specific proposal since the beginning (see [RFC][V1]), the main new features of the latest version [v2] are the addition of pre-blending Color Transformation Matrix (plane CTM) and the differentiation of Pre-defined Transfer Functions (TF) supported by color blocks. For those who just got here, I will recap this work in two blog posts. This one describes the current status of the AMD display driver in the Linux kernel/DRM subsystem and what changes with the driver-specific properties. In the next post, we go deeper to describe the features of each color block and provide a better picture of what is available in terms of color management for Linux.

The Linux kernel color management API and AMD hardware color capabilities

Before discussing colors in the Linux kernel with AMD hardware, consider accessing the Linux kernel documentation (version 6.5.0-rc5). In the AMD Display documentation, you will find my previous work documenting AMD hardware color capabilities and the Color Management Properties. It describes how AMD Display Manager (DM) intermediates requests between the AMD Display Core component (DC) and the Linux/DRM kernel interface for color management features. It also describes the relevant function to call the AMD color module in building curves for content space transformations.

A subsection also describes hardware color capabilities and how they evolve between versions. This subsection, DC Color Capabilities between DCN generations, is a good starting point to understand what we have been doing on the kernel side to provide a broader color management API with AMD driver-specific properties.

Why do we need more kernel color properties on Linux?

Blending is the process of combining multiple planes (framebuffers abstraction) according to their mode settings. Before blending, we can manage the colors of various planes separately; after blending, we have combined those planes in only one output per CRTC. Color conversions after blending would be enough in a single-plane scenario or when dealing with planes in the same color space on the kernel side. Still, it cannot help to handle the blending of multiple planes with different color spaces and luminance levels. With plane color management properties, userspace can get a more accurate representation of colors to deal with the diversity of color profiles of devices in the graphics chain, bring a wide color gamut (WCG), convert High-Dynamic-Range (HDR) content to Standard-Dynamic-Range (SDR) content (and vice-versa). With a GPU-accelerated display color management pipeline, we can use hardware blocks for color conversions and color mapping and support advanced color management.

The current DRM color management API enables us to perform some color conversions after blending, but there is no interface to calibrate input space by planes. Note that here I’m not considering some workarounds in the AMD display manager mapping of DRM CRTC de-gamma and DRM CRTC CTM property to pre-blending DC de-gamma and gamut remap block, respectively. So, in more detail, it only exposes three post-blending features:

  • DRM CRTC de-gamma: used to convert the framebuffer’s colors to linear gamma;
  • DRM CRTC CTM: used for color space conversion;
  • DRM CRTC gamma: used to convert colors to the gamma space of the connected screen.

AMD driver-specific color management interface

We can compare the Linux color management API with and without the driver-specific color properties. From now, we denote driver-specific properties with the AMD prefix and generic properties with the DRM prefix. For visual comparison, I bring the DCN 3.0 family color caps and mapping diagram closer and present it here again:

Mixing AMD driver-specific color properties with DRM generic color properties, we have a broader Linux color management system with the following features exposed by properties in the plane and CRTC interface, as summarized by this updated diagram:

The blocks highlighted by red lines are the new properties in the driver-specific interface developed by me (Igalia) and Joshua (Valve). The red dashed lines are new links between API and AMD driver components implemented by us to connect the Linux/DRM interface to AMD hardware blocks, mapping components accordingly. In short, we have the following color management properties exposed by the DRM/AMD display driver:

  • Pre-blending - AMD Display Pipe and Plane (DPP):
    • AMD plane de-gamma: 1D LUT and pre-defined transfer functions; used to linearize the input space of a plane;
    • AMD plane CTM: 3x4 matrix; used to convert plane color space;
    • AMD plane shaper: 1D LUT and pre-defined transfer functions; used to delinearize and/or normalize colors before applying 3D LUT;
    • AMD plane 3D LUT: 17x17x17 size with 12 bit-depth; three dimensional lookup table used for advanced color mapping;
    • AMD plane blend/out gamma: 1D LUT and pre-defined transfer functions; used to linearize back the color space after 3D LUT for blending.
  • Post-blending - AMD Multiple Pipe/Plane Combined (MPC):
    • DRM CRTC de-gamma: 1D LUT (can’t be set together with plane de-gamma);
    • DRM CRTC CTM: 3x3 matrix (remapped to post-blending matrix);
    • DRM CRTC gamma: 1D LUT + AMD CRTC gamma TF; added to take advantage of driver pre-defined transfer functions;

Note: You can find more about AMD display blocks in the Display Core Next (DCN) - Linux kernel documentation, provided by Rodrigo Siqueira (Linux/AMD display developer) in a 2021-documentation series. In the next post, I’ll revisit this topic, explaining display and color blocks in detail.

How did we get a large set of color features from AMD display hardware?

So, looking at AMD hardware color capabilities in the first diagram, we can see no post-blending (MPC) de-gamma block in any hardware families. We can also see that the AMD display driver maps CRTC/post-blending CTM to pre-blending (DPP) gamut_remap, but there is post-blending (MPC) gamut_remap (DRM CTM) from newer hardware versions that include SteamDeck hardware. You can find more details about hardware versions in the Linux kernel documentation/AMDGPU Product Information.

I needed to rework these two mappings mentioned above to provide pre-blending/plane de-gamma and CTM for SteamDeck. I changed the DC mapping to detach stream gamut remap matrixes from the DPP gamut remap block. That means mapping AMD plane CTM directly to DPP/pre-blending gamut remap block and DRM CRTC CTM to MPC/post-blending gamut remap block. In this sense, I also limited plane CTM properties to those hardware versions with MPC/post-blending gamut_remap capabilities since older versions cannot support this feature without clashes with DRM CRTC CTM.

Unfortunately, I couldn’t prevent conflict between AMD plane de-gamma and DRM plane de-gamma since post-blending de-gamma isn’t available in any AMD hardware versions until now. The fact is that a post-blending de-gamma makes little sense in the AMD color pipeline, where plane blending works better in a linear space, and there are enough color blocks to linearize content before blending. To deal with this conflict, the driver now rejects atomic commits if users try to set both AMD plane de-gamma and DRM CRTC de-gamma simultaneously.

Finally, we had no other clashes when enabling other AMD driver-specific color properties for our use case, Gamescope/SteamDeck. Our main work for the remaining properties was understanding the data flow of each property, the hardware capabilities and limitations, and how to shape the data for programming the registers - AMD color block capabilities (and limitations) are the topics of the next blog post. Besides that, we fixed some driver bugs along the way since it was the first Linux use case for most of the new color properties, and some behaviors are only exposed when exercising the engine.

Take a look at the Gamescope/Steam Deck Color Pipeline[**], and see how Gamescope uses the new API to manage color space conversions and calibration (please click on the image for a better view):

In the next blog post, I’ll describe the implementation and technical details of each pre- and post-blending color block/property on the AMD display driver.

* Thank Harry Wentland for helping with diagrams, color concepts and AMD capabilities.

** Thank Joshua Ashton for providing and explaining Gamescope/Steam Deck color pipeline.

*** Thanks to the Linux Graphics community - explicitly Harry, Joshua, Pekka, Simon, Sebastian, Siqueira, Alex H. and Ville - to all the learning during this Linux DRM/AMD color journey. Also, Carlos and Tomas for organizing the 2023 Display/HDR Hackfest where we have a great and immersive opportunity to discuss Color & HDR on Linux.

  1. Cinematic Color - 2012 SIGGRAPH course notes by Jeremy Selan: an introduction to color science, concepts and pipelines.
  2. Color management and HDR documentation for FOSS graphics by Pekka Paalanen: documentation and useful links on applying color concepts to the Linux graphics stack.
  3. HDR in Linux by Jeremy Cline: a blog post exploring color concepts for HDR support on Linux.
  4. Methods for conversion of high dynamic range content to standard dynamic range content and vice-versa by ITU-R: guideline for conversions between HDR and SDR contents.
  5. Using Lookup Tables to Accelerate Color Transformations by Jeremy Selan: Nvidia blog post about Lookup Tables on color management.
  6. The Importance of Being Linear by Larry Gritz and Eugene d’Eon: Nvidia blog post about gamma and color conversions.

August 21, 2023 11:13 AM

August 10, 2023

Brian Kardell

Igalia: Mid-season Power Rankings

Igalia: Mid-season Power Rankings

Let’s take a look at how the year is stacking up in terms of Open Source contributions. If this were an episode of Friends its title would be "The One With the Charts".

I’ve written before about how I have personally come to really appreciate the act of making a simple list of “things recently accomplished”. It’s always eye opening, and for me at least, usually therapeutic.

For me personally, it’s been super weird first half of the year and… I feel like I could use a nice list.

It's been a weird year, right?

I mean, not just for me personally, for all of us I guess, right?

Mass layoffs everywhere for a while, new shuffling of people we know from Google to Shopify, Mozilla to Google, Google to Igalia, Mozilla to Igalia, Mozilla to Apple, Google to Meta… Who’s on first? Third base!

LLMs are suddenly everywhere. All of the “big” CSS features people have been clamouring for forever are suddenly right here. HTML finally got <dialog> and now is getting a popover (via attributes). Apple’s got some funky XR glasses coming. There is suddenly significant renewed interest in two novel web engines. And that’s just some of the tech stuff.

So yeah… Let’s see what, if any, impacts all of this are having on the state of projects Igalia works on by looking at our commits so far this year… Note that Igalia's slice of the pie is separated in each of the charts for quick identification...

Quick disclaimers

All of these stats are based on commits publicly available through their respective gits. This is of course an imperfect measure for many reasons - some commits are huge, some are small. Some small commits are really hard while some large commits are easy, but verbose. Finally, the biggest challenge, even if we accept these metrics is mapping commits to organizations. We use a fairly elaborate system and many checkpoints - we collaborate annually with several of the projects to cross check these mappings. Still, you'll see lots of entries in these charts with just an individual's name. Often these are individual contractors or contributors, but sometimes it's just that we cannot currently map them some other way. If you see one that should be counted differently, please let me know!

The Big Web Projects

Igalia is still the one (and still having fun) with the most commits in Chromium and WebKit after Google and Apple, respectively, as we'll see... But we can add some more #1’s this year - even some where we’re not excluding the steward…

Chromium

Igalia claims a whopping 41.9% of the (non-Google) commits so far in 2023!! That’s more than Microsoft, Intel, Opera, Samsung and ARM combined!!! Yowza!

data

Top 10 contributors

ContributorContributions
Igalia41.56%
Microsoft16.42%
Intel11.74%
Opera4.88%
Ho Cheung3.92%
Samsung2.08%
Stephan Hartmann1.62%
ARM1.58%
Naver Corporation1.48%
Bytedance1.12%
127 other committers14%

As you read the others, keep in mind that the chromium repository actually has more than chrome inside it, so comparisons of these aren't Apples-to-Apples (or, Googles or foxes).

WebKit

52.6% of the non-Apple commits in WebKit so far this year are from Igalians. It's interesting to note that a huge 4.9% of all of these are from accounts with less than 10 commits this year - pretty close to what it was in Firefox!

data

Top 10 contributors

ContributorContributions
Igalia54.94%
Sony18.29%
Ahmad Saleem8.81%
Red Hat5.50%
Rose2.69%
open-tec.co.jp1.96%
warlow.dev1.91%
umich.edu0.62%
Alexey Knyazev0.51%
Google0.45%
40 other committers4%

Firefox

We're sitting at the #5 spot (excluding Mozilla) with 8.87% of commits. Firefox is, in a lot of ways, the trickiest to describe, but just look at it: It's very diverse! As the inventors of modern open source, I guess it makes sense. The mozilla-central repository has the most indidivual significant contributors as well as a really long line of tiny contributors. The tiny contributors (less than 10 commits) contributed 5.2% (compared to 4.85% in WebKit, for example). However, there are also a few external contributors who are just astoundingly profilfic (some of these bigger slices represent hundreds of commits) and such a number of significant indivial contributors, it amounts to a lot.

data

Top 10 contributors

ContributorContributions
André Bargull14.38%
Red Hat12.31%
Birchill10.32%
Gregory Pappas9.04%
Igalia8.87%
Robert Longson5.12%
CanadaHonk4.28%
Masatoshi Kimura2.65%
ganna2.21%
Itiel1.68%
174 other committers29%

Pause for a Note

When you look at these charts, it's really heartening to see how many people and organizations care and contribute. Especially when you look at the Mozilla/Firefox example, it really gives the impression that that project is just a million volunteers. But, it's important to keep it all in perspective too. WebKit has about 50 contributing orgs and individuals, Chrome about 140 and Firefox about 185. A lot more significant a % of contributions come from individuals in Mozilla. Importantly: In all of these projects, the steward org's contributions absolutely dwarf the rest of the world's contributions combined:

A version of the pies showing the steward's contributions, for scale (Mozilla contributed 87.2% of all commits, Apple 78.1% and Google 95.5% to their respective projects).

If you think this is astounding, please check out my post Webrise and our Web Ecosystem Health series on the Igalia Chats Podcast

Wolvic

A new #1 in the reports. I guess it should come as no suprise at all that we're #1 in terms of commits to our Wolvic XR browser. It looks at lot like other projects in terms of the steward's balance. What's more interesting, I think, is that its funding model is based on partnerships with several organizations and a collective rather than Igalia as a "go it alone" source.

data

Top 10 contributors

ContributorContributions
Igalia94.90%
opensuse.org0.77%
ratcliffefamily.org0.77%
gallegonovato0.51%
net-c.ca0.51%
Ayaskant Panigrahi0.51%
Anushka Chakraborty0.26%
Ana2k0.26%
Luna Jernberg0.26%
zohocorp.com0.26%

Servo

This year, thanks to some new funding and internal investment we can add Servo to a very special #1 list! Igalia is second to no one in terms of commits there either with 52.7% of all commits! An amazing 22.24% of those commits in servo are from unmappable committers with less than 10 commits so far this year!

data

Top 10 contributors

ContributorContributions
Igalia52.68%
Mozilla25.92%
sagudev5.11%
Pu Xingyu2.73%
michaelgrigoryan252.02%
Alex Touchet1.66%
2shiori171.55%
cybai (Haku)1.19%
Yutaro Ohno0.71%
switchpiggy0.59%
30 other committers6%

Test-262

Test-262 is the conformance test suite for ECMA work (JavaScript). I guess you could say we're doing a lot of work there as well, because guess who's got the most commits there? If you guessed Igalia, you'd be right, with 53.4% of all commits!

data

Top 10 contributors

ContributorContributions
Igalia53.64%
Google11.26%
Justin Grant10.60%
Richard Gibson3.97%
André Bargull3.31%
Jordan Harband3.31%
Bocoup3.31%
Veera2.65%
Huáng Jùnliàng1.99%
José Julián Espina1.32%

Note that total number of commits in Test262 is comparatively rather small as compared to many of the other projects here.

Babel

Igalians are now the #1 contributors to Babel, contributing 46.6% of all commits so far this year!

data

Top 10 contributors

ContributorContributions
Igalia46.35%
Huáng Jùnliàng22.32%
liuxingbaoyu19.31%
Jonathan Browne0.86%
Avery0.86%
fisker Cheung0.86%
Dimitri Papadopoulos Orfanos0.43%
FabianWarnecke0.43%
Abdulaziz Ghuloum0.43%
magic-akari0.43%

V8

Igalia is the #7 contributor to V8 (exclulding Google)! This is a pretty busy repo and it's interesting that 6.36% of these commits are from the unmapped/individual contributors with less than ten commits so far this year.

data

Top 10 contributors

ContributorContributions
Google85.66%
Red Hat2.98%
Intel2.57%
loongson.cn2.09%
iscas.ac.cn1.66%
Microsoft1.13%
ARM0.86%
Igalia0.79%
Bytedance0.41%
eswincomputing.com0.34%

Google's contributions account for a giant 87.5% of all commits here as well.

But that's not all!

All of the above is just looking specifically at the big web projects because, you know, the web is sort of my thing. If you're reading my blog, there's a pretty good chance it's your thing too. But Igalia does way more than that, if you can believe it. I probably don't talk about it enough, but it's pretty amazing. I suppose I can't give a million more charts, but here are just a few more highlights of other notable projects and specifications where Igalia has been playing a big role... (Keep in mind that specifications move a lot more slowly and so have generally far less commits)

  • HTML: Igalians were #3 among contribtor commits to HTML with 8.94% so far this year (behind Google and Apple).
  • Web Assembly: Igalia is the #3 contributor to Web Assembly with 8.75% of the commits so far this year!
  • ARIA: So far this year, Igalia is the #1 contributor to the ARIA repo with 19.4% of commits!
  • NativeScript: Igalia is currently the #1 contributor so far this year to the NativeScript repository with 58.3% of all commits!
  • GStreamer GStreamer is widely used and powerful open source multimedia framework. Igalia is the #2 contributor there!
  • VK-GL-CTS: The official Kronos OpenGL and Vulkan conformance test suite (graphics). It would be a massive understatement to say that Igalia has been a major contributor: We're the #1 contributor there with 31.1% of all commits.
  • Mesa: The Mesa 3D Graphics Library is huge and contains open source implementations of pretty much every graphical standard (Vulkan, as mentioned above, for example). Igalia is the #5 contributor there so far this year, contributing 6.62% of all commits.
  • Piglit: Piglit is an open-source test suite for OpenGL implementations. Igalia is the #5 contributor there with 6.86%

Wrapping up...

It's always amazing to me to look at the data. I hope it's interesting to others too. There are, of course, lots of reasons that all of the committers do what they do, but ultimately, open source development and maintenance benefits us all. The reason that Igalia is able to do all of this is that we are funded by a diverse array of clients making things downstream with needs.

You know where to find us...

August 10, 2023 04:00 AM

August 08, 2023

Víctor Jáquez

DMABuf modifier negotiation in GStreamer

It took almost a year of design and implementation but finally the DMABuf modifier negotiation in GStreamer is merged. Big kudos to all the people involved but mostly to He Junyan, who did the vast majority of the code.

What’s a DMAbuf modifier?

DMABuf are the Linux kernel mechanism to share buffers among different drivers or subsystems. A particular case of DMABuf are the DRM PRIME buffers which are buffers shared by the Display Rendering Manager (DRM) subsystem. They allowed sharing video frames between devices with zero copy.

When we initially added support for DMABuf in GStreamer, we assumed that only color format and size mattered, just as old video frames stored in system memory. But we were wrong. Beside color format and size, also the memory layout has to be considered when sharing DMABufs. By not considering it, the produced output had horrible tiled artifacts in screen. This memory layout is known as modifier, and it’s uniquely described by an uint64 number.

How was designed and implemented?

First, we wrote a design document for caps negotiation with dmabuf
modifiers
, where we added a new color format (DMA_DRM) and a new caps field (drm-format). This new caps field holds a string, or a list of strings, composed by the tuple DRM_color_format : DRM_modifier.

Second, we extended the video info object to support DMABuf with helper functions that parse and construct the drm-format field.

Third, we added the dmabuf caps negotiation in glupload. This part was the most difficult one, since the capability of importing DMABufs to OpenGL (which is only available in EGL/GLES) is run-time defined, by querying the hardware. Also, there are two code paths to import frames: direct or RGB-emulated. Direct would be the most efficient, but it depends on the presence of GLES2 API in the driver; while RGB-emulated is imported as a set of RGB images where each component is an image. At the end more than a thousand lines of code were added to the glupload element, beside the code added to EGL context object.

Fourth, and unexpectedly, waylandsink also got DMABuf caps negotiation
support.

And lastly, decoders in `va** plugin merged theirs DMABuf caps negotiation support.

How I can test it?

You need, of course, to user the current main branch of GStreamer, since it’s just fresh and there’s no release yet. Then you need a box with VA support. And if you inspect, for example, vah264dec, you might see this output if your box is Intel (but also AMD through Mesa is supported though the negotiated memory is linear so far):

Pad Templates:
SINK template: 'sink'
Availability: Always
Capabilities:
video/x-h264
profile: { (string)main, (string)baseline, (string)high, (string)progressive-high, (string)constrained-high, (string)constrained-baseline }
width: [ 1, 4096 ]
height: [ 1, 4096 ]
alignment: au
stream-format: { (string)avc, (string)avc3, (string)byte-stream }

SRC template: 'src'
Availability: Always
Capabilities:
video/x-raw(memory:VAMemory)
width: [ 1, 4096 ]
height: [ 1, 4096 ]
format: NV12
video/x-raw(memory:DMABuf)
width: [ 1, 4096 ]
height: [ 1, 4096 ]
format: DMA_DRM
drm-format: NV12:0x0100000000000002
video/x-raw
width: [ 1, 4096 ]
height: [ 1, 4096 ]
format: NV12

What it’s saying, for memory:DMABuf caps feature, the drm-format to negotiate is NV12:0x0100000000000002.

Now some tests:

NOTE: These commands assume that va decoders are primary ranked (see merge request 2312), and that you’re in a Wayland session.

$ gst-play-1.0 --flags=0x47 video.file --videosink=waylandsink
$ GST_GL_API=gles2 gst-play-1.0 --flags=0x47 video.file --videosink=glimagesink
$ gst-play-1.0 --flags=0x47 video.file --videosink='glupload ! gtkglsink'

Right now it’s required to add --flags=0x47 to playbin because it adds video filters that still don’t negotiate the new DMABuf caps.

GST_GL_API=gles2 instructs GStreamer OpenGL to use GLES2 API, which allows direct importation of YUV images.

Thanks to all the people involved in this effort!

As usual, if you would like to learn more about DMABuf, VA-API, GStreamer or any other open multimedia framework, contact us!

by vjaquez at August 08, 2023 01:00 PM

August 02, 2023

Pablo Saavedra

http503

This article will delve deeper into the intricacies of the GTK FrameClock, its interaction with the compositor, and how it ensures smooth and synchronized animations. Specifically, we will explore the GTK FrameClockIdle implementation and understand how it manages timing cycles and aligns them with VSync signals in the Wayland platform to optimize performance and enhance the user experience.

Over the last few days, I have been immersed in understanding the inner workings of the GTK FrameClock. This exploration holds significant importance to better understanding of the integration of animated applications. My focus in this post lies in comprehending two key aspects:

  • How the clock utilizes the system time to implement its ticks.
  • The synchronization mechanism the clock employs with the display refresh rate (VSync).

Notice: The article uses this code for the examples.

First steps. The overall view …

The gdk.FrameClock can be likened to a timing coordinator for a window within an application. It plays a vital role in informing the application when to update and repaint the window. By optionally syncing with the monitor’s refresh rate, it ensures smooth animations. Even without synchronization, the gdk.FrameClock aids in synchronizing painting operations, reducing unnecessary frames and optimizing performance. Additionally, the frame clock can pause painting when frames will not be visible, or adjust animation rates as needed.

When an application requests a frame, the frame clock processes it and emits signals for different phases. These signals help update animations. The phases of a FrameClock can be the following:

  1. Before Paint: This phase occurs before the painting process of a frame. It is a preparatory phase where the application can perform any necessary setup or calculations before the actual rendering.
  2. Update: The FrameClock updates the state of animations and other time-based elements. It signals the application to update the content of the frame.
  3. Layout: This phase involves the layout calculation, where the application organizes and positions the elements to be displayed in the frame.
  4. Draw: The application performs the actual rendering of the frame. It involves painting the content on the screen based on the updated layout.
  5. Paint: After the rendering is completed, this phase marks the end of the painting process. The frame is ready to be presented to the screen.
  6. After Paint: This phase follows the completion of painting and may involve additional clean-up or bookkeeping tasks related to the frame presentation.

These phases represent the sequence of events that a FrameClock typically goes through during the generation and presentation of a frame. The phase it can be adjusted manually with the gdk_frame_clock_request_phase() method.

GTK internally manages the concept of the frame drawn signal. This signal informs the gdk.FrameClock about the successful rendering and presentation of a frame on the screen by the compositor or windowing system. This signal is crucial as it allows the FrameClock to stay aligned with the monitor’s vertical refresh rate (VSync) whenever the signal is received. In the absence of the frame drawn signal, the frame clock cycles continue to occur at a constant cadence, providing regular updates to the application.

Understanding the cycle of the time …

The frame time given by Frame.clockGetFrameTime() is reported in microseconds and is similar to g_get_monotonic_time() but not the same. It doesn’t change while a frame is being painted and stays the same for similar calls outside of a frame. This makes sure that different animations timed using the frame time stay synchronized. Overall, gdk.FrameClock helps keep animations smooth and coordinated. The next output of the gtk-frame-clock-example application illustrates a complete frame generation cycle:

(s): Cycle start
               clock:on_before_paint
               clock:on_update
               get timings:
               |  - now: 1803864692852
               |  - frame time: 1803864709176 (counter: 75) (frame time - now: 16324)
               |  - predicted presentation time: 1803864726132 (predicted - now: 33280)
               \
1803864692852:  widget:on_tick_callback (rate: 16454)
               clock:on_layout
               get timings:
               |  - now: 1803864694022
               |  - frame time: 1803864709176 (counter: 75) (frame time - now: 15154)
               |  - predicted presentation time: 1803864726132 (predicted - now: 32110)
               \
1803864694022:  widget:on_draw (tick-draw latency: 1170)
               clock:on_paint
               clock:on_after_paint
               get timings:
               |  - now: 1803864709326
               |  - frame time: 1803864709176 (counter: 75) (frame time - now: -150)
               |  - predicted presentation time: 1803864726132 (predicted - now: 16806)
               \
1803864709326:  wl_surface:on_commit
(e): End of cycle

The sequence of events and timings associated with this example frame generation cycle are described below:

  1. (s) Cycle Start: The cycle begins, representing the start of a new frame generation cycle.
  2. clock:on_before_paint: The FrameClock emits the “on_before_paint” signal, indicating the preparation phase before painting the current frame.
  3. clock:on_update: The FrameClock emits the “on_update” signal, triggering an update for the current frame. The
  4. get timings: Show various timings related to the frame generation cycle. These timings include:
  • now: The current monotonic system time in microseconds.
  • frame time: The time allocated for rendering and painting this frame, along with a counter indicating the frame number.
  • predicted presentation time: The expected time when this frame will be presented on the screen.
  1. widget:on_tick_callback (rate: 16454): The application’s widget is receiving a tick callback, at an average interval of 16454 microsecons. This callback notifies the application about the right time to initiate the generation of the next frame. Usually the application will to decide if it has to update the animations and it will put in the queue (gtk_widget_queue_draw())
  2. clock:on_layout: The FrameClock emits the “on_layout” signal, indicating the layout phase, where the application prepares the layout before painting the frame.
  3. widget:on_draw (tick-draw latency: 1170): The widget receives a draw callback, which indicates the right moment to paint the frame. The “tick-draw latency” measures the time delay between the tick callback and the actual drawing of the frame.
  4. clock:on_paint: The FrameClock emits the “on_paint” signal, marking the actual painting phase of the frame.
  5. clock:on_after_paint: The FrameClock emits the “on_after_paint” signal, indicating that the frame painting is completed.
  6. wl_surface:on_commit: This event indicates that the Wayland surface has been committed, meaning that the frame has been drawn and is ready for presentation.
  7. (e) End of Cycle: The cycle ends, representing the completion of the frame generation cycle.

How does the FrameClock calculate the frame times …

When an animation begins, its first cycle might start at a random time due to external triggers like input events or timers. This phase shift, called the phase of the clock cycle start time, impacts the smoothness of animations.

During the first cycle, the smooth frame time is set at the cycle’s start time. Subsequent cycles may not align with vsync signals. However, once a frame drawn signal is received from the compositor, the clock cycles will synchronize with vsync signals, maintaining a regular cadence. This may cause the first vsync-related cycle to occur close to the previous non-vsync-related one, altering the phase of cycle start times.

To ensure consistent reported frame times, adjustments are made to the frame time. The phase of the first clock cycle start time is computed, considering skipped frames due to compositor stalls. The goal is to have the first vsync-related smooth time separated by exactly 1 frame interval from the previous one. This adjustment maintains regularity even if “frame drawn” signals are missed in subsequent frames.

In the next diagram from gdk/gdkframeclockidle.c#L468, the relationship between vsync signals, clock cycle starts, adjusted frame times, and “frame drawn” events is illustrated. The changing cadence of the clock cycles after the first vsync-related cycle is highlighted, while the regularity of the cycle cadence is maintained even if “frame drawn” events are absent in certain frames.

In the following diagram, '|' mark a vsync, '*' mark the start of a clock cycle, '+' is the adjusted
frame time, '!' marks the reception of *frame drawn* events from the compositor. Note that the clock
cycle cadence changed after the first vsync-related cycle. This cadence is kept even if we don't
receive a 'frame drawn' signal in a subsequent frame, since then we schedule the clock at intervals of
refresh_interval.

vsync             |           |           |           |           |           |... 
frame drawn       |           |           |!          |!          |           |...
cycle start       |       *   |       *   |*          |*          |*          |...
adjusted times    |       *   |       *   |       +   |       +   |       +   |...
phase                                      ^------^

You can get more information from the comment om gdk/gdkframeclockidle.c for more in-deph information about how the FrameClock handles the adjustment of reported frame times. Here is where the concept of frame drawn is introduced and explained in detail. As it was mentioned before, the frame drawn signal refers to whatever method to allows the FrameClock to know when a frame has been successfully drawn and presented on the screen by the compositor or windowing system.

Initially, the frame clock cycles occur at a regular interval, approximately matching the desired frame rate, but these cycles are not directly tied to the monitor’s vertical refresh rate (VSync) but it will be eventually smoothly aligned as far a frame drawn signal is received.

In the absence of the frame drawn signal, the frame clock cycles will continue to occur at a constant cadence. However, when the frame drawn signal is received from the compositor, it marks the successful completion of frame rendering and indicates that the frame clock cycles should align with the monitor’s VSync signals.

The frame draw signal for GTK in the Wayland platform it is the “frame.done” signal. This represents the Vsync for GTK in a Wayland environment. The FrameClock becomes freeze/unfreeze as long as the (gdk_frame_clock_idle_is_frozen function in gdkframeclockidle.c#L279) ticks are being accumulated and there is not a “frame.done” callback invokation from the compositor. This is how it works for the particular case of Wayland but similar approach are used for X11 and other supported platforms on GTK.

How can I get a FrameClock for my widget?

Unfortunatelly, there are not public methods in the GTK API for the manual creation of FrameClock instances. The common way to get a frame clock for a GTK widget is by adding it to a GTK window and then request for the clock with
gtk_widget_get_frame_clock() once the widget were realized:

static void on_realize(GtkWidget* widget, gpointer user_data) {
    frame_clock = gtk_widget_get_frame_clock(widget);
}

GtkWidget *drawing_area = gtk_drawing_area_new();
gtk_container_add(GTK_CONTAINER(window), drawing_area);

g_signal_connect(drawing_area, "realize", G_CALLBACK(on_realize), NULL);

The obtained FrameClock will be the one created during the instantiation of a new GTK window:

#0  gdk_frame_clock_idle_init (frame_clock_idle=0x5555555e8140) at ../../../../gdk/gdkframeclockidle.c:137
#1  0x00007ffff7e67fba in g_type_create_instance (type=<optimized out>) at ../../../gobject/gtype.c:1929
#2  0x00007ffff7e4f0ed in g_object_new_internal (class=class@entry=0x5555555efa80, params=params@entry=0x0, n_params=n_params@entry=0) at ../../../gobject/gobject.c:2023
#3  0x00007ffff7e5034d in g_object_new_with_propertiesPython Exception <class 'TypeError'>: can only concatenate str (not "NoneType") to str
 (object_type=, n_properties=0, names=names@entry=0x0, values=values@entry=0x0) at ../../../gobject/gobject.c:2193
#4  0x00007ffff7e50e51 in g_object_new (object_type=<optimized out>, first_property_name=first_property_name@entry=0x0) at ../../../gobject/gobject.c:1833
#5  0x00007ffff7ed8ba9 in gdk_window_new (parent=0x555555581110, attributes=0x7fffffffd770, attributes_mask=44) at ../../../../gdk/gdkwindow.c:1488
#6  0x00007ffff7f22c42 in create_foreign_dnd_window (display=0x55555557c0e0) at wayland/../../../../../gdk/wayland/gdkdevice-wayland.c:4803
#7  _gdk_wayland_device_manager_add_seat (wl_seat=<optimized out>, id=<optimized out>, device_manager=0x555555572e60) at wayland/../../../../../gdk/wayland/gdkdevice-wayland.c:5177
#8  _gdk_wayland_display_add_seat (version=<optimized out>, id=<optimized out>, display_wayland=0x55555557c0e0) at wayland/../../../../../gdk/wayland/gdkdisplay-wayland.c:238
#9  seat_added_closure_run (display_wayland=0x55555557c0e0, closure=<optimized out>) at wayland/../../../../../gdk/wayland/gdkdisplay-wayland.c:249
#10 0x00007ffff7f241d1 in process_on_globals_closures (display_wayland=0x55555557c0e0) at wayland/../../../../../gdk/wayland/gdkdisplay-wayland.c:209
#11 _gdk_wayland_display_open (display_name=<optimized out>) at wayland/../../../../../gdk/wayland/gdkdisplay-wayland.c:621
#12 0x00007ffff7ec268f in gdk_display_manager_open_display (manager=<optimized out>, name=0x0) at ../../../../gdk/gdkdisplaymanager.c:462
#13 0x00007ffff784ed4b in gtk_init_check (argc=<optimized out>, argv=<optimized out>) at ../../../../gtk/gtkmain.c:1110
#14 gtk_init_check (argc=<optimized out>, argv=<optimized out>) at ../../../../gtk/gtkmain.c:1102
#15 0x00007ffff784ed7d in gtk_init (argc=<optimized out>, argv=<optimized out>) at ../../../../gtk/gtkmain.c:1167
#16 0x0000555555557144 in main (argc=1, argv=0x7fffffffda88) at /home/user/local/git/examples/example_gdk_frame_clock/src/main.c:176

When is the right time to paint my widget?

Overall, the following code demonstrates how to set up a basic drawing area in a GTK application, connect pre-frame and drawing callbacks, and handle custom graphics rendering using the Cairo library. The on_tick_callback ensures that the widget is scheduled for redraw, and the on_draw function is responsible for actually rendering the graphics within the widget:

/*
 * This signal is emitted when a widget to be redrawn in the PAINT PHASE of the current or the next frame.
 */
static int on_tick_callback(GtkWidget *widget, GdkFrameClock *frame_clock, gpointer user_data) {
    // Schedules this widget to be redrawn in the paint phase of the current or the next frame.
    gtk_widget_queue_draw(GTK_WIDGET(user_data));
    // ...
    return G_SOURCE_CONTINUE;
}

/*
 * This signal is emitted when a widget is supposed to render itself in the PAINT PHASE.
 */
static gboolean on_draw(GtkWidget *widget, cairo_t *cr, gpointer user_data) {
    // Your drawing operations here. E.g: cairo_paint(cr);
    // ...
    return FALSE;
}

// ...

GtkWidget *drawing_area = gtk_drawing_area_new();
gtk_container_add(GTK_CONTAINER(window), drawing_area);

gtk_widget_add_tick_callback(GTK_WIDGET(drawing_area), on_tick_callback, drawing_area, NULL);
g_signal_connect(drawing_area, "draw", G_CALLBACK(on_draw), NULL);

// ...

The FrameClock will notify the widget when it is the rigth time to schedule the generation of a new frame. This happens in the update phase:

#0  gtk_widget_on_frame_clock_update (frame_clock=0x5555555e74c0, widget=0x5555555a8530) at ../../../../gtk/gtkwidget.c:5273
#4  0x00007ffff7e5c863 in <emit signal ??? on instance ???> (instance=instance@entry=0x5555555e74c0, signal_id=<optimized out>, detail=detail@entry=0) at ../../../gobject/gsignal.c:3587
    #1  0x00007ffff7e3ed2f in g_closure_invoke (closure=0x5555559fd320, return_value=0x0, n_param_values=1, param_values=0x7fffffffd540, invocation_hint=0x7fffffffd4c0) at ../../../gobject/gclosure.c:830
    #2  0x00007ffff7e5ac36 in signal_emit_unlocked_R
    (node=node@entry=0x5555555aa000, detail=detail@entry=0, instance=instance@entry=0x5555555e74c0, emission_return=emission_return@entry=0x0, instance_and_params=instance_and_params@entry=0x7fffffffd540)
    at ../../../gobject/gsignal.c:3777
    #3  0x00007ffff7e5c614 in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=var_args@entry=0x7fffffffd6f0) at ../../../gobject/gsignal.c:3530
#5  0x00007ffff7ed0b57 in _gdk_frame_clock_emit_update (frame_clock=0x5555555e74c0) at ../../../../gdk/gdkframeclock.c:645
#6  gdk_frame_clock_paint_idle (data=0x5555555e74c0) at ../../../../gdk/gdkframeclockidle.c:547
#7  0x00007ffff7ebd2ad in gdk_threads_dispatch (data=0x55555578b140, data@entry=<error reading variable: value has been optimized out>) at ../../../../gdk/gdk.c:769
#8  0x00007ffff6b032c8 in g_timeout_dispatch (source=0x555555677120, callback=<optimized out>, user_data=<optimized out>) at ../../../glib/gmain.c:4973
#9  0x00007ffff6b02c44 in g_main_dispatch (context=0x55555558e800) at ../../../glib/gmain.c:3419
#10 g_main_context_dispatch (context=0x55555558e800) at ../../../glib/gmain.c:4137
#11 0x00007ffff6b58258 in g_main_context_iterate.constprop.0 (context=0x55555558e800, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../glib/gmain.c:4213
#12 0x00007ffff6b022b3 in g_main_loop_run (loop=0x5555558d6d30) at ../../../glib/gmain.c:4413
#13 0x00007ffff7848d2d in gtk_main () at ../../../../gtk/gtkmain.c:1329
#14 0x0000555555557276 in main (argc=1, argv=0x7fffffffda88) at /home/user/local/git/examples/example_gdk_frame_clock/src/main.c:194

The gtk_widget_add_tick_callback() actually is attaching the callback to the update signal from the FrameClock. Therefore, it basically queues the animation updates and attaches a callback executed before each frame (check the implementation of gtk_widget_add_tick_callback() in gtk/gtkwidget.c):

guint
gtk_widget_add_tick_callback (GtkWidget       *widget,
                              GtkTickCallback  callback,
                              gpointer         user_data,
                              GDestroyNotify   notify)
{
  // ...
      frame_clock = gtk_widget_get_frame_clock (widget);

      if (frame_clock)
        {
          priv->clock_tick_id = g_signal_connect (frame_clock, "update",
                                                  G_CALLBACK (gtk_widget_on_frame_clock_update),
                                                  widget);
          gdk_frame_clock_begin_updating (frame_clock);
        }
  // ...
  info = g_new0 (GtkTickCallbackInfo, 1);
  // info->...
  info->callback = callback;
  // info->...
  priv->tick_callbacks = g_list_prepend (priv->tick_callbacks,
                                         info);
  // ...

The callback runs frequently, matching the output device’s frame rate or the app’s repaint speed, whichever is slower. Inside the on_tick_callback() function, the gtk_widget_queue_draw() is called to schedule the specified widget for redrawing during the current or next frame’s paint phase. This means the widget will be marked for update, and its draw signal will be emitted.

Lastly, the on_draw() callback will be responsible for rendering and drawing on a widget during the paint phase. This callback takes three parameters: a GtkWidget pointer (widget), a cairo_t context pointer (cr) for drawing operations, and user data pointer (user_data). This callback is the right place for performing draw operations using the Cairo drawing library, for example.

Keeping the ticks aligned with the VSync signals

As I already mentioned, the GTK framework internally handles the concept of the frame drawn signal. This signal lets the gdk.FrameClock know when a frame has been successfully rendered and presented on the screen. This is vital for keeping the FrameClock in sync with the monitor’s refresh rate (VSync) upon receiving the signal. With the frame drawn signal, the frame clock maintains a consistent cycle, delivering regular updates to the application.

In the context of a GTK application running in a Wayland environment this sync is implemented by adding an listener to the .done event for the wl_surface_commit() action. This is the frame_callback callback added from the on_frame_clock_after_paint() in the gdk/wayland/gdkwindow-wayland.c:

static void
on_frame_clock_after_paint (GdkFrameClock *clock,
                            GdkWindow     *window)
{
  // ...
  if (impl->surface_callback == NULL)
    {
      callback = wl_surface_frame (impl->display_server.wl_surface);
      wl_callback_add_listener (callback, &frame_listener, window);  // <-- Here
      impl->surface_callback = callback;
    }
  // ...
static void
frame_callback (void               *data,
                struct wl_callback *callback,
                uint32_t            time)
{
  // ...
  _gdk_frame_clock_thaw (clock);  
  // ...
}

The frame_callback will be called when the server has finished processing the surface commit and has made the changes visible on the screen. This function will inmediately thaw the Frameclock.

The term thaw refers to the process of unfreezing the FrameClock after it has been frozen. When a FrameClock is frozen, it means that the clock is temporarily paused or halted, preventing the generation of new frame ticks and updates.

When the FrameClock thaws, it resumes its normal operation of generating frame ticks and updates. This is typically done when the application determines that it needs to resume animations or updates that were previously paused.

GTK uses this freezing mechanism to optimize performance and reduce unnecessary updates during periods when animations or updates are not needed, or to align the generation of next frame with the presentation time of the current frame when limited by the monitor’s vertical refresh rate (VSync).

The following is a GDB backtrace from the example code with a breakpoint added in the frame_callback() function:

(gdb) b frame_callback
Breakpoint 3 at 0x7ffff7f2e100: file wayland/../../../../../gdk/wayland/gdkwindow-wayland.c, line 570.
(gdb) c
Continuing.
(s): Cycle start
               clock:on_before_paint
               clock:on_update
               get timings:
               |  - now: 1903492480109
               |  - frame time: 1903492494736 (counter: 11271) (frame time - now: 14627)
               |  - predicted presentation time: 1903492505049 (predicted - now: 24940)
               \
1903492480109:  widget:on_tick_callback (rate: 337853788)
               get timings:
               |  - now: 1903492480316
               |  - frame time: 1903492494736 (counter: 11271) (frame time - now: 14420)
               |  - predicted presentation time: 1903492505049 (predicted - now: 24733)
               \
1903492480316:  widget:on_draw (tick-draw latency: 207)
               clock:on_paint
               clock:on_after_paint
               get timings:
               |  - now: 1903492499890
               |  - frame time: 1903492494736 (counter: 11271) (frame time - now: -5154)
               |  - predicted presentation time: 1903492505049 (predicted - now: 5159)
               \
1903492499890:  wl_surface:on_commit
(e): End of cycle
Thread 1 "gtk-frame-clock" hit Breakpoint 3, frame_callback (data=0x555555581ad0, callback=0x555555bd2420, time=1903492499) at wayland/../../../../../gdk/wayland/gdkwindow-wayland.c:570
570    {
(gdb) bt
#0  frame_callback (data=0x555555581ad0, callback=0x555555bd2420, time=1903492499) at wayland/../../../../../gdk/wayland/gdkwindow-wayland.c:570
#1  0x00007ffff66e9e2e in ffi_call_unix64 () at ../src/x86/unix64.S:105
#2  0x00007ffff66e6493 in ffi_call_int (cif=<optimized out>, fn=<optimized out>, rvalue=<optimized out>, avalue=<optimized out>, closure=<optimized out>) at ../src/x86/ffi64.c:672
#3  0x00007ffff74cdad0 in wl_closure_invoke (closure=closure@entry=0x5555555e1ac0, target=<optimized out>, target@entry=0x555555bd2420, opcode=opcode@entry=0, data=<optimized out>, flags=<optimized out>) at ../src/connection.c:1025
#4  0x00007ffff74ce243 in dispatch_event (display=display@entry=0x555555575220, queue=0x5555555752f0, queue=<optimized out>) at ../src/wayland-client.c:1583
#5  0x00007ffff74ce43c in dispatch_queue (queue=0x5555555752f0, display=0x555555575220) at ../src/wayland-client.c:1729
#6  wl_display_dispatch_queue_pending (display=0x555555575220, queue=0x5555555752f0) at ../src/wayland-client.c:1971
#7  0x00007ffff74ce490 in wl_display_dispatch_pending (display=<optimized out>) at ../src/wayland-client.c:2034
#8  0x00007ffff7f25548 in _gdk_wayland_display_queue_events (display=<optimized out>) at wayland/../../../../../gdk/wayland/gdkeventsource.c:201
#9  0x00007ffff7ec0a99 in gdk_display_get_event (display=0x55555557c0e0) at ../../../../gdk/gdkdisplay.c:442
#10 0x00007ffff7f2a996 in gdk_event_source_dispatch (base=<optimized out>, callback=<optimized out>, data=<optimized out>) at wayland/../../../../../gdk/wayland/gdkeventsource.c:120
#11 0x00007ffff6b02d3b in g_main_dispatch (context=0x55555558e800) at ../../../glib/gmain.c:3419
#12 g_main_context_dispatch (context=0x55555558e800) at ../../../glib/gmain.c:4137
#13 0x00007ffff6b58258 in g_main_context_iterate.constprop.0 (context=0x55555558e800, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../glib/gmain.c:4213
#14 0x00007ffff6b022b3 in g_main_loop_run (loop=0x555555825be0) at ../../../glib/gmain.c:4413
#15 0x00007ffff7848d2d in gtk_main () at ../../../../gtk/gtkmain.c:1329
#16 0x00005555555570a2 in main (argc=1, argv=0x7fffffffda88) at /home/user/local/git/examples/example_gdk_frame_clock/src/main.c:198
(gdb) c
Continuing.
(s): Cycle start
               clock:on_before_paint
               clock:on_update
               get timings:
               |  - now: 1903498816343
               |  - frame time: 1903498827058 (counter: 11272) (frame time - now: 10715)
               |  - predicted presentation time: 1903498841278 (predicted - now: 24935)
               \
1903498816343:  widget:on_tick_callback (rate: 6336234)
               get timings:
               |  - now: 1903498816550
               |  - frame time: 1903498827058 (counter: 11272) (frame time - now: 10508)
               |  - predicted presentation time: 1903498841278 (predicted - now: 24728)
               \
1903498816550:  widget:on_draw (tick-draw latency: 207)
               clock:on_paint
               clock:on_after_paint
               get timings:
               |  - now: 1903498830894
               |  - frame time: 1903498827058 (counter: 11272) (frame time - now: -3836)
               |  - predicted presentation time: 1903498841278 (predicted - now: 10384)
               \
1903498830894:  wl_surface:on_commit
(e): End of cycle

Here’s what’s happening:

  1. A breakpoint at line 570 of the gdkwindow-wayland.c file was set inside the frame_callback function.
  2. The program continues (c) and enters a FrameClock cycle (from “Cycle start”).
  3. The FrameClock goes through different phases like on_before_paint, on_update, and so on, collecting timings including current time, frame time, and predicted presentation time.
  4. The program hits the breakpoint at line 570 (frame_callback) as part of the cycle. The backtrace (bt) shows the function call stack, indicating that the frame_callback was triggered due to a Wayland event.
  5. The program again continues (c) and progresses through the FrameClock phases.

This output represents the cycle of the FrameClock in the example code program, with debug information showing the specific phases, timings, and the function calls being executed.

So… in conclusion

The FrameClock in GTK acts as a coordinator for frame updates and animations. It triggers a series of phases for each frame, including Before Paint, Update, Layout, Draw, Paint, and After Paint. These phases ensure that animations are smoothly coordinated and displayed, optimizing performance.

The concept of the frame drawn signal is crucial for achieving synchronization with the display’s VSync. This signal is emitted when a frame has been successfully presented on the screen by the compositor or windowing system. It allows the FrameClock to adjust its cycle and stay in harmony with the monitor’s refresh rate. This synchronization is achieved in the Wayland platform by utilizing the wl_surface commit done callback, which corresponds to the VSync signal.

There is no way to create your own FrameClock instance for your app. GTK applications that want to work with FrameClock should first add widgets to a GTK window and then obtain the FrameClock instance using gtk_widget_get_frame_clock().

And basically this is all what I have. My motivation for digging into this was to get a better understanding of the GTK FrameClock and how it works internally to produce smooth, synchronized animations in graphical applications. I hope this analysis is useful for others who are also curious about how this works.

by Pablo Saavedra at August 02, 2023 03:44 PM

July 28, 2023

José Dapena

Speeding up V8 heap snapshot

My last post, Javascript memory profiling with heap snapshot, finished announcing I would write a follow up post about several optimizations I implemented that make heap snapshot faster.

Good news! The post has been accepted in V8.dev! You can read it here.

by José Dapena Paz at July 28, 2023 05:07 PM

July 25, 2023

Stéphane Cerveau

Discover GStreamer full static mode

How to embed statically your own tailored version of GStreamer in your application #

Since the gstreamer-full effort, it was possible to create a shared library which will embed the GStreamer framework in addition to its set of plugins.

Within this effort, it was also possible to register the selected plugins/features automatically by calling the gst_init method in your application linking with gstreamer-full.

This method was offering a gstreamer-full package with library, headers and pc files but it was not possible to embed GStreamer statically in your application and use it transparently.

GstVkVideoParser: a standalone solution #

In the journey to bring an open source solution for a video parser to the Vulkan Conformance Test Suite, we chose first to use GStreamer as it was bringing all the parsing facilities necessary to support the needed codecs such as H26x or VPx. This solution was supposed to be also cross platform and dragging as less as possible system dependencies. Seen that GStreamer is usually dragging its own dependencies such as glib or orc and as we wanted to have a standalone GstVkVideoParser library supported on Windows, a little bit of work and love was necessary to add this to GStreamer.

Unfortunately this solution has not be retained by the Vulkan Video TSG, not because it was not working but another parser has been made available and easy to integrate to the CTS at source level avoiding binary linkage, see Vulkan Video change.

GStreamer as a full static library #

With the gstreamer-full work, everything was almost ready to be used except to have gstreamer-full as a real static library and be able to link with it in any application.

Here is the MR merged and the challenges taken up:

Adding gst-full-target-type=static #

To generate the gstreamer-full dependency which will be statically linked into the application, we decided to introduce a new gst meson options, gst-full-target-type.

By default the gstreamer-full will be built as a shared library as before.

By passing gst-full-target-type=static, only static object will be generated and a package config file will be generated for gstreamer-full allowing the application to avoid to know what static library it needs to add the link line. The GStreamer build system will take care of enabling/disabling the features/libraries you (do/don't) need.

Initialize the plugins/features automatically #

To avoid multiple call necessary to initialize GStreamer, it was also necessary to call the gst_init_static_plugins along with gst_init only in full-static mode but it was leading to a build issue.

Indeed most of tools/examples/tests are linking with libgstreamer-1.0 which owns gst_init () but to faciliate the plugins registration, it was necessary to move all the tools build after the gstreamer-full stage. A first MR has been performed to let gstreamer tools be built against gstreamer-full but additional work was necessary for some core tools or helpers such as gst-transcoder or gst-plugin-scanner to avoid a linking issue.

Disable tests and examples #

In a future work all the tools/examples/tests should support the full-static mode but as GStreamer aims to be a shared object framework, we decided to leave this work for later and disable all the examples/tests in full static mode as most of the application using a tailor build won't need the examples and tests.

Windows support #

One of the goal of this work was to provide a Windows library to the Vulkan CTS free of dependency, which has been achieved but some additional work might be necessary to support all of the use case, the GStreamer framework offer, especially on supporting library-dependent plugins.

Give me an example ... #

In the GstVkVideoParser project, various jobs are building Linux and Windows versions generating a library without any GStreamer/glib dependencie, everything is embedded inside the library, as you can see in this GitHub' Actions.

In this project, GStreamer is used as a meson subproject/wrap which allows to build GStreamer along of GstVkVideoParser. This can be possible easily by adding the following file to your meson project

subprojects/gstreamer-1.0.wrap

[wrap-git]
directory=gstreamer-1.0
url=https://gitlab.freedesktop.org/gstreamer/gstreamer.git
revision=main

[provide]
dependency_names = gstreamer-1.0, gstreamer-base-1.0, gstreamer-video-1.0, gstreamer-audio-1.0

and then add the following lines to your meson.build to depend on gstreamer-full

meson.build

gstreamer_full_dep = dependency('gstreamer-full-1.0', fallback: ['gstreamer-1.0'], required :true)

In order to build a project, library or application which is using a tailored version of GStreamer you can follow this configure example:

$ meson buildfull-static --default-library=static --force-fallback-for=gstreamer-1.0,glib,libffi,pcre2 -Dauto_features=disabled -Dglib:tests=false -Djson-glib:tests=false -Dpcre2:test=false -Dvkparser_standalone=enabled -Dgstreamer-1.0:libav=disabled -Dgstreamer-1.0:ugly=disabled -Dgstreamer-1.0:ges=disabled -Dgstreamer-1.0:devtools=disabled -Dgstreamer-1.0:default_library=static -Dgstreamer-1.0:rtsp_server=disabled -Dgstreamer-1.0:gst-full-target-type=static_library -Dgstreamer-1.0:gst-full-libraries=gstreamer-video-1.0, gstreamer-audio-1.0, gstreamer-app-1.0, gstreamer-codecparsers-1.0 -Dgst-plugins-base:playback=enabled -Dgst-plugins-base:app=enabled -Dgst-plugins-bad:videoparsers=enabled -Dgst-plugins-base:typefind=enabled

In this case we are disabling everything in GStreamer by using -Dauto_features=disabled and some enabled features such as ges, libav, etc. and enable only what we need as plugins, playback, app, videoparsers and typefind.

And finally we are enabling the static build with --default-library=static and -Dgstreamer-1.0:gst-full-target-type=static_library.

Next ... #

As you can see, it's quite easy now to build an application and depends on gstreamer-full static build, but there is still some issues to address such as the plugins dependencies which might be not static and some other platform specific issue such as the gstreamer-full symbols export on Windows.

You can follow some open issues such as:

As usual, if you would like to learn more about Vulkan Video, GStreamer or any other open multimedia framework, please contact us!

July 25, 2023 12:00 AM

July 19, 2023

Ziran Sun

WASH in Schools

Madina Tindano is an elementary school student who lives in Bogandé, East Burkina Faso. For a long time, the toilets in her school remained unusable and completely abandoned. Now in her final year at the school (CM2), Madina is overjoyed with the project of renovation of the latrines in her school. Thanks to the promotion of children’s right to education through better access to water, sanitation and hygiene (WASH) project.

The NGO behind the WASH Project is the UNICEF Foundation in Spain. As one of their educational projects, WASH believes that every child has the right to a quality education including access to drinking water, sanitation and hygiene services while at school. This can impact students’ learning, health, and dignity, particularly for girls like Madina . “When I see my period, I am very embarrassed to come to school because we don’t have toilets.” Madina said. WASH aims to improve access to water, sanitation and hygiene in 12 rural schools, including Madina’s, in the East region of Burkina Faso. 2796 students, 77 teachers and 48 parents will benefit from this project.

Igalia has been collaborating with UNICEF Foundation in Spain since 2007 in promoting access to quality education for children in Africa, and is proud to have been a part of this effort by contributing the funds to help 2 rural schools, which will help 430 students, 11 teachers and 7 parents. To make sure the project goes as planned, a monitoring commission is formed including some UNICEF members and representatives from Igalia (Javier Fernández and María Piñeiro).

The project started in February 2022 and was expected to finish in a year’s time. Unfortunately, the implementation of the project was affected by the insecurity situation in Gnagna province, which had caused the closure of the initially pre-selected schools ( Our hearts go with the children, teachers and parents from these schools. We hope things work out the best for them) . For this reason, activities are reoriented towards new schools located in a safer area of the same province, the commune of Bogande, and the project has been extended until the end of June. Apart from this initial delay, the project has been progressing very well. To achieve the goals, WASH has managed to get students, local authorities and communities, local organizations and private sectors involved throughout the project. Following work have been carried out:

  • Improving access to sustainable water and sanitation facilities by constructing and rehabilitating water points and latrines, also via distribution of WASH kits.
  • Increasing knowledge on good hygiene practices by providing training for Hygiene Clubs, implementing Schools Action Plan and running awareness raising campaigns.
  • Strengthening schools and communities’ capacities by providing training to Parents’ Associations and Teachers.

Doesn’t this joyful smile make you feel happy too? :-).

by zsun at July 19, 2023 09:36 AM

July 08, 2023

Philippe Normand

GNOME Web Canary is back

This is a short PSA post announcing the return of the GNOME Web Canary builds. Read on for the crunchy details.

A couple years ago I was blogging about the GNOME Web Canary flavor. In summary this special build of GNOME Web provides a preview of the upcoming version of the underlying WebKitGTK engine, it is potentially unstable, but allows for testing features that have not shipped in a stable release yet.

Unfortunately, Canary broke right after GNOME Web switched to GTK4, because back then the WebKit CI was missing build bots and infrastructure for hosting WebKitGTK4 build artefacts. Recently, thanks to the efforts of my Igalia colleagues, Pablo Abelenda, Lauro Moura, Diego Pino and Carlos López the WebKit CI provides WebKitGTK4 build artefacts, hosted on a server kindly provided by Igalia.

The installation instructions are already mentioned in the introductory post but I’ll just remind them again here:

flatpak --user remote-add --if-not-exists webkit https://software.igalia.com/flatpak-refs/webkit-sdk.flatpakrepo
flatpak --user install https://nightly.gnome.org/repo/appstream/org.gnome.Epiphany.Canary.flatpakref

Update:

If you installed the older version of Canary, pre-GTK4, you might see an error related with an expired GPG key. This is due to how I update the WebKit runtime, and I’ll try to avoid it in future updates. For the time being, you can remove the flatpak remote and re-add it:

flatpak --user remote-delete webkit
flatpak --user remote-add webkit https://software.igalia.com/flatpak-refs/webkit-sdk.flatpakrepo

That’s all folks, happy hacking and happy testing.

by Philippe Normand at July 08, 2023 03:30 PM

June 30, 2023

Igalia Compilers Team

Porting BOLT to RISC-V

Recently, initial support for RISC-V has landed in LLVM's BOLT subproject. Even though the current functionality is limited, it was an interesting experience of open source development to get to this point. In this post, I will talk about what BOLT is, what it takes to teach BOLT how to process RISC-V binaries, and the interesting detours I sometimes had to make to get this work upstream.

BOLT overview #

BOLT (Binary Optimization and Layout Tool) is a post-link optimizer whose primary goal is to improve the layout of binaries. It uses sample-based profiling to improve the performance of already fully-optimized binaries. That is, the goal is to be complementary to existing optimization techniques like PGO and LTO, not to replace them.

Sample-based profiling is used in order to make it viable to obtain profiles from production systems as its overhead is usually negligible compared to profiling techniques based on instrumentation. Another advantage is that no special build configuration is needed and production binaries can directly be profiled. The choice for binary optimization (as opposed to, say, optimizing at the IR level) comes from the accuracy of the profile data: since the profile is gathered at the binary level, mapping it back to a higher level representation of the code can be a challenging problem. Since code layout optimizations can quite easily be applied at the binary level, and the accuracy of the profile is highest there, the choice for performing post-link optimization seems to be a logical one.

To use BOLT, it needs access to a binary and corresponding profile. As mentioned before, the goal is to optimize production binaries so no special build steps are required. The only hard requirement is that the binary contains a symbol table (so stripped binaries are not supported). In order for BOLT to be able to rearrange functions (in addition to the code within functions), it needs access to relocations. Linkers usually remove relocations from the final binary but can be instructed to keep them using the --emit-relocs flag. For best results, it is recommended to link your binaries with this flag.

Gathering a profile on Linux systems can be done in the usual way using perf. BOLT provides the necessary tools to convert perf output to an appropriate format, and to combine multiple profiles. On systems where perf is not available, BOLT can also instrument binaries to create profiles. For more information on how to use BOLT, see the documentation.

For more details on BOLT, including design decisions and evaluation, see the CGO'19 paper. Let's move on to discuss some of BOLT's internals to understand what is needed to support RISC-V.

BOLT internals #

Optimizing the layout of a binary involves shuffling code around. The biggest challenge in doing this, is making sure that all code references are still correct. Indeed, moving a function or basic block to a different location means changing its address and all jumps, calls, or other references to it need to be updated because of it.

To do this correctly, BOLT's rewriting pipeline transforms binaries in the following (slightly simplified) way:

  1. Function discovery: using (mostly) the ELF symbol table, the boundaries of functions are recorded;
  2. Disassembly: using LLVM's MC-layer, function bodies are disassembled into lists of MCInst objects;
  3. CFG construction: basic blocks are discovered in the instruction lists and references between them resolved, resulting in a control-flow graph for each function;
  4. Optimizations: using the CFG, basic block and function layout is optimized based on the profile;
  5. Assembly: the new layout is emitted, using LLVM's MCStreamer API, to an ELF object file in memory;
  6. Link: since this object file might still contain external references, it is linked to produce the final binary.

Some of these steps are completely architecture independent. For example, function discovery only needs the ELF symbol table. Others do need architecture specific information. Fortunately, BOLT has supported multiple architectures from the beginning (X86-64 and AArch64) so an abstraction layer exists that makes it relatively straightforward to add a new target. Let's talk about what is needed to teach BOLT to transform RISC-V binaries.

Teaching BOLT RISC-V #

Thanks to BOLT's architecture abstraction layer, adding support for a new target turned out to be mostly straightforward. I will go over the parts of BOLT's rewriting pipeline that need architecture-specific information while focusing on the aspects of RISC-V that made this slightly tricky sometimes.

(Dis)assembly #

Assembly and disassembly of binaries is obviously architecture-dependent. BOLT uses various MC-layer LLVM APIs to perform these tasks. More specifically, MCDisassembler is used for disassembly while MCAssembler is used (indirectly via MCObjectStreamer) for assembly. The good news is that there is excellent RISC-V support in the MC-layer so this can readily be used by BOLT.

CFG construction #

The result of disassembly is a linear list of instructions in the order they appear in the binary. In the MC-layer, instructions are represented by MCInst objects. In this representation, instructions essentially consist of an opcode and a list of operands, where operands could be registers, immediates, or more high-level expressions (MCExpr). Expressions can be used, for example, to refer to symbolic program locations (i.e., labels) instead of using constant immediates.

Right after disassembly, however, all operands will be registers or immediates. For example, an instruction like

jal ra, f

will be disassembled into (heavy pseudo-code here)

MCInst(RISCV::JAL, [RISCV::X1, ImmOffset])

where ImmOffset is the offset from the jal instruction to f. This is not convenient to handle in BOLT as nothing indicates that this MCInst actually refers to f.

Therefore, BOLT post-processes instructions after disassembly and replaces immediates with symbolic references where appropriate. Two different mechanisms are used to figure out the address an instruction refers to:

  • For control-transfer instructions (e.g., calls and branches), MCInstrAnalysis is used to evaluate the target. LLVM's RISC-V backend already contained an appropriate implementation for this.
  • For other instructions (e.g., auipc/addi pairs to load an address in RISC-V), relocations are used. For this, BOLT's Relocation class had to be extended to support RISC-V ELF relocations.

Once the target of an instruction had been determined, BOLT creates an MCSymbol at that location and updates the MCInst to point to that symbol instead of an immediate offset.

One question remains: how does BOLT detect control-transfer instructions? Let's first discuss how BOLT creates the control-flow graph now that all instructions symbolically refer to their targets.

A CFG is a directed graph where the nodes are basic blocks and the edges are control-flow transfers between those basic blocks. Without going into details, BOLT has a target-independent algorithm to create a CFG from a list of instructions (for those interested, you can find it here). It needs some target-specific information about instructions though. For example:

  • Terminators are instructions that end basic block (e.g., branches and returns but not calls).
  • Branches and jumps are the instructions that create edges in the CFG.

To get this information, BOLT relies again on MCInstrAnalysis which provides methods such as isTerminator and isCall. These methods can be specialized by specific LLVM backends but the default implementation relies on the MCInstrDesc class. Objects of this class are generated by various TableGen files in the backends (e.g., this one for RISC-V). An important property of MCInstrDesc for the next discussion is that its information is based only on opcodes, operands are not taken into account.

LLVM's RISC-V backend did not specialize MCInstrAnalysis so BOLT was relying MCInstrDesc to get information about terminators and branches. For many targets (e.g., X86) this might actually be fine but for RISC-V, this causes problems. For example, take a jal instruction: is this a terminator, a branch, a call? Based solely on the opcode, we cannot actually answer these questions because jal is used both for direct jumps (terminator) and function calls (non-terminator).

The solution to this problem was to specialize MCInstrAnalysis for RISC-V taking the calling convention into account:

  • jal zero, ... is an unconditional branch (return address discarded);
  • jal ra, ... is a call (return address stored in ra (x1) which the calling convention designates as the return address register);
  • Some more rules for jalr, compressed instructions, detecting returns,...

So the first patch that landed to pave the way for RISC-V support in BOLT was not in the BOLT project but in the RISC-V MC-layer.


With this in place, the patch to add a RISC-V target to BOLT consisted mainly of implementing the necessary relocations and implementing the architecture abstraction layer. The latter consisted mainly of instruction manipulation (e.g., updating branch targets), detecting some types of instructions not supported by MCInstrAnalysis (e.g., nops), and analyzing RISC-V-specific Procedure Linkage Table (PLT) entries (so BOLT knows which function they refer to). Once I started to understand the internals of BOLT, this was relatively straightforward. After iterating over the patch with the BOLT maintainers (who were very helpful and responsive during this process), it got accepted in less than a month.

There was just one minor issue to resolve.

Linking #

The final step in the rewriting pipeline is linking the generated object file. BOLT is able to rely on LLVM again by using the RuntimeDyld JIT linker which is part of the MCJIT project. Unfortunately, there was no RISC-V support yet in RuntimeDyld. Looking at the supported targets, it seemed easy enough to implement RISC-V support: I just needed to implement the few relocations that BOLT emits. So I submitted a patch.

Alas, it seemed that things might not be as easy as I hoped:

Is there something preventing Bolt from moving to ORC / JITLink? If Bolt is able to move over then the aim should be to do that. If Bolt is unable to move over then we need to know why so that we can address the issue. RuntimeDyld is very much in maintenance mode at the moment, and we're working hard to reach parity in backend coverage so that we can officially deprecate it.

Even though this comment was followed up by this:

None of that is a total blocker to landing this, but the bar is high, and it should be understood that Bolt will need to migrate in the future.

trying to push-through the patch didn't feel like the right approach. For one, I'm anticipating to need some more advanced linker features for RISC-V in the future (e.g., linker relaxation) and I wouldn't want to implement those in a deprecated linker. Moreover, the recommended linker, JITLink, has mostly complete RISC-V support and, importantly, more users and reviewers, making its implementation most certainly of higher quality than what I would implement by myself in RuntimeDyld.

So the way forward for bringing RISC-V support to BOLT seemed to be to first port BOLT from using RuntimeDyld to JITLink. Since it looked like this wasn't going to be a priority for the BOLT maintainers, I decided I might as well give it a shot myself. Even though this would surely mean a significant delay in finishing my ultimate goal of RISC-V support in BOLT, it felt like a great opportunity to me: it allowed me to learn more about linkers and BOLT's internals, as well as to invest in a project that am hoping to use in the foreseeable future.

Porting BOLT to JITLink was hard, at least for me. It had a far ranging impact on many parts of BOLT that I had never touched before. This meant it took quite some time to try and understand these parts, but also that I learned a lot in the process. Besides changes to BOLT, I submitted a few JITLink patches to implement some missing AArch64 relocations that BOLT needed. In the end, I managed to pass all BOLT tests and submit a patch.

This patch took about a month and a half to get accepted. The BOLT maintainers were very helpful and responsive in the process. They were also very strict, though. Rightfully so, of course, as BOLT is being used in production systems. The main requirement for the patch to get accepted was that BOLT's output would be a 100% binary match with the RuntimeDyld version. This was necessary to ease the verification of the correctness of the patch. With the help of the BOLT maintainers, we managed to get the patch in an acceptable state to land it.

Looking forward #

With BOLT being ported to JITLink, the patch to add initial RISC-V support to BOLT could finally land. This doesn't mean that BOLT is currently very usable for RISC-V binaries, though: most binaries can pass through BOLT fine but many of BOLT's transformations are not supported yet.

Since the initial support was added, I landed a few more patches to improve usability. For example, support for an obscure ELF feature called composed relocations was added, something RISC-V uses for R_RISCV_ADD32/SUB32 relocations (which BOLT supports now). Other patches deal with creation and reversal of branches, something BOLT needs to fix-up basic blocks after their layout has changed.

I'm currently working on handling binaries that have been relaxed during linking. The issue is that, after BOLT has moved code around, relaxed instructions might not fit the new addresses anymore. I plan to handle this as follows: during disassembly, BOLT will "unrelax" instructions (e.g., translating a jal back to an auipc/jalr pair) to make sure new addresses will always fit. The linker will then undo this, when possible, by performing relaxation again. The first step for this, adding linker relaxation support to JITLink, has been landed. More on this in a future post.

Wrapping up #

Bringing initial RISC-V support to BOLT has been a very interesting and educational journey for me, both from a technical as well as a social perspective. Having to work on multiple projects (LLVM MC, JITLink, BOLT) has taught me new technologies and put me in contact with great communities. I certainly hope to be able to continue this work in the future.

I'll close this post with a reference of the graph at the top, showing what it took, over a series of ~25 patches, to get RISC-V support in BOLT. I think this demonstrates the kind of detours that are sometimes needed to get work upstream, in this case benefiting both the RISC-V community (RISC-V support in BOLT) and BOLT as a whole (moving away from a deprecated linker and fixing bugs encountered along the way)

June 30, 2023 12:00 AM

June 22, 2023

Ziran Sun

Igalia helps building a library in Yoff

People in Yoff Senegal are expecting to have a library built at the ground floor of a local school named “Coruña” in 2024. Thanks to the “A library in Yoff” project.

The “A library in Yoff ” project is led by Ecodesarrollo Gaia, a NGO based in A Coruña, Spain. Ecodesarrollo Gaia is the founder for Yoff Coruña school. As part of their educational project, this effort aims to provide a safe and peaceful place for the local community to access books and other educational and cultural resources, and to create more jobs and professional development opportunities for the local community. Above all, Ecodesarrollo Gaia would like to get students from the school involved in this project.

Igalia has been working with Ecodesarrollo Gaia since 2018 and is very proud to provide full funds for this project. The funds cover building construction work, acquiring essential furniture, creating foundational bibliographic batches, providing computer equipment and digital media, and carrying new staff employment and training.

This 1 year project has been progressing well. At the time of writing, the constructor who built the school has scheduled a meeting in June to specify the details of the construction. People in charge of the municipal library Sagrada Familia of A Coruña have been contacted to prepare a training course in Yoff at the beginning of 2024. At the moment, some members of Ecodesarrollo have traveled to Yoff and stayed locally to help run the project.

A lot to look forward to!

by zsun at June 22, 2023 12:06 PM

June 20, 2023

Eric Meyer

First-Person Scrollers

I’ve played a lot of video games over the years, and the thing that just utterly blows my mind about them is how every frame is painted from scratch.  So in a game running at 30 frames per second, everything in the scene has to be calculated and drawn every 33 milliseconds, no matter how little or much has changed from one frame to the next.  In modern games, users generally demand 60 frames per second.  So everything you see on-screen gets calculated, placed, colored, textured, shaded, and what-have-you in 16 milliseconds (or less).  And then, in the next 16 milliseconds (or less), it has to be done all over again.  And there are games that render the entire scene in single-digits numbers of milliseconds!

I mean, I’ve done some simple 3D render coding in my day.  I’ve done hobbyist video game development; see Gravity Wars, for example (which I really do need to get back to and make less user-hostile).  So you’d think I’d be used to this concept, but somehow, I just never get there.  My pre-DOS-era brain rebels at the idea that everything has to be recalculated from scratch every frame, and doubly so that such a thing can be done in such infinitesimal slivers of time.

So you can imagine how I feel about the fact that web browsers operate in exactly the same way, and with the same performance requirements.

Maybe this shouldn’t come as a surprise.  After all, we have user interactions and embedded videos and resizable windows and page scrolling and stuff like that, never mind CSS animations and DOM manipulation, so the viewport often needs to be re-rendered to reflect the current state of things.  And to make all that feel smooth like butter, browser engines have to be able to display web pages at a minimum of 60 frames per second.

Admittedly, this would be a popular UI for browsing social media.

This demand touches absolutely everything, and shapes the evolution of web technologies in ways I don’t think we fully appreciate.  You want to add a new selector type?  It has to be performant.  This is what blocked :has() (and similar proposals) for such a long time.  It wasn’t difficult to figure out how to select ancestor elements — it was very difficult to figure out how to do it really, really fast, so as not to lower typical rendering speed below that magic 60fps.  The same logic applies to new features like view transitions, or new filter functions, or element exclusions, or whatever you might dream up.  No matter how cool the idea, if it bogs rendering down too much, it’s a non-starter.

I should note that none of this is to say it’s impossible to get a browser below 60fps: pile on enough computationally expensive operations and you’ll still jank like crazy.  It’s more that the goal is to keep any new feature from dragging rendering performance down too far in reasonable situations, both alone and in combination with already-existing features.  What constitutes “down too far” and “reasonable situations” is honestly a little opaque, but that’s a conversation slash vigorous debate for another time.

I’m sure the people who’ve worked on browser engines have fascinating stories about what they do internally to safeguard rendering speed, and ideas they’ve had to spike because they were performance killers.  I would love to hear those stories, if any BigCo devrel teams are looking for podcast ideas, or would like to guest on Igalia Chats. (We’d love to have you on!)

Anyway, the point I’m making is that performance isn’t just a matter of low asset sizes and script tuning and server efficiency.  It’s also a question of the engine’s ability to redraw the contents of the viewport, no matter what changes for whatever reason, with reasonable anticipation of things that might affect the rendering, every 15 milliseconds, over and over and over and over and over again, just so we can scroll our web pages smoothly.  It’s kind of bananas, and yet, it also makes sense.  Welcome to the web.


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at June 20, 2023 12:16 PM

June 19, 2023

Javier Fernández

Secure Curves in the Web Cryptography API

Introduction

Developers are exceptionally creative with the tools they are given. For a long time now they’ve had the ability to apply the Web Cryptography API to many uses. Getting random values from this API is, for example, an exceptionally popular use case being used on over 60% of page loads in the HTTP Archive dataset. Of course, it’s intended use is about actual cryptography and it offers numerous algorithms.

However, if developers feel the algorithm they need isn’t available from this API, they’ll write it (or compile to WASM) themselves. That’s the case today when it comes to “secure curve” algorithms, like X25519 [RFC7748] or Ed25519 [RFC8032] . These are desirable because they offer strong security guaranties while operating at much better performance levels than others. This is a shame because your browser already has internal support for these as part of TLS 1.3, it’s just not exposed to developers. Those userland solutions come with added costs of complexity, bandwidth, overall performance and has security implications.

Adding some Secure Curves to the Web Cryptography API would provide many advantages to web authors, but this has been a multi-year challenge that thanks to the collaboration between Igalia and Protocol Labs is close to give some results.

Context

Secure elliptic curves play a very important role in the area of cryptography, providing robust and efficient algorithms. Among the available algorithms of this kind, two curves that have gained significant attention in recent years are Ed25519 and X25519. These curves are based on the Edwards and Montgomery forms respectively, and offer strong security guaranties while still operating at excellent performance levels.

I think adding these curves to the API has been always an obvious step, but if we want to have the whole picture, we may need to step back and talk a bit about the history of the Web Cryptography API specification and why it’s has been so difficult to incorporate new and more modern algorithms in the last years.

The Web Cryptography API specification

In an effort of ensuring secure communication and data protection in the web, the W3C created the Web Cryptography Working Group which among its goals had the definition of an API that lets developers implement secure application protocols on the level of Web applications. Out of this effort the WG published the Web Cryptography API, becoming a W3C Recommendation in January 2017.

This specification defines a comprehensive set of interfaces and algorithms for performing various cryptographic tasks, such as encryption, decryption, digital signatures, key generation, and key management. As usual, one of the main goals of the W3C specs is to encourage an interoperable cryptographic API across different web browsers and platforms. This simplifies the development process and ensures compatibility and portability of web-based cryptographic applications.

There are several cryptographic algorithms defined in the Web Cryptography API, including symmetric encryption algorithms like AES, asymmetric encryption like RSA and Elliptic Curve Cryptography algorithms (ECC), hash functions like SHA-256 or digital signature algorithms like RSA-PSS. These API allows web authors to implement strong cryptographic mechanism without requiring a deep knowledge of the underlying cryptographic primitives.

It’s also important to note that the spec not only defines the cryptographic algorithms available for web applications, but also some security considerations, such as key storage and management, handling of sensitive data and protection against common security attacks. These considerations ensure that the apps implement their cryptographic logic in secure and robust way.

The adoption of the Web Cryptography API specification by major web browsers has been a key factor in enabling secure web applications and ensuring trust in only transactions.

Why it took so long to add Secure Curves

The lack of safe curves in the Web Cryptography specification has been a long-term issue for web developers that were forced to rely on third-party or native implementation for their applications. Even more when their use has been widely spread along non-web software components.

All these claims become an actual proposal when Qingsi Wang (Google) filed an issue for the TAG in the beginning of 2020. he proposal got quite positive feedback from Firefox engineers as it was clearly stated in the standard position request driven by David Baron (Mozillian back then) and Tantek Çelik, and endorsed by Martin Thomson.

So, despite the lack of a clear position from Safari, the proposal was accepted by the TAG with the support of 2 major browsers and the only concern of a proper standarization venue, given that the former Web Cryptography WG was closed a few years before. The solution to address these concerns was to develop this new specification in the Web Incubators Working Group.

The last Web Cryptography candidate recommendation was published in 2017, when it was still under the umbrella of the mentioned Web Cryptography WG. Since then, the spec drafts have been reviewed and published by the Web Application Security Working Group and with Daniel Huigens (Proton AG) as the only spec editor.

Even with this unstable situation, but with the support of 2 (Firefox and Chrome) main browsers, an intent to prototype request for Chrome was announced and the implementation started in Feb 2020. Unfortunately the work was not completed and even the partial implementation was removed from the Chromium source code repository.

After some time of maturing, the initial explainer written by Qingsi Wang was used to create the Secure Curves in the Web Cryptography API document, a potential W3C spec under the umbrella of the WIWG thanks to the work of its editor Daniel Huigens. The long-term plan is that the spec will be eventually integrated into the Web Cryptography API specification; and this is where Protocol Labs enters in the scene.

Protocol Labs contribution to the Web Crypto spec

Last year Protocol Labs defined a new goal in our long-term collaboration to get some progress on the effort to make the secure curves spec part of the Web Cryptography API. This kind of cryptography algorithm is a fundamental tool for several uses cases of the IPFS ecosystem they are trying to build during the last years.

The Ed25519 key pairs have become the standard in many web applications and the IPFS protocol has adopted them as default some time ago. Additionally, Ed25519 public keys had been primary identifiers across dat/hypercore and SSB from the beginning and most of the projects in this technology ecosystem prefer them due to the smaller key sizes and the possibility of implementing faster operations, in comparison to the use of RSA keys.

Since the adoption of UCANs by many teams inside Protocol Labs, it’s been frequent the hard choice between natively supported RSA keys in browsers versus the preferred Ed25519 keys, with the only option of relying on external libraries. The use of this external software components (eg many js / wasm ) implies a security risk of them been compromised. In most cases it is desired to have private keys non-extractable to prevent attacks from malicious scripts and/or web extensions, which can not be accomplished with js/wasm implementations; supply chain attacks is another vector that user space implementations are exposed to.

The alternatives to the lack of support of secure curves in the Web Platform has been bundling user space implementation of Ed25519 for signature verification (which increases complexity and amount of code of the programs) or the use of built-in RSA for signing (to prevent possible attacks as the ones described above).

In summary, Protocol Labs and Igalia consider that providing implementations of secure curves like Ed25519 and X25519 in the Web Cryptography API will provide to the Web Platform a very important feature that fills the gap respect to other native implementations. It will become a more competitive development platform for many projects, addressing the previously described attack vectors and in many cases simplifying applications and their implementation effort, as they will no longer require joggling Ed25519 and RSA keys.

Working plan

As I commented about, the long term goal is to get the full standardization status of the Secure Curves document and make the algorithms it defines part of the general Web Cryptography API specification. In order to achieve this goal it’s needed that most of the main browser implement the algorithms, ensuring a good level of interoperability. There are quite many Web Platform Tests for these new algorithms in the WebCryptoAPI test suite, so it’s a good start.

The nature of this goal, which I want to remark that is part of a long term and more general collaboration between Igalia and Protocol Labs, is a multi-browser task. Our plan is to implement, or collaborate with patches, spec work and tests, the Ed25519 and X25519 algorithms in Chromium, Firefox and Safari. Hence, one of the first steps has been to issue a standard position request for WebKit, which received positive feedback. This was useful to send a new intent to prototype request in Chrome, reactivating the one abandoned a few years ago.

Regarding Firefox, despite the positive feedback on the standard position request filed back in 2020, the implementation has not started yet and it’s pending on some blocking issues; I’ll elaborate on this issue in the next section.

Current status

Chrome

Our first target for this task has been the Chromium browser. Perhaps the best way to follow the progress of this work is through the Chrome Platform Status site, where there is a specific entry for this feature. If you are interested on the implementation details you can check the tracking bug.

It’s important to notice that the feature is being implemented behind the WebCryptoCurve25519 runtime flag, so if you are interested on trying it out you should enable the Experimental Web Platform Features. I’m going to talk later about what’s missing to propose the intent to ship request so that the feature could be enabled by default.

The Implementation of the Ed25519 algorithm landed Chromium in Nov 2022 and shipped in Chrome 110.0.5424.0. The X25519 key sharing algorithm took more time due to the review process, but it finally landed in March 2023 and has been shipped in Chrome since 113.0.5657.0. I can’t be more grateful to the patient and awesome work that David Benjamin (Google) did with all the reviews; contributing to the Chromium project has been always a pleasure and the review process extremely useful and agile, and this time it was not an exception.

Safari

Soon after getting positive feedback on the standard position request I filed, and in parallel to work on the implementation for Chrome, Safari engineers started the implementation of the Ed25519 algorithm for the WebKit engine. The main developer of this work has been Angela Izquierdo with reviews from Youenn Fablet mainly. Safari shipped the Ed25519 algorithm implementation in STP 163 and enabled by default for the COCOA WebKit port. I have in my TODO to enable it for the WebKitGtk+ port as well.

The implementation of the X25519 key sharing algorithm has not started yet, but I’ve been in conversation with some WebKit engineers to see how we can collaborate on this effort. Anyone interested could follow bug 258279 to track the progress of the implementation. I hope to have some time for this task during H2 this year.

Firefox

This is the browser that is more delayed regarding the implementation of the secure curves. I filed the bug 1804788 to track the implementation work and started already a preliminary analysis of the Gecko’s and NSS codebase. Unfortunately, it seems there is still some pending work (see bug 1325335 for details) to add the curve25519 cryptography primitives in the NSS library and this is blocking the Web Crypto API implementation.

We are already in conversations with some Firefox engineers and it seems there may be some progress by H2 this year as well.

Summary

The following table provides a high-level overview of the support of the secure curves25519 in some of the main browsers:

Browser Ed25519 X25519
Chrome ✅ ✅
Safari 🚀 🚧
Firefox 🚧 🚧

The following graphs show the current interoperability from wpt.fyi:

Test results for the generateKey method:

Test results for the deriveBits and deriveKey methods:

Tests results for the importKey and exportKey methods:

Tests results for the sign and verify methods:

Tests results for the wrap and unwrap methods:

Next steps

Shipping by default in Chrome

On of the top priorities for H2 is to send the intent to ship request for Chrome. There are currently 2 issues that are blocking this task:

  • bug 1402835 – Ensure Ed25519 and X25519 implementations matches the spec regarding small-order keys
  • bug 1433707 – Handling optional length in X25519 does not match spec

Regarding the the first issue, in the last draft of the Secure Curves in the Web Cryptography specification states that there must be checks for the all-zero values to ensure small-order keys are rejected (as per RFC7748 Section 6.1).

If secret is the all-zero value, then throw a OperationError. This check must be performed in constant-time, as per [RFC7748] Section 6.1″

However, there is an ongoing discussion in the PR#13 to introduce a change so that the small-order keys are rejected during the import operation instead of when they are used. It’s worth mentioning the strong opposition from Chrome to this spec change, under the argument of following the RFC 7748 where it’s stated to do the checks when the keys are used and considers this PR a regression. There is also an ongoing discussion about this in WebKit in the form of a new standard-position request, but still no feedback on this side.

There are WPT to ensure that the X25519 algorithm works as expected with small-order keys, but since they assume that the all-zero checks are performed at the derivation phase, there are asserts to ensure the initial keys are valid. If the spec changes, these tests must be adapted.

Regarding the second issue, there is an active discussion in the issue#322 where despite the different positions about the best approach to address it, there is a clear consensus that the Web Cryptography API spec has several inconsistencies on how the deriveBit function’s ‘length’ parameter is defined. These inconsistencies have lead to wrong WPT definitions and possibly some browser’s implementations that would beed to be changed. Although there is a clear lack of interoperability here, the most concerning issue is the correctness of the implementations and how any potential change may affect to the deriveKey operations of ECDH, HFDF and PBKDF2 algorithms.

WebKit’s implementation of X25519

As I said before, we are currently analyzing the WebKit’s codebase to see if we could have some resources to start the implementation early in H2.

Firefox’s implementation of both Ed25519 and X25519

Until there is support in Firefox’s NSS component for the Curve25519 cryptographic primitives we are not able to start with the implementation of the Web Cryptography API for these algorithms.

Conclusions

The work that Igalia and Protocol Labs are doing in the Web Cryptography API specification will have a big impact on how web developers use the platform these days, reducing security risks and allowing lighter and simpler applications.

We are working very hard to offer web authors native support for the Ed255129 and X25519 in the main browsers (Safari, Firefox, Chrome) by the end of 2033, including all the Chromium based browsers (eg, Edge, Brave, Opera).

This work is another example of the Protocol Labs’s commitment with an open Web Platform and open source browsers, investing their resources on a great variety of features with wide impact on web authors.

by jfernandez at June 19, 2023 10:13 PM

June 15, 2023

Andy Wingo

parallel futures in mobile application development

Good morning, hackers. Today I'd like to pick up my series on mobile application development. To recap, we looked at:

  • Ionic/Capacitor, which makes mobile app development more like web app development;

  • React Native, a flavor of React that renders to platform-native UI components rather than the Web, with ahead-of-time compilation of JavaScript;

  • NativeScript, which exposes all platform capabilities directly to JavaScript and lets users layer their preferred framework on top;

  • Flutter, which bypasses the platform's native UI components to render directly using the GPU, and uses Dart instead of JavaScript/TypeScript; and

  • Ark, which is Flutter-like in its rendering, but programmed via a dialect of TypeScript, with its own multi-tier compilation and distribution pipeline.

Taking a step back, with the exception of Ark which has a special relationship to HarmonyOS and Huawei, these frameworks are all layers on top of what is provided by Android or iOS. Why would you do that? Presumably there are benefits to these interstitial layers; what are they?

Probably the most basic answer is that an app framework layer offers the promise of abstracting over the different platforms. This way you can just have one mobile application development team instead of two or more. In practice you still need to test on iOS and Android at least, but this is cheaper than having fully separate Android and iOS teams.

Given that we are abstracting over platforms, it is natural also to abandon platform-specific languages like Swift or Kotlin. This is the moment in the strategic planning process that unleashes chaos: there is a fundamental element of randomness and risk when choosing a programming language and its community. Languages exist on a hype and adoption cycle; ideally you want to catch one on its way up, and you want it to remain popular over the life of your platform (10 years or so). This is not an easy thing to do and it's quite possible to bet on the wrong horse. However the communities around popular languages also bring their own risks, in that they have fashions that change over time, and you might have to adapt your platform to the language as fashions come and go, whether or not these fashions actually make better apps.

Choosing JavaScript as your language places more emphasis on the benefits of popularity, and is in turn a promise to adapt to ongoing fads. Choosing a more niche language like Dart places more emphasis on predictability of where the language will go, and ability to shape the language's future; Flutter is a big fish in a small pond.

There are other language choices, though; if you are building your own thing, you can choose any direction you like. What if you used Rust? What if you doubled down on WebAssembly, somehow? In some ways we'll never know unless we go down one of these paths; one has to pick a direction and stick to it for long enough to ship something, and endless tergiversations on such basic questions as language are not helpful. But in the early phases of platform design, all is open, and it would be prudent to spend some time thinking about what it might look like in one of these alternate worlds. In that spirit, let us explore these futures to see how they might be.

alternate world: rust

The arc of history bends away from C and C++ and towards Rust. Given that a mobile development platform has to have some low-level code, there are arguments in favor of writing it in Rust already instead of choosing to migrate in the future.

One advantage of Rust is that programs written in it generally have fewer memory-safety bugs than their C and C++ counterparts, which is important in the context of smart phones that handle untrusted third-party data and programs, i.e., web sites.

Also, Rust makes it easy to write parallel programs. For the same implementation effort, we can expect Rust programs to make more efficient use of the hardware than C++ programs.

And relative to JavaScript et al, Rust also has the advantage of predictable performance: it requires quite a good ahead-of-time compiler, but no adaptive optimization at run-time.

These observations are just conversation-starters, though, and when it comes to imagining what a real mobile device would look like with a Rust application development framework, things get more complicated. Firstly, there is the approach to UI: how do you get pixels on the screen and events from the user? The three general solutions are to use a web browser engine, to use platform-native widgets, or to build everything in Rust using low-level graphics primitives.

The first approach is taken by the Tauri framework: an app is broken into two pieces, a Rust server and an HTML/JS/CSS front-end. Running a Tauri app creates a WebView in which to run the front-end, and establishes a bridge between the web client and the Rust server. In many ways the resulting system ends up looking a lot like Ionic/Capacitor, and many of the UI questions are left open to the user: what UI framework to use, all of the JavaScript programming, and so on.

Instead of using a platform's WebView library, a Rust app could instead ship a WebView. This would of course make the application binary size larger, but tighter coupling between the app and the WebView may allow you to run the UI logic from Rust itself instead of having a large JS component. Notably this would be an interesting opportunity to adopt the Servo web engine, which is itself written in Rust. Servo is a project that in many ways exists in potentia; with more investment it could become a viable alternative to Gecko, Blink, or WebKit, and whoever does the investment would then be in a position of influence in the web platform.

If we look towards the platform-native side, though there are quite a number of Rust libraries that provide wrappers to native widgets, practically all of these primarily target the desktop. Only cacao supports iOS widgets, and there is no equivalent binding for Android, so any NativeScript-like solution in Rust would require a significant amount of work.

In contrast, the ecosystem of Rust UI libraries that are implemented on top of OpenGL and other low-level graphics facilities is much more active and interesting. Probably the best recent overview of this landscape is by Raph Levien, (see the "quick tour of existing architectures" subsection). In summary, everything is still in motion and there is no established consensus as to how to approach the problem of UI development, but there are many interesting experiments in progress. With my engineer hat on, exploring these directions looks like fun. As Raph notes, some degree of exploration seems necessary as well: we will only know if a given approach is a good idea if we spend some time with it.

However if instead we consider the situation from the perspective of someone building a mobile application development framework, Rust seems more of a mid/long-term strategy than a concrete short-term option. Sure, build low-level libraries in Rust, to the extent possible, but there is no compelling-in-and-of-itself story yet that you can sell to potential UI developers, because everything is still so undecided.

Finally, let us consider the question of scripting: sometimes you need to add logic to a program at run-time. It could be because actually most of your app is dynamic and comes from the network; in that case your app is like a little virtual machine. If your app development framework is written in JavaScript, like Ionic/Capacitor, then you have a natural solution: just serve JavaScript. But if your app is written in Rust, what do you do? Waiting until the app store pushes a new version of the app to the user is not an option.

There would appear to be three common solutions to this problem. One is to use JavaScript -- that's what Servo does, for example. As a web engine, Servo doesn't have much of a choice, but the point stands. Currently Servo embeds a copy of SpiderMonkey, the JS engine from Firefox, and it does make sense for Servo to take advantage of an industrial, complete JS engine. Of course, SpiderMonkey is written in C++; if there were a JS engine written in Rust, probably Rust programmers would prefer it. Also it would be fun to write, or rather, fun to start writing; reaching the level of ECMA-262 conformance of SpiderMonkey is at least a hundred-million-dollar project. Anyway what I am saying is that I understand why Boa was started, and I wish them the many millions of dollars needed to see it through to completion.

You are not obliged to script your app via JavaScript, of course; there are many languages out there that have "extending a low-level core" as one of their core use cases. I think the mitigated success that this approach has had over the years—who embeds Python into an iPhone app?—should probably rule out this strategy as a core part of an application development framework. Still, I should mention one Rust-specific option, Rhai; the pitch is that by being Rust-specific, you get more expressive interoperation between Rhai and Rust than you would between Rust and any other dynamic language. Still, it is not a solution that I would bet on: Rhai internalizes so many Rust concepts (notably around borrowing and lifetimes) that I think you have to know Rust to write effective Rhai, and knowing both is quite rare. Anyone who writes Rhai would probably rather be writing Rust, and that's not a good equilibrium.

The third option for scripting Rust is WebAssembly. We'll get to that in a minute.

alternate world: the web of pixels

Let's return to Flutter for a moment, if you will. Like the more active Rust GUI development projects, Flutter is an all-in-one rendering framework based on low-level primitives; all it needs is Vulkan or Metal or (soon) WebGPU, and it handles the rest, layering on opinionated patterns for how to build user interfaces. It didn't arrive to this state in a day, though. To hear Eric Seidel tell the story, Flutter began as a kind of "reset" for the Web, a conscious attempt to determine from the pieces that compose the Web rendering stack, which ones enable smooth user interfaces and which ones get in the way. After taking away all of the parts they didn't need, Flutter wasn't left with much: just GPU texture layers, a low-level drawing toolkit, and the necessary bindings to input events. Of course what the application programmer sees is much more high-level, but underneath, these are the platform primitives that Flutter uses.

So, imagine you work at Google. You used to work on the web—maybe on WebKit and then Chrome like Eric, maybe on web standards—but you broke with this past to see what Flutter might become. Flutter works: great job everybody! The set of graphical and input primitives that you use is minimal enough that it is abstract by nature; it doesn't much matter whether you target iOS or Android, because the primitives will be there. But the web is still the web, and it is annoying, aesthetically speaking. Could we Flutter-ize the web? What would that mean?

That's exactly what former HTML specification editor and now Flutter team member Ian Hixie proposed this January in a brief manifesto, Towards a modern Web stack. The basic idea is that the web and thus the browser is, well, a bit much. Hixie proposed to start over, rebuilding the web on top of WebAssembly (for code), WebGPU (for graphics), WebHID (for input), and ARIA (for accessibility). Technically it's a very interesting proposition! After all, people that build complex web apps end up having to fight with the platform to get the results they want; if we can reorient them to focus on these primitives, perhaps web apps can compete better with native apps.

However if you game out what is being proposed, I have doubts. The existing web is largely HTML, with JavaScript and CSS as add-ons: a web of structured text. Hixie's flutterized web proposal, on the other hand, is a web of pixels. This has a number of implications. One is that each app has to ship its own text renderer and internationalization tables, which is a bit silly to say the least. And whereas we take it for granted that we can mouse over a web page and select its text, with a web of pixels it is much less obvious how that would happen. Hixie's proposal is that apps expose structure via ARIA, but as far as I understand there is no association between pixels and ARIA properties: the pixels themselves really have no built-in structure to speak of.

And of course unlike in the web of structured text, in a web of pixels it would be up each app to actually describe its structure via ARIA: it's not a built-in part of the system. But if you combine this with the rendering story (here's WebGPU, now draw the rest of the owl), Hixie's proposal leaves a void for frameworks to fill between what the app developer wants to write (e.g. Flutter/Dart) and the platform (WebGPU/ARIA/etc).

I said before that I had doubts and indeed I have doubts about my doubts. I am old enough to remember when X11 apps on Unix desktops changed from having fonts rendered on the server (i.e. by the operating system) to having them rendered on the client (i.e. the app), which was associated with a similar kind of anxiety. There were similar factors at play: slow-moving standards (X11) and not knowing at build-time what the platform would actually provide (which X server would be in use, etc). But instead of using the server, you could just ship pixels, and that's how GNOME got good text rendering, with Pango and FreeType and fontconfig, and eventually HarfBuzz, the text shaper used in Chromium and Flutter and many other places. Client-side fonts not only enabled more complex text shaping but also eliminated some round-trips for text measurement during UI layout, which is a bit of a theme in this article series. So could it be that pixels instead of text does not represent an apocalypse for the web? I don't know.

Incidentally I cannot move on from this point without pointing out another narrative thread, which is that of continued human effort over time. Raph Levien, who I mentioned above as a Rust UI toolkit developer, actually spent quite some time doing graphics for GNOME in the early 2000s; I remember working with his libart_lgpl. Behdad Esfahbod, author of HarfBuzz, built many parts of the free software text rendering stack before moving on to Chrome and many other things. I think that if you work on this low level where you are constantly translating text to textures, the accessibility and interaction benefits of using a platform-provided text library start to fade: you are the boss of text around here and you can implement the needed functionality yourself. From this perspective, pixels don't represent risk at all. In the old days of GNOME 2, client-side font rendering didn't lead to bad UI or poor accessibility. To be fair, there were other factors pushing to keep work in a commons, as the actual text rendering libraries still tended to be shipped with the operating system as shared libraries. Would similar factors prevail in a statically-linked web of pixels?

In a way it's a moot question for us, because in this series we are focussing on native app development. So, if you ship a platform, should your app development framework look like the web-of-pixels proposal, or something else? To me it is clear that as a platform, you need more. You need a common development story for how to build user-facing apps: something that looks more like Flutter and less like the primitives that Flutter uses. Though you surely will include a web-of-pixels-like low-level layer, because you need it yourself, probably you should also ship shared text rendering libraries, to reduce the install size for each individual app.

And of course, having text as part of the system has the side benefit of making it easier to get users to install OS-level security patches: it is well-known in the industry that users will make time for the update if they get a new goose emoji in exchange.

alternate world: webassembly

Hark! Have you heard the good word? Have you accepted your Lord and savior, WebAssembly, into your heart? I jest; it does sometime feel like messianic narratives surrounding WebAssembly prevent us from considering its concrete aspects. But despite the hype, WebAssembly is clearly a technology that will be a part of the future of computing. So let's dive in: what would it mean for a mobile app development platform to embrace WebAssembly?

Before answering that question, a brief summary of what WebAssembly is. WebAssembly 1.0 is portable bytecode format that is a good compilation target for C, C++, and Rust. These languages have good compiler toolchains that can produce WebAssembly. The nice thing is that when you instantiate a WebAssembly module, it is completely isolated from its host: it can't harm the host (approximately speaking). All points of interoperation with the host are via copying data into memory owned by the WebAssembly guest; the compiler toolchains abstract over these copies, allowing a Rust-compiled-to-native host to call into a Rust-compiled-to-WebAssembly module using idiomatic Rust code.

So, WebAssembly 1.0 can be used as a way to script a Rust application. The guest script can be interpreted, compiled just in time, or compiled ahead of time for peak throughput.

Of course, people that would want to script an application probably want a higher-level language than Rust. In a way, WebAssembly is in a similar situation as WebGPU in the web-of-pixels proposal: it is a low-level tool that needs higher-level toolchains and patterns to bridge the gap between developers and primitives.

Indeed, the web-of-pixels proposal specifies WebAssembly as the compute primitive. The idea is that you ship your application as a WebAssembly module, and give that module WebGPU, WebHID, and ARIA capabilities via imports. Such a WebAssembly module doesn't script an existing application: it is the app. So another way for an app development platform to use WebAssembly would be like how the web-of-pixels proposes to do it: as an interchange format and as a low-level abstraction. As in the scripting case, you can interpret or compile the module. Perhaps an infrequently-run app would just be interpreted, to save on disk space, whereas a more heavily-used app would be optimized ahead of time, or something.

We should mention another interesting benefit of WebAssembly as a distribution format, which is that it abstracts over the specific chipset on the user's device; it's the device itself that is responsible for efficiently executing the program, possibly via compilation to specialized machine code. I understand for example that RISC-V people are quite happy about this property because it lowers the barrier to entry for them relative to an ARM monoculture.

WebAssembly does have some limitations, though. One is that if the throughput of data transfer between guest and host is high, performance can be bad due to copying overhead. The nascent memory-control proposal aims to provide an mmap capability, but it is still early days. The need to copy would be a limitation for using WebGPU primitives.

More generally, as an abstraction, WebAssembly may not be able to express programs in the most efficient way for a given host platform. For example, its SIMD operations work on 128-bit vectors, whereas host platforms may have much wider vectors. Any current limitation will recede with time, as WebAssembly gains new features, but every year brings new hardware capabilities (tensor operation accelerator, anyone?), so there will be some impedance-matching to do for the foreseeable future.

The more fundamental limitation of the 1.0 version of WebAssembly is that it's only a good compilation target for some languages. This is because some of the fundamental parts of WebAssembly that enable isolation between host and guest (structured control flow, opaque stack, no instruction pointer) make it difficult to efficiently implement languages that need garbage collection, such as Java or Go. The coming WebAssembly 2.0 starts to address this need by including low-level managed arrays and records, allowing for reasonable ahead-of-time compilation of languages like Java. Getting a dynamic language like JavaScript to compile to efficient WebAssembly can still be a challenge, though, because many of the just-in-time techniques needed to efficiently implement these languages will still be missing in WebAssembly 2.0.

Before moving on to WebAssembly as part of an app development framework, one other note: currently WebAssembly modules do not compose very well with each other and with the host, requiring extensive toolchain support to enable e.g. the use of any data type that's not a scalar integer or floating-point value. The component model working group is trying to establish some abstractions and associated tooling, but (again!) it is still early days. Anyone wading into this space needs to be prepared to get their hands dirty.

To return to the question at hand, an app development framework can use WebAssembly for scripting, though the problem of how to compose a host application with a guest script requires good tooling. Or, an app development framework that exposes a web-of-pixels primitive layer can support running WebAssembly apps directly, though again, the set of imports remains to be defined. Either of these two patterns can stick with WebAssembly 1.0 or also allow for garbage collection in WebAssembly 2.0, aiming to capture mindshare among a broader community of potential developers, potentially in a wide range of languages.

As a final observation: WebAssembly is ecumenical, in the sense that it favors no specific church of how to write programs. As a platform, though, you might prefer a state religion, to avoid wasting internal and external efforts on redundant or ill-advised development. After all, if it's your platform, presumably you know best.

summary

What is to be done?

Probably there are as many answers as people, but since this is my blog, here are mine:

  1. On the shortest time-scale I think that it is entirely reasonable to base a mobile application development framework on JavaScript. I would particularly focus on TypeScript, as late error detection is more annoying in native applications.

  2. I would to build something that looks like Flutter underneath: reactive, based on low-level primitives, with a multithreaded rendering pipeline. Perhaps it makes sense to take some inspiration from WebF.

  3. In the medium-term I am sympathetic to Ark's desire to extend the language in a more ResultBuilder-like direction, though this is not without risk.

  4. Also in the medium-term I think that modifications to TypeScript to allow for sound typing could provide some of the advantages of Dart's ahead-of-time compiler to JavaScript developers.

  5. In the long term... well we can do all things with unlimited resources, right? So after solving climate change and homelessness, it makes sense to invest in frameworks that might be usable 3 or 5 years from now. WebAssembly in particular has a chance of sweeping across all platforms, and the primitives for the web-of-pixels will be present everywhere, so if you manage to produce a compelling application development story targetting those primitives, you could eat your competitors' lunch.

Well, friends, that brings this article series to an end; it has been interesting for me to dive into this space, and if you have read down to here, I can only think that you are a masochist or that you have also found it interesting. In either case, you are very welcome. Until next time, happy hacking.

by Andy Wingo at June 15, 2023 02:02 PM

June 14, 2023

Frédéric Wang

Infinite version of the Set card game

edit 2023/06/17: I elaborated a bit more in the conclusion about the open problem of finding a minimal κ\kappa.

The Set Game

I visited A Coruña last week for the Web Engines Hackfest and to participate to internal events with my fellow Igalians. One of our tradition being to play board games, my colleague Ioanna presented a card game called Set. To be honest I was not very good at it, but it made me think of a potential generalization for infinite sets that is worth a blog post…

Basically, we have a deck of λμ\lambda^\mu cards with μ=4\mu = 4 features (number of shapes, shape, shading and color), each of them taking λ=3\lambda = 3 possible values (e.g. red, green or purple for the color). Given κ\kappa cards on the table, players must extract λ\lambda cards forming what is called a Set, which is defined as follows: for each of the μ\mu features, either the cards use the same value or they use pairwise distinct values.

Formally, this can be generalized for any cardinal λ\lambda as follows:

  • A card is a function from μ\mu (the features) to λ\lambda (the values).
  • A set of cards SS is a Set iff for any feature α<μ\alpha < \mu, the mapping ΦαS:S→λ\Phi_\alpha^S : S \rightarrow \lambda

    that maps a card cc to the value c(α)c(\alpha) is either constant or one-to-one.

Given a value κ\kappa such that λ≤κ≤λμ\lambda \leq \kappa \leq \lambda^\mu, can we always extract a Set when κ\kappa cards are put on the table? Or said otherwise, is there a set of κ\kappa cards from which we cannot extract any Set?

Trivial cases (λ≤2\lambda \leq 2 or μ≤1\mu \leq 1)

Given κ≥λ\kappa \geq \lambda cards, we can always extract a Set SS in the following trivial cases:

  • If μ=0\mu = 0 then the deck contains only one card c=∅c = \emptyset. If λ≥2\lambda \geq 2 such a set of κ\kappa cards does not exist. Otherwise we can just take S=∅S = \emptyset or S={c}S ={\{c\}}: these are Sets since the definition is trivial for μ=0\mu = 0.
  • If μ≥1\mu \geq 1 and λ=0\lambda = 0 then the deck is empty. We take S=∅S = \emptyset and for any α<μ\alpha < \mu, Φα∅=∅\Phi_\alpha^\emptyset = \emptyset is both constant and one-to-one.
  • If μ=1\mu = 1 and λ≥1\lambda \geq 1, a card cc is fully determined by its value c(0)c(0), so distinct cards give distinct values. So we can pick any SS of size λ\lambda: it is a Set since Φ0\Phi_0 is one-to-one.
  • If λ=1\lambda = 1 then we can pick any singleton SS: it is a Set since for any feature α<μ\alpha < \mu the mapping ΦαS\Phi_\alpha^S is both constant and one-to-one.
  • If λ=2\lambda = 2 then we can pick any pair of cards SS: it is a Set since for any feature α<μ\alpha < \mu the mapping ΦαS\Phi_\alpha^S is either constant or one-to-one (depending on whether the two cards display the same value or not).

👉🏼 For the rest of this blog post, I’ll assume μ≥2\mu \geq 2 and λ≥3\lambda \geq 3.

Not enough cards on the table (κ≤μ\kappa \leq \mu)

If μ≥κ≥λ≥3\mu \geq \kappa \geq \lambda \geq 3 then we consider cards cαc_\alpha for each α<κ\alpha < \kappa defined for each β<μ\beta < \mu as cα(β)=δα,βc_\alpha{(\beta)} = \delta_{\alpha, \beta} (using Kronecker delta). If we extract a subset SS from these cards and α1,α2,α3<κ≤μ\alpha_1, \alpha_2, \alpha_3 < \kappa \leq \mu are indices for elements of SS then Φα1S\Phi_{\alpha_1}^S respectively evaluates to 1, 0 and 0 for α1,α2,α3\alpha_1, \alpha_2, \alpha_3 so SS is not a Set.

👉🏼 For the rest of this blog post, we’ll assume μ<κ\mu < \kappa and will even focus on the minimal case κ=λ\kappa = \lambda.

Finite number of values (λ<ℵ0\lambda < \aleph_0)

Let’s consider a finite number of values λ≥3\lambda \geq 3 and define the card cαc_\alpha for each α<λ\alpha < \lambda as follows: cα(β)=δα,0c_\alpha(\beta) = \delta_{\alpha, 0} for β=0\beta=0 (again Kronecker delta) and cα(β)=αc_\alpha(\beta) = \alpha for 0<β<μ0 < \beta < \mu. Since μ≥2\mu \geq 2, the latter case shows that S={cα:α<λ}S = \{ c_\alpha : \alpha < \lambda \} contains exactly λ\lambda cards. Since κ=λ<ℵ0\kappa = \lambda < \aleph_0, the only way to extract a subset of size λ\lambda would be to take all the cards. But they don’t form a Set since by construction Φ0(c0)=1≠0=Φ0(c1)=Φ0(c2){\Phi_0{(c_0)}} = 1 \neq 0 = {\Phi_0{(c_1)}} = {\Phi_0{(c_2)}}.

👉🏼 For the rest of the blog post, I’ll assume λ\lambda is infinite.

Singular number of values (cf(λ)<λ\mi{cf}(\lambda) < \lambda)

If λ\lambda is a singular cardinal, then we consider a cofinal sequence {αγ,γ<ν}⊆λ{\{ \alpha_{\gamma}, \gamma < \nu \}} \subseteq \lambda of length ν<λ\nu < \lambda and define the card cαc_\alpha for α<λ\alpha < \lambda as follows:

  • For β=0\beta = 0, we consider the smallest ordinal γ<ν≤λ\gamma < \nu \leq \lambda such that α<αγ\alpha < \alpha_\gamma and define cα(0)=γc_\alpha{(0)} = \gamma.
  • For any 1≤β<μ1 \leq \beta < \mu, cα(β)=α{c_\alpha{(\beta)}} = \alpha.

Since μ≥2\mu \geq 2, the latter case shows that these are λ\lambda distinct cards. Consider S⊆{cα,α<λ}S \subseteq {\{c_\alpha, \alpha < \lambda \}}. If Φ0S\Phi_0^S evaluates to a constant value γ<ν\gamma < \nu then S=(Φ0S)−1({γ})S = {\left(\Phi_0^S\right)}^{-1}(\{\gamma\}) has size at most |αγ|<λ{|\alpha_\gamma|} < \lambda. If instead Φ0S\Phi_0^S is one-to-one then it takes at most ν\nu distinct values so again |S|≤ν<λ{|S|} \leq \nu < \lambda. Hence SS is not a Set.

👉🏼 For the rest of the blog post, I’ll assume λ\lambda is an infinite regular cardinal.

Finite number of features (μ<ℵ0\mu < \aleph_0)

In this section, we assume that the number of features μ\mu is finite. Let’s consider λ\lambda cards cαc_\alpha and extract a Set SS by induction as follows:

  • S0={cα:α<λ}S_0 = \{ c_\alpha : \alpha < \lambda \}
  • For any β<μ\beta < \mu, we construct Sβ+1⊆SβS_{\beta+1} \subseteq S_{\beta} of cardinality λ\lambda. We note that λ\lambda is regular and Sβ=⋃α∊ΦβSβ(Sβ)(ΦβSβ)−1({α}) S_\beta = {\bigcup_{\alpha \in \Phi_\beta^{S_\beta}{(S_\beta)}} {\left(\Phi_\beta^{S_\beta}\right)}^{-1}{(\{\alpha\})} }

    so there are only two possible cases:

    • If ΦβSβ(Sβ)\Phi_\beta^{S_\beta}{(S_\beta)} is of cardinality λ\lambda then pick λ\lambda elements from SβS_\beta with pairwise distinct image by ΦβSβ\Phi_\beta^{S_\beta}.
    • Otherwise, if there is α<λ\alpha < \lambda such that (ΦβSβ)−1({α}){\left(\Phi_\beta^{S_\beta}\right)}^{-1}{(\{\alpha\})} is of cardinality λ\lambda, then let it be our Sβ+1S_{\beta+1}.
  • S=SμS = S_{\mu}

Then by construction, SS is of size λ\lambda and for any β<μ\beta < \mu, S⊆Sβ+1S \subseteq S_{\beta+1} which means that ΦβS=(ΦβSβ)|S\Phi_\beta^S = { {(\Phi_\beta^{S_\beta})}_{| S}} is either constant or one-to-one.

Incidentally, although I said I would focus on the case κ=λ\kappa = \lambda the result of this session shows that we can extract a Set if more than λ\lambda cards are put on the table!

Summary and open questions

Above are the results I found from a preliminary investigation, which can be summarized as follows:

  1. If λ≤2\lambda \leq 2 or μ≤1\mu \leq 1 then we can always find a Set from κ≥λ\kappa \geq \lambda cards.
  2. If 3≤λ≤μ3 \leq \lambda \leq \mu then for any κ\kappa such that λ≤κ≤μ\lambda \leq \kappa \leq \mu there is a set of κ\kappa cards from which we cannot extract any Set.
  3. If 2≤μ<λ<ℵ02 \leq \mu < \lambda < \aleph_0 there is a set of λ\lambda cards from which we cannot extract any Set.
  4. If 2≤μ2 \leq \mu and λ\lambda is singular then there is a set of λ\lambda cards from which we cannot extract any Set.
  5. If 2≤μ<ℵ0≤cf(λ)=λ2 \leq \mu < \aleph_0 \leq {\mi{cf}(\lambda)} = \lambda, then we can always find a Set from κ≥λ\kappa \geq \lambda cards.

Note that for the standard game 3=λ<μ=43 = \lambda < \mu = 4 the only of the results above that applies is (2). Indeed, having only three or four cards on the table is generally not enough to extract a Set!

So far, I was not able to find an answer for the case ℵ0≤μ<cf(λ)=λ≤κ\aleph_0 \leq \mu < {\mi{cf}(\lambda)} = \lambda \leq \kappa. It looks like the inductive construction from the previous paragraph could work, but it’s not clear what guarantees that taking intersection at limit step would preserve size κ\kappa (an idea would be to use closed unbounded SβS_\beta instead but I didn’t find a satisfying proof). I also failed to build a counter-example set of λ\lambda cards without any Set subset, despite several attempts.

More generally, an open problem is to determine the minimal number of cards κ\kappa (with λ≤κ≤λμ\lambda \leq \kappa \leq \lambda^\mu) to put on the table to ensure players can always extract a Set subset… or even if such a number actually exists! If it does, then in cases (2) (3) (4) we only know κ>λ\kappa > \lambda. In cases (1) and (5) the minimum value κ=λ\kappa = \lambda works ; and when μ≥2\mu \geq 2 and λ≥3\lambda \geq 3 are finite, the maximum value κ=λμ\kappa = \lambda^\mu means taking the full deck, which works too (e.g. it always contains the Set given by ∀α<λ,∀β<μ,cα(β)=α\forall \alpha < \lambda, \forall \beta < \mu, c_\alpha(\beta) = \alpha). Incidentally, note that the latter case is consistent with (2) and (3) since we have λμ>μ,λ\lambda^\mu > \mu, \lambda. But in general for infinite parameters putting κ=λμ\kappa = \lambda^\mu cards on the table does not mean putting the full deck, so it’s less obvious whether we can extract a Set

June 14, 2023 12:00 AM

June 12, 2023

Igalia Compilers Team

QuickJS: An Overview and Guide to Adding a New Feature

In a previous blog post, I briefly mentioned QuickJS (QJS) as an alternative implementation of JavaScript (JS) that does not run in a web browser. This time, I'd like to delve deeper into QJS and explain how it works.

First, some remarks on QJS's history and overall architecture. QJS was written by Fabrice Bellard, who you may know as the original author of Qemu and FFmpeg, and was first released in 2019. QJS is primarily a bytecode interpreter (with no JIT compiler tiers) that can execute JS relatively quickly.

You can invoke QJS from the command-line like NodeJS and similar systems:

$ echo "console.log('hello world');" > hello.js
$ qjs hello.js # qjs is the main executable for quickjs
hello world

QJS comes with another tool called qjsc that can produce small executable binaries from JS source code. It does so by embedding QJS bytecode in C code that links with the QJS runtime, which avoids the need to parse JS to bytecode at runtime.

The following example demonstrates this (note: feel free to skip over the the details of this C code output, it's not crucial for the rest of the post):

$ qjsc hello.js -e -o hello.c # qjsc compiles the JS instead of running directly
$ cat hello.c
/* File generated automatically by the QuickJS compiler. */

#include "quickjs-libc.h"

const uint32_t qjsc_hello_size = 78;

const uint8_t qjsc_hello[78] = {
0x02, 0x04, 0x0e, 0x63, 0x6f, 0x6e, 0x73, 0x6f,
0x6c, 0x65, 0x06, 0x6c, 0x6f, 0x67, 0x16, 0x68,
0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72,
0x6c, 0x64, 0x10, 0x68, 0x65, 0x6c, 0x6c, 0x6f,
0x2e, 0x6a, 0x73, 0x0e, 0x00, 0x06, 0x00, 0xa0,
0x01, 0x00, 0x01, 0x00, 0x03, 0x00, 0x00, 0x14,
0x01, 0xa2, 0x01, 0x00, 0x00, 0x00, 0x38, 0xe1,
0x00, 0x00, 0x00, 0x42, 0xe2, 0x00, 0x00, 0x00,
0x04, 0xe3, 0x00, 0x00, 0x00, 0x24, 0x01, 0x00,
0xcd, 0x28, 0xc8, 0x03, 0x01, 0x00,
};

static JSContext *JS_NewCustomContext(JSRuntime *rt)
{
JSContext *ctx = JS_NewContextRaw(rt);
if (!ctx)
return NULL;
JS_AddIntrinsicBaseObjects(ctx);
JS_AddIntrinsicDate(ctx);
JS_AddIntrinsicEval(ctx);
JS_AddIntrinsicStringNormalize(ctx);
JS_AddIntrinsicRegExp(ctx);
JS_AddIntrinsicJSON(ctx);
JS_AddIntrinsicProxy(ctx);
JS_AddIntrinsicMapSet(ctx);
JS_AddIntrinsicTypedArrays(ctx);
JS_AddIntrinsicPromise(ctx);
JS_AddIntrinsicBigInt(ctx);
return ctx;
}

int main(int argc, char **argv)
{
JSRuntime *rt;
JSContext *ctx;
rt = JS_NewRuntime();
js_std_set_worker_new_context_func(JS_NewCustomContext);
js_std_init_handlers(rt);
JS_SetModuleLoaderFunc(rt, NULL, js_module_loader, NULL);
ctx = JS_NewCustomContext(rt);
js_std_add_helpers(ctx, argc, argv);
js_std_eval_binary(ctx, qjsc_hello, qjsc_hello_size, 0);
js_std_loop(ctx);
JS_FreeContext(ctx);
JS_FreeRuntime(rt);
return 0;
}

It's possible to embed parts of this C output into a larger program, for adding the ability to script a system in JS for example. You can also compile it, along with the QJS runtime, to WebAssembly (as is done in tools such as the Bytecode Alliance's Javy).

QJS as it exists today supports many features in the JS standard, but not all of them. What if you need to extend it to support modern JS features? Where would you start?

To address these questions, the rest of this post explains some of the internals of QJS by walking through the implementation of a new feature. The feature that we will explore is the ergonomic brand checks for private fields proposal, which I picked because it is a relatively simple and straightforward feature to implement. This proposal reached stage 4 in the TC39 process in 2021, and is currently part of the official ECMAScript 2022 standard.

Before getting into the details of adding the new feature, we'll first start with an explanation of what the proposal we are exploring actually does. After that, I'll explain how QJS processes JS code at a high-level before diving into the details of how to implement this proposal.

Explaining "ergonomic brand checks for private fields" #

The proposal we'll be exploring is titled "Ergonomic brand checks for private fields", which for the rest of this post I'll shorten to "private brand checks". Since ES2022, JS has supported private fields in classes. For example, you can declare a private field as follows:

class Foo {
#priv = 0; // private field declaration (needed for #priv to be in scope)
get() { return this.#priv; }
}

new Foo().get(); // returns 0
new Foo().#priv; // error, it's private

Note that the # syntax is special and only allowed for private field names. Ordinary identifiers cannot be used to define a private field.

Private brand checks, also added in ES2022, are just a way to check if a given object has a given private field with a convenient syntax. For example, the isFoo static method in the following snippet uses a private brand check:

class Foo {
#priv; // necessary declaration
static isFoo(obj) { return #priv in obj; } // brand check for #priv
}

class Bar {
#priv; // a different #priv than above!
}

Foo.isFoo(new Foo()); // returns true
Foo.isFoo({}); // returns false
Foo.isFoo(new Bar()); // returns false

The example shows that the proposal overloads the behavior of in so that if the left-hand side is a private field name, it checks for the presence of that private field. Note that since private names are scoped to the class, private names that look superficially identical in different classes may not pass the same brand checks (as the example above showed).

Now that we know what this proposal does, let's talk about what it takes to implement it. Before explaining the nitty-gritty details, we'll first talk about the architecture of QJS at a high-level.

Architecture overview #

Most people probably run JS code in a web browser or via a runtime like NodeJS, Deno, or Bun that uses those browsers' JS engines. These engines typically use a tiered implementation strategy in which code often starts running in an interpreter and then tiers up to a compiler, perhaps multiple compilers, to produce faster code (see this blog post by Lin Clark for a high-level overview).

These engines typically also compile the JS source program into bytecode, an intermediate form that can be interpreted and compiled more easily than the source code or its parsed abstract syntax tree (AST).

QJS shares some of these steps, in that it also compiles JS to bytecode and then interprets the bytecode. However, it has no additional execution tiers.

While web browers generally have to fetch JS source code and compile to bytecode while running (though there is bytecode caching to optimize this), when QJS emits an executable (e.g., the use of qjsc from earlier) it avoids the runtime parsing step by compiling the bytecode into the executable.

The QJS bytecode is designed for a stack machine (unlike, say, V8's Ignition interpreter which uses a register machine). That is, the operations in the bytecode fetch data from the runtime system's stack. WebAssembly (Wasm) made a similar choice, which reflects a goal shared by both Wasm and QJS to produce small binaries. A stack machine can save overhead in instruction encoding because the instructions do not specify register names to fetch operands from. Instead, instructions just fetch their operands from the stack.

Thus, the overall operation of QJS is that it parses a JS file and creates a representation of the module or script, which contains some functions. Each function is compiled to bytecode. Then QJS interprets that bytecode to execute the program.

Diagram illustrating the steps in the execution pipeline for QuickJS

Adding support for a new proposal will affect several parts of this pipeline. In the case of private brand checks, we will need to modify the parser to accept the new syntax, add a new bytecode to represent the new operation, and add a new case in the core interpreter loop to implement that operation.

With that high-level overview in mind, we'll dive into specific parts of QJS in the following sections. Since QJS is written in C (in fact, the bulk of the system is contained in a single 10k+ line C file.), I'll be showing example snippets of C code to show what needs to change to implement private brand checks.

Parser #

The typical parsing pass in JS engines translates the JS source code to an internal AST representation. There is a separate bytecode generation pass that walks the AST and linearizes its structure into bytecodes.

QJS fuses these two passes and directly generates bytecode while parsing the source code. While this saves execution time, it does add its own kind of complexity.

To understand parsing, it's useful to know where QJS kicks off the process. JS_EvalInternal is the entry point for evaluating JS code. This can either evaluate and construct the runtime representation of a script or module in order to execute it, or just compile it to bytecode to emit to a file.

In turn, this will first run the lexer to create a tokenized version of the source code. Afterwards, it calls js_parse_program to parse the tokenized source code. The parser has its own state (JSParseState) which contains information on where the parser is in the token stream, the bytecodes emitted so far, and so on.

The parser broadly follows the structure of the JS specification's grammar, in which statements and expressions are organized in a particular nesting structure to avoid ambiguity. For modifying how the in operator gets parsed, we'll be interested in how relational expressions in particular are parsed. As relational expressions are a kind of binary operator expression, they're handled in QJS by the js_parse_expr_binary function. That function handles binary operators by "level", corresponding to how they nest in the formal grammar. The bottom level consists of multiplicative expressions, up to bitwise logical operators. The in operator is handled at level 5, along with other relational operators like <.

Since QJS will output the stack bytecode instructions in a single pass, it's necessary in a binary expression like expr_1 in expr_2 to first parse expr_1 and emit its bytecode, then parse expr_2 and emit that, then finally emit the bytecode for OP_in (i.e., it's a post-order traversal of the AST, since stack instructions are essentially postfix).

We won't need to change js_parse_expr_binary for private brand checks, as the main difference from normal in operators is how the left-hand side is parsed. For that, we'll be interested in js_parse_postfix_expr, which parses references to variable names (and is eventually called by js_parse_expr_binary). The js_parse_postfix_expr function, like most other parsing functions, has a switch statement that dispatches on different token types.

For example, there are tokens such as TOK_IDENT for ordinary identifiers for variables (e.g., foo) and TOK_PRIVATE_NAME for private field names (e.g., #foo). We will need to add a new case for private field tokens in the switch for js_parse_postfix_expr:

    case TOK_PRIVATE_NAME:
{
JSAtom name;
// Only allow this syntax if the next token is `in`.
// The left-hand side of a private brand check can't be a nested expression, it
// has to specifically be a private name.
if (peek_token(s, FALSE) != TOK_IN)
return -1;
// I'll explain a bit about atoms later. This code extracts
// a handle for the string content of the private name.
name = JS_DupAtom(s->ctx, s->token.u.ident.atom);
if (next_token(s))
return -1;
// This is a new bytecode that we'll add that looks up that the private
// field is valid and produces data for the `in` operator.
emit_op(s, OP_scope_ref_private_field);
// These are the arguments for the above op code in the instruction stream.
emit_u32(s, name);
emit_u16(s, s->cur_func->scope_level);
break;
}

This case allows a private name to appear, and only allows it if the next token in the stream is in. We need the restriction because we don't want the private name to appear in any other expression, as those are invalid (private names should otherwise only appear in declarations in classes or in expressions like this.#priv).

It also emits the bytecode for this expression, which uses a new scope_ref_private_field operator that we add. When new opcodes get added, they're defined in quickjs-opcode.h. The scope_ref_private_field opcode is a new variant on existing opcodes like scope_get_private_field that are already defined in that header.

The scope_ref_private_field operator actually never appears in executable bytecode, and only appears temporarily as input to another pass. When I said bytecode is emitted from the parser in a single pass earlier, this was actually a slight simplification. After the initial parse, the bytecode goes through a scope resolution phase (see resolve_variables) where certain kinds of scope violations are ruled out. For example, the phase would signal an error on the following code:

// Invalid example
class Foo {
// missing declaration of #priv
foo(obj) { return #priv in obj; } // #priv is unbound
}

There's also an optimization pass on the bytecode to obtain some speedups in interpretation later.

In the scope resolution phase, scope_ref_private_field is translated to a get_var_ref operation, which looks up a variable in the runtime environment. This will resolve a variable to an index that the runtime can use to look up the private field in an object's property table. The reason we add this new operation is that existing operations like scope_get_private_field also get translated to do the actual field lookup in the object immediately, whereas we want to wait until the in operator is executed in order to do that.

Interpreter and runtime #

Once the bytecode compilation process is finished, the interpreter can start executing the program. QJS treats everything uniformly by considering all execution to take place in a function, so for example the code that runs in a module or script top-level is also in a special kind of function.

Therefore, all execution in QJS takes place in a core interpreter loop which runs a function body. It loads the bytecode for that function body and repeatedly runs the operations specified by the bytecode until it reaches the end. When executing the bytecode, the interpreter also maintains a runtime stack that stores temporary values produced by the operators. The interpreter allocates exactly enough stack space to run a particular function; the compiler pre-computes the max stack size for each function and encodes it in the bytecode format.

To add a new instruction, usually you add a new case to the big switch statement in the main interpreter loop in JS_CallInternal. Since we're just extending an existing operator, this case already exists. So instead, we need to extend the helper function js_operator_in. An annotated version of that function looks like this:

// Note: __exception is a QJS convention to warn if the result is unused
static __exception int js_operator_in(JSContext *ctx, JSValue *sp)
{
JSValue op1, op2;
JSAtom atom;
int ret;

// Reference the values in the top two stack slots
// op1 is the result of executing the left-hand side of the `in`
// op2 is the result of executing the right-hand side of the `in`
op1 = sp[-2];
op2 = sp[-1];

// op2 is the right-hand-side of `in`, which must be a JS object
if (JS_VALUE_GET_TAG(op2) != JS_TAG_OBJECT) {
JS_ThrowTypeError(ctx, "invalid 'in' operand");
return -1;
}

// Atoms are covered in more detail below
// but generally this just converts a string or symbol to a
// handle to an interned string, or it's a tagged number
atom = JS_ValueToAtom(ctx, op1);
if (unlikely(atom == JS_ATOM_NULL))
return -1;

// Look up if the property corresponding to left-hand-side name exists in the object.
ret = JS_HasProperty(ctx, op2, atom);

// QJS also has a reference-counting garbage collector. We need to appropriately
// free (i.e, decrement refcounts) on values when we stop using them.
JS_FreeAtom(ctx, atom);
if (ret < 0)
return -1;
JS_FreeValue(ctx, op1);
JS_FreeValue(ctx, op2);

// Push a boolean onto the top stack slot
// Note: the stack is shrunk after this by the main loop, so -2 is the top.
sp[-2] = JS_NewBool(ctx, ret);

return 0;
}

At this point in the code, the results of evaluating the left- and right-hand side expressions of an in are already on the stack. These are JS values, so now might be a good time to talk about how values are represented in QJS.

Object Representation #

All JS engines have their own internal representation of JS values, which include primitive values such as symbols and numbers and also object values. Since JS is dynamically typed, a given function can be called with all kinds of values, so the engine's representation needs a way to distinguish the values to appropriately signal an error, or choose the correct operation.

To do this, values need to come with some kind of tag. Some engines use a tagging scheme such as NaN-boxing to store all values inside the bit pattern of a 64-bit floating point number (using the different kinds of NaNs that exist in the IEEE-754 standard to distinguish cases). My colleague Andy Wingo wrote a blog post on this topic a while ago, laying out various options that JS engines use.

QJS uses a much simpler scheme, and dedicates 128 bits to each JS value. Half of that is the payload (a 64-bit float, pointer, etc.) and half is the tag value. The following definitions show how this is represented in C:

typedef union JSValueUnion {
int32_t int32;
double float64;
void *ptr;
} JSValueUnion;

typedef struct JSValue {
JSValueUnion u;
int64_t tag;
} JSValue;

On 32-bit platforms there is a different tagging scheme that I won't detail other than to note that it uses NaN-boxing with a 64-bit representation.

For the most part, the representation details are abstracted by various macros like JS_VALUE_GET_TAG used in the example code above, so there won't be much need to directly interact with the value representation in this post.

Reference counting and objects #

Compound data, such as objects and strings, are tracked by a relatively simple reference counting garbage collector in QJS. This is in contrast to the much more complex collectors in web engines, such as WebKit's Riptide, that have different design tradeoffs and requirements such as the need for concurrency. There's a lot more to say about how reference counting and compound data work in QJS, but I'll save most of those details for a future post.

Atoms and strings #

Certain data types have a special representation because they are so common and are used repeatedly in the program. These are small integers and strings. These correspond to property names, symbols, private names, and so on. QJS uses a datatype called an Atom for these cases (which has already appeared in code examples above).

An atom is a handle that is either tagged as an integer, or is an index that refers to an interned string, i.e., a unique string that is only allocated once and stored in a hash table. Atoms that appear in the program's bytecode are also serialized in the bytecode format itself, and are loaded into the runtime table on initialization.

The data type JSAtom is defined as a uint32_t, so it's just a 32-bit integer. Properties of objects, for example, are always accessed with atoms as the property key. This means that property tables in objects just need to map atoms to the stored values.

You can see this in action with the JS_HasProperty lookup above, which looks like JS_HasProperty(ctx, op2, atom). This code looks up a key atom in the object op2's property table. In turn, atom comes from the line atom = JS_ValueToAtom(ctx, op1), which converts the property name value op1 into either an integer or a handle to an interned string.

Changing the operation to support private fields #

The actual change to js_operator_in to support private brand checks is very simple. In the case that the private field is a non-method field, the resolved private name lookup via get_var_ref pushes a symbol value onto the stack. This case doesn't require any changes.

In the case that the private field refers to a method, the name lookup pushes a function object onto the stack. We then need to run a private brand check with the target object and this private function, to ensure the private function really is part of the object.

At a high level, you can see the similarity between this operation and the runtime semantics described in the formal spec for the private brand check proposal.

The modified code looks like the following:

static __exception int js_operator_in(JSContext *ctx, JSValue *sp)
{
JSValue op1, op2;
JSAtom atom;
int ret;

op1 = sp[-2];
op2 = sp[-1];

if (JS_VALUE_GET_TAG(op2) != JS_TAG_OBJECT) {
JS_ThrowTypeError(ctx, "invalid 'in' operand");
return -1;
}

// --- New code here ---
// This is the same as the previous code, but now under a conditional.
// It doesn't need to change, because after resolving the private field
// name to a symbol via `get_var_ref` the normal `JS_HasProperty` lookup
// works.
if (JS_VALUE_GET_TAG(op1) != JS_TAG_OBJECT) {
atom = JS_ValueToAtom(ctx, op1);
if (unlikely(atom == JS_ATOM_NULL))
return -1;
ret = JS_HasProperty(ctx, op2, atom);
JS_FreeAtom(ctx, atom);
// New conditional branch, in case the field operand is an object.
// When a private method is referenced via `get_var_ref`, it actually
// produces the function object for that method. We then can call
// the `JS_CheckBrand` operation that is already defined to check the
// validity of a private method call.
} else {
// JS_CheckBrand is modified to take a boolean (last arg) that
// determines whether to throw on failure or just indicate the
// success/fail state. This is needed as `in` doesn't throw when
// the check fails, it just returns false.
ret = JS_CheckBrand(ctx, op2, op1, FALSE);
}
// --- New code end ---

if (ret < 0)
return -1;
JS_FreeValue(ctx, op1);
JS_FreeValue(ctx, op2);

sp[-2] = JS_NewBool(ctx, ret);

return 0;
}

Testing #

We can validate this implementation against the official test262 tests. QJS comes with a test runner that can run against test262 (invoking make test2 will run it). Since we've added a new feature, we must also modify the tested features list in the test262 configuration file to specify that the feature should be tested. For private brand checks, we change class-fields-private-in=skip in that file to class-fields-private-in.

After changing the test file, the test262 tests for the private brand check feature all succeed with the exception of some syntax tests due to an existing bug with how in is parsed in general in QJS (the code function f() { "foo" in {} = 0; } should fail to parse, but errors at runtime instead in QJS).

Wrap-up #

With the examples above, I've walked through what it takes to add a relatively simple JS language feature to QuickJS. The private brand checks proposal just adds a new use of an existing syntax, so implementing it mostly just touches the parser and core interpreter loop. A feature that affects more of the language, such as adding a new datatype or changing how functions are executed, would obviously require more code and deeper changes.

The full changes required to implement this feature (other than test changes) can be reviewed in this patch.

In future posts, I'm planning to explain other parts of the QJS codebase and potentially explore how it's being used in the WebAssembly ecosystem.


Header image credit: https://www.pexels.com/photo/selective-focus-photography-of-train-610683/

June 12, 2023 12:00 AM

June 05, 2023

Alex Bradbury

2023Q2 week log

I tend to keep quite a lot of notes on the development related (sometimes at work, sometimes not) I do on a week-by-week basis, and thought it might be fun to write up the parts that were public. This may or may not be of wider interest, but it aims to be a useful aide-mémoire for my purposes at least. Weeks with few entries might be due to focusing on downstream work (or perhaps just a less productive week - I am only human!).

Week of 29th May 2023

Week of 22nd May 2023

Week of 15th May 2023

Week of 17th April 2023

  • Still pinging for an updated riscv-bfloat1y spec version that incorporates the fcvt.bf16.s encoding fix.
  • Bumped the version of the experimental Zfa RISC-V extension supported by LLVM to 0.2 (D146834). This was very straightforward as after inspecting the spec history, it was clear there were no changes that would impact the compiler.
  • Filed a couple of pull requests against the riscv-zacas repo (RISC-V Atomic Compare and Swap extension).
    • #8 made the dependency on the A extension explicit.
    • #7 attempted to explicitly reference the extension for misaligned atomics, though it seems won't be merged. I do feel uncomfortable with RISC-V extensions that can have their semantics changed by other standard extensions without this possibility being called out very explicitly. As I note in the PR, failure to appreciate this might mean that conformance tests written for zacas might fail on a system with zacas_zam. I see a slight parallel to a recent discussion about RISC-V profiles.
  • Fixed the canonical ordering used for ISA naming strings in RISCVISAInfo (this will mainly affect the string stored in build attributes). This was fixed in D148615 which built on the pre-committed test case.
  • A whole bunch of upstream LLVM reviews. As noted in D148315 I'm thinking we should probably relaxing the ordering rules for ISA strings in -march in order to avoid issues due to spec changes and incompatibilities between GCC and Clang.
  • LLVM Weekly #485.

Week of 10th April 2023

Week of 3rd April 2023


Article changelog
  • 2023-06-05: Added notes for the week of 22nd May 2023 and week fo 29th May 2023.
  • 2023-05-22: Added notes for the week of 15th May 2023.
  • 2023-04-24: Added notes for the week of 17th April 2023.
  • 2023-04-17: Added notes for the week of 10th April 2023.
  • 2023-04-10: Initial publication date.

June 05, 2023 12:00 PM

June 01, 2023

Manuel Rego

Web Engines Hackfest 2023 is coming

Next week Igalia is hosting a new edition of the Web Engines Hackfset in A Coruña.

As last year we’ll be back at Palexco an amazing venue and we have around 100 people registered to participate onsite. You can check the full schedule of the event at the wiki page.

We hope it’s going to be a great week for everyone, and we’re looking forward to the event!

Talks

On Monday 5th there will be five talks:

  • JavaScript Modules: Past, Present, and Future by Nicolò Ribaudo
  • Inside Kotlin/Wasm (or how your language could benefit from new proposals, like GC, EH, TFR) by Zalim Bashorov
  • Status of the WPE & GTK WebKit ports by Žan Doberšek
  • Servo 2023 by Delan Azabani
  • Ladybird: Building a new browser from scratch by Andreas Kling

I’m really happy about the set of talks, and particularly excited about the possibility to see the presentations about some less known web rendering engines like WPE, Servo and LibWeb. BTW, the talks will be live streamed in the Web Engines Hackfest YouTube channel.

Breakout Sessions

Apart from that we’ll have breakout sessions as usual. But this year, remote participation will be allowed on these sessions; you don’t need to register or anything like that, just join the room on the GitHub issues of each breakout session at the planned time.

Breakout Session Facilitator Issue
Cross-Shadow Root IDREF associations Alice Boxhall #10
Getting into web engine contributing CanadaHonk #20
Maintenace of Chromium downstream José Dapena Paz #9
Servo Martin Robinson #16
Standards and Web Performance Daniel Ehrenberg #8
Test262, Testing JavaScript Conformance Philip Chimento #19
Updates on accelerated compositing in WebKitGTK Carlos García Campos #18
Wasm GC in JavaScriptCore Zalim Bashorov #12
Wayland Antonio Gomes #13
WebKit and Linux graphics Žan Doberšek #15
WebViews and Apps Jonas Kruckenberg #11
WinterCG Andreu Botella #14
Wolvic: An open source XR browser Javier Fernández García-Boente #17

More sessions might be scheduled during the event so keep an eye to the hackfest wiki page and issues.

Sponsors

Last, but not least. Thanks to the Web Engines Hackfest sponsors Arm, Google and Igalia; without your support this event won’t be possible.

Web Engines Hackfest 2023 Sponsors

June 01, 2023 10:00 PM

May 31, 2023

Emmanuele Bassi

Constraints editing

Last year I talked about the newly added support for Apple’s Visual Format Language in Emeus, which allows to quickly describe layouts using a cross between ASCII art and predicates. For instance, I can use:

H:|-[icon(==256)]-[name_label]-|
H:[surname_label]-|
H:[email_label]-|
H:|-[button(<=icon)]
V:|-[icon(==256)]
V:|-[name_label]-[surname_label]-[email_label]-|
V:[button]-|

and obtain a layout like this one:

Boxes approximate widgets

Thanks to the contribution of my colleague Martin Abente Lahaye, now Emeus supports extensions to the VFL, namely:

  • arithmetic operators for constant and multiplication factors inside predicates, like [button1(button2 * 2 + 16)]
  • explicit attribute references, like [button1(button1.height / 2)]

This allows more expressive layout descriptions, like keeping aspect ratios between UI elements, without requiring hitting the code base.

Of course, editing VFL descriptions blindly is not what I consider a fun activity, so I took some time to write a simple, primitive editing tool that lets you visualize a layout expressed through VFL constraints:

I warned you that it was primitive and simple

Here’s a couple of videos showing it in action:

At some point, this could lead to a new UI tool to lay out widgets inside Builder and/or Glade.

As of now, I consider Emeus in a stable enough state for other people to experiment with it — I’ll probably make a release soon-ish. The Emeus website is up to date, as it is the API reference, and I’m happy to review pull requests and feature requests.

by ebassi at May 31, 2023 02:36 PM

May 30, 2023

Byungwoo Lee

Emmanuele Bassi

Configuring portals

One of the things I’ve been recently working on at Igalia is the desktop portals implementation, the middleware layer of API for application and toolkit developers that allows sandboxed applications to interact with the host system. Sandboxing technologies like Flatpak and Snap expose the portal D-Bus interfaces inside the sandbox they manage, to handle user-mediated interactions like opening a file that exists outside of the locations available to the sandboxed process, or talking to privileged components like the compositor to obtain a screenshot.

Outside of allowing dynamic permissions for sandboxed applications, portals act as a vendor-neutral API for applications to target when dealing with Linux as an OS; this is mostly helpful for commercial applications that are not tied to a specific desktop environment, but don’t want to re-implement the layer of system integration from the first principles of POSIX primitives.

The architecture of desktop portals has been described pretty well in a blog post by Peter Hutterer, but to recap:

  • desktop portals are a series of D-Bus interfaces
  • toolkits and applications call methods on those D-Bus interfaces
  • there is a user session daemon called xdg-desktop-portal that provides a service for the D-Bus interfaces
  • xdg-desktop-portal implements some of those interface directly
  • for the interfaces that involve user interaction, or interaction with desktop-specific services, we have separate services that are proxied by xdg-desktop-portal; GNOME has xdg-desktop-portal-gnome, KDE has xdg-desktop-portal-kde; Sway and wlroot-based compositors have xdg-desktop-portal-wlr; and so on, and so forth

There’s also xdg-desktop-portal-gtk, which acts a bit as a reference portal implementation, and a shared desktop portal implementation for a lot of GTK-based environments. Ideally, every desktop environment should have their own desktop portal implementation, so that applications using the portal API can be fully integrated with each desktop’s interface guidelines and specialised services.

One thing that is currently messy is the mechanism by which xdg-desktop-portal finds the portal implementations available on the system, and decides which implementation should be used for a specific interface.

Up until the current stable version of xdg-desktop-portal, the configuration worked this way:

  1. each portal implementation (xdg-desktop-portal-gtk, -gnome, -kde, …) ships a ${NAME}.portal file; the file is a simple INI-like desktop entry file with the following keys:
    • DBusName, which contains the service name of the portal, for instance, org.freedesktop.impl.portal.desktop.gnome for the GNOME portals; this name is used by xdg-desktop-portal to launch the portal implementation
    • Interfaces, which contains a list of D-Bus interfaces under the org.freedesktop.impl.portal.* namespace that are implemented by the desktop-specific portal; xdg-desktop-portal will match the portal implementation with the public facing D-Bus interface internally
    • UseIn, which contains the name of the desktop to be matched with the contents of the $XDG_CURRENT_DESKTOP environment variable
  2. once xdg-desktop-portal starts, it finds all the .portal files in a well-known location and builds a list of portal implementations currently installed in the system, containing all the interfaces they implement as well as their preferred desktop environment
  3. whenever something calls a method on an interface in the org.freedesktop.portal.* namespace, xdg-desktop-portal will check the current desktop using the XDG_CURRENT_DESKTOP environment variable, and check if the portal that has a UseIn key that matches the current desktop
  4. once there’s a match, xdg-desktop-portal will activate the portal implementation and proxy the calls made on the org.freedesktop.portal interfaces over to the org.freedesktop.impl.portal ones

This works perfectly fine for the average case of a Linux installation with a single session, using a single desktop environment, and a single desktop portal. Where things get messy is the case where you have multiple sessions on the same system, each with its own desktop and portals, or even no portals whatsoever. In a bad scenario, you may get the wrong desktop portal just because the name sorts before the one you’re interested in, so you get the GTK “reference” portals instead of the KDE-specific ones; in the worst case scenario, you may get a stall when launching an application just because the wrong desktop portal is trying to contact a session service that simply does not exist, and you have to wait 30 seconds for a D-Bus timeout.

The problem is that some desktop portal implementations are shared across desktops, or cover only a limited amount of interfaces; a mandatory list of desktop environments is far too coarse a tool to deal with this. Additionally, xdg-desktop-portal has to have enough fallbacks to ensure that, if it cannot find any implementation for the current desktop, it will proxy to the first implementation it can find in order to give a meaningful answer. Finally, since the supported desktops are shipped by the portal themselves, there’s no way to override this information by packagers, admins, or users.

After iterating over the issue, I ended up writing the support for a new configuration file. Instead of having portals say what kind of desktop environment they require, we have desktop environments saying which portal implementations they prefer. Now, each desktop should ship a ${NAME}-portals.conf INI-like desktop entry file listing each interface, and what kind of desktop portal should be used for it; for instance, the GNOME desktop should ship a gnome-portals.conf configuration file that specifies a default for every interface:

[preferred]
default=gnome

On the other hand, you could have a Foo desktop that relies on the GTK portal for everything, except for specific interfaces that are implemented by the “foo” portal:

[preferred]
default=gtk
org.freedesktop.impl.portal.Screenshot=foo
org.freedesktop.impl.portal.Screencast=foo

You could also disable all portals except for a specific interface (and its dependencies):

[preferred]
default=none
org.freedesktop.impl.portal.Account=gtk
org.freedesktop.impl.portal.FileChooser=gtk
org.freedesktop.impl.portal.Lockdown=gtk
org.freedesktop.impl.portal.Settings=gtk

Or, finally, you could disable all portal implementations:

[preferred]
default=none

A nice side effect of this work is that you can configure your own system, by dropping a portals.conf configuration file inside the XDG_CONFIG_HOME/xdg-desktop-portal directory; this should cover all the cases in which people assemble their desktop out of disparate components.

By having desktop environments (or, in a pinch, the user themselves) owning the kind of portals they require, we can avoid messy configurations in the portal implementations, and clarify the intended behaviour to downstream packagers; at the same time, generic portal implementations can be adopted by multiple environments without necessarily having to know which ones upfront.


In a way, the desktop portals project is trying to fulfill the original mission of freedesktop.org’s Cross-desktop Group: a set of API that are not bound to a single environment, and can be used to define “the Linux desktop” as a platform.

Of course, there’s a lot of work involved in creating a vendor-neutral platform API, especially when it comes to designing both the user and the developer experiences; ideally, more people should be involved in this effort, so if you want to contribute to the Linux ecosystem, this is an area where you can make the difference.

by ebassi at May 30, 2023 01:59 PM

May 25, 2023

Samuel Iglesias

Closing a cycle

For the last four years I’ve served as a member of the X.Org Foundation Board of Directors, but some days ago I stepped down after my term ended and not having run for re-election.

I started contributing to Mesa in 2014 and joined the amazing freedesktop community. Soon after, I joined the X.Org Foundation as an regular member in order to participate in the elections and get access to some interesting perks (VESA, Khronos Group). You can learn more about what X.Org Foundation does in Ricardo’s blogpost.

But everything changed in 2018. That year, Chema and I organized XDC 2018 in A Coruña, Spain.

XDC 2018 photo

The following year, I ran for the yearly election of X.Org Foundation’s board of directors (as it is a two years term, we renew half of the board every year)… and I was elected! It was awesome! Almost immediately, I started coordinating XDC, and looking for organization proposals for the following XDC. I documented my experience organizing XDC 2018 in an attempt to make the job easier for future organizers, reducing the burden that organizing such a conference entails.

In 2021, I was re-elected and everything continued without changes (well, except the pandemic and having our first 2 virtual XDCs: 2020 and 2021).

Unfortunately, my term finished this year… and I did not re-run for election. The reasons were a mix of personal life commitments (having 2 kids change your life completely) and new professional responsibilities. After those changes, I could not contribute as much as I wanted, and that was enough to me to pass the torch and let others contribute to the X.Org Foundation instead. Congratulations to Christopher Michale and Arek Hiler, I’m pretty sure you are going to do great!

Surprisingly enough, I am closing the cycle as it started: organizing X.Org Developers Conference 2023 in A Coruña, Spain from 17th to 19th October 2023.

A Coruña

I leave the board of directors but I won friends and great memories. In case you are interested on participating to the community via the board of directors, prepare your candidancy for next year!

See you in A Coruña!

May 25, 2023 10:09 AM

May 24, 2023

Ricardo García

What is the X.Org Foundation, anyway?

A few weeks ago the annual X.Org Foundation Board of Directors election took place. The Board of Directors has 8 members at any given moment, and members are elected for 2-year terms. Instead of renewing the whole board every 2 years, half the board is renewed every year. Foundation members, which must apply for or renew membership every year, are the electorate in the process. Their main duty is voting in board elections and occasionally voting in other changes proposed by the board.

As you may know, thanks to the work I do at Igalia, and the trust of other Foundation members, I’m part of the board and currently serving the second year of my term, which will end in Q1 2024. Despite my merits coming from my professional life, I do not represent Igalia as a board member. However, to avoid companies from taking over the board, I must disclose my professional affiliation and we must abide by the rule that prohibits more than two people with the same affiliation from being on the board at the same time.

X.Org Logo
Figure 1. X.Org Logo by Wikipedia user Sven, released under the terms of the GNU Free Documentation License

Because of the name of the Foundation and for historical reasons, some people are confused about its purpose and sometimes they tend to think it acts as a governance body for some projects, particularly the X server, but this is not the case. The X.Org Foundation wiki page at freedesktop.org has some bits of information but I wanted to clarify a few points, like mentioning the Foundation has no paid employees, and explain what we do at the Foundation and the tasks of the Board of Directors in practical terms.

Cue the music.

(“The Who - Who Are You?” starts playing)

The main points would be:

  1. The Foundation acts as an umbrella for multiple projects, including the X server, Wayland and others.

  2. The board of directors has no power to decide who has to work on what.

  3. The largest task is probably organizing XDC.

  4. Being a director is not a paid position.

  5. The Foundation pays for project infrastructure.

  6. The Foundation, or its financial liaison, acts as an intermediary with other orgs.

Umbrella for multiple projects

Some directors have argued in the past that we need to change the Foundation name to something different, like the Freedesktop.org Foundation. With some healthy sense of humor, others have advocated for names like Freedesktop Software Foundation, or FSF for short, which should be totally not confusing. Humor or not, the truth is the X.Org Foundation is essentially the Freedesktop Foundation, so the name change would be nice in my own personal opinion.

If you take a look at the Freedesktop Gitlab instance, you can navigate to a list of projects and sort them by stars. Notable mentions you’ll find in the list: Mesa, PipeWire, GStreamer, Wayland, the X server, Weston, PulseAudio, NetworkManager, libinput, etc. Most of them closely related to a free and open source graphics stack, or free and open source desktop systems in general.

X.Org server unmaintained? I feel you

As I mentioned above, the Foundation has no paid employees and the board has no power to direct engineering resources to a particular project under its umbrella. It’s not a legal question, but a practical one. Is the X.Org server dying and nobody wants to touch it anymore? Certainly. Many people who worked on the X server are now working on Wayland and creating and improving something that works better in a modern computer, with a GPU that’s capable of doing things which were not available 25 years ago. It’s their decision and the board can do nothing.

On a tangent, I’m feeling a bit old now, so let me say when I started using Linux more than 20 years ago people were already mentioning most toolkits were drawing stuff to pixmaps and putting those pixmaps on the screen, ignoring most of the drawing capabilities of the X server. I’ve seen tearing when playing movies on Linux many times, and choppy animations everywhere. Attempting to use the X11 protocol over a slow network resulted in broken elements and generally unusable screens, problems which would not be present when falling back to a good VNC server and client (they do only one specialized thing and do it better).

For the last 3 or 4 years I’ve been using Wayland (first on my work laptop, nowadays also on my personal desktop) and I’ve seen it improve all the time. When using Wayland, animations are never choppy in my own experience, tearing is unheard of and things work more smoothly, as far as my experience goes. Thanks to using the hardware better, Wayland may also give you improved battery life. I’ve posted in the past that you can even use NVIDIA with Gnome on Wayland these days, and things are even simpler if you use an Intel or AMD GPU.

Naturally, there may be a few things which may not be ready for you yet. For example, maybe you use a DE which only works on X11. Or perhaps you use an app or DE which works on Wayland, but its support is not great and has problems there. If it’s an app, likely power users or people working on distributions can tune it to make it use XWayland by default, instead of Wayland, while bugs are ironed out.

X.Org Developers Conference

Ouch, there we have the “X.Org” moniker again…​

Back on track, if the Foundation can do nothing about the lack of people maintaining the X server and does not set any technical direction for projects, what does it do? (I hear you shouting “nothing!” while waving your fist at me.) One of the most time-consuming tasks is organizing XDC every year, which is arguably one of the most important conferences, if not the most important one, for open source graphics right now.

Specifically, the board of directors will set up a commission composed of several board members and other Foundation members to review talk proposals, select which ones will have a place at the conference, talk to speakers about shortening or lengthening their talks, and put them on a schedule to be used at the conference, which typically lasts 3 days. I chaired the paper committee for XDC 2022 and spent quite a lot of time on this.

The conference is free to attend for anyone and usually alternates location between Europe and the Americas. Some people may want to travel to the conference to present talks there but they may lack the budget to do so. Maybe they’re a student or they don’t have enough money, or their company will not sponsor travel to the conference. For that, we have travel grants. The board of directors also reviews requests for travel grants and approves them when they make sense.

But that is only the final part. The board of directors selects the conference contents and prepares the schedule, but the job of running the conference itself (finding an appropriate venue, paying for it, maybe providing some free lunches or breakfasts for attendees, handling audio and video, streaming, etc) falls in the hands of the organizer. Kid you not, it’s not easy to find someone willing to spend the needed amount of time and money organizing such a conference, so the work of the board starts a bit earlier. We have to contact people and request for proposals to organize the conference. If we get more than one proposal, we have to evaluate and select one.

As the conference nears, we have to fire some more emails and convince companies to sponsor XDC. This is also really important and takes time as well. Money gathered from sponsors is not only used for the conference itself and travel grants, but also to pay for infrastructure and project hosting throughout the whole year. Which takes us to…​

Spending millions in director salaries

No, that’s not happening.

Being a director of the Foundation is not a paid position. Every year we suffer a bit to be able to get enough candidates for the 4 positions that will be elected. Many times we have to extend the nomination period.

If you read news about the Foundation having trouble finding candidates for the board, that barely qualifies as news because it’s almost the same every year. Which doesn’t mean we’re not happy when people spread the news and we receive some more nominations, thank you!

Just like being an open source maintainer is not a grateful task sometimes, not everybody wants to volunteer and do time-consuming tasks for free. Running the board elections themselves, approving membership renewals and requests every year, and sending voting reminders also takes time. Believe me, I just did that a few weeks ago with help from Mark Filion from Collabora and technical assistance from Martin Roukala.

Project infrastructure

The Foundation spends a lot of money on project hosting costs, including Gitlab and CI systems, for projects under the Freedesktop.org umbrella. These systems are used every day and are fundamental for some projects and software you may be using if you run Linux. Running our own Gitlab instance and associated services helps keep the web decentralized and healthy, and provides more technical flexibility. Many people seem to appreciate those details, judging by the number of projects we host.

Speaking on behalf of the community

The Foundation also approaches other organizations on behalf of the community to achieve some stuff that would be difficult otherwise.

To pick one example, we’ve worked with VESA to provide members with access to various specifications that are needed to properly implement some features. Our financial liaison, formerly SPI and soon SFC, signs agreements with the Khronos Group that let them waive fees for certifying open source implementations of their standards.

For example, you know RADV is certified to comply with the Vulkan 1.3 spec and the submission was made on behalf of Software in the Public Interest, Inc. Same thing for lavapipe. Similar for Turnip, which is Vulkan 1.1 conformant.

Conclusions

The song is probably over by now and you have a better idea of what the Foundation does, and what the board members do to keep the lights on. If you have any questions, please let me know.

May 24, 2023 07:52 AM

May 23, 2023

Samuel Iglesias

Joining the Linux Foundation Europe Advisory Board

Last year, the Linux Foundation announced the creation of the Linux Foundation Europe.

Linux Foundation Europe

The goal of the Linux Foundation Europe is, in a nutshell, to promote Open Source in Europe not only to individuals (via events and courses), but to companies (guidance and hosting projects) and European organizations. However, this effort needs the help of European experts in Open Source.

Thus, the Linux Foundation Europe (LFE) has formed an advisory board called the Linux Foundation Europe Advisory Board (LFEAB), which includes representatives from a cross-section of 20 leading European organizations within the EU, the UK, and beyond. The Advisory Board will play an important role in stewarding Linux Foundation Europe’s growing community, which now spans 100 member organizations from across the European region.

Early this year, I was invited to join the LFEAB as an inaugural member. I would not be in this position without the huge amount of work done by the rest of my colleagues at Igalia since the company was founded in 2001, which has paved the way for us to be one of the landmark consultancies specialized in Open Source, both globally and in Europe.

My presence in the LFEAB will help to share our experience, and help the Linux Foundation Europe to grow and spread Open Source everywhere in Europe.

Samuel Iglesias presented as Linux Foundation Europe Advisory Board member

I’m excited to participate in the Linux Foundation Europe Advisory Board! I and the rest of the LFEAB will be at the Open Source Summit Europe, send me an email if you want to meet me to know more about LFEAB, about Igalia or about how you can contribute more to Open Source.

Happy hacking!

May 23, 2023 11:30 AM

Brian Kardell

Says Who?

Says Who?

Thoughts on standards and the new baseline effort.

If you've been around me, or my writing, for any time at all, you've probably heard me ask "but what really makes it a standard"?

It is, for example, possible to have words approved in a standards body for which there are, for all intents and purposes, not much in the way of actual implementation or users. Conversely, it is possible to have things that had nothing to do with standards bodies and yet have dozens or hundreds of interoperable implementations and open licenses and are, in reality, much more universal.

At the end of the day, any real judgement kind of involves looking at the reality on the ground. It is a standard... when it is a standard.

I come from Pittsburgh, and in the Steel City, outside the locker room of the Steelers (our amazing football team) it says...

Text painted on the wall that says 'The standard... is the standard'.

See? It's right there on the wall.

Still, if this makes you uncomfortable, think about an english dictionary. The words in it are simply recognized as standards... because they are.

Where is the invisible line?

At some point, it seems, a thing crosses an invisible line and then it's standard. But only after a gradual process to reach that point. The very end of that process is really rather boring because it's really just stating the obvious.

But where is that magical line when it comes to "the reality on the ground" for web standards?

There's a new effort by the WebDX Community Group called "Baseline" which attempts to idenitfy it and I'm excited because feels like it could be really valuable in several ways.

One that I am most keen on is using it to create a really high signal-to-noise channel for developers to subscribe to. If we define a line right, then something reaching that line is very newsworthy and pretty rare, so we can all afford to pay some attention to it. Imagine an RSS Feed and social media accounts that posted very rarely and only told you this Very Serious Amazing News. Yes, please, give that access to my notifcations! I feel like that would make everything feel a lot less overwhelming and also probably markedly speed real adoption at scale.

The really tricky thing here seems to be, mainly, that it's just really hard to define where that line is in a way everyone agrees with that is actually still useful as more than a kind of historical artifact. That's not to say that such an artifact isn't useful to future learners, but again, by that point this will just be common knowledge.

Stage {x}?

The new Baseline idea has a definition (as of the time of this writing, that is "supported in the last 2 major releases of certain popular browsers (Firefox, Samsung Internet, Safari and Chrome). There was a lot of debate about it before arriving at that. It also currently has a whole slew of issues about why that definition isn't great.

But maybe that's because there are actually several different lines and all of them are interesting in different ways. Think about a progress meter: there can be lots of lines along the way to "done".

The thing I like about the ECMA "Stages" model is that it's easy to visualize like that, and has no clever names: Just 0, 1, 2, 3, 4. Each of those is a 'line' you pass on the way to done. Maybe that kind of model works to discuss here too - we just need more numbers, because those are about ECMA 'done-ness' and not something like what baseline is trying to convey.

Something reaching stage 4 is a huge day, but it doesn't mean the on-the-ground-reality of "all users have support on their devices". In theory, at least, that could still take years to reach.

Conceptually speaking, we could imagine more interesting "lines" (plural) a thing would cross on the way to that day.

For example, the day we learn there is a final engine implementation in experimental builds passing tests is an interesting line. Maybe that's "Stage 5" in my analogy.

The day when it "ships without caveat in the last of the 'steward' browsers" is, in my opinion, a super interesting line (that is, the steward browser that primarily maintains the engine itself). That seems like an especially newsworthy day we should pay attention to because many people will be working on projects that won't ship for a long time, and maybe it's worth considering using it. Maybe that's like "Stage 6" in my analogy.

But that also doesn't mean all of the downstream browsers have released support - that can take time. If you're just starting on a months or year long project, that's probably pretty safe. There's always also a risk that downstream browsers can choose not to adopt a new feature for some reason. Many downstream browsers have considerable support differences with certain APIs (web speech, for example). Not much you can do about that, but what does it mean? Is Web Speech a standard? Is it “baseline”? There are at least tens of millions of browser instances out there that lack support, but a few billion that don't.

Even in the steward's own browser (ie, Chrome, Firefox or Safari), it's not as if releasing a new version is a lightswitch that updates all of the devices in the world. There are lots of things that prevent or delay updating: Corporate controls, OS limits/settings. In some cases, a user interaction is simply required and for whatever reason, people just... don't, for long stretches of time.

So, what should baseline use? Any of them? There probably several useful 'lines' (or stages, if that is easier to imagine) worth discussing. I guess one of those can be called "baseline" - I'm just not really sure where that is in this spectrum. I'm curious for more your thoughts!

Feel free to hit me up on any of the social medias. Tweet, toot or skeet at me if you like. Or, even better: If you're interested in contributing to the thinking around this, it's part of the Web Platform DX Community Group which you can participate in. This work is being tracked and discussed mainly in its Feature-Set Repository on GitHub. Participation is welcome.

May 23, 2023 04:00 AM

May 22, 2023

Maíra Canal

May Update: Finishing my Second Igalia CE

After finishing up my first Igalia Coding Experience in January, I got the amazing opportunity to keep working in the DRI community by extending my Igalia CE to a second round. Huge thanks to Igalia for providing me with this opportunity!

Another four months passed by and here I am completing another milestone with Igalia. Previously, in the last final reports, I described GSoC as “an experience to get a better understanding of what open source is” and the first round of the Igalia CE as “an opportunity for me to mature my knowledge of technical concepts”. My second round of the Igalia CE was a period for broadening my horizons.

I had the opportunity to deepen my knowledge of a new programming language and learn more about Kernel Mode Setting (KMS). I took my time learning more about Vulkan and the Linux graphics stack. All of this new knowledge about the DRM infrastructure fascinated me and made me excited to keep developing.

So, this is a summary report of my journey at my second Igalia CE.

Wrapping Up


First, I took some time to wrap up the contributions of my previous Igalia CE. In my January Update, I described the journey to include IGT tests for V3D. But at the time, I hadn’t yet sent the final versions of the tests. Right when I started my second Igalia CE, I sent the final versions of the V3D tests, which were accepted and merged.

Series Status
[PATCH i-g-t 0/6] V3D Job Submission Tests Accepted
[PATCH i-g-t 0/3] V3D Mixed Job Submission Tests Accepted

Rustgem


The first part of my Igalia CE was focused on rewriting the VGEM driver in Rust. VGEM (Virtual GEM Provider) is a minimal non-hardware-backed GEM (Graphics Execution Manager) service. It is used with non-native 3D hardware for buffer sharing between the X server and DRI.

The goal of the project was to explore Rust in the DRM subsystem and have a working VGEM driver written in Rust. Rust is a blazingly fast and memory-efficient language with its powerful ownership model. It was really exciting to learn more about Rust and implement from the beginning a DRM driver.

During the project, I wrote two blog posts describing the technical aspects of rustgem driver. If you are interested in this project, check them out!

Date Blogpost
28th February Rust for VGEM
22th March Adding a Timeout feature to Rustgem

By the end of the first half of the Igalia CE, I sent an RFC patch with the rustgem driver. Thanks to Asahi Lina, the Rust for Linux folks, and Daniel Vetter for all the feedback provided during the development of the driver. I still need to address some feedback and rebase the series on top of the new pin-init API, but I hope to see this driver upstream soon. You can check the driver’s current status in this PR.

Series Status
[RFC PATCH 0/9] Rust version of the VGEM driver In Review

Apart from rewriting the VGEM driver, I also sent a couple of improvements to the C version of the VGEM driver and its IGT tests. I found a missing mutex_destroy on the code and also an unused struct.

Patches Status
[PATCH] drm/vgem: add missing mutex_destroy Accepted
[PATCH] drm/vgem: Drop struct drm_vgem_gem_object Accepted

On the IGT side, I added some new tests to the VGEM tests. I wanted to ensure that my driver returned the correct values for all possible error paths, so I wrote this IGT test. Initially, it was just for me, but I decided to submit it upstream.

Series Status
[PATCH v3 i-g-t 0/2] Add negative tests to VGEM Accepted

Virtual Kernel Mode Setting (VKMS)


Focusing on the VKMS was the major goal of the second part of my Igalia CE. Melissa Wen is one of the maintainers of the VKMS, and she provided me with a fantastic opportunity to learn more about the VKMS. So far, I haven’t dealt with displays, and learning new concepts in the graphics stack was great.

Rotating Planes

VKMS is a software-only KMS driver that is quite useful for testing and running X (or similar compositors) on headless machines. At the time, the driver didn’t have any support for optional plane properties, such as rotation and blend mode. Therefore, my goal was to implement the first plane property of the driver: rotation. I described the technicalities of this challenge in this blog post, but I can say that it was a nice challenge of this mentorship project.

In the end, we have the first plane property implemented for the VKMS and it is already committed. Together with the VKMS part, I sent a series to the IGT mailing list with some improvements to the kms_rotation_crc tests. These improvements included adding new tests for rotation with offset and reflection and the isolation of some Intel-specific tests.

Series Status
[PATCH v4 0/6] drm/vkms: introduce plane rotation property Accepted
[PATCH 1/2] drm/vkms: Add kernel-doc to the function vkms_compose_row() In Review
[PATCH i-g-t 0/4] kms_rotation_crc improvements and generalization In Review

Improvements

As I was working with the rotation series, I discovered a couple of things that could be improved in the VKMS driver. Last year, Igor Torrente sent a series to VKMS that changed the composition work in the driver. Before his series, the plane composition was executed on top of the primary plane. Now, the plane composition is executed on top of the CRTC.

Although his series was merged, some parts of the code still considered that the composition was executed on top of the primary plane, limiting the VKMS capabilities. So I sent a couple of patches to the mailing list, improving the handling of the primary plane and allowing full alpha blending on all planes.

Moreover, I sent a series that added a module parameter to set a background color to the CRTC. This work raised an interesting discussion about the need for this property by the user space and whether this parameter should be a KMS property.

Apart from introducing the rotation property to the VKMS driver, I also took my time to implement two other properties: alpha and blend mode. This series is still awaiting review, but it would be a nice addition to the VKMS, increasing its IGT test coverage rate.

Finally, I found a bug in the RGB565 conversion. The RGB565 conversion to ARGB16161616 involves some fixed-point operations and, when running the pixel-format IGT test, I verified that the RGB565 test was failing. So, some of those fixed-point operations were returning erroneous values. I checked that the RGB coefficients weren’t being rounded when converted from fixed-point to integers. But, this should happen in order to provided the proper coefficient values. Therefore, the fix was: implement a new helper that rounds the fixed-point value when converting it to a integer.

After performing all this work on the VKMS, I sent a patch adding myself as a VKMS maintainer, which was acked by Javier Martinez and Melissa Wen. So now, I’m working together with Melissa, Rodrigo Siqueira and all DRI community to improve and maintain the VKMS driver.

Series Status
[PATCH v2 0/2] Update the handling of the primary plane Accepted
[PATCH v2 1/2] drm/vkms: allow full alpha blending on all planes Accepted
[PATCH v2] drm/vkms: add module parameter to set background color In Review
[PATCH] drm/vkms: Implement all blend mode properties In Review
[PATCH v3 1/2] drm: Add fixed-point helper to get rounded integer values Accepted

Virtual Hardware

A couple of years ago, Sumera Priyadarsini, an Outreachy intern, worked on a Virtual Hardware functionality for the VKMS. The idea was to add a Virtual Hardware or vblank-less mode as a kernel parameter to enable VKMS to emulate virtual devices. This means no vertical blanking events occur and page flips are completed arbitrarily when required for updating the frame. Unfortunately, she wasn’t able to wrap things up and this ended up never being merged into VKMS.

Melissa suggested rebasing this series and now we can have the Virtual Hardware functionality working on the current VKMS. This was a great work by Sumera and my work here was just to adapt her changes to the new VKMS code.

Series Status
[PATCH 0/2] drm/vkms: Enable Virtual Hardware support In Review

Bug Fixing!

Finally, I was in the last week of the project, just wrapping things up, when I decided to run the VKMS CI. I had recently committed the rotation series and I had run the CI before, but to my surprise, I got the following output:

[root@fedora igt-gpu-tools]# ./build/tests/kms_writeback
IGT-Version: 1.27.1-gce51f539 (x86_64) (Linux: 6.3.0-rc4-01641-gb8e392245105-dirty x86_64)
(kms_writeback:1590) igt_kms-WARNING: Output Writeback-1 could not be assigned to a pipe
Starting subtest: writeback-pixel-formats
Subtest writeback-pixel-formats: SUCCESS (0.000s)
Starting subtest: writeback-invalid-parameters
Subtest writeback-invalid-parameters: SUCCESS (0.001s)
Starting subtest: writeback-fb-id
Subtest writeback-fb-id: SUCCESS (0.020s)
Starting subtest: writeback-check-output
(kms_writeback:1590) CRITICAL: Test assertion failure function get_and_wait_out_fence, file ../tests/kms_writeback.c:288:
(kms_writeback:1590) CRITICAL: Failed assertion: ret == 0
(kms_writeback:1590) CRITICAL: Last errno: 38, Function not implemented
(kms_writeback:1590) CRITICAL: sync_fence_wait failed: Timer expired
Stack trace:
  #0 ../lib/igt_core.c:1963 __igt_fail_assert()
  #1 [get_and_wait_out_fence+0x83]
  #2 ../tests/kms_writeback.c:337 writeback_sequence()
  #3 ../tests/kms_writeback.c:360 __igt_unique____real_main481()
  #4 ../tests/kms_writeback.c:481 main()
  #5 ../sysdeps/nptl/libc_start_call_main.h:74 __libc_start_call_main()
  #6 ../csu/libc-start.c:128 __libc_start_main@@GLIBC_2.34()
  #7 [_start+0x25]
Subtest writeback-check-output failed.
**** DEBUG ****
(kms_writeback:1590) CRITICAL: Test assertion failure function get_and_wait_out_fence, file ../tests/kms_writeback.c:288:
(kms_writeback:1590) CRITICAL: Failed assertion: ret == 0
(kms_writeback:1590) CRITICAL: Last errno: 38, Function not implemented
(kms_writeback:1590) CRITICAL: sync_fence_wait failed: Timer expired
(kms_writeback:1590) igt_core-INFO: Stack trace:
(kms_writeback:1590) igt_core-INFO:   #0 ../lib/igt_core.c:1963 __igt_fail_assert()
(kms_writeback:1590) igt_core-INFO:   #1 [get_and_wait_out_fence+0x83]
(kms_writeback:1590) igt_core-INFO:   #2 ../tests/kms_writeback.c:337 writeback_sequence()
(kms_writeback:1590) igt_core-INFO:   #3 ../tests/kms_writeback.c:360 __igt_unique____real_main481()
(kms_writeback:1590) igt_core-INFO:   #4 ../tests/kms_writeback.c:481 main()
(kms_writeback:1590) igt_core-INFO:   #5 ../sysdeps/nptl/libc_start_call_main.h:74 __libc_start_call_main()
(kms_writeback:1590) igt_core-INFO:   #6 ../csu/libc-start.c:128 __libc_start_main@@GLIBC_2.34()
(kms_writeback:1590) igt_core-INFO:   #7 [_start+0x25]
****  END  ****
Subtest writeback-check-output: FAIL (1.047s)

🫠 🫠 🫠 🫠 🫠 🫠 🫠 🫠 🫠 🫠 🫠 🫠 🫠 🫠 🫠 🫠

Initially, I thought I had introduced the bug with my rotation series. Turns out, I just had made it more likely to happen. This bug has been hidden in VKMS for a while, but it happened just on rare occasions. Yeah, I’m talking about a race condition… The kind of bug that just stays hidden in your code for a long while.

When I started to debug, I thought it was a performance issue. But then, I increased the timeout to 10 seconds and even then the job wouldn’t finish. So, I thought that it could be a deadlock. But after inspecting the DRM internal locks and the VKMS locks, it didn’t seem the case.

Melissa pointed me to a hint: there was one framebuffer being leaked when removing the driver. I discovered that it was the writeback framebuffer. It meant that the writeback job was being queued, but it wasn’t being signaled. So, the problem was inside the VKMS locking mechanism.

After tons of GDB and ftrace, I was able to find out that the composer was being set twice without any calls to the composer worker. I changed the internal locks a bit and I was able to run the test repeatedly for minutes! I sent the fix for review and now I’m just waiting for a Reviewed-by.

Patches Status
[PATCH] drm/vkms: Fix race-condition between the hrtimer and the atomic commit In Review

While debugging, I found some things that could be improved in the VKMS writeback file. So, I decided to send a series with some minor improvements to the code.

Series Status
[PATCH 0/3] drm/vkms: Minor Improvements In Review

Improving IGT tests


If you run all IGT KMS tests on the VKMS driver, you will see that some tests will fail. That’s not what we would expect: we would expect that all tests would pass or skip. The tests could fail due to errors in the VKMS driver or be wrong exceptions on the IGT side. So, on the final part of my Igalia CE, I inspected a couple of IGT failures and sent fixes to address the errors.

Linux Kernel

This patch is a revival of a series I sent in January to fix the IGT test kms_addfb_basic@addfb25-bad-modifier. This test also failed in VC4, and I investigated the reason in January. I sent a patch to guarantee that the test would pass and after some feedback, I came down to a dead end. So, I left this patch aside for a while and decided to recapture it now. Now, with this patch being merged, we can guarantee that the test kms_addfb_basic@addfb25-bad-modifier is passing for multiple drivers.

Patches Status
[PATCH] drm/gem: Check for valid formats Accepted

IGT

On the IGT side, I sent a couple of improvements to the tests. The failure was usually just a scenario that the test didn’t consider. For example, the kms_plane_scaling test was failing in VKMS, because it didn’t consider the case in which the driver did not have the rotation property. As VKMS didn’t use to have the rotation property, the tests were failing instead of skipping. Therefore, I just developed a path for drivers without the rotation property for the tests to skip.

I sent improvements to the kms_plane_scaling, kms_flip, and kms_plane tests, making the tests pass or skip on all cases for the VKMS.

Patches Status
[PATCH i-g-t] tests/kms_plane_scaling: negative tests can return -EINVAL or -ERANGE Accepted
[PATCH i-g-t] tests/kms_plane_scaling: fix variable misspelling Accepted
[PATCH i-g-t] tests/kms_plane_scaling: remove unused parameters Accepted
[PATCH i-g-t] tests/kms_plane_scaling: Only set rotation if rotation != rotate-0 Accepted
[PATCH v2 i-g-t] tests/kms_flip: Check if is Intel device before doing all the setup Accepted
[PATCH i-g-t v2] tests/kms_plane: allow pixel-format tests to run on drivers without legacy LUT In Review

VKMS CI List

One important thing to VKMS is creating a baseline of generic KMS tests that should pass. This way, we can test new contributions against this baseline and avoid introducing regressions in the codebase. I sent a patch to IGT to create a testlist for the VKMS driver with all the KMS tests that must pass on the VKMS driver. This is great for maintenance, as we can run the testlist to ensure that the VKMS functionalities are preserved.

With new features being introduced in VKMS, it is important to keep the test list updated. So, I verified the test results and updated this test list during my time at the Igalia CE. I intend to keep this list updated as long as I can.

Series Status
[PATCH i-g-t] tests/vkms: Create a testlist to the vkms DRM driver Accepted
[PATCH i-g-t 0/3] tests/vkms: Update VKMS’s testlist Accepted

Acknowledgment


First, I would like to thank my great mentor Melissa Wen. Melissa and I are completing a year together as mentee and mentor and it has been an amazing journey. Since GSoC, Melissa has been helping me by answering every single question I have and providing me with great encouragement. I have a lot of admiration for her and I’m really grateful for having her as my mentor during the last year.

Also, I would like to thank Igalia for giving me this opportunity to keep working in the DRI community and learning more about this fascinating topic. Thanks to all Igalians that helped through this journey!

Moreover, I would like to thank the DRI community for reviewing my patches and giving me constructive feedback. Especially, I would like to thank Asahi Lina, Daniel Vetter and the Rust for Linux folks for all the help with the rustgem driver.

May 22, 2023 12:30 PM

May 20, 2023

Andy Wingo

approaching cps soup

Good evening, hackers. Today's missive is more of a massive, in the sense that it's another presentation transcript-alike; these things always translate to many vertical pixels.

In my defense, I hardly ever give a presentation twice, so not only do I miss out on the usual per-presentation cost amortization and on the incremental improvements of repetition, the more dire error is that whatever message I might have can only ever reach a subset of those that it might interest; here at least I can be more or less sure that if the presentation would interest someone, that they will find it.

So for the time being I will try to share presentations here, in the spirit of, well, why the hell not.

CPS Soup

A functional intermediate language

10 May 2023 – Spritely

Andy Wingo

Igalia, S.L.

Last week I gave a training talk to Spritely Institute collaborators on the intermediate representation used by Guile's compiler.

CPS Soup

Compiler: Front-end to Middle-end to Back-end

Middle-end spans gap between high-level source code (AST) and low-level machine code

Programs in middle-end expressed in intermediate language

CPS Soup is the language of Guile’s middle-end

An intermediate representation (IR) (or intermediate language, IL) is just another way to express a computer program. Specifically it's the kind of language that is appropriate for the middle-end of a compiler, and by "appropriate" I meant that an IR serves a purpose: there has to be a straightforward transformation to the IR from high-level abstract syntax trees (ASTs) from the front-end, and there has to be a straightforward translation from IR to machine code.

There are also usually a set of necessary source-to-source transformations on IR to "lower" it, meaning to make it closer to the back-end than to the front-end. There are usually a set of optional transformations to the IR to make the program run faster or allocate less memory or be more simple: these are the optimizations.

"CPS soup" is Guile's IR. This talk presents the essentials of CPS soup in the context of more traditional IRs.

How to lower?

High-level:

(+ 1 (if x 42 69))

Low-level:

  cmpi $x, #f
  je L1
  movi $t, 42
  j L2  
L1:
  movi $t, 69
L2:
  addi $t, 1

How to get from here to there?

Before we dive in, consider what we might call the dynamic range of an intermediate representation: we start with what is usually an algebraic formulation of a program and we need to get down to a specific sequence of instructions operating on registers (unlimited in number, at this stage; allocating to a fixed set of registers is a back-end concern), with explicit control flow between them. What kind of a language might be good for this? Let's attempt to answer the question by looking into what the standard solutions are for this problem domain.

1970s

Control-flow graph (CFG)

graph := array<block>
block := tuple<preds, succs, insts>
inst  := goto B
       | if x then BT else BF
       | z = const C
       | z = add x, y
       ...

BB0: if x then BB1 else BB2
BB1: t = const 42; goto BB3
BB2: t = const 69; goto BB3
BB3: t2 = addi t, 1; ret t2

Assignment, not definition

Of course in the early days, there was no intermediate language; compilers translated ASTs directly to machine code. It's been a while since I dove into all this but the milestone I have in my head is that it's the 70s when compiler middle-ends come into their own right, with Fran Allen's work on flow analysis and optimization.

In those days the intermediate representation for a compiler was a graph of basic blocks, but unlike today the paradigm was assignment to locations rather than definition of values. By that I mean that in our example program, we get t assigned to in two places (BB1 and BB2); the actual definition of t is implicit, as a storage location, and our graph consists of assignments to the set of storage locations in the program.

1980s

Static single assignment (SSA) CFG

graph := array<block>
block := tuple<preds, succs, phis, insts>
phi   := z := φ(x, y, ...)
inst  := z := const C
       | z := add x, y
       ...
BB0: if x then BB1 else BB2
BB1: v0 := const 42; goto BB3
BB2: v1 := const 69; goto BB3
BB3: v2 := φ(v0,v1); v3:=addi t,1; ret v3

Phi is phony function: v2 is v0 if coming from first predecessor, or v1 from second predecessor

These days we still live in Fran Allen's world, but with a twist: we no longer model programs as graphs of assignments, but rather graphs of definitions. The introduction in the mid-80s of so-called "static single-assignment" (SSA) form graphs mean that instead of having two assignments to t, we would define two different values v0 and v1. Then later instead of reading the value of the storage location associated with t, we define v2 to be either v0 or v1: the former if we reach the use of t in BB3 from BB1, the latter if we are coming from BB2.

If you think on the machine level, in terms of what the resulting machine code will be, this either function isn't a real operation; probably register allocation will put v0, v1, and v2 in the same place, say $rax. The function linking the definition of v2 to the inputs v0 and v1 is purely notational; in a way, you could say that it is phony, or not real. But when the creators of SSA went to submit this notation for publication they knew that they would need something that sounded more rigorous than "phony function", so they instead called it a "phi" (φ) function. Really.

2003: MLton

Refinement: phi variables are basic block args

graph := array<block>
block := tuple<preds, succs, args, insts>

Inputs of phis implicitly computed from preds

BB0(a0): if a0 then BB1() else BB2()
BB1(): v0 := const 42; BB3(v0)
BB2(): v1 := const 69; BB3(v1)
BB3(v2): v3 := addi v2, 1; ret v3

SSA is still where it's at, as a conventional solution to the IR problem. There have been some refinements, though. I learned of one of them from MLton; I don't know if they were first but they had the idea of interpreting phi variables as arguments to basic blocks. In this formulation, you don't have explicit phi instructions; rather the "v2 is either v1 or v0" property is expressed by v2 being a parameter of a block which is "called" with either v0 or v1 as an argument. It's the same semantics, but an interesting notational change.

Refinement: Control tail

Often nice to know how a block ends (e.g. to compute phi input vars)

graph := array<block>
block := tuple<preds, succs, args, insts,
               control>
control := if v then L1 else L2
         | L(v, ...)
         | switch(v, L1, L2, ...)
         | ret v

One other refinement to SSA is to note that basic blocks consist of some number of instructions that can define values or have side effects but which otherwise exhibit fall-through control flow, followed by a single instruction that transfers control to another block. We might as well store that control instruction separately; this would let us easily know how a block ends, and in the case of phi block arguments, easily say what values are the inputs of a phi variable. So let's do that.

Refinement: DRY

Block successors directly computable from control

Predecessors graph is inverse of successors graph

graph := array<block>
block := tuple<args, insts, control>

Can we simplify further?

At this point we notice that we are repeating ourselves; the successors of a block can be computed directly from the block's terminal control instruction. Let's drop those as a distinct part of a block, because when you transform a program it's unpleasant to have to needlessly update something in two places.

While we're doing that, we note that the predecessors array is also redundant, as it can be computed from the graph of block successors. Here we start to wonder: am I simpliying or am I removing something that is fundamental to the algorithmic complexity of the various graph transformations that I need to do? We press on, though, hoping we will get somewhere interesting.

Basic blocks are annoying

Ceremony about managing insts; array or doubly-linked list?

Nonuniformity: “local” vs ‘`global’' transformations

Optimizations transform graph A to graph B; mutability complicates this task

  • Desire to keep A in mind while making B
  • Bugs because of spooky action at a distance

Recall that the context for this meander is Guile's compiler, which is written in Scheme. Scheme doesn't have expandable arrays built-in. You can build them, of course, but it is annoying. Also, in Scheme-land, functions with side-effects are conventionally suffixed with an exclamation mark; after too many of them, both the writer and the reader get fatigued. I know it's a silly argument but it's one of the things that made me grumpy about basic blocks.

If you permit me to continue with this introspection, I find there is an uneasy relationship between instructions and locations in an IR that is structured around basic blocks. Do instructions live in a function-level array and a basic block is an array of instruction indices? How do you get from instruction to basic block? How would you hoist an instruction to another basic block, might you need to reallocate the block itself?

And when you go to transform a graph of blocks... well how do you do that? Is it in-place? That would be efficient; but what if you need to refer to the original program during the transformation? Might you risk reading a stale graph?

It seems to me that there are too many concepts, that in the same way that SSA itself moved away from assignment to a more declarative language, that perhaps there is something else here that might be more appropriate to the task of a middle-end.

Basic blocks, phi vars redundant

Blocks: label with args sufficient; “containing” multiple instructions is superfluous

Unify the two ways of naming values: every var is a phi

graph := array<block>
block := tuple<args, inst>
inst  := L(expr)
       | if v then L1() else L2()
       ...
expr  := const C
       | add x, y
       ...

I took a number of tacks here, but the one I ended up on was to declare that basic blocks themselves are redundant. Instead of containing an array of instructions with fallthrough control-flow, why not just make every instruction a control instruction? (Yes, there are arguments against this, but do come along for the ride, we get to a funny place.)

While you are doing that, you might as well unify the two ways in which values are named in a MLton-style compiler: instead of distinguishing between basic block arguments and values defined within a basic block, we might as well make all names into basic block arguments.

Arrays annoying

Array of blocks implicitly associates a label with each block

Optimizations add and remove blocks; annoying to have dead array entries

Keep labels as small integers, but use a map instead of an array

graph := map<label, block>

In the traditional SSA CFG IR, a graph transformation would often not touch the structure of the graph of blocks. But now having given each instruction its own basic block, we find that transformations of the program necessarily change the graph. Consider an instruction that we elide; before, we would just remove it from its basic block, or replace it with a no-op. Now, we have to find its predecessor(s), and forward them to the instruction's successor. It would be useful to have a more capable data structure to represent this graph. We might as well keep labels as being small integers, but allow for sparse maps and growth by using an integer-specialized map instead of an array.

This is CPS soup

graph := map<label, cont>
cont  := tuple<args, term>
term  := continue to L
           with values from expr
       | if v then L1() else L2()
       ...
expr  := const C
       | add x, y
       ...

SSA is CPS

This is exactly what CPS soup is! We came at it "from below", so to speak; instead of the heady fumes of the lambda calculus, we get here from down-to-earth basic blocks. (If you prefer the other way around, you might enjoy this article from a long time ago.) The remainder of this presentation goes deeper into what it is like to work with CPS soup in practice.

Scope and dominators

BB0(a0): if a0 then BB1() else BB2()
BB1(): v0 := const 42; BB3(v0)
BB2(): v1 := const 69; BB3(v1)
BB3(v2): v3 := addi v2, 1; ret v3

What vars are “in scope” at BB3? a0 and v2.

Not v0; not all paths from BB0 to BB3 define v0.

a0 always defined: its definition dominates all uses.

BB0 dominates BB3: All paths to BB3 go through BB0.

Before moving on, though, we should discuss what it means in an SSA-style IR that variables are defined rather than assigned. If you consider variables as locations to which values can be assigned and which initially hold garbage, you can read them at any point in your program. You might get garbage, though, if the variable wasn't assigned something sensible on the path that led to reading the location's value. It sounds bonkers but it is still the C and C++ semantic model.

If we switch instead to a definition-oriented IR, then a variable never has garbage; the single definition always precedes any uses of the variable. That is to say that all paths from the function entry to the use of a variable must pass through the variable's definition, or, in the jargon, that definitions dominate uses. This is an invariant of an SSA-style IR, that all variable uses be dominated by their associated definition.

You can flip the question around to ask what variables are available for use at a given program point, which might be read equivalently as which variables are in scope; the answer is, all definitions from all program points that dominate the use site. The "CPS" in "CPS soup" stands for continuation-passing style, a dialect of the lambda calculus, which has also has a history of use as a compiler intermediate representation. But it turns out that if we use the lambda calculus in its conventional form, we end up needing to maintain a lexical scope nesting at the same time that we maintain the control-flow graph, and the lexical scope tree can fail to reflect the dominator tree. I go into this topic in more detail in an old article, and if it interests you, please do go deep.

CPS soup in Guile

Compilation unit is intmap of label to cont

cont := $kargs names vars term
      | ...
term := $continue k src expr
      | ...
expr := $const C
      | $primcall ’add #f (a b)
      | ...

Conventionally, entry point is lowest-numbered label

Anyway! In Guile, the concrete form that CPS soup takes is that a program is an intmap of label to cont. A cont is the smallest labellable unit of code. You can call them blocks if that makes you feel better. One kind of cont, $kargs, binds incoming values to variables. It has a list of variables, vars, and also has an associated list of human-readable names, names, for debugging purposes.

A $kargs contains a term, which is like a control instruction. One kind of term is $continue, which passes control to a continuation k. Using our earlier language, this is just goto *k*, with values, as in MLton. (The src is a source location for the term.) The values come from the term's expr, of which there are a dozen kinds or so, for example $const which passes a literal constant, or $primcall, which invokes some kind of primitive operation, which above is add. The primcall may have an immediate operand, in this case #f, and some variables that it uses, in this case a and b. The number and type of the produced values is a property of the primcall; some are just for effect, some produce one value, some more.

CPS soup

term := $continue k src expr
      | $branch kf kt src op param args
      | $switch kf kt* src arg
      | $prompt k kh src escape? tag
      | $throw src op param args

Expressions can have effects, produce values

expr := $const val
      | $primcall name param args
      | $values args
      | $call proc args
      | ...

There are other kinds of terms besides $continue: there is $branch, which proceeds either to the false continuation kf or the true continuation kt depending on the result of performing op on the variables args, with immediate operand param. In our running example, we might have made the initial term via:

(build-term
  ($branch BB1 BB2 'false? #f (a0)))

The definition of build-term (and build-cont and build-exp) is in the (language cps) module.

There is also $switch, which takes an unboxed unsigned integer arg and performs an array dispatch to the continuations in the list kt, or kf otherwise.

There is $prompt which continues to its k, having pushed on a new continuation delimiter associated with the var tag; if code aborts to tag before the prompt exits via an unwind primcall, the stack will be unwound and control passed to the handler continuation kh. If escape? is true, the continuation is escape-only and aborting to the prompt doesn't need to capture the suspended continuation.

Finally there is $throw, which doesn't continue at all, because it causes a non-resumable exception to be thrown. And that's it; it's just a handful of kinds of term, determined by the different shapes of control-flow (how many continuations the term has).

When it comes to values, we have about a dozen expression kinds. We saw $const and $primcall, but I want to explicitly mention $values, which simply passes on some number of values. Often a $values expression corresponds to passing an input to a phi variable, though $kargs vars can get their definitions from any expression that produces the right number of values.

Kinds of continuations

Guile functions untyped, can multiple return values

Error if too few values, possibly truncate too many values, possibly cons as rest arg...

Calling convention: contract between val producer & consumer

  • both on call and return side

Continuation of $call unlike that of $const

When a $continue term continues to a $kargs with a $const 42 expression, there are a number of invariants that the compiler can ensure: that the $kargs continuation is always passed the expected number of values, that the vars that it binds can be allocated to specific locations (e.g. registers), and that because all predecessors of the $kargs are known, that those predecessors can place their values directly into the variable's storage locations. Effectively, the compiler determines a custom calling convention between each $kargs and its predecessors.

Consider the $call expression, though; in general you don't know what the callee will do to produce its values. You don't even generally know that it will produce the right number of values. Therefore $call can't (in general) continue to $kargs; instead it continues to $kreceive, which expects the return values in well-known places. $kreceive will check that it is getting the right number of values and then continue to a $kargs, shuffling those values into place. A standard calling convention defines how functions return values to callers.

The conts

cont := $kfun src meta self ktail kentry
      | $kclause arity kbody kalternate
      | $kargs names syms term
      | $kreceive arity kbody
      | $ktail

$kclause, $kreceive very similar

Continue to $ktail: return

$call and return (and $throw, $prompt) exit first-order flow graph

Of course, a $call expression could be a tail-call, in which case it would continue instead to $ktail, indicating an exit from the first-order function-local control-flow graph.

The calling convention also specifies how to pass arguments to callees, and likewise those continuations have a fixed calling convention; in Guile we start functions with $kfun, which has some metadata attached, and then proceed to $kclause which bridges the boundary between the standard calling convention and the specialized graph of $kargs continuations. (Many details of this could be tweaked, for example that the case-lambda dispatch built-in to $kclause could instead dispatch to distinct functions instead of to different places in the same function; historical accidents abound.)

As a detail, if a function is well-known, in that all its callers are known, then we can lighten the calling convention, moving the argument-count check to callees. In that case $kfun continues directly to $kargs. Similarly for return values, optimizations can make $call continue to $kargs, though there is still some value-shuffling to do.

High and low

CPS bridges AST (Tree-IL) and target code

High-level: vars in outer functions in scope

Closure conversion between high and low

Low-level: Explicit closure representations; access free vars through closure

CPS soup is the bridge between parsed Scheme and machine code. It starts out quite high-level, notably allowing for nested scope, in which expressions can directly refer to free variables. Variables are small integers, and for high-level CPS, variable indices have to be unique across all functions in a program. CPS gets lowered via closure conversion, which chooses specific representations for each closure that remains after optimization. After closure conversion, all variable access is local to the function; free variables are accessed via explicit loads from a function's closure.

Optimizations at all levels

Optimizations before and after lowering

Some exprs only present in one level

Some high-level optimizations can merge functions (higher-order to first-order)

Because of the broad remit of CPS, the language itself has two dialects, high and low. The high level dialect has cross-function variable references, first-class abstract functions (whose representation hasn't been chosen), and recursive function binding. The low-level dialect has only specific ways to refer to functions: labels and specific closure representations. It also includes calls to function labels instead of just function values. But these are minor variations; some optimization and transformation passes can work on either dialect.

Practicalities

Intmap, intset: Clojure-style persistent functional data structures

Program: intmap<label,cont>

Optimization: program→program

Identify functions: (program,label)→intset<label>

Edges: intmap<label,intset<label>>

Compute succs: (program,label)→edges

Compute preds: edges→edges

I mentioned that programs were intmaps, and specifically in Guile they are Clojure/Bagwell-style persistent functional data structures. By functional I mean that intmaps (and intsets) are values that can't be mutated in place (though we do have the transient optimization).

I find that immutability has the effect of deploying a sense of calm to the compiler hacker -- I don't need to worry about data structures changing out from under me; instead I just structure all the transformations that you need to do as functions. An optimization is just a function that takes an intmap and produces another intmap. An analysis associating some data with each program label is just a function that computes an intmap, given a program; that analysis will never be invalidated by subsequent transformations, because the program to which it applies will never be mutated.

This pervasive feeling of calm allows me to tackle problems that I wouldn't have otherwise been able to fit into my head. One example is the novel online CSE pass; one day I'll either wrap that up as a paper or just capitulate and blog it instead.

Flow analysis

A[k] = meet(A[p] for p in preds[k])
         - kill[k] + gen[k]

Compute available values at labels:

  • A: intmap<label,intset<val>>
  • meet: intmap-intersect<intset-intersect>
  • -, +: intset-subtract, intset-union
  • kill[k]: values invalidated by cont because of side effects
  • gen[k]: values defined at k

But to keep it concrete, let's take the example of flow analysis. For example, you might want to compute "available values" at a given label: these are the values that are candidates for common subexpression elimination. For example if a term is dominated by a car x primcall whose value is bound to v, and there is no path from the definition of V to a subsequent car x primcall, we can replace that second duplicate operation with $values (v) instead.

There is a standard solution for this problem, which is to solve the flow equation above. I wrote about this at length ages ago, but looking back on it, the thing that pleases me is how easy it is to decompose the task of flow analysis into manageable parts, and how the types tell you exactly what you need to do. It's easy to compute an initial analysis A, easy to define your meet function when your maps and sets have built-in intersect and union operators, easy to define what addition and subtraction mean over sets, and so on.

Persistent data structures FTW

  • meet: intmap-intersect<intset-intersect>
  • -, +: intset-subtract, intset-union

Naïve: O(nconts * nvals)

Structure-sharing: O(nconts * log(nvals))

Computing an analysis isn't free, but it is manageable in cost: the structure-sharing means that meet is usually trivial (for fallthrough control flow) and the cost of + and - is proportional to the log of the problem size.

CPS soup: strengths

Relatively uniform, orthogonal

Facilitates functional transformations and analyses, lowering mental load: “I just have to write a function from foo to bar; I can do that”

Encourages global optimizations

Some kinds of bugs prevented by construction (unintended shared mutable state)

We get the SSA optimization literature

Well, we're getting to the end here, and I want to take a step back. Guile has used CPS soup as its middle-end IR for about 8 years now, enough time to appreciate its fine points while also understanding its weaknesses.

On the plus side, it has what to me is a kind of low cognitive overhead, and I say that not just because I came up with it: Guile's development team is small and not particularly well-resourced, and we can't afford complicated things. The simplicity of CPS soup works well for our development process (flawed though that process may be!).

I also like how by having every variable be potentially a phi, that any optimization that we implement will be global (i.e. not local to a basic block) by default.

Perhaps best of all, we get these benefits while also being able to use the existing SSA transformation literature. Because CPS is SSA, the lessons learned in SSA (e.g. loop peeling) apply directly.

CPS soup: weaknesses

Pointer-chasing, indirection through intmaps

Heavier than basic blocks: more control-flow edges

Names bound at continuation only; phi predecessors share a name

Over-linearizes control, relative to sea-of-nodes

Overhead of re-computation of analyses

CPS soup is not without its drawbacks, though. It's not suitable for JIT compilers, because it imposes some significant constant-factor (and sometimes algorithmic) overheads. You are always indirecting through intmaps and intsets, and these data structures involve significant pointer-chasing.

Also, there are some forms of lightweight flow analysis that can be performed naturally on a graph of basic blocks without looking too much at the contents of the blocks; for example in our available variables analysis you could run it over blocks instead of individual instructions. In these cases, basic blocks themselves are an optimization, as they can reduce the size of the problem space, with corresponding reductions in time and memory use for analyses and transformations. Of course you could overlay a basic block graph on top of CPS soup, but it's not a well-worn path.

There is a little detail that not all phi predecessor values have names, since names are bound at successors (continuations). But this is a detail; if these names are important, little $values trampolines can be inserted.

Probably the main drawback as an IR is that the graph of conts in CPS soup over-linearizes the program. There are other intermediate representations that don't encode ordering constraints where there are none; perhaps it would be useful to marry CPS soup with sea-of-nodes, at least during some transformations.

Finally, CPS soup does not encourage a style of programming where an analysis is incrementally kept up to date as a program is transformed in small ways. The result is that we end up performing much redundant computation within each individual optimization pass.

Recap

CPS soup is SSA, distilled

Labels and vars are small integers

Programs map labels to conts

Conts are the smallest labellable unit of code

Conts can have terms that continue to other conts

Compilation simplifies and lowers programs

Wasm vs VM backend: a question for another day :)

But all in all, CPS soup has been good for Guile. It's just SSA by another name, in a simpler form, with a functional flavor. Or, it's just CPS, but first-order only, without lambda.

In the near future, I am interested in seeing what a new GC will do for CPS soup; will bump-pointer allocation palliate some of the costs of pointer-chasing? We'll see. A tricky thing about CPS soup is that I don't think that anyone else has tried it in other languages, so it's hard to objectively understand its characteristics independent of Guile itself.

Finally, it would be nice to engage in the academic conversation by publishing a paper somewhere; I would like to see interesting criticism, and blog posts don't really participate in the citation graph. But in the limited time available to me, faced with the choice between hacking on something and writing a paper, it's always been hacking, so far :)

Speaking of limited time, I probably need to hit publish on this one and move on. Happy hacking to all, and until next time.

by Andy Wingo at May 20, 2023 07:10 AM

May 18, 2023

José Dapena

Javascript memory profiling with heap snapshot

In both web and NodeJS worlds, the main runtime for executing program logic is the Javascript runtime. Because of that, a huge number of applications and user interfaces are using it. As any software component, Javascript code uses resources of the system, that are not unlimited. We should be careful when using CPU time, application storage, or memory.

In this blog post we are going to focus on the latter.

Where’s my memory!

Usually the objects allocated by a web page are not a lot, so they do not eat a huge amount of memory for a modern and beefy computer. But we find problems like:

  • Oh, but I don’t have a single web page loaded. I like those 40-80 tabs all open for some reason… Well, no, there’s no reason for that! But that’s another topic.
  • Many users are not using beefy phones or computers. So using memory has an impact on what they can do.

The user may not be happy with the web application developer implementation choices. And this developer may want to be… more efficient. Do something.

Where’s my memory! The cloud strikes back

Now… Think about the cloud providers. And developers implementing software using NodeJS in the cloud. The contract with the provider may limit the available memory… Or get money depending on the actual usage.

So… An innocent script that takes 10MB, but is run thousands or millions or times for a few seconds. That is expensive!

These developers will need to make their apps… again, more efficient.

A new hope

In performance problems, we usually want to have reliable data of what is happening, and when. Memory problems are no different. We need some observability of the memory usage.

Chromium and NodeJS share their Javascript runtime, V8, and it provides some tools to help with memory investigation.

In this post we are going to focus on the family of tools around a V8 feature named heap snapshot, that allows capturing the memory usage at any time in a Javascript execution context.

About the heap

❗ This is a fast recap on how Javascript heap works, you can skip it if you want

In V8 Javascript runtime, variables, no matter their scope, are allocated on a heap. No matter if it is a number, a string, an object or a function, all of them are stored there. Not only that, in V8 even the code is stored in the heap.

But, in Javascript, memory is freed lazily, with a garbage collection. This means that, when an object is not used anymore, its memory is not immediately disposed. Garbage collector will explore which objects are disposable later, and free them when it is convenient.

How do we know if an object is still used? The idea is simple: objects are used if they can be accessed. To find out which ones, the runtime will take the root objects, and explore recursively all the object references. Any object that has not been found in that exploration can be discarded.

OK, and what is a root object? In a script it can be the objects in the global context. But also Javascript objects referred from native objects.

More details of how the V8 garbage collector works are out of the scope of this post. If you want to learn more, this post should provide a good overview of current implementation: Trash talk: the Orinoco garbage collector.

Heap snapshot: how does it work?

OK, so we know all the Javascript memory allocation goes through the heap. And, as I said, heap snapshot is a tool to investigate memory problems.

The name is quite explicit about how it works. Heap snapshot will stop the Javascript execution, traverse all the heap, analyze it, and dump it in a meaningful format that can be investigated.

What kind of information does it have?

  • Which objects are in the heap, and their types.
  • How much memory each object takes.
  • The references between them, so we can understand which object is keeping another one from being disposed.
  • In some of the tools, it can also store the stack trace of the code that allocated that memory.

The format of those snapshots is using JSON, and it can be opened from Chromium developer tools for analysis.

Heap snapshots from Chromium

In the Chromium browser, heap snapshots can be obtained from the Chrome developer tools, accessed through the Inspect right button menu option.

This is common to any browser based in Chromium exposing those developer tools locally or remotely.

Once the developer tools are visible, there is the Memory tab:

We can select three profiling types:

  • Heap snapshot: it just captures the heap at the specific moment it is captured.
  • Allocation instrumentation on timeline: this records all the allocations over time, in a session, allowing to check the allocation that happened in a specific time range. This is quite expensive, and suitable only for short profiling sessions.
  • Allocation sampling: instead of capturing all allocations, this one records them with sampling. Not as accurate as allocation instrumentation, but very lightweight, allowing to give a good approximation for a long profiling session.

In all cases, we will get a profiling report that we can analyze later.

Heap snapshots from NodeJS

Using Chromium dev tools UI

In NodeJS, we can attach the Chrome dev tools passing --inspect through the command line or the NODE_OPTIONS environment variable. This will attach the inspector to NodeJS, but it does not stop execution. The variant --inspect-brk will break on debugger at start of the user script.

How does it work? It will open a port in localhost:9229, and then this can be accessed from Chromium browser URL chrome://inspect. The UI allows users to select which hosts to listen to for Node sessions. The end point can be modified using --inspect=[HOST:]PORT, --inspect-brk=[HOST:]PORT or with the specific command line argument --inspect-port=[HOST:]PORT.

Once you attach dev tools inspector, you can access the Memory tab as in the case of Chromium

There is a problem, though, when we are using NODE_OPTIONS. All instances of NodeJS will take the same parameter, so they will try to attach to the same host and port. And only the first instance will get the port. So it is less useful than we would expect for a session running multiple NodeJS processes (as it can be just running NPM or YARN to run stuff).

Oh, but there are some tricks!:

  • If you pass port 0 it will allocate a port (and report it through the console!). So you can inspect any arbitrary session (more details).
  • In POSIX systems such as Linux, the inspector will be enabled if the process receives SIGUSR1. This will run in default localhost:9229 unless a different setting is specified with --inspect-port=[HOST:]PORT (more details).

Using command line

Also, there are other ways to obtain heap snapshots directly, without using developer tools UI. NodeJS allows to pass different command line parameters for programming heap snapshot capture/profiling:

  • --heapsnapshot-near-heap-limit=N will dump a heap snapshot when the V8 heap is close to its maximum size limit. The N parameter is the number of times it will dump a new snapshot. This is important because, when V8 is reaching the heap limit, it will take measures to free memory through garbage collection, so in a pattern of growing usage we will hit the limit several times.
  • --heapsnapshot-signal=SIGNAL will dump heap snapshots every time the NodeJS process gets the UNIX signal SIGNAL.

We can also record a heap profiling session from the start of the process to the end (same kind of profiling we obtain from Dev Tools using Allocation sampling option) using command line option --heap-prof. This will sample continuously the memory allocations, and can be tuned using different command line parameters as documented here.

Analysis of heap snapshots

The scope of this post is about how to capture heap snapshots in different scenarios. But… once you have them… You will want to use that information to actually understand memory usage. Here are some good reads about how to use heap snapshots.

First, from Chrome DevTools documentation:

  • Memory terminology: it gives a great tour on how memory is allocated, and what heap snapshots try to represent.
  • Fix memory problems: this one provides some examples of how to use different tools in Chromium to understand memory usage, including some heap snapshot and profiling examples.
  • View snapshots: a high level view of the different heap snapshot and profiling tools.
  • How to Use the Allocation Profiler Tool: this one specific to the allocation profiler.

And then, from NodeJS, you have also a couple of interesting things:

  • Memory Diagnostics: some of this has been covered in this post, but still has an example of how to find a memory leak using Comparison.
  • Heap snapshot exercise: this is an exercise including a memory leak, that you can hunt with heap snapshot.

Recap

  • Memory is a valuable resource that Javascript (both web and NodeJS) application developers may want to profile.
  • As usual, when there are resource allocation problems, we need reliable and accurate information about what is happening and when.
  • V8 heap snapshots provide such information, integrated with Chromium and NodeJS.

Next

In a follow up post, I will talk about several optimizations we worked on, that make V8 heap snapshot implementation faster. Stay tuned!

Thanks!

This work has been thanks to the sponsorship from Igalia and Bloomberg.


by José Dapena Paz at May 18, 2023 03:26 PM