Planet Igalia

September 30, 2023

Brian Kardell

Florence Discovery

Florence Discovery

Last week, on my way back from TPAC I took a holiday in Florence, Italy. It was an awesome experience all around, but one thing was really personal and I wanted to write it down..

So... I am Italian.

I know, "Kardell" doesn't sound Italian - and that's because it isn't. Kardell is my biological father's name and it has, as best I know, a mix of German and Irish roots. But my from even my very youngest memories, I was raised with/by my maternal side. My parents divorced when I was all of 5.

My mother's maiden name is Frediani, and both of her parents were first born generations in America, shortly after their parents arrived. My maternal grandfather's family (Muzio Frediani) was from Tuscany. His father was a printer (and all of his kids were later). Their print shop was a historic landmark in Pittsburgh's strip district. He also founded the Italian Sons and Daughters of America. Among a lot of other things, they did a lot with Italian Americans, unions and political campaigns. My maternal grandmother's family name was Casale and they were from Silcily. Her father (Giuseppe Casale) was a contractor/mason and he built many of the houses in the neighborhood my grandparents lived in, and my mom now lives in.

This side of the family had a very strong culture. There was a lot of community. Some people in the family even still spoke Italian. They were constantly instilling this in me: You are Italian. And so, these are the roots that have always resonated in me.

When I learn about Michelangelo, Galileo, Dante, Leonardo, Raphael or Donatello (and so many more) - I felt... Actually connected to it somehow. I don't know, maybe that's silly. But the result was that Italy was the one place I've always wanted to go since I was a child.

I never really managed to get very far in making a family tree (lots of those services to help you feel outrageously expensive to me). Florence, where I went, is part of Tuscany. I went to Florence because my sister had been there kind of actively looking for connections to our family. Frediani isn't a super common name - I've never met another myself.

Anyway... I went to Florence.

A series of fortunate events.

I wasted no time, I spent pretty much all of the waking hours touring something: some church, garden, museum, street, bridge, etc. In a lucky coincidence, some friend from TPAC was also in Florence and we met up for dinner. They asked me if I had been to the church where Michelangelo was buried. No! They said "I think maybe Dante is also buried there?".

Wow I wanted to go so much - I'm so fortunate that I wound up in a situation where they were there and we had dinner and they could mention it! They had me at Michelangelo, I was very keen to go.

I went the next afternoon, but I was in shorts and that's not allowed and I'd have to shift my schedule and come back the following day.

So, on my very last day I went to this church at Sante Croce. Unlike every other thing I'd been to, there was no line. Because it was the last day, I had nowhere else I really wanted to go afterward. This was my last "big" stop. So, I decided I would really take my time, soak it all in, and pay my respects. Given this, I also decided to pay extra and do the audio guided tour.

Looking down the nave of the main church.

Inside I was surprised to see there were tombs in the floor everywhere - people were walking on them. A few were roped off - I think they were damaged, but mainly there were just occasionally these tombs in the floor. Around the outer wall of the church were bigger memorials to a number of important Italians (mainly Florentine, but not exclusively). Yeah, wow. Michelangelo, Fermi, Galileo, Dante, Machiavelli - so many.

I really took my time. I learned a ton of interesting stuff and took a lot of pictures.

Before leaving the main part of the church I sat down and listened to more about how all of the people buried here were somehow "important".

I had this fleeting thought that maybe I should go through carefully and see if I can find my family name, but I dismissed this very quickly. Florence is a huge place inside the huge place that is Tuscany - there can't be more than 70 people buried here. The odds would be astronautical. I decided to plod on.

And just then...

As the tour turns from the main church, it points you to a marble corridor leading to the Medici chapel (and more). On the floor down the hall are numerous engraved marble slabs, many are worn to unreadable condition. As I looked down the hallway my audio suddenly said

The corridor is paved with white marble tomb slabs with gray frames, which bear names and coats-of-arms erased by time. One inscription is more clearly legible than the others, almost as if time itself had obeyed the wish it expresses; see if you can find it, it's the tomb of Cosimo Frediani, and on it is written: "Don't tread on me".

Wait what? Surely that is just my mind playing tricks on me.

I rewound it and listened again. It sure sounded like my family name.

I found how to switch and look at the transcript.. It was!

It kind of gave me a chill.

I searched the stones, even enlisting the help of the museum personel. There it was: It was also the only one with a relief carved on it.

The marble tomb of Cosimo Frediani.


Very cool.

But... Who was he? What did he do? How far are we related? I spent some time searching and I'm not totally sure yet! We've found a bit, but it will surely be a new family quest to find out.

But, I mean... How amazing is it that maybe someone in my family he is buried perhaps 50-100 meters from all of those famous Italians that I feel so connected with 🖤.

What a great experience :)

September 30, 2023 04:00 AM

September 12, 2023

Eric Meyer

Nuclear Anchored Sidenotes

Exactly one year ago today, which I swear is a coincidence I only noticed as I prepared to publish this, I posted an article on how I coded the footnotes for The Effects of Nuclear Weapons.  In that piece, I mentioned that the footnotes I ended up using weren’t what I had hoped to create when the project first started.  As I said in the original post:

Originally I had thought about putting footnotes off to one side in desktop views, such as in the right-hand grid gutter.  After playing with some rough prototypes, I realized this wasn’t going to go the way I wanted it to…

I came back to this in my post “CSS Wish List 2023”, when I talked about anchor(ed) positioning.  The ideal, which wasn’t really possible a year ago without a bunch of scripting, was to have the footnotes arranged structurally as endnotes, which we did, but in a way that I could place the notes as sidenotes, next to the footnote reference, when there was enough space to show them.

As it happens, that’s still not really possible without a lot of scripting today, unless you have:

  1. A recent (as of late 2023) version of Chrome
  2. With the “Experimental web features” flag enabled

With those things in place, you get experimental support for CSS anchor positioning, which lets you absolutely position an element in relation to any other element, anywhere in the DOM, essentially regardless of their markup relationship to each other, as long as they conform to a short set of constraints related to their containing blocks.  You could reveal an embedded stylesheet and then position it next to the bit of markup it styles!

Anchoring Sidenotes

More relevantly to The Effects of Nuclear Weapons, I can enhance the desktop browsing experience by turning the popup footnotes into Tufte-style static sidenotes.  So, for example, I can style the list items that contain the footnotes like this:

.endnotes li {
	position: absolute;
	top: anchor(top);
	bottom: auto;
	left: calc(anchor(--main right) + 0.5em);
	max-width: 23em;
A sidenote next to the main text column, with its number aligned with the referencing number found in the main text column.

Let me break that down.  The position is absolute, and bottom is set to auto to override a previous bit of styling that’s needed in cases where a footnote isn’t being anchored.  I also decided to restrain the maximum width of a sidenote to 23em, for no other reason than it looked right to me.

(A brief side note, pun absolutely intended: I’m using the physical-direction property top because the logical-direction equivalent in this context, inset-block-start, only gained full desktop cross-browser support a couple of years ago, and that’s only true if you ignore IE11’s existence, plus it arrived in several mobile browsers only this year, and I still fret about those kinds of things.  Since this is desktop-centric styling, I should probably set a calendar reminder to fix these at some point in the future.  Anyway, see MDN’s entry for more.)

Now for the new and unfamiliar parts.

 top: anchor(top);

This sets the position of the top edge of the list item to be aligned with the top edge of its anchor’s box.  What is a footnote’s anchor?  It’s the corresponding superscripted footnote mark embedded in the text.  How does the CSS know that?  Well, the way I set things up  —  and this is not the only option for defining an anchor, but it’s the option that worked in this use case  —  the anchor is defined in the markup itself.  Here’s what a footnote mark and its associated footnote look like, markup-wise.

explosion,<sup><a href="#fnote01" id="fn01">1</a></sup> although
<li id="fnote01" anchor="fn01"><sup>1</sup> … </li>

The important bits for anchor positioning are the id="fn01" on the superscripted link, and the anchor="fn01" on the list item: the latter establishes the element with an id of fn01 as the anchor for the list item.  Any element can have an anchor attribute, thus creating what the CSS Anchor Positioning specification calls an implicit anchor.  It’s explicit in the HTML, yes, but that makes it implicit to CSS, I guess.  There’s even an implicit keyword, so I could have written this in my CSS instead:

 top: anchor(implicit top);

(There are ways to mark an element as an anchor and associate other elements with that anchor, without the need for any HTML.  You don’t even need to have IDs in the HTML.  I’ll get to that in a bit.)

Note that the superscripted link and the list item are just barely related, structurally speaking.  Their closest ancestor element is the page’s single <main> element, which is the link’s fourth-great-grandparent, and the list item’s third-great-grandparent.  That’s okay!  Much as a <label> can be associated with an input element across DOM structures via its for attribute, any element can be associated with an anchoring element via its anchor attribute.  In both cases, the value is an ID.

So anyway, that means the top edge of the endnote will be absolutely positioned to line up with the top edge of its anchor.  Had I wanted the top of the endnote to line up with the bottom edge of the anchor, I would have said:

 top: anchor(bottom);

But I didn’t.  With the top edges aligned, I now needed to drop the endnote into the space outside the main content column, off to its right.  At first, I did it like this:

 left: anchor(--main right);

Wait.  Before you think you can just automatically use HTML element names as anchor references, well, you can’t.  That --main is what CSS calls a dashed-ident, as in a dashed identifier, and I declared it elsewhere in my CSS.  To wit:

main {
	anchor-name: --main;

That assigns the anchor name --main to the <main> element in the CSS, no HTML attributes required.  Using the name --main to identify the <main> element was me following the common practice of naming things for what they are.  I could have called it --mainElement or --elMain or --main-column or --content or --josephine or --📕😉 or whatever I wanted.  It made the most sense to me to call it --main, so that’s what I picked.

Having done that, I can use the edges of the <main> element as positioning referents for any absolutely (or fixed) positioned element.  Since I wanted the left side of sidenotes to be placed with respect to the right edge of the <main>, I set their left to be anchor(--main right).

Thus, taking these two declarations together, the top edge of a sidenote is positioned with respect to the top edge of its implicit anchor, and its left edge is positioned with respect to the right edge of the anchor named --main.

	top: anchor(top);
	left: anchor(--main right);

Yes, I’m anchoring the sidenotes with respect to two completely different anchors, one of which is a descendant of the other.  That’s okay!  You can do that!  Literally, you could position each edge of an anchored element to a separate anchor, regardless of how they relate to each other structurally.

Once I previewed the result of those declarations, I saw I the sidenotes were too close to the main content, which makes sense: I had made the edges adjacent to each other.

Red borders showing the edges of the sidenote and the main column touching.

I thought about using a left margin on the sidenotes to push them over, and that would work fine, but I figured what the heck, CSS has calculation functions and anchor functions can go inside them, and any engine supporting anchor positioning will also support calc(), so why not?  Thus:

 left: calc(anchor(--main right) + 0.5em);

I wrapped those in a media query that only turned the footnotes into sidenotes at or above a certain viewport width, and wrapped that in a feature query so as to keep the styles away from non-anchor-position-understanding browsers, and I had the solution I’d envisioned at the beginning of the project!

Except I didn’t.

Fixing Proximate Overlap

What I’d done was fine as long as the footnotes were well separated.  Remember, these are absolutely positioned elements, so they’re out of the document flow.  Since we still don’t have CSS Exclusions, there needs to be a way to deal with situations where there are two footnotes close to each other.  Without it, you get this sort of thing.

Two sidenotes completely overlapping with each other.  This will not do.

I couldn’t figure out how to fix this problem, so I did what you do these days, which is I posted my problem to social media.  Pretty quickly, I got a reply from the brilliant Roman Komarov, pointing me at a Codepen that showed how to do what I needed, plus some very cool highlighting techniques.  I forked it so I could strip it down to the essentials, which is all I really needed for my use case, and also have some hope of understanding it.

Once I’d worked through it all and applied the results to TEoNW, I got exactly what I was after.

The same two sidenotes, except now there is no overlap.

But how?  It goes like this:

.endnotes li {
	position: absolute;
	anchor-name: --sidenote;
	top: max(anchor(top) , calc(anchor(--sidenote bottom) + 0.67em));
	bottom: auto;
	left: calc(anchor(--main right) + 0.5em);
	max-width: 23em;

Whoa.  That’s a lot of functions working together there in the top value.  (CSS is becoming more and more functional, which I feel some kind of way about.)  It can all be verbalized as, “the position of the top edge of the list item is either the same as the top edge of its anchor, or two-thirds of an em below the bottom edge of the previous sidenote, whichever is further down”.

The browser knows how to do this because the list items have all been given an anchor-name of --sidenote (again, that could be anything, I just picked what made sense to me).  That means every one of the endnote list items will have that anchor name, and other things can be positioned against them.

Those styles mean that I have multiple elements bearing the same anchor name, though.  When any sidenote is positioned with respect to that anchor name, it has to pick just one of the anchors.  The specification says the named anchor that occurs most recently before the thing you’re positioning is what wins.  Given my setup, this means an anchored sidenote will use the previous sidenote as the anchor for its top edge.

At least, it will use the previous sidenote as its anchor if the bottom of the previous sidenote (plus two-thirds of an em) is lower than the top edge of its implicit anchor.  In a sense, every sidenote’s top edge has two anchors, and the max() function picks which one is actually used in every case.

CSS, man.

Remember that all this is experimental, and the specification (and thus how anchor positioning works) could change.  The best practices for accessibility are also not clear yet, from what I’ve been able to find.  As such, this may not be something you want to deploy in production, even as a progressive enhancement.  I’m holding off myself for the time being, which means none of the above is currently used in the published version of The Effects of Nuclear Weapons.  If people are interested, I can create a Codepen to illustrate.

I do know this is something the CSS Working Group is working on pretty hard right now, so I have hopes that things will finalize soon and support will spread.

My thanks to Roman Komarov for his review of and feedback on this article.  For more use cases of anchor positioning, see his lengthy (and quite lovely) article “Future CSS: Anchor Positioning”.

Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at September 12, 2023 03:16 PM

September 06, 2023

Eric Meyer

Memories of Molly

The Web is a little bit darker today, a fair bit poorer: Molly Holzschlag is dead.  She lived hard, but I hope she died easy.  I am more sparing than most with my use of the word “friend”, and she was absolutely one.  To everyone.

If you don’t know her name, I’m sorry.  Too many didn’t.  She was one of the first web gurus, a title she adamantly rejected  —  “We’re all just people, people!”  —  but it fit nevertheless.  She was a groundbreaker, expanding and explaining the Web at its infancy.  So many people, on hearing the mournful news, have described her as a force of nature, and that’s a title she would have accepted with pride.  She was raucous, rambunctious, open-hearted, never ever close-mouthed, blazing with fire, and laughed (as she did everything) with her entire chest, constantly.  She was giving and took and she hurt and she wanted to heal everyone, all the time.  She was messily imperfect, would tell you so loudly and repeatedly, and gonzo in all the senses of that word.  Hunter S. Thompson should have written her obituary.

I could tell so many stories.  The time we were waiting to check into a hotel, talking about who knows what, and realized Little Richard was a few spots ahead of us in line.  Once he’d finished checking in, Molly walked right over to introduce herself and spend a few minutes talking with him.  An evening a group of us had dinner one the top floor of a building in Chiba City and I got the unexpectedly fresh shrimp hibachi.  The time she and I were chatting online about a talk or training gig, somehow got onto the subject of Nick Drake, and coordinated a playing of “ Three Hours” just to savor it together.  A night in San Francisco where the two of us went out for dinner before some conference or other, stopped at a bar just off Union Square so she could have a couple of drinks, and she got propositioned by the impressively drunk couple seated next to her after they’d failed to talk the two of us into hooking up.  The bartender couldn’t stop laughing.

Or the time a bunch of us were gathered in New Orleans (again, some conference or other) and went to dinner at a jazz club, where we ended up seated next to the live jazz trio and she sang along with some of the songs.  She had a voice like a blues singer in a cabaret, brassy and smoky and full of hard-won joys, and she used it to great effect standing in front of Bill Gates to harangue him about Internet Explorer.  She raised it to fight like hell for the Web and its users, for the foundational principles of universal access and accessible development.  She put her voice on paper in some three dozen books, and was working on yet another when she died.  In one book, she managed to sneak past the editors an example that used a stick-figure Kama Sutra custom font face.  She could never resist a prank, particularly a bawdy one, as long as it didn’t hurt anyone.

She made the trek to Cleveland at least once to attend and be part of the crew for one of our Bread and Soup parties.  We put her to work rolling tiny matzoh balls and she immediately made ribald jokes about it, laughing harder at our one-up jokes than she had at her own.  She stopped by the house a couple of other times over the years, when she was in town for consulting work, “Auntie Molly” to our eldest and one of my few colleagues to have spent any time with Rebecca.  Those pictures were lost, and I still keenly regret that.

There were so many things about what the Web became that she hated, that she’d spent so much time and energy fighting to avert, but she still loved it for what it could be and what it had been originally designed to be.  She took more than one fledgling web designer under her wing, boosted their skills and careers, and beamed with pride at their accomplishments.  She told a great story about one, I think it was Dunstan Orchard but I could be wrong, and his afternoon walk through a dry Arizona arroyo.

I could go on for pages, but I won’t; if this were a toast and she were here, she would have long ago heckled me (affectionately) into shutting up.  But if you have treasured memories of Molly, I’d love to hear them in the comments below, or on your own blog or social media or podcasts or anywhere.  She loved stories.  Tell hers.

Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at September 06, 2023 03:44 PM

September 04, 2023

Clayton Craft

Upgrading Steam Deck storage the lazy way

When I bought my Steam Deck over a year ago, I purchased the basic model with a very unassuming 64GB eMMC module for primary storage. I knew this wouldn't be enough on its own for holding the games I like to play, so I've gotten by with having my Steam library installed on an SD card. This has worked surprisingly well, despite the lower performance of the SD card when compared to something like an NVMe. Load times, etc actually weren't all that bad! Alas, all good things must come to an end... I started having issues with the eMMC drive filling up and I was constantly interrupted with "low storage space" notifications in Steam. I guess Steam stores saved games, Mesa shader cache, and other things like that on the primary drive despite having gobs of space on SD storage. Oops!

Since I can't be bothered to enter my Steam login, WiFi credentials, etc a second time on my Deck, I really wanted to preserve/transfer as much data from the eMMC to a new NVMe drive as possible. That's what this post is about. Spoiler alert: I was successful, and it was relatively easy/painless.

by Unknown at September 04, 2023 12:00 AM

August 27, 2023

Brian Kardell

Completely Random Car and Design Stuff

Completely Random Car and Design Stuff

This is a kind of "off brand" post for me. It's very, very random and just for fun. It's just a whole bunch of mental wandering literally all over place about cars, design, AI and only a very little bit about the web. Many of these observations are probably extermely specific to America, but perhaps still interesting. There is no overarching "point" here, it's just thoughts.

There are certain trends that I notice in the design of everyday things. Sometimes I think I notice them developing early and have lots of conversations pointing them out to other people (especially my son) and wondering aloud about how all of this will look in retrospect: Is this one of those things that will come to represent this "era"?

If it's not clear what I mean, basically every single thing in the photo below screams 1970s:

A Frigidare ad from the 1970s. A woman standanding in a kitchen with avacado Frigidare appliances and countertop, a green and yellow flowered wallpaper, she's wearing 70's attire, the cabinet design is somehow also unique to the era, the quality of the film is also reminscant of the 1970s.

It seems like there are some things that come to become associated with an era... Stuff that got really popular for a period of time and then left behind.

Several years ago I pointed out two changes I thought I saw happening in new car colors, and I started to point them out to my son. If you haven't noticed it, it's real. For a lot of years the mainstream of automobiles might have varied in real ways (you'd realize how much if you ever tried to match paint on your car), it's still been a fairly common range and palette of colors that I think mainly would be described with words like "shiny" or "glossy". I think this has been generally true since the 70's or 80's where it seems I remember some more different reds and oranges and earthy colors (my dad's old truck maybe was interesting in both ways). In any case, now we have all of these new "ceramic colors" which look more "baked in" to the materials and matte. More often than not they're grays but there are some beautiful turquoises and reds. Also a new common thing is "blacked out" emblems. I think both of these things began with trendsetting aftermarket people.

Another thing that this made me think about and try to explain to my son was that that's kind of true of automobile body styles and some other characteristics too. I found it surprisingly hard to describe because it's not as if there is a single kind of car in any era - it's just that there are a smaller array of characteristics that you can recognize as belonging to that era.

Don't you think a lot of cars today have similar characteristics? I do. But... They also feel like they are increasingly resembling a car developed at the end of the 1970's, by a car company that doesn't even exist anymore. The car company is AMC (American Motors Corporation) and the car was the Eagle SX/4. Below is an advertisement

Advertisement for the Eagle SX/4 around 1979

Basically, it was sort of the first crossover vehicle that combined a car and 4 wheel (later all wheel) drive. But also, just visually: Do you see a resemblance to a lot of popular cars today? I do. I think this car looks much more similar to many cars I see on the road today than most of what I saw for the 40 years or so between them. Also interesting, it was available in one of those interesting earthy yellow/tan (and kind of matte!) colors with dark gray/black accents that kind of mostly disappeared, but I could imagine being reborn today.

Photo of the Eagle SX/4 in an earthy, somewhat matte tan/yellow with interesting dark trims.

So, I was thinking about this and the fact it was kind of ahead of its time. In fact, I searched for exactly that and found this article with a title that is pretty much that: The AMC Eagle SX/4 – An American 4×4 Sports Car That Was Ahead Of Its Time".

That's cool, but I was also wondering how much of that came from the fact that it was not (yet) part of "The Big 3". There used to be a lot more American car companies, but we just kept consolidating into those 3. Lots of those became "brands" of the Big 3 for a while, but eventually homogenized. A huge number of them no longer exist. My first car was a Pontiac. The last Pontiac produced rolled off the line in January 2010. Saturn, same. My mom drove a Plymouth Voyager. Plymouth was discontinued in 2001. In fact, the car I drive now sometimes (not even American: an Isuzo Amigo) hasn't been sold in the US since 2000. And so on.

I was thinking that most of the diversity today that drives anything about automobiles, pretty much doesn't really come from those Big 3. And wondering if maybe that meant something for web engines too. Does it? I don't know. For sure we can see outside (but still standards compliant) innovation in browsers like Arc or Vivaldi or Polypane. I suppose the open source nature makes that more easily possible, but it's worth noting too that this is kept alive mainly through ~80% of that bill being footed by the Web Engines Big 3.

Anyway... This is already all over the place (sorry, that's how my mind works!), but in sitting down this morning to write this I had an idea that perhaps AI could help me "illustrate" if not explain the "look" of body styles associated with different decades. Here's what I found out: Midjourney is pretty bad at it!

The photo below was generated with the prompt "photo of a typical 2010's automobile in America with the most common body style of cars built between 2000 and 2009".

A picture that is, for all intents and purposes, almost literally an early 1950's Buick but with a different emblem.

I tried several variations and wound up with something like this as one of the 4 options no matter how explicit it was. I wonder why? I'd kind of think that identifing and sort of recyling the "trends" like that would be very much what AI models like this would be really good at - but apparently not as good as I'd imagine!

I decided to check the other way round, just for giggles. The inverse worked better. Giving it the first image in this article mentioned the 1970's in its explanation! Interesting!

Anyway, like I said at the get-go: There's no overall point here :)

August 27, 2023 04:00 AM

August 23, 2023

Emmanuele Bassi

The Mirror

The GObject type system has been serving the GNOME community for more than 20 years. We have based an entire application development platform on top of the features it provides, and the rules that it enforces; we have integrated multiple programming languages on top of that, and in doing so, we expanded the scope of the GNOME platform in a myriad of directions. Unlike GTK, the GObject API hasn’t seen any major change since its introduction: aside from deprecations and little new functionality, the API is exactly the same today as it was when GLib 2.0 was released in March 2002. If you transported a GNOME developer from 2003 to 2023, they would have no problem understanding a newly written GObject class; though, they would likely appreciate the levels of boilerplate reduction, and the performance improvements that have been introduced over the years.

While having a stable API last this long is definitely a positive, it also imposes a burden on maintainers and users, because any change has to be weighted against the possibility of introducing unintended regressions in code that uses undefined, or undocumented, behaviour. There’s a lot of leeway when it comes to playing games with C, and GObject has dark corners everywhere.

The other burden is that any major change to a foundational library like GObject cascades across the entire platform. Releasing GLib 3.0 today would necessitate breaking API in the entirety of the GNOME stack and further beyond; it would require either a hard to execute “flag day”, or an impossibly long transition, reverberating across downstreams for years to come. Both solutions imply amounts of work that are simply not compatible with a volunteer-based project and ecosystem, especially the current one where volunteers of core components are now stretched thin across too many projects.

And yet, we are now at a cross-roads: our foundational code base has reached the point where recruiting new resources capable of affecting change on the project has become increasingly difficult; where any attempt at performance improvement is heavily counterbalanced by the high possibility of introducing world-breaking regressions; and where fixing the safety and ergonomics of idiomatic code requires unspooling twenty years of limitations inherent to the current design.

Something must be done if we want to improve the coding practices, the performance, and the safety of the platform without a complete rewrite.

The Mirror

‘Many things I can command the Mirror to reveal,’ she answered, ‘and to some I can show what they desire to see. But the Mirror also show things unbidden, and those are often stranger and more profitable than things we wish to behold. What you will see, if you leave the Mirror free to work, I cannot tell. For it shows things that were, and things that are, and things that yet may be. But which it is that he sees, even the wisest cannot always tell. Do you wish to look?’ — Lady Galadriel, “The Lords of the Rings”, Volume 1: The Fellowship of the Ring, Book 2: The Ring Goes South

In order to properly understand what we want to achieve, we need to understand the problem space that the type system is meant to solve, and the constraints upon which the type system was implemented. We do that by holding GObject up to Galadriel’s Mirror, and gazing into its surface.

Things that were

History became legend. Legend became myth. — Lady Galadriel, “The Lord of the Rings: The Fellowship of the Ring”

Before GObject there was GtkObject. It was a simpler time, it was a simpler stack. You added types only for the widgets and objects that related to the UI toolkit, and everything else was C89, with a touch of undefined behaviour, like calling function pointers with any number of arguments. Properties were “arguments”, likes were florps, and the timeline went sideways.

We had a class initialisation and an instance initialisation functions; properties were stored in a global hash table, but the property multiplexer pair of functions was stored on the type data instead of using the class structure. Types did not have private data: you only had keyed fields. No interfaces, only single inheritance. GtkObject was reference counted, and had an initially “floating” reference, to allow transparent ownership transfer from child to parent container when writing C code, and make the life of every language binding maintainer miserable in the process. There were weak references attached to an instance that worked by invoking a callback when the instance’s reference count reached zero. Signals operated exactly as they do today: large hash table of signal information, indexed by an integer.

None of this was thread safe. After all, GTK was not thread safe either, because X11 was not thread safe; and we’re talking about 1997: who even had hardware capable of multi-threading at the time? NPTL wasn’t even a thing, yet.

The introduction of GObject in 2001 changed some of the rules—mainly, around the idea of having dynamic types that could be loaded and unloaded in order to implement plugins. The basic design of the type system, after all, came from Beast, a plugin-heavy audio application, and it was extended to subsume the (mostly) static use cases of GTK. In order to support unloading, the class aspect of the type system was allowed to be cleaned up, but the type data had to be registered and never unloaded; in other words, once a type was registered, it was there forever.

Arguments” were renamed to properties, and were extended to include more than basic types, provide validations, and notify of changes; the overall design was still using a global hash table to store all the properties across all types. Properties were tied to the GObject type, but the property definition existed as a separate type hierarchy that was designed to validate values, but not manage fields inside a class. Signals were ported wholesale, with minimal changes mainly around the marshalling of values and abstracting closures.

The entire plan was to have GObject as one of the base classes at the root of a specific hierarchy, with all the required functionality for GTK to inherit from for its own GtkObject, while leaving open the possibility of creating other hierarchies, or even other roots with different functionality, for more lightweight objects.

These constraints were entirely intentional; the idea was to be able to port GTK to the new type system, and to an out of tree GLib, during the 1.3 development phase, and minimise the amount of changes necessary to make the transition work not just inside GTK, but inside of GNOME too.

Little by little, the entire GObject layer was ported towards thread safety in the only way that worked without breaking the type system: add global locks around everything; use read-write locks for the type data; lock the access and traversal of the property hash table and of the signals table. The only real world code bases that actively exercised multi-threading support were GStreamer and the GNOME VFS API that was mainly used by Nautilus.

With the 3.0 API, GTK dropped the GtkObject base type: the whole floating reference mechanism was moved to GObject, and a new type was introduced to provide the “initial” floating reference to derived types. Around the same time, a thread-safe version of weak references for GObject appeared as a separate API, which confused the matter even more.

Things that are

Darkness crept back into the forests of the world. Rumour grew of a Shadow in the East, whispers of a Nameless Fear. — Lady Galadriel, “The Lord of the Rings: The Fellowship of the Ring”

Let’s address the elephant in the room: it’s completely immaterial how many lines of code you have to deal with when creating a new type. It’s a one-off cost, and for most cases, it’s a matter of using the existing macros. The declaration and definition macros have the advantages of enforcing a series of best practices, and keep the code consistent across multiple projects. If you don’t want to deal with boilerplate when using C, you chose the wrong language to begin with. The existence of excessive API is mainly a requirement to allow other languages to integrate their type system with GObject’s own.

The dynamic part of the type system has gotten progressively less relevant. Yes: you can still create plugins, and those can register types; but classes are never unloaded, just like their type data. There is some attempt at enforcing an order of operations: you cannot just add an interface after a type has been instantiated any more; and while you can add properties and signals after class initialisation, it’s mainly a functionality reserved for specific language bindings to maintain backward compatibility.

Yes, defining properties is boring, and could probably be simplified, but the real cost is not in defining and installing a GParamSpec: it’s in writing the set and get property multiplexers, validating the values, boxing and unboxing the data, and dealing with the different property flags; none of those things can be wrapped in some fancy C preprocessor macros—unless you go into the weeds with X macros. The other, big cost of properties is their storage inside a separate, global, lock-happy hash table. The main use case of this functionality—adding entirely separate classes of properties with the same semantics as GObject properties, like style properties and child properties in GTK—has completely fallen out of favour, and for good reasons: it cannot be managed by generic code; it cannot be handled by documentation generators without prior knowledge; and, for the same reason, it cannot be introspected. Even calling these “properties” is kind of a misnomer: they are value validation objects that operate only when using the generic (and less performant) GObject accessor API, something that is constrained to things like UI definition files in GTK, or language bindings. If you use the C accessors for your own GObject type, you’ll have to implement validation yourself; and since idiomatic code will have the generic GObject accessors call the public C API of your type, you get twice the amount of validation for no reason whatsoever.

Signals have mostly been left alone, outside of performance improvements that were hard to achieve within the constraints of the existing implementation; the generic FFI-based closure turned out to be a net performance loss, and we’re trying to walk it back even for D-Bus, which was the main driver for it to land in the first place. Marshallers are now generated with a variadic arguments variant, to reduce the amount of boxing and unboxing of GValue containers. Still, there’s not much left to squeeze out of the old GSignal API.

The atomic nature of the reference counting can be a costly feature, especially for code bases that are by necessity single-threaded; the fact that the reference count field is part of the (somewhat) public API prevents fundamental refactorings, like switching to biased reference counting for faster operations on the same thread that created an instance. The lack of room on GObject also prevents storing the thread ID that owns the instance, which in turn prevents calling the GObjectClass.dispose() and GObjectClass.finalize() virtual functions on the right thread, and requires scheduling the destruction of an object on a separate main context, or locking the contents of an object at a further cost.

Things that yet may be

The quest stands upon the edge of a knife: stray but a little, and it will fail to the ruin of all. Yet hope remains, while the company is true. — Lady Galadriel, “The Lord of the Rings: The Fellowship of the Ring”

Over the years, we have been strictly focusing on GObject: speeding up its internals, figuring out ways to improve property registration and performance, adding new API and features to ensure it behaved more reliably. The type system has also been improved, mainly to streamline its use in idiomatic GObject code bases. Not everything worked: properties are still a problem; weak references and pointers are a mess, with two different API that interact badly with GObject; signals still exists on a completely separate plane; GObject is still wildly inefficient when it comes to locking.

The thesis of this strawman is that we reached the limits of backwards compatibility of GObject, and any attempt at improving it will inevitably lead to a more brittle code, rife with potential regressions. The typical answer, in this case, would be to bump the API/ABI of GObject, remove the mistakes of the past, and provide a new idiomatic approach. Sadly, doing so not only would require a level of resources we, as the GLib project stewards, cannot provide, but it would also completely break the entire ecosystem in a way that is not recoverable. Either nobody would port to the new GObject-3.0 API; or the various projects that depend on GObject would inevitably fracture, following whichever API version they can commit to; in the meantime, downstream distributors would suffer the worst effects of the shared platform we call “Linux”.

Between inaction and slow death, and action with catastrophic consequences, there’s the possibility of a third option: what if we stopped trying to emulate Java, and have a single “god” type?

Our type system is flexible enough to support partitioning various responsibilities, and we can defer complexity where it belongs: into faster moving dependencies, that have the benefit of being able to iterate and change at a much higher rate than the foundational library of the platform. What’s the point of shoving every possible feature into the base class, in order to cover ever increasingly complex use cases across multiple languages, when we can let consumers decide to opt into their own well-defined behaviours? What GObject ought to provide is a set of reliable types that can be combined in expressive ways, and that can be inspected by generic API.

A new, old base type

We already have a derivable type, called GTypeInstance. Typed instances don’t have any memory management: once instantiated, they can only be moved, or freed. All our objects already are typed instances, since GObject inherits from it. Contrary to the current common practices we should move towards using GTypeInstance for our types.

There’s a distinct lack of convenience API for defining typed instances, mostly derived from the fact that GTypeInstance is seen as a form of “escape hatch” for projects to use in order to avoid GObject. In practice, there’s nothing that prevents us from improving the convenience of creating new instantiatable/derivable types, especially if we start using them more often. The verbose API must still exist, to allow language bindings and introspection to handle this kind of types, but just like we made convenience macros for declaring and defining GObject types, we can provide macros for new typed instances, and for setting up a GValue table.

Optional functionality

Typed instances require a wrapper API to free their contents before calling g_type_free_instance(). Nothing prevents us from adding a GFinalizable interface that can be implemented by a GTypeInstance, though: interfaces exist at the type system level, and do not require GObject to work.

typedef struct {
  void (* finalize) (GFinalizable *self);
} GFinalizableInterface;

If a typed instance provides an implementation of GFinalizable, then g_type_free_instance() can free the contents of the instance by calling g_finalizable_finalize().

This interface is optional, in case your typed instance just contains simple values, like:

typedef struct {
  GTypeInstance parent;

  bool is_valid;
  double x1, y1;
  double x2, y2;
} Box;

and does not require deallocations outside of the instance block.

A similar interface can be introduced for cloning instances, allowing a copy operation alongside a move:

typedef struct {
  GClonable * (* clone) (GClonable *self);
} GClonable;

We could then introduce g_type_instance_clone() as a generic entry point that either used GClonable, or simply allocated a new instance and called memcpy() on it, using the size of the instance (and eventual private data) known to the type system.

The prior art for this kind of functionality exists in GIO, in the form of the GInitable and GAsyncInitable interfaces; unfortunately, those interfaces require GObject, and they depend on GCancellable and GAsyncResult objects, which prevent us from moving them into the lower level API.

Typed containers and life time management

The main functionality provided by GObject is garbage collection through reference counting: you acquire a (strong) reference when you need to access an instance, and release it when you don’t need the instance any more. If the reference you released was the last one, the instance gets finalized.

Of course, once you introduce strong references you open the door to a veritable bestiary of other type of references:

  • weak references, used to keep a “pointer” to the instance, and get a notification when the last reference drops
  • floating references, used as a C convenience to allow ownership transfer of newly constructed “child” objects to their “parent”
  • toggle references, used by language bindings that acquire a strong reference on an instance they wrap with a native object; when the toggle reference gets triggered it means that the last reference being held is the one on the native wrapper, and the wrapper can be dropped causing the instance to be finalized

All of these types of reference exist inside GObject, but since they were introduced over the years, they are bolted on top of the base class using the keyed data storage, which comes with its own costly locking and ordering; they are also managed through the finalisation code, which means there are re-entrancy issues or undefined ordering behaviours that routinely crop up over the years, especially when trying to optimise construction and destruction phases.

None of this complexity is, strictly speaking, necessary; we don’t care about an instance being reference counted: a “parent” object can move the memory of a “child” typed instance directly into its own code. What we care about is that, whenever other code interacts with ours, we can hand out a reference to that memory, so that ownership is maintained.

Other languages and standard libraries have the same concept:

These constructs are not part of a base class: they are wrappers around instances. This means you’re not handing out a reference to an instance: you are handing out a reference to a container, which holds the instance for you. The behaviour of the value is made explicit by the type system, not implicit to the type.

A simple implementation of a typed “reference counted” container would provide us with both strong and weak references:

typedef struct _GRc GRc;
typedef struct _GWeak GWeak;

GRc *g_rc_new (GType data_type, gpointer data);

GRc *g_rc_acquire (GRc *rc);
void g_rc_release (GRc *rc);

gpointer g_rc_get_data (GRc *rc);

GWeak *g_rc_downgrade (GRc *rc);
GRc *g_weak_upgrade (GWeak *weak);

bool g_weak_is_empty (GWeak *weak);
gpointer g_weak_get_data (GWeak *weak);

Alongside this type of containers, we could also have a specialisation for atomic reference counted containers; or pinned containers, which guarantee that an object is kept in the same memory location; or re-implement referenced containers inside each language binding, to ensure that the behaviour is tailored to the memory management of those languages.

Specialised types

Container types introduce the requirement of having the type system understand that an object can be the product of two types: the type of the container, and the type of the data. In order to allow properties, signals, and values to effectively provide introspection of this kind of container types we are going to need to introduce “specialised” types:

  • GRc exists as a “generic”, abstract type in the type system
  • any instance of GRc that contains a instance of type A gets a new type in the type system

A basic implementation would look like:

GRc *
g_rc_new (GType data_type, gpointer data)
  // Returns an existing GType if something else already
  // has registered the same GRc<T>
  GType rc_type =
    g_generic_type_register_static (G_TYPE_RC, data_type);

  // Instantiates GRc, but gives it the type of
  // GRc<T>; there is only the base GRc class
  // and instance initialization functions, as
  // GRc<T> is not a pure derived type
  GRc *res = (GRc *) g_type_create_instance (rc_type);
  res->data = data;

  return res;

Any instance of type GRc<A> satisfies the “is-a” relationship with GRc, but it is not a purely derived type:

GType rc_type =
  ((GTypeInstance *) rc)->g_class.g_type;
g_assert_true (g_type_is_a (rc_type, G_TYPE_RC));

The GRc<A> type does not have a different instance or class size, or its own class and instance initialisation functions; it’s still an instance of the GRc type, with a different GType. The GRc<A> type only exists at run time, as it is the result of the type instantiation; you cannot instantiate a plain GRc, or derive your type from GRc in order to create your own reference counted type, either:

GRc *rc = g_type_create_instance (G_TYPE_RC);

typedef GRc GtkWidget;

You can only use a GRc inside your own instance:

typedef struct {
  // GRc<GtkWidget>
  GRc *parent;
  // GRc<GtkWidget>
  GRc *first_child;
  // GRc<GtkWidget>
  GRc *next_sibling;

  // ...
} GtkWidgetPrivate;

Tuple types

Tuples are generic containers of N values, but right now we don’t have any way of formally declaring them into the type system. A hack is to use arrays of similarly typed values, but with the deprecation of GValueArray—which is a bad type that does not allow reference counting, and does not give you guarantees anyway—we only have C arrays and pointer types.

Registering a new tuple type would work like a generic type: a base GTuple abstract type as the “parent”, and a number of types:

typedef struct _GTuple GTuple;

GTuple *
g_tuple_new_int (size_t n_elements,
                 int elements[])
  GType tuple_type =
    g_tuple_type_register_static (G_TYPE_TUPLE, n_elements, G_TYPE_INT);

  GTuple *res = g_type_create_instance (tuple_type);
  for (size_t i = 0; i < n_elements; i++)
    g_tuple_add (res, elements[i]);

  return res;

We can also create specialised tuple types, like pairs:

typedef struct _GPair GPair;

GPair *
g_pair_new (GType this_type,
            GType that_type,

This would give use the ability to standardise our API around fundamental types, and reduce the amount of ad hoc container types that libraries have to define and bindings have to wrap with native constructs.

Sum types

Of course, once we start with specialised types, we end up with sum types:

typedef enum {
} ShapeKind;

typedef struct {
  GTypeInstance parent;

  ShapeKind kind;

  union {
    struct { Point origin; float side; };
    struct { Point origin; Size size; };
    struct { Point center; float radius; };
  } shape;
} Shape;

As of right now, discriminated unions don’t have any special handling in the type system: they are generally boxed types, or typed instances, but they require type-specific API to deal with the discriminator field and type. Since we have types for enumerations and instances, we can register them at the same time, and provide offsets for direct access:

g_sum_type_register_static (const char *name,
                            size_t class_size,
                            size_t instance_size,
                            GType tag_enum_type,
                            offset_t tag_field);

This way it’s possible to ask the type system for:

  • the offset of the tag in an instance, for direct access
  • all the possible values of the tag, by inspecting its GEnum type

From then on, we can easily build types like Option and Result:

typedef enum {
} GResultKind;

typedef struct {
  GTypeInstance parent;

  GResultKind type;
  union {
    GValue value;
    GError *error;
  } result;
} GResult;

// ...
g_sum_type_register_static ("GResult",
                            sizeof (GResultClass),
                            sizeof (GResult),
                            offsetof (GResult, type));

// ...
GResult *
g_result_new_boolean (gboolean value)
  GType res_type =
    g_generic_type_register_static (G_TYPE_RESULT,
  GResult *res =
    g_type_create_instance (res_type);
  g_value_set_boolean (&res->result.value, value);

  return res;

// ...
g_autoptr (GResult) result = obj_finish (task);
switch (g_result_get_kind (result)) {
  case G_RESULT_OK:
    g_print ("Result: %s\n",
      g_result_get_boolean (result)
        ? "true"
        : "false");

    g_printerr ("Error: %s\n",
      g_result_get_error_message (result));

// ...
g_autoptr (GResult) result =
  g_input_stream_read_bytes (stream);
if (g_result_is_error (result)) {
  // ...
} else {
  g_autoptr (GBytes) data = g_result_get_boxed (result);
  // ...

Consolidating GLib and GType

Having the type system in a separate shared library did make sense back when GLib was spun off from GTK; after all, GLib was mainly a set of convenient data types for a language that lacked a decent standard library. Additionally, not many C projects were interested in the type system, as it was perceived as a big chunk of functionality in an era where space was at a premium. These days, the smallest environment capable of running GLib code is plenty capable of running the GObject type system as well. The separation between GLib data types and the GObject type system has created data types that are not type safe, and work by copying data, by having run time defined destructor functions, or by storing pointers and assuming everything will be fine. This leads to code duplication between shared libraries, and prevents the use of GLib data types in the public API, lest the introspection information gets lost.

Moving the type system inside GLib would allow us to have properly typed generic container types, like a GVector replacing GArray, GPtrArray, GByteArray, as well as the deprecated GValueArray; or a GMap and a GSet, replacing GHashTable, GSequence, and GtkRBTree. Even the various list models could be assembled on top of these new types, and moved out of GTK.

Current consumers of GLib-only API would still have their basic C types, but if they don’t want to link against a slightly bigger shared library that includes GTypeInstance, GTypeInterface, and the newly added generic, tuple, and sum types, then they would probably be better served by projects like c-util instead.


Instead of bolting properties on top of GParamSpec, we can move their definition into the type system; after all, properties are a fundamental part of a type, so it does not make sense to bind them to the class instantiation. This would also remove the long-standing issue of properties being available for registration long after a class has been initialised; it would give us the chance to ship a utility for inspecting the type system to get all the meta-information on the hierarchy and generating introspection XML without having to compile a small binary.

If we move property registration to the type registration we can also finally move away from multiplexed accessors, and use direct instance field access where applicable:

GPropertyBuilder builder;

g_property_builder_init (&builder,
  G_TYPE_STRING, "name");
// Stop using flags, and use proper setters; since
// there's no use case for unsetting the readability
// flag, we don't even need a boolean argument
g_property_builder_set_readwrite (&builder);
// The offset is used for read and write access...
g_property_builder_set_private_offset (&builder,
  offsetof (GtkWidgetPrivate, name));
// ... unless an accessor function is provided; in
// this case we want setting a property to go through
// a function
g_property_builder_set_setter_func (&builder,

// Register the property into the type; we return the
// offset of the property into the type node, so we can
// access the property definition with a fast look up
properties[NAME] =
  g_type_add_instance_property (type,
    g_property_builder_end (&builder));

Accessing the property information would then be a case of looking into the type system under a single reader lock, instead of traversing all properties in a glorified globally locked hash table.

Once we have a property registered in the type system, accessing it is a matter of calling API on the GProperty object:

gtk_widget_set_name (GtkWidget *widget,
                     const char *name)
  GProperty *prop =
    g_type_get_instance_property (GTK_TYPE_WIDGET,

  g_property_set (prop, name);


Moving signal registration into the type system would allow us to subsume the global locking into the type locks; it would also give us the chance to simplify some of the complexity for re-emission and hooks:

GSignalBuilder builder;

g_signal_builder_init (&builder, "insert-text");
g_signal_builder_set_args (&builder, 3,
  (GSignalArg[]) {
    { .name = "text", .gtype = G_TYPE_STRING },
    { .name = "length", .gtype = G_TYPE_SIZE },
    { .name = "position", .gtype = G_TYPE_OFFSET },
g_signal_builder_set_retval (&builder,
g_signal_builder_set_class_offset (&builder,
  offsetof (EditableClass, insert_text));

signals[INSERT_TEXT] =
  g_type_add_class_signal (type,
    g_signal_builder_end (&builder));

By taking the chance of moving signals out of the their own namespace we can also move to a model where each class is responsible for providing the API necessary to connect and emit signals, as well as providing callback types for each signal. This would allow us to increase type safety, and reduce the reliance on generic API:

typedef offset_t (* EditableInsertText) (Editable *self,
                                         const char *text,
                                         size_t length,
                                         offset_t position);

unsigned long
editable_connect_insert_text (Editable *self,
                              EditableInsertText callback,
                              gpointer user_data,
                              GSignalFlags flags);

editable_emit_insert_text (Editable *self,
                           const char *text,
                           size_t length,
                           offset_t position);

Extending the type system

Some of the metadata necessary to provide properly typed properties and signals is missing from the type system. For instance, by design, there is no type representing a uint16_t; we are supposed to create a GParamSpec to validate the value of a G_TYPE_INT in order to fit in the 16bit range. Of course, this leads to excessive run time validation, and relies on C’s own promotion rules for variadic arguments; it also does not work for signals, as those do not use GParamSpec. More importantly, though, the missing connection between C types and GTypes prevents gathering proper introspection information for properties and signal arguments: if we only have the GType we cannot generate the full metadata that can be used by documentation and language bindings, unless we’re willing to lose specificity.

Not only the type system should be sufficient to contain all the standard C types that are now available, we also need the type system to provide us with enough information to be able to serialise those types into the introspection data, if we want to be able to generate code like signal API, type safe bindings, or accurate documentation for properties and signal handlers.


Introspection exists outside of GObject mainly because of dependencies; the parser, abstract syntax tree, and transformers are written in Python and interface with a low level C tokeniser. Adding a CPython dependency to GObject is too much of a stretch, especially when it comes to bootstrapping a system. While we could keep the dependency optional, and allow building GObject without support for introspection, keeping the code separate is a simpler solution.

Nevertheless, GObject should not ignore introspection. The current reflection API inside GObject should generate data that is compatible with the libgirepository API and with its GIR parser. Currently, gobject-introspection is tasked with generating a small C executable, compiling it, running it to extract metadata from the type system, as well as the properties and signals of a GObject type, and generate XML that can be parsed and included into the larger GIR metadata for the rest of the ABI being introspected. GObject should ship a pre-built binary, instead; it should dlopen the given library or executable, extract all the type information, and emit the introspection data. This would not make gobject-introspection more cross-compilable, but it would simplify its internals and its distributability. We would not need to know how to compile and run C code from a Python script, for one; a simple executable wrapper around a native copy of the GObject-provided binary would be enough.

Ideally, we could move the girepository API into GObject itself, and allow it to load the binary data compiled out of the XML; language bindings loading the data at run time would then need to depend on GObject instead of an additional library, and we could ship the GIR → typelib compiler directly with GLib, leaving gobject-introspection to deal only with the parsing of C headers, docblocks, and annotations, to generate the XML representation of the C/GObject ABI.

There and back again

And the ship went out into the High Sea and passed on into the West, until at last on a night of rain Frodo smelled a sweet fragrance on the air and heard the sound of singing that came over the water. And then it seemed to him that as in his dream in the house of Bombadil, the grey rain-curtain turned all to silver glass and was rolled back, and he beheld white shores and beyond them a far green country under a swift sunrise. — “The Lord of the Rings”, Volume 3: The Return of the King, Book 6: The End of the Third Age

The hard part of changing a project in a backward compatible way is resisting the temptation of fixing the existing design. Some times it’s necessary to backtrack the chain of decisions, and consider the extant code base a dead branch; not because the code is wrong, or bug free, but because any attempt at doubling down on the same design will inevitably lead to breakage. In this sense, it’s easy to just declare “maintenance bankruptcy”, and start from a new major API version: breaks allow us to fix the implementation, at the cost of adapting to new API. For instance, widgets are still the core of GTK, even after 4 major revisions; we did not rename them to “elements” or “actors”, and we did not change how the windows are structured. You are still supposed to build a tree of widgets, connect callbacks to signals, and let the main event loop run. Porting has been painful because of underlying changes in the graphics stack, or because of portability concerns, but even with the direction change of favouring composition over inheritance, the knowledge on how to use GTK has been transferred from GTK 1 to 4.

We cannot do the same for GObject. Changing how it is implemented implies changing everything that depends on it; it means introducing behavioural changes in subtle, and hard to predict ways. Luckily for us, the underlying type system is still flexible and nimble enough that it can give us the ability to change direction, and implement an entirely different approach to object orientation—one that is more in line with languages like modern C++ and Rust. By following new approaches we can slowly migrate our platform to other languages over time, with a smaller impedance mismatch caused by the current design of our object model. Additionally, by keeping the root of the type system, we maintain the ability to provide a stable C ABI that can be consumed by multiple languages, which is the strong selling point of the GNOME ecosystem.

Why do all of this work, though? Compared to a full API break, this proposal has the advantage of being tractable and realistic; I cannot overemphasise enough how little appetite there is for a “GObject 3.0” in the ecosystem. The recent API bump from libsoup2 to libsoup3 has clearly identified that changes deep into the stack end up being too costly an effort: some projects have found it easier to switch to another HTTP library altogether, rather than support two versions of libsoup for a while; other projects have decided to drop compatibility with libsoup2, forcing the hand of every reverse dependency both upstream and downstream. Breaking GObject would end up breaking the ecosystem, with the hope of a “perfect” implementation way down the line and with very few users on one side, and a dead branch used by everybody else on the other.

Of course, the complexity of the change is not going to be trivial, and it will impact things like the introspection metadata and the various language bindings that exist today; some bindings may even require a complete redesign. Nevertheless, by implementing this new object model and leaving GObject alone, we buy ourselves enough time and space to port our software development platform towards a different future.

Maybe this way we will get to save the Shire; and even if we give up some things, or even lose them, we still get to keep what matters.

by ebassi at August 23, 2023 08:23 PM

August 21, 2023

Melissa Wen

AMD Driver-specific Properties for Color Management on Linux (Part 1)


Color is a visual perception. Human eyes can detect a broader range of colors than any devices in the graphics chain. Since each device can generate, capture or reproduce a specific subset of colors and tones, color management controls color conversion and calibration across devices to ensure a more accurate and consistent color representation. We can expose a GPU-accelerated display color management pipeline to support this process and enhance results, and this is what we are doing on Linux to improve color management on Gamescope/SteamDeck. Even with the challenges of being external developers, we have been working on mapping AMD GPU color capabilities to the Linux kernel color management interface, which is a combination of DRM and AMD driver-specific color properties. This more extensive color management pipeline includes pre-defined Transfer Functions, 1-Dimensional LookUp Tables (1D LUTs), and 3D LUTs before and after the plane composition/blending.

The study of color is well-established and has been explored for many years. Color science and research findings have also guided technology innovations. As a result, color in Computer Graphics is a very complex topic that I’m putting a lot of effort into becoming familiar with. I always find myself rereading all the materials I have collected about color space and operations since I started this journey (about one year ago). I also understand how hard it is to find consensus on some color subjects, as exemplified by all explanations around the 2015 online viral phenomenon of The Black and Blue Dress. Have you heard about it? What is the color of the dress for you?

So, taking into account my skills with colors and building consensus, this blog post only focuses on GPU hardware capabilities to support color management :-D If you want to learn more about color concepts and color on Linux, you can find useful links at the end of this blog post.

Linux Kernel, show me the colors ;D

DRM color management interface only exposes a small set of post-blending color properties. Proposals to enhance the DRM color API from different vendors have landed the subsystem mailing list over the last few years. On one hand, we got some suggestions to extend DRM post-blending/CRTC color API: DRM CRTC 3D LUT for R-Car (2020 version); DRM CRTC 3D LUT for Intel (draft - 2020); DRM CRTC 3D LUT for AMD by Igalia (v2 - 2023); DRM CRTC 3D LUT for R-Car (v2 - 2023). On the other hand, some proposals to extend DRM pre-blending/plane API: DRM plane colors for Intel (v2 - 2021); DRM plane API for AMD (v3 - 2021); DRM plane 3D LUT for AMD - 2021. Finally, Simon Ser sent the latest proposal in May 2023: Plane color pipeline KMS uAPI, from discussions in the 2023 Display/HDR Hackfest, and it is still under evaluation by the Linux Graphics community.

All previous proposals seek a generic solution for expanding the API, but many seem to have stalled due to the uncertainty of matching well the hardware capabilities of all vendors. Meanwhile, the use of AMD color capabilities on Linux remained limited by the DRM interface, as the DCN 3.0 family color caps and mapping diagram below shows the Linux/DRM color interface without driver-specific color properties [*]:

Bearing in mind that we need to know the variety of color pipelines in the subsystem to be clear about a generic solution, we decided to approach the issue from a different perspective and worked on enabling a set of Driver-Specific Color Properties for AMD Display Drivers. As a result, I recently sent another round of the AMD driver-specific color mgmt API.

For those who have been following the AMD driver-specific proposal since the beginning (see [RFC][V1]), the main new features of the latest version [v2] are the addition of pre-blending Color Transformation Matrix (plane CTM) and the differentiation of Pre-defined Transfer Functions (TF) supported by color blocks. For those who just got here, I will recap this work in two blog posts. This one describes the current status of the AMD display driver in the Linux kernel/DRM subsystem and what changes with the driver-specific properties. In the next post, we go deeper to describe the features of each color block and provide a better picture of what is available in terms of color management for Linux.

The Linux kernel color management API and AMD hardware color capabilities

Before discussing colors in the Linux kernel with AMD hardware, consider accessing the Linux kernel documentation (version 6.5.0-rc5). In the AMD Display documentation, you will find my previous work documenting AMD hardware color capabilities and the Color Management Properties. It describes how AMD Display Manager (DM) intermediates requests between the AMD Display Core component (DC) and the Linux/DRM kernel interface for color management features. It also describes the relevant function to call the AMD color module in building curves for content space transformations.

A subsection also describes hardware color capabilities and how they evolve between versions. This subsection, DC Color Capabilities between DCN generations, is a good starting point to understand what we have been doing on the kernel side to provide a broader color management API with AMD driver-specific properties.

Why do we need more kernel color properties on Linux?

Blending is the process of combining multiple planes (framebuffers abstraction) according to their mode settings. Before blending, we can manage the colors of various planes separately; after blending, we have combined those planes in only one output per CRTC. Color conversions after blending would be enough in a single-plane scenario or when dealing with planes in the same color space on the kernel side. Still, it cannot help to handle the blending of multiple planes with different color spaces and luminance levels. With plane color management properties, userspace can get a more accurate representation of colors to deal with the diversity of color profiles of devices in the graphics chain, bring a wide color gamut (WCG), convert High-Dynamic-Range (HDR) content to Standard-Dynamic-Range (SDR) content (and vice-versa). With a GPU-accelerated display color management pipeline, we can use hardware blocks for color conversions and color mapping and support advanced color management.

The current DRM color management API enables us to perform some color conversions after blending, but there is no interface to calibrate input space by planes. Note that here I’m not considering some workarounds in the AMD display manager mapping of DRM CRTC de-gamma and DRM CRTC CTM property to pre-blending DC de-gamma and gamut remap block, respectively. So, in more detail, it only exposes three post-blending features:

  • DRM CRTC de-gamma: used to convert the framebuffer’s colors to linear gamma;
  • DRM CRTC CTM: used for color space conversion;
  • DRM CRTC gamma: used to convert colors to the gamma space of the connected screen.

AMD driver-specific color management interface

We can compare the Linux color management API with and without the driver-specific color properties. From now, we denote driver-specific properties with the AMD prefix and generic properties with the DRM prefix. For visual comparison, I bring the DCN 3.0 family color caps and mapping diagram closer and present it here again:

Mixing AMD driver-specific color properties with DRM generic color properties, we have a broader Linux color management system with the following features exposed by properties in the plane and CRTC interface, as summarized by this updated diagram:

The blocks highlighted by red lines are the new properties in the driver-specific interface developed by me (Igalia) and Joshua (Valve). The red dashed lines are new links between API and AMD driver components implemented by us to connect the Linux/DRM interface to AMD hardware blocks, mapping components accordingly. In short, we have the following color management properties exposed by the DRM/AMD display driver:

  • Pre-blending - AMD Display Pipe and Plane (DPP):
    • AMD plane de-gamma: 1D LUT and pre-defined transfer functions; used to linearize the input space of a plane;
    • AMD plane CTM: 3x4 matrix; used to convert plane color space;
    • AMD plane shaper: 1D LUT and pre-defined transfer functions; used to delinearize and/or normalize colors before applying 3D LUT;
    • AMD plane 3D LUT: 17x17x17 size with 12 bit-depth; three dimensional lookup table used for advanced color mapping;
    • AMD plane blend/out gamma: 1D LUT and pre-defined transfer functions; used to linearize back the color space after 3D LUT for blending.
  • Post-blending - AMD Multiple Pipe/Plane Combined (MPC):
    • DRM CRTC de-gamma: 1D LUT (can’t be set together with plane de-gamma);
    • DRM CRTC CTM: 3x3 matrix (remapped to post-blending matrix);
    • DRM CRTC gamma: 1D LUT + AMD CRTC gamma TF; added to take advantage of driver pre-defined transfer functions;

Note: You can find more about AMD display blocks in the Display Core Next (DCN) - Linux kernel documentation, provided by Rodrigo Siqueira (Linux/AMD display developer) in a 2021-documentation series. In the next post, I’ll revisit this topic, explaining display and color blocks in detail.

How did we get a large set of color features from AMD display hardware?

So, looking at AMD hardware color capabilities in the first diagram, we can see no post-blending (MPC) de-gamma block in any hardware families. We can also see that the AMD display driver maps CRTC/post-blending CTM to pre-blending (DPP) gamut_remap, but there is post-blending (MPC) gamut_remap (DRM CTM) from newer hardware versions that include SteamDeck hardware. You can find more details about hardware versions in the Linux kernel documentation/AMDGPU Product Information.

I needed to rework these two mappings mentioned above to provide pre-blending/plane de-gamma and CTM for SteamDeck. I changed the DC mapping to detach stream gamut remap matrixes from the DPP gamut remap block. That means mapping AMD plane CTM directly to DPP/pre-blending gamut remap block and DRM CRTC CTM to MPC/post-blending gamut remap block. In this sense, I also limited plane CTM properties to those hardware versions with MPC/post-blending gamut_remap capabilities since older versions cannot support this feature without clashes with DRM CRTC CTM.

Unfortunately, I couldn’t prevent conflict between AMD plane de-gamma and DRM plane de-gamma since post-blending de-gamma isn’t available in any AMD hardware versions until now. The fact is that a post-blending de-gamma makes little sense in the AMD color pipeline, where plane blending works better in a linear space, and there are enough color blocks to linearize content before blending. To deal with this conflict, the driver now rejects atomic commits if users try to set both AMD plane de-gamma and DRM CRTC de-gamma simultaneously.

Finally, we had no other clashes when enabling other AMD driver-specific color properties for our use case, Gamescope/SteamDeck. Our main work for the remaining properties was understanding the data flow of each property, the hardware capabilities and limitations, and how to shape the data for programming the registers - AMD color block capabilities (and limitations) are the topics of the next blog post. Besides that, we fixed some driver bugs along the way since it was the first Linux use case for most of the new color properties, and some behaviors are only exposed when exercising the engine.

Take a look at the Gamescope/Steam Deck Color Pipeline[**], and see how Gamescope uses the new API to manage color space conversions and calibration (please click on the image for a better view):

In the next blog post, I’ll describe the implementation and technical details of each pre- and post-blending color block/property on the AMD display driver.

* Thank Harry Wentland for helping with diagrams, color concepts and AMD capabilities.

** Thank Joshua Ashton for providing and explaining Gamescope/Steam Deck color pipeline.

*** Thanks to the Linux Graphics community - explicitly Harry, Joshua, Pekka, Simon, Sebastian, Siqueira, Alex H. and Ville - to all the learning during this Linux DRM/AMD color journey. Also, Carlos and Tomas for organizing the 2023 Display/HDR Hackfest where we have a great and immersive opportunity to discuss Color & HDR on Linux.

  1. Cinematic Color - 2012 SIGGRAPH course notes by Jeremy Selan: an introduction to color science, concepts and pipelines.
  2. Color management and HDR documentation for FOSS graphics by Pekka Paalanen: documentation and useful links on applying color concepts to the Linux graphics stack.
  3. HDR in Linux by Jeremy Cline: a blog post exploring color concepts for HDR support on Linux.
  4. Methods for conversion of high dynamic range content to standard dynamic range content and vice-versa by ITU-R: guideline for conversions between HDR and SDR contents.
  5. Using Lookup Tables to Accelerate Color Transformations by Jeremy Selan: Nvidia blog post about Lookup Tables on color management.
  6. The Importance of Being Linear by Larry Gritz and Eugene d’Eon: Nvidia blog post about gamma and color conversions.

August 21, 2023 11:13 AM

August 10, 2023

Brian Kardell

Igalia: Mid-season Power Rankings

Igalia: Mid-season Power Rankings

Let’s take a look at how the year is stacking up in terms of Open Source contributions. If this were an episode of Friends its title would be "The One With the Charts".

I’ve written before about how I have personally come to really appreciate the act of making a simple list of “things recently accomplished”. It’s always eye opening, and for me at least, usually therapeutic.

For me personally, it’s been super weird first half of the year and… I feel like I could use a nice list.

It's been a weird year, right?

I mean, not just for me personally, for all of us I guess, right?

Mass layoffs everywhere for a while, new shuffling of people we know from Google to Shopify, Mozilla to Google, Google to Igalia, Mozilla to Igalia, Mozilla to Apple, Google to Meta… Who’s on first? Third base!

LLMs are suddenly everywhere. All of the “big” CSS features people have been clamouring for forever are suddenly right here. HTML finally got <dialog> and now is getting a popover (via attributes). Apple’s got some funky XR glasses coming. There is suddenly significant renewed interest in two novel web engines. And that’s just some of the tech stuff.

So yeah… Let’s see what, if any, impacts all of this are having on the state of projects Igalia works on by looking at our commits so far this year… Note that Igalia's slice of the pie is separated in each of the charts for quick identification...

Quick disclaimers

All of these stats are based on commits publicly available through their respective gits. This is of course an imperfect measure for many reasons - some commits are huge, some are small. Some small commits are really hard while some large commits are easy, but verbose. Finally, the biggest challenge, even if we accept these metrics is mapping commits to organizations. We use a fairly elaborate system and many checkpoints - we collaborate annually with several of the projects to cross check these mappings. Still, you'll see lots of entries in these charts with just an individual's name. Often these are individual contractors or contributors, but sometimes it's just that we cannot currently map them some other way. If you see one that should be counted differently, please let me know!

The Big Web Projects

Igalia is still the one (and still having fun) with the most commits in Chromium and WebKit after Google and Apple, respectively, as we'll see... But we can add some more #1’s this year - even some where we’re not excluding the steward…


Igalia claims a whopping 41.9% of the (non-Google) commits so far in 2023!! That’s more than Microsoft, Intel, Opera, Samsung and ARM combined!!! Yowza!


Top 10 contributors

Ho Cheung3.92%
Stephan Hartmann1.62%
Naver Corporation1.48%
127 other committers14%

As you read the others, keep in mind that the chromium repository actually has more than chrome inside it, so comparisons of these aren't Apples-to-Apples (or, Googles or foxes).


52.6% of the non-Apple commits in WebKit so far this year are from Igalians. It's interesting to note that a huge 4.9% of all of these are from accounts with less than 10 commits this year - pretty close to what it was in Firefox!


Top 10 contributors

Ahmad Saleem8.81%
Red Hat5.50%
Alexey Knyazev0.51%
40 other committers4%


We're sitting at the #5 spot (excluding Mozilla) with 8.87% of commits. Firefox is, in a lot of ways, the trickiest to describe, but just look at it: It's very diverse! As the inventors of modern open source, I guess it makes sense. The mozilla-central repository has the most indidivual significant contributors as well as a really long line of tiny contributors. The tiny contributors (less than 10 commits) contributed 5.2% (compared to 4.85% in WebKit, for example). However, there are also a few external contributors who are just astoundingly profilfic (some of these bigger slices represent hundreds of commits) and such a number of significant indivial contributors, it amounts to a lot.


Top 10 contributors

André Bargull14.38%
Red Hat12.31%
Gregory Pappas9.04%
Robert Longson5.12%
Masatoshi Kimura2.65%
174 other committers29%

Pause for a Note

When you look at these charts, it's really heartening to see how many people and organizations care and contribute. Especially when you look at the Mozilla/Firefox example, it really gives the impression that that project is just a million volunteers. But, it's important to keep it all in perspective too. WebKit has about 50 contributing orgs and individuals, Chrome about 140 and Firefox about 185. A lot more significant a % of contributions come from individuals in Mozilla. Importantly: In all of these projects, the steward org's contributions absolutely dwarf the rest of the world's contributions combined:

A version of the pies showing the steward's contributions, for scale (Mozilla contributed 87.2% of all commits, Apple 78.1% and Google 95.5% to their respective projects).

If you think this is astounding, please check out my post Webrise and our Web Ecosystem Health series on the Igalia Chats Podcast


A new #1 in the reports. I guess it should come as no suprise at all that we're #1 in terms of commits to our Wolvic XR browser. It looks at lot like other projects in terms of the steward's balance. What's more interesting, I think, is that its funding model is based on partnerships with several organizations and a collective rather than Igalia as a "go it alone" source.


Top 10 contributors

Ayaskant Panigrahi0.51%
Anushka Chakraborty0.26%
Luna Jernberg0.26%


This year, thanks to some new funding and internal investment we can add Servo to a very special #1 list! Igalia is second to no one in terms of commits there either with 52.7% of all commits! An amazing 22.24% of those commits in servo are from unmappable committers with less than 10 commits so far this year!


Top 10 contributors

Pu Xingyu2.73%
Alex Touchet1.66%
cybai (Haku)1.19%
Yutaro Ohno0.71%
30 other committers6%


Test-262 is the conformance test suite for ECMA work (JavaScript). I guess you could say we're doing a lot of work there as well, because guess who's got the most commits there? If you guessed Igalia, you'd be right, with 53.4% of all commits!


Top 10 contributors

Justin Grant10.60%
Richard Gibson3.97%
André Bargull3.31%
Jordan Harband3.31%
Huáng Jùnliàng1.99%
José Julián Espina1.32%

Note that total number of commits in Test262 is comparatively rather small as compared to many of the other projects here.


Igalians are now the #1 contributors to Babel, contributing 46.6% of all commits so far this year!


Top 10 contributors

Huáng Jùnliàng22.32%
Jonathan Browne0.86%
fisker Cheung0.86%
Dimitri Papadopoulos Orfanos0.43%
Abdulaziz Ghuloum0.43%


Igalia is the #7 contributor to V8 (exclulding Google)! This is a pretty busy repo and it's interesting that 6.36% of these commits are from the unmapped/individual contributors with less than ten commits so far this year.


Top 10 contributors

Red Hat2.98%

Google's contributions account for a giant 87.5% of all commits here as well.

But that's not all!

All of the above is just looking specifically at the big web projects because, you know, the web is sort of my thing. If you're reading my blog, there's a pretty good chance it's your thing too. But Igalia does way more than that, if you can believe it. I probably don't talk about it enough, but it's pretty amazing. I suppose I can't give a million more charts, but here are just a few more highlights of other notable projects and specifications where Igalia has been playing a big role... (Keep in mind that specifications move a lot more slowly and so have generally far less commits)

  • HTML: Igalians were #3 among contribtor commits to HTML with 8.94% so far this year (behind Google and Apple).
  • Web Assembly: Igalia is the #3 contributor to Web Assembly with 8.75% of the commits so far this year!
  • ARIA: So far this year, Igalia is the #1 contributor to the ARIA repo with 19.4% of commits!
  • NativeScript: Igalia is currently the #1 contributor so far this year to the NativeScript repository with 58.3% of all commits!
  • GStreamer GStreamer is widely used and powerful open source multimedia framework. Igalia is the #2 contributor there!
  • VK-GL-CTS: The official Kronos OpenGL and Vulkan conformance test suite (graphics). It would be a massive understatement to say that Igalia has been a major contributor: We're the #1 contributor there with 31.1% of all commits.
  • Mesa: The Mesa 3D Graphics Library is huge and contains open source implementations of pretty much every graphical standard (Vulkan, as mentioned above, for example). Igalia is the #5 contributor there so far this year, contributing 6.62% of all commits.
  • Piglit: Piglit is an open-source test suite for OpenGL implementations. Igalia is the #5 contributor there with 6.86%

Wrapping up...

It's always amazing to me to look at the data. I hope it's interesting to others too. There are, of course, lots of reasons that all of the committers do what they do, but ultimately, open source development and maintenance benefits us all. The reason that Igalia is able to do all of this is that we are funded by a diverse array of clients making things downstream with needs.

You know where to find us...

August 10, 2023 04:00 AM

August 08, 2023

Víctor Jáquez

DMABuf modifier negotiation in GStreamer

It took almost a year of design and implementation but finally the DMABuf modifier negotiation in GStreamer is merged. Big kudos to all the people involved but mostly to He Junyan, who did the vast majority of the code.

What’s a DMAbuf modifier?

DMABuf are the Linux kernel mechanism to share buffers among different drivers or subsystems. A particular case of DMABuf are the DRM PRIME buffers which are buffers shared by the Display Rendering Manager (DRM) subsystem. They allowed sharing video frames between devices with zero copy.

When we initially added support for DMABuf in GStreamer, we assumed that only color format and size mattered, just as old video frames stored in system memory. But we were wrong. Beside color format and size, also the memory layout has to be considered when sharing DMABufs. By not considering it, the produced output had horrible tiled artifacts in screen. This memory layout is known as modifier, and it’s uniquely described by an uint64 number.

How was designed and implemented?

First, we wrote a design document for caps negotiation with dmabuf
, where we added a new color format (DMA_DRM) and a new caps field (drm-format). This new caps field holds a string, or a list of strings, composed by the tuple DRM_color_format : DRM_modifier.

Second, we extended the video info object to support DMABuf with helper functions that parse and construct the drm-format field.

Third, we added the dmabuf caps negotiation in glupload. This part was the most difficult one, since the capability of importing DMABufs to OpenGL (which is only available in EGL/GLES) is run-time defined, by querying the hardware. Also, there are two code paths to import frames: direct or RGB-emulated. Direct would be the most efficient, but it depends on the presence of GLES2 API in the driver; while RGB-emulated is imported as a set of RGB images where each component is an image. At the end more than a thousand lines of code were added to the glupload element, beside the code added to EGL context object.

Fourth, and unexpectedly, waylandsink also got DMABuf caps negotiation

And lastly, decoders in `va** plugin merged theirs DMABuf caps negotiation support.

How I can test it?

You need, of course, to user the current main branch of GStreamer, since it’s just fresh and there’s no release yet. Then you need a box with VA support. And if you inspect, for example, vah264dec, you might see this output if your box is Intel (but also AMD through Mesa is supported though the negotiated memory is linear so far):

Pad Templates:
SINK template: 'sink'
Availability: Always
profile: { (string)main, (string)baseline, (string)high, (string)progressive-high, (string)constrained-high, (string)constrained-baseline }
width: [ 1, 4096 ]
height: [ 1, 4096 ]
alignment: au
stream-format: { (string)avc, (string)avc3, (string)byte-stream }

SRC template: 'src'
Availability: Always
width: [ 1, 4096 ]
height: [ 1, 4096 ]
format: NV12
width: [ 1, 4096 ]
height: [ 1, 4096 ]
format: DMA_DRM
drm-format: NV12:0x0100000000000002
width: [ 1, 4096 ]
height: [ 1, 4096 ]
format: NV12

What it’s saying, for memory:DMABuf caps feature, the drm-format to negotiate is NV12:0x0100000000000002.

Now some tests:

NOTE: These commands assume that va decoders are primary ranked (see merge request 2312), and that you’re in a Wayland session.

$ gst-play-1.0 --flags=0x47 video.file --videosink=waylandsink
$ GST_GL_API=gles2 gst-play-1.0 --flags=0x47 video.file --videosink=glimagesink
$ gst-play-1.0 --flags=0x47 video.file --videosink='glupload ! gtkglsink'

Right now it’s required to add --flags=0x47 to playbin because it adds video filters that still don’t negotiate the new DMABuf caps.

GST_GL_API=gles2 instructs GStreamer OpenGL to use GLES2 API, which allows direct importation of YUV images.

Thanks to all the people involved in this effort!

As usual, if you would like to learn more about DMABuf, VA-API, GStreamer or any other open multimedia framework, contact us!

by vjaquez at August 08, 2023 01:00 PM

August 02, 2023

Pablo Saavedra


This article will delve deeper into the intricacies of the GTK FrameClock, its interaction with the compositor, and how it ensures smooth and synchronized animations. Specifically, we will explore the GTK FrameClockIdle implementation and understand how it manages timing cycles and aligns them with VSync signals in the Wayland platform to optimize performance and enhance the user experience.

Over the last few days, I have been immersed in understanding the inner workings of the GTK FrameClock. This exploration holds significant importance to better understanding of the integration of animated applications. My focus in this post lies in comprehending two key aspects:

  • How the clock utilizes the system time to implement its ticks.
  • The synchronization mechanism the clock employs with the display refresh rate (VSync).

Notice: The article uses this code for the examples.

First steps. The overall view …

The gdk.FrameClock can be likened to a timing coordinator for a window within an application. It plays a vital role in informing the application when to update and repaint the window. By optionally syncing with the monitor’s refresh rate, it ensures smooth animations. Even without synchronization, the gdk.FrameClock aids in synchronizing painting operations, reducing unnecessary frames and optimizing performance. Additionally, the frame clock can pause painting when frames will not be visible, or adjust animation rates as needed.

When an application requests a frame, the frame clock processes it and emits signals for different phases. These signals help update animations. The phases of a FrameClock can be the following:

  1. Before Paint: This phase occurs before the painting process of a frame. It is a preparatory phase where the application can perform any necessary setup or calculations before the actual rendering.
  2. Update: The FrameClock updates the state of animations and other time-based elements. It signals the application to update the content of the frame.
  3. Layout: This phase involves the layout calculation, where the application organizes and positions the elements to be displayed in the frame.
  4. Draw: The application performs the actual rendering of the frame. It involves painting the content on the screen based on the updated layout.
  5. Paint: After the rendering is completed, this phase marks the end of the painting process. The frame is ready to be presented to the screen.
  6. After Paint: This phase follows the completion of painting and may involve additional clean-up or bookkeeping tasks related to the frame presentation.

These phases represent the sequence of events that a FrameClock typically goes through during the generation and presentation of a frame. The phase it can be adjusted manually with the gdk_frame_clock_request_phase() method.

GTK internally manages the concept of the frame drawn signal. This signal informs the gdk.FrameClock about the successful rendering and presentation of a frame on the screen by the compositor or windowing system. This signal is crucial as it allows the FrameClock to stay aligned with the monitor’s vertical refresh rate (VSync) whenever the signal is received. In the absence of the frame drawn signal, the frame clock cycles continue to occur at a constant cadence, providing regular updates to the application.

Understanding the cycle of the time …

The frame time given by Frame.clockGetFrameTime() is reported in microseconds and is similar to g_get_monotonic_time() but not the same. It doesn’t change while a frame is being painted and stays the same for similar calls outside of a frame. This makes sure that different animations timed using the frame time stay synchronized. Overall, gdk.FrameClock helps keep animations smooth and coordinated. The next output of the gtk-frame-clock-example application illustrates a complete frame generation cycle:

(s): Cycle start
               get timings:
               |  - now: 1803864692852
               |  - frame time: 1803864709176 (counter: 75) (frame time - now: 16324)
               |  - predicted presentation time: 1803864726132 (predicted - now: 33280)
1803864692852:  widget:on_tick_callback (rate: 16454)
               get timings:
               |  - now: 1803864694022
               |  - frame time: 1803864709176 (counter: 75) (frame time - now: 15154)
               |  - predicted presentation time: 1803864726132 (predicted - now: 32110)
1803864694022:  widget:on_draw (tick-draw latency: 1170)
               get timings:
               |  - now: 1803864709326
               |  - frame time: 1803864709176 (counter: 75) (frame time - now: -150)
               |  - predicted presentation time: 1803864726132 (predicted - now: 16806)
1803864709326:  wl_surface:on_commit
(e): End of cycle

The sequence of events and timings associated with this example frame generation cycle are described below:

  1. (s) Cycle Start: The cycle begins, representing the start of a new frame generation cycle.
  2. clock:on_before_paint: The FrameClock emits the “on_before_paint” signal, indicating the preparation phase before painting the current frame.
  3. clock:on_update: The FrameClock emits the “on_update” signal, triggering an update for the current frame. The
  4. get timings: Show various timings related to the frame generation cycle. These timings include:
  • now: The current monotonic system time in microseconds.
  • frame time: The time allocated for rendering and painting this frame, along with a counter indicating the frame number.
  • predicted presentation time: The expected time when this frame will be presented on the screen.
  1. widget:on_tick_callback (rate: 16454): The application’s widget is receiving a tick callback, at an average interval of 16454 microsecons. This callback notifies the application about the right time to initiate the generation of the next frame. Usually the application will to decide if it has to update the animations and it will put in the queue (gtk_widget_queue_draw())
  2. clock:on_layout: The FrameClock emits the “on_layout” signal, indicating the layout phase, where the application prepares the layout before painting the frame.
  3. widget:on_draw (tick-draw latency: 1170): The widget receives a draw callback, which indicates the right moment to paint the frame. The “tick-draw latency” measures the time delay between the tick callback and the actual drawing of the frame.
  4. clock:on_paint: The FrameClock emits the “on_paint” signal, marking the actual painting phase of the frame.
  5. clock:on_after_paint: The FrameClock emits the “on_after_paint” signal, indicating that the frame painting is completed.
  6. wl_surface:on_commit: This event indicates that the Wayland surface has been committed, meaning that the frame has been drawn and is ready for presentation.
  7. (e) End of Cycle: The cycle ends, representing the completion of the frame generation cycle.

How does the FrameClock calculate the frame times …

When an animation begins, its first cycle might start at a random time due to external triggers like input events or timers. This phase shift, called the phase of the clock cycle start time, impacts the smoothness of animations.

During the first cycle, the smooth frame time is set at the cycle’s start time. Subsequent cycles may not align with vsync signals. However, once a frame drawn signal is received from the compositor, the clock cycles will synchronize with vsync signals, maintaining a regular cadence. This may cause the first vsync-related cycle to occur close to the previous non-vsync-related one, altering the phase of cycle start times.

To ensure consistent reported frame times, adjustments are made to the frame time. The phase of the first clock cycle start time is computed, considering skipped frames due to compositor stalls. The goal is to have the first vsync-related smooth time separated by exactly 1 frame interval from the previous one. This adjustment maintains regularity even if “frame drawn” signals are missed in subsequent frames.

In the next diagram from gdk/gdkframeclockidle.c#L468, the relationship between vsync signals, clock cycle starts, adjusted frame times, and “frame drawn” events is illustrated. The changing cadence of the clock cycles after the first vsync-related cycle is highlighted, while the regularity of the cycle cadence is maintained even if “frame drawn” events are absent in certain frames.

In the following diagram, '|' mark a vsync, '*' mark the start of a clock cycle, '+' is the adjusted
frame time, '!' marks the reception of *frame drawn* events from the compositor. Note that the clock
cycle cadence changed after the first vsync-related cycle. This cadence is kept even if we don't
receive a 'frame drawn' signal in a subsequent frame, since then we schedule the clock at intervals of

vsync             |           |           |           |           |           |... 
frame drawn       |           |           |!          |!          |           |...
cycle start       |       *   |       *   |*          |*          |*          |...
adjusted times    |       *   |       *   |       +   |       +   |       +   |...
phase                                      ^------^

You can get more information from the comment om gdk/gdkframeclockidle.c for more in-deph information about how the FrameClock handles the adjustment of reported frame times. Here is where the concept of frame drawn is introduced and explained in detail. As it was mentioned before, the frame drawn signal refers to whatever method to allows the FrameClock to know when a frame has been successfully drawn and presented on the screen by the compositor or windowing system.

Initially, the frame clock cycles occur at a regular interval, approximately matching the desired frame rate, but these cycles are not directly tied to the monitor’s vertical refresh rate (VSync) but it will be eventually smoothly aligned as far a frame drawn signal is received.

In the absence of the frame drawn signal, the frame clock cycles will continue to occur at a constant cadence. However, when the frame drawn signal is received from the compositor, it marks the successful completion of frame rendering and indicates that the frame clock cycles should align with the monitor’s VSync signals.

The frame draw signal for GTK in the Wayland platform it is the “frame.done” signal. This represents the Vsync for GTK in a Wayland environment. The FrameClock becomes freeze/unfreeze as long as the (gdk_frame_clock_idle_is_frozen function in gdkframeclockidle.c#L279) ticks are being accumulated and there is not a “frame.done” callback invokation from the compositor. This is how it works for the particular case of Wayland but similar approach are used for X11 and other supported platforms on GTK.

How can I get a FrameClock for my widget?

Unfortunatelly, there are not public methods in the GTK API for the manual creation of FrameClock instances. The common way to get a frame clock for a GTK widget is by adding it to a GTK window and then request for the clock with
gtk_widget_get_frame_clock() once the widget were realized:

static void on_realize(GtkWidget* widget, gpointer user_data) {
    frame_clock = gtk_widget_get_frame_clock(widget);

GtkWidget *drawing_area = gtk_drawing_area_new();
gtk_container_add(GTK_CONTAINER(window), drawing_area);

g_signal_connect(drawing_area, "realize", G_CALLBACK(on_realize), NULL);

The obtained FrameClock will be the one created during the instantiation of a new GTK window:

#0  gdk_frame_clock_idle_init (frame_clock_idle=0x5555555e8140) at ../../../../gdk/gdkframeclockidle.c:137
#1  0x00007ffff7e67fba in g_type_create_instance (type=<optimized out>) at ../../../gobject/gtype.c:1929
#2  0x00007ffff7e4f0ed in g_object_new_internal (class=class@entry=0x5555555efa80, params=params@entry=0x0, n_params=n_params@entry=0) at ../../../gobject/gobject.c:2023
#3  0x00007ffff7e5034d in g_object_new_with_propertiesPython Exception <class 'TypeError'>: can only concatenate str (not "NoneType") to str
 (object_type=, n_properties=0, names=names@entry=0x0, values=values@entry=0x0) at ../../../gobject/gobject.c:2193
#4  0x00007ffff7e50e51 in g_object_new (object_type=<optimized out>, first_property_name=first_property_name@entry=0x0) at ../../../gobject/gobject.c:1833
#5  0x00007ffff7ed8ba9 in gdk_window_new (parent=0x555555581110, attributes=0x7fffffffd770, attributes_mask=44) at ../../../../gdk/gdkwindow.c:1488
#6  0x00007ffff7f22c42 in create_foreign_dnd_window (display=0x55555557c0e0) at wayland/../../../../../gdk/wayland/gdkdevice-wayland.c:4803
#7  _gdk_wayland_device_manager_add_seat (wl_seat=<optimized out>, id=<optimized out>, device_manager=0x555555572e60) at wayland/../../../../../gdk/wayland/gdkdevice-wayland.c:5177
#8  _gdk_wayland_display_add_seat (version=<optimized out>, id=<optimized out>, display_wayland=0x55555557c0e0) at wayland/../../../../../gdk/wayland/gdkdisplay-wayland.c:238
#9  seat_added_closure_run (display_wayland=0x55555557c0e0, closure=<optimized out>) at wayland/../../../../../gdk/wayland/gdkdisplay-wayland.c:249
#10 0x00007ffff7f241d1 in process_on_globals_closures (display_wayland=0x55555557c0e0) at wayland/../../../../../gdk/wayland/gdkdisplay-wayland.c:209
#11 _gdk_wayland_display_open (display_name=<optimized out>) at wayland/../../../../../gdk/wayland/gdkdisplay-wayland.c:621
#12 0x00007ffff7ec268f in gdk_display_manager_open_display (manager=<optimized out>, name=0x0) at ../../../../gdk/gdkdisplaymanager.c:462
#13 0x00007ffff784ed4b in gtk_init_check (argc=<optimized out>, argv=<optimized out>) at ../../../../gtk/gtkmain.c:1110
#14 gtk_init_check (argc=<optimized out>, argv=<optimized out>) at ../../../../gtk/gtkmain.c:1102
#15 0x00007ffff784ed7d in gtk_init (argc=<optimized out>, argv=<optimized out>) at ../../../../gtk/gtkmain.c:1167
#16 0x0000555555557144 in main (argc=1, argv=0x7fffffffda88) at /home/user/local/git/examples/example_gdk_frame_clock/src/main.c:176

When is the right time to paint my widget?

Overall, the following code demonstrates how to set up a basic drawing area in a GTK application, connect pre-frame and drawing callbacks, and handle custom graphics rendering using the Cairo library. The on_tick_callback ensures that the widget is scheduled for redraw, and the on_draw function is responsible for actually rendering the graphics within the widget:

 * This signal is emitted when a widget to be redrawn in the PAINT PHASE of the current or the next frame.
static int on_tick_callback(GtkWidget *widget, GdkFrameClock *frame_clock, gpointer user_data) {
    // Schedules this widget to be redrawn in the paint phase of the current or the next frame.
    // ...

 * This signal is emitted when a widget is supposed to render itself in the PAINT PHASE.
static gboolean on_draw(GtkWidget *widget, cairo_t *cr, gpointer user_data) {
    // Your drawing operations here. E.g: cairo_paint(cr);
    // ...
    return FALSE;

// ...

GtkWidget *drawing_area = gtk_drawing_area_new();
gtk_container_add(GTK_CONTAINER(window), drawing_area);

gtk_widget_add_tick_callback(GTK_WIDGET(drawing_area), on_tick_callback, drawing_area, NULL);
g_signal_connect(drawing_area, "draw", G_CALLBACK(on_draw), NULL);

// ...

The FrameClock will notify the widget when it is the rigth time to schedule the generation of a new frame. This happens in the update phase:

#0  gtk_widget_on_frame_clock_update (frame_clock=0x5555555e74c0, widget=0x5555555a8530) at ../../../../gtk/gtkwidget.c:5273
#4  0x00007ffff7e5c863 in <emit signal ??? on instance ???> (instance=instance@entry=0x5555555e74c0, signal_id=<optimized out>, detail=detail@entry=0) at ../../../gobject/gsignal.c:3587
    #1  0x00007ffff7e3ed2f in g_closure_invoke (closure=0x5555559fd320, return_value=0x0, n_param_values=1, param_values=0x7fffffffd540, invocation_hint=0x7fffffffd4c0) at ../../../gobject/gclosure.c:830
    #2  0x00007ffff7e5ac36 in signal_emit_unlocked_R
    (node=node@entry=0x5555555aa000, detail=detail@entry=0, instance=instance@entry=0x5555555e74c0, emission_return=emission_return@entry=0x0, instance_and_params=instance_and_params@entry=0x7fffffffd540)
    at ../../../gobject/gsignal.c:3777
    #3  0x00007ffff7e5c614 in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=var_args@entry=0x7fffffffd6f0) at ../../../gobject/gsignal.c:3530
#5  0x00007ffff7ed0b57 in _gdk_frame_clock_emit_update (frame_clock=0x5555555e74c0) at ../../../../gdk/gdkframeclock.c:645
#6  gdk_frame_clock_paint_idle (data=0x5555555e74c0) at ../../../../gdk/gdkframeclockidle.c:547
#7  0x00007ffff7ebd2ad in gdk_threads_dispatch (data=0x55555578b140, data@entry=<error reading variable: value has been optimized out>) at ../../../../gdk/gdk.c:769
#8  0x00007ffff6b032c8 in g_timeout_dispatch (source=0x555555677120, callback=<optimized out>, user_data=<optimized out>) at ../../../glib/gmain.c:4973
#9  0x00007ffff6b02c44 in g_main_dispatch (context=0x55555558e800) at ../../../glib/gmain.c:3419
#10 g_main_context_dispatch (context=0x55555558e800) at ../../../glib/gmain.c:4137
#11 0x00007ffff6b58258 in g_main_context_iterate.constprop.0 (context=0x55555558e800, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../glib/gmain.c:4213
#12 0x00007ffff6b022b3 in g_main_loop_run (loop=0x5555558d6d30) at ../../../glib/gmain.c:4413
#13 0x00007ffff7848d2d in gtk_main () at ../../../../gtk/gtkmain.c:1329
#14 0x0000555555557276 in main (argc=1, argv=0x7fffffffda88) at /home/user/local/git/examples/example_gdk_frame_clock/src/main.c:194

The gtk_widget_add_tick_callback() actually is attaching the callback to the update signal from the FrameClock. Therefore, it basically queues the animation updates and attaches a callback executed before each frame (check the implementation of gtk_widget_add_tick_callback() in gtk/gtkwidget.c):

gtk_widget_add_tick_callback (GtkWidget       *widget,
                              GtkTickCallback  callback,
                              gpointer         user_data,
                              GDestroyNotify   notify)
  // ...
      frame_clock = gtk_widget_get_frame_clock (widget);

      if (frame_clock)
          priv->clock_tick_id = g_signal_connect (frame_clock, "update",
                                                  G_CALLBACK (gtk_widget_on_frame_clock_update),
          gdk_frame_clock_begin_updating (frame_clock);
  // ...
  info = g_new0 (GtkTickCallbackInfo, 1);
  // info->...
  info->callback = callback;
  // info->...
  priv->tick_callbacks = g_list_prepend (priv->tick_callbacks,
  // ...

The callback runs frequently, matching the output device’s frame rate or the app’s repaint speed, whichever is slower. Inside the on_tick_callback() function, the gtk_widget_queue_draw() is called to schedule the specified widget for redrawing during the current or next frame’s paint phase. This means the widget will be marked for update, and its draw signal will be emitted.

Lastly, the on_draw() callback will be responsible for rendering and drawing on a widget during the paint phase. This callback takes three parameters: a GtkWidget pointer (widget), a cairo_t context pointer (cr) for drawing operations, and user data pointer (user_data). This callback is the right place for performing draw operations using the Cairo drawing library, for example.

Keeping the ticks aligned with the VSync signals

As I already mentioned, the GTK framework internally handles the concept of the frame drawn signal. This signal lets the gdk.FrameClock know when a frame has been successfully rendered and presented on the screen. This is vital for keeping the FrameClock in sync with the monitor’s refresh rate (VSync) upon receiving the signal. With the frame drawn signal, the frame clock maintains a consistent cycle, delivering regular updates to the application.

In the context of a GTK application running in a Wayland environment this sync is implemented by adding an listener to the .done event for the wl_surface_commit() action. This is the frame_callback callback added from the on_frame_clock_after_paint() in the gdk/wayland/gdkwindow-wayland.c:

static void
on_frame_clock_after_paint (GdkFrameClock *clock,
                            GdkWindow     *window)
  // ...
  if (impl->surface_callback == NULL)
      callback = wl_surface_frame (impl->display_server.wl_surface);
      wl_callback_add_listener (callback, &frame_listener, window);  // <-- Here
      impl->surface_callback = callback;
  // ...
static void
frame_callback (void               *data,
                struct wl_callback *callback,
                uint32_t            time)
  // ...
  _gdk_frame_clock_thaw (clock);  
  // ...

The frame_callback will be called when the server has finished processing the surface commit and has made the changes visible on the screen. This function will inmediately thaw the Frameclock.

The term thaw refers to the process of unfreezing the FrameClock after it has been frozen. When a FrameClock is frozen, it means that the clock is temporarily paused or halted, preventing the generation of new frame ticks and updates.

When the FrameClock thaws, it resumes its normal operation of generating frame ticks and updates. This is typically done when the application determines that it needs to resume animations or updates that were previously paused.

GTK uses this freezing mechanism to optimize performance and reduce unnecessary updates during periods when animations or updates are not needed, or to align the generation of next frame with the presentation time of the current frame when limited by the monitor’s vertical refresh rate (VSync).

The following is a GDB backtrace from the example code with a breakpoint added in the frame_callback() function:

(gdb) b frame_callback
Breakpoint 3 at 0x7ffff7f2e100: file wayland/../../../../../gdk/wayland/gdkwindow-wayland.c, line 570.
(gdb) c
(s): Cycle start
               get timings:
               |  - now: 1903492480109
               |  - frame time: 1903492494736 (counter: 11271) (frame time - now: 14627)
               |  - predicted presentation time: 1903492505049 (predicted - now: 24940)
1903492480109:  widget:on_tick_callback (rate: 337853788)
               get timings:
               |  - now: 1903492480316
               |  - frame time: 1903492494736 (counter: 11271) (frame time - now: 14420)
               |  - predicted presentation time: 1903492505049 (predicted - now: 24733)
1903492480316:  widget:on_draw (tick-draw latency: 207)
               get timings:
               |  - now: 1903492499890
               |  - frame time: 1903492494736 (counter: 11271) (frame time - now: -5154)
               |  - predicted presentation time: 1903492505049 (predicted - now: 5159)
1903492499890:  wl_surface:on_commit
(e): End of cycle
Thread 1 "gtk-frame-clock" hit Breakpoint 3, frame_callback (data=0x555555581ad0, callback=0x555555bd2420, time=1903492499) at wayland/../../../../../gdk/wayland/gdkwindow-wayland.c:570
570    {
(gdb) bt
#0  frame_callback (data=0x555555581ad0, callback=0x555555bd2420, time=1903492499) at wayland/../../../../../gdk/wayland/gdkwindow-wayland.c:570
#1  0x00007ffff66e9e2e in ffi_call_unix64 () at ../src/x86/unix64.S:105
#2  0x00007ffff66e6493 in ffi_call_int (cif=<optimized out>, fn=<optimized out>, rvalue=<optimized out>, avalue=<optimized out>, closure=<optimized out>) at ../src/x86/ffi64.c:672
#3  0x00007ffff74cdad0 in wl_closure_invoke (closure=closure@entry=0x5555555e1ac0, target=<optimized out>, target@entry=0x555555bd2420, opcode=opcode@entry=0, data=<optimized out>, flags=<optimized out>) at ../src/connection.c:1025
#4  0x00007ffff74ce243 in dispatch_event (display=display@entry=0x555555575220, queue=0x5555555752f0, queue=<optimized out>) at ../src/wayland-client.c:1583
#5  0x00007ffff74ce43c in dispatch_queue (queue=0x5555555752f0, display=0x555555575220) at ../src/wayland-client.c:1729
#6  wl_display_dispatch_queue_pending (display=0x555555575220, queue=0x5555555752f0) at ../src/wayland-client.c:1971
#7  0x00007ffff74ce490 in wl_display_dispatch_pending (display=<optimized out>) at ../src/wayland-client.c:2034
#8  0x00007ffff7f25548 in _gdk_wayland_display_queue_events (display=<optimized out>) at wayland/../../../../../gdk/wayland/gdkeventsource.c:201
#9  0x00007ffff7ec0a99 in gdk_display_get_event (display=0x55555557c0e0) at ../../../../gdk/gdkdisplay.c:442
#10 0x00007ffff7f2a996 in gdk_event_source_dispatch (base=<optimized out>, callback=<optimized out>, data=<optimized out>) at wayland/../../../../../gdk/wayland/gdkeventsource.c:120
#11 0x00007ffff6b02d3b in g_main_dispatch (context=0x55555558e800) at ../../../glib/gmain.c:3419
#12 g_main_context_dispatch (context=0x55555558e800) at ../../../glib/gmain.c:4137
#13 0x00007ffff6b58258 in g_main_context_iterate.constprop.0 (context=0x55555558e800, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../glib/gmain.c:4213
#14 0x00007ffff6b022b3 in g_main_loop_run (loop=0x555555825be0) at ../../../glib/gmain.c:4413
#15 0x00007ffff7848d2d in gtk_main () at ../../../../gtk/gtkmain.c:1329
#16 0x00005555555570a2 in main (argc=1, argv=0x7fffffffda88) at /home/user/local/git/examples/example_gdk_frame_clock/src/main.c:198
(gdb) c
(s): Cycle start
               get timings:
               |  - now: 1903498816343
               |  - frame time: 1903498827058 (counter: 11272) (frame time - now: 10715)
               |  - predicted presentation time: 1903498841278 (predicted - now: 24935)
1903498816343:  widget:on_tick_callback (rate: 6336234)
               get timings:
               |  - now: 1903498816550
               |  - frame time: 1903498827058 (counter: 11272) (frame time - now: 10508)
               |  - predicted presentation time: 1903498841278 (predicted - now: 24728)
1903498816550:  widget:on_draw (tick-draw latency: 207)
               get timings:
               |  - now: 1903498830894
               |  - frame time: 1903498827058 (counter: 11272) (frame time - now: -3836)
               |  - predicted presentation time: 1903498841278 (predicted - now: 10384)
1903498830894:  wl_surface:on_commit
(e): End of cycle

Here’s what’s happening:

  1. A breakpoint at line 570 of the gdkwindow-wayland.c file was set inside the frame_callback function.
  2. The program continues (c) and enters a FrameClock cycle (from “Cycle start”).
  3. The FrameClock goes through different phases like on_before_paint, on_update, and so on, collecting timings including current time, frame time, and predicted presentation time.
  4. The program hits the breakpoint at line 570 (frame_callback) as part of the cycle. The backtrace (bt) shows the function call stack, indicating that the frame_callback was triggered due to a Wayland event.
  5. The program again continues (c) and progresses through the FrameClock phases.

This output represents the cycle of the FrameClock in the example code program, with debug information showing the specific phases, timings, and the function calls being executed.

So… in conclusion

The FrameClock in GTK acts as a coordinator for frame updates and animations. It triggers a series of phases for each frame, including Before Paint, Update, Layout, Draw, Paint, and After Paint. These phases ensure that animations are smoothly coordinated and displayed, optimizing performance.

The concept of the frame drawn signal is crucial for achieving synchronization with the display’s VSync. This signal is emitted when a frame has been successfully presented on the screen by the compositor or windowing system. It allows the FrameClock to adjust its cycle and stay in harmony with the monitor’s refresh rate. This synchronization is achieved in the Wayland platform by utilizing the wl_surface commit done callback, which corresponds to the VSync signal.

There is no way to create your own FrameClock instance for your app. GTK applications that want to work with FrameClock should first add widgets to a GTK window and then obtain the FrameClock instance using gtk_widget_get_frame_clock().

And basically this is all what I have. My motivation for digging into this was to get a better understanding of the GTK FrameClock and how it works internally to produce smooth, synchronized animations in graphical applications. I hope this analysis is useful for others who are also curious about how this works.

by Pablo Saavedra at August 02, 2023 03:44 PM

July 28, 2023

José Dapena

Speeding up V8 heap snapshot

My last post, Javascript memory profiling with heap snapshot, finished announcing I would write a follow up post about several optimizations I implemented that make heap snapshot faster.

Good news! The post has been accepted in! You can read it here.

by José Dapena Paz at July 28, 2023 05:07 PM

July 25, 2023

Stéphane Cerveau

Discover GStreamer full static mode

How to embed statically your own tailored version of GStreamer in your application #

Since the gstreamer-full effort, it was possible to create a shared library which will embed the GStreamer framework in addition to its set of plugins.

Within this effort, it was also possible to register the selected plugins/features automatically by calling the gst_init method in your application linking with gstreamer-full.

This method was offering a gstreamer-full package with library, headers and pc files but it was not possible to embed GStreamer statically in your application and use it transparently.

GstVkVideoParser: a standalone solution #

In the journey to bring an open source solution for a video parser to the Vulkan Conformance Test Suite, we chose first to use GStreamer as it was bringing all the parsing facilities necessary to support the needed codecs such as H26x or VPx. This solution was supposed to be also cross platform and dragging as less as possible system dependencies. Seen that GStreamer is usually dragging its own dependencies such as glib or orc and as we wanted to have a standalone GstVkVideoParser library supported on Windows, a little bit of work and love was necessary to add this to GStreamer.

Unfortunately this solution has not be retained by the Vulkan Video TSG, not because it was not working but another parser has been made available and easy to integrate to the CTS at source level avoiding binary linkage, see Vulkan Video change.

GStreamer as a full static library #

With the gstreamer-full work, everything was almost ready to be used except to have gstreamer-full as a real static library and be able to link with it in any application.

Here is the MR merged and the challenges taken up:

Adding gst-full-target-type=static #

To generate the gstreamer-full dependency which will be statically linked into the application, we decided to introduce a new gst meson options, gst-full-target-type.

By default the gstreamer-full will be built as a shared library as before.

By passing gst-full-target-type=static, only static object will be generated and a package config file will be generated for gstreamer-full allowing the application to avoid to know what static library it needs to add the link line. The GStreamer build system will take care of enabling/disabling the features/libraries you (do/don't) need.

Initialize the plugins/features automatically #

To avoid multiple call necessary to initialize GStreamer, it was also necessary to call the gst_init_static_plugins along with gst_init only in full-static mode but it was leading to a build issue.

Indeed most of tools/examples/tests are linking with libgstreamer-1.0 which owns gst_init () but to faciliate the plugins registration, it was necessary to move all the tools build after the gstreamer-full stage. A first MR has been performed to let gstreamer tools be built against gstreamer-full but additional work was necessary for some core tools or helpers such as gst-transcoder or gst-plugin-scanner to avoid a linking issue.

Disable tests and examples #

In a future work all the tools/examples/tests should support the full-static mode but as GStreamer aims to be a shared object framework, we decided to leave this work for later and disable all the examples/tests in full static mode as most of the application using a tailor build won't need the examples and tests.

Windows support #

One of the goal of this work was to provide a Windows library to the Vulkan CTS free of dependency, which has been achieved but some additional work might be necessary to support all of the use case, the GStreamer framework offer, especially on supporting library-dependent plugins.

Give me an example ... #

In the GstVkVideoParser project, various jobs are building Linux and Windows versions generating a library without any GStreamer/glib dependencie, everything is embedded inside the library, as you can see in this GitHub' Actions.

In this project, GStreamer is used as a meson subproject/wrap which allows to build GStreamer along of GstVkVideoParser. This can be possible easily by adding the following file to your meson project



dependency_names = gstreamer-1.0, gstreamer-base-1.0, gstreamer-video-1.0, gstreamer-audio-1.0

and then add the following lines to your to depend on gstreamer-full

gstreamer_full_dep = dependency('gstreamer-full-1.0', fallback: ['gstreamer-1.0'], required :true)

In order to build a project, library or application which is using a tailored version of GStreamer you can follow this configure example:

$ meson buildfull-static --default-library=static --force-fallback-for=gstreamer-1.0,glib,libffi,pcre2 -Dauto_features=disabled -Dglib:tests=false -Djson-glib:tests=false -Dpcre2:test=false -Dvkparser_standalone=enabled -Dgstreamer-1.0:libav=disabled -Dgstreamer-1.0:ugly=disabled -Dgstreamer-1.0:ges=disabled -Dgstreamer-1.0:devtools=disabled -Dgstreamer-1.0:default_library=static -Dgstreamer-1.0:rtsp_server=disabled -Dgstreamer-1.0:gst-full-target-type=static_library -Dgstreamer-1.0:gst-full-libraries=gstreamer-video-1.0, gstreamer-audio-1.0, gstreamer-app-1.0, gstreamer-codecparsers-1.0 -Dgst-plugins-base:playback=enabled -Dgst-plugins-base:app=enabled -Dgst-plugins-bad:videoparsers=enabled -Dgst-plugins-base:typefind=enabled

In this case we are disabling everything in GStreamer by using -Dauto_features=disabled and some enabled features such as ges, libav, etc. and enable only what we need as plugins, playback, app, videoparsers and typefind.

And finally we are enabling the static build with --default-library=static and -Dgstreamer-1.0:gst-full-target-type=static_library.

Next ... #

As you can see, it's quite easy now to build an application and depends on gstreamer-full static build, but there is still some issues to address such as the plugins dependencies which might be not static and some other platform specific issue such as the gstreamer-full symbols export on Windows.

You can follow some open issues such as:

As usual, if you would like to learn more about Vulkan Video, GStreamer or any other open multimedia framework, please contact us!

July 25, 2023 12:00 AM

July 19, 2023

Ziran Sun

WASH in Schools

Madina Tindano is an elementary school student who lives in Bogandé, East Burkina Faso. For a long time, the toilets in her school remained unusable and completely abandoned. Now in her final year at the school (CM2), Madina is overjoyed with the project of renovation of the latrines in her school. Thanks to the promotion of children’s right to education through better access to water, sanitation and hygiene (WASH) project.

The NGO behind the WASH Project is the UNICEF Foundation in Spain. As one of their educational projects, WASH believes that every child has the right to a quality education including access to drinking water, sanitation and hygiene services while at school. This can impact students’ learning, health, and dignity, particularly for girls like Madina . “When I see my period, I am very embarrassed to come to school because we don’t have toilets.” Madina said. WASH aims to improve access to water, sanitation and hygiene in 12 rural schools, including Madina’s, in the East region of Burkina Faso. 2796 students, 77 teachers and 48 parents will benefit from this project.

Igalia has been collaborating with UNICEF Foundation in Spain since 2007 in promoting access to quality education for children in Africa, and is proud to have been a part of this effort by contributing the funds to help 2 rural schools, which will help 430 students, 11 teachers and 7 parents. To make sure the project goes as planned, a monitoring commission is formed including some UNICEF members and representatives from Igalia (Javier Fernández and María Piñeiro).

The project started in February 2022 and was expected to finish in a year’s time. Unfortunately, the implementation of the project was affected by the insecurity situation in Gnagna province, which had caused the closure of the initially pre-selected schools ( Our hearts go with the children, teachers and parents from these schools. We hope things work out the best for them) . For this reason, activities are reoriented towards new schools located in a safer area of the same province, the commune of Bogande, and the project has been extended until the end of June. Apart from this initial delay, the project has been progressing very well. To achieve the goals, WASH has managed to get students, local authorities and communities, local organizations and private sectors involved throughout the project. Following work have been carried out:

  • Improving access to sustainable water and sanitation facilities by constructing and rehabilitating water points and latrines, also via distribution of WASH kits.
  • Increasing knowledge on good hygiene practices by providing training for Hygiene Clubs, implementing Schools Action Plan and running awareness raising campaigns.
  • Strengthening schools and communities’ capacities by providing training to Parents’ Associations and Teachers.

Doesn’t this joyful smile make you feel happy too? :-).

by zsun at July 19, 2023 09:36 AM

July 08, 2023

Philippe Normand

GNOME Web Canary is back

This is a short PSA post announcing the return of the GNOME Web Canary builds. Read on for the crunchy details.

A couple years ago I was blogging about the GNOME Web Canary flavor. In summary this special build of GNOME Web provides a preview of the upcoming version of the underlying WebKitGTK engine, it is potentially unstable, but allows for testing features that have not shipped in a stable release yet.

Unfortunately, Canary broke right after GNOME Web switched to GTK4, because back then the WebKit CI was missing build bots and infrastructure for hosting WebKitGTK4 build artefacts. Recently, thanks to the efforts of my Igalia colleagues, Pablo Abelenda, Lauro Moura, Diego Pino and Carlos López the WebKit CI provides WebKitGTK4 build artefacts, hosted on a server kindly provided by Igalia.

The installation instructions are already mentioned in the introductory post but I’ll just remind them again here:

flatpak --user remote-add --if-not-exists webkit
flatpak --user install


If you installed the older version of Canary, pre-GTK4, you might see an error related with an expired GPG key. This is due to how I update the WebKit runtime, and I’ll try to avoid it in future updates. For the time being, you can remove the flatpak remote and re-add it:

flatpak --user remote-delete webkit
flatpak --user remote-add webkit

That’s all folks, happy hacking and happy testing.

by Philippe Normand at July 08, 2023 03:30 PM

June 30, 2023

Igalia Compilers Team

Porting BOLT to RISC-V

Recently, initial support for RISC-V has landed in LLVM's BOLT subproject. Even though the current functionality is limited, it was an interesting experience of open source development to get to this point. In this post, I will talk about what BOLT is, what it takes to teach BOLT how to process RISC-V binaries, and the interesting detours I sometimes had to make to get this work upstream.

BOLT overview #

BOLT (Binary Optimization and Layout Tool) is a post-link optimizer whose primary goal is to improve the layout of binaries. It uses sample-based profiling to improve the performance of already fully-optimized binaries. That is, the goal is to be complementary to existing optimization techniques like PGO and LTO, not to replace them.

Sample-based profiling is used in order to make it viable to obtain profiles from production systems as its overhead is usually negligible compared to profiling techniques based on instrumentation. Another advantage is that no special build configuration is needed and production binaries can directly be profiled. The choice for binary optimization (as opposed to, say, optimizing at the IR level) comes from the accuracy of the profile data: since the profile is gathered at the binary level, mapping it back to a higher level representation of the code can be a challenging problem. Since code layout optimizations can quite easily be applied at the binary level, and the accuracy of the profile is highest there, the choice for performing post-link optimization seems to be a logical one.

To use BOLT, it needs access to a binary and corresponding profile. As mentioned before, the goal is to optimize production binaries so no special build steps are required. The only hard requirement is that the binary contains a symbol table (so stripped binaries are not supported). In order for BOLT to be able to rearrange functions (in addition to the code within functions), it needs access to relocations. Linkers usually remove relocations from the final binary but can be instructed to keep them using the --emit-relocs flag. For best results, it is recommended to link your binaries with this flag.

Gathering a profile on Linux systems can be done in the usual way using perf. BOLT provides the necessary tools to convert perf output to an appropriate format, and to combine multiple profiles. On systems where perf is not available, BOLT can also instrument binaries to create profiles. For more information on how to use BOLT, see the documentation.

For more details on BOLT, including design decisions and evaluation, see the CGO'19 paper. Let's move on to discuss some of BOLT's internals to understand what is needed to support RISC-V.

BOLT internals #

Optimizing the layout of a binary involves shuffling code around. The biggest challenge in doing this, is making sure that all code references are still correct. Indeed, moving a function or basic block to a different location means changing its address and all jumps, calls, or other references to it need to be updated because of it.

To do this correctly, BOLT's rewriting pipeline transforms binaries in the following (slightly simplified) way:

  1. Function discovery: using (mostly) the ELF symbol table, the boundaries of functions are recorded;
  2. Disassembly: using LLVM's MC-layer, function bodies are disassembled into lists of MCInst objects;
  3. CFG construction: basic blocks are discovered in the instruction lists and references between them resolved, resulting in a control-flow graph for each function;
  4. Optimizations: using the CFG, basic block and function layout is optimized based on the profile;
  5. Assembly: the new layout is emitted, using LLVM's MCStreamer API, to an ELF object file in memory;
  6. Link: since this object file might still contain external references, it is linked to produce the final binary.

Some of these steps are completely architecture independent. For example, function discovery only needs the ELF symbol table. Others do need architecture specific information. Fortunately, BOLT has supported multiple architectures from the beginning (X86-64 and AArch64) so an abstraction layer exists that makes it relatively straightforward to add a new target. Let's talk about what is needed to teach BOLT to transform RISC-V binaries.

Teaching BOLT RISC-V #

Thanks to BOLT's architecture abstraction layer, adding support for a new target turned out to be mostly straightforward. I will go over the parts of BOLT's rewriting pipeline that need architecture-specific information while focusing on the aspects of RISC-V that made this slightly tricky sometimes.

(Dis)assembly #

Assembly and disassembly of binaries is obviously architecture-dependent. BOLT uses various MC-layer LLVM APIs to perform these tasks. More specifically, MCDisassembler is used for disassembly while MCAssembler is used (indirectly via MCObjectStreamer) for assembly. The good news is that there is excellent RISC-V support in the MC-layer so this can readily be used by BOLT.

CFG construction #

The result of disassembly is a linear list of instructions in the order they appear in the binary. In the MC-layer, instructions are represented by MCInst objects. In this representation, instructions essentially consist of an opcode and a list of operands, where operands could be registers, immediates, or more high-level expressions (MCExpr). Expressions can be used, for example, to refer to symbolic program locations (i.e., labels) instead of using constant immediates.

Right after disassembly, however, all operands will be registers or immediates. For example, an instruction like

jal ra, f

will be disassembled into (heavy pseudo-code here)

MCInst(RISCV::JAL, [RISCV::X1, ImmOffset])

where ImmOffset is the offset from the jal instruction to f. This is not convenient to handle in BOLT as nothing indicates that this MCInst actually refers to f.

Therefore, BOLT post-processes instructions after disassembly and replaces immediates with symbolic references where appropriate. Two different mechanisms are used to figure out the address an instruction refers to:

  • For control-transfer instructions (e.g., calls and branches), MCInstrAnalysis is used to evaluate the target. LLVM's RISC-V backend already contained an appropriate implementation for this.
  • For other instructions (e.g., auipc/addi pairs to load an address in RISC-V), relocations are used. For this, BOLT's Relocation class had to be extended to support RISC-V ELF relocations.

Once the target of an instruction had been determined, BOLT creates an MCSymbol at that location and updates the MCInst to point to that symbol instead of an immediate offset.

One question remains: how does BOLT detect control-transfer instructions? Let's first discuss how BOLT creates the control-flow graph now that all instructions symbolically refer to their targets.

A CFG is a directed graph where the nodes are basic blocks and the edges are control-flow transfers between those basic blocks. Without going into details, BOLT has a target-independent algorithm to create a CFG from a list of instructions (for those interested, you can find it here). It needs some target-specific information about instructions though. For example:

  • Terminators are instructions that end basic block (e.g., branches and returns but not calls).
  • Branches and jumps are the instructions that create edges in the CFG.

To get this information, BOLT relies again on MCInstrAnalysis which provides methods such as isTerminator and isCall. These methods can be specialized by specific LLVM backends but the default implementation relies on the MCInstrDesc class. Objects of this class are generated by various TableGen files in the backends (e.g., this one for RISC-V). An important property of MCInstrDesc for the next discussion is that its information is based only on opcodes, operands are not taken into account.

LLVM's RISC-V backend did not specialize MCInstrAnalysis so BOLT was relying MCInstrDesc to get information about terminators and branches. For many targets (e.g., X86) this might actually be fine but for RISC-V, this causes problems. For example, take a jal instruction: is this a terminator, a branch, a call? Based solely on the opcode, we cannot actually answer these questions because jal is used both for direct jumps (terminator) and function calls (non-terminator).

The solution to this problem was to specialize MCInstrAnalysis for RISC-V taking the calling convention into account:

  • jal zero, ... is an unconditional branch (return address discarded);
  • jal ra, ... is a call (return address stored in ra (x1) which the calling convention designates as the return address register);
  • Some more rules for jalr, compressed instructions, detecting returns,...

So the first patch that landed to pave the way for RISC-V support in BOLT was not in the BOLT project but in the RISC-V MC-layer.

With this in place, the patch to add a RISC-V target to BOLT consisted mainly of implementing the necessary relocations and implementing the architecture abstraction layer. The latter consisted mainly of instruction manipulation (e.g., updating branch targets), detecting some types of instructions not supported by MCInstrAnalysis (e.g., nops), and analyzing RISC-V-specific Procedure Linkage Table (PLT) entries (so BOLT knows which function they refer to). Once I started to understand the internals of BOLT, this was relatively straightforward. After iterating over the patch with the BOLT maintainers (who were very helpful and responsive during this process), it got accepted in less than a month.

There was just one minor issue to resolve.

Linking #

The final step in the rewriting pipeline is linking the generated object file. BOLT is able to rely on LLVM again by using the RuntimeDyld JIT linker which is part of the MCJIT project. Unfortunately, there was no RISC-V support yet in RuntimeDyld. Looking at the supported targets, it seemed easy enough to implement RISC-V support: I just needed to implement the few relocations that BOLT emits. So I submitted a patch.

Alas, it seemed that things might not be as easy as I hoped:

Is there something preventing Bolt from moving to ORC / JITLink? If Bolt is able to move over then the aim should be to do that. If Bolt is unable to move over then we need to know why so that we can address the issue. RuntimeDyld is very much in maintenance mode at the moment, and we're working hard to reach parity in backend coverage so that we can officially deprecate it.

Even though this comment was followed up by this:

None of that is a total blocker to landing this, but the bar is high, and it should be understood that Bolt will need to migrate in the future.

trying to push-through the patch didn't feel like the right approach. For one, I'm anticipating to need some more advanced linker features for RISC-V in the future (e.g., linker relaxation) and I wouldn't want to implement those in a deprecated linker. Moreover, the recommended linker, JITLink, has mostly complete RISC-V support and, importantly, more users and reviewers, making its implementation most certainly of higher quality than what I would implement by myself in RuntimeDyld.

So the way forward for bringing RISC-V support to BOLT seemed to be to first port BOLT from using RuntimeDyld to JITLink. Since it looked like this wasn't going to be a priority for the BOLT maintainers, I decided I might as well give it a shot myself. Even though this would surely mean a significant delay in finishing my ultimate goal of RISC-V support in BOLT, it felt like a great opportunity to me: it allowed me to learn more about linkers and BOLT's internals, as well as to invest in a project that am hoping to use in the foreseeable future.

Porting BOLT to JITLink was hard, at least for me. It had a far ranging impact on many parts of BOLT that I had never touched before. This meant it took quite some time to try and understand these parts, but also that I learned a lot in the process. Besides changes to BOLT, I submitted a few JITLink patches to implement some missing AArch64 relocations that BOLT needed. In the end, I managed to pass all BOLT tests and submit a patch.

This patch took about a month and a half to get accepted. The BOLT maintainers were very helpful and responsive in the process. They were also very strict, though. Rightfully so, of course, as BOLT is being used in production systems. The main requirement for the patch to get accepted was that BOLT's output would be a 100% binary match with the RuntimeDyld version. This was necessary to ease the verification of the correctness of the patch. With the help of the BOLT maintainers, we managed to get the patch in an acceptable state to land it.

Looking forward #

With BOLT being ported to JITLink, the patch to add initial RISC-V support to BOLT could finally land. This doesn't mean that BOLT is currently very usable for RISC-V binaries, though: most binaries can pass through BOLT fine but many of BOLT's transformations are not supported yet.

Since the initial support was added, I landed a few more patches to improve usability. For example, support for an obscure ELF feature called composed relocations was added, something RISC-V uses for R_RISCV_ADD32/SUB32 relocations (which BOLT supports now). Other patches deal with creation and reversal of branches, something BOLT needs to fix-up basic blocks after their layout has changed.

I'm currently working on handling binaries that have been relaxed during linking. The issue is that, after BOLT has moved code around, relaxed instructions might not fit the new addresses anymore. I plan to handle this as follows: during disassembly, BOLT will "unrelax" instructions (e.g., translating a jal back to an auipc/jalr pair) to make sure new addresses will always fit. The linker will then undo this, when possible, by performing relaxation again. The first step for this, adding linker relaxation support to JITLink, has been landed. More on this in a future post.

Wrapping up #

Bringing initial RISC-V support to BOLT has been a very interesting and educational journey for me, both from a technical as well as a social perspective. Having to work on multiple projects (LLVM MC, JITLink, BOLT) has taught me new technologies and put me in contact with great communities. I certainly hope to be able to continue this work in the future.

I'll close this post with a reference of the graph at the top, showing what it took, over a series of ~25 patches, to get RISC-V support in BOLT. I think this demonstrates the kind of detours that are sometimes needed to get work upstream, in this case benefiting both the RISC-V community (RISC-V support in BOLT) and BOLT as a whole (moving away from a deprecated linker and fixing bugs encountered along the way)

June 30, 2023 12:00 AM

June 22, 2023

Ziran Sun

Igalia helps building a library in Yoff

People in Yoff Senegal are expecting to have a library built at the ground floor of a local school named “Coruña” in 2024. Thanks to the “A library in Yoff” project.

The “A library in Yoff ” project is led by Ecodesarrollo Gaia, a NGO based in A Coruña, Spain. Ecodesarrollo Gaia is the founder for Yoff Coruña school. As part of their educational project, this effort aims to provide a safe and peaceful place for the local community to access books and other educational and cultural resources, and to create more jobs and professional development opportunities for the local community. Above all, Ecodesarrollo Gaia would like to get students from the school involved in this project.

Igalia has been working with Ecodesarrollo Gaia since 2018 and is very proud to provide full funds for this project. The funds cover building construction work, acquiring essential furniture, creating foundational bibliographic batches, providing computer equipment and digital media, and carrying new staff employment and training.

This 1 year project has been progressing well. At the time of writing, the constructor who built the school has scheduled a meeting in June to specify the details of the construction. People in charge of the municipal library Sagrada Familia of A Coruña have been contacted to prepare a training course in Yoff at the beginning of 2024. At the moment, some members of Ecodesarrollo have traveled to Yoff and stayed locally to help run the project.

A lot to look forward to!

by zsun at June 22, 2023 12:06 PM

June 20, 2023

Eric Meyer

First-Person Scrollers

I’ve played a lot of video games over the years, and the thing that just utterly blows my mind about them is how every frame is painted from scratch.  So in a game running at 30 frames per second, everything in the scene has to be calculated and drawn every 33 milliseconds, no matter how little or much has changed from one frame to the next.  In modern games, users generally demand 60 frames per second.  So everything you see on-screen gets calculated, placed, colored, textured, shaded, and what-have-you in 16 milliseconds (or less).  And then, in the next 16 milliseconds (or less), it has to be done all over again.  And there are games that render the entire scene in single-digits numbers of milliseconds!

I mean, I’ve done some simple 3D render coding in my day.  I’ve done hobbyist video game development; see Gravity Wars, for example (which I really do need to get back to and make less user-hostile).  So you’d think I’d be used to this concept, but somehow, I just never get there.  My pre-DOS-era brain rebels at the idea that everything has to be recalculated from scratch every frame, and doubly so that such a thing can be done in such infinitesimal slivers of time.

So you can imagine how I feel about the fact that web browsers operate in exactly the same way, and with the same performance requirements.

Maybe this shouldn’t come as a surprise.  After all, we have user interactions and embedded videos and resizable windows and page scrolling and stuff like that, never mind CSS animations and DOM manipulation, so the viewport often needs to be re-rendered to reflect the current state of things.  And to make all that feel smooth like butter, browser engines have to be able to display web pages at a minimum of 60 frames per second.

Admittedly, this would be a popular UI for browsing social media.

This demand touches absolutely everything, and shapes the evolution of web technologies in ways I don’t think we fully appreciate.  You want to add a new selector type?  It has to be performant.  This is what blocked :has() (and similar proposals) for such a long time.  It wasn’t difficult to figure out how to select ancestor elements — it was very difficult to figure out how to do it really, really fast, so as not to lower typical rendering speed below that magic 60fps.  The same logic applies to new features like view transitions, or new filter functions, or element exclusions, or whatever you might dream up.  No matter how cool the idea, if it bogs rendering down too much, it’s a non-starter.

I should note that none of this is to say it’s impossible to get a browser below 60fps: pile on enough computationally expensive operations and you’ll still jank like crazy.  It’s more that the goal is to keep any new feature from dragging rendering performance down too far in reasonable situations, both alone and in combination with already-existing features.  What constitutes “down too far” and “reasonable situations” is honestly a little opaque, but that’s a conversation slash vigorous debate for another time.

I’m sure the people who’ve worked on browser engines have fascinating stories about what they do internally to safeguard rendering speed, and ideas they’ve had to spike because they were performance killers.  I would love to hear those stories, if any BigCo devrel teams are looking for podcast ideas, or would like to guest on Igalia Chats. (We’d love to have you on!)

Anyway, the point I’m making is that performance isn’t just a matter of low asset sizes and script tuning and server efficiency.  It’s also a question of the engine’s ability to redraw the contents of the viewport, no matter what changes for whatever reason, with reasonable anticipation of things that might affect the rendering, every 15 milliseconds, over and over and over and over and over again, just so we can scroll our web pages smoothly.  It’s kind of bananas, and yet, it also makes sense.  Welcome to the web.

Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at June 20, 2023 12:16 PM

June 19, 2023

Javier Fernández

Secure Curves in the Web Cryptography API


Developers are exceptionally creative with the tools they are given. For a long time now they’ve had the ability to apply the Web Cryptography API to many uses. Getting random values from this API is, for example, an exceptionally popular use case being used on over 60% of page loads in the HTTP Archive dataset. Of course, it’s intended use is about actual cryptography and it offers numerous algorithms.

However, if developers feel the algorithm they need isn’t available from this API, they’ll write it (or compile to WASM) themselves. That’s the case today when it comes to “secure curve” algorithms, like X25519 [RFC7748] or Ed25519 [RFC8032] . These are desirable because they offer strong security guaranties while operating at much better performance levels than others. This is a shame because your browser already has internal support for these as part of TLS 1.3, it’s just not exposed to developers. Those userland solutions come with added costs of complexity, bandwidth, overall performance and has security implications.

Adding some Secure Curves to the Web Cryptography API would provide many advantages to web authors, but this has been a multi-year challenge that thanks to the collaboration between Igalia and Protocol Labs is close to give some results.


Secure elliptic curves play a very important role in the area of cryptography, providing robust and efficient algorithms. Among the available algorithms of this kind, two curves that have gained significant attention in recent years are Ed25519 and X25519. These curves are based on the Edwards and Montgomery forms respectively, and offer strong security guaranties while still operating at excellent performance levels.

I think adding these curves to the API has been always an obvious step, but if we want to have the whole picture, we may need to step back and talk a bit about the history of the Web Cryptography API specification and why it’s has been so difficult to incorporate new and more modern algorithms in the last years.

The Web Cryptography API specification

In an effort of ensuring secure communication and data protection in the web, the W3C created the Web Cryptography Working Group which among its goals had the definition of an API that lets developers implement secure application protocols on the level of Web applications. Out of this effort the WG published the Web Cryptography API, becoming a W3C Recommendation in January 2017.

This specification defines a comprehensive set of interfaces and algorithms for performing various cryptographic tasks, such as encryption, decryption, digital signatures, key generation, and key management. As usual, one of the main goals of the W3C specs is to encourage an interoperable cryptographic API across different web browsers and platforms. This simplifies the development process and ensures compatibility and portability of web-based cryptographic applications.

There are several cryptographic algorithms defined in the Web Cryptography API, including symmetric encryption algorithms like AES, asymmetric encryption like RSA and Elliptic Curve Cryptography algorithms (ECC), hash functions like SHA-256 or digital signature algorithms like RSA-PSS. These API allows web authors to implement strong cryptographic mechanism without requiring a deep knowledge of the underlying cryptographic primitives.

It’s also important to note that the spec not only defines the cryptographic algorithms available for web applications, but also some security considerations, such as key storage and management, handling of sensitive data and protection against common security attacks. These considerations ensure that the apps implement their cryptographic logic in secure and robust way.

The adoption of the Web Cryptography API specification by major web browsers has been a key factor in enabling secure web applications and ensuring trust in only transactions.

Why it took so long to add Secure Curves

The lack of safe curves in the Web Cryptography specification has been a long-term issue for web developers that were forced to rely on third-party or native implementation for their applications. Even more when their use has been widely spread along non-web software components.

All these claims become an actual proposal when Qingsi Wang (Google) filed an issue for the TAG in the beginning of 2020. he proposal got quite positive feedback from Firefox engineers as it was clearly stated in the standard position request driven by David Baron (Mozillian back then) and Tantek Çelik, and endorsed by Martin Thomson.

So, despite the lack of a clear position from Safari, the proposal was accepted by the TAG with the support of 2 major browsers and the only concern of a proper standarization venue, given that the former Web Cryptography WG was closed a few years before. The solution to address these concerns was to develop this new specification in the Web Incubators Working Group.

The last Web Cryptography candidate recommendation was published in 2017, when it was still under the umbrella of the mentioned Web Cryptography WG. Since then, the spec drafts have been reviewed and published by the Web Application Security Working Group and with Daniel Huigens (Proton AG) as the only spec editor.

Even with this unstable situation, but with the support of 2 (Firefox and Chrome) main browsers, an intent to prototype request for Chrome was announced and the implementation started in Feb 2020. Unfortunately the work was not completed and even the partial implementation was removed from the Chromium source code repository.

After some time of maturing, the initial explainer written by Qingsi Wang was used to create the Secure Curves in the Web Cryptography API document, a potential W3C spec under the umbrella of the WIWG thanks to the work of its editor Daniel Huigens. The long-term plan is that the spec will be eventually integrated into the Web Cryptography API specification; and this is where Protocol Labs enters in the scene.

Protocol Labs contribution to the Web Crypto spec

Last year Protocol Labs defined a new goal in our long-term collaboration to get some progress on the effort to make the secure curves spec part of the Web Cryptography API. This kind of cryptography algorithm is a fundamental tool for several uses cases of the IPFS ecosystem they are trying to build during the last years.

The Ed25519 key pairs have become the standard in many web applications and the IPFS protocol has adopted them as default some time ago. Additionally, Ed25519 public keys had been primary identifiers across dat/hypercore and SSB from the beginning and most of the projects in this technology ecosystem prefer them due to the smaller key sizes and the possibility of implementing faster operations, in comparison to the use of RSA keys.

Since the adoption of UCANs by many teams inside Protocol Labs, it’s been frequent the hard choice between natively supported RSA keys in browsers versus the preferred Ed25519 keys, with the only option of relying on external libraries. The use of this external software components (eg many js / wasm ) implies a security risk of them been compromised. In most cases it is desired to have private keys non-extractable to prevent attacks from malicious scripts and/or web extensions, which can not be accomplished with js/wasm implementations; supply chain attacks is another vector that user space implementations are exposed to.

The alternatives to the lack of support of secure curves in the Web Platform has been bundling user space implementation of Ed25519 for signature verification (which increases complexity and amount of code of the programs) or the use of built-in RSA for signing (to prevent possible attacks as the ones described above).

In summary, Protocol Labs and Igalia consider that providing implementations of secure curves like Ed25519 and X25519 in the Web Cryptography API will provide to the Web Platform a very important feature that fills the gap respect to other native implementations. It will become a more competitive development platform for many projects, addressing the previously described attack vectors and in many cases simplifying applications and their implementation effort, as they will no longer require joggling Ed25519 and RSA keys.

Working plan

As I commented about, the long term goal is to get the full standardization status of the Secure Curves document and make the algorithms it defines part of the general Web Cryptography API specification. In order to achieve this goal it’s needed that most of the main browser implement the algorithms, ensuring a good level of interoperability. There are quite many Web Platform Tests for these new algorithms in the WebCryptoAPI test suite, so it’s a good start.

The nature of this goal, which I want to remark that is part of a long term and more general collaboration between Igalia and Protocol Labs, is a multi-browser task. Our plan is to implement, or collaborate with patches, spec work and tests, the Ed25519 and X25519 algorithms in Chromium, Firefox and Safari. Hence, one of the first steps has been to issue a standard position request for WebKit, which received positive feedback. This was useful to send a new intent to prototype request in Chrome, reactivating the one abandoned a few years ago.

Regarding Firefox, despite the positive feedback on the standard position request filed back in 2020, the implementation has not started yet and it’s pending on some blocking issues; I’ll elaborate on this issue in the next section.

Current status


Our first target for this task has been the Chromium browser. Perhaps the best way to follow the progress of this work is through the Chrome Platform Status site, where there is a specific entry for this feature. If you are interested on the implementation details you can check the tracking bug.

It’s important to notice that the feature is being implemented behind the WebCryptoCurve25519 runtime flag, so if you are interested on trying it out you should enable the Experimental Web Platform Features. I’m going to talk later about what’s missing to propose the intent to ship request so that the feature could be enabled by default.

The Implementation of the Ed25519 algorithm landed Chromium in Nov 2022 and shipped in Chrome 110.0.5424.0. The X25519 key sharing algorithm took more time due to the review process, but it finally landed in March 2023 and has been shipped in Chrome since 113.0.5657.0. I can’t be more grateful to the patient and awesome work that David Benjamin (Google) did with all the reviews; contributing to the Chromium project has been always a pleasure and the review process extremely useful and agile, and this time it was not an exception.


Soon after getting positive feedback on the standard position request I filed, and in parallel to work on the implementation for Chrome, Safari engineers started the implementation of the Ed25519 algorithm for the WebKit engine. The main developer of this work has been Angela Izquierdo with reviews from Youenn Fablet mainly. Safari shipped the Ed25519 algorithm implementation in STP 163 and enabled by default for the COCOA WebKit port. I have in my TODO to enable it for the WebKitGtk+ port as well.

The implementation of the X25519 key sharing algorithm has not started yet, but I’ve been in conversation with some WebKit engineers to see how we can collaborate on this effort. Anyone interested could follow bug 258279 to track the progress of the implementation. I hope to have some time for this task during H2 this year.


This is the browser that is more delayed regarding the implementation of the secure curves. I filed the bug 1804788 to track the implementation work and started already a preliminary analysis of the Gecko’s and NSS codebase. Unfortunately, it seems there is still some pending work (see bug 1325335 for details) to add the curve25519 cryptography primitives in the NSS library and this is blocking the Web Crypto API implementation.

We are already in conversations with some Firefox engineers and it seems there may be some progress by H2 this year as well.


The following table provides a high-level overview of the support of the secure curves25519 in some of the main browsers:

Browser Ed25519 X25519
Chrome ✅ ✅
Safari 🚀 🚧
Firefox 🚧 🚧

The following graphs show the current interoperability from

Test results for the generateKey method:

Test results for the deriveBits and deriveKey methods:

Tests results for the importKey and exportKey methods:

Tests results for the sign and verify methods:

Tests results for the wrap and unwrap methods:

Next steps

Shipping by default in Chrome

On of the top priorities for H2 is to send the intent to ship request for Chrome. There are currently 2 issues that are blocking this task:

  • bug 1402835 – Ensure Ed25519 and X25519 implementations matches the spec regarding small-order keys
  • bug 1433707 – Handling optional length in X25519 does not match spec

Regarding the the first issue, in the last draft of the Secure Curves in the Web Cryptography specification states that there must be checks for the all-zero values to ensure small-order keys are rejected (as per RFC7748 Section 6.1).

If secret is the all-zero value, then throw a OperationError. This check must be performed in constant-time, as per [RFC7748] Section 6.1″

However, there is an ongoing discussion in the PR#13 to introduce a change so that the small-order keys are rejected during the import operation instead of when they are used. It’s worth mentioning the strong opposition from Chrome to this spec change, under the argument of following the RFC 7748 where it’s stated to do the checks when the keys are used and considers this PR a regression. There is also an ongoing discussion about this in WebKit in the form of a new standard-position request, but still no feedback on this side.

There are WPT to ensure that the X25519 algorithm works as expected with small-order keys, but since they assume that the all-zero checks are performed at the derivation phase, there are asserts to ensure the initial keys are valid. If the spec changes, these tests must be adapted.

Regarding the second issue, there is an active discussion in the issue#322 where despite the different positions about the best approach to address it, there is a clear consensus that the Web Cryptography API spec has several inconsistencies on how the deriveBit function’s ‘length’ parameter is defined. These inconsistencies have lead to wrong WPT definitions and possibly some browser’s implementations that would beed to be changed. Although there is a clear lack of interoperability here, the most concerning issue is the correctness of the implementations and how any potential change may affect to the deriveKey operations of ECDH, HFDF and PBKDF2 algorithms.

WebKit’s implementation of X25519

As I said before, we are currently analyzing the WebKit’s codebase to see if we could have some resources to start the implementation early in H2.

Firefox’s implementation of both Ed25519 and X25519

Until there is support in Firefox’s NSS component for the Curve25519 cryptographic primitives we are not able to start with the implementation of the Web Cryptography API for these algorithms.


The work that Igalia and Protocol Labs are doing in the Web Cryptography API specification will have a big impact on how web developers use the platform these days, reducing security risks and allowing lighter and simpler applications.

We are working very hard to offer web authors native support for the Ed255129 and X25519 in the main browsers (Safari, Firefox, Chrome) by the end of 2033, including all the Chromium based browsers (eg, Edge, Brave, Opera).

This work is another example of the Protocol Labs’s commitment with an open Web Platform and open source browsers, investing their resources on a great variety of features with wide impact on web authors.

by jfernandez at June 19, 2023 10:13 PM

June 15, 2023

Andy Wingo

parallel futures in mobile application development

Good morning, hackers. Today I'd like to pick up my series on mobile application development. To recap, we looked at:

  • Ionic/Capacitor, which makes mobile app development more like web app development;

  • React Native, a flavor of React that renders to platform-native UI components rather than the Web, with ahead-of-time compilation of JavaScript;

  • NativeScript, which exposes all platform capabilities directly to JavaScript and lets users layer their preferred framework on top;

  • Flutter, which bypasses the platform's native UI components to render directly using the GPU, and uses Dart instead of JavaScript/TypeScript; and

  • Ark, which is Flutter-like in its rendering, but programmed via a dialect of TypeScript, with its own multi-tier compilation and distribution pipeline.

Taking a step back, with the exception of Ark which has a special relationship to HarmonyOS and Huawei, these frameworks are all layers on top of what is provided by Android or iOS. Why would you do that? Presumably there are benefits to these interstitial layers; what are they?

Probably the most basic answer is that an app framework layer offers the promise of abstracting over the different platforms. This way you can just have one mobile application development team instead of two or more. In practice you still need to test on iOS and Android at least, but this is cheaper than having fully separate Android and iOS teams.

Given that we are abstracting over platforms, it is natural also to abandon platform-specific languages like Swift or Kotlin. This is the moment in the strategic planning process that unleashes chaos: there is a fundamental element of randomness and risk when choosing a programming language and its community. Languages exist on a hype and adoption cycle; ideally you want to catch one on its way up, and you want it to remain popular over the life of your platform (10 years or so). This is not an easy thing to do and it's quite possible to bet on the wrong horse. However the communities around popular languages also bring their own risks, in that they have fashions that change over time, and you might have to adapt your platform to the language as fashions come and go, whether or not these fashions actually make better apps.

Choosing JavaScript as your language places more emphasis on the benefits of popularity, and is in turn a promise to adapt to ongoing fads. Choosing a more niche language like Dart places more emphasis on predictability of where the language will go, and ability to shape the language's future; Flutter is a big fish in a small pond.

There are other language choices, though; if you are building your own thing, you can choose any direction you like. What if you used Rust? What if you doubled down on WebAssembly, somehow? In some ways we'll never know unless we go down one of these paths; one has to pick a direction and stick to it for long enough to ship something, and endless tergiversations on such basic questions as language are not helpful. But in the early phases of platform design, all is open, and it would be prudent to spend some time thinking about what it might look like in one of these alternate worlds. In that spirit, let us explore these futures to see how they might be.

alternate world: rust

The arc of history bends away from C and C++ and towards Rust. Given that a mobile development platform has to have some low-level code, there are arguments in favor of writing it in Rust already instead of choosing to migrate in the future.

One advantage of Rust is that programs written in it generally have fewer memory-safety bugs than their C and C++ counterparts, which is important in the context of smart phones that handle untrusted third-party data and programs, i.e., web sites.

Also, Rust makes it easy to write parallel programs. For the same implementation effort, we can expect Rust programs to make more efficient use of the hardware than C++ programs.

And relative to JavaScript et al, Rust also has the advantage of predictable performance: it requires quite a good ahead-of-time compiler, but no adaptive optimization at run-time.

These observations are just conversation-starters, though, and when it comes to imagining what a real mobile device would look like with a Rust application development framework, things get more complicated. Firstly, there is the approach to UI: how do you get pixels on the screen and events from the user? The three general solutions are to use a web browser engine, to use platform-native widgets, or to build everything in Rust using low-level graphics primitives.

The first approach is taken by the Tauri framework: an app is broken into two pieces, a Rust server and an HTML/JS/CSS front-end. Running a Tauri app creates a WebView in which to run the front-end, and establishes a bridge between the web client and the Rust server. In many ways the resulting system ends up looking a lot like Ionic/Capacitor, and many of the UI questions are left open to the user: what UI framework to use, all of the JavaScript programming, and so on.

Instead of using a platform's WebView library, a Rust app could instead ship a WebView. This would of course make the application binary size larger, but tighter coupling between the app and the WebView may allow you to run the UI logic from Rust itself instead of having a large JS component. Notably this would be an interesting opportunity to adopt the Servo web engine, which is itself written in Rust. Servo is a project that in many ways exists in potentia; with more investment it could become a viable alternative to Gecko, Blink, or WebKit, and whoever does the investment would then be in a position of influence in the web platform.

If we look towards the platform-native side, though there are quite a number of Rust libraries that provide wrappers to native widgets, practically all of these primarily target the desktop. Only cacao supports iOS widgets, and there is no equivalent binding for Android, so any NativeScript-like solution in Rust would require a significant amount of work.

In contrast, the ecosystem of Rust UI libraries that are implemented on top of OpenGL and other low-level graphics facilities is much more active and interesting. Probably the best recent overview of this landscape is by Raph Levien, (see the "quick tour of existing architectures" subsection). In summary, everything is still in motion and there is no established consensus as to how to approach the problem of UI development, but there are many interesting experiments in progress. With my engineer hat on, exploring these directions looks like fun. As Raph notes, some degree of exploration seems necessary as well: we will only know if a given approach is a good idea if we spend some time with it.

However if instead we consider the situation from the perspective of someone building a mobile application development framework, Rust seems more of a mid/long-term strategy than a concrete short-term option. Sure, build low-level libraries in Rust, to the extent possible, but there is no compelling-in-and-of-itself story yet that you can sell to potential UI developers, because everything is still so undecided.

Finally, let us consider the question of scripting: sometimes you need to add logic to a program at run-time. It could be because actually most of your app is dynamic and comes from the network; in that case your app is like a little virtual machine. If your app development framework is written in JavaScript, like Ionic/Capacitor, then you have a natural solution: just serve JavaScript. But if your app is written in Rust, what do you do? Waiting until the app store pushes a new version of the app to the user is not an option.

There would appear to be three common solutions to this problem. One is to use JavaScript -- that's what Servo does, for example. As a web engine, Servo doesn't have much of a choice, but the point stands. Currently Servo embeds a copy of SpiderMonkey, the JS engine from Firefox, and it does make sense for Servo to take advantage of an industrial, complete JS engine. Of course, SpiderMonkey is written in C++; if there were a JS engine written in Rust, probably Rust programmers would prefer it. Also it would be fun to write, or rather, fun to start writing; reaching the level of ECMA-262 conformance of SpiderMonkey is at least a hundred-million-dollar project. Anyway what I am saying is that I understand why Boa was started, and I wish them the many millions of dollars needed to see it through to completion.

You are not obliged to script your app via JavaScript, of course; there are many languages out there that have "extending a low-level core" as one of their core use cases. I think the mitigated success that this approach has had over the years—who embeds Python into an iPhone app?—should probably rule out this strategy as a core part of an application development framework. Still, I should mention one Rust-specific option, Rhai; the pitch is that by being Rust-specific, you get more expressive interoperation between Rhai and Rust than you would between Rust and any other dynamic language. Still, it is not a solution that I would bet on: Rhai internalizes so many Rust concepts (notably around borrowing and lifetimes) that I think you have to know Rust to write effective Rhai, and knowing both is quite rare. Anyone who writes Rhai would probably rather be writing Rust, and that's not a good equilibrium.

The third option for scripting Rust is WebAssembly. We'll get to that in a minute.

alternate world: the web of pixels

Let's return to Flutter for a moment, if you will. Like the more active Rust GUI development projects, Flutter is an all-in-one rendering framework based on low-level primitives; all it needs is Vulkan or Metal or (soon) WebGPU, and it handles the rest, layering on opinionated patterns for how to build user interfaces. It didn't arrive to this state in a day, though. To hear Eric Seidel tell the story, Flutter began as a kind of "reset" for the Web, a conscious attempt to determine from the pieces that compose the Web rendering stack, which ones enable smooth user interfaces and which ones get in the way. After taking away all of the parts they didn't need, Flutter wasn't left with much: just GPU texture layers, a low-level drawing toolkit, and the necessary bindings to input events. Of course what the application programmer sees is much more high-level, but underneath, these are the platform primitives that Flutter uses.

So, imagine you work at Google. You used to work on the web—maybe on WebKit and then Chrome like Eric, maybe on web standards—but you broke with this past to see what Flutter might become. Flutter works: great job everybody! The set of graphical and input primitives that you use is minimal enough that it is abstract by nature; it doesn't much matter whether you target iOS or Android, because the primitives will be there. But the web is still the web, and it is annoying, aesthetically speaking. Could we Flutter-ize the web? What would that mean?

That's exactly what former HTML specification editor and now Flutter team member Ian Hixie proposed this January in a brief manifesto, Towards a modern Web stack. The basic idea is that the web and thus the browser is, well, a bit much. Hixie proposed to start over, rebuilding the web on top of WebAssembly (for code), WebGPU (for graphics), WebHID (for input), and ARIA (for accessibility). Technically it's a very interesting proposition! After all, people that build complex web apps end up having to fight with the platform to get the results they want; if we can reorient them to focus on these primitives, perhaps web apps can compete better with native apps.

However if you game out what is being proposed, I have doubts. The existing web is largely HTML, with JavaScript and CSS as add-ons: a web of structured text. Hixie's flutterized web proposal, on the other hand, is a web of pixels. This has a number of implications. One is that each app has to ship its own text renderer and internationalization tables, which is a bit silly to say the least. And whereas we take it for granted that we can mouse over a web page and select its text, with a web of pixels it is much less obvious how that would happen. Hixie's proposal is that apps expose structure via ARIA, but as far as I understand there is no association between pixels and ARIA properties: the pixels themselves really have no built-in structure to speak of.

And of course unlike in the web of structured text, in a web of pixels it would be up each app to actually describe its structure via ARIA: it's not a built-in part of the system. But if you combine this with the rendering story (here's WebGPU, now draw the rest of the owl), Hixie's proposal leaves a void for frameworks to fill between what the app developer wants to write (e.g. Flutter/Dart) and the platform (WebGPU/ARIA/etc).

I said before that I had doubts and indeed I have doubts about my doubts. I am old enough to remember when X11 apps on Unix desktops changed from having fonts rendered on the server (i.e. by the operating system) to having them rendered on the client (i.e. the app), which was associated with a similar kind of anxiety. There were similar factors at play: slow-moving standards (X11) and not knowing at build-time what the platform would actually provide (which X server would be in use, etc). But instead of using the server, you could just ship pixels, and that's how GNOME got good text rendering, with Pango and FreeType and fontconfig, and eventually HarfBuzz, the text shaper used in Chromium and Flutter and many other places. Client-side fonts not only enabled more complex text shaping but also eliminated some round-trips for text measurement during UI layout, which is a bit of a theme in this article series. So could it be that pixels instead of text does not represent an apocalypse for the web? I don't know.

Incidentally I cannot move on from this point without pointing out another narrative thread, which is that of continued human effort over time. Raph Levien, who I mentioned above as a Rust UI toolkit developer, actually spent quite some time doing graphics for GNOME in the early 2000s; I remember working with his libart_lgpl. Behdad Esfahbod, author of HarfBuzz, built many parts of the free software text rendering stack before moving on to Chrome and many other things. I think that if you work on this low level where you are constantly translating text to textures, the accessibility and interaction benefits of using a platform-provided text library start to fade: you are the boss of text around here and you can implement the needed functionality yourself. From this perspective, pixels don't represent risk at all. In the old days of GNOME 2, client-side font rendering didn't lead to bad UI or poor accessibility. To be fair, there were other factors pushing to keep work in a commons, as the actual text rendering libraries still tended to be shipped with the operating system as shared libraries. Would similar factors prevail in a statically-linked web of pixels?

In a way it's a moot question for us, because in this series we are focussing on native app development. So, if you ship a platform, should your app development framework look like the web-of-pixels proposal, or something else? To me it is clear that as a platform, you need more. You need a common development story for how to build user-facing apps: something that looks more like Flutter and less like the primitives that Flutter uses. Though you surely will include a web-of-pixels-like low-level layer, because you need it yourself, probably you should also ship shared text rendering libraries, to reduce the install size for each individual app.

And of course, having text as part of the system has the side benefit of making it easier to get users to install OS-level security patches: it is well-known in the industry that users will make time for the update if they get a new goose emoji in exchange.

alternate world: webassembly

Hark! Have you heard the good word? Have you accepted your Lord and savior, WebAssembly, into your heart? I jest; it does sometime feel like messianic narratives surrounding WebAssembly prevent us from considering its concrete aspects. But despite the hype, WebAssembly is clearly a technology that will be a part of the future of computing. So let's dive in: what would it mean for a mobile app development platform to embrace WebAssembly?

Before answering that question, a brief summary of what WebAssembly is. WebAssembly 1.0 is portable bytecode format that is a good compilation target for C, C++, and Rust. These languages have good compiler toolchains that can produce WebAssembly. The nice thing is that when you instantiate a WebAssembly module, it is completely isolated from its host: it can't harm the host (approximately speaking). All points of interoperation with the host are via copying data into memory owned by the WebAssembly guest; the compiler toolchains abstract over these copies, allowing a Rust-compiled-to-native host to call into a Rust-compiled-to-WebAssembly module using idiomatic Rust code.

So, WebAssembly 1.0 can be used as a way to script a Rust application. The guest script can be interpreted, compiled just in time, or compiled ahead of time for peak throughput.

Of course, people that would want to script an application probably want a higher-level language than Rust. In a way, WebAssembly is in a similar situation as WebGPU in the web-of-pixels proposal: it is a low-level tool that needs higher-level toolchains and patterns to bridge the gap between developers and primitives.

Indeed, the web-of-pixels proposal specifies WebAssembly as the compute primitive. The idea is that you ship your application as a WebAssembly module, and give that module WebGPU, WebHID, and ARIA capabilities via imports. Such a WebAssembly module doesn't script an existing application: it is the app. So another way for an app development platform to use WebAssembly would be like how the web-of-pixels proposes to do it: as an interchange format and as a low-level abstraction. As in the scripting case, you can interpret or compile the module. Perhaps an infrequently-run app would just be interpreted, to save on disk space, whereas a more heavily-used app would be optimized ahead of time, or something.

We should mention another interesting benefit of WebAssembly as a distribution format, which is that it abstracts over the specific chipset on the user's device; it's the device itself that is responsible for efficiently executing the program, possibly via compilation to specialized machine code. I understand for example that RISC-V people are quite happy about this property because it lowers the barrier to entry for them relative to an ARM monoculture.

WebAssembly does have some limitations, though. One is that if the throughput of data transfer between guest and host is high, performance can be bad due to copying overhead. The nascent memory-control proposal aims to provide an mmap capability, but it is still early days. The need to copy would be a limitation for using WebGPU primitives.

More generally, as an abstraction, WebAssembly may not be able to express programs in the most efficient way for a given host platform. For example, its SIMD operations work on 128-bit vectors, whereas host platforms may have much wider vectors. Any current limitation will recede with time, as WebAssembly gains new features, but every year brings new hardware capabilities (tensor operation accelerator, anyone?), so there will be some impedance-matching to do for the foreseeable future.

The more fundamental limitation of the 1.0 version of WebAssembly is that it's only a good compilation target for some languages. This is because some of the fundamental parts of WebAssembly that enable isolation between host and guest (structured control flow, opaque stack, no instruction pointer) make it difficult to efficiently implement languages that need garbage collection, such as Java or Go. The coming WebAssembly 2.0 starts to address this need by including low-level managed arrays and records, allowing for reasonable ahead-of-time compilation of languages like Java. Getting a dynamic language like JavaScript to compile to efficient WebAssembly can still be a challenge, though, because many of the just-in-time techniques needed to efficiently implement these languages will still be missing in WebAssembly 2.0.

Before moving on to WebAssembly as part of an app development framework, one other note: currently WebAssembly modules do not compose very well with each other and with the host, requiring extensive toolchain support to enable e.g. the use of any data type that's not a scalar integer or floating-point value. The component model working group is trying to establish some abstractions and associated tooling, but (again!) it is still early days. Anyone wading into this space needs to be prepared to get their hands dirty.

To return to the question at hand, an app development framework can use WebAssembly for scripting, though the problem of how to compose a host application with a guest script requires good tooling. Or, an app development framework that exposes a web-of-pixels primitive layer can support running WebAssembly apps directly, though again, the set of imports remains to be defined. Either of these two patterns can stick with WebAssembly 1.0 or also allow for garbage collection in WebAssembly 2.0, aiming to capture mindshare among a broader community of potential developers, potentially in a wide range of languages.

As a final observation: WebAssembly is ecumenical, in the sense that it favors no specific church of how to write programs. As a platform, though, you might prefer a state religion, to avoid wasting internal and external efforts on redundant or ill-advised development. After all, if it's your platform, presumably you know best.


What is to be done?

Probably there are as many answers as people, but since this is my blog, here are mine:

  1. On the shortest time-scale I think that it is entirely reasonable to base a mobile application development framework on JavaScript. I would particularly focus on TypeScript, as late error detection is more annoying in native applications.

  2. I would to build something that looks like Flutter underneath: reactive, based on low-level primitives, with a multithreaded rendering pipeline. Perhaps it makes sense to take some inspiration from WebF.

  3. In the medium-term I am sympathetic to Ark's desire to extend the language in a more ResultBuilder-like direction, though this is not without risk.

  4. Also in the medium-term I think that modifications to TypeScript to allow for sound typing could provide some of the advantages of Dart's ahead-of-time compiler to JavaScript developers.

  5. In the long term... well we can do all things with unlimited resources, right? So after solving climate change and homelessness, it makes sense to invest in frameworks that might be usable 3 or 5 years from now. WebAssembly in particular has a chance of sweeping across all platforms, and the primitives for the web-of-pixels will be present everywhere, so if you manage to produce a compelling application development story targetting those primitives, you could eat your competitors' lunch.

Well, friends, that brings this article series to an end; it has been interesting for me to dive into this space, and if you have read down to here, I can only think that you are a masochist or that you have also found it interesting. In either case, you are very welcome. Until next time, happy hacking.

by Andy Wingo at June 15, 2023 02:02 PM

June 14, 2023

Frédéric Wang

Infinite version of the Set card game

edit 2023/06/17: I elaborated a bit more in the conclusion about the open problem of finding a minimal κ\kappa.

The Set Game

I visited A Coruña last week for the Web Engines Hackfest and to participate to internal events with my fellow Igalians. One of our tradition being to play board games, my colleague Ioanna presented a card game called Set. To be honest I was not very good at it, but it made me think of a potential generalization for infinite sets that is worth a blog post…

Basically, we have a deck of λμ\lambda^\mu cards with μ=4\mu = 4 features (number of shapes, shape, shading and color), each of them taking λ=3\lambda = 3 possible values (e.g. red, green or purple for the color). Given κ\kappa cards on the table, players must extract λ\lambda cards forming what is called a Set, which is defined as follows: for each of the μ\mu features, either the cards use the same value or they use pairwise distinct values.

Formally, this can be generalized for any cardinal λ\lambda as follows:

  • A card is a function from μ\mu (the features) to λ\lambda (the values).
  • A set of cards SS is a Set iff for any feature α<μ\alpha < \mu, the mapping ΦαS:S→λ\Phi_\alpha^S : S \rightarrow \lambda

    that maps a card cc to the value c(α)c(\alpha) is either constant or one-to-one.

Given a value κ\kappa such that λ≤κ≤λμ\lambda \leq \kappa \leq \lambda^\mu, can we always extract a Set when κ\kappa cards are put on the table? Or said otherwise, is there a set of κ\kappa cards from which we cannot extract any Set?

Trivial cases (λ≤2\lambda \leq 2 or μ≤1\mu \leq 1)

Given κ≥λ\kappa \geq \lambda cards, we can always extract a Set SS in the following trivial cases:

  • If μ=0\mu = 0 then the deck contains only one card c=∅c = \emptyset. If λ≥2\lambda \geq 2 such a set of κ\kappa cards does not exist. Otherwise we can just take S=∅S = \emptyset or S={c}S ={\{c\}}: these are Sets since the definition is trivial for μ=0\mu = 0.
  • If μ≥1\mu \geq 1 and λ=0\lambda = 0 then the deck is empty. We take S=∅S = \emptyset and for any α<μ\alpha < \mu, Φα∅=∅\Phi_\alpha^\emptyset = \emptyset is both constant and one-to-one.
  • If μ=1\mu = 1 and λ≥1\lambda \geq 1, a card cc is fully determined by its value c(0)c(0), so distinct cards give distinct values. So we can pick any SS of size λ\lambda: it is a Set since Φ0\Phi_0 is one-to-one.
  • If λ=1\lambda = 1 then we can pick any singleton SS: it is a Set since for any feature α<μ\alpha < \mu the mapping ΦαS\Phi_\alpha^S is both constant and one-to-one.
  • If λ=2\lambda = 2 then we can pick any pair of cards SS: it is a Set since for any feature α<μ\alpha < \mu the mapping ΦαS\Phi_\alpha^S is either constant or one-to-one (depending on whether the two cards display the same value or not).

👉🏼 For the rest of this blog post, I’ll assume μ≥2\mu \geq 2 and λ≥3\lambda \geq 3.

Not enough cards on the table (κ≤μ\kappa \leq \mu)

If μ≥κ≥λ≥3\mu \geq \kappa \geq \lambda \geq 3 then we consider cards cαc_\alpha for each α<κ\alpha < \kappa defined for each β<μ\beta < \mu as cα(β)=δα,βc_\alpha{(\beta)} = \delta_{\alpha, \beta} (using Kronecker delta). If we extract a subset SS from these cards and α1,α2,α3<κ≤μ\alpha_1, \alpha_2, \alpha_3 < \kappa \leq \mu are indices for elements of SS then Φα1S\Phi_{\alpha_1}^S respectively evaluates to 1, 0 and 0 for α1,α2,α3\alpha_1, \alpha_2, \alpha_3 so SS is not a Set.

👉🏼 For the rest of this blog post, we’ll assume μ<κ\mu < \kappa and will even focus on the minimal case κ=λ\kappa = \lambda.

Finite number of values (λ<ℵ0\lambda < \aleph_0)

Let’s consider a finite number of values λ≥3\lambda \geq 3 and define the card cαc_\alpha for each α<λ\alpha < \lambda as follows: cα(β)=δα,0c_\alpha(\beta) = \delta_{\alpha, 0} for β=0\beta=0 (again Kronecker delta) and cα(β)=αc_\alpha(\beta) = \alpha for 0<β<μ0 < \beta < \mu. Since μ≥2\mu \geq 2, the latter case shows that S={cα:α<λ}S = \{ c_\alpha : \alpha < \lambda \} contains exactly λ\lambda cards. Since κ=λ<ℵ0\kappa = \lambda < \aleph_0, the only way to extract a subset of size λ\lambda would be to take all the cards. But they don’t form a Set since by construction Φ0(c0)=1≠0=Φ0(c1)=Φ0(c2){\Phi_0{(c_0)}} = 1 \neq 0 = {\Phi_0{(c_1)}} = {\Phi_0{(c_2)}}.

👉🏼 For the rest of the blog post, I’ll assume λ\lambda is infinite.

Singular number of values (cf(λ)<λ\mi{cf}(\lambda) < \lambda)

If λ\lambda is a singular cardinal, then we consider a cofinal sequence {αγ,γ<ν}⊆λ{\{ \alpha_{\gamma}, \gamma < \nu \}} \subseteq \lambda of length ν<λ\nu < \lambda and define the card cαc_\alpha for α<λ\alpha < \lambda as follows:

  • For β=0\beta = 0, we consider the smallest ordinal γ<ν≤λ\gamma < \nu \leq \lambda such that α<αγ\alpha < \alpha_\gamma and define cα(0)=γc_\alpha{(0)} = \gamma.
  • For any 1≤β<μ1 \leq \beta < \mu, cα(β)=α{c_\alpha{(\beta)}} = \alpha.

Since μ≥2\mu \geq 2, the latter case shows that these are λ\lambda distinct cards. Consider S⊆{cα,α<λ}S \subseteq {\{c_\alpha, \alpha < \lambda \}}. If Φ0S\Phi_0^S evaluates to a constant value γ<ν\gamma < \nu then S=(Φ0S)−1({γ})S = {\left(\Phi_0^S\right)}^{-1}(\{\gamma\}) has size at most |αγ|<λ{|\alpha_\gamma|} < \lambda. If instead Φ0S\Phi_0^S is one-to-one then it takes at most ν\nu distinct values so again |S|≤ν<λ{|S|} \leq \nu < \lambda. Hence SS is not a Set.

👉🏼 For the rest of the blog post, I’ll assume λ\lambda is an infinite regular cardinal.

Finite number of features (μ<ℵ0\mu < \aleph_0)

In this section, we assume that the number of features μ\mu is finite. Let’s consider λ\lambda cards cαc_\alpha and extract a Set SS by induction as follows:

  • S0={cα:α<λ}S_0 = \{ c_\alpha : \alpha < \lambda \}
  • For any β<μ\beta < \mu, we construct Sβ+1⊆SβS_{\beta+1} \subseteq S_{\beta} of cardinality λ\lambda. We note that λ\lambda is regular and Sβ=⋃α∊ΦβSβ(Sβ)(ΦβSβ)−1({α}) S_\beta = {\bigcup_{\alpha \in \Phi_\beta^{S_\beta}{(S_\beta)}} {\left(\Phi_\beta^{S_\beta}\right)}^{-1}{(\{\alpha\})} }

    so there are only two possible cases:

    • If ΦβSβ(Sβ)\Phi_\beta^{S_\beta}{(S_\beta)} is of cardinality λ\lambda then pick λ\lambda elements from SβS_\beta with pairwise distinct image by ΦβSβ\Phi_\beta^{S_\beta}.
    • Otherwise, if there is α<λ\alpha < \lambda such that (ΦβSβ)−1({α}){\left(\Phi_\beta^{S_\beta}\right)}^{-1}{(\{\alpha\})} is of cardinality λ\lambda, then let it be our Sβ+1S_{\beta+1}.
  • S=SμS = S_{\mu}

Then by construction, SS is of size λ\lambda and for any β<μ\beta < \mu, S⊆Sβ+1S \subseteq S_{\beta+1} which means that ΦβS=(ΦβSβ)|S\Phi_\beta^S = { {(\Phi_\beta^{S_\beta})}_{| S}} is either constant or one-to-one.

Incidentally, although I said I would focus on the case κ=λ\kappa = \lambda the result of this session shows that we can extract a Set if more than λ\lambda cards are put on the table!

Summary and open questions

Above are the results I found from a preliminary investigation, which can be summarized as follows:

  1. If λ≤2\lambda \leq 2 or μ≤1\mu \leq 1 then we can always find a Set from κ≥λ\kappa \geq \lambda cards.
  2. If 3≤λ≤μ3 \leq \lambda \leq \mu then for any κ\kappa such that λ≤κ≤μ\lambda \leq \kappa \leq \mu there is a set of κ\kappa cards from which we cannot extract any Set.
  3. If 2≤μ<λ<ℵ02 \leq \mu < \lambda < \aleph_0 there is a set of λ\lambda cards from which we cannot extract any Set.
  4. If 2≤μ2 \leq \mu and λ\lambda is singular then there is a set of λ\lambda cards from which we cannot extract any Set.
  5. If 2≤μ<ℵ0≤cf(λ)=λ2 \leq \mu < \aleph_0 \leq {\mi{cf}(\lambda)} = \lambda, then we can always find a Set from κ≥λ\kappa \geq \lambda cards.

Note that for the standard game 3=λ<μ=43 = \lambda < \mu = 4 the only of the results above that applies is (2). Indeed, having only three or four cards on the table is generally not enough to extract a Set!

So far, I was not able to find an answer for the case ℵ0≤μ<cf(λ)=λ≤κ\aleph_0 \leq \mu < {\mi{cf}(\lambda)} = \lambda \leq \kappa. It looks like the inductive construction from the previous paragraph could work, but it’s not clear what guarantees that taking intersection at limit step would preserve size κ\kappa (an idea would be to use closed unbounded SβS_\beta instead but I didn’t find a satisfying proof). I also failed to build a counter-example set of λ\lambda cards without any Set subset, despite several attempts.

More generally, an open problem is to determine the minimal number of cards κ\kappa (with λ≤κ≤λμ\lambda \leq \kappa \leq \lambda^\mu) to put on the table to ensure players can always extract a Set subset… or even if such a number actually exists! If it does, then in cases (2) (3) (4) we only know κ>λ\kappa > \lambda. In cases (1) and (5) the minimum value κ=λ\kappa = \lambda works ; and when μ≥2\mu \geq 2 and λ≥3\lambda \geq 3 are finite, the maximum value κ=λμ\kappa = \lambda^\mu means taking the full deck, which works too (e.g. it always contains the Set given by ∀α<λ,∀β<μ,cα(β)=α\forall \alpha < \lambda, \forall \beta < \mu, c_\alpha(\beta) = \alpha). Incidentally, note that the latter case is consistent with (2) and (3) since we have λμ>μ,λ\lambda^\mu > \mu, \lambda. But in general for infinite parameters putting κ=λμ\kappa = \lambda^\mu cards on the table does not mean putting the full deck, so it’s less obvious whether we can extract a Set

June 14, 2023 12:00 AM

June 12, 2023

Igalia Compilers Team

QuickJS: An Overview and Guide to Adding a New Feature

In a previous blog post, I briefly mentioned QuickJS (QJS) as an alternative implementation of JavaScript (JS) that does not run in a web browser. This time, I'd like to delve deeper into QJS and explain how it works.

First, some remarks on QJS's history and overall architecture. QJS was written by Fabrice Bellard, who you may know as the original author of Qemu and FFmpeg, and was first released in 2019. QJS is primarily a bytecode interpreter (with no JIT compiler tiers) that can execute JS relatively quickly.

You can invoke QJS from the command-line like NodeJS and similar systems:

$ echo "console.log('hello world');" > hello.js
$ qjs hello.js # qjs is the main executable for quickjs
hello world

QJS comes with another tool called qjsc that can produce small executable binaries from JS source code. It does so by embedding QJS bytecode in C code that links with the QJS runtime, which avoids the need to parse JS to bytecode at runtime.

The following example demonstrates this (note: feel free to skip over the the details of this C code output, it's not crucial for the rest of the post):

$ qjsc hello.js -e -o hello.c # qjsc compiles the JS instead of running directly
$ cat hello.c
/* File generated automatically by the QuickJS compiler. */

#include "quickjs-libc.h"

const uint32_t qjsc_hello_size = 78;

const uint8_t qjsc_hello[78] = {
0x02, 0x04, 0x0e, 0x63, 0x6f, 0x6e, 0x73, 0x6f,
0x6c, 0x65, 0x06, 0x6c, 0x6f, 0x67, 0x16, 0x68,
0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72,
0x6c, 0x64, 0x10, 0x68, 0x65, 0x6c, 0x6c, 0x6f,
0x2e, 0x6a, 0x73, 0x0e, 0x00, 0x06, 0x00, 0xa0,
0x01, 0x00, 0x01, 0x00, 0x03, 0x00, 0x00, 0x14,
0x01, 0xa2, 0x01, 0x00, 0x00, 0x00, 0x38, 0xe1,
0x00, 0x00, 0x00, 0x42, 0xe2, 0x00, 0x00, 0x00,
0x04, 0xe3, 0x00, 0x00, 0x00, 0x24, 0x01, 0x00,
0xcd, 0x28, 0xc8, 0x03, 0x01, 0x00,

static JSContext *JS_NewCustomContext(JSRuntime *rt)
JSContext *ctx = JS_NewContextRaw(rt);
if (!ctx)
return NULL;
return ctx;

int main(int argc, char **argv)
JSRuntime *rt;
JSContext *ctx;
rt = JS_NewRuntime();
JS_SetModuleLoaderFunc(rt, NULL, js_module_loader, NULL);
ctx = JS_NewCustomContext(rt);
js_std_add_helpers(ctx, argc, argv);
js_std_eval_binary(ctx, qjsc_hello, qjsc_hello_size, 0);
return 0;

It's possible to embed parts of this C output into a larger program, for adding the ability to script a system in JS for example. You can also compile it, along with the QJS runtime, to WebAssembly (as is done in tools such as the Bytecode Alliance's Javy).

QJS as it exists today supports many features in the JS standard, but not all of them. What if you need to extend it to support modern JS features? Where would you start?

To address these questions, the rest of this post explains some of the internals of QJS by walking through the implementation of a new feature. The feature that we will explore is the ergonomic brand checks for private fields proposal, which I picked because it is a relatively simple and straightforward feature to implement. This proposal reached stage 4 in the TC39 process in 2021, and is currently part of the official ECMAScript 2022 standard.

Before getting into the details of adding the new feature, we'll first start with an explanation of what the proposal we are exploring actually does. After that, I'll explain how QJS processes JS code at a high-level before diving into the details of how to implement this proposal.

Explaining "ergonomic brand checks for private fields" #

The proposal we'll be exploring is titled "Ergonomic brand checks for private fields", which for the rest of this post I'll shorten to "private brand checks". Since ES2022, JS has supported private fields in classes. For example, you can declare a private field as follows:

class Foo {
#priv = 0; // private field declaration (needed for #priv to be in scope)
get() { return this.#priv; }

new Foo().get(); // returns 0
new Foo().#priv; // error, it's private

Note that the # syntax is special and only allowed for private field names. Ordinary identifiers cannot be used to define a private field.

Private brand checks, also added in ES2022, are just a way to check if a given object has a given private field with a convenient syntax. For example, the isFoo static method in the following snippet uses a private brand check:

class Foo {
#priv; // necessary declaration
static isFoo(obj) { return #priv in obj; } // brand check for #priv

class Bar {
#priv; // a different #priv than above!

Foo.isFoo(new Foo()); // returns true
Foo.isFoo({}); // returns false
Foo.isFoo(new Bar()); // returns false

The example shows that the proposal overloads the behavior of in so that if the left-hand side is a private field name, it checks for the presence of that private field. Note that since private names are scoped to the class, private names that look superficially identical in different classes may not pass the same brand checks (as the example above showed).

Now that we know what this proposal does, let's talk about what it takes to implement it. Before explaining the nitty-gritty details, we'll first talk about the architecture of QJS at a high-level.

Architecture overview #

Most people probably run JS code in a web browser or via a runtime like NodeJS, Deno, or Bun that uses those browsers' JS engines. These engines typically use a tiered implementation strategy in which code often starts running in an interpreter and then tiers up to a compiler, perhaps multiple compilers, to produce faster code (see this blog post by Lin Clark for a high-level overview).

These engines typically also compile the JS source program into bytecode, an intermediate form that can be interpreted and compiled more easily than the source code or its parsed abstract syntax tree (AST).

QJS shares some of these steps, in that it also compiles JS to bytecode and then interprets the bytecode. However, it has no additional execution tiers.

While web browers generally have to fetch JS source code and compile to bytecode while running (though there is bytecode caching to optimize this), when QJS emits an executable (e.g., the use of qjsc from earlier) it avoids the runtime parsing step by compiling the bytecode into the executable.

The QJS bytecode is designed for a stack machine (unlike, say, V8's Ignition interpreter which uses a register machine). That is, the operations in the bytecode fetch data from the runtime system's stack. WebAssembly (Wasm) made a similar choice, which reflects a goal shared by both Wasm and QJS to produce small binaries. A stack machine can save overhead in instruction encoding because the instructions do not specify register names to fetch operands from. Instead, instructions just fetch their operands from the stack.

Thus, the overall operation of QJS is that it parses a JS file and creates a representation of the module or script, which contains some functions. Each function is compiled to bytecode. Then QJS interprets that bytecode to execute the program.

Diagram illustrating the steps in the execution pipeline for QuickJS

Adding support for a new proposal will affect several parts of this pipeline. In the case of private brand checks, we will need to modify the parser to accept the new syntax, add a new bytecode to represent the new operation, and add a new case in the core interpreter loop to implement that operation.

With that high-level overview in mind, we'll dive into specific parts of QJS in the following sections. Since QJS is written in C (in fact, the bulk of the system is contained in a single 10k+ line C file.), I'll be showing example snippets of C code to show what needs to change to implement private brand checks.

Parser #

The typical parsing pass in JS engines translates the JS source code to an internal AST representation. There is a separate bytecode generation pass that walks the AST and linearizes its structure into bytecodes.

QJS fuses these two passes and directly generates bytecode while parsing the source code. While this saves execution time, it does add its own kind of complexity.

To understand parsing, it's useful to know where QJS kicks off the process. JS_EvalInternal is the entry point for evaluating JS code. This can either evaluate and construct the runtime representation of a script or module in order to execute it, or just compile it to bytecode to emit to a file.

In turn, this will first run the lexer to create a tokenized version of the source code. Afterwards, it calls js_parse_program to parse the tokenized source code. The parser has its own state (JSParseState) which contains information on where the parser is in the token stream, the bytecodes emitted so far, and so on.

The parser broadly follows the structure of the JS specification's grammar, in which statements and expressions are organized in a particular nesting structure to avoid ambiguity. For modifying how the in operator gets parsed, we'll be interested in how relational expressions in particular are parsed. As relational expressions are a kind of binary operator expression, they're handled in QJS by the js_parse_expr_binary function. That function handles binary operators by "level", corresponding to how they nest in the formal grammar. The bottom level consists of multiplicative expressions, up to bitwise logical operators. The in operator is handled at level 5, along with other relational operators like <.

Since QJS will output the stack bytecode instructions in a single pass, it's necessary in a binary expression like expr_1 in expr_2 to first parse expr_1 and emit its bytecode, then parse expr_2 and emit that, then finally emit the bytecode for OP_in (i.e., it's a post-order traversal of the AST, since stack instructions are essentially postfix).

We won't need to change js_parse_expr_binary for private brand checks, as the main difference from normal in operators is how the left-hand side is parsed. For that, we'll be interested in js_parse_postfix_expr, which parses references to variable names (and is eventually called by js_parse_expr_binary). The js_parse_postfix_expr function, like most other parsing functions, has a switch statement that dispatches on different token types.

For example, there are tokens such as TOK_IDENT for ordinary identifiers for variables (e.g., foo) and TOK_PRIVATE_NAME for private field names (e.g., #foo). We will need to add a new case for private field tokens in the switch for js_parse_postfix_expr:

JSAtom name;
// Only allow this syntax if the next token is `in`.
// The left-hand side of a private brand check can't be a nested expression, it
// has to specifically be a private name.
if (peek_token(s, FALSE) != TOK_IN)
return -1;
// I'll explain a bit about atoms later. This code extracts
// a handle for the string content of the private name.
name = JS_DupAtom(s->ctx, s->token.u.ident.atom);
if (next_token(s))
return -1;
// This is a new bytecode that we'll add that looks up that the private
// field is valid and produces data for the `in` operator.
emit_op(s, OP_scope_ref_private_field);
// These are the arguments for the above op code in the instruction stream.
emit_u32(s, name);
emit_u16(s, s->cur_func->scope_level);

This case allows a private name to appear, and only allows it if the next token in the stream is in. We need the restriction because we don't want the private name to appear in any other expression, as those are invalid (private names should otherwise only appear in declarations in classes or in expressions like this.#priv).

It also emits the bytecode for this expression, which uses a new scope_ref_private_field operator that we add. When new opcodes get added, they're defined in quickjs-opcode.h. The scope_ref_private_field opcode is a new variant on existing opcodes like scope_get_private_field that are already defined in that header.

The scope_ref_private_field operator actually never appears in executable bytecode, and only appears temporarily as input to another pass. When I said bytecode is emitted from the parser in a single pass earlier, this was actually a slight simplification. After the initial parse, the bytecode goes through a scope resolution phase (see resolve_variables) where certain kinds of scope violations are ruled out. For example, the phase would signal an error on the following code:

// Invalid example
class Foo {
// missing declaration of #priv
foo(obj) { return #priv in obj; } // #priv is unbound

There's also an optimization pass on the bytecode to obtain some speedups in interpretation later.

In the scope resolution phase, scope_ref_private_field is translated to a get_var_ref operation, which looks up a variable in the runtime environment. This will resolve a variable to an index that the runtime can use to look up the private field in an object's property table. The reason we add this new operation is that existing operations like scope_get_private_field also get translated to do the actual field lookup in the object immediately, whereas we want to wait until the in operator is executed in order to do that.

Interpreter and runtime #

Once the bytecode compilation process is finished, the interpreter can start executing the program. QJS treats everything uniformly by considering all execution to take place in a function, so for example the code that runs in a module or script top-level is also in a special kind of function.

Therefore, all execution in QJS takes place in a core interpreter loop which runs a function body. It loads the bytecode for that function body and repeatedly runs the operations specified by the bytecode until it reaches the end. When executing the bytecode, the interpreter also maintains a runtime stack that stores temporary values produced by the operators. The interpreter allocates exactly enough stack space to run a particular function; the compiler pre-computes the max stack size for each function and encodes it in the bytecode format.

To add a new instruction, usually you add a new case to the big switch statement in the main interpreter loop in JS_CallInternal. Since we're just extending an existing operator, this case already exists. So instead, we need to extend the helper function js_operator_in. An annotated version of that function looks like this:

// Note: __exception is a QJS convention to warn if the result is unused
static __exception int js_operator_in(JSContext *ctx, JSValue *sp)
JSValue op1, op2;
JSAtom atom;
int ret;

// Reference the values in the top two stack slots
// op1 is the result of executing the left-hand side of the `in`
// op2 is the result of executing the right-hand side of the `in`
op1 = sp[-2];
op2 = sp[-1];

// op2 is the right-hand-side of `in`, which must be a JS object
JS_ThrowTypeError(ctx, "invalid 'in' operand");
return -1;

// Atoms are covered in more detail below
// but generally this just converts a string or symbol to a
// handle to an interned string, or it's a tagged number
atom = JS_ValueToAtom(ctx, op1);
if (unlikely(atom == JS_ATOM_NULL))
return -1;

// Look up if the property corresponding to left-hand-side name exists in the object.
ret = JS_HasProperty(ctx, op2, atom);

// QJS also has a reference-counting garbage collector. We need to appropriately
// free (i.e, decrement refcounts) on values when we stop using them.
JS_FreeAtom(ctx, atom);
if (ret < 0)
return -1;
JS_FreeValue(ctx, op1);
JS_FreeValue(ctx, op2);

// Push a boolean onto the top stack slot
// Note: the stack is shrunk after this by the main loop, so -2 is the top.
sp[-2] = JS_NewBool(ctx, ret);

return 0;

At this point in the code, the results of evaluating the left- and right-hand side expressions of an in are already on the stack. These are JS values, so now might be a good time to talk about how values are represented in QJS.

Object Representation #

All JS engines have their own internal representation of JS values, which include primitive values such as symbols and numbers and also object values. Since JS is dynamically typed, a given function can be called with all kinds of values, so the engine's representation needs a way to distinguish the values to appropriately signal an error, or choose the correct operation.

To do this, values need to come with some kind of tag. Some engines use a tagging scheme such as NaN-boxing to store all values inside the bit pattern of a 64-bit floating point number (using the different kinds of NaNs that exist in the IEEE-754 standard to distinguish cases). My colleague Andy Wingo wrote a blog post on this topic a while ago, laying out various options that JS engines use.

QJS uses a much simpler scheme, and dedicates 128 bits to each JS value. Half of that is the payload (a 64-bit float, pointer, etc.) and half is the tag value. The following definitions show how this is represented in C:

typedef union JSValueUnion {
int32_t int32;
double float64;
void *ptr;
} JSValueUnion;

typedef struct JSValue {
JSValueUnion u;
int64_t tag;
} JSValue;

On 32-bit platforms there is a different tagging scheme that I won't detail other than to note that it uses NaN-boxing with a 64-bit representation.

For the most part, the representation details are abstracted by various macros like JS_VALUE_GET_TAG used in the example code above, so there won't be much need to directly interact with the value representation in this post.

Reference counting and objects #

Compound data, such as objects and strings, are tracked by a relatively simple reference counting garbage collector in QJS. This is in contrast to the much more complex collectors in web engines, such as WebKit's Riptide, that have different design tradeoffs and requirements such as the need for concurrency. There's a lot more to say about how reference counting and compound data work in QJS, but I'll save most of those details for a future post.

Atoms and strings #

Certain data types have a special representation because they are so common and are used repeatedly in the program. These are small integers and strings. These correspond to property names, symbols, private names, and so on. QJS uses a datatype called an Atom for these cases (which has already appeared in code examples above).

An atom is a handle that is either tagged as an integer, or is an index that refers to an interned string, i.e., a unique string that is only allocated once and stored in a hash table. Atoms that appear in the program's bytecode are also serialized in the bytecode format itself, and are loaded into the runtime table on initialization.

The data type JSAtom is defined as a uint32_t, so it's just a 32-bit integer. Properties of objects, for example, are always accessed with atoms as the property key. This means that property tables in objects just need to map atoms to the stored values.

You can see this in action with the JS_HasProperty lookup above, which looks like JS_HasProperty(ctx, op2, atom). This code looks up a key atom in the object op2's property table. In turn, atom comes from the line atom = JS_ValueToAtom(ctx, op1), which converts the property name value op1 into either an integer or a handle to an interned string.

Changing the operation to support private fields #

The actual change to js_operator_in to support private brand checks is very simple. In the case that the private field is a non-method field, the resolved private name lookup via get_var_ref pushes a symbol value onto the stack. This case doesn't require any changes.

In the case that the private field refers to a method, the name lookup pushes a function object onto the stack. We then need to run a private brand check with the target object and this private function, to ensure the private function really is part of the object.

At a high level, you can see the similarity between this operation and the runtime semantics described in the formal spec for the private brand check proposal.

The modified code looks like the following:

static __exception int js_operator_in(JSContext *ctx, JSValue *sp)
JSValue op1, op2;
JSAtom atom;
int ret;

op1 = sp[-2];
op2 = sp[-1];

JS_ThrowTypeError(ctx, "invalid 'in' operand");
return -1;

// --- New code here ---
// This is the same as the previous code, but now under a conditional.
// It doesn't need to change, because after resolving the private field
// name to a symbol via `get_var_ref` the normal `JS_HasProperty` lookup
// works.
atom = JS_ValueToAtom(ctx, op1);
if (unlikely(atom == JS_ATOM_NULL))
return -1;
ret = JS_HasProperty(ctx, op2, atom);
JS_FreeAtom(ctx, atom);
// New conditional branch, in case the field operand is an object.
// When a private method is referenced via `get_var_ref`, it actually
// produces the function object for that method. We then can call
// the `JS_CheckBrand` operation that is already defined to check the
// validity of a private method call.
} else {
// JS_CheckBrand is modified to take a boolean (last arg) that
// determines whether to throw on failure or just indicate the
// success/fail state. This is needed as `in` doesn't throw when
// the check fails, it just returns false.
ret = JS_CheckBrand(ctx, op2, op1, FALSE);
// --- New code end ---

if (ret < 0)
return -1;
JS_FreeValue(ctx, op1);
JS_FreeValue(ctx, op2);

sp[-2] = JS_NewBool(ctx, ret);

return 0;

Testing #

We can validate this implementation against the official test262 tests. QJS comes with a test runner that can run against test262 (invoking make test2 will run it). Since we've added a new feature, we must also modify the tested features list in the test262 configuration file to specify that the feature should be tested. For private brand checks, we change class-fields-private-in=skip in that file to class-fields-private-in.

After changing the test file, the test262 tests for the private brand check feature all succeed with the exception of some syntax tests due to an existing bug with how in is parsed in general in QJS (the code function f() { "foo" in {} = 0; } should fail to parse, but errors at runtime instead in QJS).

Wrap-up #

With the examples above, I've walked through what it takes to add a relatively simple JS language feature to QuickJS. The private brand checks proposal just adds a new use of an existing syntax, so implementing it mostly just touches the parser and core interpreter loop. A feature that affects more of the language, such as adding a new datatype or changing how functions are executed, would obviously require more code and deeper changes.

The full changes required to implement this feature (other than test changes) can be reviewed in this patch.

In future posts, I'm planning to explain other parts of the QJS codebase and potentially explore how it's being used in the WebAssembly ecosystem.

Header image credit:

June 12, 2023 12:00 AM

June 05, 2023

Alex Bradbury

2023Q2 week log

I tend to keep quite a lot of notes on the development related (sometimes at work, sometimes not) I do on a week-by-week basis, and thought it might be fun to write up the parts that were public. This may or may not be of wider interest, but it aims to be a useful aide-mémoire for my purposes at least. Weeks with few entries might be due to focusing on downstream work (or perhaps just a less productive week - I am only human!).

Week of 29th May 2023

Week of 22nd May 2023

Week of 15th May 2023

Week of 17th April 2023

  • Still pinging for an updated riscv-bfloat1y spec version that incorporates the fcvt.bf16.s encoding fix.
  • Bumped the version of the experimental Zfa RISC-V extension supported by LLVM to 0.2 (D146834). This was very straightforward as after inspecting the spec history, it was clear there were no changes that would impact the compiler.
  • Filed a couple of pull requests against the riscv-zacas repo (RISC-V Atomic Compare and Swap extension).
    • #8 made the dependency on the A extension explicit.
    • #7 attempted to explicitly reference the extension for misaligned atomics, though it seems won't be merged. I do feel uncomfortable with RISC-V extensions that can have their semantics changed by other standard extensions without this possibility being called out very explicitly. As I note in the PR, failure to appreciate this might mean that conformance tests written for zacas might fail on a system with zacas_zam. I see a slight parallel to a recent discussion about RISC-V profiles.
  • Fixed the canonical ordering used for ISA naming strings in RISCVISAInfo (this will mainly affect the string stored in build attributes). This was fixed in D148615 which built on the pre-committed test case.
  • A whole bunch of upstream LLVM reviews. As noted in D148315 I'm thinking we should probably relaxing the ordering rules for ISA strings in -march in order to avoid issues due to spec changes and incompatibilities between GCC and Clang.
  • LLVM Weekly #485.

Week of 10th April 2023

Week of 3rd April 2023

Article changelog
  • 2023-06-05: Added notes for the week of 22nd May 2023 and week fo 29th May 2023.
  • 2023-05-22: Added notes for the week of 15th May 2023.
  • 2023-04-24: Added notes for the week of 17th April 2023.
  • 2023-04-17: Added notes for the week of 10th April 2023.
  • 2023-04-10: Initial publication date.

June 05, 2023 12:00 PM