Planet Igalia

May 26, 2020

Brian Kardell

Web Engine Diversity and Ecosystem Health

Web Engine Diversity and Ecosystem Health

For many years, we've seen lots of blog posts and conversation on the topic of "browser engine diversity". I'd like to offer a slightly tilted view of this based instead on "the health of the ecosystem", and explain why I think this is more valuable measure and way to discuss these topics.

Back in January, Jeremy Keith compared the complexity of browser engine diversity to political systems...

If you have hundreds of different political parties, that’s not ideal. But if you only have one political party, that’s very bad indeed!

I like this analogy bat's almost self-evident: 1 is too few, a hundred is too many - it's just chaos and noise. But what is the ideal number? This answer dogs us a lot, in part because it is not only unanswerable, it's actually kind of a red herring. I think there's something to this that leads to an important takeaway about how we think about the problem...

Numbers and goals...

The interesting part about this analogy is that the simple number of political parties isn't actually a very meaningful measure: Parties can differ a lot, or they can differ a little. 6 parties that barely disagree is quite a different thing than 3 that disagree substantially. Further, it's not simply a matter of giving voice to every dissenting opinion that matters either: There are political parties formed on ideas of hate, for example. Adding those doesn't add good things. In short, what really matters is the goal of those numbers.

In the case of the political analogy, the real aim is about what makes for a healthy, effective and just model of governance. Similarly on the topic of "browser engine diversity", I believe that the real goal is "What makes for a healthy and effective ecosystem?" and that the numbers we frequently talk about (and how) can easily lead us into perhaps the wrong takeaways.

Older engines and diversity

It is fairly common that discussions of browser diversity point to the Web's own history for examples of "why engine diversity matters": IE vs Netscape are often held up. IE's dominance and ability to dictate the market (or even kill it in this case) was overcome by diversity. This is true, but very incomplete: There's a lot more beneath the surface...

Consider, for example, the fact that for a time there were only two mainstream browsers - and the very dominant one (IE) was both entirely proprietary and based on a single proprietary OS.

Consider how different this could have been, if IE had been open source and properly licensed. Even if Microsoft didn't want it to run on other operating systems, someone else could step in and fill that gap. If Microsoft walked away from the Web, or worse, went belly up, someone could fork and take over the project. If, for any reason at all, the primary maintainer cannot prioritize - other people and organization could more directly invest in advancements.

A while later, Opera's Presto was very interesting, even multi-OS - but ultimately similarly proprietary.

So, while we had greater 'diverity', these numbers alone aren't a great measure of the overall health of the ecosystem.

The most healthy, open ecosystem ever.

This is in stark contrast to the situation today: In important ways, we are a more diverse, efficient and healthier ecosystem with the three multi-os, open source engines we have left than when we had had more and were dominated by projects that weren't that at all.

A lot of people seem to want to suggest that there aren't really 3 today because blink was born from a fork of WebKit, or because they think that WebKit is somehow "Just for iOS".

There are certainly 3 entirely different JavaScript engines, and 3 entirely different architectures. There are some bits that remain similar, but the truth is that that this is neither new nor easily solvable. Web engines today are so astonishingly complex (measured in tens of millions of lines of code) that we only get new ones through evolution. For the most part, this has been true for a long time. Even Netscape was a "take 2" from many of the same minds from Mosaic. Mozilla was created from a rework of Netscape. Even IE was born by licensing the Mosaic code. WebKit was born by forking KHTML. In other words: The critical investment required to create a full-blown compliant browser from nothing would be mind-boggling if everything were stopped today - and it never stops.

Today, the 3 that remain are all multi-OS.

"...Wait, even WebKit?" - you, just now.

Yes! I'm glad you asked! The GNOME flagship Epiphany/Web browser for Linux is WebKit based -- and billions of devices are based on Embedded WebKit. PS4 is a fun example of something lots of people might be familliar with -- not only the Web browser on your PS4, but in fact lots of the UI itself is made with Web content running in a WebKit based browser. But chances are pretty good that you encounter emdedded WebKit all the time: On cable boxes, smart TVs and appliances, kiosks, car and airplane infotainment systems, digital signage and so on.

This is possible because WebKit (and all of the engines today) are open source. Because of this, they all receive investments more widely than the org that maintains them, and that's expanding in important ways that are very good for the health of the ecosystem.

Igalia, where I work, for example are able to work on all of the browsers and help expand investment in advancing the commons. The list of things we have helped advance, and who has helped fund that is growing all the time: From recent things in JavaScript like class fields and Big Integer, to CSS features like CSS Grid, or features in HTML like lazy loading - the benefits of this are clear.

This also leads to lots of interesting new opportunities. For example, we are the creators/maintainers of WPE, the official WebKit Port for Embedded that powers tons of those devices mentioned above. This matters a lot because a rising tide lifts all boats in the commons. You can see an example of this in that Igalia is advancing work on SVG2 and hardware acceleration for SVG in WebKit - a topic which has failed to get the prioritization for years, but is especially critical for lots of embedded use cases. If you're interested in this, as well as some interesting history of SVG, WebKit, HTML/CSS and embedded browsers, have a listen to this.

Opportunities to do better...

There is another interesting thing here that is always left out of these discussions but I think is considerably interesting in this reframing. A funny little quirk of history is that while we say there are 3 rendering engines, that's only partially true.

More specifically, there are 3 modern browser rendering engines, and a bunch of other rendering engines that aren't that.

Amazon, for example, has a renderer that is used for ebooks and printing. PrinceXML has a renderer used for print. Antenna house too. And there are more still.

For the most part, these are proprietary and what they do and don't support isn't necessarily clear cut. Their support for modern standards is pretty ragged and the gaps are only likely to grow faster. This is a shame as there are lots of potentially useful things for these industries, like Houdini which are unlikely to ever exist for them.

This is a topic I'd love to see discussed more for several reasons - but mainy they center on the fact that it is just harder for us to move forward together if things are too fragmented. Instead of growing together and a rising tide lifting all boats - the boats are kind of all in entirely different bodies of water.

I would love to see some effort to resolve this. Imagine if these vendors came into the fray and either invested in basing their work on modern engines (hopefully various), or pulled together to create something new. That might be the one practical way we could arrive at an actual viable 4th engine somehow.

Why now is a good time to discuss it

Most of these engines also contain some things that were developed in standards organizations but which no browser supports... That's kind of why they were created. However, there are new opportunities to align...

Web engine architectures are like a lot of engineering projects - they grow, they get crufty, you get locked into certain boundaries that you know you'd like to escape. However,until you do you can't really afford to tackle certain kinds of problems. That's a big part of why browser engines today don't do a lot of the things you needed to fill their use cases. Browser architectures have traditionally, generally not been well-prepared for the problems of print or ebooks, and that's part of why we find ourselves in the current situation.

However, for many years, there has been much rework and rearchitecture of layout engines aimed at solving precisely the fundamental sorts of things that prevented those sorts of things from being considered.

Stir in that most of our office suites and things are now web based and there's considerable incentives to figuring out things like good print and pagination.

More open prioritization

But it's not just the projects being open source that matters, it's also that this affords us new opportunities for standardization itself.

Ultimately, it is the calculus of prioritization which impacts our ability to get things done almost more than anything else. Historically, lots of companies get together and, effectively, ask that a very few companies do the work. They are big organizations with big investments, to be sure - but they are all constrained by hard limits. The gauntlet of issues surrounding our ability to prioritize everything from attention to actual implementation have stymied many a topic.

But in this new age, companies like Igalia are showing that there is potentially a whole new game here: Where centralized, concrete investments from outside that small group can lift up the commons, benefit everyone and help drive progress.

In other words: Things are better and healthier because we continue to find better ways to work together. And when we do, everyone does better.

May 26, 2020 04:00 AM

May 14, 2020

Mario Sanchez Prada

The Web Platform Tests project

Web Browsers and Test Driven Development

Working on Web browsers development is not an easy feat but if there’s something I’m personally very grateful for when it comes to collaborating with this kind of software projects, it is their testing infrastructure and the peace of mind that it provides me with when making changes on a daily basis.

To help you understand the size of these projects, they involve millions of lines of code (Chromium is ~25 million lines of code, followed closely by Firefox and WebKit) and around 200-300 new patches landing everyday. Try to imagine, for one second, how we could make changes if we didn’t have such testing infrastructure. It would basically be utter and complete chao​s and, more especially, it would mean extremely buggy Web browsers, broken implementations of the Web Platform and tens (hundreds?) of new bugs and crashes piling up every day… not a good thing at all for Web browsers, which are these days some of the most widely used applications (and not just ‘the thing you use to browse the Web’).

The Chromium Trybots in action
The Chromium Trybots in action

Now, there are all different types of tests that Web engines run automatically on a regular basis: Unit tests for checking that APIs work as expected, platform-specific tests to make sure that your software runs correctly in different environments, performance tests to help browsers keep being fast and without increasing too much their memory footprint… and then, of course, there are the tests to make sure that the Web engines at the core of these projects implement the Web Platform correctly according to the numerous standards and specifications available.

And it’s here where I would like to bring your attention with this post because, when it comes to these last kind of tests (what we call “Web tests” or “layout tests”), each Web engine used to rely entirely on their own set of Web tests to make sure that they implemented the many different specifications correctly.

Clearly, there was some room for improvement here. It would be wonderful if we could have an engine-independent set of tests to test that a given implementation of the Web Platform works as expected, wouldn’t it? We could use that across different engines to make sure not only that they work as expected, but also that they also behave exactly in the same way, and therefore give Web developers confidence on that they can rely on the different specifications without having to implement engine-specific quirks.

Enter the Web Platform Tests project

Good news is that just such an ideal thing exists. It’s called the Web Platform Tests project. As it is concisely described in it’s official site:

“The web-platform-tests project is a cross-browser test suite for the Web-platform stack. Writing tests in a way that allows them to be run in all browsers gives browser projects confidence that they are shipping software which is compatible with other implementations, and that later implementations will be compatible with their implementations.”

I’d recommend visiting its website if you’re interested in the topic, watching the “Introduction to the web-platform-tests” video or even glance at the git repository containing all the tests here. Here, you can also find specific information such as how to run WPTs or how to write them. Also, you can have a look as well at the dashboard to get a sense of what tests exists and how some of the main browsers are doing.

In short: I think it would be safe to say that this project is critical to the health of the whole Web Platform, and ultimately to Web developers. What’s very, very surprising is how long it took to get to where it is, since it came into being only about halfway into the history of the Web (there were earlier testing efforts at the W3C, but none that focused on automated & shared testing). But regardless of that, this is an interesting challenge: Filling in all of the missing unified tests, while new things are being added all the time!

Luckily, this was a challenge that did indeed took off and all the major Web engines can now proudly say that they are regularly running about 36500 of these Web engine-independent tests (providing ~1.7 million sub-tests in total), and all the engines are showing off a pass rate between 91% and 98%. See the numbers below, as extracted from today’s WPT data:

Chrome 84 Edge 84 Firefox 78 Safari 105 preview
Pass Total Pass Total Pass Total Pass Total
1680105 1714711 1669977 1714195 1640985 1698418 1543625 1695743
Pass rate: 97.98% Pass rate: 97.42% Pass rate: 96.62% Pass rate: 91.03%

And here at Igalia, we’ve recently had the opportunity to work on this for a little while and so I’d like to write a bit about that…

Upstreaming Chromium’s tests during the Coronavirus Outbreak

As you all know, we’re in the middle of an unprecedented world-wide crisis that is affecting everyone in one way or another. One particular consequence of it in the context of the Chromium project is that Chromium releases were paused for a while. On top of this, some constraints on what could be landed upstream were put in place to guarantee quality and stability of the Chromium platform during this strange period we’re going through these days.

These particular constraints impacted my team in that we couldn’t really keep working on the tasks we were working on up to that point, in the context of the Chromium project. Our involvement with the Blink Onion Soup 2.0 project usually requires the landing of relatively large refactors, and these kind of changes were forbidden for the time being.

Fortunately, we found an opportunity to collaborate in the meantime with the Web Platform Tests project by analyzing and trying to upstream many of the existing Chromium-specific tests that haven’t yet been unified. This is important because tests exist for widely used specifications, but if they aren’t in Web Platform Tests, their utility and benefits are limited to Chromium. If done well, this would mean that all of the tests that we managed to upstream would be immediately available for everyone else too. Firefox and WebKit-based browsers would not only be able to identify missing features and bugs, but also be provided with an extra set of tests to check that they were implementing these features correctly, and interoperably.

The WPT Dashboard
The WPT Dashboard

It was an interesting challenge considering that we had to switch very quickly from writing C++ code around the IPC layers of Chromium to analyzing, migrating and upstreaming Web tests from the huge pool of Chromium tests. We focused mainly on CSS Grid Layout, Flexbox, Masking and Filters related tests… but I think the results were quite good in the end:

As of today, I’m happy to report that, during the ~4 weeks we worked on this my team migrated 240 Chromium-specific Web tests to the Web Platform Tests’ upstream repository, helping increase test coverage in other Web Engines and thus helping towards improving interoperability among browsers:

  • CSS Flexbox: 89 tests migrated
  • CSS Filters: 44 tests migrated
  • CSS Masking: 13 tests migrated
  • CSS Grid Layout: 94 tests migrated

But there is more to this than just numbers. Ultimately, as I said before, these migrations should help identifying missing features and bugs in other Web engines, and that was precisely the case here. You can easily see this by checking the list of automatically created bugs in Firefox’s bugzilla, as well as some of the bugs filed in WebKit’s bugzilla during the time we worked on this.

…and note that this doesn’t even include the additional 96 Chromium-specific tests that we analyzed but determined were not yet eligible for migrating to WPT (normally because they relied on some internal Chromium API or non-standard behaviour), which would require further work to get them upstreamed. But that was a bit out of scope for those few weeks we could work on this, so we decided to focus on upstreaming the rest of tests instead.

Personally, I think this was a big win for the Web Platform and I’m very proud and happy to have had an opportunity to have contributed to it during these dark times we’re living, as part of my job at Igalia. Now I’m back to working on the Blink Onion Soup 2.0 project, where I think I should write about too, but that’s a topic for a different blog post.

Credit where credit is due

IgaliaI wouldn’t want to finish off this blog post without acknowledging all the different contributors who tirelessly worked on this effort to help improve the Web Platform by providing the WPT project with these many tests more, so here it is:

From the Igalia side, my whole team was the one which took on this challenge, that is: Abhijeet, Antonio, Gyuyoung, Henrique, Julie, Shin and myself. Kudos everyone!

And from the reviewing side, many people chimed in but I’d like to thank in particular the following persons, who were deeply involved with the whole effort from beginning to end regardless of their affiliation: Christian Biesinger, David Grogan, Robert Ma, Stephen Chenney, Fredrik Söderquist, Manuel Rego Casasnovas and Javier Fernandez. Many thanks to all of you!

Take care and stay safe!

by mario at May 14, 2020 09:07 AM

May 11, 2020

Gyuyoung Kim

How Chromium Got its Mojo?

Chromium IPC

Chromium has a multi-process architecture to become more secure and robust like modern operating systems, and it means that Chromium has a lot of processes communicating with each other. For example, renderer process, browser process, GPU process, utility process, and so on. Those processes have been communicating using IPC [1].

Why is Mojo needed?

As a long-term intent, the Chromium team wanted to refactor Chromium into a large set of smaller services. To achieve that, they had considered below questions [3]

  • Which services we bring up?
  • How can we isolate these services to improve security and stability?
  • Which binary features can we ship?

They learned much from using the legacy Chromium IPC and maintaining Chromium dependencies over the past years. They felt a more robust messaging layer could allow them to integrate a large number of components without link-time interdependencies as well as help to build more and better features, faster, and with much less cost to users. So, that’s why Chromium team begins to make the Mojo communication framework.

From the performance perspective, Mojo is 3 times faster than IPC, and ⅓ less context switching compared to the old IPC in Chrome [3]. Also, we can remove unnecessary layers like content/renderer layer to communicate between different processes. Because combined with generated code from the Mojom IDL, we can easily connect interface clients and implementations across arbitrary inter-process boundaries. Lastly, Mojo is a collection of runtime libraries providing a platform-agnostic abstraction of common IPC primitives. So, we can build higher-level bindings APIs to simplify messaging for developers writing C++, Java, Javascript.

Status of migrating legacy IPC to Mojo

Igalia has been working on the migration since this year in earnest. But, hundreds of IPCs still remain in Chromium. The below chart shows the progress of migrating legacy IPC to Mojo [4].

Mojo Terminology

Let’s take a look at the key terminology before starting the migration briefly.

  • Message Pipe: A pair of endpoints and either endpoint may be transferred over another message pipe. Because we bootstrap a primordial message pipe between the browser process and each child process, eventually this means that a new pipe we create ultimately sends either end to any process, and the two ends will still be able to talk to each other seamlessly and exclusively. We don’t need to use routing ID anymore. Each point has a queue of incoming messages.
  • Mojom file: Define interfaces, which are strongly-typed collections of messages. Each interface message is roughly analogous to a single prototype message
  • Remote: Used to send messages described by the interface.
  • Receiver: Used to receive the interface messages sent by Remote.
  • PendingRemote: Typed container to hold the other end of a Receiver’s pipe.
  • PendingReceiver: Typed container to hold the other end of a Remote’s pipe.
  • AssociatedRemote/Receiver: Similar to a Remote and a Receiver. But, they run on multiple interfaces over a single message pipe while preserving message order, because the AssociatedRemote/Receiver was implemented by using the IPC::Channel used by legacy IPC messages.

Example of migrating a legacy IPC to Mojo

In the following example, we migrate WebTestHostMsg_SimulateWebNotificationClose to illustrate the conversion from legacy IPC to Mojo.

The existing WebTestHostMsg_SimulateWebNotificationClose IPC

  1. Message definition
    File: content/shell/common/web_test/web_test_messages.h
                    std::string /*title*/,  bool /*by_user*/)
  • Send the message in the renderer
    File: content/shell/renderer/web_test/
  • void BlinkTestRunner::SimulateWebNotificationClose(
        const std::string& title, bool by_user) {
      Send(new WebTestHostMsg_SimulateWebNotificationClose(
        routing_id(), title, by_user));
  • Receive the message in the browser
    File: content/shell/browser/web_test/
  • bool WebTestMessageFilter::OnMessageReceived(
        const IPC::Message& message) {
      bool handled = true;
      IPC_BEGIN_MESSAGE_MAP(WebTestMessageFilter, message)
  • Call the handler in the browser
    File: content/shell/browser/web_test/
  • void WebTestMessageFilter::OnSimulateWebNotificationClose(
        const std::string& title, bool by_user) {
          SimulateClose(title, by_user);

    Call flow after migrating the legacy IPC to Mojo

    We begin to migrate WebTestHostMsg_SimulateWebNotificationClose to WebTestClient interface from here. First, let’s see an overall call flow through simple diagrams. [5]

    1. The WebTestClientImpl factory method is called with passing the WebTestClientImpl PendingReceiver along to the Receiver.
    2. The receiver takes ownership of the WebTestClientImpl PendingReceiver’s pipe endpoint and begins to watch it for incoming messages. The pipe is readable immediately, so a task is scheduled to read the pending SimulateWebNotificationClose message from the pipe as soon as possible.
    3. The WebTestClientImpl message is read and deserialized, then, it will make the Receiver to invoke the WebTestClientImpl::SimulateWebNotificationClose() implementation on its bound WebTestClientImpl.

    Migrate the legacy IPC to Mojo

    1. Write a mojom file
      File: content/shell/common/web_test/web_test.mojom
    module content.mojom;
    // Web test messages sent from the renderer process to the
    // browser. 
    interface WebTestClient {
      // Simulates closing a titled web notification depending on the user
      // click.
      //   - |title|: the title of the notification.
      //   - |by_user|: whether the user clicks the notification.
      SimulateWebNotificationClose(string title, bool by_user);
  • Add the mojom file to a proper GN target.
    File: content/shell/
  • mojom("web_test_common_mojom") {
      sources = [
  • Implement the interface files
    File: content/shell/browser/web_test/web_test_client_impl.h
  • #include "content/shell/common/web_test.mojom.h"
    class WebTestClientImpl : public mojom::WebTestClient {
      WebTestClientImpl() = default;
      ~WebTestClientImpl() override = default;
      WebTestClientImpl(const WebTestClientImpl&) = delete;
      WebTestClientImpl& operator=(const WebTestClientImpl&) = delete;
      static void Create(
          mojo::PendingReceiver<mojom::WebTestClient> receiver);
      // WebTestClient implementation.
     void SimulateWebNotificationClose(const std::string& title,
                             bool by_user) override;
  • Implement the interface files
    File: content/shell/browser/web_test/
  • void WebTestClientImpl::SimulateWebNotificationClose(
        const std::string& title, bool by_user) {
            SimulateClose(title, by_user);
  • Creating an interface pipe
    File: content/shell/renderer/web_test/blink_test_runner.h
  • mojo::AssociatedRemote<mojom::WebTestClient>&

    File: content/shell/renderer/web_test/

    BlinkTestRunner::GetWebTestClientRemote() {
      if (!web_test_client_remote_) {
       return web_test_client_remote_;
  • Register the WebTest interface
    File: content/shell/browser/web_test/
  • void WebTestContentBrowserClient::ExposeInterfacesToRenderer {
     void WebTestContentBrowserClient::BindWebTestController(
         int render_process_id,
         StoragePartition* partition,
               receiver) {
  • Call an interface message in the renderer
    File: content/shell/renderer/web_test/
  • void BlinkTestRunner::SimulateWebNotificationClose(
     const std::string& title, bool by_user) {
         SimulateWebNotificationClose(title, by_user);
  • Receive the incoming message in the browser
    File: content/shell/browser/web_test/
  • void WebTestClientImpl::SimulateWebNotificationClose(
     const std::string& title, bool by_user) {
         SimulateClose(title, by_user);

    Appendix: A case study of Regression

    There were a lot of flaky web test failures after finishing the migration of WebTestHostMsg to Mojo. The failures were caused by using ‘Remote’ instead of ‘AssociatedRemote’ for WebTestClient interface in the BlinkTestRunner class. Because BlinkTestRunner was using the WebTestControlHost interface for ‘PrintMessage’ as an ‘AssociatedRemote’. But, ‘Remote’ used by WebTestClient didn’t guarantee the message order between ‘PrintMessage’ and ‘InitiateCaptureDump’ message implemented by different interfaces(WebTestControlHost vs. WebTestClient). Thus, tests had often finished before receiving all logs. The actual results could be different from the expected results.

    Changing Remote with AssociatedRemote for the WebTestClient interface solved the flaky test issues.

    [1] Inter-process Communication (IPC)
    [2] Mojo in Chromium
    [3] Mojo & Servicification Performance Notes
    [4] Chrome IPC legacy Conversion Status
    [5] Convert Legacy IPC to Mojo



    by gyuyoung at May 11, 2020 07:46 AM

    April 29, 2020

    Brian Kardell

    Interactive Elements: A Strange Game

    Note from the author...

    My posts frequently (like this one) have a 'theme' and tend to use a number of images for visual flourish. Personally, I like it that way, I find it more engaging and I prefer for people to read it that way. However, for users on a metered or slow connection, downloading unnecessary images is, well, unnecessary, potentially costly and kind of rude. Just to be polite to my users, I offer the ability for you to opt out of 'optional' images if the total size of viewing the page would exceed a budget I have currently defined as 200k...

    Interactive Elements: A Strange Game

    It can seem especially frustrating to a lot of developers that we don't have more elements "out of the box" as part of HTML. I mean: Where are the tabs? How do we not have tabs by now? It seems like almost every week this comes up in some fashion or other. In this piece I'll talk a little about why, how to see this for what it is, how we're trying to move forward - and also present an interesting experiment for your consideration and feedback that may help inform some of these efforts. If you want, you can skip right to the 'new idea/experiment', but I think this background is pretty helpful in understanding it, so please, come back and read it...

    Let's have a look at HTML so far: ~10 of the elements are the 'boilerplate stuff': <html>, <body>, <head> and metadata stuff - <link>, <meta>, <base>, <style>, <script>, <title>. There's a few elements for linking and embedding other content and multimedia, and a few with special powers like <template> and <slot>. And then the rest: ~30 are "weak semantics about text" that aren't interactive and don't have signficant (any) special meaning necessarily and are primarily just things with default CSS stylesheet rules. About 15 are "weak semantics about text, but meaningfully weak" - stuff like lists. ~15 are sectioning or landmarks. 11 more are about tables - a 2 dimensional relationship of text. 14 are about forms. And then two are... well... something else. We'll come back to that. And ~30 are deprecated.

    The route to all of these elements was a little different - and they have different costs associated with them, and offer different value. One thing that's interesting to note is that the majority of them are what Dave Rupert refers to as "Spicy Divs". That is, there's not much to them - they aren't complex or interactive, and they don't bring a ton of real world value. That's not to say they are without value, but ultimately we spend a lot of time and bandwidth in standards debating what often amount to silly things because of it. <main> is a kind of a good example. Nothing wrong with it as an element, it's great, in fact - it just has a long and complex backstory that ultimately cost a lot relative to its real world value and proven use. Tons of time defining and debating <address>, which isn't really that meaningful to browsers, and then lots of evangelism telling people they're using it wrong - only in the end to eventually have to admit that it ultimately means the thing people used it as.

    The second thing to notice is that there there are very few interactive elements. Further, most of the ones we usually talk about are about forms. And that lead us to this: Interactive, form-related elements on the web are... tricky.

    The form-related game

    On the one hand, there's a lot to like about native, interactive form controls. In theory, they can bring a simple declarative form with all kinds of goodness in terms of platform integration, portability and centralized work on accessibility and UX.


    ...primary goal has not yet been achieved
    > What is the primary goal?
    To win the game.

    More simply: Despite a lot of work, and huge investments, people still gravitate toward custom implementations. That's not good for anybody.

    Why are native form controls hard?

    Well... So many reasons, but to start with, the one that they could really do without is that they got off to a tricky start by design.

    <input type="WOPR">

    Most of the form-related controls are created based on <input type-"...">. You've probably heard someone suggest that this is great because in cases where it isn't supported, simply providing the text field is fine. However, this is incomplete and we've been paying the price for it for a long time.

    > People sometimes make... mistakes.

    Did you ever notice how the JavaScript API surface of input works differently based on the type? If the type is number and you type gibberish that looks remotely 'potentially numbery' like '12312e314124', then .value will be '' (empty string). Or, the checkbox type has a checked property, but that means that input type=number has that property too... What does it do? Anything!? Or, have you ever noticed how .selectionStart and .selectionEnd actually would initially throw if you tried to use them on a numeric input type?! (fixed by Simon Pieters). If not, go have a watch of Monica Dinculescu's <input> I ♡ you, but you're bringing me down -- a 45 minute talk about just this.

    Ultimately, many of these problems stem from us accepting the illusion presented that a date picker is a kind of text input.

    Faulken pointing to NORAD screens
    Stephen Falken: Uh, uh, General, what you see on these screens up here is a fantasy; a computer-enhanced hallucination. Those blips are not real missiles. They're phantoms.

    Realistically, a date picker isn't a text input. Perhaps at an HTTP level, or even a database level, it's just a value - but controls are about how to populate values, and that's quite a different thing. It's not an is-a relationship: A date picker is a completely different complex control with potentially lots of complex interactions: Different things it needs to communicate, lots of sub-iteractions to manage, and so on.

    All of this is why you might hear people suggest "LSP" or "Favor composition over inheritance" as a better alternative.

    Acceptable Losses?

    This is exacerbated by the fact that at the end of the day, developers are left with some pretty unappealing choices: Something is going to be sacrificed.

    Color pickers provide a great example of how this plays out in practice: Will developers building the sorts of tools that use a color picker allow a simple input field as the fallback until every browser ships something acceptable? Seems like not much of a choice really.

    Sadly the story for 'polyfilling' an element also still isn't great: It's really just 'replacing it with something else entirely'. So, at the end of the day, you've got to go and find something that is a good quality stand in anyway. Finally, since literally everything your code relies on understanding the DOM, and that stand in will have entirely different DOM, you now have two entirely different interactive trees to worry about. Yikes.

    Further, this situation isn't necessarily short lived, historically. The reality is that lacking universal support can be the case for a long time: Native support for <input type="color"> was added to the last major browser... Checks notes... last year.

    Even after all of that... Guess what? It still isn't acceptable for a lot of people because it isn't styleable -- and in fact, it has entirely different UI and features everywhere. While this can occasionally be very useful on some kinds of devices, it makes things like providing helpful documentation hard. Worse, even the native ones weren't super keyboard accessible either.

    So... That's a lot of barriers and disincentives.

    Conversely, a custom solution can solve all of these problems now... So it shouldn't actually be surprising that developers often choose a custom solution: There are no 'acceptable losses' for developers here - all of the choices are somehow bad.

    Learn, dammit.

    Now, let's talk about how we get out of this mess.

    You'll be happy to know that there are efforts to both learn and improve existing native interactive elements. If you haven't seen Greg Whitworth and Nicole Sullivan's talk HTML Isn't Done from Chrome Dev Summit last year, I would highly recommend it (below).

    There's also been a heck of a lot of work to make it possible for authors to define their own interactive elements that work just like the native ones. The idea is to empower developers to explore the space and try to help find the really sweet spots that strike all of the right balances.

    It's worth pointing out why this is useful - it's not just an academic exercise that is trying to push responsibility to developers. It's important because the truth is: We don't actually know how. We need to be able to throw lots of things at the wall and see what sticks. The trouble is, currently (historically) failure is really slow and really expensive -- and even our most educated 'guesses' at improvement are still just that: Guesses. We haven't proved we have this figured out yet.

    We need something like the WOPR for playing out possible answers without doing harm: failing, and learning - that's what the custom elements stuff is all about.
    Shall we play a (different) game?

    Ok, but hold on.... Let's talk about non-form related interactive controls now. Realistically, we've come up with really only 1 strictly generic non-form-related UI control in 30 years: <details>. That "other" one in my intial list. Wow... And we've had that done for like 10 years, right? Let's see, it's been supported universally since... checks notes... Jan 14, 2020. Waaaaat?

    But... Again, it has most of the same kinds of general problems. In the most recent HTTP Almanac data, it was the 102nd most used HTML element... Can you even name 101 other HTML elements? It has styling problems. It's misunderstood. It was hard to polyfill well without just creating other problems.

    So... This is terrible.

    Our continued problems in this space are part of the reason there is hesitancy to take up entirely new things... Like, tabs.

    Back in 2015, I managed to get a number of people in Web standards (especially a11y) interested in working on a possible new element proposal for HTML which would have given not only the tabset but some other things as well. Determined not to be doomed by the same sorts of stylability problems, we set out to define things that would become shadow parts and themes and custom properties and so on.

    But, in many ways, it wasn't great. Recently, this has got me to thinking: What if this isn't even the right game we're playing? What if for some elements at least - non-form associated ones - the right lesson to take away is that this is an unnecessary exercise in futility?

    Maybe we need a different kind of experiment: One that just says "Strange game. The only winning move is not to play".

    How about a nice game of chess?

    So, here's the basic idea: What if we just created a resilient pattern focused on mainly function and meaning -- and hardly at all on the UI aspect. What if instead of inventing complex new ways to strike the balance of preventing authors from styling too much, we mostly just acknowledged the fact that they want to, and... let them.

    What if, like with <video> you could kind of just take control of the UI, but... maybe without throwing it away entirely.

    I spoke with Greg Whitworth about this, who has been investigating how to improve the existing controls on the Web (see also Can we please style select?.

    In fact, he completely agreed with the overall premise and went on to described how...

    The core mission of Open UI, a new W3C community group, is to document the anatomies, behaviors and states of components and controls from across frameworks. By doing this, we can ensure that the key pieces that folks don't want to re-create, they don't have to. We've got a few different ideas that are beginning to take shape but changes like this are like peeling an onion. If it's a new control/component then it would be a bit more straight forward but the most painful controls we've found have been on the web since the 90s. So we'll be gathering telemetry and ensuring that any modifications we ultimately propose are web compatible.

    In our conversation we discussed how things like tabs seem like an interesting target where this kind of thing might just work really well. The particular shape of the tabs problem lends itself, I think, to easily trying something completely different and entirely without past baggage.

    So, in that light here's a demo/link to just such an experiment which gives us... tabs! - and which we think has some nice qualities:

    • It is a declarative decorator over otherwise good but non-interactive semantics. That means there's no miss of differing interaction issues here caused by complex inheritance. It's composition.
    • It doesn't change your light DOM, and it barely uses Shadow DOM. There's no "two trees" problem, and there's not secrets or new challenges. Go ahead, use CSS and JavaScript.
    • It adds the functionality: roving focus, keyboard handling, accessibility stuff and a simple container for you. That's it, really.
    • If you really want to opt in to some simple 'default styles' and like the 'minor tweaksonly' approach, it has an attribute that enables 'native' look and feel, with a bit less flexibility, but which might be enough for some people. It just starts with the assumption that that's not the case.

    So... that's it. It could for sure use more work, but WDYT? Would you try it out? Let us know your thoughts? Such experiments will be useful in informing work and directions for efforts like Open-UI and possible future standards.

    Will it help? To be honest, I don't know, but maybe the right way to learn involves trying something really different, and I'm open to anything...

    April 29, 2020 04:00 AM

    April 14, 2020

    Andy Wingo

    understanding webassembly code generation throughput

    Greets! Today's article looks at browser WebAssembly implementations from a compiler throughput point of view. As I wrote in my article on Firefox's WebAssembly baseline compiler, web browsers have multiple wasm compilers: some that produce code fast, and some that produce fast code. Implementors are willing to pay the cost of having multiple compilers in order to satisfy these conflicting needs. So how well do they do their jobs? Why bother?

    In this article, I'm going to take the simple path and just look at code generation throughput on a single chosen WebAssembly module. Think of it as X-ray diffraction to expose aspects of the inner structure of the WebAssembly implementations in SpiderMonkey (Firefox), V8 (Chrome), and JavaScriptCore (Safari).

    experimental setup

    As a workload, I am going to use a version of the "Zen Garden" demo. This is a 40-megabyte game engine and rendering demo, originally released for other platforms, and compiled to WebAssembly a couple years later. Unfortunately the original URL for the demo was disabled at some point in late 2019, so it no longer has a home on the web. A bit of a weird situation and I am not clear on licensing either. In any case I have a version downloaded, and have hacked out a minimal set of "imports" that the WebAssembly module needs from the host to allow the module to compile and link when run from a JavaScript shell, without requiring WebGL and similar facilities. So the benchmark is just to instantiate a WebAssembly module from the 40-megabyte byte array and see how long it takes. It would be better if I had more test cases (and would be happy to add them to the comparison!) but this is a start.

    I start by benchmarking the various WebAssembly implementations, firstly in their standard configuration and then setting special run-time flags to measure the performance of the component compilers. I run these tests on the core-rich machine that I use for browser development (2 Xeon Silver 4114 CPUs for a total of 40 logical cores). The default-configuration numbers are therefore not indicative of performance on a low-end Android phone, but we can use them to extract aspects of the different implementations.

    Since I'm interested in compiler throughput, I'm not particularly concerned about how well a compiler will use all 40 cores. Therefore when testing the specific compilers I will set implementation-specific flags to disable parallelism in the compiler and GC: --single-threaded on V8, --no-threads on SpiderMonkey, and --useConcurrentGC=false --useConcurrentJIT=false on JSC. To further restrict any threads that the implementation might decide to spawn, I'll bind these to a single core on my machine using taskset -c 4. Otherwise the machine is in its normal configuration (nothing else significant running, all cores available for scheduling, turbo boost enabled).

    I'll express results in nanoseconds per WebAssembly code byte. Of the 40 megabytes or so in the Zen Garden demo, only 23 891 164 bytes are actually function code; the rest is mostly static data (textures and so on). So I'll divide the total time by this code byte count.

    I tested V8 at git revision 0961376575206, SpiderMonkey at hg revision 8ec2329bef74, and JavaScriptCore at subversion revision 259633. The benchmarks can be run using just a shell; see the pull request. I timed how long it took to instantiate the Zen Garden demo, ensuring that a basic export was callable. I collected results from 20 separate runs, sleeping a second between them. The bars in the charts below show the median times, with a histogram overlay of all results.

    results & analysis

    We can see some interesting results in this graph. Note that the Y axis is logarithmic. The "concurrent tiering" results in the graph correspond to the default configurations (no special flags, no taskset, all cores available).

    The first interesting conclusions that pop out for me concern JavaScriptCore, which is the only implementation to have a baseline interpreter (run using --useWasmLLInt=true --useBBQJIT=false --useOMGJIT=false). JSC's WebAssembly interpreter is actually structured as a compiler that generates custom WebAssembly-specific bytecode, which is then run by a custom interpreter built using the same infrastructure as JSC's JavaScript interpreter (the LLInt). Directly interpreting WebAssembly might be possible as a low-latency implementation technique, but since you need to validate the WebAssembly anyway and eventually tier up to an optimizing compiler, apparently it made sense to emit fresh bytecode.

    The part of JSC that generates baseline interpreter code runs slower than SpiderMonkey's baseline compiler, so one is tempted to wonder why JSC bothers to go the interpreter route; but then we recall that on iOS, we can't generate machine code in some contexts, so the LLInt does appear to address a need.

    One interesting feature of the LLInt is that it allows tier-up to the optimizing compiler directly from loops, which neither V8 nor SpiderMonkey support currently. Failure to tier up can be quite confusing for users, so good on JSC hackers for implementing this.

    Finally, while baseline interpreter code generation throughput handily beats V8's baseline compiler, it would seem that something in JavaScriptCore is not adequately taking advantage of multiple cores; if one core compiles at 51ns/byte, why do 40 cores only do 41ns/byte? It could be my tests are misconfigured, or it could be that there's a nice speed boost to be found somewhere in JSC.

    JavaScriptCore's baseline compiler (run using --useWasmLLInt=false --useBBQJIT=true --useOMGJIT=false) runs much more slowly than SpiderMonkey's or V8's baseline compiler, which I think can be attributed to the fact that it builds a graph of basic blocks instead of doing a one-pass compile. To me these results validate SpiderMonkey's and V8's choices, looking strictly from a latency perspective.

    I don't have graphs for code generation throughput of JavaSCriptCore's optimizing compiler (run using --useWasmLLInt=false --useBBQJIT=false --useOMGJIT=true); it turns out that JSC wants one of the lower tiers to be present, and will only tier up from the LLInt or from BBQ. Oh well!

    V8 and SpiderMonkey, on the other hand, are much of the same shape. Both implement a streaming baseline compiler and an optimizing compiler; for V8, we get these via --liftoff --no-wasm-tier-up or --no-liftoff, respectively, and for SpiderMonkey it's --wasm-compiler=baseline or --wasm-compiler=ion.

    Here we should conclude directly that SpiderMonkey generates code around twice as fast as V8 does, in both tiers. SpiderMonkey can generate machine code faster even than JavaScriptCore can generate bytecode, and optimized machine code faster than JSC can make baseline machine code. It's a very impressive result!

    Another conclusion concerns the efficacy of tiering: for both V8 and SpiderMonkey, their baseline compilers run more than 10 times as fast as the optimizing compiler, and the same ratio holds between JavaScriptCore's baseline interpreter and compiler.

    Finally, it would seem that the current cross-implementation benchmark for lowest-tier code generation throughput on a desktop machine would then be around 50 ns per WebAssembly code byte for a single core, which corresponds to receiving code over the wire at somewhere around 160 megabits per second (Mbps). If we add in concurrency and manage to farm out compilation tasks well, we can obviously double or triple that bitrate. Optimizing compilers run at least an order of magnitude slower. We can conclude that to the desktop end user, WebAssembly compilation time is indistinguishable from download time for the lowest tier. The optimizing tier is noticeably slower though, running more around 10-15 Mbps per core, so time-to-tier-up is still a concern for faster networks.

    Going back to the question posed at the start of the article: yes, tiering shows a clear benefit in terms of WebAssembly compilation latency, letting users interact with web sites sooner. So that's that. Happy hacking and until next time!

    by Andy Wingo at April 14, 2020 08:59 AM

    April 08, 2020

    Andy Wingo

    multi-value webassembly in firefox: a binary interface

    Hey hey hey! Hope everyone is staying safe at home in these weird times. Today I have a final dispatch on the implementation of the multi-value feature for WebAssembly in Firefox. Last week I wrote about multi-value in blocks; this week I cover function calls.

    on the boundaries between things

    In my article on Firefox's baseline compiler, I mentioned that all WebAssembly engines in web browsers treat the function as the unit of compilation. This facilitates streaming, parallel compilation of WebAssembly modules, by farming out compilation of individual functions to worker threads. It also allows for easy tier-up from quick-and-dirty code generated by the low-latency baseline compiler to the faster code produced by the optimizing compiler.

    There are some interesting Conway's Law implications of this choice. One is that division of compilation tasks becomes an opportunity for division of human labor; there is a whole team working on the experimental Cranelift compiler that could replace the optimizing tier, and in my hackings on Firefox I have had minimal interaction with them. To my detriment, of course; they are fine people doing interesting things. But the code boundary means that we don't need to communicate as we work on different parts of the same system.

    Boundaries are where places touch, and sometimes for fluid crossing we have to consider boundaries as places in their own right. Functions compiled with the baseline compiler, with Ion (the production optimizing compiler), and with Cranelift (the experimental optimizing compiler) are all able to call each other because they actively maintain a common boundary, a binary interface (ABI). (Incidentally the A originally stands for "application", essentially reflecting division of labor between groups of people making different components of a software system; Conway's Law again.) Let's look closer at this boundary-place, with an eye to how it changes with multi-value.

    what's in an ABI?

    Among other things, an ABI specifies a calling convention: which arguments go in registers, which on the stack, how the stack values are represented, how results are returned to the callers, which registers are preserved over calls, and so on. Intra-WebAssembly calls are a closed world, so we can design a custom ABI if we like; that's what V8 does. Sometimes WebAssembly may call functions from the run-time, though, and so it may be useful to be closer to the C++ ABI on that platform (the "native" ABI); that's what Firefox does. (Incidentally here I think Firefox is probably leaving a bit of performance on the table on Windows by using the inefficient native ABI that only allows four register parameters. I haven't measured though so perhaps it doesn't matter.) Using something closer to the native ABI makes debugging easier as well, as native debugger tools can apply more easily.

    One thing that most native ABIs have in common is that they are really only optimized for a single result. This reflects their heritage as artifacts from a world built with C and C++ compilers, where there isn't a concept of a function with more than one result. If multiple results are required, they are represented instead as arguments, typically as pointers to memory somewhere. Consider the AMD64 SysV ABI, used on Unix-derived systems, which carefully specifies how to pass arbitrary numbers of arbitrary-sized data structures to a function (§3.2.3), while only specifying what to do for a single return value. If the return value is too big for registers, the ABI specifies that a pointer to result memory be passed as an argument instead.

    So in a multi-result WebAssembly world, what are we to do? How should a function return multiple results to its caller? Let's assume that there are some finite number of general-purpose and floating-point registers devoted to return values, and that if the return values will fit into those registers, then that's where they go. The problem is then to determine which results will go there, and if there are remaining results that don't fit, then we have to put them in memory. The ABI should indicate how to address that memory.

    When looking into a design, I considered three possibilities.

    first thought: stack results precede stack arguments

    When a function needs some of its arguments passed on the stack, it doesn't receive a pointer to those arguments; rather, the arguments are placed at a well-known offset to the stack pointer.

    We could do the same thing with stack results, either reserving space deeper on the stack than stack arguments, or closer to the stack pointer. With the advent of tail calls, it would make more sense to place them deeper on the stack. Like this:

    The diagram above shows the ordering of stack arguments as implemented by Firefox's WebAssembly compilers: later arguments are deeper (farther from the stack pointer). It's an arbitrary choice that happens to match up with what the native ABIs do, as it was easier to re-use bits of the already-existing optimizing compiler that way. (Native ABIs use this stack argument ordering because of sloppiness in a version of C from before I was born. If you were starting over from scratch, probably you wouldn't do things this way.)

    Stack result order does matter to the baseline compiler, though. It's easier if the stack results are placed in the same order in which they would be pushed on the virtual stack, so that when the function completes, the results can just be memmove'd down into place (if needed). The same concern dictates another aspect of our ABI: unlike calls, registers are allocated to the last results rather than the first results. This is to make it easy to preserve stack invariant (1) from the previous article.

    At first I thought this was the obvious option, but I ran into problems. It turns out that stack arguments are fundamentally unlike stack results in some important ways.

    While a stack argument is logically consumed by a call, a stack result starts life with a call. As such, if you reserve space for stack results just by decrementing the stack pointer before a call, probably you will need to load the results eagerly into registers thereafter or shuffle them into other positions to be able to free the allocated stack space.

    Eager shuffling is busy-work that should be avoided if possible. It's hard to avoid in the baseline compiler. For example, a call to a function with 10 arguments will consume 10 values from the temporary stack; any results will be pushed on after removing argument values from the stack. If there any stack results, it's almost impossible to avoid a post-call memmove, to move stack results to where they should be before the 10 argument values were pushed on (and probably spilled). So the baseline compiler case is not optimal.

    However, things get gnarlier with the Ion optimizing compiler. Like many other optimizing compilers, Ion is designed to compute the necessary stack frame size ahead of time, and to never move the stack pointer during an activation. The only exception is for pushing on any needed stack arguments for nested calls (which are popped directly after the nested call). So in that case, assuming there are a number of multi-value calls in a stack frame, we'll be shuffling in the optimizing compiler as well. Not great.

    Besides the need to shuffle, stack arguments and stack results differ as regards ownership and garbage collection. A callee "owns" the memory for its stack arguments; it is responsible for them. The caller can't assume anything about the contents of that memory after a call, especially if the WebAssembly implementation supports tail calls (a whole 'nother blog post, that). If the values being passed are just bits, that's one thing, but with the reference types proposal, some result values may be managed by the garbage collector. The callee is responsible for making stack arguments visible to the garbage collector; the caller is responsible for the results. The caller will need to emit metadata to allow the garbage collector to see stack result references. For this reason, a stack result actually starts life just before a call, because it can become initialized at any point and thus needs to be traced during the entire callee activation. Not all callers can easily add garbage collection roots for writable stack slots, so the need to place stack results in a fixed position complicates calling multi-value WebAssembly functions in some cases (e.g. from C++).

    second thought: pointers to individual stack results

    Surely there are more well-trodden solutions to the multiple-result problem. If we encoded a multi-value return in C, how would we do it? Consider a function in C that has three 64-bit integer results. The idiomatic way to encode it would be to have one of the results be the return value of the function, and the two others to be passed "by reference":

    int64_t foo(int64_t* a, int64_t* b) {
      *a = 1;
      *b = 2;
      return 3;
    void call_foo(void) {
      int64 a, b, c;
      c = foo(&a, &b);

    This program shows us a possibility for encoding WebAssembly's multiple return values: pass an additional argument for each stack result, pointing to the location to which to write the stack result. Like this:

    The result pointers are normal arguments, subject to normal argument allocation. In the above example, given that there are already stack arguments, they will probably be passed on the stack, but in many cases the stack result pointers may be passed in registers.

    The result locations themselves don't even need to be on the stack, though they certainly will be in intra-WebAssembly calls. However the ability to write to any memory is a useful form of flexibility when e.g. calling into WebAssembly from C++.

    The advantage of this approach is that we eliminate post-call shuffles, at least in optimizing compilers. But, having to make an argument for each stack result, each of which might itself become a stack argument, seems a bit offensive. I thought we might be able to do a little better.

    third thought: stack result area, passed as pointer

    Given that stack results are going to be written to memory, it doesn't really matter where they will be written, from the perspective of the optimizing compiler at least. What if we allocated them all in a block and just passed one pointer to the block? Like this:

    Here there's just one additional argument, no matter how many stack results. While we're at it, we can specify that the layout of the stack arguments should be the same as how they would be written to the baseline stack, to make the baseline compiler's job easier.

    As I started implementation with the baseline compiler, I chose this third approach, essentially because I was already allocating space for the results in a block in this way by bumping the stack pointer.

    When I got to the optimizing compiler, however, it was quite difficult to convince Ion to allocate an area on the stack of the right shape.

    Looking back on it now, I am not sure that I made the right choice. The thing is, the IonMonkey compiler started life as an optimizing compiler for JavaScript. It can represent unboxed values, which is how it came to be used as a compiler for asm.js and later WebAssembly, and it does a good job on them. However it has never had to represent aggregate data structures like a C++ class, so it didn't have support for spilling arbitrary-sized data to the stack. It took a while staring at the register allocator to convince it to allocate arbitrary-sized stack regions, and then to allocate component scalar values out of those regions. If I had just asked the register allocator to give me one appropriate-sized stack slot for each scalar, and hacked out the ability to pass separate pointers to the stack slots to WebAssembly calls with stack results, then I would have had an easier time of it, and perhaps stack slot allocation could be more dense because multiple results wouldn't need to be allocated contiguously.

    As it is, I did manage to hack it in, and I think in a way that doesn't regress. I added a layer over an argument type vector that adds a synthetic stack results pointer argument, if the function returns stack results; iterating over this type with ABIArgIter will allocate a stack result area pointer, either as a register argument or a stack argument. In the optimizing compiler, I added add a kind of value allocation coresponding to a variable-sized stack area, (using pointer tagging again!), and extended the register allocator to allocate LStackArea, and the component stack results. Interestingly, I had to add a kind of definition that starts life on the stack; previously all Ion results started life in registers and were only spilled if needed.

    In the end, a function will capture the incoming stack result area argument, either as a normal SSA value (for Ion) or stored to a stack slot (baseline), and when returning will write stack results to that pointer as appropriate. Passing in a pointer as an argument did make it relatively easy to implement calls from WebAssembly to and from C++, getting the variable-shape result area to be known to the garbage collector for C++-to-WebAssembly calls was simple in the end but took me a while to figure out.

    Finally I was a bit exhausted from multi-value work and ready to walk away from the "JS API", the bit that allows multi-value WebAssembly functions to be called from JavaScript (they return an array) or for a JavaScript function to return multiple values to WebAssembly (via an iterable) -- but then when I got to thinking about this blog post I preferred to implement the feature rather than document its lack. Avoidance-of-document-driven development: it's a thing!

    towards deployment

    As I said in the last article, the multi-value feature is about improved code generation and also making a more capable base for expressing further developments in the WebAssembly language.

    As far as code generation goes, things are progressing but it is still early days. Thomas Lively has implemented support in LLVM for emitting return of C++ aggregates via multiple results, which is enabled via the -experimental-multivalue-abi cc1 flag. Thomas has also been implementing multi-value support in the binaryen WebAssembly toolchain component, used by the emscripten C++-to-WebAssembly toolchain. I think it will be a few months though before everything lands in a way that end users can take advantage of.

    On the specification side, the multi-value feature is now at phase 4 since January, which basically means things are all done there.

    Implementation-wise, V8 has had experimental support since 2017 or so, and the feature was staged last fall, although V8 doesn't yet support multi-value in their baseline compiler. WebKit also landed support last fall.

    Unlike V8 and SpiderMonkey, JavaScriptCore (the JS and wasm engine in WebKit) actually implements a WebAssembly interpreter as their solution to the one-pass streaming compilation problem. Then on the compiler side, there are two tiers that both operate on basic block graphs (OMG and BBQ; I just puked a little in my mouth typing that). This strategy makes the compiler implementation quite straightforward. It's also an interesting design point because JavaScriptCore's garbage collector scans the stack conservatively; there's no need for the compiler to do bookkeeping on the GC's behalf, which I'm sure was a relief to the hacker. Anyway, multi-value in WebKit is done too.

    The new thing of course is that finally, in Firefox, the feature is now fully implemented (woo) and enabled by default on Nightly builds (woo!). I did that! It took me a while! Perhaps too long? Anyway it's done. Thanks again to Bloomberg for supporting this work; large ups to y'all for helping the web move forward.

    See you next time with a more general article rounding up compile-time benchmarks on a variety of WebAssembly implementations. Until then, happy hacking!

    by Andy Wingo at April 08, 2020 09:02 AM

    April 03, 2020

    Andy Wingo

    multi-value webassembly in firefox: from 1 to n

    Greetings, hackers! Today I'd like to write about something I worked on recently: implementation of the multi-value future feature of WebAssembly in Firefox, as sponsored by Bloomberg.

    In the "minimum viable product" version of WebAssembly published in 2018, there were a few artificial restrictions placed on the language. Functions could only return a single value; if a function would naturally return two values, it would have to return at least one of them by writing to memory. Loops couldn't take parameters; any loop state variables had to be stored to and loaded from indexed local variables at each iteration. Similarly, any block that would naturally return more than one result would also have to do so via locals.

    This restruction is lifted with the multi-value proposal. Function types now map from result type to result type, where a result type is a sequence of value types. That is to say, just as functions can take multiple arguments, they can return multiple results. Similarly, with the multi-value proposal, block types are now the same as function types: loops and blocks can take arguments and return any number of results. This change improves the expressiveness of WebAssembly as a compilation target; a C++ program compiled to multi-value WebAssembly can be encoded in fewer bytes than before. Multi-value also establishes a base for other language extensions. For example, the exception handling proposal builds on multi-value to pass multiple values to catch blocks.

    So, that's multi-value. You would think that relaxing a restriction would be easy, but you'd be wrong! This task took me 5 months and had a number of interesting gnarly bits. This article is part one of two about interesting aspects of implementing multi-value in Firefox, specifically focussing on blocks. We'll talk about multi-value function calls next week.

    multi-value in blocks

    In the last article, I presented the basic structure of Firefox's WebAssembly support: there is a baseline compiler optimized for low latency and an optimizing compiler optimized for throughput. (There is also Cranelift, a new experimental compiler that may replace the current implementation of the optimizing compiler; but that doesn't affect the basic structure.)

    The optimizing compiler applies traditional compiler techniques: SSA graph construction, where values flow into and out of graphs using the usual defs-dominate-uses relationship. The only control-flow joins are loop entry and (possibly) block exit, so the addition of loop parameters means in multi-value there are some new phi variables in that case, and the expansion of block result count from [0,1] to [0,n] means that you may have more block exit phi variables. But these compilers are built to handle these situations; you just build the SSA and let the optimizing compiler go to town.

    The problem comes in the baseline compiler.

    from 1 to n

    Recall that the baseline compiler is optimized for compiler speed, not compiled speed. If there are only ever going to be 0 or 1 result from a block, for example, the baseline compiler's internal data structures will use something like a Maybe<ValType> to represent that block result.

    If you then need to expand this to hold a vector of values, the naïve approach of using a Vector<ValType> would mean heap allocation and indirection, and thus would regress the baseline compiler.

    In this case, and in many other similar cases, the solution is to use value tagging to represent 0 or 1 value type directly in a word, and the general case by linking out to an external vector. As block types are function types, they actually appear as function types in the WebAssembly type section, so they are already parsed; the BlockType in that case can just refer out to already-allocated memory.

    In fact this value-tagging pattern applies all over the place. (The jit/ links above are for the optimizing compiler, but they relate to function calls; will write about that next week.) I have a bit of pause about value tagging, in that it's gnarly complexity and I didn't measure the speed of alternative implementations, but it was a useful migration strategy: value tagging minimizes performance risk to existing specialized use cases while adding support for new general cases. Gnarly it is, then.

    control-flow joins

    I didn't mention it in the last article, but there are two important invariants regarding stack discipline in the baseline compiler. Recall that there's a virtual stack, and that some elements of the virtual stack might be present on the machine stack. There are four kinds of virtual stack entry: register, constant, local, and spilled. Locals indicate local variable reads and are mostly like registers in practice; when registers spill to the stack, locals do too. (Why spill to the temporary stack instead of leaving the value in the local variable slot? Because locals are mutable. A local.get captures a local variable value at its point of execution. If future code changes the local variable value, you wouldn't want the captured value to change.)

    Digressing, the stack invariants:

    1. Spilled values precede registers and locals on the virtual stack. If u and v are virtual stack entries and u is older than v, then if u is in a register or is a local, then v is not spilled.

    2. Older values precede newer values on the machine stack. Again for u and v, if they are both spilled, then u will be farther from the stack pointer than v.

    There are five fundamental stack operations in the baseline compiler; let's examine them to see how the invariants are guaranteed. Recall that before multi-value, targets of non-local exits (e.g. of the br instruction) could only receive 0 or 1 value; if there is a value, it's passed in a well-known register (e.g. %rax or %xmm0). (On 32-bit machines, 64-bit values use a well-known pair of registers.)

    Results of WebAssembly operations never push spilled values, neither onto the virtual nor the machine stack. v is either a register, a constant, or a reference to a local. Thus we guarantee both (1) and (2).
    pop() -> v
    Doesn't affect older stack entries, so (1) is preserved. If the newest stack entry is spilled, you know that it is closest to the stack pointer, so you can pop it by first loading it to a register and then incrementing the stack pointer; this preserves (2). Therefore if it is later pushed on the stack again, it will not be as a spilled value, preserving (1).
    When spilling the virtual stack to the machine stack, you first traverse stack entries from new to old to see how far you need to spill. Once you get to a virtual stack entry that's already on the stack, you know that everything older has already been spilled, because of (1), so you switch to iterating back towards the new end of the stack, pushing registers and locals onto the machine stack and updating their virtual stack entries to be spilled along the way. This iteration order preserves (2). Note that because known constants never need to be on the machine stack, they can be interspersed with any other value on the virtual stack.
    return(height, v)
    This is the stack operation corresponding to a block exit (local or nonlocal). We drop items from the virtual and machine stack until the stack height is height. In WebAssembly 1.0, if the target continuation takes a value, then the jump passes a value also; in that case, before popping the stack, v is placed in a well-known register appropriate to the value type. Note however that v is not pushed on the virtual stack at the return point. Popping the virtual stack preserves (1), because a stack and its prefix have the same invariants; popping the machine stack also preserves (2).
    Whereas return operations happen at block exits, capture operations happen at the target of block exits (the continuation). If no value is passed to the continuation, a capture is a no-op. If a value is passed, it's in a register, so we just push that register onto the virtual stack. Both invariants are obviously preserved.

    Note that a value passed to a continuation via return() has a brief instant in which it has no name -- it's not on the virtual stack -- but only a location -- it's in a well-known place. capture() then gives that floating value a name.

    Relatedly, there is another invariant, that the allocation of old values on block entry is the same as their allocation on block exit, so that all predecessors of the block exit flow all values via the same places. This is preserved by spilling on block entry. It's a big hammer, but effective.

    So, given all this, how do we pass multiple values via return()? We don't have unlimited registers, so the %rax strategy isn't going to work.

    The answer for the baseline compiler is informed by our lean into the stack machine principle. Multi-value returns are allocated in such a way that a capture() can push them onto the virtual stack. Because spilled values must precede registers, we therefore allocate older results on the stack, and put the last result in a register (or register pair for i64 on 32-bit platforms). Note that it's possible in theory to allocate multiple results to registers; we'll touch on this next week.

    Therefore the implementation of return(height, is straightforward: we first pop register results, then spill the remaining virtual stack items, then shuffle stack results down towards height. This should result in a memmove of contiguous stack results towards the frame pointer. However because const values aren't present on the machine stack, depending on the stack height difference, it may mean a split between moving some values toward the frame pointer and some towards the stack pointer, then filling in by spilling constants. It's gnarly, but it is what it is. Note that the links to the return and capture implementations above are to the post-multi-value world, so you can see all the details there.

    that's it!

    In summary, the hard part of multi-value blocks was reworking internal compiler data structures to be able to represent multi-value block types, and then figuring out the low-level stack manipulations in the baseline compiler. The optimizing compiler on the other hand was pretty easy.

    When it comes to calls though, that's another story. We'll get to that one next week. Thanks again to Bloomberg for supporting this work; I'm really delighted that Igalia and Bloomberg have been working together for a long time (coming on 10 years now!) to push the web platform forward. A special thanks also to Mozilla's Lars Hansen for his patience reviewing these patches. Until next week, then, stay at home & happy hacking!

    by Andy Wingo at April 03, 2020 10:56 AM

    March 25, 2020

    Andy Wingo

    firefox's low-latency webassembly compiler

    Good day!

    Today I'd like to write a bit about the WebAssembly baseline compiler in Firefox.

    background: throughput and latency

    WebAssembly, as you know, is a virtual machine that is present in web browsers like Firefox. An important initial goal for WebAssembly was to be a good target for compiling programs written in C or C++. You can visit a web page that includes a program written in C++ and compiled to WebAssembly, and that WebAssembly module will be downloaded onto your computer and run by the web browser.

    A good virtual machine for C and C++ has to be fast. The throughput of a program compiled to WebAssembly (the amount of work it can get done per unit time) should be approximately the same as its throughput when compiled to "native" code (x86-64, ARMv7, etc.). WebAssembly meets this goal by defining an instruction set that consists of similar operations to those directly supported by CPUs; WebAssembly implementations use optimizing compilers to translate this portable instruction set into native code.

    There is another dimension of fast, though: not just work per unit time, but also time until first work is produced. If you want to go play Doom 3 on the web, you care about frames per second but also time to first frame. Therefore, WebAssembly was designed not just for high throughput but also for low latency. This focus on low-latency compilation expresses itself in two ways: binary size and binary layout.

    On the size front, WebAssembly is optimized to encode small files, reducing download time. One way in which this happens is to use a variable-length encoding anywhere an instruction needs to specify an integer. In the usual case where, for example, there are fewer than 128 local variables, this means that a local.get instruction can refer to a local variable using just one byte. Another strategy is that WebAssembly programs target a stack machine, reducing the need for the instruction stream to explicitly load operands or store results. Note that size optimization only goes so far: it's assumed that the bytes of the encoded module will be compressed by gzip or some other algorithm, so sub-byte entropy coding is out of scope.

    On the layout side, the WebAssembly binary encoding is sorted by design: definitions come before uses. For example, there is a section of type definitions that occurs early in a WebAssembly module. Any use of a declared type can only come after the definition. In the case of functions which are of course mutually recursive, function type declarations come before the actual definitions. In theory this allows web browsers to take a one-pass, streaming approach to compilation, starting to compile as functions arrive and before download is complete.

    implementation strategies

    The goals of high throughput and low latency conflict with each other. To get best throughput, a compiler needs to spend time on code motion, register allocation, and instruction selection; to get low latency, that's exactly what a compiler should not do. Web browsers therefore take a two-pronged approach: they have a compiler optimized for throughput, and a compiler optimized for latency. As a WebAssembly file is being downloaded, it is first compiled by the quick-and-dirty low-latency compiler, with the goal of producing machine code as soon as possible. After that "baseline" compiler has run, the "optimizing" compiler works in the background to produce high-throughput code. The optimizing compiler can take more time because it runs on a separate thread. When the optimizing compiler is done, it replaces the baseline code. (The actual heuristics about whether to do baseline + optimizing ("tiering") or just to go straight to the optimizing compiler are a bit hairy, but this is a summary.)

    This article is about the WebAssembly baseline compiler in Firefox. It's a surprising bit of code and I learned a few things from it.

    design questions

    Knowing what you know about the goals and design of WebAssembly, how would you implement a low-latency compiler?

    It's a question worth thinking about so I will give you a bit of space in which to do so.




    After spending a lot of time in Firefox's WebAssembly baseline compiler, I have extracted the following principles:

    1. The function is the unit of compilation

    2. One pass, and one pass only

    3. Lean into the stack machine

    4. No noodling!

    In the remainder of this article we'll look into these individual points. Note, although I have done a good bit of hacking on this compiler, its design and original implementation comes mainly from Mozilla hacker Lars Hansen, who also currently maintains it. All errors of exegesis are mine, of course!

    the function is the unit of compilation

    As we mentioned, in the binary encoding of a WebAssembly module, all definitions needed by any function come before all function definitions. This naturally leads to a partition between two phases of bytestream parsing: an initial serial phase that collects the set of global type definitions, annotations as to which functions are imported and exported, and so on, and a subsequent phase that compiles individual functions in an essentially independent manner.

    The advantage of this approach is that compiling functions is a natural task unit of parallelism. If the user has a machine with 8 virtual cores, the web browser can keep one or two cores for the browser itself and farm out WebAssembly compilation tasks to the rest. The result is that the compiled code is available sooner.

    Taking functions to be the unit of compilation also allows for an easy "tier-up" mechanism: after the baseline compiler is done, the optimizing compiler can take more time to produce better code, and when it is done, it can swap out the results on a per-function level. All function calls from the baseline compiler go through a jump table indirection, to allow for tier-up. In SpiderMonkey there is no mechanism currently to tier down; if you need to debug WebAssembly code, you need to refresh the page, causing the wasm code to be compiled in debugging mode. For the record, SpiderMonkey can only tier up at function calls (it doesn't do OSR).

    This simple approach does have some down-sides, in that it leaves intraprocedural optimizations on the table (inlining, contification, custom calling conventions, speculative optimizations). This is mitigated in two ways, the most obvious being that LLVM or whatever produced the WebAssembly has ideally already done whatever inlining might be fruitful. The second is that WebAssembly is designed for predictable performance. In JavaScript, an implementation needs to do run-time type feedback and speculative optimizations to get good performance, but the result is that it can be hard to understand why a program is fast or slow. The designers and implementers of WebAssembly in browsers all had first-hand experience with JavaScript virtual machines, and actively wanted to avoid unpredictable performance in WebAssembly. Therefore there is currently a kind of détente among the various browser vendors, that everyone has agreed that they won't do speculative inlining -- yet, anyway. Who knows what will happen in the future, though.

    Digressing, the summary here is that the baseline compiler receives an individual function body as input, and generates code just for that function.

    one pass, and one pass only

    The WebAssembly baseline compiler makes one pass through the bytecode of a function. Nowhere in all of this are we going to build an abstract syntax tree or a graph of basic blocks. Let's follow through how that works.

    Firstly, emitFunction simply emits a prologue, then the body, then an epilogue. emitBody is basically a big loop that consumes opcodes from the instruction stream, dispatching to opcode-specific code emitters (e.g. emitAddI32).

    The opcode-specific code emitters are also responsible for validating their arguments; for example, emitAddI32 is wrapped in an assertion that there are two i32 values on the stack. This validation logic is shared by a templatized codestream iterator so that it can be re-used by the optimizing compiler, as well as by the publicly-exposed WebAssembly.validate function.

    A corollary of this approach is that machine code is emitted in bytestream order; if the WebAssembly instruction stream has an i32.add followed by a i32.sub, then the machine code will have an addl followed by a subl.

    WebAssembly has a syntactically limited form of non-local control flow; it's not goto. Instead, instructions are contained in a tree of nested control blocks, and control can only exit nonlocally to a containing control block. There are three kinds of control blocks: jumping to a block or an if will continue at the end of the block, whereas jumping to a loop will continue at its beginning. In either case, as the compiler keeps a stack of nested control blocks, it has the set of valid jump targets and can use the usual assembler logic to patch forward jump addresses when the compiler gets to the block exit.

    lean into the stack machine

    This is the interesting bit! So, WebAssembly instructions target a stack machine. That is to say, there's an abstract stack onto which evaluating i32.const 32 pushes a value, and if followed by i32.const 10 there would then be i32(32) | i32(10) on the stack (where new elements are added on the right). A subsequent i32.add would pop the two values off, and push on the result, leaving the stack as i32(42). There is also a fixed set of local variables, declared at the beginning of the function.

    The easiest thing that a compiler can do, then, when faced with a stack machine, is to emit code for a stack machine: as values are pushed on the abstract stack, emit code that pushes them on the machine stack.

    The downside of this approach is that you emit a fair amount of code to do read and write values from the stack. Machine instructions generally take arguments from registers and write results to registers; going to memory is a bit superfluous. We're willing to accept suboptimal code generation for this quick-and-dirty compiler, but isn't there something smarter we can do for ephemeral intermediate values?

    Turns out -- yes! The baseline compiler keeps an abstract value stack as it compiles. For example, compiling i32.const 32 pushes nothing on the machine stack: it just adds a ConstI32 node to the value stack. When an instruction needs an operand that turns out to be a ConstI32, it can either encode the operand as an immediate argument or load it into a register.

    Say we are evaluating the i32.add discussed above. After the add, where does the result go? For the baseline compiler, the answer is always "in a register" via pushing a new RegisterI32 entry on the value stack. The baseline compiler includes a stupid register allocator that spills the value stack to the machine stack if no register is available, updating value stack entries from e.g. RegisterI32 to MemI32. Note, a ConstI32 never needs to be spilled: its value can always be reloaded as an immediate.

    The end result is that the baseline compiler avoids lots of stack store and load code generation, which speeds up the compiler, and happens to make faster code as well.

    Note that there is one limitation, currently: control-flow joins can have multiple predecessors and can pass a value (in the current WebAssembly specification), so the allocation of that value needs to be agreed-upon by all predecessors. As in this code:

    (func $f (param $arg i32) (result i32)
      (block $b (result i32)
        (i32.const 0)
        (local.get $arg)
        (br_if $b) ;; return 0 from $b if $arg is zero
        (i32.const 1))) ;; otherwise return 1
    ;; result of block implicitly returned

    When the br_if branches to the block end, where should it put the result value? The baseline compiler effectively punts on this question and just puts it in a well-known register (e.g., $rax on x86-64). Results for block exits are the only place where WebAssembly has "phi" variables, and the baseline compiler allocates all integer phi variables to the same register. A hack, but there we are.

    no noodling!

    When I started to hack on the baseline compiler, I did a lot of code reading, and eventually came on code like this:

    void BaseCompiler::emitAddI32() {
      int32_t c;
      if (popConstI32(&c)) {
        RegI32 r = popI32();
        masm.add32(Imm32(c), r);
      } else {
        RegI32 r, rs;
        pop2xI32(&r, &rs);
        masm.add32(rs, r);

    I said to myself, this is silly, why are we only emitting the add-immediate code if the constant is on top of the stack? What if instead the constant was the deeper of the two operands, why do we then load the constant into a register? I asked on the chat channel if it would be OK if I improved codegen here and got a response I was not expecting: no noodling!

    The reason is, performance of baseline-compiled code essentially doesn't matter. Obviously let's not pessimize things but the reason there's a baseline compiler is to emit code quickly. If we start to add more code to the baseline compiler, the compiler itself will slow down.

    For that reason, changes are only accepted to the baseline compiler if they are necessary for some reason, or if they improve latency as measured using some real-world benchmark (time-to-first-frame on Doom 3, for example).

    This to me was a real eye-opener: a compiler optimized not for the quality of the code that it generates, but rather for how fast it can produce the code. I had seen this in action before but this example really brought it home to me.

    The focus on compiler throughput rather than compiled-code throughput makes it pretty gnarly to hack on the baseline compiler -- care has to be taken when adding new features not to significantly regress the old. It is much more like hacking on a production JavaScript parser than your traditional SSA-based compiler.

    that's a wrap!

    So that's the WebAssembly baseline compiler in SpiderMonkey / Firefox. Until the next time, happy hacking!

    by Andy Wingo at March 25, 2020 04:29 PM

    March 16, 2020

    Víctor Jáquez

    Review of the Igalia Multimedia team Activities (2019/H2)

    This blog post is a review of the various activities the Igalia Multimedia team was involved along the second half of 2019.

    Here are the previous 2018/H2 and 2019/H1 reports.


    Succinctly, GstWPE is a GStreamer plugin which allows to render web-pages as a video stream where it frames are GL textures.

    Phil, its main author, wrote a blog post explaning at detail what is GstWPE and its possible use-cases. He wrote a demo too, which grabs and previews a live stream from a webcam session and blends it with an overlay from wpesrc, which displays HTML content. This composited live stream can be broadcasted through YouTube or Twitch.

    These concepts are better explained by Phil himself in the following lighting talk, presented at the last GStreamer Conference in Lyon:

    Video Editing

    After implementing a deep integration of the GStreamer Editing Services (a.k.a GES) into Pixar’s OpenTimelineIO during the first half of 2019, we decided to implement an important missing feature for the professional video editing industry: nested timelines.

    Toward that goal, Thibault worked with the GSoC student Swayamjeet Swain to implement a flexible API to support nested timelines in GES. This means that users of GES can now decouple each scene into different projects when editing long videos. This work is going to be released in the upcoming GStreamer 1.18 version.

    Henry Wilkes also implemented the support for nested timeline in OpenTimelineIO making GES integration one of the most advanced one as you can see on that table:

    Single Track of Clips ✔ ✔ ✔ ✔ ✔ W-O ✔ ✔
    Multiple Video Tracks ✔ ✖ ✔ ✔ ✔ W-O ✔ ✔
    Audio Tracks & Clips ✔ ✔ ✔ ✔ ✔ W-O ✔ ✔
    Gap/Filler ✔ ✔ ✔ ✔ ✔ ✔ ✖ ✔
    Markers ✔ ✔ ✔ ✔ ✖ N/A ✖ ✔
    Nesting ✔ ✖ ✔ ✔ ✔ W-O ✔ ✔
    Transitions ✔ ✔ ✖ ✖ ✔ W-O ✖ ✔
    Audio/Video Effects ✖ ✖ ✖ ✖ ✖ N/A ✖ ✔
    Linear Speed Effects ✔ ✔ ✖ ✖ R-O ✖ ✖ ✖
    Fancy Speed Effects ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖
    Color Decision List ✔ ✔ ✖ ✖ ✖ ✖ N/A ✖

    Along these lines, Thibault delivered a 15 minutes talk, also in the GStreamer Conference 2019:

    After detecting a few regressions and issues in GStreamer, related to frame accuracy, we decided to make sure that we can seek in a perfectly frame accurate way using GStreamer and the GStreamer Editing Services. In order to ensure that, an extensive integration testsuite has been developed, mostly targeting most important container formats and codecs (namely mxf, quicktime, h264, h265, prores, jpeg) and issues have been fixed in different places. On top of that, new APIs are being added to GES to allow expressing times in frame number instead of nanoseconds. This work is still ongoing but should be merged in time for GStreamer 1.18.

    GStreamer Validate Flow

    GstValidate has been turning into one of the most important GStreamer testing tools to check that elements behave as they are supposed to do in the framework.

    Along with our MSE work, we found that other way to specify tests, related with produced buffers and events through specific pads, was needed. Thus, Alicia developed a new plugin for GstValidate: Validate Flow.

    Alicia gave an informative 30 minutes talk about GstValidate and the new plugin in the last GStreamer Conference too:

    GStreamer VAAPI

    Most of the work along the second half of 2019 were maintenance tasks and code reviews.

    We worked mainly on memory restrictions per backend driver, and we reviewed a big refactor: internal encoders now use GstObject, instead of the custom GstVaapiObject. Also we reviewed patches for new features such as video rotation and cropping in vaapipostproc.

    Servo multimedia

    Last year we worked integrating media playing in Servo. We finally delivered hardware accelerated video playback in Linux and Android. We worked also for Windows and Mac ports but they were not finished. As natural, most of the work were in servo/media crate, pushing code and reviewing contributions. The major tasks were to rewrite the media player example and the internal source element looking to handle the download playbin‘s flag properly.

    We also added WebGL integration support with <video> elements, thus webpages can use video frames as WebGL textures.

    Finally we explored how to isolate the multimedia processing in a dedicated thread or process, but that task remains pending.

    WebKit Media Source Extension

    We did a lot of downstream and upstream bug fixing and patch review, both in WebKit and GStreamer, for our MSE GStreamer-based backend.

    Along this line we improved WebKitMediaSource to use playbin3 but also compatibility with older GStreamer versions was added.

    WebKit WebRTC

    Most of the work in this area were maintenance and fix regressions uncovered by the layout tests. Besides, the support for the Rasberry Pi was improved by handling encoded streams from v4l2 video sources, with some explorations with Minnowboard on top of that.


    GStreamer Conference

    Igalia was Gold sponsor this last GStreamer Conference held in Lyon, France.

    All team attended and five talks were delivered. Only Thibault presented, besides the video editing one which we already referred, another two more: One about GstTranscoder API and the other about the new documentation infrastructure based in Hotdoc:

    We also had a productive hackfest, after the conference, where we worked on AV1 Rust decoder, HLS Rust demuxer, hardware decoder flag in playbin, and other stuff.

    Linaro Connect

    Phil attended the Linaro Connect conference in San Diego, USA. He delivered a talk about WPE/Multimedia which you can enjoy here:


    Charlie attended Demuxed, in San Francisco. The conference is heavily focused on streaming and codec engineering and validation. Sadly there are not much interest in GStreamer, as the main focus is on FFmpeg.


    Phil and I attended the last RustFest in Barcelona. Basically we went to meet with the Rust community and we attended the “WebRTC with GStreamer-rs” workshop presented by Sebastian Dröge.

    by vjaquez at March 16, 2020 03:20 PM

    March 09, 2020

    Jacobo Aragunde

    Mapping the input method implementations in Chromium

    This is an overview of the different input method implementations in Chromium, centered in Linux platforms.

    By Rime-devel – 小狼毫输入法界面 / Interface of Weasel Input Method, GPL, Link

    Native IME

    InputMethodAuraLinux is the central piece here. It implements two important interfaces and triggers the construction of the platform-specific bits. This is the structure for the Wayland/Ozone backend:

    The InputMethodAuraLinux triggers the construction of a specific LinuxInputMethodContext through the Factory. In the case above, it would be an instance of WaylandInputMethodContext. It will save a reference to the Context while, at the same time, set itself as the delegate for that Context object. The set of methods of both classes is pretty similar, because they are expected to work together: the Context object, in the parts related to the platform, and InputMethodAuraLinux in the parts related to the rest of Chromium. It will interact with other system classes, mainly through the singleton object IMEBridge.

    The X11 backend currently involves slightly different classes. It still has the InputMethodAuraLinux as a center piece, but uses different implementations for the Context and the ContextFactory: the multi-purpose class LinuxUI also acts as a factory. It’s expected to adapt to the Ozone structure at some point in the future.

    ChromeOS and IME extension API

    Chromium provides an extension API to allow the implementation of IMEs exclusively with web technologies. It shares code with the ChromeOS implementation.

    The piece glueing the native infrastructure and the ChromeOS/Extension API implementations is the IMEEngineHandlerInterface, which has hooks to manage relevant events like focus in or out, process key events, etc. The singleton class IMEBridge can set an EngineHandler, which is an implementation of the aforementioned interface, and in the case that such implementation exists, it will receive those events from Chromium whenever they happen.

    There is one base implementation of IMEEngineHandlerInterface called InputMethodEngineBase. It is extended by chromeos::InputMethodEngine, for the ChromeOS implementation, and by input_method::InputMethodEngine, for the IME extension API implementation.

    The setup of the IME extension API contains the following pieces:

    • observers to the IME bridge
    • an InputIMEEventRouter that contains an instance of the InputMethodEngine class
    • a factory to associate instances of InputIMEEventRouter with browser profiles.
    • An InputImeAPI class that acts as several kinds of observers.
    • the actual code that backs every JavaScript IME operation, which makes use of the classes mentioned above and is contained in individual classes that implement UIThreadExtensionFunction for every operation.

    The IME API operates in two directions: when one of the Javascript operations is directly called, it will trigger the native code associated to it, which would invoke any relevant Chromium code; when a relevant event happens in the system, the IMEBridgeObserver is notified and the associated JavaScript callback will be triggered.

    by Jacobo Aragunde Pérez at March 09, 2020 05:00 PM

    Brian Kardell

    Making Sure Content Lives On...

    Making Sure Content Lives On...

    In which I describe a problem, share a possible solution that you can use today, and, hopefully start a conversation...

    (If you already know the problem, and you're looking for the solution, check out the documentation over on GitHub)

    There's a series of problems that I experience literally all the time: Link rot and the disappearance of our history. I have written posts on blogging platforms that no longer exist. I have written guest posts for sites that no longer exist. I frequently wish that that content still existed - not only still existed, but was findable again. I have researched hard and bookmarked or linked to obscure but important history which, you guessed it: no longer exists. This is tremendously frustrating, but also just really sad (more on this later)...

    This is one thing that really interests me about a potentially future decentralized web - we could do better here. At the same time, we often have to deal with the Web we have now - not the one we wish existed. So, let's look the state of this today?

    Archiving today

    Luckily, when content disappears, I can often turn to the Internet Archive and the Wayback Machine to find content. But the truth is that they've got a very tough challenge. Identifing all the things that need indexing is really hard. If you have a relatively small blog or site, especially, it can be hard to find. Then, even if it gets found, how do they know when you've got some new content? So, the net result is that no matter how good a job they do - a lot of really interesting stuff can wind up not being findable today there either.

    History itself makes this problem even harder: It's very unpredictable. Often things are considerably more interesting in only in hindsight. Things also tend to seem a lot more stable in the short term than they are in reality in the long arc of history: Even giant "too big to fail", semi-monopolies ultimately shift, burn out or even just break their history. One of the wisest things I ever heard was an older engineer who told me if they could impart one things to students it was this - all the paradigms that you think are forever will shift. Once upon a time IBM ruled the world. Motorola was absolutely dominant. Once upon a time, SourceForge was the shit. Then GitHub. Once upon a time Netscape won the interwebs, and then they lost it to Microsoft who won it even bigger. Then Microsoft lost it again.

    It's ok if you don't share my desire, but I personally, want to be sure my content continues to exist, and people can find it via old links despite all of these uncertain futures. Today, the best chance of that is in the Internet Archive. So, how can I be more confident that it does?

    Maybe... Just ask?

    Interestingly, a lot of these problems get hella simpler if you just ask.

    For a really long time now (as long as I can remember the Wayback Machine existing), you can go to, enter a URL under the heading 'Save page now' and click the button.

    A notification model is way, way simpler than somehow trying to monitor and guess - and we use this same basic pattern for several things in engineering. It's quite success at keeping things efficient.

    The trouble here is, I think, that it is currently manual. It's unlikely that I'm going to do that every time I add a new blog post, even thought that's exactly what I want to do.

    Maybe we can fix that?

    Asking by default...

    Recently, this got me to thinking... What if it were really easy to incorporate this into whatever blog software you use today? Think about it: RSS continues to thrive, in part, because you hardly have to know about it - it's so comparatively easy (entirely free in some cases in that it is just part of the software you get). If you could incorporate this into your system in somewhere between 0 and 30 minutes, would you?

    Every now and then I have a thought like this that seems so clear to me that I figure "surely this must exist? Why do I not know about it?" So, I asked around on social media. Surprisingly, I could find no one who was aware of such a thing. What about publicly exposing an API? Do they? Maybe? Kinda? Could we make it easier?

    Brendan Eich cc'ed Jason Scott from the Internet Archive into the conversation. Jason cc'ed in Mark Graham, the Director for the Wayback Machine. Mark got me access to a service and set up this little experiement to try it out.

    But how to get there?

    One of the challenges here is how to make something 'easy' when there are so many blogging systems and site hosting things. Submitting a URL automatically, even, is technically not hard - but fitting that into each system is a little more work. Then, each of these uses different frameworks, languages, package managers and so on. I don't really have time to build out 100 different 'easy' solutions just to start a conversation. In fact, I want this for myself, and currently my setup is very custom - it's all stuff I wrote that works for me but would be more or less useful to someone else.

    But then I had a thought: There's actually lots of good stuff in the Web we have for tackling this problem.

    Existing, higher level hooks

    HTTP is already a nice, high-level API that doesn't really care about how you built your site, but it might not be automatically obvious or convenient regarding how to plug into it. We already have lots of infrastructure too... As I said, almost everyone has an RSS feed and almost everyone has some kind of 'publish' process. Maybe I could make it easy for huge swaths of people by just tapping into some big patterns and existing machinery. What I really wanted, I thought, was just a flexible and easy to integrate service. It doesn't have to block your build, it isn't absolutely critical, it's just "hey... I just published a thing."

    So, I decided to try this..

    I created an HTTP service with a Cloudflare worker. Cloudflare workers are interesting because they're deployed at the edge, they sleep when youre not using them and you can process up to 100k requests per-day for free. Since they kind of spin up as needed, handling errors for this kind of thing is easy too - you aren't going to pollute the system somehow. In fact, they seem kind of ideally suited for this kind of thing.

    This seems very good actually because blogs, especially personal blogs, tend to churn out content "slowly". That is, 100k requests per day seems like plenty for a few orders of magnitude more blogs than that. So... Cool.

    Basically - you send an HTTP Post to this service with the content-type `application/json` and provide some stuff in the body. But what you provide can differ to make integration very easy regardless of what kind of setup you have. It allows, effectively 3 'shapes' that I've put together that should fit nicely and easily into whatever system you already have today and be easy to integrate. For my own blog, hosted on gh pages, it took ~5 minutes. If you're interested in trying it out or seeing what I came up with check out the experiment itself, try it out.

    I think it should be pretty easy and flexible, but I don't know, wdyt? Are you willing to try it? Is this an interesting problem? Should the Wayback maybe offer something like this itself? How could it be better?

    March 09, 2020 04:00 AM

    March 02, 2020

    Žan Doberšek

    Flatpak repository for WPE

    To let developers play with the WPE stack, we have set up a Flatpak repository containing all the necessary bits to start working with it. To install applications (like Cog, the very simple WPE launcher), first add the remote repository, and proceed with the following instructions:

    $ flatpak --user remote-add wpe-releases --from
    $ flatpak --user install org.wpe.Cog
    $ flatpak run org.wpe.Cog -P fdo <url>

    Currently the 2.26 release of the WPE port is used, along with libwpe 1.4.0, WPEBackend-fdo 1.4.0 and Cog 0.4.0. Upgrades to the newer releases (happening in next few weeks) will be done in the next month or two. Builds are provided for x86_64, arm and aarch64 architectures.

    If you need ideas or inspiration on how to use WPE, this repository also contains GstWPEBroadcastDemo, an application that showcases both GStreamer and WPE, enabling you to mix live video input with HTML content that can be updated on-the-fly. You can read more about this in the blog post made by Philippe Normand.

    The current Cog/WPE stack still imposes the Wayland-only limitation, with Mesa-based graphics stacks most likely to work well. In future release, we plan to add support for new platforms, graphics stacks and methods of integration.

    All of this is still in very early stages. If you find an issue with the applications or libraries in the repository, please do not hesitate to report it to our issue tracker. The issues will be rerouted to the trackers of the problematic component if necessary.

    by Žan Doberšek at March 02, 2020 09:00 AM

    February 17, 2020

    Paulo Matos

    CReduce - it's a kind of magic!

    During my tenure as a C compiler developer working on GCC and LLVM there was an indispensable tool when it came to fixing bugs and that was C-Reduce, or its precursor delta. Turns out this magic applies well to JavaScript, so lets take a quick glance at what it is and how to use it.


    by Paulo Matos at February 17, 2020 10:00 AM

    February 12, 2020

    Brian Kardell

    Toward Responsive Elements

    Toward Responsive Elements

    In this piece I'll talk about the "Container Queries" problem, try to shine some light on some misconceptions, and tell you about the state of things.

    As developers, watching standards can be frustrating. We might learn of some new spec developing, some new features landing that you weren't even aware of - maybe many of these are even things you don't actually care much about. In the meantime, "Container Queries," clearly the single most requested feature in CSS for years, appears to be just... ignored.

    What gives? Why does the CSS Working Group not seem to listen? After all of these years -- why isn't there some official CSS spec at least?

    It's very easy to get the wrong impression about what's going on (or isn't) in Web standards, and why. This isn't because it is a secret, but because there is just so much happening and so many levels of discussion it would be impossible to follow it all. The net result is often that things the view from the outside can feel kind of opaque - and our impression can easily be a kind of a distorted image. Shining some light on this sort of thing is one of my goals this year, so let's start here…

    N problems

    One of the more difficult aspects of "Container Queries" is that there isn't actually a nice, single problem to discuss. Many discussions begin with how to write them - usually in a way that "feels kind of obvious" -- but this usually (inadvertently) often creates a stack of new asks of CSS and numerous conflicts. These wind up derailing the conversation. In the process, it can also (inadvertently) hand-wave over tremendous complexity of what turn out to be very significant particulars and important aspects of CSS's current design and implementation. This makes it very hard to focus a discussion, it is easily derailed.

    A big part of the challenge here is that depending on how you divide up the problem, some of these asks would suggest that fundamental changes are necessary to the architectures of browser engines. Worse: We don't even know what they would be. Unfortunately, it's very hard to appreciate what this means in practice since so much of this, again, deals with things that are opaque to most developers. For now: suffice it to say that any such re-architecture could require a entirely unknown (in years) speculative amount of work in order to create and prove such an idea - and then many more years of development per-engine. Worse, all of this would be similarly invisible to developers in the meantime. Did you know, for example, that there are multiple many year long efforts with huge investments underway already aimed at unlocking many new things in CSS? There are - and I don't mean Houdini!

    Ok, but what about container queries...

    What is happening in this space?

    The truth is that while it might look like nothing much is happening, there have been lots of discussions and efforts to feel out various ideas. They have mostly, unfortunately, run into certain kinds of problems very quickly and it's not been at all easy to even determine which paths aren't dead ends.

    More recently, many have instead turned to "how do we make this into more solvable problems?" and "How do we actually make some progress, mitigate risk - take a step, and and actually get something to developers?"

    Two important ideas that have gotten a lot of traction in this time are 'containment' and ResizeObserver, both of which force us to tackle a different set of (hopefully more answerable) problems - allow us to lay what we think are probably necessary foundations to solving the problem, while enabling developers to meet these and other use cases more effectively, and in more timely fashion.

    Lightspeed progress, actually...

    Standards move at a scale unfamilliar to most of us. Consider that we shifted the stalemate conversation and ResizeObserver was envisioned, incubated, speced, tested, agreed upon, iterated on (we got things wrong!) and implemented in all browsers in about 2 years (with Igalia doing the latest implementation in WebKit thanks to some sponsorship from AMP). Containment is shipping in 2 of 3 engines.

    This is standards and normalization of something really new for developers at light speed.

    But... this is incomplete.

    Yes, it is.

    In the announcement about Igalia's implementation of ResizeObserver in WebKit, I included an overly simple example of how you could create responsive components with very few lines of code that actually worked, regardless of how or why things resized. But... no one wants us to ultimately have to write JavaScript for this… Literally nobody that I know of - so what gives?

    First, by treating it this way, we've made some incredible, actual progress on parts of the problem in very short time - for the first time. All browsers will have ResizeObserver now. In the meantime developers will at least have the ability to do a thing they couldn't actually do before, and do the things they could do before more efficiently and experiments can understand a little more about how it actually has to work. That's good for developers - we don't have to wait forever for something better than what we have now... But it's also good for problem solving: As things settle and we have more answers, its allowed us to focus conversations more and ask better next questions.

    We definitely care about moving futher!

    For the last several months, I and some folks at Igalia have been working on this space. We're very interested in helping find the connections that bring together developers, implementer and standards bodies to solve the problems.

    Internally, we have had numerous sessions to discuss past proposals and tackle challenges. I've been researching the various user-space solutions: How they work, what their actual use is like on the Web, and so on. In developer space we've worked a lot with folks active in this space: I've met regularly with Tommy Hodgins and Jon Neal, and had discussions with folks like Eric Portis, Brad Frost, Scott Jehl and several others.

    But we've also talked to people from Google, Mozilla and Apple, and they've been very helpful and willing to chat - both commenting on and generating new ideas and discussions. I really can't stress enough how helpful these discussions have been, or how willing all of the CSSWG members representing browser vendors have been to spend time doing this. Just during the recent travel for the CSS Working Group face to face, I had numerous breakout discussions - many hours worth of idea trading between Igalians and browser folks.

    I wish I could say that we were 'there' but we're not.

    As it stands it seems that there are kind of a few 'big ideas' that seem like they could go somewhere - but not actually agreement on whether one or both of them is acceptable or prioritization yet.

    Current ideas...

    There does seem to be some general agreement on at least one part of what I am going to call instead "Responsive Design for Components" and that is that flipping the problem on its head is better. In other words: Let's see if we can find a way talk about how we can expose the necessary powers in CSS, in a way that is largely compatible with CSS's architecture today... even if writing that looks absolutely nothing like past proposals at first.

    There seems to be some agreement at large that this is where the hardest problems lie and separating that part of the discussion from things that we can already sugar in user-space seems helpful for making progreess. This has led to a few different ideas…

    Lay down small, very optimal paths in CSS

    One very good idea generated by lots of discussion with many people during CSS Working Group breakouts sprang from a thought from Google's Ian Kilpatrick…

    I am constantly impressed by Ian's ability to listen, pull the right threads, help guide the big ship and coordinate progress on even long visions that enable us to exceed current problem limits. You probably don't hear a lot about him, but the Web is a lot better for his efforts…

    The crux of the idea is that there seems to be one 'almost natural' space that enables a ton of the use cases we've actually seen in research. Some community members have landed on not entirely dissimilar ideas: creating something of an 'if' via calc expressions to express different values for properties. The idea goes something like this:

    Imagine that we added a function, let's call it "switch" which allows you to express something "like a media query" with several possible values for a property. These would be based on the available-inline-width during the layout phase. Something like this…

    .foo {
    	display: grid;
    	grid-template-columns: switch(
    	 	(available-inline-size > 1024px) 1fr 4fr 1fr;
    	 	(available-inline-size > 400px) 2fr 1fr;
    		(available-inline-size > 100px) 1fr;
    		default 1fr;

    See, a a whole lot of the problems with existing ideas is that they heave to loop back through (expensive) phases potentially several times and make it (seemingly) impossible to keep CSS rendering in the same frame - thus requiring fairly major architectural changes. It would need to be limited to properties that affect things only during layout, but another practical benefit here is that it would be not that hard to start some basic prototyping and feel this out a little in an implementation without actually committing to years of work. It would be, very likely, deliverable to developers in a comparatively short amount of time.

    Importantly: While it makes sense to expose this step to developers, the idea really is that if we could get agreement and answers on this piece, we would be able to discuss concretely how we sugar some better syntax which effectly distills down to this, internally. Nicole Sullivan also has some ideas which might help here, and which I look forward to learn more about.

    Better Containment and...

    There is another interesting idea generated primarily by David Baron at Mozilla. In fact, David has a whole series of proposals on this that would lay out how to get all the way there in one vision. It is still being fleshed out, he's writing something up as I write this.

    I've seen some early drafts and it's really interesting but it's worth noting that it also isn't without a practical order of dependencies. Effectively, everything in David's idea hinges on "containment in a single direction". This is the smallest and most fundamental property - the next step is almost certainly to define and impement that and if we have that then the next things follow more easily - but so do many other possibilities.

    This information as a property potentially provides an opportunity for an optimization of what would happen if you built any system with ResizeObserver today, where it might be possible to avoid complete trips through the event loop.

    It isn't currently clear how some questions will be answered but David is one of the most knowledgable people I know, and there's a lot of good stuff in here - I look forward to seeing more details!

    Some new experiments?

    Another idea that we've tossed around a bunch at Igalia and with several developers punts on a few hard of these hard questions whose answers seem hard to speculate on and really focus instead on the circularity problem and thinking about the kind of unbounded space of use cases developers seem to imagine. Perhaps it "fits" with option B even...

    These would be a kind of declarative way, in CSS, to wire up finite states. They would focus on a laying down a single new path in CSS that allowed us to express pattern of what to observe, how to observe it, how to measure and say that an element was in a particular state. The practical upshot of this is that not just Container Queries, but lots and lots of use cases that are currently hard to solve in CSS, but are entirely solvable in JS with Observers become suddenly possible to just express via CSS without asking a lot of new stuff CSS - and doing that might present some new optimization opportunities, but it might not -- and at least we wouldn't block moving forward.

    So what's next?

    In the coming months I hope to continue to think about, explore this space and continue discussions with others. I would love to publish some research and maybe some new (functional) experiments with JS that aim to be 'closer' to a path that might be paveable. We have raw materials and enough information to understand some of where things are disjoint between what we're asking for and the practical aspects of what we're proving.

    Ultimately, I'd like to help gather all of these ideas back into a place where we have enough to really dig in in WICG or CSSWG without a very significant likelihood of easy derailment. Sooner rather than later, I hope.

    In the meantime - let's celebrate that we have actually made progress and work to encourage more!

    February 12, 2020 05:00 AM

    February 09, 2020

    Andy Wingo

    state of the gnunion 2020

    Greetings, GNU hackers! This blog post rounds up GNU happenings over 2019. My goal is to celebrate the software we produced over the last year and to help us plan a successful 2020.

    Over the past few months I have been discussing project health with a group of GNU maintainers and we were wondering how the project was doing. We had impressions, but little in the way of data. To that end I wrote some scripts to collect dates and versions for all releases made by GNU projects, as far back as data is available.

    In 2019, I count 243 releases, from 98 projects. Nice! Notably, on we have the first stable releases from three projects:

    GNU Guix
    GNU Guix is perhaps the most exciting project in GNU these days. It's a package manager! It's a distribution! It's a container construction tool! It's a package-manager-cum-distribution-cum-container-construction-tool! Hearty congratulations to Guix on their first stable release.
    GNU Shepherd
    The GNU Daemon Shepherd is a modern dependency-based init service, written in Guile Scheme, and used in Guix. When you install Guix as an operating system, it actually stages Scheme programs from the operating system definition into the Shepherd configuration. So cool!
    GNU Backgammon
    Version 1.06.002 is not GNU Backgammon's first stable release, but it is the earliest version which is available on Formerly hosted on the now-defunct, GNU Backgammon is a venerable foe, and uses neural networks since before they were cool. Welcome back, GNU Backgammon!

    The total release counts above are slightly above what Mike Gerwitz's scripts count in his "GNU Spotlight", posted on the FSF blog. This could be because in addition to files released on, I also manually collected release dates for most packages that upload their software somewhere other than I don't count releases, and there were a handful of packages for which I wasn't successful at retrieving their release dates. But as a first approximation, it's a relatively complete data set.

    I put my scripts in git repository if anyone is interested in playing with the data. Some raw CSV files are there as well.

    where we at?

    Hair toss, check my nails, baby how you GNUing? Hard to tell!

    To get us closer to an answer, I calculated the active package count per year. There can be other definitions, but my reading is that an active package is one that has had a stable release within the preceding 3 calendar years. So for 2019, for example, a GNU package is considered active if it had a stable release in 2017, 2018, or 2019. What I got was a graph that looks like this:

    What we see is nothing before 1991 -- surely pointing to lacunae in my data set -- then a more or less linear rise in active package count until 2002, some stuttering growth rising to a peak in 2014 at 208 active packages, and from there a steady decline down to 153 active packages in 2019.

    Of course, as a metric, active package count isn't precisely the same as project health; GNU ed is indeed the standard editor but it's not GCC. But we need to look for measurements that indirectly indicate project health and this is what I could come up with.

    Looking a little deeper, I tabulated the first and last release date for each GNU package, and then grouped them by year. In this graph, the left blue bars indicate the number of packages making their first recorded release, and the right green bars indicate the number of packages making their last release. Obviously a last release in 2019 indicates an active package, so it's to be expected that we have a spike in green bars on the right.

    What this graph indicates is that GNU had an uninterrupted growth phase from its beginning until 2006, with more projects being born than dying. Things are mixed until 2012 or so, and since then we see many more projects making their last release and above all, very few packages "being born".

    where we going?

    I am not sure exactly what steps GNU should take in the future but I hope that this analysis can be a good conversation-starter. I do have some thoughts but will post in a follow-up. Until then, happy hacking in 2020!

    by Andy Wingo at February 09, 2020 07:44 PM

    February 07, 2020

    Andy Wingo

    lessons learned from guile, the ancient & spry

    Greets, hackfolk!

    Like just about every year, last week I took the train up to Brussels for FOSDEM, the messy and wonderful carnival of free software and of those that make it. Mostly I go for the hallway track: to see old friends, catch up, scheme about future plans, and refill my hacker culture reserves.

    I usually try to see if I can get a talk or two in, and this year was no exception. First on my mind was the recent release of Guile 3. This was the culmination of a 10-year plan of work and so obviously there are some things to say! But at the same time, I wanted to reflect back a bit and look at the past with a bit of distance.

    So in the end, my one talk was two talks. Let's start with the first one. (I'm trying a new thing where I share my talks as blog posts. We'll see how this goes. I know the rendering can be a bit off relative to the slides, but hopefully it's good enough. If you prefer, you can just watch the video instead!)

    Celebrating Guile 3

    FOSDEM 2020, Brussels

    Andy Wingo | | @andywingo

    So yeah let's celebrate! I co-maintain the Guile implementation of Scheme. It's a programming language. Guile 3, in summary, is just Guile, but faster. We added a simple just-in-time compiler as well as a bunch of ahead-of-time optimizations. The result is that it runs faster -- sometimes by a lot!

    In the image above you can see Guile 3's performance on a number of microbenchmarks, relative to Guile 2.2, sorted by speedup. The baseline is 1.0x as fast. You can see that besides the first couple microbenchmarks where things are a bit inconclusive (click for full-size image), everything gets faster. Most are at least 2x as fast, and one benchmark is even 32x as fast. (Note the logarithmic scale on the Y axis.)

    I only took a look at microbenchmarks at the end of the Guile 3 series; before that, I was mostly going by instinct. It's a relief to find out that in this case, my instincts did align with improvement.

    mini-benchmark: eval

     ’(let fib ((n 30))
        (if (< n 2)
            (+ (fib (- n 1)) (fib (- n 2))))))

    Guile 1.8: primitive-eval written in C

    Guile 2.0+: primitive-eval in Scheme

    Taking a look at a more medium-sized benchmark, let's compute the 30th fibonacci number, but using the interpreter instead of compiling the procedure. In Guile 2.0 and up, the interpreter (primitive-eval) is implemented in Scheme, so it's a good test of an important small Scheme program.

    Before 2.0, though, primitive-eval was actually implemented in C. This had a number of disadvantages, notably that it prevented tail calls between interpreted and compiled code. When we switched to a Scheme implementation of primitive-eval, we knew we would have a performance hit, but we thought that we would gain it back eventually as the compiler got better.

    As you can see, it took a while before the compiler and run-time improved to the point that primitive-eval in Scheme reached the speed of its old hand-tuned C implementation, but for Guile 3, we finally got there. Note again the logarithmic scale on the Y axis.

    macro-benchmark: guix

    guix build libreoffice ghc-pandoc guix \
      –dry-run --derivation

    7% faster

    guix system build config.scm \
      –dry-run --derivation

    10% faster

    Finally, taking a real-world benchmark, the Guix package manager is implemented entirely in Scheme. All ten thousand packages are defined in Scheme, the building scripts are in Scheme, the initial RAM disk is in Scheme -- you get the idea. Guile performance in Guix can have an important effect on user experience. As you can see, Guile 3 lowered elapsed time for some operations by around 10 percent or so. Of course there's a lot of I/O going on in addition to computation, so Guile running twice as fast will rarely make Guix run twice as fast (Amdahl's law and all that).

    spry /sprī/

    • adjective: active; lively

    So, when I was thinking about words that describe Guile, the word "spry" came to mind.

    spry /sprī/

    • adjective: (especially of an old person) active; lively

    But actually when I went to look up the meaning of "spry", Collins Dictionary says that it especially applies to the agèd. At first I was a bit offended, but I knew in my heart that the dictionary was right.

    Lessons Learned from Guile, the Ancient & Spry

    FOSDEM 2020, Brussels

    Andy Wingo | | @andywingo

    That leads me into my second talk.

    guile is ancient

    2010: Rust

    2009: Go

    2007: Clojure

    1995: Ruby

    1995: PHP

    1995: JavaScript

    1993: Guile (33 years before 3.0!)

    It's common for a new project to be lively, but Guile is definitely not new. People have been born, raised, and earned doctorates in programming languages in the time that Guile has been around.

    built from ancient parts

    1991: Python

    1990: Haskell

    1990: SCM

    1989: Bash

    1988: Tcl

    1988: SIOD

    Guile didn't appear out of nothing, though. It was hacked up from the pieces of another Scheme implementation called SCM, which itself was initially based on Scheme in One Defun (SIOD), back before the Berlin Wall fell.

    written in an ancient language

    1987: Perl

    1984: C++

    1975: Scheme

    1972: C

    1958: Lisp

    1958: Algol

    1954: Fortran

    1958: Lisp

    1930s: λ-calculus (34 years ago!)

    But it goes back further! The Scheme language, of which Guile is an implementation, dates from 1975, before I was born; and you can, if you choose, trace the lines back to the lambda calculus, created in mid-30s as a notation for computation. I suppose at this point I should say mid-2030s, to disambiguate.

    The point is, Guile is old! Statistically, most software projects from olden times are now dead. How has Guile managed to survive and (sometimes) thrive? Surely there must be some lesson or other that can be learned here.

    ancient & spry

    Men make their own history, but they do not make it as they please; they do not make it under self-selected circumstances, but under circumstances existing already, given and transmitted from the past.

    The tradition of all dead generations weighs like a nightmare on the brains of the living. [...]

    Eighteenth Brumaire of Louis Bonaparte, Marx, 1852

    I am no philospher of history, but I know that there are some ways of looking at the past that do not help me understand things. One is the arrow of enlightened progress, in which events exist in a causal chain, each producing the next. It doesn't help me understand the atmosphere, tensions, and possibilities inherent at any particular point. I find the "progress" theory of history to be an extreme form of selection bias.

    Much more helpful to me is the Hegelian notion of dialectics: that at an given point in time there are various tensions at work. In our field, an example could be memory safety versus systems programming. These tensions create an environment that favors actions that lead towards resolution of the tensions. It doesn't mean that there's only one way to resolve the tensions, and it's not an automatic process -- people still have to do things. But the tendency is to ratchet history forward to a new set of tensions.

    The history of a project, to me, is then a process of dialectic tensions and resolutions. If the project survives, as Guile has, then it should teach us something about the way this process works in practice.

    ancient & spry

    Languages evolve; how to remain minimal?

    Dialectic opposites

    • world and guile

    • stable and active

    • ...

    Lessons learned from inside Hegel’s motor of history

    One dialectic is the tension between the world's problems and what tools Guile offers to understand and solve them. In 1993, the web didn't really exist. In 2033, if Guile doesn't run well in a web browser, probably it will be dead. But this process operates very slowly, for an old project; Guile isn't built on CORBA or something ephemeral like that, so we don't have very much data here.

    The tension between being a stable base for others to build on, and in being a dynamic project that improves and changes, is a key tension that this talk investigates.

    In the specific context of Guile, and for the audience of the FOSDEM minimal languages devroom, we should recognize that for a software project, age and minimalism don't necessarily go together. Software gets features over time and becomes bigger. What does it mean for a minimal language to evolve?

    hill-climbing is insufficient

    Ex: Guile 1.8; Extend vs Embed

    One key lesson that I have learned is that the strategy of making only incremental improvements is a recipe for death, in the long term. The natural result is that you reach what you perceive to be the most optimal state of your project. Any change can only make it worse, so you stop moving.

    This is what happened to Guile around version 1.8: we had taken the paradigm of the interpreter as language implementation strategy as far as it could go. There were only around 150 commits to Guile in 2007. We were stuck.

    users stay unless pushed away

    Inertial factor: interface

    • Source (API)

    • Binary (ABI)

    • Embedding (API)

    • CLI

    • ...

    Ex: Python 3; local-eval; R6RS syntax; set!, set-car!

    So how do we make change, in such a circumstance? You could start a new project, but then you wouldn't have any users. It would be nice to change and keep your users. Fortunately, it turns out that users don't really go away; yes, they trickle out if you don't do anything, but unless you change in an incompatible way, they stay with you, out of inertia.

    Inertia is good and bad. It does conflict with minimalism as a principle; if you were to design Scheme in 2020, you would not include mutable variables or even mutable pairs. But they are still with us because if we removed them, we'd break too many users.

    Users can even make you add back things that you had removed. In Guile 2.0, we removed the capability to evaluate an expression at run-time within the lexical environment of an expression, as we didn't know how to implement this outside an interpreter. It turns out this was so important to users that we had to add local-eval back to Guile, later in the 2.0 series. (Fortunately we were able to do it in a way that layered on lower-level facilities; this approach reconciled me to the solution.)

    you can’t keep all users

    What users say: don’t change or remove existing behavior

    But: sometimes losing users is OK. Hard to know when, though

    No change at all == death

    • Natural result of hill-climbing

    Ex: psyntax; BDW-GC mark & finalize; compile-time; Unicode / locales

    Unfortunately, the need to change means that sometimes you will lose users. It's either a dead project, or losing users.

    In Guile 1.8, for example, the macro expander ran lazily: it would only expand code the first time it ran it. This was good for start-up time, because not all code is evaluated in the course of a simple script. Lazy expansion allowed us to start doing important work sooner. However, this approach caused immense pain to people that wanted "proper" Scheme macros that preserved lexical scoping; the state of the art was to eagerly expand an entire file. So we switched, and at the same time added a notion of compile-time. This compromise kept good start-up time while allowing fancy macros.

    But eager expansion was a change. Users that relied on side effects from macro expansion would see them at compile-time instead of run-time. Users of old "defmacros" that could previously splice in live Scheme closures as literals in expanded source could no longer do that. I think it was the right choice but it did lose some users. In fact I just got another bug report related to this 10-year-old change last week.

    every interface is a cost

    Guile binary ABI:; compiled Scheme files

    Make compatibility easier: minimize interface

    Ex: scm_sym_unquote, GOOPS, Go, Guix

    So if you don't want to lose users, don't change any interface. The easiest way to do this is to minimize your interface surface. In Go, for example, they mostly haven't had dynamic-linking problems because that's not a thing they do: all code is statically linked into binaries. Similarly, Guix doesn't define a stable API, because all of its code is maintained in one "monorepo" that can develop in lock-step.

    You always have some interfaces, though. For example Guix can't change its command-line interface from one day to the next, for example, because users would complain. But it's been surprising to me the extent to which Guile has interfaces that I didn't consider. Recently for example in the 3.0 release, we unexported some symbols by mistake. Users complained, so we're putting them back in now.

    parallel installs for the win

    Highly effective pattern for change



    Changed ABI is new ABI; it should have a new name

    Ex: make-struct/no-tail, GUILE_PKG([2.2]), libtool

    So how does one do incompatible change? If "don't" isn't a sufficient answer, then parallel installs is a good strategy. For example in Guile, users don't have to upgrade to 3.0 until they are ready. Guile 2.2 happily installs in parallel with Guile 3.0.

    As another small example, there's a function in Guile called make-struct (old doc link), whose first argument is the number of "tail" slots, followed by initializers for all slots (normal and "tail"). This tail feature is weird and I would like to remove it. Unfortunately I can't just remove the argument, so I had to make a new function, make-struct/no-tail, which exists in parallel with the old version that I can't break.

    deprecation facilitates migration

    __attribute__ ((__deprecated__))
     "(ice-9 mapping) is deprecated."
     "  Use srfi-69 or rnrs hash tables instead.")
      ("Arbiters are deprecated.  "
       "Use mutexes or atomic variables instead.");

    begin-deprecated, SCM_ENABLE_DEPRECATED

    Fortunately there is a way to encourage users to migrate from old interfaces to new ones: deprecation. In Guile this applies to all of our interfaces (binary, source, etc). If a feature is marked as deprecated, we cause its use to issue a warning, ideally at compile-time when users responsible for the package can fix it. You can even add __attribute__((__deprecated__)) on C types!

    the arch-pattern

    Replace, Deprecate, Remove

    All change is possible; question is only length of deprecation period

    Applies to all interfaces

    Guile deprecation period generally one stable series

    Ex: scm_t_uint8; make-struct; Foreign objects; uniform vectors

    Finally, you end up in a situation where you have replaced the old interface and issued deprecation warnings to help users migrate. The next step is to remove the old interface. If you don't do this, you are failing as a project maintainer -- your project becomes literally unmaintainable as it just grows and grows.

    This strategy applies to all changes. The deprecation period may last a while, and it may be that the replacement you built doesn't serve the purpose. There is still a dialog with the users that needs to happen. As an example, I made a replacement for the "SMOB" facility in Guile that allows users to define new types, backed by C interfaces. This new "foreign object" facility might not actually be good enough to replace SMOBs; since I haven't formally deprecatd SMOBs, I don't know yet because users are still using the old thing!

    change produces a new stable point

    Stability within series: only additions

    Corollary: dependencies must be at least as stable as you!

    • for your definition of stable

    • social norms help (GNU, semver)

    Ex: libtool; unistring; gnulib

    In my experience, the old management dictum that "the only constant is change" does not describe software. Guile changes, then it becomes stable for a while. You need an unstable series escape hill-climbing, then once you found your new hill, you start climbing again in the stable series.

    Once you reach your stable point, the projects you rely on need to exhibit the same degree of stability that you envision for your project. You can't build a web site that you expect to maintain for 10 years on technology that fundamentally changes every 6 months. But stable dependencies isn't something you can ensure technically; rather it relies on social norms of who makes the software you use.

    who can crank the motor of history?

    All libraries define languages

    Allow user to evolve the language

    • User functionality: modules (Guix)

    • User syntax: macros (yay Scheme)

    Guile 1.8 perf created tension

    • incorporate code into Guile

    • large C interface “for speed”

    Compiler removed pressure on C ABI

    Empowered users need less from you

    A dialectic process does not progress on its own: it requires actions. As a project maintainer, some of my actions are because I want to do them. Others are because users want me to do them. The user-driven actions are generally a burden and as a lazy maintainer, I want to minimize them.

    Here I think Guile has to a large degree escaped some of the pressures that weigh on other languages, for example Python. Because Scheme allows users to define language features that exist on par with "built-in" features, users don't need my approval or intervention to add (say) new syntax to the language they work in. Furthermore, their work can still compose with the work of others, even if the others don't buy in to their language extensions.

    Still, Guile 1.8 did have a dynamic whereby the relatively poor performance of having to run all code through primitive-eval meant that users were pushed towards writing extensions in C. This in turn pushed Guile to expose all of its guts for access from C, which obviously has led to an overbloated C API and ABI. Happily the work on the Scheme compiler has mostly relieved this pressure, and we may therefore be able to trim the size of the C API and ABI over time.

    contributions and risk

    From maintenance point of view, all interface is legacy

    Guile: Sometimes OK to accept user modules when they are more stable than Guile

    In-tree users keep you honest

    Ex: SSAX, fibers, SRFI

    It can be a good strategy to "sediment" solutions to common use cases into Guile itself. This can improve the minimalism of an entire ecosystem of code. The maintenance burden has to be minimal, however; Guile has sometimes adopted experimental code into its repository, and without active maintenance, it soon becomes stale relative to what users and the module maintainers expect.

    I would note an interesting effect: pieces of code that were adopted into Guile become a snapshot of the coding style at that time. It's useful to have some in-tree users because it gives you a better idea about how a project is seen from the outside, from a code perspective.

    sticky bits

    Memory management is an ongoing thorn

    Local maximum: Boehm-Demers-Weiser conservative collector

    How to get to precise, generational GC?

    Not just Guile; e.g. CPython __del__

    There are some points that resist change. The stickiest of these is the representation of heap-allocated Scheme objects in C. Guile currently uses a garbage collector that "automatically" finds all live Scheme values on the C stack and in registers. It was the right choice at the time, given our maintenance budget. But to get the next bump in performance, we need to switch to a generational garbage collector. It's hard to do that without a lot of pain to C users, essentially because the C language is too weak to express the patterns that we would need. I don't know how to proceed.

    I would note, though, that memory management is a kind of cross-cutting interface, and that it's not just Guile that's having problems changing; I understand PyPy has had a lot of problems regarding changes on when Python destructors get called due to its switch from reference counting to a proper GC.


    We are here: stability

    And then?

    • Parallel-installability for source languages: #lang

    • Sediment idioms from Racket to evolve Guile user base

    Remove myself from “holding the crank”

    So where are we going? Nowhere, for the moment; or rather, up the hill. We just released Guile 3.0, so let's just appreciate that for the time being.

    But as far as next steps in language evolution, I think in the short term they are essentially to further enable change while further sedimenting good practices into Guile. On the change side, we need parallel installability for entire languages. Racket did a great job facilitating this with #lang and we should just adopt that.

    As for sedimentation, we should step back and if any common Guile use patterns built by our users should be include core Guile, and widen our gaze to Racket also. It will take some effort both on a technical perspective but also on a social/emotional consensus about how much change is good and how bold versus conservative to be: putting the dialog into dialectic.

    dialectic, boogie woogie woogie

    #guile on freenode


    Happy hacking!

    Hey that was the talk! Hope you enjoyed the writeup. Again, video and slides available on the FOSDEM web site. Happy hacking!

    by Andy Wingo at February 07, 2020 11:38 AM