Planet Igalia

July 26, 2024

Loïc Le Page

FFmpeg 101

A high-level architecture overview to start with FFmpeg.

FFmpeg package content #

FFmpeg is composed of a suite of tools and libraries.

FFmpeg tools #

The tools can be used to encode/decode/transcode a multitude of different audio and video formats, and to stream the encoded media over networks.

  • ffmpeg: a command line tool to convert multimedia files between formats
  • ffplay: a simple mediaplayer based on SDL and the FFmpeg libraries
  • ffprobe: a simple multimedia stream analyzer

FFmpeg libraries #

The libraries can be used to integrate those same features into your own product.

  • libavformat: I/O and muxing/demuxing
  • libavcodec: encoding/decoding
  • libavfilter: graph-based filters for raw media
  • libavdevice: input/output devices
  • libavutil: common multimedia utilities
  • libswresample: audio resampling, samples format conversion and audio mixing
  • libswscale: color conversion and image scaling
  • libpostproc: video post-processing (deblocking/noise filters)

FFmpeg simple player #

A basic usage of FFmpeg is to demux a multimedia stream (obtained from a file or from the network) into its audio and video streams and then to decode those streams into raw audio and raw video data.

To manage the media streams, FFmpeg uses the following structures:

  • AVFormatContext: a high level structure providing sync, metadata and muxing for the streams
  • AVStream: a continuous stream (audio or video)
  • AVCodec: defines how data are encoded and decoded
  • AVPacket: encoded data in the stream
  • AVFrame: decoded data (raw video frame or raw audio samples)

The process used to demux and decode follows this logic:

basic processing

Here is the basic code needed to read an encoded multimedia stream from a file, analyze its content and demux the audio and video streams. Those features are provided by the libavformat library and it uses the AVFormatContext and AVStream structures to store the information.

// Allocate memory for the context structure
AVFormatContext* format_context = avformat_alloc_context();

// Open a multimedia file (like an mp4 file or any format recognized by FFmpeg)
avformat_open_input(&format_context, filename, NULL, NULL);
printf("File: %s, format: %s\n", filename, format_context->iformat->name);

// Analyze the file content and identify the streams within
avformat_find_stream_info(format_context, NULL);

// List the streams
for (unsigned int i = 0; i < format_context->nb_streams; ++i)
{
    AVStream* stream = format_context->streams[i];

    printf("---- Stream %02d\n", i);
    printf("  Time base: %d/%d\n", stream->time_base.num, stream->time_base.den);
    printf("  Framerate: %d/%d\n", stream->r_frame_rate.num, stream->r_frame_rate.den);
    printf("  Start time: %" PRId64 "\n", stream->start_time);
    printf("  Duration: %" PRId64 "\n", stream->duration);
    printf("  Type: %s\n", av_get_media_type_string(stream->codecpar->codec_type));

    uint32_t fourcc = stream->codecpar->codec_tag;
    printf("  FourCC: %c%c%c%c\n", fourcc & 0xff, (fourcc >> 8) & 0xff, (fourcc >> 16) & 0xff, (fourcc >> 24) & 0xff);
}

// Close the multimedia file and free the context structure
avformat_close_input(&format_context);

Once we’ve got the different streams from inside the multimedia file, we need to find specific codecs to decode the streams to raw audio and raw video data. All codecs are statically included in libavcodec. You can easily create your own codec by just creating an instance of the FFCodec structure and registering it as an extern const FFCodec in libavcodec/allcodecs.c, but this would be a different topic for another post.

To find the codec corresponding to the content of an AVStream, we can use the following code:

// Stream obtained from the AVFormatContext structure in the former streams listing loop
AVStream* stream = format_context->streams[i];

// Search for a compatible codec
const AVCodec* codec = avcodec_find_decoder(stream->codecpar->codec_id);
if (!codec)
{
    fprintf(stderr, "Unsupported codec\n");
    continue;
}
printf("  Codec: %s, bitrate: %" PRId64 "\n", codec->name, stream->codecpar->bit_rate);

if (codec->type == AVMEDIA_TYPE_VIDEO)
{
    printf("  Video resolution: %dx%d\n", stream->codecpar->width, stream->codecpar->height);
}
else if (codec->type == AVMEDIA_TYPE_AUDIO)
{
    printf("  Audio: %d channels, sample rate: %d Hz\n",
        stream->codecpar->ch_layout.nb_channels,
        stream->codecpar->sample_rate);
}

With the right codec and codec parameters extracted from the AVStream information, we can now allocate the AVCodecContext structure that will be used to decode the corresponding stream. It is important to remember the index of the stream we want to decode from the former streams list (format_context->streams) because this index will be used later to identify the demuxed packets extracted by the AVFormatContext.

In the following code we’re going to select the first video stream contained in the multimedia file.

// first_video_stream_index is determined during the streams listing in the former loop
int first_video_stream_index = ...;

AVStream* first_video_stream = format_context->streams[first_video_stream_index];
AVCodecParameters* first_video_stream_codec_params = first_video_stream->codecpar;
const AVCodec* first_video_stream_codec = avcodec_find_decoder(first_video_stream_codec_params->codec_id);

// Allocate memory for the decoding context structure
AVCodecContext* codec_context = avcodec_alloc_context3(first_video_stream_codec);

// Configure the decoder with the codec parameters
avcodec_parameters_to_context(codec_context, first_video_stream_codec_params);

// Open the decoder
avcodec_open2(codec_context, first_video_stream_codec, NULL);

Now that we have a running decoder, we can extract the demuxed packets using the AVFormatContext structure and decode them to raw video frames. For that we need 2 different structures:

  • AVPacket which contains the encoded packets extracted from the input multimedia file,
  • AVFrame which will contain the raw video frame after the AVCodecContext has decoded the former packets.
// Allocate memory for the encoded packet structure
AVPacket* packet = av_packet_alloc();

// Allocate memory for the decoded frame structure
AVFrame* frame = av_frame_alloc();

// Demux the next packet from the input multimedia file
while (av_read_frame(format_context, packet) >= 0)
{
    // The demuxed packet uses the stream index to identify the AVStream it is coming from
    printf("Packet received for stream %02d, pts: %" PRId64 "\n", packet->stream_index, packet->pts);

    // In our example we are only decoding the first video stream identified formerly by first_video_stream_index
    if (packet->stream_index == first_video_stream_index)
    {
        // Send the packet to the previsouly initialized decoder
        int res = avcodec_send_packet(codec_context, packet);
        if (res < 0)
        {
            fprintf(stderr, "Cannot send packet to the decoder: %s\n", av_err2str(res));
            break;
        }

        // The decoder (AVCodecContext) acts like a FIFO queue, we push the encoded packets on one end and we need to
        // poll the other end to fetch the decoded frames. The codec implementation may (or may not) use different
        // threads to perform the actual decoding.

        // Poll the running decoder to fetch all available decoded frames until now
        while (res >= 0)
        {
            // Fetch the next available decoded frame
            res = avcodec_receive_frame(codec_context, frame);
            if (res == AVERROR(EAGAIN) || res == AVERROR_EOF)
            {
                // No more decoded frame is available in the decoder output queue, go to next encoded packet
                break;
            }
            else if (res < 0)
            {
                fprintf(stderr, "Error while receiving a frame from the decoder: %s\n", av_err2str(res));
                goto end;
            }

            // Now the AVFrame structure contains a decoded raw video frame, we can process it further...
            printf("Frame %02" PRId64 ", type: %c, format: %d, pts: %03" PRId64 ", keyframe: %s\n",
                codec_context->frame_num, av_get_picture_type_char(frame->pict_type), frame->format, frame->pts,
                (frame->flags & AV_FRAME_FLAG_KEY) ? "true" : "false");

            // The AVFrame internal content is automatically unreffed and recycled during the next call to
            // avcodec_receive_frame(codec_context, frame)
        }
    }

    // Unref the packet internal content to recycle it for the next demuxed packet
    av_packet_unref(packet);
}

// Free the previously allocated memory for the different FFmpeg structures
end:
    av_packet_free(&packet);
    av_frame_free(&frame);
    avcodec_free_context(&codec_context);
    avformat_close_input(&format_context);

The way the former code is acting is resumed in the next diagram:

processing diagram

You can find the full code here.

To build the example you will need meson and ninja. If you have python and pip installed, you can install them very easily by calling pip3 install meson ninja. Then, once the example archive extracted to a ffmpeg-101 folder, go to this folder and call: meson setup build. It will automatically download the right version of FFmpeg if you don’t have it already installed on your system. Then call: ninja -C build to build the code and ./build/ffmpeg-101 sample.mp4 to run it.

You should obtain the following result:

File: sample.mp4, format: mov,mp4,m4a,3gp,3g2,mj2
---- Stream 00
  Time base: 1/3000
  Framerate: 30/1
  Start time: 0
  Duration: 30000
  Type: video
  FourCC: avc1
  Codec: h264, bitrate: 47094
  Video resolution: 206x80
---- Stream 01
  Time base: 1/44100
  Framerate: 0/0
  Start time: 0
  Duration: 440320
  Type: audio
  FourCC: mp4a
  Codec: aac, bitrate: 112000
  Audio: 2 channels, sample rate: 44100 Hz
Packet received for stream 00, pts: 0
Send video packet to decoder...
Frame 01, type: I, format: 0, pts: 000, keyframe: true
Packet received for stream 00, pts: 100
Send video packet to decoder...
Frame 02, type: P, format: 0, pts: 100, keyframe: false
Packet received for stream 00, pts: 200
Send video packet to decoder...
Frame 03, type: P, format: 0, pts: 200, keyframe: false
Packet received for stream 00, pts: 300
Send video packet to decoder...
Frame 04, type: P, format: 0, pts: 300, keyframe: false
Packet received for stream 00, pts: 400
Send video packet to decoder...
Frame 05, type: P, format: 0, pts: 400, keyframe: false
Packet received for stream 00, pts: 500
Send video packet to decoder...
Frame 06, type: P, format: 0, pts: 500, keyframe: false
Packet received for stream 00, pts: 600
Send video packet to decoder...
Frame 07, type: P, format: 0, pts: 600, keyframe: false
Packet received for stream 00, pts: 700
Send video packet to decoder...
Frame 08, type: P, format: 0, pts: 700, keyframe: false
Packet received for stream 01, pts: 0
Packet received for stream 01, pts: 1024
Packet received for stream 01, pts: 2048
Packet received for stream 01, pts: 3072
Packet received for stream 01, pts: 4096
Packet received for stream 01, pts: 5120
Packet received for stream 01, pts: 6144
Packet received for stream 01, pts: 7168
Packet received for stream 01, pts: 8192
Packet received for stream 01, pts: 9216
Packet received for stream 01, pts: 10240
Packet received for stream 01, pts: 11264
Packet received for stream 01, pts: 12288
Packet received for stream 01, pts: 13312
Packet received for stream 01, pts: 14336
Packet received for stream 01, pts: 15360
Packet received for stream 01, pts: 16384
Packet received for stream 01, pts: 17408
Packet received for stream 01, pts: 18432
Packet received for stream 01, pts: 19456
Packet received for stream 01, pts: 20480
Packet received for stream 01, pts: 21504
Packet received for stream 00, pts: 800
Send video packet to decoder...
Frame 09, type: P, format: 0, pts: 800, keyframe: false
Packet received for stream 00, pts: 900
Send video packet to decoder...
Frame 10, type: P, format: 0, pts: 900, keyframe: false

July 26, 2024 12:00 AM

July 24, 2024

Andy Wingo

whippet progress update: funding, features, future

Greets greets! Today, an update on recent progress in Whippet, including sponsorship, a new collector, and a new feature.

the lob, the pitch

But first, a reminder of what the haps: Whippet is a garbage collector library. The target audience is language run-time authors, particularly “small” run-times: wasm2c, Guile, OCaml, and so on; to a first approximation, the kinds of projects that currently use the Boehm-Demers-Weiser collector.

The pitch is that if you use Whippet, you get a low-fuss small dependency to vendor into your source tree that offers you access to a choice of advanced garbage collectors: not just the conservative mark-sweep collector from BDW-GC, but also copying collectors, an Immix-derived collector, generational collectors, and so on. You can choose the GC that fits your problem domain, like Java people have done for many years. The Whippet API is designed to be a no-overhead abstraction that decouples your language run-time from the specific choice of GC.

I co-maintain Guile and will be working on integrating Whippet in the next months, and have high hopes for success.

bridgeroos!

I’m delighted to share that Whippet was granted financial support from the European Union via the NGI zero core fund, administered by the Dutch non-profit, NLnet foundation. See the NLnet project page for the overview.

This funding allows me to devote time to Whippet to bring it from proof-of-concept to production. I’ll finish the missing features, spend some time adding tracing support, measuring performance, and sanding off any rough edges, then work on integrating Whippet into Guile.

This bloggery is a first update of the progress of the funded NLnet project.

a new collector!

I landed a new collector a couple weeks ago, a parallel copying collector (PCC). It’s like a semi-space collector, in that it always evacuates objects (except large objects, which are managed in their own space). However instead of having a single global bump-pointer allocation region, it breaks the heap into 64-kB blocks. In this way it supports multiple mutator threads: mutators do local bump-pointer allocation into their own block, and when their block is full, they fetch another from the global store.

The block size is 64 kB, but really it’s 128 kB, because each block has two halves: the active region and the copy reserve. It’s a copying collector, after all. Dividing each block in two allows me to easily grow and shrink the heap while ensuring there is always enough reserve space.

Blocks are allocated in 64-MB aligned slabs, so you get 512 blocks in a slab. The first block in a slab is used by the collector itself, to keep metadata for the rest of the blocks, for example a chain pointer allowing blocks to be collected in lists, a saved allocation pointer for partially-filled blocks, whether the block is paged in or out, and so on.

The PCC not only supports parallel mutators, it can also trace in parallel. This mechanism works somewhat like allocation, in which multiple trace workers compete to evacuate objects into their local allocation buffers; when an allocation buffer is full, the trace worker grabs another, just like mutators do.

However, unlike the simple semi-space collector which uses a Cheney grey worklist, the PCC uses the fine-grained work-stealing parallel tracer originally developed for Whippet’s Immix-like collector. Each trace worker maintains a local queue of objects that need tracing, which currently has 1024 entries. If the local queue becomes full, the worker will publish 3/4 of those entries to the worker’s shared worklist. When a worker runs out of local work, it will first try to remove work from its own shared worklist, then will try to steal from other workers.

Of course, because threads compete to evacuate objects, we have to use atomic compare-and-swap instead of simple forwarding pointer updates; if you only have one mutator thread and are OK with just one tracing thread, you can avoid the ~30% performance penalty that atomic operations impose. The PCC generally starts to win over a semi-space collector when it can trace with 2 threads, and gets better with each thread you add.

I sometimes wonder whether the worklist should contain grey edges or grey objects. MMTk seems to do the former, and bundles edges into work packets, which are the unit of work sharing. I don’t know yet what is best and look forward to experimenting once I have better benchmarks.

Anyway, maintaining an external worklist is cheating in a way: unlike the Cheney worklist, this memory is not accounted for as part of the heap size. If you are targetting a microcontroller or something, probably you need to choose a different kind of collector. Fortunately, Whippet enables this kind of choice, as it contains a number of collector implementations.

What about benchmarks? Well, I’ll be doing those soon in a more rigorous way. For now I will just say that it seems to behave as expected and I am satisfied; it is useful to have a performance oracle against which to compare other collectors.

finalizers!

This week I landed support for finalizers!

Finalizers work in all the collectors: semi, pcc, whippet, and the BDW collector that is a shim to use BDW-GC behind the Whippet API. They have a well-defined relationship with ephemerons and are multi-priority, allowing embedders to build guardians or phantom references on top.

In the future I should do some more work to make finalizers support generations, if the collector is generational, allowing a minor collection to avoid visiting finalizers for old objects. But this is a straightforward extension that will happen at some point.

future!

And that’s the end of this update. Next up, I am finally going to tackle heap resizing, using the membalancer approach. Then basic Windows and Mac support, then I move on to the tracing and performance measurement phase. Onwards and upwards!

by Andy Wingo at July 24, 2024 09:19 AM

July 22, 2024

Eric Meyer

Design for Real Life News!

If you’re reading this, odds are you’ve at least heard of A Book Apart (ABA), who published Design for Real Life, which I co-wrote with Sara Wachter-Boettcher back in 2016.  What you may not have heard is that ABA has closed up shop.  There won’t be any more new ABA titles, nor will ABA continue to sell the books in their catalog.

That’s the bad news.  The great news is that ABA has transferred the rights for all of its books to their respective authors! (Not every ex-publisher does this, and not every book contract demands it, so thanks to ABA.) We’re all figuring out what to do with our books, and everyone will make their own choices.  One of the things Sara and I have decided to do is to eventually put the entire text online for free, as a booksite.  That isn’t ready yet, but it should be coming somewhere down the road.

In the meantime, we’ve decided to cut the price of print and e-book copies available through Ingram.  DfRL was the eighteenth book ABA put out, so we’ve decided to make the price of both the print and e-book $18, regardless of whether those dollars are American, Canadian, or Australian.  Also €18 and £18.  Basically, in all five currencies we can define, the price is 18 of those.

…unless you buy it through Apple Books; then it’s 17.99 of every currency, because the system forces us to make it cheaper than the list price and also have the amount end in .99.  Obversely, if you’re buying a copy (or copies) for a library, the price has to be more than the list price and also end in .99, so the library cost is 18.99 currency units.  Yeah, I dunno either.

At any rate, compared to its old price, this is a significant price cut, and in some cases (yes, Australia, we’re looking at you) it’s a huge discount.  Or, at least, it will be at a discount once online shops catch up.  The US-based shops seem to be up to date, and Apple Books as well, but some of the “foreign” (non-U.S.) sources are still at their old prices.  In those cases, maybe wishlist or bookmark or something and keep an eye out for the drop.  We hope it will become global by the end of the week.  And hey, as I write this, a couple of places have the ebook version for like 22% less than our listed price.

So!  If you’ve always thought about buying a copy but never got around to it, now’s a good time to get a great deal.  Ditto if you’ve never heard of the book but it sounds interesting, or you want it in ABA branding, or really for any other reason you have to buy a copy now.

I suppose the real question is, should you buy a copy?  We’ll grant that some parts of it are a little dated, for sure.  But the concepts and approaches we introduced can be seen in a lot of work done even today.  It made significant inroads into government design practices in the UK and elsewhere, for example, and we still hear from people who say it really changed how they think about design and UX.  We’re still very proud of it, and we think anyone who takes the job of serving their users seriously should give it a read.  But then, I guess we would, or else we’d never have written it in the first place.

And that’s the story so far.  I’ll blog again when the freebook is online, and if anything else changes as we go through the process.  Got questions?  Leave a comment or drop me a line.


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at July 22, 2024 03:22 PM

Andy Wingo

finalizers, guardians, phantom references, et cetera

Consider guardians. Compared to finalizers, in which the cleanup procedures are run concurrently with the mutator, by the garbage collector, guardians allow the mutator to control concurrency. See Guile’s documentation for more notes. Java’s PhantomReference / ReferenceQueue seems to be similar in spirit, though the details differ.

questions

If we want guardians, how should we implement them in Whippet? How do they relate to ephemerons and finalizers?

It would be a shame if guardians were primitive, as they are a relatively niche language feature. Guile has them, yes, but really what Guile has is bugs: because Guile implements guardians on top of BDW-GC’s finalizers (without topological ordering), all the other things that finalizers might do in Guile (e.g. closing file ports) might run at the same time as the objects protected by guardians. For the specific object being guarded, this isn’t so much of a problem, because when you put an object in the guardian, it arranges to prepend the guardian finalizer before any existing finalizer. But when a whole clique of objects becomes unreachable, objects referred to by the guarded object may be finalized. So the object you get back from the guardian might refer to, say, already-closed file ports.

The guardians-are-primitive solution is to add a special guardian pass to the collector that will identify unreachable guarded objects. In this way, the transitive closure of all guarded objects will be already visited by the time finalizables are computed, protecting them from finalization. This would be sufficient, but is it necessary?

answers?

Thinking more abstractly, perhaps we can solve this issue and others with a set of finalizer priorities: a collector could have, say, 10 finalizer priorities, and run the finalizer fixpoint once per priority, in order. If no finalizer is registered at a given priority, there is no overhead. A given embedder (Guile, say) could use priority 0 for guardians, priority 1 for “normal” finalizers, and ignore the other priorities. I think such a facility could also support other constructs, including Java’s phantom references, weak references, and reference queues, especially when combined with ephemerons.

Anyway, all this is a note for posterity. Are there other interesting mutator-exposed GC constructs that can’t be implemented with a combination of ephemerons and multi-priority finalizers? Do let me know!

by Andy Wingo at July 22, 2024 09:27 AM

July 18, 2024

Brian Kardell

927: Thoughts on a Global Design System

927: Thoughts on a Global Design System

My thoughts on "A Global Design System" as is being discussed in OpenUI.

As you may or may not be aware, there's been recent discussion in OpenUI, brought forward by an effort by my fellow Pittsburgher Brad Frost, about the group taking on the effort of creating a global design system.

First, let me say that the problem that Brad describes is real, and also not new. He and I have discussed this in the past as well. I've spent a lot (the majority maybe) of my career (which began in the 90s) working on projects that were either using, evaluating or making their own common controls.

So much wasted energy

While explaining this, Brad frequently notes that inventing and reinventing the same things over and over wastes an enormous amount of human potential. We could be spending that time better.

I mean... Yes. I agree.

But, even more than that, the time spent re-inventing is only part of the story. The status quo is good for approximately no one. It also has multiplicative effects far beyond just the actual reinvention..

There might be 100 toolkits/component libraries which combined have 100k worth of invested hours, and yeah, that's a huge amount of time... Those hours are also wildly skewed. 1 might have 10x or even 100x the thought, care, review and testing than another.

But while there might be thousands of people spending time re-inventing, there are millions of authors who need components - and so many are spending at least a few hours, or maybe in some cases days searching for suitable components. I've been involved in corporate evaluations that were weeks of time. And it's hard to evaluate them and make good choices that consider accessibility, responsiveness, design, and internationalization. It is not only time-consuming, we often don't have the skills to do it. That is, after all, one of the reasons we want them: So that we don't each have to know all that stuff.

But then, how do we expect authors make a good choice?

Sometimes the ones with the least effort put into them can have a great looking web site, nice documentation, charismatic promotion, or be somehow associated with a big tech company. Too often we wind up choosing components by proxy and just assuming that something else must mean it's good, and will last a long time. However, history has not borne that out — see the various component toolkits and design systems from even big orgs like Microsoft and Google, for example, that fell by the wayside.

But yeah - multiply that time out... What all of this currently creates is bad all around. All of the millions of developers looking and ultimately unable to make well-informed choices is probably tens of millions of hours, by comparison.

In the end, many give up and re-implement again, making the problem even worse.

Each one might introduce tiny variations and accidentally invent something subtly new and create new challenges for users that we'll spend years sorting out too.

Ugh. It's bad. We should want a better future, and we should act on that.

Imagining a Better Future

Here's where I believe we get into trouble though: We have to be clear on what are we imagining, and whether it is practical/pragmatic to deliver adequate value in a reasonable timeframe.

Native HTML?

We could, for example, choose to imagine that HTML can be given a great and complete set of elements representing a complete UI toolkit. In addition to correcting all of the issues with the elements we've added so far, this means adding powerful grids connected to data, tabsets, notifications, carousels, charts, and so on.

Can it? Eventually, maybe, but I hope it is not controversial to suggest that it is extremely unlikely that we could accomplish this with the necessary qualities and in a reasonable timeframe. There's just no information or insight I have that gives me hope that focusing only on that scenario is a good idea.

This is a good end-goal for many components, but it's not where to start. It's hard and time consuming and gated on very specific and limited participation of a small number of people. HTML itself moves slow, on purpose.

I think HTML is at the end of 99 other steps...

The real question, I believe, is about improving how we get there, and deliver iterative partial value along the way.

New Web Components Reference Implementations?

It's been suggested that we could work on a single standard with a reference implementation for each component.

I do believe that ultimately this is a good goal, but I'd like to suggest that it's not where to start either.

The challenges to this are less than trying to add it to HTML in some ways, it doesn't require browser vendorts to act in concert, sure. We can iterate on it, sure. But the challenges are still huge and trading knowns for unknowns.

Instead of needing to convince 3 browser vendors to act in concert, we have to convince several UI kit vendors and developers to participate. We also have to convince everyone to use it and try to avoid XKCD 927 territory...

XKCD 927
Situation: There are 14 Competing Standards
Person 1: 14? Ridiculous! We need to develop one universal standard that covers everyone's use cases!
Person 2: Yeah!
Situation: There are 15 Competing Standards

This is exacerbated by the fact that it won't come all at once. It'll still be a non-trivial amount of time before we have a whole library of components which could reasonably be promoted for use. It still requires people with expertise (probably many of the same people as if it were native) to participate for reviewing accessibility, usability, internationalization, etc. In practice, there are just very finite resources available to put toward large scale, long term cooperation. Practically speaking, it seems likely we could only focus on a couple of components at a time.

Let's say we finish and ship the first component: Tabs. Can we really call it a global design system if it has just one component? Won't that really limit who can/will adopt it?

Adopt, modify and bless an library

It's been suggested that we could take up a library as a kind of a 'donation' project to provide a starting point. Specifically, maintainers from Shoelace/Web Awesome (also formerly MS components) have volunteered components for this purpose. Not as a "this is the thing" but a "this is a start". That would give us a nice leap forward.

Yeah, it would.

Except... Doesn't it raise a lot of questions we have to answer anyways?

First, but maybe not as importantly: Why that one? That goes to legitimacy. We should be able to explain why this is not just the first attractive looking opportunity that presented itself.

More importantly, it seems to me that the rest of the situation decribed above remains largely unchanged. We can't seriously promote that until it is deemed "good", and practically speaking it seems that we will approve them individually, not as a library. So, can't we define how we think it should work before we worry about picking a library?

The most obvious thing we could have ever done that with was jQuery, and we didn't.

I think that a library of reference implementations that we can agree to and recommend is still very far along the timeline...

The real question, I believe, is about improving how we get there, and deliver iterative partial value along the way.

We still don't have a great way to evolve the web - but I keep saying that I think we should.

How I think we could get there...

This is what I want more than anything: A plan to get there. Reasonable steps of value along the way, comparatively quickly.

It is effectively what I thought in 2013-2014 too. I suggested to the W3C Advisory Committee that we needed to rethink how we approach standards to include this sort of idea, which could work more like languages/dictionaries. I tried to suggest the W3C should create such a process/listing/review process.

What follows is a vague outline of what I imagine:

I'd like to create a central place where we lay out some new rules and a process where components, in a basic form that we agree to (it is as a module, should it use shadow dom or not, etc) can be submitted.

What are the criteria? That's the first few steps...

We'd define some criteria gating submission, first with IP/license agreements we agree to, possibly some kind of bar for contributors or stars or something, but mainly: A commitment of participation in this new process, especially from experts. Honestly, participation is a bigger part of the limiting factor than anyone really imagines.

Once submitted it would undergo wide review and get some kind of 'verification stamps' for each kind (a11y, i18n, etc).

For this reason, I would really love to try to include the authors of government tools here. They are legally mandated and funded to solve the problem already and seem highly incentivized to participate. A collective of government efforts also lends immediate credibility and sense of independence to it.

To me, ideally, we would begin with a call for components/participation.

A call for particpation/submissions...

You might have noticed...

You might have noticed that I didn't answer the question of "how do we pick one?" That's because I think that's like 99 steps down the road and will come naturally.

We can get a set of people who can contribute tabs, and a set of people who can review, and we can all discuss several of them at the same time. We can begin to lay out conformance criteria, and give each one little 'conformance stamps' along the way. Inevitably we can more easily get implementations to align and develop universal definitions and tests -- new stamps to be had.

Component get conformance stamps...

For authors, along the way, there's a nice central catalog somewhere, like webcomponents.org, but better. You'll know those have been submitted, and which ones have which conformance stamps. Maybe there isn't a 'the one', yet. But, it's ok? You have a smaller set, and the information you really need to choose one. Maybe all 3 of them are ... fine?

That's not the worst thing, we can sit back and evaluate it for a while while already saving ourselves collectively millions of hours and our users a lot of pain.

In fact, collecting data and a little variation is good. Probably, they continue to align, or one begins to be the clearer winner.

We have very well defined, portable criteria for testing and more or less 1 definition...

And, that's the point: As we go we would slowly, but without stopping major progress at any point. Even if nothing more happens, each of those steps has had real value. No one has just wasted time.

Then, maybe we can get somewhere where we have a single reference implementation of all of those things - or even a standard almost identical to them.

We have a true global reference implementation... Should we bake it into HTML?

In any case, that's how I would prefer to approach it. I wouldn't call it a "global design system" to start, because I wouldn't even start out assuming there would be only one of anything initially... But eventually.

July 18, 2024 04:00 AM

Igalia Compilers Team

Summary of the June 2024 TC39 plenary in Helsinki

In June, many colleagues from Igalia participated in a TC39 meeting organized in Helsinki by Aalto University and Mozilla to discuss proposed features for the JavaScript standard alongside delegates from various other organizations.

Let's delve together into some of the most exciting updates!

You can also read the full agenda and the meeting minutes on GitHub.

Day 1 #

import defer to Stage 2.7 igalia logo #

The import defer proposal allows pre-loading modules while deferring their evaluation until a later time. This proposal aims at giving developers better tools to optimize the startup performance of their application.

As soon as some code needs the variables exported by a deferred module, it will be synchronously evaluated to immediately give access to its value:

// This does not cause evaluation of my-mod or its dependencies:
import defer * as myMod from "./my-mod.js";

$button.addEventListener("click", () => {
// but this does!
console.log("val:", myMod.val);
});

This is similar to, when using CommonJS, moving a require(...) call from the top-level of a module to inside a function. Adding a similar capability to ES modules is one further step towards helping people migrate from CommonJS to ESM.

The proposal reached stage 2.7 after answering a few questions centered around the behavior of top-level await: modules with top-level await will not be deferred even if imported through import defer, because they cannot be synchronously evaluted. If you application can easily handle asynchronous deferred evaluation of modules, it can as well use dynamic import().

The proposal now needs to have test262 tests written, to be able to go to Stage 3.

Promise.try to Stage 3 #

The new Promise.try helper allows calling a functions that might or might not be asynchronous, unifying their error handling paths. This is useful for asynchronous APIs (that should always signal errors by returning a rejected promise) that interact with user-provided callbacks.

Consider this example:

type AsyncNegator = (val: number) => number | Promise<number>;
function subtract(a: number, b: number, negate: AsyncNegator): Promise<number> {
return Promise.resolve(negate(b)).then(negB => a + negB);
}

While the developer here took care of wrapping negate's result in Promise.resolve, in case negate returns a number directly, what happens in negate throws an error? In that case, subtract will throw synchronously rather than returning a rejected promise!

With Promise.try, you can easily handle both the success and error paths correctly:

type AsyncNegator = (val: number) => number | Promise<number>;
function subtract(a: number, b: number, negate: AsyncNegator): Promise<number> {
return Promise.try(() => negate(b)).then(negB => a + negB);
}

Day 2 #

Source maps update #

Source maps are an important tool in a developer's toolbox: they are what lets you debug transpiled/minified code in your editor or browser, while still stepping through your original hand-authored code.

While they are supported by most tools and browsers, there hasn't been a shared standard that defines how they should work. Tools and browsers all have to peek at what the others are doing to understand how to properly implement them, and this situation makes it very difficult to evolve the format to improve the debugging experience.

TC39 recently picked up the task of formalizing the standard, as well as adding new features such as the scopes proposal that would let devtools better understand renamed variables and inlined functions.

Iterator.zip to Stage 2.7 #

TC39 is working on many helpers for more easily working with iterators ("lazy lists", that only produce values as needed). While most of them are in the Iterator Helpers proposal, this one is advancing on its own.

Iterator.zip allows pairing values coming from multiple iterators:

function getNums(start = 0, step = 1) {
for (let val = start; ; start += step) yield step;
}

let naturals = getNums();
let evens = getNums(0, 2);
let negatives = getNums(-1, -1);

// an iterator of [a natural, an even, a negative]
let allTogether = Iterators.zip([naturals, evens, negative]);

console.log(allTogether.next().value); // [0, 0, -1]
console.log(allTogether.next().value); // [1, 2, -2]
console.log(allTogether.next().value); // [2, 4, -3]

This proposal, like import defer, just reached the new Stage 2.7: it will now need test262 tests to be eligible for Stage 3.

Temporal reduction igalia logo #

Temporal is one of the longest awaited features of JavaScript, advancing bit by bit on its path to stage 4 as obstacles are removed. For the last 6 months or so we have been working on removing one of the final obstacles: addressing feedback from JS engines on the size and complexity of the proposal, which culminated in this meeting.

As we get closer to having shipping implementations, it's become clear that the size of Temporal was an obstacle for platforms such as low-end Android devices: it added a large chunk to the size of the JS engine all at once. So, Philip Chimento and Justin Grant presented a slate of API removals to make the proposal smaller.

What was removed? Some methods previously existed for convenience, but were removed as somewhat redundant because there was a one-line way to accomplish the same thing. A more substantial removal was Temporal.Calendar and Temporal.TimeZone objects, along with the ability to extend them to implement custom calendars and custom time zones. We've received feedback that these have been the most complicated parts of the proposal for implementations, and they've also been where the most bugs have popped up. As well, due to the existence of formats like jsCalendar (RFC 8984), as well as learning more about the drawbacks of a callback-based design, we believe there are better designs possible for custom time zones and calendars than there were when the feature was designed.

Most of the slate of removals was adopted, and Temporal continues its journey to stage 4 smaller than it was before. You can follow the progress in this ticket on Temporal's issue tracker.

Day 3 #

Decimal update igalia logo #

If you're tired of the fact that 0.1 + 0.2 is not 0.3 in JavaScript, then the decimal proposal is for you! This proposal, which is currently at stage 1, was presented by Jesse Alama. The goal was to present an update about some changes to the API, and go through the status of the proposal's spec text. Although most of the committee was generally supportive of allowing this proposal to go to stage 2, it remains at stage 1 due to some concerns about missing details in the spec text and the overall motivation of the proposal.

Discard bindings update #

The intention of discard bindings is to formalize a pattern commonly seen out there in the wild of JavaScript:

let [ _, rhs ] = s.split(".");

Notice that underscore character (_)? Although our intention is to signal to the readers of our code that we don't care what is on the left-hand side of the . in our string, _ is actually a valid identifier. It might even contain a big value, which takes up memory. This is just the tip of the iceberg; things get even more complex when one imagines binding -- but not using! -- even more complex entities. We would like to use _ -- or perhaps something else, like void -- to signal to the JavaScript engine that it can throw away whatever value _ might have held. Ron Buckton presented an update about this proposal and successfully advanced it to Stage 2 in the TC39 process.

Signals update #

Signals is an ambitious proposal that takes some of the various approaches to reactivity found out there in various JS frameworks such as Vue, React, and so on. The idea is to bring some form of reactivity into the JS Core. Of course, different approaches to reactivity are possible, and should remain valid; the idea of the Signals proposal is to provide some common abstraction that different aproaches can build on. Dan Ehrenberg presented an update on this new proposal, which is currently at stage 1. A lot of work remains to be done; Dan explicitly said that a request to move Signals to stage 2 might take at least a year.

July 18, 2024 12:00 AM

July 12, 2024

Georges Stavracas

Profiling a web engine

One topic that interests me endlessly is profiling. I’ve covered this topic many times in this blog, but not enough to risk sounding like a broken record yet. So here we are again!

Not everyone may know this but GNOME has its own browser, Web (a.k.a. Epiphany, or Ephy for the intimates). It’s a fairly old project, descendant of Galeon. It uses the GTK port of WebKit as its web engine.

The recent announcement that WebKit on Linux (both WebKitGTK and WPE WebKit) switched to Skia for rendering brought with it a renewed interest in measuring the performance of WebKit.

And that was only natural; prior to that, WebKit on Linux was using Cairo, which is entirely CPU-based, whereas Skia had both CPU and GPU-based rendering easily available. The CPU renderer mostly matches Cairo in terms of performance and resource usage. Thus one of the big promises of switching to Skia was better hardware utilization and better overall performance by switching to the GPU renderer.

A Note About Cairo

Even though nowadays we often talk about Cairo as a legacy piece of software, there’s no denying that Cairo is really good at what it does. Cairo can and often is extremely fast at 2D rendering on the CPU, specially for small images with simple rendering. Cairo has received optimizations and improvements for this specific use case for almost 20 years, and it is definitely not a low bar to beat.

I think it’s important to keep this in mind because, as tempting as it may sound, simply switching to use GPU rendering doesn’t necessarily imply better performance.

Guesswork is a No-No

Optimizations should always be a byproduct of excellent profiling. Categorically speaking, meaningful optimizations are a consequence of instrumenting the code so much that the bottlenecks become obvious.

I think the most important and practical lesson I’ve learned is: when I’m guessing what are the performance issues of my code, I will be wrong pretty much 100% of the time. The only reliable way to optimize anything is to have hard data about the behavior of the app.

I mean, so many people – myself included – were convinced that GNOME Software was slow due to Flatpak that nobody thought about looking at app icons loading.

Enter the Profiler

Thanks to the fantastic work of Søren Sandmann, Christian Hergert, et al, we have a fantastic modern system profiler: Sysprof.

Sysprof offers a variety of instruments to profile the system. The most basic one uses perf to gather stack traces of the processes that are running. Sysprof also supports time marks, which allow plotting specific events and timings in a timeline. Sysprof also offers extra instrumentation for more specific metrics, such as network usage, graphics, storage, and more.

  • Screenshot of Sysprof's callgraph view
  • Screenshot of Sysprof's flamegraphs view
  • Screenshot of Sysprof's mark chart view
  • Screenshot of Sysprof's waterfall view

All these metrics are super valuable when profiling any app, but they’re particularly useful for profiling WebKit.

One challenging aspect of WebKit is that, well, it’s not exactly a small project. A WebKit build can easily take 30~50min. You need a fairly beefy machine to even be able to build a debug build of WebKit. The debug symbols can take hundreds of megabytes. This makes WebKit particularly challenging to profile.

Another problem is that Sysprof marks require integration code. Apps have to purposefully link against, and use, libsysprof-capture to send these marks to Sysprof.

Integrating with Sysprof

As a first step, Adrian brought the libsysprof-capture code into the WebKit tree. As libsysprof-capture is a static library with minimal dependencies, this was relatively easy. We’re probably going to eventually remove the in-tree copy and switch to host system libsysprof-capture, but having it in-tree was enough to kickstart the whole process.

Originally I started sprinkling Sysprof code all around the WebKit codebase, and to some degree, it worked. But eventually I learned that WebKit has its own macro-based tracing mechanism that is only ever implemented for Apple builds.

Looking at it, it didn’t seem impossible to implement these macros using Sysprof, and that’s what I’ve been doing for the past few weeks. The review was lengthy but behold, WebKit now reports Sysprof marks!

Screenshot of Sysprof with WebKit marks highlighted

Right now these marks cover a variety of JavaScript events, layout and rendering events, and web page resources. This all came for free from integrating with the preexisting tracing mechanism!

This gives us a decent understanding of how the Web process behaves. It’s not yet complete enough, but it’s a good start. I think the most interesting data to me is correlating frame timings across the whole stack, from the kernel driver to the compositor to GTK to WebKit’s UI process to WebKit’s Web process, and back:

Screenshot of Sysprof with lots of compositor and GTK and WebKit marks

But as interesting as it may be, oftentimes the fun part of profiling is being surprised by the data you collect.

For example, in WebKit, one specific, seemingly innocuous, completely bland method is in the top 3 of the callgraph chart:

Screenshot of Sysprof showing the callgraph view with an interesting result highlighted

Why is WebCore::FloatRect::contains so high in the profiling? That’s what I’m investigating right now. Who guessed this specific method would be there? Nobody, as far as I know.

Once this is out in a stable release, anyone will be able to grab a copy of GNOME Web, and run it with Sysprof, and help find out any performance issues that only reproduce in particular combinations of hardware.

Next Plans

To me this already is a game changer for WebKit, but of course we can do more. Besides the rectangular surprise, and one particular slowdown that comes from GTK loading Vulkan on startup, no other big obvious data point popped up. Specially in the marks, I think their coverage is still fairly small compared to what it could have been.

We need more data.

Some ideas that are floating right now:

  • Track individual frames and correlate them with Sysprof marks
  • Measure top-to-bottom-to-top latency
  • Measure input delay
  • Integrate with multimedia frames

Perhaps this will allow us to make WebKit the prime web engine for Linux, with top-tier performance, excellent system integration, and more. Maybe we can even redesign the whole rendering architecture of WebKit on Linux to be more GPU friendly now. I can dream high, can’t I? 🙂

In any case, I think we have a promising and exciting time ahead for WebKit on Linux!

by Georges Stavracas at July 12, 2024 12:42 PM

July 10, 2024

Andy Wingo

copying collectors with block-structured heaps are unreliable

Good day, garbage pals! This morning, a quick note on “reliability” and garbage collectors, how a common GC construction is unreliable, and why we choose it anyway.

on reliability

For context, I’m easing back in to Whippet development. One of Whippet’s collectors is a semi-space collector. Semi-space collectors are useful as correctness oracles: they always move objects, so they require their embedder to be able to precisely enumerate all edges of the object graph, to update those edges to point to the relocated objects. A semi-space collector is so simple that if there is a bug, it is probably in the mutator rather than the collector. They also have well-understood performance, as as such are useful when comparing performance of other collectors.

But one other virtue of the simple semi-space collector is that it is reliable, in the sense that given a particular set of live objects, allocated in any order, there is a single heap size at which the allocation (and collection) will succeed, and below which the program fails (not enough memory). This is because all allocations go in the same linear region, collection itself doesn’t allocate memory, the amount of free space after an object (the fragmentation) does not depend on where it is allocated, and those object extents just add up in a commutative way.

Reliability is a virtue. Sometimes it is a requirement: for example, the Toit language and run-time targets embeded microcontrollers, and you there you have finite resources and either they workload fits or it doesn’t. You can’t really tolerate a result of “it works sometimes”. So, Toit uses a generational semi-space + mark-sweep collector that never makes things worse.

on block-structured heaps

But, threads make reliability tricky. With Whippet I am targetting embedders with multiple mutator threads, and a classic semi-space collector doesn’t scale – you could access the allocation pointer atomically, but that would be a bottleneck, serializing mutators, not to mention the cache contention.

The usual solution for this problem is to arrange the heap in such a way that different threads can allocate in different areas, so they don’t need to share an allocation pointer and so they don’t write to the same cache lines. And, the most common way to do this is to use a block-structured heap; for example you might have a 256 MB heap, but divided into 4096 blocks, each of which is 64 kB. That’s enough granularity to dynamically partition out space between many threads: you keep a list of available blocks and allocator threads compete to grab fresh blocks as needed. There’s central contention on the block list, so you want blocks big enough that you aren’t fetching blocks too often.

To break a heap into blocks requires a large-object space, to allow for allocations that are larger than a block. And actually, as I mentioned in the article about block-structured heaps, usually you choose a threshold for large object allocations that is smaller than the block size, to limit the maximum and expected amount of fragmentation at the end of each block, when an allocation doesn’t fit.

on unreliability

Which brings me to my point: a copying collector with a block-structured heap is unreliable, in the sense that there is no single heap size below which the program fails and above which it succeeds.

Consider a mutator with a single active thread, allocating a range of object sizes, all smaller than the large object threshold. There is a global list of empty blocks available for allocation, and the thread grabs blocks as needed and bump-pointer allocates into that block. The last allocation in each block will fail: that’s what forces the thread to grab a new fresh block. The space left at the end of the block is fragmentation.

Assuming that the sequence of allocations performed by the mutator is deterministic, by the time the mutator has forced the first collection, the total amount of fragmentation will also be deterministic, as will the set of live roots at the time of collection. Assume also that there is a single collector thread which evacuates the live objects; this will also produce deterministic fragmentation.

However, there is no guarantee that the post-collection fragmentation is less than the pre-collection fragmentation. Unless objects are copied in such a way that preserves allocation order—generally not the case for a semi-space collector, and it would negate any advantage of a block-structured heap—then different object order could produce different end-of-block fragmentation.

causes of unreliability

The unreliability discussed above is due to non-commutative evacuation. If your collector marks objects in place, you are not affected. If you don’t commute live objects—if you preserve their allocation order, as Toit’s collector does—then you are not affected. If your evacuation commutes, as in the case of the simple semi-space collector, you are not affected. But if you have a block-structured heap and you evacuate, your collector is probably unreliable.

There are other sources of unreliability in a collector, but to my mind they are not as fundamental as this one.

  • Multiple mutator threads generally lead to a kind of unreliability, because the size of the live object graph is not deterministic at the time of collection: even if all threads have the same allocation trace, they don’t necessarily proceed in lock-step nor stop in the same place.

  • Adding collector threads to evacuate in parallel adds its own form of unreliability: if you have 8 evacuator threads, then there are 8 blocks local to the evacuator threads which also contribute to post-collection wasted space, possibly an entire block per thread.

  • Some collectors will allocate memory during collection, for example to represent a worklist of objects that need tracing. This allocation can fail. Also, annoyingly, collection-time allocation complicates comparison: you can no longer compare two collectors at the same heap size, because one of them cheats.

  • Virtual memory and paging can make you have a bad time. For example, you go to allocate a large object, so you remove some empty blocks from the main space and return them to the OS, providing you enough budget to allocate the new large object. Then the new large object is collected, so you reclaim the pages you returned to the OS, adding them to the available list. But probably you don’t page them in already, because who wants a syscall? They get paged in lazily when the mutator needs them, but that could fail because of other processes on the system.

embracing unreliability

I think it only makes sense to insist on a reliable collector if your mutator does not have threads; otherwise, the fragmentation-related unreliability pales in comparison.

What’s more, fragmentation-related unreliability can be entirely mitigated by giving the heap more memory: the maximum amount of fragmentation is an object just shy of the large object threshold, per block, so in our case 8 kB per 64 kB. So, just increase your heap by 12.5%. You will certainly not regret increasing your heap by 12.5%.

And happily, increasing heap size also works to mitigate unreliability related to multiple mutator threads. Consider 10 threads each of which has a local object graph that is usually 10 MB but briefly 100MB when calculating: usually when GC happens, total live object size is 10×10MB=100MB, but sometimes as much as 1 GB; there is a minimum heap size for which the program sometimes works, but also a minimum heap size at which it always works. The trouble is, of course, that you generally only know the minimum always-works size by experimentation, and you are unlikely to be able to actually measure the maximum heap size.

Which brings me to my final point, which is that virtual memory and growable heaps are good. Unless you have a dedicated devops team or you are evaluating a garbage collector, you should not be using a fixed heap size. The ability to just allocate some pages to keep the heap from being too tight buys us a form of soft reliability.

And with that, end of observations. Happy fragmenting, and until next time!

by Andy Wingo at July 10, 2024 08:48 AM

Stephen Chenney

Canvas Text Editing

Editing text in HTML canvas has never been easy. It requires identifying which character is under the hit point in order to place a caret, and it requires computing bounds for a range of text that is selected. The existing implementations of Canvas TextMetrics made these things possible, but not without a lot of Javascript making multiple expensive calls to compute metrics for substrings. Three new additions to the TextMetrics API are intended to support editing use cases in Canvas text. They are in the standards pipeline, and implemented in Chromium-based browsers behind the ExtendedTextMetrics flag:

  • caretPositionFromPoint gives the location in a string corresponding to a pixel length along the string. Use it to identify where the caret is in the string, and what the bounds of a selection range are.
  • getSelectionRects returns the rectangles that a browser would use to highlight a range of text. Use it to draw the selection highlight.
  • getActualBoundingBox returns the bounding box for a sub-range of text within a string. Use it if you need to know whether a point lies within a substring, rather than the entire string.

To enable the flag, use --enable-blink-features=ExtendedTextMetrics when launching Chrome from a script or command line, or enable “Experimental Web Platform features” via chrome://flags/#enable-experimental-web-platform-features.

I wrote a basic web app (opens in a new tab) in order to demonstrate the use of these features. It will function in Chrome versions beyond 128.0.6587.0 (Canary at the time of writing) with the above flags set.

The app allows the editing of a single line of text drawn in an HTML canvas. Here I’ll work through usage of the new features.

In the demo, the first instance of “new Canvas Text Metrics” is considered a link back to this blog page. Canvas Text has no notion of links, and thousands of people have looked at Stack Exchange for a way to insert hyperlinks in canvas text. Part of the problem, assuming you know where the link is in the text, is determining when the link was clicked on. The TextMetrics getActualBoundingBox(start, end) method is intended to simplify the problem by returning the bounding box of a substring of the text, in this case the link.

  onStringChanged() {
text_metrics = context.measureText(string);
link_start_position = string.indexOf(link_text);
if (link_start_position != -1) {
link_end_position = link_start_position + link_text.length;
}
}
...
linkHit(x, y) {
let bound_rect = undefined;
try {
bound_rect = text_metrics.getActualBoundingBox(link_start_position, link_end_position);
} catch (error) {
return false;
}
let relative_x = x - string_x;
let relative_y = y - string_y;
return relative_x >= bound_rect.left && relative_y >= bound_rect.top
&& relative_x < bound_rect.right && relative_y < bound_rect.bottom;
}

The first function finds the link in the string and stores the start and end string offsets. When a click event happens, the second method is called to determine if the hit point was within the link area. The text metrics object is queried for the bounding box of the link’s substring. Note the call is contained within a try...catch block because an exception will be returned if the substring is invalid. The event offset is mapped into the coordinate system of the text (in this case by subtracting the text location) and the resulting point is tested against the rectangle.

In more general situations you may need to use a regular expression to find links, and keep track of a more complex transformation chain to convert event locations into the text string’s coordinate system.

Mapping a Point to a String Index #

A primary concept of any editing application is the caret location because it indicates where typed text will appear, or what will be deleted by backspace, or where an insertion will happen. Mapping a hit point in the canvas into the caret position in the text string is a fundamental editing operation. It is possible to do this with existing methods but it is expensive (you can do a binary search using the width of substrings).

The TextMetrics caretPositionFromPoint(offset) method uses existing code in browsers to efficiently map a point to a string position. The underlying functionality is very similar to the document.caretPositionFromPoint(x,y) method, but modified for the canvas situation. The demo code uses it to position the caret and to identify the selection range.

  text_offset = event.offsetX - string_x;
caret_position = text_metrics.caretPositionFromPoint(text_offset);

The caretPositionFromPoint function takes the horizontal offset, in pixels, measured from the origin of the text (based on the textAlign property of the canvas context). The function finds the character boundary closest to the given offset, then returns the character index to the right for left-to-right text, and to the left for right-to-left text. The offset can be negative to allow characters to the left of the origin to be mapped.

In the figure below, the top string has textDirection = "ltr" and textAlign = "center". The origin for measuring offsets is the center of the string. Green shows the offsets given, while blue shows the indexes returned. The bottom string demonstrates textDirection = "rtl" and textAlign = "start".

An offset past the beginning of the text always returns 0, and past the end returns the string length. Note that the offset is always measured left-to-right, even if the text direction is right-to-left. The “beginning” and “end” of the text string do respect the text direction, so for RTL text the beginning is on the right.

The caretPositionFromPoint function may produce very counter-intuitive results when the text string has mixed bidi content, such as a latin substring within an arabic string. As the offset moves along the string the positions will not steadily increase, or decrease, but may jump around at the boundaries of a directional run. Full handling of bidi content requires incorporating bidi level information, particularly for selecting text, and is beyond the scope of this article.

Selection Rectangles #

Selected text is normally indicated by drawing a highlight over the range, but to produce such an effect in canvas requires estimating the rectangle using existing text metrics, and again making multiple queries to text metrics to obtain the left and right extents. The new TextMetrics getSelectionRects(start, end) function returns a list of browser defined selection rectangles for the given subrange of the string. There may be multiple rectangles because the browser returns one for each bidi run; you would need to draw them all to highlight the complete range. The demo assumes a single rectangle because it assumes no mixed-direction strings.

selection_rect = text_metrics.getSelectionRects(selection_range[0], selection_range[1])[0];
...
context.fillStyle = 'yellow';
context.fillRect(selection_rect.x + string_x,
selection_rect.y + string_y,
selection_rect.width,
selection_rect.height)

Like all the new methods, the rectangle returned is in the coordinate system of the string, as defined by the transform, textAlign and textBaseline.

Conclusion #

The new Canvas Text Metrics described here are in the process of standardization. When the feedback process is opened we will update this blog post with the place to raise issues with these proposed methods.

Thanks #

The implementation of Canvas Text Features was aided by Igalia S.L. funded by Bloomberg L.P.

July 10, 2024 12:00 AM

July 09, 2024

Guilherme Piccoli

Presenting kdumpst, or how to collect kernel crash logs on Arch Linux

It’s been a long time since I last posted something here – yet there are interesting news to mention, a bit far from ARM64 (the topic of my previous posts). Let’s talk today about kernel crashes, or even better, how can we collect information if a kernel panic happens on Arch Linux and on SteamOS, … Continue reading "Presenting kdumpst, or how to collect kernel crash logs on Arch Linux"

by gpiccoli at July 09, 2024 04:21 PM

July 08, 2024

Frédéric Wang

My recent contributions to Gecko (2/3)

Introduction

This is the second in a series of blog posts describing new web platform features Igalia has implemented in Gecko, as part of an effort to improve browser interoperability. I’ll talk about the task of implementing ‘content-visibility’, to which several Igalians have contributed since early 2022, and I’ll focus on two main roadblocks I had to overcome.

The ‘content-visibility’ property

In the past, Igalia worked on CSS containment, a feature allowing authors to isolate a subtree from the rest of the document to improve rendering performance. This is done using the ‘contain’ property, which accepts four kinds of containment: size, layout, style and paint.

‘content-visibility’ is a new property allowing authors to “hide” some content from the page, and save the browser unnecessary work by applying containment. The most interesting one is probably content-visibility: auto, which hides content that is not relevant to the user. This is essentially native “virtual scrolling”, allowing you to build virtualized or “recycled” lists without breaking accessibility and find-in-page.

To explain this, consider the typical example of a page with a series of posts, as shown below. By default, each post would have the four types of containment applied, plus it won’t be painted, won’t respond to hit-testing, and would use the dimensions specified in the ‘contain-intrinsic-size’ property. It’s only once a post becomes relevant to the user (e.g. when scrolled close enough to the viewport, or when focus is moved into the post) that the actual effort to properly render the content, and calculate its actual size, is performed:

div.post {
  content-visibility: auto;
  contain-intrinsic-size: 500px 1000px;
}
<div class="post">
...
</div>
<div class="post">
...
</div>
<div class="post">
...
</div>
<div class="post">
...
</div>

If a post later loses its relevance (e.g. when scrolled away, or when focus is lost) then it would use the dimensions specified by ‘contain-intrinsic-size’ again, discarding the content size that was obtained after layout. One can also avoid that and use the last remembered size instead:

div.post {
  contain-intrinsic-size: auto 500px auto 1000px;
}

Finally, there is also a content-visibility: hidden value, which is the same as content-visibility: auto but never reveals the content, enhancing other methods to hide content such as display: none or visibility: hidden.

This is just a quick overview of the feature, but I invite you to read the web.dev article on content-visibility for further details and thoughts.

Viewport distance for content-visibility: auto

As is often the case, the feature looks straightforward to implement, but issues appear when you get into the details.

In bug 1807253, my colleague Oriol Brufau raised an interoperability bug with a very simple test case, reproduced below for convenience. Chromium would report 0 and 42, whereas Firefox would sometimes report 0 twice, meaning that the post did not become relevant after a rendering update:

<!DOCTYPE html>
<div id="post" style="content-visibility: auto">
  <div style="height: 42px"></div>
</div>
<script>
console.log(post.clientHeight);
requestAnimationFrame(() => requestAnimationFrame(() => {
  console.log(post.clientHeight);
}));
</script>

It turned out that an early version of the specification relied too heavily on an modified version of IntersectionObserver to synchronously detect when an element is close to the viewport, as this was how it was implemented in Chromium. However, the initial implementation in Firefox relied on a standard IntersectionObserver (with asynchronous notifications of observers) and so failed to produce the behavior described in the specification. This issue was showing up in several WPT failures.

To solve that problem, the moment when we determine an element’s proximity to the viewport was moved into the HTML5 specification, at the step when the rendering is updated, more precisely when the ResizeObserver notifications are broadcast. My colleague Alexander Surkov had started rewriting Firefox’s implementation to align with this new behavior in early 2023, and I took over his work in November.

Since this touches the “update the rendering” step which is executed on every page, it was quite likely to break things… and indeed many regressions were caused by my patch, for example:

  • One regression was about white flickering of pages on every reload/navigation.
  • One more regression was about content-visibility: auto nodes not being rendered at all.
  • Another regression was about new resize loop errors appearing in tests.
  • Some test cases were also found where the “update the rendering step” would repeat indefinitely, causing performance regressions.
  • Last but not least, crashes were reported.

Some of these issues were due to the fact that support for the last remembered size in Firefox relied on an internal ResizeObserver. However, the CSS Box Sizing spec only says that the last remembered size is updated when ResizeObserver events are delivered, not that such an internal ResizeObserver object is actually needed. I removed this internal observer and ensured the last remembered size is computed directly in the “update the rendering” phase, making the whole thing simpler and more robust.

Dynamic changes to CSS ‘contain’ and ‘content-visibility’

Before sending the intent-to-ship, we reviewed remaining issues and stumbled on bug 1765615, which had been opened during the initial 2022 work. Mozilla indicated this performance bug was important enough to consider an optimization, so I started tackling the issue.

Elaborating a bit about what was mentioned above, a non-visible ‘content-visibility’ implies layout, style and paint containment, and when the element is not relevant to the user, it also implies size containment 1. This has certain side effects, for example paint and layout containment establish an independent formatting context and affect how the contained box interacts with floats and how margin collapsing applies. Style containment can even have more drastic consequences, since they make counter-* and *-quote properties scoped to the subtree.

When we dynamically modify the ‘contain’ or ‘content-visibility’ properties, or when the relevance of a content-visibility: auto element changes, browsers must make sure that the rendering is properly updated. It turned out that there were almost no tests for that, and unsurprisingly, Chromium and WebKit had various invalidation bugs. Firefox was always forcing a rebuild of the tree used for rendering, which avoided such bugs but is not optimal.

I wrote a couple of web platform tests for ‘contain’ and ‘content-visibility’ 2, and made sure that Firefox does the minimal invalidation effort needed, being careful not to cause any regressions. As a result, except for style containment changes, we’re now able to avoid the cost a rebuild of the tree used for rendering!

Conclusion

Almost two years after the initial work on ‘content-visibility’, I was able to send the intent-to-ship, and the feature finally became available in Firefox 125. Finishing the implementation work on this feature was challenging, but quite interesting to me.

I believe ‘content-visibility’ is a good example of why implementing a feature in different browsers is important to ensure that both the specification and tests are good enough. The lack of details in the spec regarding when we determine viewport proximity, and the absence for WPT tests for invalidation, definitely made the Firefox work take longer than expected. But finishing that implementation work was also useful for improving the spec, tests, and other implementations 3.

I’ll conclude this series of blog posts with fetch priority, which also has its own interesting story…

  1. In both cases, “implies” means the used value of ‘contain’ is modified accordingly. 

  2. One of the thing I had to handle with care was the update of the accessibility tree, since content that is not relevant to the user must not be exposed. Unfortunately it’s not possible to write WPT tests for accessibility yet so for now I had to write internal Firefox-specific non-regression tests. 

  3. Another interesting report happened after the release and is related to content-visibility: auto on elements drawn in a canvas

July 08, 2024 10:00 PM

July 01, 2024

Alex Bradbury

pwr

Summary

pwr (paced web reader) is a script and terminal-centric workflow I use for keeping up to date with various sources online, shared on the off chance it's useful to you too.

Motivation

The internet is (mostly) a wonderful thing, but it's kind of a lot. It can be distracting and I thnk we all know the unhealthy loops of scrolling and refreshing the same sites. pwr provides a structured workflow for keeping up to date with a preferred set of sites in an incremental fashion (willpower required). It takes some inspiration from a widely reported workflow that involved sending a URL to a server and having it returned via email to be read in a batch later. pwr adopts the delayed gratification aspect of this but doesn't involve downloading for offline reading.

The pwr flow

One-time setup:

  • Configure the pwr script so it supports your desired feed sources (RSS or using hand-written extractors for those that don't have a good feed).

Regular workflow (just run pwr with no arguments to initiate this sequence in one invocation):

  • Run pwr read to open any URLs that were previously queued for reading.
  • Run pwr fetch to get any new URLs from the configured sources.
  • Run pwr filter to open an editor window where you can quickly mark which retrieved articles to queue for reading.

In my preferred usage, the above is run once a day as a replacement for unstructured web browsing. This flow means you're always reading items that were identified the previous day. Although comments on sites such as Hacker News or Reddit are much maligned, I do find they can be a source of additional insight, and this flow means that by the time you're reading a post ~24 hours after initially found, discussion has died down so there's little reason to keep refreshing.

pwr filter is the main part requiring active input, and involves the editor in a way that is somewhat inspired by git rebase -i. For instance, at the time of writing it produces the following output (and you would simply replace the d prefix with r for any you want to queue to read:

------------------------------------------------------------
Filter file generated at 2024-07-01 08:51:54 UTC
DO NOT DELETE OR MOVE ANY LINES
To mark an item for reading, replace the 'd' prefix with 'r'
Exit editor with non-zero return code (:cq in vim) to abort
------------------------------------------------------------

# Rust Internals
d [Discussion] Hybrid borrow (0 replies)

# Swift Evolution
d [Pitch #2] Safe Access to Contiguous Storage (27 replies)
d [Re-Proposal] Type only Unions (69 replies)

# HN
d Programmers Should Never Trust Anyone, Not Even Themselves
d Unification in Elixir
d Quaternions in Signal and Image Processing

# lobste.rs
d Code Reviews Do Find Bugs
d Integrated assembler improvements in LLVM 19
d Cubernetes
d Grafana security update: Grafana Loki and unintended data write attempts to Amazon S3 buckets
d regreSSHion: RCE in OpenSSH's server, on glibc-based Linux systems (CVE-2024-6387)
d Elaboration of the PostgreSQL sort cost model

# /r/programminglanguages
d Rate my syntax (Array Access)

Making it your own

Ultimately pwr is a tool that happens to scratch an itch for me. It's out there in case any aspect of it is useful to you. It's very explicitly written a script, where the expected usage is that you take a copy and make what modifications you need for yourself (changing sources, new fetchers, or other improvements).


Article changelog
  • 2024-07-01: Initial publication date.

July 01, 2024 10:00 AM

June 26, 2024

Lucas Fryzek

Software Rendering and Android

My current project at Igalia has had me working on Mesa’s software renderers, llvmpipe and lavapipe. I’ve been working to get them running on Android, and I wanted to document the progress I’ve made, the challenges I’ve faced, and talk a little bit about the development process for a project like this. My work is not totally merged into upstream mesa yet, but you can see the MRs I made here:

Setting up an Android development environment

Getting system level software to build and run on Android is unfortunately not straightforward. Since we are doing software rendering we don’t need a physical device and instead we can make use of the Android emulator, and if you didn’t know Android has two emulators, the common one most people use is “goldfish” and the other lesser known is “cuttlefish”. For this project I did my work on the cuttlefish emulator as its meant for testing the Android OS itself instead of just Android apps and is more reflective of real hardware. The cuttlefish emulator takes a little bit more work to setup, and I’ve found that it only works properly in Debian based linux distros. I run Fedora, so I had to run the emulator in a debian VM.

Thankfully Google has good instructions for building and running cuttlefish, which you can find here. The instructions show you how to setup the emulator using nightly build images from Google. We’ll also need to setup our own Android OS images so after we’ve confirmed we can run the emulator, we need to start looking at building AOSP.

For building our own AOSP image, we can also follow the instructions from Google here. For the target we’ll want aosp_cf_x86_64_phone-trunk_staging-eng. At this point it’s a good idea to verify that you can build the image, which you can do by following the rest of the instructions on the page. Building AOSP from source does take a while though, so prepare to wait potentially an entire day for the image to build. Also if you get errors complaining that you’re out of memory, you can try to reduce the number of parallel builds. Google officially recommends to have 64GB of RAM, and I only had 32GB so some packages had to be built with the parallel builds set to 1 so I wouldn’t run out of RAM.

For running this custom-built image on Cuttlefish, you can just copy all the *.img files from out/target/product/vsoc_x86_64/ to the root cuttlefish directory, and then launch cuttlefish. If everything worked successfully you should be able to see your custom built AOSP image running in the cuttlefish webui.

Building Mesa targeting Android

Working from the changes in MR !29344 building llvmpipe or lavapipe targeting Android should just work™️. To get to that stage required a few changes. First llvmpipe actually already had some support on Android, as long as it was running on a device that supports a DRM display driver. In that case it could use the dri window system integration which already works on Android. I wanted to get llvmpipe (and lavapipe) running without dri, so I had to add support for Android in the drisw window system integration.

To support Android in drisw, this mainly meant adding support for importing dmabuf as framebuffers. The Android windowing system will provide us with a “gralloc” buffer which inside has a dmabuf fd that represents the framebuffer. Adding support for importing dmabufs in drisw means we can import and begin drawing to these frame buffers. Most the changes to support that can be found in drisw_allocate_textures and the underlying changes to llvmpipe to support importing dmabufs in MR !27805. The EGL Android platform code also needed some changes to use the drisw window system code. Previously this code would only work with true dri drivers, but with some small tweaks it was possible to get to have it initialize the drisw window system and then using it for rendering if no hardware devices are available.

For lavapipe the changes were a lot simpler. The Android Vulkan loader requires your driver to have HAL_MODULE_INFO_SYM symbol in the binary, so that got created and populated correctly, following other Vulkan drivers in Mesa like turnip. Then the image creation code had to be modified to support the VK_ANDROID_native_buffer extension which allows the Android Vulkan loader to create images using Android native buffer handles. Under the hood this means getting the dmabuf fd from the native buffer handle. Thankfully mesa already has some common code to handle this, so I could just use that. Some other small changes were also necessary to address crashes and other failures that came up during testing.

With the changes out of of the way we can now start building Mesa on Android. For this project I had to update the Android documentation for Mesa to include steps for building LLVM for Android since the version Google ships with the NDK is missing libraries that llvmpipe/lavapipe need to function. You can see the updated documentation here and here. After sorting out LLVM, building llvmpipe/lavapipe is the same as building any other Mesa driver for Android: we setup a cross file to tell meson how to cross compile and then we run meson. At this point you could manual modify the Android image and copy these files to the vm, but I also wanted to support building a new AOSP image directly including the driver. In order to do that you also have to rename the driver binaries to match Android’s naming convention, and make sure SO_NAME matches as well. If you check out this section of the documentation I wrote, it covers how to do that.

If you followed all of that you should have built an version of llvmpipe and lavapipe that you can run on Android’s cuttlefish emulator.

Android running lavapipe
Android running lavapipe

References

June 26, 2024 11:00 PM

June 20, 2024

Frédéric Wang

My recent contributions to Gecko (1/3)

Introduction

Igalia has been contributing to the web platform implementations of different web engines for a long time. One of our goals is ensuring that these implementations are interoperable, by relying on various web standards and web platform tests. In July 2023, I happily joined a project that focuses on this goal, and I worked more specifically on the Gecko web engine. One year later, three new features I contributed to are being shipped in Firefox. In this series of blog posts, I’ll give an overview of those features (namely registered custom properties, content visibility, and fetch priority) and my journey to make them “ride the train” as Mozilla people say.

Let’s start with registered custom properties, an enhancement of traditional CSS variables.

Registered custom properties

You may already be familiar with CSS variables, these “dash dash” names that facilitate the maintenance of a large web site by allowing author-defined CSS properties. In the example below, the :root selector defines a variable --main-theme-color with value “blue”, which is used for the style applied to other elements via the var() CSS function. As you can see, this makes the usage of the main theme color in different places more readable and makes customizing that color much easier.

:root { --main-theme-color: blue; }
p { color: var(--main-theme-color); }
section {
  padding: 1em;
  border: 1px solid var(--main-theme-color);
}
.progress-bar {
  height: 10px;
  width: 100%;
  background: linear-gradient(white, var(--main-theme-color));
}
<section>
  <p>Loading...</p>
  <div class="progress-bar"></div>
</section>

In browsers supporting CSS variables, you should see a frame containing the text “Loading” and a progress bar, all of these components being blue:

.example1 { --main-theme-color: blue; margin: 2em; } .example1 p { color: var(--main-theme-color); } .example1 section { padding: 1em; border: 1px solid var(--main-theme-color); } .example1 .progress-bar { height: 10px; width: 100%; background: linear-gradient(white, var(--main-theme-color)); }

Loading...

Having such CSS variables available is already nice, but they are lacking some features available to native CSS properties… For example, there is (almost) no syntax checking on specified values, they are always inherited, and their initial value is always the guaranteed invalid value. In order to improve on that situation, the CSS Properties and Values specification provides some APIs to register custom properties with further characteristics:

  • An accepted syntax for the property; for example, igalia | <url> | <integer>+ means either the custom identifier “igalia”, or a URL, or a space-separated list of integers.
  • Whether the property is inherited or non-inherited.
  • An initial value.

Custom properties can be registered via CSS or via a JS API, and these ways are equivalent. For example, to register --main-theme-color as a non-inherited color with initial value blue:

@property --main-theme-color {
  syntax: "<color>";
  inherits: false;
  initial-value: blue;
}
window.CSS.registerProperty({
  name: "--main-theme-color",
  syntax: "<color>",
  inherits: false,
  initialValue: blue,
});

Interpolation of registered custom properties

By having custom properties registered with a specific syntax, we open up the possibility of interpolating between two values of the properties when performing an animation. Consider the following example, where the width of the animated div depends on the custom property --my-length. Defining this property as a length allows browsers to interpolate it continuously between 10px and 200px when it is animated:

 @property --my-length {
   syntax: "<length>";
   inherits: false;
   initialValue: '0px';
 }
 @keyframes test {
   from {
     --my-length: 10px;
   }
   to {
     --my-length: 200px;
   }
 }
 div#animated {
   animation: test 2s linear both;
   width: var(--my-length, 10px);
   height: 200px;
   background: lightblue;
 }

With non-registered custom properties, we can instead only animate discretely; --my-length would suddenly jump from 10px to 200px halfway through the duration of the animation, which is generally not what is desired for lengths.

Custom properties in the cascade

If you check the Interop 2023 Dashboard for custom properties, you may notice that interoperability was really bad at the beginning of the year, and this was mainly due to Firefox’s low score. Consequently, when I joined the project, I was asked to help with improving that situation.

Graph showing the 2023 evolution of scores and interop for custom properties

While the two registration methods previously mentioned had already been implemented, the main issue was that the CSS cascade was always treating custom properties as inherited and initialized with the guaranteed invalid value. This is indeed correct for unregistered custom properties, but it’s generally incorrect for registered custom properties!

In bug 1840478, bug 1855887, and others, I made registered custom properties work properly in the cascade, including non-inherited properties and registered initial values. But in the past, with the previous assumptions around inheritance and initial values, it was possible to store the computed values of custom properties on an element as a “cheap” map, considering only the properties actually specified on the element or an ancestor and (in most cases) only taking shallow copies of the parent’s map. As a result, when generalizing the cascade for registered custom properties, I had to be careful to avoid introducing performance regressions for existing content.

Custom properties in animations

Another area where the situation was pretty bad was animations. Not only was Firefox unable to interpolate registered custom properties between two values — one of the main motivations for the new spec — but it was actually unable to animate custom properties at all!

The main problem was that the existing animation code referred to CSS properties using an enum nsCSSPropertyID, with all custom properties represented by the single value nsCSSPropertyID::eCSSPropertyExtra_variable. To make this work for custom properties, I had to essentially replace that value with a structure containing the nsCSSPropertyID and the name of the custom properties.

I uploaded patches to bug 1846516 to perform that change throughout the whole codebase, and with a few more tweaks, I was able to make registered custom properties animate discretely, but my patches still needed some polish before they could be reviewed. I had to move onto other tasks, but fortunately, some Mozilla folks were kind enough to take over this task, and more generally, complete the work on registered custom properties!

Conclusion

This was an interesting task to work on, and because a lot of the work happened in Stylo, the CSS engine shared by Servo and Gecko, I also had the opportunity to train more on the Rust programming language. Thanks to help from folks at Mozilla, we were able to get excellent progress on registered custom properties in Firefox in 2023, and this feature is expected to ship in Firefox 128!

As I said, I’ve since moved onto other tasks, which I’ll describe in subsequent blog posts in this series. Stay tuned for content-visibility, enabling interesting layout optimizations for web pages.

June 20, 2024 10:00 PM

June 06, 2024

Víctor Jáquez

GStreamer Vulkan Operation API

Two weeks ago the GStreamer Spring Hackfest took place in Thessaloniki, Greece. I had a great time. I hacked a bit on VA, Vulkan and my toy, planet-rs, but mostly I ate delicious Greek food ☻. A big thanks to our hosts: Vivia, Jordan and Sebastian!

GStreamer Spring Hackfest
2024
First day of the GStreamer Spring Hackfest 2024 - https://floss.social/@gstreamer/112511912596084571

And now, writing this supposed small note, I recalled that I have in my to-do list an item to write a comment about GstVulkanOperation, an addition to GstVulkan API which helps with the synchronization of operations on frames, in order to enable Vulkan Video.

Originally, GstVulkan API didn’t provide almost any synchronization operation, beside fences, and that appeared to be enough for elements, since they do simple Vulkan operations. Nonetheless, as soon as we enabled VK_VALIDATION_FEATURE_ENABLE_SYNCHRONIZATION_VALIDATION_EXT feature, which reports resource access conflicts due to missing or incorrect synchronization operations between action [*], a sea of hazard operation warnings drowned us [*].

Hazard operations are a sequence of read/write commands in a memory area, such as an image, that might be re-ordered, or racy even.

Why are those hazard operations reported by the Vulkan Validation Layer, if the programmer pushes the commands to execute in queue in order? Why is explicit synchronization required? Because, as the great blog post from Hans-Kristian Arntzen, Yet another blog explaining Vulkan synchronization, (make sure you read it!) states:

[…] all commands in a queue execute out of order. Reordering may happen across command buffers and even vkQueueSubmits

In order to explain how synchronization is done in Vulkan, allow me to yank a couple definitions stated by the specification:

Commands are instructions that are recorded in a device’s queue. There are four types of commands: action, state, synchronization and indirection. Synchronization commands impose ordering constraints on action commands, by introducing explicit execution and memory dependencies.

Operation is an arbitrary amount of commands recorded in a device’s queue.

Since the driver can reorder commands (perhaps for better performance, dunno), we need to send explicit synchronization commands to the device’s queue to enforce a specific sequence of action commands.

Nevertheless, Vulkan doesn’t offer fine-grained dependencies between individual operations. Instead, dependencies are expressed as a relation of two elements, where each element is composed by the intersection of scope and operation. A scope is a concept in the specification that, in practical terms, can be either pipeline stage (for execution dependencies), or both pipeline stage and memory access type (for memory dependencies).

First let’s review execution dependencies through pipeline stages:

Every command submitted to a device’s queue goes through a sequence of steps known as pipeline stages. This sequence of steps is one of the very few implicit ordering guarantees that Vulkan has. Draw calls, copy commands, compute dispatches, all go through certain sequential stages, which amount of stages to cover depends on the specific command and the current command buffer state.

In order to visualize an abstract execution dependency let’s imagine two compute operations and the first must happen before the second.

Operation 1
Sync command
Operation 2
  1. The programmer has to specify the Sync command in terms of two scopes (Scope 1 and Scope 2), in this execution dependency case, two pipeline stages.
  2. The driver generates an intersection between commands in Operation 1 and Scope 1 defined as Scoped operation 1. The intersection contains all the commands in Operation 1 that go through up to the pipeline stage defined in Scope 1. The same is done with Operation 2 and Scope 2 generating Scoped operation 2.
  3. Finally, we got an execution dependency that guarantees that Scoped operation 1 happens before Scoped operation 2.

Now let’s talk about memory dependencies:

First we need to understand the concepts of memory availability and visibility. Their formal definition in Vulkan are a bit hard to grasp since they come from the Vulkan memory model, which is intended to abstract all the ways of how hardware access memory. Perhaps we could say that availability is the operation that assures the existence of the required memory; while visibility is the operation that assures it’s possible to read/write the data in that memory area.

Memory dependencies are limited the Operation 1 that be done before memory availability and Operation 2 that have to be done after its visibility.

But again, there’s no fine-grained way to declare that memory dependency. Instead, there are memory access types, which are functions used by descriptor types, or functions for pipeline stage to access memory, and they are used as access scopes.

All in all, if a synchronization command defining a memory dependency between two operations, it’s composed by the intersection of between each command and a pipeline stage, intersected with the memory access type associated with the memory processed by those commands.

Now that the concepts are more or less explained we could see those concepts expressed in code. The synchronization command for execution and memory dependencies is defined by VkDependencyInfoKHR. And it contains a set of barrier arrays, for memory, buffers and images. Barriers express the relation of dependency between two operations. For example, Image barriers use VkImageMemoryBarrier2 which contain the mask for source pipeline stage (to define Scoped operation 1), and the mask for the destination pipeline stage (to define Scoped operation 2); the mask for source memory access type and the mask for the destination memory access to define access scopes; and also layout transformation declaration.

A Vulkan synchronization example from Vulkan Documentation wiki:

vkCmdDraw(...);

... // First render pass teardown etc.

VkImageMemoryBarrier2KHR imageMemoryBarrier = {
...
.srcStageMask = VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT_KHR,
.dstStageMask = VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT_KHR,
.dstAccessMask = VK_ACCESS_2_COLOR_ATTACHMENT_WRITE_BIT_KHR,
.oldLayout = VK_IMAGE_LAYOUT_READ_ONLY_OPTIMAL,
.newLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL
/* .image and .subresourceRange should identify image subresource accessed */};

VkDependencyInfoKHR dependencyInfo = {
...
1, // imageMemoryBarrierCount
&imageMemoryBarrier, // pImageMemoryBarriers
...
}

vkCmdPipelineBarrier2KHR(commandBuffer, &dependencyInfo);

... // Second render pass setup etc.

vkCmdDraw(...);

First draw samples a texture in the fragment shader. Second draw writes to that texture as a color attachment.

This is a Write-After-Read (WAR) hazard, which you would usually only need an execution dependency for - meaning you wouldn’t need to supply any memory barriers. In this case you still need a memory barrier to do a layout transition though, but you don’t need any access types in the src access mask. The layout transition itself is considered a write operation though, so you do need the destination access mask to be correct - or there would be a Write-After-Write (WAW) hazard between the layout transition and the color attachment write.

Other explicit synchronization mechanisms, along with barriers, are semaphores and fences. Semaphores are a synchronization primitive that can be used to insert a dependency between operations without notifying the host; while fences are a synchronization primitive that can be used to insert a dependency from a queue to the host. Semaphores and fences are expressed in the VkSubmitInfo2KHR structure.

As a preliminary conclusion, synchronization in Vulkan is hard and a helper API would be very helpful. Inspired by FFmpeg work done by Lynne, I added GstVulkanOperation object helper to GStreamer Vulkan API.

GstVulkanOperation object helper aims to represent an operation in the sense of the Vulkan specification mentioned before. It owns a command buffer as public member where external commands can be pushed to the associated device’s queue.

It has a set of methods:

Internally, GstVulkanOperation contains two arrays:

  1. The array of dependency frames, which are the set of frames, each representing an operation, which will hold dependency relationships with other dependency frames.

    gst_vulkan_operation_add_dependency_frame appends frames to this array.

    When calling gst_vulkan_operation_end the frame’s barrier state for each frame in the array is updated.

    Also, each dependency frame creates a timeline semaphore, which will be signaled when a command, associated with the frame, is executed in the device’s queue.

  2. The array of barriers, which contains a list of synchronization commands. gst_vulkan_operation_add_frame_barrier fills and appends a VkImageMemoryBarrier2KHR associated with a frame, which can be in the array of dependency frames.

Here’s a generic view of video decoding example:

gst_vulkan_operation_begin (priv->exec, ...);

cmd_buf = priv->exec->cmd_buf->cmd;

gst_vulkan_operation_add_dependency_frame (exec, out,
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR,
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR);

/* assume a engine where out frames can be used for DPB frames, */
/* so a barrier for layout transition is required */
gst_vulkan_operation_add_frame_barrier (exec, out,
VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
VK_ACCESS_2_VIDEO_DECODE_WRITE_BIT_KHR,
VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR, NULL);

for (i = 0; i < dpb_size; i++) {
gst_vulkan_operation_add_dependency_frame (exec, dpb_frame,
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR,
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR);
}

barriers = gst_vulkan_operation_retrieve_image_barriers (exec);
vkCmdPipelineBarrier2 (cmd_buf, &(VkDependencyInfo) {
...
.pImageMemoryBarriers = barriers->data,
.imageMemoryBarrierCount = barriers->len,
});
g_array_unref (barriers);

vkCmdBeginVideoCodingKHR (cmd_buf, &decode_start);
vkCmdDecodeVideoKHR (cmd_buf, &decode_info);
vkCmdEndVideoCodingKHR (cmd_buf, &decode_end);

gst_vulkan_operation_end (exec, ...);

Here, just one memory barrier is required for memory layout transition, but semaphores are required to signal when an output frame and its DPB frames are processed, and later, the output frame can be used as a DPB frame. Otherwise, the output frame might not be fully reconstructed with it’s used as DPB for the next output frame, generating only noise.

And that’s all. Thank you.

June 06, 2024 12:00 AM

June 05, 2024

Alberto Garcia

More ways to install software in SteamOS: Distrobox and Nix

Introduction

In my previous post I talked about how to use systemd-sysext to add software to the Steam Deck without modifying the root filesystem. In this post I will give a brief overview of two additional methods.

Distrobox

distrobox is a tool that uses containers to create a mutable environment on top of your OS.

Distrobox running in SteamOS

With distrobox you can open a terminal with your favorite Linux distro inside, with full access to the package manager and the ability to install additional software. Containers created by distrobox are integrated with the system so apps running inside have normal access to the user’s home directory and the Wayland/X11 session.

Since these containers are not stored in the root filesystem they can survive an OS update and continue to work fine. For this reason they are particularly suited to systems with an immutable root filesystem such as Silverblue, Endless OS or SteamOS.

Starting from SteamOS 3.5 the system comes with distrobox (and podman) preinstalled and it can be used right out of the box without having to do any previous setup.

For example, in order to create a Debian bookworm container simply open a terminal and run this:

$ distrobox create -i debian:bookworm debbox

Here debian:bookworm is the image that this container is created from (debian is the name and bookworm is the tag, see the list of supported tags here) and debbox is the name that is given to this new container.

Once the container is created you can enter it:

$ distrobox enter debbox

Or from the ‘Debian’ entry in the desktop menu -> Lost & Found.

Once inside the container you can run your Debian commands normally:

$ sudo apt update
$ sudo apt install vim-gtk3

Nix

Nix is a package manager for Linux and other Unix-like systems. It has the property that it can be installed alongside the official package manager of any distribution, allowing the user to add software without affecting the rest of the system.

Nix running in SteamOS

Nix installs everything under the /nix directory, and packages are made available to the user through a new entry in the PATH and a ~/.nix-profile symlink stored in the home directory.

Nix is more things, including the basis of the NixOS operating system. Explaning Nix in more detail is beyond the scope of this blog post, but for SteamOS users these are perhaps its most interesting properties:

  • Nix is self-contained: all packages and their dependencies are installed under /nix.
  • Unlike software installed with pacman, Nix survives OS updates.
  • Unlike podman / distrobox, Nix does not create any containers. All packages have normal access to the rest of the system, just like native SteamOS packages.
  • Nix has a very large collection of packages, here is a search engine: https://search.nixos.org/packages

The only thing that Nix needs from SteamOS is help to set up the /nix directory so its contents are not stored in the root filesystem. This is already happening starting from SteamOS 3.5 so you can install Nix right away in single-user mode:

$ sudo chown deck:deck /nix
$ wget https://nixos.org/nix/install
$ sh ./install --no-daemon

This installs Nix and adds a line to ~/.bash_profile to set up the necessary environment variables. After that you can log in again and start using it. Here’s a very simple example (refer to the official documentation for more details):

# Install and run Midnight Commander
$ nix-env -iA nixpkgs.mc
$ mc

# List installed packages
$ nix-env -q
mc-4.8.31
nix-2.21.1

# Uninstall Midnight Commander
$ nix-env -e mc-4.8.31

What we have seen so far is how to install Nix in single-user mode, which is the simplest one and probably good enough for a single-user machine like the Steam Deck. The Nix project however recommends a multi-user installation, see here for the reasons.

Unfortunately the official multi-user installer does not work out of the box on the Steam Deck yet, but if you want to go the multi-user way you can use the Determinate Systems installer: https://github.com/DeterminateSystems/nix-installer

Conclusion

Distrobox and Nix are useful tools and they give SteamOS users the ability to add additional software to the system without having to modify the base operating system.

While for graphical applications the recommended way to install third-party software is still Flatpak, Distrobox and Nix give the user additional flexibility and are particularly useful for installing command-line utilities and other system tools.

by berto at June 05, 2024 03:53 PM

May 27, 2024

Andy Wingo

cps in hoot

Good morning good morning! Today I have another article on the Hoot Scheme-to-Wasm compiler, this time on Hoot’s use of the continuation-passing-style (CPS) transformation.

calls calls calls

So, just a bit of context to start out: Hoot is a Guile, Guile is a Scheme, Scheme is a Lisp, one with “proper tail calls”: function calls are either in tail position, syntactically, in which case they are tail calls, or they are not in tail position, in which they are non-tail calls. A non-tail call suspends the calling function, putting the rest of it (the continuation) on some sort of stack, and will resume when the callee returns. Because non-tail calls push their continuation on a stack, we can call them push calls.

(define (f)
  ;; A push call to g, binding its first return value.
  (define x (g))
  ;; A tail call to h.
  (h x))

Usually the problem in implementing Scheme on other language run-times comes in tail calls, but WebAssembly supports them natively (except on JSC / Safari; should be coming at some point though). Hoot’s problem is the reverse: how to implement push calls?

The issue might seem trivial but it is not. Let me illustrate briefly by describing what Guile does natively (not compiled to WebAssembly). Firstly, note that I am discussing residual push calls, by which I mean to say that the optimizer might remove a push call in the source program via inlining: we are looking at those push calls that survive the optimizer. Secondly, note that native Guile manages its own stack instead of using the stack given to it by the OS; this allows for push-call recursion without arbitrary limits. It also lets Guile capture stack slices and rewind them, which is the fundamental building block we use to implement exception handling, Fibers and other forms of lightweight concurrency.

The straightforward function call will have an artificially limited total recursion depth in most WebAssembly implementations, meaning that many idiomatic uses of Guile will throw exceptions. Unpleasant, but perhaps we could stomach this tradeoff. The greater challenge is how to slice the stack. That I am aware of, there are three possible implementation strategies.

generic slicing

One possibility is that the platform provides a generic, powerful stack-capture primitive, which is what Guile does. The good news is that one day, the WebAssembly stack-switching proposal should provide this too. And in the meantime, the so-called JS Promise Integration (JSPI) proposal gets close: if you enter Wasm from JS via a function marked as async, and you call out to JavaScript to a function marked as async (i.e. returning a promise), then on that nested Wasm-to-JS call, the engine will suspend the continuation and resume it only when the returned promise settles (i.e. completes with a value or an exception). Each entry from JS to Wasm via an async function allocates a fresh stack, so I understand you can have multiple pending promises, and thus multiple wasm coroutines in progress. It gets a little gnarly if you want to control when you wait, for example if you might want to wait on multiple promises; in that case you might not actually mark promise-returning functions as async, and instead import an async-marked async function waitFor(p) { return await p} or so, allowing you to use Promise.race and friends. The main problem though is that JSPI is only for JavaScript. Also, its stack sizes are even smaller than the the default stack size.

instrumented slicing

So much for generic solutions. There is another option, to still use push calls from the target machine (WebAssembly), but to transform each function to allow it to suspend and resume. This is what I think of as Joe Marshall’s stack trick (also see §4.2 of the associated paper). The idea is that although there is no primitive to read the whole stack, each frame can access its own state. If you insert a try/catch around each push call, the catch handler can access local state for activations of that function. You can slice a stack by throwing a SaveContinuation exception, in which each frame’s catch handler saves its state and re-throws. And if we want to avoid exceptions, we can use checked returns as Asyncify does.

I never understood, though, how you resume a frame. The Generalized Stack Inspection paper would seem to indicate that you need the transformation to introduce a function to run “the rest of the frame” at each push call, which becomes the Invoke virtual method on the reified frame object. To avoid code duplication you would have to make normal execution flow run these Invoke snippets as well, and that might undo much of the advantages. I understand the implementation that Joe Marshall was working on was an interpreter, though, which bounds the number of sites needing such a transformation.

cps transformation

The third option is a continuation-passing-style transformation. A CPS transform results in a program whose procedures “return” by tail-calling their “continuations”, which themselves are procedures. Taking our previous example, a naïve CPS transformation would reify the following program:

(define (f' k)
  (g' (lambda (x) (h' k x))))

Here f' (“f-prime”) receives its continuation as an argument. We call g', for whose continuation argument we pass a closure. That closure is the return continuation of g, binding a name to its result, and then tail-calls h with respect to f. We know their continuations are the same because it is the same binding, k.

Unfortunately we can’t really slice abitrary ranges of a stack with the naïve CPS transformation: we can only capture the entire continuation, and can’t really inspect its structure. There is also no way to compose a captured continuation with the current continuation. And, in a naïve transformation, we would be constantly creating lots of heap allocation for these continuation closures; a push call effectively pushes a frame onto the heap as a closure, as we did above for g'.

There is also the question of when to perform the CPS transform; most optimizing compilers would like a large first-order graph to work on, which is out of step with the way CPS transformation breaks functions into many parts. Still, there is a nugget of wisdom here. What if we preserve the conventional compiler IR for most of the pipeline, and only perform the CPS transformation at the end? In that way we can have nice SSA-style optimizations. And, for return continuations of push calls, what if instead of allocating a closure, we save the continuation data on an explicit stack. As Andrew Kennedy notes, closures introduced by the CPS transform follow a stack discipline, so this seems promising; we would have:

(define (f'' k)
  (push! k)
  (push! h'')
  (g'' (lambda (x)
         (define h'' (pop!))
         (define k (pop!))
         (h'' k x))))

The explicit stack allows for generic slicing, which makes it a win for implementing delimited continuations.

hoot and cps

Hoot takes the CPS transformation approach with stack-allocated return closures. In fact, Hoot goes a little farther, too far probably:

(define (f''')
  (push! k)
  (push! h''')
  (push! (lambda (x)
           (define h'' (pop!))
           (define k (pop!))
           (h'' k x)))
  (g'''))

Here instead of passing the continuation as an argument, we pass it on the stack of saved values. Returning pops off from that stack; for example, (lambda () 42) would transform as (lambda () ((pop!) 42)). But some day I should go back and fix it to pass the continuation as an argument, to avoid excess stack traffic for leaf function calls.

There are some gnarly details though, which I know you are here for!

splits

For our function f, we had to break it into two pieces: the part before the push-call to g and the part after. If we had two successive push-calls, we would instead split into three parts. In general, each push-call introduces a split; let us use the term tails for the components produced by a split. (You could also call them continuations.) How many tails will a function have? Well, one for the entry, one for each push call, and one any time control-flow merges between two tails. This is a fixpoint problem, given that the input IR is a graph. (There is also some special logic for call-with-prompt but that is too much detail for even this post.)

where to save the variables

Guile is a dynamically-typed language, having a uniform SCM representation for every value. However in the compiler and run-time we can often unbox some values, generally as u64/s64/f64 values, but also raw pointers of some specific types, some GC-managed and some not. In native Guile, we can just splat all of these data members into 64-bit stack slots and rely on the compiler to emit stack maps to determine whether a given slot is a double or a tagged heap object reference or what. In WebAssembly though there is no sum type, and no place we can put either a u64 or a (ref eq) value. So we have not one stack but three (!) stacks: one for numeric values, implemented using a Wasm memory; one for (ref eq) values, using a table; and one for return continuations, because the func type hierarchy is disjoin from eq. It’s.... it’s gross? It’s gross.

what variables to save

Before a push-call, you save any local variables that will be live after the call. This is also a flow analysis problem. You can leave off constants, and instead reify them anew in the tail continuation.

I realized, though, that we have some pessimality related to stacked continuations. Consider:

(define (q x)
  (define y (f))
  (define z (f))
  (+ x y z))

Hoot’s CPS transform produces something like:

(define (q0 x)
  (save! x)
  (save! q1)
  (f))

(define (q1 y)
  (restore! x)
  (save! x)
  (save! y)
  (save! q2)
  (f))

(define (q2 z)
  (restore! x)
  (restore! y)
  ((pop!) (+ x y z)))

So q0 saved x, fine, indeed we need it later. But q1 didn’t need to restore x uselessly, only to save it again on q2‘s behalf. Really we should be applying a stack discipline for saved data within a function. Given that the source IR is a graph, this means another flow analysis problem, one that I haven’t thought about how to solve yet. I am not even sure if there is a solution in the literature, given that the SSA-like flow graphs plus tail calls / CPS is a somewhat niche combination.

calling conventions

The continuations introduced by CPS transformation have associated calling conventions: return continuations may have the generic varargs type, or the compiler may have concluded they have a fixed arity that doesn’t need checking. In any case, for a return, you call the return continuation with the returned values, and the return point then restores any live-in variables that were previously saved. But for a merge between tails, you can arrange to take the live-in variables directly as parameters; it is a direct call to a known continuation, rather than an indirect call to an unknown call site.

cps soup?

Guile’s intermediate representation is called CPS soup, and you might wonder what relationship that CPS has to this CPS. The answer is not much. The continuations in CPS soup are first-order; a term in one function cannot continue to a continuation in another function. (Inlining and contification can merge graphs from different functions, but the principle is the same.)

It might help to explain that it is the same relationship as it would be if Guile represented programs using SSA: the Hoot CPS transform runs at the back-end of Guile’s compilation pipeline, where closures representations have already been made explicit. The IR is still direct-style, just that syntactically speaking, every call in a transformed program is a tail call. We had to introduce save and restore primitives to implement the saved variable stack, and some other tweaks, but generally speaking, the Hoot CPS transform ensures the run-time all-tail-calls property rather than altering the compile-time language; a transformed program is still CPS soup.

fin

Did we actually make the right call in going for a CPS transformation?

I don’t have good performance numbers at the moment, but from what I can see, the overhead introduced by CPS transformation can impose some penalties, even 10x penalties in some cases. But some results are quite good, improving over native Guile, so I can’t be categorical.

But really the question is, is the performance acceptable for the functionality, and there I think the answer is more clear: we have a port of Fibers that I am sure Spritely colleagues will be writing more about soon, we have good integration with JavaScript promises while not relying on JSPI or Asyncify or anything else, and we haven’t had to compromise in significant ways regarding the source language. So, for now, I am satisfied, and looking forward to experimenting with the stack slicing proposal as it becomes available.

Until next time, happy hooting!

by Andy Wingo at May 27, 2024 12:36 PM

May 24, 2024

Andy Wingo

hoot's wasm toolkit

Good morning! Today we continue our dive into the Hoot Scheme-to-WebAssembly compiler. Instead of talking about Scheme, let’s focus on WebAssembly, specifically the set of tools that we have built in Hoot to wrangle Wasm. I am peddling a thesis: if you compile to Wasm, probably you should write a low-level Wasm toolchain as well.

(Incidentally, some of this material was taken from a presentation I gave to the Wasm standardization organization back in October, which I think I haven’t shared yet in this space, so if you want some more context, have at it.)

naming things

Compilers are all about names: definitions of globals, types, local variables, and so on. An intermediate representation in a compiler is a graph of definitions and uses in which the edges are names, and the set of possible names is generally unbounded; compilers make more names when they see fit, for example when copying a subgraph via inlining, and remove names if they determine that a control or data-flow edge is not necessary. Having an unlimited set of names facilitates the graph transformation work that is the essence of a compiler.

Machines, though, generally deal with addresses, not names; one of the jobs of the compiler back-end is to tabulate the various names in a compilation unit, assigning them to addresses, for example when laying out an ELF binary. Some uses may refer to names from outside the current compilation unit, as when you use a function from the C library. The linker intervenes at the back-end to splice in definitions for dangling uses and applies the final assignment of names to addresses.

When targetting Wasm, consider what kinds of graph transformations you would like to make. You would probably like for the compiler to emit calls to functions from a low-level run-time library written in wasm. Those functions are probably going to pull in some additional definitions, such as globals, types, exception tags, and so on. Then once you have your full graph, you might want to lower it, somehow: for example, you choose to use the stringref string representation, but browsers don’t currently support it; you run a post-pass to lower to UTF-8 arrays, but then all your strings are not constant, meaning they can’t be used as global initializers; so you run another post-pass to initialize globals in order from the start function. You might want to make other global optimizations as well, for example to turn references to named locals into unnamed stack operands (not yet working :).

Anyway what I am getting at is that you need a representation for Wasm in your compiler, and that representation needs to be fairly complete. At the very minimum, you need a facility to transform that in-memory representation to the standard WebAssembly text format, which allows you to use a third-party assembler and linker such as Binaryen’s wasm-opt. But since you have to have the in-memory representation for your own back-end purposes, probably you also implement the names-to-addresses mapping that will allow you to output binary WebAssembly also. Also it could be that Binaryen doesn’t support something you want to do; for example Hoot uses block parameters, which are supported fine in browsers but not in Binaryen.

(I exaggerate a little; Binaryen is a more reasonable choice now than it was before the GC proposal was stabilised. But it has been useful to be able to control Hoot’s output, for example as the exception-handling proposal has evolved.)

one thing leads to another

Once you have a textual and binary writer, and an in-memory representation, perhaps you want to be able to read binaries as well; and perhaps you want to be able to read text. Reading the text format is a little annoying, but I had implemented it already in JavaScript a few years ago; and porting it to Scheme was a no-brainer, allowing me to easily author the run-time Wasm library as text.

And so now you have the beginnings of a full toolchain, built just out of necessity: reading, writing, in-memory construction and transformation. But how are you going to test the output? Are you going to require a browser? That’s gross. Node? Sure, we have to check against production Wasm engines, and that’s probably the easiest path to take; still, would be nice if this were optional. Wasmtime? But that doesn’t do GC.

No, of course not, you are a dirty little compilers developer, you are just going to implement a little wasm interpreter, aren’t you. Of course you are. That way you can build nice debugging tools to help you understand when things go wrong. Hoot’s interpreter doesn’t pretend to be high-performance—it is not—but it is simple and it just works. Massive kudos to Spritely hacker David Thompson for implementing this. I think implementing a Wasm VM also had the pleasant side effect that David is now a Wasm expert; implementation is the best way to learn.

Finally, one more benefit of having a Wasm toolchain as part of the compiler: %inline-wasm. In my example from last time, I had this snippet that makes a new bytevector:

(%inline-wasm
 '(func (param $len i32) (param $init i32)
    (result (ref eq))
    (struct.new
     $mutable-bytevector
     (i32.const 0)
     (array.new $raw-bytevector
                (local.get $init)
                (local.get $len))))
 len init)

%inline-wasm takes a literal as its first argument, which should parse as a Wasm function. Parsing guarantees that the wasm is syntactically valid, and allows the arity of the wasm to become apparent: we just read off the function’s type. Knowing the number of parameters and results is one thing, but we can do better, in that we also know their type, which we use for intentional types, requiring in this case that the parameters be exact integers which get wrapped to the signed i32 range. The resulting term is spliced into the CPS graph, can be analyzed for its side effects, and ultimately when written to the binary we replace each local reference in the Wasm with a reference of the appropriate local variable. All this is possible because we have the tools to work on Wasm itself.

fin

Hoot’s Wasm toolchain is about 10K lines of code, and is fairly complete. I think it pays off for Hoot. If you are building a compiler targetting Wasm, consider budgetting for a 10K SLOC Wasm toolchain; you won’t regret it.

Next time, an article on Hoot’s use of CPS. Until then, happy hacking!

by Andy Wingo at May 24, 2024 10:37 AM

Víctor Jáquez

GStreamer Hackfest 2024

Last weeks were a bit hectic. First, with a couple friends we biked the southwest of the Netherlands for almost a week. The next week, the last one, I attended the 2024 Display Next Hackfest

This week was Igalia’s Assembly meetings, and next week, along with other colleagues, I’ll be in Thessaloniki for the GStreamer Spring Hackfest

I’m happy to meet again friends from the GStreamer community and talk and move things forward related with Vulkan, VA-API, KMS, video codecs, etc.

May 24, 2024 12:00 AM

May 23, 2024

Patrick Griffis

Introducing the WebKit Container SDK

Developing WebKitGTK and WPE has always had challenges such as the amount of dependencies or it’s fairly complex C++ codebase which not all compiler versions handle well. To help with this we’ve made a new SDK to make it easier.

Current Solutions

There have always been multiple ways to build WebKit and its dependencies on your host however this was never a great developer experience. Only very specific hosts could be “supported”, you often had to build a large number of dependencies, and the end result wasn’t very reproducable for others.

The current solution used by default is a Flatpak based one. This was a big improvement for ease of use and excellent for reproducablity but it introduced many challenges doing development work. As it has a strict sandbox and provides read-only runtimes it was difficult to use complex tooling/IDEs or develop third party libraries in it.

The new SDK tries to take a middle ground between those two alternatives, isolating itself from the host to be somewhat reproducable, yet being a mutable environment to be flexible enough for a wide range of tools and workflows.

The WebKit Container SDK

At the core it is an Ubuntu OCI image with all of the dependencies and tooling needed to work on WebKit. On top of this we added some scripts to run/manage these containers with podman and aid in developing inside of the container. It’s intention is to be as simple as possible and not change traditional development workflows.

You can find the SDK and follow the quickstart guide on our GitHub: https://github.com/Igalia/webkit-container-sdk

The main requirements is that this only works on Linux with podman 4.0+ installed. For example Ubuntu 23.10+.

In the most simple case, once you clone https://github.com/Igalia/webkit-container-sdk.git, using the SDK can be a few commands:

source /your/path/to/webkit-container-sdk/register-sdk-on-host.sh
wkdev-create --create-home
wkdev-enter

From there you can use WebKit’s build scripts (./Tools/Scripts/build-webkit --gtk) or CMake. As mentioned before it is an Ubuntu installation so you can easily install your favorite tools directly like VSCode. We even provide a wkdev-setup-vscode script to automate that.

Advanced Usage

Disposibility

A workflow that some developers may not be familiar with is making use of entirely disposable development environments. Since these are isolated containers you can easily make two. This allows you to do work in parallel that would interfere with eachother while not worrying about it as well as being able to get back to a known good state easily:

wkdev-create --name=playground1
wkdev-create --name=playground2

podman rm playground1 # You would stop first if running.
wkdev-enter --name=playground2

Working on Dependencies

An important part of WebKit development is working on the dependencies of WebKit rather than itself, either for debugging or for new features. This can be difficult or error-prone with previous solutions. In order to make this easier we use a project called JHBuild which isn’t new but works well with containers and is a simple solution to work on our core dependencies.

Here is an example workflow working on GLib:

wkdev-create --name=glib
wkdev-enter --name=glib

# This will clone glib main, build, and install it for us. 
jhbuild build glib

# At this point you could simply test if a bug was fixed in a different versin of glib.
# We can also modify and debug glib directly. All of the projects are cloned into ~/checkout.
cd ~/checkout/glib

# Modify the source however you wish then install your new version.
jhbuild make

Remember that containers are isoated from each other so you can even have two terminals open with different builds of glib. This can also be used to test projects like Epiphany against your build of WebKit if you install it into the JHBUILD_PREFIX.

To Be Continued

In the next blog post I’ll document how to use VSCode inside of the SDK for debugging and development.

May 23, 2024 04:00 AM

May 22, 2024

Frédéric Wang

Flygskam and o caminho da Corunha

Prolegomenon

Early next June, I’m traveling from Paris to A Coruña for the Web Engines Hackfest and other internal Igalia events. In recent years I’ve done it by train, as previously mentioned. Some colleagues at Igalia were curious about it so I decided to write this blog post, investigating possible ways to do it from various European places. I wish this can also motivate more people at Igalia and beyond, who are still hesistant to give up alternatives with heavier carbon footprint.

In addition to various trip planners, I’ve also used this nice map from Wikipedia, which gives a good overview of high-speed rail in Europe:

I also sought advice from Nicolò Ribaudo who is quite familiar with train traveling and provided useful recommendations. In particular, he mentioned the ÖBB trip planner, which seems quite efficient combining trains from multiple operators.

I’ve focused on big european cities (with airports) that are close to Spain but this is definitely not exhaustive. There is probably a lot more to discuss beyong trip planning, but hopefully that can be a good starting point.

Paris as a departure or connection

Based on my experience traveling from Paris, I found these direct trains between big cities:

  • Renfe offers several AVE and Alvia trains traveling every days between A Coruña and Madrid, with a duration 3h30-4h 1. There are two important thing to note:
    1. These local trains are only avalailble for sell maybe 2 months in advance at best.
    2. These trains are connecting to Madrid Chamartín which is maybe 30 minutes away from Madrid Atocha by the Madrid metro.
  • Renfe also proposes even more options (say one each half hour during the day) between Barcelona and Madrid Atocha, with a duration of 2h30-3h 2.
  • The SNCF proposes two or three direct trains during the day between Paris Gare de Lyon and Barcelona, with a duration of 6h30-7h. If you are coming from Paris and want to to take a train to Madrid, you will likely have to cross the station and pass some x-ray baggage scanner, so be sure to keep enough time for the connection.
  • The Eurostar offers several options during the day to connect Paris with cities below. They connect to Gare du Nord or Gare de l’Est, which are very close to each other but ~30 minutes away from Gare de Lyon by public transport.

Personally, I’m doing this one-day-and-half trip (inbound trip is similar):

  1. Take the train from Paris (9:42 AM) to Barcelona (4:33 PM).
  2. Keep enough time for the Barcelona Sants connection.
  3. Take a train from Barcelona to Madrid in the evening.
  4. Stay one night in Madrid.
  5. Take a train from Madrid to A Coruña in the morning.

From London, Amsterdam, Brussels, Berlin one could instead do a two-days trip:

  1. Travel to Paris in the morning 3.
  2. Keep enough time for the Paris connection.
  3. Take the train from Paris (2:42PM) to Barcelona (9:27PM).
  4. Stay one night in Barcelona.
  5. Travel from Barcelona to Madrid Atocha.
  6. Keep enough time for the Madrid Metro connection.
  7. Travel from Madrid Chamartín to A Coruña.

I also looked at the trip with the minimum number of connections to go to Barcelona from big cities in Switzerland and a a similar traject is possible. See later for alternatives.

Finally, Nicolò mentioned that ÖBB recently started running a night train from Berlin to Paris, which you can probably use to do a similar trip as mine.

Estimate of CO2 emissions

In order to estimate CO2 emission for the trips suggested in the previous section, I compiled information from different sources:

There would be a lot to discuss about the methodology but the goal here is only to give a rough idea. Anyway, below is an estimate of kilograms of CO2 per passenger for some of the train trips previously mentioned:

  Eurostar SNCF Ecopassenger 4 Ecotree
Berlin ↔ Cologne - - 17-19 1-3
Cologne ↔ Paris 5.2 5.2 7.4 1-3
London ↔ Paris 2.4 2.4 1.7 1-3
Brussels ↔ Paris 1.6 1.8 1.8 1-2
Amsterdam ↔ Paris 2.6 2.9 9.3-9.5 1-3
Paris ↔ Barcelona - 3.8 - 1-6
Barcelona ↔ Madrid - - 17-20 1-6
Madrid ↔ A Coruña - - - 1-3

The best/worst cases are compiled into the following table and compared with a flight to A Coruña (with a connection by Madrid) as calculated by Ecotree. If we follow these data, a train from Berlin, London, Paris, Brussels and Amsterdam won’t at worst be around 50kg of CO2 per passenger and represent at least a reduction of around 90% compared to using a plane:

Departure Flight by Madrid (Ecotree) Train 5 CO2 reduction
Berlin 448 6-55.4 ≥87%
London 388 4-32 ≥91%
Brussels 396 4-30.8 ≥92%
Amsterdam 396 4-38.5 ≥90%
Paris 368 3-29 ≥92%
Madrid 244 1-3 ≥98%

More cities and trains

Disclaimer: This section is essentially based on information provided by Nicolò and what I found on internet.

In addition to the SNCF train previously mentioned, Renfe proposes two trains that are quite important for traveling from France to Spain:

  • One train between Lyon and Barcelona per day (duration ~5h)
  • One train between Marseille and Madrid per day (duration ~8h)

From Switzerland, you can pass by the south of France using more trains/connections to arrive faster in Barcelone than what was previously proposed. For example, taking a local train to go from Genève to Lyon followed by the Lyon to Barcelona train mentioned above can be done in around 7h. From Zurich, with connection at Genève and Lyon, it takes around 10h as opposed to 12h for a single connection in Paris.

From the Netherlands, Belgium or Germany it makes sense to consider the night trains from Paris to the border with Spain. Those trains do not have a fixed schedule, but vary depending on the weekday. Most of them arrive in Portbou and from there you can take a regional train to Barcelona. Some of them arrive to Latour-de-Carol, and from there it’s three hours on a regional train to Barcelona. In any case, you’ll be early enough at the border so that it’s possible to then arrive to A Coruña in the afternoon or evening. Rarely the night train arrives at the border on the west coast, and continuing from there to A Coruña with the regional trains that go on the northern coast might be a good experience.

From Belgium it’s also possible to take a TGV from Brussels to Lyon, and then from there take the train to Barcelona. This avoids a stressful connection in Paris, where you need to move between the two stations Gare du Nord and Gare de Lyon.

From Italy, the main trouble is to deal with connections. The Turin–Lyon high-speed railway contruction may help in the future but it’s not running yet. The alternatives are either to go on the coast through Genova-Ventimiglia-Nice, or with the Eurocity from Milan to Switzerland finally get to France.

From Portugal, which is so close geographically and culturally to Galicia, we could think there should be an easy way to travel to A Coruña. But apparently neither Comboios de Portugal nor Renfe provides any direct train:

  • From Porto, the ÖBB trip planner suggests connection at Vigo-Guixar for a total duration of 6h30. As a comparison, the website of the Web Engines Hackfest indicates a 3-hours trip by car and a 5-hours trip by bus.
  • Between Libon and Porto, Comboios de Portugal seems to propose trains taking 2h30-3h. You can combine that with the other trains to do the trip to A Coruña with one night at Porto or Vigo.

Last but not least, A Coruña railway station is a one-quarter walk away from the Igalia office and a three-quarter walks away from Palexco (Web Engines Hackfest’s venue). This is more convenient than the A Coruña airport which is around 10 km away from A Coruña.

Notes and references

  • Flygskam: anti-flying social movement started in Sweden, with the aim of reducing the environmental impact of aviation.
  • O Caminho de Santiago: The Way of St. James, a network of pilgrims’ ways leading to Santiago de Compostela (whose railway station you will likely stop at if you take a train to A Coruña).
  1. Incidentally, people travelling from far away are unlikely to find a direct flight to the A Coruña airport. In that case, using local trains from bigger airports like Madrid or Porto may be a better option. 

  2. Direct train between A Coruña and Barcelona seems to be rarer, slower and no longer available as night train. So a connection or night stay in Madrid seems the best option. 

  3. Deutsche Bahn offers a lot of Berlin to Cologne trains per day with a duration of 4h-4h30, including early/late trains or night trains, that you can combine with the Cologne-Paris Eurostar. Deutsche Bahn also offers (non-direct) ICE trains to go from Paris to Berlin. 

  4. Ecopassenger gives information per train, so I’m provided some lower/upper bounds based on different trains. 

  5. Based on the previous data from Eurostar, SNCF, Ecopassenger, Ecotree, trying to find the lowest/highest sum for each individual segment. 

May 22, 2024 10:00 PM

Andy Wingo

growing a bootie

Following on last week’s egregious discussion of the Hoot Scheme-to-WebAssembly compiler bootie, today I would like to examine another axis of boot, which is a kind of rebased branch of history: not the hack as it happened, but the logic inside the hack, the structure of the built thing, the history as it might have been. Instead of describing the layers of shims and props that we used while discovering what were building, let’s look at how we would build Hoot again, if we had to.

I think many readers of this blog will have seen Growing a Language, a talk / performance art piece in which Guy L. Steele—I once mentioned to him that Guy L. was one of the back-justifications for the name Guile; he did not take it well—in which Steele takes the set of monosyllabic words as primitives and builds up a tower of terms on top, bootstrapping a language as he goes. I just watched it again and I think it holds up, probably well enough to forgive the superfluous presence of the gender binary in the intro; ideas were different in the 1900s.

It is in the sense of that talk that I would like to look at growing a Hoot: how Hoot defines nouns and verbs in terms of smaller, more primitive terms: terms in terms of terms.

(hoot features) features (hoot primitives) primitives (ice-9 match) match (ice-9 match):s->(hoot primitives):n (hoot eq) eq (ice-9 match):s->(hoot eq):n (hoot pairs) pairs (ice-9 match):s->(hoot pairs):n (hoot vectors) vectors (ice-9 match):s->(hoot vectors):n (hoot equal) equal (ice-9 match):s->(hoot equal):n (hoot lists) lists (ice-9 match):s->(hoot lists):n (hoot errors) errors (ice-9 match):s->(hoot errors):n (hoot numbers) numbers (ice-9 match):s->(hoot numbers):n (fibers scheduler) scheduler (hoot ffi) ffi (fibers scheduler):s->(hoot ffi):n (guile) (guile) (fibers scheduler):s->(guile):n (fibers channels) channels (fibers channels):s->(ice-9 match):n (fibers waiter-queue) waiter-queue (fibers channels):s->(fibers waiter-queue):n (fibers operations) operations (fibers channels):s->(fibers operations):n (fibers channels):s->(guile):n (srfi srfi-9) srfi-9 (fibers channels):s->(srfi srfi-9):n (fibers waiter-queue):s->(ice-9 match):n (fibers waiter-queue):s->(fibers operations):n (fibers waiter-queue):s->(guile):n (fibers waiter-queue):s->(srfi srfi-9):n (fibers promises) promises (fibers promises):s->(fibers operations):n (hoot exceptions) exceptions (fibers promises):s->(hoot exceptions):n (fibers promises):s->(hoot ffi):n (fibers promises):s->(guile):n (fibers conditions) conditions (fibers conditions):s->(ice-9 match):n (fibers conditions):s->(fibers waiter-queue):n (fibers conditions):s->(fibers operations):n (fibers conditions):s->(guile):n (fibers conditions):s->(srfi srfi-9):n (fibers timers) timers (fibers timers):s->(fibers scheduler):n (fibers timers):s->(fibers operations):n (scheme time) time (fibers timers):s->(scheme time):n (fibers timers):s->(guile):n (fibers operations):s->(ice-9 match):n (fibers operations):s->(fibers scheduler):n (hoot boxes) boxes (fibers operations):s->(hoot boxes):n (fibers operations):s->(guile):n (fibers operations):s->(srfi srfi-9):n (hoot eq):s->(hoot primitives):n (hoot syntax) syntax (hoot eq):s->(hoot syntax):n (hoot strings) strings (hoot strings):s->(hoot primitives):n (hoot strings):s->(hoot eq):n (hoot strings):s->(hoot pairs):n (hoot bytevectors) bytevectors (hoot strings):s->(hoot bytevectors):n (hoot strings):s->(hoot lists):n (hoot bitwise) bitwise (hoot strings):s->(hoot bitwise):n (hoot char) char (hoot strings):s->(hoot char):n (hoot strings):s->(hoot errors):n (hoot strings):s->(hoot numbers):n (hoot match) match (hoot strings):s->(hoot match):n (hoot pairs):s->(hoot primitives):n (hoot bitvectors) bitvectors (hoot bitvectors):s->(hoot primitives):n (hoot bitvectors):s->(hoot bitwise):n (hoot bitvectors):s->(hoot errors):n (hoot bitvectors):s->(hoot match):n (hoot vectors):s->(hoot primitives):n (hoot vectors):s->(hoot pairs):n (hoot vectors):s->(hoot lists):n (hoot vectors):s->(hoot errors):n (hoot vectors):s->(hoot numbers):n (hoot vectors):s->(hoot match):n (hoot equal):s->(hoot primitives):n (hoot equal):s->(hoot eq):n (hoot equal):s->(hoot strings):n (hoot equal):s->(hoot pairs):n (hoot equal):s->(hoot bitvectors):n (hoot equal):s->(hoot vectors):n (hoot records) records (hoot equal):s->(hoot records):n (hoot equal):s->(hoot bytevectors):n (hoot not) not (hoot equal):s->(hoot not):n (hoot values) values (hoot equal):s->(hoot values):n (hoot hashtables) hashtables (hoot equal):s->(hoot hashtables):n (hoot equal):s->(hoot numbers):n (hoot equal):s->(hoot boxes):n (hoot equal):s->(hoot match):n (hoot exceptions):s->(hoot features):n (hoot exceptions):s->(hoot primitives):n (hoot exceptions):s->(hoot pairs):n (hoot exceptions):s->(hoot records):n (hoot exceptions):s->(hoot lists):n (hoot exceptions):s->(hoot syntax):n (hoot exceptions):s->(hoot errors):n (hoot exceptions):s->(hoot match):n (hoot cond-expand) cond-expand (hoot exceptions):s->(hoot cond-expand):n (hoot parameters) parameters (hoot parameters):s->(hoot primitives):n (hoot fluids) fluids (hoot parameters):s->(hoot fluids):n (hoot parameters):s->(hoot errors):n (hoot parameters):s->(hoot cond-expand):n (hoot records):s->(hoot primitives):n (hoot records):s->(hoot eq):n (hoot records):s->(hoot pairs):n (hoot records):s->(hoot vectors):n (hoot symbols) symbols (hoot records):s->(hoot symbols):n (hoot records):s->(hoot lists):n (hoot records):s->(hoot values):n (hoot records):s->(hoot bitwise):n (hoot records):s->(hoot errors):n (hoot ports) ports (hoot records):s->(hoot ports):n (hoot records):s->(hoot numbers):n (hoot records):s->(hoot match):n (hoot keywords) keywords (hoot records):s->(hoot keywords):n (hoot records):s->(hoot cond-expand):n (hoot dynamic-wind) dynamic-wind (hoot dynamic-wind):s->(hoot primitives):n (hoot dynamic-wind):s->(hoot syntax):n (hoot bytevectors):s->(hoot primitives):n (hoot bytevectors):s->(hoot bitwise):n (hoot bytevectors):s->(hoot errors):n (hoot bytevectors):s->(hoot match):n (hoot error-handling) error-handling (hoot error-handling):s->(hoot primitives):n (hoot error-handling):s->(hoot pairs):n (hoot error-handling):s->(hoot exceptions):n (hoot write) write (hoot error-handling):s->(hoot write):n (hoot control) control (hoot error-handling):s->(hoot control):n (hoot error-handling):s->(hoot fluids):n (hoot error-handling):s->(hoot errors):n (hoot error-handling):s->(hoot ports):n (hoot error-handling):s->(hoot numbers):n (hoot error-handling):s->(hoot match):n (hoot error-handling):s->(hoot cond-expand):n (hoot ffi):s->(hoot primitives):n (hoot ffi):s->(hoot strings):n (hoot ffi):s->(hoot pairs):n (hoot procedures) procedures (hoot ffi):s->(hoot procedures):n (hoot ffi):s->(hoot lists):n (hoot ffi):s->(hoot not):n (hoot ffi):s->(hoot errors):n (hoot ffi):s->(hoot numbers):n (hoot ffi):s->(hoot cond-expand):n (hoot debug) debug (hoot debug):s->(hoot primitives):n (hoot debug):s->(hoot match):n (hoot symbols):s->(hoot primitives):n (hoot symbols):s->(hoot errors):n (hoot assoc) assoc (hoot assoc):s->(hoot primitives):n (hoot assoc):s->(hoot eq):n (hoot assoc):s->(hoot pairs):n (hoot assoc):s->(hoot equal):n (hoot assoc):s->(hoot lists):n (hoot assoc):s->(hoot not):n (hoot procedures):s->(hoot primitives):n (hoot procedures):s->(hoot syntax):n (hoot write):s->(hoot primitives):n (hoot write):s->(hoot eq):n (hoot write):s->(hoot strings):n (hoot write):s->(hoot pairs):n (hoot write):s->(hoot bitvectors):n (hoot write):s->(hoot vectors):n (hoot write):s->(hoot records):n (hoot write):s->(hoot bytevectors):n (hoot write):s->(hoot symbols):n (hoot write):s->(hoot procedures):n (hoot write):s->(hoot bitwise):n (hoot write):s->(hoot char):n (hoot write):s->(hoot errors):n (hoot write):s->(hoot ports):n (hoot write):s->(hoot numbers):n (hoot write):s->(hoot keywords):n (hoot lists):s->(hoot primitives):n (hoot lists):s->(hoot pairs):n (hoot lists):s->(hoot values):n (hoot lists):s->(hoot numbers):n (hoot lists):s->(hoot match):n (hoot lists):s->(hoot cond-expand):n (hoot not):s->(hoot syntax):n (hoot syntax):s->(hoot primitives):n (hoot values):s->(hoot primitives):n (hoot values):s->(hoot syntax):n (hoot control):s->(hoot primitives):n (hoot control):s->(hoot parameters):n (hoot control):s->(hoot values):n (hoot control):s->(hoot cond-expand):n (hoot bitwise):s->(hoot primitives):n (hoot char):s->(hoot primitives):n (hoot char):s->(hoot bitvectors):n (hoot char):s->(hoot bitwise):n (hoot char):s->(hoot errors):n (hoot char):s->(hoot match):n (hoot dynamic-states) dynamic-states (hoot dynamic-states):s->(hoot primitives):n (hoot dynamic-states):s->(hoot vectors):n (hoot dynamic-states):s->(hoot debug):n (hoot dynamic-states):s->(hoot lists):n (hoot dynamic-states):s->(hoot values):n (hoot dynamic-states):s->(hoot errors):n (hoot dynamic-states):s->(hoot numbers):n (hoot dynamic-states):s->(hoot match):n (hoot read) read (hoot read):s->(hoot primitives):n (hoot read):s->(hoot eq):n (hoot read):s->(hoot strings):n (hoot read):s->(hoot pairs):n (hoot read):s->(hoot bitvectors):n (hoot read):s->(hoot vectors):n (hoot read):s->(hoot exceptions):n (hoot read):s->(hoot symbols):n (hoot read):s->(hoot lists):n (hoot read):s->(hoot not):n (hoot read):s->(hoot values):n (hoot read):s->(hoot char):n (hoot read):s->(hoot errors):n (hoot read):s->(hoot ports):n (hoot read):s->(hoot numbers):n (hoot read):s->(hoot match):n (hoot read):s->(hoot keywords):n (hoot hashtables):s->(hoot primitives):n (hoot hashtables):s->(hoot eq):n (hoot hashtables):s->(hoot pairs):n (hoot hashtables):s->(hoot vectors):n (hoot hashtables):s->(hoot procedures):n (hoot hashtables):s->(hoot lists):n (hoot hashtables):s->(hoot values):n (hoot hashtables):s->(hoot bitwise):n (hoot hashtables):s->(hoot errors):n (hoot hashtables):s->(hoot numbers):n (hoot fluids):s->(hoot primitives):n (hoot fluids):s->(hoot cond-expand):n (hoot errors):s->(hoot primitives):n (hoot atomics) atomics (hoot atomics):s->(hoot primitives):n (hoot ports):s->(hoot primitives):n (hoot ports):s->(hoot eq):n (hoot ports):s->(hoot strings):n (hoot ports):s->(hoot pairs):n (hoot ports):s->(hoot vectors):n (hoot ports):s->(hoot parameters):n (hoot ports):s->(hoot bytevectors):n (hoot ports):s->(hoot procedures):n (hoot ports):s->(hoot lists):n (hoot ports):s->(hoot not):n (hoot ports):s->(hoot values):n (hoot ports):s->(hoot bitwise):n (hoot ports):s->(hoot char):n (hoot ports):s->(hoot errors):n (hoot ports):s->(hoot numbers):n (hoot ports):s->(hoot boxes):n (hoot ports):s->(hoot match):n (hoot ports):s->(hoot cond-expand):n (hoot numbers):s->(hoot primitives):n (hoot numbers):s->(hoot eq):n (hoot numbers):s->(hoot not):n (hoot numbers):s->(hoot values):n (hoot numbers):s->(hoot bitwise):n (hoot numbers):s->(hoot errors):n (hoot numbers):s->(hoot match):n (hoot boxes):s->(hoot primitives):n (hoot match):s->(hoot primitives):n (hoot match):s->(hoot errors):n (hoot keywords):s->(hoot primitives):n (hoot cond-expand):s->(hoot features):n (hoot cond-expand):s->(hoot primitives):n (scheme lazy) lazy (scheme lazy):s->(hoot primitives):n (scheme lazy):s->(hoot records):n (scheme lazy):s->(hoot match):n (scheme base) base (scheme lazy):s->(scheme base):n (scheme load) load (scheme load):s->(hoot primitives):n (scheme load):s->(hoot errors):n (scheme load):s->(scheme base):n (scheme complex) complex (scheme complex):s->(hoot numbers):n (scheme time):s->(hoot primitives):n (scheme time):s->(scheme base):n (scheme file) file (scheme file):s->(hoot primitives):n (scheme file):s->(hoot errors):n (scheme file):s->(hoot ports):n (scheme file):s->(hoot match):n (scheme file):s->(scheme base):n (scheme write) write (scheme write):s->(hoot write):n (scheme eval) eval (scheme eval):s->(hoot errors):n (scheme eval):s->(scheme base):n (scheme inexact) inexact (scheme inexact):s->(hoot primitives):n (scheme inexact):s->(hoot numbers):n (scheme char) char (scheme char):s->(hoot primitives):n (scheme char):s->(hoot bitwise):n (scheme char):s->(hoot char):n (scheme char):s->(hoot numbers):n (scheme char):s->(scheme base):n (scheme process-context) process-context (scheme process-context):s->(hoot primitives):n (scheme process-context):s->(hoot errors):n (scheme process-context):s->(scheme base):n (scheme cxr) cxr (scheme cxr):s->(hoot pairs):n (scheme read) read (scheme read):s->(hoot read):n (scheme base):s->(hoot features):n (scheme base):s->(hoot primitives):n (scheme base):s->(hoot eq):n (scheme base):s->(hoot strings):n (scheme base):s->(hoot pairs):n (scheme base):s->(hoot vectors):n (scheme base):s->(hoot equal):n (scheme base):s->(hoot exceptions):n (scheme base):s->(hoot parameters):n (scheme base):s->(hoot dynamic-wind):n (scheme base):s->(hoot bytevectors):n (scheme base):s->(hoot error-handling):n (scheme base):s->(hoot symbols):n (scheme base):s->(hoot assoc):n (scheme base):s->(hoot procedures):n (scheme base):s->(hoot write):n (scheme base):s->(hoot lists):n (scheme base):s->(hoot not):n (scheme base):s->(hoot syntax):n (scheme base):s->(hoot values):n (scheme base):s->(hoot control):n (scheme base):s->(hoot char):n (scheme base):s->(hoot read):n (scheme base):s->(hoot errors):n (scheme base):s->(hoot ports):n (scheme base):s->(hoot numbers):n (scheme base):s->(hoot match):n (scheme base):s->(hoot cond-expand):n (scheme base):s->(srfi srfi-9):n (scheme repl) repl (scheme repl):s->(hoot errors):n (scheme repl):s->(scheme base):n (scheme r5rs) r5rs (scheme r5rs):s->(scheme lazy):n (scheme r5rs):s->(scheme load):n (scheme r5rs):s->(scheme complex):n (scheme r5rs):s->(scheme file):n (scheme r5rs):s->(scheme write):n (scheme r5rs):s->(scheme eval):n (scheme r5rs):s->(scheme inexact):n (scheme r5rs):s->(scheme char):n (scheme r5rs):s->(scheme process-context):n (scheme r5rs):s->(scheme cxr):n (scheme r5rs):s->(scheme read):n (scheme r5rs):s->(scheme base):n (scheme r5rs):s->(scheme repl):n (scheme case-lambda) case-lambda (scheme case-lambda):s->(hoot primitives):n (fibers) (fibers) (fibers):s->(fibers scheduler):n (fibers):s->(guile):n (guile):s->(hoot features):n (guile):s->(hoot primitives):n (guile):s->(ice-9 match):n (guile):s->(hoot eq):n (guile):s->(hoot strings):n (guile):s->(hoot pairs):n (guile):s->(hoot bitvectors):n (guile):s->(hoot vectors):n (guile):s->(hoot equal):n (guile):s->(hoot exceptions):n (guile):s->(hoot parameters):n (guile):s->(hoot dynamic-wind):n (guile):s->(hoot bytevectors):n (guile):s->(hoot error-handling):n (guile):s->(hoot symbols):n (guile):s->(hoot assoc):n (guile):s->(hoot procedures):n (guile):s->(hoot write):n (guile):s->(hoot lists):n (guile):s->(hoot not):n (guile):s->(hoot syntax):n (guile):s->(hoot values):n (guile):s->(hoot control):n (guile):s->(hoot bitwise):n (guile):s->(hoot char):n (guile):s->(hoot dynamic-states):n (guile):s->(hoot read):n (guile):s->(hoot fluids):n (guile):s->(hoot errors):n (guile):s->(hoot ports):n (guile):s->(hoot numbers):n (guile):s->(hoot boxes):n (guile):s->(hoot keywords):n (guile):s->(hoot cond-expand):n (guile):s->(scheme lazy):n (guile):s->(scheme time):n (guile):s->(scheme file):n (guile):s->(scheme char):n (guile):s->(scheme process-context):n (guile):s->(scheme base):n (guile):s->(srfi srfi-9):n (srfi srfi-9):s->(hoot primitives):n (srfi srfi-9):s->(hoot records):n

If you are reading this on the web, you should see above a graph of dependencies among the 50 or so libraries that are shipped as part of Hoot. (Somehow I doubt that a feed reader will plumb through the inline SVG, but who knows.) It’s a bit of a mess, but still I think it’s a useful illustration of a number of properties of how the Hoot language is grown from small to large. Click on any box to visit the source code for that module.

the root of the boot

Firstly, let us note that the graph is not a forest: it is a single tree. There is no module that does not depend (possibly indirectly) on (hoot primitives). This is because there are no capabilities that Hoot libraries can access without importing them, and the only way into the Hootosphere from outside is via the definitions in the primitives module.

So what are these definitions, you might ask? Well, these are the “well-known” bindings, for example + for which the compiler might have some special understanding, the sort of binding that gets translated to a primitive operation at the compiler IR level. They are used in careful ways by the modules that use (hoot primitives) to ensure that their uses are all open-coded by the compiler. (“Open coding” is inlining. But inlining to me implies that the whole implementation is inlined, with no slow-path callouts, whereas open coding implies to me that it’s the compiler that knows what the op does and may or may not inline the actual asm.)

But, (hoot primitives) also exposes some other definitions, for example define and let and lambda and all that. Scheme doesn’t have keywords in the sense that Python has def and with and such: there is no privileged way to associate a name with its meaning. It is in this sense that it is impossible to avoid (hoot primitives): the most simple (define x 42) depends on the lexical meaning of define, which is provided by the primitives module.

Syntax definitions are an expander construct; they are not present at run-time. Using a syntax definition causes the expander to invoke code, and the expander runs on the host system, which is Guile and not WebAssembly. So, syntax definitions belong to the host. This goes also for some first-order definitions such as syntax->datum and so on, which are only used in syntax expanders; these definitions are plumbed through (hoot primitives), but can only ever be used by macro definitions, which run on the meta-level.

(Is this too heavy? Allow me to lighten the mood: when I was 22 or so and working in Namibia, I somehow got an advance copy of Notes from the Metalevel. I was working on algorithmic music synthesis, and my chief strategy was knocking hubris together with itself, as one does. I sent the author a bunch of uninvited corrections to his book. I think it was completely unwelcome! Anyway, moral of the story, at 22 you get a free pass to do whatever you want, and come to think of it, now that I am 44 I think I should get some kind of hubris loyalty award or something.)

powerful primitives

So, there are expand-time primitives and run-time primitives. The expander knows about expand-time primitives and the compiler knows about run-time primitives. One particularly powerful primitive is %inline-wasm, which takes an inline snippet of WebAssembly as an s-expression and applies it to a number of arguments passed at run-time. Consider make-bytevector:

(define* (make-bytevector len #:optional (init 0))
  (%inline-wasm
   '(func (param $len i32) (param $init i32)
      (result (ref eq))
      (struct.new
       $mutable-bytevector
       (i32.const 0)
       (array.new $raw-bytevector
                  (local.get $init)
                  (local.get $len))))
   len init))

We have an inline snippet of wasm that makes a $mutable-bytevector. It passes 0 as the hash field, meaning that the hashq of this value will be lazily initialized, and the contents are a new array of a given size and initial value. Inputs will be unboxed to the appropriate type (two i32s in this case), and likewise with outputs; here we produce the universal (ref eq) representation.

The nice thing about %inline-wasm is that the compiler didn’t have to be taught about make-bytevector: this definition suffices, because %inline-wasm can access a number of lower-level capabilities.

dual denotations

But as we learned in my notes on whole-program compilation, any run-time definition is available at compile-time, if it is reachable from a syntax transformer. So this definition above isn’t quite sufficient; we can’t call make-bytevector as part of a procedural macro, which we might want to do. What we need instead is to provide one definition when residualizing wasm at run-time, and another when loading a module at expand-time.

In Hoot we do this with cond-expand, where we expand to %inline-wasm when targetting Hoot, and... what, precisely, at expand-time? Really we need to make a Guile bytevector, so in this sort of case, we end up having to include a run-time make-bytevector definition in the (hoot primitives) module. This happens whereever we end up using %inline-wasm.

building to guile

Returning to our graph, we see that there is a red-colored block for Hoot modules, a teal-colored layer on top for those modules that are defined by R7RS, a few oddballs, and then (guile) and Fibers built on top. The (guile) module provides a shim that implements Guile’s own default set of bindings, allowing Guile modules to be loaded on a Hoot system. (guile) is layered on top of the low-level Hoot libraries, and out of convenience, on top of the various R7RS libraries as well, because it was easiest to remember what was where in R7RS than our ad-hoc nest of Hoot internal libraries.

Having (guile) lets Guile hackers build on Hoot. It’s still incomplete but I think eventually it will be capital-G Good. Even for a library that needed more porting like Fibers (Hoot has no threads so much of the parallel concurrent ML implementation can be simplified, and we use an event loop from the Wasm run-time instead of an epoll-based scheduler), it was still pleasant to be able to use define-module and keyword arguments and all of that.

next layers

I mentioned that this tower of terms is incomplete, and so that is one of the next work items for Hoot: complete support for Guile’s run-time library. At that point we’d probably want to merge it into Guile, but that is another topic.

But let’s leave that for another day; until then, happy hacking!

by Andy Wingo at May 22, 2024 08:16 AM

May 21, 2024

Brian Kardell

Improving the WPT Dashboard

Improving the WPT Dashboard

Thoughts on things I'd like to see as part of the WPT dashboard.

In my last post I dug into the data behind wpt.fyi's Browser Specific Failures chart (below) and the site's general reporting capabilities.

I suggested that linking to data on specifically what failed (at least what is queryable) would probably be really helpful (perhaps some kind of "understanding this chart" link as well).

While these aren't part of the design today, I think this is mainly because the primary audience of that chart was originally mainly the vendors themselves. It was intended to allow for certain simple kinds of tracking, planning and prioritization. For example "Let's set a goal to not let the failure exceed such and such threshold" or "Let's aim to lower the failures by X this quarter". It wasn't critical to link to the tests because the audience knew how to interrogate the data - the purpose was just to get a quantifiable number you can easily report on.

But, now we see this chart shared a lot and it's pretty clear that people are curious so we should probably adjust it for the wider audience.

Additionally though, that's also only a single view of the data, and I'd like to argue that we could some other improvements too.

Prioritization

BSF made the observation that if we can identify a test that fails in only 1 browser, then that browser's team can easily prioritize something that has significant impact. That browser is the boat anchor holding things back. Except, it's not quite that cut and dry in reality.

Real management of software projects is hard. I think that anyone who's worked on software projects can relate to this, at least a bit, if we take some time to consider all of the things that go into choosing how to apply our limited resources. Obviously, not all failures are equal - especially when we're talking about projects which are a quarter of a century old. The reality is that all of that decisioning and prioritization is happening independently across different organizations, with different views on the web, different budgets, different legacy challenges, etc.

That's where I think there are some things to learn from Interop.

What I learned from Interop Is...

If you think about it, Interop is about trying to achieve thematically, basically the same thing as BSF: Make more things "green across the board". But it is a very different thing than BSF.

I've really learned a lot from the process of helping organize Interop every year about why this takes so long to happen naturally. There are so many limits and signals and opinions. One of the things we do as part of the process is to take all of the submissions and independently order them in terms of what we think are their priorities. There are 6 organizations doing that: Apple, Bocoup, Google, Igalia, Microsoft and Mozilla. How many do you think chose the same #1? The answer is 0.

It really highlights how waiting for all of the stars to align by chance winds up often being a painfully slow process and full of problems.

However, a huge part of interop is dedicated to dealing with the stuff BSF doesn't really consider - aligning agreement on: 1. what features are most important 2. which tests regarding those are valid/important 3. are all the spec questions really answered? 4. is this actually (hopefully) achievable in the next year?

In that, I believe it has been extremely successful in creating way more "green across the board" than anything else. I think this is true beyond even what is officially part of Interop, because we're all able to kind of discuss and see where others are probably going to invest in work because things that were important for them didn't make the cut.

In a way, each year is sort of like doing what we used to do with "CSS2" and "HTML4"... Creating a more focused discussion that is the goal floor, not the ceiling.

It's not enough... Sure, I believe this gives us much better results by helping alignment. I think this is obvious given how rapid and smoothly we've found so much high-quality alignment in recent years. However, there's something I want stress in all of this: Choosing what to prioritize is also inherently choosing what to collectively deprioritize. It is inevitable because at the end of the day there is just too much.

The only real solution to this problem is wider investment in the platform and, ultimately, almost certainly, changing how we fund it.

Alignment vs Passing

Interop also showed us that a simple, individual pass/fail can be incomplete and misleading. If 3 browsers reach a point of passing 50% of measured tests, the number of tests that pass in all browsers might actually be 0, as illustrated in the table below...

Chrome Firefox WebKit
Lots of tests pass, but not even one passes universally!

In fact, here's a real world example of exactly this kind of misleading view in a set of SVG tests. If we look at the numbers across the bottom:

  • chrome: 166 / 191
  • edge: 166 / 191
  • firefox: 175 / 191
  • safari: 132 / 191

It's not terrible if you're only looking at those numbers. But, if you scroll down through that table you'll see that there are ragged failures all over that. In fact, only 52 of 189 are "green across the board"!

We can only realistically solve by having a more holistic view and working together. BSF is just the slice that is theoretically actionable individually, not everything that matters.

What about a focus on Universally Passing?

In the Interop project we track the difference above as its own data point: The Interop number, and we put it as a separate column in the test tables:

a table containing a column for each individual browser scores on different features, and a column for the number that pass in all
The interop column reports how many tests pass on all of the tracked browsers

Similarly, we track it over time:

A graph showing scores of each browser over time as well as an "interop line"

Could we learn something from this? Wouldn't something like that be great to have in general?

For example, in the wpt.fyi tables? Now, it couldn't look just like that because those numbers are all in percentages, and this only really works because the interop process carefully sets a governance process for defining/agreeing to what the tests are. It would be enough to add a column to the table in the same form, something like this:

That might help us uncover situations like the SVG one above and present opportunites like interop for us to collectively decide to try to address that.

Similarly, we could track it over time. Sort of the opposite of BSF. We want to see the simple number of subtests passing in browsers and it should always be going up (even as new tests are added, no existing ones should stop passing - those are just more opportunities to go up). Further, ideally the Universally Passing number shouldn't ever be drawing significantly further away from that over time or we're making less of the platform universal. That is, you could see, over time when we are cooperating better, and when we are not.

We do better when we are. In my mind, that's an explicit goal, and this would be a view into it.

May 21, 2024 04:00 AM

May 20, 2024

Frédéric Wang

Time travel debugging of WebKit with rr

Introduction

rr is a debugging tool for Linux that was originally developed by Mozilla for Firefox. It has long been adopted by Igalia and other web platform developers for Chromium and WebKit too. Back in 2019, there were breakout sessions on this topic at the Web Engines Hackfest and BlinkOn.

For WebKitGTK, the Flatpak SDK provides a copy of rr, but recently I was unable to use the instructions on trac.webkit.org. Fortunately, my colleague Adrián Pérez suggested using a direct build without flatpak or the bubblewrap sandbox, and that indeed solved my problem. I thought it might be interesting to share this information with others, so I decided to write this blog post.

Disclaimer: The build instructions below may be imperfect, will likely become outdated, and are in any case not a replacement for the official ones for WebKitGTK development. Use them at your own risk!

CMake configuration

The approach that worked for me was thus to perform a direct build from my system. I came up with the following configuration step:

cmake -S. -BWebKitBuild/Release \
   -DCMAKE_BUILD_TYPE=Release \
   -DCMAKE_INSTALL_PREFIX=$HOME/WebKit/WebKitBuild/install/Release \
   -GNinja -DPORT=GTK -DENABLE_BUBBLEWRAP_SANDBOX=OFF \
   -DDEVELOPER_MODE=ON -DDEVELOPER_MODE_FATAL_WARNINGS=OFF \
   -DENABLE_TOOLS=ON -DENABLE_LAYOUT_TESTS=ON

where:

  • The -B option specifies the build directory, which is traditionnaly called WebKitBuild/ for the WebKit project.
  • CMAKE_BUILD_TYPE specifies the build type, e.g. optimized release builds (Release, corresponding to --release for the offical script) or debug builds with assertions (Debug, corresponding to --debug) 1.
  • CMAKE_INSTALL_PREFIX specifies the installation directory, which I place inside WebKitBuild/install/ 2.
  • The -G option specifies the build system generator. I used Ninja, which is the default for the offical script too.
  • -DPORT=GTK is for building WebKitGTK. I haven’t tested rr with other Linux ports.
  • -DENABLE_BUBBLEWRAP_SANDBOX=OFF was suggested by Adrián. The bubblewrap sandbox probably does not make sense without flatpak, so it should be safe to disable it anyway.
  • I extracted the other -D flags from the official script, trying to stay as close as possible to what it provides for WebKit development (being able to run layout tests, building the Tools/, ignoring fatal warnings, etc).

Needless to say, the advantage of using flatpak is that it automatically downloads and install all the required dependencies. But if you do your own build, you need to figure out what they are and perform the setup manually. Generally, this is straightforward using your distribution’s package manager, but there can be some tricky exceptions 3.

While we are still at the configuration step, I believe it’s worth sharing two more tricks for WebKit developers:

  • You can use -DENABLE_SANITIZERS=address to produce Asan builds or builds with other sanitizers.
  • You can use -DCMAKE_CXX_FLAGS="-DENABLE_TREE_DEBUGGING" in release builds if you want to get access to the tree debugging functions (ShowRenderTree and the like). This flag is turned on by default for debug builds.

Building and running WebKit

Once the configure step is successful, you can build and install WebKit using the following CMake command 2.

cmake --build WebKitBuild/Release --target install

When that operation completes, you should be able to run MiniBrowser with the following command:

LD_LIBRARY_PATH=WebKitBuild/install/Release/lib ./WebKitBuild/Release/bin/MiniBrowser

For WebKitTestRunner, some extra environment variables are necessary 2:

TEST_RUNNER_INJECTED_BUNDLE_FILENAME=$HOME/WebKit/WebKitBuild/Release/lib/libTestRunnerInjectedBundle.so LD_LIBRARY_PATH=WebKitBuild/install/Release/lib ./WebKitBuild/Release/bin/WebKitTestRunner filename.html

You can also use the official scripts, Tools/Script/run-minibrowser and Tools/Script/run-webkit-tests. They expect some particular paths, but a quick workaround is to use a symbolic link:

ln -s $HOME/WebKit/WebKitBuild $HOME/WebKit/WebKitBuild/GTK

Using rr for WebKit debugging

rr is generally easily installable from your distribution’s package manager. However, as stated on the project wiki page:

Support for the latest hardware and kernel features may require building rr from Github master.

Indeed, using the source has always worked best for me to avoid mysterious execution failures when starting the recording 4.

If you are not familiar with rr, I strongly invite you to take a look at the overview on the project home page or at some of the references I mentioned in the introduction. In any case, the first step is to record a trace by passing the program and arguments to rr. For example, to record a trace for MiniBrowser:

LD_LIBRARY_PATH=WebKitBuild/install/Debug/lib rr ./WebKitBuild/Debug/bin/MiniBrowser https://www.igalia.com/

After the program exits, you can replay the recorded trace as many times as you want. For hard-to-reproduce bugs (e.g. non-deterministic issues or involving a lot of manual steps), that means you only need to be able to record and reproduce the bug once and then can just focus on debugging. You can even turn off your machine after hours of exhausting debugging, then continue the effort later when you have more time and energy! The trace is played in a deterministic way, always using the same timing and pointer addresses. You can use most gdb commands (to run the program, interrupt it, and inspect data), but the real power comes from new commands to perform reverse execution!

Before coming to that, let’s explain how to handle programs with multiple processes, which is the case for WebKit and modern browsers in general. After you recorded a trace, you can display the pids of all recorded processes using the rr ps command. For example, we can see in the following output that the MiniBrowser process (pid 24103) actually forked three child processes, including the Network Process (pid 24113) and the Web Process (24116):

PID     PPID    EXIT    CMD
24103   --      0       ./WebKitBuild/Debug/bin/MiniBrowser https://www.igalia.com/
24113   24103   -9      ./WebKitBuild/Debug/bin/WebKitNetworkProcess 7 12
24115   24103   1       (forked without exec)
24116   24103   -9      ./WebKitBuild/Debug/bin/WebKitWebProcess 15 15

Here is a small debugging session similar to the single-process example from Chromium Chronicle #13 5. We use the option -p 24116 to attach the debugger to the Web Process and -e to start debugging from where it exited:

rr replay -p 24116 -e
(rr) break RenderFlexibleBox::layoutBlock
(rr) rc # Run back to the last layout call
Thread 2 hit Breakpoint 1, WebCore::RenderFlexibleBox::layoutBlock (this=0x7f66699cc400, relayoutChildren=false) at /home/fred/src-obj/WebKit/Source/WebCore/rendering/RenderFlexibleBox.cpp:420
(rr) # Inspect anything you want here. To find the previous Layout call on this object:
(rr) cond 1 this == 0x7f66699cc400
(rr) rc
Thread 2 hit Breakpoint 1, WebCore::RenderFlexibleBox::layoutBlock (this=0x7f66699cc400, relayoutChildren=false) at /home/fred/src-obj/WebKit/Source/WebCore/rendering/RenderFlexibleBox.cpp:420
420     {
(rr) delete 1
(rr) watch -l m_style.m_nonInheritedFlags.effectiveDisplay # Or find the last time the effective display was changed
Thread 4 hit Hardware watchpoint 2: -location m_style.m_nonInheritedFlags.effectiveDisplay

Old value = 16
New value = 0
0x00007f6685234f39 in WebCore::RenderStyle::RenderStyle (this=0x7f66699cc4a8) at /home/fred/src-obj/WebKit/Source/WebCore/rendering/style/RenderStyle.cpp:176
176     RenderStyle::RenderStyle(RenderStyle&&) = default;

rc is an abbreviation for reverse-continue and continues execution backward. Similarly, you can use reverse-next, reverse-step and reverse-finish commands, or their abbreviations. Notice that the watchpoint change is naturally reversed compared to normal execution: the old value (sixteen) is the one after intialization, while the new value (zero) is the one before initialization!

Restarting playback from a known point in time

rr also has a concept of “event” and associates a number to each event it records. They can be obtained by the when command, or printed to the standard output using the -M option. To elaborate a bit more, suppose you add the following printf in RenderFlexibleBox::layoutBlock:

@@ -423,6 +423,8 @@ void RenderFlexibleBox::layoutBlock(bool relayoutChildren, LayoutUnit)
     if (!relayoutChildren && simplifiedLayout())
         return;

+    printf("this=%p\n", this);
+

After building, recording and replaying again, the output should look like this:

$ rr -M replay -p 70285 -e # replay with the new PID of the web process.
...
[rr 70285 57408]this=0x7f742203fa00
[rr 70285 57423]this=0x7f742203fc80
[rr 70285 57425]this=0x7f7422040200
...

Each printed output is now annotated with two numbers in bracket: a PID and an event number. So in order to restart from when an interesting output happened (let’s say [rr 70285 57425]this=0x7f7422040200), you can now execute run 57425 from the debugging session, or equivalently:

rr replay -p 70285 -g 57425

Older traces and parallel debugging

Another interesting thing to know is that traces are stored in ~/.local/share/rr/ and you can always specify an older trace to the rr command e.g. rr ps ~/.local/share/rr/MiniBrowser-0. Be aware that the executable image must not change, but you can use rr pack to be able to run old traces after a rebuild, or even to copy traces to another machine.

To be honest, most the time I’m just using the latest trace. However, one thing I’ve sometimes found useful is what I would call the “parallel debugging” technique. Basically, I’m recording one trace for a testcase that exhibits the bug and another one for a very similar testcase (e.g. with one CSS property difference) that behaves correctly. Then I replay the two traces side by side, comparing them to understand where the issue comes from and what can be done to fix it.

The usage documentation also provides further tips, but this should be enough to get you started with time travel debugging in WebKit!

  1. RelWithDebInfo build type (which yields an optimized release build with debug symbols) might also be interesting to consider in some situations, e.g. debugging bugs that reproduce in release builds but not in debug builds. 

  2. Using an installation directory might not be necessary, but without that, I had trouble making the whole thing work properly (wrong libraries loaded or libraries not found).  2 3

  3. In my case, I chose the easiest path to disable some features, namely -DUSE_JPEGXL=OFF, -DUSE_LIBBACKTRACE=OFF, and -DUSE_GSTREAMER_TRANSCODER=OFF

  4. Incidentally, you are likely to get an error saying that perf_event_paranoid is required to be at most 1, which you can force using sudo sysctl kernel.perf_event_paranoid=1

  5. The equivalent example would probably have been to watch for the previous style change with watch -l m_style, but this was exceeding my hardware watchpoint limit, so I narrowed it down to a smaller observation scope. 

May 20, 2024 10:00 PM

May 16, 2024

Andy Wingo

on hoot, on boot

I realized recently that I haven’t been writing much about the Hoot Scheme-to-WebAssembly compiler. Upon reflection, I have been too conscious of its limitations to give it verbal tribute, preferring to spend each marginal hour fixing bugs and filling in features rather than publicising progress.

In the last month or so, though, Hoot has gotten to a point that pleases me. Not to the point where I would say “accept no substitutes” by any means, but good already for some things, and worth writing about.

So let’s start today by talking about bootie. Boot, I mean! The boot, the boot, the boot of Hoot.

hoot boot: temporal tunnel

The first axis of boot is time. In the beginning, there was nary a toot, and now, through boot, there is Hoot.

The first boot of Hoot was on paper. Christine Lemmer-Webber had asked me, ages ago, what I thought Guile should do about the web. After thinking a bit, I concluded that it would be best to avoid compromises when building an in-browser Guile: if you have to pollute Guile to match what JavaScript offers, you might as well program in JavaScript. JS is cute of course, but Guile is a bit different in some interesting ways, the most important of which is control: delimited continuations, multiple values, tail calls, dynamic binding, threads, and all that. If Guile’s web bootie doesn’t pack all the funk in its trunk, probably it’s just junk.

So I wrote up a plan something to which I attributed the name tailification. In retrospect, this is simply a specific flavor of a continuation-passing-style (CPS) transmutation, late in the compiler pipeline. I’ll elocute more in a future dispatch. I did end up writing the tailification pass back then; I could have continued to target JS, but it was sufficiently annoying and I didn’t prosecute. It sat around unused for a few years, until Christine’s irresistable charisma managed to conjure some resources for Hoot.

In the meantime, the GC extension for WebAssembly shipped (woot woot!), and to boot Hoot, I filled in the missing piece: a backend for Guile’s compiler that tailified and then translated primitive operations to snippets of WebAssembly.

It was, well, hirsute, but cute and it did compute, so we continued to boot. From this root we grew a small run-time library, written in raw WebAssembly, used for slow-paths for the various primitive operations that are part of Guile’s compiler back-end. We filled out Guile primcalls, in minute commits, growing the WebAssembly runtime library and toolchain as we went.

Eventually we started constituting facilities defined in terms of those primitives, via a Scheme prelude that was prepended to all programs, within a nested lexical environment. It was never our intention though to drown the user’s programs in a sea of predefined bindings, as if the ultimate program were but a vestigial inhabitant of the lexical lake—don’t dilute the newt!, we would often say [ed: we did not]— so eventually when the prelude became unmanageable, we finally figured out how to do whole-program compilation of a set of modules.

Then followed a long month in which I would uproot the loot from the boot: take each binding from the prelude and reattribute it into an appropriate module. User code could import all the modules that suit, as long as they were known to Hoot, but no others; it was only until we added the ability for users to programmatically consitute an environment from their modules that Hoot became a language implementation of any repute.

Which brings us to the work of the last month, about which I cannot be mute. When you have existing Guile code that you want to distribute via the web, Hoot required you transmute its module definitions into the more precise R6RS syntax. Precise, meaning that R6RS modules are static, in a way that Guile modules, at least in absolute terms, are not: Guile programs can use first-class accessors on the module systems to pull out bindings. This is yet another example of what I impute as the original sin of 1990s language development, that modules are just mutable hash maps. You see it in Python, for example: because you don’t know for sure to what values global names are bound, it is easy for any discussion of what a particular piece of code means to end in dispute.

The question is, though, are the semantics of name binding in a language fixed and absolute? Once your language is booted, are its aspects definitively attributed? I think some perfection, in the sense of becoming more perfect or more like the thing you should be, is something to salute. Anyway, in Guile it would be coherent with Scheme’s lexical binding heritage to restitute some certainty as to the meanings of names, at least in a default compilation node. Lexical binding is, after all, the foundation of the Macro Writer’s Statute of Rights. Of course if you are making a build for development purposes, not to distribute, then you might prefer a build that marks all bindings as dynamic. Otherwise I think it’s reasonable to require the user to explicitly indicate which definitions are denotations, and which constitute locations.

Hoot therefore now includes an implementation of the static semantics of Guile’s define-module: it can load Guile modules directly, and as a tribute, it also has an implementation of the ambient (guile) module that constitutes the lexical soup of modules that aren’t #:pure. (I agree, it would be better if all modules were explicit about the language they are written in—their imported bindings and so on—but there is an existing corpus to accomodate; the point is moot.)

The astute reader (whom I salute!) will note that we have a full boot: Hoot is a Guile. Not an implementation to substitute the original, but more of an alternate route to the same destination. So, probably we should scoot the two implementations together, to knock their boots, so to speak, merging the offshoot Hoot into Guile itself.

But do I circumlocute: I can only plead a case of acute Hoot. Tomorrow, we elocute on a second axis of boot. Until then, happy compute!

by Andy Wingo at May 16, 2024 08:01 PM

May 15, 2024

Manuel A. Fernandez Montecelo

WiFi back-ends in SteamOS 3.6

As of NetworkManager v1.46.0 (and a very long while before that, since 1.12 released in 2018), there are two wifi.backend's supported: wpa_supplicant (default), and iwd (experimental).

iwd is still considered experimental after these years, see the list of issues with iwd backend, and specially this one.

In SteamOS, however, the default is iwd. It works relatively well in most scenarios; and the main advantage is that, when waking up the device (think of Steam Deck returning from sleep, "suspended to RAM" or S3 sleeping state), it regains network connectivity much faster -- typically 1-2 seconds vs. 5+ for wpa_supplicant.

However, if for some reason iwd doesn't work properly in your case, you can give wpa_supplicant a try.

With steamos-wifi-set-backend command

In 3.6, the back-end can be changed via command.

3.6 has been available in Main channel since late March and in Beta/Preview for several days, for example 20240509.100.

If you haven't done this before, to be able to run it, one must be able to either log in via SSH (not available out of the box), or to switch to desktop-mode, open a terminal and be able to type commands.

Assuming that one wants to change from the default iwd to wpa_supplicant, the command is:

steamos-wifi-set-backend wpa_supplicant

The script tries to ensure than things are in place in terms of some systemd units being stopped and others brought up as appropriate. After a few seconds, if it's able to connect correctly to the WiFi network, the device should get connectivity (e.g. ping works). It is not necessary to enter again the preferred networks and passwords, NetworkManager feeds the back-ends with the necessary config and credentials.

It is possible to switch back and forth between back-ends by using alternatively iwd and wpa_supplicant as many times as desired, and without restarting the system.

The current back-end as per configuration can be obtained with:

steamos-wifi-set-backend --check

And finally, this is the output of changing to wpa_supplicant and back to iwd (i.e., the output that you should expect):

(deck@steamdeck ~)$ steamos-wifi-set-backend --check
iwd

(deck@steamdeck ~)$ steamos-wifi-set-backend wpa_supplicant
INFO: switching back-end to 'wpa_supplicant'
INFO: stopping old back-end service and restarting NetworkManager,
      networking will be disrupted (hopefully only momentary blip, max 10 seconds)...
INFO: restarting done
INFO: checking status of services ...
      (sleeping for 2 seconds to catch-up with state)
INFO: status OK

(deck@steamdeck ~)$ steamos-wifi-set-backend --check
wpa_supplicant

(deck@steamdeck ~)$ ping -c 3 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=56 time=16.4 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=56 time=24.8 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=56 time=16.5 ms

--- 1.1.1.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 16.370/19.195/24.755/3.931 ms

(deck@steamdeck ~)$ steamos-wifi-set-backend iwd
INFO: switching back-end to 'iwd'
INFO: stopping old back-end service and restarting NetworkManager,
      networking will be disrupted (hopefully only momentary blip, max 10 seconds)...
INFO: restarting done
INFO: checking status of services ...
      (sleeping for 2 seconds to catch-up with state)
INFO: status OK

(deck@steamdeck ~)$ steamos-wifi-set-backend --check
iwd

(deck@steamdeck ~)$ ping -c 3 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=56 time=16.0 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=56 time=16.5 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=56 time=21.3 ms

--- 1.1.1.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 15.991/17.940/21.295/2.382 ms

By hand

⚠️ IMPORTANT ⚠️ : Please do this only if you know what you are doing, you don't fear minor breakage and you are confident that you can recover from the situation if something goes wrong. If unsure, better wait until the channel that you run supports the commands to handle this automatically.

If you are running a channel that still does not have this command, e.g. Stable at the time of writing this, you can achieve the same effect by executing the instructions run by this script by hand.

Note, however, that you might easily loose network connectivity if you make some mistake or something goes wrong. So be prepared to restore things by hand by switching to desktop-mode and having a terminal ready.

Now, the actual instructions. In /etc/NetworkManager/conf.d/wifi_backend.conf, change the string of wifi.backend, for example comment out the default one with iwd and add a line using wpa_supplicant as value:

[device]
#wifi.backend=iwd
wifi.backend=wpa_supplicant
wifi.iwd.autoconnect=yes

⚠️ IMPORTANT ⚠️ : Leave the other lines alone. If you don't have experience with the format you can easily cause the file to not be parseable, and then networking could stop working entirely.

After that, things get more complicated depending on the hardware on which the system runs.

If the device is the SteamDeck, there is a difference between the original LCD and the newer OLED model. With the OLED model, when you switch away from iwd, typically the network device is destroyed, so it's necessary to re-create it. This happens automatically when booting the system, so the easiest way if the network doesn't come up after 15 seconds after the following instructions, is to just restart the whole system.

If you're running SteamOS on models of the SteamDeck that don't exist at the time of writing this, or on other types of computers, the situation can be similar to the OLED or even more complex, depending on the WiFi hardware and drivers. So, again, tread carefully.

In the simplest scenario, LCD model, execute the following commands as root (or prepending sudo), with OLD_BACKEND being either iwd or wpa_supplicant, the one that you want to migrate away from:

systemctl stop NetworkManager
systemctl disable --now OLD_BACKEND
systemctl restart NetworkManager

If you have the OLED model and switching from wpa_supplicant to iwd, that should be enough as well.

But if you have the OLED model and you changed from iwd to wpa_supplicant, as explained above, your system will not be able to connect to the network, and the easiest is to restart the system.

However, if you want to try to solve the problem by hand without restarting, then execute these commands as root/sudo:

systemctl stop NetworkManager
systemctl disable --now OLD_BACKEND

if ! iw dev wlan0 info &>/dev/null; then iw phy phy0 interface add wlan0 type station; fi

systemctl restart NetworkManager

In any case, if after trying to change the back-end by hand and rebooting it's still not being able to connect, perhaps the best thing to do is to restore the previous configuration and restart the system again, and wait until your channel is upgraded to include this command.

Change back-ends via UI?

In the future, it is conceivable that the UI allows to change WiFi backends with the help of this command or by other means, but this is not possible at the moment.

by Manuel A. Fernandez Montecelo at May 15, 2024 09:21 AM

May 13, 2024

Andy Wingo

partitioning pitfalls for generational collectors

You might have heard of long-pole problems: when you are packing a tent, its bag needs to be at least as big as the longest pole. (This is my preferred etymology; there are others.) In garbage collection, the long pole is the longest chain of object references; there is no way you can algorithmically speed up pointer-chasing of (say) a long linked-list.

As a GC author, some of your standard tools can mitigate the long-pole problem, and some don’t apply.

Parallelism doesn’t help: a baby takes 9 months, no matter how many people you assign to the problem. You need to visit each node in the chain, one after the other, and having multiple workers available to process those nodes does not get us there any faster. Parallelism does help in general, of course, but doesn’t help for long poles.

You can apply concurrency: instead of stopping the user’s program (the mutator) to enumerate live objects, you trace while the mutator is running. This can happen on mutator threads, interleaved with allocation, via what is known as incremental tracing. Or it can happen in dedicated tracing threads, which is what people usually mean when they refer to concurrent tracing. Though it does impose some coordination overhead on the mutator, the pause-time benefits are such that most production garbage collectors do trace concurrently with the mutator, if they have the development resources to manage the complexity.

Then there is partitioning: instead of tracing the whole graph all the time, try to just trace part of it and just reclaim memory for that part. This bounds the size of a long pole—it can’t be bigger than the partition you trace—and so tracing a graph subset should reduce total pause time.

The usual way to partition is generational, in which the main program allocates into a nursery, and objects that survive a few collections then get promoted into old space. But there may be other partitions, for example to put “large objects” (for example, bigger than a few virtual memory pages) in their own section, to be managed with their own algorithm.

partitions, take one

And that brings me to today’s article: generational partitioning has a failure mode which manifests itself as spurious out-of-memory. For example, in V8, running this small JavaScript program that repeatedly allocates a 1-megabyte buffer grows to consume all memory in the system and eventually panics:

while (1) new Uint8Array(1e6);

This is a funny result, to say the least. Let’s dig in a bit to see why.

First off, note that allocating a 1-megabyte Uint8Array makes a large object, because it is bigger than half a V8 page, which is 256 kB on most systems There is a backing store for the array that will get allocated into the large object space, and then the JS object wrapper that gets allocated in the young generation of the regular object space.

Let’s imagine the heap consists of the nursery, the old space, and a large object space (lospace, for short). (See the next section for a refinement of this model.) In the loop, we cause allocation in the nursery and the lospace. When the nursery starts to get full, we run a minor collection, to trace the newly allocated part of the object graph, possibly promoting some objects to the old generation.

Promoting an object adds to the byte count in the old generation. You could say that promoting causes allocation in the old-gen, though it might happen just on an accounting level if an object is promoted in place. In V8, old-gen allocation is subject to a limit, and that limit is set by a gnarly series of heuristics. Exceeding the limit will cause a major GC, in which the whole heap is traced.

So, minor collections cause major collections via promotion. But what if a minor collection never promotes? This can happen in request-oriented workflows, in which a request comes in, you handle it in JS, write out the result, and then handle the next. If the nursery is at least as large as the memory allocated in one request, no object will ever survive more than one collection. But in our new Uint8Array(1e6) example above, we are allocating to newspace and the lospace. If we never cause promotion, we will always collect newspace but never trigger the full GC that would also collect the large object space, triggering this out-of-memory phenomenon.

partitions, take two

In general, the phenomenon we are looking for is nursery allocations that cause non-nursery allocations, where those non-nursery allocations will not themselves bring about a major GC.

In our example above, it was a typed array object with an associated lospace backing store, assuming the lospace allocation wouldn’t eventually trigger GC. But V8’s heap is not exactly like our model, for one because it actually has separate nursery and old-generation lospaces, and for two because allocation to the old-generation lospace does count towards the old-gen allocation limit. And yet, that simple does still blow through all memory. So what is the deal?

The answer is that V8’s heap now has around two dozen spaces, and allocations to some of those spaces escape the limits and allocation counters. In this case, V8’s sandbox makes the link between the JS typed array object and its backing store pass through a table of indirections. At each allocation, we make an object in the nursery, allocate a backing store in the nursery lospace, and then allocate a new entry in the external pointer table, storing the index of that entry in the JS object. When the object needs to get its backing store, it dereferences the corresponding entry in the external pointer table.

Our tight loop above would therefore cause an allocation (in the external pointer table) that itself would not hasten a major collection.

Seen this way, one solution immediately presents itself: maybe we should find a way to make external pointer table entry allocations trigger a GC based on some heuristic, perhaps the old-gen allocated bytes counter. But then you have to actually trigger GC, and this is annoying and not always possible, as EPT entries can be allocated in response to a pointer write. V8 hacker Michael Lippautz summed up the issue in a comment that can be summarized as “it’s gnarly”.

In the end, it would be better if new-space allocations did not cause old-space allocations. We should separate the external pointer table (EPT) into old and new spaces. Because there is at most one “host” object that is associated with any EPT entry—if it’s zero objects, the entry is dead—then the heuristics that apply to host objects will do the right thing with regards to EPT entries.

A couple weeks ago, I landed a patch to do this. It was much easier said than done; the patch went through upwards of 40 revisions, as it took me a while to understand the delicate interactions between the concurrent and parallel parts of the collector, which were themselves evolving as I was working on the patch.

The challenge mostly came from the fact that V8 has two nursery implementations that operate in different ways.

The semi-space nursery (the scavenger) is a parallel stop-the-world collector, which is relatively straightforward... except that there can be a concurrent/incremental major trace in progress while the scavenger runs (a so-called interleaved minor GC). External pointer table entries have mark bits, which the scavenger collector will want to use to determine which entries are in use and which can be reclaimed, but the concurrent major marker will also want to use those mark bits. In the end we decided that if a major mark is in progress, an interleaved collection will neither mark nor sweep the external pointer table nursery; a major collection will finish soon, and reclaiming dead EPT entries will be its responsibility.

The minor mark-sweep nursery does not run concurrently with a major concurrent/incremental trace, which is something of a relief, but it does run concurrently with the mutator. When we promote a page, we also need to promote all EPT entries for objects on that page. To keep track of which objects have external pointers, we had to add a new remembered set, built up during a minor trace, and cleared when the page is swept (also lazily / concurrently). Promoting a page iterates through that set, evacuating EPT entries to the old space. This is additional work during the stop-the-world pause, but hopefully it is not too bad.

To be honest I don’t have the performance numbers for this one. It rides the train this week to Chromium 126 though, and hopefully it fixes this problem in a robust way.

partitions, take three

The V8 sandboxing effort has sprouted a number of indirect tables: external pointers, external buffers, trusted objects, and code objects. Also recently to better integrate with compressed pointers, there is also a table for V8-to-managed-C++ (Oilpan) references. I expect the Oilpan reference table will soon grow a nursery along the same lines as the regular external pointer table.

In general, if you have a generational partition of your main object space, it would seem that you almost always need a generational partition of every other space. Otherwise either you cause new allocations to occur in what is effectively an old space, perhaps needlessly hastening a major GC, or you forget to track allocations in that space, leading to a memory leak. If the generational hypothesis applies for a wrapper object, it probably also applies for any auxiliary allocations as well.

fin

I have a few cleanups remaining in this area but I am relieved to have landed this patch, and pleased to have spent time under the hood of V8’s collector. Many many thanks to Samuel Groß, Michael Lippautz, and Dominik Inführ for their review and patience. Until the next cycle, happy allocating!

by Andy Wingo at May 13, 2024 01:03 PM

May 12, 2024

Alex Bradbury

Notes from the Carbon panel session at EuroLLVM 2024

Last month I had the pleasure of attending EuroLLVM which featured a panel session on the Carbon programming language. It was recorded and of course we all know automated transcription can be stunningly accurate these days, yet I still took fairly extensive notes throughout. I often take notes (or near transcriptions) as I find it helps me process. I'm not sure whether I'm adding any value by writing this up, or just contributing entropy but here it goes. You should of course assume factual mistakes or odd comments are errors in my transcription or interpretation, and if you keep an eye on the LLVM YouTube channel you should find the session recording uploaded there in the coming months.

First, a bit of background on the session, Carbon, and my interest in it. Carbon is a programming language started by Google aiming to be used in many cases where you would currently use C++ (billed as "an experimental successor to C++"). The Carbon README gives a great description of its goals and purpose, but one point I'll highlight is that ease of interoperability with C++ is a key design goal and constraint. The project recognises that this limits the options for the language to such a degree that it explicitly recommends that if you are able to make use of a modern language like Go, Swift, Kotlin, or Rust, then you should. Carbon is intended to be there for when you need that C++ interoperability. Of course, (and as mentioned during the panel) there are parallel efforts to improve Rust's ability to interoperate with C++.

Whatever other benefits the Carbon project is able to deliver, I think there's huge value in the artefacts being produced as part of the design and decision making process so far. This has definitely been true of languages like Swift and especially Rust, where going back through the development history there's masses of discussion on difficult decisions e.g. (in the case of Rust) green threads, the removal of typestate, or internal vs external iterators. The Swift Evolution and Rust Internals forums are still a great read, but obviously there's little ability to revisit fundamental decisions at this point. I try to follow Carbon's development due to a combination of its openness, the fact it's early stage enough to be making big decisions (e.g. lambdas in Carbon), and also because they're exploring different design trade-offs than those other languages (e.g. the Carbon approach to definition-checked generics). I follow because I'm a fan of programming language design discussion, not necessarily because of the language itself, which may or may not turn out to be something I'm excited to program in. As a programming language development lurker I like to imagine I'm learning something by osmosis if nothing else - but perhaps it's just a displacement activity...

If you want to keep up with Carbon, then check out their recently started newsletter, issues/PRs/discussions on GitHub and there's a lot going on on Discord too (I find group chat distracting so don't really follow there). The current Carbon roadmap is of course worth a read too. In case I wasn't clear enough already, I have no affiliation to the project. From where I'm standing, I'm impressed by the way it's being designed and developed out in the open, more than (rightly or wrongly) you might assume given the Google origin.

Notes from the panel session

The panel "Carbon: An experiment in different tradeoffs" took place on April 11th at EuroLLVM 2024. It was moderated by Hana Dusíková and the panel members were Chandler Carruth, Jon Ross-Perkins, and Richard Smith. Rather than go through each question in the order they were asked I've tried to group together questions I saw as thematically linked. Anything in quotes should be a close approximation of what was said, but I can't guarantee I didn't introduce errors.

Background and basics of Carbon

This wasn't actually the first question, but I think a good starting point is 'how would you sell Carbon to a C++ user?'. Per Chandler, "we can provide a better language, tooling, and ecosystem without needing to leave everything you built up in C++ (both existing code and the training you had). The cost of leveraging carbon should have as simple a ramp as possible. Sometimes you need performance, we can give more advanced tools to help you get the most performance from your code. In other cases, it's security and we'll be able to offer more tools here than C++. It's having the ability to unlock improvements in the language without having to walk away from your investments in C++, that of course isn't going anywhere."

When is it going to be done? Chandler: "When it's ready! We're trying to put our roadmap out there publicly where we can. It's a long term project. Our goal for this year is to get the toolchain caught up with the language design, get into building practical C++ interop this year. There are many unknowns in terms of how far we'll get this year, but next year I think you'll see a toolchain that works and you can do interesting stuff with and evaluate in a more concrete context."

As for the size of the team and how many companies are contributing, the conclusion was that Google is definitely the main backer right now but there are others starting to take a look. There are probably about 5-10 people active on Carbon in any different week, but it varies so this can be a different set of people from week to week.

Given we've established that Google are still the main backer, one line of questioning was about what Google see in it and how management were convinced to back it. Richard commented "I think we're always interested in exploring new possibilities for how to deal with our existing C++ codebase, which is a hugely valuable asset for us. That includes both looking at how we can keep the C++ language itself happy and our tools for it being good and maintainable, but also places we might take it in the future. For a long time we've been talking about if we can build something that works better for our use case than existing tools, and had an opportunity to explore that and went with it."

A fun question that followed later aimed to establish the favourite thing about Carbon from each of the panel members (of course, it's nice that much of this motivation overlaps with the reasons for my own interest!):

  • Jon: For me it's unlocking the ability to improve the developer experience. Improving the language syntax, making things more understandable. Also C++ is very slow to compile and we're hoping for 1/3rd of the compile time for a typical file vs Clang.
  • Richard: from a language design perspective, being able to go through and systematically evaluate every decision and look at them again using things that have been learned since then.
  • Chandler: It's not so much the systematic review for me. We're building this with an eye to C++, and to integrate with C++ need a fairly big complex language (generics, inheritance, overloading etc etc). But when we look at this stuff, we keep finding principled and small/focused designs that we can use to underpin the sort of language functionality that is in C++ but fits together in a coherent and principled way. And we write this down. I think a lot of people don't realise that Carbon is writing down all the design as we go rather than having some mostly complete starting point and then making changes to that using something like an RFC process.

Finally, (for this section) there was also some intriguing discussion about Carbon's design tradeoffs. This is covered mostly elsewhere, but I think Chandler's answer focusing on the philosophy of Carbon rather than technical implementation details fits well here: "One of the philosophical goals of our language design is we don't try to solve hard problems in the compiler. [In other languages] there's often a big problem and the compiler has to solve it, freeing the programmer from worrying about it. Instead we provide the syntax and ask the programmer to tell us. e.g. for type inference, you could imagine having Hindley-Milner type inference. That's a hard problem, even if it makes ergonomics better in some cases. So we use a simpler and more direct type system. You see this crop up all over the language design."

Carbon governance

The first question related to organisation of the project and its governance was about what has been done to make it easier for contributors to get involved. My interpretation of the responses is that although they've had some success in attracting new contributors, the feeling is that there's more that could be done here. e.g. from Jon "Things like maintaining the project documentation, writing a start guide on how to build. We're making some different infrastructure choices, e.g. bazel, but in general we're trying to use things people are familiar with: GitHub, GH actions etc. There's probably still room for improvement. Listening to the LLVM newcomers session a couple of days ago gave me some ideas. Right now there's probably a lot of challenge learning how the different model works in Carbon."

The governance model for Carbon in general was something Chandler had a lot to say about:

  • On getting it in place early: "I was keen we start off with it, seeing challenges with C++ or even LLVM where as a project gets big the governance model that worked well at the beginning might not work so well and changing it at that point can be really hard. So I wanted to set up something that could scale, and can be changed if needed.
  • Initial mistakes: "The initial model was terrible. It was very elaborate, a large core team of people that pulled in people that weren't even involved in the project. We'd have weekly/biweekly meetings, a complex consensus process. In some ways it worked, we were being intentional and it was clear there was a governance project. But its was so inefficient, so people tried to avoid it. So we revamped it and simplified it, distilling it to something a little closer to the benevolent dictator for life model with minimal patches. Three leads rather than one to add some redundancy, one top-level governance body, will eventually have a rotation there."
  • The model now: "It's not that we'll try to agree on everything - if a change is uncontroversial then any one of the leads can approve, which gives us 3x bandwidth. Except for the cases if there is controversy, then the three of us step back and discuss. I had my personal pet favourite feature of carbon overturned by this governance model. The model allows us to make decisions quickly and easily, and most decisions are reversible if they turn out to be wrong. So we don't try to optimise for getting to the best outcome, rather to minimise cost for getting to an outcome. We're still early - we have a plan for scaling but that's more in the future."

Further questioning led to additional points being made about the governance model, such as the fact there are now several community members unaffiliated to Google with commit privileges, that the project goals are documented which helps contributors understand if a proposal is likely to be aligned with Carbon or not, and that these goals are potentially changeable if people come in with good arguments as to why they should be updated.

Carbon vs other languages

Naturally, there were multiple questions related to Carbon vs Rust. In terms of high-level comparisons, the panel members were keen to point out that Rust is a great language and is mature, while Carbon remains at an early stage. There are commonalities in terms of the aim to provide modern type system features to systems programmers, but the focus on C++ interoperability as a goal driving the language design is a key differentiator.

Thoughts about Rust naturally lead to questions about Carbon's memory safety story, and whether Carbon plans to have something like Rust's borrow checker. Richard commented "I think it's possible to get something like that. We're not sure exactly what it will look like yet, we're planning to look at this more between the 0.1 and 0.2 milestone. Borrow checking is an interesting option to pursue but there are some others to explore."

Concerns were also raised about whether given that C++ interop is such a core goal of Carbon, it may have problems as C++ continues to evolve (perhaps with features that clash with features add in Carbon). Chandler's response was "As long as clang gets new C++ features, we'll get them. Similar to how swift's interop works, linking the toolchain and using that to understand the functionality. If C++ starts moving in ways addressing the safety concerns etc that would be fantastic. We could shut down carbon! I don't think that's very likely. In terms of functionality that would make interop not work well that's a bit of a concern, but of course if C++ adds something they need it to interop with existing C++ code so we face similar constraints. While Richard commented "C++ modules is a big one we need to keep an eye on. But at the moment as modules adoption hasn't taken off in a big way yet, we're still targeting header files." There was also a brief exchange about what if Carbon gets reflection before C++, with the hope that if it did happen it could help with the design process by giving another example for C++ to learn from (just as Carbon learns from Rust, Swift, Go, C++, and others).

Compiler implementation questions

EuroLLVM is of course a compilers conference, so there was a lot of curiousity about the implementation of the Carbon toolchain itself. In discussing trade-offs in the design, Jon leapt into a discussion of this "to go for SemIR vs a traditional AST. In Clang the language is represented by an AST, which is what you're typically taught about how compilers work. We have a semantic IR model where we produce lex tokens, then a parse tree (slightly different to an AST) an then it becomes SemIR. This is very efficient and fast to process, lowers to LLVM IR very easily, but it's a different model and a different way to think about writing a compiler. To the earlier point about newcomers, it's something people have to learn and a bit of a barrier because of that. We try to address it, but it is a trade-off." Jon since provided more insight on Carbon's implementation approach in a recent /r/ProgrammingLanguages thread.

Given what Carbon devs have learned about applying data oriented design to optimise the frontend (see talks like Modernizing Compiler Design for Carbon Toolchain), could the same ideas be applied to the backend? Chandler commented on the tradeoff he sees "By necessity with data-oriented design you have to specialise a lot. We think we can benefit from this a lot with the frontend at the cost of reusability. The trade-off might be different within LLVM."

When asked about whether the Carbon frontend may be reusable for other tools (reducing duplicated effort writing parsers for Carbon), Jon responded "I'm pretty confident at least through the parse tree it will be reusable. I'm more hesitant to make that claim through to SemIR. It may not be the best choice for everyone, and in some cases you may want more of an AST. But providing these tools as part of the language project like a formatter, migration tool [for language of API changes], something like clang-tidy, we're going to be taking on these costs ourselves which is going to incentivise us to find ways of amortising this and finding the right balance."


Article changelog
  • 2024-05-13: Various phrasing tweaks and fixed improper heading nesting.
  • 2024-05-12: Initial publication date.

May 12, 2024 12:00 PM

May 09, 2024

Brian Kardell

Brian Specific Failures

Brian Specific Failures

Through my work in Interop, I've been I've been learning stuff about WPT. Part of this has been driven by a desire to really understand the graph on the main page about browser specific failures (like: which tests?). You might think this is stuff I'd already know, but I didn't. So, I thought I'd share some of the things I learned along the way, just incase you didn't know either.

Did you know that wpt.fyi has lots of ways to query it? It does!

You can, for example, type nameOfBrowser: fail (or nameOfBrowser: pass) into the search box - and you can even string together several of them to ask interesting questions!

What tests fail in every browser?

It seemed like an interesting question: Are there tests that fail in every single browser?

A query for that might look like: chrome:fail firefox:fail edge:fail safari:fail (using the browsers that are part of Interop, for example). And, as of this writing, it tells me...

There are 2,724 tests (15,718 subtests) which fail in every single browser!

Wow. I didn't expect there to be zero, but that feels like kind of a lot of tests to fail in every browser, doesn't it? How can that even happen? To be totally honest, I don't know!

A chicken and egg problem?

The purpose of WPT is to test conformance to a standard, but this is... tricky. Tests can be used to discuss specs as they are being written - or highlight things that are yet to be implemented, and WPT doesn't have the same kind of guardrails as specs. Things can get committed there that aren't "real" or yet agreed upon. Frequently the first implementers and experimenters write tests. And, it's not uncommon that then things change - or sometimes the specs just never go anywhere. Maybe those tests sit in a limbo "hoping" to advance?

WPT asks that files of early things like this be placed in a /tentative directory or have a .tentative in the names of early things like that which don't have agreement.

Luckily thanks to the query API we can see which of the tests above fall into that category by adding is:tentative to our query: chrome:fail firefox:fail edge:fail safari:fail is:tentative). We can see that indeed

348 of the tests (1,911 subtests) that fail in every single browser are marked as tentative.

The query language supports negating parameters with an exclamation, so we can adjust the query (chrome:fail firefox:fail edge:fail safari:fail !is:tentative) and see...

2,372 tests (13,797 subtests) of the tests that fail in every single browser are not marked as tentative.

So, I guess .tentative doesn't explain most of it (or maybe some of those just didn't follow that guidance). What does? I don't know!

Poking around, I see there's even 1 ACID test that literally everyone fails. Maybe that's just a bad test at this point? Maybe all of the tests that match this query need reviewed? I don't know!

You can also query simply all(status:fail) which will show pretty much the same thing for whatever browsers you happen to be showing. I like the explicit version for introducing this though as it's very clear from the query itself which "all" I'm referring to about.

Are there any tests that only pass in one browser?

I also wondered: Hmm... If there are tests that fail in exactly one browser, and we've just shown there's a bunch that pass in none, how many are there that pass in exactly one? That's pretty interesting to look at too:

Engines and Browsers: Tricky

Often when we're showing results we're using the 3 flagship browsers as a proxy for engine capabilies. We do that a lot - like in the Browsers Specific Failures (BSF) graph at the top of wpt.fyi - but... It's an imperfect proxy.

For example, there are also tests that only fail in Edge. As of this writing 1,456 tests (1,727 subtests) of them.

Or, there's also tests that fail uniquely in WebKit GTK - 1,459 tests (1,740 subtests) of those.

Here's where there's a bit of a catch-22 though. BSF is really useful for the browser teams, so links like the above are actually very handy if you're on the Edge or GTK teams. But we can't add those to the BSF chart because it kind of really changes the meaning of the whole thing. What that kind of wants to be is "Engine Specific Failures", but that's not actually directly measurable.

Brian Specific Failures...

Below is an interesting table which links to query results that show actual tests that report as failing in only one of the three flagship browsers.

Browser Non-tentative Subtests which uniquely fail among these browsers
Chrome 1,345 tests (12,091 subtests)
Firefox 2,290 tests (16,791 subtests)
Safari 4,263 tests (14,867 subtests)

If that sounds like the data for the Browser Specific Failures (BSF) graph on the front page of wpt.fyi, yes. And no. I call this the "Brian Specific Failures" (BSF) table.

I think that this is about as close as we're probably going to get in the short term to a linkable set of tests that fail, if you'd like to explore them. The BSF graph is also, I believe, more disciminating than just "pass" and "fail" that we're showing here. Like, what do you do if a test times out? Are there intermediate or currently unknown statuses?

It was also kind of interesting, for me at least, while looking at the data, to realize just how different the situation looks depending on whether you are looking at tests, or subtests. Firefox fails the most subtests, but Safari fails the most tests. BSF "scores" subtests as decimal values of a whole test.

It's pretty complicated to pull any of this off actually. Things go wrong with tests and runs, and so on. I also realized that all of these scores are inherently a little imperfect.

For example, if you at the top of the page, it says (as I am writing this) there are 55,176 tests (1,866,346 subtests)

But if you look to the bottom of the screen, the table has a summary with the passing/total subtests for each browser. As I am writing this it says:

chrome: 1836812 / 1890027, edge: 1834777 / 1887771, firefox: 1786949 / 1868316, safari: 1787112 / 1879926

You might note that none of those lists the same number of subtests!

Anyway, it was an interesting rabbit hole to dive into! I have further thoughts on the WPT.fyi dashboard page which I'll share in another post, but I'd love to hear of any interesting queries that you come up with or questions you have. I probably don't know the answers, but I'll try to help find them!

May 09, 2024 04:00 AM

Igalia Compilers Team

MessageFormat 2.0: A domain-specific language?

code { word-break: normal; }

I recommend reading the prequel to this post, "MessageFormat 2.0: a new standard for translatable messages"; otherwise what follows won't make much sense!

This blog post represents the Igalia compilers team; all opinions expressed here are our own and may not reflect the opinions of other members of the MessageFormat Working Group or any other group.

What we've achieved #

In the previous post, I've just described the work done by the members of the MF2 Working Group to develop a specification for MF2, a process that spanned several years. When I joined the project in 2023, my task was to implement the spec and produce a back-end for the JavaScript Intl.MessageFormat proposal.

|-------------------------------|
| JavaScript code using the API |
|-------------------------------|
| browser engine                |
|-------------------------------|
| MF2 in ICU                    |
|-------------------------------|

For code running in a browser, the stack looks like this: web developers write JavaScript code that uses the API; their code runs in a JavaScript engine that's built into a Web browser; and that browser makes API calls to the ICU library in order to execute the JS code.

The bottom layer of the stack is what I've implemented as a tech preview. Since the major JavaScript engines are implemented in C++, I worked on ICU4C, the C++ version of ICU.

The middle level is not yet implemented. A polyfill makes it possible to try out the API in JavaScript now.

Understanding MF2 #

To understand the remainder of this post, it's useful to have read over a few more examples from the MF2 documentation. See:

Is MF2 a programming language? (And does it matter?) #

Different members of the working group have expressed different opinions on whether MF2 is a programming language:

  • Addison Phillips: "I think we made a choice (we can reconsider it if necessary, although I don't think we need to) that MF2 is not a resource format. I'm not sure if 'templating language' is the right characterization, but let's go with it for now." (comment on pull request 474, September 2023)
  • Stanisław Małolepszy: "...we’re neither resource format nor templating language. I tend to think we’re a storage format for variants." (comment on the same PR)
  • Mihai Niță: "For me, this is a DSL designed for i18n / l10n." (comment on PR 507)

MF2 lacks block structure and complex control flow: the .match construct can't be nested, and .local variable declarations are sequential rather than nested. Some people might expect a general-programming language to include these features.

But in implementing it, I have found that several issues arise that are familiar from my experience studying, and implementing, compilers and interpreters for more conventional programming languages.

I wrote in part 1 that when using either MF1 or MF2, messages can be separated from code -- but it turns out that maybe, messages are code! But they constitute code in a different programming language, which can be cleanly separated from the host programming language, just as with uninterpreted strings.

Abstract syntax #

As with a programming language, the syntax of MF2 is specified using a formal grammar, in this case using Augmented Backus-Naur Form. In most compilers and interpreters, a parser (that's either generated from a specification of the formal grammar, or written by hand to closely follow that grammar) reads the text of the program and generates an abstract syntax tree (AST), which is an internal representation of the program.

MF2 has a data model, which is like an AST. Users can either write messages in the text-based syntax (probably easiest for most people) or construct a data model directly using part of the API (useful for things like building external tools).

Parsers and ASTs are familiar concepts for programming language implementors, and writing the "front-end" for MF2 (the parser, and classes that define the data model) was not that different from writing a front-end for a programming language. Because it works on ASTs, the formatter itself (the part that does all the work once everything is parsed) also looks a lot like an interpreter. The "back-end", producing a formatted result, is currently quite simple since the end result is a flat string -- however, in the future, "formatting to parts" (necessary to easily support features like markup) will be supported.

For more details, see Eemeli Aro's FOSDEM 2024 talk on the data model.

Naming #

MF2 has local variable declarations, like the let construct in JavaScript and in some functional programming languages. That means that in implementing MF2, we've had to think about evaluation strategies (lazy versus eager evaluation), scope, and mutability. MF2 settled on a spec in which all local variables are immutable and no name shadowing is permitted except in a very limited way, with .input declarations. Free variables are permitted, and don't need to be declared explicitly: these correspond to runtime arguments provided to a message. Eager and lazy implementations can both conform to the spec, since variables are immutable, and lazy implementations are free to use call-by-need semantics (memoization). These design choices weren't obvious, and in some cases, inspired quite a lot of discussion.

Naming can have subtle consequences. In at least one case, the presence of .local declarations necessitates a dataflow analysis (albeit a simple one). The analysis is to check for "missing selector annotation" errors; the details are outside the scope of this post, but the point is just that error checking requires "looking through" a variable reference, possibly multiple chains of references, to the variable's definition. Sounds like a programming language!

Functions #

Placeholders in MF2 may be expressions with annotations. These expressions resemble function calls in a more conventional language. Unlike in most languages, functions cannot be implemented in MF2 itself, but must be implemented in a host language. For example, in the ICU4C implementation, the functions would either be written in C++ or (less likely) another language that has a foreign function interface with C++.

Functions can be built-in (required in the spec to be provided), or custom. Programmers using an implementation of the MF2 API can write their own custom functions that format data ("formatters"), that customize the behavior of the .match construct ("selectors"), or do both. Formatters and selectors have different interfaces.

The term "annotation" in the MF2 spec is suggestive. For example, a placeholder like {$x :number} can be read: "x should be formatted as a number." But custom formatters and selectors may also do much more complicated things, as long as they conform to their interfaces. An implementor could define a custom :squareroot function, for example, such that {$x :squareroot} is replaced with a formatted number whose value is the square root of the runtime value of $x. It's unclear how common use cases like this will be, but the generality of the custom function mechanism makes them possible.

Wherever there are functions, functional programmers think about types, explicit or implicit. MF2 is designed to use a single runtime type: it's unityped. The spec uses the term "resolved value" to refer to the type of its runtime values. Moreover, the spec tries to avoid constraining the structure of "resolved values", to allow the implementation as much freedom as possible. However, the implementation has to call functions written in a host language that may have a different type system. The challenge comes in defining the "function registry", which is the part of the MF2 spec that defines both the names and behavior of built-in functions, and how to specify custom functions.

Different host languages have different type systems. Specifying the interface between MF2 and externally-defined functions is tricky, since the goal is to be able to implement MF2 in any programming language.

While a language that can call foreign functions but not define its own functions is unusual, defining foreign function interfaces that bridge the gaps between disparate programming languages is a common task when designing a language. This also sounds like a programming language.

Pattern matching #

The .match construct in MF2 looks a lot like a case expression in Haskell, or perhaps a switch statement in C/C++, depending on your perspective. Unlike switch, .match allows multiple expressions to be examined in order to choose an alternative. And unlike case, .match only compares data against specific string literals or a wildcard symbol, rather than destructuring the expressions being scrutinized.

The ability provided by MF2 to customize pattern-matching is unusual. An implementation of a selector function takes a list of keys, one per variant, and returns a sorted list of keys in order of preference. The list is then used by the pattern matching algorithm in the formatter to pick the best-matching variant given that there are multiple selectors. An abstract example is:

.match {$x :X} {$y :Y}
A1 A2 {{1}}
B1 B2 {{2}}
C1 * {{3}}
* C2 {{4}}
* * {{5}}

In this example, the implementation of X would be called with the list of keys [A1, B1, C1] (the * key is not passed to the selector implementation) and returns a list of the same keys, arranged in any order. Likewise, the implementation of Y would be called with [A2, B2, C2].

I don't know of any existing programming languages with a pattern-matching construct like this one; usually, the comparison between values and patterns is based on structural equality and can't be abstracted over. But the ability to factor out the workings of pattern matching and swap in a new kind of matching (by defining a custom selector function) is a kind of abstraction that would be found in a general-purpose programming language. Of course, the goal here is not to match patterns in general but to select a grammatically correct variant depending on data that flows in at runtime.

But is it Turing-complete? #

MF2 has no explicit looping constructs or recursion, but it does have function calls. The details of how custom functions are realized are implementation-specific; typically, using a general-purpose programming language. That means that MF2 can invoke code that does arbitrary computation, but not express it. I think it would be fair to say that the combination of MF2 and a suitable registry of custom functions is Turing-complete, but MF2 itself is not Turing-complete. For example, imagine a custom function named eval that accepts an arbitrary JavaScript program as a string, and returns its output as a string. This is not how MF2 is intended to be used; the spec notes that "execution time SHOULD be limited" for invocations of custom functions. I'm not aware of any other languages whose Turing-completeness depends on their execution environment. (Though there is at least one lengthy discussion of whether CSS is Turing-complete.)

If custom functions were removed altogether and the registry of functions was limited to a small built-in set, MF2 would look much less like a programming language; its underlying "instruction set" would be much more limited.

Code versus data #

There is an old Saturday Night Live routine: “It’s a floor wax! It’s a dessert topping! It’s both!” XML is similar. “It’s a database! It’s a document! It’s both!” -- Philip Wadler, "XQuery: a typed functional language for querying XML" (2002)

The line between code and data isn't always clear, and the MF2 data model can be seen as a representation of the input data, rather than as an AST representing a program. Likewise, the syntax can be seen as a serialized format for representing the data, rather than as the syntax of a program.

There is also a "floor wax / dessert topping" dichotomy when it comes to functions. Is MF2 an active agent that calls functions, or does it define data passed to an API, whose implementation is what calls functions?

"Language" has multiple meanings in software: it can refer to a programming language, but the "L" in "HTML" and "XML" stands for "language". We would usually say that HTML and XML are markup languages, not programming languages, but even that point is debatable. After all, HTML embeds JavaScript code; the relationship between HTML and JavaScript resembles the relationship between MF2 and function implementations. Languages differ in how much computation they can directly express, but there are common aspects that unite different languages, like having a formal grammar.

I view MF2 as a domain-specific language for formatting messages, but another perspective is that it's a representation of data passed to an API. An API itself can be viewed as a domain-specific language: it provides verbs (functions or methods that can be called) and nouns (data structures that the functions can accept.)

Summing up #

As someone with a background in programming language design and implementation, I'm naturally inclined to see everything as a language design problem. A general-purpose programming language is complex, or at least can be used to solve complex problems. In contrast, those who have worked on the MF2 spec over the years have put in a lot of effort to make it as simple and focused on a narrow domain as possible.

One of the ways in which MF2 has veered towards programming language territory is the custom function mechanism, which was added to provide extensibility. The mechanism is general, but if there was a less general solution that still supported the desired range of use cases, these years of effort would have unearthed one. The presence of name binding (with all the complexities that brings up) and an unusual form of pattern matching also suggest to me that it's appropriate to consider MF2 a programming language, and to apply known programming language techniques to its design and implementation. This shows that programming language theory has interesting applications in internationalization, which is a new finding as far as I know.

What's left to do? #

The MessageFormat spec working group welcomes feedback during the tech preview period. User feedback on the MF2 spec and implementations will influence its future design. The current tech preview spec is part of LDML 45; the beta version, to be included in LDML 46, may include improvements suggested by users and implementors.

Igalia plans to continue collaboration to advance the Intl.MessageFormat proposal in TC39. Implementation in browser engines will be part of that process.

Acknowledgments #

Thanks to my colleagues Ben Allen, Philip Chimento, Brian Kardell, Eric Meyer, Asumu Takikawa, and Andy Wingo; and MF2 working group members Eemeli Aro, Elango Cheran, Richard Gibson, Mihai Niță, and Addison Phillips for their comments and suggestions on this post and its predecessor. Special thanks to my colleague Ujjwal Sharma, whose work I borrowed from in parts of the first post.

May 09, 2024 12:00 AM

May 07, 2024

Melissa Wen

Get Ready to 2024 Linux Display Next Hackfest in A Coruña!

We’re excited to announce the details of our upcoming 2024 Linux Display Next Hackfest in the beautiful city of A Coruña, Spain!

This year’s hackfest will be hosted by Igalia and will take place from May 14th to 16th. It will be a gathering of minds from a diverse range of companies and open source projects, all coming together to share, learn, and collaborate outside the traditional conference format.

Who’s Joining the Fun?

We’re excited to welcome participants from various backgrounds, including:

  • GPU hardware vendors;
  • Linux distributions;
  • Linux desktop environments and compositors;
  • Color experts, researchers and enthusiasts;

This diverse mix of backgrounds are represented by developers from several companies working on the Linux display stack: AMD, Arm, BlueSystems, Bootlin, Collabora, Google, GravityXR, Igalia, Intel, LittleCMS, Qualcomm, Raspberry Pi, RedHat, SUSE, and System76. It’ll ensure a dynamic exchange of perspectives and foster collaboration across the Linux Display community.

Please take a look at the list of participants for more info.

What’s on the Agenda?

The beauty of the hackfest is that the agenda is driven by participants! As this is a hybrid event, we decided to improve the experience for remote participants by creating a dedicated space for them to propose topics and some introductory talks in advance. From those inputs, we defined a schedule that reflects the collective interests of the group, but is still open for amendments and new proposals. Find the schedule details in the official event webpage.

Expect discussions on:

KMS Color/HDR
  • Proposal with new DRM object type:
    • Brief presentation of GPU-vendor features;
    • Status update of plane color management pipeline per vendor on Linux;
  • HDR/Color Use-cases:
    • HDR gainmap images and how should we think about HDR;
    • Google/ChromeOS GFX view about HDR/per-plane color management, VKMS and lessons learned;
  • Post-blending Color Pipeline.
  • Color/HDR testing/CI
    • VKMS status update;
    • Chamelium boards, video capture.
  • Wayland protocols
    • color-management protocol status update;
    • color-representation and video playback.
Display control
  • HDR signalling status update;
  • backlight status update;
  • EDID and DDC/CI.
Strategy for video and gaming use-cases
  • Multi-plane support in compositors
    • Underlay, overlay, or mixed strategy for video and gaming use-cases;
    • KMS Plane UAPI to simplify the plane arrangement problem;
    • Shared plane arrangement algorithm desired.
  • HDR video and hardware overlay
Frame timing and VRR
  • Frame timing:
    • Limitations of uAPI;
    • Current user space solutions;
    • Brainstorm better uAPI;
  • Cursor/overlay plane updates with VRR;
  • KMS commit and buffer-readiness deadlines;
Power Saving vs Color/Latency
  • ABM (adaptive backlight management);
  • PSR1 latencies;
  • Power optimization vs color accuracy/latency requirements.
Content-Adaptive Scaling & Sharpening
  • Content-Adaptive Scalers on display hardware;
  • New drm_colorop for content adaptive scaling;
  • Proprietary algorithms.
Display Mux
  • Laptop muxes for switching of the embedded panel between the integrated GPU and the discrete GPU;
  • Seamless/atomic hand-off between drivers on Linux desktops.
Real time scheduling & async KMS API
  • Potential benefits: lower latency input feedback, better VRR handling, buffer synchronization, etc.
  • Issues around “async” uAPI usage and async-call handling.

In-person, but also geographically-distributed event

This year Linux Display Next hackfest is a hybrid event, hosted onsite at the Igalia offices and available for remote attendance. In-person participants will find an environment for networking and brainstorming in our inspiring and collaborative office space. Additionally, A Coruña itself is a gem waiting to be explored, with stunning beaches, good food, and historical sites.

Semi-structured structure: how the 2024 Linux Display Next Hackfest will work

  • Agenda: Participants proposed the topics and talks for discussing in sessions.
  • Interactive Sessions: Discussions, workshops, introductory talks and brainstorming sessions lasting around 1h30. There is always a starting point for discussions and new ideas will emerge in real time.
  • Immersive experience: We will have coffee-breaks between sessions and lunch time at the office for all in-person participants. Lunches and coffee-breaks are sponsored by Igalia. This will keep us sharing knowledge and in continuous interaction.
  • Spaces for all group sizes: In-person participants will find different room sizes that match various group sizes at Igalia HQ. Besides that, there will be some devices for showcasing and real-time demonstrations.

Social Activities: building connections beyond the sessions

To make the most of your time in A Coruña, we’ll be organizing some social activities:

  • First-day Dinner: In-person participants will enjoy a Galician dinner on Tuesday, after a first day of intensive discussions in the hackfest.
  • Getting to know a little of A Coruña: Finding out a little about A Coruña and current local habits.

Participants of a guided tour in one of the sectors of the Museum of Estrella Galicia (MEGA). Source: mundoestrellagalicia.es

  • On Thursday afternoon, we will close the 2024 Linux Display Next hackfest with a guided tour of the Museum of Galicia’s favorite beer brand, Estrella Galicia. The guided tour covers the eight sectors of the museum and ends with beer pouring and tasting. After this experience, a transfer bus will take us to the Maria Pita square.
  • At Maria Pita square we will see the charm of some historical landmarks of A Coruña, explore the casual and vibrant style of the city center and taste local foods while chatting with friends.

Sponsorship

Igalia sponsors lunches and coffee-breaks on hackfest days, Tuesday’s dinner, and the social event on Thursday afternoon for in-person participants.

We can’t wait to welcome hackfest attendees to A Coruña! Stay tuned for further details and outcomes of this unconventional and unique experience.

May 07, 2024 02:33 PM

May 06, 2024

Igalia Compilers Team

MessageFormat 2.0: a new standard for translatable messages

code { word-break: normal; }
This blog post represents the Igalia compilers team; all opinions expressed here are our own and may not reflect the opinions of other members of the MessageFormat Working Group or any other group.

MessageFormat 2.0 is a new standard for expressing translatable messages in user interfaces, both in Web applications and other software. Last week, my work on implementing MessageFormat 2.0 in C++ was released in ICU 75, the latest release of the International Components for Unicode library.

As a compiler engineer, when I learned about MessageFormat 2.0, I began to see it as a programming language, albeit an unconventional one. The message formatter is analogous to an interpreter for a programming language. I was surprised to discover a programming language hiding in this unexpected place. Understanding my surprise requires some context: first of all, what "messages" are.

A story about message formatting #

Over the past 40 to 50 years, user interfaces (UIs) have grown increasingly complex. As interfaces have become more interactive and dynamic, the process of making them accessible in a multitude of natural languages has increased in complexity as well. Internationalization (i18n) refers to this general practice, while the process of localization (l10n) refers to the act of modifying a specific system for a specific natural language.

i18n history #

Localization of user interfaces (both command-line and graphical) began by translating strings embedded in code that implements UIs. Those strings are called "messages". For example, consider a text-based adventure game: messages like "It is pitch black. You are likely to be eaten by a grue." might appear anywhere in the code. As a slight improvement, the messages could all be stored in a separate "resource file" that is selected based on the user's locale. The ad hoc approaches to translating these messages and integrating them into code didn't scale well. In the late 1980s, C introduced a gettext() function into glibc, which was never standardized but was widely adopted. This function primarily provided string replacement functionality. While it was limited, it inspired the work that followed. Microsoft and Apple operating systems had more powerful i18n support during this time, but that's beyond the scope of this post.

The rise of the Web required increased flexibility. Initially, static documents and apps with mostly static content dominated. Java, PHP, and Ruby on Rails all brought more dynamic content to the web, and developers using these languages needed to tinker more with message formatting. Their applications needed to handle not just static strings, but messages that could be customized to dynamic content in a way that produced understandable and grammatically correct messages for the target audience. MessageFormat arose as one solution to this problem, implemented by Taligent (along with other i18n functionality).

The creators of Java had an interest in making Java the preferred language for implementing Web applications, so Sun Microsystems incorporated Taligent's work into Java in JDK 1.1 (1997). This version of MessageFormat supported pattern strings, formatting basic types, and choice format.

The MessageFormat implementation was then incorporated into ICU, the widely used library that implements internationalization functions. Initially, Taligent and IBM (which was one of Taligent's parent companies) worked with Sun to maintain parallel i18n implementations in ICU and in the JDK, but eventually the two implementations diverged. Moving at a faster pace, ICU added plural selection to MessageFormat and expanded other capabilities, based on feedback from developers and translators who had worked with the previous generations of localization tools. ICU also added a C++ port of this Java-based implementation; hence, the main ICU project includes ICU4J (implemented in Java) and ICU4C (implemented in C++ with some C APIs).

Since ICU MessageFormat was introduced, many much more complex Web UIs, largely based on JavaScript, have appeared. Complex programs can generate much more interesting content, which implies more complex localization. This necessitates a different model of message formatting. An update to MessageFormat has been in the works at least since 2019, and that work culminates with the recent release of the MessageFormat 2.0 specification. The goals are to provide more modularity and extensibility, and easier adaptation for different locales.

For brevity, we will refer to ICU MessageFormat, otherwise known simply as "MessageFormat", as "MF1" throughout. Likewise, "MF2" refers to MessageFormat 2.0.

Why message formatting? (Not just for translation) #

The work described in this post is motivated by the desire to make it as easy as possible to write software that is accessible to people regardless of the language(s) they use to communicate. But message formatting is needed even in user interfaces that only support a single natural language.

Consider this C code, where number is presumed to be an unsigned integer variable that's already been defined:

printf("You have %u files", number);

We've probably all used applications that informed us that we "have 1 files". While it's probably clear what that means, why should human readers have to mentally correct the grammatical errors of a computer? A programmer trying to improve things might write:

printf("You have %u file%s", number, number == 1 ? "" : "s");

which handles the case where n is 1. However, English is full of irregular plurals: consider "bunny" and "bunnies", "life" and "lives", or "mouse" and "mice". The code would be easier to read and maintain if it was easier to express "print the plural of 'file'", instead of a conditional expression that must be scrutinized for its meaning. While "printf" is short for "print formatted", its formatting is limited to simpler tasks like printing numbers as strings, not complex tasks like pluralizing "mouse".

This is just one example; another example is messages including ordinals, like "1st", "2nd", or "3rd", where the suffix (like "st") is determined in a complicated way from the number (consider "11th" versus "21st").

We've seen how messages can vary based on user input in non-local ways: the bug in "You have %u files" is that not only does the number of files vary, so does the word "files". So you might begin to see the value of a specialized notation for expressing such messages, which would make it easier to notice bugs like the "1 files" bug and even prevent them in the first place. That notation might even resemble a domain-specific language.

Why message formatting? #

Turning now to translation, you might be wondering why you would need a separate message formatter, instead of writing code in your programming language of choice that would look something like this (if your language of choice is C/C++):

// Supposing a char* "what" and int "planet" are already defined
printf("There was %s on planet %d", what, planet);

Here, the message is the first argument to printf and the other arguments are substituted into the message at runtime; no special message formatting functionality is used, beyond what the C standard library offers.

The problem is that if you want to translate the words in the message, translation may require changing the code, not just the text of the message. Suppose that in the target language, we needed to write the equivalent of:

printf("On planet %d, there was a %s", what, planet);

Oops! We've reordered the text, but not the other arguments to printf. When using a modern C/C++ compiler, this would be a compile-time error (since what is a string and not an integer). But it's easy to imagine similar cases where both arguments are strings, and a nonsense message would result. Message formatters are necessary not only for resilience to errors, but also for division of labor: people building systems want to decouple the tasks of programming and translation.

ICU MessageFormat #

In search of a better tool for the job, we turn to one of many tools for formatting messages: ICU MessageFormat (MF1). According to the ICU4C API documentation, "MessageFormat prepares strings for display to users, with optional arguments (variables/placeholders)".

The MF2 working group describes MF1 more succinctly as "An i18n-aware printf, basically."

Here's an expanded version of the previous printf example, expressed in MF1 (and taken from the "Usage Information" of the API docs):

"At {1,time,::jmm} on {1,date,::dMMMM}, there was {2} on planet {0,number}."

Typically, the task of creating a message string like this is done by a software engineer, perhaps one who specializes in localization. The work of translating the words in the message from one natural language to another is done by a translator, who is not assumed to also be a software developer.

The designers of MF1 made it clear in the syntax what content needs to be translated, and what doesn't. Any text occurring inside curly braces is non-translatable: for example, the string "number" in the last set of curly braces doesn't mean the word "number", but rather, is a directive that the value of argument 0 should be formatted as a number. This makes it easy for both developers and translators to work with a message. In MF1, a piece of text delimited by curly braces is called a "placeholder".

The numbers inside placeholders represent arguments provided at runtime. (Arguments can be specified either by number or name.) Conceptually, time, date, and number are formatting functions, which take an argument and an optional "style" (like ::jmm and ::dMMMM in this example). MF1 has a fixed set of these functions. So {1,time,::jmm} should be read as "Format argument 1 as a time value, using the format specified by the string ::jmm". (The details of how the format strings work aren't important for this explanation.) Since {2} has no formatting function specified, the value of argument 2 is formatted based on its type. (From context, we can suppose it has a string type, but we would need to look at the arguments to the message to know for sure.) The set of types that can be formatted in this way is fixed (users can't add new types of their own).

For brevity, I won't show how to provide the arguments that are substituted for the numbers 0, 1 and 2 in the message; the API documentation shows the C++ code to create those arguments.

MF1 addresses the reordering issue we noticed in the printf example. As long as they don't change what's inside the placeholders, a translator doesn't have to worry that their translation will disturb the relationship between placeholders and arguments. Likewise, a developer doesn't have to worry about manually changing the order of arguments to reflect a translation of the message. The placeholder {2} means the same thing regardless of where it appears in the message. (Using named rather than positional arguments would make this point even clearer.)

(The ICU User Guide contains more documentation on MF1. A tutorial by Mohammad Ashour also contains some useful information on MF1.)

Enter MessageFormat 2.0 #

To summarize a few of the shortcomings of MF1:

  • Lack of modularity: the set of formatters (mostly for numbers, dates, and times) is fixed. There's no way to add your own.
    • Like formatting, selection (choosing a different pattern based on the runtime contents of one or more arguments) is not customizable. It can only be done either based on plural form, or literal string matching. This makes it hard to express the grammatical structures of various human languages.
  • No separate data model: the abstract syntax is concrete syntax.
  • There is no way to declare a local variable so that the same piece of a message can be re-used without repeating it; all variables are external arguments.
  • Baggage: extending the existing spec with different syntax could change the meaning of existing messages, so a new spec is needed. (MF1 was not designed for forward-compatibility, and the new spec is backward-incompatible.)

MF2 represents the best efforts of a number of experts in the field to address these shortcomings and others.

I hope the brief history I gave shows how much work, by many tech workers, has gone into the problem of message formatting. Synthesizing the lessons of the past few decades into one standard has taken years, with seemingly small details provoking nuanced discussions. What follows may seem complex, but the complexity is inherent in the problem space that it addresses. The plethora of different competing tools to address the same set of problems is evidence for that.

"Inherent is Latin for not your fault." -- Rich Hickey, "Simple Made Easy"

A simple example #

Here's the same example in MF2 syntax:

At time {$time :datetime hour=numeric minute=|2-digit|}, on {$time :datetime day=numeric month=long},
there was {$what} on planet {$planet :number}

As in MF1, placeholders are enclosed in curly braces. Variables are prefixed with a $; since there are no local bindings for time, what, or planet, they are treated as external arguments. Although variable names are already delimited by curly braces, the $ prefix helps make it even clearer that variable names should not be translated.

Function names are now prefixed by a :, and options (like hour) are named. There can be multiple options, not just one as in MF1. This is a loose translation of the MF1 version, since in MF2, the skeleton option is not part of the alpha version of the spec. (It is possible to write a custom function that takes this option, and it may be added in the future as part of the built-in datetime function.) Instead, the :datetime formatter can take a variety of field options (shown in the example) or style options (not shown) The full options are specified in the Default Registry portion of the spec.

Literal strings that occur within placeholders, like 2-digit, are quoted. The MF2 syntax for quoting a literal is to enclose it in vertical bars (|). The vertical bars are optional in most cases, but are necessary for the literal 2-digit because it includes a hyphen. This syntax was chosen instead of literal quotation marks (") because some use cases for MF2 messages require embedding them in a file format that assigns meaning to quotation marks. Vertical bars are less commonly used in this way.

A shorter version of this example is:

At time {$time :time}, on {$time :date}, there was {$what} on planet {$planet :number}

:time formats the time portion of its operand with default options, and likewise for :date. Implementations may differ on how options are interpreted and their default values. Thus, the formatted output of this version may be slightly different from that of the first version.

A more complex example: selection and custom functions #

So far, the examples are single messages that contain placeholders that vary based on runtime input. But sometimes, the translatable text in a message also depends on runtime input. A common example is pluralization: in English, consider "You have one item" versus "You have 5 items." We can call these different strings "variants" of the same message.

While you can imagine code that selects the right variant with a conditional, that violates the separation between code and data that we previously discussed.

Another motivator for variants is grammatical case. English has a fairly simple case system (consider "She told me" versus "I told her"; the "she" of the first sentence changes to "her" when that pronoun is the object of the transitive verb "tell".) Some languages have much more complex case systems. For example, Russian has six grammatical cases; Basque, Estonian, and Finnish all have more than ten. The complexity of translating messages into and between languages like these is further motivation for organizing all the variants of a message together. The overall goal is to make messages as self-contained as possible, so that changing them doesn't require changing the code that manipulates them. MF2 makes that possible in more situations. While MF1 supports selection based on plural categories, MF2 also supports a general, customizable form of selection.

Here's a very simple example (necessarily simple since it's in English) of using custom selectors in MF2 to express grammatical case:

.match {$userName :hasCase}
vocative {{Hello, {$userName :person case=vocative}!}}
accusative {{Please welcome {$userName :person case=accusative}!}}
* {{Hello!}}

The keyword .match designates the beginning of a matcher.

How to read a matcher #

Although MF2 is declarative, you can read this example imperatively as follows:

  1. Apply :hasCase to the runtime value of $userName to get a string c representing a grammatical case.
  2. Compare c to each of the keys in the three variants ("vocative", "accusative", and the wildcard "*", which matches any string.)
  3. Take the matching variant v and format its pattern.

This example assumes that the runtime value of the argument $userName is a structure whose grammatical case can be determined. This example also assumes that custom :hasCase and :person functions have been defined (the details of how those functions are defined are outside the scope of this post). In this example, :person is a formatting function, like :datetime in the simpler example. :hasCase is a different kind of function: a selector function, which may extract a field from its argument or do arbitrarily complicated computation to determine a value to be matched against.

In general: a matcher includes one or more selectors and one or more variants. A variant includes a list of keys, whose length must equal the number of selectors in the matcher; and a single pattern.

In this example, the selector of the matcher is the placeholder {$userName :hasCase}. Selectors appear between the .match keyword and the beginning of the first variant. There are three variants, each of which has a single key. The strings delimited by double sets of curly braces are patterns, which in turn contain other placeholders. The selectors are used to select a pattern based on whether the runtime value of the selector matches one of the keys.

Formatting the selected pattern #

Supposing that :hasCase maps the value of $userName onto "vocative", the formatted pattern consists of the string "Hello, " concatenated with the result of formatting this placeholder:

{$userName :person case=vocative}

concatenated with the string "!". (This also supposes that we are formatting to a single string result; future versions of MF2 may also support formatting to a list of "parts", in which case the result strings would be returned in a more complicated data structure, not concatenated.)

You can read this placeholder like a function call with arguments: "Call the :person function on the value of $userName with an additional named argument, the name "case" bound to the string "vocative". We also elide the details of :person, but you can suppose it uses additional fields in $userName to format it as a personal name. Such fields might include a title, first name, and last name (incidentally, see "Falsehoods Programmers Believe About Names" for why formatting personal names is more complicated than you might think.)

Note that MF2 provides no constructs that mutate variables. Once a variable is bound, its value doesn't change. Moreover, built-in functions don't have side effects, so as long as custom functions are written in a reasonable way (that is, without side effects that cross the boundary between MF2 and function execution), MF2 has no side effects.

That means that $userName represents the same value when it appears in the selector and when it appears in any of the patterns. Conceptually, :hasCase returns a result that is used for selection; it doesn't change what the name $userName is bound to.

Abstraction over function implementations #

A developer could plug in an implementation of :hasCase that requires its argument to be an object or record that contains grammatical metadata, and then simply returns one of the fields of this object. Or we could plug in an implementation that can accept a string and uses a dictionary for the target language to guess its case. The message is structured the same way regardless, though to make it work as expected, the structure of the arguments must match the expectations of whatever functions consume them. Effectively, the message is parameterized over the meanings of its arguments and over the meanings of any custom functions it uses. This parameterization is a key feature of MF2.

Summing up #

This example couldn't be expressed in MF1, since MF1 has no custom functions. The checking for different grammatical cases would be done in the underlying programming language, with ad hoc code for selecting between the different strings. This would be error-prone, and would force a code change whenever a message changes (just as in the simple example shown previously). MF1 does have an equivalent of .match that supports a few specific kinds of matching, like plural matching. In MF2, the ability to write custom selector functions allows for much richer matching.

Further reading (or watching) #

For more background both on message formatting in general and on MF2, I recommend my teammate Ujjwal Sharma's talk at FOSDEM 2024, on which portions of this post are based. A recent MessageFormat 2 open house talk by Addison Phillips and Elango Cheran also provides some great context and motivation for why a new standard is needed.

You can also read a detailed argument in favor of a new standard by visiting the spec repository on GitHub: "Why MessageFormat needs a successor".

Igalia's work on MF2, or: Who are we and what are we doing here? #

So far, I've provided a very brief overview of the syntax and semantics of MF2. More examples will be provided via links in the next post.

A question I haven't answered is why this post is on the Igalia compilers team blog. I'm a member of the Igalia compilers team; together with Ujjwal Sharma, I have been collaborating with the MF2 working group, a subgroup of the Unicode CLDR technical committee. The working group is chaired by Addison Phillips and has many other members and contributors.

Part of our work on the compilers team has been to implement the MF2 specification as a C++ API in ICU. (Mihai Niță at Google, also a member of the working group, implemented the specification as a Java API in ICU.)

So why would the compilers team work on internationalization?

Part of our work as a team is to introduce and refine proposals for new JavaScript (JS) language features and work with TC39, the JS standards committee, to advance these proposals with the goal of inclusion in the official JS specification.

One such proposal that the compilers team has been involved with is the Intl.MessageFormat proposal.

The implementation of MessageFormat 2 in ICU provides support for browser engines (the major ones are implemented in C++) to implement this proposal. Prototype implementations are part of the TC39 proposal process.

But there's another reason: as I hinted at the beginning, the more I learned about MF2, the more I began to see it as a programming language, at least a domain-specific language. In the next post on this blog, I'll talk about the implementation work that Igalia has done on MF2 and reflect on that work from a programming language design perspective. Stay tuned!

May 06, 2024 12:00 AM

May 04, 2024

Brian Kardell

Known Knowns

Known Knowns

Fun with the DOM, the parser, illogical trees and "unknowns"...

HTML has this very tricky parser that does 'corrections' and things on the fly. So if you create a page like:

<table>
   Hello
   <td>
      Look below...
   </td>
</table>

And load it in your browser, what you'll actually get parsed as a tree will be

  • HTML
    • HEAD
    • BODY
      • #text: Hello
      • TABLE
        • TBODY
          • TR
            • TD
              • #text: Look below...
            • #text:

Things can literally be moved around, implied elements added and so on.

Illogical trees

But it's not impossible to create trees that are impossible to create with the parser itself, if you do it dynamically. With the DOM API, you can create whatever wild constructs you want: Paragraphs inside of paragraphs? Sure, why not.

Or, text that is a direct child of a table. Note this still renders the text in every browser engine.

You can even add children to 'void' elements that way too. Here's an interesting one: An <hr> with a text node child. Again, It renders the text in every browser engine (the color varies).

You can also put unknown elements into your markup and the text content is shown... By default, it is basically a <span>.

In most cases, HTML wants to show something... or at least leave CSS in control. For example, you can dynamically add children to a <script>. While that won't be shown by default, it's simply because the UA style sheet has script set to display: none;. If we change that, we can totally see it.

But this isn't universal: In some cases there are other renderers that are in control - mainly when it comes to form controls. But also, for example, if you switch the <hr> in the example above to a <br> it won't render the text. It doesn't generate a box that you can do anything with with CSS as far as I can tell, except make it display: none (useful if they’re in white-space: pre blocks to keep them from forcing open extra lines).

SVG

The HTML parser has fix ups for embedding SVG too, there are kind of integration points. But, in SVG you can have an unknown element too... For example:

<svg>
    <unknown>Test<unknown>
    <rect width="150" height="150" x="10" y="10" style="fill:blue;stroke:pink;stroke-width:5;opacity:0.5" />
</svg>

The unknown element won't render the text, nor additional SVG children inside it. For example:

<svg id=one width="300" height="170">
  <unknown><ellipse cx="120" cy="80" rx="100" ry="50" style="fill:yellow;stroke:green;stroke-width:3" /></unknown>
  <rect width="150" height="150" x="10" y="10" style="fill:blue;stroke:pink;stroke-width:5;opacity:0.5" />
</svg>

MathML

As you might expect, there are parser integrations for MathML too, and you can have an unknown elements in MathML too. In MathML, all elements (including unknown ones) generate a "math content box", but only token elements (like <mi>, <mo>, <mn>) render text. For example, the <math> element itself - if you try to put text in it, the text won't render, but it will still generate a box and other content.

<math>
   Not a token. Doesn't render.
   <mi>X</mi>
</math>

MathML has other elements too like <mrow> and <mspace> and <mphantom> which are just containers. Same story there, if you try to put text in them, the text won't render...

<math>
   <mrow>Not a token. Doesn't render.</mrow>
   <mi>X</mi>
</math>

But if you put the text inside a token element (like <mi>) inside that same <mrow>, then the text will render..

<math>
   <mrow><mi>ok</mi></mrow>
   <mi>X</mi>
</math>

In MathML, unknown elements are basically treated just like <mrow>. In the above examples, you could replace <mrow> with <unknown> and it'd be the same.

Unknown Unknowns

Ok, here's something you don't think about every day: Given this markup:

<unknown id=one>One</unknown>
<math>
  <unknown id=two>Two</unknown>
</math>
<svg>
  <unknown id=three>Three</unknown>
</svg>

One, Two and Three are three different kinds of unknowns!

console.log(one.namespaceURI, one.constructor.name)
// logs 'http://www.w3.org/1999/xhtml HTMLUnknownElement'

console.log(two.namespaceURI, two.constructor.name)
// logs 'http://www.w3.org/1998/Math/MathML MathMLElement'

console.log(three.namespaceURI, three.constructor.name)
// logs 'http://www.w3.org/2000/svg SVGElement`

In CSS, these can also (theoretically) be styled via namespaces. The following will only style the first of those:

@namespace html url(http://www.w3.org/1999/xhtml);
html|unknown { color: blue; }

Under-defined Unknowns

Remember how in the beginning we created nonsensical constructs dynamically? Well, we can do that here too. We can move an unknown MathMLElement right into HTML, or an unknown HTMLElement right into MathML, and so on - and it's not currently well-defined, universal, or consistent what actually happens here.

Here's an interesting example moves an unknown MathMLElement and a SVGElement into HTML, and an HTMLElement into MathML and so on

See the Pen Untitled by вкαя∂εℓℓ (@briankardell) on CodePen.

Here's how that renders in the various today:

Left to right: Chrome, Firefox, Safari all render differently

So, I guess we should probably fix that. I'll have to start creating some issues and tentative tests (feel free to beat me to it 😉)

Semi-related Rabbit Holes

Not specifically these issues, but related namespace stuff has caused a lot of problems that we're remedying as shown by a recent flurry of activity started by my colleague Luke Warlow to fix MathML-Core support in various libraries/frameworks:

Who's next? :)

May 04, 2024 04:00 AM

May 01, 2024

Stephanie Stimac

Exploring the Immersive Web with Wolvic

When I was interviewing for a role with Microsoft Edge in early 2016, I got to my final interview in the loop and I can still remember sitting across from my soon-to-be manager when he asked about the future of the web.

HoloLens had been announced, so I must have had that in my mind because I answered with a question.

"What if you could crawl into a web page?" And began to describe what it would look like to crawl into a news article to be immersed in a visual representation of the story you were reading.

I got the job and joined the land of browser work and discovered an entire career path that changed my life. I didn't do anything with browsers in VR while I was at Microsoft.

Eight years later, I powered on a Meta Quest 3 in my little home office, fumbled through the setup, downloaded Igalia's open source browser Wolvic and crawled in a browser. Jhey was outside when I was suddenly immersed in a browser cityscape and very loudly exclaimed, "holy shit" and laughed.

I entered an art exhibition and then crawled into a painting. Exploring art in the 3D. "This is so cool." And that's what I kept repeating.

Hello Wolvic #

So, why exactly do I have an XR headset, and what does this have to do with your career? One of the projects I am slowly ramping up on at Igalia is Wolvic. Wolvic used to be Firefox Reality but Igalia took over stewardship of the project in 2022 and have been working on developing it since. It's the only open source browser available on several devices.

From a product perspective, this is a really cool space for me to flex some product management chops and think about user growth and how to make getting started with Wolvic easier. Then there's the project management side of things and finding partnerships for companies interested in developing the browser.

What could you use a VR headset for? #

I was sharing my first experience with my friends and one of them works for a telecommunications company. "I spend so much time in Google Earth exploring and drawing lines on maps between points. It would be pretty fun to do in VR."

That sounds pretty practical to me. Being "in" a view is completely different from viewing something on your normal computer monitor.

I'm pondering all the different use cases now, things I'd never thought of before. It's pretty cool.

Immersed in the Web #

I am sure I will be blogging more about Wolvic and building an XR browser. I'm excited to explore this space...a space I imagined eight years ago. It feels a little full circle and that's pretty neat for me.

Wolvic is Open Source & if you're interested in helping to fund the project or use it, here are a few links:

You can download Wolvic from the Meta App store. It's also available on Pico & Huawei devices.

If you're an XR developer, please let us know if you have any feedback for Wolvic. You can follow the project on GitHub.

More to come but for now I just wanted to say....this is SO freaking cool. Stay tuned for some posts and videos on YouTube.

May 01, 2024 12:00 AM

April 30, 2024

Enrique Ocaña

Dissecting GstSegments

During all these years using GStreamer, I’ve been having to deal with GstSegments in many situations. I’ve always have had an intuitive understanding of the meaning of each field, but never had the time to properly write a good reference explanation for myself, ready to be checked at those times when the task at hand stops being so intuitive and nuisances start being important. I used the notes I took during an interesting conversation with Alba and Alicia about those nuisances, during the GStreamer Hackfest in A Coruña, as the seed that evolved into this post.

But what are actually GstSegments? They are the structures that track the values needed to synchronize the playback of a region of interest in a media file.

GstSegments are used to coordinate the translation between Presentation Timestamps (PTS), supplied by the media, and Runtime.

PTS is the timestamp that specifies, in buffer time, when the frame must be displayed on screen. This buffer time concept (called buffer running-time in the docs) refers to the ideal time flow where rate isn’t being had into account.

Decode Timestamp (DTS) is the timestamp that specifies, in buffer time, when the frame must be supplied to the decoder. On decoders supporting P-frames (forward-predicted) and B-frames (bi-directionally predicted), the PTS of the frames reaching the decoder may not be monotonic, but the PTS of the frames reaching the sinks are (the decoder outputs monotonic PTSs).

Runtime (called clock running time in the docs) is the amount of physical time that the pipeline has been playing back. More specifically, the Runtime of a specific frame indicates the physical time that has passed or must pass until that frame is displayed on screen. It starts from zero.

Base time is the point when the Runtime starts with respect to the input timestamp in buffer time (PTS or DTS). It’s the Runtime of the PTS=0.

Start, stop, duration: Those fields are buffer timestamps that specify when the piece of media that is going to be played starts, stops and how long that portion of the media is (the absolute difference between start and stop, and I mean absolute because a segment being played backwards may have a higher start buffer timestamp than what its stop buffer timestamp is).

Position is like the Runtime, but in buffer time. This means that in a video being played back at 2x, Runtime would flow at 1x (it’s physical time after all, and reality goes at 1x pace) and Position would flow at 2x (the video moves twice as fast than physical time).

The Stream Time is the position in the stream. Not exactly the same concept as buffer time. When handling multiple streams, some of them can be offset with respect to each other, not starting to be played from the begining, or even can have loops (eg: repeating the same sound clip from PTS=100 until PTS=200 intefinitely). In this case of repeating, the Stream time would flow from PTS=100 to PTS=200 and then go back again to the start position of the sound clip (PTS=100). There’s a nice graphic in the docs illustrating this, so I won’t repeat it here.

Time is the base of Stream Time. It’s the Stream time of the PTS of the first frame being played. In our previous example of the repeating sound clip, it would be 100.

There are also concepts such as Rate and Applied Rate, but we didn’t get into them during the discussion that motivated this post.

So, for translating between Buffer Time (PTS, DTS) and Runtime, we would apply this formula:

Runtime = BufferTime * ( Rate * AppliedRate ) + BaseTime

And for translating between Buffer Time (PTS, DTS) and Stream Time, we would apply this other formula:

StreamTime = BufferTime * AppliedRate + Time

And that’s it. I hope these notes in the shape of a post serve me as reference in the future. Again, thanks to Alicia, and especially to Alba, for the valuable clarifications during the discussion we had that day in the Igalia office. This post wouldn’t have been possible without them.

by eocanha at April 30, 2024 06:00 AM

April 26, 2024

Gyuyoung Kim

Web Platform Test and Chromium

I’ve been working on tasks related to web platform tests since last year. In this blog post, I’ll introduce what web platform tests are and how the Chromium project incorporates these tests into its development process.

1. Introduction

The web-platform-tests project serves as a cross-browser test suite for the Web platform stack. By crafting tests that can run seamlessly across all browsers, browser projects gain assurance that their software aligns with other implementations. This confidence extends to future implementations, ensuring compatibility. Consequently, web authors and developers can trust the Web platform to fulfill its promise of seamless functionality across browsers and devices, eliminating the need for additional layers of abstraction to address gaps introduced by specification editors and implementors.

For your information, the Web Platform Test Community operates a dashboard that tracks the pass ratio of tests across major web browsers. The chart below displays the number of failing tests in major browsers over time, with Chrome showing the lowest failure rate.

[Caption] Test failures graph on major browsers
And, the below chart shows the interoperability of the web platform technology for 2023 among major browsers. The interoperability has been improving.

[Caption] Interoperability among the major browsers 2023

2. Test Suite Design

The majority of the test suite is made up of HTML pages that are designed to be loaded in a browser. These pages may either generate results programmatically or provide a set of steps for executing the test and obtaining the outcome. Overall, the tests are concise, cross-platform, and self-contained, making them easy to run in any browser.

2.1 Test Layout

Most primary directories within the repository are dedicated to tests associated with specific web standards. For W3C specifications, these directories typically use the short name of the spec, which is the name used for snapshot publications under the /TR/ path. For WHATWG specifications, the directories are usually named after the spec’s subdomain, omitting “.spec.whatwg.org” from the URL. Other specifications follow a logical naming convention.

The css/ directory contains test suites specifically designed for the CSS Working Group specifications.

Within each specification-specific directory, tests are organized in one of two common ways: a flat structure, sometimes used for shorter specifications, or a nested structure, where each subdirectory corresponds to the ID of a heading within the specification. The nested structure, which provides implicit metadata about the tested section of the specification based on its location in the filesystem, is preferred for larger specifications.

For example, tests related to “The History interface” in HTML can be found in html/browsers/history/the-history-interface/.

Many directories also include a file named META.yml, which may define properties such as:
  • spec: a link to the specification covered by the tests in the directory
  • suggested_reviewers: a list of GitHub usernames for individuals who are notified when pull requests modify files in the directory
Various resources that tests rely on are stored in common directories, including images, fonts, media, and general resources.

2.2 Test Types

Tests in this project employ various approaches to validate expected behavior, with classifications based on how expectations are expressed:

  1. Rendering Tests:
    • Reftests: Compare the graphical rendering of two (or more) web pages, asserting equality in their display (e.g., A.html and B.html must render identically). This can be done manually by users switching between tabs/windows or through automated scripts.
    • Visual Tests: Evaluate a page’s appearance, with results determined either by human observation or by comparing it with a saved screenshot specific to the user agent and platform.
  2. JavaScript Interface Tests (testharness.js tests):
    • Ensure that JavaScript interfaces behave as expected. Named after the JavaScript harness used to execute them.
  3. WebDriver Browser Automation Protocol Tests (wdspec tests):
    • Written in Python, these tests validate the WebDriver browser automation protocol.
  4. Manual Tests rely on a human to run them and determine their result.

3. Web Platform Test in Chromium

The Chromium project conducts web tests utilized by Blink to assess various components, encompassing, but not limited to, layout and rendering. Generally, these web tests entail loading pages in a test renderer (content_shell) and comparing the rendered output or JavaScript output with an expected output file. The Web Tests include “Web platform tests”(WPT) located at web_tests/external/wpt. Other directories are for Chrome-specific tests only. In the web_tests/external/wpt, there are around 50,000 web platform tests currently.

3.1 Web Tests in the Chromium development process

Every patch must pass all tests before it can be merged into the main source tree. In Gerrit, trybots execute a wide array of tests, including the Web Test, to ensure this. We can initiate these trybots with a specific patch, and each trybot will then run the scheduled tests using that patch.

[Caption] Trybots in Gerrit
For example, the linux-rel trybot runs the web tests in the blink_web_tests step as below,

[Caption] blink_web_tests in the linux-rel trybot

3.2 Internal sequence for running the Web Test in Chromium

Let’s take a look at how Chromium internally runs the web tests simply. Basically, there are 2 major components to run the web tests in Chromium. One is run_web_tests.py acting as a server. The other one is content_shell acting as a client that loads a passed test and returns the result. The input and output of content_shell are assumed to follow the run_web_tests protocol through pipes that connect stdin and stdout of run_web_tests.py and content_shell as below,

[Caption] Sequence how to execute a test between run_web_test.py and content_shell

4. In conclusion

We just explored what Web Tests are and how Chromium runs these tests for web platform testing. Web platform tests play a crucial role in ensuring interoperability among various web browsers and platforms. In the following blog post, I will share how did I support and implement to run of web tests on iOS for the Blink port.

References

by gyuyoung at April 26, 2024 05:48 PM

April 24, 2024

Stephen Chenney

CSS Custom Properties in Highlight Pseudos

The CSS highlight inheritance model describes the process for inheriting the CSSproperties of the various highlight pseudo elements:

  • ::selection controlling the appearance of selected content
  • ::spelling-error controlling the appearance of misspelled word markers
  • ::grammar-error controlling how grammar errors are marked
  • ::target-text controlling the appearance of the string matching a target-text URL
  • ::highlight defining the appearance of a named highlight, accessed via the CSS highlight API

The inheritance model is intended to generate more intuitive behavior for examples like the one below, where we would expect the second line to be entirely blue because the <em> is a child of the blue paragraph that typically would inherit properties from the paragraph:

<style>
::selection /* = *::selection (universal) */ {
color: lightgreen;
}
.blue::selection {
color: blue;
}
</style>
<p>Some <em>lightgreen</em> text</p>
<p class="blue">Some <em>lightgreen</em> text that one would expect to be blue</p>
<script>
range = new Range();
range.setStart(document.body, 0);
range.setEnd(document.body, 3)
document.getSelection().addRange(range);
</script>

Before this model was standardized, web developers achieved the same effect using CSS custom properties. The selection highlight could make use of a custom property defined on the originating element (the element that is selected). Being defined on the originating element tree, those custom properties were inherited, so a universal highlight would effectively inherit the properties of its parent. Here’s what it looks like:

<style>
:root {
--selection-color: lightgreen;
}
::selection /* = *::selection (universal) */ {
color: var(--selection-color);
}
.blue {
--selection-color: blue;
}
</style>
<p>Some <em>lightgreen</em> text</p>
<p class="blue">Some <em>lightgreen</em> text that is also blue</p>
<script>
range = new Range();
range.setStart(document.body, 0);
range.setEnd(document.body, 3)
document.getSelection().addRange(range);
</script>

In the example, all selection highlights use the value of --selection-color as the selection text color. The <em> element inherits the property value of blue from its parent <p> and hence has a blue highlight.

This approach to selection inheritance was promoted in Stack Overflow posts and other places, even in posts that did not discuss inheritance. The problem is that the new highlight inheritance model, as previously specified, broke this behavior in a way that required significant changes for web sites making use of the former approach. At the prompting of developer advocates, the CSS Working Group decided to change the specified behavior of custom properties with highlight pseudos to support the existing recommended approach.

Custom Properties for Highlights #

The updated behavior for custom properties in highlight pseudos is that highlight properties that have var(...) references take the property values from the originating element. In addition, custom properties defined in the highlight pseudo itself will be ignored. The example above works with this new approach, so content making use of it will not break when highlight inheritance is enabled in browsers.

Note that this explicitly conflicts with the previous guidance, which was to define the custom properties on the highlight pseudos, with only the root highlight inheriting custom properties from the document root element.

The change is implemented behind a flag in Chrome 126 and higher, as is expected to be enabled by default as early as Chrome 130. To see the effects right now, enable “Experimental Web Platform features” via chrome://flags in M126 Chrome (Canary at the time of writing).

A Unified Approach for Derived Properties in Highlights #

There are other situations in which properties are disallowed on highlight pseudos yet those properties are necessary to resolve variables. For example, lengths that use font based units, such as 0.5em cannot use font-size from a highlight pseudo because font-size is not allowed in highlights. Similarly container-based units or viewport units. The general rule is that the highlight will get whatever value it needs from the originating element, and that now includes custom property values.

April 24, 2024 12:00 AM

April 22, 2024

Brian Kardell

Mirror Effects

Mirror Effects

Today I had this thought about "AI" that felt more like a short blog post than a social media thread, so I thought I'd share it.

I love learning and thinking about weird indirect connections like how thing X made things Y and Z suddenly possible, and given some time these had secondary or tertiary impacts that were really unexpected.

There are so many possible examples: We didn't really expect the web and social media to give rise to the Arab Spring. Nor to such disinformation. Nor the 2016 American election (or several others similarly around the world). Maybe Carrier didn't expect that the invention of air conditioning would help reshape the American political map, but it did.

One of the things that came to my mind today was a thing I read about mirrors. Ian Mortimer explains how improvements to and the spread of mirrors radically changed lots of things. Here's a quote from the piece:

The very act of a person seeing himself in a mirror or being represented in a portrait as the center of attention encouraged him to think of himself in a different way. He began to see himself as unique. Previously the parameters of individual identity had been limited to an individual’s interaction with the people around him and the religious insights he had over the course of his life. Thus individuality as we understand it today did not exist; people only understood their identity in relation to groups—their household, their manor, their town or parish—and in relation to God... The Mirror Effect

It's pretty interesting to look back and observe all of the changes that this began to stir - from the way people literally lived (more privacy), to changes in the types of writing and art, and how we thought about fitting in to the larger world (and thus the societies we built).

So, today I had this random thought that I wonder what sorts of effects like these will come from all of the "AI" focus. That is, not the common/direct sorts of things we're talking about but the ones that maybe have very little to do with any actual technology even. Which things will some future us look back on in 20 or 100 years and say "huh, interesting".

In particular, the thing that brought this to mind is that I am suddenly seeing lots of more people suddenly having conversations like "What is consciousness though, really?" and "What is intelligence, really?". I don't mean from technologists, I just mean it seems to be causing lots more people to suddenly think about that kind of thing.

So it made me think: I wonder if we will see increased signups for philosophy courses? Or, sales of more books along these lines? Could that ultimately lead to another, similar sort of changing in how we collectively see ourselves? I wonder what effects this has in the long term on literature, film, or even science and government.

This isn't a "take" - it's not trying to be optimistic or pessimistic. It's more of a "huh... I hadn't thought much about that before, but it's kind of interesting to think about." Don't you think? Outside of the normal sorts of things we're obviously thinking about - what are some you could imagine?

As a kind of final interesting note: Stephen Johnson is a great storyteller, and his work is full of connections like these. If you want to read more, check out his books. Part of the reason that I mention him is that interestingly, it seems that he has recently gone to work with Google on a LM project about writing himself. I bet he's got some notes and ideas on this already too.

April 22, 2024 04:00 AM

April 16, 2024

Philippe Normand

From WebKit/GStreamer to rust-av, a journey on our stack’s layers

In this post I’ll try to document the journey starting from a WebKit issue and ending up improving third-party projects that WebKitGTK and WPEWebKit depend on.

I’ve been working on WebKit’s GStreamer backends for a while. Usually some new feature needed on WebKit side would trigger work on GStreamer. That’s quite common and healthy actually, by improving GStreamer (bug fixes or implementing new features) we make the whole stack stronger (hopefully). It’s not hard to imagine other web-engines, such as Servo for instance, leveraging fixes made in GStreamer in the context of WebKit use-cases.

Sometimes though we have to go deeper and this is what this post is about!

Since version 2.44, WebKitGTK and WPEWebKit ship with a WebCodecs backend. That backend leverages the wide range of GStreamer audio and video decoders/encoders to give low-level access to encoded (or decoded) audio/video frames to Web developers. I delivered a lightning talk at gst-conf 2023 about this topic.

There are still some issues to fix regarding performance and some W3C web platform tests are still failing. The AV1 decoding tests were flagged early on while I was working on WebCodecs, I didn’t have time back then to investigate the failures further, but a couple weeks ago I went back to those specific issues.

The WebKit layout tests harness is executed by various post-commit bots, on various platforms. The WebKitGTK and WPEWebKit bots run on Linux. The WebCodec tests for AV1 currently make use of the GStreamer av1enc and dav1ddec elements. We currently don’t run the tests using the modern and hardware-accelerated vaav1enc and vaav1dec elements because the bots don’t have compatible GPUs.

The decoding tests were failing, this one for instance (the ?av1 variant). In that test both encoding and decoding are tested, but decoding was failing, for a couple reasons. Rabbit hole starts here. After debugging this for a while, it was clear that the colorspace information was lost between the encoded chunks and the decoded frames. The decoded video frames didn’t have the expected colorimetry values.

The VideoDecoderGStreamer class basically takes encoded chunks and notifies decoded VideoFrameGStreamer objects to the upper layers (JS) in WebCore. A video frame is basically a GstSample (Buffer and Caps) and we have code in place to interpret the colorimetry parameters exposed in the sample caps and translate those to the various WebCore equivalents. So far so good, but the caps set on the dav1ddec elements didn’t have those informations! I thought the dav1ddec element could be fixed, “shouldn’t be that hard” and I knew that code because I wrote it in 2018 :)

So let’s fix the GStreamer dav1ddec element. It’s a video decoder written in Rust, relying on the dav1d-rs bindings of the popular C libdav1d library. The dav1ddec element basically feeds encoded chunks of data to dav1d using the dav1d-rs bindings. In return, the bindings provide the decoded frames using a Dav1dPicture Rust structure and the dav1ddec GStreamer element basically makes buffers and caps out of this decoded picture. The dav1d-rs bindings are quite minimal, we implemented API on a per-need basis so far, so it wasn’t very surprising that… colorimetry information for decoded pictures was not exposed! Rabbit hole goes one level deeper.

So let’s add colorimetry API in dav1d-rs. When working on (Rust) bindings of a C library, if you need to expose additional API the answer is quite often in the C headers of the library. Every Dav1dPicture has a Dav1dSequenceHeader, in which we can see a few interesting fields:

typedef struct Dav1dSequenceHeader {
...
    enum Dav1dColorPrimaries pri; ///< color primaries (av1)
    enum Dav1dTransferCharacteristics trc; ///< transfer characteristics (av1)
    enum Dav1dMatrixCoefficients mtrx; ///< matrix coefficients (av1)
    enum Dav1dChromaSamplePosition chr; ///< chroma sample position (av1)
    ...
    uint8_t color_range;
    ...
...
} Dav1dSequenceHeader;

After sharing a naive branch with rust-av co-maintainers Luca Barbato and Sebastian Dröge, I came up with a couple pull-requests that eventually were shipped in version 0.10.3 of dav1d-rs. I won’t deny matching primaries, transfer, matrix and chroma-site enum values to rust-avs Pixel enum was a bit challenging :P Anyway, with dav1d-rs fixed up, rabbit hole level goes up one level :)

Now with the needed dav1d-rs API, the GStreamer dav1ddec element could be fixed. Again, matching the various enum values to their GStreamer equivalent was an interesting exercise. The merge request was merged, but to this date it’s not shipped in a stable gst-plugins-rs release yet. There’s one more complication here, ABI broke between dav1d 1.2 and 1.4 versions. The dav1d-rs 0.10.3 release expects the latter. I’m not sure how we will cope with that in terms of gst-plugins-rs release versioning…

Anyway, WebKit’s runtime environment can be adapted to ship dav1d 1.4 and development version of the dav1ddec element, which is what was done in this pull request. The rabbit is getting out of his hole.

The WebCodec AV1 tests were finally fixed in WebKit, by this pull request. Beyond colorimetry handling a few more fixes were needed, but luckily those didn’t require any fixes outside of WebKit.

Wrapping up, if you’re still reading this post, I thank you for your patience. Working on inter-connected projects can look a bit daunting at times, but eventually the whole ecosystem benefits from cross-project collaborations like this one. Thanks to Luca and Sebastian for the help and reviews in dav1d-rs and the dav1ddec element. Thanks to my fellow Igalia colleagues for the WebKit reviews.

by Philippe Normand at April 16, 2024 08:14 PM

April 04, 2024

Jesse Alama

The decimals around us: Cataloging support for decimal numbers

A cat­a­log of sup­port for dec­i­mal num­bers in var­i­ous pro­gram­ming lan­guages

Dec­i­mals num­bers are a data type that aims to ex­act­ly rep­re­sent dec­i­mal num­bers. Some pro­gram­mers may not know, or ful­ly re­al­ize, that, in most pro­gram­ming lan­guages, the num­bers that you en­ter look like dec­i­mal num­bers but in­ter­nal­ly are rep­re­sent­ed as bi­na­ry—that is, base-2—float­ing-point num­bers. Things that are to­tal­ly sim­ple for us, such as 0.1, sim­ply can­not be rep­re­sent­ed ex­act­ly in bi­na­ry. The dec­i­mal data type—what­ev­er its stripe or fla­vor—aims to rem­e­dy this by giv­ing us a way of rep­re­sent­ing and work­ing with dec­i­mal num­bers, not bi­na­ry ap­prox­i­ma­tions there­of. (Wikipedia has more.)

To help with my work on adding dec­i­mals to JavaScript, I've gone through a list of pop­u­lar pro­gram­ming lan­guages, tak­en from the 2022 Stack­Over­flow de­vel­op­er sur­vey. What fol­lows is a brief sum­ma­ry of where these lan­guages stand re­gard­ing dec­i­mals. The in­ten­tion is to keep things sim­ple. The pur­pose is:

  1. If a lan­guage does have dec­i­mals, say so;
  2. If a lan­guage does not have dec­i­mals, but at least one third-par­ty li­brary ex­ists, men­tion it and link to it. If a dis­cus­sion is un­der­way to add dec­i­mals to the lan­guage, link to that dis­cus­sion.

There is no in­ten­tion to fil­ter out an lan­guage in par­tic­u­lar; I'm just work­ing with a slice of lan­guages found in in the Stack­Over­flow list linked to ear­li­er. If a lan­guage does not have dec­i­mals, there may well be mul­ti­ple third-part dec­i­mal li­braries. I'm not aware of all li­braries, so if I have linked to a mi­nor li­brary and ne­glect to link to a more high-pro­file one, please let me know. More im­por­tant­ly, if I have mis­rep­re­sent­ed the ba­sic fact of whether dec­i­mals ex­ists at all in a lan­guage, send mail.

C

C does not have dec­i­mals. But they're work­ing on it! The C23 stan­dard (as in, 2023) stan­dard pro­pos­es to add new fixed bit-width data types (32, 64, and 128) for these num­bers.

C#

C# has dec­i­mals in its un­der­ly­ing .NET sub­sys­tem. (For the same rea­son, dec­i­mals also ex­ist in Vi­su­al Ba­sic.)

C++

C++ does not have dec­i­mals. But—like C—they're work­ing on it!

Dart

Dart does not have dec­i­mals. But a third-par­ty li­brary ex­ists.

Go

Go does not have dec­i­mals, but a third-par­ty li­brary ex­ists.

Java

Java has dec­i­mals.

JavaScript

JavaScript does not have dec­i­mals. We're work­ing on it!

Kotlin

Kotlin does not have dec­i­mals. But, in a way, it does: since Kotlin is run­ning on the JVM, one can get dec­i­mals by us­ing Java's built-in sup­port.

PHP

PHP does not have dec­i­mals. An ex­ten­sion ex­ists and at least one third-par­ty li­brary ex­ists.

Python

Python has dec­i­mals.

Ruby

Ruby has dec­i­mals. De­spite that, there is some third-par­ty work to im­prove the built-in sup­port.

Rust

Rust does not have dec­i­mals, but a crate ex­ists.

SQL

SQL has dec­i­mals (it is the DECIMAL data type). (Here is the doc­u­men­ta­tion for, e.g., Post­greSQL, and here is the doc­u­men­ta­tion for MySQL.)

Swift

Swift has dec­i­mals

Type­Script

Type­Script does not have dec­i­mals. How­ev­er, if dec­i­mals get added to JavaScript (see above), Type­Script will prob­a­bly in­her­it dec­i­mals, even­tu­al­ly.

April 04, 2024 12:52 PM

Getting started with Lean 4, your next programming language

I had the plea­sure of learn­ing about Lean 4 with David Chris­tiansen and Joachim Bre­it­ner at their tu­to­r­i­al at BOBKonf 2024. I‘m plan­ning on do­ing a cou­ple of for­mal­iza­tions with Lean and would love to share what I learn as a to­tal new­bie, work­ing on ma­cOS.

Need­ed tools

I‘m on ma­cOS and use Home­brew ex­ten­sive­ly. My sim­ple go-to ap­proach to find­ing new soft­ware is to do brew search lean. This re­vealed lean as well as sur­face elan. Run­ning brew info lean showed me that that pack­age (at the time I write this) in­stalls Lean 3. But I know, out-of-band, that Lean 4 is what I want to work with. Run­ning brew info elan looked bet­ter, but the out­put re­minds me that (1) the in­for­ma­tion is for the elan-init pack­age, not the elan cask, and (2) elan-init con­flicts with both the elan and the afore­men­tioned lean. Yikes! This strikes me as a po­ten­tial prob­lem for the com­mu­ni­ty, be­cause I think Lean 3, though it still works, is pre­sum­ably not where new Lean de­vel­op­ment should be tak­ing place. Per­haps the Home­brew for­mu­la for Lean should be up­dat­ed called lean3, and a new lean4 pack­age should be made avail­able. I‘m not sure. The sit­u­a­tion seems less than ide­al, but in short, I have been suc­cess­ful with the elan-init pack­age.

Af­ter in­stalling elan-init, you‘ll have the elan tool avail­able in your shell. elan is the tool used for main­tain­ing dif­fer­ent ver­sions of Lean, sim­i­lar to nvm in the Node.js world or pyenv.

Set­ting up a blank pack­age

When I did the Lean 4 tu­to­r­i­al at BOB, I worked en­tire­ly with­in VS Code (…) and cre­at­ed a new stand­alone pack­age us­ing some in-ed­i­tor func­tion­al­i­ty. At the com­mand line, I use lake init to man­u­al­ly cre­ate a new Lean pack­age. At first, I made the mis­take of run­ning this com­mand, as­sum­ing it would cre­ate a new di­rec­to­ry for me and set up any con­fig­u­ra­tion and boil­er­plate code there. I was sur­prised to find, in­stead, that lake init sets things up in the cur­rent di­rec­to­ry, in ad­di­tion to cre­at­ing a sub­di­rec­to­ry and pop­u­lat­ing it. Us­ing lake --help, I read about the lake new com­mand, which does what I had in mind. So I might sug­gest us­ing lake new rather than lake init.

What‘s in the new di­rec­to­ry? Do­ing tree foobar re­veals

foobar ├── Foobar │   └── Basic.lean ├── Foobar.lean ├── Main.lean ├── lakefile.lean └── lean-toolchain

Tak­ing a look there, I see four .lean files. Here‘s what they con­tain:

Main.lean

import «Foobar» def main : IO Unit := IO.println s!"Hello, {hello}!"

Foobar.lean

-- This module serves as the root of the `Foobar` library. -- Import modules here that should be built as part of the library. import «Foobar».Basic

Foobar/Basic.lean

def hello := "world"

lakefile.lean

import Lake open Lake DSL package «foobar» where -- add package configuration options here lean_lib «Foobar» where -- add library configuration options here @[default_target] lean_exe «foobar» where root := `Main

It looks like there‘s a lit­tle mod­ule struc­ture here, and a ref­er­ence to the iden­ti­fi­er hello, de­fined in Foobar/Basic.lean and made avail­able via Foobar.lean. I’m not go­ing to touch lakefile.lean for now; as a new­bie, it looks scary enough that I think I’ll just stick to things like Basic.lean.

There‘s also an au­to­mat­i­cal­ly cre­at­ed .git there, not shown in the di­rec­to­ry out­put above.

Now what?

Now that you‘ve got Lean 4 in­stalled and set up a pack­age, you‘re ready to dive in to one of the of­fi­cial tu­to­ri­als. The one I‘m work­ing through is David‘s Func­tion­al Pro­gram­ming in Lean. There‘s all sorts of ad­di­tion­al things to learn, such as all the dif­fer­ent lake com­mands. En­joy!

April 04, 2024 01:48 AM

April 03, 2024

Brian Kardell

The Blessing of the Strings

The Blessing of the Strings

Trusted Types have been a proposal by Google for quite some time at this point, but it's currently getting a lot of attention and work in all browsers (Igalia is working on implementations in WebKit and Gecko, sponsored by Salesforce and Google, respectively). I've been looking at it a lot and thought it's probably something worth writing about.

The Trusted Types proposal is about preventing Cross-site scripting (XSS), and rides atop Content Security Policy (CSP) and allows website maintainers to say "require trusted-types". Once required, lots of the Web Platform's dangerous API surfaces ("sinks") which currently require a string will now require... well, a different type.

myElement.innerHTML (and a whole lot of other APIs) for example, would now require a TrustedHTML object instead of just a string.

You can think of TrustedHTML as an interface indicating that a string has been somehow specially "blessed" as safe... Sanitized.

the Holy Hand grenade scene from Monty Python's Holy Grail
And Saint Attila raised the string up on high, saying, 'O Lord, bless this thy string, that with it we may trust that it is free of XSS...' [ref].

Granting Blessings

The interesting thing about this is how one goes about blessing strings, and how this changes the dynamics of development and safety to protect from XSS.

To start with, there is a new global trustedTypes object (available in both window and workers) with a method called .createPolicy which can be used to create "policies" for blessing various kinds of input (createHTML, createScript, and createScriptURL). Trusted Types comes with the concept of a default policy, and the ability for you to register a specially named "default"...

//returns a policy, but you 
// don't really need to do anything 
// with the default one
trustedTypes.createPolicy(
    "default", 
    {
      createHTML: s => { 
          return DOMPurify.sanitize(s) 
      } 
    }
);

And now, the practical upshot is that all attempts to set HTML will be sanitized... So if there's some code that tries to do:

// if str contains
// `&lt;img src="no" onerror="<em>dangerous code</em>" &gt;`;
target.innerHTML =  str;

Then the onerror attribute will be automatically stripped (sanitized) before .innerHTML gets it.

Hey that's pretty cool!

one of the scenes where the castle guard is mocking arthur and his men
It's almost like you just put defenses around all that stuff and can just peer over the wall at would be attackers and make faces at them....

But wait... can't someone come along then and just create a more lenient policy called default?

No! That will throw an exception!

Also, you don't have to create a default. If you don't, and someone tries to use one of those methods to assign a string, it will throw.

The only thing this enforcement cares about is that it is one of these "blessed" types. Website administrators can also provide (in the header) the name of 1 or more policies which should be created.

Any attempts to define a policy not in that list will throw (it's a bit more complicated than that, see Name your Policy below). Let's imagine that in the header we specified that a policy named "sanitize" is allowed to be created.

Maybe you can see some of why that starts to get really interesting. In order to use any of those APIs (at all), you'd need access to a policy in order to bless the string. But because the policy which can do that blessing is a handle, it's up to you what code you give it to...

{
  const sanitizerPolicy = 
      trustedTypes.createPolicy(
        "sanitize",
        {
          createHTML: s => { 
            return DOMPurify.sanitize(s) 
        } 
  );


    // give someOtherModule access to a sanitization policy
    someOtherModule.init(sanitizerPolicy)

    // yetAnotherModule can't even sanitize, any use of those
    // APIs will throw
    yetAnotherModule.foo()
}

// Anything out here also doesn't have 
// access to a sanitization policy

What's interesting about this is that the thing doing the trusting on the client, is actually on the client as well - but the pattern ensures that this becomes a considerably more finite problem. It is much easier to audit whether the "trust" is warranted. That is, we can look at the above to see that there is only one policy and it only supports creating HTML. We can see that the trust there is placed in DOMPurify, and even that amount of trust is only provided to select modules.

Finally, most importantly: It is a pattern that is machine enforceable. Anything that tries to use any of those APIs without a blessed string (a Trusted Type) will fail... Unless you ask it not to.

Don't Throw, Just Help?

Shutting down all of those APIs after the fact is hard because all of those dangerous APIs are also really useful and therefore widely used. As I said earlier, auditing to find and understand all uses of them all is pretty difficult. Chances are pretty good that there might just be a lot more unsafe stuff floating around in your site than you expected.

Instead of Content-Security-Policy CSP headers, you can send Content-Security-Policy-Report-Only and include a directive that includes report-to /csp-violation-report-endpoint/ where /csp-violation-report-endpoint/ is an endpoint path (on the same origin). If set, whenever violations occur, browsers should send a request to report a violation to that endpoint (JSON formatted with lots of data).

The general idea is that it is then pretty easy to turn this on and monitor your site to discover where you might have some problems, and begin to work through them. This should be especially good for your QA environment. Just keep in mind that the report doesn't actually prevent the potentially bad things from happening, it just lets you know they exist.

Shouldn't there just be a standard santizer too?

Yes!! That is also a thing that is being worked on.

Name Your Policy

I'm not going to lie, I found CSP/headers to be both a little confusing to read and to figure out their relationships. You might see a header set up to report only....

Content-Security-Policy-Report-Only: report-uri /csp-violation-report-endpoint; default-src 'self'; require-trusted-types-for 'script'; trusted-types one two;

Believe it or not that's a fairly simple one. Basically though, you split it up on semi-colons and each of those is a directive. The directive has a name like "report-uri" followed by whitespace and then a list of values (potentially containing only 1) which are whitespace separated. There are also keyword values which are quoted.

So, the last two parts of this are about Trusted Types. The first, require-trusted-types-for is about what gets some kind of enforcement and really the only thing you can put there currently is the keyword 'script'. The second, trusted-types is about what policies can be created.

Note that I said "some kind of enforcement" because the above is "report only" which means those things will report, but not actually throw, while if we just change the name of the header from Content-Security-Policy-Report-Only to Content-Security-Policy lots of things might start throwing - which didn't greatly help my exploration. So, here's a little table that might help..

If the directives are... then...
(missing) You can create whatever policies you want (except duplicates), but they aren't enforced in any way.
require-trusted-types-for 'script'; You can create whatever policies you want (except duplicates), and they are enforced. All attempts to assign strings to those sinks will throw. This means if you create a policy named default, it will 'bless' strings through that automatically, but it also means anyone can create any policy to 'bless' strings too.
trusted-types You cannot create any policies whatsoever. Attempts to will throw.
trusted-types 'none' Same as with no value.
trusted-types a b You can call createPolicy with names 'a' and 'b' exactly once. Attempts to call with other names (including 'default'), or repeatedly will throw.
trusted-types default You can call createPolicy with names 'default' exactly once. Attempts to call with other names, or repeatedly will throw.
require-trusted-types-for 'script'; trusted-types a You can call createPolicy with names 'a' exactly once. Attempts to call with other names (including default), or repeatedly will throw. All attempts to assign strings to those sinks will throw unless they are 'blessed' from a function in a policy named 'a'

April 03, 2024 04:00 AM

April 02, 2024

Maíra Canal

Linux 6.8: AMD HDR and Raspberry Pi 5

The Linux kernel 6.8 came out on March 10th, 2024, bringing brand-new features and plenty of performance improvements on different subsystems. As part of Igalia, I’m happy to be an active part of many features that are released in this version, and today I’m going to review some of them.

Linux 6.8 is packed with a lot of great features, performance optimizations, and new hardware support. In this release, we can check the Intel Xe DRM driver experimentally, further support for AMD Zen 5 and other upcoming AMD hardware, initial support for the Qualcomm Snapdragon 8 Gen 3 SoC, the Imagination PowerVR DRM kernel driver, support for the Nintendo NSO controllers, and much more.

Igalia is widely known for its contributions to Web Platforms, Chromium, and Mesa. But, we also make significant contributions to the Linux kernel. This release shows some of the great work that Igalia is putting into the kernel and strengthens our desire to keep working with this great community.

Let’s take a deep dive into Igalia’s major contributions to the 6.8 release:

AMD HDR & Color Management

You may have seen the release of a new Steam Deck last year, the Steam Deck OLED. What you may not know is that Igalia helped bring this product to life by putting some effort into the AMD driver-specific color management properties implementation. Melissa Wen, together with Joshua Ashton (Valve), and Harry Wentland (AMD), implemented several driver-specific properties to allow Gamescope to manage color features provided by the AMD hardware to fit HDR content and improve gamers’ experience.

She has explained all features implemented in the AMD display kernel driver in two blog posts and a 2023 XDC talk:

Async Flip

André Almeida worked together with Simon Ser (SourceHut) to provide support for asynchronous page-flips in the atomic API. This feature targets users who want to present a new frame immediately, even if after missing a V-blank. This feature is particularly useful for applications with high frame rates, such as gaming.

Raspberry Pi 5

Raspberry Pi 5 was officially released on October 2023 and Igalia was ready to bring top-notch graphics support for it. Although we still can’t use the RPi 5 with the mainline kernel, it is superb to see some pieces coming upstream. Iago Toral worked on implementing all the kernel support needed for the V3D 7.1.x driver.

With the kernel patches, by the time the RPi 5 was released, it already included a fully 3.1 OpenGL ES and Vulkan 1.2 compliant driver implemented by Igalia.

GPU stats and CPU jobs for the Raspberry Pi 4/5

Apart from the release of the Raspberry Pi 5, Igalia is still working on improving the whole Raspberry Pi environment. I worked, together with José Maria “Chema” Casanova, implementing the support for GPU stats on the V3D driver. This means that RPi 4/5 users now can access the usage percentage of the GPU and they can access the statistics by process or globally.

I also worked, together with Melissa, implementing CPU jobs for the V3D driver. As the Broadcom GPU isn’t capable of performing some operations, the Vulkan driver uses the CPU to compensate for it. In order to avoid stalls in the job submission, now CPU jobs are part of the kernel and can be easily synchronized though with synchronization objects.

If you are curious about the CPU job implementation, you can check this blog post.

Other Contributions & Fixes

Sometimes we don’t contribute to a major feature in the release, however we can help improving documentation and sending fixes. André also contributed to this release by documenting the different AMD GPU reset methods, making it easier to understand by future users.

During Igalia’s efforts to improve the general users’ experience on the Steam Deck, Guilherme G. Piccoli noticed a message in the kernel log and readily provided a fix for this PCI issue.

Outside of the Steam Deck world, we can check some of Igalia’s work on the Qualcomm Adreno GPUs. Although most of our Adreno-related work is located at the user-space, Danylo Piliaiev sent a couple of kernel fixes to the msm driver, fixing some hangs and some CTS tests.

We also had contributions from our 2023 Igalia CE student, Nia Espera. Nia’s project was related to mobile Linux and she managed to write a couple of patches to the kernel in order to add support for the OnePlus 9 and OnePlus 9 Pro devices.

If you are a student interested in open-source and would like to have a first exposure to the professional world, check if we have openings for the Igalia Coding Experience. I was a CE student myself and being mentored by a Igalian was a incredible experience.

Check the complete list of Igalia’s contributions for the 6.8 release

Authored (57):

André Almeida (2)

Danylo Piliaiev (2)

Guilherme G. Piccoli (1)

Iago Toral Quiroga (4)

Maíra Canal (17)

Melissa Wen (27)

Nia Espera (4)

Signed-off-by (88):

André Almeida (4)

Danylo Piliaiev (2)

Guilherme G. Piccoli (1)

Iago Toral Quiroga (4)

Jose Maria Casanova Crespo (2)

Maíra Canal (28)

Melissa Wen (43)

Nia Espera (4)

Acked-by (4):

Jose Maria Casanova Crespo (2)

Maíra Canal (1)

Melissa Wen (1)

Reviewed-by (30):

André Almeida (1)

Christian Gmeiner (1)

Iago Toral Quiroga (20)

Maíra Canal (4)

Melissa Wen (4)

Tested-by (1):

Guilherme G. Piccoli (1)

April 02, 2024 11:00 AM