Planet Igalia

January 20, 2022

Tim Chevalier

Implementing records in Warp

Toward the goal of implementing records and tuples in Warp, I’m starting with code generation for empty records. The two modules I’ve been staring at the most are WarpBuilder.cpp and WarpCacheIRTranspiler.cpp.

While the baseline compiler translates bytecode and CacheIR directly into assembly code (using the MacroAssembler), the optimizing compiler (Warp) uses two intermediate languages: it translates bytecode and CacheIR to MIR; MIR to LIR; and then LIR to assembly code.

As explained in more detail here, WarpBuilder takes a snapshot (generated by another module, WarpOracle.cpp) of running code, and for each bytecode, it generates MIR instructions, either from CacheIR (for bytecode ops that can have inline caches), or directly. For ops that can be cached, WarpBuilder calls its own buildIC(), method, which in turn calls the TranspileCacheIRToMIR() method in WarpCacheIRTranspiler.

A comment in WarpBuilderShared.h says “Because this code is used by WarpCacheIRTranspiler we should generally assume that we only have access to the current basic block.” From that, I’m inferring that WarpCacheIRTranspiler maps each CacheIR op onto exactly one basic block. In addition, the addEffectful() method in WarpCacheIRTranspiler enforces that each basic block contains at most one effectful instruction.

In the baseline JIT implementation that I already finished, the InitRecord and FinishRecord bytecodes each have their own corresponding CacheIR ops; I made this choice by looking at how existing ops like NewArray were implemented, though in all of these cases, I’m still not sure I fully understand what the benefit of caching is (rather than just generating code) — my understanding of inline caching is that it’s an optimization to avoid method lookups when polymorphic code is instantiated repeatedly at the same type, and in all of these cases, there’s no type-based polymorphism.

I could go ahead and add InitRecord and FinishRecord into MIR and LIR as well; this would be similar to my existing code where the BaselineCacheIRCompiler compiles these operations to assembly. To implement these operations in Warp, I would add similar code to CodeGenerator.cpp (the module that compiles LIR to assembly) as what is currently in the BaselineCacheIRCompiler.

But, MIR includes some lower-level operations that aren’t present in CacheIR — most relevantly to me, operations for manipulating ObjectElements fields: Elements, SetInitializedLength, and so on. Using these operations (and adding a few more similar ones), I could translate FinishRecord to a series of simpler MIR operations, rather than adding it to MIR. To be more concrete, it would look something like:

(CacheIR)

FinishRecord r

== WarpCacheIRTranspiler ==>

(MIR)

e = Elements r
Freeze e
sortedKeys = LoadFixedSlot r SORTED_KEYS_SLOT
sortedKeysElements = Elements sortedKeys
CallShrinkCapacityToInitializedLength sortedKeys
SetNonWritableArrayLength sortedKeysElements
recordInitializedLength = InitializedLength r
SetArrayLength sortedKeysElements recordInitializedLength
CallSort sortedKeys

(I’m making up a concrete syntax for MIR.)

This would encapsulate the operations involved in finishing a record, primarily sorting the keys array and setting flags to ensure that the record and its sorted keys array are read-only. Several of these are already present in MIR, and the others would be easy to add, following existing operations as a template.

The problem with this approach is that FinishRecord in CacheIR would map onto multiple effectful MIR instructions, so I can’t just add a case for it in WarpCacheIRTranspiler.

I could also push the lower-level operations up into CacheIR, but I don’t know if that’s a good idea, since presumably there’s a reason why it hasn’t been done already.

To summarize, the options I’m considering are:

  1. Pass down InitRecord and FinishRecord through the pipeline by adding them to MIR and LIR
  2. Open up FinishRecord (InitRecord isn’t as complicated) in the translation to MIR, which might involve making FinishRecord non-cacheable altogether
  3. Open up FinishRecord in the translation to CacheIR, by adding more lower-level operations into CacheIR

I’ll have to do more research and check my assumptions before making a decision. A bigger question I’m wondering about is how to determine if it’s worth it to implement a particular operation in CacheIR at all; maybe I’m going about things the wrong way by adding the record/tuple opcodes into CacheIR right away, and instead I should just be implementing code generation and defer anything else until benchmarks exist?

by Tim Chevalier at January 20, 2022 07:31 AM

January 18, 2022

Tim Chevalier

Tuple prototype methods ☑️

It’s been a little while. But since my last post, Nicolò’s patches implementing records and tuples in SpiderMonkey landed, which means that I was able to submit my patches implementing the prototype methods for the Tuple type. That code is awaiting review.

Some initial review comments that I received focused on concerns about changing existing code that implements Arrays. Immutable and mutable sequences support a lot of the same operations, so it made sense to re-use existing Array code in SpiderMonkey rather than reinventing the wheel. Section 8.2.3 of the Records nad Tuples proposal lists the tuple prototype methods and, for many of them, specifies that the behavior should be the same as what’s already specified for the equivalent Array method.

In some cases, code reuse required changing existing code. Built-in JavaScript types can have their methods implemented in SpiderMonkey in two different ways: self-hosted (implemented in JavaScript as a library) or native (implemented inside the compiler, in C++). I implemented most of the tuple methods as self-hosted methods, which can easily call existing self-hosted methods for arrays. In some cases, I thought a native implementation would be more efficient: for example, for the toReversed() method, which returns a reversed copy of a given tuple. (Since tuples are immutable, it can’t reverse in-place.) There’s already an efficient C++ implementation of reverse for arrays, and because of how tuples are currently implemented in SpiderMonkey (using the NativeObject C++ type, which is the same underlying type that represents arrays and most other built-in types), making it work for tuples as well just required changing the method’s precondition, not the implementation. However, the maintainers were reluctant to allow any changes to the built-in array methods. I replaced that with a straightforward self-hosted implementation of toReversed(); it seems a shame to not take advantage of existing optimized code, but perhaps I was falling prey to premature optimization. We don’t have any performance benchmarks for code that uses tuples yet, and without profiling, it’s impossible to know what will actually be performance-critical.

While I await further review, I’ll be learning about WarpMonkey, the next tier of optimizing JIT in SpiderMonkey. The work I described in my previous posts was all about implementing records and tuples in the baseline compiler, and that work is finished (I’ll hold off on submitting it for review until after the prototype methods code has been accepted). I expect this to be where the real fun is!

by Tim Chevalier at January 18, 2022 12:29 AM

January 17, 2022

Víctor Jáquez

Digging further into Flatpak with NVIDIA

As you may know the development environment used by WebKitGTK and WPE is based on Flatpak. I feel hacking software within Flatpak like teleoperating a rover in Mars, since I have to go through Flatpak commands to execute the commands I want to execute. The learning curve is steeper in exchange of a common development environment.

I started to work on another project where is required to use an NVIDIA GPU, without stopping to work on WebKitGTK/WPE. So I needed to use the card within Flatpak, and it’s well known that, currently, that setup is not available out-of-the-box. Furthermore, I have to use a very specific version of the graphics card drive for Vulkan.

This is the story of how I make it work.

My main reference is, of course, the blog post of my colleague TingPing: Using host Nvidia driver with Flatpak, besides flatpak’s NVIDIA GL runtime platform.

As TingPing explained, flatpak does not use host libraries, that’s why it might need runtimes and extensions for specific hardware setups, with the libraries for user-space, such as NVIDIA GL platform runtime. And it must have the same version as the one running in kernel.

NVIDIA GL platform extension is a small project which generates flatpak’s runtimes for every public NVIDIA driver. The interesting part is that those runtimes are not created in building time, but at install-time. When the user installs the runtime, a driver blob is downloaded from NVIDIA servers (see --extra-data in flatpak build-finish for reference), and a small program is executed, which extracts the embedded tarball in the blob, and from it, it extracts the required libraries. In a few words, initially, the runtime is composed only by a definition of the file to
download, and the small program that populates the flatpak’s filesystem at install-time.

The trick here, that took me a lot to be aware, is that this small program has to be statically compiled, since it has be executed regardless the available runtime.

This little program uses libarchive to extract the libraries from NVIDIA’s tarball, but libarchive is not available statically in any flatpak’s SDK. Furthermore, our use of libarchive will depend on libz and liblzma, both statically compile as well. Gladly, there’s only one, very old version, obsolete, of freedesktop SDK, which offers static versions of libz and liblzma: 1.6. And that’s why org.freedesktop.Platform.GL.nvidia demands that specific old version of the SDK. Then, the manifest of the extension contains basically the static compilation of libarchive and the static compilation of the next-to-be apply_extra.

Update: There’s a merge request to use current freedesktop SDK 21.08, which, basically, builds statically libz and liblzma, besides libarchive.

I needed to modify org.freedesktop.Platform.GL.nvidia sources a bit, since it, by default, consist in a big loop of downloading, hashing, templating a json manifest, and building, for every supported driver. But, as my case is just one custom driver, I don’t want to waste time in that loop. The hack to achieve it is fairly simple:

diff --git a/versions.sh b/versions.sh
index 8b72664..86686c0 100755
--- a/versions.sh
+++ b/versions.sh
@@ -15,4 +15,5 @@ TESLA_VERSIONS="450.142.00 450.119.04 450.51.06 450.51.05 440.118.02 440.95.01 4
# Probably never: https://ahayzen.com/direct/flathub_downloads_only_nvidia_runtimes.txt
UNSUPPORTED_VERSIONS="390.147 390.144 390.143 390.141 390.138 390.132 390.129 390.116 390.87 390.77 390.67 390.59 390.48 390.42 390.25 390.12 387.34 387.22 387.12 384.130 384.111 384.98 384.90 384.69 384.59 384.47 381.22 381.09 378.13 375.82 375.66 375.39 375.26 370.28 367.57"

-DRIVER_VERSIONS="$BETA_VERSIONS $VULKAN_VERSIONS $NEW_FEATURE_VERSIONS $PRODUCTION_VERSIONS $LEGACY_VERSIONS $TESLA_VERSIONS $UNSUPPORTED_VERSIONS"
+#DRIVER_VERSIONS="$BETA_VERSIONS $VULKAN_VERSIONS $NEW_FEATURE_VERSIONS $PRODUCTION_VERSIONS $LEGACY_VERSIONS $TESLA_VERSIONS $UNSUPPORTED_VERSIONS"
+DRIVER_VERSIONS="470.XX.XX"

But in order to make it work, it needs a file in data/ directory with the specification of the file to download, with the format: NAME:SHA256:DOWNLOAD-SIZE:INSTALL-SIZE:URL.

--- /dev/null
+++ b/data/nvidia-470.XX.XX-x86_64.data
@@ -0,0 +1 @@
+:34...checksum-sha264...:123456789::http://compu.home.arpa/NVIDIA/NVIDIA-Linux-x86_64-470.XX.XX.run

The last parameter is the URL where the driver shall be downloaded. In my case is a local server to ease the testing.

Long story short, the command to execute are:

To setup the building environment:

$ flatpak install org.freedesktop.Sdk//1.6 org.freedesktop.Platform//1.6

To build the flatpak repository and package:

$ make

The command will output a repo directory in the current one. There’s where the generated flatpak package is stored.

To install the local repository and the extension:

$ flatpak --user remote-add --no-gpg-verify nvidia-local repo
$ flatpak -v install nvidia-local org.freedesktop.Platform.GL.nvidia-470-XX-XX

To remove the obsolete SDK and platform once built:

$ flatpak uninstall org.freedesktop.Sdk//1.6 org.freedesktop.Platform//1.6

To remove the local repository and the extension if something went wrong:

$ flatpak -v uninstall org.freedesktop.Platform.GL.nvidia-470-62-15
$ flatpak --user remote-delete nvidia-local

One way to verify if the libraries are installed correctly and if they match with the driver running in the kernel’s host, is to install and run GreenWithEnvy:

$ flatpak install com.leinardi.gwe
$ flatpak run com.leinardi.gwe

If you want to install the driver in your WebKit development environment, you just need to set the environment variable FLATPAK_USER_DIR:

$ FLATPAK_USER_DIR=~/WebKit/WebKitBuild/UserFlatpak flatpak --user remote-add --no-gpg-verify nvidia-local repo
$ FLATPAK_USER_DIR=~/WebKit/WebKitBuild/UserFlatpak flatpak -v install nvidia-local org.freedesktop.Platform.GL.nvidia-470-XX-XX

by vjaquez at January 17, 2022 12:46 PM

January 14, 2022

Brian Kardell

What even is a web browser?

What even is a web browser?

A number of things (some of which I will share later this month) have me thinking about this question. It probably seems silly at first, but it's suprisingly kind of interesting - and maybe important question.

The other day, Chris Coyier wrote a post called What is Chromium without Chrome on top that hit on some things I've been thinking about/discussing recently too, so I thought I'd share a bit more.

To the vast majority of people, a web browser is - you know, "the button for the internet". It's that icon that you click to somehow navigate the web. I mean, what could be simpler?

Within the web community, we break this down pretty commonly into "engines" and "browsers". The common description of which is roughly something like...

The browser provides the window(s) and interface (the 'chrome') around the stuff inside. The actual rendering of the website inside the window is handled by the rendering engine.

However, this is actually kind of incomplete, and quite a lot of interesting and important conversations and debates require more. Sadly, as you dig into it - things can get a little overloaded and confusing.

The un-browser

Most browsers can be launched, from the terminal, in a "full screen" mode. Many can be launched in a true "kiosk" mode, even. In both cases, with none of the typical "browser" user interface parts at all. What if someone released a product that only did that.... Is that still a browser?

Given a good start page (any popular search engine, for example), the web would still work pretty darn well. If there is some fundamental need for something (back in history, seems important), providing it via a typical window chrome with a URL bar and so on isn't the only possible way to provide that.

If it isn't a browser - what is it? Perhaps the closest answer is that it is a WebView.

Web standards, platform neutrality and good rendering are super useful to just about any program. Many "regular programs" from Sublime Text to Slack contain some kind of WebView to do all of that. WebViews (or any of the few things like it) are often called "embedded browsers".

But... is a WebView really a browser? Hmm... Tricky.

No True Scotsman

You might have had that experience where you are in some "regular program" and clikcing around, reading - and everything is pretty much fine. And then you click a link to read something and are asked to login. But wait... you are logged in, in your browser! That kind of stuff is painful for users, and some people say that that is because it isn't a "real browser". However, that this is what happens if you have more than one "real browser" installed too. An important distinction here, perhaps, is the concept of a "default browser".

But now we get into a very uncanny valley and a lot of intersting challenges and further distinctions.

If you download Firefox, launch it and don't make it your default browser, clicking a link from a page within Firefox still opens in Firefox. You, as a user, chose to browse with Firefox right now.

That is, sort of what is happening when you use something with an embedded browser too. You've chosen to browse with, say, Facebook Messenger - and, is it a browser? Kinda?!

We really haven't entirely sorted this out.

One common technique when making a "real program" with a WebView is to just not have it handle HTTP requests at all. All of the "program stuff" uses some local, non-HTTP scheme. When a user clicks a regular HTTP-based hyperlink, then, most GUI operating systems receive the notification of an effort to open a HTTP URL. To deal with this, they launch the program registered as the default handler for that - the default browser.

Are you a browser? If you are, you have to tell me.

In other words, applications built like this have to decide: "should you be handling this?" and provide an answer to the question "are you really a browser?".

Nearly all "regular programs" built with WebViews don't actually claim to be a browser, and don't offer the ability to be registered as the default browser. The first question though is often stickier - especially on mobile.

"Should you be handling this?" is often a kind of tricky question. They are after all, mostly capable of rendering anything your "real browser" can and there are valid reasons to open their own content, like documentation, that actually lives on the web, at least. The perceived cost of leaving the experience on mobile is also much higher. So, it gets tricky.

But operating systems vary too, and they change over time, learning lessons and adapting along the way. It's kind of hard to pin down.

Default WebView

Since WebViews are super useful for all apps, every GUI OS/desktop today also ships with a default WebView that programs can use. That makes a lot of sense, because there's no point in everyone independently packing up and managing their own - and this way you could share logins and stuff.

It makes even more sense when you realize how intertwined the OS and the rendering engine are.

In fact, "engine" is a better metaphor than the way we usually describe it. An engine, all by itself, can't do anything. It only has potential. It needs to be hooked up in all directions: Something to provide it fuel, something to transfer its power into useful motion, and something to govern that with input.

Web engines aren't different in that respect - they provide consistent abstractions for features but need to be connected in all directions to the OS. Each of the engines have different philosophical/design differences about what needs to be hooked up, or how.

Form controls are one example. Some controls are always provided by the OS, others, in some engines are provided by the engine itself. WebSpeech is another. WebKit, for example supports WebSpeech - but some ports of WebKit don't. Chrome is similar. Speech itself can be wired up in many ways. In some browsers this wires to a service on the Web. In some browsers this uses a local speech engine packed up with the browser. In some it wires up to the OS level speech engine. In some browsers it uses one as a preference and the other as a backup, and so on.

So, WebViews (or evolutions of them) are an even higher level abstraction. They're more like a kit car which is pretty much drivable - it just doesn't look like much.

Whose job is it anyways?

There are a lot of choices that go into all of this. Think about it: There is no keyboard on some devices. Whose job it is to present a virtual keyboard? The "easy" answer is that the OS itself fundamentally needs that concept so it can offer it as a service. That's pretty cool until the site decides you need animated gifs or emjoiis or a particular keyboard layout, etc. Actually, some apps kind of like having control because it means that all of their users have a more similar experience, regardless of mobile OS.

The OS uncanny valley

So... OSes pretty much always also ship a default browser, and you can change what the default browser is. Lots of apps use WebViews, so the OSes provide a default WebView that they can use instead of shipping their own.

At some level, it would make sense if the default browser itself used the default OS level WebView. That would make our initial statement almost true: The browser handles the stuff on the outside, and the WebView handles the stuff on the inside.

Wait... But... Do they work that way?!

¯\_(ツ)_/¯

Recently Duck Duck Go announced that they would do something just like this - depend on the OS level default view. That is... kind of an interesting take.

The ecosystem interplay

There is a really interesting aspect to all of this: The Duck Duck Go version kind of leans into what others are upset about in iOS. Basically, "That's not the choice of a browser - because there is only one rendering engine allowed".

But... is it? I mean, everyday humans don't really choose engines. They choose something higher level: A device. They use what that device makes available to them unless it gets real bad. IE won the first browser wars because they were the default, and they more than met expectations. There was no need to go elsewhere - you had, at the time, the best browser. A browser with a synergy with the OS too.

People only began looking beyond IE when lots of stuff just stopped working. Is it great that there was a competitor who could step in on the same OS - sure. Definitely but a lot of things have changed too.

Contrast this with Android today. The default browser has the lion's share for lots of reasons - some of them are pretty good ones even. Most of what isn't that is still Chromium based. Gecko is technically "there" but it exists only at a non-trivial cost which ultimately hasn't amounted to much in terms of users. I guess this is important for Mozilla because they don't have a default OS, but you'll note that there's not a WebKit option there (yet), and there wasn't an Edge option when that was it's own engine either - until sharing the burden through Chromium made that a lot easier.

If tomorrow, Google disbanded the Chrome team though - one has to imagine that the natural successor on Android is still chromium based.

...If there is a successor.

Chromium is bordering on 30 million lines of code - it takes an army of unparalleled size, 80% of which today is Google committers. Only a company of astonishing size and means could step in and fill the gap just to take it over and keep the lights on - Microsoft, probably, is our best hope - but I wrote lots of words around those kinds of worries and possibilites in my post Webrise

From another angle ...

Of course, there are those that claim that WebKit is too bad and people would choose differently on iOS if they could. I am biased, of course, but I don't see any anecdotal data that the general public actually agrees. I know very few iOS users who aren't engineers who complain about Safari - and I know very few engineers who won't admit that WebKit has made huge strides in the last few years.

If there's a random person I know who has installed another browser on iOS, it's not been about shortcomings, but everything to do with seamless integration with other Google products they own, or the payment model on Brave, or something.

What's interesting to me about all of this is that one can think about it from another angle: While it's easy to agree with diversity of choice - keeping a competitive market of actual choices around in the first place is maybe even trickier with the current model we've built. It's interesting to think about how differently things might have played out if everything had followed the "browser is built on the OS level default WebView" model, where those WebViews were spaces of great collaboration and flexible architectures allowing differentiation.

The cost to making the standard stuff would go way down - the cost toward providing the actual products would go way down. As long as there is variety of OS (there definitely will continue to be, with so many devices and uses floating out there) there would be a diversity of engines, and competition on more or less equal footing at the space of the stuff around them.

I'm not suggesting that's the way it should be, but -- it's kind of interesting to think about.

January 14, 2022 05:00 AM

January 07, 2022

Clayton Craft

Network booting an aarch64 SBC with u-boot and iPXE

I recently started trying to figure out network booting for aarch64 single board computers (SBC), such as the Raspberry Pi, for a new CI I've been helping out with at my Igalia day job. For one reason or another, I never participated in the Raspberry Pi "fad" (maybe because they use Broadcom chips, which are [or were] notoriously unfriendly on Linux? I don't recall why... But I digress...)

But, I do have a quite capable aarch64 SBC just laying around, literally collecting dust... the Purism Librem 5 DevKit!

While not exactly a Raspberry Pi, I believe many of the concepts pre-Linux boot are similar and this should serve as a decent replacement until the Great Chip Shortage of 2020-???? is over and those things are available for purchase again.

The general idea is that u-boot will execute iPXE, which will be responsible establishing a network connection and booting whatever the DHCP server on the other end tells it to boot. The end goal is to have it load/boot the Linux kernel and an initfs based on boot2container.

This is the first in a series of posts to get there. The focus of this initial post is building/setting up iPXE, the DHCP server, and doing a test boot from u-boot.

Building iPXE and configuring the devkit

The first step is to build iPXE, since I want to embed a script for it to run automatically on boot. I did the compilation on the devkit, since iPXE is a relatively small program and it didn't take too long to compile on this CPU:

$ git clone git://git.ipxe.org/ipxe.git
$ cd ipxe/src

## needed so that ipxe doesn't lock up if you want to C-b to enter the cmdline
$ cat << EOF > config/local/nap.h
#undef NAP_EFIX86
#undef NAP_EFIARM
#define NAP_NULL
EOF

## and create a simple ipxe script that will be executed when ipxe runs:
$ cat << EOF > ipxescript
#!ipxe

:retry_dhcp
echo Acquiring an IP
dhcp || goto retry_dhcp

echo Got the IP: $${netX/ip} / $${netX/netmask}

:retry_boot
echo Booting from DHCP...
autoboot || goto retry_boot
EOF

## build/install:
$ make bin-arm64-efi/snp.efi -j4 EMBED=ipxescript
$ doas cp bin-arm64-efi/snp.efi /boot/ipxe.efi

Loading things in u-boot is quite tedious, since you have to specify memory addresses to load files into, and the correct *load command to read files into memory. I already have an existing install of postmarketOS on my devkit, so I used the /boot partition (formatted as ext2) as a home for the iPXE binary. I created the following U-boot helper script for loading iPXE, since typing all of these in becomes tiresome very quickly:

$ cat << EOF > /tmp/ipxe 
echo ===== Loading iPXE =====
ext2load mmc 0:1 $kernel_addr_r ipxe.efi
ext2load mmc 0:1 $fdt_addr_r imx8mq-librem5-devkit.dtb
fdt addr $fdt_addr_r
fdt resize
echo ===== Running iPXE =====
bootefi $kernel_addr_r $fdt_addr_r
EOF

I'm not entirely sure if we need to specify/load the dtb, but it doesn't seem to hurt! Also note that this u-boot script is using bootefi to load the iPXE app. That'll be important later on when we try to boot a kernel.

The u-boot script must be compiled before u-boot can execute it:

$ mkimage -A arm64 -C none -O linux -T script -d /tmp/ipxe /tmp/ipxe.scr
$ doas cp /tmp/ipxe.scr /boot

In that last step, I copy it to /boot since I'm performing these steps on the devkit, and /boot is the ext2 partition I'll run iPXE from when booted into u-boot.

Configuring dnsmasq for BOOTP/DHCP

Now that all the necessary pieces are setup/installed on the devkit, the last step is to run dnsmasq on a host to provide BOOTP service to the devkit. This should be good enough for our purposes:

$ export workdir=/path/to/some/dir

## set to the network interface that is on the same physical LAN as the devkit that dnsmasq will bind to
$ export iface=eth0

## needs to run as root since it binds to ports < 1000
$ doas dnsmasq \
    --port=0 \
    --dhcp-hostsfile="$workdir"/hosts.dhcp \
    --dhcp-optsfile="$workdir"/options.dhcp \
    --dhcp-leasefile="$workdir"/dnsmasq.leases \
    --dhcp-boot=grubnetaa64.efi \
    --dhcp-range=10.42.0.10,10.42.0.100 \
    --dhcp-script=/bin/echo \
    --enable-tftp="$iface" \
    --tftp-root="$workdir"/tftp \
    --log-queries=extra \
    --conf-file=/dev/null \
    --log-debug \
    --no-daemon \
    --interface="$iface"

Note that the boot option sent to the client is grubnetaa64.efi. This is a binary I pulled from some Debian build of grub2 for aarch64, since it was annoying to have to build grub myself just for a quick smoke test.

Grub isn't necessary for booting the Linux kernel, but it is a small application that serves as a good test to make sure that u-boot, iPXE, and dnsmasq are happy.

If you're like me and run firewalls everywhere, you'll need to punch some holes in it for bootp / tftp to work.

Once dnsmasq is started, the devkit is reset and the u-boot script to run iPXE is executed:

Hit any key to stop autoboot:  0
u-boot=> env set boot_scripts ipxe.scr
u-boot=> boot
switch to partitions #0, OK
mmc0(part 0) is current device
Scanning mmc 0:1...
Found U-Boot script /ipxe.scr
294 bytes read in 1 ms (287.1 KiB/s)

In Part 2, I'll cover booting the Linux kernel... Stay tuned!

January 07, 2022 12:00 AM

January 03, 2022

Danylo Piliaiev

Graphics Flight Recorder - unknown but handy tool to debug GPU hangs

It appears that Google created a handy tool that helps finding the command which causes a GPU hang/crash. It is called Graphics Flight Recorder (GFR) and was open-sourced a year ago but didn’t receive any attention. From the readme:

The Graphics Flight Recorder (GFR) is a Vulkan layer to help trackdown and identify the cause of GPU hangs and crashes. It works by instrumenting command buffers with completion tags. When an error is detected a log file containing incomplete command buffers is written. Often the last complete or incomplete commands are responsible for the crash.

It requires VK_AMD_buffer_marker support; however, this extension is rather trivial to implement - I had only to copy-paste the code from our vkCmdSetEvent implementation and that was it. Note, at the moment of writing, GFR unconditionally usesVK_AMD_device_coherent_memory, which could be manually patched out for it to run on other GPUs.

GFR already helped me to fix hangs in “Alien: Isolation” and “Digital Combat Simulator”. In both cases the hang was in a compute shader and the output from GFR looked like:

...
- # Command:
        id: 6/9
        markerValue: 0x000A0006
        name: vkCmdBindPipeline
        state: [SUBMITTED_EXECUTION_COMPLETE]
        parameters:
          - # parameter:
            name: commandBuffer
            value: 0x000000558CFD2A10
          - # parameter:
            name: pipelineBindPoint
            value: 1
          - # parameter:
            name: pipeline
            value: 0x000000558D3D6750
      - # Command:
        id: 6/9
        message: '>>>>>>>>>>>>>> LAST COMPLETE COMMAND <<<<<<<<<<<<<<'
      - # Command:
        id: 7/9
        markerValue: 0x000A0007
        name: vkCmdDispatch
        state: [SUBMITTED_EXECUTION_INCOMPLETE]
        parameters:
          - # parameter:
            name: commandBuffer
            value: 0x000000558CFD2A10
          - # parameter:
            name: groupCountX
            value: 5
          - # parameter:
            name: groupCountY
            value: 1
          - # parameter:
            name: groupCountZ
            value: 1
        internalState:
          pipeline:
            vkHandle: 0x000000558D3D6750
            bindPoint: compute
            shaderInfos:
              - # shaderInfo:
                stage: cs
                module: (0x000000558F82B2A0)
                entry: "main"
          descriptorSets:
            - # descriptorSet:
              index: 0
              set: 0x000000558E498728
      - # Command:
        id: 8/9
        markerValue: 0x000A0008
        name: vkCmdPipelineBarrier
        state: [SUBMITTED_EXECUTION_NOT_STARTED]
...

After confirming that corresponding vkCmdDispatch is indeed the call which hangs, in both cases I made an Amber test which fully simulated the call. For a compute shader, this is relatively easy to do since all you need is to save the decompiled shader and buffers being used by it. Luckily in both cases these Amber tests reproduced the hangs.

With standalone reproducers, the problems were much easier to debug, and fixes were made shortly: MR#14044 for “Alien: Isolation” and MR#14110 for “Digital Combat Simulator”.

Unfortunately this tool is not a panacea:

  • It likely would fail to help with unrecoverable hangs where it would be impossible to read the completion tags back.
  • Or when the mere addition of the tags could “fix” the issue which may happen with synchronization issues.
  • If draw/dispatch calls run in parallel on the GPU, writing tags may force them to execute sequentially or to be imprecise.

Anyway, it’s easy to use so you should give it a try.

by Danylo Piliaiev at January 03, 2022 10:00 PM

January 01, 2022

Clayton Craft

Using ASNs and nftables to block connections

Blocking Facebook, and similarly-toxic sites/services, is a common theme amongst those who value privacy. Facebook goes to great lengths to track everyone, regardless of whether or not they have an account or use anything they "generously" offer to the public. Previously I had a long, long list of domains that Facebook owned, and set up unbound (the DNS resolver I run) to deny lookups to those domains. This was a classic game of cat & mouse, as Facebook would frequently acquire new domains and it was basically impossible to keep up.

Enter autonomous system numbers (ASN), which are unique identifiers that the IANA assigns to owners of public IP blocks. Using as ASN, it's possible to look up every IP "owned" by the thing the ASN was given to. Once you have every IP, it's trivial to generate a firewall rule (using nftables, at least) to block connections to them. You can evidently even get ASNs for entire ISPs (and therefore, effectively, [some] entire countries!)

I have done this in the script below, ASNs can be set to include others as well, but I have left the two ASNs for Facebook as a convenience to the reader :D

#!/bin/sh

set -euf

# facebook ASNs
ASNs="AS32934 AS11917"

get_asn_ips() {
        asn="$1"
        whois -h whois.radb.net -- -i origin "$asn" |  awk '/route:/ {printf("\t\t%s,", $2)}'
}

asn_ips=

for a in $ASNs; do
        asn_ips=$(printf "%s%s" $asn_ips $(get_asn_ips $a))
done

cat  <<EOF > /etc/nftables.d/50-nft_asn_block.nft
#!/usr/sbin/nft -f
table inet filter {
    set asn_blocked_addresses {
        type ipv4_addr
        flags interval
        elements = {
            $asn_ips
        }
        auto-merge
    }
    chain output {
        meta nfproto ipv4 ip daddr @asn_blocked_addresses log prefix "BLOCKED BY NFT_ASN_BLOCK: " drop;
    }
}
EOF

I have this set to run as a cron job every week, which might be too often (don't forget to reload nftables), but it works fine ¯\_ (ツ)_/¯

There are various ways to find an ASN, some searches allow you to specify the company/organization name, but the most common seem to do lookups based on a given IP address. I won't link to any here, because it's easy to find them using your favorite search engine.

January 01, 2022 12:00 AM

December 30, 2021

Clayton Craft

Timing performance of functions in shell scripts

I recently started looking at some shell scripts that run fine on big fast powerful systems (i.e. x86 laptops/desktops), but quite slowly on small slow devices. Using time to run the script (or, the amazing hyperfine) works OK if you're timing the entire script execution, but what do you do if you want to time individual functions within the script?

Well without getting too fancy, I came up with the following, which is capable of timing far below 1 second:

$!/bin/sh

foo() {
    # do real work
    sleep 4
}

start=$(date +%s.%N)

foo

end=$(date +%s.%N)
echo "foo: $( echo "$end - $start" | bc -l ) seconds..."

It's not the most accurate thing in the world, and you'll pay some penalty for running date in a sub shell, twice, but it works well for blaming slowdowns in a shell script.

$ ./run.sh
foo: 4.037054796 seconds...

December 30, 2021 12:00 AM

December 28, 2021

Manuel Rego

A story on web engines interoperability related to wavy text decorations

Some weeks ago I wrote a twitter thread explaining the process of fixing a Chromium bug related to wavy text decorations. At first it was just a patch in Chromium, but we also fixed the same issue in WebKit, which unveiled a mistake on the initial fix in Chromium (a win-win situation).

This blog post is somehow a story around web platform features implementation, which highlights the importance of interoperability and how investigating and fixing bugs on different web engines usually leads to gains for the whole ecosystem.

Some background

Let’s start from the beginning. Igalia (as part of our collaboration with Bloomberg) is working on adding support for ::spelling-error & ::grammar-error highlight pseudo-elements in Chromium.

My colleague Delan Azabani has been leading this effort. If you want more details about this work you can read her two blog posts. Also don’t miss the chance to enjoy her amazing talk from last BlinkOn 15, this talk gives lots of details about how highlight pseudos work, and includes some cool animations that help to understand this complex topic.

Lately I’ve been also helping with some related tasks here and there. Next I’m going to talk about one of them.

Spelling and grammar error markers

As you probably know spelling and grammar error markers use wavy underlines in some platforms like Linux or Windows, though not in all of them as they use dotted underlines on Mac. In Chromium they’re painted on a separated codepath, totally independent of how CSS text decorations are painted. You can easily spot the difference between a “native” spelling errors (left) and elements with text-decoration: wavy red underline (right) in the next picture.

Spelling errors on the left vs wavy red underlines on the right Spelling errors Chromium Linux (left) vs wavy red underlines (right)

As part of our work around ::spelling|grammar-error highlight pseudos, we plan to merge both codepaths and use the CSS one for painting the default spelling and grammar error markers in the future. This doesn’t mean that they’ll look the same, actually they will still have a different rendering so the user can differentiate between a spelling marker and a wavy text decoration (like it happens now). But they’ll share the same code, so any improvement we do will apply to both of them.

There have been some bugs on each of them in the past, related to invalidation and overflow issues, and they had to be fixed in two places instead of just one. That’s why we’re looking into sharing the code, as its main job is to produce very similar things.

The issue we’re describing in the next section doesn’t happen on native spelling error markers, but as we plan to follow the CSS codepath, we had to fix it as a preliminary task getting things ready to move the spelling markers to use that codepath.

The issue

One problem with wavy text decorations in Chromium was that they sometimes don’t cover the full length of the text. This is because it only paints whole cycles and thus fall short in some situations.

A simple example is a wavy underline (text-decoration: wavy green underline) on a “m” letter using big fonts (see the picture below and how it doesn’t cover the full length of the letter).

  <div style="font-size: 5em; text-decoration: wavy green underline;">m</div>

Green wavy underline doesn't cover the full length of the letter m (Chromium) Green wavy underline doesn’t cover the full length of the letter “m” (Chromium)

Fixing the problem

This section goes into some implementation details about how this works on Chromium and how we fixed it.

To draw wavy text decorations Chromium defines a vector path for a Bezier curve, that path is generated at TextDecorationInfo::PrepareWavyStrokePath(). The comment in that method is quite self explanatory: Comment from TextDecorationInfo::PrepareWavyStrokePath() explaining how the path for the Bezier curve is defined Comment from TextDecorationInfo::PrepareWavyStrokePath() explaining how the Bezier curve is defined

This method generates the path for wavy text decorations using the next loop:

    for (float x = x1; x + 2 * step <= x2;) {
      control_point1.set_x(x + step);
      control_point2.set_x(x + step);
      x += 2 * step;
      path.AddBezierCurveTo(control_point1, control_point2,
                            gfx::PointF(x, y_axis));
    }

As you can see, it only uses whole cycles of the wave (2 * step), and it never splits that in smaller chunks. If we’re going to end up further away than the text size, we don’t add that wave to the path (x + 2 * step <= x2). Which leads to the wrong behavior we saw in the example of the “m” letter above, where the text decoration falls short.

To prevent this problem, the code was using the method AdjustStepToDecorationLength(), that was expected to adjust the length of a whole wave. If that method was working properly, we would always cover the full text width adjusting the size of the waves. However there were two different problems on that method:

  • On one side, that method adjusted the step, but we were always generating whole waves (2 * step), so we might need to adjust the whole length of the wave instead.
  • On the other side, the method had some bug, as it was changing the step when it was not actually needed. For example if you pass a total length of 40px and a step of 10px, this method was adjusting the step to 10.75px, which makes no sense.

Digging a little bit on the repository’s history, we found out that this method has been around since 2013 thus it was present in both Blink and WebKit. As it has a bunch of issues and our proposed fix was cutting the waves at any point, we decided it was not needed to try to adjust their size anymore, so we get rid of this method.

The solution we used to the length issue requires two main changes:

  • First we generate two extra waves before and after the text width:
  // We paint the wave before and after the text line (to cover the whole length
  // of the line) and then we clip it at
  // AppliedDecorationPainter::StrokeWavyTextDecoration().
  // Offset the start point, so the beizer curve starts before the current line,
  // that way we can clip it exactly the same way in both ends.
  FloatPoint p1(start_point + FloatPoint(-2 * step, wave_offset));
  // Increase the width including the previous offset, plus an extra wave to be
  // painted after the line.
  FloatPoint p2(start_point + FloatPoint(width_ + 4 * step, wave_offset));
  
  • Then clip the path so it’s no longer than the text width. For which GraphicsContextStateSaver was really useful to just clip things related to the line that is currently being painted (for example in cases where you have both underline and overline text decorations).

Video showing the solution described above

The reviewers liked the idea and the patch landed in Chromium 97.0.4692 with some internal tests (not using WPT tests as how wavy lines are painted is not defined per spec and varies between implementations).

To finish this section, below there is a screenshot of the “m” with wavy green underline after this patch.

Green wavy underline covering the full length of the letter m (Chromium) Green wavy underline covering the full length of the letter “m” (Chromium)

WebKit & WPT test

While looking into the history of AdjustStepToDecorationLength() method we ended up looking into some old WebKit patches, we realized that the code for the wavy text decoration in WebKit is still very similar to Chromium, and that this very same issue was also present in WebKit. For that reason we decided to fix this problem also in WebKit too, with the same approach than the patch in Chromium.

Green wavy underline doesn't cover the full length of the letter m (WebKit) Green wavy underline doesn’t cover the full length of the letter “m” (WebKit)

The cool thing is that during the patch review Myles Maxfield suggested to create a mismatch reference test.

Just an aside quick explanation, reference tests (reftests) usually compare a screenshot of the test with a screenshot of the reference file, to see if the rendered output matches exactly or not between both files. But sometimes browsers do a different thing that is called mismatch reftests, which compares a test with a reference and checks that they’re actually different.

The idea here was to do a test that has a wavy text decoration but we hide most of the content with some element on top of that, and just show the bottom right corner of the decoration. We mismatch against a blank page, because there should be something painted there, if the wavy text decoration cover the whole line.

So we wrote WPT tests, that we can share between implementations, to check that this was working as expected. And while working on that test we discovered an issue on the initial Chromium fix, as the wavy underline was kind of misplaced to the left. More about that later.

On top of that there was another issue, WPT mismatch tests were not supported by WebKit tests importer, so we also added support for that in order to be able to use these new tests on the final WebKit patch fixing the wavy text decorations length which is included in Safari Technology Preview 136.

Again let’s finish the section with a screenshot of the “m” with wavy green underline after the WebKit patch.

Green wavy underline covering the full length of the letter m (WebKit) Green wavy underline covering the full length of the letter “m” (WebKit)

Round trip

As mentioned in the previous section, thanks to porting the patch to WebKit and working on a WPT test we found out a mistake on the first Chromium fix.

So we’re back in Chromium where we were clipping the wavy text decoration with the wrong offset, so it looks like it was a little bit shifted to the left (specially when using big fonts). I’m repeating here the image for the initial Chromium fix, adding a grey background so it’s easier to notice the problem and compare with the final fix. There you can see that the wavy underline starts more on the left than expected, and ends earlier than the “m” letter.

Green wavy underline covering the full length of the letter m (Chromium). Initial fix. Green wavy underline covering the full length of the letter “m” (Chromium). Initial fix

The patch to fix that was pretty simple, so we landed it in Chromium 98.0.4697. And this is the final output with the text decoration positioned in the proper place.

Green wavy underline covering the full length of the letter m (Chromium). Final fix. Green wavy underline covering the full length of the letter “m” (Chromium). Final fix

Other issues

In addition to an improved rendering on static wavy text decorations, these fixes have some nice effects when animations are involved. See the following video showing an example with animations (stealing a letter-spacing example from Delan’s latest blog post), on the left you can see Chromium (buggy version on top, fixed one on bottom) and on the right WebKit (again buggy on top and fixed on bottom).

Video showing the fix (top left: Chromium buggy, bottom left: Chromium fixed, top right: WebKit buggy, bottom right: WebKit fixed)

But as usual there’s still something else, in this case very related to this topic. There’s the concept of decorating box in CSS Text Decoration spec, and that hasn’t been implemented in Chromium and WebKit yet (though Firefox seems to be doing that right in most cases, except for dotted text decorations).

This issue is quite noticeable when you have different elements in the same line (like a <strong> element), or different font sizes in the same line. See the next video that shows this problem in Chromium (on the left) and Firefox (on the right, where only dotted text decorations have problems).

Video showing the problem related to decorating box (left: Chromium, right: Firefox)

This has been a problem in Chromium and WebKit forever, even native spelling and grammar error markers have the same issue (thought it’s less noticeable as they’re painted always in a small size). Though even when this isn’t a strictly a blocker for all this work, this is something we’re looking forward to get fixed too.

Conclusion

The main takeaway from this blog post is how browser interoperability plays a key important role in the implementation of web platform features. Which is something we’re very concerned of at Igalia.

The fact that we fixed this issue in Chromium and WebKit at the same time helped to get more eyes looking into the same code, which is usually very beneficial.

We ended up not just fixing the issue in both implementations (Chromium and WebKit), but also adding new WPT tests that would be useful for any other implementation and to prevent regressions in the future. Even as a positive side effect, WebKit added support for mismatch reference tests as part of this work.

Finally, thanks to all the people that helped to make this happen by providing ideas, feedback and reviewing the patches; particularly Delan Azabani (Igalia), Myles Maxfield (Apple) and Stephen Chenney (Google).

December 28, 2021 11:00 PM

December 20, 2021

Guilherme Piccoli

Booting upstream kernel on Inforce 6640

My first task in the Core team at Igalia was to boot a more recent kernel in the Inforce 6640 board , which was very interesting given my very limited experience working in the ARM64 world – also, the work aimed to benefit Igalia’s Graphics team in their Freedreno development. The board itself is pretty … Continue reading "Booting upstream kernel on Inforce 6640"

by gpiccoli at December 20, 2021 10:45 PM

December 17, 2021

Alexander Dunaev

Ozone: our way to the big change

Did you know that there is the Ozone layer inside Chromium? Well, Chromium is so huge, it has enough space for anything. Like the real ozone layer on the planet Earth that protects life beneath from harmful cosmic rays, the Ozone layer in Chromium shields the browser from the (sometimes unfriendly) environment. The purpose of Ozone is to hide the actual platform implementation and convert platform-specific entities such as UI events or windows into platform-agnostic ones. Unlike the ozone layer on planet Earth which is above the most of things, Ozone in Chromium is somewhere below almost everything. Heck, many things in computers are upside down.

Ozone is old. Quite a few of its source files are dated 2014, but probably that is the time when parts of the design that already existed somewhere else were re-shaped as the new component that got this name: Ozone. At that time it was used only by ChromeOS—likely to provide some adapters for the variety of base systems that ChromeOS had to integrate with.

Igalia started dealing with Ozone while implementing the native support for Wayland in the desktop Chromium. The full story of the project is so long and complicated that it is hard to find a single source that would cover it all, but if you are interested, take a look at The pathway to Chromium on Wayland by Antonio Gomes and Jeongeun Kim, or at this presentation from BlinkOn 13 by Maksim Sisov and me. Here I will only mention the major milestones.

From compile time to run time

When the work began, the Linux port of Chromium only knew X11 as an implied part of the Linux environment. There was a USE_X11 macro that was defined in Linux builds, and it was used generously to guard Linux-specific logic in hundreds of places throughout the code base. Everything was static, defined at compile time, and not welcoming any change. Our task was to find a way to insert something totally new, and after trying a few approaches, it was agreed that using Ozone for that would be the best choice. After all, its main purpose is hiding the platform, and Wayland fits well the definition of “platform”, so why not?

There was one problem with that. Back then Ozone was also enabled at compile time, and the only configuration it was enabled for was ChromeOS. For the desktop Chromium, it was not even compiled, the source files were not included into the build. To bring Wayland support into Chromium in form of the Ozone platform, we had to refactor the entire Linux implementation to make Ozone the integral part of Chromium, and then to convert the existing Linux implementation into one of Ozone platforms. What is more, we needed to do all that following the standard development process in Chromium: committing our changes to the upstream Chromium repository, keeping tests passing, and introducing as little overhead as possible. What is even more, we had to keep the existing implementation until the Ozone platform is fully functional.

When the plan was set, our partners in the Chromium community said, “even if we do all the job, noone will accept a change that big right away, so we need to be silent and approach the goal in small steps.” So we did, and it was silent until we sent the patch that enabled compilation of Ozone in Linux desktop builds. One of reviewers said then, “I don’t think this is right.” We managed to convince them that it was right.

Long story short, we followed that plan since 2019, and by the end of 2020 we started to feel the coming finale. The new Ozone platform, X11, caught up rapidly with the legacy implementation. In our dev environments, we used Ozone/X11 routinely, and noticed no difference to the legacy mode. To ship Ozone as the new default, the only thing we needed was ensuring that everyone else would not notice the change either.

How could we know that? Chromium has Finch, the built-in facility for doing the A/B testing on the real audience: users of Chrome. It also has another built-in facility for gathering various performance metrics on users’ hosts and aggregating them on the server side. Together they make it possible to compare different configurations. A new feature is first enabled for a small share of users in the developers release channel, then it is extended to a larger share, then the same is repeated in the beta channel, and finally it comes to the stable one. The performance metrics are analysed all the time during the rollout, so if any regressions are found, the experiment will be put on hold or stopped.

So to roll Ozone out gracefully and the controlled way, we had to combine both the legacy and the Ozone paths in the single binary, so that Ozone could be enabled for the end user at run time in the process of A/B testing. The Ozone mode would be the new “feature”, and switching the mode would be possible by setting the so called feature flag. That was also part of the grand plan.

One year ago and today

At the beginning of 2021 we were actively working on completing the feature for A/B testing. We revised all Linux-specific code so that it could be used by both the legacy implementation and the Ozone platform. We fixed and enabled tests. We configured the infrastructure so that the continuous integration would be aware of Ozone. Finally, everything was ready.

The field trial of Ozone with the X11 platform started on April, 30th, and finished by the end of August with success. Since that time, Ozone is the default path for Linux. We started to clean up the code immediately, and by date, the USE_X11 macro is history.

We still keep a few items of USE_X11 displayed in museums, but sooner or later the inventory committee may decide that they have no historical value—and throw them away.

Modularity has always been one of core principles of Linux. From that perspective, migrating Chromium from its monolithic design to Ozone that is naturally modular was the right choice. The new architecture is much easier to extend, both for Chromium itself and for downstream projects. Check this video to see what it takes to implement a minimal Ozone platform—not much!

For more details on where we are now, see another post by my colleague Maksim.

What is next?

These days we are coming to the public release of Wayland, the second Ozone platform for the desktop Chromium. It is not as simple as the demo that I mentioned above—we have been working on it during all these years. Ozone has grown a lot thanks to our work on two real platforms, and gained flexibility that it did not have before. We extended Ozone a lot to make that possible, and hopefully it will now stabilise a bit.

After the Wayland platform is finally released, we expect some support and maintenance work to be coming for a while. Wayland (I mean the compositor here) is itself evolving rapidly, and it has already got a few distinct “flavours” attributed to major Linux desktop environments, which is one more question to modularity and flexibility of Chromium and other applications. We will keep an eye on that—and tell you the news.

by Alex at December 17, 2021 09:27 AM

Maksim Sisov

“Where” is Ozone now?


The “Ozone” abstraction.

Within Chromium, an abstraction known as “Ozone” has been in development for many years. It is believed to be very important for Chromium’s design that heavily relies on dependency inversion, which helps to isolate high and low level components that communicate through interfaces.

It is also designed with the interface segregation in mind that increases cohesion, which is not that easy to maintain given extremely different requirements that different backends may have, and overall cleanness of the code.

Nowadays both Chromium on Chrome OS and on Linux are the heavy users of Ozone.


Where are we now?

We reached a major milestone this fall.– Ozone became an integral part of Chromium on Linux. The main backend that Chromium uses when runs on Linux is Ozone/X11.

If you ask – “why?”. The answer will be simple – Linux has always been the user of the X Window System and it was a natural choice to continue to use X11. But… via Ozone this time.

Even though, enabling that required major refactoring and thousands of lines of new code, it opened an opportunity to bring other backends into the play – Wayland.

Honestly speaking, Wayland was and is the reason why all this effort exists.
And, if you are interested in more details, I encourage you to checkout my colleague’s blog post – “Ozone: our way to the big change”, which describes the past progress in a little bit more details.

Let’s talk about the current state of Ozone backends in a little more detail.


Ozone/X11

As was said previously, the Ozone/X11 has been released and does not require as much effort as it required before. There is only on task left – a grand final code migration.

You may remember from our previous blog posts that we had to refactor the old legacy X11 path in such a way that both Ozone/X11 and legacy X11 implementations can reuse the same low-level code. To achieve that, all the low-level pieces were placed into common directories – //ui/base/x and //ui/platform_window/x11.

Recently, the old path has been deprecated and removed, and all these low-level details can finally be moved into the //ui/ozone/platform/x11 directory.

However, just few bits left – there are a couple of interactive UI tests that use low-level components directly. These tests will be placed into to the //ui/ozone/platform/x11 directory very soon.

Once that task is completed, all the low-level X11 code I mentioned before will also be moved into the Ozone/X11 folder.


Ozone/Wayland

Wayland is the reason why all this work happened. The implementation is stable enough, and it is used by our several customers in different areas – from automotive to home appliances.

Nowadays, we can call it beta as most of the features have already been implemented. The list includes handling of UI events, window state changes, window management, buffer management, zero-copy and more.

Today, we are working on several performance improvements, better test coverage and better support of different hardware. Moreover, we are adding ANGLE support soon. Vulkan is also part of our plans.

And going forward, the performance improvements that I mentioned are based on delegating most of the compositing to Wayland by sending as many quads as overlays as possible. This approach is better than doing the entire compositing on the Chromium side (this deserves a separate blog post).


What’s next?

We are continuing our work on the Ozone/Wayland area.

There are still many issues to be addressed and new wayland protocols to be written but, thanks to the milestones achieved so far, we are confident we will be building on solid foundations, so it will be a matter of time to have a fully functional native Wayland implementation for Chromium on Linux.

by Maksim Sisov at December 17, 2021 08:00 AM

December 16, 2021

Delan Azabani

Chromium spelling and grammar, part 2

Modern web browsers can help users with their word processing needs by drawing squiggly lines under possible spelling or grammar errors in their input. CSS will give authors more control over when and how they appear, with the new ::spelling- and ::grammar-error pseudo-elements, and spelling- and grammar-error text decorations. Since part 1 in May, we’ve done a fair bit of work in both Chromium and the CSSWG towards making this possible.

article figure > img { max-width: 100%; } article figure > figcaption { max-width: 30rem; margin-left: auto; margin-right: auto; } article pre, article code { font-family: Inconsolata, monospace, monospace; } article blockquote { max-width: 27rem; margin-inline: auto; } article blockquote > footer { text-align: right; } article > /* gross and fragile hack */ :not(img):not(hr):not(blockquote):before { width: 13em; display: block; overflow: hidden; content: ""; } ._demo { font-style: italic; font-weight: bold; color: rebeccapurple; } ._spelling, ._grammar { text-decoration-thickness: /* iOS takes 0 literally */ 1px; text-decoration-skip-ink: none; } ._spelling { text-decoration: /* not a shorthand on iOS */ underline; text-decoration-style: wavy; text-decoration-color: red; } ._grammar { text-decoration: /* not a shorthand on iOS */ underline; text-decoration-style: wavy; text-decoration-color: green; } ._table { font-size: 0.75em; } ._table td, ._table th { vertical-align: top; border: 1px solid black; } ._table td:not(._tight), ._table th:not(._tight) { padding: 0.5em; } ._tight picture, ._tight img { vertical-align: top; } ._compare * + *, ._tight * + *, ._gifs * + * { margin-top: 0; } ._compare { max-width: 100%; border: 1px solid rebeccapurple; } ._compare > div { max-width: 100%; position: relative; touch-action: pinch-zoom; --cut: 50%; } ._compare > div > * { vertical-align: top; max-width: 100%; } ._compare > div > :nth-child(1) { position: absolute; clip: rect(auto, auto, auto, var(--cut)); } ._compare > div > :nth-child(2) { position: absolute; width: var(--cut); height: 100%; border-right: 1px solid rebeccapurple; } ._compare > div > :nth-child(2):before { content: var(--left-label); color: rebeccapurple; font-size: 0.75em; position: absolute; right: 0.5em; } ._compare > div > :nth-child(2):after { content: var(--right-label); color: rebeccapurple; font-size: 0.75em; position: absolute; left: calc(100% + 0.5em); } ._sum td:first-of-type { padding-right: 1em; } ._gifs { position: relative; display: flex; flex-flow: column nowrap; } ._gifs > video { transition: opacity 0.125s linear; } ._gifs > button { transition: 0.125s linear; transition-property: color, background-color; } ._gifs._paused > video { opacity: 0.5; } ._gifs._paused > button { color: rebeccapurple; background: #66339940; } ._gifs > button { position: absolute; top: 0; bottom: 0; left: 0; right: 0; width: 100%; font-size: 7em; color: transparent; background: transparent; content: "▶"; } ._gifs > button:focus-visible { outline: 0.25rem solid #663399C0; outline-offset: -0.25rem; } ._commits { position: relative; } ._commits > :first-child { position: absolute; right: -0.1em; height: 100%; border-right: 0.2em solid rgba(102,51,153,0.5); } ._commits > :last-child { position: relative; padding-right: 0.5em; } * + ._commit, ._commit * + * { margin-top: 0; } ._commit { line-height: 2; margin-right: -1.5em; text-align: right; } ._commit > img { width: 2em; vertical-align: middle; } ._commit > a { padding-right: 0.5em; text-decoration: none; color: rebeccapurple; } ._commit > a > code { font-size: 1em; } ._commit-none > a { color: rgba(102,51,153,0.5); }

The client funding this work had an internal patch that allowed you to change the colors of those squiggly lines, and our job was to upstream it. The patch itself was pretty simple, but turning that into an upstream feature is a much bigger can of worms. So far, we’ve landed over 30 patches, including dozens of new web platform tests, opened 8 spec issues, and run into some gnarly bugs going back to at least 2009.

Check out our project index for a complete list of demos, tests, patches, and issues. For more details about the CSS highlight pseudos in particular, check out my BlinkOn 15 talk, including the highlight painting visualiser.

(slides)

Contents

Implementation status

Chromium 96 includes a rudimentary version of highlight inheritance, with support for ::highlight in Chromium 98 (Fernando Fiori). This is currently behind a Blink feature:

--enable-blink-features=HighlightInheritance

Adding to our initial support for ::{spelling,grammar}-error, we’ve since made progress on the new {spelling,grammar}-error decorations. While they are accepted but ignored in Chromium 96, you’ll be able to see them in Chromium 98, with our early paint support.

Chromium 96 also makes it possible to change the color of native squiggly lines by setting ‘text-decoration-color’ on either of the new pseudo-elements. This feature, and the features above, are behind another flag:

--enable-blink-features=CSSSpellingGrammarErrors

Charlie’s bird spec lawyerings

I’ve learned a lot of things while working on this project. One interesting lesson was that no matter how clearly a feature is specified, and how much discussion goes into spec details, half the questions won’t become apparent until someone starts building it.

  • What happens when both highlight and originating content define text shadows? What if multiple highlights do the same? What order do we paint these shadows in? (#3932)
  • What happens to the originating content’s decorations when highlighted? What happens when highlights define their own decorations? Which decorations get recolored to the foreground color for clarity? What’s the painting order? Does it even mean anything for a highlight to set ‘text-decoration-color’ only? (#6022)
  • Some browsers invert the author’s ::selection background based on contrast with the foreground color. Should this be allowed, or does it do more harm than good? (#6150)
    • What about other “tweaks”? What if a browser needs to force translucency to make its selection highlighting work? (#6853)
    • How do we even write reftests if they are allowed? (no issue)
    • While we’re talking about testing, how do we even test ::{spelling,grammar}-error without a way to guarantee that some text is treated as an error? (wpt#30863)
  • How does paired cascade work? Does “use” mean used value? Which properties are “highlight colors”? Do we really mean ::selection only, and color and background-color only? What does it mean for a highlight color to have been “specified by the author”? Does the user origin stylesheet count as “specified”? Do unset and revert count as “specified”? Does unset mean inherit even when the property is not normally inherited? (#6386)
  • Should custom properties be allowed? What about variable references? Do we force non-inherited custom properties to become inherited like we do for non-custom properties? Should we provide a better way to set custom properties in a way that affects highlight pseudos? (#6264, #6641)
  • What if existing content relies on implicitly inheriting a highlight foreground color when setting background-color explicitly, or vice versa? Do we need to accommodate this for compat? (#6774)
  • The spec effectively recommends that ::{spelling,grammar}-error (and requires that ::highlight) force the text color to black by default. Surely we want to not change the color by default? (#6779)
  • Does color:currentColor point to the next active highlight overlay below, or are inactive highlights included too? What happens when the author tries to getComputedStyle with ::selection? (#6818)
  • Do decorations “propagate” to descendants in highlights like they would normally? How do we reconcile that with highlight inheritance? How do we ensure that “decorating box” semantics aren’t broken? (#6829)

Squiggly lines

Since landing ‘text-decoration-color’ support for the new pseudos, my colleague Rego has taken the lead on the rest of the core spelling and grammar features, starting with the new ‘text-decoration-line’ values.

Currently, when setting ‘text-decoration-color’ on the pseudos, we change the color, but ‘text-decoration-line’ is still ‘none’, which doesn’t really make sense. This might sound like it required gross hacks, but the style system just gives us a blob of properties, where ‘color’ and ‘line’ are independent. All of the logic that uses them is in paint and layout.

We started by adding the new values to the stylesheet parser. While highlight painting still needs a lot more work before we can do so, the idea is that eventually the pseudos and decorations will meet in the default stylesheet.

::spelling-error { text-decoration-line: spelling-error; }
::grammar-error { text-decoration-line: grammar-error; }

Something that’s often neglected in CSS tests is dynamic testing, which checks that the rendering updates correctly when styles are changed by JavaScript, since the easiest and most common way to write a rendering test involves no scripting at all.

In this case, only ::selection had dynamic tests, and only ::selection actually worked correctly, so we then fixed the other pseudos.

Platform “conventions”

Blink’s squiggly lines look quite different to anything CSS can achieve with wavy or dotted decorations, and they’re painted on unrelated codepaths (more details). We want to unify these codepaths, to make them easier to maintain and help us integrate them with CSS, but this creates a few complications.

The CSS codepath naïvely paints as many bézier curves as needed to span the necessary width, but the squiggly codepath has always painted a single rectangle with a cached texture, which is probably more efficient. This texture used to be a hardcoded bitmap, but even when we made the decorations scale with the user’s dpi, we still kept the same technique, so the approach we use for CSS decorations might be too slow.

Another question is the actual appearance of spelling and grammar decorations. We don’t necessarily want to make them identical to the default wavy or dotted decorations, because it might be nice to tell when, say, a wavy-decorated word is misspelled.

We also want to conform to platform conventions where possible, and you would think there’s at least a consistent convention for macOS… but not exactly. One thing that’s clear is that gradients are no longer conventional.

macOS (compare demo0)

SafariNotesTextEditKeynote

But anyway, if we’re adding new decoration values that mimic the native ones, which codepath do we paint them with? We decided to go down the CSS route — leaving native squiggly lines untouched for now — and take this time to refactor and extend those decoration painters for the needs of spelling and grammar errors.

Precise wavy decorations

To that end, one of the biggest improvements we’ve landed is making wavy decorations start and stop exactly where needed, rather than falling short. This includes the new spelling and grammar decoration values, other than on macOS.

Wavy decorations under ‘letter-spacing’, top version 96, bottom version 97 (demo0).

You may have noticed that the decorations in that last example sometimes extend to the right of “h”. This is working as expected: ‘letter-spacing’ adds a space after letters, not between them, even though it Really Should Not. I tried wrapping the last letter of each word in a span, but then the letter appears to have its own decoration, out of phase with the rest of the word. This is because Blink lacks phase-locked decorations.


Phase-locked decorations

Blink uses an inheritance hack to propagate decorations from parents to children, rather than properly implementing the concept of decorating box. In other words, we paint two independent decorations, whereas we should paint one decoration that spans the entire word. This has been the cause of a lot of bugs, and is widely regarded as a bad move.

Note that we don’t actually have to paint the decoration in a single pass, we only have to render as if that was the case. For example, when testing the same change in Firefox, the decoration appears to jitter near the last letter, which suggests that the decoration is probably being painted separately for that element.

Gecko goes above and beyond with this, even synchronising separate decorations introduced under the same block, which allows authors to make it look like their decorations change color partway through.

A related problem in the highlight painting space is that the spec calls for “recoloring” originating decorations to the highlight foreground color. By making these decorations “lose their color”, we avoid situations where a decoration becomes illegible when highlighted, despite being legible in its original context.

I’ve partially implemented this for ::selection in Chromium 95, by adding a special case that splits originating decorations into two clipped paints with different colors — though not yet the correct colors — while carefully keeping them in phase.

highlight-painting-004 and -ref3, version 97. In this test, the originating element has a red underline, while ::selection introduces a purple line-through. The underline needs to become blue in the highlighted part, to match the ::selection ‘color’, but for now, we match its ‘text-decoration-color’.

To paint the highlighted part of the decoration, we clip the canvas to a rectangle as wide as the background, and paint the decoration in full. To paint the rest, we clip “out” the canvas to the same rectangle, which means we don’t touch anything inside the rectangle.

But how tall should that rectangle be? Short answer: infinity.

Bézier bounding box

Long answer: Skia doesn’t let us clip to an infinitely tall rectangle, so it depends on several things, including ‘text-decoration-thickness’, ‘text-underline-offset’, and in the case of wavy decorations, the amplitude of the bézier curves.

In the code, there was a pretty diagram that illustrated the four relevant points to each “wave” repeated in the decoration. Clearly, it suggested that the pattern in that example was bounded by the control points, but I had no idea whether this was true for all cubic béziers, my terrible search engine skills failed me again, and I don’t like assuming.

/*                   controlPoint1
 *                         +
 *
 *
 *                  . .
 *                .     .
 *              .         .
 * (x1, y1) p1 +           .            + p2 (x2, y2)
 *                          .         .
 *                            .     .
 *                              . .
 *
 *
 *                         +
 *                   controlPoint2
 */

To avoid getting stuck on those questions for too long, and because I genuinely didn’t know how to determine the amplitude of a bézier curve, I went with three times the background height. This should be Good Enough™ for most content, but you can easily break it with, say, a very large ‘text-underline-offset’.

Weeks later, I stumbled upon a video by Freya Holmér answering that very question.

So, how do we get [the bounding] box?

The naïve solution is to simply use the control points of the bézier curve. This can be good enough, but what we really want is the “tight bounding box”; in some cases, the difference between the two is huge.

For now, the code still clips to a fixed three times the background height, but at least we now have some ideas for how to properly measure these decorations:

  • use the minimum and maximum y values of the control points (naïve)
  • find better min and max y values by evaluating the derivative at its zeros
  • use a dedicated function for this purpose like SkDCubic::convexHull?

Cover me!

Writing the reference pages for that test was also a fun challenge. When written the obvious way, Blink would actually fail, because in general we make no attempt to keep any decoration paints in phase.

highlight-painting-004-ref1 and -ref3, version 96.

The ref that Blink ended up matching has five layers. Each layer contains the word “quick” in full with any decorations spanning the whole word, but only part of the layer is shown. This is achieved by an elaborate system of positioned “covers” and “hiders”: the former clips a layer from the right with a white rectangle, while the latter clips a layer from the left by way of right:0 wrapped in overflow:hidden.

Wanna know the best part though? All three refs are identical in Firefox. Someday, hopefully, this will also be true for Blink.


Highlight inheritance

Presto (Opera), uniquely, supported inheritance for ::selection before it was cool, by mapping those styles to synthesised (internal) ‘selection-color’ and ‘selection-background’ properties that were marked as inherited.

Blink also has internal properties for things like :visited links and forced colors, where we need to keep track of both “original” and “new” colors. This works well enough, but internal properties add a great deal of complexity to the code that applies and consumes styles. Now that there are multiple highlight pseudos, supporting a lot more than just ‘color’ and ‘background-color’, this complexity is hard to justify.

To understand the approach we went with, let’s look at how CSS works in Chromium.

CSS is managed by Blink’s style system, which at its highest level consists of the engine, the resolver, and the ComputedStyle data structure. The engine maintains all of the style-related state for a document, including all of its stylesheet rules and the information needed to recalculate styles efficiently when the document changes. The resolver’s job is to calculate styles for some element, writing the results to a new ComputedStyle object.

ComputedStyle itself is also interesting. Blink recognises over 600 properties, including internal properties, shorthands (like ‘margin’), and aliases (like ‘-webkit-transform’), so most of the fields and methods are actually generated (ComputedStyleBase) with the help of some Python scripts.

These fields are “sharded” into field groups, so we can efficiently reuse style data from ancestors and previous resolver outputs. Some of these field groups are human-defined, like “surround” for all of the margin/border/padding properties, but there are also several raredata groups generated from property popularity stats.

When resolving styles, we usually clone an “empty” ComputedStyle, then we copy over the inherited properties from the parent to this fresh new object. Many of these live in the “inherited” field group, so all we need to do for them is copy a single pointer. At this point, we have the parent’s inherited properties, and everything else as initial values, so if the element doesn’t have any rules of its own, we’re more or less done.

Otherwise, we search for matching rules, sort all of their declarations by things like specificity, then apply the winning declarations by overwriting various ComputedStyle fields. If the field we’re overwriting is in a field group, we need to clone the field group too, to avoid clobbering someone else’s styles.

For ordinary elements, as well as pseudo-elements with a clear place in the DOM tree (e.g. ::before, ::marker), we resolve styles as part of style’s regular tree traversal. We start by updating :root’s styles, then any children affected by the update, and so on. But for other pseudos we usually use a “lazy” approach, where we don’t bother resolving styles unless they are needed by a later phase of the rendering process, like layout or paint.

Let’s say we’re resolving styles for some ordinary element. When we’re searching for matching rules, if we find one that actually matches our ::selection, we make a note in our pseudo bits saying we’ve seen rules for that pseudo, but otherwise ignore the rule.

Once we’re in paint, if the user has selected some text, then we need to know our ::selection styles, so we check our pseudo bits. If the ::selection bit was set, we call our resolver with a special request for pseudo styles, then cache the result into a vector inside the originating element’s ComputedStyle.

This is how ::selection used to work, and at first I tried to keep it that way.

Status quo

My initial solution was to make paint pass in a custom inheritance parent with its style request. Normally pseudo styles inherit from the originating element, but here they would inherit from the parent’s highlight styles, which we would obtain recursively. Then in the resolver, if we’re dealing with a highlight, we copy non-inherited properties too.

On the surface, this worked, but to make it correct, we had to work around an optimisation where the resolver would bail out early if there were no matching rules. Worse still, we had to bypass the pseudo cache entirely. While we already had to do so under :window-inactive, the performance penalty there was at least pretty contained.

If we copy over the parent’s inherited properties as usual, and for highlights, copy the non-inherited properties too, that more or less means we’re copying all the fields, so why not do away with that and just clone the parent’s ComputedStyle?

The pseudo cache is only designed for pseudos whose styles won’t need to change between the originating element’s style updates. For most pseudos, this is true anyway, as long as we bypass the cache under pseudo-classes like :window-inactive.

These caches are rarely actually cleared, but when the next update happens, the whole ComputedStyle — including the cache — gets discarded. Caching results with custom inheritance parents is usually frowned upon, because changing the parent you inherit your styles from can yield different styles. But for highlights, we will always have the same parent throughout an update cycle, so surely we can use the cache here?

…well, yes and no.

Given an element that inherits a bunch of highlight styles, the initial styles are correct. But when those inherited values change in some ancestor, our highlight styles fail to update! This is a classic cache invalidation bug. Our invalidation system wasn’t even the problem — it’s just unaware of lazily resolved styles in pseudo caches. This is usually fine, because most pseudos inherit from the originating element, but not here.

Storing highlight styles

With the pseudo cache being unsuitable for highlight styles, we needed some other way of storing them. Only a handful of properties are allowed in highlight styles, so why not make a dedicated type with only those fields?

The declarations and basic methods for CSS properties are entirely generated, so let’s write some new templates…

{% macro declare_highlight_class(name, fields, field_templates): -%}
class {{name}} : public RefCounted<{{name}}> {
 public:
  static scoped_refptr<{{name}}> Create() { /* ... */ }
  scoped_refptr<{{name}}> Copy() const { /* ... */ }
  bool operator==(const {{name}}& other) const { /* ... */ }
  bool operator!=(const {{name}}& other) const { /* ... */ }
  {% for field in fields %}
  {{declare_storage(field)}}
  {% endfor %}
  {% for field in fields %}
  {{field_templates[field.field_template]
      .decl_public_methods(field.without_group())
    |indent(2)}}
  {% endfor %}
 private:
  {{name}}();
  CORE_EXPORT {{name}}(const {{name}}&);
};
{%- endmacro %}

…then use them in the ComputedStyleBase template.

{{declare_highlight_class(
    'StyleHighlightData',
    computed_style.all_fields
        |sort(attribute='name')
        |selectattr('valid_for_highlight')
        |list,
    field_templates)
  |indent(2)}}

Trouble is, all of the methods that apply and serialise property values — and there are hundreds of them — take a ComputedStyle, not some other type.

const blink::Color Color::ColorIncludingFallback(
    bool visited_link,
    const ComputedStyle& style) const { /* ... */ }

const CSSValue* Color::CSSValueFromComputedStyleInternal(
    const ComputedStyle& style,
    const LayoutObject*,
    bool allow_visited_style) const { /* ... */ }

Combined with the fact that our copy-on-write field groups mitigate a lot of the wasted memory (well hopefully anyway), we quickly abandoned this dedicated type.

We then optimised the top-level struct a bit, saving a few pointer widths by moving the four highlight style pointers into a separate type, but this was still less than ideal. We were widening ComputedStyle by one pointer, but the vast majority of web content doesn’t use highlight pseudos at all, and ComputedStyle and ComputedStyleBase are very sensitive to size changes. To give you an idea of how much it matters, Blink even throws a compile-time error if the size inadvertently changes!

struct SameSizeAsComputedStyleBase {
  SameSizeAsComputedStyleBase() { Alias(&pointers); Alias(&bitfields); }
 private:
  void* pointers[9];
  unsigned bitfields[5];
};

struct SameSizeAsComputedStyle : public SameSizeAsComputedStyleBase,
                                 public RefCounted<SameSizeAsComputedStyle> {
  SameSizeAsComputedStyle() { Alias(&own_pointers); }
 private:
  void* own_pointers[1];
};

ASSERT_SIZE(ComputedStyle, SameSizeAsComputedStyle);

To move highlights out of the top-level and into a raredata group, we had to get rid of all the fancy generated code and Just write a plain struct, which has the added benefit of making the code easier to read. Luckily, we were only using that code to loop through the four highlight pseudos at this point, not dozens or hundreds of properties.

Then all we needed was a bit of JSON to tell the code generator to add an “extra” field, and find an appropriate field group for us ("*"). Because this field is not for a popular CSS property, or a property at all really, it automatically goes in a raredata group.

[{
  name: "HighlightData",
  inherited: true,
  field_template: "external",
  type_name: "StyleHighlightData",
  include_paths: ["third_party/blink/renderer/core/style/style_highlight_data.h"],
  default_value: "",
  wrapper_pointer_name: "DataRef",
  field_group: "*",
  computed_style_custom_functions: ["initial", "getter", "setter", "resetter"],
}]

Single-pass resolution

With our new storage ready, we now needed to actually write to it. We want to resolve highlight styles as part of the regular style update cycle, so that they can eventually benefit from style invalidation.

Looking at the resolver, I thought wow, there does seem to be a lot of redundant work being done when resolving highlight styles in a separate request, so why not weave highlight resolution into the resolver while we’re at it?

@@ third_party/blink/renderer/core/css/css_selector.h @@
   enum RelationType {
+    kHighlights,
@@ third_party/blink/renderer/core/css/css_selector.cc @@
       case kShadowSlot:
+      case kHighlights:
@@ third_party/blink/renderer/core/css/element_rule_collector.h @@
   MatchedRule(const RuleData* rule_data,
               unsigned style_sheet_index,
               const CSSStyleSheet* parent_style_sheet,
+              absl::optional<PseudoId> highlight)
@@ third_party/blink/renderer/core/css/resolver/match_result.h @@
   void AddMatchedProperties(
       const CSSPropertyValueSet* properties,
       unsigned link_match_type = CSSSelector::kMatchAll,
       ValidPropertyFilter = ValidPropertyFilter::kNoFilter,
+      absl::optional<PseudoId> highlight = absl::nullopt);
@@ ... @@
   const MatchedPropertiesVector& GetMatchedProperties(
+      absl::optional<PseudoId> highlight) const {
+    DCHECK(!highlight || highlight_matched_properties_.Contains(*highlight));
+    return highlight ? *highlight_matched_properties_.at(*highlight)
                      : matched_properties_;
@@ ... @@
   MatchedPropertiesVector matched_properties_;
+  HeapHashMap<PseudoId, Member<MatchedPropertiesVector>>
+      highlight_matched_properties_;
@@ third_party/blink/renderer/core/css/resolver/style_cascade.h @@
   void Apply(CascadeFilter = CascadeFilter());
+  void ApplyHighlight(PseudoId);
@@ third_party/blink/renderer/core/css/resolver/style_cascade.cc @@
 const CSSValue* ValueAt(const MatchResult& result,
+                        absl::optional<PseudoId> highlight,
@@ ... @@
 const TreeScope& TreeScopeAt(const MatchResult& result,
+                             absl::optional<PseudoId> highlight,
                              uint32_t position) {

In general we must find a less intrusive way to implement this. We can not have |highlight| params on everything.

andruud, Blink style owner

You know what? Fair enough.

Multi-pass resolution

Element::Recalc{,Own}Style are pretty big friends of the style system. They drive the style update cycle by determining how the tree has changed, making a resolver request for the element, and determining which descendants also need to be updated.

This makes them the perfect place to update highlight styles. All we need to do is make an additional resolver request for each highlight pseudo, store it in the highlight data, and bob’s your uncle.

StyleRecalcChange Element::RecalcOwnStyle(
    const StyleRecalcChange change,
    const StyleRecalcContext& style_recalc_context) {
  // ...
  if (new_style) {
    StyleHighlightData* highlights = new_style->MutableHighlightData();
    if (new_style->HasPseudoElementStyle(kPseudoIdSelection)) {
      ComputedStyle* parent = ParentComputedStyle()->HighlightData()->Selection();
      StyleRequest request{kPseudoIdSelection, parent};
      highlights->SetSelection(StyleForPseudoElement(style_recalc_context, parent));
    }
    // kPseudoIdTargetText
    // kPseudoIdSpellingError
    // kPseudoIdGrammarError
    // ...
  }
  // SetComputedStyle(new_style);
  // ...
}

Pathology in legacy

So far, I had been writing this patch as a replacement for the old inheritance logic, but since we decided to defer highlight inheritance for ::highlight to a later patch, we had to undelete the old behaviour and switch between them with a Blink feature.

Another reason for the feature gate was performance. Of the pages in the wild already using highlight pseudos, most of them probably use universal ::selection rules, if only because of how useless the old model was for more complex use cases.

::selection { color: lime; background: green; }

But ::selection isn’t magic — it literally means *::selection, which makes the rule match everywhere in the ::selection tree. When highlight inheritance is enabled, that means we end up cloning highlight styles for each descendant, only to apply the same property values, which wastes time and memory.

The reality is a bit more complicated than this, because ‘color’ and ‘background-color’ are actually in field groups that would also need to be cloned.

Under the old model, where lack of inheritance made this necessary, *::selection rules suffered from roughly the same problem, but the lazy style resolution meant that time and memory was only wasted on the elements directly containing selected content.

As a result, this will need to be fixed before we can enable the feature for everyone.

Paired cascade

Next we tried to reimplement paired cascade. For compatibility reasons, ::selection has special logic for the browser’s default ‘color’ and ‘background-color’ (e.g. white on blue), where we only use those colors if neither of them were set by the author. Otherwise, they default to initial values, usually black on transparent.

default on default
+
::selection { background: yellow; }
=initial on yellow

The spec says so in a mere 22 words:

The UA must use its own highlight colors for ::selection only when neither color nor background-color has been specified by the author.

Brevity is a good thing, and this seemed clear enough to me in the past. But once I actually had to implement it, I had questions about almost every word (#6386). While they aren’t entirely resolved, we’ve been getting pretty close over the last few weeks.

Who’s got green?

Much of the remaining work was to fix test failures and other bugs. These included crashes under legacy layout, since we only implemented this for LayoutNG, and functional changes leaking out of the feature gate. One of the reftest failures was also interesting to deal with. Let’s minimise it and take a look.

<!doctype html><meta charset="utf-8">
<title>active selection and background-color (basic)</title>
<style>
    main { color: fuchsia; background: red; }
    main::selection { background: green; }
</style>
<p>Pass if text is fuchsia on green, not fuchsia on red.
<main>Selected Text</main>
<script>/* selectNodeContents(main); */</script>

In the past, the “Selected Text” would render as fuchsia on green, and the test passes. But under highlight inheritance it fails, rendering as initial (black) on green, because we now inherit styles in a tree for each pseudo, not from the originating element.

Selected TextSelected Text

So if the test is wrong, then how do we fix it? Well… it depends on the intent of the test, at least if we want to Do The Right Thing and preserve that. Clearly the primary intent of the test is ‘background-color’, given the <title>, but tests can also have secondary, less explicit intents. In this case, the flavour text1 even mentions fuchsia!

It might have helped if the test had a <meta name=assert>, an optional field dedicated to conveying intent, but probably not. Most of the assert tags I’ve seen are poorly written anyway, being a more or less verbose adaptation of the title or flavour text, and there’s a good chance that the intent for fuchsia (if any) was simply to inherit it from the originating element, so we would still need to invent a new intent.

We could change the reference to initial (black) on green, which would serve as a secondary test that we don’t inherit from the originating element, or remove the existing ‘color’, which would serve as a secondary test for paired cascade. But I didn’t think it through that far at the time, so I gave ::selection a new ‘color’, achieving neither.

 main::selection {
+ color: aqua;
  background: green; }
 </style>
 <p>Pass if text is
- fuchsia
+ aqua
 on green, not fuchsia on red.

Because the selected and unselected text colors were now different, I created another test failure, though only under legacy layout. The reference for this test was straightforward: aqua on green, no mention of fuchsia. This makes sense on the surface, given that all of the text under test was selected.

In this case, the tip of the “t” was crossing the right edge of the selection as ink overflow, and were carefully painting the overflow in the unselected color. The test would have failed under LayoutNG too, if not for an optimisation that skips this technique when everything is selected. Let me illustrate with an exaggerated example:

This behaviour is generally considered desirable, at least when there are unselected characters, so Blink isn’t exactly wrong here. It’s definitely possible to make the active-selection tests account for this — and the tools to do so already exist in the Web Platform Tests — but I don’t have the time to pursue this right now.

What now?

After the holidays, we plan to:

  • Resolve the remaining spec issues. These issues are critical for finishing highlight inheritance and allowing highlights to add their own decorations.
  • Port ::selection’s painting logic to the other highlights. We might even use this as an opportunity to roll ::selection into the marker system.

Other work needed before we can ship the spelling and grammar features:

  • Ship highlight inheritance. This includes addressing any spec resolutions, fixing the performance issues, and adding devtools support.
  • Integrate spelling and grammar errors with decoration painting (bug 1257553).
  • Make automated testing possible for spelling and grammar errors (wpt#30863).

Special thanks to Rego, Frédéric (Igalia), Rune, andruud (Google), Florian, fantasai (CSSWG), Emilio (Mozilla), and Fernando (Microsoft). We would also like to thank Bloomberg for sponsoring this work.


  1. This is an automated reftest, so the instructions in <p> have no effect on the outcome. We require them anyway, because they add a bit of redundancy that helps humans understand and verify the test’s assertions. 

December 16, 2021 12:30 PM

December 13, 2021

Andy Wingo

webassembly: the new kubernetes?

I had an "oh, duh, of course" moment a few weeks ago that I wanted to share: is WebAssembly the next Kubernetes?

katers gonna k8s

Kubernetes promises a software virtualization substrate that allows you to solve a number of problems at the same time:

  • Compared to running services on bare metal, Kubernetes ("k8s") lets you use hardware more efficiently. K8s lets you run many containers on one hardware server, and lets you just add more servers to your cluster as you need them.

  • The "cloud of containers" architecture efficiently divides up the work of building server-side applications. Your database team can ship database containers, your backend team ships java containers, and your product managers wire them all together using networking as the generic middle-layer. It cuts with the grain of Conway's law: the software looks like the org chart.

  • The container abstraction is generic enough to support lots of different kinds of services. Go, Java, C++, whatever -- it's not language-specific. Your dev teams can use what they like.

  • The operations team responsible for the k8s servers that run containers don't have to trust the containers that they run. There is some sandboxing and security built-in.

K8s itself is an evolution on a previous architecture, OpenStack. OpenStack had each container be a full virtual machine, with a whole kernel and operating system and everything. K8s instead generally uses containers, which don't generally require a kernel in the containers. The result is that they are lighter-weight -- think Docker versus VirtualBox.

In a Kubernetes deployment, you still have the kernel at a central place in your software architecture. The fundamental mechanism of containerization is the Linux kernel process, with private namespaces. These containers are then glued together by TCP and UDP sockets. However, though one or more kernel process per container does scale better than full virtual machines, it doesn't generally scale to millions of containers. And processes do have some start-up time -- you can't spin up a container for each request to a high-performance web service. These technical contraints lead to certain kinds of system architectures, with generally long-lived components that keep some kind of state.

k8s <=? w9y

Server-side WebAssembly is in a similar space as Kubernetes -- or rather, WebAssembly is similar to processes plus private namespaces. WebAssembly gives you a good abstraction barrier and (can give) high security isolation. It's even better in some ways because WebAssembly provides "allowlist" security -- it has no capabilities to start with, requiring that the "host" that runs the WebAssembly explicitly delegate some of its own capabilities to the guest WebAssembly module. Compare to processes which by default start with every capability and then have to be restricted.

Like Kubernetes, WebAssembly also gets you Conway's-law-affine systems. Instead of shipping containers, you ship WebAssembly modules -- and some metadata about what kinds of things they need from their environment (the 'imports'). And WebAssembly is generic -- it's a low level virtual machine that anything can compile to.

But, in WebAssembly you get a few more things. One is fast start. Because memory is data, you can arrange to create a WebAssembly module that starts with its state pre-initialized in memory. Such a module can start in microseconds -- fast enough to create one on every request, in some cases, just throwing away the state afterwards. You can run function-as-a-service architectures more effectively on WebAssembly than on containers. Another is that the virtualization is provided entirely in user-space. One process can multiplex between many different WebAssembly modules. This lets one server do more. And, you don't need to use networking to connect WebAssembly components; they can transfer data in memory, sometimes even without copying.

(A digression: this lightweight in-process aspect of WebAssembly makes it so that other architectures are also possible, e.g. this fun hack to sandbox a library linked into Firefox. They actually shipped that!)

I compare WebAssembly to K8s, but really it's more like processes and private namespaces. So one answer to the question as initially posed is that no, WebAssembly is not the next Kubernetes; that next thing is waiting to be built, though I know of a few organizations that have started already.

One thing does seem clear to me though: WebAssembly will be at the bottom of the new thing, and therefore that the near-term trajectory of WebAssembly is likely to follow that of Kubernetes, which means...

  • Champagne time for analysts!

  • The Gartner ✨✨Magic Quadrant✨✨™®© rides again

  • IBM spins out a new WebAssembly division

  • Accenture starts asking companies about their WebAssembly migration plan

  • The Linux Foundation tastes blood in the waters

And so on. I see turbulent waters in the near-term future. So in that sense, in which Kubernetes is not essentially a technical piece of software but rather a nexus of frothy commercial jousting, then yes, certainly: we have a fun 5 years or so ahead of us.

by Andy Wingo at December 13, 2021 03:50 PM

Julie Kim

Mojo conversion in printing

[from the renderer to the browser]

For the past few years, Chromium has put effort into the conversion of the legacy IPC system to Mojo.


If you need information about Mojo, please take a look at the documents and a video below.
Igalia also joined the task, and we converted the old-style IPC communication into Mojo messages. The highest priority was the conversion for the messages in //content area, and then we expanded into other components. Regarding the work on Mojo conversion by Igalia, please see the blog post from one of my teammates, Mario.

In this blog post, I’d like to talk about the Mojo conversion in the printing module. I gave a small talk about this during BlinkOn14 lightning talk.

Chromium architecture is based on the multiprocess model and a big amount of IPC messages are for the communication between the browser process and the renderer process. The printing module also communicates with both of them. When I started to work on the Mojo conversion in the printing module, the conversion from the browser to the renderer had already been completed. So, I looked into the messages from the renderer to the browser.


Printing in Chromium

I knew that the printing functionality is great while I was using the Chromium browser, but I was not aware that it has a lot of features with consideration of various environments. On the desktop, you could use it through

  • window.print() from javascript code.
  • Keyboard shortcut (e.g. ctrl+p)
  • From the context menu: print a page or print a selection.
  • The three dots menu
It might have more entry points I don’t know yet.


Once it arrives at the printing main code, it has almost similar code flow but many issues I faced were caused by these various entrance points and use cases. If you have used the printing from Chromium, I believe you’re aware of the printing preview. But, did you know that you could also directly use the printing native menu, skipping the printing preview? I didn’t know it. To disable the print preview,
Command line Switch option: --disable-print-preview
Keyboard shortcut:
  - Preview : ctrl+p
  - Without Preview : ctrl+shift+p
And, the printing module also supports printing on the headless and Android WebView. Whenever the CL related to the Mojo conversion landed, I tried to verify all the possible combinations but it always contained regressions.



Printing with legacy IPC vs Printing with Mojo

Here are legacy IPC messages. You can see how many messages the printing module had and how they were distributed.
Using legacy IPC

And, the next is how they have been changed now. You can see each end point.
Using Mojo
For the interfaces added, you could refer to //components/printing/common/print.mojom.

So why do we want to use Mojo instead of the legacy IPC? Legacy IPC is deprecated in Chromium and Mojo helps to refactor Chromium into a large set of smaller services. For more detail, please refer to this document. With the Mojo conversation in printing, we also removed multiple layers in printing for legacy IPC.


Replacing EnableMessagePumping()

There were a few challenges while working on the printing Mojo conversion. One of them was when I needed to find something to replace EnableMessagePumping(). That is a function that supports pumping messages while a sync message is waiting for a reply. It was not used in many places because it could increase complexity during IPC and the messages used it or the use cases had been cleaned up already. In the printing module, there were two legacy IPC messages using it. After much deliberation on how to handle this part, I defined a sync message and passed a callback that quits the runloop. Then, the nested runloop allows running tasks while it’s waiting for a reply.


A racing issue

There were also some interesting issues. One of them was related to handling the printing result of multiple renderer processes. When there is more than one renderer process by subframes, the browser process manages all the results from each renderer. In that case, there was a racing issue like below.
A racing issue
The sub frame should be handled after the compositor for the main frame is created but it could be handled before the main frame. So, it’s been updated with queuing the sub frame if the compositor is not created and handling the queue after the compositor is created.


Wrap up

With the Mojo conversion in the printing module, I’ve naturally cleaned up classes, structures, and files that are no longer necessary. \o/


There were a few more problems after I thought the work was done, but with the help of many people I was able to solve them safely. Thanks a lot, all people who reviewed CLs, reported issues, and discussed how to fix them.


by jkim at December 13, 2021 10:23 AM

December 09, 2021

Samuel Iglesias

Cross-compiling with icecream

Introduction

One of the big issues I have when working on Turnip driver development is that when compiling either Mesa or VK-GL-CTS it takes a lot of time to complete, no matter how powerful the embedded board is. There are reasons for that: typically those board have limited amount of RAM (8 GB for the best case), a slow storage disk (typically UFS 2.1 on-board storage) and CPUs that are not so powerful compared with x86_64 desktop alternatives.

RB3 Photo of the Qualcomm® Robotics RB3 Platform embedded board that I use for Turnip development.

To fix this, it is recommended to do cross-compilation, however installing the development environment for cross-compilation could be cumbersome and prone to errors depending on the toolchain you use. One alternative is to use a distributed compilation system that allows cross-compilation like Icecream.

Icecream is a distributed compilation system that is very useful when you have to compile big projects and/or on low-spec machines, while having powerful machines in the local network that can do that job instead. However, it is not perfect: the linking stage is still done in the machine that submits the job, which depending on the available RAM, could be too much for it (however you can alleviate this a bit by using ZRAM for example).

One of the features that icecream has over its alternatives is that there is no need to install the same toolchain in all the machines as it is able to share the toolchain among all of them. This is very useful as we will see below in this post.

Installation

Debian-based systems

$ sudo apt install icecc

Fedora systems

$ sudo dnf install icecream

Compile it from sources

You can compile it from sources.

Configuration of icecc scheduler

You need to have an icecc scheduler in the local network that will balance the load among all the available nodes connected to it.

It does not matter which machine is the scheduler, you can use any of them as it is quite lightweight. To run the scheduler execute the following command:

$ sudo icecc-scheduler

Notice that the machine running this command is going to be the scheduler but it will not participate in the compilation process by default unless you ran iceccd daemon as well (see next step).

Setup on icecc nodes

Launch daemon

First you need to run the iceccd daemon as root. This is not needed on Debian-based systems, as its systemd unit is enabled by default.

You can do that using systemd in the following way:

$ sudo systemctl start iceccd

Or you can enable the daemon at startup time:

$ sudo systemctl enable iceccd

The daemon will connect automatically to the scheduler that is running in the local network. If that’s not the case, or there are more than one scheduler, you can run it standalone and give the scheduler’s IP as parameter:

sudo iceccd -s <ip_scheduler>

Enable icecc compilation

With ccache

If you use ccache (recommended option), you just need to add the following in your .bashrc:

export CCACHE_PREFIX=icecc

Without ccache

To use it without ccache, you need to add its path to $PATH envvar so it is picked before the system compilers:

export PATH=/usr/lib/icecc/bin:$PATH

Execution

Same architecture

If you followed the previous steps, any time you compile anything on C/C++, it will distribute the work among the fastest nodes in the network. Notice that it will take into account system load, network connection, cores, among other variables, to decide which node will compile the object file.

Remember that the linking stage is always done in the machine that submits the job.

Different architectures (example cross-compiling for aarch64 on x86_64 nodes)

Icecream Icemon showing my x86_64 desktop (maxwell) cross-compiling a job for my aarch64 board (rb3).

Preparation on x86_64 machine

In one x86_64 machine, you need to create a toolchain. This is not automatically done by icecc as you can have different toolchains for cross-compilation.

Install cross-compiler

For example, you can install the cross-compiler from the distribution repositories:

For Debian-based systems:

sudo apt install crossbuild-essential-arm64

For Fedora:

$ sudo dnf install gcc-aarch64-linux-gnu gcc--c++-aarch64-linux-gnu

Create toolchain for icecc

Finally, to create the toolchain to share in icecc:

$ icecc-create-env --gcc /usr/bin/aarch64-linux-gnu-gcc /usr/bin/aarch64-linux-gnu-g++

This will create a <hash>.tar.gz file. The <hash> is used to identify the toolchain to distribute among the nodes in case there is more than one. But don’t worry, once it is copied to a node, it won’t be copied again as it detects it is already present.

Note: it is important that the toolchain is compatible with the target machine. For example, if my aarch64 board is using Debian 11 Bullseye, it is better if the cross-compilation toolchain is created from a Debian Bullseye x86_64 machine (a VM also works), because you avoid incompatibilities like having different glibc versions.

If you have installed Debian 11 Bullseye in your aarch64, you can use my own cross-compilation toolchain for x86_64 and skip this step.

Copy the toolchain to the aarch64 machine

scp <hash>.tar.gz aarch64-machine-hostname:

Preparation on aarch64

Once the toolchain (<hash>.tar.gz) is copied to the aarch64 machine, you just need to export this on .bashrc:

# Icecc setup for crosscompilation
export CCACHE_PREFIX=icecc
export ICECC_VERSION=x86_64:~/<hash>.tar.gz

Execute

Just compile on aarch64 machine and the jobs be distributed among your x86_64 machines as well. Take into account the jobs will be shared among other aarch64 machines as well if icecc decides so, therefore no need to do any extra step.

It is important to remark that the cross-compilation toolchain creation is only needed once, as icecream will copy it on all the x86_64 machines that will execute any job launched by this aarch64 machine. However, you need to copy this toolchain to any aarch64 machines that will use icecream resources for cross-compiling.

Icecream monitor

Icemon

This is an interesting graphical tool to see the status of the icecc nodes and the jobs under execution.

Install on Debian-based systems

$ sudo apt install icecc-monitor

Install on Fedora

$ sudo dnf install icemon

Install it from sources

You can compile it from sources.

Acknowledgments

Even though icecream has a good cross-compilation documentation, it was the post written 8 years ago by my Igalia colleague Víctor Jáquez the one that convinced me to setup icecream as explained in this post.

Hope you find this info as useful as I did :-)

December 09, 2021 02:28 PM

Tim Chevalier

Fun with pointer arithmetic

a diagram showing the layout of the NativeObject and RecordType typesA picture is worth a thousand words, they say.

Where I left off yesterday, I was trying to figure out why my generated AddRecordProperty code was crashing. I was still using the simplest possible test case, a record with one field:

function f() { x = #{"a": 1}; }

Fixed slots

My code was writing a literal zero into the record’s “initialized length” slot:

store32(Imm32(0), Address(result, NativeObject::getFixedSlotOffset(RecordType::INITIALIZED_LENGTH_SLOT)));

But this should have been:

  storeValue(Int32Value(0), Address(result, NativeObject::getFixedSlotOffset(RecordType::INITIALIZED_LENGTH_SLOT)));

In the drawing, I indicated that offset 24 of a RecordType is a Value (JS::Value) denoting the number of elements that have been initialized so far. While it’s an invariant that this will actually be an integer value, as far as the compiler is concerned, the representation is of a Value, which has a different bit pattern from the integer 0.

Some existing code in RecordType::createUninitializedRecord() (this is code that isn’t upstream yet) should have been a clue:

  uint32_t length = getFixedSlot(INITIALIZED_LENGTH_SLOT).toPrivateUint32();

To get an unsigned int32, we call the Value method toPrivateUint32(), which returns an integer when called on an integer value.

Moreover, the getFixedSlot() method of NativeObject also returns a Value, which should have been a pretty good hint to me that fixed slots are Values:

const Value& getFixedSlot(uint32_t slot) const;

(NativeObject.h)

Observing the length field

Supposing that the register %rcx points to a record, I would like to be able to execute:

call js::DumpValue(((RecordType*) $rcx)->getFixedSlot(INITIALIZED_LENGTH_SLOT))

in gdb. (Where INITIALIZED_LENGTH_SLOT is defined as 0, since it happens to be the first fixed slot in this object.) Casting the value in %rcx to RecordType is necessary to tell gdb where the struct fields begin and end, but from there, I would have thought there would be enough debug information for it to know that RecordType inherits from NativeObject, which has a getFixedSlot() method.

Since I can’t do that, the next best thing is:

(gdb) call js::DumpValue( (JS::Value::fromRawBits (*($rcx + 24)) ))
0

And that works — it prints 0, which is what I would expect for a record with no initialized fields. Effectively, I inlined getFixedSlot(), which accesses offset 24 from the object. Then, JS::Value::fromRawBits decodes the tagged pointer that represents a Value, and DumpValue() pretty-prints it.

Observing the sortedKeys field

Looking at the picture again, records have a second fixed slot that’s a Value that is guaranteed (assuming the compiler works) to correspond to an ArrayObject, which just contains the record keys, in sorted order. I knew that my code was temporarily storing the value of this slot in register %rbx (as before, I figured this out by putting breakpoints in the code generation methods and looking at the values of various variables), so if I do:

call js::DumpValue ((JS::Value::fromRawBits($rbx)))

in gdb, I get output that’s something like <Array object at 232d468007f8>

But for more detail, I can do:

(gdb) call js::DumpObject (& ((JS::Value::fromRawBits($rbx)).toObject()))
object 29574ae007f0
  global 21bd66c40030 [global]
  class 5555590a8770 Array
  shape 21bd66c66320
  flags:
  proto 
  properties:
    [Latin 1]"length" (map 21bd66c62670/0 writable )

which interprets the Value as an Object and uses DumpObject to print out more details about its representation.

Observing the record itself

Having seen that the individual fixed slots of the record seemed to be correct, I wanted to debug my generated code for creating uninitialized records to see what the entire record object looked like. Knowing that the record was stored in %rcx, I figured out that I could do:

(gdb) call js::DumpObject (& ((JS::Value::fromRawBits($rcx)).toExtendedPrimitive()))
object 29574ae007b0
  global 21bd66c40030 [global]
  class 5555590b7d10 record
  shape 21bd66c663c0
  flags:
  proto null
  reserved slots:
      0 : 0
      1 : 
      2 : false
  properties:
  elements:
      0: false
      1: Assertion failure: (asBits_ & js::gc::CellAlignMask) == 0 (GC pointer is not aligned. Is this memory corruption?), at /home/tjc/gecko-fork/obj-x64-debug/dist/include/js/Value.h:622

“Extended primitive” is a provisional name for records and tuples, which are not objects, but are (in our prototype implementation) represented internally in the compiler as objects; that’s why I’m able to use DumpObject to print out the fields. Under “reserved slots”, it’s showing me the values of the three reserved slots shown at offsets, 24, 32, and 40 in the picture above.

Obviously, it’s a warning sign that trying to print out the elements array causes an assertion failure. I would like to be able to print it out using:

call js::DumpValue(((RecordType*) $rcx)->getElements()[0])

since getElements() is a NativeObject method that represents an array of Values. But this doesn’t work in gdb, so knowing that the offset of the elements_ field is 16, I did the following:

(gdb) p *(ObjectElements::fromElements((HeapSlot*) ($rcx + 16)))
$10 = { flags = 3952501696,  initializedLength = 13525, capacity = 1437286072, length = 21845}
    (gdb) 

I deleted some of the output so as to show only the dynamic fields. The picture above makes this much clearer, but once we access the HeapSlot array stored in the elements_ field, we can call the ObjectElements::fromElements method on it to access the elements header, which is stored physically before the array itself. This header consists of four int32 fields: flags, initializedLength, capacity, and length. This output makes it seem like garbage got written into the object, since the initialized length shouldn’t be 13525.

Looking at the slots_ array of the record (which is at offset 8), I observed:

p ((HeapSlot*) (*($rcx + 8)))
$22 = (js::HeapSlot *) 0x55ab3eb8
(gdb) 

This coresponds to this MacroAssembler call:

  storePtr(ImmPtr(emptyObjectSlots), Address(result, NativeObject::offsetOfSlots()));

result here represents the register that contains the record, and offsetOfSlots() is 8. I had copied/pasted this code from existing code that initializes arrays, without looking at it too carefully, but when I read it again, I noticed that emptyObjectSlots is a special value. A comment in NativeObject.h mentions that special singleton values are used when the elements_ and slots_ arrays are empty.

Initializing the elements

And that’s how I realized that some other code that I had copied and pasted out of the existing array initialization code was misplaced:

  // Initialize elements pointer for fixed (inline) elements.
  computeEffectiveAddress(
      Address(result, NativeObject::offsetOfFixedElements()), temp);
  storePtr(temp, Address(result, NativeObject::offsetOfElements()));

This works fine for what it does, but I needed to do the equivalent of the NativeObject::setEmptyElements() method:

Should do exactly what NativeObject::setEmptyElements() does:

  void setEmptyElements() { elements_ = emptyObjectElements; }

(code)

So what I really wanted was:

  // Initialize elements pointer
  storePtr(ImmPtr(emptyObjectElements),
           Address(result, NativeObject::offsetOfElements()));
                     

And after making that change, the elements header looked a lot better:

(gdb) p (*((ObjectElements*) ($rcx)))
$8 = {flags = 0, initializedLength = 0, capacity = 1, length = 1, static VALUES_PER_HEADER = 2}
(gdb)  call ((JSObject*) $rcx)->dump()
object 17d67b3007b0
  global 2689ff340030 [global]
  class 5555590b7b80 record
  shape 2689ff3663c0
  flags:
  proto null
  reserved slots:
      0 : 0
      1 : 
      2 : false
  properties:
    (gdb) 

There’s more to recount, but I don’t have any more time today.

by Tim Chevalier at December 09, 2021 01:58 AM

December 08, 2021

Víctor Jáquez

GstVA in GStreamer 1.20

It was a year and half ago when I announced a new VA-API H.264 decoder element in gst-plugins-bad. And it was bundled in GStreamer release 1.18 a couple months later. Since then, we have been working adding more decoders and filters, fixing bugs, and enhancing its design. I wanted to publish this blog post as soon as release 1.20 was announced, but, since the developing window is closed, which means no more new features will be included, I’ll publish it now, to create buzz around the next GStreamer release.

Here’s the list of new GstVA decoders (of course, they are only available if your driver supports them):

  • vah265dec
  • vavp8dec
  • vavp9dec
  • vaav1dec
  • vampeg2dec

Also, there are a couple new features in vah264dec (common to all gstcodecs-based H.264 decoders):

  • Supports interlaced streams (vah265dec and vampeg2dec too).
  • Added a compliance property to trick the specification conformance for lower the latency, for example, or to enable non-standard features.

But not only decoders, there are two new elements for post-processing:

  • vapostproc
  • vadeinterlace

vapostproc is similar to vaapipostproc but without the interlace operation, since it was moved to another element. The reason for this is because there are deinterlacing methods which require to hold a list of referenced frames, thus these methods are broken in vaapipostproc, and adding them would increase the complexity of the element with no need. To keep things simple it’s better to handle deinterlacing in a different element.

This is the list of filters and features supported by vapostproc:

  • Color conversion
  • Resizing
  • Cropping
  • Color balance (Intel only -so far-)
  • Video direction (Intel only)
  • Skin tone enhancement (Intel only)
  • Denoise and Sharpen (Intel only)

And, I ought to say, HDR is in the pipeline, but it will be released after 1.20.

While vadeinterlace only does that, deinterlacing. But it supports all the available methods currently in the VA-API specification, using the new way to select the field to extract, since the old one (used by GStreamer-VAAPI and FFMPEG) is a bit more expensive.

Finally, both video filters, if they cannot handle the income format, they are configured in passthrough mode.

But there are not only new elements, there’s also a new library!

Since many others elements need to share a common VADisplay in the GStreamer pipeline, the new library expose only the GstVaDisplay object by now. The new library must be thin and lean, exposing only what it’s requested by other elements, such as gst-msdk. We have pending to merge after 1.20, for example, the add of GstContext helpers, and the plan is to expose the allocators and bufferpools later.

Another huge task are encoders. After the freeze, we’ll merge the first
implementation of the H.264 encoder
, and add, in different iterations, more encoders.

As I said in the previous blog post, all these elements are ranked as none, so the won’t be autoplugged, for example by playbin. To do so, users need to export the environment variable GST_PLUGIN_FEATURE_RANK as documented.

$ GST_PLUGINS_FEATURE_RANK=vah264dec:MAX,vah265dec:MAX,vampeg2dec:MAX,vavp8dec:MAX,vavp9dec:MAX gst-play-1.0 stream.mp4

Thanks a bunch to He Junyan, Seungha Yang and Nicolas Dufresne, for all the effort and care.


Still, the to-do list is large enough. Just to share what I have in my notes:

  • Add a new upload method in glupload to interop with VA surfaces — though this hardly will be merged since it creates a circular dependency between -base and -bad.
  • vavc1dec — it might need a rewrite of vc1parse.
  • vajpegdec — it needs a rewrite of jpegparse.
  • vaalphacombine — decoding alpha channel with VA within vp9alphacodebin and vp8alphacodebin
  • vamixer — similar to compositor, glmixer or vaapioverlay, to compose a single frame from different video streams.
  • And encoders (mainly H.264 and H.265).

As a final mode, GStreamer-VAAPI has enter into maintenance mode. The general plan, without any promise or dates, is to deprecate it when most of its use cases were covered by GstVA.

by vjaquez at December 08, 2021 11:58 AM

Tim Chevalier

Observability

I’m realizing that this blog is mostly about debugging, since figuring out how to debug SpiderMonkey is taking up most of my time. Right now, the actual compilation algorithms I’m implementing are straightforward; if I was working on performance tuning an existing implementation or something, I’m sure it would be different. What’s actually hard is finding ways to make generated code more observable. I’m sure that more experienced SpiderMonkey hackers have their own techniques, but since there isn’t a lot of documentation to go on, I’m making it up mostly from scratch.

Having gotten empty records to work, I started working on code generation for the AddRecordProperty opcode, so we can have non-empty records. When I initially tested my code with input x = #{"a": 1}, I got an assertion failure:

Assertion failure: !this->has(reg), at /home/tjc/gecko-fork/js/src/jit/RegisterSets.h:680

Thread 1 "js" received signal SIGSEGV, Segmentation fault.
0x00005555580486c9 in js::jit::SpecializedRegSet<js::jit::LiveSetAccessors<js::jit::TypedRegisterSet >, js::jit::TypedRegisterSet >::add (
    this=0x7fffffffc768, reg=...) at /home/tjc/gecko-fork/js/src/jit/RegisterSets.h:680
680     MOZ_ASSERT(!this->has(reg));

AddRecordProperty takes three operands (the record, the field name, and the value for the field). When I dug into it, it turned out that two of the three operands had been allocated to the same register, which wasn’t right. The CacheIRCompiler module handles register allocation, spilling registers to the stack when necessary, so you can implement new operations without having to worry about those details. But in this case, I had to worry about that detail. Internally, the register allocator keeps a list, operandLocations_ that maps integers (operands are numbered from 0 onward) to physical registers. It turned out that operands 0 and 2 were both allocated to %rax, which I figured out in gdb (irrelevant details snipped):

(gdb) p operandLocations_[0]
$13 = (js::jit::OperandLocation &) @0x7fffffffc680: {
  kind_ = js::jit::OperandLocation::PayloadReg, data_ = {payloadReg = {reg = {
        reg_ = js::jit::X86Encoding::rax, 
                ...
(gdb) p operandLocations_[1]
$14 = (js::jit::OperandLocation &) @0x7fffffffc690: {
  kind_ = js::jit::OperandLocation::ValueReg, data_ = {payloadReg = {reg = {
        reg_ = js::jit::X86Encoding::rbx, 
                ...
(gdb) p operandLocations_[2]
$15 = (js::jit::OperandLocation &) @0x7fffffffc6a0: {
  kind_ = js::jit::OperandLocation::ValueReg, data_ = {payloadReg = {reg = {
        reg_ = js::jit::X86Encoding::rax,  

The BaselineCacheIRCompiler::init() method (code) is called each time a new opcode is compiled, and it assigns a virtual register for each operand. I compared the code I’d added for AddRecordProperty with the code for the other opcodes that also have 3 operands (such as SetElem), I noticed that all of them passed the third operand on the stack. When I changed AddRecordProperty to work the same way, the problem was fixed. I don’t know how I would have figured this out except for a lucky guess! I’m guessing it’s this way because of the small number of registers on x86, but the internal error-checking here seems lacking; it doesn’t seem good to silently alias distinct operands to each othre.

Once I changed the init() method and modified my emitAddRecordPropertyResult() method to expect the third operand on the stack, I found that for the compiled code to be run and not just emitted, I had to embed the code in a function once again:
function f() { x = #{"a":1}; }

(I still don’t understand the details as to when the JIT decides to generate native code.)

The result was a segfault, which was okay for the time being; it means that my code was at least able to generate code for setting a record property, without internal errors.

I realized that I wasn’t using the protocol for calling from the JIT into C++ functions (the callVM() method in the baseline compiler) correctly, but took a break before fixing it.

After taking a break, I fixed that problem only to run into another one: callVM() doesn’t automatically save registers. The macro assembler provides another method, callWithABI() (code), that makes it easier to save and restore registers. Evidently, ABI calls are less efficient than VM calls, but I’m not worried about performance at the moment. I switched it around to use callWithABI() and had to do some debugging by setting breakpoints both in the C++ code for the callee function and in the generated assembly code to make sure all the data types matched up. That worked reasonably well, but alas, I had to switch back to callVM() because I couldn’t figure out how to pass a HandleObject (see the comment at the beginning of RootingAPI.h for a fairly good explanation) into an ABI function. I added code to manually save and restore registers.

That worked okay enough, and my generated code got almost all the way through without a segfault. At the very end, my code calls the existing emitStoreDenseElement method. Internally (in the current implementation; this is all subject to change), records contain a sorted array of keys. Separately from storing the actual value at the correct offset in the record, the key has to be added to the array. (Since records are immutable, the actual sorting only has to happen once, when the FinishRecord opcode executes.)

I was getting a segfault because there was something wrong about the array that I was passing into the generated code for storing dense elements. I haven’t found the problem yet, but (going back to the beginning of this post), I’m trying to find ways to make the generated code more transparent. In this case, I knew that the emitStoreDenseElement code was getting its array input in the %rsi register, so I put a breakpoint in the right place and looked at it in gdb:


Thread 1 "js" received signal SIGTRAP, Trace/breakpoint trap.
0x00001dec1e58ab70 in ?? ()

(gdb) p *((JSObject*) $rsi)
$3 = {<js::gc::CellWithTenuredGCPointer> = { = {
      header_ = {<mozilla::detail::AtomicBaseIncDec> = {<mozilla::detail::AtomicBase> = {
            mValue = {<std::__atomic_base> = {static _S_alignment = 8, _M_i = 6046507808}, 
              static is_always_lock_free = true}}, }, }, static RESERVED_MASK = 7, static FORWARD_BIT = 1}, }, 
  static TraceKind = JS::TraceKind::Object, static MAX_BYTE_SIZE = 152}
(gdb) p *((JSObject*) $rsi).getHeader()
Attempt to take address of value not located in memory.
(gdb) p ((JSObject*) $rsi)->header_
$4 = {<mozilla::detail::AtomicBaseIncDec> = {<mozilla::detail::AtomicBase> = {
      mValue = {<std::__atomic_base> = {static _S_alignment = 8, _M_i = 6046507808}, 
        static is_always_lock_free = true}}, }, }
(gdb) p ((JSObject*) $rsi)->slots_
There is no member or method named slots_.
(gdb) p ((js::HeapSlot*) ($rsi + 8))
$5 = (js::HeapSlot *) 0x3a6c5de007f8
(gdb) p ((js::HeapSlot*) ($rsi + 8))[0]
$6 = {<js::WriteBarriered> = {<js::BarrieredBase> = {value = {
        asBits_ = 93824997702264}}, <js::WrappedPtrOperations<JS::Value, js::WriteBarriered, void>> = {}, }, }
(gdb) p ((js::HeapSlot*) ($rsi + 8))[1]
$7 = {<js::WriteBarriered> = {<js::BarrieredBase> = {value = {
        asBits_ = 64237105842200}}, <js::WrappedPtrOperations<JS::Value, js::WriteBarriered, void>> = {}, }, }
(gdb) p ((js::HeapSlot*) ($rsi + 8))[2]
$8 = {<js::WriteBarriered> = {<js::BarrieredBase> = {value = {
        asBits_ = 0}}, <js::WrappedPtrOperations<JS::Value, js::WriteBarriered, void>> = {}, }, }
(gdb) p ((js::HeapSlot*) ($rsi + 16))[0]
$9 = {<js::WriteBarriered> = {<js::BarrieredBase> = {value = {
        asBits_ = 64237105842200}}, <js::WrappedPtrOperations<JS::Value, js::WriteBarriered, void>> = {}, }, }
(gdb) p ((js::HeapSlot*) ($rsi + 16))[0].value
$10 = {asBits_ = 64237105842200}
Slot*) ($rsi + 16))))
No symbol "ObjectElements" in current context.
(gdb) p (js::ObjectElements::fromElements (((js::HeapSlot*) ($rsi + 16))))
$11 = (js::ObjectElements *) 0x3a6c5de007f0
(gdb) p (js::ObjectElements::fromElements (((js::HeapSlot*) ($rsi + 16))))->initializedLength
$12 = 1
(gdb) 

I’m posting this debugging transcript to show some of the things I’ve discovered so far. In gdb, you can cast a pointer to an arbitrary type and use that to print out the contents according to the physical layout of that type; hence, printing ((JSObject*) $rsi. Weirdly, the NativeObject class (code) seems to be inaccessible in gdb; I’m not sure if this is because NativeObject inherits from JSObject (in this case, I know that the object pointed to by %rsi is a NativeObject, or because of something different about that class. (None of NativeObject, js::NativeObject, or JS::NativeObject seem to be names that gdb recognizes.) That makes it harder to print some of the other fields, but the debugger does know about the header_ field, since it’s defined in the Cell class (code) that JSObject inherits from. The getHeader() method is just supposed to return the header_ field, so I’m not sure why gdb reported the “Attempt to take address of value not located in memory” error.

I knew that NativeObjects have a slots_ field and an elements_ field, physically laid out right after the header_ field, but due to the problem I mentioned with the NativeObject class, gdb won’t recognize these field names. I knew that slots_ would be at offset 8 and elements would be at offset 16, so using some pointer math and casting the pointers to the type that both slots_ and elements_ have (js::HeapSlot*; the HeapSlot class is defined here), I was able to make gdb treat these addresses as HeapSlot arrays. HeapSlot is basically an alias for the Value type, which represents all runtime values in SpiderMonkey (see Value.h).

The slots_ array contains “fixed slots” (attributes, basically); this object’s slots array has two elements that are initialized (the one at index 2 is printed as if its asBits_ field is 0, meaning it’s just a 0 and is probably uninitialized). For array-like objects (which this one is — if you’ve lost track, it’s the array of sorted record keys that’s part of a record), the elements_ array is an array of, well, elements. NativeObject has a getElements method, which casts the HeapSlot array into an ObjectElements object (which has some additional methods on it), but I can’t call any NativeObject methods from gdb, so instead I called the ObjectElements::fromElements method directly to cast it. Since the initializedLength() method on the resulting object returns 1, which is correct (the previous generated code sets the initialized length to 1, since we’re adding a property to a previously empty record), it seems like the layout is vaguely correct.

That’s as far as I got. Debugging tools for generated code that don’t require all these manual casts and have more access to type information that would make pretty-printing easier would be great! Maybe that exists and I haven’t found it, or maybe it’s just that compiler implementors are a small enough group that making tools more usable isn’t too compelling for anyone. In any case, the hard part about all of this is finding good ways to observe data at runtime and relate it to types that exist in the compiler.

by Tim Chevalier at December 08, 2021 05:31 AM

December 07, 2021

Brian Kardell

Helping Move the Web Forward: The UI Fund

Helping Move the Web Forward: The UI Fund

It is a universal truth that one’s time and energy are, unfortunately, limited. We have to priotize how we spend it. I’ve talked a lot this year about how implementation funding limits bottleneck standards, but there are other important kinds of limits too. In this piece I’ll talk about one of them, and an exciting new development to help it from the Chrome team.

The truth is, lots more people would love to help move the web forward in some way, but their budgets of “free” time and energy are small. As a result, many potentially good things just don’t happen. Lots of critical stuff that does happen ultimately winds up being unhealthy for people who began work as a labor of love.

Efforts toward potential stanardization are especially hard in this respect. Ultimately we need lots more web developers to be involved in finding the ways we move the web forward. In the end, they’ll provide the fitness test anyway But, the truth is, that while we can (and have) made our processes and efforts theoretically open, the kind of sustained attention and efforts that standardization requires are implausibly difficult without the support of one’s employer.

That’s why I was pretty psyched to read this post about Chrome’s new UI Fund by my friend Nicole Sullivan. It aims to add some grant-based funding toward regular people doing lots of the work that helps move the Web forward.

I think this is an amazingly great idea that I’m excited to watch develop. I feel like it is very spirtually aligned with our Open Prioritization efforts - finding ways to invest in the under-addressed parts of the platform and ecosystem that need it.

The hard economics of developer involvement

Why am I so excited? Standardization operates on a timescale completely foreign to most developers. It requires long, sustained efforts and massive coordionations of time and attention where things move forward in fits and starts as planets align. The value proposition is an all-or-nothing reward that might pay off only years down the road.

This is a problem that has intrigued me a long time.

I’ve been very keen on reimagining standards as a more natural process, most of which didn’t happen in a committee. I believe this could help address the economics on both ends of it. I have tried to help imagine a number of things here to imagine better involvement and lower barriers.

Robin Berjon and later I created a discourse instance we called “Specifiction” which tried to help with some of this. Later, with the jQuery Foundation I helped the (now defunct) chapters.io to try to engage developers directly. I’ve been involved with things like WebWeWant and the WICG.

Most recently I’ve been a part of Open UI - a kind of “supergroup” within WICG focused on UI Components.

There’s a lot I like about it. First, it’s not all or nothing failure. It doesn’t make assumptions about standarization. Stanadrdization into the platform itself is, we hope, a possible outcome for some thing - but there are lots of value points along the way in shorter timeframes which can potentially help shape the broader ecosystem. Its first aim is research, which will serve any library going forward at all. It’s a good place for ideas to begin to coalesce and have critical questions asked along the way to keep things on a path that could be easily paved. It should help libraries align, and hopefully shape thought and conversations in the work we all do.

But… It’s still a lot of effort.

What I learned along the way…

What I found along the way is that all of my efforts to shape a process change the calculus somehow in useful ways, but the fundamental time budget/funding issues don’t really go away. In practice, things are moved forward by people who show up and find effective ways to move something forward. That is already terribly hard for the people who work for browsers, but consider what a head start they already have here. First, they have a ton of deep background and insights gained from the fact that they are deeply plugged in. They know about past efforts, related work, engine internals, gotchas to avoid and so on. Thus, they can use their time comparatively efficiently.

It’s probably worth noting here how many of those people tend to not be keen to start efforts sometimes. This can be frustrating and time consuming. A lot of it is not in your control. Ideas don’t pan out, even for them - and that can be terribly frustrating and demoralizing if you have personally invested a lot.

… And remember, they’re getting paid for it.

How much more then is this asking of someone who isn’t? The truth is, significant developer/designer involvement is kind of disadvantaged from the start and only gets worse.

One improvement might be to sponsor or at least incentivize continued involvement. That would be a lot healhier and maybe make this more plausible.

So check out The UI Fund - and I encourage you to look into this if you’re doing work already and not getting paid, or to look into getting some sponsorship for that stuff you’ve always wanted to try getting involved with.

December 07, 2021 05:00 AM

Tim Chevalier

Further adventures in gdb

So, that updating-every-day plan hasn’t exactly worked out. Working on the code is so much fun that it’s hard to stop, and then I have to choose between going to bed and writing a blog post.

I’m currently working on generating code for the three opcodes related to records: AddRecord, FinishRecord, and AddRecordProperty. The work I wrote about in previous posts wasn’t exactly complete, because it allowed code generation in the presence of records, but I just plugged in no-op implementations for the tryAttachStub() methods in CacheIR.cpp that actually do the work. I still had to change some code generation methods so that existing code could work with records and tuples, but the really fun stuff is what I’m doing now!

Rather than try to cover everything, I’ll summarize my notes from the end of last week. I had added two methods to BaselineCacheIRCompiler.cpp, which is the part of the baseline compiler that generates baseline code from CacheIR code. As an initial test case, I was using a function that just returns an empty record literal:

function f() { x = #{}; }

since I hadn’t yet implemented code generation for adding record properties. The JavaScript bytecode for this function looks like:

loc     op
-----   --
main:
00000:  BindGName "x"                   # GLOBAL
00005:  InitRecord 0                    # GLOBAL 
00010:  FinishRecord                    # GLOBAL 
00011:  SetGName "x"                    # 
00016:  Pop                             # 
00017:  RetRval                         # 

The InitRecord 0 instruction creates an uninitialized record of length 0, and the FinishRecord instruction sets internal flags to make the record read-only. In a non-empty record, one or more AddRecordProperty instructions would appear in between; it would be a dynamic error to invoke another AddRecordProperty instruction again after the FinishRecord.

The bytecode above is from executing dis(f) in the interpreter; I also discovered that you can do disnative(f) after CacheIR and then native code are generated for f. This prints out the generated assembly code; it’s actually much easier to read if you execute print(disnative(f)). Invoking the interpreter with the environment variable IONFLAGS=codegen also prints out the code as it’s generated, which is sometimes more useful since it can be interleaved with other debug messages that show which methods are generating which bits of code.

I was getting a segfault, meaning that the code being generated by my emitNewRecordResult() method didn’t match the expectations that the code generated by my emitFinishRecordResult() method had about the data. So I fired up gdb, as I had done before. Trying to zero in on the problem, I was looking at some generated code that took its input in the %rax register (gdb uses the $rax syntax instead.) With careful use of casts and calling accessor methods, it’s possible to look at the C++ representations of the objects that generated code creates. I found it helpful to look at the value of this expression:

(gdb) p (*(*((JSObject*) $rax)).shape()).getObjectClass()
$9 = (const JSClass *) 0x5555590a6130 

From context, I was able to figure out that 0x5555590a6130 was the address of the JSClass structure for the ArrayObject type. This wasn’t what I expected, since the input for this code was supposed to be a record (with type RecordType in C++).

Since this was a few days ago, I’ve lost track of the logical steps in between, but eventually, I put a breakpoint in the createUninitializedRecord() method that I’d added to MacroAssembler.cpp; shape and arrayShape are variable names referring to two registers I allocated within that method.

Thread 1 "js" hit Breakpoint 1, js::jit::MacroAssembler::createUninitializedRecord (
    this=0x7fffffffae00, result=..., shape=..., arrayShape=..., temp=..., temp2=..., 
    recordLength=0, allocKind=js::gc::AllocKind::OBJECT4_BACKGROUND, 
    initialHeap=js::gc::DefaultHeap, fail=0x7fffffffaa38, allocSite=...)
    at /home/tjc/gecko-fork/js/src/jit/MacroAssembler.cpp:493
warning: Source file is more recent than executable.
493   allocateObject(result, temp, allocKind, 0, initialHeap, fail, allocSite);
(gdb) p shape
$10 = {reg_ = js::jit::X86Encoding::rcx, static DefaultType = js::jit::RegTypeName::GPR}
(gdb) p arrayShape
$11 = {reg_ = js::jit::X86Encoding::rcx, static DefaultType = js::jit::RegTypeName::GPR}
(gdb) quit

This debug output shows that the register allocator assigned both shape and arrayShape to the same register, %rcx. As a result, the code I generated to initialize the shapes was using the same shape for both the record object itself, and the internal sortedKeys array pointed to by the record’s sortedKeys field — causing createUninitializedRecord() to return something that looked like an array rather than a record.

The reason this was happening was the following code that I wrote:

 
AutoScratchRegisterMaybeOutput shape(allocator, masm, output);
AutoScratchRegisterMaybeOutput arrayShape(allocator, masm, output);

The AutoScratchRegisterMaybeOutput class (defined in CacheIRCompiler.h) re-uses the output register if possible, so calling it twice allocates both symbolic names to the same register. That wasn’t what I wanted.

The solution was to use the AutoScratchRegister class instead, telling the register allocator that shape and arrayShape should live in distinct registers. After making that change, repeating the original experiment gives:

(gdb) p (*(*((JSObject*) $rcx)).shape()).getObjectClass()
$5 = (const JSClass *) 0x5555590b5690 
(gdb) c
Continuing....
(gdb) p (*(*((JSObject*) $rbx)).shape()).getObjectClass()
$8 = (const JSClass *) 0x5555590a60f0 
(gdb) 

(To be sure that $rcx and $rbx were the registers allocated to shape and arrayShape, I had to set a breakpoint in my createUninitializedRecord() method again and look at the values of these variables.)

I hope this might be marginally useful to somebody trying to debug SpiderMonkey baseline code generation, or if nothing else, useful to me if I ever have to set aside this work for a few months (let’s be honest, weeks) and come back to it!

by Tim Chevalier at December 07, 2021 02:54 AM

December 04, 2021

Clayton Craft

Quick n' Dirty Mobile IRC/Matrix via Weechat

Due to a current lack of usable Weechat relay clients on Linux that work well with mobile display sizes, and a lack of free time on my part to write one, I've come up with this simple (albeit not elegant) way to "run" Glowing Bear in a way that doesn't take up valuable tab space in Firefox. This essentially just runs a new Firefox window in kiosk mode, so that the tab bar, menus, etc are hidden, and makes it "feel" a bit more like a "native app" than a web thing running in a browser tab.

~/.local/share/applications/weechat.desktop:

[Desktop Entry]
Name=Weechat
Exec=firefox --kiosk --new-window https://glowing-bear.org
Terminal=false
Type=Application
StartupNotify=true
X-Purism-FormFactor=Workstation;Mobile;

I did experiment with using webkit2gtk and having a very simple wrapper to load Glowing Bear, but the performance of webkit2gtk is not great... scrolling in Glowing Bear was very slow even on the Librem 5.

December 04, 2021 12:00 AM

December 03, 2021

Miyoung Shin

Cache the directionality of the HTML element

Now Chromium supports :dir pseudo-class as experimental (--enable-blink-features=CSSPseudoDir) since M89. I actively worked on this feature funded by eyeo. When I was working on :dir pseudo-class on Chromium, it was necessary to solve issues regarding the performance and functionality of existing direction implementations. This posting is about explaining these issues and how I had solved them in more detail. I also had a talk about this at BlinkOn14.

What is directionality?

All HTML elements have their own directionality. Let’s see a simple example.
1) LTR
When we add explicitly a dir attribute to an element like <div dir="ltr">, or in cases where an element that doesn’t have the dir attribute inherits it from its parent that has ltr direction, the element’s directionality is ltr.
2) RTL
When we add explicitly the dir attribute to an element like <div dir="rtl">, or when an element that doesn’t have the dir attribute inherits it from its parent that has rtl direction, the element’s directionality is rtl.
3) dir=auto
The element’s directionality is resolved through its own descendants, and it is determined as the directionality of the closest descendant that can determine directionality among the descendants. This is usually a character with bidirectional information.

How do elements have directionality in Blink?

Before the element had a caching value for its direction, it depended on the direction property in ComputedStyle, and there were functionality and performance problems.
  • We didn’t know the directionality if ComputedStyle was not created yet.
  • We used it with the direction property of CSS in ComputedStyle, but not exactly the same with the element’s directionality.
  • There were cases when the directionality of the element needed to be recalculated even when there is no change in the element.
  • We need to recalculate more for :dir pseudo-class.
To solve these problems, we now cache the directionality of the element. We can see the detailed signatures in node.h. There are the rules to cache the element directionality like
  • If dir=ltr|rtl, update the directionality of the element and its descendant immediately.
  • If dir=auto, resolve the directionality and update the element and its descendant immediately.
  • If no valid dir, use the directionality of the parent before parsing children.
  • If the child is inserted with no valid dir, use the directionality of the parent.
  • Complete caching the directionality of all elements before calculating the style.
Note: We have exception handling for the <slot> element to access it via the flattened tree, but it’s still in the middle of the implementation since it is still unclear and under discussion on the Web Spec.

Improved the performance for dir=auto by reducing a calculation of the directionality

Unfortunately, there are no comprehensive measurements of how much performance has improved after caching the directionality of all elements, because it is tricky to measure. However, theoretically, this is clearly a way to improve performance. In addition, a few additional patches had a chance to measure performance and we were able to see improvement. For instance, we do not adjust directionality if the new text’s direction is the same as the old one, or doesn’t have a strong directionality and drop out of the tree walk if the node changed children is passed during traversing. The following results were obtained. You can also watch the talk online (47:40 min) presented on BlinkOn14. I will post about :dir pseudo-class when we clarify <slot> and support the feature by default on Chromium. Thanks all!

by mshin at December 03, 2021 04:15 AM

December 02, 2021

Danylo Piliaiev

:tada: Turnip is Vulkan 1.1 Conformant :tada:

Khronos submission indicating Vulkan 1.1 conformance for Turnip on Adreno 618 GPU.

It is a great feat, especially for a driver which is created without hardware documentation. And we support features far from the bare minimum required for conformance.

But first of all, I want to thank and congratulate everyone working on the driver: Connor Abbott, Rob Clark, Emma Anholt, Jonathan Marek, Hyunjun Ko, Samuel Iglesias. And special thanks to Samuel Iglesias and Ricardo Garcia for tirelessly improving Khronos Vulkan Conformance Tests.


At the start of the year, when I started working on Turnip, I looked at the list of failing tests and thought “It wouldn’t take a lot to fix them!”, right, sure… And so I started fixing issues alongside of looking for missing features.

In June there were even more failures than there were in January, how could it be? Of course we were adding new features and it accounted for some of them. However even this list was likely not exhaustive because for gitlab CI instead of running the whole Vulkan CTS suite - we ran 1/3 of it. We didn’t have enough devices to run the whole suite fast enough to make it usable in CI. So I just ran it locally from time to time.

1/3 of the tests doesn’t sound bad and for the most part it’s good enough since we have a huge amount of tests looking like this:

dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_copy
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_copy_format_list
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_load
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_load_format_list
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_texture
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_texture_format_list
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_copy
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_copy_format_list
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_load
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_load_format_list
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_texture
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_texture_format_list
...

Every format, every operation, etc. Tens of thousands of them.

Unfortunately the selection of tests for a fractional run is as straightforward as possible - just every third test. Which bites us when there a single unique tests, like:

dEQP-VK.fragment_operations.early_fragment.no_early_fragment_tests_depth
dEQP-VK.fragment_operations.early_fragment.no_early_fragment_tests_stencil
dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_depth
dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_stencil
dEQP-VK.fragment_operations.early_fragment.no_early_fragment_tests_depth_no_attachment
dEQP-VK.fragment_operations.early_fragment.no_early_fragment_tests_stencil_no_attachment
dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_depth_no_attachment
dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_stencil_no_attachment
...

Most of them test something unique that has much higher probability of triggering a special path in a driver compared to uncountable image tests. And they fell through the cracks. I even had to fix one test twice because the CI didn’t run it.

A possible solution is to skip tests only when there is a large swath of them and run smaller groups as-is. But it’s likely more productive to just throw more hardware at the issue =).

Not enough hardware in CI

Another trouble is that we had only one 6xx sub-generation present in CI - Adreno 630. We distinguish four sub-generations. Not only they have some different capabilities, there are also differences in the existing ones, causing the same test to pass on CI and being broken on another newer GPU. Presently in CI we test only Adreno 618 and 630 which are “Gen 1” GPUs and we claimed conformance only for Adreno 618.

Yet another issue is that we could render in tiling and bypass (sysmem) modes. That’s because there are a few features we could support only when there is no tiling and we render directly into the sysmem, and sometimes rendering directly into sysmem is just faster. At the moment we use tiling rendering by default unless we meet an edge case, so by default CTS tests only tiling rendering.

We are forcing sysmem mode for a subset of tests on CI, however it’s not enough because the difference between modes is relevant for more than just a few tests. Thus ideally we should run twice as many tests, and even better would be thrice as many to account for tiling mode without binning vertex shader.

That issue became apparent when I implemented a magical eight-ball to choose between tiling and bypass modes depending on the run-time information in order to squeeze more performance (it’s still work-in-progress). The basic idea is that a single draw call or a few small draw calls is faster to render directly into system memory instead of loading framebuffer into the tile memory and storing it back. But almost every single CTS test does exactly this! Do a single or a few draw calls per render pass, which causes all tests to run in bypass mode. Fun!

Now we would be forced to deal with this issue since with the magic eight-ball games would partly run in the tiling mode and partly in the bypass, making them equally important for real-world workload.

Does conformance matter? Does it reflect anything real-world?

Unfortunately no test suite could wholly reflect what game developers do in their games. However, the amount of tests grows and new tests are getting contributed based on issues found in games and other applications.

When I ran my stash of D3D11 game traces through DXVK on Turnip for the first time - I found a bunch of new crashes and hangs but it took fixing just a few of them for majority of games to render correctly. This shows that Khronos Vulkan Conformance Tests are doing their job and we at Igalia are striving to make them even better.

by Danylo Piliaiev at December 02, 2021 10:00 PM

Samuel Iglesias

VK_EXT_image_view_min_lod Vulkan extension released

One of the extensions released as part of Vulkan 1.2.199 was VK_EXT_image_view_min_lod extension. I’m happy to see it published as I have participated in the release process of this extension: from reviewing the spec exhaustively (I even contributed a few things to improve it!) to developing CTS tests for it that will be eventually merged to the CTS repo.

This extension was proposed by Valve to mirror a feature present in Direct3D 12 (check ResourceMinLODClamp here) and Direct3D 11 (check SetResourceMinLOD here). In other words, this extension allows clamping the minimum LOD value accessed by an image view to a minLod value set at image view creation time.

That way, any library or API layer that translates Direct3D 11/12 calls to Vulkan can use the extension to mirror the behavior above on Vulkan directly without workarounds, facilitating the port of Direct3D applications such as games to Vulkan. For example, projects like Vkd3d, Vkd3d-proton and DXVK could benefit from it.

Going into more details, this extension changed how the image level selection is calculated and sets an additional minimum required in the image level for integer texel coordinate operations if it is enabled.

The way to use this feature in an application is very simple:

  • Check the extension is supported and if the physical device supports the respective feature:
// Provided by VK_EXT_image_view_min_lod
typedef struct VkPhysicalDeviceImageViewMinLodFeaturesEXT {
    VkStructureType    sType;
    void*              pNext;
    VkBool32           minLod;
} VkPhysicalDeviceImageViewMinLodFeaturesEXT;
  • Once you know everything is working, enable both the extension and the feature when creating the device.

  • When you want to create a VkImageView that defines a minLod for image accesses, then add the following structure filled with the value you want in VkImageViewCreateInfo’s pNext.

// Provided by VK_EXT_image_view_min_lod
typedef struct VkImageViewMinLodCreateInfoEXT {
    VkStructureType    sType;
    const void*        pNext;
    float              minLod;
} VkImageViewMinLodCreateInfoEXT;

And that’s all! As you see, it is a very simple extension.

Happy hacking!

December 02, 2021 02:48 PM

November 29, 2021

Brian Kardell

Webrise

Webrise

Something in a recent piece by Jeremy Keith really clicked with something I’ve been thinking about, so I thought I’d write about it.

Recently Jeremy Keith published “The State of the Web”. It’s based on an opening talk from the An Event Apart Spring Summit earlier this year. He’s also made it available in audio format. Jeremy is a great story teller/writer/speaker, so it is unsurpringly a delightful read/listen, and I couldn’t recommend it enough. In it, he (beautifully) explains a lot about perspective. He holds up that

“Astronauts have been known to experience something called the overview effect. It’s a profound change in perspective that comes from seeing the totality of our home planet in all its beauty and fragility.”

He notes that the famous “earthrise” photo (below) that the astronauts took, gave everyone here on earth a very small taste of that too.

Earthrise, taken on December 24, 1968, by Apollo 8 astronaut William Anders.

Then he asks...

“I wonder if it’s possible to get an overview effect for the World Wide Web?”

When I heard this, I realized: This is exactly what I have been trying to get at too, just from a different angle.

Zoom out...

I have been trying to ask people to put aside a lot of the conversations that we typically have for a moment. Zoom out, and see the whole ecosystem.

It’s not that all of things we talk about today are unimportant, in fact, some of them are profoundly important - but when you zoom way out and look at the whole thing, you gain some new perspective on all of them... And more.

Gaining new perspective can have big impacts. In How we got to now Steven Johnson describes how mirrors, and the simple ability to see one’s self ultimately impacted art, literature and politics. It literally helped shape the world in profound ways.

Since coming to work at Igalia, I’ve gotten the rare privilidge to observe the web ecosystem from a whole new point of view. That is, not the site and pages, but what makes all of that stuff possible and holds it all together? This has caused exactly that sort of “overview effect” shift for me, and really want to share it.

There is, unfortunately, no camera with which to snap a nice neat “Webrise” photo that I can distribute, nor a mirror I can just show people. So, I’ll try to use words.

New Perspective

We spend so much time discussing particular details: “Why are none of them giving time to feature Q?”. Or, “Why does z push so many (or few) features?”. Or even, “Why are they all doing x instead of y?!”. We imagine larger motives. We fill volumes with debates.

But, from my vantage point, I see something that informs all of those, and seems far more important.

We’ve built the web so far on a very particular model, with web engine implementers at the center. The whole world is leaning extremely heavily on the indepenendent, voluntary funding (and management and prioritization) of a very few steward organizations.

Maybe that isn’t a great idea.

Fragility

However much we might have convinced ourselves that this is how it should work, it feels increasingly bad to me. It seems like it is not a lasting strategy, and we really need one that is. Web engines have to outlast each of those voluntary investments/investors. The current situation feels precarious.

There are, of course, a lot of variables, but I can easily imagine a lot of ways that it could all come apart - either with a bang, or a whimper.

Imagine, for example, that Apple is convinced to match or even exceed Google’s contributions. Yay! A boom in innovation! Great! It seems hard to imagine Mozilla not being left behind. Maybe Google responds in kind and we enter a sort of “new arms” race. Again, great for a lot of things, but sustainability isn’t one. In this scenario it feels almost certain to me that Mozilla (the only foundation here) is the first casuality, but maybe not the last. The aim of a war is to win. And then what? Microsoft won the first browser wars, and then left the game.

Or perhaps legislation hits Google’s default search deal and seriously disrupts things in the ecosystem. That’s pretty much all of the actual funding for Mozilla. Uh oh. Interestingly too though, Apple’s entire earnings would suddenly dip by something like 15-20%. Yikes! Perhaps there are simply changes in leadership. Or several of these happen together. All sorts of things like these cause companies to re-evaluate what they’re spending money on. If any of these things caused reevaluation by either company, it’s not impossible for me to imagine them deciding that maybe the costs outweigh the benefits of maintaining an engine in the first place. That’s an entirely normal response and there are historical precedents: Opera did it. Microsoft did it. And the problems in this space only get harder and harder (see next section).

The only thing I can say for sure is that things will change for businesses we’re leaning on. In fact, things have changed. In 1993 when the web was still in its infancy - Microsoft had just entered the top 10 by market cap. By actual revenue, they weren’t even on the list. In fact, the first company who makes an engine to appear on the top 10 list by revenue was Apple, in 2014… At the same time, there are several other tech companies who are also on the list who don’t invest in implementations at all. Many have come and gone since. It is incredibly rare for a business to stay on the Fortune 500 list (let alone dominate it) for more than 10-15 years. When this status is lost, actions and reevaluation usually follow and as a result key dominating names in computing have disappeared entirely.

Not just more sustainable… More.

Being reliant on the historical model isn’t just possibly precarious in the long term - it also has definite limits. All of the engine teams, no matter how big, have to do some fairly radical prioritorization. The backlog is already miles long, and subject to lots of filters. It’s not just how big a team is, but tons of mundane things about the makeup, expertise, and often the vision and current state of some area of code in their engine.

A really basic implication of this is that rollouts of features can be extremely ragged, but it’s much more than that. It also means that they have to short circuit things where they can. Even in standards discussion, it means a lot of potentially good stuff just can’t get discussed. In the end, it's hard to work all of those things in a way that it’s easy to say is representative of everyone if only a few are investing in the commons.

Solvable, but not solved anywhere

Luckily, this is all very solvable, and is very much in our control. To some extent we’ve already started to address it: There are today, more limited partners too. I think that somehow people have impressions of this, but we don’t talk about the actual details much, so let’s…

It’s held up, for example, that Microsoft, Samsung and Intel are all Chromium partners. That’s great, and that isn’t even counting Igalia, who has, for the last few years, made more contributions than anyone outside of Google.

Others, I’ve heard say that Mozilla has a veritable army of independent contributors too.

Conversely, I often hear WebKit described as “mainly an Apple thing”. However, there are partners there too. Igalia, Sony and RedHat all contribute significantly, for example.

However, if we look at commits: Over 80% of contributions to Chromium come from Google, about 77% of contributions in WebKit come from Apple, and at Mozilla Central - about 82% of commits are from Mozillans.

In other words, they aren’t all that different in terms of diversity of investment. Each engine project would seem to have about 20-25% of its investment diversified. That’s way better than exclusive investment, but I think we’ve still got a long way to go.

One obvious solution seems to be for existing implementation partners to simply ramp up contribution budgets.

Sure that would be great on its own, but that’s still a really small number of organizations. There are many big tech web companies who aren’t on that list at all, at lest one of them is in the “trillion dollar club”.

Imagine what we could do if we changed our perspective, and built a model in which we invested and prioritized more collaboratively. Imagine how much more resilient that would be.

Collective Funding for Collective Benefits

We’ve spent a lot of time trying to solve problems together in standards, but we don’t then also act together. But… we could.

In fact, why should we stop at a dozen big tech companies making gigantic or general investments? We could decide that investments, could also be shared in different ways. Funding doesn’t need to come from giant sources, or to be generally purposed. 10 companies agreeing to invest $10k apiece to advance and maintain some area of shared interest is every bit as useful as 1 agreeing to invest $100k generally. In fact, maybe it’s more representative.

Igalia has help advanced things that boost capabilities for everyone by working with individual organizations, often much smaller ones, who have considerably finite asks: More responsive cable box interfaces or more fluid SVG interfaces on their cooking machines. We do this precisely because we can see the interconnectedness of it.

We believe that there is a very long tail of increasingly smaller companies who could do something, if only they coordinated to fund it together. The further we stretch this out, the more sources we enable, the more its potential adds up.

That’s part of what our Open Prioritization efforts have been about. We’re trying to shine light on this in different ways, open new doors and help people see the web ecosystem from a different perspective.

My colleage, Eric Meyer recently gave a talk to W3C member orgnizations on this topic, and we did a podcast together on it too, as part of announcing our new MathML-Core Support Collective. You can find links to both and learn more about it in this announcement.

If you find this interesting, please let us know. Consider talking to orgnanizations interested in promoting the advancement of more rapidly interoperable and standard and accessible mathematical support on the web by adding some supporting funding through the collective - but also to organizations who aren’t interested in math specifically about the bigger idea. I’m hopeful that we can shift our perspective.

November 29, 2021 05:00 AM

November 23, 2021

Alexander Dunaev

Drop shadows on Linux, or why standards are good

Since the origins of graphical desktop environments, there were two approaches to styling GUI of an application: using the standard system toolkit versus choosing a custom one.

When a single platform is targeted, choosing the approach is often the matter of aesthetics or some particular features that may be supported in certain toolkits. The additional cost of adopting a custom toolkit may actually be a one-time investment, and if the decision to use it is taken at the right time, the cost may be low. However, when it comes to cross-platform applications, using a cross-platform toolkit is the obvious choice.

GUI toolkits do a good job at rendering the contents of the window, but there is an area where they usually step aside: window decorations. Even if we look at cross-platform toolkits, the best they can do is provide some façade for the standard options available on supported platforms. But what if we want to customise everything?

Let us take a look at some random window in a modern desktop environment.

This is KCalc, the standard calculator application built into KDE Plasma desktop environment.

KCalc, the standard application built into KDE Plasma

What if we wanted to replicate that on our own? At first glance, no big deal. Drawing the title bar would not be that difficult, as long as we render everything in the window. The border is easy too, and rounded corners are also feasible if the window manager supports transparency.

But the window also has a drop shadow. We have to render it too, and this is where things become tricky.

KCalc vs. Chromium.  Drop shadows look quite different.

KCalc vs. Chromium, note how different the shadows are

Yes, the drop shadow is essentially just one more area inside the window, we have to render it, and also we have to make things around it work smoothly. The inner strip of the shadow should work as the frame of the window where the user would see the resize mouse pointer (and it should work that way), while the outer part should be totally transparent for the mouse events, but not totally—to the user’s eye.

The outermost rectangle is the edge of the “real” window; the innermost one is the “logical” one. The narrow strip (partially striped) that borders the logical window is the resize area.

Basically, to be able to do what we have just explained, we need two things. The first one is support for transparency in the window manager. The second one is some way to tell the window manager where our “logical” window resides within the “real” one, so that the environment could correctly snap our window to the edge of the screen or to other windows when we drag it there. (The inner part that makes sense as a window to the user is often called “window geometry”.)

On Wayland, transparency is always supported (yay!), and the concept of the window geometry is part of the desktop shell protocol, such as xdg_wm_base. Both requirements are met.

On X11 it is more complicated. First, transparency is not always supported, but let us assume that we have that support, otherwise we cannot have any shadows. The major pain is setting the window geometry, or to say better, the lack (at the moment of writing) of a standard way to do so. There is a _GTK_FRAME_EXTENTS window property that, as its name suggests, was once introduced in GTK. There it seems to be used to define margins at the edges of the window—you may ask, “it seems”? Are you not certain? Well, yes, because that property is not documented. There are a few other posts about this issue on the internet. I would recommend What are _GTK_FRAME_EXTENTS and how does Gnome Window Sizing work? by Erwin and CSD support in KWin by Vlad Zahorodnii.

Currently _GTK_FRAME_EXTENTS is supported by GNOME (naturally) and KDE Plasma (reverse engineered). In other desktop environments (or to say better, in window managers other than Mutter and KWin) setting it may cause weird issues.

Precisely that issue is what happened to Chromium.

In regards to the window decorations, the Linux port of Chromium was a bit backwards for a very long time. It had an old style thick frame with sharp corners and without the drop shadow. Finally, that had been improved, and the modern window decorations were shipped in Chromium version 94. The new implementation used _GTK_FRAME_EXTENTS to define the shadow area.

Soon after that, a bug report came from users of Enlightenment. In that environment things inside the Chromium window went mad, mouse clicks strayed from the actual position of the pointer. The quick investigation (it was really quick thanks to the help of people who reported the problem) showed that the culprit was that very window property. The window manager got confused when the frame extents were set to zeros for a maximised window, instead it expected the property to be reset completely.

Soon after we landed the fix, and people from Enlightenment confirmed that the issue was resolved, another bug report came, this time from Xfce. There, the investigation was a bit longer, but finally we found (thanks to the help of people who reported the problem and to the maintainers of the window manager) that the window manager in that environment actually expects quite the opposite: for the maximised window it wants all zeros, and gets confused if the property is reset completely.

The situation came to a dead end. Two window managers wanted exactly the opposite things. What could be done to resolve the issue? We could easily end up having workarounds for every non-standard window manager, which is one of the most unpleasant situations in software maintenance.

Luckily, the maintainers of Xfwm4 (the window manager in Xfce) suggested fixing the issue from their side, and landed the fix really promptly. So this story has a happy end!

Or rather, the story will have a happy end, because we still had to put in a workaround for Xfwm4 that disables window decorations on that window manager. The workaround is temporary, and we will remove it once Linux distributions that base on Xfwm4 adopt the fix.

by Alex at November 23, 2021 01:19 PM

November 19, 2021

Tim Chevalier

The emotional roller coaster that is programming

I skipped a few days’ worth of updates; turns out it’s a bit difficult to fit in time to write an entire post when your work schedule isn’t very consistent.

In the meantime, I finished implementing all the record and tuple opcodes in the JIT. Having done some manual testing, it was time to start running the existing test suite with the compiler enabled. Fortunately, I figured out the flag to pass in so that I wouldn’t have to add it to each test file by hand (which would have been bad practice anyway):

mach jstests Record --args=--baseline-eager

This runs all the tests with Record in the name and adds the --baseline-eager flag in to the JavaScript shell.

At this stage, failures are good — it means there’s still something interesting left to work on. Yay, a failure!

Hit MOZ_CRASH(Unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIR.cpp:7745
REGRESSION - non262/Record/equality.js
[7|1|0|0] 100% ======================================================>|   1.1s
REGRESSIONS
    non262/Record/equality.js
FAIL
 

Narrowing down the code that caused the failure, I got:

js> Object.is(#{x: +0}, #{x: -0})
Object.is(withPosZ, withNegZ)
Hit MOZ_CRASH(Unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIR.cpp:7745

Thread 1 "js" received signal SIGSEGV, Segmentation fault.
0x00005555583763f8 in js::jit::CallIRGenerator::tryAttachObjectIs (this=0x7fffffffcd10, callee=...)
    at /home/tjc/gecko-fork/js/src/jit/CacheIR.cpp:7745
7745            MOZ_CRASH("Unexpected type");
(gdb) 

So this told me that I hadn’t yet implemented the cases for comparing records/tuples/boxes to each other in Object.is() in the JIT.

Fixing the problem seemed straightforward. I found the CallIRGenerator::tryAttachObjectIs() method in CacheIR.cpp. The CallIRGenerator stub takes care of generating code for built-in methods as they’re called; each time a known method is called on a particular combination of operand types that’s implemented in the baseline compiler, code gets generated that will be called next time instead of either interpreting the code or re-generating it from scratch.

For example, this code snippet from tryAttachObjectIs() shows that the first time Object.is() is called with two int32 operands, the compiler will generate a version of Object.is() that’s specialized to this case and saves the need to call a more generic method and do more type checks. Of course, the generated code has to include a check that the operand types actually are int32, and either call a different generated method or generate a new stub (specialized version of the method) if not.

    MOZ_ASSERT(lhs.type() == rhs.type());
    MOZ_ASSERT(lhs.type() != JS::ValueType::Double);

    switch (lhs.type()) {
      case JS::ValueType::Int32: {
        Int32OperandId lhsIntId = writer.guardToInt32(lhsId);
        Int32OperandId rhsIntId = writer.guardToInt32(rhsId);
        writer.compareInt32Result(JSOp::StrictEq, lhsIntId, rhsIntId);
        break;
      }

The existing code handles cases where both arguments have type Int32, String, Symbol, Object, et al. So it was easy to follow that structure and add a case where both operands have a box, record, or tuple type. After a fun adventure through the MacroAssembler, I had all the pieces implemented and the test passed; I was able to apply Object.is() to records (etc.) with the baseline compiler enabled.

After that, all the tests for records passed, which isn’t too surprising since there aren’t many methods for records. Next, I tried running the tests for what’s currently called Box in the Records and Tuples proposal (subject to change), and got more failures; still a good thing.

mach-with record-tuple-with-jit jstests Box --args=--baseline-eager
[1|0|0|0]  20% ==========>                                            |   1.2s

Hit MOZ_CRASH(unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIRCompiler.cpp:1930
REGRESSION - non262/Box/unbox.js
[1|1|0|0]  40% =====================>                                 |   1.2s

Hit MOZ_CRASH(unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIRCompiler.cpp:1930
REGRESSION - non262/Box/json.js
[1|2|0|0]  60% ================================>                      |   1.3s

Hit MOZ_CRASH(unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIRCompiler.cpp:1930
REGRESSION - non262/Box/constructor.js
[2|3|0|0] 100% ======================================================>|   1.3s
REGRESSIONS
    non262/Box/unbox.js
    non262/Box/json.js
    non262/Box/constructor.js
FAIL

The common cause: generating code for any method calls on Boxes invokes GetPropIRGenerator::tryAttachPrimitive() (also in CacheIR.cpp as above), which didn’t have a case for records/tuples/boxes. (In JavaScript, a method is just another property on an object; so the GetProp bytecode operation extracts the property, and calling it is a separate instruction.) Similarly to the above, I added a case, and the code worked; I was able to successfully call (Box({}).unbox()) with the compiler enabled.

The next test failure, in json.js, was harder. I minimized the test case to one line, but wasn’t able to get it any simpler than this:

JSON.stringify(Box({}), (key, value) => (typeof value === "box" ? {x: value.unbox() } : value))

This code calls the JSON.stringify() standard library method on the value Box({}) (a box wrapped around an empty object); the second argument is a function that’s applied to the value of each property in the structure before converting it to a string. The fix I made that fixed unbox.js got rid of the MOZ_CRASH(unexpected type) failure, but replaced it with a segfault.

It took me too many hours to figure out that I had made the mistake of copying/pasting code without fully understanding it. The cached method stubs rely on “guards”, which is to say, runtime type checks, to ensure that we only call a previously-generated method in the future if the types of the operands match the ones from the past (when we generated the code for this particular specialization of the method). When making the change for Object.is(), I had looked at CacheIRCompiler.cpp and noticed that the CacheIRCompiler::emitGuardToObject() method generates code that tests whether an operand is an object or not:

bool CacheIRCompiler::emitGuardToObject(ValOperandId inputId) {
  JitSpew(JitSpew_Codegen, "%s", __FUNCTION__);
  if (allocator.knownType(inputId) == JSVAL_TYPE_OBJECT) {
    return true;
  }

  ValueOperand input = allocator.useValueRegister(masm, inputId);
  FailurePath* failure;
  if (!addFailurePath(&failure)) {
    return false;
  }
  masm.branchTestObject(Assembler::NotEqual, input, failure->label());
  return true;
}

The generated code contains a “failure” label that this code branches to when the operand inputId is not an object. (It’s up to the caller to put appropriate code under the “failure” label so that this result will be handled however the caller wants.) I copied and pasted this code to create an emitGuardToExtendedPrimitive() method (“extended primitives” are what we’re calling records/tuples/boxes for now), and changed JSVAL_TYPE_OBJECT to JSVAL_TYPE_EXTENDED_PRIMITIVE so that the code would check for the “extended primitive” runtime type tag instead of the “object” type tag. The problem is that I also needed to use something else instead of branchTestObject. As it was, whenever a stub that expects a record/tuple/box as an argument was generated, it would be re-used for operands that are objects. This is obviously unsound and, looking at the failing test case again, we can see why this code exposed the bug:

JSON.stringify(Box({}), (key, value) => (typeof value === "box" ? {x: value.unbox() } : value))

The first time the (key, value) anonymous function is called, the name value is bound to Box({}). So a stub gets generated that’s a version of the typeof operation, specialized to Box things (actually anything that’s a record, tuple, or box, for implementation-specific reasons). The stub checks that the operand is a record/tuple/box, and if so, returns the appropriate type tag string (such as “box”). Except because of the bug that I introduced, this stub got re-used for any object operands. The way that the JSON stringify code works (JSON.cpp), it calls the “replacer” (i.e. the anonymous (key, value) function) on the value of each property — but then, it calls the replacer again on the replaced value. So my generated stub that worked perfectly well for Box({}) was subsequently called on {x: {}}, which has an entirely different type; hence the segfault.

Finding this bug took a long time (partly because I couldn’t figure out how to enable the “CacheIR spew” code that prints out generated CacheIR code, so I was trying to debug generated code without being able to read it…), but I experimented with commenting out various bits of code and eventually deduced that typeof was probably the problem; once I read over the code related to typeof, I spotted that my emitGuardToExtendedPrimitive() method was calling branchTestObject(). Adding a branchTestExtendedPrimitive() method to the macro assembler was easy, but tedious, since the code is architecture-specific. It would be nice if dynamic type-testing code was automatically generated, since the code that tests for tags denoting different runtime types is all basically the same. But rather than trying to automate that, I decided it was better to bite the bullet, since I already had enough cognitive load with trying to understand the compiler as it is.

It turned out that the json.js test case, despite being designed to test something else, was perfect for catching this bug, since it involved applying the same method first to a Box and then to an object. Once I’d fixed the problem with guards, this test passed. The constructor.js test still fails, but that just means I’ll have something interesting to work on tomorrow.

Perhaps the swings from despair to elation are why programming can be so habit-forming. While trying to track down the bug, I felt like I was the dullest person in the world and had hit my limit of understanding, and would never make any further progress. When I found the bug, for a moment I felt like I was on top of the world. That’s what keeps us doing it, right? (Besides the money, anyway.)

By the way, I still don’t fully understand inline caching in SpiderMonkey, so other resources, such as “An Inline Cache Isn’t Just A Cache” by Matthew Gaudet, are better sources than my posts. I mean for this blog to be more of a journal of experimentation than a definitive source of facts about anything.

by Tim Chevalier at November 19, 2021 07:28 AM

November 17, 2021

Manuel Rego

The path of bringing :focus-visible to WebKit

Last weekend I was speaking at CSS Conf Armenia 2021 about the work Igalia has been doing adding support for :focus-visible in WebKit.

The slides of my talk are available on this blog and the video is on Igalia’s YouTube channel.

The presentation is divided in 4 parts:

  1. An introduction about :focus-visible feature, paying attention to some special details.
  2. An explanation of the Open Prioritization effort from Igalia that lead to the implementation of :focus-visible in WebKit.
  3. A summary of the work done during this year.
  4. Some discussion about the next steps looking forward to ship :focus-visbile in Safari/WebKit.

Last but not least, thanks again to all the people and organizations that have sponsored the implementation of :focus-visible in WebKit. We’re closer than ever to see it shipping there. We’ll keep you posted!

November 17, 2021 11:00 PM

November 16, 2021

Igalia Compilers Team

Recent talks at GUADEC and NodeConf

Over the summer and now going into autumn, Igalia compilers team members have been presenting talks at various venues about JavaScript and web engines. Today we’d like to share with you two of those talks that you can watch online.

First, Philip Chimento gave a talk titled “What’s new with JavaScript in GNOME: The 2021 edition” at GUADEC 2021 about GNOME’s integrated JavaScript engine GJS. This is part of a series of talks about JavaScript in GNOME that Philip has been giving at GUADEC for a number of years.

You can watch it on Youtube here and the slides for the talk are available here.

Screenshot of NodeConf 2021 talk

Second, Romulo Cintra gave a talk at NodeConf Remote 2021 titled “IPFS – InterPlanetary File System with Node.js”. In this talk, Romulo introduces IPFS: a new distributed file system protocol for sharing files and media in a peer-to-peer fashion. Romulo also talks about some of the efforts to bring this to the web (https://arewedistributedyet.com/) and goes over how IPFS can be used with Node.js.

You can watch Romulo’s talk on YouTube as well by going here.

The slides for the talk are available here or you can even use IPFS to download it: ipfs://QmQCZaHJBZVFncftY8YGsS3BEbgA9Pu6B3JT4gdE7EhELD

by Compilers Team at November 16, 2021 04:26 PM

November 10, 2021

Tim Chevalier

Adventures in gdb

I picked up from yesterday wanting to see what code was being generated for record initialization. A colleague pointed me to a page of SpiderMonkey debugging tips. This was helpful, but required being able to run the JS interpreter inside GDB and type some code into the REPL. The problem is that before it got to that point, the interpreter was trying to compile all the self-hosted code; I knew that this wasn’t going to succeed since I’ve only implemented one of the record/tuple opcodes. I wanted to be able to just do:

> x = #{}

(binding the variable x to an empty record literal) and see the generated code. But because the much-more-complicated self-hosted code has to get compiled first, I never get to that point.

Another colleague suggested looking at the IONFLAGS environment variable. This, in turn, seems to only have an effect if you build the compiler with the --enable-jitspew option. Once I did that, I was able to find out more:

$ IONFLAGS=zzzz mach run
obj-x64-debug/dist/bin/js
found tag: zzzz
Unknown flag.

usage: IONFLAGS=option,option,option,... where options can be:

  aborts        Compilation abort messages
  scripts       Compiled scripts
  mir           MIR information
    ...
    

And so on.

I found that IONFLAGS=codegen mach run would cause the interpreter to print out all the generated assembly code, including all the code for self-hosted methods. This wasn’t entirely helpful, since it was hard to see where the boundaries were between different methods.

I decided to try a different strategy and see what I could do inside gdb. I’ve avoided using debuggers as much as possible throughout my programming career. I’m a fan of printf-style debugging. So much so that I created the printf-style debugging page on Facebook. (This made more sense back when Facebook pages were “fan pages”, so you could be a “fan of” printf-style debugging.) I’ve always had the feeling that any more sophisticated debugging technology wasn’t worth the difficulty of use. Working on a compiler implemented in C++, though, it seems I’m finally having to suck it up and learn.

The first question was how to set a breakpoint on a templated function. I found the rbreak command in gdb, which takes a regular expression. I realized I could also just do:

(gdb) info functions .*emit_InitR.*
All functions matching regular expression ".*emit_InitR.*":

File js/src/jit/BaselineCodeGen.cpp:
2590:   bool js::jit::BaselineCodeGen::emit_InitRecord();
2590:   bool js::jit::BaselineCodeGen::emit_InitRecord();

File js/src/jit/BaselineIC.cpp:
2454:   bool js::jit::FallbackICCodeCompiler::emit_InitRecord();
(gdb)

So I set a breakpoint on the method I wrote to generate code for the InitRecord opcode:

(gdb) b js::jit::BaselineCodeGen::emit_InitRecord
Breakpoint 1 at 0x555558093884: file /home/tjc/gecko-fork/js/src/jit/BaselineCodeGen.cpp, line 2591.
(gdb) b js::jit::FallbackICCodeCompiler::emit_InitRecord
Breakpoint 2 at 0x5555580807b1: file /home/tjc/gecko-fork/js/src/jit/BaselineIC.cpp, line 2455.
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/tjc/gecko-fork/obj-x64-debug/dist/bin/js 
[snip]

Thread 1 "js" hit Breakpoint 2, js::jit::FallbackICCodeCompiler::emit_InitRecord (this=0x7fffffffd1b0)
    at /home/tjc/gecko-fork/js/src/jit/BaselineIC.cpp:2455
2455      EmitRestoreTailCallReg(masm);
(gdb) 

Finally! At this point, I was hoping to be able to view the code that was being generated for the empty record literal. Stepping through the code from here gave me what I was looking for:

(gdb) s
js::jit::FallbackICCodeCompiler::tailCallVMInternal (
    this=0x7fffffffd1b0, masm=..., 
    id=js::jit::TailCallVMFunctionId::DoInitRecordFallback)
    at /home/tjc/gecko-fork/js/src/jit/BaselineIC.cpp:510
510   TrampolinePtr code = cx->runtime()->jitRuntime()->getVMWrapper(id);
(gdb) n
511   const VMFunctionData& fun = GetVMFunction(id);
(gdb) n
512   MOZ_ASSERT(fun.expectTailCall == TailCall);
(gdb) n
513   uint32_t argSize = fun.explicitStackSlots() * sizeof(void*);
(gdb) n
514   EmitBaselineTailCallVM(code, masm, argSize);
(gdb) n
515   return true;
(gdb) p code
$18 = {value = 0x1e4412b875e0 "H\277"}
(gdb) p code.value
$19 = (uint8_t *) 0x1e4412b875e0 "H\277"
(gdb) x/64i code.value
   0x1e4412b875e0:  movabs $0x7ffff4219000,%rdi
   0x1e4412b875ea:  mov    0x1c0(%rdi),%rax
   0x1e4412b875f1:  mov    %rsp,0x70(%rax)
   0x1e4412b875f5:  movabs $0x55555903de60,%r11
   0x1e4412b875ff:  push   %r11
   0x1e4412b87601:  lea    0x18(%rsp),%r10
   0x1e4412b87606:  movabs $0xfff9800000000000,%r11
     

So that’s the generated code for DoInitRecordFallback (the fallback method implemented in the inline cache module of the baseline compiler), but I realized this wasn’t really what I was hoping to find. I wanted to see the intermediate representation first.

From there, I realized I was barking up the wrong tree, since the baseline compiler just goes straight from JS to assembly; only the more sophisticated compilers (which weren’t being invoked at this point) use MIR and LIR. (A blog post from Matthew Gaudet, “A Beginners Guide To SpiderMonkey’s MacroAssembler”], explains some of the pipeline.)

So at least I knew one way to get to the generated assembly code for one opcode, but it wasn’t particularly helpful. My co-worker suggested putting in no-op implementations for the other opcodes so that it would be able to compile all the self-hosted code (even if the generated code wouldn’t work). This seemed like the fastest way to get to a functioning REPL so I could experiment with simpler code snippets, and it worked. After just adding a no-op emit_ method in BaselineCodeGen.cpp for each opcode, the interpreter was able to start up.

When I typed code into the REPL, I could tell it was only being interpreted, not compiled, since everything still worked, and I would expect anything that used records/tuples except for an empty record literal to fail. I found the --baseline-eager flag with a little bit of digging, and:

obj-x64-debug/dist/bin/js --baseline-eager
js> function f() { return #{}; }
function f() { return #{}; }
js> f()
f()
Assertion failure: !BytecodeOpHasIC(op) (Missing entry in OpToFallbackKindTable for JOF_IC op), at js/src/jit/BaselineIC.cpp:353
Segmentation fault
$

Excellent! This pointed to something I didn’t change yesterday (since the compiler didn’t make me) — I had to update the OpToFallbackKindTable in BaselineIC.cpp.

Once I did that, I realized that I couldn’t get very far with just InitRecord, since I wouldn’t expect even the empty record to compile without being able to compile the FinishRecord opcode. (Since records are immutable, Nicolò’s implementation adds three opcodes for creating records: one to initialize the empty record, one to add a new record field, and one to finish initialization, the last of which marks the record as immutable so that no more fields can be added.)

So I implemented FinishRecord, similarly to the work from yesterday. Now what? I was able to type in an empty record literal without errors:

> x = #{}
#{}

But how do I know that x is bound to a well-formed record that satisfies its interface? There’s not too much you can do with an empty record. I decided to check that typeof(x) worked (it should return “record”), and got an assertion failure in the emitGuardNonDoubleType() method in CacheIRCompiler.cpp). It took me some time to make sense of various calls through generated code, but the issue was the TypeOfIRGenerator::tryAttachStub() method in CacheIR.cpp:

AttachDecision TypeOfIRGenerator::tryAttachStub() {
[...snip...]
  TRY_ATTACH(tryAttachPrimitive(valId));
  TRY_ATTACH(tryAttachObject(valId));

  MOZ_ASSERT_UNREACHABLE("Failed to attach TypeOf");
  return AttachDecision::NoAction;
    }
    

This code decides, based on the type of the operand (valId) whether to use the typeOf code for primitives or for objects. The record/tuple implementation adds “object primitives”, which share some qualities with objects but aren’t objects (since, among other things, objects are mutable). The tryAttachPrimitive() call was successfully selecting the typeOf code for primitives, since the isPrimitive() method on the Value type returns true for object primitives. Because there was no explicit case in the code for records, the code for double values was getting called as a fallback and that’s where the assertion failure was coming from. Tracking this down took much more time than actually implementing typeOf for records, which I proceeded to do. And now I can get the type of a record-valued variable in compiled code:

js> x = #{}
    #{}
js> typeof(x)
"record"

This provides at least some evidence that the code I’m generating is laying out records properly. Next up, I’ll try implementing the opcode that adds record properties, so that I can test out non-empty records!

by Tim Chevalier at November 10, 2021 06:06 AM

November 09, 2021

Tim Chevalier

Adding record and tuple support to the JIT

Today I started working on implementing the Record and Tuples proposal for JavaScript in the JIT in SpiderMonkey. All of this work is building on code written by Nicolò Ribaudo, which isn’t merged into SpiderMonkey yet but can be seen in patches linked from the Bugzilla bug.

Up until now, SpiderMonkey would automatically disable the JIT if you built it with the compile-time flag that enables records and tuples. Currently, the interpreter implements records and tuples, but not the compiler. I started by searching through the code to figure out how to re-enable the JIT, but realized it would be faster to look through the commit history, and found it in js/moz.configure. (If you try to follow along, you won’t be able to see some of the code I’m referring to since it’s in unapplied patches, but I’m including some links anyway to give context.)

I saw that if I just pass in the --enable-jit build flag explicitly, it should override what the config file said, and it indeed did. I decided to operate on the assumption that the compiler error messages would tell me what I needed to implement, which isn’t always a safe assumption when working in C/C++, but seems to have served me okay in my SpiderMonkey work so far.

The first set of compiler errors I got had to do with adding the IsTuple() built-in method to the LIR. (The MIR and LIR, two of the intermediate languages used in SpiderMonkey, are explained briefly on the SpiderMonkey documentation page.) This involved implementing EmitObjectIsTuple() and visitIsTuple methods in CodeGenerator.cpp, part of the baseline compiler (the documentation also explains the various compilers that make up the JIT). That was straightforward, since IsTuple() is just a predicate that returns true for tuple arguments and false for arguments of any other type. When I implemented this method before, I chose to implement it as a JS_INLINABLE_FN, not knowing what I was getting myself into. With JIT disabled at compile time, the compiler made me implement it down to the MIR level, but now I had to implement it in LIR.

Once that was done, I ran the interpreter and got an assertion failure: "Hit MOZ_CRASH(Record and Tuple are not supported by jit) at gecko-fork/js/src/jit/BaselineCodeGen.cpp:2589". This was excellent, since it told me exactly where to start. When I looked at BaselineCodeGen.cpp, I saw that the seven opcodes for records and tuples were all defined with the UNSUPPORTED_OPCODE macro, so I planned to proceed by removing each of the UNSUPPORTED_OPCODE calls one-by-one and seeing what that forced me to implement.

I started with the InitRecord opcode, which as you might guess, creates a new record with a specified number of fields. As a strategy, I followed the pattern for the existing NewArray and NewObject opcodes, since creating new arrays and objects is similar to creating new records.

By following the error messages, I found the files that I needed to change; I’m putting this list in logical order rather than in the order that the compile errors came up, which was quite different.

  • VMFunctionList-inl.h — added the RecordType::createUninitialized C++ function to the list of functions that can be called from the JIT
  • VMFunctions.h — added a TypeToDataType case for the RecordType C++ type
  • BaselineCodeGen.cpp, where I added an emit method for InitRecord
  • BaselineIC.cpp, and CacheIR.cpp, where I added code to support inline caching (explained here) for InitRecord.
  • MIROps.yaml, the file that defines all MIR opcodes; a lot of other code is automatically generated from this file. I had to add a new InitRecord opcode.
  • MIR.cpp — MInitRecord methods
  • MIR.h, where I had to define a new MInitRecord class, and MIR.cpp, where I had to implement the class.
  • Lowering.cpp, where I added code to translate the MIR representation for an InitRecord call to LIR.
  • LIROps.yaml, similarly to MIROps.yaml.
  • CodeGenerator.cpp, where I added the visitInitRecord method that translates the LIR code to assembly.
  • Recover.cpp — while I don’t understand this code very well, I think it’s what implements the “bailout” mechanism described in the docs. Similarly to the other modules, I had to add methods for InitRecord and a new class to the accompanying header file.

I love compiler errors! Without static typechecking, I wouldn’t have any information about what parts of the code I needed to change to add a new feature. As a functional programmer, I normally don’t give C++ a lot of credit for static typechecking, but whether it’s about modern language features or the coding style used in SpiderMonkey (or both), I actually find that I get a lot of helpful information from type error messages when working on SpiderMonkey. Without static type errors, I would have had to understand the JIT from the top down to know what parts I needed to change, maybe by reading through the code (slow and tedious) or maybe by reading through documentation (likely to be out of date). Types are documentation that can’t fall out of date, since the compiler won’t generate code for you if you give it something that doesn’t typecheck.

Once everything compiled and I started the interpreter again, I got a different assertion failure:

"Assertion failure: BytecodeOpHasIC(op), at /home/tjc/gecko-fork/js/src/jit/BaselineCodeGen.cpp:649"

This pointed to the final change, in BytecodeLocation.h. I had added the code for inline caching, but hadn’t updated the opcode table defined in this file to indicate that the InitRecord opcode had an inline cache. Since the relationship between this table and the code itself exists only in the programmers’ heads, there’s no way for the compiler to check this for us.

Once I fixed this and started the interpreter again, I got a new error:

Hit MOZ_CRASH(Record and Tuple are not supported by jit) at /home/tjc/gecko-fork/js/src/jit/BaselineCodeGen.cpp:2604
Thread 1 "js" received signal SIGSEGV, Segmentation fault. 0x000055555809ce62 in js::jit::BaselineCodeGen::emit_AddRecordProperty (this=0x7fffffffd080) at /home/tjc/gecko-fork/js/src/jit/BaselineCodeGen.cpp:2604 2604 UNSUPPORTED_OPCODE(AddRecordProperty)

This is just saying that AddRecordProperty is an unsupported opcode, which is what I would expect since I only implemented one of the record/tuple opcodes. So that means that after my changes, SpiderMonkey was able to generate code for the InitRecord opcode. (The reason why these errors showed up as soon as I launched the interpreter, without having to execute any code, is that at startup time with JIT enabled, the interpreter compiles all the self-hosted libraries, which are implemented in JavaScript. Since on my working branch, there is library code that uses the Record and Tuple types, that means that the code path leading to those UNSUPPORTED_OPCODES was guaranteed to be reached.)

So what do I know now? The JIT seems to be able to generate code for the InitRecord opcode, at least for the first occurrence of it in the self-hosted libraries. Whether that code works (that is, implements the semantics in the spec) is a separate question. To know the answer, I would have to look at the generated code — I won’t be able to actually test any code in the interpreter until I implement all the opcodes, since each one will subsequently fail with the same error message as above. But that’s for another day.

by Tim Chevalier at November 09, 2021 05:32 AM

November 08, 2021

Tim Chevalier

Hello, world!

It’s been a long time since I’ve blogged regularly, especially about software. When I worked on the Rust team, I wrote an update post at the end of every single day about what I’d worked on that day, every day I possibly could. I’m going to try to do that again I joined the Compilers team at Igalia this past September and am currently working on implementing new JavaScript features in the Spidermonkey JavaScript engine; at the moment, the Records and Tuples proposal, which would add immutable data types to JavaScript. As much as possible, I’m going to document how I spend each work day and what problems arise. This is mostly for me (so that I don’t look back and wonder what I did all month), but if anyone else happens to find it interesting, that’s an added bonus.

by Tim Chevalier at November 08, 2021 02:12 AM

November 01, 2021

Qiuyi Zhang (Joyee)

Building V8 on an M1 MacBook

I’ve recently got an M1 MacBook and played around with it a bit. It seems many open source projects still haven’t added MacOS with ARM64

November 01, 2021 01:50 PM

My 2019

It’s that time of the year again! I did not manage to write a recap about my 2018, so I’ll include some reflection about that year in

November 01, 2021 01:50 PM

Uncaught exceptions in Node.js

In this post, I’ll jot down some notes that I took when refactoring the uncaught exception handling routines in Node.js. Hopefully it

November 01, 2021 01:50 PM

On deps/v8 in Node.js

I recently ran into a V8 test failure that only showed up in the V8 fork of Node.js but not in the upstream. Here I’ll write down my

November 01, 2021 01:50 PM

Tips and Tricks for Node.js Core Development and Debugging

I thought about writing some guides on this topic in the nodejs/node repo, but it’s easier to throw whatever tricks I personally use on

November 01, 2021 01:50 PM

My 2017

I decided to write a recap of my 2017 because looking back, it was a very important year to me.

November 01, 2021 01:50 PM

New Blog

I’ve been thinking about starting a new blog for a while now. So here it is.

Not sure if I am going to write about tech here.

November 01, 2021 01:50 PM

October 13, 2021

Nikolas Zimmermann

Accelerating SVG - an update

Yikes, it’s been more than a year since my last post.

October 13, 2021 12:00 AM

October 02, 2021

Alicia Boya

Setting up VisualStudio code to work with WebKitGTK using clangd

Lately I’m working on a refactor in the append pipeline of the MediaSource Extensions implementation of the WebKit for the GStreamer ports. Working on refactors often triggers many build issues, not only because they often encompass a lot of code, but also because it’s very easy to miss errors in the client code when updating an interface.

The traditional way to tackle this problem is by doing many build cycles: compile, fix the topmost error, and maybe some other errors on view that seem legit (note in C++ it’s very common to have chain errors that are consequence of previous errors), repeat until it builds successfully.

This approach is not very pleasant in a project like WebKit where an incremental build of a single file takes just enough time to cause the need for a distraction. It’s also worsened when it’s not just one file, but a complete build that may stop at any time, depending on the order the build system chooses for the files. Often it does take more time to wait for the compiler to show the error than to fix the error.

Unpleasant unfavors motivation, and lack of motivation unfavors productivity, and by the end of the day you are tired and still undone. Somehow it feels like the time spent fixing trivial build issues is substancially more than the time of a build cycle times the number of errors. Whether that perception is accurate or not, I am acutely aware of the huge impact having helpful tooling has on both productivity and quality of life, both while and after you’re done with the work, so I decided to have a look at the state of modern C++ language servers when working on a large codebase like WebKit. Previous experiences were very unsuccessful, but there are people dedicated to this and progress has been made.

Creating a WebKit project in VS Code

  1. Open the directory containing the WebKit checkout in VS Code.
  2. WebKit has A LOT of files. If you use Linux you will see a warning telling you increase the number of inotify watchers. Do so if you haven’t done it before, but even then, it will not be enough, because WebKit has more files than the maximum number of inotify watchers supported by the kernel. Also, they use memory.
  3. Go to File/Preferences/Settings, click the Workspace tab, search for Files: Watcher Exclude and add the following patterns:
    **/CMakeFiles/**
    **/JSTests/**
    **/LayoutTests/**
    **/Tools/buildstream/cache/**
    **/Tools/buildstream/repo/**
    **/WebKitBuild/UserFlatpak/repo/**

    This will keep the number of watches on a workable 258k. Still a lot, but under the 1M limit.

How to set up clangd

The following instructions assume you’re using WebKitGTK with the WebKit Flatpak SDK. They should also work for WPE with minimal substitutions.

  1. Microsoft has its own C++ plugin for VS Code, which may be installed by default. The authors of the clangd plugin recommend to uninstall the built-in C++ plugin, as running both doesn’t make much sense and could cause conflicts.
  2. Install the clangd extension for VS Code from the VS Code Marketplace.
  3. The WebKit flatpak SDK already includes clangd, so it’s not necessary to install it if you’re using it. On the other hand, because the flatpak has a virtual filesystem, it’s necessary to map paths from the flatpak to the outside. You can create this wrapper script for this purpose. Make sure to give it execution rights (chmod +x).
    #!/bin/bash
    set -eu
    # https://stackoverflow.com/a/17841619
    function join_by { local d=${1-} f=${2-}; if shift 2; then printf %s "$f" "${@/#/$d}"; fi; }
    
    local_webkit=/webkit
    include_path=("$local_webkit"/WebKitBuild/UserFlatpak/runtime/org.webkit.Sdk/x86_64/*/active/files/include)
    if [ ! -f "${include_path[0]}/stdio.h" ]; then
      echo "Couldn't find the directory hosting the /usr/include of the flatpak SDK."
      exit 1
    fi
    include_path="${include_path[0]}"
    mappings=(
      "$local_webkit/WebKitBuild/GTK/Debug=/app/webkit/WebKitBuild/Debug"
      "$local_webkit/WebKitBuild/GTK/Release=/app/webkit/WebKitBuild/Release"
      "$local_webkit=/app/webkit"
      "$include_path=/usr/include"
    )
    
    exec "$local_webkit"/Tools/Scripts/webkit-flatpak --gtk --debug run -c clangd --path-mappings="$(join_by , "${mappings[@]}")" "$@"

    Make sure to set the path of your WebKit repository in local_webkit.

    Then, in VS Code, go to File/Preferences/Settings, and in the left pane, search for Extensions/clangd. Change Clangd: Path to the absolute path of the saved script above. I recomend making these changes in the Workspace tab, so they apply only to WebKit.

  4. Create a symlink named compile_commands.json inside the root of the WebKit checkout directory pointing to the compile_commands.json file of the WebKit build you will be using, for instance: WebKitBuild/GTK/Debug/compile_commands.json
  5. Create a .clangd file inside the root of the WebKit checkout directory with these contents:
    If:
        PathMatch: "(/app/webkit/)?Source/.*\\.h"
        PathExclude: "(/app/webkit/)?Source/ThirdParty/.*"
    
    CompileFlags:
        Add: [-include, config.h]

    This includes config.h in header files in WebKit files, with the exception of those in Source/ThirdParty. Note: If you need to add additional rules, this is done by adding additional YAML documents, which are separated by a --- line.

  6. VS Code clangd plugin doesn’t read .clangd by default. Instead, it has to be instructed to do so by adding --enable-config to Clangd: Arguments. Also add --limit-results=5000, since the default limit for cross reference search results (100) is too small for WebKit.Additional tip: clangd will also add #include lines when you autocomplete a type. While the intention is good, this often can lead to spurious redundant includes. I have disabled it by adding --header-insertion=never to clangd’s arguments.
  7. Restart VS Code. Next time you open a C++ file you will get a prompt requesting confirmating your edited configuration:

VS Code will start indexing your code, and you will see a progress count in the status bar.

Debugging problems

clangd has a log. To see it, click View/Output, then in the Output panel combo box, select clangd.

The clangd database is stored in .cache/clangd inside the WebKit checkout directory. rm -rf’ing that directory will reset it back to its initial state.

For each compilation unit indexed, you’ll find a file following the pattern .cache/clangd/index/<Name>.<Hash>.idx. For instance: .cache/clangd/index/MediaSampleGStreamer.cpp.0E0C77DCC76C3567.idx. This way you can check whether a particular compilation unit has been indexed.

Bug: Some files are not indexed

You may notice VS Code has not indexed all your files. This is apparent when using the Find all references feature, since you may be missing results. This in particular affects to generated code, in particular unified sources (.cpp files generated by concatenating via #include a series of related .cpp files with the purpose of speeding up the build, compared to compiling them as individual units).

I don’t know the reason for this bug, but I can confirm the following workaround: Open a UnifiedSources file. Any UnifiedSources file will do. You can find them in paths such as WebKitBuild/GTK/Debug/WebCore/DerivedSources/unified-sources/UnifiedSource-043dd90b-1.cpp. After you open any of them, you’ll see VS Code indexing over a thousand files that were skipped before. You can close the file now. Find all references should work once the indexing is done.

Things that work

Overall I’m quite satisfied with the setup. The following features work:

  • Autocompletion:
  • . gets replaced to -> when autocompleting a member inside an object accessible by dereferencing a pointer or smart pointer. (. will autocomplete not only the members of the object, but also of the pointee).
  • Right click/Find All References: What it founds is accurate, although I don’t feel very confident in it being exhaustive, as that requires a full index.
  • Right click/Show Call Hierarchy: This a useful tool that shows what functions call the selected function, and so on, automating what otherwise is a very manual process. At least, when it’s exhaustive enough.
  • Right click/Type hierarchy: It shows the class tree containing a particular class (ancestors, children classes and siblings).
  • Error reporting: the right bar of VS Code will show errors and warnings that clangd identifies with the code. It’s important to note that there is a maximum number of errors per file, after which the checking will stop, so it’s a good idea to start from the top of the file. The errors seem quite precise and avoid a lot of trips to the compiler. Unfortunately, they’re not completely exhaustive, so even after the file shows no errors in clangd, it might still show errors in the actual compiler, but it still catches most with very detailed information.
  • Signature completion: after completing a function, you get help showing you what types the parameters expect

Known issues and workarounds

“Go to definition” not working sometimes

If “Go to definition” (ctrl+click on the name of a function) doesn’t work on a header file, try opening the source file by pressing Ctrl+o, then go back to the header file by pressing Ctrl+o again and try going to definition again.

Base functions of overriden functions don’t show up when looking for references

Although this is supposed to be a closed issue I can still reproduce it. For instance, when searching for uses of SourceBufferPrivateGStreamer::enqueueSample(), calls to the parent class, SourceBufferPrivate::enqueueSample() get ignored.

This is also a common issue when using Show Call Hierarchy.

Lots of strange errors after a rebase

Clean the cache, reindex the project. Close VS Code, rm -rf .cache/clangd/index inside the WebKit checkout directory, then open VS Code again. Remember to open a UnifiedSources file to create a complete index.

by aboya at October 02, 2021 01:07 PM

September 30, 2021

Brian Kardell

Making the whole web better, one canvas at a time.

Making the whole web better, one canvas at a time.

One can have an entire career on the web and never write a single canvas.getContext('2d'), so "Why should I care about this new OffscreenCanvas thing?" is a decent question for many. In this post, I'll tell you why I'm certain that it will matter to you, in real ways.

How relevant is canvas?

As a user, you know from lived experience that <video> on the web is pretty popular. It isn't remotely niche. However, many developers I talk to think that <canvas> is. The sentiment seems to be something like...

I can see how it is useful if you want to make a photo editor or something, but... It's not really a thing I've ever added to a site or think I experience much... It's kind of niche, right?

What's interesting though, is that in reality, <canvas>'s prevalence in the the HTTPArchive isn't so far behind <video> (63rd/70th most popular elements respectively). It's considerably more widely used than many other standard HTML elements.

Amazing, right? I mean, how could that even be?!

The short answer is, it's just harder to recognize. A great example of this is maps. As a user, you recognize maps. You know they are common and popular. But what perhaps you don't recognize that it's on a canvas.

As a developer, there is a fair chance you have included a <canvas> somewhere without even realizing it. But again, since it is harder to recognize "ah this is a canvas" we don't idenitfy it the way we do video. Think about it: We include videos similarly all the time - not by directly including a <video> but via an abtraction - maybe it is a custom element or an iframe. Still, as a user you still clearly idenitfy it, so in your mind, as a developer you count it.

If canvas is niche, it is only so in the sense of who has to worry about those details. So let's talk about why you'll care, even if you don't directly use the API...

The trouble with canvas...

Unfortunately, <canvas> itself has a fundamental flaw. Let me show you...

Canvas (old)

This video is made by Andreas Hocevar using a common mapping library, on some fairly powerful hardware. You'll note how janky it gets - what you also can't tell from the video is that user interactions are temporarily interrupted on and off as rendering tries to keep up. The interface feels a little broken and frustrating.

For whom the bell tolls

For as bad as the video above is, as is the case on all performance related things, it's tempting to kind of shrug it off and think "Well, I don't know.. it's pretty usable, still - and hardware will catch up".

For all of the various appeals that have been made over the years to get us to care more about performance ("What about the fact that the majority of people use hardware less powerful than yours?" or "What about the fact that you're losing potential customers and users?" etc), we haven't moved that ball as meaningfully as we'd like. But,W I'd like to add one more to the list of things to think about here...

Ask not for whom the performance bell tolls, because increasingly: It tolls for you.

While we've been busy talking about phones and computers, something interesting happened: Billions of new devices using embedded web rendering engines appeared. TVs, game consoles, GPS systems, audio systems, infotainment systems in cars, planes and trains, kiosks, point of sale, digital signage, refridgerators, cooking appliances, ereaders, etc.. They're all using web engines.

Interstingly, if you own a high-end computer or phone, you're similarly more likely to enounter even more of these, as a user.

Embedded systems are generally way less powered than the universal devices we talk about often when they're brand new -- and their replacement rate is way slower.

So, while that moderately uncomfortable jank on your new iPhone still seems pretty bearable, it might translate to just a few (or even 1) FPS on your embedded device. Zoiks!

In other words, increasingly, that person that all of the other talks ask you to consider and empathize with... is you.

Enter: OffscreenCanvas

OffscreenCanvas is a solution to this. It's API surface is really small: It has a constructor, and a getContext('2d') method. Unlike the canvas element itself, however, it is neatly decoupled from the DOM. It can be used in a worker - in fact, they are tranferrable - you can pass them between windows and workers via postMessage. The existing DOM <canvas> API itself adds a .transferControlToOffscreen which will (explcitly) give you one back, and is in charge of painting in this rectangle.

If you are one of the many people who don't program against canvases yourself, don't worry about the details... Instead, let me show you what that means. The practical upshot of simply decoupling this is pretty clear, even on good hardware, as you can see in this demo...

OffscreenCanvas based maps
Using OffscreenCanvas, user interactions are not blocked - the rendering is way more fluid and the interface is able to feel smooth and responsive.

A Unique Opportunity

Canvas is also pretty unique in the history of the web because it began as unusually low level. That has its pros and its cons - but one positive thing is that the fact that most people use it by abstraction presents an intersting opportunity. We can radically improve things for pretty much all real users, through the actions of comparatively group of people who directly write things against the actual canvas APIs. Your own work can realize this, in most cases, without any changes to your code. Potentially without you even knowing. Nice.

New super powers, same great taste

There's a knock on effect here too that might be hard to notice at first. OffscreenCanvas doesn't create a whole new API to do its work - it's basically the same canvas context. And so are Houdini Custom Paint worklets. In fact, it's pretty hard to not see the relationship between painting on a canvas in a worker, and painting on a canvas in a worklet - right? They are effectively the same idea. There is minimal new platform "stuff" but we gain whole new superpowers and a clearer architecture. To me, this seems great.

What's more, while breaking off control and decoupling the main thread is a kind of easy win for performance and an intersting super power on it's own, we actually get more than that: In the case of Houdini we are suddenly able to tap into all of the rest of the CSS infrastructure and use this to brainstorm, explore and test and polyfill interesting new paint ideas before we talk about standardizing them. Amazing! That's really good for both standards and users.

Really interestingly though: In the case of OffscreenCanvas, we now suddenly have the ability to parallelize tasks and throw more hardware at highly parallelizable problems. Maps are also an example of that, but they aren't the only one.

My colleague Chris Lord recently gave a talk in which he gave a great demo visualizing an interactive and animated Mandlebrot Set (below). If you're unfamilliar with why this is impressive: A fractal is a self repeating geometric pattern, and they can be pretty intense to visualize. Even harder to make explorable in a UI. At 1080p resolution, and 250 iterations, that's about half a billion complex equations per rendered frame. Fortunately, they are also an example of a highly parallelizable problem, so they make for a nice demo of a thing that was just totally impossible with web technology yesterday, suddenly becomming possible with this new superpower.

OffscreenCanvas super powers!
A video of a talk from a recent WebKit Contributors meeting, showing impressive rendering. It should be time jumped, but on the chance that that fails, you can skip to about the 5 minute mark to see the demo.

What other doors will this open, and what will we see come from it? It will be super exciting to see!

September 30, 2021 04:00 AM

September 29, 2021

Thibault Saunier

GStreamer: one repository to rule them all

For the last years, the GStreamer community has been analysing and discussing the idea of merging all the modules into one single repository. Since all the official modules are released in sync and the code evolves simultaneously between those repositories, having the code split was a burden and several core GStreamer developers believed that it was worth making the effort to consolidate them into a single repository. As announced a while back this is now effective and this post is about explaining the technical choices and implications of that change.

You can also check out our Monorepo FAQ for a list of questions and answers.

Technicall details of the unification

Since we moved to meson as a build system a few years ago we implemented gst-build which leverages the meson subproject feature to build all GStreamer modules as one single project. This greatly enhanced the development experience of the GStreamer framework but we considered that we could improve it even more by having all GStreamer code in a single repository that looks the same as gst-build.

This is what the new unified git repository looks like, gst-build in the main gstreamer repository, except that all the code from the GStreamer modules located in the subprojects/ directory are checked in.

This new setup now lives in the main default branch of the gstreamer repository, the master branches for all the other modules repositories are now retired and frozen, no new merge request or code change will be accepted there.

This is only the first step and we will consider reorganizing the repository in the future, but the goal is to minimize disruptions.

The technical process for merging the repositories looks like:

foreach GSTREAMER_MODULE
    git remote add GSTREAMER_MODULE.name GSTREAMER_MODULE.url
    git fetch GSTREAMER_MODULE.name
    git merge GSTREAMER_MODULE.name/master
    git mv list_all_files_from_merged_gstreamer_module() GSTREAMER_MODULE.shortname
    git commit -m "Moved all files from " + GSTREAMER_MODULE.name
endforeach

This allows us to keep the exact same history (and checksum of each commit) for all the old gstreamer modules in the new repository which guarantees that the code is still exactly the same as before.

Releases with the new setup

In the same spirit of avoiding disruption, releases will look exactly the same as before. In the new unique gstreamer repository we still have meson subprojects for each GStreamer modules and they will have their own release tarballs. In practice, this means that not much (nothing?) should change for distribution packagers and consumers of GStreamer tarballs.

What should I do with my pending MRs in old modules repositories?

Since we can not create new merge requests in your name on gitlab, we wrote a move_mrs_to_monorepo script that you can run yourself. The script is located in the gstreamer repository and you can start moving all your pending MRs by simply calling it (scripts/move_mrs_to_monorepo.py and follow the instructions).


You can also check out our Monorepo FAQ for a list of questions and answers.

Thanks to everyone in the community for providing us with all the feedback and thanks to Xavier Claessens for co-leading the effort.

We are still working on ensuring the transition as smoothly as possible and if you have any question don’t hesitate to come talk to us in #gstreamer on the oftc IRC network.

Happy GStreamer hacking!

by thiblahute at September 29, 2021 09:34 PM