Planet Igalia Chromium

April 26, 2024

Gyuyoung Kim

Web Platform Test and Chromium

I’ve been working on tasks related to web platform tests since last year. In this blog post, I’ll introduce what web platform tests are and how the Chromium project incorporates these tests into its development process.

1. Introduction

The web-platform-tests project serves as a cross-browser test suite for the Web platform stack. By crafting tests that can run seamlessly across all browsers, browser projects gain assurance that their software aligns with other implementations. This confidence extends to future implementations, ensuring compatibility. Consequently, web authors and developers can trust the Web platform to fulfill its promise of seamless functionality across browsers and devices, eliminating the need for additional layers of abstraction to address gaps introduced by specification editors and implementors.

For your information, the Web Platform Test Community operates a dashboard that tracks the pass ratio of tests across major web browsers. The chart below displays the number of failing tests in major browsers over time, with Chrome showing the lowest failure rate.

[Caption] Test failures graph on major browsers
And, the below chart shows the interoperability of the web platform technology for 2023 among major browsers. The interoperability has been improving.

[Caption] Interoperability among the major browsers 2023

2. Test Suite Design

The majority of the test suite is made up of HTML pages that are designed to be loaded in a browser. These pages may either generate results programmatically or provide a set of steps for executing the test and obtaining the outcome. Overall, the tests are concise, cross-platform, and self-contained, making them easy to run in any browser.

2.1 Test Layout

Most primary directories within the repository are dedicated to tests associated with specific web standards. For W3C specifications, these directories typically use the short name of the spec, which is the name used for snapshot publications under the /TR/ path. For WHATWG specifications, the directories are usually named after the spec’s subdomain, omitting “.spec.whatwg.org” from the URL. Other specifications follow a logical naming convention.

The css/ directory contains test suites specifically designed for the CSS Working Group specifications.

Within each specification-specific directory, tests are organized in one of two common ways: a flat structure, sometimes used for shorter specifications, or a nested structure, where each subdirectory corresponds to the ID of a heading within the specification. The nested structure, which provides implicit metadata about the tested section of the specification based on its location in the filesystem, is preferred for larger specifications.

For example, tests related to “The History interface” in HTML can be found in html/browsers/history/the-history-interface/.

Many directories also include a file named META.yml, which may define properties such as:
  • spec: a link to the specification covered by the tests in the directory
  • suggested_reviewers: a list of GitHub usernames for individuals who are notified when pull requests modify files in the directory
Various resources that tests rely on are stored in common directories, including images, fonts, media, and general resources.

2.2 Test Types

Tests in this project employ various approaches to validate expected behavior, with classifications based on how expectations are expressed:

  1. Rendering Tests:
    • Reftests: Compare the graphical rendering of two (or more) web pages, asserting equality in their display (e.g., A.html and B.html must render identically). This can be done manually by users switching between tabs/windows or through automated scripts.
    • Visual Tests: Evaluate a page’s appearance, with results determined either by human observation or by comparing it with a saved screenshot specific to the user agent and platform.
  2. JavaScript Interface Tests (testharness.js tests):
    • Ensure that JavaScript interfaces behave as expected. Named after the JavaScript harness used to execute them.
  3. WebDriver Browser Automation Protocol Tests (wdspec tests):
    • Written in Python, these tests validate the WebDriver browser automation protocol.
  4. Manual Tests rely on a human to run them and determine their result.

3. Web Platform Test in Chromium

The Chromium project conducts web tests utilized by Blink to assess various components, encompassing, but not limited to, layout and rendering. Generally, these web tests entail loading pages in a test renderer (content_shell) and comparing the rendered output or JavaScript output with an expected output file. The Web Tests include “Web platform tests”(WPT) located at web_tests/external/wpt. Other directories are for Chrome-specific tests only. In the web_tests/external/wpt, there are around 50,000 web platform tests currently.

3.1 Web Tests in the Chromium development process

Every patch must pass all tests before it can be merged into the main source tree. In Gerrit, trybots execute a wide array of tests, including the Web Test, to ensure this. We can initiate these trybots with a specific patch, and each trybot will then run the scheduled tests using that patch.

[Caption] Trybots in Gerrit
For example, the linux-rel trybot runs the web tests in the blink_web_tests step as below,

[Caption] blink_web_tests in the linux-rel trybot

3.2 Internal sequence for running the Web Test in Chromium

Let’s take a look at how Chromium internally runs the web tests simply. Basically, there are 2 major components to run the web tests in Chromium. One is run_web_tests.py acting as a server. The other one is content_shell acting as a client that loads a passed test and returns the result. The input and output of content_shell are assumed to follow the run_web_tests protocol through pipes that connect stdin and stdout of run_web_tests.py and content_shell as below,

[Caption] Sequence how to execute a test between run_web_test.py and content_shell

4. In conclusion

We just explored what Web Tests are and how Chromium runs these tests for web platform testing. Web platform tests play a crucial role in ensuring interoperability among various web browsers and platforms. In the following blog post, I will share how did I support and implement to run of web tests on iOS for the Blink port.

References

by gyuyoung at April 26, 2024 08:48 AM

March 26, 2024

Andy Wingo

hacking v8 with guix, bis

Good day, hackers. Today, a pragmatic note, on hacking on V8 from a Guix system.

I’m going to skip a lot of the background because, as it turns out, I wrote about this already almost a decade ago. But following that piece, I mostly gave up on doing V8 hacking from a Guix machine—it was more important to just go with the flow of the ever-evolving upstream toolchain. In fact, I ended up installing Ubuntu LTS on my main workstations for precisely this reason, which has worked fine; I still get Guix in user-space, which is better than nothing.

Since then, though, Guix has grown to the point that it’s easier to create an environment that can run a complicated upstream source management project like V8’s. This is mainly guix shell in the --container --emulate-fhs mode. This article is a step-by-step for how to get started with V8 hacking using Guix.

get the code

You would think this would be the easy part: just git clone the V8 source. But no, the build wants a number of other Google-hosted dependencies to be vendored into the source tree. To perform the initial fetch for those dependencies and to keep them up to date, you use helpers from the depot_tools project. You also use depot_tools to submit patches to code review.

When you live in the Guix world, you might be tempted to look into what depot_tools actually does, and to replicate its functionality in a more minimal, Guix-like way. Which, sure, perhaps this is a good approach for packaging V8 or Chromium or something, but when you want to work on V8, you need to learn some humility and just go with the flow. (It’s hard for the kind of person that uses Guix. But it’s what you do.)

You can make some small adaptations, though. depot_tools is mostly written in Python, and it actually bundles its own virtualenv support for using a specific python version. This isn’t strictly needed, so we can set the funny environment variable VPYTHON_BYPASS="manually managed python not supported by chrome operations" to just use python from the environment.

Sometimes depot_tools will want to run some prebuilt binaries. Usually on Guix this is anathema—we always build from source—but there’s only so much time in the day and the build system is not our circus, not our monkeys. So we get Guix to set up the environment using a container in --emulate-fhs mode; this lets us run third-party pre-build binaries. Note, these binaries are indeed free software! We can run them just fine if we trust Google, which you have to when working on V8.

no, really, get the code

Enough with the introduction. The first thing to do is to check out depot_tools.

mkdir src
cd src
git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git

I’m assuming you have git in your Guix environment already.

Then you need to initialize depot_tools. For that you run a python script, which needs to run other binaries – so we need to make a specific environment in which it can run. This starts with a manifest of packages, is conventionally placed in a file named manifest.scm in the project’s working directory, though you don’t have one yet, so you can just write it into v8.scm or something anywhere:

(use-modules (guix packages)
             (gnu packages gcc))

(concatenate-manifests
 (list
  (specifications->manifest
   '(
     "bash"
     "binutils"
     "clang-toolchain"
     "coreutils"
     "diffutils"
     "findutils"
     "git"
     "glib"
     "glibc"
     "glibc-locales"
     "grep"
     "less"
     "ld-gold-wrapper"
     "make"
     "nss-certs"
     "nss-mdns"
     "openssh"
     "patch"
     "pkg-config"
     "procps"
     "python"
     "python-google-api-client"
     "python-httplib2"
     "python-pyparsing"
     "python-requests"
     "python-tzdata"
     "sed"
     "tar"
     "wget"
     "which"
     "xz"
     ))
  (packages->manifest
   `((,gcc "lib")))))

Then, you guix shell -m v8.scm. But you actually do more than that, because we need to set up a container so that we can expose a standard /lib, /bin, and so on:

guix shell --container --network \
  --share=$XDG_RUNTIME_DIR --share=$HOME \
  --preserve=TERM --preserve=SSH_AUTH_SOCK \
  --emulate-fhs \
  --manifest=v8.scm

Let’s go through these options one by one.

  • --container: This is what lets us run pre-built binaries, because it uses Linux namespaces to remap the composed packages to /bin, /lib, and so on.

  • --network: Depot tools are going to want to download things, so we give them net access.

  • --share: By default, the container shares the current working directory with the “host”. But we need not only the checkout for V8 but also the sibling checkout for depot tools (more on this in a minute); let’s just share the whole home directory. Also, we share the /run/user/1000 directory, which is $XDG_RUNTIME_DIR, which lets us access the SSH agent, so we can check out over SSH.

  • --preserve: By default, the container gets a pruned environment. This lets us pass some environment variables through.

  • --emulate-fhs: The crucial piece that lets us bridge the gap between Guix and the world.

  • --manifest: Here we specify the list of packages to use when composing the environment.

We can use short arguments to make this a bit less verbose:

guix shell -CNF --share=$XDG_RUNTIME_DIR --share=$HOME \
  -ETERM -ESSH_AUTH_SOCK -m manifest.scm

I would like it if all of these arguments could somehow be optional, that I could get a bare guix shell invocation to just apply them, when run in this directory. Perhaps some day.

Running guix shell like this drops you into a terminal. So let’s initialize depot tools:

cd $HOME/src
export VPYTHON_BYPASS="manually managed python not supported by chrome operations"
export PATH=$HOME/src/depot_tools:$PATH
export SSL_CERT_DIR=/etc/ssl/certs/
export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
gclient

This should download a bunch of things, I don’t know what. But at this point we’re ready to go:

fetch v8

This checks out V8, which is about 1.3 GB, and then probably about as much again in dependencies.

build v8

You can build V8 directly:

# note caveat below!
cd v8
tools/dev/gm.py x64.release

This will build fine... and then fail to link. The precise reason is obscure to me: it would seem that by default, V8 uses a whole Debian sysroot for Some Noble Purpose, and ends up linking against it. But it compiles against system glibc, which seems to have replaced fcntl64 with a versioned symbol, or some such nonsense. It smells like V8 built against a too-new glibc and then failed trying to link to an old glibc.

To fix this, you need to go into the args.gn that was generated in out/x64.release and then add use_sysroot = false, so that it links to system glibc instead of the downloaded one.

echo 'use_sysroot = false' >> out/x64.release/args.gn
tools/dev/gm.py x64.release

You probably want to put the commands needed to set up your environment into some shell scripts. For Guix you could make guix-env:

#!/bin/sh
guix shell -CNF --share=$XDG_RUNTIME_DIR --share=$HOME \
  -ETERM -ESSH_AUTH_SOCK -m manifest.scm -- "$@"

Then inside the container you need to set the PATH and such, so we could put this into the V8 checkout as env:

#!/bin/sh
# Look for depot_tools in sibling directory.
depot_tools=`cd $(dirname $0)/../depot_tools && pwd`
export PATH=$depot_tools:$PATH
export VPYTHON_BYPASS="manually managed python not supported by chrome operations"
export SSL_CERT_DIR=/etc/ssl/certs/
export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
exec "$@"

This way you can run ./guix-env ./env tools/dev/gm.py x64.release and not have to “enter” the container so much.

notes

This all works fine enough, but I do have some meta-reflections.

I would prefer it if I didn’t have to use containers, for two main reasons. One is that the resulting build artifacts have to be run in the container, because they are dynamically linked to e.g. /lib, at least for the ELF loader. It would be better if I could run them on the host (with the host debugger, for example). Using Guix to make the container is better than e.g. docker, though, because I can ensure that the same tools are available in the guest as I use on the host. But also, I don’t like adding “modes” to my terminals: are you in or out of this or that environment. Being in a container is not like being in a vanilla guix shell, and that’s annoying.

The build process uses many downloaded tools and artifacts, including clang itself. This is a feature, in that I am using the same compiler that colleagues at Google use, which is important. But it’s also annoying and it would be nice if I could choose. (Having the same clang-format though is an absolute requirement.)

There are two tests failing, in this configuration. It is somehow related to time zones. I have no idea why, but I just ignore them.

If the build system were any weirder, I would think harder about maybe using Docker or something like that. Colleagues point to distrobox as being a useful wrapper. It is annoying though, because such a docker image becomes like a little stateful thing to do sysadmin work on, and I would like to avoid that if I can.

Welp, that’s all for today. Hopefully if you are contemplating installing Guix as your operating system (rather than just in user-space), this can give you a bit more information as to what it might mean when working on third-party projects. Happy hacking and until next time!

by Andy Wingo at March 26, 2024 11:51 AM

March 05, 2024

José Dapena

Maintaining downstreams of Chromium: why downstream?

Chromium, the web browser open source project Google Chrome is based on, can be considered nowadays the reference implementation of the web platform. As such, it is the first choice when implementing the web platform in a software platform or product.

Why is it like this? In this blog post I am going to introduce the topic, and then review the different reasons why a downstream Chromium is used in different projects.

A series of blog posts

This is the first of a series of blog posts, where I am going through several aspects and challenges of maintaining a downstream project using Chromium as its upstream project.

They will be mostly based on the discussions in two events. First, on The Web Engines Hackfest 2023 break out session with same title that I chaired in A Coruña. Then, on my BlinkOn 18 talk in November 2023, at Sunnyvale.

Some definitions

Before starting the discussion of the different aspects, let’s clarify how I will use several terms.

Repository vs. project

I am going to refer to a repository as a version controlled storage of code strictly. Then, a project (specifically, a software project) is the community of people that share goals and some kind of organization, to maintain one or several software products.

So, a project may use several repositories for their goals.

In this discussion I will talk about Chromium, an open source project that targets the implementation of the web platform user agent, a web browser, for different platforms. As such, it uses a number of repositories (src, v8 and more).

Downstream and upstream

I will use the downstream and upstream terms, referred to the relationship of different software projects version control repositories.

If there is a software project repository (typically open source), and a new repository is created that contains all or part of the original repository, then:

  • Upstream project will be the original repository.
  • Downstream project is the new repository.

It is important to highlight that different things can happen to the downstream repository:

  • This copy could be a one time event, so the downstream repository becomes an independent fork, and there may be no interest in tracking the upstream evolution. This happens often for abandoned repositories, where a different set of people start an independent project. But there could be other reasons.
  • There is the intent to track the upstream repository changes. So the downstream repository evolves as the upstream repository does too, but with some specific differences maintained on top of the original repository.

Why using Chromium?

Nowadays, web platform is a solid alternative for providing contents to the users. It allows modern user interfaces, based on well known standards, and integrate well with local and remote services. The gap between native applications and web contents has been reduced, so it is a good alternative quite often.

But, when integrating web contents, product integrators need an implementation of the web platform. It is no surprise that Chromium is the most used, for a number of reasons:

  • It is open source, with a license that allows to adapt it for new product needs.
  • It is well maintained and up to date. Even pushing through standardization for improving it continuously.
  • It is secure, both from architecture and maintenance model point of view.
  • It provides integration points to tailor the implementation to ones needs.
  • It supports the most popular software platforms (Windows, Android, Linux, …) for integrating new products.
  • On top of the web platform itself, it provides an implementation for many of the components required to build a modern web browser.

Still, there are other good alternate choices for integrating the web, as WebKit (specially WPE for the embedded use cases), or using the system-provided web components (Android or iOS web view, …).

Though, in this blog post I will focus on the Chromium case.

Why downstreaming Chromium?

But, why do different projects need to use downstream Chromium repositories?

The main simple reason the project needs a downstream repository is adding changes that are not upstream. This can be for a variety of reasons:

  • Downstream changes that are not allowed by upstream. I.e. because they will make upstream project harder to maintain, or it will not be tested often.
  • Downstream changes that downstream project does not want to add to upstream.

Let’s see some examples of changes of both types.

Hardware and OS adaptation

This is when downstream adds support for a hardware target or OS that is not originally supported in upstream Chromium project.

Chromium upstream provides an abstraction layer for that purpose named Ozone, that allows to adapt it to the OS, desktop environment, and system graphics compositor. But there are other abstraction layers for media acceleration, accessibility or input methods.

The Wayland protocol adaptation started as a downstream effort, as upstream Chromium did not intend to support Wayland at that time. Eventually it evolved into an upstream official Ozone backend.

An example? LGE webOS Chromium port.

Differentiation

The previous case mostly forces to have a downstream project or repository. But there are also some cases where this is intended. There is the will to have some features in the downstream repository and not in upstream, an intended differentiation.

Why would anybody want that? Some typical examples:

  • A set of features that the downstream project owners consider to make the project better in some way, and want them to be kept downstream. This can happen when a new browser is shipped, and it contains features that make the product offering different, and, in some ways, better, than upstream Chrome. That can be a different user experience, some security features, better privacy…
  • Adaptation to a different product brand. Each browser or browser-based product will want to have its specific brand instead of upstream Chromium brand.

Examples of this:

  • Brave browser, with completely different privacy and security choices.
  • ARC browser, with an innovative user experience.
  • Microsoft Edge, with tight Windows OS integration and corporate features.

Hybrid application runtimes

And one last interesting case: integrating the web platform for developing hybrid applications: those that mix parts of the user interface implemented in a native toolkit, and parts implemented using the web platform.

Though Chromium includes official support for hybrid applications for Android, with the Android Web View, other toolkits provide also web applications support, and the integration of those belong, in Chromium case, to downstream projects.

Examples?

What’s next?

In this blog post I presented different reasons why projects end up maintaining a downstream fork of Chromium.

In the next blog post I will present one of the main challenges when maintaining a downstream of Chromium: the different rebase and upgrade strategies.

by José Dapena Paz at March 05, 2024 03:54 PM

January 05, 2024

Andy Wingo

v8's precise field-logging remembered set

A remembered set is used by a garbage collector to identify graph edges between partitioned sub-spaces of a heap. The canonical example is in generational collection, where you allocate new objects in newspace, and eventually promote survivor objects to oldspace. If most objects die young, we can focus GC effort on newspace, to avoid traversing all of oldspace all the time.

Collecting a subspace instead of the whole heap is sound if and only if we can identify all live objects in the subspace. We start with some set of roots that point into the subspace from outside, and then traverse all links in those objects, but only to other objects within the subspace.

The roots are, like, global variables, and the stack, and registers; and in the case of a partial collection in which we identify live objects only within newspace, also any link into newspace from other spaces (oldspace, in our case). This set of inbound links is a remembered set.

There are a few strategies for maintaining a remembered set. Generally speaking, you start by implementing a write barrier that intercepts all stores in a program. Instead of:

obj[slot] := val;

You might abstract this away:

write_slot(obj, sizeof obj, &obj[slot], val);

As you can see, it’s quite an annoying transformation to do by hand; typically you will want some sort of language-level abstraction that lets you keep the more natural syntax. C++ can do this pretty well, or if you are implementing a compiler, you just add this logic to the code generator.

Then the actual write barrier... well its implementation is twingled up with implementation of the remembered set. The simplest variant is a card-marking scheme, whereby the heap is divided into equal-sized power-of-two-sized cards, and each card has a bit. If the heap is also divided into blocks (say, 2 MB in size), then you might divide those blocks into 256-byte cards, yielding 8192 cards per block. A barrier might look like this:

void write_slot(ObjRef obj, size_t size,
                SlotAddr slot, ObjRef val) {
  obj[slot] := val; // Start with the store.

  uintptr_t block_size = 1<<21;
  uintptr_t card_size = 1<<8;
  uintptr_t cards_per_block = block_size / card_size;

  uintptr_t obj_addr = obj;
  uintptr_t card_idx = (obj_addr / card_size) % cards_per_block;

  // Assume remset allocated at block start.
  void *block_start = obj_addr & ~(block_size-1);
  uint32_t *cards = block_start;

  // Set the bit.
  cards[card_idx / 32] |= 1 << (card_idx % 32);
}

Then when marking the new generation, you visit all cards, and for all marked cards, trace all outbound links in all live objects that begin on the card.

Card-marking is simple to implement and simple to statically allocate as part of the heap. Finding marked cards takes time proportional to the size of the heap, but you hope that the constant factors and SIMD minimize this cost. However iterating over objects within a card can be costly. You hope that there are few old-to-new links but what do you know?

In Whippet I have been struggling a bit with sticky-mark-bit generational marking, in which new and old objects are not spatially partitioned. Sometimes generational collection is a win, but in benchmarking I find that often it isn’t, and I think Whippet’s card-marking barrier is at fault: it is simply too imprecise. Consider firstly that our write barrier applies to stores to slots in all objects, not just those in oldspace; a store to a new object will mark a card, but that card may contain old objects which would then be re-scanned. Or consider a store to an old object in a more dense part of oldspace; scanning the card may incur more work than needed. It could also be that Whippet is being too aggressive at re-using blocks for new allocations, where it should be limiting itself to blocks that are very sparsely populated with old objects.

what v8 does

There is a tradeoff in write barriers between the overhead imposed on stores, the size of the remembered set, and the precision of the remembered set. Card-marking is relatively low-overhead and usually small as a fraction of the heap, but not very precise. It would be better if a remembered set recorded objects, not cards. And it would be even better if it recorded slots in objects, not just objects.

V8 takes this latter strategy: it has per-block remembered sets which record slots containing “interesting” links. All of the above words were to get here, to take a brief look at its remembered set.

The main operation is RememberedSet::Insert. It takes the MemoryChunk (a block, in our language from above) and the address of a slot in the block. Each block has a remembered set; in fact, six remembered sets for some reason. The remembered set itself is a SlotSet, whose interesting operations come from BasicSlotSet.

The structure of a slot set is a bitvector partitioned into equal-sized, possibly-empty buckets. There is one bit per slot in the block, so in the limit the size overhead for the remembered set may be 3% (1/32, assuming compressed pointers). Currently each bucket is 1024 bits (128 bytes), plus the 4 bytes for the bucket pointer itself.

Inserting into the slot set will first allocate a bucket (using C++ new) if needed, then load the “cell” (32-bit integer) containing the slot. There is a template parameter declaring whether this is an atomic or normal load. Finally, if the slot bit in the cell is not yet set, V8 will set the bit, possibly using atomic compare-and-swap.

In the language of Blackburn’s Design and analysis of field-logging write barriers, I believe this is a field-logging barrier, rather than the bit-stealing slot barrier described by Yang et al in the 2012 Barriers Reconsidered, Friendlier Still!. Unlike Blackburn’s field-logging barrier, however, this remembered set is implemented completely on the side: there is no in-object remembered bit, nor remembered bits for the fields.

On the one hand, V8’s remembered sets are precise. There are some tradeoffs, though: they require off-managed-heap dynamic allocation for the buckets, and traversing the remembered sets takes time proportional to the whole heap size. And, should V8 ever switch its minor mark-sweep generational collector to use sticky mark bits, the lack of a spatial partition could lead to similar problems as I am seeing in Whippet. I will be interested to see what they come up with in this regard.

Well, that’s all for today. Happy hacking in the new year!

by Andy Wingo at January 05, 2024 09:44 AM