Planet Igalia Chromium

March 26, 2024

Andy Wingo

hacking v8 with guix, bis

Good day, hackers. Today, a pragmatic note, on hacking on V8 from a Guix system.

I’m going to skip a lot of the background because, as it turns out, I wrote about this already almost a decade ago. But following that piece, I mostly gave up on doing V8 hacking from a Guix machine—it was more important to just go with the flow of the ever-evolving upstream toolchain. In fact, I ended up installing Ubuntu LTS on my main workstations for precisely this reason, which has worked fine; I still get Guix in user-space, which is better than nothing.

Since then, though, Guix has grown to the point that it’s easier to create an environment that can run a complicated upstream source management project like V8’s. This is mainly guix shell in the --container --emulate-fhs mode. This article is a step-by-step for how to get started with V8 hacking using Guix.

get the code

You would think this would be the easy part: just git clone the V8 source. But no, the build wants a number of other Google-hosted dependencies to be vendored into the source tree. To perform the initial fetch for those dependencies and to keep them up to date, you use helpers from the depot_tools project. You also use depot_tools to submit patches to code review.

When you live in the Guix world, you might be tempted to look into what depot_tools actually does, and to replicate its functionality in a more minimal, Guix-like way. Which, sure, perhaps this is a good approach for packaging V8 or Chromium or something, but when you want to work on V8, you need to learn some humility and just go with the flow. (It’s hard for the kind of person that uses Guix. But it’s what you do.)

You can make some small adaptations, though. depot_tools is mostly written in Python, and it actually bundles its own virtualenv support for using a specific python version. This isn’t strictly needed, so we can set the funny environment variable VPYTHON_BYPASS="manually managed python not supported by chrome operations" to just use python from the environment.

Sometimes depot_tools will want to run some prebuilt binaries. Usually on Guix this is anathema—we always build from source—but there’s only so much time in the day and the build system is not our circus, not our monkeys. So we get Guix to set up the environment using a container in --emulate-fhs mode; this lets us run third-party pre-build binaries. Note, these binaries are indeed free software! We can run them just fine if we trust Google, which you have to when working on V8.

no, really, get the code

Enough with the introduction. The first thing to do is to check out depot_tools.

mkdir src
cd src
git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git

I’m assuming you have git in your Guix environment already.

Then you need to initialize depot_tools. For that you run a python script, which needs to run other binaries – so we need to make a specific environment in which it can run. This starts with a manifest of packages, is conventionally placed in a file named manifest.scm in the project’s working directory, though you don’t have one yet, so you can just write it into v8.scm or something anywhere:

(use-modules (guix packages)
             (gnu packages gcc))

(concatenate-manifests
 (list
  (specifications->manifest
   '(
     "bash"
     "binutils"
     "clang-toolchain"
     "coreutils"
     "diffutils"
     "findutils"
     "git"
     "glib"
     "glibc"
     "glibc-locales"
     "grep"
     "less"
     "ld-gold-wrapper"
     "make"
     "nss-certs"
     "nss-mdns"
     "openssh"
     "patch"
     "pkg-config"
     "procps"
     "python"
     "python-google-api-client"
     "python-httplib2"
     "python-pyparsing"
     "python-requests"
     "python-tzdata"
     "sed"
     "tar"
     "wget"
     "which"
     "xz"
     ))
  (packages->manifest
   `((,gcc "lib")))))

Then, you guix shell -m v8.scm. But you actually do more than that, because we need to set up a container so that we can expose a standard /lib, /bin, and so on:

guix shell --container --network \
  --share=$XDG_RUNTIME_DIR --share=$HOME \
  --preserve=TERM --preserve=SSH_AUTH_SOCK \
  --emulate-fhs \
  --manifest=v8.scm

Let’s go through these options one by one.

  • --container: This is what lets us run pre-built binaries, because it uses Linux namespaces to remap the composed packages to /bin, /lib, and so on.

  • --network: Depot tools are going to want to download things, so we give them net access.

  • --share: By default, the container shares the current working directory with the “host”. But we need not only the checkout for V8 but also the sibling checkout for depot tools (more on this in a minute); let’s just share the whole home directory. Also, we share the /run/user/1000 directory, which is $XDG_RUNTIME_DIR, which lets us access the SSH agent, so we can check out over SSH.

  • --preserve: By default, the container gets a pruned environment. This lets us pass some environment variables through.

  • --emulate-fhs: The crucial piece that lets us bridge the gap between Guix and the world.

  • --manifest: Here we specify the list of packages to use when composing the environment.

We can use short arguments to make this a bit less verbose:

guix shell -CNF --share=$XDG_RUNTIME_DIR --share=$HOME \
  -ETERM -ESSH_AUTH_SOCK -m manifest.scm

I would like it if all of these arguments could somehow be optional, that I could get a bare guix shell invocation to just apply them, when run in this directory. Perhaps some day.

Running guix shell like this drops you into a terminal. So let’s initialize depot tools:

cd $HOME/src
export VPYTHON_BYPASS="manually managed python not supported by chrome operations"
export PATH=$HOME/src/depot_tools:$PATH
export SSL_CERT_DIR=/etc/ssl/certs/
export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
gclient

This should download a bunch of things, I don’t know what. But at this point we’re ready to go:

fetch v8

This checks out V8, which is about 1.3 GB, and then probably about as much again in dependencies.

build v8

You can build V8 directly:

# note caveat below!
cd v8
tools/dev/gm.py x64.release

This will build fine... and then fail to link. The precise reason is obscure to me: it would seem that by default, V8 uses a whole Debian sysroot for Some Noble Purpose, and ends up linking against it. But it compiles against system glibc, which seems to have replaced fcntl64 with a versioned symbol, or some such nonsense. It smells like V8 built against a too-new glibc and then failed trying to link to an old glibc.

To fix this, you need to go into the args.gn that was generated in out/x64.release and then add use_sysroot = false, so that it links to system glibc instead of the downloaded one.

echo 'use_sysroot = false' >> out/x64.release/args.gn
tools/dev/gm.py x64.release

You probably want to put the commands needed to set up your environment into some shell scripts. For Guix you could make guix-env:

#!/bin/sh
guix shell -CNF --share=$XDG_RUNTIME_DIR --share=$HOME \
  -ETERM -ESSH_AUTH_SOCK -m manifest.scm -- "$@"

Then inside the container you need to set the PATH and such, so we could put this into the V8 checkout as env:

#!/bin/sh
# Look for depot_tools in sibling directory.
depot_tools=`cd $(dirname $0)/../depot_tools && pwd`
export PATH=$depot_tools:$PATH
export VPYTHON_BYPASS="manually managed python not supported by chrome operations"
export SSL_CERT_DIR=/etc/ssl/certs/
export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
exec "$@"

This way you can run ./guix-env ./env tools/dev/gm.py x64.release and not have to “enter” the container so much.

notes

This all works fine enough, but I do have some meta-reflections.

I would prefer it if I didn’t have to use containers, for two main reasons. One is that the resulting build artifacts have to be run in the container, because they are dynamically linked to e.g. /lib, at least for the ELF loader. It would be better if I could run them on the host (with the host debugger, for example). Using Guix to make the container is better than e.g. docker, though, because I can ensure that the same tools are available in the guest as I use on the host. But also, I don’t like adding “modes” to my terminals: are you in or out of this or that environment. Being in a container is not like being in a vanilla guix shell, and that’s annoying.

The build process uses many downloaded tools and artifacts, including clang itself. This is a feature, in that I am using the same compiler that colleagues at Google use, which is important. But it’s also annoying and it would be nice if I could choose. (Having the same clang-format though is an absolute requirement.)

There are two tests failing, in this configuration. It is somehow related to time zones. I have no idea why, but I just ignore them.

If the build system were any weirder, I would think harder about maybe using Docker or something like that. Colleagues point to distrobox as being a useful wrapper. It is annoying though, because such a docker image becomes like a little stateful thing to do sysadmin work on, and I would like to avoid that if I can.

Welp, that’s all for today. Hopefully if you are contemplating installing Guix as your operating system (rather than just in user-space), this can give you a bit more information as to what it might mean when working on third-party projects. Happy hacking and until next time!

by Andy Wingo at March 26, 2024 11:51 AM

March 05, 2024

José Dapena

Maintaining downstreams of Chromium: why downstream?

Chromium, the web browser open source project Google Chrome is based on, can be considered nowadays the reference implementation of the web platform. As such, it is the first choice when implementing the web platform in a software platform or product.

Why is it like this? In this blog post I am going to introduce the topic, and then review the different reasons why a downstream Chromium is used in different projects.

A series of blog posts

This is the first of a series of blog posts, where I am going through several aspects and challenges of maintaining a downstream project using Chromium as its upstream project.

They will be mostly based on the discussions in two events. First, on The Web Engines Hackfest 2023 break out session with same title that I chaired in A Coruña. Then, on my BlinkOn 18 talk in November 2023, at Sunnyvale.

Some definitions

Before starting the discussion of the different aspects, let’s clarify how I will use several terms.

Repository vs. project

I am going to refer to a repository as a version controlled storage of code strictly. Then, a project (specifically, a software project) is the community of people that share goals and some kind of organization, to maintain one or several software products.

So, a project may use several repositories for their goals.

In this discussion I will talk about Chromium, an open source project that targets the implementation of the web platform user agent, a web browser, for different platforms. As such, it uses a number of repositories (src, v8 and more).

Downstream and upstream

I will use the downstream and upstream terms, referred to the relationship of different software projects version control repositories.

If there is a software project repository (typically open source), and a new repository is created that contains all or part of the original repository, then:

  • Upstream project will be the original repository.
  • Downstream project is the new repository.

It is important to highlight that different things can happen to the downstream repository:

  • This copy could be a one time event, so the downstream repository becomes an independent fork, and there may be no interest in tracking the upstream evolution. This happens often for abandoned repositories, where a different set of people start an independent project. But there could be other reasons.
  • There is the intent to track the upstream repository changes. So the downstream repository evolves as the upstream repository does too, but with some specific differences maintained on top of the original repository.

Why using Chromium?

Nowadays, web platform is a solid alternative for providing contents to the users. It allows modern user interfaces, based on well known standards, and integrate well with local and remote services. The gap between native applications and web contents has been reduced, so it is a good alternative quite often.

But, when integrating web contents, product integrators need an implementation of the web platform. It is no surprise that Chromium is the most used, for a number of reasons:

  • It is open source, with a license that allows to adapt it for new product needs.
  • It is well maintained and up to date. Even pushing through standardization for improving it continuously.
  • It is secure, both from architecture and maintenance model point of view.
  • It provides integration points to tailor the implementation to ones needs.
  • It supports the most popular software platforms (Windows, Android, Linux, …) for integrating new products.
  • On top of the web platform itself, it provides an implementation for many of the components required to build a modern web browser.

Still, there are other good alternate choices for integrating the web, as WebKit (specially WPE for the embedded use cases), or using the system-provided web components (Android or iOS web view, …).

Though, in this blog post I will focus on the Chromium case.

Why downstreaming Chromium?

But, why do different projects need to use downstream Chromium repositories?

The main simple reason the project needs a downstream repository is adding changes that are not upstream. This can be for a variety of reasons:

  • Downstream changes that are not allowed by upstream. I.e. because they will make upstream project harder to maintain, or it will not be tested often.
  • Downstream changes that downstream project does not want to add to upstream.

Let’s see some examples of changes of both types.

Hardware and OS adaptation

This is when downstream adds support for a hardware target or OS that is not originally supported in upstream Chromium project.

Chromium upstream provides an abstraction layer for that purpose named Ozone, that allows to adapt it to the OS, desktop environment, and system graphics compositor. But there are other abstraction layers for media acceleration, accessibility or input methods.

The Wayland protocol adaptation started as a downstream effort, as upstream Chromium did not intend to support Wayland at that time. Eventually it evolved into an upstream official Ozone backend.

An example? LGE webOS Chromium port.

Differentiation

The previous case mostly forces to have a downstream project or repository. But there are also some cases where this is intended. There is the will to have some features in the downstream repository and not in upstream, an intended differentiation.

Why would anybody want that? Some typical examples:

  • A set of features that the downstream project owners consider to make the project better in some way, and want them to be kept downstream. This can happen when a new browser is shipped, and it contains features that make the product offering different, and, in some ways, better, than upstream Chrome. That can be a different user experience, some security features, better privacy…
  • Adaptation to a different product brand. Each browser or browser-based product will want to have its specific brand instead of upstream Chromium brand.

Examples of this:

  • Brave browser, with completely different privacy and security choices.
  • ARC browser, with an innovative user experience.
  • Microsoft Edge, with tight Windows OS integration and corporate features.

Hybrid application runtimes

And one last interesting case: integrating the web platform for developing hybrid applications: those that mix parts of the user interface implemented in a native toolkit, and parts implemented using the web platform.

Though Chromium includes official support for hybrid applications for Android, with the Android Web View, other toolkits provide also web applications support, and the integration of those belong, in Chromium case, to downstream projects.

Examples?

What’s next?

In this blog post I presented different reasons why projects end up maintaining a downstream fork of Chromium.

In the next blog post I will present one of the main challenges when maintaining a downstream of Chromium: the different rebase and upgrade strategies.

by José Dapena Paz at March 05, 2024 03:54 PM

January 05, 2024

Andy Wingo

v8's precise field-logging remembered set

A remembered set is used by a garbage collector to identify graph edges between partitioned sub-spaces of a heap. The canonical example is in generational collection, where you allocate new objects in newspace, and eventually promote survivor objects to oldspace. If most objects die young, we can focus GC effort on newspace, to avoid traversing all of oldspace all the time.

Collecting a subspace instead of the whole heap is sound if and only if we can identify all live objects in the subspace. We start with some set of roots that point into the subspace from outside, and then traverse all links in those objects, but only to other objects within the subspace.

The roots are, like, global variables, and the stack, and registers; and in the case of a partial collection in which we identify live objects only within newspace, also any link into newspace from other spaces (oldspace, in our case). This set of inbound links is a remembered set.

There are a few strategies for maintaining a remembered set. Generally speaking, you start by implementing a write barrier that intercepts all stores in a program. Instead of:

obj[slot] := val;

You might abstract this away:

write_slot(obj, sizeof obj, &obj[slot], val);

As you can see, it’s quite an annoying transformation to do by hand; typically you will want some sort of language-level abstraction that lets you keep the more natural syntax. C++ can do this pretty well, or if you are implementing a compiler, you just add this logic to the code generator.

Then the actual write barrier... well its implementation is twingled up with implementation of the remembered set. The simplest variant is a card-marking scheme, whereby the heap is divided into equal-sized power-of-two-sized cards, and each card has a bit. If the heap is also divided into blocks (say, 2 MB in size), then you might divide those blocks into 256-byte cards, yielding 8192 cards per block. A barrier might look like this:

void write_slot(ObjRef obj, size_t size,
                SlotAddr slot, ObjRef val) {
  obj[slot] := val; // Start with the store.

  uintptr_t block_size = 1<<21;
  uintptr_t card_size = 1<<8;
  uintptr_t cards_per_block = block_size / card_size;

  uintptr_t obj_addr = obj;
  uintptr_t card_idx = (obj_addr / card_size) % cards_per_block;

  // Assume remset allocated at block start.
  void *block_start = obj_addr & ~(block_size-1);
  uint32_t *cards = block_start;

  // Set the bit.
  cards[card_idx / 32] |= 1 << (card_idx % 32);
}

Then when marking the new generation, you visit all cards, and for all marked cards, trace all outbound links in all live objects that begin on the card.

Card-marking is simple to implement and simple to statically allocate as part of the heap. Finding marked cards takes time proportional to the size of the heap, but you hope that the constant factors and SIMD minimize this cost. However iterating over objects within a card can be costly. You hope that there are few old-to-new links but what do you know?

In Whippet I have been struggling a bit with sticky-mark-bit generational marking, in which new and old objects are not spatially partitioned. Sometimes generational collection is a win, but in benchmarking I find that often it isn’t, and I think Whippet’s card-marking barrier is at fault: it is simply too imprecise. Consider firstly that our write barrier applies to stores to slots in all objects, not just those in oldspace; a store to a new object will mark a card, but that card may contain old objects which would then be re-scanned. Or consider a store to an old object in a more dense part of oldspace; scanning the card may incur more work than needed. It could also be that Whippet is being too aggressive at re-using blocks for new allocations, where it should be limiting itself to blocks that are very sparsely populated with old objects.

what v8 does

There is a tradeoff in write barriers between the overhead imposed on stores, the size of the remembered set, and the precision of the remembered set. Card-marking is relatively low-overhead and usually small as a fraction of the heap, but not very precise. It would be better if a remembered set recorded objects, not cards. And it would be even better if it recorded slots in objects, not just objects.

V8 takes this latter strategy: it has per-block remembered sets which record slots containing “interesting” links. All of the above words were to get here, to take a brief look at its remembered set.

The main operation is RememberedSet::Insert. It takes the MemoryChunk (a block, in our language from above) and the address of a slot in the block. Each block has a remembered set; in fact, six remembered sets for some reason. The remembered set itself is a SlotSet, whose interesting operations come from BasicSlotSet.

The structure of a slot set is a bitvector partitioned into equal-sized, possibly-empty buckets. There is one bit per slot in the block, so in the limit the size overhead for the remembered set may be 3% (1/32, assuming compressed pointers). Currently each bucket is 1024 bits (128 bytes), plus the 4 bytes for the bucket pointer itself.

Inserting into the slot set will first allocate a bucket (using C++ new) if needed, then load the “cell” (32-bit integer) containing the slot. There is a template parameter declaring whether this is an atomic or normal load. Finally, if the slot bit in the cell is not yet set, V8 will set the bit, possibly using atomic compare-and-swap.

In the language of Blackburn’s Design and analysis of field-logging write barriers, I believe this is a field-logging barrier, rather than the bit-stealing slot barrier described by Yang et al in the 2012 Barriers Reconsidered, Friendlier Still!. Unlike Blackburn’s field-logging barrier, however, this remembered set is implemented completely on the side: there is no in-object remembered bit, nor remembered bits for the fields.

On the one hand, V8’s remembered sets are precise. There are some tradeoffs, though: they require off-managed-heap dynamic allocation for the buckets, and traversing the remembered sets takes time proportional to the whole heap size. And, should V8 ever switch its minor mark-sweep generational collector to use sticky mark bits, the lack of a spatial partition could lead to similar problems as I am seeing in Whippet. I will be interested to see what they come up with in this regard.

Well, that’s all for today. Happy hacking in the new year!

by Andy Wingo at January 05, 2024 09:44 AM

December 14, 2023

Jacobo Aragunde

Setting up a minimal, command-line Android emulator on Linux

Android has all kinds of nice development tools, but sometimes you just want to run an apk and don’t need all the surrounding tooling. In my case, I have already have my Chromium setup, which can produce binaries for several platforms including Android.

I usually test on a physical device, a smartphone, but I would like to try a device with a tablet form factor and I don’t have one at hand.

Chromium smartphone and tablet screenshots

Chromium provides different user experiences for smartphones and tablets.

I’ve set up the most stripped-down environment possible to run an Android emulator using the same tools provided by the platform. Notice these are generic instructions, not tied to Chromium tooling, despite I’m doing this mainly to run Chromium.

The first step is to download the latest version of the command line tools, instead of Android Studio, from the Android developer website. We will extract the tools to the location ~/Android/Sdk, in the path where they like to find themselves:

mkdir -p ~/Android/Sdk/cmdline-tools
unzip -d ~/Android/Sdk/cmdline-tools ~/Downloads/commandlinetools-linux-10406996_latest.zip 
mv ~/Android/Sdk/cmdline-tools/cmdline-tools/ ~/Android/Sdk/cmdline-tools/latest

Now we have the tools installed in ~/Android/Sdk/cmdline-tools/latest, we need to add their binaries to the path. We will also add other paths for tools we are about to install, with the command:

export PATH=~/Android/Sdk/cmdline-tools/latest/bin:~/Android/Sdk/emulator:~/Android/Sdk/platform-tools:$PATH

Run the command above for a one-time use, or add it to ~/.bashrc or your preferred shell configuration for future use.

You will also need to setup another envvar:

export ANDROID_HOME=~/Android/Sdk

Again, run it once for a one-time use or add it to your shell configuration in ~/.bashrc or equivalent.

Now we use the tool sdkmanager, which came with Android’s command-line tools, to install the rest of the software we need. All of this will be installed inside your $ANDROID_HOME, so in ~/Android/Sdk.

sdkmanager "emulator" "platform-tools" "platforms;android-29" "system-images;android-29;google_apis;x86_64"

In the command above I have installed the emulator, platform tools (adb and friends), and system libraries and a system image for Android 10 (API level 29). You may check what other things are available with sdkmanager --list, you will find multiple variants of the Android platform and system images.

Finally, we have to setup a virtual machine, called AVD (“Android Virtual Device”) in Android jargon. We do that with:

avdmanager -v create avd -n Android_API_29_Google -k "system-images;android-29;google_apis;x86_64" -d pixel_c

Here I have created a virtual device called “Android_API_29_Google” with the system image I had downloaded, and the form factor and screen size of a Pixel C, which is a tablet. You may get a list of all the devices that can be simulated with avdmanager list device.

Now we can already start an emulator running the AVD with the name we have just chosen, with:

emulator -avd Android_API_29_Google

The files and configuration for this AVD are stored in ~/.android/avd/Android_API_29_Google.avd, or a sibling directory if you used a different name. Configuration is in the config.ini file, you can set here many options you would normally configure from Android Studio, and even more. I recommend to change at least this value, for your convenience:

hw.keyboard = yes

Otherwise you will find yourself typing URLs with a mouse on a virtual keyboard… Not fun.

Finally, your software will be the up-to-date after a fresh install but, when new versions are released, you will be able to install them with:

sdkmanager --update

Make sure to run this every now and then!

At this point, your setup should be ready, and all the tools you need are in the PATH. Feel free to reuse the commands above to create any number of virtual devices with different Android system images, different form factors… Happy hacking!

by Jacobo Aragunde Pérez at December 14, 2023 05:00 PM

December 08, 2023

Andy Wingo

v8's mark-sweep nursery

Today, a followup to yesterday’s note with some more details on V8’s new young-generation implementation, minor mark-sweep or MinorMS.

A caveat again: these observations are just from reading the code; I haven’t run these past the MinorMS authors yet, so any of these details might be misunderstandings.

The MinorMS nursery consists of pages, each of which is 256 kB, unless huge-page mode is on, in which case they are 2 MB. The total default size of the nursery is 72 MB by default, or 144 MB if pointer compression is off.

There can be multiple threads allocating into the nursery, but let’s focus on the main allocator, which is used on the main thread. Nursery allocation is bump-pointer, whether in a MinorMS page or scavenger semi-space. Bump-pointer regions are called linear allocation buffers, and often abbreviated as Lab in the source, though the class is LinearAllocationArea.

If the current bump-pointer region is too small for the current allocation, the nursery implementation finds another one, or triggers a collection. For the MinorMS nursery, each page collects the set of allocatable spans in a free list; if the free-list is non-empty, it pops off one entry as the current and tries again.

Otherwise, MinorMS needs another page, and specifically a swept page: a page which has been visited since the last GC, and whose spans of unused memory have been collected into a free-list. There is a concurrent sweeping task which should usually run ahead of the mutator, but if there is no swept page available, the allocator might need to sweep some. This logic is in MainAllocator::RefillLabMain.

Finally, if all pages are swept and there’s no Lab big enough for the current allocation, we trigger collection from the roots. The initial roots are the remembered set: pointers from old objects to new objects. Most of the trace happens concurrently with the mutator; when the nursery utilisation rises over 90%, V8 will kick off concurrent marking tasks.

Then once the mutator actually runs out of space, it pauses, drains any pending marking work, marks conservative roots, then drains again. I am not sure whether MinorMS with conservative stack scanning visits the whole C/C++ stacks or whether it manages to install some barriers (i.e. “don’t scan deeper than 5 frames because we collected then, and so all older frames are older”); dunno. All of this logic is in MinorMarkSweepCollector::MarkLiveObjects.

Marking traces the object graph, setting object mark bits. It does not trace pages. However, the MinorMS space promotes in units of pages. So how to decide what pages to promote? The answer is that sweeping partitions the MinorMS pages into empty, recycled, aging, and promoted pages.

Empty pages have no surviving objects, and are very useful because they can be given back to the operating system if needed or shuffled around elsewhere in the system. If they are re-used for allocation, they do not need to be swept.

Recycled pages have some survivors, but not many; MinorMS keeps the page around for allocation in the next cycle, because it has enough empty space. By default, a page is recyclable if it has 50% or more free space after a minor collection, or 30% after a major collection. MinorMS also promotes a page eagerly if in the last cycle, we only managed to allocate into 30% or less of its empty space, probably due to fragmentation. These pages need to be swept before re-use.

Finally, MinorMS doesn’t let pages be recycled indefinitely: after 4 minor cycles, a page goes into the aging pool, in which it is kept unavailable for allocation for one cycle, but is not yet promoted. This allows any new allocations on that page in the previous cycle age out and probably die, preventing premature tenuring.

And that’s it. Next time, a note on a way in which generational collectors can run out of memory. Have a nice weekend, hackfolk!

by Andy Wingo at December 08, 2023 02:34 PM

December 07, 2023

Andy Wingo

the last 5 years of V8's garbage collector

Captain, status report: I’m down here in a Jeffries tube, poking at V8’s garbage collector. However, despite working on other areas of the project recently, V8 is now so large that it’s necessary to ignore whole subsystems when working on any given task. But now I’m looking at the GC in anger: what is its deal? What does V8’s GC even look like these days?

The last public article on the structure of V8’s garbage collector was in 2019; fine enough, but dated. Now in the evening of 2023 I think it could be useful to revisit it and try to summarize the changes since then. At least, it would have been useful to me had someone else written this article.

To my mind, work on V8’s GC has had three main goals over the last 5 years: improving interactions between the managed heap and C++, improving security, and increasing concurrency. Let’s visit these in turn.

C++ and GC

Building on the 2018 integration of the Oilpan tracing garbage collector into the Blink web engine, there was some refactoring to move the implementation of Oilpan into V8 itself. Oilpan is known internally as cppgc.

I find the cppgc name a bit annoying because I can never remember what it refers to, because of the other thing that has been happpening in C++ integration: a migration away from precise roots and instead towards conservative root-finding.

Some notes here: with conservative stack scanning, we can hope for better mutator throughput and fewer bugs. The throughput comes from not having to put all live pointers in memory; the compiler can keep them in registers, and avoid managing the HandleScope. You may be able to avoid the compile-time and space costs of stack maps (side tables telling the collector where the pointers are). There are also two classes of bug that we can avoid: holding on to a handle past the lifetime of a handlescope, and holding on to a raw pointer (instead of a handle) during a potential GC point.

Somewhat confusingly, it would seem that conservative stack scanning has garnered the acronym “CSS” inside V8. What does CSS have to do with GC?, I ask. I know the answer but my brain keeps asking the question.

In exchange for this goodness, conservative stack scanning means that because you can’t be sure that a word on the stack refers to an object and isn’t just a spicy integer, you can’t move objects that might be the target of a conservative root. And indeed the conservative edge might actually not point to the start of the object; it could be an interior pointer, which places additional constraints on the heap, that it be able to resolve internal pointers.

Security

Which brings us to security and the admirable nihilism of the sandbox effort. The idea is that everything is terrible, so why not just assume that no word is safe and that an attacker can modify any word they can address. The only way to limit the scope of an attacker’s modifications is then to limit the address space. This happens firstly by pointer compression, which happily also has some delightful speed and throughput benefits. Then the pointer cage is placed within a larger cage, and off-heap data such as Wasm memories and array buffers go in that larger cage. Any needed executable code or external object is accessed indirectly, through dedicated tables.

However, this indirection comes with a cost of a proliferation in the number of spaces. In the beginning, there was just an evacuating newspace, a mark-compact oldspace, and a non-moving large object space. Now there are closer to 20 spaces: a separate code space, a space for read-only objects, a space for trusted objects, a space for each kind of indirect descriptor used by the sandbox, in addition to spaces for objects that might be shared between threads, newspaces for many of the preceding kinds, and so on. From what I can see, managing this complexity has taken a significant effort. The result is pretty good to work with, but you pay for what you get. (Do you get security guarantees? I don’t know enough to say. Better pay some more to be sure.)

Finally, the C++ integration has also had an impact on the spaces structure, and with a security implication to boot. The thing is, conservative roots can’t be moved, but the original evacuating newspace required moveability. One can get around this restriction by pretenuring new allocations from C++ into the mark-compact space, but this would be a performance killer. The solution that V8 is going for is to use the block-structured mark-compact space that is already used for the old-space, but for new allocations. If an object is ever traced during a young-generation collection, its page will be promoted to the old generation, without copying. Originally called minor mark-compact or MinorMC in the commit logs, it was renamed to minor mark-sweep or MinorMS to indicate that it doesn’t actually compact. (V8’s mark-compact old-space doesn’t have to compact: V8 usually chooses to just mark in place. But we call it a mark-compact space because it has that capability.)

This last change is a performance hazard: yes, you keep the desirable bump-pointer allocation scheme for new allocations, but you lose on locality in the old generation, and the rate of promoted bytes will be higher than with the semi-space new-space. The only relief is that for a given new-space size, you can allocate twice as many objects, because you don’t need the copy reserve.

Why do I include this discussion in the security section? Well, because most MinorMS commits mention this locked bug. One day we’ll know, but not today. I speculate that evacuating is just too rich a bug farm, especially with concurrency and parallel mutators, and that never-moving collectors will have better security properties. But again, I don’t know for sure, and I prefer to preserve my ability to speculate rather than to ask for too many details.

Concurrency

Speaking of concurrency, ye gods, the last few years have been quite the ride I think. Every phase that can be done in parallel (multiple threads working together to perform GC work) is now fully parallel: semi-space evacuation, mark-space marking and compaction, and sweeping. Every phase that can be done concurrently (where the GC runs threads while the mutator is running) is concurrent: marking and sweeping. A major sweep task can run concurrently with an evacuating minor GC. And, V8 is preparing for multiple mutators running in parallel. It’s all a bit terrifying but again, with engineering investment and a huge farm of fuzzers, it seems to be a doable transition.

Concurrency and threads means that V8 has sprouted new schedulers: should a background task have incremental or concurrent marking? How many sweepers should a given isolate have? How should you pause concurrency when the engine needs to do something gnarly?

The latest in-progress work would appear to be concurrent marking of the new-space. I think we should expect this work to result in a lower overall pause-time, though I am curious also to learn more about the model: how precise is it? Does it allow a lot of slop to get promoted? It seems to have a black allocator, so there will be some slop, but perhaps it can avoid promotion for those pages. I don’t know yet.

Summary

Yeah, GCs, man. I find the move to a non-moving young generation is quite interesting and I wish the team luck as they whittle down the last sharp edges from the conservative-stack-scanning performance profile. The sandbox is pretty fun too. All good stuff and I look forward to spending a bit more time with it; engineering out.

by Andy Wingo at December 07, 2023 12:15 PM