Planet Igalia Chromiumhttps://planet.igalia.com/chromium/atom.xml2024-03-28T16:00:06+00:00Planet/2.0 +http://www.planetplanet.orgAndy Wingo: hacking v8 with guix, bishttps://wingolog.org/2024/03/26/hacking-v8-with-guix-bis2024-03-26T11:51:53+00:00
<div><p>Good day, hackers. Today, a pragmatic note, on hacking on <a href="https://v8.dev/">V8</a> from a
<a href="https://guix.gnu.org/">Guix</a> system.</p><p>I’m going to skip a lot of the background because, as it turns out, I
<a href="https://wingolog.org/archives/2015/08/04/developing-v8-with-guix">wrote about this already almost a decade
ago</a>.
But following that piece, I mostly gave up on doing V8 hacking from a
Guix machine—it was more important to just go with the flow of the
ever-evolving upstream toolchain. In fact, I ended up installing Ubuntu
LTS on my main workstations for precisely this reason, which has worked
fine; I still get Guix in user-space, which is better than nothing.</p><p>Since then, though, Guix has grown to the point that it’s easier to
create an environment that can run a complicated upstream source
management project like V8’s. This is mainly <a href="https://guix.gnu.org/manual/en/html_node/Invoking-guix-shell.html"><tt>guix shell</tt></a>
in the <tt>--container --emulate-fhs</tt> mode. This article is a step-by-step
for how to get started with V8 hacking using Guix.</p><h3>get the code</h3><p>You would think this would be the easy part: just <tt>git clone</tt> the V8
source. But no, the build wants a number of other Google-hosted
dependencies to be vendored into the source tree. To perform the
initial fetch for those dependencies and to keep them up to date, you
use helpers from the
<a href="https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/depot_tools_tutorial.html"><tt>depot_tools</tt></a>
project. You also use <tt>depot_tools</tt> to submit patches to code review.</p><p>When you live in the Guix world, you might be tempted to look into what
<tt>depot_tools</tt> actually does, and to replicate its functionality in a
more minimal, Guix-like way. Which, sure, perhaps this is a good
approach for <i>packaging</i> V8 or Chromium or something, but when you want
to work <i>on</i> V8, you need to learn some humility and just go with the
flow. (It’s hard for the kind of person that uses Guix. But it’s what
you do.)</p><p>You can make some small adaptations, though. <tt>depot_tools</tt> is mostly
written in Python, and it actually bundles its own <tt>virtualenv</tt> support
for using a specific python version. This isn’t strictly needed, so we
can set the funny environment variable <tt>VPYTHON_BYPASS="manually managed python not supported by chrome operations"</tt> to just use python from the
environment.</p><p>Sometimes <tt>depot_tools</tt> will want to run some prebuilt binaries.
Usually on Guix this is anathema—we always build from source—but there’s
only so much time in the day and the build system is not our circus, not
our monkeys. So we get Guix to set up the environment using a container
in
<a href="https://guix.gnu.org/manual/en/html_node/Invoking-guix-shell.html#index-FHS-_0028file-system-hierarchy-standard_0029"><tt>--emulate-fhs</tt></a>
mode; this lets us run third-party pre-build binaries. Note, these
binaries are indeed free software! We can run them just fine if we
trust Google, which you have to when working on V8.</p><h3>no, really, get the code</h3><p>Enough with the introduction. The first thing to do is to check out
<tt>depot_tools</tt>.</p><pre>mkdir src
cd src
git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
</pre><p>I’m assuming you have <tt>git</tt> in your Guix environment already.</p><p>Then you need to initialize <tt>depot_tools</tt>. For that you run a python
script, which needs to run other binaries – so we need to make a
specific environment in which it can run. This starts with a <i>manifest</i>
of packages, is conventionally placed in a file named <tt>manifest.scm</tt> in
the project’s working directory, though you don’t have one yet, so you
can just write it into <tt>v8.scm</tt> or something anywhere:</p><pre>(use-modules (guix packages)
(gnu packages gcc))
(concatenate-manifests
(list
(specifications->manifest
'(
"bash"
"binutils"
"clang-toolchain"
"coreutils"
"diffutils"
"findutils"
"git"
"glib"
"glibc"
"glibc-locales"
"grep"
"less"
"ld-gold-wrapper"
"make"
"nss-certs"
"nss-mdns"
"openssh"
"patch"
"pkg-config"
"procps"
"python"
"python-google-api-client"
"python-httplib2"
"python-pyparsing"
"python-requests"
"python-tzdata"
"sed"
"tar"
"wget"
"which"
"xz"
))
(packages->manifest
`((,gcc "lib")))))
</pre><p>Then, you <tt>guix shell -m v8.scm</tt>. But you actually do more than that,
because we need to set up a container so that we can expose a standard
<tt>/lib</tt>, <tt>/bin</tt>, and so on:</p><pre>guix shell --container --network \
--share=$XDG_RUNTIME_DIR --share=$HOME \
--preserve=TERM --preserve=SSH_AUTH_SOCK \
--emulate-fhs \
--manifest=v8.scm
</pre><p>Let’s go through these options one by one.</p><ul><li><p><tt>--container</tt>: This is what lets us run pre-built binaries, because
it uses Linux namespaces to remap the composed packages to <tt>/bin</tt>,
<tt>/lib</tt>, and so on.</p></li><li><p><tt>--network</tt>: Depot tools are going to want to download things, so we
give them net access.</p></li><li><p><tt>--share</tt>: By default, the container shares the current working
directory with the “host”. But we need not only the checkout for V8
but also the sibling checkout for depot tools (more on this in a
minute); let’s just share the whole home directory. Also, we share
the <tt>/run/user/1000</tt> directory, which is <tt>$XDG_RUNTIME_DIR</tt>, which
lets us access the SSH agent, so we can check out over SSH.</p></li><li><p><tt>--preserve</tt>: By default, the container gets a pruned environment.
This lets us pass some environment variables through.</p></li><li><p><tt>--emulate-fhs</tt>: The crucial piece that lets us bridge the gap
between Guix and the world.</p></li><li><p><tt>--manifest</tt>: Here we specify the list of packages to use when
composing the environment.</p></li></ul><p>We can use short arguments to make this a bit less verbose:</p><pre>guix shell -CNF --share=$XDG_RUNTIME_DIR --share=$HOME \
-ETERM -ESSH_AUTH_SOCK -m manifest.scm
</pre><p>I would like it if all of these arguments could somehow be optional,
that I could get a bare <tt>guix shell</tt> invocation to just apply them, when
run in this directory. Perhaps some day.</p><p>Running <tt>guix shell</tt> like this drops you into a terminal. So let’s
initialize depot tools:</p><pre>cd $HOME/src
export VPYTHON_BYPASS="manually managed python not supported by chrome operations"
export PATH=$HOME/src/depot_tools:$PATH
export SSL_CERT_DIR=/etc/ssl/certs/
export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
gclient
</pre><p>This should download a bunch of things, I don’t know what. But at this
point we’re ready to go:</p><pre>fetch v8
</pre><p>This checks out V8, which is about 1.3 GB, and then probably about as
much again in dependencies.</p><h3>build v8</h3><p>You can build V8 directly:</p><pre># note caveat below!
cd v8
tools/dev/gm.py x64.release
</pre><p>This will build fine... and then fail to link. The precise reason is obscure to me: it would seem
that by default, V8 uses a whole Debian sysroot for Some Noble Purpose, and ends up linking against it. But it
compiles against system glibc, which seems to have replaced <tt>fcntl64</tt>
with a versioned symbol, or some such nonsense. It smells like V8 built
against a too-new glibc and then failed trying to link to an old glibc.</p><p>To fix this, you need to go into the <tt>args.gn</tt> that was generated in
<tt>out/x64.release</tt> and then add <tt>use_sysroot = false</tt>, so that it links
to system glibc instead of the downloaded one.</p><pre>echo 'use_sysroot = false' >> out/x64.release/args.gn
tools/dev/gm.py x64.release
</pre><p>You probably want to put the commands needed to set up your environment
into some shell scripts. For Guix you could make <tt>guix-env</tt>:</p><pre>#!/bin/sh
guix shell -CNF --share=$XDG_RUNTIME_DIR --share=$HOME \
-ETERM -ESSH_AUTH_SOCK -m manifest.scm -- "$@"
</pre><p>Then inside the container you need to set the <tt>PATH</tt> and such, so we
could put this into the V8 checkout as <tt>env</tt>:</p><pre>#!/bin/sh
# Look for depot_tools in sibling directory.
depot_tools=`cd $(dirname $0)/../depot_tools && pwd`
export PATH=$depot_tools:$PATH
export VPYTHON_BYPASS="manually managed python not supported by chrome operations"
export SSL_CERT_DIR=/etc/ssl/certs/
export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
exec "$@"
</pre><p>This way you can run <tt>./guix-env ./env tools/dev/gm.py x64.release</tt> and
not have to “enter” the container so much.</p><h3>notes</h3><p>This all works fine enough, but I do have some meta-reflections.</p><p>I would prefer it if I didn’t have to use containers, for two main
reasons. One is that the resulting build artifacts have to be run in
the container, because they are dynamically linked to e.g. <tt>/lib</tt>, at
least for the ELF loader. It would be better if I could run them on the
host (with the host debugger, for example). Using Guix to make the
container is better than e.g. docker, though, because I can ensure that
the same tools are available in the guest as I use on the host. But
also, I don’t like adding “modes” to my terminals: are you in or out of
this or that environment. Being in a container is not like being in a
vanilla <tt>guix shell</tt>, and that’s annoying.</p><p>The build process uses many downloaded tools and artifacts, including
<tt>clang</tt> itself. This is a feature, in that I am using the same compiler
that colleagues at Google use, which is important. But it’s also
annoying and it would be nice if I could choose. (Having the same
<tt>clang-format</tt> though is an absolute requirement.)</p><p>There are two tests failing, in this configuration. It is somehow
related to time zones. I have no idea why, but I just ignore them.</p><p>If the build system were any weirder, I would think harder about maybe
using Docker or something like that. Colleagues point to
<a href="https://github.com/89luca89/distrobox"><tt>distrobox</tt></a> as being a useful
wrapper. It is annoying though, because such a docker image becomes
like a little stateful thing to do sysadmin work on, and I would like to
avoid that if I can.</p><p>Welp, that’s all for today. Hopefully if you are contemplating
installing Guix as your operating system (rather than just in
user-space), this can give you a bit more information as to what it
might mean when working on third-party projects. Happy hacking and
until next time!</p></div> Andy Wingohttps://wingolog.org/José Dapena: Maintaining downstreams of Chromium: why downstream?http://blogs.igalia.com/dape/?p=4092024-03-05T15:54:05+00:00
<p><a href="https://www.chromium.org/Home/">Chromium</a>, the web browser open source project <a href="https://www.google.com/intl/en_us/chrome/">Google Chrome</a> is based on, can be considered nowadays the reference implementation of the web platform. As such, it is the first choice when implementing the web platform in a software platform or product.</p>
<p>Why is it like this? In this blog post I am going to introduce the topic, and then review the different reasons why a downstream Chromium is used in different projects.</p>
<h2>A series of blog posts</h2>
<p>This is the first of a series of blog posts, where I am going through several aspects and challenges of maintaining a downstream project using Chromium as its upstream project.</p>
<p>They will be mostly based on the discussions in two events. First, on <a href="https://github.com/Igalia/webengineshackfest/issues/9">The Web Engines Hackfest 2023 break out session</a> with same title that I chaired in A Coruña. Then, on my <a href="https://youtu.be/N47g8V9y7pc">BlinkOn 18 talk</a> in November 2023, at Sunnyvale.</p>
<h2>Some definitions</h2>
<p>Before starting the discussion of the different aspects, let’s clarify how I will use several terms.</p>
<h4>Repository vs. project</h4>
<p>I am going to refer to a repository as a version controlled storage of code strictly. Then, a project (specifically, a software project) is the community of people that share goals and some kind of organization, to maintain one or several software products.</p>
<p>So, a project may use several repositories for their goals.</p>
<p>In this discussion I will talk about Chromium, an open source project that targets the implementation of the web platform user agent, a web browser, for different platforms. As such, it uses a number of repositories (<code>src</code>, <code>v8</code> and more).</p>
<h4>Downstream and upstream</h4>
<p>I will use the downstream and upstream terms, referred to the relationship of different software projects version control repositories.</p>
<p>If there is a software project repository (typically open source), and a new repository is created that contains all or part of the original repository, then:</p>
<ul>
<li>Upstream project will be the original repository.</li>
<li>Downstream project is the new repository.</li>
</ul>
<p>It is important to highlight that different things can happen to the downstream repository:</p>
<ul>
<li>This copy could be a one time event, so the downstream repository becomes an independent fork, and there may be no interest in tracking the upstream evolution. This happens often for abandoned repositories, where a different set of people start an independent project. But there could be other reasons.</li>
<li>There is the intent to track the upstream repository changes. So the downstream repository evolves as the upstream repository does too, but with some specific differences maintained on top of the original repository.</li>
</ul>
<h2>Why using Chromium?</h2>
<p>Nowadays, web platform is a solid alternative for providing contents to the users. It allows modern user interfaces, based on well known standards, and integrate well with local and remote services. The gap between native applications and web contents has been reduced, so it is a good alternative quite often.</p>
<p>But, when integrating web contents, product integrators need an implementation of the web platform. It is no surprise that Chromium is the most used, for a number of reasons:</p>
<ul>
<li>It is open source, with a license that allows to adapt it for new product needs.</li>
<li>It is well maintained and up to date. Even pushing through standardization for improving it continuously.</li>
<li>It is secure, both from architecture and maintenance model point of view.</li>
<li>It provides integration points to tailor the implementation to ones needs.</li>
<li>It supports the most popular software platforms (Windows, Android, Linux, …) for integrating new products.</li>
<li>On top of the web platform itself, it provides an implementation for many of the components required to build a modern web browser.</li>
</ul>
<p>Still, there are other good alternate choices for integrating the web, as <a href="https://webkit.org/">WebKit</a> (specially <a href="https://wpewebkit.org/">WPE</a> for the embedded use cases), or using the system-provided web components (Android or iOS web view, …).</p>
<p>Though, in this blog post I will focus on the Chromium case.</p>
<h2>Why downstreaming Chromium?</h2>
<p>But, why do different projects need to use downstream Chromium repositories?</p>
<p>The main simple reason the project needs a downstream repository is adding changes that are not upstream. This can be for a variety of reasons:</p>
<ul>
<li>Downstream changes that are not allowed by upstream. I.e. because they will make upstream project harder to maintain, or it will not be tested often.</li>
<li>Downstream changes that downstream project does not want to add to upstream.</li>
</ul>
<p>Let’s see some examples of changes of both types.</p>
<h3>Hardware and OS adaptation</h3>
<p>This is when downstream adds support for a hardware target or OS that is not originally supported in upstream Chromium project.</p>
<p>Chromium upstream provides an abstraction layer for that purpose named <a href="https://chromium.googlesource.com/chromium/src/+/HEAD/docs/ozone_overview.md">Ozone</a>, that allows to adapt it to the OS, desktop environment, and system graphics compositor. But there are other abstraction layers for media acceleration, accessibility or input methods.</p>
<p>The <a href="https://chromium.googlesource.com/chromium/src.git/+/refs/heads/main/ui/ozone/platform/wayland/">Wayland protocol adaptation</a> started as a downstream effort, as upstream Chromium did not intend to support Wayland at that time. Eventually it evolved into an upstream official Ozone backend.</p>
<p>An example? <a href="https://www.webosose.org/">LGE webOS</a> Chromium port.</p>
<h3>Differentiation</h3>
<p>The previous case mostly forces to have a downstream project or repository. But there are also some cases where this is intended. There is the will to have some features in the downstream repository and not in upstream, an intended differentiation.</p>
<p>Why would anybody want that? Some typical examples:</p>
<ul>
<li>A set of features that the downstream project owners consider to make the project better in some way, and want them to be kept downstream. This can happen when a new browser is shipped, and it contains features that make the product offering different, and, in some ways, better, than upstream Chrome. That can be a different user experience, some security features, better privacy…</li>
<li>Adaptation to a different product brand. Each browser or browser-based product will want to have its specific brand instead of upstream Chromium brand.</li>
</ul>
<p>Examples of this:</p>
<ul>
<li><a href="https://brave.com/">Brave browser</a>, with completely different privacy and security choices.</li>
<li><a href="https://arc.net/">ARC browser</a>, with an innovative user experience.</li>
<li><a href="https://www.microsoft.com/en-us/edge">Microsoft Edge</a>, with tight Windows OS integration and corporate features.</li>
</ul>
<h3>Hybrid application runtimes</h3>
<p>And one last interesting case: integrating the web platform for developing hybrid applications: those that mix parts of the user interface implemented in a native toolkit, and parts implemented using the web platform.</p>
<p>Though Chromium includes official support for hybrid applications for Android, with the Android Web View, other toolkits provide also web applications support, and the integration of those belong, in Chromium case, to downstream projects.</p>
<p>Examples?</p>
<ul>
<li><a href="https://doc.qt.io/qt-6/qtwebengine-overview.html">Qt Web Engine</a>.</li>
<li><a href="https://bitbucket.org/chromiumembedded/cef/">CEF</a>.</li>
</ul>
<h2>What’s next?</h2>
<p>In this blog post I presented different reasons why projects end up maintaining a downstream fork of Chromium.</p>
<p>In the next blog post I will present one of the main challenges when maintaining a downstream of Chromium: the different rebase and upgrade strategies.</p> José Dapena Pazhttps://blogs.igalia.com/dapeAndy Wingo: v8's precise field-logging remembered sethttps://wingolog.org/2024/01/05/v8s-precise-field-logging-remembered-set2024-01-05T09:44:21+00:00
<div><p>A <i>remembered set</i> is used by a garbage collector to identify graph
edges between partitioned sub-spaces of a heap. The canonical example
is in generational collection, where you allocate new objects in
<i>newspace</i>, and eventually promote survivor objects to <i>oldspace</i>. If
most objects die young, we can focus GC effort on newspace, to avoid
traversing all of oldspace all the time.</p><p>Collecting a subspace instead of the whole heap is sound if and only if
we can identify all live objects in the subspace. We start with some
set of <i>roots</i> that point into the subspace from outside, and then
traverse all links in those objects, but only to other objects within
the subspace.</p><p>The roots are, like, global variables, and the stack, and registers; and
in the case of a partial collection in which we identify live objects
only within newspace, also any link into newspace from other spaces
(oldspace, in our case). This set of inbound links is a <i>remembered
set</i>.</p><p>There are a few strategies for maintaining a remembered set. Generally
speaking, you start by implementing a write barrier that intercepts all
stores in a program. Instead of:</p><pre>obj[slot] := val;
</pre><p>You might abstract this away:</p><pre>write_slot(obj, sizeof obj, &obj[slot], val);
</pre><p>As you can see, it’s quite an annoying transformation to do by hand;
typically you will want some sort of language-level abstraction that
lets you keep the more natural syntax. C++ can do this pretty well, or
if you are implementing a compiler, you just add this logic to the
code generator.</p><p>Then the actual write barrier... well its implementation is twingled up
with implementation of the remembered set. The simplest variant is a
<i>card-marking</i> scheme, whereby the heap is divided into equal-sized
power-of-two-sized <i>cards</i>, and each card has a bit. If the heap is
also divided into blocks (say, 2 MB in size), then you might divide
those blocks into 256-byte cards, yielding 8192 cards per block. A
barrier might look like this:</p><pre>void write_slot(ObjRef obj, size_t size,
SlotAddr slot, ObjRef val) {
obj[slot] := val; // Start with the store.
uintptr_t block_size = 1<<21;
uintptr_t card_size = 1<<8;
uintptr_t cards_per_block = block_size / card_size;
uintptr_t obj_addr = obj;
uintptr_t card_idx = (obj_addr / card_size) % cards_per_block;
// Assume remset allocated at block start.
void *block_start = obj_addr & ~(block_size-1);
uint32_t *cards = block_start;
// Set the bit.
cards[card_idx / 32] |= 1 << (card_idx % 32);
}
</pre><p>Then when marking the new generation, you visit all cards, and for all
marked cards, trace all outbound links in all live objects that begin on
the card.</p><p>Card-marking is simple to implement and simple to statically allocate as
part of the heap. Finding marked cards takes time proportional to the
size of the heap, but you hope that the constant factors and SIMD
minimize this cost. However iterating over objects within a card can be
costly. You hope that there are few old-to-new links but what do you
know?</p><p>In <a href="https://github.com/wingo/whippet">Whippet</a> I have been struggling a
bit with <a href="https://wingolog.org/archives/2022/10/22/the-sticky-mark-bit-algorithm">sticky-mark-bit generational
marking</a>,
in which new and old objects are not spatially partitioned. Sometimes
generational collection is a win, but in benchmarking I find that often
it isn’t, and I think <a href="https://github.com/wingo/whippet/blob/main/api/gc-api.h#L187">Whippet’s card-marking
barrier</a>
is at fault: it is simply too imprecise. Consider firstly that our
write barrier applies to stores to slots in all objects, not just those
in oldspace; a store to a new object will mark a card, but that card may
contain old objects which would then be re-scanned. Or consider a store
to an old object in a more dense part of oldspace; scanning the card may
incur more work than needed. It could also be that Whippet is being too
aggressive at re-using blocks for new allocations, where it should be
limiting itself to blocks that are very sparsely populated with old
objects.</p><h3>what v8 does</h3><p>There is a tradeoff in write barriers between the overhead imposed on
stores, the size of the remembered set, and the precision of the
remembered set. Card-marking is relatively low-overhead and usually
small as a fraction of the heap, but not very precise. It would be
better if a remembered set recorded objects, not cards. And it would be
even better if it recorded slots in objects, not just objects.</p><p>V8 takes this latter strategy: it has per-block remembered sets which
record slots containing “interesting” links. All of the above words
were to get here, to take a brief look at its remembered set.</p><p>The main operation is
<a href="https://chromium.googlesource.com/v8/v8.git/+/refs/heads/main/src/heap/remembered-set.h#92"><tt>RememberedSet::Insert</tt></a>.
It takes the <tt>MemoryChunk</tt> (a block, in our language from above) and the
address of a slot in the block. Each block has a remembered set; in
fact, <a href="https://chromium.googlesource.com/v8/v8.git/+/refs/heads/main/src/heap/memory-chunk-layout.h#25">six remembered
sets</a>
for some reason. The remembered set itself is a
<a href="https://chromium.googlesource.com/v8/v8.git/+/refs/heads/main/src/heap/slot-set.h"><tt>SlotSet</tt></a>,
whose interesting operations come from
<a href="https://chromium.googlesource.com/v8/v8.git/+/refs/heads/main/src/heap/base/basic-slot-set.h"><tt>BasicSlotSet</tt></a>.</p><p>The structure of a slot set is a <a href="https://chromium.googlesource.com/v8/v8.git/+/refs/heads/main/src/heap/base/basic-slot-set.h#44">bitvector partitioned into
equal-sized, possibly-empty
buckets</a>.
There is one bit per slot in the block, so in the limit the size
overhead for the remembered set may be 3% (1/32, assuming compressed
pointers). Currently <a href="https://chromium.googlesource.com/v8/v8.git/+/refs/heads/main/src/heap/base/basic-slot-set.h#265">each bucket is 1024 bits (128
bytes)</a>,
plus the 4 bytes for the bucket pointer itself.</p><p><a href="https://chromium.googlesource.com/v8/v8.git/+/refs/heads/main/src/heap/base/basic-slot-set.h#109">Inserting into the slot
set</a>
will first allocate a bucket (using C++ <tt>new</tt>) if needed, then <a href="https://chromium.googlesource.com/v8/v8.git/+/refs/heads/main/src/heap/base/basic-slot-set.h#130">load the
“cell” (32-bit
integer)</a>
containing the slot. There is a template parameter declaring whether
this is an atomic or normal load. Finally, if the slot bit in the cell
is not yet set, V8 will set the bit, possibly using atomic
compare-and-swap.</p><p>In the language of Blackburn’s <a href="https://users.cecs.anu.edu.au/~steveb/pubs/papers/fieldbarrier-ismm-2019.pdf"><i>Design and analysis of field-logging
write
barriers</i></a>,
I believe this is a field-logging barrier, rather than the bit-stealing
slot barrier described by Yang et al in the 2012 <a href="https://users.cecs.anu.edu.au/~steveb/pubs/papers/barrier-ismm-2012.pdf"><i>Barriers
Reconsidered, Friendlier
Still!</i></a>.
Unlike Blackburn’s field-logging barrier, however, this remembered set
is implemented completely on the side: there is no in-object remembered
bit, nor remembered bits for the fields.</p><p>On the one hand, V8’s remembered sets are precise. There are some
tradeoffs, though: they require off-managed-heap dynamic allocation for
the buckets, and <a href="https://chromium.googlesource.com/v8/v8.git/+/refs/heads/main/src/heap/minor-mark-sweep.cc#162">traversing the remembered
sets</a>
takes time proportional to the whole heap size. And, should V8 ever
switch its <a href="https://wingolog.org/archives/2023/12/08/v8s-mark-sweep-nursery">minor mark-sweep generational
collector</a>
to use sticky mark bits, the lack of a spatial partition could lead to
similar problems as I am seeing in Whippet. I will be interested to see
what they come up with in this regard.</p><p>Well, that’s all for today. Happy hacking in the new year!</p></div> Andy Wingohttps://wingolog.org/Jacobo Aragunde: Setting up a minimal, command-line Android emulator on Linuxhttp://blogs.igalia.com/jaragunde/?p=11542023-12-14T17:00:47+00:00
<p>Android has all kinds of nice development tools, but sometimes you just want to run an apk and don’t need all the surrounding tooling. In my case, I have already have my Chromium setup, which can produce binaries for several platforms <a href="https://source.chromium.org/chromium/chromium/src/+/main:docs/android_build_instructions.md">including Android</a>.</p>
<p>I usually test on a physical device, a smartphone, but I would like to try a device with a tablet form factor and I don’t have one at hand.</p>
<div id="attachment_1162" class="wp-caption aligncenter"><a href="http://blogs.igalia.com/jaragunde/files/2023/12/chromium_android_screenshots.jpg"><img src="http://blogs.igalia.com/jaragunde/files/2023/12/chromium_android_screenshots-1024x535.jpg" alt="Chromium smartphone and tablet screenshots" width="584" height="305" class="size-large wp-image-1162" /></a><p id="caption-attachment-1162" class="wp-caption-text">Chromium provides different user experiences for smartphones and tablets.</p></div>
<p>I’ve set up the most stripped-down environment possible to run an Android emulator using the same tools provided by the platform. Notice these are generic instructions, not tied to Chromium tooling, despite I’m doing this mainly to run Chromium.</p>
<p>The first step is to download the latest version of the <strong>command line tools</strong>, instead of Android Studio, from the <a href="https://developer.android.com/studio#command-tools">Android developer website</a>. We will extract the tools to the location <code>~/Android/Sdk</code>, in the path where they like to find themselves:</p>
<pre><code class="shell">mkdir -p ~/Android/Sdk/cmdline-tools
unzip -d ~/Android/Sdk/cmdline-tools ~/Downloads/commandlinetools-linux-10406996_latest.zip
mv ~/Android/Sdk/cmdline-tools/cmdline-tools/ ~/Android/Sdk/cmdline-tools/latest
</code></pre>
<p>Now we have the tools installed in <code>~/Android/Sdk/cmdline-tools/latest</code>, we need to <strong>add their binaries to the path</strong>. We will also add other paths for tools we are about to install, with the command:</p>
<pre><code class="shell">export PATH=~/Android/Sdk/cmdline-tools/latest/bin:~/Android/Sdk/emulator:~/Android/Sdk/platform-tools:$PATH
</code></pre>
<p>Run the command above for a one-time use, or add it to <code>~/.bashrc</code> or your preferred shell configuration for future use.</p>
<p>You will also need to setup another envvar:</p>
<pre><code class="shell">export ANDROID_HOME=~/Android/Sdk
</code></pre>
<p>Again, run it once for a one-time use or add it to your shell configuration in <code>~/.bashrc</code> or equivalent.</p>
<p>Now we use the tool <code>sdkmanager</code>, which came with Android’s command-line tools, to <strong>install the rest of the software</strong> we need. All of this will be installed inside your <code>$ANDROID_HOME</code>, so in <code>~/Android/Sdk</code>.</p>
<pre><code class="shell">sdkmanager "emulator" "platform-tools" "platforms;android-29" "system-images;android-29;google_apis;x86_64"
</code></pre>
<p>In the command above I have installed the emulator, platform tools (<code>adb</code> and friends), and system libraries and a system image for Android 10 (API level 29). You may check what other things are available with <code>sdkmanager --list</code>, you will find multiple variants of the Android platform and system images.</p>
<p>Finally, we have to <strong>setup a virtual machine</strong>, called AVD (“Android Virtual Device”) in Android jargon. We do that with:</p>
<pre><code class="shell">avdmanager -v create avd -n Android_API_29_Google -k "system-images;android-29;google_apis;x86_64" -d pixel_c
</code></pre>
<p>Here I have created a virtual device called “Android_API_29_Google” with the system image I had downloaded, and the form factor and screen size of a <a href="https://en.wikipedia.org/wiki/Pixel_C">Pixel C</a>, which is a tablet. You may get a list of all the devices that can be simulated with <code>avdmanager list device</code>.</p>
<p>Now we can already <strong>start an emulator</strong> running the AVD with the name we have just chosen, with:</p>
<pre><code class="shell">emulator -avd Android_API_29_Google
</code></pre>
<p>The files and configuration for this AVD are stored in <code>~/.android/avd/Android_API_29_Google.avd</code>, or a sibling directory if you used a different name. <strong>Configuration</strong> is in the <code>config.ini</code> file, you can set here many options you would normally configure from Android Studio, and <a href="https://learn.microsoft.com/en-us/xamarin/android/get-started/installation/android-emulator/device-properties">even more</a>. I recommend to change at least this value, for your convenience:</p>
<pre><code>hw.keyboard = yes
</code></pre>
<p>Otherwise you will find yourself typing URLs with a mouse on a virtual keyboard… Not fun.</p>
<p>Finally, your software will be the up-to-date after a fresh install but, when new versions are released, you will be able to install them with:</p>
<pre><code class="shell">sdkmanager --update
</code></pre>
<p>Make sure to run this every now and then!</p>
<p>At this point, your setup should be ready, and all the tools you need are in the PATH. Feel free to reuse the commands above to create any number of virtual devices with different Android system images, different form factors… Happy hacking!</p> Jacobo Aragunde Pérezhttps://blogs.igalia.com/jaragundeAndy Wingo: v8's mark-sweep nurseryhttps://wingolog.org/2023/12/08/v8s-mark-sweep-nursery2023-12-08T14:34:03+00:00
<div><p>Today, a followup to <a href="https://wingolog.org/archives/2023/12/07/the-last-5-years-of-v8s-garbage-collector">yesterday’s note</a> with some more details on V8’s new
young-generation implementation, <i>minor mark-sweep</i> or <i>MinorMS</i>.</p><p>A
caveat again: these observations are just from reading the code; I
haven’t run these past the MinorMS authors yet, so any of these details
might be misunderstandings.</p><p>The MinorMS nursery consists of <i>pages</i>, each of which is 256 kB, unless
huge-page mode is on, in which case they are 2 MB. The total default
size of the nursery is 72 MB by default, or 144 MB if <a href="https://v8.dev/blog/pointer-compression">pointer
compression</a> is off.</p><p>There can be multiple threads allocating into the nursery, but let’s
focus on the <a href="https://chromium.googlesource.com/v8/v8.git/+/refs/heads/main/src/heap/main-allocator.h">main
allocator</a>,
which is used on the main thread. Nursery allocation is bump-pointer,
whether in a MinorMS page or scavenger semi-space. Bump-pointer regions
are called <i>linear allocation buffers</i>, and often abbreviated as <tt>Lab</tt>
in the source, though the class is
<a href="https://chromium.googlesource.com/v8/v8.git/+/refs/heads/main/src/heap/linear-allocation-area.h"><tt>LinearAllocationArea</tt></a>.</p><p>If the current bump-pointer region is too small for the current
allocation, the nursery implementation finds another one, or triggers a
collection. For the MinorMS nursery, each page collects the set of
allocatable spans in a free list; if the free-list is non-empty, it pops
off one entry as the current and tries again.</p><p>Otherwise, MinorMS needs another page, and specifically a <i>swept page</i>:
a page which has been visited since the last GC, and whose spans of
unused memory have been collected into a free-list. There is a
concurrent sweeping task which should usually run ahead of the mutator,
but if there is no swept page available, the allocator might need to
sweep some. This logic is in
<a href="https://chromium.googlesource.com/v8/v8.git/+/refs/heads/main/src/heap/main-allocator.cc#625"><tt>MainAllocator::RefillLabMain</tt></a>.</p><p>Finally, if all pages are swept and there’s no Lab big enough for the
current allocation, we trigger collection from the roots. The initial
roots are the <i>remembered set</i>: pointers from old objects to new
objects. Most of the trace happens concurrently with the mutator; when
the nursery utilisation rises over 90%, V8 will kick off concurrent
marking tasks.</p><p>Then once the mutator actually runs out of space, it pauses, drains any pending marking work, marks
conservative roots, then drains again. I am not sure whether MinorMS
with conservative stack scanning visits the whole C/C++ stacks or
whether it manages to install some barriers (i.e. “don’t scan deeper
than 5 frames because we collected then, and so all older frames are
older”); dunno. All of this logic is in
<a href="https://chromium.googlesource.com/v8/v8.git/+/refs/heads/main/src/heap/minor-mark-sweep.cc#618"><tt>MinorMarkSweepCollector::MarkLiveObjects</tt></a>.</p><p>Marking traces the object graph, setting object mark bits. It does not
trace pages. However, the MinorMS space promotes in units of pages. So
how to decide what pages to promote? The answer is that <i>sweeping</i> partitions the MinorMS pages into empty,
recycled, aging, and promoted pages.</p><p>Empty pages have no surviving
objects, and are very useful because they can be given back to the
operating system if needed or shuffled around elsewhere in the system. If they are re-used for allocation, they do not need to be swept.</p><p>Recycled pages have some survivors, but not many; MinorMS keeps the page
around for allocation in the next cycle, because it has enough empty
space. By default, a page is recyclable if it has 50% or more free
space after a minor collection, or 30% after a major collection.
MinorMS also promotes a page eagerly if in the last cycle, we only
managed to allocate into 30% or less of its empty space, probably due to
fragmentation. These pages need to be swept before re-use.</p><p>Finally, MinorMS doesn’t let pages be recycled indefinitely:
after 4 minor cycles, a page goes into the <i>aging</i> pool, in which it is
kept unavailable for allocation for one cycle, but is not yet promoted.
This allows any new allocations on that page in the previous cycle age
out and probably die, preventing premature tenuring.</p><p>And that’s it. Next time, a note on a way in which generational
collectors can run out of memory. Have a nice weekend, hackfolk!</p></div> Andy Wingohttps://wingolog.org/Andy Wingo: the last 5 years of V8's garbage collectorhttps://wingolog.org/2023/12/07/the-last-5-years-of-v8s-garbage-collector2023-12-07T12:15:45+00:00
<div><p>Captain, status report: I’m down here in a Jeffries tube, poking at V8’s
garbage collector. However, despite working on other areas of the project
recently, V8 is now so large that it’s necessary to ignore whole subsystems when working on any given task. But now I’m looking at the GC in anger: what is its deal? What does V8’s GC even look like these days?</p><p>The <a href="https://v8.dev/blog/trash-talk">last public article on the structure of V8’s garbage
collector</a> was in 2019; fine enough, but
dated. Now in the evening of 2023 I think it could be useful to revisit
it and try to summarize the changes since then. At least, it would have
been useful to me had someone else written this article.</p><p>To my mind, work on V8’s GC has had three main goals over the last 5
years: improving interactions between the managed heap and C++,
improving security, and increasing concurrency. Let’s visit these in
turn.</p><h3>C++ and GC</h3><p>Building on the 2018 <a href="https://docs.google.com/document/d/1Hs60Zx1WPJ_LUjGvgzt1OQ5Cthu-fG-zif-vquUH_8c/edit#heading=h.nh3gzht95k4n">integration of the Oilpan tracing garbage
collector into the Blink web
engine</a>,
there was some refactoring to <a href="https://chromium.googlesource.com/v8/v8/+/main/include/cppgc/README.md">move the implementation of Oilpan into V8
itself</a>.
Oilpan is known internally as <i>cppgc</i>.</p><p>I find the cppgc name a bit annoying because I can never remember what
it refers to, because of the other thing that has been happpening in C++
integration: a migration away from <a href="https://github.com/v8/v8/blob/main/src/handles/handles.h">precise
roots</a> and
instead towards <a href="https://bugs.chromium.org/p/v8/issues/detail?id=13257">conservative
root-finding</a>.</p><p>Some notes here: with conservative stack scanning, we can hope for
better mutator throughput and fewer bugs. The throughput comes from not
having to put all live pointers in memory; the compiler can keep them in
registers, and avoid managing the HandleScope. You may be able to avoid
the compile-time and space costs of stack maps (<a href="https://wingolog.org/archives/2023/10/16/on-safepoints">side tables telling the
collector where the pointers
are</a>). There
are also two classes of bug that we can avoid: holding on to a handle
past the lifetime of a handlescope, and holding on to a raw pointer
(instead of a handle) during a potential GC point.</p><p>Somewhat confusingly, it would seem that conservative stack scanning has
garnered the acronym “CSS” inside V8. <i>What does CSS have to do with
GC?</i>, I ask. I know the answer but my brain keeps asking the question.</p><p>In exchange for this goodness, conservative stack scanning means that
because you can’t be sure that a word on the stack refers to an object
and isn’t just a spicy integer, you can’t move objects that might be the
target of a conservative root. And indeed the conservative edge might
actually not point to the start of the object; it could be an interior
pointer, which places additional constraints on the heap, that it be
able to resolve internal pointers.</p><h3>Security</h3><p>Which brings us to security and the admirable nihilism of the <a href="https://docs.google.com/document/d/1FM4fQmIhEqPG8uGp5o9A-mnPB5BOeScZYpkHjo0KKA8/edit#heading=h.xzptrog8pyxf">sandbox
effort</a>.
The idea is that everything is terrible, so why not just assume that no
word is safe and that an attacker can modify any word they can address. The only way to limit the scope of an attacker’s modifications is then to
limit the address space. This happens firstly by <a href="https://v8.dev/blog/pointer-compression">pointer
compression</a>, which happily
also has some delightful speed and throughput benefits. Then the
pointer cage is placed within a larger cage, and off-heap data such as
Wasm memories and array buffers go in that larger cage. Any needed
<a href="https://docs.google.com/document/d/1CPs5PutbnmI-c5g7e_Td9CNGh5BvpLleKCqUnqmD82k/edit#heading=h.xzptrog8pyxf">executable
code</a>
or <a href="https://docs.google.com/document/d/1V3sxltuFjjhp_6grGHgfqZNK57qfzGzme0QTk0IXDHk/edit">external
object</a>
is accessed indirectly, through dedicated tables.</p><p>However, this indirection comes with a cost of a proliferation in the
number of spaces. <a href="https://chromium.googlesource.com/v8/v8.git/+/43d26ecc3563a46f62a0224030667c8f8f3f6ceb/src/spaces.h#36">In the
beginning</a>,
there was just an evacuating newspace, a mark-compact oldspace, and a
non-moving large object space. Now there are closer to 20 spaces: a
separate code space, a space for <i>read-only</i> objects, a space for
<i>trusted</i> objects, a space for each kind of indirect descriptor used by
the sandbox, in addition to spaces for objects that might be shared
between threads, newspaces for many of the preceding kinds, and so on.
From what I can see, managing this complexity has taken a significant
effort. The result is pretty good to work with, but you pay for what
you get. (Do you get security guarantees? I don’t know enough to say.
Better pay some more to be sure.)</p><p>Finally, the C++ integration has also had an impact on the spaces
structure, and with a security implication to boot. The thing is,
conservative roots can’t be moved, but the original evacuating newspace
required moveability. One can get around this restriction by
pretenuring new allocations from C++ into the mark-compact space, but
this would be a performance killer. The solution that V8 is going for
is to use the block-structured mark-compact space that is already used for the old-space, but for new
allocations. If an object is ever traced during a young-generation
collection, its page will be promoted to the old generation, without
copying. Originally called <i>minor mark-compact</i> or <i>MinorMC</i> in the
commit logs, it was renamed to <i>minor mark-sweep</i> or <i>MinorMS</i> to
indicate that it doesn’t actually compact. (V8’s mark-compact old-space
doesn’t <i>have</i> to compact: V8 usually chooses to just mark in place.
But we call it a mark-compact space because it has that capability.)</p><p>This last change is a performance hazard: yes, you keep the desirable
bump-pointer allocation scheme for new allocations, but you lose on
locality in the old generation, and the rate of promoted bytes will be
higher than with the semi-space new-space. The only relief is that for
a given new-space size, you can allocate twice as many objects, because
you don’t need the copy reserve.</p><p>Why do I include this discussion in the security section? Well, because
<a href="https://bugs.chromium.org/p/v8/issues/detail?id=12612">most MinorMS commits mention this locked
bug</a>. One day
we’ll know, but not today. I speculate that evacuating is just too rich
a bug farm, especially with concurrency and parallel mutators, and that
never-moving collectors will have better security properties. But
again, I don’t know for sure, and I prefer to preserve my ability to
speculate rather than to ask for too many details.</p><h3>Concurrency</h3><p>Speaking of concurrency, ye gods, the last few years have been quite the
ride I think. Every phase that can be done in parallel (multiple
threads working together to perform GC work) is now fully parallel:
semi-space evacuation, mark-space marking and compaction, and sweeping.
Every phase that can be done <i>concurrently</i> (where the GC runs threads
while the mutator is running) is concurrent: marking and sweeping. A
major sweep task can run concurrently with an evacuating minor GC. And,
V8 is preparing for multiple mutators running in parallel. It’s all a
bit terrifying but again, with engineering investment and a huge farm of
fuzzers, it seems to be a doable transition.</p><p>Concurrency and threads means that V8 has sprouted new schedulers:
should a background task have incremental or concurrent marking? How
many sweepers should a given isolate have? How should you pause
concurrency when the engine needs to do something gnarly?</p><p>The latest in-progress work would appear to be <a href="https://bugs.chromium.org/p/v8/issues/detail?id=13012">concurrent marking of
the new-space</a>.
I think we should expect this work to result in a lower overall
pause-time, though I am curious also to learn more about the model: how
precise is it? Does it allow a lot of slop to get promoted? It seems
to have a black allocator, so there will be some slop, but perhaps it
can avoid promotion for those pages. I don’t know yet.</p><h3>Summary</h3><p>Yeah, GCs, man. I find the move to a non-moving young generation is
quite interesting and I wish the team luck as they whittle down the last
sharp edges from the conservative-stack-scanning performance profile.
The sandbox is pretty fun too. All good stuff and I look forward to
spending a bit more time with it; engineering out.</p></div> Andy Wingohttps://wingolog.org/