Planet Igalia

April 03, 2020

Andy Wingo

multi-value webassembly in firefox: from 1 to n

Greetings, hackers! Today I'd like to write about something I worked on recently: implementation of the multi-value future feature of WebAssembly in Firefox, as sponsored by Bloomberg.

In the "minimum viable product" version of WebAssembly published in 2018, there were a few artificial restrictions placed on the language. Functions could only return a single value; if a function would naturally return two values, it would have to return at least one of them by writing to memory. Loops couldn't take parameters; any loop state variables had to be stored to and loaded from indexed local variables at each iteration. Similarly, any block that would naturally return more than one result would also have to do so via locals.

This restruction is lifted with the multi-value proposal. Function types now map from result type to result type, where a result type is a sequence of value types. That is to say, just as functions can take multiple arguments, they can return multiple results. Similarly, with the multi-value proposal, block types are now the same as function types: loops and blocks can take arguments and return any number of results. This change improves the expressiveness of WebAssembly as a compilation target; a C++ program compiled to multi-value WebAssembly can be encoded in fewer bytes than before. Multi-value also establishes a base for other language extensions. For example, the exception handling proposal builds on multi-value to pass multiple values to catch blocks.

So, that's multi-value. You would think that relaxing a restriction would be easy, but you'd be wrong! This task took me 5 months and had a number of interesting gnarly bits. This article is part one of two about interesting aspects of implementing multi-value in Firefox, specifically focussing on blocks. We'll talk about multi-value function calls next week.

multi-value in blocks

In the last article, I presented the basic structure of Firefox's WebAssembly support: there is a baseline compiler optimized for low latency and an optimizing compiler optimized for throughput. (There is also Cranelift, a new experimental compiler that may replace the current implementation of the optimizing compiler; but that doesn't affect the basic structure.)

The optimizing compiler applies traditional compiler techniques: SSA graph construction, where values flow into and out of graphs using the usual defs-dominate-uses relationship. The only control-flow joins are loop entry and (possibly) block exit, so the addition of loop parameters means in multi-value there are some new phi variables in that case, and the expansion of block result count from [0,1] to [0,n] means that you may have more block exit phi variables. But these compilers are built to handle these situations; you just build the SSA and let the optimizing compiler go to town.

The problem comes in the baseline compiler.

from 1 to n

Recall that the baseline compiler is optimized for compiler speed, not compiled speed. If there are only ever going to be 0 or 1 result from a block, for example, the baseline compiler's internal data structures will use something like a Maybe<ValType> to represent that block result.

If you then need to expand this to hold a vector of values, the naïve approach of using a Vector<ValType> would mean heap allocation and indirection, and thus would regress the baseline compiler.

In this case, and in many other similar cases, the solution is to use value tagging to represent 0 or 1 value type directly in a word, and the general case by linking out to an external vector. As block types are function types, they actually appear as function types in the WebAssembly type section, so they are already parsed; the BlockType in that case can just refer out to already-allocated memory.

In fact this value-tagging pattern applies all over the place. (The jit/ links above are for the optimizing compiler, but they relate to function calls; will write about that next week.) I have a bit of pause about value tagging, in that it's gnarly complexity and I didn't measure the speed of alternative implementations, but it was a useful migration strategy: value tagging minimizes performance risk to existing specialized use cases while adding support for new general cases. Gnarly it is, then.

control-flow joins

I didn't mention it in the last article, but there are two important invariants regarding stack discipline in the baseline compiler. Recall that there's a virtual stack, and that some elements of the virtual stack might be present on the machine stack. There are four kinds of virtual stack entry: register, constant, local, and spilled. Locals indicate local variable reads and are mostly like registers in practice; when registers spill to the stack, locals do too. (Why spill to the temporary stack instead of leaving the value in the local variable slot? Because locals are mutable. A local.get captures a local variable value at its point of execution. If future code changes the local variable value, you wouldn't want the captured value to change.)

Digressing, the stack invariants:

  1. Spilled values precede registers and locals on the virtual stack. If u and v are virtual stack entries and u is older than v, then if u is in a register or is a local, then v is not spilled.

  2. Older values precede newer values on the machine stack. Again for u and v, if they are both spilled, then u will be farther from the stack pointer than v.

There are five fundamental stack operations in the baseline compiler; let's examine them to see how the invariants are guaranteed. Recall that before multi-value, targets of non-local exits (e.g. of the br instruction) could only receive 0 or 1 value; if there is a value, it's passed in a well-known register (e.g. %rax or %xmm0). (On 32-bit machines, 64-bit values use a well-known pair of registers.)

push(v)
Results of WebAssembly operations never push spilled values, neither onto the virtual nor the machine stack. v is either a register, a constant, or a reference to a local. Thus we guarantee both (1) and (2).
pop() -> v
Doesn't affect older stack entries, so (1) is preserved. If the newest stack entry is spilled, you know that it is closest to the stack pointer, so you can pop it by first loading it to a register and then incrementing the stack pointer; this preserves (2). Therefore if it is later pushed on the stack again, it will not be as a spilled value, preserving (1).
spill()
When spilling the virtual stack to the machine stack, you first traverse stack entries from new to old to see how far you need to spill. Once you get to a virtual stack entry that's already on the stack, you know that everything older has already been spilled, because of (1), so you switch to iterating back towards the new end of the stack, pushing registers and locals onto the machine stack and updating their virtual stack entries to be spilled along the way. This iteration order preserves (2). Note that because known constants never need to be on the machine stack, they can be interspersed with any other value on the virtual stack.
return(height, v)
This is the stack operation corresponding to a block exit (local or nonlocal). We drop items from the virtual and machine stack until the stack height is height. In WebAssembly 1.0, if the target continuation takes a value, then the jump passes a value also; in that case, before popping the stack, v is placed in a well-known register appropriate to the value type. Note however that v is not pushed on the virtual stack at the return point. Popping the virtual stack preserves (1), because a stack and its prefix have the same invariants; popping the machine stack also preserves (2).
capture(t)
Whereas return operations happen at block exits, capture operations happen at the target of block exits (the continuation). If no value is passed to the continuation, a capture is a no-op. If a value is passed, it's in a register, so we just push that register onto the virtual stack. Both invariants are obviously preserved.

Note that a value passed to a continuation via return() has a brief instant in which it has no name -- it's not on the virtual stack -- but only a location -- it's in a well-known place. capture() then gives that floating value a name.

Relatedly, there is another invariant, that the allocation of old values on block entry is the same as their allocation on block exit, so that all predecessors of the block exit flow all values via the same places. This is preserved by spilling on block entry. It's a big hammer, but effective.

So, given all this, how do we pass multiple values via return()? We don't have unlimited registers, so the %rax strategy isn't going to work.

The answer for the baseline compiler is informed by our lean into the stack machine principle. Multi-value returns are allocated in such a way that a capture() can push them onto the virtual stack. Because spilled values must precede registers, we therefore allocate older results on the stack, and put the last result in a register (or register pair for i64 on 32-bit platforms). Note that it's possible in theory to allocate multiple results to registers; we'll touch on this next week.

Therefore the implementation of return(height, v1..vn) is straightforward: we first pop register results, then spill the remaining virtual stack items, then shuffle stack results down towards height. This should result in a memmove of contiguous stack results towards the frame pointer. However because const values aren't present on the machine stack, depending on the stack height difference, it may mean a split between moving some values toward the frame pointer and some towards the stack pointer, then filling in by spilling constants. It's gnarly, but it is what it is. Note that the links to the return and capture implementations above are to the post-multi-value world, so you can see all the details there.

that's it!

In summary, the hard part of multi-value blocks was reworking internal compiler data structures to be able to represent multi-value block types, and then figuring out the low-level stack manipulations in the baseline compiler. The optimizing compiler on the other hand was pretty easy.

When it comes to calls though, that's another story. We'll get to that one next week. Thanks again to Bloomberg for supporting this work; I'm really delighted that Igalia and Bloomberg have been working together for a long time (coming on 10 years now!) to push the web platform forward. A special thanks also to Mozilla's Lars Hansen for his patience reviewing these patches. Until next week, then, stay at home & happy hacking!

by Andy Wingo at April 03, 2020 10:56 AM

March 25, 2020

Andy Wingo

firefox's low-latency webassembly compiler

Good day!

Today I'd like to write a bit about the WebAssembly baseline compiler in Firefox.

background: throughput and latency

WebAssembly, as you know, is a virtual machine that is present in web browsers like Firefox. An important initial goal for WebAssembly was to be a good target for compiling programs written in C or C++. You can visit a web page that includes a program written in C++ and compiled to WebAssembly, and that WebAssembly module will be downloaded onto your computer and run by the web browser.

A good virtual machine for C and C++ has to be fast. The throughput of a program compiled to WebAssembly (the amount of work it can get done per unit time) should be approximately the same as its throughput when compiled to "native" code (x86-64, ARMv7, etc.). WebAssembly meets this goal by defining an instruction set that consists of similar operations to those directly supported by CPUs; WebAssembly implementations use optimizing compilers to translate this portable instruction set into native code.

There is another dimension of fast, though: not just work per unit time, but also time until first work is produced. If you want to go play Doom 3 on the web, you care about frames per second but also time to first frame. Therefore, WebAssembly was designed not just for high throughput but also for low latency. This focus on low-latency compilation expresses itself in two ways: binary size and binary layout.

On the size front, WebAssembly is optimized to encode small files, reducing download time. One way in which this happens is to use a variable-length encoding anywhere an instruction needs to specify an integer. In the usual case where, for example, there are fewer than 128 local variables, this means that a local.get instruction can refer to a local variable using just one byte. Another strategy is that WebAssembly programs target a stack machine, reducing the need for the instruction stream to explicitly load operands or store results. Note that size optimization only goes so far: it's assumed that the bytes of the encoded module will be compressed by gzip or some other algorithm, so sub-byte entropy coding is out of scope.

On the layout side, the WebAssembly binary encoding is sorted by design: definitions come before uses. For example, there is a section of type definitions that occurs early in a WebAssembly module. Any use of a declared type can only come after the definition. In the case of functions which are of course mutually recursive, function type declarations come before the actual definitions. In theory this allows web browsers to take a one-pass, streaming approach to compilation, starting to compile as functions arrive and before download is complete.

implementation strategies

The goals of high throughput and low latency conflict with each other. To get best throughput, a compiler needs to spend time on code motion, register allocation, and instruction selection; to get low latency, that's exactly what a compiler should not do. Web browsers therefore take a two-pronged approach: they have a compiler optimized for throughput, and a compiler optimized for latency. As a WebAssembly file is being downloaded, it is first compiled by the quick-and-dirty low-latency compiler, with the goal of producing machine code as soon as possible. After that "baseline" compiler has run, the "optimizing" compiler works in the background to produce high-throughput code. The optimizing compiler can take more time because it runs on a separate thread. When the optimizing compiler is done, it replaces the baseline code. (The actual heuristics about whether to do baseline + optimizing ("tiering") or just to go straight to the optimizing compiler are a bit hairy, but this is a summary.)

This article is about the WebAssembly baseline compiler in Firefox. It's a surprising bit of code and I learned a few things from it.

design questions

Knowing what you know about the goals and design of WebAssembly, how would you implement a low-latency compiler?

It's a question worth thinking about so I will give you a bit of space in which to do so.

.

.

.

After spending a lot of time in Firefox's WebAssembly baseline compiler, I have extracted the following principles:

  1. The function is the unit of compilation

  2. One pass, and one pass only

  3. Lean into the stack machine

  4. No noodling!

In the remainder of this article we'll look into these individual points. Note, although I have done a good bit of hacking on this compiler, its design and original implementation comes mainly from Mozilla hacker Lars Hansen, who also currently maintains it. All errors of exegesis are mine, of course!

the function is the unit of compilation

As we mentioned, in the binary encoding of a WebAssembly module, all definitions needed by any function come before all function definitions. This naturally leads to a partition between two phases of bytestream parsing: an initial serial phase that collects the set of global type definitions, annotations as to which functions are imported and exported, and so on, and a subsequent phase that compiles individual functions in an essentially independent manner.

The advantage of this approach is that compiling functions is a natural task unit of parallelism. If the user has a machine with 8 virtual cores, the web browser can keep one or two cores for the browser itself and farm out WebAssembly compilation tasks to the rest. The result is that the compiled code is available sooner.

Taking functions to be the unit of compilation also allows for an easy "tier-up" mechanism: after the baseline compiler is done, the optimizing compiler can take more time to produce better code, and when it is done, it can swap out the results on a per-function level. All function calls from the baseline compiler go through a jump table indirection, to allow for tier-up. In SpiderMonkey there is no mechanism currently to tier down; if you need to debug WebAssembly code, you need to refresh the page, causing the wasm code to be compiled in debugging mode. For the record, SpiderMonkey can only tier up at function calls (it doesn't do OSR).

This simple approach does have some down-sides, in that it leaves intraprocedural optimizations on the table (inlining, contification, custom calling conventions, speculative optimizations). This is mitigated in two ways, the most obvious being that LLVM or whatever produced the WebAssembly has ideally already done whatever inlining might be fruitful. The second is that WebAssembly is designed for predictable performance. In JavaScript, an implementation needs to do run-time type feedback and speculative optimizations to get good performance, but the result is that it can be hard to understand why a program is fast or slow. The designers and implementers of WebAssembly in browsers all had first-hand experience with JavaScript virtual machines, and actively wanted to avoid unpredictable performance in WebAssembly. Therefore there is currently a kind of détente among the various browser vendors, that everyone has agreed that they won't do speculative inlining -- yet, anyway. Who knows what will happen in the future, though.

Digressing, the summary here is that the baseline compiler receives an individual function body as input, and generates code just for that function.

one pass, and one pass only

The WebAssembly baseline compiler makes one pass through the bytecode of a function. Nowhere in all of this are we going to build an abstract syntax tree or a graph of basic blocks. Let's follow through how that works.

Firstly, emitFunction simply emits a prologue, then the body, then an epilogue. emitBody is basically a big loop that consumes opcodes from the instruction stream, dispatching to opcode-specific code emitters (e.g. emitAddI32).

The opcode-specific code emitters are also responsible for validating their arguments; for example, emitAddI32 is wrapped in an assertion that there are two i32 values on the stack. This validation logic is shared by a templatized codestream iterator so that it can be re-used by the optimizing compiler, as well as by the publicly-exposed WebAssembly.validate function.

A corollary of this approach is that machine code is emitted in bytestream order; if the WebAssembly instruction stream has an i32.add followed by a i32.sub, then the machine code will have an addl followed by a subl.

WebAssembly has a syntactically limited form of non-local control flow; it's not goto. Instead, instructions are contained in a tree of nested control blocks, and control can only exit nonlocally to a containing control block. There are three kinds of control blocks: jumping to a block or an if will continue at the end of the block, whereas jumping to a loop will continue at its beginning. In either case, as the compiler keeps a stack of nested control blocks, it has the set of valid jump targets and can use the usual assembler logic to patch forward jump addresses when the compiler gets to the block exit.

lean into the stack machine

This is the interesting bit! So, WebAssembly instructions target a stack machine. That is to say, there's an abstract stack onto which evaluating i32.const 32 pushes a value, and if followed by i32.const 10 there would then be i32(32) | i32(10) on the stack (where new elements are added on the right). A subsequent i32.add would pop the two values off, and push on the result, leaving the stack as i32(42). There is also a fixed set of local variables, declared at the beginning of the function.

The easiest thing that a compiler can do, then, when faced with a stack machine, is to emit code for a stack machine: as values are pushed on the abstract stack, emit code that pushes them on the machine stack.

The downside of this approach is that you emit a fair amount of code to do read and write values from the stack. Machine instructions generally take arguments from registers and write results to registers; going to memory is a bit superfluous. We're willing to accept suboptimal code generation for this quick-and-dirty compiler, but isn't there something smarter we can do for ephemeral intermediate values?

Turns out -- yes! The baseline compiler keeps an abstract value stack as it compiles. For example, compiling i32.const 32 pushes nothing on the machine stack: it just adds a ConstI32 node to the value stack. When an instruction needs an operand that turns out to be a ConstI32, it can either encode the operand as an immediate argument or load it into a register.

Say we are evaluating the i32.add discussed above. After the add, where does the result go? For the baseline compiler, the answer is always "in a register" via pushing a new RegisterI32 entry on the value stack. The baseline compiler includes a stupid register allocator that spills the value stack to the machine stack if no register is available, updating value stack entries from e.g. RegisterI32 to MemI32. Note, a ConstI32 never needs to be spilled: its value can always be reloaded as an immediate.

The end result is that the baseline compiler avoids lots of stack store and load code generation, which speeds up the compiler, and happens to make faster code as well.

Note that there is one limitation, currently: control-flow joins can have multiple predecessors and can pass a value (in the current WebAssembly specification), so the allocation of that value needs to be agreed-upon by all predecessors. As in this code:

(func $f (param $arg i32) (result i32)
  (block $b (result i32)
    (i32.const 0)
    (local.get $arg)
    (i32.eqz)
    (br_if $b) ;; return 0 from $b if $arg is zero
    (drop)
    (i32.const 1))) ;; otherwise return 1
;; result of block implicitly returned

When the br_if branches to the block end, where should it put the result value? The baseline compiler effectively punts on this question and just puts it in a well-known register (e.g., $rax on x86-64). Results for block exits are the only place where WebAssembly has "phi" variables, and the baseline compiler allocates all integer phi variables to the same register. A hack, but there we are.

no noodling!

When I started to hack on the baseline compiler, I did a lot of code reading, and eventually came on code like this:

void BaseCompiler::emitAddI32() {
  int32_t c;
  if (popConstI32(&c)) {
    RegI32 r = popI32();
    masm.add32(Imm32(c), r);
    pushI32(r);
  } else {
    RegI32 r, rs;
    pop2xI32(&r, &rs);
    masm.add32(rs, r);
    freeI32(rs);
    pushI32(r);
  }
}

I said to myself, this is silly, why are we only emitting the add-immediate code if the constant is on top of the stack? What if instead the constant was the deeper of the two operands, why do we then load the constant into a register? I asked on the chat channel if it would be OK if I improved codegen here and got a response I was not expecting: no noodling!

The reason is, performance of baseline-compiled code essentially doesn't matter. Obviously let's not pessimize things but the reason there's a baseline compiler is to emit code quickly. If we start to add more code to the baseline compiler, the compiler itself will slow down.

For that reason, changes are only accepted to the baseline compiler if they are necessary for some reason, or if they improve latency as measured using some real-world benchmark (time-to-first-frame on Doom 3, for example).

This to me was a real eye-opener: a compiler optimized not for the quality of the code that it generates, but rather for how fast it can produce the code. I had seen this in action before but this example really brought it home to me.

The focus on compiler throughput rather than compiled-code throughput makes it pretty gnarly to hack on the baseline compiler -- care has to be taken when adding new features not to significantly regress the old. It is much more like hacking on a production JavaScript parser than your traditional SSA-based compiler.

that's a wrap!

So that's the WebAssembly baseline compiler in SpiderMonkey / Firefox. Until the next time, happy hacking!

by Andy Wingo at March 25, 2020 04:29 PM

March 16, 2020

Víctor Jáquez

Review of the Igalia Multimedia team Activities (2019/H2)

This blog post is a review of the various activities the Igalia Multimedia team was involved along the second half of 2019.

Here are the previous 2018/H2 and 2019/H1 reports.

GstWPE

Succinctly, GstWPE is a GStreamer plugin which allows to render web-pages as a video stream where it frames are GL textures.

Phil, its main author, wrote a blog post explaning at detail what is GstWPE and its possible use-cases. He wrote a demo too, which grabs and previews a live stream from a webcam session and blends it with an overlay from wpesrc, which displays HTML content. This composited live stream can be broadcasted through YouTube or Twitch.

These concepts are better explained by Phil himself in the following lighting talk, presented at the last GStreamer Conference in Lyon:

Video Editing

After implementing a deep integration of the GStreamer Editing Services (a.k.a GES) into Pixar’s OpenTimelineIO during the first half of 2019, we decided to implement an important missing feature for the professional video editing industry: nested timelines.

Toward that goal, Thibault worked with the GSoC student Swayamjeet Swain to implement a flexible API to support nested timelines in GES. This means that users of GES can now decouple each scene into different projects when editing long videos. This work is going to be released in the upcoming GStreamer 1.18 version.

Henry Wilkes also implemented the support for nested timeline in OpenTimelineIO making GES integration one of the most advanced one as you can see on that table:

Feature OTIO EDL FCP7 XML FCP X AAF RV ALE GES
Single Track of Clips ✔ ✔ ✔ ✔ ✔ W-O ✔ ✔
Multiple Video Tracks ✔ ✖ ✔ ✔ ✔ W-O ✔ ✔
Audio Tracks & Clips ✔ ✔ ✔ ✔ ✔ W-O ✔ ✔
Gap/Filler ✔ ✔ ✔ ✔ ✔ ✔ ✖ ✔
Markers ✔ ✔ ✔ ✔ ✖ N/A ✖ ✔
Nesting ✔ ✖ ✔ ✔ ✔ W-O ✔ ✔
Transitions ✔ ✔ ✖ ✖ ✔ W-O ✖ ✔
Audio/Video Effects ✖ ✖ ✖ ✖ ✖ N/A ✖ ✔
Linear Speed Effects ✔ ✔ ✖ ✖ R-O ✖ ✖ ✖
Fancy Speed Effects ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖
Color Decision List ✔ ✔ ✖ ✖ ✖ ✖ N/A ✖

Along these lines, Thibault delivered a 15 minutes talk, also in the GStreamer Conference 2019:

After detecting a few regressions and issues in GStreamer, related to frame accuracy, we decided to make sure that we can seek in a perfectly frame accurate way using GStreamer and the GStreamer Editing Services. In order to ensure that, an extensive integration testsuite has been developed, mostly targeting most important container formats and codecs (namely mxf, quicktime, h264, h265, prores, jpeg) and issues have been fixed in different places. On top of that, new APIs are being added to GES to allow expressing times in frame number instead of nanoseconds. This work is still ongoing but should be merged in time for GStreamer 1.18.

GStreamer Validate Flow

GstValidate has been turning into one of the most important GStreamer testing tools to check that elements behave as they are supposed to do in the framework.

Along with our MSE work, we found that other way to specify tests, related with produced buffers and events through specific pads, was needed. Thus, Alicia developed a new plugin for GstValidate: Validate Flow.

Alicia gave an informative 30 minutes talk about GstValidate and the new plugin in the last GStreamer Conference too:

GStreamer VAAPI

Most of the work along the second half of 2019 were maintenance tasks and code reviews.

We worked mainly on memory restrictions per backend driver, and we reviewed a big refactor: internal encoders now use GstObject, instead of the custom GstVaapiObject. Also we reviewed patches for new features such as video rotation and cropping in vaapipostproc.

Servo multimedia

Last year we worked integrating media playing in Servo. We finally delivered hardware accelerated video playback in Linux and Android. We worked also for Windows and Mac ports but they were not finished. As natural, most of the work were in servo/media crate, pushing code and reviewing contributions. The major tasks were to rewrite the media player example and the internal source element looking to handle the download playbin‘s flag properly.

We also added WebGL integration support with <video> elements, thus webpages can use video frames as WebGL textures.

Finally we explored how to isolate the multimedia processing in a dedicated thread or process, but that task remains pending.

WebKit Media Source Extension

We did a lot of downstream and upstream bug fixing and patch review, both in WebKit and GStreamer, for our MSE GStreamer-based backend.

Along this line we improved WebKitMediaSource to use playbin3 but also compatibility with older GStreamer versions was added.

WebKit WebRTC

Most of the work in this area were maintenance and fix regressions uncovered by the layout tests. Besides, the support for the Rasberry Pi was improved by handling encoded streams from v4l2 video sources, with some explorations with Minnowboard on top of that.

Conferences

GStreamer Conference

Igalia was Gold sponsor this last GStreamer Conference held in Lyon, France.

All team attended and five talks were delivered. Only Thibault presented, besides the video editing one which we already referred, another two more: One about GstTranscoder API and the other about the new documentation infrastructure based in Hotdoc:

We also had a productive hackfest, after the conference, where we worked on AV1 Rust decoder, HLS Rust demuxer, hardware decoder flag in playbin, and other stuff.

Linaro Connect

Phil attended the Linaro Connect conference in San Diego, USA. He delivered a talk about WPE/Multimedia which you can enjoy here:

Demuxed

Charlie attended Demuxed, in San Francisco. The conference is heavily focused on streaming and codec engineering and validation. Sadly there are not much interest in GStreamer, as the main focus is on FFmpeg.

RustFest

Phil and I attended the last RustFest in Barcelona. Basically we went to meet with the Rust community and we attended the “WebRTC with GStreamer-rs” workshop presented by Sebastian Dröge.

by vjaquez at March 16, 2020 03:20 PM

March 09, 2020

Jacobo Aragunde

Mapping the input method implementations in Chromium

This is an overview of the different input method implementations in Chromium, centered in Linux platforms.

By Rime-devel – 小狼毫输入法界面 / Interface of Weasel Input Method, GPL, Link

Native IME

InputMethodAuraLinux is the central piece here. It implements two important interfaces and triggers the construction of the platform-specific bits. This is the structure for the Wayland/Ozone backend:

The InputMethodAuraLinux triggers the construction of a specific LinuxInputMethodContext through the Factory. In the case above, it would be an instance of WaylandInputMethodContext. It will save a reference to the Context while, at the same time, set itself as the delegate for that Context object. The set of methods of both classes is pretty similar, because they are expected to work together: the Context object, in the parts related to the platform, and InputMethodAuraLinux in the parts related to the rest of Chromium. It will interact with other system classes, mainly through the singleton object IMEBridge.

The X11 backend currently involves slightly different classes. It still has the InputMethodAuraLinux as a center piece, but uses different implementations for the Context and the ContextFactory: the multi-purpose class LinuxUI also acts as a factory. It’s expected to adapt to the Ozone structure at some point in the future.

ChromeOS and IME extension API

Chromium provides an extension API to allow the implementation of IMEs exclusively with web technologies. It shares code with the ChromeOS implementation.

The piece glueing the native infrastructure and the ChromeOS/Extension API implementations is the IMEEngineHandlerInterface, which has hooks to manage relevant events like focus in or out, process key events, etc. The singleton class IMEBridge can set an EngineHandler, which is an implementation of the aforementioned interface, and in the case that such implementation exists, it will receive those events from Chromium whenever they happen.

There is one base implementation of IMEEngineHandlerInterface called InputMethodEngineBase. It is extended by chromeos::InputMethodEngine, for the ChromeOS implementation, and by input_method::InputMethodEngine, for the IME extension API implementation.

The setup of the IME extension API contains the following pieces:

  • observers to the IME bridge
  • an InputIMEEventRouter that contains an instance of the InputMethodEngine class
  • a factory to associate instances of InputIMEEventRouter with browser profiles.
  • An InputImeAPI class that acts as several kinds of observers.
  • the actual code that backs every JavaScript IME operation, which makes use of the classes mentioned above and is contained in individual classes that implement UIThreadExtensionFunction for every operation.

The IME API operates in two directions: when one of the Javascript operations is directly called, it will trigger the native code associated to it, which would invoke any relevant Chromium code; when a relevant event happens in the system, the IMEBridgeObserver is notified and the associated JavaScript callback will be triggered.

by Jacobo Aragunde Pérez at March 09, 2020 05:00 PM

Brian Kardell

Making Sure Content Lives On...

Making Sure Content Lives On...

In which I describe a problem, share a possible solution that you can use today, and, hopefully start a conversation...

(If you already know the problem, and you're looking for the solution, check out the documentation over on GitHub)

There's a series of problems that I experience literally all the time: Link rot and the disappearance of our history. I have written posts on blogging platforms that no longer exist. I have written guest posts for sites that no longer exist. I frequently wish that that content still existed - not only still existed, but was findable again. I have researched hard and bookmarked or linked to obscure but important history which, you guessed it: no longer exists. This is tremendously frustrating, but also just really sad (more on this later)...

This is one thing that really interests me about a potentially future decentralized web - we could do better here. At the same time, we often have to deal with the Web we have now - not the one we wish existed. So, let's look the state of this today?

Archiving today

Luckily, when content disappears, I can often turn to the Internet Archive and the Wayback Machine to find content. But the truth is that they've got a very tough challenge. Identifing all the things that need indexing is really hard. If you have a relatively small blog or site, especially, it can be hard to find. Then, even if it gets found, how do they know when you've got some new content? So, the net result is that no matter how good a job they do - a lot of really interesting stuff can wind up not being findable today there either.

History itself makes this problem even harder: It's very unpredictable. Often things are considerably more interesting in only in hindsight. Things also tend to seem a lot more stable in the short term than they are in reality in the long arc of history: Even giant "too big to fail", semi-monopolies ultimately shift, burn out or even just break their history. One of the wisest things I ever heard was an older engineer who told me if they could impart one things to students it was this - all the paradigms that you think are forever will shift. Once upon a time IBM ruled the world. Motorola was absolutely dominant. Once upon a time, SourceForge was the shit. Then GitHub. Once upon a time Netscape won the interwebs, and then they lost it to Microsoft who won it even bigger. Then Microsoft lost it again.

It's ok if you don't share my desire, but I personally, want to be sure my content continues to exist, and people can find it via old links despite all of these uncertain futures. Today, the best chance of that is in the Internet Archive. So, how can I be more confident that it does?

Maybe... Just ask?

Interestingly, a lot of these problems get hella simpler if you just ask.

For a really long time now (as long as I can remember the Wayback Machine existing), you can go to https://archive.org/web/, enter a URL under the heading 'Save page now' and click the button.

A notification model is way, way simpler than somehow trying to monitor and guess - and we use this same basic pattern for several things in engineering. It's quite success at keeping things efficient.

The trouble here is, I think, that it is currently manual. It's unlikely that I'm going to do that every time I add a new blog post, even thought that's exactly what I want to do.

Maybe we can fix that?

Asking by default...

Recently, this got me to thinking... What if it were really easy to incorporate this into whatever blog software you use today? Think about it: RSS continues to thrive, in part, because you hardly have to know about it - it's so comparatively easy (entirely free in some cases in that it is just part of the software you get). If you could incorporate this into your system in somewhere between 0 and 30 minutes, would you?

Every now and then I have a thought like this that seems so clear to me that I figure "surely this must exist? Why do I not know about it?" So, I asked around on social media. Surprisingly, I could find no one who was aware of such a thing. What about publicly exposing an API? Do they? Maybe? Kinda? Could we make it easier?

Brendan Eich cc'ed Jason Scott from the Internet Archive into the conversation. Jason cc'ed in Mark Graham, the Director for the Wayback Machine. Mark got me access to a service and set up this little experiement to try it out.

But how to get there?

One of the challenges here is how to make something 'easy' when there are so many blogging systems and site hosting things. Submitting a URL automatically, even, is technically not hard - but fitting that into each system is a little more work. Then, each of these uses different frameworks, languages, package managers and so on. I don't really have time to build out 100 different 'easy' solutions just to start a conversation. In fact, I want this for myself, and currently my setup is very custom - it's all stuff I wrote that works for me but would be more or less useful to someone else.

But then I had a thought: There's actually lots of good stuff in the Web we have for tackling this problem.

Existing, higher level hooks

HTTP is already a nice, high-level API that doesn't really care about how you built your site, but it might not be automatically obvious or convenient regarding how to plug into it. We already have lots of infrastructure too... As I said, almost everyone has an RSS feed and almost everyone has some kind of 'publish' process. Maybe I could make it easy for huge swaths of people by just tapping into some big patterns and existing machinery. What I really wanted, I thought, was just a flexible and easy to integrate service. It doesn't have to block your build, it isn't absolutely critical, it's just "hey... I just published a thing."

So, I decided to try this..

I created an HTTP service with a Cloudflare worker. Cloudflare workers are interesting because they're deployed at the edge, they sleep when youre not using them and you can process up to 100k requests per-day for free. Since they kind of spin up as needed, handling errors for this kind of thing is easy too - you aren't going to pollute the system somehow. In fact, they seem kind of ideally suited for this kind of thing.

This seems very good actually because blogs, especially personal blogs, tend to churn out content "slowly". That is, 100k requests per day seems like plenty for a few orders of magnitude more blogs than that. So... Cool.

Basically - you send an HTTP Post to this service with the content-type `application/json` and provide some stuff in the body. But what you provide can differ to make integration very easy regardless of what kind of setup you have. It allows, effectively 3 'shapes' that I've put together that should fit nicely and easily into whatever system you already have today and be easy to integrate. For my own blog, hosted on gh pages, it took ~5 minutes. If you're interested in trying it out or seeing what I came up with check out the experiment itself, try it out.

I think it should be pretty easy and flexible, but I don't know, wdyt? Are you willing to try it? Is this an interesting problem? Should the Wayback maybe offer something like this itself? How could it be better?

March 09, 2020 04:00 AM

March 02, 2020

Žan Doberšek

Flatpak repository for WPE

To let developers play with the WPE stack, we have set up a Flatpak repository containing all the necessary bits to start working with it. To install applications (like Cog, the very simple WPE launcher), first add the remote repository, and proceed with the following instructions:

$ flatpak --user remote-add wpe-releases --from https://software.igalia.com/flatpak-refs/wpe-releases.flatpakrepo
$ flatpak --user install org.wpe.Cog
$ flatpak run org.wpe.Cog -P fdo <url>

Currently the 2.26 release of the WPE port is used, along with libwpe 1.4.0, WPEBackend-fdo 1.4.0 and Cog 0.4.0. Upgrades to the newer releases (happening in next few weeks) will be done in the next month or two. Builds are provided for x86_64, arm and aarch64 architectures.

If you need ideas or inspiration on how to use WPE, this repository also contains GstWPEBroadcastDemo, an application that showcases both GStreamer and WPE, enabling you to mix live video input with HTML content that can be updated on-the-fly. You can read more about this in the blog post made by Philippe Normand.

The current Cog/WPE stack still imposes the Wayland-only limitation, with Mesa-based graphics stacks most likely to work well. In future release, we plan to add support for new platforms, graphics stacks and methods of integration.

All of this is still in very early stages. If you find an issue with the applications or libraries in the repository, please do not hesitate to report it to our issue tracker. The issues will be rerouted to the trackers of the problematic component if necessary.

by Žan Doberšek at March 02, 2020 09:00 AM

February 17, 2020

Paulo Matos

CReduce - it's a kind of magic!

During my tenure as a C compiler developer working on GCC and LLVM there was an indispensable tool when it came to fixing bugs and that was C-Reduce, or its precursor delta. Turns out this magic applies well to JavaScript, so lets take a quick glance at what it is and how to use it.

More…

by Paulo Matos at February 17, 2020 10:00 AM

February 12, 2020

Brian Kardell

Toward Responsive Elements

Toward Responsive Elements

In this piece I'll talk about the "Container Queries" problem, try to shine some light on some misconceptions, and tell you about the state of things.

As developers, watching standards can be frustrating. We might learn of some new spec developing, some new features landing that you weren't even aware of - maybe many of these are even things you don't actually care much about. In the meantime, "Container Queries," clearly the single most requested feature in CSS for years, appears to be just... ignored.

What gives? Why does the CSS Working Group not seem to listen? After all of these years -- why isn't there some official CSS spec at least?

It's very easy to get the wrong impression about what's going on (or isn't) in Web standards, and why. This isn't because it is a secret, but because there is just so much happening and so many levels of discussion it would be impossible to follow it all. The net result is often that things the view from the outside can feel kind of opaque - and our impression can easily be a kind of a distorted image. Shining some light on this sort of thing is one of my goals this year, so let's start here…

N problems

One of the more difficult aspects of "Container Queries" is that there isn't actually a nice, single problem to discuss. Many discussions begin with how to write them - usually in a way that "feels kind of obvious" -- but this usually (inadvertently) often creates a stack of new asks of CSS and numerous conflicts. These wind up derailing the conversation. In the process, it can also (inadvertently) hand-wave over tremendous complexity of what turn out to be very significant particulars and important aspects of CSS's current design and implementation. This makes it very hard to focus a discussion, it is easily derailed.

A big part of the challenge here is that depending on how you divide up the problem, some of these asks would suggest that fundamental changes are necessary to the architectures of browser engines. Worse: We don't even know what they would be. Unfortunately, it's very hard to appreciate what this means in practice since so much of this, again, deals with things that are opaque to most developers. For now: suffice it to say that any such re-architecture could require a entirely unknown (in years) speculative amount of work in order to create and prove such an idea - and then many more years of development per-engine. Worse, all of this would be similarly invisible to developers in the meantime. Did you know, for example, that there are multiple many year long efforts with huge investments underway already aimed at unlocking many new things in CSS? There are - and I don't mean Houdini!

Ok, but what about container queries...

What is happening in this space?

The truth is that while it might look like nothing much is happening, there have been lots of discussions and efforts to feel out various ideas. They have mostly, unfortunately, run into certain kinds of problems very quickly and it's not been at all easy to even determine which paths aren't dead ends.

More recently, many have instead turned to "how do we make this into more solvable problems?" and "How do we actually make some progress, mitigate risk - take a step, and and actually get something to developers?"

Two important ideas that have gotten a lot of traction in this time are 'containment' and ResizeObserver, both of which force us to tackle a different set of (hopefully more answerable) problems - allow us to lay what we think are probably necessary foundations to solving the problem, while enabling developers to meet these and other use cases more effectively, and in more timely fashion.

Lightspeed progress, actually...

Standards move at a scale unfamilliar to most of us. Consider that we shifted the stalemate conversation and ResizeObserver was envisioned, incubated, speced, tested, agreed upon, iterated on (we got things wrong!) and implemented in all browsers in about 2 years (with Igalia doing the latest implementation in WebKit thanks to some sponsorship from AMP). Containment is shipping in 2 of 3 engines.

This is standards and normalization of something really new for developers at light speed.

But... this is incomplete.

Yes, it is.

In the announcement about Igalia's implementation of ResizeObserver in WebKit, I included an overly simple example of how you could create responsive components with very few lines of code that actually worked, regardless of how or why things resized. But... no one wants us to ultimately have to write JavaScript for this… Literally nobody that I know of - so what gives?

First, by treating it this way, we've made some incredible, actual progress on parts of the problem in very short time - for the first time. All browsers will have ResizeObserver now. In the meantime developers will at least have the ability to do a thing they couldn't actually do before, and do the things they could do before more efficiently and experiments can understand a little more about how it actually has to work. That's good for developers - we don't have to wait forever for something better than what we have now... But it's also good for problem solving: As things settle and we have more answers, its allowed us to focus conversations more and ask better next questions.

We definitely care about moving futher!

For the last several months, I and some folks at Igalia have been working on this space. We're very interested in helping find the connections that bring together developers, implementer and standards bodies to solve the problems.

Internally, we have had numerous sessions to discuss past proposals and tackle challenges. I've been researching the various user-space solutions: How they work, what their actual use is like on the Web, and so on. In developer space we've worked a lot with folks active in this space: I've met regularly with Tommy Hodgins and Jon Neal, and had discussions with folks like Eric Portis, Brad Frost, Scott Jehl and several others.

But we've also talked to people from Google, Mozilla and Apple, and they've been very helpful and willing to chat - both commenting on and generating new ideas and discussions. I really can't stress enough how helpful these discussions have been, or how willing all of the CSSWG members representing browser vendors have been to spend time doing this. Just during the recent travel for the CSS Working Group face to face, I had numerous breakout discussions - many hours worth of idea trading between Igalians and browser folks.

I wish I could say that we were 'there' but we're not.

As it stands it seems that there are kind of a few 'big ideas' that seem like they could go somewhere - but not actually agreement on whether one or both of them is acceptable or prioritization yet.

Current ideas...

There does seem to be some general agreement on at least one part of what I am going to call instead "Responsive Design for Components" and that is that flipping the problem on its head is better. In other words: Let's see if we can find a way talk about how we can expose the necessary powers in CSS, in a way that is largely compatible with CSS's architecture today... even if writing that looks absolutely nothing like past proposals at first.

There seems to be some agreement at large that this is where the hardest problems lie and separating that part of the discussion from things that we can already sugar in user-space seems helpful for making progreess. This has led to a few different ideas…

Lay down small, very optimal paths in CSS

One very good idea generated by lots of discussion with many people during CSS Working Group breakouts sprang from a thought from Google's Ian Kilpatrick…

I am constantly impressed by Ian's ability to listen, pull the right threads, help guide the big ship and coordinate progress on even long visions that enable us to exceed current problem limits. You probably don't hear a lot about him, but the Web is a lot better for his efforts…

The crux of the idea is that there seems to be one 'almost natural' space that enables a ton of the use cases we've actually seen in research. Some community members have landed on not entirely dissimilar ideas: creating something of an 'if' via calc expressions to express different values for properties. The idea goes something like this:

Imagine that we added a function, let's call it "switch" which allows you to express something "like a media query" with several possible values for a property. These would be based on the available-inline-width during the layout phase. Something like this…

.foo {
	display: grid;
	grid-template-columns: switch(
	 	(available-inline-size > 1024px) 1fr 4fr 1fr;
	 	(available-inline-size > 400px) 2fr 1fr;
		(available-inline-size > 100px) 1fr;
		default 1fr;
	 );
}

See, a a whole lot of the problems with existing ideas is that they heave to loop back through (expensive) phases potentially several times and make it (seemingly) impossible to keep CSS rendering in the same frame - thus requiring fairly major architectural changes. It would need to be limited to properties that affect things only during layout, but another practical benefit here is that it would be not that hard to start some basic prototyping and feel this out a little in an implementation without actually committing to years of work. It would be, very likely, deliverable to developers in a comparatively short amount of time.

Importantly: While it makes sense to expose this step to developers, the idea really is that if we could get agreement and answers on this piece, we would be able to discuss concretely how we sugar some better syntax which effectly distills down to this, internally. Nicole Sullivan also has some ideas which might help here, and which I look forward to learn more about.

Better Containment and...

There is another interesting idea generated primarily by David Baron at Mozilla. In fact, David has a whole series of proposals on this that would lay out how to get all the way there in one vision. It is still being fleshed out, he's writing something up as I write this.

I've seen some early drafts and it's really interesting but it's worth noting that it also isn't without a practical order of dependencies. Effectively, everything in David's idea hinges on "containment in a single direction". This is the smallest and most fundamental property - the next step is almost certainly to define and impement that and if we have that then the next things follow more easily - but so do many other possibilities.

This information as a property potentially provides an opportunity for an optimization of what would happen if you built any system with ResizeObserver today, where it might be possible to avoid complete trips through the event loop.

It isn't currently clear how some questions will be answered but David is one of the most knowledgable people I know, and there's a lot of good stuff in here - I look forward to seeing more details!

Some new experiments?

Another idea that we've tossed around a bunch at Igalia and with several developers punts on a few hard of these hard questions whose answers seem hard to speculate on and really focus instead on the circularity problem and thinking about the kind of unbounded space of use cases developers seem to imagine. Perhaps it "fits" with option B even...

These would be a kind of declarative way, in CSS, to wire up finite states. They would focus on a laying down a single new path in CSS that allowed us to express pattern of what to observe, how to observe it, how to measure and say that an element was in a particular state. The practical upshot of this is that not just Container Queries, but lots and lots of use cases that are currently hard to solve in CSS, but are entirely solvable in JS with Observers become suddenly possible to just express via CSS without asking a lot of new stuff CSS - and doing that might present some new optimization opportunities, but it might not -- and at least we wouldn't block moving forward.

So what's next?

In the coming months I hope to continue to think about, explore this space and continue discussions with others. I would love to publish some research and maybe some new (functional) experiments with JS that aim to be 'closer' to a path that might be paveable. We have raw materials and enough information to understand some of where things are disjoint between what we're asking for and the practical aspects of what we're proving.

Ultimately, I'd like to help gather all of these ideas back into a place where we have enough to really dig in in WICG or CSSWG without a very significant likelihood of easy derailment. Sooner rather than later, I hope.

In the meantime - let's celebrate that we have actually made progress and work to encourage more!

February 12, 2020 05:00 AM

February 09, 2020

Andy Wingo

state of the gnunion 2020

Greetings, GNU hackers! This blog post rounds up GNU happenings over 2019. My goal is to celebrate the software we produced over the last year and to help us plan a successful 2020.

Over the past few months I have been discussing project health with a group of GNU maintainers and we were wondering how the project was doing. We had impressions, but little in the way of data. To that end I wrote some scripts to collect dates and versions for all releases made by GNU projects, as far back as data is available.

In 2019, I count 243 releases, from 98 projects. Nice! Notably, on ftp.gnu.org we have the first stable releases from three projects:

GNU Guix
GNU Guix is perhaps the most exciting project in GNU these days. It's a package manager! It's a distribution! It's a container construction tool! It's a package-manager-cum-distribution-cum-container-construction-tool! Hearty congratulations to Guix on their first stable release.
GNU Shepherd
The GNU Daemon Shepherd is a modern dependency-based init service, written in Guile Scheme, and used in Guix. When you install Guix as an operating system, it actually stages Scheme programs from the operating system definition into the Shepherd configuration. So cool!
GNU Backgammon
Version 1.06.002 is not GNU Backgammon's first stable release, but it is the earliest version which is available on ftp.gnu.org. Formerly hosted on the now-defunct gnubg.org, GNU Backgammon is a venerable foe, and uses neural networks since before they were cool. Welcome back, GNU Backgammon!

The total release counts above are slightly above what Mike Gerwitz's scripts count in his "GNU Spotlight", posted on the FSF blog. This could be because in addition to files released on ftp.gnu.org, I also manually collected release dates for most packages that upload their software somewhere other than gnu.org. I don't count alpha.gnu.org releases, and there were a handful of packages for which I wasn't successful at retrieving their release dates. But as a first approximation, it's a relatively complete data set.

I put my scripts in git repository if anyone is interested in playing with the data. Some raw CSV files are there as well.

where we at?

Hair toss, check my nails, baby how you GNUing? Hard to tell!

To get us closer to an answer, I calculated the active package count per year. There can be other definitions, but my reading is that an active package is one that has had a stable release within the preceding 3 calendar years. So for 2019, for example, a GNU package is considered active if it had a stable release in 2017, 2018, or 2019. What I got was a graph that looks like this:

What we see is nothing before 1991 -- surely pointing to lacunae in my data set -- then a more or less linear rise in active package count until 2002, some stuttering growth rising to a peak in 2014 at 208 active packages, and from there a steady decline down to 153 active packages in 2019.

Of course, as a metric, active package count isn't precisely the same as project health; GNU ed is indeed the standard editor but it's not GCC. But we need to look for measurements that indirectly indicate project health and this is what I could come up with.

Looking a little deeper, I tabulated the first and last release date for each GNU package, and then grouped them by year. In this graph, the left blue bars indicate the number of packages making their first recorded release, and the right green bars indicate the number of packages making their last release. Obviously a last release in 2019 indicates an active package, so it's to be expected that we have a spike in green bars on the right.

What this graph indicates is that GNU had an uninterrupted growth phase from its beginning until 2006, with more projects being born than dying. Things are mixed until 2012 or so, and since then we see many more projects making their last release and above all, very few packages "being born".

where we going?

I am not sure exactly what steps GNU should take in the future but I hope that this analysis can be a good conversation-starter. I do have some thoughts but will post in a follow-up. Until then, happy hacking in 2020!

by Andy Wingo at February 09, 2020 07:44 PM

February 07, 2020

Andy Wingo

lessons learned from guile, the ancient & spry

Greets, hackfolk!

Like just about every year, last week I took the train up to Brussels for FOSDEM, the messy and wonderful carnival of free software and of those that make it. Mostly I go for the hallway track: to see old friends, catch up, scheme about future plans, and refill my hacker culture reserves.

I usually try to see if I can get a talk or two in, and this year was no exception. First on my mind was the recent release of Guile 3. This was the culmination of a 10-year plan of work and so obviously there are some things to say! But at the same time, I wanted to reflect back a bit and look at the past with a bit of distance.

So in the end, my one talk was two talks. Let's start with the first one. (I'm trying a new thing where I share my talks as blog posts. We'll see how this goes. I know the rendering can be a bit off relative to the slides, but hopefully it's good enough. If you prefer, you can just watch the video instead!)

Celebrating Guile 3

FOSDEM 2020, Brussels

Andy Wingo | wingo@igalia.com

wingolog.org | @andywingo

So yeah let's celebrate! I co-maintain the Guile implementation of Scheme. It's a programming language. Guile 3, in summary, is just Guile, but faster. We added a simple just-in-time compiler as well as a bunch of ahead-of-time optimizations. The result is that it runs faster -- sometimes by a lot!

In the image above you can see Guile 3's performance on a number of microbenchmarks, relative to Guile 2.2, sorted by speedup. The baseline is 1.0x as fast. You can see that besides the first couple microbenchmarks where things are a bit inconclusive (click for full-size image), everything gets faster. Most are at least 2x as fast, and one benchmark is even 32x as fast. (Note the logarithmic scale on the Y axis.)

I only took a look at microbenchmarks at the end of the Guile 3 series; before that, I was mostly going by instinct. It's a relief to find out that in this case, my instincts did align with improvement.

mini-benchmark: eval

(primitive-eval
 ’(let fib ((n 30))
    (if (< n 2)
        n
        (+ (fib (- n 1)) (fib (- n 2))))))

Guile 1.8: primitive-eval written in C

Guile 2.0+: primitive-eval in Scheme

Taking a look at a more medium-sized benchmark, let's compute the 30th fibonacci number, but using the interpreter instead of compiling the procedure. In Guile 2.0 and up, the interpreter (primitive-eval) is implemented in Scheme, so it's a good test of an important small Scheme program.

Before 2.0, though, primitive-eval was actually implemented in C. This had a number of disadvantages, notably that it prevented tail calls between interpreted and compiled code. When we switched to a Scheme implementation of primitive-eval, we knew we would have a performance hit, but we thought that we would gain it back eventually as the compiler got better.

As you can see, it took a while before the compiler and run-time improved to the point that primitive-eval in Scheme reached the speed of its old hand-tuned C implementation, but for Guile 3, we finally got there. Note again the logarithmic scale on the Y axis.

macro-benchmark: guix

guix build libreoffice ghc-pandoc guix \
  –dry-run --derivation

7% faster

guix system build config.scm \
  –dry-run --derivation

10% faster

Finally, taking a real-world benchmark, the Guix package manager is implemented entirely in Scheme. All ten thousand packages are defined in Scheme, the building scripts are in Scheme, the initial RAM disk is in Scheme -- you get the idea. Guile performance in Guix can have an important effect on user experience. As you can see, Guile 3 lowered elapsed time for some operations by around 10 percent or so. Of course there's a lot of I/O going on in addition to computation, so Guile running twice as fast will rarely make Guix run twice as fast (Amdahl's law and all that).

spry /sprī/

  • adjective: active; lively

So, when I was thinking about words that describe Guile, the word "spry" came to mind.

spry /sprī/

  • adjective: (especially of an old person) active; lively

But actually when I went to look up the meaning of "spry", Collins Dictionary says that it especially applies to the agèd. At first I was a bit offended, but I knew in my heart that the dictionary was right.

Lessons Learned from Guile, the Ancient & Spry

FOSDEM 2020, Brussels

Andy Wingo | wingo@igalia.com

wingolog.org | @andywingo

That leads me into my second talk.

guile is ancient

2010: Rust

2009: Go

2007: Clojure

1995: Ruby

1995: PHP

1995: JavaScript

1993: Guile (33 years before 3.0!)

It's common for a new project to be lively, but Guile is definitely not new. People have been born, raised, and earned doctorates in programming languages in the time that Guile has been around.

built from ancient parts

1991: Python

1990: Haskell

1990: SCM

1989: Bash

1988: Tcl

1988: SIOD

Guile didn't appear out of nothing, though. It was hacked up from the pieces of another Scheme implementation called SCM, which itself was initially based on Scheme in One Defun (SIOD), back before the Berlin Wall fell.

written in an ancient language

1987: Perl

1984: C++

1975: Scheme

1972: C

1958: Lisp

1958: Algol

1954: Fortran

1958: Lisp

1930s: λ-calculus (34 years ago!)

But it goes back further! The Scheme language, of which Guile is an implementation, dates from 1975, before I was born; and you can, if you choose, trace the lines back to the lambda calculus, created in mid-30s as a notation for computation. I suppose at this point I should say mid-2030s, to disambiguate.

The point is, Guile is old! Statistically, most software projects from olden times are now dead. How has Guile managed to survive and (sometimes) thrive? Surely there must be some lesson or other that can be learned here.

ancient & spry

Men make their own history, but they do not make it as they please; they do not make it under self-selected circumstances, but under circumstances existing already, given and transmitted from the past.

The tradition of all dead generations weighs like a nightmare on the brains of the living. [...]

Eighteenth Brumaire of Louis Bonaparte, Marx, 1852

I am no philospher of history, but I know that there are some ways of looking at the past that do not help me understand things. One is the arrow of enlightened progress, in which events exist in a causal chain, each producing the next. It doesn't help me understand the atmosphere, tensions, and possibilities inherent at any particular point. I find the "progress" theory of history to be an extreme form of selection bias.

Much more helpful to me is the Hegelian notion of dialectics: that at an given point in time there are various tensions at work. In our field, an example could be memory safety versus systems programming. These tensions create an environment that favors actions that lead towards resolution of the tensions. It doesn't mean that there's only one way to resolve the tensions, and it's not an automatic process -- people still have to do things. But the tendency is to ratchet history forward to a new set of tensions.

The history of a project, to me, is then a process of dialectic tensions and resolutions. If the project survives, as Guile has, then it should teach us something about the way this process works in practice.

ancient & spry

Languages evolve; how to remain minimal?

Dialectic opposites

  • world and guile

  • stable and active

  • ...

Lessons learned from inside Hegel’s motor of history

One dialectic is the tension between the world's problems and what tools Guile offers to understand and solve them. In 1993, the web didn't really exist. In 2033, if Guile doesn't run well in a web browser, probably it will be dead. But this process operates very slowly, for an old project; Guile isn't built on CORBA or something ephemeral like that, so we don't have very much data here.

The tension between being a stable base for others to build on, and in being a dynamic project that improves and changes, is a key tension that this talk investigates.

In the specific context of Guile, and for the audience of the FOSDEM minimal languages devroom, we should recognize that for a software project, age and minimalism don't necessarily go together. Software gets features over time and becomes bigger. What does it mean for a minimal language to evolve?

hill-climbing is insufficient

Ex: Guile 1.8; Extend vs Embed

One key lesson that I have learned is that the strategy of making only incremental improvements is a recipe for death, in the long term. The natural result is that you reach what you perceive to be the most optimal state of your project. Any change can only make it worse, so you stop moving.

This is what happened to Guile around version 1.8: we had taken the paradigm of the interpreter as language implementation strategy as far as it could go. There were only around 150 commits to Guile in 2007. We were stuck.

users stay unless pushed away

Inertial factor: interface

  • Source (API)

  • Binary (ABI)

  • Embedding (API)

  • CLI

  • ...

Ex: Python 3; local-eval; R6RS syntax; set!, set-car!

So how do we make change, in such a circumstance? You could start a new project, but then you wouldn't have any users. It would be nice to change and keep your users. Fortunately, it turns out that users don't really go away; yes, they trickle out if you don't do anything, but unless you change in an incompatible way, they stay with you, out of inertia.

Inertia is good and bad. It does conflict with minimalism as a principle; if you were to design Scheme in 2020, you would not include mutable variables or even mutable pairs. But they are still with us because if we removed them, we'd break too many users.

Users can even make you add back things that you had removed. In Guile 2.0, we removed the capability to evaluate an expression at run-time within the lexical environment of an expression, as we didn't know how to implement this outside an interpreter. It turns out this was so important to users that we had to add local-eval back to Guile, later in the 2.0 series. (Fortunately we were able to do it in a way that layered on lower-level facilities; this approach reconciled me to the solution.)

you can’t keep all users

What users say: don’t change or remove existing behavior

But: sometimes losing users is OK. Hard to know when, though

No change at all == death

  • Natural result of hill-climbing

Ex: psyntax; BDW-GC mark & finalize; compile-time; Unicode / locales

Unfortunately, the need to change means that sometimes you will lose users. It's either a dead project, or losing users.

In Guile 1.8, for example, the macro expander ran lazily: it would only expand code the first time it ran it. This was good for start-up time, because not all code is evaluated in the course of a simple script. Lazy expansion allowed us to start doing important work sooner. However, this approach caused immense pain to people that wanted "proper" Scheme macros that preserved lexical scoping; the state of the art was to eagerly expand an entire file. So we switched, and at the same time added a notion of compile-time. This compromise kept good start-up time while allowing fancy macros.

But eager expansion was a change. Users that relied on side effects from macro expansion would see them at compile-time instead of run-time. Users of old "defmacros" that could previously splice in live Scheme closures as literals in expanded source could no longer do that. I think it was the right choice but it did lose some users. In fact I just got another bug report related to this 10-year-old change last week.

every interface is a cost

Guile binary ABI: libguile.so; compiled Scheme files

Make compatibility easier: minimize interface

Ex: scm_sym_unquote, GOOPS, Go, Guix

So if you don't want to lose users, don't change any interface. The easiest way to do this is to minimize your interface surface. In Go, for example, they mostly haven't had dynamic-linking problems because that's not a thing they do: all code is statically linked into binaries. Similarly, Guix doesn't define a stable API, because all of its code is maintained in one "monorepo" that can develop in lock-step.

You always have some interfaces, though. For example Guix can't change its command-line interface from one day to the next, for example, because users would complain. But it's been surprising to me the extent to which Guile has interfaces that I didn't consider. Recently for example in the 3.0 release, we unexported some symbols by mistake. Users complained, so we're putting them back in now.

parallel installs for the win

Highly effective pattern for change

  • libguile-2.0.so

  • libguile-3.0.so

https://ometer.com/parallel.html

Changed ABI is new ABI; it should have a new name

Ex: make-struct/no-tail, GUILE_PKG([2.2]), libtool

So how does one do incompatible change? If "don't" isn't a sufficient answer, then parallel installs is a good strategy. For example in Guile, users don't have to upgrade to 3.0 until they are ready. Guile 2.2 happily installs in parallel with Guile 3.0.

As another small example, there's a function in Guile called make-struct (old doc link), whose first argument is the number of "tail" slots, followed by initializers for all slots (normal and "tail"). This tail feature is weird and I would like to remove it. Unfortunately I can't just remove the argument, so I had to make a new function, make-struct/no-tail, which exists in parallel with the old version that I can't break.

deprecation facilitates migration

__attribute__ ((__deprecated__))
(issue-deprecation-warning
 "(ice-9 mapping) is deprecated."
 "  Use srfi-69 or rnrs hash tables instead.")
scm_c_issue_deprecation_warning
  ("Arbiters are deprecated.  "
   "Use mutexes or atomic variables instead.");

begin-deprecated, SCM_ENABLE_DEPRECATED

Fortunately there is a way to encourage users to migrate from old interfaces to new ones: deprecation. In Guile this applies to all of our interfaces (binary, source, etc). If a feature is marked as deprecated, we cause its use to issue a warning, ideally at compile-time when users responsible for the package can fix it. You can even add __attribute__((__deprecated__)) on C types!

the arch-pattern

Replace, Deprecate, Remove

All change is possible; question is only length of deprecation period

Applies to all interfaces

Guile deprecation period generally one stable series

Ex: scm_t_uint8; make-struct; Foreign objects; uniform vectors

Finally, you end up in a situation where you have replaced the old interface and issued deprecation warnings to help users migrate. The next step is to remove the old interface. If you don't do this, you are failing as a project maintainer -- your project becomes literally unmaintainable as it just grows and grows.

This strategy applies to all changes. The deprecation period may last a while, and it may be that the replacement you built doesn't serve the purpose. There is still a dialog with the users that needs to happen. As an example, I made a replacement for the "SMOB" facility in Guile that allows users to define new types, backed by C interfaces. This new "foreign object" facility might not actually be good enough to replace SMOBs; since I haven't formally deprecatd SMOBs, I don't know yet because users are still using the old thing!

change produces a new stable point

Stability within series: only additions

Corollary: dependencies must be at least as stable as you!

  • for your definition of stable

  • social norms help (GNU, semver)

Ex: libtool; unistring; gnulib

In my experience, the old management dictum that "the only constant is change" does not describe software. Guile changes, then it becomes stable for a while. You need an unstable series escape hill-climbing, then once you found your new hill, you start climbing again in the stable series.

Once you reach your stable point, the projects you rely on need to exhibit the same degree of stability that you envision for your project. You can't build a web site that you expect to maintain for 10 years on technology that fundamentally changes every 6 months. But stable dependencies isn't something you can ensure technically; rather it relies on social norms of who makes the software you use.

who can crank the motor of history?

All libraries define languages

Allow user to evolve the language

  • User functionality: modules (Guix)

  • User syntax: macros (yay Scheme)

Guile 1.8 perf created tension

  • incorporate code into Guile

  • large C interface “for speed”

Compiler removed pressure on C ABI

Empowered users need less from you

A dialectic process does not progress on its own: it requires actions. As a project maintainer, some of my actions are because I want to do them. Others are because users want me to do them. The user-driven actions are generally a burden and as a lazy maintainer, I want to minimize them.

Here I think Guile has to a large degree escaped some of the pressures that weigh on other languages, for example Python. Because Scheme allows users to define language features that exist on par with "built-in" features, users don't need my approval or intervention to add (say) new syntax to the language they work in. Furthermore, their work can still compose with the work of others, even if the others don't buy in to their language extensions.

Still, Guile 1.8 did have a dynamic whereby the relatively poor performance of having to run all code through primitive-eval meant that users were pushed towards writing extensions in C. This in turn pushed Guile to expose all of its guts for access from C, which obviously has led to an overbloated C API and ABI. Happily the work on the Scheme compiler has mostly relieved this pressure, and we may therefore be able to trim the size of the C API and ABI over time.

contributions and risk

From maintenance point of view, all interface is legacy

Guile: Sometimes OK to accept user modules when they are more stable than Guile

In-tree users keep you honest

Ex: SSAX, fibers, SRFI

It can be a good strategy to "sediment" solutions to common use cases into Guile itself. This can improve the minimalism of an entire ecosystem of code. The maintenance burden has to be minimal, however; Guile has sometimes adopted experimental code into its repository, and without active maintenance, it soon becomes stale relative to what users and the module maintainers expect.

I would note an interesting effect: pieces of code that were adopted into Guile become a snapshot of the coding style at that time. It's useful to have some in-tree users because it gives you a better idea about how a project is seen from the outside, from a code perspective.

sticky bits

Memory management is an ongoing thorn

Local maximum: Boehm-Demers-Weiser conservative collector

How to get to precise, generational GC?

Not just Guile; e.g. CPython __del__

There are some points that resist change. The stickiest of these is the representation of heap-allocated Scheme objects in C. Guile currently uses a garbage collector that "automatically" finds all live Scheme values on the C stack and in registers. It was the right choice at the time, given our maintenance budget. But to get the next bump in performance, we need to switch to a generational garbage collector. It's hard to do that without a lot of pain to C users, essentially because the C language is too weak to express the patterns that we would need. I don't know how to proceed.

I would note, though, that memory management is a kind of cross-cutting interface, and that it's not just Guile that's having problems changing; I understand PyPy has had a lot of problems regarding changes on when Python destructors get called due to its switch from reference counting to a proper GC.

future

We are here: stability

And then?

  • Parallel-installability for source languages: #lang

  • Sediment idioms from Racket to evolve Guile user base

Remove myself from “holding the crank”

So where are we going? Nowhere, for the moment; or rather, up the hill. We just released Guile 3.0, so let's just appreciate that for the time being.

But as far as next steps in language evolution, I think in the short term they are essentially to further enable change while further sedimenting good practices into Guile. On the change side, we need parallel installability for entire languages. Racket did a great job facilitating this with #lang and we should just adopt that.

As for sedimentation, we should step back and if any common Guile use patterns built by our users should be include core Guile, and widen our gaze to Racket also. It will take some effort both on a technical perspective but also on a social/emotional consensus about how much change is good and how bold versus conservative to be: putting the dialog into dialectic.

dialectic, boogie woogie woogie

https://gnu.org/s/guile

https://wingolog.org/

#guile on freenode

@andywingo

wingo@igalia.com

Happy hacking!

Hey that was the talk! Hope you enjoyed the writeup. Again, video and slides available on the FOSDEM web site. Happy hacking!

by Andy Wingo at February 07, 2020 11:38 AM

January 17, 2020

Iago Toral

Raspberry Pi 4 V3D driver gets OpenGL ES 3.1 conformance

So continuing with the news, here is a fairly recent one: as the tile states, I am happy to announce that the Raspberry Pi 4 is now an OpenGL ES 3.1 conformant product!. This means that the Mesa V3D driver has successfully passed a whole lot of tests designed to validate the OpenGL ES 3.1 feature set, which should be a good sign of driver quality and correctness.

It should be noted that the Raspberry Pi 4 shipped with a V3D driver exposing OpenGL ES 3.0, so this also means that on top of all the bugfixes that we implemented for conformance, the driver has also gained new functionality! Particularly, we merged Eric’s previous work to enable Compute Shaders.

All this work has been in Mesa master since December (I believe there is only one fix missing waiting for us to address review feedback), and will hopefully make it to Raspberry Pi 4 users soon.

by Iago Toral at January 17, 2020 10:02 AM

Raspberry Pi 4 V3D driver gets Geometry Shaders

I actually landed this in Mesa back in December but never got to announce it anywhere. The implementation passes all the tests available in the Khronos Conformance Tests Suite (CTS). If you give this a try and find any bugs, please report them here with the V3D tag.

This is also the first large feature I land in V3D! Hopefully there will be more coming in the future.

by Iago Toral at January 17, 2020 09:45 AM

I am working on the Raspberry Pi 4 Mesa V3D driver

Yeah… this blog post is well overdue, but better late than never! So yes, I am currently working on progressing the Raspberry Pi 4 Mesa driver stack, together with my Igalian colleagues Piñeiro and Chema, continuing the fantastic work started by Eric Anholt on the Mesa V3D driver.

The Raspberry Pi 4 sports a Video Core VI GPU that is capable of OpenGL ES 3.2, so it is a big update from the Raspberry Pi 3, which could only do OpenGL ES 2.0. Another big change with the Raspberry Pi 4 is that the Mesa v3d driver is the driver used by default with Raspbian. Because both GPUs are quite different, Eric had to write an all new driver for the Raspberry Pi 4, and that is why there are two drivers in Mesa: the VC4 driver is for the Raspberry Pi 3, while the V3D driver targets the Raspberry Pi 4.

As for what we have been working on exactly, I wrote a long post on the Raspberry Pi blog some months ago with a lot of the details, but for those looking for the quick summary:

  • Shader compiler optimizations.
  • Significant Transform Feedback fixes and improvements.
  • Implemented OpenGL Logic Operations.
  • A bunch of bugfixes for Piglit test failures.
  • Set up a Continuous Integration system to identify regressions.
  • Rebased and merge Eric’s work on Compute Shaders.
  • Many bug fixes targeting the Khronos OpenGL ES Conformance Test Suite (CTS).

So that’s it for the late news. I hope to do a better job keeping this blog updated with the news this year, and to start with that I will be writing a couple of additional posts to highlight a few significant development milestones we achieved recently, so stay tuned for more!

by Iago Toral at January 17, 2020 09:31 AM

January 16, 2020

Paulo Matos

Cross-Arch Reproducibility using Containers

I present the use of containers for cross architecture reproducibility using docker and podman, which I then go on to apply to JSC. If you are trying to understand how to create cross-arch reproducible environments for your software, this might help you!

More…

by Paulo Matos at January 16, 2020 04:00 PM

January 08, 2020

Víctor Jáquez

GStreamer-VAAPI 1.16 and libva 2.6 in Debian

Debian has migrated libva 2.6 into testing. This release includes a pull request that changes how the drivers are selected to be loaded and used. As the pull request mentions:

libva will try to load iHD firstly, if it failed. then it will load i965.

Also, Debian testing has imported that iHD driver with two flavors: intel-media-driver and intel-media-driver-non-free. So basically iHD driver is now the main VAAPI driver for Intel platforms, though it only supports the new chips, the old ones still require i965-va-driver.

Sadly, for current GStreamer-VAAPI stable, the iHD driver is not included in its driver white list. And this will pose a problem for users that have installed either of the intel-media-driver packages, because, by default, such driver is ignored and the VAAPI GStreamer elements won’t be registered.

There are three temporal workarounds (mutually excluded) for those users (updated):

  1. Uninstall intel-media-driver* and install (or keep) the old i965-va-driver-shaders/i965-va-driver.
  2. Export, by default in your session, export LIBVA_DRIVER_NAME=i965. Normally this is done adding the variable exportation in $HOME/.profile file. This environment variable will force libva to load the i965 driver.
  3. And finally, export, by default in your sessions, GST_VAAPI_ALL_DRIVERS=1. This is not advised since many applications, such as Epiphany, might fail.

We prefer to not include iHD in the stable white list because most of the work done for that driver has occurred after release 1.16.

In the case of GStreamer-VAAPI master branch (actively in develop) we have merged the iHD in the white list, since the Intel team has been working a lot to make it work. Though, it will be released for GStreamer version 1.18.

by vjaquez at January 08, 2020 04:36 PM

Angelos Oikonomopoulos

A Dive Into JavaScriptCore

Recently, the compiler team at Igalia was discussing the available resources for the WebKit project, both for the purpose of onboarding new Igalians and for lowering the bar for third-party contributors. As compiler people, we are mainly concerned with JavaScriptCore (JSC), WebKit’s javascript engine implementation. There are many high quality blog posts on the webkit blog that describe various phases in the evolution of JSC, but finding one’s bearings in the actual source can be a daunting task.

The aim of this post is twofold: first, document some aspects of JavaScriptCore at the source level; second, show how one can figure out what a piece of code actually does in a large and complex source base (which JSC’s certainly is).

In medias res

As an exercise, we’re going to arbitrarily use a commit I had open in a web browser tab. Specifically, we will be looking at this snippet:

Operands<Optional<JSValue>> mustHandleValues(codeBlock->numParameters(), numVarsWithValues);
int localsUsedForCalleeSaves = static_cast<int>(CodeBlock::llintBaselineCalleeSaveSpaceAsVirtualRegisters());
for (size_t i = 0; i < mustHandleValues.size(); ++i) {
    int operand = mustHandleValues.operandForIndex(i);
    if (operandIsLocal(operand) && VirtualRegister(operand).toLocal() < localsUsedForCalleeSaves)
	continue;
    mustHandleValues[i] = callFrame->uncheckedR(operand).jsValue();
}

This seems like a good starting point for taking a dive into the low-level details of JSC internals. Virtual registers look like a concept that’s good to know about. And what are those “locals used for callee saves” anyway? How do locals differ from vars? What are “vars with values”? Let’s find out!

Backstory

Recall that JSC is a multi-tiered execution engine. Most Javascript code is only executed once; compiling takes longer than simply interpreting the code, so Javascript code is always interpreted the first time through. If it turns out that a piece of code is executed frequently though1, compiling it becomes a more attractive proposition.

Initially, the tier up happens to the baseline JIT, a simple and fast non-optimizing compiler that produces native code for a Javascript function. If the code continues to see much use, it will be recompiled with DFG, an optimizing compiler that is geared towards low compilation times and decent performance of the produced native code. Eventually, the code might end up being compiled with the FTL backend too, but the upper tiers won’t be making an appearence in our story here.

What do tier up and tier down mean? In short, tier up is when code execution switches to a more optimized version, whereas tier down is the reverse operation. So the code might tier up from the interpreter to the baseline JIT, but later tier down (under conditions we’ll briefly touch on later) back to the baseline JIT. You can read a more extensive overview here.

Diving in

With this context now in place, we can revisit the snippet above. The code is part of operationOptimize. Just looking at the two sites it’s referenced in, we can see that it’s only ever used if the DFG_JIT option is enabled. This is where the baseline JIT ➞ DFG tier up happens!

The sites that make use of operationOptimize both run during the generation of native code by the baseline JIT. The first one runs in response to the op_enter bytecode opcode, i.e. the opcode that marks entry to the function. The second one runs when encountering an op_loop_hint opcode (an opcode that only appears at the beginning of a basic block marking the entry to a loop). Those are the two kinds of program points at which execution might tier up to the DFG.

Notice that calls to operationOptimize only occur during execution of the native code produced by the baseline JIT. In fact, if you look at the emitted code surrounding the call to operationOptimize for the function entry case, you’ll see that the call is conditional and only happens if the function has been executed enough times that it’s worth making a C++ call to consider it for optimization.

The function accepts two arguments: a vmPointer which is, umm, a pointer to a VM structure (i.e. the “state of the world” as far as this function is concerned) and the bytecodeIndex. Remember that the bytecode is the intermediate representation (IR) that all higher tiers start compiling from. In operationOptimize, the bytecodeIndex is used for

Again, the bytecodeIndex is a parameter that has already been set in stone during generation of the native code by the baseline JIT.

The other parameter, the VM, is used in a number of things. The part that’s relevant to the snippet we started out to understand is that the VM is (sometimes) used to give us access to the current CallFrame. CallFrame inherits from Register, which is a thin wrapper around a (maximally) 64-bit value.

The CodeBlock

In this case, the various accessors defined by CallFrame effectively treat the (pointer) value that CallFrame consists of as a pointer to an array of Register values. Specifically, a set of constant expressions

struct CallFrameSlot {
    static constexpr int codeBlock = CallerFrameAndPC::sizeInRegisters;
    static constexpr int callee = codeBlock + 1;
    static constexpr int argumentCount = callee + 1;
    static constexpr int thisArgument = argumentCount + 1;
    static constexpr int firstArgument = thisArgument + 1;
};

give the offset (relative to the callframe) of the pointer to the codeblock, the callee, the argument count and the this pointer. Note that the first CallFrameSlot is the CallerFrameAndPC, i.e. a pointer to the CallFrame of the caller and the returnPC.

The CodeBlock is definitely something we’ll need to understand better, as it appears in our motivational code snippet. However, it’s a large class that is intertwined with a number of other interesting code paths. For the purposes of this discussion, we need to know that it

  • is associated with a code block (i.e. a function, eval, program or module code block)
  • holds data relevant to tier up/down decisions and operations for the associated code block

We’ll focus on three of its data members:

int m_numCalleeLocals;
int m_numVars;
int m_numParameters;

So, it seems that a CodeBlock can have at least some parameters (makes sense, right?) but also has both variables and callee locals.

First things first: what’s the difference between callee locals and vars? Well, it turns out that m_numCalleeLocals is only incremented in BytecodeGeneratorBase<Traits>::newRegister whereas m_numVars is only incremented in BytecodeGeneratorBase<Traits>::addVar(). Except, addVar calls into newRegister, so vars are a subset of callee locals (and therefore m_numVarsm_numCalleelocals).

Somewhat surprisingly, newRegister is only called in 3 places:

So there you have it. Callee locals

  1. are allocated by a function called newRegister
  2. are either a var or a temporary.

Let’s start with the second point. What is a var? Well, let’s look at where vars are created (via addVar):

There is definitely a var for every lexical variable (VarKind::Stack), i.e. a non-local variable accessible from the current scope. Vars are also generated (via BytecodeGenerator::createVariable) for

So, intuitively, vars are allocated more or less for “every JS construct that could be called a variable”. Conversely, temporaries are storage locations that have been allocated as part of bytecode generation (i.e. there is no corresponding storage location in the JS source). They can store intermediate calculation results and what not.

Coming back to the first point regarding callee locals, how come they’re allocated by a function called newRegister? Why, because JSC’s bytecode operates on a register VM! The RegisterID returned by newRegister wraps the VirtualRegister that our register VM is all about.

Virtual registers, locals and arguments, oh my!

A virtual register (of type VirtualRegister) consists simply of an int (which is also called its offset). Each virtual register corresponds to one of

There is no differentiation between locals and arguments at the type level (everything is a (positive) int); However, virtual registers that map to locals are negative and those that map to arguments are nonnegative. In the context of bytecode generation, the int

It feels like JSC is underusing C++ here.

In all cases, what we get after indexing with a local, argument or constant is a RegisterID. As explained, the RegisterID wraps a VirtualRegister. Why do we need this indirection?

Well, there are two extra bits of info in the RegisterID. The m_refcount and an m_isTemporary flag. The reference count is always greater than zero for a variable, but the rules under which a RegisterID is ref’d and unref’d are too complicated to go into here.

When you have an argument, you get the VirtualRegister for it by directly adding it to CallFrame::thisArgumentoffset.

When you have a local, you map it to (-1 - local) to get the corresponding Virtualregister. So

local vreg
0 -1
1 -2
2 -3

(remember, virtual registers that correspond to locals are negative).

For an argument, you map it to (arg + CallFrame::thisArgumentOffset()):

argument vreg
0 this
1 this + 1
2 this + 2

Which makes all the sense in the world when you remember what the CallFrameSlot looks like. So argument 0 is always the `this` pointer.

If the vreg is greater than some large offset (s_firstConstantRegisterIndex), then it is an index into the CodeBlock's constant pool (after subtracting the offset).

Bytecode operands

If you’ve followed any of the links to the functions doing the actual mapping of locals and arguments to a virtual register, you may have noticed that the functions are called localToOperand and argumentToOperand. Yet they’re only ever used in virtualRegisterForLocal and virtualRegisterForArgument respectively. This raises the obvious question: what are those virtual registers operands of?

Well, of the bytecode instructions in our register VM of course. Instead of recreating the pictures, I’ll simply encourage you to take a look at a recent blog post describing it at a high level.

How do we know that’s what “operand” refers to? Well, let’s look at a use of virtualRegisterForLocal in the bytecode generator. BytecodeGenerator::createVariable will allocate2 the next available local index (using the size of m_calleeLocals to keep track of it). This calls into virtualRegisterForLocal, which maps the local to a virtual register by calling localToOperand.

The newly allocated local is inserted into the function symbol table, along with its offset (i.e. the ID of the virtual register).

The SymbolTableEntry is looked up when we generate bytecode for a variable reference. A variable reference is represented by a ResolveNode3.

So looking into ResolveNode::emitBytecode, we dive into BytecodeGenerator::variable and there’s our symbolTable->get() call. And then the symbolTableEntry is passed to BytecodeGenerator::variableForLocalEntry which uses entry.varOffset() to initialize the returned Variable with offset. It also uses registerFor to retrieve the RegisterID from m_calleeLocals.

ResolveNode::emitBytecode will then pass the local RegisterID to move which calls into emitMove, which just calls OpMov::emit (a function generated by the JavaScriptCore/generator code). Note that the compiler implicitly converts the RegisterID arguments to VirtualRegister type at this step. Eventually, we end up in the (generated) function

template<OpcodeSize __size, bool recordOpcode, typename BytecodeGenerator>
static bool emitImpl(BytecodeGenerator* gen, VirtualRegister dst, VirtualRegister src)
{
    if (__size == OpcodeSize::Wide16)
	gen->alignWideOpcode16();
    else if (__size == OpcodeSize::Wide32)
	gen->alignWideOpcode32();
    if (checkImpl<__size>(gen, dst, src)) {
	if (recordOpcode)
	    gen->recordOpcode(opcodeID);
	if (__size == OpcodeSize::Wide16)
	    gen->write(Fits<OpcodeID, OpcodeSize::Narrow>::convert(op_wide16));
	else if (__size == OpcodeSize::Wide32)
	    gen->write(Fits<OpcodeID, OpcodeSize::Narrow>::convert(op_wide32));
	gen->write(Fits<OpcodeID, __size>::convert(opcodeID));
	gen->write(Fits<VirtualRegister, __size>::convert(dst));
	gen->write(Fits<VirtualRegister, __size>::convert(src));
	return true;
    }
    return false;
}

where Fits::convert(VirtualRegister) will trivially encode the VirtualRegister into the target type. Specifically the mapping is nicely summed up in the following comment

// Narrow:
// -128..-1  local variables
//    0..15  arguments
//   16..127 constants
//
// Wide16:
// -2**15..-1  local variables
//      0..64  arguments
//     64..2**15-1 constants

You may have noticed that the Variable returned by BytecodeGenerator::variableForLocalEntry already has been initialized with the virtual register offset we set when inserting the SymbolTableEntry for the local variable. And yet we use registerFor to look up the RegisterID for the local and then use the offset of the VirtualRegister contained therein. Surely those are the same? Oh well, something for a runtime assert to check.

Variables with values

Whew! Quite the detour there. Time to get back to our original snippet:

Operands<Optional<JSValue>> mustHandleValues(codeBlock->numParameters(), numVarsWithValues);
int localsUsedForCalleeSaves = static_cast<int>(CodeBlock::llintBaselineCalleeSaveSpaceAsVirtualRegisters());
for (size_t i = 0; i < mustHandleValues.size(); ++i) {
    int operand = mustHandleValues.operandForIndex(i);
    if (operandIsLocal(operand) && VirtualRegister(operand).toLocal() < localsUsedForCalleeSaves)
	continue;
    mustHandleValues[i] = callFrame->uncheckedR(operand).jsValue();
}

What are those numVarsWithValues then? Well, the definition is right before our snippet:

unsigned numVarsWithValues;
if (bytecodeIndex)
    numVarsWithValues = codeBlock->numCalleeLocals();
else
    numVarsWithValues = 0;

OK, so this looks straighforward for a change. If the bytecodeIndex is not zero, we’re doing the tier up from JIT to DFG in the body of a function (i.e. at a loop entry). In that case, we consider all our callee locals to have values. Conversely, when we’re running for the function entry (i.e. bytecodeIndex == 0), none of the callee locals are live yet. Do note that the variable is incorrectly named. Vars are not the same as callee locals; we’re dealing with the latter here.

A second gotcha is that, whereas vars are always live, temporaries might not be. The DFG compiler will do liveness analysis at compile time to make sure it’s only looking at live values. That must have been a fun bug to track down!

Values that must be handled

Back to our snippet, numVarsWithValues is used as an argument to the constructor of mustHandleValues which is of type Operands<Optional<JSValue>>. Right, so what are the Operands? They simply hold a number of T objects (here T is Optional<JSValue>) of which the first m_numArguments correspond to, well, arguments whereas the remaining correspond to locals.

What we’re doing here is recording all the live (non-heap, obviously) values when we try to do the tier up. The idea is to be able to mix those values in with the previously observed values that DFG’s Control Flow Analysis will use to emit code which will bail us out of the optimized version (i.e. do a tier down). According to the comments and commit logs, this is in order to increase the chances of a successful OSR entry (tier up), even if the resulting optimized code may be slightly less conservative.

Remember that the optimized code that we tier up to makes assumptions with regard to the types of the incoming values (based on what we’ve observed when executing at lower tiers) and wil bail out if those assumptions are not met. Taking the values of the current execution at the time of the tier up attempt ensures we won’t be doing all this work only to immediately have to tier down again.

Operands provides an operandForIndex method which will directly give you a virtual reg for every kind of element. For example, if you had called Operands<T> opnds(2, 1), then the first iteration of the loop would give you

operandForIndex(0)
-> VirtualRegisterForargument(0).offset()
  -> VirtualRegister(argumentToOperand(0)).offset()
    -> VirtualRegister(CallFrame::thisArgumentOffset).offset()
      -> CallFrame::thisArgumentOffset

The second iteration would similarly give you CallFrame::thisArgumentOffset + 1.

In the third iteration, we’re now dealing with a local, so we’d get

operandForIndex(2)
-> virtualRegisterForLocal(2 - 2).offset()
  -> VirtualRegister(localToOperand(0)).offset()
    -> VirtualRegister(-1).offset()
      -> -1

Callee save space as virtual registers

So, finally, what is our snippet doing here? It’s iterating over the values that are likely to be live at this program point and storing them in mustHandleValues. It will first iterate over the arguments (if any) and then over the locals. However, it will use the “operand” (remember, everything is an int…) to get the index of the respective local and then skip the first locals up to localsUsedForCalleeSaves. So, in fact, even though we allocated space for (arguments + callee locals), we skip some slots and only store (arguments + callee locals - localsUsedForCalleeSaves). This is OK, as the Optional<JSValue> values in the Operands will have been initialized by the default constructor of Optional<> which gives us an object without a value (i.e. an object that will later be ignored).

Here, callee-saved register (csr) refers to a register that is available for use to the LLInt and/or the baseline JIT. This is described a bit in LowLevelInterpreter.asm, but is more apparent when one looks at what csr sets are used on each platform (or, in C++).

platform metadataTable PC-base (PB) numberTag notCellMask
X86_64 csr1 csr2 csr3 csr4
x86_64_win csr3 csr4 csr5 csr6
ARM64~/~ARM64E csr6 csr7 csr8 csr9
C_LOOP 64b csr0 csr1 csr2 csr3
C_LOOP 32b csr3 - - -
ARMv7 csr0 - - -
MIPS csr0 - - -
X86 - - - -

On 64-bit platforms, offlineasm (JSC’s portable assembler) makes a range of callee-saved registers available to .asm files. Those are properly saved and restored. For example, for X86_64 on non-Windows platforms, the returned RegisterSet contains registers r12-r15 (inclusive), i.e. the callee-saved registers as defined in the System V AMD64 ABI. The mapping from symbolic names to architecture registers can be found in GPRInfo.

On 32-bit platforms, the assembler doesn’t make any csr regs available, so there’s nothing to save except if the platform makes special use of some register (like C_LOOP does for the metadataTable 4).

What are the numberTag and notCellMask registers? Out of scope, that’s what they are!

Conclusion

Well, that wraps it up. Hopefully now you have a better understanding of what the original snippet does. In the process, we learned about a few concepts by reading through the source and, importantly, we added lots of links to JSC’s source code. This way, not only can you check that the textual explanations are still valid when you read this blog post, you can use the links as spring boards for further source code exploration to your heart’s delight!

Footnotes

1 Both the interpreter – better known as LLInt – and the baseline JIT keep track of execution statistics, so that JSC can make informed decisions on when to tier up.

2 Remarkably, no RegisterID has been allocated at this point – we used the size of m_calleeLocals but never modified it. Instead, later in the function (after adding the new local to the symbol table!) the code will call addVar which will allocate a new “anonymous” local. But then the code asserts that the index of the newly allocated local (i.e. the offset of the virtual register it contains) is the same as the offset we previously used to create the virtual register, so it’s all good.

3 How did we know to look for the ResolveNode? Well, the emitBytecode method needs to be implemented by subclasses of ExpressionNode. If we look at how a simple binary expression is parsed (and given that ASTBuilder defines BinaryOperand as std::pair<ExpressionNode*, BinaryOpInfo>), it’s clear that any variable reference has already been lifted to an ExpressionNode.

So instead, we take the bottom up approach. We find the lexer/parser token definitions, one of which is the IDENT token. Then it’s simply a matter of going over its uses in Parser.cpp, until we find our smoking gun. This gets us into createResolve aaaaand

return new (m_parserArena) ResolveNode(location, ident, start);

That’s the node we’re looking for!

4 C_LOOP is a special backend for JSC’s portable assembler. What is special about it is that it generates C++ code, so that it can be used on otherwise unsupported architectures. Remember that the portable assembler (offlineasm) runs at compilation time.

January 08, 2020 12:00 PM

A Dive Into JavaScriptCore

This post is an attempt to both document some aspects of JSC at the source level and to show how one can figure out what a piece of code actually does in a source base as large and complex as JSC's.

January 08, 2020 12:00 PM

January 01, 2020

Brian Kardell

Unlocking Colors

Unlocking Colors

Despite the fact that I draw and paint, and am involved with CSS, I am actually generally not good at design. Color, however, is endlessly fascinating. As there have been many color-oriented proposals for CSS, to enable people who are good at design, I thought it might be nice to write about an important, but underdiscussed bit that's necessary in order to really unlock a lot - and how we can help.

First... What even is color?

When you look at real physical things, what you perceive as color is a reflection of which wavelengths of light bounce back at you without being 'swallowed up' by the pigment. Pigments are subtractive, light is additive. That is, if you combine all of the colors of light, you get a pure white - and if you combine all of the pigments you get a pure black.

There is 'real information' that exists in the universe, but there are also real constraints when talking about color: If the light shining on that pigment doesn't contain all of the possible wavelengths, that color will look different, for example. Or, if your monitor can't physically create true pure white or pure black, things are a little skewed.

When we're doing things digitally there's another important constraint: We also have to store colors with some number of bits and format, express those somehow, and be able to reason about them. Storing 'really accurate true values' would require a lot of bits and machines (at various points in time) weren't capable of creating, managing or displaying - either efficiently, or at all. As a result, over the years we've created a lot of interesting tradeoffs with regard to color.

Color spaces

If you can imagine a true rainbow, with all of the real possible colors in the universe, the set of those you can actually represent (and how you move about in it) is called a 'color space'. The limits of a color space are called its gammut. Because your monitor works with light, it is an additive process and 'Red' 'Green' and 'Blue' combined to create white (once upon a time, with actual CRTs). Along the way, we created the sRGB color space. The sRGB color space is a limited section of the rainbow, which is kind of narrow compared to all of the possible colors that exist in nature, but is very optimized for machines and common cases. Below is an example from wikipedia illustrating sRBG and color spaces illustrating the sRGB color space. The colors within the triangle are part of the sRGB color space.

sRGB color space, visualized

But, the point is that these tradeoffs aren't always desirable.

Today, for example, we have machines capable of recording or displaying things with a much wider gammut -- the stuff outside the triangle. Your monitor does, and if you have a TV capable of HDR (hi dynamic range) that's got a wider gammut, for example... And then, there's the math…

Humans vs machines

It's more than just efficiency. sRGB, as I said, is really built for machines - and humans and machines are quite different. If I showed you a color and asked you to guess the value in RGB, mapped that to sRGB space and showed you on your monitor, chances that you could get it very close are pretty low in general. You can test this theory with hex-guess.glitch.me. Further, there's not great odds that you could efficiently zero in on closer and closer simply because it's a little unnatural. This is why we have created things like like hsla which some people think is easier to reason about 'more saturated' and 'lighter' and a circle of color is potentially a little easier to think about. But it's not just that one uses RGB either, it's that the color space itself has weird characteristics.

This plays in especially if we want to reason about things mathematically - for example - in creating design systems.

Our design systems, ideally, want to reason about colors by 'moving through the color space'. Doing things like mixing colors, lightening, darkening, can be done well only if they include a sense of how our eyes really work rather than how machines like to think about storing and displaying. That is, if I move this one aspect just a little bit, I expect it to have the same effect no matter what the color I start with is... But in sRGB space, that really isn't true at all. The sRGB space is not perceptually uniform. The same mathematical movement has different degrees of perceived effect depending on where you are at in the color space. If you want to read a designer's experience with this, here's an interesting example which does a good job struggling to do well.

Some examples…

This is, of course, very hard to understand if you don't live in this space because we're just not used to it. It doesn't sound intuitive, or maybe it sounds unnecessarily complex - but it really isn't. It's kind of simple: the RGB color spaces aren't built for humans to reason about mathmetically, really, that's it. Here are some examples where it is easy to see in action...

See the Pen NWWZPdL by Brian Kardell (@bkardell) on CodePen.

Recently, my friend and co-author of the CSS Colors specification, Adam Argyle also ran a twitter poll asking this question "Which of these is ligher". Polls are tricky for stuff like this because people "assume" it's a trick question somehow. Nevertheless, there was overwhelming agreement that #2 was lighter… Because, well, it is.

There's another great illustration of this in that earlier article struggling with this

The perceived brightness of all of the hues in a spectrum with the same saturation and lightness... It's quite clear they're different.
This matters a lot...

This matters a lot, for example, if you wanted to create a design system which mathematically reasons about color contrast (just like the author of that article).

There are color spaces like Lab and LCH which deal with the full spectrum and have qualities like perceptual uniformness. Thus, if we want great color functions for use in real design systems everyone seems to agree that having support to do said math in the Lab/LCH color spaces is the ideal enabling feature. It's not exactly a prerequisite to doing useful things. In fact, a lot of designers it seems aren't even aware of this quality because we've lived so long with the sRGB space. However, realistically, we get the most value if we invest in having support for these two color spaces first. They are also well defined, don't require a lot of debate and are non-controversial. Having them would make everything else so much the better, because they give us the right tools to help the actual humans.

Recently, Adam opened this issue in Chromium. (Update: Since this post Simon Frasier has also opened this issue in WebKit).

Let's get this done.

To recap... We have two CSS color related functions which:

  • Are not controversial
  • Are quite mature and widely used color standards
  • Are important for accesibility
  • Are important for design systems
  • Make all of the other design system related work much more attractive
  • Are actually some of the easiest things we could do in CSS
  • Just aren't getting done

Chances are pretty good, I think, that as a designer or developer reading this, you'll find that last bullet irritating. You might be asking yourself "But why isn't it getting done? Wouldn't that help so much?". The answer is neither as offensive nor complex as you might imagine, it's kind of simple really: Yes, it would - but there's just so much in the pipeline already. In the end, everyone (including browser vendors) has to prioritize the work in the queue with available resources and skills. There are only so many people with the particular skills to be working on this available, and there are lots of important or interesting things competing for their prioritization.

I recently wrote a post about this problem and how Igalia enables us to move things like this. We have shown that we can get big things done - like CSS Grid. This is a small thing in terms of effort and cost with an outsized impact. We don't need to wait, ask for the browsers to move this in their priority queues, we just need someone to see the value in funding the work - and it's not huge. We can get this done.

So, if your organization is interested in design systems, or how you can help move important standards work, this is a great entry point for discussion and I'd love to talk with you. My Twitter dms are open as well.

January 01, 2020 05:00 AM

December 23, 2019

Mario Sanchez Prada

End of the year Update: 2019 edition

It’s the end of December and it seems that yet another year has gone by, so I figured that I’d write an EOY update to summarize my main work at Igalia as part of our Chromium team, as my humble attempt to make up for the lack of posts in this blog during this year.

I did quit a few things this year, but for the purpose of this blog post I’ll focus on what I consider the most relevant ones: work on the Servicification and the Blink Onion Soup projects, the migration to the new Mojo APIs and the BrowserInterfaceBroker, as well as a summary of the conferences I attended, both as a regular attendee and a speaker.

But enough of an introduction, let’s dive now into the gory details…

Servicification: migration to the Identity service

As explained in my previous post from January, I’ve started this year working on the Chromium Servicification (s13n) Project. More specifically, I joined my team mates in helping with the migration to the Identity service by updating consumers of several classes from the sign-in component to ensure they now use the new IdentityManager API instead of directly accessing those other lower level APIs.

This was important because at some point the Identity Service will run in a separate process, and a precondition for that to happen is that all access to sign-in related functionality would have to go through the IdentityManager, so that other process can communicate with it directly via Mojo interfaces exposed by the Identity service.

I’ve already talked long enough in my previous post, so please take a look in there if you want to know more details on what that work was exactly about.

The Blink Onion Soup project

Interestingly enough, a bit after finishing up working on the Identity service, our team dived deep into helping with another Chromium project that shared at least one of the goals of the s13n project: to improve the health of Chromium’s massive codebase. The project is code-named Blink Onion Soup and its main goal is, as described in the original design document from 2015, to “simplify the codebase, empower developers to implement features that run faster, and remove hurdles for developers interfacing with the rest of the Chromium”. There’s also a nice slide deck from 2016’s BlinkOn 6 that explains the idea in a more visual way, if you’re interested.


“Layers”, by Robert Couse-Baker (CC BY 2.0)

In a nutshell, the main idea is to simplify the codebase by removing/reducing the several layers of located between Chromium and Blink that were necessary back in the day, before Blink was forked out of WebKit, to support different embedders with their particular needs (e.g. Epiphany, Chromium, Safari…). Those layers made sense back then but these days Blink’s only embedder is Chromium’s content module, which is the module that Chrome and other Chromium-based browsers embed to leverage Chromium’s implementation of the Web Platform, and also where the multi-process and sandboxing architecture is implemented.

And in order to implement the multi-process model, the content module is split in two main parts running in separate processes, which communicate among each other over IPC mechanisms: //content/browser, which represents the “browser process” that you embed in your application via the Content API, and //content/renderer, which represents the “renderer process” that internally runs the web engine’s logic, that is, Blink.

With this in mind, the initial version of the Blink Onion Soup project (aka “Onion Soup 1.0”) project was born about 4 years ago and the folks spearheading this proposal started working on a 3-way plan to implement their vision, which can be summarized as follows:

  1. Migrate usage of Chromium’s legacy IPC to the new IPC mechanism called Mojo.
  2. Move as much functionality as possible from //content/renderer down into Blink itself.
  3. Slim down Blink’s public APIs by removing classes/enums unused outside of Blink.

Three clear steps, but definitely not easy ones as you can imagine. First of all, if we were to remove levels of indirection between //content/renderer and Blink as well as to slim down Blink’s public APIs as much as possible, a precondition for that would be to allow direct communication between the browser process and Blink itself, right?

In other words, if you need your browser process to communicate with Blink for some specific purpose (e.g. reacting in a visual way to a Push Notification), it would certainly be sub-optimal to have something like this:

…and yet that is what would happen if we kept using Chromium’s legacy IPC which, unlike Mojo, doesn’t allow us to communicate with Blink directly from //content/browser, meaning that we’d need to go first through //content/renderer and then navigate through different layers to move between there and Blink itself.

In contrast, using Mojo would allow us to have Blink implement those remote services internally and then publicly declare the relevant Mojo interfaces so that other processes can interact with them without going through extra layers. Thus, doing that kind of migration would ultimately allow us to end up with something like this:

…which looks nicer indeed, since now it is possible to communicate directly with Blink, where the remote service would be implemented (either in its core or in a module). Besides, it would no longer be necessary to consume Blink’s public API from //content/renderer, nor the other way around, enabling us to remove some code.

However, we can’t simply ignore some stuff that lives in //content/renderer implementing part of the original logic so, before we can get to the lovely simplification shown above, we would likely need to move some logic from //content/renderer right into Blink, which is what the second bullet point of the list above is about. Unfortunately, this is not always possible but, whenever it is an option, the job here would be to figure out what of that logic in //content/renderer is really needed and then figure out how to move it into Blink, likely removing some code along the way.

This particular step is what we commonly call “Onion Soup’ing //content/renderer/<feature>(not entirely sure “Onion Soup” is a verb in English, though…) and this is for instance how things looked before (left) and after (right) Onion Souping a feature I worked on myself: Chromium’s implementation of the Push API:


Onion Soup’ing //content/renderer/push_messaging

Note how the whole design got quite simplified moving from the left to the right side? Well, that’s because some abstract classes declared in Blink’s public API and implemented in //content/renderer (e.g. WebPushProvider, WebPushMessagingClient) are no longer needed now that those implementations got moved into Blink (i.e. PushProvider and PushMessagingClient), meaning that we can now finally remove them.

Of course, there were also cases where we found some public APIs in Blink that were not used anywhere, as well as cases where they were only being used inside of Blink itself, perhaps because nobody noticed when that happened at some point in the past due to some other refactoring. In those cases the task was easier, as we would just remove them from the public API, if completely unused, or move them into Blink if still needed there, so that they are no longer exposed to a content module that no longer cares about that.

Now, trying to provide a high-level overview of what our team “Onion Soup’ed” this year, I think I can say with confidence that we migrated (or helped migrate) more than 10 different modules like the one I mentioned above, such as android/, appcache/, media/stream/, media/webrtc, push_messaging/ and webdatabase/, among others. You can see the full list with all the modules migrated during the lifetime of this project in the spreadsheet tracking the Onion Soup efforts.

In my particular case, I “Onion Soup’ed” the PushMessagingWebDatabase and SurroundingText features, which was a fairly complete exercise as it involved working on all the 3 bullet points: migrating to Mojo, moving logic from //content/renderer to Blink and removing unused classes from Blink’s public API.

And as for slimming down Blink’s public API, I can tell that we helped get to a point where more than 125 classes/enums were removed from that Blink’s public APIs, simplifying and reducing the Chromium code- base along the way, as you can check in this other spreadsheet that tracked that particular piece of work.

But we’re not done yet! While overall progress for the Onion Soup 1.0 project is around 90% right now, there are still a few more modules that require “Onion Soup’ing”, among which we’ll be tackling media/ (already WIP) and accessibility/ (starting in 2020), so there’s quite some more work to be done on that regard.

Also, there is a newer design document for the so-called Onion Soup 2.0 project that contains some tasks that we have been already working on for a while, such as “Finish Onion Soup 1.0”, “Slim down Blink public APIs”, “Switch Mojo to new syntax” and “Convert legacy IPC in //content to Mojo”, so definitely not done yet. Good news here, though: some of those tasks are already quite advanced already, and in the particular case of the migration to the new Mojo syntax it’s nearly done by now, which is precisely what I’m talking about next…

Migration to the new Mojo APIs and the BrowserInterfaceBroker

Along with working on “Onion Soup’ing” some features, a big chunk of my time this year went also into this other task from the Onion Soup 2.0 project, where I was lucky enough again not to be alone, but accompanied by several of my team mates from Igalia‘s Chromium team.

This was a massive task where we worked hard to migrate all of Chromium’s codebase to the new Mojo APIs that were introduced a few months back, with the idea of getting Blink updated first and then having everything else migrated by the end of the year.


Progress of migrations to the new Mojo syntax: June 1st – Dec 23rd, 2019

But first things first: you might be wondering what was wrong with the “old” Mojo APIs since, after all, Mojo is the new thing we were migrating to from Chromium’s legacy API, right?

Well, as it turns out, the previous APIs had a few problems that were causing some confusion due to not providing the most intuitive type names (e.g. what is an InterfacePtrInfo anyway?), as well as being quite error-prone since the old types were not as strict as the new ones enforcing certain conditions that should not happen (e.g. trying to bind an already-bound endpoint shouldn’t be allowed). In the Mojo Bindings Conversion Cheatsheet you can find an exhaustive list of cases that needed to be considered, in case you want to know more details about these type of migrations.

Now, as a consequence of this additional complexity, the task wouldn’t be as simple as a “search & replace” operation because, while moving from old to new code, it would often be necessary to fix situations where the old code was working fine just because it was relying on some constraints not being checked. And if you top that up with the fact that there were, literally, thousands of lines in the Chromium codebase using the old types, then you’ll see why this was a massive task to take on.

Fortunately, after a few months of hard work done by our Chromium team, we can proudly say that we have nearly finished this task, which involved more than 1100 patches landed upstream after combining the patches that migrated the types inside Blink (see bug 978694) with those that tackled the rest of the Chromium repository (see bug 955171).

And by “nearly finished” I mean an overall progress of 99.21% according to the Migration to new mojo types spreadsheet where we track this effort, where Blink and //content have been fully migrated, and all the other directories, aggregated together, are at 98.64%, not bad!

On this regard, I’ve been also sending a bi-weekly status report mail to the chromium-mojo and platform-architecture-dev mailing lists for a while (see the latest report here), so make sure to subscribe there if you’re interested, even though those reports might not last much longer!

Now, back with our feet on the ground, the main roadblock at the moment preventing us from reaching 100% is //components/arc, whose migration needs to be agreed with the folks maintaining a copy of Chromium’s ARC mojo files for Android and ChromeOS. This is currently under discussion (see chromium-mojo ML and bug 1035484) and so I’m confident it will be something we’ll hopefully be able to achieve early next year.

Finally, and still related to this Mojo migrations, my colleague Shin and I took a “little detour” while working on this migration and focused for a while in the more specific task of migrating uses of Chromium’s InterfaceProvider to the new BrowserInterfaceBroker class. And while this was not a task as massive as the other migration, it was also very important because, besides fixing some problems inherent to the old InterfaceProvider API, it also blocked the migration to the new mojo types as InterfaceProvider did usually rely on the old types!


Architecture of the BrowserInterfaceBroker

Good news here as well, though: after having the two of us working on this task for a few weeks, we can proudly say that, today, we have finished all the 132 migrations that were needed and are now in the process of doing some after-the-job cleanup operations that will remove even more code from the repository! \o/

Attendance to conferences

This year was particularly busy for me in terms of conferences, as I did travel to a few events both as an attendee and a speaker. So, here’s a summary about that as well:

As usual, I started the year attending one of my favourite conferences of the year by going to FOSDEM 2019 in Brussels. And even though I didn’t have any talk to present in there, I did enjoy my visit like every year I go there. Being able to meet so many people and being able to attend such an impressive amount of interesting talks over the weekend while having some beers and chocolate is always great!

Next stop was Toronto, Canada, where I attended BlinkOn 10 on April 9th & 10th. I was honoured to have a chance to present a summary of the contributions that Igalia made to the Chromium Open Source project in the 12 months before the event, which was a rewarding experience but also quite an intense one, because it was a lightning talk and I had to go through all the ~10 slides in a bit under 3 minutes! Slides are here and there is also a video of the talk, in case you want to check how crazy that was.

Took a bit of a rest from conferences over the summer and then attended, also as usual, the Web Engines Hackfest that we at Igalia have been organising every single year since 2009. Didn’t have a presentation this time, but still it was a blast to attend it once again as an Igalian and celebrate the hackfest’s 10th anniversary sharing knowledge and experiences with the people who attended this year’s edition.

Finally, I attended two conferences in the Bay Area by mid November: first one was the Chrome Dev Summit 2019 in San Francisco on Nov 11-12, and the second one was BlinkOn 11 in Sunnyvale on Nov 14-15. It was my first time at the Chrome Dev Summit and I have to say I was fairly impressed by the event, how it was organised and the quality of the talks in there. It was also great for me, as a browsers developer, to see first hand what are the things web developers are more & less excited about, what’s coming next… and to get to meet people I would have never had a chance to meet in other events.

As for BlinkOn 11, I presented a 30 min talk about our work on the Onion Soup project, the Mojo migrations and improving Chromium’s code health in general, along with my colleague Antonio Gomes. It was basically a “extended” version of this post where we went not only through the tasks I was personally involved with, but also talked about other tasks that other members of our team worked on during this year, which include way many other things! Feel free to check out the slides here, as well as the video of the talk.

Wrapping Up

As you might have guessed, 2019 has been a pretty exciting and busy year for me work-wise, but the most interesting bit in my opinion is that what I mentioned here was just the tip of the iceberg… many other things happened in the personal side of things, starting with the fact that this was the year that we consolidated our return to Spain after 6 years living abroad, for instance.

Also, and getting back to work-related stuff here again, this year I also became accepted back at Igalia‘s Assembly after having re-joined this amazing company back in September 2018 after a 6-year “gap” living and working in the UK which, besides being something I was very excited and happy about, also brought some more responsibilities onto my plate, as it’s natural.

Last, I can’t finish this post without being explicitly grateful for all the people I got to interact with during this year, both at work and outside, which made my life easier and nicer at so many different levels. To all of you,  cheers!

And to everyone else reading this… happy holidays and happy new year in advance!

by mario at December 23, 2019 11:13 PM

December 16, 2019

Brian Kardell

2019, That's a Wrap.

2019, That's a Wrap.

As 2019 draws to a close and we look toward a new year (a new decade, in fact) Igalians like to take a moment and look back on the year. As it is my first year at Igalia, I am substantially impressed by the long list of things we accomplished. I thought it was worth, then, highlighting some of that work and putting it into context: What we did, how, and why I think this work is important the open web platform and to us all…

All the Browsers…

All of the browser projects are open source now and we are active and trusted committers to all of them. That's important because browser diversity matters and we're able to jump in and lend a hand wherever necessary. The rate at which we contribute to any browser can vary from year to year for reasons (many explained in this post), but I'd like to illustrate what that actually looked like in 2019. Let's look at the commits for 2019…

Commits are an imperfect kind of measure. Not all commits are of the same value (in fact, value is hard to qualify here). It doesn't account for potentially lots and lots of downstream work that went into getting one upstream commit. Even figuring out how to count them can be tricky. Looking at them purely as a vague relative scale of "how much did we do", they're one useful illustration. That's how I'd like to you view these statisics - about our slice of the commit pie, and not anyone elses.

Chromium

During BlinkOn this year, you might have seen a number of mentions about our contributions to Chromium in 2019: We made the second most commits after Google this year.

Just one of the slides shared from Google’s presentation at blinkon 2019 showing 1782 commits from Igalia at that point in 2019…

Chromium, if you're not aware, is a very large and busy project with lots of big company contributors, so it's quite exciting to show the level at which we are contributing here. (Speaking of imperfections in this measure - all of our MathML work was done downstream and thus commits toward this very significant project aren't even counted here!)

Mozilla

This year, our contributions to Mozilla projects were relatively small by comparison. Nevertheless, 9 Igalians still actively contributed over 160 commits to servo and mozilla-central in 2019 placing us in the top 10 contributors to servo and the top 20 to mozilla-central.

A little over 1% of contributions to Servo were by Igalians in 2019 (over 3.4% of non-Mozillan commits).

As an aside, it's pretty great to see that the #1 committer to mozilla-central this year is our friend Emilio Cobos (by a wide margin, over 2-to-1). Emilio completed an Igalia Coding Experience in 2017… and gave a great talk at least years Web Engines Hackfest. Congratulations Emilio!

WebKit

Very importantly: We are also the #2 contributor to WebKit. In 2019 37 Igalians made over 1100 commits to WebKit this year, delivering ~11% of the total commits.

Almost 11% of all commits to WebKit in 2019 were from Igalians.

In fact, if we zoom out and look at what that looks like without Apple, our level of committment is even more evident…

60% of all non-Apple commits to WebKit in 2019 were from Igalians.

I think this is important because there's a larger, untold story here which I look forward to talking more about next year. In short though: While you might think of WebKit as “Apple”, the truth is that there are a number of companies invested and interested in keeping WebKit alive and increasingly competitive, and Igalia is key among them. Why? Because 3 of the 5 downloads available from webkit.org are actually maintained by Igalia. These WebKit browsers are used on hundreds of millions of devices already, and that's growing. Chances are even pretty good that you've encountered one, maybe even have one, and just didn't even know it.

A Unique Role…

As I explained in my post earlier this year Beyond Browser Vendors: Igalia does this work by uniquely expanding the ability to prioritize work and allowing us all to better collectively improve the commons and share the load.

Bloomberg, as you might know, funded the development of CSS Grid. In 2019 they continue to fund a lot great efforts in both JavaScript and CSS standardization and implementation. To name just a few in 2019: In CSS they funded some paint optimizations, development of white-space: break-spaces, line-break: anywhere, overflow-wrap: anywhere, ::marker and list-style-type: <string>. In JavaScript, BigInt, Public and Private Class Fields, Decorators and Records and Tuples.

But they weren't remotely alone: 2019 saw many diverse clients funding some kind of upstream work for all sorts of reasons. Google AMP, as another example, funded a lot of great work in our partnership for Predictability efforts. Several sponsors helped fund work on MathML. Our work with embedded browsers has helped introduce a number of new paths for contributions as well. We worked on Internationalization, accessibility, module loading, top-level-await, a proposal for operator overloading and more.

I'm also very proud of fact that we are an example of what we preach in this respect: Among those who sponsor our work is… us. Igalia cares about doing things that matter, and while we are great at finding ways to help fund the advancement of work that matters, we're also willing to invest ourselves.

So, what matters?

Lots of things matter to Igalia, here are a few broad technical areas that we priortized in 2019…

Predictability

This year MDN organized a giant survey which turned up a lot of interesting findings about what things caused developers pain, or that they would like to see improved. Several of these had to do with uneven, inconsistent (buggy) or lacking implementations of standards.

Thanks to lots of clients, and some of our own investment, we were able to help here in a big way. This year we were able to resolve a number of incompatibilities, fix bugs, create a lot of tests, and fix bugs that cause a lot of developer pain. AMP, notably, has sponsored a lot of these important things in partnership with us, even funding 'last implementations' of important missing features to WebKit like ResizeObserver and preloading for responsive images.

In 2019, 20 Igalians made 343 commits to Web Platform Tests, adding thousands of new tests and placing us among the top few contributors..

In 2019, Igalia was the #3 contributor to Web Platform Tests, after Google and Mozilla.

We were also among the top contributors to Test262 as well… Great contributions here from our friends at Bocoup! We are two very unique companies in this space - contributing to the open web with a different kind of background than most. It's excellent to see how much we're all accomplishing together - just between the two of us, that's approaching 60% of the commits!

We're among the top contributors to Test262 as well.
Acccessibility

“Accessibility is very important to us” might sound like a nice thing to say, but at Igalia it's very true. We are probably accessibility on Linux's greatest champions. Here's what Igalia brought to the table in 2019:

As the maintainers of the Orca screen reader for the Gnome desktop we did an enormous amount of work this year with over 460 commits (over 83%) in 2019 to Orca itself.

Over 83% of all commits to Orca in 2019 were from Igalians.

We were also active contributors for lots of the underlying parts (at-spi2-atk, at-spi2-core). And, thanks to investment from Google, we were also able to properly implement ATK support in Chromium, making Chrome for Linux now very accessible with a screen reader. We also actively contributed to Accerciser, a popular Python based interactive accessibility explorer.

Standards

Of course, all of our implentation work is predicated on open standards, and we care deeply about them, and I'm proud of our participation. Here are a few highlights from 2019:

Igalia's Joanmarie Diggs is the chair the ARIA Working Group, and co-editor of several of its related specifications. Igalia had the second most commits to the ARIA specification itself - almost 20% in 2019 behind only Adobe (well done Adobe!).

Almost 1/5th of contributions to ARIA came from Igalians in 2019.

We were also the top 3 contributor to the HTML Accessibility API Mappings 1.0 and actively contributed to Accessible Name and Description Computation.

This year, during the CSS Working Group's Face-to-Face meeting at the Mozilla offices in Toronto, our colleage Oriol Brufau was unanimously, and quickly appointed as co-editor of the CSS Grid Specification.

2019 also saw the hiring of Nikolas Zimmerman, bringing the original authors of KSVG (the original SVG implementation) back together at Igalia. We've nominated Niko to the SVG working group and we we're so excited to share what he's working on (see end of post):

Math on the Web

Enabling authors to share text was an original goal of the Web. That math was an important aspect of this was evident from the very earliest days at CERN, however, because of a complex history and lacking implementation in Chromium, this remained elusive and without a starting point for productive conversation. We believe that this matters to society, and resolving and enabling a way to move forward it is the right thing to do.

With some initial funding from sponsors, this year we also helped to steadily advance and define an interoperable, modern and rigorous MathML Core specification which is well integrated with the Web Platform to provide a starting point. Igalia's Fred Wang is the co-editor of MathML Core, and we've implemented alongside and completed an initial implementation thanks to lots of hard work from Rob Buis and Fred, and worked to write tests and align high quality interoperability.

Modern interoperabilty and high quality rendering by July 2019.

We filed the Intent to Implement and began the process of upstreaming. As part of this, we have also created a corresponding explainer and filed for a review with the W3C Technical Architecture Group (TAG). We've seen positive movement and feedback and several patches have already begun to land upstream toward this long, popular and overdue issue. Thanks to our work, we will soon be able to say (interoperably), that the Web can handle original use cases defined at CERN in the early 1990s.

Outreach and Community

Standards are a complex process and we believe that enabling good standardization requires input from a considerably larger audience than is historically able to engage in standards meetings. Bringing diverse parts of the community together is important. This has historically been quite a challenge to achieve. As such, we've been doing a lot in this space.

2020…

So, I think we did some amazing and important things in 2019. But, as amazing as it was: Just wait until you see what we do 2020.

In fact, we're excited about so many things, we couldn't even wait to give you a taste...

December 16, 2019 05:00 AM

December 12, 2019

Nikolas Zimmermann

CSS 3D transformations & SVG

As mentioned in my first article, I have a long relationship with the WebKit project, and its SVG implementation. In this post I will explain some exciting new developments and possible advances, and I present some demos of the state of the art (if you cannot wait, go and watch them, and come back for the details). To understand why these developments are both important and achievable now though, we’ll have to first understand some history.

by zimmermann@kde.org (Nikolas Zimmermann) at December 12, 2019 12:00 AM

December 09, 2019

Frédéric Wang

Review of my year 2019 at Igalia

Co-owner and mentorship

In 2016, I was among the new software engineers who joined Igalia. Three years later I applied to become co-owner of the company and the legal paperwork was completed in April. As my colleague Andy explained on his blog, this does not change a lot of things in practice because most of the decisions are taken within the assembly. However, I’m still very happy and proud of having achieved this step 😄

One of my new duty has been to be the “mentor” of Miyoung Shin since February and to help with her integration at Igalia. Shin has been instrumental in Igalia’s project to improve Chromium’s Code Health with an impressive number of ~500 commits. You can watch the video of her BlinkOn lightning talk on YouTube. In addition to her excellent technical contribution she has also invested her energy in company’s life, helping with the organization of social activities during our summits, something which has really been appreciated by her colleagues. I’m really glad that she has recently entered the assembly and will be able to take a more active role in the company’s decisions!

Julie and Shin at BlinkOn11
Julie (left) and Shin (right) at BlinkOn 11.

Working with new colleagues

Igalia also hired Cathie Chen last December and I’ve gotten the chance to work with her as part of our Web Platform collaboration with AMP. Cathie has been very productive and we have been able to push forward several features, mostly related to scrolling. Additionally, she implemented ResizeObserver in WebKit, which is a quite exciting new feature for web developers!

Cathie Chen signing 溜溜溜
Cathie executing traditional Chinese sign language.

The other project I’ve been contributed to this year is MathML. We are introducing new math and font concepts into Chromium’s new layout engine, so the project is both technically challenging and fascinating. For this project, I was able to rely on Rob’s excellent development skills to complete this year’s implementation roadmap on our Chromium branch and start upstreaming patches.

In addition, a lot of extra non-implementation effort has been done which led to consensus between browser implementers, MathML enthusiasts and other web developer groups. Brian Kardell became a developer advocate for Igalia in March and has been very helpful talking to different people, explaining our project and introducing ideas from the Extensible Web, HTML or CSS Communities. I strongly recommend his recent Igalia chat and lightning talk for a good explanation of what we have been doing this year.

Photo of Brian
Brian presenting his lightning talk at BlinkOn11.

Conferences

These are the developer conferences I attended this year and gave talks about our MathML project:

Me at BlinkOn 11
Me, giving my BlinkOn 11 talk on "MathML Core".

As usual it was nice to meet the web platform and chromium communities during these events and to hear about the great projects happening. I’m also super excited that the assembly decided to try something new for my favorite event. Indeed, next year we will organize the Web Engines Hackfest in May to avoid conflicts with other browser events but more importantly, we will move to a bigger venue so that we can welcome more people. I’m really looking forward to seeing how things go!

Paris - A Coruña by train

Environmental protection is an important topic discussed in our cooperative. This year, I’ve tried to reduce my carbon footprint when traveling to Igalia’s headquarters by using train instead of plane. Obviously the latter is faster but the former is much more confortable and has less security constraints. It is possible to use high-speed train from Paris to Barcelona and then a Trenhotel to avoid a hotel night. For a quantitative comparison, let’s consider this table, based on my personal travel experience:

Traject Transport Duration Price CO2/passenger
Paris - Barcelona TGV ~6h30 60-90€ 6kg
Barcelona - A Coruña Trenhotel ~15h 40-60€ Unspecified
Paris - A Coruña
(via Madrid)
Iberia 4h30-5h 100-150€ 192kg

The price depends on how early the tickets are booked but it is more or less the same for train and plane. Renfe does not seem to indicate estimated CO2 emission and trains to Galicia are probably not the most eco-friendly. However, using this online estimator, the 1200 kilometers between Barcelona and A Coruña would emit 50kg by national train. This would mean a reduction of 71% in the emission of CO2, which is quite significant. Hopefully things will get better when one day AVE is available between Madrid and A Coruña… 😉

December 09, 2019 11:00 PM

December 08, 2019

Philippe Normand

HTML overlays with GstWPE, the demo

Once again this year I attended the GStreamer conference and just before that, Embedded Linux conference Europe which took place in Lyon (France). Both events were a good opportunity to demo one of the use-cases I have in mind for GstWPE, HTML overlays!

As we, at Igalia, usually have a booth at ELC, I thought a GstWPE demo would be nice to have so we can show it there. The demo is a rather simple GTK application presenting a live preview of the webcam video capture with an HTML overlay blended in. The HTML and CSS can be modified using the embedded text editor and the overlay will be updated accordingly. The final video stream can even be streamed over RTMP to the main streaming platforms (Twitch, Youtube, Mixer)! Here is a screenshot:

The code of the demo is available on Igalia’s GitHub. Interested people should be able to try it as well, the app is packaged as a Flatpak. See the instructions in the GitHub repo for more details.

Having this demo running on our booth greatly helped us to explain GstWPE and how it can be used in real-life GStreamer applications. Combining the flexibility of the Multimedia framework with the wide (wild) features of the Web Platform will for sure increase the synergy and foster collaboration!

As a reminder, GstWPE is available in GStreamer since the 1.16 version. On the roadmap for the mid-term I plan to add Audio support, thus allowing even better integration between WPEWebKit and GStreamer. Imagine injecting PCM audio coming from WPEWebKit into an audio mixer in your GStreamer-based application! Stay tuned.

by Philippe Normand at December 08, 2019 02:00 PM