Planet Igalia

June 26, 2019

Jacobo Aragunde

Introducing the Chromium-based web runtime for the AGL platform

Igalia has been working with AGL (Automotive Grade Linux) to provide a web application runtime to their platform, based on Chromium. We delivered the first phase of this project back in January, and it’s been available since the Flounder 6.0.5 and Guppy 7.0.0 releases. This is a summary of how it came to be, and the next steps into the future.

Igalia stand showing the AGL web runtime

Web applications as first-class citizens

The idea of web applications as first-class citizens is not new in AGL. The early versions of the stack, based on Tizen 3.0, already implemented this idea, via the Crosswalk runtime. It provided extended JS APIs for developers to access to system services inside a web application runtime.

The abandon of the Tizen 3.0 effort by its main developers meant the chance for AGL to start fresh, redefining its architecture to become what it is today, but the idea of web applications was still there. The current AGL architecture still supports web apps, and the system APIs are available through WebSockets so both native and web applications can use them. But, until now, there wasn’t a piece of software to run them other than a general-purpose browser.

Leveraging existing projects

One of the strengths of open source is speeding up the time-to-market by allowing code reuse. We looked into the WebOS OSE platform, developed by LGe with contributions from Igalia. It provides the Web Application Manager (WAM) component, capable of booting up a Chromium-based runtime with a low footprint and managing application life cycle. Reusing it, we were able to deliver a web application runtime more quickly.

Our contribution back to the webOS OSE project is the use of the new Wayland backend, independently developed by Igalia and available upstream in new Chromium versions. It was backported to the Chromium version used by WebOS OSE, and hopefully it will be part of future releases of the platform based on newer versions of Chromium.

Development process

My colleague Julie made a great presentation of the AGL web runtime in the last AGL All Member Meeting in March. I’m reusing her work for this overview.

Wayland

The version of Chromium provided by the WebOS OSE platform makes use of an updated version of the Intel Ozone-Wayland backend. This project has been abandoned for a while, and never upstreamed due to deep architecture differences. In the last couple of years, Igalia has implemented a new backend independently, using the lessons learned from integrating and maintaining Ozone-Wayland in previous projects, and following the current (and future) architecture of the video pipeline.

Compared process structure of Wayland implementations

As we mentioned before, we backported the new implementation of the Wayland backend to WebOS OSE, and added IVI-shell integration patches on top of it.

AGL life cycle

The WAM launcher process was modified to integrate with AGL life cycle callbacks and events. In particular, it registers event callbacks for HomeScreen, WindowManager and notification for ILMControl, activates WebApp window, when it gets Event_TapShortcut, and manages WebApp states for Event_Active/Event_Inactive. LGe, also a member of AGL, provided the initial work based on the Eel release, which we ported over to Flounder and kept evolving and maintaining.

WAM Launcher process integration diagram

AGL security model

In the AGL security model, access to system services is controlled with SMACK labels. A process with a certain label can only access a subset of the system API. The installation manifest for AGL applications and services is in charge of the relation between labels and services.

Access to system APIs in AGL happens through WebSockets so, from the point of view of the browser, it’s just a network request. Since WAM reuses the Chromium process model, the network interaction happening in any running webapp would actually be performed by the browser process. The problem is that there is only one browser process in the system, it’s running in the background and its labels don’t relate with the ones from the running applications. As a solution, we configure web applications to channel their networking through a proxy, and the WAM launcher creates a proxy process with the proper SMACK label for every webapp.

Proxy process integration diagram

Current implementation is based on the Tinyproxy project, but we are pending to review this model and find a more efficient solution.

Next steps

We are clearing any Qt dependencies from WAM, replacing them with standard C++ or the boost library. This work is interesting for AGL to be able to create a web-only version of the platform, and also for WebOS OSE to make the platform lighter, so we will be contributing it back. In this regard, the final goal is that AGL doesn’t require any patches on top of WebOS OSE to be able to use WAM.

Also, in the line of creating a web-only AGL, we will provide new demo apps so the Qt dependency wouldn’t be needed anywhere in the platform.

And, after the AGL face-to-face meeting that Igalia hosted in Coruña, we will be integrating more subsystems available in the platform, and reworking the integration with the security framework to be more robust and efficient. You can follow our progress in the AGL Jira with the label WebAppMgr.

Try it now!

The web application runtime is available in Flounder (starting in 6.0.5) and Guppy releases, and it will be in Halibut (current master). It can be tested adding the agl-html5-framework feature to the agl setup script. I have a bunch of test web applications available for testing in the repository wam-demo-applications, but there will be official demos in AGL soon.

Youtube running as an AGL app

Youtube running as an AGL application

This project has been made possible by the Linux Foundation, through a sponsorship from Automotive Grade Linux. Thanks!

AGL logo

Igalia

by Jacobo Aragunde Pérez at June 26, 2019 05:00 PM

Andy Wingo

fibs, lies, and benchmarks

Friends, consider the recursive Fibonacci function, expressed most lovelily in Haskell:

fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

Computing elements of the Fibonacci sequence ("Fibonacci numbers") is a common microbenchmark. Microbenchmarks are like a Suzuki exercises for learning violin: not written to be good tunes (good programs), but rather to help you improve a skill.

The fib microbenchmark teaches language implementors to improve recursive function call performance.

I'm writing this article because after adding native code generation to Guile, I wanted to check how Guile was doing relative to other language implementations. The results are mixed. We can start with the most favorable of the comparisons: Guile present versus Guile of the past.


I collected these numbers on my i7-7500U CPU @ 2.70GHz 2-core laptop, with no particular performance tuning, running each benchmark 10 times, waiting 2 seconds between measurements. The bar value indicates the median elapsed time, and above each bar is an overlayed histogram of all results for that scenario. Note that the y axis is on a log scale. The 2.9.3* version corresponds to unreleased Guile from git.

Good news: Guile has been getting significantly faster over time! Over decades, true, but I'm pleased.

where are we? static edition

How good are Guile's numbers on an absolute level? It's hard to say because there's no absolute performance oracle out there. However there are relative performance oracles, so we can try out perhaps some other language implementations.

First up would be the industrial C compilers, GCC and LLVM. We can throw in a few more "static" language implementations as well: compilers that completely translate to machine code ahead-of-time, with no type feedback, and a minimal run-time.


Here we see that GCC is doing best on this benchmark, completing in an impressive 0.304 seconds. It's interesting that the result differs so much from clang. I had a look at the disassembly for GCC and I see:

fib:
    push   %r12
    mov    %rdi,%rax
    push   %rbp
    mov    %rdi,%rbp
    push   %rbx
    cmp    $0x1,%rdi
    jle    finish
    mov    %rdi,%rbx
    xor    %r12d,%r12d
again:
    lea    -0x1(%rbx),%rdi
    sub    $0x2,%rbx
    callq  fib
    add    %rax,%r12
    cmp    $0x1,%rbx
    jg     again
    and    $0x1,%ebp
    lea    0x0(%rbp,%r12,1),%rax
finish:
    pop    %rbx
    pop    %rbp
    pop    %r12
    retq   

It's not quite straightforward; what's the loop there for? It turns out that GCC inlines one of the recursive calls to fib. The microbenchmark is no longer measuring call performance, because GCC managed to reduce the number of calls. If I had to guess, I would say this optimization doesn't have a wide applicability and is just to game benchmarks. In that case, well played, GCC, well played.

LLVM's compiler (clang) looks more like what we'd expect:

fib:
   push   %r14
   push   %rbx
   push   %rax
   mov    %rdi,%rbx
   cmp    $0x2,%rdi
   jge    recurse
   mov    %rbx,%rax
   add    $0x8,%rsp
   pop    %rbx
   pop    %r14
   retq   
recurse:
   lea    -0x1(%rbx),%rdi
   callq  fib
   mov    %rax,%r14
   add    $0xfffffffffffffffe,%rbx
   mov    %rbx,%rdi
   callq  fib
   add    %r14,%rax
   add    $0x8,%rsp
   pop    %rbx
   pop    %r14
   retq   

I bolded the two recursive calls.

Incidentally, the fib as implemented by GCC and LLVM isn't quite the same program as Guile's version. If the result gets too big, GCC and LLVM will overflow, whereas in Guile we overflow into a bignum. Also in C, it's possible to "smash the stack" if you recurse too much; compilers and run-times attempt to mitigate this danger but it's not completely gone. In Guile you can recurse however much you want. Finally in Guile you can interrupt the process if you like; the compiled code is instrumented with safe-points that can be used to run profiling hooks, debugging, and so on. Needless to say, this is not part of C's mission.

Some of these additional features can be implemented with no significant performance cost (e.g., via guard pages). But it's fair to expect that they have some amount of overhead. More on that later.

The other compilers are OCaml's ocamlopt, coming in with a very respectable result; Go, also doing well; and V8 WebAssembly via Node. As you know, you can compile C to WebAssembly, and then V8 will compile that to machine code. In practice it's just as static as any other compiler, but the generated assembly is a bit more involved:


fib_tramp:
    jmp    fib

fib:
    push   %rbp
    mov    %rsp,%rbp
    pushq  $0xa
    push   %rsi
    sub    $0x10,%rsp
    mov    %rsi,%rbx
    mov    0x2f(%rbx),%rdx
    mov    %rax,-0x18(%rbp)
    cmp    %rsp,(%rdx)
    jae    stack_check
post_stack_check:
    cmp    $0x2,%eax
    jl     return_n
    lea    -0x2(%rax),%edx
    mov    %rbx,%rsi
    mov    %rax,%r10
    mov    %rdx,%rax
    mov    %r10,%rdx
    callq  fib_tramp
    mov    -0x18(%rbp),%rbx
    sub    $0x1,%ebx
    mov    %rax,-0x20(%rbp)
    mov    -0x10(%rbp),%rsi
    mov    %rax,%r10
    mov    %rbx,%rax
    mov    %r10,%rbx
    callq  fib_tramp
return:
    mov    -0x20(%rbp),%rbx
    add    %ebx,%eax
    mov    %rbp,%rsp
    pop    %rbp
    retq   
return_n:
    jmp    return
stack_check:
    callq  WasmStackGuard
    mov    -0x10(%rbp),%rbx
    mov    -0x18(%rbp),%rax
    jmp    post_stack_check

Apparently fib compiles to a function of two arguments, the first passed in rsi, and the second in rax. (V8 uses a custom calling convention for its compiled WebAssembly.) The first synthesized argument is a handle onto run-time data structures for the current thread or isolate, and in the function prelude there's a check to see that the function has enough stack. V8 uses these stack checks also to handle interrupts, for when a web page is stuck in JavaScript.

Otherwise, it's a more or less normal function, with a bit more register/stack traffic than would be strictly needed, but pretty good.

do optimizations matter?

You've heard of Moore's Law -- though it doesn't apply any more, it roughly translated into hardware doubling in speed every 18 months. (Yes, I know it wasn't precisely that.) There is a corresponding rule of thumb for compiler land, Proebsting's Law: compiler optimizations make software twice as fast every 18 years. Zow!

The previous results with GCC and LLVM were with optimizations enabled (-O3). One way to measure Proebsting's Law would be to compare the results with -O0. Obviously in this case the program is small and we aren't expecting much work out of the optimizer, but it's interesting to see anyway:


Answer: optimizations don't matter much for this benchark. This investigation does give a good baseline for compilers from high-level languages, like Guile: in the absence of clever trickery like the recursive inlining thing GCC does and in the absence of industrial-strength instruction selection, what's a good baseline target for a compiler? Here we see for this benchmark that it's somewhere between 420 and 620 milliseconds or so. Go gets there, and OCaml does even better.

how is time being spent, anyway?

Might we expect V8/WebAssembly to get there soon enough, or is the stack check that costly? How much time does one stack check take anyway? For that we'd have to determine the number of recursive calls for a given invocation.

Friends, it's not entirely clear to me why this is, but I instrumented a copy of fib, and I found that the number of calls in fib(n) was a more or less constant factor of the result of calling fib. That ratio converges to twice the golden ratio, which means that since fib(n+1) ~= φ * fib(n), then the number of calls in fib(n) is approximately 2 * fib(n+1). I scratched my head for a bit as to why this is and I gave up; the Lord works in mysterious ways.

Anyway for fib(40), that means that there are around 3.31e8 calls, absent GCC shenanigans. So that would indicate that each call for clang takes around 1.27 ns, which at turbo-boost speeds on this machine is 4.44 cycles. At maximum throughput (4 IPC), that would indicate 17.8 instructions per call, and indeed on the n > 2 path I count 17 instructions.

For WebAssembly I calculate 2.25 nanoseconds per call, or 7.9 cycles, or 31.5 (fused) instructions at max IPC. And indeed counting the extra jumps in the trampoline, I get 33 cycles on the recursive path. I count 4 instructions for the stack check itself, one to save the current isolate, and two to shuffle the current isolate into place for the recursive calls. But, compared to clang, V8 puts 6 words on the stack per call, as opposed to only 4 for LLVM. I think with better interprocedural register allocation for the isolate (i.e.: reserve a register for it), V8 could get a nice boost for call-heavy workloads.

where are we? dynamic edition

Guile doesn't aim to replace C; it's different. It has garbage collection, an integrated debugger, and a compiler that's available at run-time, it is dynamically typed. It's perhaps more fair to compare to languages that have some of these characteristics, so I ran these tests on versions of recursive fib written in a number of languages. Note that all of the numbers in this post include start-up time.


Here, the ocamlc line is the same as before, but using the bytecode compiler instead of the native compiler. It's a bit of an odd thing to include but it performs so well I just had to include it.

I think the real takeaway here is that Chez Scheme has fantastic performance. I have not been able to see the disassembly -- does it do the trick like GCC does? -- but the numbers are great, and I can see why Racket decided to rebase its implementation on top of it.

Interestingly, as far as I understand, Chez implements stack checks in the straightfoward way (an inline test-and-branch), not with a guard page, and instead of using the stack check as a generic ability to interrupt a computation in a timely manner as V8 does, Chez emits a separate interrupt check. I would like to be able to see Chez's disassembly but haven't gotten around to figuring out how yet.

Haskell's call performance is surprisingly bad here, beaten even by OCaml's bytecode compiler; is this the cost of laziness, or just a lacuna of the implementation? I do not know. I do know I have this mental image that Haskell is a good compiler but apparently if that's the standard, so is Guile :)

Finally, in this comparison section, I was not surprised by cpython's relatively poor performance; we know cpython is not fast. I think though that it just goes to show how little these microbenchmarks are worth when it comes to user experience; like many of you I use plenty of Python programs in my daily work and don't find them slow at all. Think of micro-benchmarks like x-ray diffraction; they can reveal the hidden substructure of DNA but they say nothing at all about the organism.

where to now?

Perhaps you noted that in the last graph, the Guile and Chez lines were labelled "(lexical)". That's because instead of running this program:

(define (fib n)
  (if (< n 2)
      n
      (+ (fib (- n 1)) (fib (- n 2)))))

They were running this, instead:

(define (fib n)
  (define (fib* n)
    (if (< n 2)
        n
        (+ (fib* (- n 1)) (fib* (- n 2)))))
  (fib* n))

The thing is, historically, Scheme programs have treated top-level definitions as being mutable. This is because you don't know the extent of the top-level scope -- there could always be someone else who comes and adds a new definition of fib, effectively mutating the existing definition in place.

This practice has its uses. It's useful to be able to go in to a long-running system and change a definition to fix a bug or add a feature. It's also a useful way of developing programs, to incrementally build the program bit by bit.


But, I would say that as someone who as written and maintained a lot of Scheme code, it's not a normal occurence to mutate a top-level binding on purpose, and it has a significant performance impact. If the compiler knows the target to a call, that unlocks a number of important optimizations: type check elision on the callee, more optimal closure representation, smaller stack frames, possible contification (turning calls into jumps), argument and return value count elision, representation specialization, and so on.

This overhead is especially egregious for calls inside modules. Scheme-the-language only gained modules relatively recently -- relative to the history of scheme -- and one of the aspects of modules is precisely to allow reasoning about top-level module-level bindings. This is why running Chez Scheme with the --program option is generally faster than --script (which I used for all of these tests): it opts in to the "newer" specification of what a top-level binding is.

In Guile we would probably like to move towards a more static way of treating top-level bindings, at least those within a single compilation unit. But we haven't done so yet. It's probably the most important single optimization we can make over the near term, though.

It's true though that even absent lexical optimizations, top-level calls can be made more efficient in Guile. I am not sure if we can reach Chez with the current setup of having a template JIT, because we need two return addresses: one virtual (for bytecode) and one "native" (for JIT code). Register allocation is also something to improve but it turns out to not be so important for fib, as there are few live values and they need to spill for the recursive call. But, we can avoid some of the indirection on the call, probably using an inline cache associated with the callee; Chez has had this optimization since 1984!

what guile learned from fib

This exercise has been useful to speed up Guile's procedure calls, as you can see for the difference between the latest Guile 2.9.2 release and what hasn't been released yet (2.9.3).

To decide what improvements to make, I extracted the assembly that Guile generated for fib to a standalone file, and tweaked it in a number of ways to determine what the potential impact of different scenarios was. Some of the detritus from this investigation is here.

There were three big performance improvements. One was to avoid eagerly initializing the slots in a function's stack frame; this took a surprising amount of run-time. Fortunately the rest of the toolchain like the local variable inspector was already ready for this change.

Another thing that became clear from this investigation was that our stack frames were too large; there was too much memory traffic. I was able to improve this in the lexical-call by adding an optimization to elide useless closure bindings. Usually in Guile when you call a procedure, you pass the callee as the 0th parameter, then the arguments. This is so the procedure has access to its closure. For some "well-known" procedures -- procedures whose callers can be enumerated -- we optimize to pass a specialized representation of the closure instead ("closure optimization"). But for well-known procedures with no free variables, there's no closure, so we were just passing a throwaway value (#f). An unhappy combination of Guile's current calling convention being stack-based and a strange outcome from the slot allocator meant that frames were a couple words too big. Changing to allow a custom calling convention in this case sped up fib considerably.

Finally, and also significantly, Guile's JIT code generation used to manually handle calls and returns via manual stack management and indirect jumps, instead of using the platform calling convention and the C stack. This is to allow unlimited stack growth. However, it turns out that the indirect jumps at return sites were stalling the pipeline. Instead we switched to use call/return but keep our manual stack management; this allows the CPU to use its return address stack to predict return targets, speeding up code.

et voilà

Well, long article! Thanks for reading. There's more to do but I need to hit the publish button and pop this off my stack. Until next time, happy hacking!

by Andy Wingo at June 26, 2019 10:34 AM

June 22, 2019

Eleni Maria Stea

Depth-aware upsampling experiments (Part 4: Improving the nearest depth where we detect discontinuities)

This is another post of the series where I explain some ideas I tried in order to improve the upscaling of the half-resolution SSAO render target of the VKDF sponza demo that was written by Iago Toral. In the previous post, I had classified the sample neighborhoods in surface neighborhoods and neighborhoods that contain depth … Continue reading Depth-aware upsampling experiments (Part 4: Improving the nearest depth where we detect discontinuities)

by hikiko at June 22, 2019 09:44 AM

June 18, 2019

Manuel Rego

Speaking at CSS Day 2019

This year I got invited to speak at CSS Day, this is an amazing event that every year brings to Amsterdam great speakers from all around the globe to talk about cutting edge topics related to CSS.

Pictures of speakers and MC at CSS Day 2019 Speakers and MC at CSS Day 2019

The conference happened in the beautiful Compagnietheater. Kudos to the organization as they were really kind and supportive during the whole event. Thanks for giving me the opportunity to speak there.

Compagnietheater in Amsterdam Compagnietheater in Amsterdam

For this event I prepared a totally new talk, focused on explaining what it takes to implement something like CSS Grid Layout in a browser engine. I took an idea from Jen Simmons and implemented a new property grid-skip-areas during the presentation, this was useful to explain the different things that happen during the whole process. Video of the talk is available on YouTube, and the slides are available if you are interested too; however, note that some of them won’t work on your browser unless you built the linked patches.

The feedback after the talk was really good, everyone seemed to like it (despite the fact that I showed lots of slides with C++ code) and find it useful to understand what’s going on behind the scenes. Thank you very much for your kind words! 😊


Somehow with this talk I was explaining the kind of things we do at Igalia and how we can help to fill the gaps left by browser vendors in the evolution of the Web Platform. Igalia has now a solid position inside the Web community, which makes us an excellent partner in order to improve the platform aligned with your specific needs. If you want to know more don’t hesitate to read our website or contact us directly.

I hope you enjoy it! 😀

June 18, 2019 10:00 PM

Samuel Iglesias

My latest VK-GL-CTS contributions

Even if you are not a gamer, odds are that you already heard about Vulkan graphics and compute API that provides high-efficency, cross-platform access to modern GPUs. This API is designed by the Khronos Group and it is supported by a new set of drivers specifically designed to implement the different functions and features defined by the spec (at the time of writing this post, it is version 1.1).

Vulkan

In order to guarantee that the drivers work according to the spec, drivers need to pass a conformance test suite that ensures they do what it is expected from them. VK-GL-CTS is the name of the conformance test suite used for certify the conformance on both Vulkan and OpenGL APIs and… it is open-source!

VK-GL-CTS

As part of my daily job at Igalia, I contribute to VK-GL-CTS from fixing some bugs, improving existing tests or even writing new tests for a variety of extensions. In this post I am going to describe some of the work I have been doing in the last few months.

VK_EXT_host_query_reset

This extension gives you the opportunity to reset queries outside a command buffer, which is a fast way of doing it once your application has finished reading query’s data. All that you need is to call vkResetQueryPoolEXT() function. There are several Vulkan drivers supporting already this extension on GNU/Linux (NVIDIA, open-source drivers AMDVLK, RADV and ANV) and probably more in other platforms.

I have implemented tests for all the different queries: occlusion queries, pipeline timestamp queries and statistics queries. Transform feedback stream queries tests landed a bit later.

VK_EXT_discard_rectangles

VK_EXT_discard_rectangles provides a way to define rectangles in framebuffer-space coordinates that discard rasterization of all points, lines and triangles that fall inside (exclusive mode) or outside (inclusive mode) of their area. You can regard this feature as something similar to scissor testing but it operates orthogonally to the existing scissor test functionality.

It is easier to understand with an example. Imagine that you want to do the following in your application: clear the color attachment to red color, draw a green quad covering the whole attachment but defining a discard rectangle in order to restrict the rasterization of the quad to the area defined by the discard rectangle.

For that, you define the discard rectangles at pipeline creation time for example (it is possible to define them dynamically too); as we want to restrict the rasterization of the quad to the area defined by the discard rectangle, then we set its mode to VK_DISCARD_RECTANGLE_MODE_INCLUSIVE_EXT.

VK_EXT_discard_rectangles inclusive mode example

If we want to discard the rasterization of the green quad inside the area defined by the discard rectangle, then we set VK_DISCARD_RECTANGLE_MODE_EXCLUSIVE_EXT mode at pipeline creation time and that’s all. Here you have the output for this case:

VK_EXT_discard_rectangles exclusive mode example

You are not limited to define just one discard rectangle, drivers supporting this extension should support a minimum of 4 of discard rectangles but some drivers may support more. As this feature works orthogonally to other tests like scissor test, you can do fancy things in your app :-)

The tests I developed for VK_EXT_discard_rectangles extension are already available in VK-GL-CTS repo. If you want to test them on an open-source driver, right now only RADV has implemented this extension.

VK_EXT_pipeline_creation_feedback

VK_EXT_pipeline_creation_feedback is another example of a useful feature for application developers, specially game developers. This extension gives a way to know at pipeline creation, if the pipeline hit the provided pipeline cache, the time consumed to create it or even which shaders stages hit the cache. This feature gives feedback about pipeline creation that can help to improve the pipeline caches that are shipped to users, with the final goal of reducing load times.

Tests for VK_EXT_pipeline_creation_feedback extension have made their way into VK-GL-CTS repo. Good news for the ones using open-source drivers: both RADV and ANV have implemented the support for this extension!

Conclusions

Since I started working in the Graphics team at Igalia, I have been contributing code to Mesa drivers for both OpenGL and Vulkan, adding new tests to Piglit, improve VkRunner among other contributions.

Now I am contributing to increase VK-GL-CTS coverage by developing new tests for extensions, fixing existing tests among other things. This work also involves developing patches for Vulkan Validation Layers, fixes for glslang and more things to come. In summary, I am enjoying a lot doing contributions to the open-source ecosystem created by Khronos Group as part of my daily work!

Note: if you are student and you want to start contributing to open-source projects, don’t miss our Igalia Coding Experience program (more info in our website).

Igalia

June 18, 2019 06:45 AM

June 14, 2019

Michael Catanzaro

An OpenJPEG Surprise

My previous blog post seems to have resolved most concerns about my requests for Ubuntu stable release updates, but I again received rather a lot of criticism for the choice to make WebKit depend on OpenJPEG, even though my previous post explained clearly why there are are not any good alternatives.

I was surprised to receive a pointer to ffmpeg, which has its own JPEG 2000 decoder that I did not know about. However, we can immediately dismiss this option due to legal problems with depending on ffmpeg. I also received a pointer to a resurrected libjasper, which is interesting, but since libjasper was removed from Ubuntu, its status is not currently better than OpenJPEG.

But there is some good news! I have looked through Ubuntu’s security review of the OpenJPEG code and found some surprising results. Half the reported issues affect the library’s companion tools, not the library itself. And the other half of the issues affect the libmj2 library, a component of OpenJPEG that is not built by Ubuntu and not used by WebKit. So while these are real security issues that raise concerns about the quality of the OpenJPEG codebase, none of them actually affect OpenJPEG as used by WebKit. Yay!

The remaining concern is that huge input sizes might cause problems within the library that we don’t yet know about. We don’t know because OpenJPEG’s fuzzer discards huge images instead of testing them. Ubuntu’s security team thinks there’s a good chance that fixing the fuzzer could uncover currently-unknown multiplication overflow issues, for instance, a class of vulnerability that OpenJPEG has clearly had trouble with in the past. It would be good to see improvement on this front. I don’t think this qualifies as a security vulnerability, but it is certainly a security problem that would facilitate discovering currently-unknown vulnerabilities if fixed.

Still, on the whole, the situation is not anywhere near as bad as I’d thought. Let’s hope OpenJPEG can be included in Ubuntu main sooner rather than later!

by Michael Catanzaro at June 14, 2019 02:43 PM

June 10, 2019

Javier Fernández

A new terminal-style line breaking with CSS Text

The CSS Text 3 specification defines a module for text manipulation and covers, among a few other features, the line breaking behavior of the browser, including white space handling. I’ve been working lately on some new features and bug fixing for this specification and I’d like to introduce in this posts the last one we made available for the Web Platform users. This is yet another contribution that came out the collaboration between Igalia and Bloomberg, which has been held for several years now and has produced many important new features for the Web, like CSS Grid Layout.

The feature

I guess everybody knows the white-space CSS property, which allows web authors to control two main aspects of the rendering of a text line: collapsing and wrapping. A new value break-spaces has been added to the ones available for this property, which allows web authors to emulate a terminal-like line breaking behavior. This new value operates basically like pre-wrap, but with two key differences:

  • any sequence of preserved white space characters takes up space, even at the end of the line.
  • a preserved white space sequence can be wrapped at any character, moving the rest of the sequence, intact, to the line bellow.

What does this new behavior actually mean ? I’ll try to explain it with a few examples. Lets start with a simple but quite illustrative demo which tries to emulate a meteorology monitoring system which shows relevant changes over time, where the gaps between subsequent changes must be preserved:



 #terminal {
  font: 20px/1 monospace;
  width: 340px;
  height: 5ch;
  background: black;
  color: green;
  overflow: hidden;
  white-space: break-spaces;
  word-break: break-all;
 }


    

Another interesting use case for this feature could be a logging system which should preserve the text formatting of the logged information, considering different window sizes. The following demo tries to describe this such scenario:



body { width: 1300px; }
#logging {
  font: 20px/1 monospace;
  background: black;
  color: green;

  animation: resize 7s infinite alternate;

  white-space: break-spaces;
  word-break: break-all;
}
@keyframes resize {
  0% { width: 25%; }
  100% { width: 100%; }
}

Hash: 5a2a3d23f88174970ed8 Version: webpack 3.12.0 Time: 22209ms Asset Size Chunks Chunk Names pages/widgets/index.51838abe9967a9e0b5ff.js 1.17 kB 10 [emitted] pages/widgets/index img/icomoon.7f1da5a.svg 5.38 kB [emitted] fonts/icomoon.2d429d6.ttf 2.41 kB [emitted] img/fontawesome-webfont.912ec66.svg 444 kB [emitted] [big] fonts/fontawesome-webfont.b06871f.ttf 166 kB [emitted] img/mobile.8891a7c.png 39.6 kB [emitted] img/play_button.6b15900.png 14.8 kB [emitted] img/keyword-back.f95e10a.jpg 43.4 kB [emitted] . . .

Use cases

In the demo shown before there are several cases that I think it’s worth to analyze in detail.

A breaking opportunity exists after any white space character

The main purpose of this feature is to preserve the white space sequences length even when it has to be wrapped into multiple lines. The following example tries to describe this basic use case:



.container {
  font: 20px/1 monospace;
  width: 5ch;
  white-space: break-spaces;
  border: 1px solid;
}

XX XX

The example above shows how the white space sequence with a length of 15 characters is preserved and wrapped along 3 different lines.

Single leading white space

Before the addition of the break-spaces value this scenario was only possible at the beginning of the line. In any other case, the trailing white spaces were either collapsed or hang, hence the next line couldn’t start with a sequence of white spaces. Lets consider the following example:



.container {
  font: 20px/1 monospace;
  width: 3ch;
  white-space: break-spaces;
  border: 1px solid;
}

XX XX

Like when using pre-wrap, the single leading space is preserved. Since break-spaces allows breaking opportunities after any white space character, we break after the first leading white space (” |XX XX”). The second line can be broken after the first preserved white space, creating another leading white space in the next line (” |XX | XX”).

However, lets consider now a case without such first single leading white space.



.container {
  font: 20px/1 monospace;
  width: 3ch;
  white-space: break-spaces;
  border: 1px solid;
}

XXX XX

Again, it s not allowed to break before the first space, but in this case there isn’t any previous breaking opportunity, so the first space after the word XX should overflow (“XXX | XX”); the next white space character will be moved down to the next line as preserved leading space.

Breaking before the first white space

I mentioned before that the spec states clearly that the break-space feature allows breaking opportunities only after white space characters. However, it’d be possible to break the line just before the first white space character after a word if the feature is used in combination with other line breaking CSS properties, like word-break or overflow-wrap (and other properties too).



.container {
  font: 20px/1 monospace;
  width: 4ch;
  white-space: break-spaces;
  overflow-wrap: break-word;
  border: 1px solid;
}

XXXX X

The two white spaces between the words are preserved due to the break-spaces feature, but the first space after the XXXX word would overflow. Hence, the overflow-wrap: break-word feature is applied to prevent the line to overflow and introduce an additional breaking opportunity just before the first space after the word. This behavior causes that the trailing spaces are moved down as a leading white space sequence in the next line.

We would get the same rendering if word-break: break-all is used instead overflow-wrap (or even in combination), but this is actualy an incorrect behavior, which has the corresponding bug reports in WebKit (197277) and Blink (952254) according to the discussion in the CSS WG (see issue #3701).

Consider previous breaking opportunities

In the previous example I described a combination of line breaking features that would allow breaking before the first space after a word. However, this should be avoided if there are previous breaking opportunities. The following example is one of the possible scenarios where this may happen:



.container {
  font: 20px/1 monospace;
  width: 4ch;
  white-space: break-spaces;
  overflow-wrap: break-word;
  border: 1px solid;
}

XX X X

In this case, we could break after the second word (“XX X| X”), since overflow-wrap: break-word would allow us to do that in order to avoid the line to overflow due to the following white space. However, white-space: break-spaces only allows breaking opportunities after a space character, hence, we shouldn’t break before if there are valid previous opportunities, like in this case in the space after the first word (“XX |X X”).

This preference for previous breaking opportunities before breaking the word, honoring the overflow-wrap property, is also part of the behavior defined for the white-space: pre-wrap feature; although in that case, there is no need to deal with the issue of breaking before the first space after a word since trailing space will just hang. The following example uses just the pre-wrap to show how previous opportunities are selected to avoid overflow or breaking a word (unless explicitly requested by word-break property).



.container {
  font: 20px/1 monospace;
  width: 2ch;
  white-space: pre-wrap;
  border: 1px solid;
}

XX
overflow-wrap:
break-word
word-break:
break-all

In this case, break-all enables breaking opportunities that are not available otherwise (we can break a word at any letter), which can be used to prevent the line to overflow; hence, the overflow-wrap property doesn’t take any effect. The existence of previous opportunities is not considered now, since break-all mandates to produce the longer line as possible.

This new white-space: break-spaces feature implies a different behavior when used in combination with break-all. Even though the preference of previous opportunities should be ignored if we use the word-break: break-all, this may not be the case for the breaking before the first space after a word scenario. Lets consider the same example but using now the word-break: break-all feature:



.container {
  font: 20px/1 monospace;
  width: 4ch;
  white-space: break-spaces;
  overflow-wrap: break-word;
  word-break: break-all;
  border: 1px solid;
}

XX X X

The example above shows that using word-break: break-all doesn’t produce any effect. It’s debatable whether the use of break-all should force the selection of the breaking opportunity that produces the longest line, like it happened in the pre-wrap case described before. However, the spec states clearly that break-spaces should only allow breaking opportunities after white space characters. Hence, I considered that breaking before the first space should only happen if there is no other choice.

As a matter of fact, specifying break-all we shouldn’t considering only previous white spaces, to avoid breaking before the first white space after a word; the break-all feature creates additional breaking opportunities, indeed, since it allows to break the word at any character. Since break-all is intended to produce the longest line as possible, this new breaking opportunity should be chosen over any previous white space. See the following test case to get a clearer idea of this scenario:



.container {
  font: 20px/1 monospace;
  width: 4ch;
  white-space: break-spaces;
  overflow-wrap: break-word;
  word-break: break-all;
  border: 1px solid;
}

X XX X

Bear in mind that the expected rendering in the above example may not be obtained if your browser’s version is still affected by the bugs 197277(Safari/WebKit) and 952254(Chrome/Blink). In this case, the word is broken despite the opportunity in the previous white space, and also avoiding breaking after the ‘XX’ word, just before the white space.

There is an exception to the rule of avoiding breaking before the first white space after a word if there are previous opportunities, and it’s precisely the behavior the line-break: anywhere feature would provide. As I said, all these assumptions were not, in my opinion, clearly defined in the current spec, so that’s why I filed an issue for the CSS WG so that we can clarify when it’s allowed to break before the first space.

Current status and support

The intent-to-ship request for Chrome has been approved recently, so I’m confident the feature will be enabled by default in Chrome 76. However, it’s possible to try the feature in older versions by enabling the Experimental Web Platform Features flag. More details in the corresponding Chrome Status entry. I want to highlight that I also implemented the feature for LayoutNG, the new layout engine that Chrome will eventually ship; this achievement is very important to ensure the stability of the feature in future versions of Chrome.

In the case of Safari, the patch with the implementation of the feature landed in the WebKit’s trunk in r244036, but since Apple doesn’t announce publicly when a new release of Safari will happen or which features it’ll ship, it’s hard to guess when the break-spaces feature will be available for the web authors using such browser. Meanwhile, It’s possible to try the feature in the Safari Technology Preview 80.

Finally, while I haven’t see any signal of active development in Firefox, some of the Mozilla developers working on this area of the Gecko engine have shown public support for the feature.

The following table summarizes the support of the break-spaces feature in the 3 main browsers:

Chrome Safari Firefox
Experimental M73 STP 80 Public support
Ship M76 Unknown Unknown

Web Platform Tests

At Igalia we believe that the Web Platform Tests project is a key piece to ensure the compatibility and interoperability of any development on the Web Platform. That’s why a substantial part of my work to implement this relatively small feature was the definition of enough tests to cover the new functionality and basic use cases of the feature.

white-space overflow-wrap word-break
pre-wrap-008
pre-wrap-015
pre-wrap-016
break-spaces-003
break-spaces-004
break-spaces-005
break-spaces-006
break-spaces-007
break-spaces-008
break-spaces-009
break-word-004
break-word-005
break-word-006
break-word-007
break-word-008
break-all-010
break-all-011
break-all-012
break-all-013
break-all-014
break-all-015

Implementation in several web engines

During the implementation of a browser feature, even a small one like this, it’s quite usual to find out bugs and interoperability issues. Even though this may slow down the implementation of the feature, it’s also a source of additional Web Platform tests and it may contribute to the robustness of the feature itself and the related CSS properties and values. That’s why I decided to implement the feature in parallel for WebKit (Safari) and Blink (Chrome) engines, which I think it helped to ensure interoperability and code maturity. This approach also helped to get a deeper understanding of the line breaking logic and its design and implementation in different web engines.

I think it’s worth mentioning some of these code architectural differences, to get a better understanding of the work and challenges this feature required until it reached web author’s browser.

Chrome/Blink engine

Lets start with Chrome/Blink, which was especially challenging due to the fact that Blink is implementing a new layout engine (LayoutNG). The implementation for the legacy layout engine was the first step, since it ensures the feature will arrive earlier, even behind an experimental runtime flag.

The legacy layout relies on the BreakingContext class to implement the line breaking logic for the inline layout operations. It has the main characteristic of handling the white space breaking opportunities by its own, instead of using the TextBreakIterator (based on ICU libraries), as it does for determining breaking opportunities between letters and/or symbols. This design implies too much complexity to do even small changes like this, especially because is very sensible in terms of performance impact. In the following diagram I try to show a simplified view of the classes involved and the interactions implemented by this line breaking logic.

The LayoutNG line breaking logic is based on a new concept of fragments, mainly handled by the NGLineBreaker class. This new design simplifies the line breaking logic considerably and it’s highly optimized and adapted to get the most of the TextBreakIterator classes and the ICU features. I tried to show a simplified view of this new design with the following diagram:

In order to describe the work done to implement the feature for this web engine, I’ll list the main bugs and patches landed during this time: CR#956465, CR#952254, CR#944063,CR#900727, CR#767634, CR#922437

Safari/WebKit engine

Although as time passes this is less probable, WebKit and Blink still share some of the layout logic from the ages prior to the fork. Although Blink engineers have applied important changes to the inline layout logic, both code refactoring and optimizations, there are common design patterns that made relatively easy porting to WebKit the patches that implemented the feature for the Blink’s legacy layout. In WebKit, the line breaking logic is also implemented by the BreakingContext class and it has a similar architecture, as it’s described, in a quite simplified way, in the class diagram above (it uses different class names for the render/layout objects, though) .

However, Safari supports for the mac and iOS platforms a different code path for the line breaking logic, implemented in the SimpleLineLayout class. This class provides a different design for the line breaking logic, and, similar to what Blink implements in LayoutNG, is based on a concept of text fragments. It also relies as much as possible into the TextBreakIterator, instead of implementing complex rules to handle white spaces and breaking opportunities. The following diagrams show this alternate design to implement the line breaking process.

This SimpleLineLayout code path in not supported by other WebKit ports (like WebKitGtk+ or WPE) and it’s not available either when using some CSS Text features or specific fonts. There are other limitations to use this SimpleLineLayout codepath, which may lead to render the text using the BreakingContext class.

Again, this is the list of bugs that were solved to implement the feature for the WebKit engine: WK#197277, WK#196169, WK#196353, WK#195361, WK#177327, WK#197278

Conclusion

I hope that at this point these 2 facts are clear now:

  • The white-space: break-spaces feature is a very simple but powerful feature that provides a new line breaking behavior, based on unix-terminal systems.
  • Although it’s a simple feature, on the paper (spec), it implies a considerable amount of work so that it reaches the browser and it’s available for web authors.

In this post I tried to explain in a simple way the main purpose of this new feature and also some interesting corner cases and combinations with other Line Breaking features. The demos I used shown 2 different use cases of this feature, but there are may more. I’m sure the creativity of web authors will push the feature to the limits; by then, I’ll be happy to answer doubts, about the spec or the implementation for the web engines, and of course fix the bugs that may appear once the feature is more used.

Igalia logo
Bloomberg logo

Igalia and Bloomberg working together to build a better web

Finally, I want to thank Bloomberg for supporting the work to implement this feature. It’s another example of how non-browser vendors can influence the Web Platform and contribute with actual features that will be eventually available for web authors. This is the kind of vision that we need if we want to keep a healthy, open and independent Web Platform.

by jfernandez at June 10, 2019 08:11 PM

June 06, 2019

Eleni Maria Stea

Depth-aware upsampling experiments (Part 3.2: Improving the upsampling using normals to classify the samples)

This post is again about improving the upsampling of the half-resolution SSAO render target used in the VKDF sponza demo that was written by Iago Toral. I am going to explain how I used information from the normals to understand if the samples of each 2×2 neighborhood we check during the upsampling belong to the … Continue reading Depth-aware upsampling experiments (Part 3.2: Improving the upsampling using normals to classify the samples)

by hikiko at June 06, 2019 08:15 PM

June 05, 2019

Eleni Maria Stea

Depth-aware upsampling experiments (Part 3.1: Improving the upsampling using depths to classify the samples)

In my previous posts of these series I analyzed the basic idea behind the depth-aware upsampling techniques. In the first post [1], I implemented the nearest depth sampling algorithm [3] from NVIDIA and in the second one [2], I compared some methods that are improving the quality of the z-buffer downsampled data that I use … Continue reading Depth-aware upsampling experiments (Part 3.1: Improving the upsampling using depths to classify the samples)

by hikiko at June 05, 2019 07:41 PM

Depth-aware upsampling experiments (Part 2: Improving the Z-buffer downsampling)

In the previous post of these series, I tried to explain the nearest depth algorithm [1] that I used to improve Iago Toral‘s SSAO upscaling in the sponza demo of VKDF. Although the nearest depth was improving the ambient occlusion in higher resolutions the results were not very good, so I decided to try more … Continue reading Depth-aware upsampling experiments (Part 2: Improving the Z-buffer downsampling)

by hikiko at June 05, 2019 01:37 PM

June 04, 2019

Eleni Maria Stea

Some additions to vkrunner

A new option has been added to Vkrunner (the Vulkan shader testing tool written by Neil Roberts) to allow selecting the Vulkan device for each shader test. Usage: [crayon-5d1318142a4f3596372690/] or [crayon-5d1318142a504776842836/]   When the device id is not set, the default GPU is used. IDs start from 1 to match the convention of the VK-GL-CTS … Continue reading Some additions to vkrunner

by hikiko at June 04, 2019 12:30 PM

June 03, 2019

Andy Wingo

pictie, my c++-to-webassembly workbench

Hello, interwebs! Today I'd like to share a little skunkworks project with y'all: Pictie, a workbench for WebAssembly C++ integration on the web.

loading pictie...

JavaScript disabled, no pictie demo. See the pictie web page for more information. >&&<&>>>&&><<>>&&<><>>

wtf just happened????!?

So! If everything went well, above you have some colors and a prompt that accepts Javascript expressions to evaluate. If the result of evaluating a JS expression is a painter, we paint it onto a canvas.

But allow me to back up a bit. These days everyone is talking about WebAssembly, and I think with good reason: just as many of the world's programs run on JavaScript today, tomorrow much of it will also be in languages compiled to WebAssembly. JavaScript isn't going anywhere, of course; it's around for the long term. It's the "also" aspect of WebAssembly that's interesting, that it appears to be a computing substrate that is compatible with JS and which can extend the range of the kinds of programs that can be written for the web.

And yet, it's early days. What are programs of the future going to look like? What elements of the web platform will be needed when we have systems composed of WebAssembly components combined with JavaScript components, combined with the browser? Is it all going to work? Are there missing pieces? What's the status of the toolchain? What's the developer experience? What's the user experience?

When you look at the current set of applications targetting WebAssembly in the browser, mostly it's games. While compelling, games don't provide a whole lot of insight into the shape of the future web platform, inasmuch as there doesn't have to be much JavaScript interaction when you have an already-working C++ game compiled to WebAssembly. (Indeed, much of the incidental interactions with JS that are currently necessary -- bouncing through JS in order to call WebGL -- people are actively working on removing all of that overhead, so that WebAssembly can call platform facilities (WebGL, etc) directly. But I digress!)

For WebAssembly to really succeed in the browser, there should also be incremental stories -- what does it look like when you start to add WebAssembly modules to a system that is currently written mostly in JavaScript?

To find out the answers to these questions and to evaluate potential platform modifications, I needed a small, standalone test case. So... I wrote one? It seemed like a good idea at the time.

pictie is a test bed

Pictie is a simple, standalone C++ graphics package implementing an algebra of painters. It was created not to be a great graphics package but rather to be a test-bed for compiling C++ libraries to WebAssembly. You can read more about it on its github page.

Structurally, pictie is a modern C++ library with a functional-style interface, smart pointers, reference types, lambdas, and all the rest. We use emscripten to compile it to WebAssembly; you can see more information on how that's done in the repository, or check the README.

Pictie is inspired by Peter Henderson's "Functional Geometry" (1982, 2002). "Functional Geometry" inspired the Picture language from the well-known Structure and Interpretation of Computer Programs computer science textbook.

prototype in action

So far it's been surprising how much stuff just works. There's still lots to do, but just getting a C++ library on the web is pretty easy! I advise you to take a look to see the details.

If you are thinking of dipping your toe into the WebAssembly water, maybe take a look also at Pictie when you're doing your back-of-the-envelope calculations. You can use it or a prototype like it to determine the effects of different compilation options on compile time, load time, throughput, and network trafic. You can check if the different binding strategies are appropriate for your C++ idioms; Pictie currently uses embind (source), but I would like to compare to WebIDL as well. You might also use it if you're considering what shape your C++ library should have to have a minimal overhead in a WebAssembly context.

I use Pictie as a test-bed when working on the web platform; the weakref proposal which adds finalization, leak detection, and working on the binding layers around Emscripten. Eventually I'll be able to use it in other contexts as well, with the WebIDL bindings proposal, typed objects, and GC.

prototype the web forward

As the browser and adjacent environments have come to dominate programming in practice, we lost a bit of the delightful variety from computing. JS is a great language, but it shouldn't be the only medium for programs. WebAssembly is part of this future world, waiting in potentia, where applications for the web can be written in any of a number of languages. But, this future world will only arrive if it "works" -- if all of the various pieces, from standards to browsers to toolchains to virtual machines, only if all of these pieces fit together in some kind of sensible way. Now is the early phase of annealing, when the platform as a whole is actively searching for its new low-entropy state. We're going to need a lot of prototypes to get from here to there. In that spirit, may your prototypes be numerous and soon replaced. Happy annealing!

by Andy Wingo at June 03, 2019 10:10 AM

May 28, 2019

Eleni Maria Stea

Depth-aware upsampling experiments (Part 1: Nearest depth)

This post is about different depth aware techniques I tried in order to improve the upsampling of the low resolution Screen Space Ambient Occlusion (SSAO) texture of a VKDF demo. VKDF is a library and collection of Vulkan demos, written by Iago Toral. In one of his demos (the sponza), Iago implemented SSAO among many … Continue reading Depth-aware upsampling experiments (Part 1: Nearest depth)

by hikiko at May 28, 2019 08:42 PM

May 24, 2019

Andy Wingo

lightening run-time code generation

The upcoming Guile 3 release will have just-in-time native code generation. Finally, amirite? There's lots that I'd like to share about that and I need to start somewhere, so this article is about one piece of it: Lightening, a library to generate machine code.

on lightning

Lightening is a fork of GNU Lightning, adapted to suit the needs of Guile. In fact at first we chose to use GNU Lightning directly, "vendored" into the Guile source respository via the git subtree mechanism. (I see that in the meantime, git gained a kind of a subtree command; one day I will have to figure out what it's for.)

GNU Lightning has lots of things going for it. It has support for many architectures, even things like Itanium that I don't really care about but which a couple Guile users use. It abstracts the differences between e.g. x86 and ARMv7 behind a common API, so that in Guile I don't need to duplicate the JIT for each back-end. Such an abstraction can have a slight performance penalty, because maybe it missed the opportunity to generate optimal code, but this is acceptable to me: I was more concerned about the maintenance burden, and GNU Lightning seemed to solve that nicely.

GNU Lightning also has fantastic documentation. It's written in C and not C++, which is the right thing for Guile at this time, and it's also released under the LGPL, which is Guile's license. As it's a GNU project there's a good chance that GNU Guile's needs might be taken into account if any changes need be made.

I mentally associated Paolo Bonzini with the project, who I knew was a good no-nonsense hacker, as he used Lightning for a smalltalk implementation; and I knew also that Matthew Flatt used Lightning in Racket. Then I looked in the source code to see architecture support and was pleasantly surprised to see MIPS, POWER, and so on, so I went with GNU Lightning for Guile in our 2.9.1 release last October.

on lightening the lightning

When I chose GNU Lightning, I had in mind that it was a very simple library to cheaply write machine code into buffers. (Incidentally, if you have never worked with this stuff, I remember a time when I was pleasantly surprised to realize that an assembler could be a library and not just a program that processes text. A CPU interprets machine code. Machine code is just bytes, and you can just write C (or Scheme, or whatever) functions that write bytes into buffers, and pass those buffers off to the CPU. Now you know!)

Anyway indeed GNU Lightning 1.4 or so was that very simple library that I had in my head. I needed simple because I would need to debug any problems that came up, and I didn't want to add more complexity to the C side of Guile -- eventually I should be migrating this code over to Scheme anyway. And, of course, simple can mean fast, and I needed fast code generation.

However, GNU Lightning has a new release series, the 2.x series. This series is a rewrite in a way of the old version. On the plus side, this new series adds all of the weird architectures that I was pleasantly surprised to see. The old 1.4 didn't even have much x86-64 support, much less AArch64.

This new GNU Lightning 2.x series fundamentally changes the way the library works: instead of having a jit_ldr_f function that directly emits code to load a float from memory into a floating-point register, the jit_ldr_f function now creates a node in a graph. Before code is emitted, that graph is optimized, some register allocation happens around call sites and for temporary values, dead code is elided, and so on, then the graph is traversed and code emitted.

Unfortunately this wasn't really what I was looking for. The optimizations were a bit opaque to me and I just wanted something simple. Building the graph took more time than just emitting bytes into a buffer, and it takes more memory as well. When I found bugs, I couldn't tell whether they were related to my usage or in the library itself.

In the end, the node structure wasn't paying its way for me. But I couldn't just go back to the 1.4 series that I remembered -- it didn't have the architecture support that I needed. Faced with the choice between changing GNU Lightning 2.x in ways that went counter to its upstream direction, switching libraries, or refactoring GNU Lightning to be something that I needed, I chose the latter.

in which our protagonist cannot help himself

Friends, I regret to admit: I named the new thing "Lightening". True, it is a lightened Lightning, yes, but I am aware that it's horribly confusing. Pronounced like almost the same, visually almost identical -- I am a bad person. Oh well!!

I ported some of the existing GNU Lightning backends over to Lightening: ia32, x86-64, ARMv7, and AArch64. I deleted the backends for Itanium, HPPA, Alpha, and SPARC; they have no Debian ports and there is no situation in which I can afford to do QA on them. I would gladly accept contributions for PPC64, MIPS, RISC-V, and maybe S/390. At this point I reckon it takes around 20 hours to port an additional backend from GNU Lightning to Lightening.

Incidentally, if you need a code generation library, consider your choices wisely. It is likely that Lightening is not right for you. If you can afford platform-specific code and you need C, Lua's DynASM is probably the right thing for you. If you are in C++, copy the assemblers from a JavaScript engine -- C++ offers much more type safety, capabilities for optimization, and ergonomics.

But if you can only afford one emitter of JIT code for all architectures, you need simple C, you don't need register allocation, you want a simple library to just include in your source code, and you are good with the LGPL, then Lightening could be a thing for you. Check the gitlab page for info on how to test Lightening and how to include it into your project.

giving it a spin

Yesterday's Guile 2.9.2 release includes Lightening, so you can give it a spin. The switch to Lightening allowed us to lower our JIT optimization threshold by a factor of 50, letting us generate fast code sooner. If you try it out, let #guile on freenode know how it went. In any case, happy hacking!

by Andy Wingo at May 24, 2019 08:44 AM

May 23, 2019

Andy Wingo

bigint shipping in firefox!

I am delighted to share with folks the results of a project I have been helping out on for the last few months: implementation of "BigInt" in Firefox, which is finally shipping in Firefox 68 (beta).

what's a bigint?

BigInts are a new kind of JavaScript primitive value, like numbers or strings. A BigInt is a true integer: it can take on the value of any finite integer (subject to some arbitrarily large implementation-defined limits, such as the amount of memory in your machine). This contrasts with JavaScript number values, which have the well-known property of only being able to precisely represent integers between -253 and 253.

BigInts are written like "normal" integers, but with an n suffix:

var a = 1n;
var b = a + 42n;
b << 64n
// result: 793209995169510719488n

With the bigint proposal, the usual mathematical operations (+, -, *, /, %, <<, >>, **, and the comparison operators) are extended to operate on bigint values. As a new kind of primitive value, bigint values have their own typeof:

typeof 1n
// result: 'bigint'

Besides allowing for more kinds of math to be easily and efficiently expressed, BigInt also allows for better interoperability with systems that use 64-bit numbers, such as "inodes" in file systems, WebAssembly i64 values, high-precision timers, and so on.

You can read more about the BigInt feature over on MDN, as usual. You might also like this short article on BigInt basics that V8 engineer Mathias Bynens wrote when Chrome shipped support for BigInt last year. There is an accompanying language implementation article as well, for those of y'all that enjoy the nitties and the gritties.

can i ship it?

To try out BigInt in Firefox, simply download a copy of Firefox Beta. This version of Firefox will be fully released to the public in a few weeks, on July 9th. If you're reading this in the future, I'm talking about Firefox 68.

BigInt is also shipping already in V8 and Chrome, and my colleague Caio Lima has an project in progress to implement it in JavaScriptCore / WebKit / Safari. Depending on your target audience, BigInt might be deployable already!

thanks

I must mention that my role in the BigInt work was relatively small; my Igalia colleague Robin Templeton did the bulk of the BigInt implementation work in Firefox, so large ups to them. Hearty thanks also to Mozilla's Jan de Mooij and Jeff Walden for their patient and detailed code reviews.

Thanks as well to the V8 engineers for their open source implementation of BigInt fundamental algorithms, as we used many of them in Firefox.

Finally, I need to make one big thank-you, and I hope that you will join me in expressing it. The road to ship anything in a web browser is long; besides the "simple matter of programming" that it is to implement a feature, you need a specification with buy-in from implementors and web standards people, you need a good working relationship with a browser vendor, you need willing technical reviewers, you need to follow up on the inevitable security bugs that any browser change causes, and all of this takes time. It's all predicated on having the backing of an organization that's foresighted enough to invest in this kind of long-term, high-reward platform engineering.

In that regard I think all people that work on the web platform should send a big shout-out to Tech at Bloomberg for making BigInt possible by underwriting all of Igalia's work in this area. Thank you, Bloomberg, and happy hacking!

by Andy Wingo at May 23, 2019 12:13 PM

May 21, 2019

Adrián Pérez

WPE WebKit 2.24

While WPE WebKit 2.24 has now been out for a couple of months, it includes over a year of development effort since our first official release, which means there is plenty to talk about. Let's dive in!

API & ABI Stability

The public API for WPE WebKit has been essentially unchanged since the 2.22.x releases, and we consider it now stable and its version has been set to 1.0. The pkg-config modules for the main components have been updated accordingly and are now named wpe-1.0 (for libwpe), wpebackend-fdo-1.0 (the FDO backend), and wpe-webkit-1.0 (WPE WebKit itself).

Our plan for the foreseeable future is to keep the WPE WebKit API backwards-compatible in the upcoming feature releases. On the other hand, the ABI might change, but will be kept compatible if possible, on a best-effort basis.

Both API and ABI are always guaranteed to remain compatible inside the same stable release series, and we are trying to follow a strict “no regressions allowed” policy for patch releases. We have added a page in the Web site which summarizes the WPE WebKit release schedule and this API/ABI stability guarantee.

This should allow distributors to always ship the latest available point release in a stable series. This is something we always strongly recommend because almost always they include fixes for security vulnerabilities.

Security

Web engines are security-critical software components, on which users rely every day for visualizing and manipulating sensitive information like personal data, medical records, or banking information—to name a few. Having regular releases means that we are able to publish periodical security advisories detailing the vulnerabilities fixed by them.

As WPE WebKit and WebKitGTK share a number of components with each other, advisories and the releases containing the corresponding fixes are published in sync, typically on the same day.

The team takes security seriously, and we are always happy to receive notice of security bugs. We ask reporters to act responsibly and observe the WebKit security policy for guidance.

Content Filtering

This new feature provides access to the WebKit internal content filtering engine, also used by Safari content blockers. The implementation is quite interesting: filtering rule sets are written as JSON documents, which are parsed and compiled to a compact bytecode representation, and a tiny virtual machine which executes it for every resource load. This way deciding whether a resource load should be blocked adds very little overhead, at the cost of a (potentially) slow initial compilation. To give you an idea: converting the popular EasyList rules to JSON results in a ~15 MiB file that can take up to three seconds to compile on ARM processors typically used in embedded devices.

In order to penalize application startup as little as possible, the new APIs are fully asynchronous and compilation is offloaded to a worker thread. On top of that, compiled rule sets are cached on disk to be reused across different runs of the same application (see WebKitUserContentFilterStore for details). Last but not least, the compiled bytecode is mapped on memory and shared among all the processes which need it: a browser with many tabs opened will practically use the same amount of memory for content filtering than one with a single Web page loaded. The implementation is shared by the GTK and WPE WebKit ports.

I had been interested in implementing support for content filtering even before the WPE WebKit port existed, with the goal of replacing the ad blocker in GNOME Web. Some of the code had been laying around in a branch since the 2016 edition of the Web Engines Hackfest, it moved from my old laptop to the current one, and I worked on it on-and-off while the different patches needed to make it work slowly landed in the WebKit repository—one of the patches went through as many as seventeen revisions! At the moment I am still working on replacing the ad blocker in Web—on my free time—which I expect will be ready for GNOME 3.34.

It's All Text!

No matter how much the has evolved over the years, almost every Web site out there still needs textual content. This is one department where 2.24.x shines: text rendering.

Carlos García has been our typography hero during the development cycle: he single-handedly implemented support for variable fonts (demo), fixed our support for composite emoji (like 🧟‍♀️, composed of the glyphs “woman” and “zombie”), and improved the typeface selection algorithm to prefer coloured fonts for emoji. Additionally, many other subtle issues have been fixed, and the latest two patch releases include important fixes for text rendering.

Tip: WPE WebKit uses locally installed fonts as fallback. You may want to install at least one coloured font like Twemoji, which will ensure emoji glyphs can always be displayed.

API Ergonomics

GLib 2.44 added a nifty feature back in 2015: automatic cleanup of variables when they go out of scope using g_auto, g_autofree, and g_autoptr.

We have added the needed bits in the headers to allow their usage with the types from the WPE WebKit API. This enables developers to write code less likely to introduce accidental memory leaks because they do not need to remember freeing resources manually:

WebKitWebView* create_view (void)
{
    g_autoptr(WebKitWebContext) ctx = webkit_web_context_new ();
    /*
     * Configure "ctx" to your liking here. At the end of the scope (this
     * function), a g_object_unref(ctx) call will be automatically done.
     */
    return webkit_web_view_new_with_context (ctx);
}

Note that this does not change the API (nor the ABI). You will need to build your applications with GCC or Clang to make use of this feature.

“Featurism” and “Embeddability”

Look at that, I just coined two new “technobabble” terms!

There are many other improvements which are shipping right now in WPE WebKit. The following list highlights the main user and developer visible features that can be found in the 2.24.x versions:

  • A new GObject based API for JavaScriptCore.
  • A fairly complete WebDriver implementation. There is a patch for supporting WPE WebKit in Selenium pending to be integrated. Feel free to vote 👍 for it to be merged.
  • WPEQt, which provides an idiomatic API similar to that of QWebView and allows embedding WPE WebKit as a widget in Qt/QML applications.
  • Support for the JPEG2000 image format. Michael Catanzaro has written about the reasoning behind this in his write-up about WebKitGTK 2.24.
  • Allow configuring the background of the WebKitWebView widget. Translucent backgrounds work as expected, which allows for novel applications like overlaying Web content on top of video streams.
  • An opt-in 16bpp rendering mode, which can be faster in some cases—remember to measure and profile in your target hardware! For now this only works with the RGB565 pixel format, which is the most common one used in embedded devices where 24bpp and 32bpp modes are not available.
  • Support for hole-punching using external media players. Note that at the moment there is no public API for this and you will need to patch the WPE WebKit code to plug your playback engine.

Despite all the improvements and features, still the main focus of WPE WebKit is providing an embeddable Web engine. Fear not: new features either are opt-in (e.g. 16bpp rendering), or disabled by default and add no overhead unless enabled (WebDriver, background color), or have no measurable impact at all (g_autoptr). Not to mention that many features can be even disabled at build time, bringing to the table smaller binaries and runtime footprint—but that would be a topic for another day.

by aperez (adrian@perezdecastro.org) at May 21, 2019 07:00 PM

Eleni Maria Stea

A simple pixel shader viewer

In a previous post, I wrote about Vkrunner, and how I used it to play with fragment shaders. While I was writing the shaders for it, I had to save them, generate a PPM image and display it to see the changes. This render to image/display repetition gave me the idea to write a minimal … Continue reading A simple pixel shader viewer

by hikiko at May 21, 2019 05:52 AM

May 14, 2019

Javier Muñoz

Cephalocon Barcelona 2019

Next week I will attend Cephalocon 2019. It will take place on 19 and 20 May in Barcelona.

I will deliver a talk, under the sponsorship of my company Igalia, about Ceph Object Storage and the RGW/S3 service layer.

In this talk, I will share my experience contributing new features and bugfixes upstream that were developed through open projects in the community.

I will also review some of the contributions from Jewel to Nautilus and its impact from the product/service perspective for users and companies investing in upstream development.

Cephalocon 2019 is our second international conference and it aims to bring together more than 800 technologists and adopters from across the globe to showcase the history and future of Ceph, demonstrate real-world applications and highlight vendor solutions.

The registration of the attendees is still open. You can find more information about the event and how to register on the official event page. The complete schedule is also available.

See you there!

Update 2019/05/25

by Javier at May 14, 2019 10:00 PM

Alicia Boya

validateflow: A new tool to test GStreamer pipelines

It has been a while since GstValidate has been available. GstValidate has made it easier to write integration tests that check that playback and transcoding executing actions (like seeking, changing subtitle tracks, etc…) work as expected; testing at a high level rather than fine exact/fine grained data flow.

As GStreamer is applied to an ever wider variety of cases, testing often becomes cumbersome for those cases that resemble less typical playback. On one hand there is the C testing framework intended for unit tests, which is admittedly low level. Even when using something like GstHarness, checking an element outputs the correct buffers and events requires a lot of manual coding. On the other hand gst-validate so far has focused mostly on assets that can be played with a typical playbin, requiring extra effort and coding for the less straightforward cases.

This has historically left many specific test cases within that middle ground without an effective way to be tested. validateflow attempts to fill this gap by allowing gst-validate to test that custom pipelines acted in a certain way produce the expected result.

validateflow itself is a GstValidate plugin that records buffers and events flowing through a given pad and records them in a log file. The first time a test is run, this log becomes the expectation log. Further executions of the test still create a new log file, but this time it’s compared against the expectation log. Any difference is reported as an error. The user can rebaseline the tests by removing the expectation log file and running it again. This is very similar to how many web browser tests work (e.g. Web Platform Tests).

How to get it

validateflow has been landed recently on the development versions of GStreamer. Before 1.16 is released you’ll be able to use it by checking out the latest master branches of GStreamer subprojects, preferably with something like gst-build.

Make sure to update both gst-devtools. Then update gst-integration-testsuites by running the following command, that will update the repo and fetch media files. Otherwise you will get errors.

gst-validate-launcher --sync -L

Writing tests

The usual way to use validateflow is through pipelines.json, a file parsed by the validate test suite (the one run by default by gst-validate-launcher) where all the necessary elements of a validateflow tests can be placed together.

For instance:

"qtdemux_change_edit_list":
{
    "pipeline": "appsrc ! qtdemux ! fakesink async=false",
    "config": [
        "%(validateflow)s, pad=fakesink0:sink, record-buffers=false"
    ],
    "scenarios": [
        {
            "name": "default",
            "actions": [
                "description, seek=false, handles-states=false",
                "appsrc-push, target-element-name=appsrc0, file-name=\"%(medias)s/fragments/car-20120827-85.mp4/init.mp4\"",
                "appsrc-push, target-element-name=appsrc0, file-name=\"%(medias)s/fragments/car-20120827-85.mp4/media1.mp4\"",
                "checkpoint, text=\"A moov with a different edit list is now pushed\"",
                "appsrc-push, target-element-name=appsrc0, file-name=\"%(medias)s/fragments/car-20120827-86.mp4/init.mp4\"",
                "appsrc-push, target-element-name=appsrc0, file-name=\"%(medias)s/fragments/car-20120827-86.mp4/media2.mp4\"",
                "stop"
            ]
        }
    ]
},

These are:

  • pipeline: A string with the same syntax of gst-launch describing the pipeline to use. Python string interpolation can be used to get the path to the medias directory where audio and video assets are placed in the gst-integration-testsuites repo by writing %(media)s. It can also be used to get a video or audio sink that can be muted, with %(videosink)s or %(audiosink)s
  • config: A validate configuration file. Among other things that can be set, here validateflow overrides are defined, one per line, with %(validateflow)s, which expands to validateflow, plus some options defining where the logs will be written (which depends on the test name). Each override monitors one pad. The settings here define which pad, and what will be recorded.
  • scenarios: Usually a single scenario is provided. A series of actions performed in order on the pipeline. These are normal GstValidate scenarios, but new actions have been added, e.g. for controlling appsrc elements (so that you can push chunks of data in several steps instead of relying on a filesrc pushing a whole file and be done with it).
  • Running tests

    The tests defined in pipelines.json are automatically run by default when running gst-validate-launcher, since they are part of the default test suite.

    You can get the list of all the pipelines.json tests like this:

    gst-validate-launcher -L |grep launch_pipeline
    

    You can use these test names to run specific tests. The -v flag is useful to see the actions as they are executed. --gdb runs the test inside the GNU debugger.

    gst-validate-launcher -v validate.launch_pipeline.qtdemux_change_edit_list.default
    

    In the command line argument above validate. defines the name of the test suite Python file, testsuites/validate.py. The rest, launch_pipeline.qtdemux_change_edit_list.default is actually a regex: actually . just happens to match a period but it would match any character (it would be more correct, albeit also more inconvenient, to use \. instead). You can use this feature to run several related tests, for instance:

    $ gst-validate-launcher -m 'validate.launch_pipeline\.appsrc_.*'
    
    Setting up GstValidate default tests
    
    [3 / 3]  validate.launch_pipeline.appsrc_preroll_test.single_push: Passed
    
    Statistics:
    -----------                                                  
    
               Total time spent: 0:00:00.369149 seconds
    
               Passed: 3
               Failed: 0
               ---------
               Total: 3
    

    Expectation files are stored in a directory named flow-expectations, e.g.:

    ~/gst-validate/gst-integration-testsuites/flow-expectations/qtdemux_change_edit_list/log-fakesink0:sink-expected
    

    The actual output log (which is compared to the expectations) is stored as a log file, e.g.:

    ~/gst-validate/logs/qtdemux_change_edit_list/log-fakesink0:sink-actual
    

    Here is how a validateflow log looks.

    event stream-start: GstEventStreamStart, flags=(GstStreamFlags)GST_STREAM_FLAG_NONE, group-id=(uint)1;
    event caps: video/x-h264, stream-format=(string)avc, alignment=(string)au, level=(string)2.1, profile=(string)main, codec_data=(buffer)014d4015ffe10016674d4015d901b1fe4e1000003e90000bb800f162e48001000468eb8f20, width=(int)426, height=(int)240, pixel-aspect-ratio=(fraction)1/1;
    event segment: format=TIME, start=0:00:00.000000000, offset=0:00:00.000000000, stop=none, time=0:00:00.000000000, base=0:00:00.000000000, position=0:00:00.000000000
    event tag: GstTagList-stream, taglist=(taglist)"taglist\,\ video-codec\=\(string\)\"H.264\\\ /\\\ AVC\"\;";
    event tag: GstTagList-global, taglist=(taglist)"taglist\,\ datetime\=\(datetime\)2012-08-27T01:00:50Z\,\ container-format\=\(string\)\"ISO\\\ fMP4\"\;";
    event tag: GstTagList-stream, taglist=(taglist)"taglist\,\ video-codec\=\(string\)\"H.264\\\ /\\\ AVC\"\;";
    event caps: video/x-h264, stream-format=(string)avc, alignment=(string)au, level=(string)2.1, profile=(string)main, codec_data=(buffer)014d4015ffe10016674d4015d901b1fe4e1000003e90000bb800f162e48001000468eb8f20, width=(int)426, height=(int)240, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)24000/1001;
    
    CHECKPOINT: A moov with a different edit list is now pushed
    
    event caps: video/x-h264, stream-format=(string)avc, alignment=(string)au, level=(string)3, profile=(string)main, codec_data=(buffer)014d401effe10016674d401ee8805017fcb0800001f480005dc0078b168901000468ebaf20, width=(int)640, height=(int)360, pixel-aspect-ratio=(fraction)1/1;
    event segment: format=TIME, start=0:00:00.041711111, offset=0:00:00.000000000, stop=none, time=0:00:00.000000000, base=0:00:00.000000000, position=0:00:00.041711111
    event tag: GstTagList-stream, taglist=(taglist)"taglist\,\ video-codec\=\(string\)\"H.264\\\ /\\\ AVC\"\;";
    event tag: GstTagList-stream, taglist=(taglist)"taglist\,\ video-codec\=\(string\)\"H.264\\\ /\\\ AVC\"\;";
    event caps: video/x-h264, stream-format=(string)avc, alignment=(string)au, level=(string)3, profile=(string)main, codec_data=(buffer)014d401effe10016674d401ee8805017fcb0800001f480005dc0078b168901000468ebaf20, width=(int)640, height=(int)360, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)24000/1001;
    

    Prerolling and appsrc

    By default scenarios don’t start executing actions until the pipeline is playing. Also by default sinks require a preroll for that to occur (that is, a buffer must reach the sink before the state transition to playing is completed).

    This poses a problem for scenarios using appsrc, as no action will be executed until a buffer reaches the sink, but a buffer can only be pushed as the result of an appsrc-push action, creating a chicken and egg problem.

    For many cases that don’t require playback we can solve this simply by disabling prerolling altogether, setting async=false in the sinks.

    For cases where prerolling is desired (like playback), handles_states=true should be set in the scenario description. This makes the scenario actions run without having to wait for a state change. appsrc-push will notice the pipeline is in a state where buffers can’t flow and enqueue the buffer without waiting for it so that the next action can run immediately. Then the set-state can be used to set the state of the pipeline to playing, which will let the appsrc emit the buffer.

    description, seek=false, handles-states=true
    appsrc-push, target-element-name=appsrc0, file-name="raw_h264.0.mp4"
    set-state, state=playing
    appsrc-eos, target-element-name=appsrc0
    

    Documentation

    The documentation of validateflow, explaining its usage in more detail can be found here:

    https://gstreamer.freedesktop.org/documentation/gst-devtools-1.0/plugins/validateflow.html

    by aboya at May 14, 2019 02:13 PM

    May 06, 2019

    Eleni Maria Stea

    Vkrunner allows specifying the required Vulkan version

    The required Vulkan implementation version for a Vkrunner shader test can now be specified in its [require] section. Tests that are targeting Vulkan versions that aren’t supported by the device driver will be skipped. Syntax:[crayon-5d1318142c2ff226894666/]   For example:[crayon-5d1318142c310289729643/]   Reminding that vkrunner is a Vulkan shader testing tool similar to piglit that was written by … Continue reading Vkrunner allows specifying the required Vulkan version

    by hikiko at May 06, 2019 07:11 PM

    Having fun with Vkrunner!

    Vkrunner is a Vulkan shader testing tool similar to Piglit, written by Neil Roberts. It is mostly used by graphics drivers developers, and was also part of the official Khronos conformance tests suite repository (VK-GL-CTS) for some time [1]. There are already posts [2] about its use but they are all written from a driver … Continue reading Having fun with Vkrunner!

    by hikiko at May 06, 2019 07:10 PM

    April 29, 2019

    Asumu Takikawa

    WebAssembly in Redex

    Recently I’ve been studying the semantics of WebAssembly (wasm). If you’ve never heard of WebAssembly, it’s a new web programming language that’s planned to be a low-level analogue to JavaScript that’s supported by multiple browsers. Wasm could be used as a compilation target for a variety of new frontend languages for web programming.

    (see also Lin Clark’s illustrated blog series on WebAssembly for a nice introduction)

    As a cross-browser effort, the language comes with a detailed independent specification. A very nice aspect of the WebAssembly spec is that the language’s semantics are specified precisely with a small-step operational semantics (it’s even a reduction semantics in the style of Felleisen and Hieb).

    A condensed version of the wasm semantics was presented in a 2017 PLDI paper written by Andreas Hass and co-authors.

    (the current semantics of the full language is available at the WebAssembly spec website)

    In this blog post, I’m going to share some work I’ve done to make an executable version of the operational semantics from the 2017 paper. In the process, I’ll try to explain a few of the interesting design choices of wasm via the semantics.

    A good thing about operational semantics is that they are relatively easy to understand and can be easily converted into an executable form. You can think of them as basically an interpreter for your programming language that is specified precisely using mathematical relations defined over a formal grammar.

    Specifying a language in this way is helpful for proving various properties about your language. It’s also an implementation independent way to understand how programs will evaluate, which you can use to validate a production implementation.

    Because of the way that operational semantics resemble interpreters, you can construct executable operational semantics using domain specific modeling languages designed for that purpose (Redex, K, etc).

    (Note: wasm already comes with a reference interpreter written in OCaml, but it’s not using a modeling language in the sense described here)

    The modeling language I’ll use in this blog post is Redex, a DSL hosted in Racket that is designed for reduction semantics.

    Why Make an Executable Formal Semantics?

    If you’re not familiar with semantics modeling tools, you might be wondering what it even means to write an executable model. The basic process is to take the formal grammars and rules presented in an operational semantics and transcribe them as programs written in a specialized modeling language. The resulting interpreters can be used to run examples in the modeled language.

    The advantage to the executable model is that you can apply software engineering principles like randomized testing on your language semantics to see if there are bugs in your specification.

    Another benefit that you get is visualization, which can be helpful for understanding how specific programs execute.

    For example, here’s a screenshot showing a trace visualization for executing a factorial program in wasm:

    step-through of a wasm program
    Stepping through a function call in Redex

    This is a custom visualization that I made leveraging Redex’s built-in traces function, which lets you visualize every step of the reduction process. Here it shows a trace starting with a function call to an implementation of factorial. Each box show a visual representation of the instructions for that machine state. The arrows show the order of reduction, and which rule was used to produce the next state.

    In the next section I’m going to go over some background about semantics and modeling in Redex that’s needed to understand how the wasm model works (you can skip this if you already familiar with operational semantics and Redex).

    Redex & Reduction Semantics Basics

    As mentioned above, the basic idea of reduction semantics is to define a kind of formal interpreter for your programming language. In order to define this interpreter, you need to define (1) your language as a grammar, and (2) what the states for the machine that runs your programs looks like.

    In a simple language, the machine states might just be the program expressions in your language. But when you start to deal with more complicated features like memory and side effects, your machine may need additional components (like a store for representing memory, a representation of stack frames, and so on).

    In Redex, language definitions are made using the define-language form. It defines a grammar for your language using a straightforward pattern-based language that looks like BNF.

    A simple grammar might look like the following:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    (define-language simple-language
      ;; values
      (v ::= number)
    
      ;; expressions
      (e ::= v
             (add e e)
             (if e e e))
    
      ;; evaluation contexts
      (E ::= hole
             (add E e)
             (add v E)
             (if E e e)))
    

    The define-language form takes sets of non-terminals (e.g., e or v) paired with productions for terms in the language (e.g., (add e e)) and creates a language description bound to the given name (i.e., simple-language).

    This is a small and simple language with only conditionals, numbers, and basic addition. It accepts terms like (term 5), (term (+ (+ 1 2) 3)), (term (if 0 (+ 1 3) (if 1 1 2))) and so on.

    (term is a special Redex form for constructing language terms)

    In order to actually evaluate programs, we will need to define a reduction relation, which describes how machine states reduce to other states. In this simple language, states are just expressions.

    Here’s a very simple reduction relation:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    (define simple-red-v0
      (reduction-relation simple-language
        ;; machine states are just expressions "e"
        #:domain e
    
        (--> (add v_1 v_2)               ; pattern match
             ,(+ (term v_1) (term v_2))  ; result of reduction
             ;; this part is just the name of the rule
             addition)
    
        ;; two rules for conditionals, true and false
        (--> (if 1 e_t e_f) ; the _ is used to name pattern occurences, like e_t
             e_t
             if-true)
    
        (--> (if 0 e_t e_f) e_f
             if-false)))
    

    You can read this reduction-relation form as describing a relation on the states e of the language, where the --> clauses define specific rules.

    Each --> describes how to reduce terms that match a pattern, like (add v_1 v_2), into a result term. The , allows an escape from the Redex DSL into full Racket to do arbitrary computations, like arithmetic in this case.

    With this reduction relation in hand, we can evaluate examples in the language using the test-->> unit test form, which will recursively apply the reduction relation until it’s done:

    1
    2
    (test-->> simple-red-v0 (term (add 3 4)) (term 7))
    (test-->> simple-red-v0 (term (if 0 1 2)) (term 2))
    

    These tests will succeed, showing that the first term evaluates to the second in each case.

    Unfortunately, this isn’t quite enough to evaluate more complex terms:

    1
    2
    3
    4
    > (test-->> simple-red-v0 (term (add (if 0 (add 3 4) (add 18 2)) 5)) (term 25))
    FAILED ::2666
    expected: 25
      actual: '(add (if 0 (add 3 4) (add 18 2)) 5)
    

    This is because the reduction relation above is defined for every form in the language, but it doesn’t tell you how to reduce inside nested expressions. For example, there is no explicit rule for evaluating an if nested inside an add.

    Of course, writing rules for all these nested combinations doesn’t make sense. That’s where evaluation contexts come in. An evaluation context like E (see the grammar from earlier) defines where reduction can happen. Anywhere that a hole exists in an evaluation context is a reducible spot.

    So for example, (add 3 hole) and (add hole (if 0 1 2)) are valid evaluation contexts, following the productions in the grammar. On the other hand, (if (add 1 -1) hole 5) is not a valid evaluation context, because you can’t evaluate the “then” branch of a conditional before evaluating the condition.

    You can combine evaluation contexts with an in-hole form to do context-based matching in reduction relations. Adding that changes the above reduction relation to this one:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    (define simple-red-v1
      (reduction-relation simple-language
        #:domain e
    
        ;; note the in-hole that's added
        (--> (in-hole E (add v_1 v_2))
             (in-hole E ,(+ (term v_1) (term v_2)))
             addition)
    
        (--> (in-hole E (if 1 e_t e_f))
             (in-hole E e_t)
             if-true)
    
        (--> (in-hole E (if 0 e_t e_f))
             (in-hole E e_f)
             if-false)))
    

    With this new relation, the test from before will succeed:

    1
    (test-->> simple-red-v0 (term (add (if 0 (add 3 4) (add 18 2)) 5)) (term 25))
    

    This is because the in-hole pattern will match an evaluation context whose hole is substituted with a term that will be reduced.

    To make that a bit more concrete, the test above would execute a pattern match like this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    > (redex-match
       simple-language
       (in-hole E (if 0 e_1 e_2))
       (term (add (if 0 (add 3 4) (add 18 2)) 5)))
    (list
     (match
      (list
       (bind 'E '(add hole 5))
       (bind 'e_1 '(add 3 4))
       (bind 'e_2 '(add 18 2)))))
    

    This interaction shows that the evaluation context pattern E gets matched to an add term with a hole. Inside that hole is an if expression, which is where the reduction will happen.

    Evaluation contexts are powerful in that you can describe all of these nested computations with straightforward rules that only mention the inner term that is being reduced. They are also useful for describing more complicated language features as well, such as mutation and state.

    With that brief tour of Redex, the next section will give a high-level overview of a model of wasm in Redex.

    WebAssembly Design via Redex

    WebAssembly is an interesting language because its design goals are specifically oriented towards web programming. That means, for example, that programs should be compact so that web browsers don’t have to download and process large scripts.

    Security is of course also a major concern on the web, so the language is designed with isolation in mind to ensure that programs cannot access unnecessary state or interfere with the runtime system’s data.

    The desire for a “a compact program representation” (Hass et al.) led to a stack-based design, in contrast to the simple nested expression language I gave as an example earlier.

    This means that operations take values from the stack, rather than nesting them in a tree-like structure. For example, instead of an expression like (add 3 4), wasm would use something like (i32.const 3) (i32.const 4) add. This sequence of instructions pushes two constants onto the stack, and then performs an addition with values popped from the stack (and then pushes the result).

    Wasm is also statically typed with a very simple type system. The validation provided by the type system ensures that you can statically know the stack layout at any point in the program, which means that you can’t have programs that access out-of-bounds positions in the stack (or the wrong registers, if you compile stack accesses to register accesses).

    By understanding the semantics, either in math or in model code, you can see how these security concerns are addressed operationally. For example, later in this section I’ll explain how the operational semantics demonstrates wasm’s memory isolation.

    To provide a starting point for explaining the Redex model, here’s a screenshot of the grammar of wasm from Hass et al’s paper:

    wasm grammar
    Grammar from Hass et al., CC-BY 4.0

    This formal grammar can be transcribed straightforwardly as a BNF-style language definition, as explained in the previous section. The grammar in Redex looks something like this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    (define-language wasm-lang
      ;; basic types
      (t   ::= t-i t-f)
      (t-f ::= f32 f64)
      (t-i ::= i32 i64)
    
      ;; function types
      (tf  ::= (-> (t ...) (t ...)))
    
      ;; instructions (excerpted)
      (e-no-v ::= drop                      ; drop stack value
                  select                    ; select 1 of 2 values
                  (block tf e*)             ; control block
                  (loop tf e*)              ; looping
                  (if tf e* else e*)        ; conditional
                  (br i)                    ; branch
                  (br-if i)                 ; conditional branch
                  (call i)                  ; call (function by index)
                  (call cl)                 ; call (a closure)
                  return                    ; return from function
                  (get-local i)             ; get local variable
                  (set-local i)             ; set local variable
                  (label n {e*} e*)         ; branch target
                  (local n {i (v ...)} e*)  ; function body instruction
    
                  ...                       ; and so on
                  )
    
      ;; instructions including constant values
      (e    ::= e-no-v
                (const t c))
      (c    ::= number)
    
      ;; sequences of instructions
      (e*   ::= ϵ
                (e e*))
    
      ;; various kinds of indices
      ((i j l k m n a o) integer)
    
      ;; modules and various other forms omitted
      )
    

    This shows just a subset of the grammar, but you can see there’s a close correspondence to the math. The main differences are just in the surface syntax, such as how expressions are ordered or nested.

    Using this syntax, the addition instructions from before would inhabit the e* non-terminal for a sequence of instructions. Following the grammar, it would be written as ((const i32 4) ((const i32 3) (add ε))). This instruction sequence is represented with a nested list (where ε is the null list) rather than a flat sequence to make it easier to manipulate in Redex.

    We also need to define evaluation contexts for this language. This is thankfully pretty simple:

    1
    2
    3
    (E ::= hole
           (v E)
           ((label n (e*) E) e*))
    

    What these contexts means is that either you will look for a matching sequence of instructions that comes after a nested prefix (possibly empty) of values v to reduce, or you will look to reduce inside a label form (or a combination of these two patterns).

    Unlike the simple language from earlier, wasm has side effects, memory, modules, and other features. Because of this complexity, the machine states of wasm have to include a store (the non-terminal s), a call frame (F), a sequence of instructions (e*), and an index for the current module instance (i).

    As a result, the reduction relation becomes more complicated. The structure of the relation looks like this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    (define wasm->
      (reduction-relation
       wasm-runtime-lang
    
       ;; the machine states
       #:domain (s F e* i)
    
       ;; a subset of reduction rules shown below
    
       ;; rule for binary operations, like add or sub
       (++> ;; this pattern matches two consts on the stack
            (in-hole E ((const t c_1) ((const t c_2) ((binop t) e*))))
            ;; the two consts are consumed, and replaced with a result const
            (in-hole E ((const t c) e*))
            ;; do-binop is a helper function for executing the operations
            (where c (do-binop binop t c_1 c_2))
            binop)
    
       ;; rule for just dropping a stack value
       (++> (in-hole E (v (drop e*))) ; consumes one v = (const t c)
            (in-hole E e*)            ; return remaining instructions e*
            drop)
    
       ;; more rules would go here
       ...
    
       ;; shorthand reduction definitions
       with
       [(--> (s F x i)
             (s F y i))
        (++> x y)]))
    

    Again, we have the #:domain keyword indicating what the machine states are. Then there are two rules for binary operations and the drop instruction (other rules omitted for now).

    The rules use a shorthand form ++> (defined at the bottom) that only match the instruction sequence, and ignores the store, call frame, and so on. This is just used to simplify how the rules look to match the paper.

    For comparison, here’s a screenshot of the full reduction semantics figure from the wasm paper:

    wasm reduction rules
    Reduction relation from Hass et al., CC-BY 4.0

    You can look at the 3rd and 12th rules in that figure and compare them to the two in the code excerpt above. You can see that there’s a close correspondence. In this fashion, you can transcribe all the rules from math to code, though you also have to write a significant amount of helper code to implement the side conditions in the rules.

    As I promised earlier, by inspecting the semantics, you can see how the language isolates the memory of wasm scripts that are executing. The store term s, whose grammar I omitted earlier, is defined like this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
      ;; stores: contains modules, tables, memories
      (s       ::= {(inst modinst ...)
                    (tab tabinst ...)
                    (mem meminst ...)})
    
      ;; modules with a list of closures cl, global values v,
      ;; table index and memory index
      (modinst ::= {(func cl ...) (glob v ...)}
                   ;; tab and mem are optional, hence all these cases
                   {(func cl ...) (glob v ...) (tab i)}
                   {(func cl ...) (glob v ...) (mem i)}
                   {(func cl ...) (glob v ...) (tab i) (mem i)})
    
      ;; tables are lists of closures
      (tabinst ::= (cl ...))
    
      ;; memories are lists of bytes
      (meminst ::= (b ...))
    

    The store is a structure containing some module, table, and memory instances. These are basically several kinds of global state that the instructions need to reference to accomplish various non-local operations, such as memory access, global variable access, function calls, and so on.

    Each module instance contains a list of functions (stored as closures cl), a list of global variables, and optionally indices specifying tables and memories associated with the module.

    The functions and global variables have an obvious purpose, which is to allow function calls to fetch the code for the function and allowing global variables to be read/written.

    The table index inside modules allow dynamic dispatch to functions stored in a shared table of closures. This lets you use a function pointer-like dispatch pattern without the dangers of pointers into memory.

    Finally, each module may declare an associated memory via an index, which means different modules can share access to the same memory if desired.

    This index number indexes into the list of memory instances (mem meminst ...), each of which is just represented as a list of bytes (b ...). The memory can be used for loads and stores to represent structured data or for other uses of memory.

    The Redex code for memory load and store rules look like this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
       ;; reductions for operating on memory
       (--> ;; one stack argument for address in memory, k
            (s F (in-hole E ((const i32 k) ((load t a o) e*))) i)
    
            ;; reintrepret bytes as the appropriately typed data
            (s F (in-hole E ((const t (const-reinterpret t (b ...))) e*)) i)
    
            ;; helper function fetches bytes from appropriate memory in s
            (where (b ...) (store-mem s i ,(+ (term k) (term o)) (sizeof t)))
            load)
    
       (--> ;; stack arguments for address k and new value c
            (s F (in-hole E ((const i32 k) ((const t c) ((store t a o) e*)))) i)
           
            ;; installs a new store s_new with the memory changed
            (s_new F (in-hole E e*) i)
    
            ;; the size in bytes for the given type
            (where n (sizeof t))
    
            ;; helper function modifies the memory in s, creating s_new
            (where s_new (store-mem= s i ,(+ (term k) (term o)) n (bits n t c)))
            store)
    

    These rules are a little more complicated and rely on various helper functions. The helper functions are not too complicated, however, and basically just do the appropriate indexing into the store data structure.

    One key difference from the earlier rules is the use of --> without any shorthands. This allows the rules to reference the store s to access parts of the global machine state. In this case, it’s the memory part of the store that’s needed.

    From this, you can see that wasm code only ever touches the appropriate memory instance that’s associated with the module that the code is in. More specifically, that’s because the store lookup is done using the current module index i. No other memories can be accessed via this lookup since this index is fixed for a given function definition.

    All memory accesses are also bounds-checked to avoid access to arbitrary regions of memory that might interfere with the runtime system and result in security vulnerabilities. The bounds checking is done inside the store-mem and store-mem= helper functions, which will fail to match in the where clauses if the index k is out of bounds.

    You can get an idea of how wasm is designed with web security in mind from this particular rule and how the store is designed. If you look at other rules, you can also see how global variables and local variables are kept separate from general memory access as well, which prevents memory access from interfering with variables and vice versa.

    In addition, code (e.g., function definitions, indirectly accessed functions in tables) are not stored in the memory either, which prevents arbitrary code execution or modification. You can see this from how the function closures are stored in separate function and table sections in modules, as explained above.

    Where to go from here

    The last section gave an overview of some interesting parts of the wasm semantics, with a focus on understanding some of the isolation guarantees that the design provides.

    This is just a small taste of the full semantics, which you can understand more comprehensively by reading the paper or the execution part of the language specification. The paper’s a very interesting read, with lots of attention paid to explaining design rationales.

    You can also try examples out with the basic Redex model that I’ve built, though with the caveat that it’s not quite complete. For example, I didn’t implement the type system and there are some rough edges around the numeric operations.

    There are also interesting ways in which the model could be extended.

    For one, it could cover the full semantics that’s described in the spec instead of the small version in the paper. If you combined that with a parser for wasm’s surface syntax, you could run real wasm programs that a web browser would understand and trace through the reductions.

    Wasm’s reference interpreter also comes with a lot of tests. With a proper parser, we could feed in all the tests to make sure the model is accurate (I’m sure there are bugs in my model).

    It would then be interesting to do random testing to discover if there are discrepancies between the specification-based semantics encoded in the executable model and implementations in various web browsers.

    If anyone’s interested in exploring any of that, you can check out the code I’ve written so far on github: https://github.com/takikawa/wasm-redex

    Appendix: More Redex Discussion

    This section is an appendix of more in-depth discussion about using Redex specifically, if you’re interested in some of the nitty-gritty details.

    There were several challenges involved in actually modeling wasm in Redex. The first challenge I bumped into is that the wasm model’s reduction rules operate on sequences of values, sometimes even producing an empty sequence as a result.

    This actually makes encoding into Redex a little tricky, as Redex assumes that evaluation contexts are plugged in with a single datum and not a splicing sequence of them. The solution to this, which you can see in the rules shown above, is to explicitly match over the “remaining” sequence of expressions in the instruction stack and handle it explicitly (either discarding it or appending to it).

    Then evaluation contexts can just be plugged with a sequence of instructions or values.

    This requires a few cosmetic modifications to the grammar and rules when compared to the paper. For example, the wasm paper’s evaluation contexts simply define a basic context as (v ... hole e ...) in which the hole could be plugged with a general e ....

    Instead, the Redex model uses a context like (v E) where E can a hole or another nesting level. Then a hole is plugged in with an e* (defined as a list) representing both the thing to plug into the hole in the original rule plus the “remaining” e ... expressions.

    The rules also need to explicitly handle cons-ing and appending of expression sequences. In a sense, this is just making explicit what was implicit in the ... and juxtaposition in the paper’s rules.

    Another challenge was the indexed evaluation contexts of the paper. The paper’s contexts are indexed by a nesting depth, which constrains the kinds of contexts that a rule can apply to. In Redex it’s not possible to add such data-indexed constraints into grammars, so you end up having to apply extra side conditions in the reduction rules where such indexed contexts are used.

    For example, this shows in the rule for branching from inside of a control construct. The evaluation context E below is indexed by a nesting depth k in the paper’s rules:

    1
    2
    3
    4
    5
    (==> ((label n {e*_0} (in-hole E ((br j) e*_1))) e*_2)
         (e*-append (in-hole E_v e*_0) e*_2)
         (where j (label-depth E))
         (where (E_outer E_v) (v-split E n))
         label-br)
    

    Whereas in the code above, the nesting depth is checked with a metafunction label-depth.

    Finally, the reduction relation in wasm has a rule for operating on the local form (for function invocations), that doesn’t follow the structure of typical evaluation context based reduction.

    Specifically, the rule states that when a model state reduces as:

    s; v*; e* →_i s'; v'*; e'*,

    then you can reduce under a function’s local expression as follows:

    s; v_0*; local_n {i; v*} e* →_j s'; v_0*; local_n {i; v'*} e'*.

    That is, you can reduce under a local if you swap out the call frame and are able to reduce under the swapped out call frame.

    I think this can’t be expressed using a normal evaluation context, but it’s still possible to express in Redex as a recursive call to the reduction relation being defined. Basically, you have to call apply-reduction-relation on the right-hand side of the rule if you encounter a local form.

    To make debugging easier, you also need to use apply-reduction-relation/tag-with-names and the computed-name form for the rule name to make sure the traces form shows the right rule names.

    Here’s what the reduction rule for the local case looks like:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    ;; specifies how to reduce inside a local/frame instruction via a
    ;; recursive use of the reduction relation
    (--> (s_0 F_0 (in-hole E ((local n {i F_1} e*_0) e*_2)) j)
         (s_1 F_0 (in-hole E ((local n {i F_2} e*_1) e*_2)) j)
    
         ;; apply --> recursively
         (where any_rec
                ,(apply-reduction-relation/tag-with-names
                  wasm-> (term (s_0 F_1 e*_0 i))))
    
         ;; only apply this rule if this reduction was valid
         (side-condition (not (null? (term any_rec))))
    
         ;; the relation should be deterministic, so just take the first
         (where (string_tag (s_1 F_2 e*_1 i)) ,(first (term any_rec)))
    
         (computed-name (term string_tag)))
    

    by Asumu Takikawa at April 29, 2019 08:12 PM

    April 26, 2019

    Samuel Iglesias

    Igalia coding experience open positions

    The Igalia Coding Experience is an grant program which provides their first exposure to the professional world, working hand in hand with Igalia programmers and learning with them. The program is aimed to students with background in Computer Science, Information Technology, or Free Software development.

    This program is a great opportunity for those students willing to improve their technical skills by working on the field, learn how to contribute to open-source projects, and work together with the engineers of Igalia, which is a worker-owned company rocking in the Free Software world for more than 18 years!

    Igalia

    We are looking for candidates that are passionate about Free Software philosophy and willing to work on Free Software projects. If you have already contributed to any Free Software project related to our areas of specialization… that’s great! But don’t worry if you have not yet, we encourage you to apply as well!

    The conditions of the program are the following:

    • You will be mentored by one Igalian that is an expert in the respective field, so you are not going to be alone.
    • You will need to spend 450h working in the tasks agreed with the mentor, but you are free to distribute them along the year as it fits better for you. Usually students prefer to distribute them on timetables of 3 months working full-time, or 6 months part-time or even 1 year working 10 hours per week!
    • You are not going to do it for free. We will compensate you with 6500€ for all your work :)

    This year we are offering Coding experience positions on 6 different areas:

    • Implementation of web standards. The student will become familiar, and contribute to the implementation of W3C standards in open source web engines.

    • WebKit, one of the most important open source web rendering engines. The student will have the opportunity to help maintain and/or contribute to the development of new features.

    • Chromium, a well-known browser rendering engine. The student will work on specific features development and/or bug-fixing. Additional tasks may include maintenance of our internal buildbots, and creation of Chromium/Wayland packages for distribution.

    • Compilers, with focus on WebAssembly and JavaScript implementations. The student will contribute JS engines like V8 or JSC, working on new language features, optimizations or ports.

    • Multimedia and GStreamer, the leading open source multimedia framework. The student will help develop the Video Editing stack in GStreamer (namely GES and NLE). This work will include adding new features in any part of GStreamer, GStreamer Editing Services or in the Pitivi video editor, as well as fixing bugs in any of those components.

    • Open-source graphics stack. The student will work in the development of specific features in Mesa or in improving any of the open-source testing suites (VkRunner, piglit) used in the Mesa community. Candidates who would like to propose topics of interest to work on, please include them in the cover letter.

    The last area is the one that I have been working for more than 5 years inside the Graphics team at Igalia and I am thrilled we can offer such kind of position this year :-)

    You can find more information about Igalia Coding experience program in the website… don’t forget to apply for it! Happy hacking!

    April 26, 2019 06:30 AM

    April 22, 2019

    Thibault Saunier

    GStreamer Editing Services OpenTimelineIO support

    GStreamer Editing Services OpenTimelineIO support

    OpenTimelineIO is an Open Source API and interchange format for editorial timeline information, it basically allows some form of interoperability between the different post production Video Editing tools. It is being developed by Pixar and several other studios are contributing to the project allowing it to evolve quickly.

    We, at Igalia, recently landed support for the GStreamer Editing Services (GES) serialization format in OpenTimelineIO, making it possible to convert GES timelines to any format supported by the library. This is extremely useful to integrate GES into existing Post production workflow as it allows projects in any format supported by OpentTimelineIO to be used in the GStreamer Editing Services and vice versa.

    On top of that we are building a GESFormatter that allows us to transparently handle any file format supported by OpenTimelineIO. In practice it will be possible to use cuts produced by other video editing tools in any project using GES, for instance Pitivi:

    At Igalia we are aiming at making GStreamer ready to be used in existing Video post production pipelines and this work is one step in that direction. We are working on additional features in GES to fill the gaps toward that goal, for instance we are now implementing nested timeline support and framerate based timestamps in GES. Once we implement them, those features will enhance compatibility of Video Editing projects created from other NLE softwares through OpenTimelineIO. Stay tuned for more information!

    by thiblahute at April 22, 2019 03:21 PM

    April 14, 2019

    Javier Muñoz

    Ceph Days Galicia 2019

    On Wednesday of last week took place the second Ceph Days Galicia in Santiago de Compostela. It was organized by AMTEGA in collaboration with Red Hat, Supermicro, Colabora Ingenieros, Mellanox, Dinahosting, Aitire and Igalia.

    I presented in detail the new archive zone functionality available in Ceph Nautilus. The slides I used in the talk are available here.

    If you could not attend and are interested in the topics we talked about, you can read more about the event here. Félix and Camilo have also published a blog post in Spanish about the event.

    Thank all the people who participated in the organization and actively collaborated to make the event possible. See you at the next one!

    Acknowledgments

    by Javier at April 14, 2019 10:00 PM

    April 08, 2019

    Philippe Normand

    Introducing WPEQt, a WPE API for Qt5

    WPEQt provides a QML plugin implementing an API very similar to the QWebView API. This blog post explains the rationale behind this new project aimed for QtWebKit users.

    Qt5 already provides multiple WebView APIs, one based on QtWebKit (deprecated) and one based on QWebEngine (aka Chromium). WPEQt aims to provide a viable alternative to the former. QtWebKit is being retired and has by now lagged a lot behind upstream WebKit in terms of features and security fixes. WPEQt can also be considered as an alternative to QWebEngine but bear in mind the underlying Chromium web-engine doesn’t support the same HTML5 features as WebKit.

    WPEQt is included in WPEWebKit, starting from the 2.24 series. Bugs should be reported in WebKit’s Bugzilla. WPEQt’s code is published under the same licenses as WPEWebKit, the LGPL2 and BSD.

    At Igalia we have compared WPEQt and QtWebKit using the BrowserBench tests. The JetStream1.1 results show that WPEQt completes all the tests twice as fast as QtWebKit. The Speedometer benchmark doesn’t even finish due to a crash in the QtWebKit DFG JIT. Although the memory consumption looks similar in both engines, the upstream WPEQt engine is well maintained and includes security bug-fixes. Another advantage of WPEQt compared to QtWebKit is that its multimedia support is much stronger, with specs such as MSE, EME and media-capabilities being covered. WebRTC support is coming along as well!

    So to everybody still stuck with QtWebKit in their apps and not yet ready (or reluctant) to migrate to QtWebEngine, please have a look at WPEQt! The remaining of this post explains how to build it and test it.

    Building WPEQt

    For the time being, WPEQt only targets Linux platforms using graphics drivers compatible with wayland-egl. Therefore, the end-user Qt application has to use the wayland-egl Qt QPA plugin. Under certain circumstances the EGLFS QPA might also work, YMMV.

    Using a SVN/git WebKit snapshot

    If you have a SVN/git development checkout of upstream WebKit, then you can build WPEQt with the following commands on a Linux desktop platform:

    $ Tools/wpe/install-dependencies
    $ Tools/Scripts/webkit-flatpak --wpe --wpe-extension=qt update
    $ Tools/Scripts/build-webkit --wpe --cmakeargs="-DENABLE_WPE_QT=ON"
    

    The first command will install the main WPE host build dependencies. The second command will setup the remaining build dependencies (including Qt5) using Flatpak. The third command will build WPEWebKit along with WPEQt.

    Using the WPEWebKit 2.24 source tarball

    This procedure is already documented in the WPE Wiki page. The only change required is the new CMake option for WPEQt, which needs to be explicitly enabled as follows:

    $ cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_WPE_QT=ON -GNinja
    

    Then, invoke ninja, as documented in the Wiki.

    Using Yocto

    At Igalia we’re maintaining a Yocto overlay for WPE (and WebKitGTK). It was tested for the rocko, sumo and thud Yocto releases. The target platform we tested so far is the Zodiac RDU2 board, which is based on the Freescale i.MX6 QuadPlus SoC. The backend we used is WPEBackend-fdo which fits very naturally in the Mesa open-source graphics environment, inside Weston 5. The underlying graphics driver is etnaviv. In addition to this platform, WPEQt should also run on Raspberry Pi (with the WPEBackend-rdk or -fdo). Please let us know how it goes!

    To enable WPEQt in meta-webkit, the qtwpe option needs to be enabled in the wpewebkit recipe:

    PACKAGECONFIG_append_pn-wpewebkit = " qtwpe"
    

    The resulting OS image can also include WPEQt’s sample browser application:

    IMAGE_INSTALL_append = " wpewebkit-qtwpe-qml-plugin qt-wpe-simple-browser"
    

    Then, on device, the sample application can be executed either in Weston:

    $ qt-wpe-simple-browser -platform wayland-egl https://wpewebkit.org
    

    Or with the EGLFS QPA:

    $ # stop weston
    $ qt-wpe-simple-browser -platform eglfs https://wpewebkit.org
    

    Using WPEQt in your application

    A sample MiniBrowser application is included in WebKit, in the Tools/MiniBrowser/wpe/qt directory. If you have a desktop build of WPEQt you can launch it with the following command:

    $ Tools/Scripts/run-qt-wpe-minibrowser -platform wayland <url>
    

    Here’s the QML code used for the WPEQt MiniBrowser. As you can see it’s fairly straightforward!

    import QtQuick 2.11
    import QtQuick.Window 2.11
    import org.wpewebkit.qtwpe 1.0
    
    Window {
        id: main_window
        visible: true
        width: 1280
        height: 720
        title: qsTr("Hello WPE!")
    
        WPEView {
            url: initialUrl
            focus: true
            anchors.fill: parent
            onTitleChanged: {
                main_window.title = title;
            }
        }
    }
    

    As explained in this blog post, WPEQt is a simple alternative to QtWebKit. Migrating existing applications should be straightforward because the API provided by WPEQt is very similar to the QWebView API. We look forward to hearing your feedback or inquiries on the webkit-wpe mailing list and you are welcome to file bugs in Bugzilla.

    I wouldn’t close this post without acknowledging the support of my company Igalia and Zodiac, many thanks to them!

    by Philippe Normand at April 08, 2019 10:20 AM

    March 28, 2019

    Jacobo Aragunde

    The Chromium startup process

    I’ve been investigating the process of Chromium startup, the classes involved and the calls exchanged between them. This is a summary of my findings!

    There are several implementations of a browser living inside Chromium source code, known as “shells”. Chrome is the main one, of course, but there are other implementations like the content_shell, a minimal browser designed to exercise the content API; the app_shell, a minimal container for Chrome Apps, and several others.

    To investigate the difference between the different shell, we can start checking the binary entry point and find out how it evolves. This is a sequence diagram that starts from the content_shell main() function:

    content_shell and app_shell sequence diagram

    It creates two objects, ShellMainDelegate and ContentMainParams, then hands control to ContentMain() as implemented in the content module.

    Chrome’s main is very similar, it also spawns a couple objects and then hands control to ContentMain(), following exactly the same code path from that point onward:

    Chrome init sequence diagram

    If we took a look to the app_shell, it would be very similar, and it’s probably the same for other shells, so where’s the magic? What’s the difference between the many shells in Chromium? The key is the implementation of that first object created in the main() function:

    ContentMainDelegate class diagram

    Those *MainDelegate objects created in main() are implementations of ContentMainDelegate. This delegate will get the control in key moments of the initialization process, so the shells can customize what happens. Two important events are the calls to CreateContentBrowserClient and CreateContentRendererClient, which will enable the shells to customize the behavior of the Browser and Render processes.

    ContentBrowserClient class diagram

    The diagram above shows how the ContentMainDelegate implementations provided by the different shells intantiate each their own implementation of ContentBrowserClient. This class runs in the UI thread and is able to customize the browser logic, its API is able to enable or disable certain parameters (e.g. AllowGpuLaunchRetryOnIOThread), provide delegates on certain objects (e.g. GetWebContentsViewDelegate), etc. A remarkable responsibility of ContentBrowserClient is providing an implementation of BrowserMainParts, which runs code in certain stages of the initialization.

    There is a parallel hierarchy of ContentRendererClient classes, which works analogously to what we’ve just seen for ContentBrowserClient:

    ContentRendererClient class diagram

    The specific case of extensions::ShellContentRendererClient is interesting because it contains the details to setup the extension API:

    ShellContentRendererClient class diagram

    The purpose of both ExtensionsClient and ExtensionsRendererClient is to set up the extensions system. The difference lies in the knowledge of the renderer process and its concepts by ExtensionsRendererClient, only methods that make use of this knowledge should be there, otherwise they should be part of ExtensionsClient, which has a much bigger API already.
    The specific implementation of ShellExtensionsRendererClient is very simple but it owns an instance of extensions::Dispatcher; this is an important class that sets up extension features on demand whenever necessary.

    The investigation may continue in different directions, and I’ll try to share more report like this one. Finally, these are the source files for the diagrams and a shared document containing the same information in this report, where any comments, corrections and updates are welcome!

    by Jacobo Aragunde Pérez at March 28, 2019 05:03 PM

    March 27, 2019

    Michael Catanzaro

    Epiphany 3.32 and WebKitGTK 2.24

    I’m very pleased to (belatedly) announce the release of Epiphany 3.32 and WebKitGTK 2.24. This Epiphany release contains far more changes than usual, while WebKitGTK continues to improve steadily as well. There are a lot of new features to discuss, so let’s dive in.

    Dazzling New Address Bar

    The most noticeable change is the new address bar, based on libdazzle’s DzlSuggestionEntry. Christian put a lot of effort into designing this search bar to work for both Builder and Epiphany, and Jan-Michael helped integrate it into Epiphany. The result is much nicer than we had before:

    The address bar is a central component of the user interface, and this clean design is important to provide a quality user experience. It should also leave a much better first impression than we had before.

    Redesigned Tabs Menu

    Epiphany 3.24 first added a tab menu at the end of the tab bar. This isn’t very useful if you have only a few tabs open, but if you have a huge number of tabs then it’s useful to help navigate through them. Previously, this menu only showed the page titles of the tabs. For 3.32, Adrien has converted this menu to a nice popover, including favicons, volume indicators, and close buttons. These enhancements were primarily aimed at making the browser easier to use on mobile devices, where there is no tab bar, but they’re nice improvement for desktop users, too.

    (On mobile, the tab rows are much larger, to make touch selection easier.)

    Touchpad Gestures

    Epiphany now supports touchpad gestures. Jan-Michael first added a three-finger swipe to Epiphany, for navigating back and forward. Then Alexander (Exalm) decided to go and rewrite it, pushing the implementation down into WebKit to share as much code as possible with Safari. The end result is a two-finger swipe. This was much more involved than I expected as it required converting a bunch of Apple-specific Objective C++ code into cross-platform C++, but the end result was worth the effort:

    Applications that depend on WebKitGTK 2.24 may opt-in to these gestures using webkit_settings_set_enable_back_forward_navigation_gestures().

    Alexander also added pinch zoom.

    Variable Fonts

    Carlos Garcia decided to devote some attention to WebKit’s FreeType font backend, and the result speaks for itself:

    Emoji 🦇

    WebKit’s FreeType backend has supported emoji for some time, but there were a couple problems:

    • Most emoji combinations were not supported, so while characters like🧟(zombie) would work just fine, characters like 🧟‍♂️(man zombie) and 🧟‍♀️(woman zombie) were broken. Carlos fixed this. (Technically, only emoji combinations using a certain character code were broken, but that was most of them.)
    • There was no code to prefer emoji fonts for rendering emoji, meaning emoji would almost always be displayed in non-ideal fonts, usually DejaVu, resulting in a black and white glyph rather than color. Carlos fixed this, too. This seems to work properly in Firefox on some websites but not others, and it’s currently WONTFIXed in Chrome. It’s good to see WebKit ahead of the game, for once. Note that you’ll see color on this page regardless of your browser, because WordPress replaces the emoji characters with images, but I believe only WebKit can handle the characters themselves. You can test your browser here.

    Improved Adaptive Mode

    First introduced in 3.30, Adrien has continued to improve adaptive mode to ensure Epiphany works well on mobile devices. 3.32 is the first release to depend on libhandy. Adrien has converted various portions of the UI to use libhandy widgets.

    Reader Mode

    Jan-Michael’s reader mode has been available since 3.30, but new to 3.32 are many style improvements and new preferences to choose between dark and light theme, and between sans and serif font, thanks to Adrian (who is, confusingly, not Adrien). The default, sans on light background, still looks the best to me, but if you like serif fonts or dark backgrounds, now you can have them.

    JPEG 2000

    Wait, JPEG 2000? The obscure image standard not supported by Chrome or Firefox? Why would we add support for this? Simple: websites are using it. A certain piece of popular server-side software is serving JPEG 2000 images in place of normal JPEGs and even in place of PNG images to browsers with Safari-style user agents. (The software in question doesn’t even bother to change the file extensions. We’ve found far too many images in the wild ending in .png that are actually JPEG 2000.) Since this software is used on a fairly large number of websites, and our user agent is too fragile to change, we decided to support JPEG 2000 in order to make these websites work properly. So Carlos has implemented JPEG 2000 support, using the OpenJPEG library.

    This isn’t a happy event for the web, because WebKit is only as secure as its least-secure dependency, and adding new obscure image formats is not a step in the right direction. But in this case,  it is necessary.

    Mouse Gestures

    Experimental mouse gesture support is now available, thanks to Jan-Michael, if you’re willing to use the command line to enable it:

    $ gsettings set org.gnome.Epiphany.web:/org/gnome/epiphany/web/ enable-mouse-gestures true

    With this, I find myself closing tabs by dragging the mouse down and then to the right. Down and back up will reload the tab. Straight to the left is Back, straight to the right is Forward. Straight down will open a new tab. I had originally hoped to use the right mouse button for this, as in Opera, but turns out there is a difference in context menu behavior: whereas Windows apps normally pop up the context menu on button release, GTK apps open the menu on button press. That means the context menu would appear at the start of every mouse gesture. And that is certainly no good, so we’ve opted to use the middle mouse button instead. We aren’t sure whether this is a good state of things, and need your feedback to decide where to go with this feature.

    Improved Fullscreen Mode

    A cool side benefit of using libdazzle is that the header bar is now available in fullscreen mode by pressing the mouse towards the top of the screen. There’s even a nice animation to show the header bar sliding up to the top of the screen, so you know it’s there (animation disabled for fullscreen video).

    The New Tab Button

    Some users were disconcerted that the new tab button would jump from the end of the tab bar (when multiple tabs are open) back up to the end of the header bar (when there is only one tab open). Now this button will remain in one place: the header bar. Since it will no longer appear in the tab bar, Jan-Michael has moved it back to the start of the header bar, where it was from 3.12 through 3.22, rather than the end. This is mostly arbitrary, but makes for a somewhat more balanced layout.

    The history of the new tab button is rather fun: when the new tab button was first added in 3.8, it was added at the end of the header bar, but moved to the start in 3.12 to be more consistent with gedit, then moved back to the end in 3.24 to reduce the distance it would need to move to reach the tab bar. So we’ve come full circle here, twice. Only time will tell if this nomadic button will finally be able to stay put.

    New Icon

    Yes, most GNOME applications have a new icon in 3.32, so Epiphany is not special here. But I just can’t resist the urge to show it off. Thanks, Jakub!

    And More…

    It’s impossible to mention all the improvements in 3.32 in a single blog post, but I want to squeeze a few more in.

    Alexander (Exalm) landed several improvements to Epiphany’s theme, especially the incognito mode theme, which needed work to look good with the new Adwaita in 3.32.

    Jan-Michael added an animation for completed downloads, so we don’t need to annoyingly pop open the download popover anymore to let you know that your download has completed.

    Carlos Garcia added support for automation mode. This means Epiphany can now be used for running automated tests with WebDriver (e.g. with Selenium). Using the new automation mode, I’ve upstreamed support for running tests with Epiphany to the Web Platform Tests (WPT) project, the test suite used by web engine developers to test standards conformance.

    Carlos also reworked the implementation of script dialogs so that they are now modal only to their associated web view, not modal to the entire application. This means you can just close the browser tab if a particular website is abusing script dialogs in a problematic way, e.g. by continuously opening new dialogs.

    Patrick has improved the directory layout Epiphany uses to store data on disk to avoid storing non-configuration data under ~/.config, and reworked the internals of the password manager to mitigate Spectre-related concerns. He also implemented Happy Eyeballs support in GLib, so Epiphany will now fall back to an IPv4 connection if IPv6 is available but broken.

    Now Contains 100% Less Punctuation!

    Did you notice any + signs missing in this blog? Following GTK+’s rename to GTK, WebKitGTK+ has been renamed to WebKitGTK. You’re welcome.

    Whither Pop!_OS?

    Extra Credit

    Although Epiphany 3.32 has been the work of many developers, as you’ve seen, I want to give special credit Epiphany’s newest maintainer, Jan-Michael. He has closed a considerable number of bugs, landed too many improvements to mention here, and has been a tremendous help. Thank you!

    Now, onward to 3.34!

    by Michael Catanzaro at March 27, 2019 12:41 PM

    March 19, 2019

    Michael Catanzaro

    Epiphany Technology Preview Upgrade Requires Manual Intervention

    Jan-Michael has recently changed Epiphany Technology Preview to use a separate app ID. Instead of org.gnome.Epiphany, it will now be org.gnome.Epiphany.Devel, to avoid clashing with your system version of Epiphany. You can now have separate desktop icons for both system Epiphany and Epiphany Technology Preview at the same time.

    Because flatpak doesn’t provide any way to rename an app ID, this means it’s the end of the road for previous installations of Epiphany Technology Preview. Manual intervention is required to upgrade. Fortunately, this is a one-time hurdle, and it is not hard:

    $ flatpak uninstall org.gnome.Epiphany

    Uninstall the old Epiphany…

    $ flatpak install gnome-apps-nightly org.gnome.Epiphany.Devel org.gnome.Epiphany.Devel.Debug

    …install the new one, assuming that your remote is named gnome-apps-nightly (the name used locally may differ), and that you also want to install debuginfo to make it possible to debug it…

    $ mv ~/.var/app/org.gnome.Epiphany ~/.var/app/org.gnome.Epiphany.Devel

    …and move your personal data from the old app to the new one.

    Then don’t forget to make it your default web browser under System Settings -> Details -> Default Applications. Thanks for testing Epiphany Technology Preview!

    by Michael Catanzaro at March 19, 2019 06:39 PM

    March 07, 2019

    Brian Kardell

    Interesting Custom Element Data Begins

    Interesting Custom Element Data Begins

    A while back I wrote a piece asking how we begin to think about using data to move forward with standardization, and called for ways to help get data. One thing I did was request a new query from the HTTPArchive including data on “dasherized elements”. Keep in mind that the while the top 1.2 million sites or so in this dataset a lot of data, it is still a small sampling and has its own biases. It reports mostly on a particular ‘kind’ of site which is not representative of the giant bottom of the iceberg that lives beneath the surface, inside of corporate intranets, behind logins and paywalls and so on. Ultimately, we need more - but you have to start somewhere.

    Yesterday, Simon Pieters answered with this tweet linking to an HTTPArchive post and yielding this dataset which is amazing.

    It’s still a little hard to track because we can’t tell whether that is one page that includes an element a bunch of times, or many pages that include them, but this is an awesome start!

    It’s a little hard to view that dataset and, while the attributes are awesome in also helping us know more about what that element is, but it also means some noise and that the counts are slightly confused, so I took that, ran it through some processing and created a few other views (linked where appropriate below)…

    Here’s some preliminary, interesting observations:

    Even from this small sample, the HTTPArchive query that reports on use of HTML elements searches for only 140 known specific elements that are in a standard, but this report shows over 24k different "dasherized tags that appear in the top 1.2 million pages. Wow! What this tells me is that there are a lot of dasherized tags in use.

    It important to note that that doesn’t mean these are “custom elements” proper, but it also doesn’t really matter: What we care about really, is what you were trying to say there, semantically.

    Of these, there are 3,227 different unique prefixes. These may or may not indicate common authors, but they might at least be a helpful way to look for popular ‘sets’ of elements. For example, it’s unsuprising to see the amp- prefix in there given all of the boosts that it gets, and it’s nice to see them all linked in and counted there. I’ve organized a json output that looks like this

    To break them down into some further semi-arbitrary groups for summary:

    • ~7.8k of these occur between 1 and 100 times.
    • 31 of these occur between 101-200 times
    • 14 occur between 200-500 times
    • 4 occur between 500-1000 times
    • 4 occur > 500 times

    One personal note: I’m kind of sad to see that the most popular one is amp-auto-ads occuring a whopping 3718 times and it’s not remotely the only thing that would appear to be about ads. In fact, amp-ad also occurs 395 times and there are many other non-amp elements that appear to be ad related. But... I guess the web has a lot of ads. Who knew.

    More importantly, it’s interesting to look at this file from the bottom up (or the grouped one) though and think about whether we can identify the possible sources of these, or ‘tag’ them according to common purposes somehow. I’d like to think about how we could get this into a format thatIf you feel like you’re potentially interested in digging in and helping think about this, identifying where some of those come from, what their purpose is, etc, getting that data into a a place where we can do that kind of stuff better – whatever, feel free to leave comments on any of these gists or cc me (@briankardell) on twitter.

    March 07, 2019 05:00 AM