Planet Igalia

January 16, 2019

Víctor Jáquez

Rust bindings for GStreamerGL: Memoirs

Rust is a great programming language but the community around it’s just amazing. Those are the ingredients for the craft of useful software tools, just like Servo, an experimental browser engine designed for tasks isolation and high parallelization.

Both projects, Rust and Servo, are funded by ">">Mozilla.

Thanks to Mozilla and Igalia I have the opportunity to work on Servo, adding it HTML5 multimedia features.

First, with the help of Fernando Jiménez, we finished what my colleague Philippe Normand and Sebastian Dröge (one of my programming heroes) started: a media player in Rust designed to be integrated in Servo. This media player lives in its own crate: servo/media along with the WebAudio engine. A crate, in Rust jargon, is like a library. This crate is (very ad-hocly) designed to be multimedia framework agnostic, but the only backend right now is for GStreamer. Later we integrated it into Servo adding an initial support for audio and video tags.

Currently, servo/media passes, through a IPC channel, the array with the whole frame to render in Servo. This implies, at least, one copy of the frame in memory, and we would like to avoid it.

For painting and compositing the web content, Servo uses WebRender, a crate designed to use the GPU intensively. Thus, if instead of raw frame data we pass OpenGL textures to WebRender the performance could be enhanced notoriously.

Luckily, GStreamer already supports the uploading, downloading, painting and composition of video frames as OpenGL textures with the OpenGL plugin and its OpenGL Integration library. Even more, with plugins such as GStreamer-VAAPI, Gst-OMX (OpenMAX), and others, it’s possible to process video without using the main CPU or its mapped memory in different platforms.

But from what’s available in GStreamer to what it’s available in Rust there’s a distance. Nonetheless, Sebastian has putting a lot of effort in the Rust bindings for GStreamer, either for applications and plugins, sadly, GStreamer’s OpenGL Integration library (GstGL for short) wasn’t available at that time. So I rolled up my sleeves and got to work on the bindings.

These are the stories of that work.

As GStreamer shares with GTK+ the GObject framework and its introspection mechanism, both projects have collaborated on the required infrastructure to support Rust bindings. Thanks to all the GNOME folks who are working on the intercommunication between Rust and GObject. The quest has been long and complex, since Rust doesn’t map all the object oriented concepts, and GObject, being a set of practices and software helpers to do object oriented programming with C, its usage is not homogeneous.

The Rubicon that ease the generation of Rust bindings for GObject-based projects is GIR, a tool, written in Rust, that reads gir files, along with metadata in toml, and outputs two types of bindings: sys and api.

Rust can call external functions through FFI (foreign function interface), which is just a declaration of a C function with Rust types. But these functions are considered unsafe. The sys bindings, are just the exporting of the C function for the library organized by the library’s namespace.

The next step is to create a safe and rustified API. This is the api bindings.

As we said, GObject libraries are quite homogeneous, and even following the introspection annotations, there will be cases where GIR won’t be able to generate the correct bindings. For that reason GIR is constantly evolving, looking for a common way to solve the corner cases that exist in every GObject project. For example, these are my patches in order to generate the GstGL bindings.

The done tasks were:

For this document we assume that the reader has a functional Rust setup and they know the basic concepts.

Clone and build gir

$ cd ~/ws
$ git clone https://github.com/gtk-rs/gir.git
$ cd gir
$ cargo build --release

The reason to build gir in release mode is because, otherwise would be very slow.

For sys bindings.

These kind of bindings are normally straight forward (and unsafe) since they only map the C API to Rust via FFI mechanism.

$ cd ~/ws
$ git clone https://gitlab.freedesktop.org/gstreamer/gstreamer-rs-sys.git
$ cd gstreamer-rs-sys
$ cp /usr/share/gir-1.0/GstGL-1.0.gir gir-files/
  1. Verify if the gir file is more o less correct
    1. If there something strange, we should fix the code that generated it.
    2. If that is not possible, the last resource is to fix the gir file directly, which is just XML, not manually but through a script using xmlstartlet. See fix.sh in gtk-rs as example.
  2. Create the toml file with the metadata required to create the bindings. In other words, this file contains the exceptions, rules and options used by the tool to generated the bindings. See Gir_GstGL.toml in gstreamer-rs-sys as example. The documentation of the toml file is in the gir’s README.md file.
$ ~/ws/gir/target/release/gir -c Gir_GstGL.toml

This command will generate, as specified in the toml file (target_path), a crate in the directory named gstreamer-gl-sys.

Api bindings.

These type of bindings may require more manual work since their purpose is to offer a rustified API of the library, with all its syntactic sugar, semantics, and so on. But in general terms, the process is similar:

$ cd ~/ws
$ git clone https://gitlab.freedesktop.org/gstreamer/gstreamer-sys.git
$ cd gstreamer-sys
$ cp /usr/share/gir-1.0/GstGL-1.0.gir gir-files/

Again, it would be possible to end up applying fixes to the gir file through a fix.sh script using xmlstartlet.

And again, the confection of the toml file might take a lot of time, by trial and error, by cleaning and tidying up the API. See Gir_GstGL.toml in gstreamer-rs as example.

$ ~/ws/gir/target/release/gir -c Gir_GstGL.toml

A good way to test your bindings is by crafting a test application, which shows how to use the API. Personally I devoted a ton of time in the test application for GstGL, but worth it. It made me aware of a missing part in the crate used for GL applications in Rust, named Glutin, which was a way to get the used EGLDisplay. So also worked on that and sent a pull request that was recently merged. The sweets of the free software development.

Nowadays I’m integrating GstGL API in servo/media and later, Servo!

by vjaquez at January 16, 2019 07:42 PM

January 14, 2019

Andrés Gómez

matrix-send me a notification!

When you are working in the console of an Un*x system you always have the possibility of using some kind of notification system to warn you when a task has completed. Quite typically, that would involve an email that could arrive to your box’ local inbox or, if you have a mail agent properly configure, to some other inbox in the Internet.

With the arriving of the Instant Messaging systems you could somehow move from the good old email notification to some other fancy service. That has been my prefered method for quite a while since I understand email as a “non-instant” messaging system. Basically, I do not want to get instant notifications when a mail arrives. Add to that the hassle of setting some kind of filter criteria to get the notifications only for specific mail rules and the not yet universally supported IMAP4 push method, instead of pulling for newly arrived mail …

Anyway, long story short, for some time now we are using [matrix] as our Instant Messaging service at Igalia so, why not getting notifications there when a task is completed?

Yes, you have guessed correctly, that’s possible and, actually, it’s very easy to set up, specially with the help of matrix-send.

First, you need an account that will send you the notification(s). Ideally, that would be a bot user, but it could be any account. Then, you have get an access token with such user so you can interact with the matrix server from the command line as if it would be any other ordinary matrix client. Finally, you need to create a chat room between that user and your own in order to keep the communication ongoing. All this is explained in matrix’ client-server API documentation but, to make things easier, it would go as follows:

$ curl -XPOST -d '{"user":"<matrix-user>", "password":"<password>", "type":"m.login.password"}' "https://<matrix-server>/_matrix/client/r0/login"
{
    "access_token": "<access-token>",
    "device_id": "<device-id>",
    "home_server": "<home-server>",
    "user_id": "@<matrix-user>:<home-server>"
}

This will give you the needed access-token.

Now, from your regular matrix client, invite the bot user to a conversation in a new room. Check in the configuration of the new room for its internal ID. It would be something like
!<internal-id>:<home-server>.

Then, accept such invitation from the command line:

$ curl -XPOST -d '{}' "https://<matrix-server>/_matrix/client/r0/rooms/%21<internal-room-id>:<home-server>/join?access_token=<access-token>"
{
    "room_id": "!<internal-room-id>:<home-server>"
}

All that is left is to configure matrix-send and start using it. Mind you, I’ve done a small addition that it has not been merged yet so I would just clone from my fork.

The configuration file would look like this:

$ cat ~/.config/matrix-send/config.ini
[DEFAULT]
endpoint=https://<matrix-server>/_matrix/
access_token=<access-token>
channel_id=!<internal-room-id>:<home-server>
msgtype=m.text

The interesting addition from my own is the msgtype field. By default, in matrix-send its value is m.notice which, depending on the configuration, quite typically won’t trigger a notification in your matrix client.

All that is left is to make matrix-send executable and test it:

$ chmod +x <path-to-matrix-send>/matrix-send.py
$ <path-to-matrix-send>/matrix-send.py "Hello World!"

by tanty at January 14, 2019 10:05 PM

January 10, 2019

Manuel Rego

An introduction to CSS Containment

Igalia has been recently working on the implementation of css-contain in Chromium by providing some fixes and optimizations based on this standard. This is a brief blog post trying to give an introduction to the spec, explain the status of things, the work done during past year, and some plans for the future.

What’s css-contain?

The main goal of CSS Containment standard is to improve the rendering performance of web pages, allowing the isolation of a subtree from the rest of the document. This specification only introduces one new CSS property called contain with different possible values. Browser engines can use that information to implement optimizations and avoid doing extra work when they know which subtrees are independent of the rest of the page.

Let’s explain what is this about and why this can be can bring performance improvements in complex websites. Imagine that you have a big HTML page which generates a complex DOM tree, but you know that some parts of that page are totally independent of the rest of the page and the content in those parts is modified at some point.

Browser engines usually try to avoid doing more work than needed and use some heuristics to avoid spending more time than required. However there are lots of corner cases and complex situations in which the browser needs to actually recompute the whole webpage. To improve these scenarios the author has to identify which parts (subtrees) of their website are independent and isolate them from the rest of the page thanks to the contain property. Then when there are changes in some of those subrees the rendering engine will be able to avoid doing any work outsize of the subtree boundaries.

Not everything is for free, when you use contain there are some restrictions that will affect those elements, so the browser is totally certain it can apply optimizations without causing any breakage (e.g. you need to manually set the size of the elment if you want to use size containment).

The CSS Containment specification defines four values for the contain property, one per each type of containment:

  • layout: The internal layout of the element is totally isolated from the rest of the page, it’s not affected by anything outside and its contents cannot have any effect on the ancestors.
  • paint: Descendants of the element cannot be displayed outside its bounds, nothing will overflow this element (or if it does it won’t be visible).
  • size: The size of the element can be computed without checking its children, the element dimensions are independent of its contents.
  • style: The effects of counters and quotes cannot escape this element, so they are isolated from the rest of the page.
    Note that regarding style containment there is an ongoing discussion on the CSS Working Group about how useful it is (due to the narrowed scope of counters and quotes).

You can combine the different type of containments as you wish, but the spec also provides two extra values that are a kind of “shorthand” for the other four:

  • content: Which is equivalent to contain: layout paint style.
  • strict: This is the same than having all four types of containment, so it’s equivalent to contain: layout paint size style.

Example

Let’s show an example of how CSS Containment can help to improve the performance of a webpage.

Imagine a page with lots of elements, in this case 10,000 elements like this:

  <div class="item">
    <div>Lorem ipsum...</div>
  </div>

And that that it modifies the content of one of the inner DIVs trough textContent attribute.

If you don’t use css-contain, even when the change is on a single element, Chromium spends a lot of time on layout because it traverses the whole DOM tree (which in this case is big as it has 10,000 elements).

CSS Containment Example DOM Tree CSS Containment Example DOM Tree

Here is when contain property comes to the rescue. In this example the DIV item has fixed size, and the contents we’re changing in the inner DIV will never overflow it. So we can apply contain: strict to the item, that way the browser won’t need to visit the rest of the nodes when something changes inside an item, it can stop checking things on that element and avoid going outside.

Notice that if the content overflows the item it would get clipped, also if we don’t set a fixed size for the item it’ll be rendered as an empty box so nothing would be visible (actually in this example the borders would be present but they would be the only visible thing).

CSS Containment Example CSS Containment Example

Despite how simple is each of the items in this example, we’re getting a big improvement by using CSS Containment in layout time going down from ~4ms to ~0.04ms which is a huge difference. Imagine what would happen if the DOM tree has very complex structures and contents but only a small part of the page gets modified, if you can isolate that from the rest of the page you could get similar benefits.

State of the art

This is not a new spec, Chrome 52 shipped the initial support by July 2016, but during last year there has been some extra development related to it and that’s what I want to highlight in this blog post.

First of all many specification issues have been fixed and some of them imply changes on the implementations, most of this work has been carried on by Florian Rivoal in collaboration with the CSS Working Group.

Not only that but on the tests side Gérard Talbot has completed the test suite in the web-platform-tests (WPT) repository, which is really important to fix bugs on the implementations and ensure interoperability.

In my case I’ve been working on the Chromium implementation fixing several bugs and interoperability issues and getting it up to date according to the last specification changes. I took advantage of the WPT test suite to do this work and also contributed back a bunch of tests there. I also imported Firefox tests into Chromium to improve interop (even did a small Firefox patch as part of this work).

Last, it’s worth to notice that Firefox has been actively working on the implementation of css-contain during last year (you can test it by enabling the runtime flag layout.css.contain.enabled). Hopefully that would bring a second browser engine shipping the spec in the future.

Wrap-up

CSS Containment is a nice and simple specification that can be useful to improve web rendering performance in many different use cases. It’s true that currently it’s only supported by Chromium (remember that Firefox is working on it too) and that more improvements and optimizations can be implemented based on it, still it seems to have a huge potential.

Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

One more time all the work from Igalia related to css-contain has been sponsored by Bloomberg as part of our ongoing collaboration.

Bloomberg has some complex UIs that are taking advantage of css-contain to improve the rendering performance, in future blog posts we’ll talk about some of these cases and the optimizations that have been implemented on the rendering engine to improve them.

January 10, 2019 11:00 PM

Diego Pino

The eXpress Data Path

In the previous article I briefly introduced XDP (eXpress Data Path) and eBPF, the multipurpose in-kernel virtual machine. On the XDP side, I focused only on the motivations behind this new technology, the reasons why rearchitecting the Linux kernel networking layer to enable faster packet processing. However, I didn’t get much into the details on how XDP works. In this new blog post I try to go deeper into XDP.

XDP: A fast path for packet processing

The design of XDP has its roots in a DDoS attack mitigation solution presented by Cloudflare at Netdev 1.1. Cloudflare leverages heavily on iptables, which according to their own metrics is able to handle 1 Mpps on a decent server (Source: Why we use the Linux kernel’s TCP stack). In the event of a DDoS attack, the amount of spoofed traffic can be up to 3 Mpps. Under those circumstances, a Linux box starts to be overflooded by IRQ interruptions until it becomes unusable.

Because Cloudflare wanted to keep the convenience of using iptables (and the rest of the kernel’s network stack), they couldn’t go with a solution that takes full control of the hardware, such as DPDK. Their solution consisted of implementing what they called a “partial kernel bypass”. Some queues of the NIC are still attached to the kernel while others are attached to an user-space program that decides whether a packet should be dropped or not. By dropping packets at the lowest point of the stack, the amount of traffic that reaches the kernel’s networking subsystem gets significantly reduced.

Cloudflare’s solution used the Netmap toolkit to implement its partial kernel bypass (Source: Single Rx queue kernel bypass with Netmap). However this idea could be generalized by adding a checkpoint in the Linux kernel network stack, preferably as soon as a packet is received in the NIC. This checkpoint should pass a packet to an user-space program that will decide what to do with it: drop it or let it continue through the normal path.

Luckily, Linux already features a mechanism that allows user-space code execution within the kernel: the eBPF VM. So the solution seemed obvious.

Linux network stack with XDP
Linux network stack with XDP

Packet operations

Every network function, no matter how complex it is, consists of a series of basic operations:

  • Firewall: read incoming packets, compare them to a table of rules and execute an action: forward or drop.
  • NAT: read incoming packets, modify headers and forward packet.
  • Tunelling: read incoming packets, create a new packet, embed packet into new one and forward it.

XDP passes packets to our eBPF program which decides what to do with them. We can read them or modify them if we need it. We can also access to helper functions to parse packets, compute checksums, and other functionalities, at no cost (avoiding system call cost penalties). And thanks to eBPF Maps we have access to complex data structures for persistent data storage, like tables. We are also able to decide what to do with a packet. Are we going to drop it? Forward it? To control a packet’s processing logic, XDP provides a set of predefined actions:

  • XDP_PASS: pass the packet to the normal network stack.
  • XDP_DROP: very fast drop.
  • XDP_TX: forward or TX-bounce back-out same interface.
  • XDP_REDIRECT: redirects the packet to another NIC or CPU.
  • XDP_ABORTED: indicates eBPF program error.

XDP_PASS, XDP_TX and XDP_REDIRECT are specific cases of a forwarding action, whereas XDP_ABORTED is actually treated as a packet drop.

Let’s take a look at one example that uses most of these elements to program a simple network function.

Example: An IPv6 packet filter

The canonical example when introducing XDP is a DDoS filter. What such network function does is to drop packets if they’re coming from a suspicious origin. In my case, I’m going with something even simpler: a function that filters out all traffic except IPv6.

The advantage of this simpler function is that we don’t need to manage a list of suspicious addresses. Our program will simply examine the ethertype value of a packet and let it continue through the network stack or drop it depending on whether is an IPv6 packet or not.

SEC("prog")
int xdp_ipv6_filter_program(struct xdp_md *ctx)
{
    void *data_end = (void *)(long)ctx->data_end;
    void *data     = (void *)(long)ctx->data;
    struct ethhdr *eth = data;
    u16 eth_type = 0;

    if (!(parse_eth(eth, data_end, eth_type))) {
        bpf_debug("Debug: Cannot parse L2\n");
        return XDP_PASS;
    }

    bpf_debug("Debug: eth_type:0x%x\n", ntohs(eth_type));
    if (eth_type == ntohs(0x86dd)) {
        return XDP_PASS;
    } else {
        return XDP_DROP;
    }
}

The function xdp_ipv6_filter_program is our main program. We define a new section in the binary called prog. This serves as a hook between our program and XDP. Whenever XDP receives a packet, our code will be executed.

ctx represents a context, a struct which contains all the data necessary to access a packet. Our program calls parse_eth to fetch the ethertype. Then checks whether its value is 0x86dd (IPv6 ethertype), in that case the packet passes. Otherwise the packet is dropped. In addition, all the ethertype values are printed for debugging purposes.

bpf_debug is in fact a macro defined as:

#define bpf_debug(fmt, ...)                          \
    ({                                               \
        char ____fmt[] = fmt;                        \
        bpf_trace_printk(____fmt, sizeof(____fmt),   \
            ##__VA_ARGS__);                          \
    })

It uses the function bpf_trace_printk under the hood, a function which prints out messages in /sys/kernel/debug/tracing/trace_pipe.

The function parse_eth takes a packet’s beginning and end and parses its content.

static __always_inline
bool parse_eth(struct ethhdr *eth, void *data_end, u16 *eth_type)
{
    u64 offset;

    offset = sizeof(*eth);
    if ((void *)eth + offset > data_end)
        return false;
    *eth_type = eth->h_proto;
    return true;
}

Running external code in the kernel involves certain risks. For instance, an infinite loop may freeze the kernel or a program may access an unrestricted area of memory. To avoid these potential hazards a verifier is run when the eBPF code is loaded. The verifier walks all possible code paths, checking our program doesn’t access out-of-range memory and there are not out of bound jumps. The verifier also ensures the program terminates in finite time.

The snippets above conform our eBPF program. Now we just need to compile it (Full source code is available at: xdp_ipv6_filter).

$ make

Which generates xdp_ipv6_filter.o, the eBPF object file.

Now we’re going to load this object file into a network interface. There are two ways to do that:

  • Write an user-space program that loads the object file and attaches it to a network interface.
  • Use iproute2 to load the object file to an interface.

For this example, I’m going to use the latter method.

Currently there’s a limited amount of network interfaces that support XDP (ixgbe, i40e, mlx5, veth, tap, tun, virtio_net and others), although the list is growing. Some of this network interfaces support XDP at driver level. That means, the XDP hook is implemented at the lowest point in the networking layer, just when the NIC receives a packet in the Rx ring. In other cases, the XDP hook is implemented at a higher point in the network stack. The former method offers better performance results, although the latter makes XDP available for any network interface.

Luckily, veth interfaces are supported by XDP. I’m going to create a veth pair and attach the eBPF program to one of its ends. Remember that a veth always comes in pairs. It’s like a virtual patch cable connecting two interfaces. Whatever is transmited in one of the ends arrives to the other end and viceversa.

$ sudo ip link add dev veth0 type veth peer name veth1
$ sudo ip link set up dev veth0
$ sudo ip link set up dev veth1

Now I attach the eBPF program to veth1:

$ sudo ip link set dev veth1 xdp object xdp_ipv6_filter.o

You may have noticed I called the section for the eBPF program “prog”. That’s the name of the section iproute2 expects to find and naming the section with a different name will result into an error.

If the program was successfully loaded I should see an xdp flag in the veth1 interface:

$ sudo ip link sh veth1
8: veth1@veth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 32:05:fc:9a:d8:75 brd ff:ff:ff:ff:ff:ff
    prog/xdp id 32 tag bdb81fb6a5cf3154 jited

To verify my program works as expected, I’m going to push a mix of IPv4 and IPv6 packets to veth0 (ipv4-and-ipv6-data.pcap). My sample has a total of 20 packets (10 IPv4 and 10 IPv6). Before doing that though, I’m going to launch a tcpdump program on veth1 which is ready to capture only 10 IPv6 packets.

$ sudo tcpdump "ip6" -i veth1 -w captured.pcap -c 10
tcpdump: listening on veth1, link-type EN10MB (Ethernet), capture size 262144 bytes

Send packets to veth0:

$ sudo tcpreplay -i veth0 ipv4-and-ipv6-data.pcap

The filtered packets arrived at the other end. The tcpdump program terminates since all the expected packets were received.

10 packets captured
10 packets received by filter
0 packets dropped by kernel

We can also print out /sys/kernel/debug/tracing/trace_pipe to check the ethertype values listed:

$ sudo cat /sys/kernel/debug/tracing/trace_pipe
tcpreplay-4496  [003] ..s1 15472.046835: 0: Debug: eth_type:0x86dd
tcpreplay-4496  [003] ..s1 15472.046847: 0: Debug: eth_type:0x86dd
tcpreplay-4496  [003] ..s1 15472.046855: 0: Debug: eth_type:0x86dd
tcpreplay-4496  [003] ..s1 15472.046862: 0: Debug: eth_type:0x86dd
tcpreplay-4496  [003] ..s1 15472.046869: 0: Debug: eth_type:0x86dd
tcpreplay-4496  [003] ..s1 15472.046878: 0: Debug: eth_type:0x800
tcpreplay-4496  [003] ..s1 15472.046885: 0: Debug: eth_type:0x800
tcpreplay-4496  [003] ..s1 15472.046892: 0: Debug: eth_type:0x800
tcpreplay-4496  [003] ..s1 15472.046903: 0: Debug: eth_type:0x800
tcpreplay-4496  [003] ..s1 15472.046911: 0: Debug: eth_type:0x800
...

XDP: The future of in-kernel packet processing?

XDP started as a fast path for certain use cases, especially the ones which could result into an early packet drop (like a DDoS attack prevention solution). However, since a network function is nothing else but a combination of basic primitives (reads, writes, forwarding, dropping…), all of them available via XDP/eBPF, it could possible to use XDP for more than packet dropping. It could be used, in fact, to implement any network function.

So what started as a fast path gradually is becoming the normal path. We’re seeing now how tools such as iptables are getting rewritten in XDP/eBPF, keeping their user-level interfaces intact. The enormous performance gains of this new approach makes the effort worth it. And since the hunger for more performance gains never ends, it seems reasonable to think that any other tool that can be possibly written in XDP/eBPF will follow a similar fate.

iptables vs nftables vs bpfilter
iptables vs nftables vs bpfilter

Source: Why is the kernel community replacing iptables with BPF?

Summary

In this article I took a closer look at XDP. I explained the motivations that lead to its design. Through a simple example, I showed how XDP and eBPF work together to perform fast packet processing inside the kernel. XDP provides check points within the kernel’s network stack. An eBPF program can hook to XDP events to perform an operation on a packet (modify its headers, drop it, forward it, etc).

XDP offers high-performance packet processing while maintaining interoperatibility with the rest of networking subsystem, an advantage over full kernel bypass solutions. I didn’t get much into the internals of XDP and how it interacts with other parts of the networking subsystem though. I encourage checking the first two links in the recommended readings section for further understanding on XDP internals.

In the next article, the last in the series, I will cover the new AF_XDP socket address family and the implementation of a Snabb bridge for this new interface.

Recommended readings:

January 10, 2019 10:00 AM

January 08, 2019

Carlos García Campos

Epiphany automation mode

Last week I finally found some time to add the automation mode to Epiphany, that allows to run automated tests using WebDriver. It’s important to note that the automation mode is not expected to be used by users or applications to control the browser remotely, but only by WebDriver automated tests. For that reason, the automation mode is incompatible with a primary user profile. There are a few other things affected by the auotmation mode:

  • There’s no persistency. A private profile is created in tmp and only ephemeral web contexts are used.
  • URL entry is not editable, since users are not expected to interact with the browser.
  • An info bar is shown to notify the user that the browser is being controlled by automation.
  • The window decoration is orange to make it even clearer that the browser is running in automation mode.

So, how can I write tests to be run in Epiphany? First, you need to install a recently enough selenium. For now, only the python API is supported. Selenium doesn’t have an Epiphany driver, but the WebKitGTK driver can be used with any WebKitGTK+ based browser, by providing the browser information as part of session capabilities.

from selenium import webdriver

options = webdriver.WebKitGTKOptions()
options.binary_location = 'epiphany'
options.add_argument('--automation-mode')
options.set_capability('browserName', 'Epiphany')
options.set_capability('version', '3.31.4')

ephy = webdriver.WebKitGTK(options=options, desired_capabilities={})
ephy.get('http://www.webkitgtk.org')
ephy.quit()

This is a very simple example that just opens Epiphany in automation mode, loads http://www.webkitgtk.org and closes Epiphany. A few comments about the example:

  • Version 3.31.4 will be the first one including the automation mode.
  • The parameter desired_capabilities shouldn’t be needed, but there’s a bug in selenium that has been fixed very recently.
  • WebKitGTKOptions.set_capability was added in selenium 3.14, if you have an older version you can use the following snippet instead
from selenium import webdriver

options = webdriver.WebKitGTKOptions()
options.binary_location = 'epiphany'
options.add_argument('--automation-mode')
capabilities = options.to_capabilities()
capabilities['browserName'] = 'Epiphany'
capabilities['version'] = '3.31.4'

ephy = webdriver.WebKitGTK(desired_capabilities=capabilities)
ephy.get('http://www.webkitgtk.org')
ephy.quit()

To simplify the driver instantation you can create your own Epiphany driver derived from the WebKitGTK one:

from selenium import webdriver

class Epiphany(webdriver.WebKitGTK):
    def __init__(self):
        options = webdriver.WebKitGTKOptions()
        options.binary_location = 'epiphany'
        options.add_argument('--automation-mode')
        options.set_capability('browserName', 'Epiphany')
        options.set_capability('version', '3.31.4')

        webdriver.WebKitGTK.__init__(self, options=options, desired_capabilities={})

ephy = Epiphany()
ephy.get('http://www.webkitgtk.org')
ephy.quit()

The same for selenium < 3.14

from selenium import webdriver

class Epiphany(webdriver.WebKitGTK):
    def __init__(self):
        options = webdriver.WebKitGTKOptions()
        options.binary_location = 'epiphany'
        options.add_argument('--automation-mode')
        capabilities = options.to_capabilities()
        capabilities['browserName'] = 'Epiphany'
        capabilities['version'] = '3.31.4'

        webdriver.WebKitGTK.__init__(self, desired_capabilities=capabilities)

ephy = Epiphany()
ephy.get('http://www.webkitgtk.org')
ephy.quit()

by carlos garcia campos at January 08, 2019 05:22 PM

January 07, 2019

Diego Pino

A brief introduction to XDP and eBPF

In a previous post I explained how to build a kernel with XDP (eXpress Data Path) support. Having that feature enabled is mandatory in order to use it. XDP is a new Linux kernel component that highly improves packet processing performance.

In the last years, we have seen an upraise of programming toolkits and techniques to overcome the limitations of the Linux kernel when it comes to do high-performance packet processing. One of the most popular techniques is kernel bypass which means to skip the kernel’s networking layer and do all packet processing from user-space. Kernel bypass also involves to manage the NIC from user-space, in other words, to rely on an user-space driver to handle the NIC.

By giving full control of the NIC to an user-space program, we reduce the overhead introduced by the kernel (context switching, networking layer processing, interruptions, etc), which is relevant enough when working at speeds of 10Gbps or higher. Kernel bypass plus a combination of other features (batch packet processing) and performance tuning adjustments (NUMA awareness, CPU isolation, etc) conform the basis of high-performance user-space networking. Perhaps the poster child of this new approach to packet processing is Intel’s DPDK (Data Plane Development Kit), although other well-know toolkits and techniques are Cisco’s VPP (Vector Packet Processing), Netmap and of course Snabb.

The disadvantages of user-space networking are several:

  • An OS’s kernel is an abstraction layer for hardware resources. Since user-space programs need to manage their resources directly, they also need to manage their hardware. That often means to program their own drivers.
  • As the kernel-space is completely skipped, all the networking functionality provided by the kernel is skipped too. User-space programs need to reimplement functionality that might be already provided by the kernel or the OS.
  • Programs work as sandboxes, which severely limit their ability to interact, and be integrated, with other parts of the OS.

Essentially, user-space networking achieves high-speed performance by moving packet-processing out of the kernel’s realm into user-space. XDP does in fact the opposite: it moves user-space networking programs (filters, mappers, routing, etc) into the kernel’s realm. XDP allow us to execute our network function as soon as a packet hits the NIC, and before it starts moving upwards into the kernel’s networking subsystem, which results into a significant increase of packet-processing speed. But how does the kernel make possible for an user to execute their programs within the kernel’s realm? Before answering this question we need to take a look at BPF.

BPF and eBPF

Despite its somehow misleading name, BPF (Berkeley Packet Filtering) is in fact a virtual machine model. This VM was originally designed for packet filtering processing, thus its name.

One of the most prominent users of BPF is the tool tcpdump. When capturing packets with tcpdump, an user can define a packet-filtering expression. Only packets that match that expression will actually be captured. For instance, the expression “tcp dst port 80” captures all TCP packets which destination port equals to 80. This expression can be reduced by a compiler to BPF bytecode.

$ sudo tcpdump -d "tcp dst port 80"
(000) ldh      [12]
(001) jeq      #0x86dd          jt 2    jf 6
(002) ldb      [20]
(003) jeq      #0x6             jt 4    jf 15
(004) ldh      [56]
(005) jeq      #0x50            jt 14   jf 15
(006) jeq      #0x800           jt 7    jf 15
(007) ldb      [23]
(008) jeq      #0x6             jt 9    jf 15
(009) ldh      [20]
(010) jset     #0x1fff          jt 15   jf 11
(011) ldxb     4*([14]&0xf)
(012) ldh      [x + 16]
(013) jeq      #0x50            jt 14   jf 15
(014) ret      #262144
(015) ret      #0

Basically what the program above does is:

  • Instruction (000): loads the packet’s offset 12, as a 16-bit word, into the accumulator. Offset 12 represents a packet’s ethertype.
  • Instruction (001): compares the value of the accumulator to 0x86dd, which is the ethertype value for IPv6. If the result is true, the program counter jumps to instruction (002), if not it jumps to (006).
  • Instruction (006): compares the value to 0x800 (ethertype value of IPv4). If true jump to (007), if not (015).

And so forth, until the packet-filtering program returns a result. This result is generally a boolean. Returning a non-zero value (instruction (014)) means the packet matched, whereas returning a zero value (instruction (015)) means the packet didn’t match.

The BPF VM and its bytecode was introduced by Steve McCanne and Van Jacobson in late 1992, in their paper The BSD Packet Filter: A New Architecture for User-level Packet Capture, and it was presented for the first time at Usenix Conference Winter ‘93.

Since BPF is a VM, it defines an environment where programs are executed. Besides a bytecode, it also defines a packet-based memory model (load instructions are implicitly done on the processing packet), registers (A and X; Accumulator and Index register), a scratch memory store and an implicit Program Counter. Interestingly, BPF’s bytecode was modeled after the Motorola 6502 ISA. As Steve McCanne recalls in his Sharkfest ‘11 keynote, he was familiar with 6502 assembly from his junior high-school days programming on an Apple II and that influence him when he designed the BPF bytecode.

The Linux kernel features BPF support since v2.5, mainly added by Jay Schullist. There were not major changes in the BPF code until 2011, when Eric Dumazet turned the BPF interpreter into a JIT (Source: A JIT for packet filters). Instead of interpreting BPF bytecode, now the kernel was able to translate BPF programs directly to a target architecture: x86, ARM, MIPS, etc.

Later on, in 2014, Alexei Starovoitov introduced a new BPF JIT. This new JIT was actually a new architecture based on BPF, known as eBPF. Both VMs co-existed for some time I think, but nowadays packet-filtering is implemented on top of eBPF. In fact, a lot of documentation refers now to eBPF as BPF, and the classic BPF is known as cBPF.

eBPF extends the classic BPF virtual machine in several ways:

  • Takes advantage of modern 64-bit architectures. eBPF uses 64-bit registers and increases the number of available registers from 2 (Accumulator and X register) to 10. eBPF also extends the number of opcodes (BPF_MOV, BPF_JNE, BPF_CALL…).
  • Decoupled from the networking subsystem. BPF was bounded to a packet-based data model. Since it was used for packet filtering, its code lived within the networking subsystem. However, the eBPF VM is no longer bounded to a data model and it can be used for any purpose. It’s possible to attach now an eBPF program to a tracepoint or to a kprobe. This opens up the door of eBPF to instrumentation, performance analysis and many more uses within other kernel subsystems. The eBPF code lives now at its own path: kernel/bpf.
  • Global data stores called Maps. Maps are key-value stores that allow the interchange of data between user-space and kernel-space. eBPF provides several types of Maps.
  • Helper functions. Such as packet rewrite, checksum calculation or packet cloning. Unlike user-space programming, these functions get executed inside the kernel. In addition, it’s possible to execute system calls from eBPF programs.
  • Tail-calls. eBPF programs are limited to 4096 bytes. The tail-call feature allows a eBPF program to pass control a new eBPF program, overcoming this limitation (up to 32 programs can be chained).

eBPF: an example

The Linux kernel sources include several eBPF examples. They’re available at samples/bpf/. To compile these examples simply type:

$ sudo make samples/bpf/

Instead of coding a new eBPF example, I’m going to reuse one of the samples available in samples/bpf/. I will go through some parts of the code and explain how it works. The example I chose was the tracex4 program.

Generally, all the examples at samples/bpf/ consist of 2 files. In this case:

We need to compile then tracex4_kern.c to eBPF bytecode. At this moment, gcc lacks a backend for eBPF. Luckily, clang can emit eBPF bytecode. The Makefile uses clang to compile tracex4_kern.c into an object file.

I commented earlier that one of the most interesting features of eBPF are Maps. Maps are key/value stores that allow to exchange data between user-space and kernel-space programs. tracex4_kern defines one map:

struct pair {
    u64 val;
    u64 ip;
};  

struct bpf_map_def SEC("maps") my_map = {
    .type = BPF_MAP_TYPE_HASH,
    .key_size = sizeof(long),
    .value_size = sizeof(struct pair),
    .max_entries = 1000000,
};

BPF_MAP_TYPE_HASH is one of the many Map types offered by eBPF. In this case, it’s simply a hash. You may also have noticed the SEC("maps") declaration. SEC is a macro used to create a new section in the binary. Actually the tracex4_kern example defines two more sections:

SEC("kprobe/kmem_cache_free")
int bpf_prog1(struct pt_regs *ctx)
{   
    long ptr = PT_REGS_PARM2(ctx);

    bpf_map_delete_elem(&my_map, &ptr); 
    return 0;
}
    
SEC("kretprobe/kmem_cache_alloc_node") 
int bpf_prog2(struct pt_regs *ctx)
{
    long ptr = PT_REGS_RC(ctx);
    long ip = 0;

    // get ip address of kmem_cache_alloc_node() caller
    BPF_KRETPROBE_READ_RET_IP(ip, ctx);

    struct pair v = {
        .val = bpf_ktime_get_ns(),
        .ip = ip,
    };
    
    bpf_map_update_elem(&my_map, &ptr, &v, BPF_ANY);
    return 0;
}   

These two functions will allow us to delete an entry from a map (kprobe/kmem_cache_free) and to add a new entry to a map (kretprobe/kmem_cache_alloc_node). All the function calls in capital letters are actually macros defined at bpf_helpers.h.

If I dump the sections of the object file, I should be able to see these new sections defined:

$ objdump -h tracex4_kern.o

tracex4_kern.o:     file format elf64-little

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000000  0000000000000000  0000000000000000  00000040  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 kprobe/kmem_cache_free 00000048  0000000000000000  0000000000000000  00000040  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  2 kretprobe/kmem_cache_alloc_node 000000c0  0000000000000000  0000000000000000  00000088  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  3 maps          0000001c  0000000000000000  0000000000000000  00000148  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  4 license       00000004  0000000000000000  0000000000000000  00000164  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  5 version       00000004  0000000000000000  0000000000000000  00000168  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  6 .eh_frame     00000050  0000000000000000  0000000000000000  00000170  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

Then there is tracex4_user.c, the main program. Basically what the program does is to listen to kmem_cache_alloc_node events. When that event happens, the corresponding eBPF code is executed. The code stores the IP attribute of an object into a map, which is printed in loop in the main program. Example:

$ sudo ./tracex4
obj 0xffff8d6430f60a00 is  2sec old was allocated at ip ffffffff9891ad90
obj 0xffff8d6062ca5e00 is 23sec old was allocated at ip ffffffff98090e8f
obj 0xffff8d5f80161780 is  6sec old was allocated at ip ffffffff98090e8f

How the user-space program and the eBPF program are connected? On initialization, tracex4_user.c loads the tracex4_kern.o object file using the load_bpf_file function.

int main(int ac, char **argv)
{
    struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
    char filename[256];
    int i;

    snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);

    if (setrlimit(RLIMIT_MEMLOCK, &r)) {
        perror("setrlimit(RLIMIT_MEMLOCK, RLIM_INFINITY)");
        return 1;
    }

    if (load_bpf_file(filename)) {
        printf("%s", bpf_log_buf);
        return 1;
    }

    for (i = 0; ; i++) {
        print_old_objects(map_fd[1]);
        sleep(1);
    }

    return 0;
}

When load_bpf_file is executed, the probes defined in the eBPF file are added to /sys/kernel/debug/tracing/kprobe_events. We’re listening now to those events and our program can do something when they happen.

$ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:kprobes/kmem_cache_free kmem_cache_free
r:kprobes/kmem_cache_alloc_node kmem_cache_alloc_node

All the other programs in sample/bpf/ follow a similar structure. There’s always two files:

  • XXX_kern.c: the eBPF program.
  • XXX_user.c: the main program.

The eBPF program defines Maps and functions hooked to a binary section. When the kernel emits a certain type of event (a tracepoint, for instance) our hooks will be executed. Maps are used to exchange data between the kernel program and the user-space program.

Wrapping up

In this article I have covered BPF and eBPF from a high-level view. I’m aware there’s a lot of resources and information nowadays about eBPF, but I feel I needed to explain it with my own words. Please check out the list of recommended readings for further information.

On the next article I will cover XDP and its relation with eBPF.

Recommended readings:

January 07, 2019 08:00 AM

December 08, 2018

Philippe Normand

Web overlay in GStreamer with WPEWebKit

After a year or two of hiatus I attended the GStreamer conference which happened in beautiful Edinburgh. It was great to meet the friends from the community again and learn about what’s going on in the multimedia world. The quality of the talks was great, the videos are published online as usual in Ubicast. I delivered a talk about the Multimedia support in WPEWebKit, you can watch it there and the slides are also available.

One of the many interesting presentations was about GStreamer for cloud-based live video. Usually anything with the word cloud would tend to draw my attention away but for some reason I attended this presentation, and didn’t regret it! The last demo presented by the BBC folks was about overlaying Web content on native video streams. It’s an interesting use-case for live TV broadcasting for instance. A web page provides dynamic notifications popping up and down, the web page is rendered with a transparent background and blended over the live video stream. The BBC folks implemented a GStreamer source element relying on CEF for their Brave project.

So here you wonder, why am I talking about Chromium Embedded Framework (CEF)? Isn’t this post about WPEWebKit? After seeing the demo from the Brave developers I immediately thought WPE could be a great fit for this HTML overlay use-case too! So a few weeks after the conference I finally had the time to start working on the WPE GStreamer plugin. My colleague Žan Doberšek, WPE’s founder hacker, provided a nice solution for the initial rendering issues of the prototype, many thanks to him!

Here’s a first example, a basic web-browser with gst-play:

$ gst-play-1.0 --videosink gtkglsink wpe://https://gnome.org

A GTK window opens up and the GNOME homepage should load. You can click on links too! To overlay a web page on top of a video you can use a pipeline like this one:

$ gst-launch-1.0 glvideomixer name=m sink_1::zorder=0 sink_0::height=818 sink_0::width=1920 ! gtkglsink \
 wpesrc location="file:///home/phil/Downloads/plunk/index.html" draw-background=0 ! m. \
 uridecodebin uri="http://192.168.1.44/Sintel.2010.1080p.mkv" name=d d. ! queue ! glupload \
  ! glcolorconvert ! m.

which can be represented with this simplified graph:

The advantage of this approach is that many heavy-lifting tasks happen in the GPU. WPE loads the page using its WPENetworkProcess external process, parses everything (DOM, CSS, JS, …) and renders it as a EGLImage, shared with the UIProcess (the GStreamer application, gst-launch in this case). In most situations decodebin will use an hardware decoder. The decoded video frames are uploaded to the GPU and composited with the EGLImages representing the web-page, in a single OpenGL scene, using the glvideomixer element.

The initial version of the GstWPE plugin is now part of the gst-plugins-bad staging area, where most new plugins are uploaded for further improvements later on. Speaking of improvements, the following tasks have been identified:

  • The wpesrc draw-background property is not yet operational due to missing WPEWebKit API for background-color configuration support. I expect to complete this task very soon, interested people can follow this bugzilla ticket
  • Audio support, WPEWebKit currently provides only EGLImages to application side. The audio session is rendered directly to GStreamer’s autoaudiosink in WebKit, so there’s currently no audio sharing support in wpesrc.
  • DMABuf support as an alternative to EGLImages. WPEWebKit internally leverages linux-dmabuf support already but doesn’t expose the file descriptors and plane informations.
  • Better navigation events support. GStreamer’s navigation events API was initially designed for DVD menus navigation uses-cases mostly, the exposed input events informations are not a perfect match for WPEWebKit which expects hardware-level informations from keyboard, mouse and touch devices.

There are more ways and use-cases related with WPE, I expect to unveil another WPE embedding project very soon. Watch this space! As usual many thanks to my Igalia colleagues for sponsoring this work. We are always happy to hear what others are doing with WPE and to help improving it, don’t hesitate to get in touch!

by Philippe Normand at December 08, 2018 02:09 PM

GStreamer’s playbin3 overview for application developers

Multimedia applications based on GStreamer usually handle playback with the playbin element. I recently added support for playbin3 in WebKit. This post aims to document the changes needed on application side to support this new generation flavour of playbin.

So, first of, why is it named playbin3 anyway? The GStreamer 0.10.x series had a playbin element but a first rewrite (playbin2) made it obsolete in the GStreamer 1.x series. So playbin2 was renamed to playbin. That’s why a second rewrite is nicknamed playbin3, I suppose :)

Why should you care about playbin3? Playbin3 (and the elements it’s using internally: parsebin, decodebin3, uridecodebin3 among others) is the result of a deep re-design of playbin2 (along with decodebin2 and uridecodebin) to better support:

  • gapless playback
  • audio cross-fading support (not yet implemented)
  • adaptive streaming
  • reduced CPU, memory and I/O resource usage
  • faster stream switching and full control over the stream selection process

This work was carried on mostly by Edward Hervey, he presented his work in detail at 3 GStreamer conferences. If you want to learn more about this and the internals of playbin3 make sure to watch his awesome presentations at the 2015 gst-conf, 2016 gst-conf and 2017 gst-conf.

Playbin3 was added in GStreamer 1.10. It is still considered experimental but in my experience it works already very well. Just keep in mind you should use at least the latest GStreamer 1.12 (or even the upcoming 1.14) release before reporting any issue in Bugzilla. Playbin3 is not a drop-in replacement for playbin, both elements share only a sub-set of GObject properties and signals. However, if you don’t want to modify your application source code just yet, it’s very easy to try playbin3 anyway:

$ USE_PLAYBIN3=1 my-playbin-based-app

Setting the USE_PLAYBIN environment variable enables a code path inside the GStreamer playback plugin which swaps the playbin element for the playbin3 element. This trick provides a glance to the playbin3 element for the most lazy people :) The problem is that depending on your use of playbin, you might get runtime warnings, here’s an example with the Totem player:

$ USE_PLAYBIN3=1 totem ~/Videos/Agent327.mp4
(totem:22617): GLib-GObject-WARNING **: ../../../../gobject/gsignal.c:2523: signal 'video-changed' is invalid for instance '0x556db67f3170' of type 'GstPlayBin3'

(totem:22617): GLib-GObject-WARNING **: ../../../../gobject/gsignal.c:2523: signal 'audio-changed' is invalid for instance '0x556db67f3170' of type 'GstPlayBin3'

(totem:22617): GLib-GObject-WARNING **: ../../../../gobject/gsignal.c:2523: signal 'text-changed' is invalid for instance '0x556db67f3170' of type 'GstPlayBin3'

(totem:22617): GLib-GObject-WARNING **: ../../../../gobject/gsignal.c:2523: signal 'video-tags-changed' is invalid for instance '0x556db67f3170' of type 'GstPlayBin3'

(totem:22617): GLib-GObject-WARNING **: ../../../../gobject/gsignal.c:2523: signal 'audio-tags-changed' is invalid for instance '0x556db67f3170' of type 'GstPlayBin3'

(totem:22617): GLib-GObject-WARNING **: ../../../../gobject/gsignal.c:2523: signal 'text-tags-changed' is invalid for instance '0x556db67f3170' of type 'GstPlayBin3'
sys:1: Warning: g_object_get_is_valid_property: object class 'GstPlayBin3' has no property named 'n-audio'
sys:1: Warning: g_object_get_is_valid_property: object class 'GstPlayBin3' has no property named 'n-text'
sys:1: Warning: ../../../../gobject/gsignal.c:3492: signal name 'get-video-pad' is invalid for instance '0x556db67f3170' of type 'GstPlayBin3'

As mentioned previously, playbin and playbin3 don’t share the same set of GObject properties and signals, so some changes in your application are required in order to use playbin3.

If your application is based on the GstPlayer library then you should set the GST_PLAYER_USE_PLAYBIN3 environment variable. GstPlayer already handles both playbin and playbin3, so no changes needed in your application if you use GstPlayer!

Ok, so what if your application relies directly on playbin? Some changes are needed! If you previously used playbin stream selection properties and signals, you will now need to handle the GstStream and GstStreamCollection APIs. Playbin3 will emit a stream collection message on the bus, this is very nice because the collection includes information (metadata!) about the streams (or tracks) the media asset contains. In playbin this was handled with a bunch of signals (audio-tags-changed, audio-changed, etc), properties (n-audio, n-video, etc) and action signals (get-audio-tags, get-audio-pad, etc). The new GstStream API provides a centralized and non-playbin-specific access point for all these informations. To select streams with playbin3 you now need to send a select_streams event so that the demuxer can know exactly which streams should be exposed to downstream elements. That means potentially improved performance! Once playbin3 completed the stream selection it will emit a streams selected message, the application should handle this message and potentially update its internal state about the selected streams. This is also the best moment to update your UI regarding the selected streams (like audio track language, video track dimensions, etc).

Another small difference between playbin and playbin3 is about the source element setup. In playbin there is a source read-only GObject property and a source-setup GObject signal. In playbin3 only the latter is available, so your application should rely on source-setup instead of the notify::source GObject signal.

The gst-play-1.0 playback utility program already supports playbin3 so it provides a good source of inspiration if you consider porting your application to playbin3. As mentioned at the beginning of this post, WebKit also now supports playbin3, however it needs to be enabled at build time using the CMake -DUSE_GSTREAMER_PLAYBIN3=ON option. This feature is not part of the WebKitGTK+ 2.20 series but should be shipped in 2.22. As a final note I wanted to acknowledge my favorite worker-owned coop Igalia for allowing me to work on this WebKit feature and also our friends over at Centricular for all the quality work on playbin3.

by Philippe Normand at December 08, 2018 09:48 AM

December 05, 2018

Samuel Iglesias

VK_KHR_shader_float_controls and Mesa support

Khronos Group has published two new extensions for Vulkan: VK_KHR_shader_float16_int8 and VK_KHR_shader_float_controls. In this post, I will talk about VK_KHR_shader_float_controls, which is the extension I have been implementing on Anvil driver, the open-source Intel Vulkan driver, as part of my job at Igalia. For information about VK_KHR_shader_float16_int8 and its implementation in Mesa, you can read Iago’s blogpost.

The Vulkan Working Group has defined a new extension VK_KHR_shader_float_controls, which allows applications to query and override the implementation’s default floating point behavior for rounding modes, denormals, signed zero and infinity. From the Vulkan application developer perspective, VK_shader_float_controls defines a new structure called VkPhysicalDeviceFloatControlsPropertiesKHR where the drivers expose the supported capabilities such as the rounding modes for each floating point data type, how the denormals are expected to be handled by the hardware (either flush to zero or preserve their bits) and if the value is a signed zero, infinity and NaN, whether it will preserve their bits.

typedef struct VkPhysicalDeviceFloatControlsPropertiesKHR {
    VkStructureType    sType;
    void*              pNext;
    VkBool32           separateDenormSettings;
    VkBool32           separateRoundingModeSettings;
    VkBool32           shaderSignedZeroInfNanPreserveFloat16;
    VkBool32           shaderSignedZeroInfNanPreserveFloat32;
    VkBool32           shaderSignedZeroInfNanPreserveFloat64;
    VkBool32           shaderDenormPreserveFloat16;
    VkBool32           shaderDenormPreserveFloat32;
    VkBool32           shaderDenormPreserveFloat64;
    VkBool32           shaderDenormFlushToZeroFloat16;
    VkBool32           shaderDenormFlushToZeroFloat32;
    VkBool32           shaderDenormFlushToZeroFloat64;
    VkBool32           shaderRoundingModeRTEFloat16;
    VkBool32           shaderRoundingModeRTEFloat32;
    VkBool32           shaderRoundingModeRTEFloat64;
    VkBool32           shaderRoundingModeRTZFloat16;
    VkBool32           shaderRoundingModeRTZFloat32;
    VkBool32           shaderRoundingModeRTZFloat64;
} VkPhysicalDeviceFloatControlsPropertiesKHR;

This structure will be filled by the driver when calling vkGetPhysicalDeviceProperties2(), with a pointer to such structure as one of the pNext pointers of VkPhysicalDeviceProperties2 structure. With that, we know if the driver will support the SPIR-V capabilities we want to use in our shaders, if separate*Settings are true, remember to check the value of the property for the floating point bit-size types you are planning to work with.

The required bits to enable such capabilities in a SPIR-V shader are the following:

  1. Enable the extension: OpExtension "SPV_KHR_float_controls"
  2. Enable the desired capability. For example: OpCapability DenormFlushToZero
  3. Specify where to apply it. For example, we would like to flush to zero all fp64 denormalss in the %main function of a shader: OpExecutionMode %main DenormFlushToZero 64. If we want to apply different modes, we would repeat that line with the needed ones.
  4. Profit!

I implemented the support of this extensions for the Anvil’s supported GPUs (Broadwell, Skylake, Kabylake and newer), although we don’t support all the capabilities. For example on Broadwell, float16 denormals are not supported, and the support for flushing to zero the float16 denormals is not supported for all the instructions in the rest of generations.

If you are interested, the patches are now under review :-) As there are not real world code using this feature yet, please fill any bug you find about this in our bugzilla.

December 05, 2018 03:57 PM

December 04, 2018

Iago Toral

VK_KHR_shader_float16_int8 on Anvil

The last time I talked about my driver work was to announce the implementation of the shaderInt16 feature for the Anvil Vulkan driver back in May, and since then I have been working on VK_KHR_shader_float16_int8, a new Vulkan extension recently announced by the Khronos group, for which I have just posted initial patches in mesa-dev supporting Broadwell and later Intel platforms.

As you probably guessed by the name, this extension enables Vulkan to consume SPIR-V shaders that use of Float16 and Int8 types in arithmetic operations, extending the functionality included with VK_KHR_16bit_storage and VK_KHR_8bit_storage, which was limited to load/store operations. In theory, applications that do not need the range and precision of regular 32-bit floating point and integers, can use these new types to improve performance by increasing ALU throughput and reducing register pressure, which in some platforms can also lead to improved parallelism.

In the case of the Intel platforms initial testing done by Intel suggests that better ALU throughput is expected when issuing half-float instructions. Lower register pressure is also expected, at least for SIMD16 fragment and compute shaders, where we can pack all 16-channels worth of half-float data into a single GPU register, which could significantly improve performance for shaders that would otherwise need to spill registers to memory.

Another neat thing is that while VK_KHR_shader_float16_int8 is a Vulkan extension, its implementation is mostly API agnostic, so most of the work we did here should also help us have a proper mediump implementation for GLSL ES shaders in the future.

There are a few caveats to consider as well though: on some hardware platforms smaller bit-sizes have certain hardware restrictions that may lead to emitting worse shader code in some scenarios, and generally, Mesa’s compiler infrastructure (and the Intel compiler backend in particular) have a long history of being 32-bit only, so there are parts of the compiler stack that still work better for 32-bit code.

Because VK_KHR_shader_float16_int8 is a brand new feature, we don’t really have any real world use cases yet. This is on top of the fact that Mesa’s compiler backends have been mostly (or exclusively) 32-bit aware until now (and more recently 64-bit too), so going forward I would expect a lot of focus on making our compiler be as robust (and optimal) for 16-bit code as it is for 32-bit code.

While we are already aware of a few areas where we can do better and I am currently working on addressing a few of these, one of the major limiting factors we have at the moment is the fact that the only source of 16-bit shaders available to us is the Khronos CTS, which due to its particular motivation, is very different from real world shader workloads and it is not a valid source material to drive compiler optimization work. Unfortunately, it might take some time until we start seeing applications using these new features, so in the meantime we will need to find other ways to drive further work in this area, and I think our best option here might be GLSL ES’s mediump and lowp qualifiers.

GLSL ES mediump and lowp qualifiers have been around for a long time but they are only defined as hints to the shader compiler that lower precision is acceptable and we have never really used them to emit half-float code. Thankfully, Topi Pohjolainen from Intel has been working on this for a while, which would open up a much better scenario for improving our 16-bit compiler paths, so this is something I am really looking forward to.

Finally, as I say above, we could could definitely use more testing and feedback from real world use cases, so if you decide to use this feature in your next project and you hit any bugs, please be sure to file them in Bugzilla so we can continue to improve our implementation.

by Iago Toral at December 04, 2018 08:25 AM

November 23, 2018

Víctor Jáquez

Building gst-msdk with MediaSDK opensource

I tried, several months ago, the open source version of Intel MediaSDK and it was a complete mess. In order to review some patches for gst-msdk I tried it again. I am surprised how the situation has improved since then.

Install dependencies

$ sudo apt get install libva-dev vainfo cmake ccache
$ sudo apt build-dep gstreamer1.0 gst-plugins-{base,good,bad}1.0
$ sudo apt remove libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev

Seting up the workplace

$ sudo mkdir /opt/intel
$ sudo chown usuario:usuario /opt/intel
$ mkdir ~/msdk
$ cd ~/msdk

Build MediaSDK

It will be built in its source directory: ~/msdk/MediaSDK/build

It will be installed in /opt/intel

$ git clone https://github.com/Intel-Media-SDK/MediaSDK.git
$ cd MediaSDK
$ mkdir build
$ cd build
$ cmake ..
$ make
$ make install

Build media-driver

$ cd ~/msdk
$ git clone https://github.com/intel/media-driver.git
$ git clone https://github.com/intel/gmmlib.git
$ mkdir build
$ cd build
$ cmake ../media-driver
$ make

Let’s install media-driver in /opt/intel too

$ cd ~/msdk/build
$ cp ./media_driver/iHD_drv_video.so /opt/intel

But don’t remove, rename or move the directori ~/msdk/build because iHD_drv_video.so links against libigdgmm.so.5 which is there. Thus either you keep the directory or you install that library in a path searchable by the linker, or set the environment variable LD_LIBRARY_PATH

Test environment

$ LIBVA_DRIVERS_PATH=/opt/intel LIBVA_DRIVER_NAME=iHD vainfo
  libva info: VA-API version 1.3.0
  libva info: va_getDriverName() returns -1
  libva info: User requested driver 'iHD'
  libva info: Trying to open /opt/intel/iHD_drv_video.so
  libva info: Found init function __vaDriverInit_1_3
  libva info: va_openDriver() returns 0
 vainfo: VA-API version: 1.3 (libva 2.2.0)
 vainfo: Driver version: Intel iHD driver - 1.0.0
 vainfo: Supported profile and entrypoints
   VAProfileNone                   : VAEntrypointVideoProc
   VAProfileNone                   : VAEntrypointStats
   VAProfileMPEG2Simple            : VAEntrypointVLD
   VAProfileMPEG2Simple            : VAEntrypointEncSlice
   VAProfileMPEG2Main              : VAEntrypointVLD
   VAProfileMPEG2Main              : VAEntrypointEncSlice
   VAProfileH264Main               : VAEntrypointVLD
   VAProfileH264Main               : VAEntrypointEncSlice
   VAProfileH264Main               : VAEntrypointFEI
   VAProfileH264Main               : VAEntrypointEncSliceLP
   VAProfileH264High               : VAEntrypointVLD
   VAProfileH264High               : VAEntrypointEncSlice
   VAProfileH264High               : VAEntrypointFEI
   VAProfileH264High               : VAEntrypointEncSliceLP
   VAProfileVC1Simple              : VAEntrypointVLD
   VAProfileVC1Main                : VAEntrypointVLD
   VAProfileVC1Advanced            : VAEntrypointVLD
   VAProfileJPEGBaseline           : VAEntrypointVLD
   VAProfileJPEGBaseline           : VAEntrypointEncPicture
   VAProfileH264ConstrainedBaseline: VAEntrypointVLD
   VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
   VAProfileH264ConstrainedBaseline: VAEntrypointFEI
   VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
   VAProfileVP8Version0_3          : VAEntrypointVLD
   VAProfileHEVCMain               : VAEntrypointVLD
   VAProfileHEVCMain               : VAEntrypointEncSlice
   VAProfileHEVCMain               : VAEntrypointFEI

Setup gst-build

It will be build in its source directory: ~/msdk/gst-build/build

$ cd ~/msdk
$ git clone https://gitlab.freedesktop.org/gstreamer/gst-build.git
$ cd gst-build
$ export INTELMEDIASDKROOT=/opt/intel/mediasdk
$ meson build -Dpython=disabled -Dgst-plugins-bad:msdk=enabled
$ ninja -C build

Check for build elements

$ ninja -C ~/msdk/gst-build/build  uninstalled
[gst-master] $ GST_VAAPI_ALL_DRIVERS=1 \
               LIBVA_DRIVERS_PATH=/opt/intel \
               LIBVA_DRIVER_NAME=iHD \
               gst-inspect-1.0 | egrep "vaapi|msdk"
 vaapi:  vaapijpegdec: VA-API JPEG decoder
vaapi:  vaapimpeg2dec: VA-API MPEG2 decoder
vaapi:  vaapih264dec: VA-API H264 decoder
vaapi:  vaapivc1dec: VA-API VC1 decoder
vaapi:  vaapivp8dec: VA-API VP8 decoder
vaapi:  vaapih265dec: VA-API H265 decoder
vaapi:  vaapipostproc: VA-API video postprocessing
vaapi:  vaapidecodebin: VA-API Decode Bin
vaapi:  vaapisink: VA-API sink
vaapi:  vaapimpeg2enc: VA-API MPEG-2 encoder
vaapi:  vaapih265enc: VA-API H265 encoder
vaapi:  vaapijpegenc: VA-API JPEG encoder
vaapi:  vaapih264enc: VA-API H264 encoder
msdk:  msdkvpp: MSDK Video Postprocessor
msdk:  msdkvc1dec: Intel MSDK VC1 decoder
msdk:  msdkvp8enc: Intel MSDK VP8 encoder
msdk:  msdkvp8dec: Intel MSDK VP8 decoder
msdk:  msdkmpeg2enc: Intel MSDK MPEG2 encoder
msdk:  msdkmpeg2dec: Intel MSDK MPEG2 decoder
msdk:  msdkmjpegenc: Intel MSDK MJPEG encoder
msdk:  msdkmjpegdec: Intel MSDK MJPEG decoder
msdk:  msdkh265enc: Intel MSDK H265 encoder
msdk:  msdkh265dec: Intel MSDK H265 decoder
msdk:  msdkh264enc: Intel MSDK H264 encoder
msdk:  msdkh264dec: Intel MSDK H264 decoder

Rembember

Remember to export these environment variables (perhaps you what to create a script file to set them):

export GST_VAAPI_ALL_DRIVERS=1
export LIBVA_DRIVERS_PATH=/opt/intel
 export LIBVA_DRIVER_NAME=iHD

by vjaquez at November 23, 2018 05:05 PM

November 14, 2018

Asumu Takikawa

Data Path Objects in VPP

A while back, I wrote a blog post explaining some of the basics of writing plugins for the VPP networking toolkit.

In that previous post, I explained a few mechanisms for hooking a plugin into VPP’s graph architecture so that your code can process incoming packets.

I also briefly mentioned something called DPOs (data path objects) but didn’t explain what they are or how they work. Since then, I’ve been reading and hacking on code that involves DPOs, so I’d like to attempt to explain them in this post.

(I’ll be assuming you’ve read the previous post or are already somewhat familiar with VPP, so if that’s not the case you may want to take a look at my previous post)

Data path objects

Here’s how DPOs are defined in their main header file (vpp.h):

A Data-Path Object is an object that represents actions that are applied to packets as they are switched through VPP’s data-path.

So a DPO is an object, which means that it’s a value that we can create and manipulate (via instances of of dpo_id_t) and also has some behavior (i.e., it has specialized methods or functions that do something).

By “as they are switched through”, this means that DPOs can be set to activate via rules set in VPP’s FIB (forwarding information base). For example, you can add a DPO that will act on IPv6 packets matching an address prefix that you choose.

The job of a FIB is to maintain forwarding information so that the switch knows which interfaces on which to forward packets along. With DPOs, you can add entries to the FIB that tell VPP to forward packets via your DPO to a VPP node of your choosing instead (where presumably you will act on the packets somehow).

You can see this at work by interacting with the FIB in VPP. Here’s an example CLI interaction:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
vpp# dslite set aftr-tunnel-endpoint-address 2001:db8:85a3::8a2e:370:1
vpp# dslite add pool address 10.1.1.5
vpp# show fib entry
FIB Entries:
[... omitted ...]
7@2001:db8:85a3::8a2e:370:1/128
  unicast-ip6-chain
  [@0]: dpo-load-balance: [proto:ip6 index:9 buckets:1 uRPF:7 to:[0:0]]
    [0] [@19]: DS-Lite: AFTR:0
8@10.1.1.5/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:10 buckets:1 uRPF:8 to:[0:0]]
    [0] [@12]: DS-Lite: AFTR:0

The first two commands are part of the DS-Lite plugin and use some DPOs to set up a kind of IPv4 in IPv6 tunnel. You can see from the results of show fib entry that the commands have populated the FIB with entries for the given addresses and their associated DPO: DS-Lite: AFTR:0.

The ability to tie into the FIB is why you may want to use DPOs instead of some of the mechanisms I mentioned in the previous blog post. For some applications, it could make sense to hook into a feature arc because you potentially want to look at all packets (e.g., a monitoring program like an IPFIX meter) or, say, all IP packets. But in other cases, you are only interested in packets going to a specific prefix (e.g., you are setting up an endpoint for a tunnel) and would like to take advantage of the FIB for that.

DPO API

In order to set up DPOs, you first create an interface of DPO functions for your own DPO type. I’ve been reading the DS-Lite implementation in VPP a lot recently so I’ll show some (simplified) examples from that.

The typical pattern to use DPOs is to first create your own DPO type and create an API of DPO functions to use with that type. The first part of this API is a constructor function for making instances of the DPO, like dslite_dpo_create:

1
2
3
4
5
6
7
dpo_type_t dslite_dpo_type;

void
dslite_dpo_create (dpo_proto_t dproto, index_t aftr_index, dpo_id_t * dpo)
{
  dpo_set (dpo, dslite_dpo_type, dproto, aftr_index);
}

The dpo_set function takes a protocol constant, an index_t, and a dpo_id_t (a struct that identifies a particular DPO). It’s used to initialize the DPO. You could directly ues dpo_set if you wanted by passing in the dslite_dpo_type, so dslite_dpo_create is effectively a partial application of dpo_set.

A use of the constructor in your API client’s code might look like this:

1
2
3
4
5
/* declaration & temp initialization of DPO */
dpo_id_t my_dpo = DPO_INVALID;

/* initialize DPO for desired protocol */
dslite_dpo_create(DPO_PROTO_IP6, 0, &my_dpo);

The constructor takes a few arguments, namely the protocol to use, an index for the DPO, and a pointer to the DPO that’s going to be initialized.

The most interesting argument is the protocol, which in this case is DPO_PROTO_IP6. You pass in a protocol at construction time because:

  • DPOs can be specialized to work on packets with a specific protocol because the actions you take on them are specialized, and
  • DPOs can be used with more than one protocol type, for example both IPv4 and IPv6.

In particular, you can also send packets to different nodes depending on the protocol that is matched. This is set up with some additional data structures in the API code like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
const static char *const dslite_ce_ip4_nodes[] = {
  "dslite-ce-encap",
  NULL,
};

const static char *const dslite_ce_ip6_nodes[] = {
  "dslite-ce-decap",
  NULL,
};

const static char *const *const dslite_nodes[DPO_PROTO_NUM] = {
  [DPO_PROTO_IP4] = dslite_ip4_nodes,
  [DPO_PROTO_IP6] = dslite_ip6_nodes,
  [DPO_PROTO_MPLS] = NULL
};

The code above basically constructs a table mapping DPO protocol to arrays of node names. The table doesn’t have to be exhaustive (relative to all protocols that DPOs work on), but it should cover whatever protocols you want to use with your particular DPO type.

The nodes specified in your mapping are actually registered for a DPO type by calling the dpo_register_new_type function:

1
2
3
4
5
void
dslite_dpo_module_init (void)
{
  dslite_dpo_type = dpo_register_new_type (&dslite_dpo_vft, dslite_nodes);
}

This dslite_dpo_module_init function is called from the NAT plugin’s initialization function (the DS-Lite code is a part of the NAT code). If you write your own DPO API, you’ll need to register the new DPO type in your VPP plugin’s initialization code.

You might be wondering where the object-oriented aspect of DPOs come from, given the allusion in the name. When defining your DPO API, you also define a virtual function table struct (dpo_vft_t) that is passed to dpo_register_new_type call shown above. That table might look like this:

1
2
3
4
5
const static dpo_vft_t dslite_dpo_vft = {
  .dv_lock = dslite_dpo_lock,
  .dv_unlock = dslite_dpo_unlock,
  .dv_format = format_dslite_dpo,
};

In which the fields are basically methods that you implement for the DPO type. For the DS-Lite example, these functions do very little so I won’t go into the details here.

Using DPOs in forwarding

Once you’ve defined your DPO type and API, you can use it to forward packets to your VPP graph nodes. In order to hook your DPO up to the FIB, which lets you switch packets to your nodes, you need to construct a DPO instance in your plugin code and then call an function that registers FIB entries.

This example code from the DS-Lite implementation illustrates some of this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
/* recall constructor examples from earlier */
dslite_dpo_create (DPO_PROTO_IP6, 0, &dpo);

/* FIB prefix data structure, used below */
fib_prefix_t pfx = {
  .fp_proto = FIB_PROTOCOL_IP6,
  .fp_len = 128,
  .fp_addr.ip6.as_u64[0] = addr->as_u64[0],
  .fp_addr.ip6.as_u64[1] = addr->as_u64[1],
};

/* register FIB entry for DPO */
fib_table_entry_special_dpo_add (0,
                                 &pfx,
                                 /* if you're writing a plugin you use this,
                                    some other DPO code uses other constants */
                                 FIB_SOURCE_PLUGIN_HI,
                                 FIB_ENTRY_FLAG_EXCLUSIVE,
                                 &dpo);

The excerpt above is doing a few things:

  1. Constructs a DPO and putting it in the dpo variable,
  2. Declares a FIB prefix (fib_prefix_t) used for switching (which could be an address for IP, or a label for MPLS), and
  3. Adds an entry to the FIB using the DPO and FIB prefix.

With that information added, the FIB can start switching packets to the nodes specified in your DPO (in this case to dslite-ce-decap). When packets go to your node, they are processed like in any other VPP node that you write.

To explain what the above excerpt is doing a bit more concretely, recall that this example is taken from the DS-Lite code. DS-Lite is a mechanism for sending IPv4 traffic tunneled (i.e., encapsulated) over an IPv6 network.

The DPO above is associated with the server endpoint for a DS-Lite tunnel that does NAT on the inner (decapsulated) packet, which means the server only receives encapsulated packets addressed to its IPv6 address.

Therefore the prefix is an IPv6 address (put inside .fp_addr.ip6 above) and the prefix length is 128, the full length of the address.

In other networking setups, you may have a different kind of tunnel in which the client does NAT rather than the server. In this case, you might set a prefix length that isn’t the full length of an address, and instead corresponds to however you allocate your NAT addresses (see the MAP-E code in VPP for an example of this).

The point here is that DPOs let you use the typical prefix forwarding capabilities of IP (or MPLS, etc) to hook up packets to your VPP node.

Further reading

Hopefully this blog post made it a bit clearer why you might want to use DPOs in your own VPP code and how to start doing so. To learn more, I would suggest just reading examples in the code base (files ending with _dpo.c and _dpo.h are helpful, and then look for uses of those API functions).

The DS-Lite code is also relatively simple and easy to read. The main files for the DPO code in DS-Lite are dslite_dpo.c and dslite.c.

by Asumu Takikawa at November 14, 2018 06:40 PM

November 12, 2018

Michael Catanzaro

The GNOME (and WebKitGTK+) Networking Stack

WebKit currently has four network backends:

  • CoreFoundation (used by macOS and iOS, and thus Safari)
  • CFNet (used by iTunes on Windows… I think only iTunes?)
  • cURL (used by most Windows applications, also PlayStation)
  • libsoup (used by WebKitGTK+ and WPE WebKit)

One guess which of those we’re going to be talking about in this post. Yeah, of course, libsoup! If you’re not familiar with libsoup, it’s the GNOME HTTP library. Why is it called libsoup? Because before it was an HTTP library, it was a SOAP library. And apparently somebody thought that when Mexican people say “soap,” it often sounds like “soup,” and also thought that this was somehow both funny and a good basis for naming a software library. You can’t make this stuff up.

Anyway, libsoup is built on top of GIO’s sockets APIs. Did you know that GIO has Object wrappers for BSD sockets? Well it does. If you fancy lower-level APIs, create a GSocket and have a field day with it. Want something a bit more convenient? Use GSocketClient to create a GSocketConnection connected to a GNetworkAddress. Pretty straightforward. Everything parallels normal BSD sockets, but the API is nice and modern and GObject, and that’s really all there is to know about it. So when you point WebKitGTK+ at an HTTP address, libsoup is using those APIs behind the scenes to handle connection establishment. (We’re glossing over details like “actually implementing HTTP” here. Trust me, libsoup does that too.)

Things get more fun when you want to load an HTTPS address, since we have to add TLS to the picture, and we can’t have TLS code in GIO or GLib due to this little thing called “copyright law.” See, there are basically three major libraries used to implement TLS on Linux, and they all have problems:

  • OpenSSL is by far the most popular, but it’s, hm, shall we say technically non-spectacular. There are forks, but the forks have problems too (ask me about BoringSSL!), so forget about them. The copyright problem here is that the OpenSSL license is incompatible with the GPL. (Boring details: Red Hat waves away this problem by declaring OpenSSL a system library qualifying for the GPL’s system library exception. Debian has declared the opposite, so Red Hat’s choice doesn’t gain you anything if you care about Debian users. The OpenSSL developers are trying to relicense to the Apache license to fix this, but this process is taking forever, and the Apache license is still incompatible with GPLv2, so this would make it impossible to use GPLv2+ software except under the terms of GPLv3+. Yada yada details.) So if you are writing a library that needs to be used by GPL applications, like say GLib or libsoup or WebKit, then it would behoove you to not use OpenSSL.
  • GnuTLS is my favorite from a technical standpoint. Its license is LGPLv2+, which is unproblematic everywhere, but some of its dependencies are licensed LGPLv3+, and that’s uncomfortable for many embedded systems vendors, since LGPLv3+ contains some provisions that make it difficult to deny you your freedom to modify the LGPLv3+ software. So if you rely on embedded systems vendors to fund the development of your library, like say libsoup or WebKit, then you’re really going to want to avoid GnuTLS.
  • NSS is used by Firefox. I don’t know as much about it, because it’s not as popular. I get the impression that it’s more designed for the needs of Firefox than as a Linux system library, but it’s available, and it works, and it has no license problems.

So naturally GLib uses NSS to avoid the license issues of OpenSSL and GnuTLS, right?

Haha no, it uses a dynamically-loadable extension point system to allow you to pick your choice of OpenSSL or GnuTLS! (Support for NSS was started but never finished.) This is OK because embedded systems vendors don’t use GPL applications and have no problems with OpenSSL, while desktop Linux users don’t produce tivoized embedded systems and have no problems with LGPLv3. So if you’re using desktop Linux and point WebKitGTK+ at an HTTPS address, then GLib is going to load a GIO extension point called glib-networking, which implements all of GIO’s TLS APIs — notably GTlsConnection and GTlsCertificate — using GnuTLS. But if you’re building an embedded system, you simply don’t build or install glib-networking, and instead build a different GIO extension point called glib-openssl, and libsoup will create GTlsConnection and GTlsCertificate objects based on OpenSSL instead. Nice! And if you’re Centricular and you’re building GStreamer for Windows, you can use yet another GIO extension point, glib-schannel, for your native Windows TLS goodness, all hidden behind GTlsConnection so that GStreamer (or whatever application you’re writing) doesn’t have to know about SChannel or OpenSSL or GnuTLS or any of that sad complexity.

Now you know why the TLS extension point system exists in GIO. Software licenses! And you should not be surprised to learn that direct use of any of these crypto libraries is banned in libsoup and WebKit: we have to cater to both embedded system developers and to GPL-licensed applications. All TLS library use is hidden behind the GTlsConnection API, which is really quite nice to use because it inherits from GIOStream. You ask for a TLS connection, have it handed to you, and then read and write to it without having to deal with any of the crypto details.

As a recap, the layering here is: WebKit -> libsoup -> GIO (GLib) -> glib-networking (or glib-openssl or glib-schannel).

So when Epiphany fails to load a webpage, and you’re looking at a TLS-related error, glib-networking is probably to blame. If it’s an HTTP-related error, the fault most likely lies in libsoup. Same for any other GNOME applications that are having connectivity troubles: they all use the same network stack. And there you have it!

P.S. The glib-openssl maintainers are helping merge glib-openssl into glib-networking, such that glib-networking will offer a choice of GnuTLS or OpenSSL and obsoleting glib-openssl. This is still a work in progress. glib-schannel will be next!

P.S.S. libcurl also gives you multiple choices of TLS backend, but makes you choose which at build time, whereas with GIO extension points it’s actually possible to choose at runtime from the selection of installed extension points. The libcurl approach is fine in theory, but creates some weird problems, e.g. different backends with different bugs are used on different distributions. On Fedora, it used to use NSS, but now uses OpenSSL, which is fine for Fedora, but would be a license problem elsewhere. Debian actually builds several different backends and gives you a choice, unlike everywhere else. I digress.

by Michael Catanzaro at November 12, 2018 04:51 AM

November 07, 2018

Javier Fernández

CSS Grid on LayoutNG: a Web Engines Hackfest story

I had the pleasure to attend the Web Engines Hackfest last week, hosted and organized by Igalia in its HQ in A Coruña. I’m really proud of what we are achieving with this event, thank so much to everybody involved in the organization but, specially, to the people attending. This year we had a lot of talent in a single place, hacking and sharing their expertise on the Web Platform; I really think we all have pushed the Web Platform forward during these days.

As you may already know, I’m part of the Web Platform team at Igalia working on the implementation of the CSS Grid Layout feature for Blink and WebKit web engines. This work has been sponsored by Bloomberg, as part of the collaboration we started several years ago to improve the Web Platform in many areas, from JS (V8, JSC and even ChakraCore) to different modules of the layout engine (eg. CSS features, editing/selection).

Since the day I received the invitation to attend the hackfest I knew that one of the tasks I wanted to hack on was the implementation of the CSS Grid feature in the new Chromium’s layout engine (still experimental), known as LayoutNG. Having Chistinan Biesinger, one of the Google engineers working on LayoutNG, here during 3 days was so good to let pass by the oportunity to, at least, start this task. I asked him to give a lighting talk during the layout breakout session about the current status of the LayoutNG project, a brief explanation of some of the most relevant details of its logic and its advantages with the current layout.

Layout Breakout Session

A small group of people interested on layout met to discuss about the future of layout in different web engines. I attended with some folks representing Igalia, and other people from Mozilla, Google, WebKit and ARM.

Christian described the key parts of the new LayoutNG, which provides a simpler code and generally better performance (although results are still preliminary since its under development phase). The concept of fragments gained relevance in this new layout model and currently, inline and block layout is basically complete; the multicolumn layout is quite advanced while Flexbox is still in the early stages (although a substantial portion of the layout tests are passing now).

We discussed about the different strategies that Firefox, Chrome and Safari are following to redesign the layout logic, which has a huge legacy codebase required to support the old web along the last years. However, browsers need to adapt to the new layout models that a modern Web Platform requires. Chrome with LayoutNG implies a clear bet, with a big team and strong determination; it seems it’ll be ready to ship in the first months of 2019. Firefox is also starting to implement a new layout design with Servo, although I couldn’t get details about its current status and plans. Finally, WebKit started a few months ago a new project called Next-Generation layout which tries to implement a new Layout Formatting Context (LFC) logic from scratch, getting rid f the huge technical debt acquired during the last years; although I couldn’t get confirmation, my opinion is that it’s still an experimental project.

We also had time to talk about the effort ARM is doing towards a better parallelization of the CSS parsing and style recalc logic, following a similar approach to Mozilla’s Sylo in Servo. It’s a very intresting initiative, but still quite experimental. There is some progress on specific codepahts, but still dealing with Oilpan (Blink’s Garbage Collector) which is the root cause of several issues that prevents to obtain an effective parallelization.

Hacking, hacking, hacking, ….

As I commented, this event is designed precisely to gather together some of the most brilliant minds in the Web Platform to discuss, analyze and hack on complex topics that are usually very difficult to handle when working remotely. I had a clear hacking task this time, so that’s why I decided to focus a bit more on coding. Although I already had assumed that implementing CSS Grid in LayoutNG would be a huge challenge, I decided to take it and at least start the task. I took as reference the Flexible Box implementation, which is under development right now and something Christian was partially involved on.

As it happened with the Flexbible Box implementation, the first step was to redesign the logic so that we can get rid of the dependency of the old Layout Tree, in this case, the LayoutGrid class. This has ben a complex and long task, which took me a quite big part of my time during the hackfest. The following diagrams show the redesign effort achieved, which I’d admit is still a preliminary approach that needs to be refined:

The next step was to implement an skeleton of the new layout-ng grid algorithm. Thanks to Christian’s direction, I quickly figure out how to do it and it looks like something like this:

namespace blink {

NGGridLayoutAlgorithm::NGGridLayoutAlgorithm(NGBlockNode node,
                                             const NGConstraintSpace& space,
                                             NGBreakToken* break_token)
    : NGLayoutAlgorithm(node, space, ToNGBlockBreakToken(break_token)) {}

scoped_refptr NGGridLayoutAlgorithm::Layout() {
  return container_builder_.ToBoxFragment();
}

base::Optional NGGridLayoutAlgorithm::ComputeMinMaxSize(
    const MinMaxSizeInput& input) const {
  // TODO Implement this.
  return base::nullopt;
}

}  // namespace blink

Finally, I tried to implement the Grid layout’s algorithm, according to the CSS Grid Layout feature’s specification, using the new LayoutNG APIs. This is the more complex tasks, since I still have to learn how sizing and positioning functions are used on the new layout logic, specially how to use the new Fragments and ContainerBuilder concepts.

I submitted a WIP CL so that anybody can take a look, give suggestions or continue with the work. My plan is to devote some time to this challenge, every now and then, but I can’t set specifc goals or schedule for the time being. If anybody wants to speed up this task, perhaps it’d be possible to fund a project, which Igalia would be happy to participate.

Other Web Engines Hackfest stories

I also tried to attend some of the talks give during the hackfest and participate in a few breakout sessions. I’ll give now my impression of some of the ones I liked more.

I enjoyed a lot the one given by Camille Lamy, Colin Blundell and Robert Kroeger (Google) about the Chrome’s Servicification project. The new services design they are implementing is awesome and it will improve for sure Chrome modularity and codebase maintenance.

I participated in the MathML breakout session, which has somehow related to LayoutNG. Igalia launched a crowfunding campaign to implement the MathML specification in Chrome, using the new LayoutNG APIs. We thin that MathML could be a great success case for the new LayoutNG APIs, which has the goal of provide a stable API to implement new and complex layout models. This model will provide flexibility to the web engine, proving an easier way to implement new layout models without depending too much on the Chrome development cycle. In a way, this development model could be similar to a polyfill, but it’s integrated in the browser as native code instead of via external libraries.

by jfernandez at November 07, 2018 12:23 PM

Michael Catanzaro

Mesa Update Breaks WebKitGTK+ in Fedora 29

If you’re using Fedora and discovered that WebKitGTK+ is displaying blank pages, the cause is a bad mesa update, mesa-18.2.3-1.fc29. This in turn was caused by a GCC bug that resulted in miscompilation of mesa.

To avoid this bug, downgrade to mesa-18.2.2-1.fc29:

$ sudo dnf downgrade mesa*

You can also update to mesa-18.2.4-2.fc29, but this build has not yet reached updates-testing, let alone stable, so downgrading is easier for now. Another workaround is to run your application with accelerated compositing mode disabled, to avoid OpenGL usage:

$ WEBKIT_DISABLE_COMPOSITING_MODE=1 epiphany

On the bright side of things, from all the bug reports I’ve received over the past two days I’ve discovered that lots of people use Epiphany and notice when it’s broken. That’s nice!

Huge thanks to Dave Airlie for quickly preparing the fixed mesa update, and to Jakub Jelenik for handling the same for GCC.

by Michael Catanzaro at November 07, 2018 02:28 AM

November 03, 2018

Michael Catanzaro

WebKitGTK+ 2.22.2 and 2.22.3, Media Source Extensions, and YouTube

Last month, I attended the Web Engines Hackfest (hosted by Igalia in A Coruña, Spain) and also the WebKit Contributors Meeting (hosted by Apple in San Jose, California). These are easily the two biggest WebKit development events of the year, and it’s always amazing to meet everyone in person yet again. A Coruña is an amazing city, and every browser developer ought to visit at least once. And the Contributors Meeting is a no-brainer event for WebKit developers.

One of the main discussion points this year was Media Source Extensions (MSE). MSE is basically a way for browsers to control how videos are downloaded. Until recently, if you were to play a YouTube video in Epiphany, you’d notice that the video loads way faster than it does in other browsers. This is because WebKitGTK+ — until recently — had no support for MSE. In other browsers, YouTube uses MSE to limit the speed at which video is downloaded, in order to reduce wasted bandwidth in case you stop watching the video before it ends. But with WebKitGTK+, MSE was not available, so videos would load as quickly as possible. MSE also makes it harder for browsers to offer the ability to download the videos; you’ll notice that neither Firefox nor Chrome offer to download the videos in their context menus, a feature that’s been available in Epiphany for as long as I remember.

So that sounds like it’s good to not have MSE. Well, the downside is that YouTube requires it in order to receive HD videos, to avoid that wasted bandwidth and to make it harder for users to download HD videos. And so WebKitGTK+ users have been limited to 720p video with H.264 and 480p video with WebM, where other browsers had access to 1080p and 1440p video. I’d been stuck with 480p video on Fedora for so long, I’d forgotten that internet video could look good.

Unfortunately, WebKitGTK+ was quite late to implement MSE. All other major browsers turned it on several years ago, but WebKitGTK+ dawdled. There was some code to support MSE, but it didn’t really work, and was disabled. And so it came to pass that, in September of this year, YouTube began to require MSE to access any WebM video, and we had a crisis. We don’t normally enable major new features in stable releases, but this was an exceptional situation and users would not be well-served by delaying until the next release cycle. So within a couple weeks, we were able to release WebKitGTK+ 2.22.2 and Epiphany 3.30.1 (both on September 21), and GStreamer 1.14.4 (on October 2, thanks to Tim-Philipp Müller for expediting that release). Collectively, these releases enabled basic video playback with MSE for users of GNOME 3.30. And if you still use of GNOME 3.28, worry not: you are still supported and can get MSE if you update to Epiphany 3.28.5 and also have the aforementioned versions of WebKitGTK+ and GStreamer.

MSE in WebKitGTK+ 2.22.2 had many rough edges because it was a mad rush to get the feature into a minimally-viable state, but those issues have been polished off in 2.22.3, which we released earlier this week on October 29. Be sure you have WebKitGTK+ 2.22.3, plus GStreamer 1.14.4, for a good experience on YouTube. Unfortunately we can’t provide support for older software versions anymore: if you don’t have GStreamer 1.14.4, then you’ll need to configure WebKitGTK+ with -DENABLE_MEDIA_SOURCE=OFF at build time and suffer from lack of MSE.

Epiphany 3.28.1 uses WebKitSettings to turn on the “enable-mediasource” setting. Turn that on if your application wants MSE now (if it’s a web browser, it certainly does). This setting will be enabled by default in WebKitGTK+ 2.24. Huge thanks to the talented developers who made this feature possible! Enjoy your 1080p and 1440p video.

by Michael Catanzaro at November 03, 2018 04:19 AM

On WebKit Build Options (Also: How to Accidentally Disable Important Security Features!)

When building WebKitGTK+, it’s a good idea to stick to the default values for the build options. If you’re building some sort of embedded system and really know what you’re doing, then OK, it might make sense to change some settings and disable some stuff. But Linux distros are generally well-advised to stick to the defaults to avoid creating problems for users.

One exception is if you need to disable certain features to avoid newer dependencies when building WebKit for older systems. For example, Ubuntu 18.04 disables web fonts (ENABLE_WOFF2=OFF) because it doesn’t have the libbrotli and libwoff2 dependencies that are required for that feature to work, hence some webpages will display using subpar fonts. And distributions shipping older versions of GStreamer will need to disable the ENABLE_MEDIA_SOURCE option (which is missing from the below feature list by mistake), since that requires the very latest GStreamer to work.

Other exceptions are the ENABLE_GTKDOC and ENABLE_MINIBROWSER settings, which distros do want. ENABLE_GTKDOC is disabled by default because it’s slow to build, and ENABLE_MINIBROWSER because, well, actually I don’t know why, you always want that one and it’s just annoying to find it’s not built.

OK, but really now, other than those exceptions, you should probably leave the defaults alone.

The feature list that prints when building WebKitGTK+ looks like this:

--  ENABLE_ACCELERATED_2D_CANVAS .......... OFF
--  ENABLE_DRAG_SUPPORT                     ON
--  ENABLE_GEOLOCATION .................... ON
--  ENABLE_GLES2                            OFF
--  ENABLE_GTKDOC ......................... OFF
--  ENABLE_ICONDATABASE                     ON
--  ENABLE_INTROSPECTION .................. ON
--  ENABLE_JIT                              ON
--  ENABLE_MINIBROWSER .................... OFF
--  ENABLE_OPENGL                           ON
--  ENABLE_PLUGIN_PROCESS_GTK2 ............ ON
--  ENABLE_QUARTZ_TARGET                    OFF
--  ENABLE_SAMPLING_PROFILER .............. ON
--  ENABLE_SPELLCHECK                       ON
--  ENABLE_TOUCH_EVENTS ................... ON
--  ENABLE_VIDEO                            ON
--  ENABLE_WAYLAND_TARGET ................. ON
--  ENABLE_WEBDRIVER                        ON
--  ENABLE_WEB_AUDIO ...................... ON
--  ENABLE_WEB_CRYPTO                       ON
--  ENABLE_X11_TARGET ..................... ON
--  USE_LIBHYPHEN                           ON
--  USE_LIBNOTIFY ......................... ON
--  USE_LIBSECRET                           ON
--  USE_SYSTEM_MALLOC ..................... OFF
--  USE_WOFF2                               ON

And, asides from the exceptions noted above, those are probably the options you want to ship with.

Why are some things disabled by default? ENABLE_ACCELERATED_2D_CANVAS is OFF by default because it is experimental (i.e. not great :) and requires CairoGL, which has been available in most distributions for about half a decade now, but still hasn’t reached Debian yet, because the Debian developers know that the Cairo developers consider CarioGL experimental (i.e. not great!). Many of our developers use Debian, and we’re not keen on having two separate sets of canvas bugs depending on whether you’re using Debian or not, so best keep this off for now. ENABLE_GLES2 switches you from desktop GL to GLES, which is maybe needed for embedded systems with crap proprietary graphics drivers, but certainly not what you want when building for a general-purpose distribution with mesa. Then ENABLE_QUARTZ_TARGET is for building on macOS, not for Linux. And then we come to USE_SYSTEM_MALLOC.

USE_SYSTEM_MALLOC disables WebKit’s bmalloc memory allocator (“fast malloc”) in favor of glibc malloc. bmalloc is performance-optimized for macOS, and I’m uncertain how its performance compares to glibc malloc on Linux. Doesn’t matter really, because bmalloc contains important heap security features that will be disabled if you switch to glibc malloc, and that’s all you need to know to decide which one to use. If you disable bmalloc, you lose the Gigacage, isolated heaps, heap subspaces, etc. I don’t pretend to understand how any of those things work, so I’ll just refer you to this explanation by Sam Brown, who sounds like he knows what he’s talking about. The point is that, if an attacker has found a memory vulnerability in WebKit, these heap security features make it much harder to exploit and take control of users’ computers, and you don’t want them turned off.

USE_SYSTEM_MALLOC is currently enabled (bad!) in openSUSE and SUSE Linux Enterprise 15, presumably because when the Gigacage was originally introduced, it crashed immediately for users who set address space (virtual memory allocation) limits. Gigacage works by allocating a huge address space to reduce the chances that an attacker can find pointers within that space, similar to ASLR, so limiting the size of the address space prevents Gigacage from working. At first we thought it made more sense to crash than to allow a security feature to silently fail, but we got a bunch of complaints from users who use ulimit to limit the address space used by processes, and also from users who disable overcommit (which is required for Gigacage to allocate ludicrous amounts of address space), and so nowadays we just silently disable Gigacage instead if enough address space for it cannot be allocated. So hopefully there’s no longer any reason to disable this important security feature at build time! Distributions should be building with the default USE_SYSTEM_MALLOC=OFF.

The openSUSE CMake line currently looks like this:

%cmake \
  -DCMAKE_BUILD_TYPE=Release \
  -DLIBEXEC_INSTALL_DIR=%{_libexecdir}/libwebkit2gtk%{_wk2sover} \
  -DPORT=GTK \
%if 0%{?suse_version} == 1315
  -DCMAKE_C_COMPILER=gcc-7 \
  -DCMAKE_CXX_COMPILER=g++-7 \
  -DENABLE_WEB_CRYPTO=OFF \
  -DUSE_GSTREAMER_GL=false \
%endif
%if 0%{?suse_version} <= 1500
  -DUSE_WOFF2=false \
%endif
  -DENABLE_MINIBROWSER=ON \
%if %{with python3}
  -DPYTHON_EXECUTABLE=%{_bindir}/python3 \
%endif
%if !0%{?is_opensuse}
  -DENABLE_PLUGIN_PROCESS_GTK2=OFF \
%endif
%ifarch armv6hl ppc ppc64 ppc64le riscv64 s390 s390x
  -DENABLE_JIT=OFF \
%endif
  -DUSE_SYSTEM_MALLOC=ON \
  -DCMAKE_EXE_LINKER_FLAGS="-Wl,--as-needed -Wl,-z,now -pthread" \
  -DCMAKE_MODULE_LINKER_FLAGS="-Wl,--as-needed -Wl,-z,now -pthread" \
  -DCMAKE_SHARED_LINKER_FLAGS="-Wl,--as-needed -Wl,-z,now -pthread"

which all looks pretty reasonable to me: certain features that require “newer” dependencies are disabled on the old distros, and NPAPI plugins are not supported in the enterprise distro, and JIT doesn’t work on odd architectures. I would remove the ENABLE_JIT=OFF lines only because WebKit’s build system should be smart enough nowadays to disable it automatically to save you the trouble of thinking about which architectures the JIT works on. And I would also remove the -DUSE_SYSTEM_MALLOC=ON line to ensure users are properly protected.

by Michael Catanzaro at November 03, 2018 03:29 AM

October 25, 2018

José Dapena

3 events in a month

As part of my job at Igalia, I have been attending 2-3 events per year. My role mostly as a Chromium stack engineer is not usually much demanding regarding conference trips, but they are quite important as an opportunity to meet collaborators and project mates.

This month has been a bit different, as I ended up visiting Santa Clara LG Silicon Valley Lab in California, Igalia headquarters in A Coruña, and Dresden. It was mostly because I got involved in the discussions for the web runtime implementation being developed by Igalia for AGL.

AGL f2f at LGSVL

It is always great to visit LG Silicon Valley Lab (Santa Clara, US), where my team is located. I have been participating for 6 years in the development of the webOS web stack you can most prominently enjoy in LG webOS smart TV.

One of the goals for next months at AGL is providing an efficient web runtime. In LGSVL we have been developing and maintaining WAM, the webOS web runtime. And as it was released with an open source license in webOS Open Source Edition, it looked like a great match for AGL. So my team did a proof of concept in May and it was succesful. At the same time Igalia has been working on porting Chromium browser to AGL. So, after some discussions AGL approved sponsoring my company, Igalia for porting the LG webOS web runtime to AGL.

As LGSVL was hosting the september 2018 AGL f2f meeting, Igalia sponsored my trip to the event.

AGL f2f Santa Clara 2018, AGL wiki CC BY 4.0

So we took the opportunity to continue discussions and progress in the development of the WAM AGL port. And, as we expected, it was quite beneficial to unblock tasks like AGL app framework security integration, and the support of AGL latest official release, Funky Flounder. Julie Kim from Igalia attended the event too, and presented an update on the progress of the Ozone Wayland port.

The organization and the venue were great. Thanks to LGSVL!

Web Engines Hackfest 2018 at Igalia

Next trip was definitely closer. Just 90 minutes drive to our Igalia headquarters in A Coruña.


Igalia has been organizing this event since 2009. It is a cross-web-engine event, where engineers of Mozilla, Chromium and WebKit have been meeting yearly to do some hacking, and discuss the future of the web.

This time my main interest was participating in the discussions about the effort by Igalia and Google to support Wayland natively in Chromium. I was pleased to know around 90% of the work had already landed in upstream Chromium. Great news as it will smooth integration of Chromium for embedders using Ozone Wayland, like webOS. It was also great to know the work for improving GPU performance reducing the number of copies required for painting web contents.

Web Engines Hackfest 2018 CC BY-SA 2.0

Other topics of my interest:
– We did a follow-up of the discussion in last BlinkOn about the barriers for Chromium embedders, sharing the experiences maintaining a downstream Chromium tree.
– Joined the discussions about the future of WebKitGTK. In particular the graphics pipeline adaptation to the upcoming GTK+ 4.

As usual, the organization was great. We had 70 people in the event, and it was awesome to see all the activity in the office, and so many talented engineers in the same place. Thanks Igalia!

Web Engines Hackfest 2018 CC BY-SA 2.0

AGL All Members Meeting Europe 2018 at Dresden

The last event in barely a month was my first visit to the beautiful town of Dresden (Germany).

The goal was continuing the discussions for the projects Igalia is developing for AGL platform: Chromium upstream native Wayland support, and the WAM web runtime port. We also had a booth showcasing that work, but also our lightweight WebKit port WPE that was, as usual, attracting interest with its 60fps video playback performance in a Raspberry Pi 2.

I co-presented with Steve Lemke a talk about the automotive activities at LGSVL, taking the opportunity to update on the status of the WAM web runtime work for AGL (slides here). The project is progressing and Igalia should be landing soon the first results of the work.

Igalia booth at AGL AMM Europe 2018

It was great to meet all this people, and discuss in person the architecture proposal for the web runtime, unblocking several tasks and offering more detailed planning for next months.

Dresden was great, and I can’t help highlighting the reception and guided tour in the Dresden Transportation Museum. Great choice by the organization. Thanks to Linux Foundation and the AGL project community!

Next: Chrome Dev Summit 2018

So… what’s next? I will be visiting San Francisco in November for Chrome Dev Summit.

I can only thank Igalia for sponsoring my attendance to these events. They are quite important for keeping things moving forward. But also, it is also really nice to meet friends and collaborators. Thanks Igalia!

by José Dapena Paz at October 25, 2018 09:29 AM

October 20, 2018

Manuel Rego

Igalia at TPAC 2018

Just a quick update before boarding to Lyon for TPAC 2018. This year 12 igalians will be at TPAC, 10 employees (Álex García Castro, Daniel Ehrenberg, Javier Fernández, Joanmarie Diggs, Martin Robinson, Rob Buis, Sergio Villar, Thibault Saunier and myself) and 2 coding experience students (Oriol Brufau and Sven Sauleau). We will represent Igalia in the different working groups and breakout sessions.

On top of that Igalia will have a booth in the solutions showcase where we’ll have a few demos of our last developments like: WebRTC, MSE, CSS Grid Layout, CSS Box Alignment, MathML, etc. Showing them in some low-end boards like the Raspebrry Pi using WPE an optimized WebKit port for embedded platforms.

Thread by W3C Developers announcing my talk.

In my personal case I’ll be attending the CSS Working Group (CSSWG) and Houdini Task Force meetings to follow the work Igalia has been doing on the implementation of different standards. In addition, I’ll be giving a talk about how to contribute to the evolution of CSS on the W3C Developer Meetup that happens on Monday. I’ll try to explain how easy is nowadays to provide feedback to the CSSWG and have some influence on the different specifications.

Tweet by Daniel Ehrenberg about the Web Platform position.

Last but not least, Igalia Web Platform Team is hiring, we’re looking for people willing to work on web standards from the implementation on the different browser engines, to the discussions with the standard bodies or the definition of test suites. If you’re attending TPAC and you want to work on a flat company focused on free software development, probably you are a good candidate to join us. Read the position announcement and don’t hesitate to talk to any of us there about that.

See you at TPAC tomorrow!

October 20, 2018 10:00 PM

October 11, 2018

Andy Wingo

heap object representation in spidermonkey

I was having a look through SpiderMonkey's source code today and found something interesting about how it represents heap objects and wanted to share.

I was first looking to see how to implement arbitrary-length integers ("bigints") by storing the digits inline in the allocated object. (I'll use the term "object" here, but from JS's perspective, bigints are rather values; they don't have identity. But I digress.) So you have a header indicating how many words it takes to store the digits, and the digits follow. This is how JavaScriptCore and V8 implementations of bigints work.

Incidentally, JSC's implementation was taken from V8. V8's was taken from Dart. Dart's was taken from Go. We might take SpiderMonkey's from Scheme48. Good times, right??

When seeing if SpiderMonkey could use this same strategy, I couldn't find how to make a variable-sized GC-managed allocation. It turns out that in SpiderMonkey you can't do that! SM's memory management system wants to work in terms of fixed-sized "cells". Even for objects that store properties inline in named slots, that's implemented in terms of standard cell sizes. So if an object has 6 slots, it might be implemented as instances of cells that hold 8 slots.

Truly variable-sized allocations seem to be managed off-heap, via malloc or other allocators. I am not quite sure how this works for GC-traced allocations like arrays, but let's assume that somehow it does.

Anyway, the point of this blog post. I was looking to see which part of SpiderMonkey reserves space for type information. For example, almost all objects in V8 start with a "map" word. This is the object's "hidden class". To know what kind of object you've got, you look at the map word. That word points to information corresponding to a class of objects; it's not available to store information that might vary between objects of that same class.

Interestingly, SpiderMonkey doesn't have a map word! Or at least, it doesn't have them on all allocations. Concretely, BigInt values don't need to reserve space for a map word. I can start storing data right from the beginning of the object.

But how can this work, you ask? How does the engine know what the type of some arbitrary object is?

The answer has a few interesting wrinkles. Firstly I should say that for objects that need hidden classes -- e.g. generic JavaScript objects -- there is indeed a map word. SpiderMonkey calls it a "Shape" instead of a "map" or a "hidden class" or a "structure" (as in JSC), but it's there, for that subset of objects.

But not all heap objects need to have these words. Strings, for example, are values rather than objects, and in SpiderMonkey they just have a small type code rather than a map word. But you know it's a string rather than something else in two ways: one, for "newborn" objects (those in the nursery), the GC reserves a bit to indicate whether the object is a string or not. (Really: it's specific to strings.)

For objects promoted out to the heap ("tenured" objects), objects of similar kinds are allocated in the same memory region (in kind-specific "arenas"). There are about a dozen trace kinds, corresponding to arena kinds. To get the kind of object, you find its arena by rounding the object's address down to the arena size, then look at the arena to see what kind of objects it has.

There's another cell bit reserved to indicate that an object has been moved, and that the rest of the bits have been overwritten with a forwarding pointer. These two reserved bits mostly don't conflict with any use a derived class might want to make from the first word of an object; if the derived class uses the first word for integer data, it's easy to just reserve the bits. If the first word is a pointer, then it's probably always aligned to a 4- or 8-byte boundary, so the low bits are zero anyway.

The upshot is that while we won't be able to allocate digits inline to BigInt objects in SpiderMonkey in the general case, we won't have a per-object map word overhead; and we can optimize the common case of digits requiring only a word or two of storage to have the digit pointer point to inline storage. GC is about compromise, and it seems this can be a good one.

Well, that's all I wanted to say. Looking forward to getting BigInt turned on upstream in Firefox!

by Andy Wingo at October 11, 2018 02:33 PM

October 09, 2018

Manuel Rego

Web Engines Hackfest 2018

One year more and a new edition of the Web Engines Hackfest was arranged by Igalia. This time it was the tenth edition, the first five ones using the WebKitGTK+ Hackfest name and another five editions with the new broader name Web Engines Hackfest. A group of igalians, including myself, have been organizing this event. It has been some busy days for us, but we hope everyone enjoyed it and had a great time during the hackfest.

This was the biggest edition ever, we were 70 people from 15 different companies including Apple, Google and Mozilla (three of the main browser vendors). It seems the hackfest is getting more popular, several people attending are repeating in the next editions, so that shows they enjoy it. This is really awesome and we’re thrilled about the future of this event.

Talks

The presentations are not the main part of the event, but I think it’s worth to do a quick recap about the ones we had this year:

  • Behdad Esfahbod and Dominik Röttsches from Google talked about Variable Fonts and the implementation in Chromium. It’s always amazing to check the possibilities of this new technology.

  • Camille Lamy, Colin Blundell and Robert Kroeger from Google presented the Servicification effort in the Chromium project. Which is trying to modularize Chromium in smaller parts.

  • Žan Doberšek from Igalia gave an update on WPE WebKit. The port is now official and it’s used everyday in more and more low-end devices.

  • Thibault Saunier from Igalia complemented Žan’s presentation talking about the GStreamer based WebRTC implementation in WebKitGTK+ and WPE ports. Really cool to see WebRTC arriving to more browsers and web engines.

  • Antonio Gomes and Jeongeun Kim from Igalia explained the status of Chromium on Wayland and it’s way to become fully supported upstream. This work will help to use Chromium on embedded systems.

  • Youenn Fablet from Apple closed the event talking about Service Workers support on WebKit. This is a key technology for Progressive Web Apps (PWA) and is now available in all major browsers.

The slides of the talks are available on the website and wiki. The videos will be published soon in our YouTube channel.

Some pictures from Web Engines Hackfest 2018 Some pictures from Web Engines Hackfest 2018 (Flickr album)

Other topics

During the event there were breakout sessions about many different topics. In this section I’m going to talk about the ones I’m more interested on.

  • Web Platform Tests (WPT)

    This is a key topic to improve interoperability on the web platform. Simon Pieters started the session with an introduction to WPT just in case someone was not aware of the repository and how it works. For the rest of the session we discussed the status of WPT on the different browsers.

    Chromium and Firefox are doing an automatic two ways (import/export) synchronization process so the tests can be easily shared between both implementations. On the other side WebKit still has some kind of manual process over the table, neither import or export is totally automatic, there are some scripts that help with the process though.

    Apart from that, WPT is a first-class citizen in Chromium, and the encouraged way to do new developments. In Firefox it’s still not there, as the test suites are not run in all the possible configurations yet (but they’re getting there).

    Finally the WPT dashboard is showing results for the most recent unstable releases of the different browsers, which is really cool despite being somehow hidden on the UI: https://wpt.fyi/results/?label=experimental.

  • LayoutNG

    Christian Biesinger gave an introduction to LayoutNG project in Blink, where Google is rewriting Chromium’s layout engine. He showed the main ideas and concepts behind this effort and navigated the code showing some examples. According to Christian things are getting ready and LayoutNG could be shipping in the coming months for inline and block layout.

    On top of questions about LayoutNG, we briefly mentioned how other browsers are also trying to improve the layout code: Firefox with Servo layout and WebKit with Layout Formatting Context (LFC) aka Layout Reloaded. It seems quite clear that the current layout engines are getting to their limits and people are looking for new solutions.

  • Chromium downstream

    Several companies (Google included) have to maintain downstream forks Chromium with their own customizations to fit their particular use cases and hardware platforms.

    Colin Blundell was explaining how it was the process of maintaining the downstream version of Chrome for iOS. After trying many different strategies the best solution was rebasing their changes 2-3 times per day. That way the conflicts they had to deal with were much simpler to resolve, otherwise it was not possible for them to cope with all the upstream changes. Note that he mentioned that one (rotatory) full-time resource was required to perform this job in time.

    It was good to share the experiences of different companies that are facing very similar issues for this kind of work.

Thank you very much

Just to close this post, big thanks to all the people attending the event, without you the hackfest wouldn’t have any sense at all. People are key for this event where discussions and conversations are one of the main parts of it.

Of course special acknowledgments to the speakers for the hard work they put on their lovely talks.

Finally I couldn’t forget to thank the Web Engines Hackfest 2018 sponsors: Google and Igalia. Without their support this event won’t be possible.

Web Engines Hackfest 2018 sponsors: Google and Igalia Web Engines Hackfest 2018 sponsors: Google and Igalia

Looking forward for a new edition!

October 09, 2018 10:00 PM

October 08, 2018

Eleni Maria Stea

XDC2018

I am back home after my last trip for this year and a long and great week in A Coruña where I attended my first XDC, organized and sponsored by Igalia. It was a funny but also tiring week, especially for my colleagues of the organisation team (Sam, Chema, Juan and Antia) that were working … Continue reading XDC2018

by hikiko at October 08, 2018 08:48 AM

October 03, 2018

Samuel Iglesias

XDC 2018 experience

XDC 2018 This year, X.org Developers’ Conference (XDC 2018) happened in the Computer Science Faculty of University of Coruña, in the city of A Coruña, Spain during the last week of September, from Wednesday 26th to Friday 28th. XDC 2018 was a 3-day conference full of talks about all the technologies about free software graphics stack covering different topics like graphics driver development, testing and benchmarking, DRM, X, virtualization… Check the schedule for more info, slide decks and, once we have them edited, videos. For those loving statistics, we had 18 main track talks, 4 Workshops, 17 lightning talks, 3 social events, hallway tracks… Amazing!

XDC 2018 photo

This year we had 110 registered attendees and +40 students from the University of A Coruña attending it. Taking into account that some people couldn’t come at the very last minute, we estimate that we had ~140 attendees in total, which is probably the most successful XDC conference ever in terms of attendance. Honestly, we were a bit worried as coming to A Coruña is not so easy as other international hubs, so thanks everybody for coming to A Coruña!

GPUL

This year the conference was organized by GPUL (Galician Linux User and Developer Group founded in 1998) together with University of A Coruña, Igalia and, of course, X.Org Foundation. The organization team was composed by 12 volunteers, some from Igalia and the rest from GPUL, who were taking care that everything went fine and fixed all the late minute issues that happened. I hope GPUL can keep organizing events for another 20 years! :-D

However, we are not perfect. Feel free to send us your feedback to both xdc2018@gpul.org and board@foundation.x.org, we would like to improve the organization of both next XDC conferences and our own local conferences.

Thanks to Igalia for allowing me organizing this event, for their platinum sponsorship and for sponsor Tuesday and Wednesday events.

Igalia

October 03, 2018 12:00 PM

September 22, 2018

Adrián Pérez

Comments: Begone!

Starting today, the comments section is not available any more in the articles at the perezdecastro.org website. The main reason for for taking this decision is that I have never felt particularly comfortable leaving that to a third party service. I have had in mind to provide a self-hosted solution, which never materialized; and given the little amount of comments posted in the last years, it seemed to me that removing the comments section altogether would be the best course of action.

This means that now there is absolutely nothing in the website which could potentially track visitors. This is a small contribution against surveillance capitalism that I have been wanting to do for a while.

The only information stored are the usual HTTP server logs; which contain the IP addresses of visitors and the URLs they fetch from the server. Because I do my own hosting I know for a fact that this data is not used for anything other than troubleshooting, and only a few months of it are kept. Also, IP addresses are anyway “public” information in the sense that any server we connect to gets to know it, and that is part of how the Internet works.

Thanks for reading, and enjoy the ad- and tracker-free experience!

by aperez (adrian@perezdecastro.org) at September 22, 2018 03:55 PM