Planet Igalia

January 17, 2018

Andy Wingo

instruction explosion in guile

Greetings, fellow Schemers and compiler nerds: I bring fresh nargery!

instruction explosion

A couple years ago I made a list of compiler tasks for Guile. Most of these are still open, but I've been chipping away at the one labeled "instruction explosion":

Now we get more to the compiler side of things. Currently in Guile's VM there are instructions like vector-ref. This is a little silly: there are also instructions to branch on the type of an object (br-if-tc7 in this case), to get the vector's length, and to do a branching integer comparison. Really we should replace vector-ref with a combination of these test-and-branches, with real control flow in the function, and then the actual ref should use some more primitive unchecked memory reference instruction. Optimization could end up hoisting everything but the primitive unchecked memory reference, while preserving safety, which would be a win. But probably in most cases optimization wouldn't manage to do this, which would be a lose overall because you have more instruction dispatch.

Well, this transformation is something we need for native compilation anyway. I would accept a patch to do this kind of transformation on the master branch, after version 2.2.0 has forked. In theory this would remove most all high level instructions from the VM, making the bytecode closer to a virtual CPU, and likewise making it easier for the compiler to emit native code as it's working at a lower level.

Now that I'm getting close to finished I wanted to share some thoughts. Previous progress reports on the mailing list.

a simple loop

As an example, consider this loop that sums the 32-bit floats in a bytevector. I've annotated the code with lines and columns so that you can correspond different pieces to the assembly.

   0       8   12     19
 +-v-------v---v------v-
 |
1| (use-modules (rnrs bytevectors))
2| (define (f32v-sum bv)
3|   (let lp ((n 0) (sum 0.0))
4|     (if (< n (bytevector-length bv))
5|         (lp (+ n 4)
6|             (+ sum (bytevector-ieee-single-native-ref bv n)))
7|          sum)))

The assembly for the loop before instruction explosion went like this:

L1:
  17    (handle-interrupts)     at (unknown file):5:12
  18    (uadd/immediate 0 1 4)
  19    (bv-f32-ref 1 3 1)      at (unknown file):6:19
  20    (fadd 2 2 1)            at (unknown file):6:12
  21    (s64<? 0 4)             at (unknown file):4:8
  22    (jnl 8)                ;; -> L4
  23    (mov 1 0)               at (unknown file):5:8
  24    (j -7)                 ;; -> L1

So, already Guile's compiler has hoisted the (bytevector-length bv) and unboxed the loop index n and accumulator sum. This work aims to simplify further by exploding bv-f32-ref.

exploding the loop

In practice, instruction explosion happens in CPS conversion, as we are converting the Scheme-like Tree-IL language down to the CPS soup language. When we see a Tree-Il primcall (a call to a known primitive), instead of lowering it to a corresponding CPS primcall, we inline a whole blob of code.

In the concrete case of bv-f32-ref, we'd inline it with something like the following:

(unless (and (heap-object? bv)
             (eq? (heap-type-tag bv) %bytevector-tag))
  (error "not a bytevector" bv))
(define len (word-ref bv 1))
(define ptr (word-ref bv 2))
(unless (and (<= 4 len)
             (<= idx (- len 4)))
  (error "out of range" idx))
(f32-ref ptr len)

As you can see, there are four branches hidden in the bv-f32-ref: two to check that the object is a bytevector, and two to check that the index is within range. In this explanation we assume that the offset idx is already unboxed, but actually unboxing the index ends up being part of this work as well.

One of the goals of instruction explosion was that by breaking the operation into a number of smaller, more orthogonal parts, native code generation would be easier, because the compiler would only have to know about those small bits. However without an optimizing compiler, it would be better to reify a call out to a specialized bv-f32-ref runtime routine instead of inlining all of this code -- probably whatever language you write your runtime routine in (C, rust, whatever) will do a better job optimizing than your compiler will.

But with an optimizing compiler, there is the possibility of removing possibly everything but the f32-ref. Guile doesn't quite get there, but almost; here's the post-explosion optimized assembly of the inner loop of f32v-sum:

L1:
  27    (handle-interrupts)
  28    (tag-fixnum 1 2)
  29    (s64<? 2 4)             at (unknown file):4:8
  30    (jnl 15)               ;; -> L5
  31    (uadd/immediate 0 2 4)  at (unknown file):5:12
  32    (u64<? 2 7)             at (unknown file):6:19
  33    (jnl 5)                ;; -> L2
  34    (f32-ref 2 5 2)
  35    (fadd 3 3 2)            at (unknown file):6:12
  36    (mov 2 0)               at (unknown file):5:8
  37    (j -10)                ;; -> L1

good things

The first thing to note is that unlike the "before" code, there's no instruction in this loop that can throw an exception. Neat.

Next, note that there's no type check on the bytevector; the peeled iteration preceding the loop already proved that the bytevector is a bytevector.

And indeed there's no reference to the bytevector at all in the loop! The value being dereferenced in (f32-ref 2 5 2) is a raw pointer. (Read this instruction as, "sp[2] = *(float*)((byte*)sp[5] + (uptrdiff_t)sp[2])".) The compiler does something interesting; the f32-ref CPS primcall actually takes three arguments: the garbage-collected object protecting the pointer, the pointer itself, and the offset. The object itself doesn't appear in the residual code, but including it in the f32-ref primcall's inputs keeps it alive as long as the f32-ref itself is alive.

bad things

Then there are the limitations. Firstly, instruction 28 tags the u64 loop index as a fixnum, but never uses the result. Why is this here? Sadly it's because the value is used in the bailout at L2. Recall this pseudocode:

(unless (and (<= 4 len)
             (<= idx (- len 4)))
  (error "out of range" idx))

Here the error ends up lowering to a throw CPS term that the compiler recognizes as a bailout and renders out-of-line; cool. But it uses idx as an argument, as a tagged SCM value. The compiler untags the loop index, but has to keep a tagged version around for the error cases.

The right fix is probably some kind of allocation sinking pass that sinks the tag-fixnum to the bailouts. Oh well.

Additionally, there are two tests in the loop. Are both necessary? Turns out, yes :( Imagine you have a bytevector of length 1025. The loop continues until the last ref at offset 1024, which is within bounds of the bytevector but there's one one byte available at that point, so we need to throw an exception at this point. The compiler did as good a job as we could expect it to do.

is is worth it? where to now?

On the one hand, instruction explosion is a step sideways. The code is more optimal, but it's more instructions. Because Guile currently has a bytecode VM, that means more total interpreter overhead. Testing on a 40-megabyte bytevector of 32-bit floats, the exploded f32v-sum completes in 115 milliseconds compared to around 97 for the earlier version.

On the other hand, it is very easy to imagine how to compile these instructions to native code, either ahead-of-time or via a simple template JIT. You practically just have to look up the instructions in the corresponding ISA reference, is all. The result should perform quite well.

I will probably take a whack at a simple template JIT first that does no register allocation, then ahead-of-time compilation with register allocation. Getting the AOT-compiled artifacts to dynamically link with runtime routines is a sufficient pain in my mind that I will put it off a bit until later. I also need to figure out a good strategy for truly polymorphic operations like general integer addition; probably involving inline caches.

So that's where we're at :) Thanks for reading, and happy hacking in Guile in 2018!

by Andy Wingo at January 17, 2018 10:30 AM

January 15, 2018

Asumu Takikawa

Supporting both VMDq and RSS in Snabb

In my previous blog post, I talked about the support libraries and the core structure of Snabb’s NIC drivers. In this post, I’ll talk about some of the driver improvements we made at Igalia over the last few months.

(as in my previous post, this work was joint work with Nicola Larosa)

Background

Modern NICs are designed to take advantage of increasing parallelism in modern CPUs in order to scale to larger workloads.

In particular, to scale to 100G workloads, it becomes necessary to work in parallel since a single off-the-shelf core cannot keep up. Even with 10G hardware, processing packets in parallel makes it easier for software to operate at line-rate because the time budget is quite tight.

To get an idea of what the time budget is like, see these calculations. tl;dr is 67.2 ns/packet or about 201 cycles.

To scale to multiple CPUs, NICs have a feature called receive-side scaling or RSS which distributes incoming packets to multiple receive queues. These queues can be serviced by separate cores.

RSS and related features for Intel NICs are detailed more in an overview whitepaper

RSS works by computing a hash in hardware over the packet to determine the flow it belongs to (this is similar to the hashing used in IPFIX, which I described in a previous blog post).

RSS diagram
A diagram showing how RSS directs packets

The diagram above tries to illustrate this. When a packet arrives in the NIC, the hash is computed. Packets with the same hash (i.e., they’re in the same flow) are directed to a particular receive queue. Receive queues live in RAM as a ring buffer (shown as blue rings in the diagram) and packets are placed there via DMA by consulting registers on the NIC.

All this means that network functions that depend on tracking flow-related state can usually still work in this parallel setup.

As a side note, you might wonder (I did anyway!) what happens to fragmented packets whose flow membership may not be identifiable from a fragment. It turns out that on Intel NICs, the hash function will ignore the layer 3 flow information when a packet is fragmented. This means that on occasion a fragmented packet may end up on a different queue than a non-fragmented packet in the same flow. More on this problem here.

Snabb’s two Intel drivers

The existing driver used in most Snabb programs (apps.intel.intel_app) worked well and was mature but was missing support for RSS.

An alternate driver (apps.intel_mp.intel_mp) made by Peter Bristow supported RSS, but wasn’t entirely compatible with the features provided by the main Intel driver. We worked on extending intel_mp to work as a more-or-less drop in replacement for intel_app.

The incompatibility between the two drivers was caused mainly by lack of support for VMDq (Virtual Machine Device Queues) in intel_mp. This is another feature that allows for multiple queue operation on Intel NICs that is used to allow a NIC to present itself as multiple virtualized sets of queues. It’s often used to host VMs in a virtualized environment, but can also be used (as in Snabb) for serving logically separate apps.

The basic idea is that queues may be assigned to separate pools assigned to a VM or app with its own particular MAC address. A host can use this to run logically separate network functions sharing a single NIC. As with RSS, services running on separate cores can service the queues in parallel.

RSS diagram
A diagram showing how VMDq affects queue selection

As the diagram above shows, adding VMDq changes queue selection slightly from the RSS case above. An appropriate pool is selected based on criteria such as the MAC address (or VLAN tag, and so on) and then RSS may be used.

BTW, VMDq is not the only virtualization feature on these NICs. There is also SR-IOV or “Single Root I/O Virtualization” which is designed to provide a virtualized NIC for every VM that directly uses the NIC hardware resources. My understanding is that Snabb doesn’t use it for now because we can implement more switching flexibility in software.

The intel_app driver supports VMDq but not RSS and the opposite situation is true for intel_mp. It turns out that both features can be used simultaneously, in which case packets are first sorted by MAC address and then by flow hashing for RSS. Basically each VMDq pool has its own set of RSS queues.

We implemented this support in the intel_mp driver and made the driver interface mostly compatible with intel_app so that only minimal modifications are necessary to switch over. In the process, we made bug-fixes and performance fixes in the driver to try to ensure that performance and reliability are comparable to using intel_app.

The development process was made a lot easier due to the existence of the intel_app code that we could copy and follow in many cases.

The tricky parts were making sure that the NIC state was set correctly when multiple processes were using the NIC. In particular, intel_app can rely on tracking VMDq state inside a single Lua process.

For intel_mp, it is necessary to use locking and IPC (via shared memory) to coordinate between different Lua processes that are setting driver state. In particular, the driver needs to be careful to be aware of what resources (VMDq pool numbers, MAC address registers, etc.) are available for use.

Current status

The driver improvements are now merged upstream in intel_mp, which is now the default driver, and is available in the Snabb 2017.11 “Endive” release. It’s still possible to opt out and use the old driver in case there are any problems with using intel_mp. And of course we appreciate any bug reports or feedback.

by Asumu Takikawa at January 15, 2018 04:24 PM

January 12, 2018

Diego Pino

More practical Snabb

Some time ago, in a Hacker News thread an user proposed the following use case for Snabb:

I have a ChromeCast on my home network, but I want sandbox/log its traffic. I would want to write some logic to ignore video data, because that’s big. But I want to see the metadata and which servers it’s talking to. I want to see when it’s auto-updating itself with new binaries and record them.

Is that a good use case for Snabb Switch, or is there is an easier way to accomplish what I want?

I decided to take this request and implement it as a tutorial. Hopefully, the resulting tutorial can be a valuable piece of information highlighting some of Snabb’s strengths:

  • Fine-grained control of the data-plane.
  • Wide variety of solid libraries for protocol parsing.
  • Rapid prototyping.

Limiting the project’s scope

Before putting my hands down on this project, I broke it down into smaller pieces and checked how much of it is already supported in Snabb. To fully implement this project I’d need:

  1. To be able to discover Chromecast devices.
  2. Identify their network flows.
  3. Save the data to disk.

Snabb provides libraries to identify network flows as well as capturing packets and filter them by content. That pretty much covers bullets 2) and 3). However, Snabb doesn’t provide any tool or library to fully support bullet 1). Thus, I’m going to limit the scope of this tutorial to that single feature: Discover Chromecast and similar devices in a local network.

Multicast DNS

A fast lookup on Chromecast’s Wikipedia article reveals Chromecast devices rely on a protocol called Multicast DNS (mDNS).

Multicast DNS is standardized as RFC6762. The origin of the protocol goes back to Apple’s Rendezvous, later rebranded as Bonjour. Bonjour is in fact the origin of the more generic concept known as Zeroconf. Zeroconf’s goal is to automatically create usable TCP/IP computer networks when computers or network peripherals are interconnected. It is composed of three main elements:

  • Addressing: Self-Assigned Link-Local Addressing (RFC2462 and RFC3927). Automatically assigned addresses in the 169.254.0.0/16 network space.
  • Naming: Multicast DNS (RFC6762). Host name resolution.
  • Browsing: DNS Service Discovery (RFC6763). The ability of discovering devices and services in a local network.

Multicast DNS and DNS-SD are very similar and are often mixed up, although they are not strictly the same thing. The former is the description of how to do name resolution in a serverless DNS network, while DNS-SD, although a protocol as well, is an specific use of Multicast DNS.

One of the nicest things of Multicast DNS is that it reuses many of the concepts of DNS. This allowed mDNS to spread quickly and gain fast adoption, since existing software only required mininimal change. What’s more, programmers didn’t need to learn new APIs or study a completely brand-new protocol.

Today Multicast DNS is featured in a myriad of small devices, ranging from Google Chromecast to Amazon’s FireTV or Philips Hue lights, as well as software such as Apple’s Bonjour or Spotify.

This tutorial is going to focus pretty much on mDNS/DNS-SD. Since Multicast DNS reuses many of the ideas of DNS, I am going to review DNS first. Feel free to skip the next section if you are already familiar with DNS.

DNS basis

The most common use case of DNS is resolving host names to IP addresses:

$ dig igalia.com -t A +short
91.117.99.155

In the command above, flag ‘-t A’ means an Address record. There are actually many different types of DNS records. The most common ones are:

  • A (Address record). Used to map hostnames to IPv4 address.
  • AAAA (IPv6 address record). Used to map hostnames to IPv6 address.
  • PTR (Pointer record). Used for reverse DNS lookups, that means, IP addresses to hostnames.
  • SOA (Start of zone of authority). DNS can be seen as a distributed database which is organized in a hierarchical layout of subdomains. A DNS zone is a contiguous portion of the domain space for which a server is responsible of. The top-level DNS zone is known as the DNS root zone, which consists of 13 logical root name servers (although there are more than 13 instances) that contain the top-level domains, generic top-level domains (.com, .net, etc) and country code top-level domains. The command below prints out how the domain www.google.com gets resolved (I trimmed down the output for the sake of clarity).
$ dig @8.8.8.8 www.google.com +trace

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @8.8.8.8 www.google.com +trace
; (1 server found)
;; global options: +cmd
.                       181853  IN      NS      k.root-servers.net.
.                       181853  IN      NS      g.root-servers.net.
.                       181853  IN      NS      j.root-servers.net.
.                       181853  IN      RRSIG   NS 8 0 518400 20180117170000 20180104160000 41824 ....
;; Received 525 bytes from 8.8.8.8#53(8.8.8.8) in 48 ms

com.                    172800  IN      NS      j.gtld-servers.net.
com.                    172800  IN      NS      k.gtld-servers.net.
com.                    172800  IN      NS      l.gtld-servers.net.
com.                    86400   IN      RRSIG   DS 8 1 86400 20180118170000 20180105160000 41824 ...
;; Received 1174 bytes from 199.7.83.42#53(l.root-servers.net) in 44 ms

google.com.             172800  IN      NS      ns2.google.com.
google.com.             172800  IN      NS      ns1.google.com.
google.com.             172800  IN      NS      ns3.google.com.
google.com.             172800  IN      NS      ns4.google.com.

;; Received 664 bytes from 192.26.92.30#53(c.gtld-servers.net) in 44 ms

www.google.com.         300     IN      A       216.58.201.132
;; Received 48 bytes from 216.239.32.10#53(ns1.google.com) in 48 ms

The domain name is split in parts. First the top-level domain is consulted which returns a list of name servers. The root server l.root-servers.net gets consulted to resolve the subdomain .com. That also returns a list of generic top-level domain name servers. Name server c.gtld-servers.net is picked and returns another list of name servers for google.com. Finally www.google.com gets resolved by ns1.google.com, that returns the A record containing the domain name IPv4 address.

Using DNS is also possible to resolve an IP address to a domain name.

$ dig -x 8.8.4.4 +short
google-public-dns-b.google.com.

In this case, the type record is PTR. The command above is equivalent to:

$ dig 4.4.8.8.in-addr.arpa -t PTR +short
google-public-dns-b.google.com.

When using PTR records for reverse lookups, the target IPv4 addres has to be part of the domain in-addr.arpa. This is an special domain registered under the top-level domain arpa and it’s used for reverse IPv4 lookup. Reverse lookup is the most common use of PTR records, but in fact PTR records are just pointers to a canonical name and other uses are possible as we will see later.

Summarizing:

  • DNS helps solving a host name to an IP address. Other types of record resolution are possible.
  • DNS is a centralized protocol where DNS servers respond to DNS queries.
  • DNS names are grouped in zones or domains, forming a hierarchical structure. Each SOA is responsible of the name resolution within its area.

DNS Service Discovery

Unlike DNS, Multicast DNS doesn’t require a central server. Instead devices listen on port 5353 for DNS queries to a multicast address. In the case of IPv4, this destination address is 224.0.0.251. In addition, the destination Ethernet address of a mDNS request must be 01:00:5E:00:00:FB.

The Multicast DNS standard defines the domain name local as a pseudo-TLD (top-level domain) under which hosts and services can register. For instance, a laptop computer might answer to the name mylaptop.local (replace mylaptop for your actual laptop’s name).

$ dig @224.0.0.251 -p 5353 mylaptop.local. +short
192.168.0.13

To discover all the services and devices in a local network, DNS-SD sends a PTR Multicast DNS request asking for the domain name `services._dns-sd._udp.local.

$ dig @224.0.0.251 -p 5353 -t PTR _services._dns-sd._udp.local

The expected result should be a set of PTR records announcing their domain name. In my case the dig command doesn’t print out any PTR records, but using tcpdump I can check I’m in fact receiving mDNS responses:

$ sudo tcpdump "port 5353" -t -qns 0 -e -i wlp3s0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wlp3s0, link-type EN10MB (Ethernet), capture size 262144 bytes
44:85:00:4f:b8:fc > 01:00:5e:00:00:fb, IPv4, length 99: 192.168.86.30.58722 > 224.0.0.251.5353: UDP, length 57
54:60:09:fc:d6:04 > 01:00:5e:00:00:fb, IPv4, length 82: 192.168.86.57.5353 > 224.0.0.251.5353: UDP, length 40
54:60:09:fc:d6:04 > 01:00:5e:00:00:fb, IPv4, length 299: 192.168.86.57.5353 > 224.0.0.251.5353: UDP, length 257
54:60:09:fc:d6:04 > 01:00:5e:00:00:fb, IPv4, length 119: 192.168.86.57.5353 > 224.0.0.251.5353: UDP, length 77
f4:f5:d8:d3:de:dc > 01:00:5e:00:00:fb, IPv4, length 299: 192.168.86.61.5353 > 224.0.0.251.5353: UDP, length 257
f4:f5:d8:d3:de:dc > 01:00:5e:00:00:fb, IPv4, length 186: 192.168.86.61.5353 > 224.0.0.251.5353: UDP, length 144

Why dig doesn’t print out the PTR records is still a mystery to me. So instead of dig I used Avahi, the free software implementation of mDNS/DNS-SD, to browse the available devices:

$ avahi-browse -a
+ wlp3s0 IPv4 dcad2b6c-7a21-10c310-568b-ad83b4a3ea1e          _googlezone._tcp     local
+ wlp3s0 IPv4 1ebe35f6-26f1-bc92-318c-9e35fdcbe11d          _googlezone._tcp     local
+ wlp3s0 IPv4 Google-Cast-Group-71010755f10ad16b10c231437a5e543d1dc3 _googlecast._tcp     local
+ wlp3s0 IPv4 Chromecast-Audio-fd7d2b9d29c92b24db10be10661010eebb9f _googlecast._tcp     local
+ wlp3s0 IPv4 Google-Home-d81d02e1e48a1f0b7d2cbac88f2df820  _googlecast._tcp     local
+ wlp3s0 IPv4 dcad2b6c7a2110c310-0                            _spotify-connect._tcp local

Each row identifies a service instance name. The structure of a service instance name is the following:

Service Instance Name = <Instance> . <Service> . <Domain>

For example, consider the following record “_spotify-connect._tcp.local”:

  • Domain: local. The pseudo-TLD used by Multicast DNS.
  • Service: spotify-connect._tcp. The service names consists of a pair of DNS labels. The first label identifies what the service does (_spotify-connect is a service that allows an user to continue playing Spotify from a phone to a desktop computer, and viceversa). The second label identifies what protocol the service uses, in this case TCP.
  • Instance: dcad2b6c7a2110c310-0. An user friendly name that identifies the service.

Besides a PTR record, an instance also replies with several additional DNS records that might be useful for the requester. These extra records are part of the PTR record and are embed in the DNS additional records field. These extra records are of 3 types:

  • SRV: Gives the target host and port where the service instance can be reached.
  • TXT: Gives additional information about this instance, in a structured form using key/value pairs.
  • A: IPv4 address of the reached instance.

Snabb’s DNS-SD

Now that we have a fair understanding of Multicast DNS and DNS-SD, we can start coding the app in Snabb. Like on the previous posts I decided not to past the code directly here, instead I’ve pushed the code to a remote branch and will comment on the most relevant parts. To checkout this repo do:

$ git clone https://github.com/snabbco/snabb.git
$ cd snabb
$ git remote add dpino https://github.com/dpino/snabb.git
$ git checkout dns-sd

Highlights:

  • The app needs to send a DNS-SD packet through a network interface managed by the OS. I used Snabb’s RawSocket app to do that.
  • A DNSSD app emits one DNS-SD request every second. This is done in DNSSD’s pull method. There’s a helper class called mDNSQuery that is in charge of composing this request.
  • The DNSSD app receives responses on its push method. If the response is a Multicast DNS packet, it will print out all the contained DNS records in stdout.
  • A Multicast DNS packet is composed by a header and a body. The header contains control information such as number of queries, answers, additional records, etc. The body contains DNS records. If the mDNS packet is a response packet, these are the DNS records we would need to print out.
  • To help me handling Multicast DNS packets I created a MDNS helper class. Similarly, I added a DNS helper class that helps me parsing the necessary DNS records: PTR, SRV, TXT and A records.

Here is Snabb’s dns-sd command in use:

$ sudo ./snabb dnssd --interface wlp3s0
PTR: (name: _services._dns-sd._udp.local; domain-name: _spotify-connect._tcp )
SRV: (target: dcad2b6c7a2110c310-0)
TXT: (CPath=/zc/0;VERSION=1.0;Stack=SP;)
Address: 192.168.86.55
PTR: (name: _googlecast._tcp.local; domain-name: Chromecast-Audio-fd7d2b9d29c92b24db10be10661010eebb9f)
SRV: (target: 1ebe35f6-26f1-bc92-318c-9e35fdcbe11d)
TXT: (id=fd7d2b9d29c92b24db10be10661010eebb9f;cd=224708C2E61AED24676383796588FF7E;
rm=8F2EE2757C6626CC;ve=05;md=Chromecast Audio;ic=/setup/icon.png;fn=Jukebox;
ca=2052;st=0;bs=FA8FCA9E3FC2;nf=1;rs=;)
Address: 192.168.86.57

Finally I’d like to share some trick or practices I used when coding the app:

1) I started small by capturing a DNS-SD’s request emited from Avahi. Then I sent that very same packet from Snabb and checked the response was a Multicast DNS packet:

$ avahi-browse -a
$ sudo tcpdump -i wlp3s0 -w mdns.pcap

Then open mdns.pcap with Wireshark, mark the request packet only and save it to disk. Then use od command to dump the packet as text:

$ od -j 40 -A x -tx1 mdns_request.pcap
000028 01 00 5e 00 00 fb 44 85 00 4f b8 fc 08 00 45 00
000038 00 55 32 7c 00 00 01 11 8f 5a c0 a8 56 1e e0 00
000048 00 fb e3 53 14 e9 00 41 89 9d 25 85 01 20 00 01
000058 00 00 00 00 00 01 09 5f 73 65 72 76 69 63 65 73
000068 07 5f 64 6e 73 2d 73 64 04 5f 75 64 70 05 6c 6f
000078 63 61 6c 00 00 0c 00 01 00 00 29 10 00 00 00 00
000088 00 00 00

This dumped packet can be copied raw into Snabb such in MDNS’s selftest.

NOTE: text2pcap command can also be a very convenient tool to convert a dumped packet in text format to a pcap file.

2) Instead of sending requests on the wire to obtain responses, I saved a bunch of responses to a .pcap file and used the file as an input for the DNS parser. In fact the command supports a –pcap flag that can be used to print out DNS records.

$ sudo ./snabb dnssd --pcap /home/dpino/avahi-browse.pcap
Reading from file: /home/dpino/avahi-browse.pcap
PTR: (name: _services._dns-sd._udp.local; domain-name: _spotify-connect._tcp)
PTR: (name: ; domain-name: dcad2b6c7a2110c310-0)
SRV: (target: dcad2b6c7a2110c310-0)
TXT: (CPath=/zc/0;VERSION=1.0;Stack=SP;)
Address: 192.168.86.55
..._

3) When sending a packet to the wire, checkout the packet’s header checksum are correct. Wireshark has a mode to verify whether a packet’s header checksums are correct or not, which is very convenient. Snabb counts with protocol libraries to calculate a IP, TCP or UDP checksums. Check out how mDNSQuery does it.

Last thoughts

Implementing this tool has helped me to understand DNS better, specially the Multicast DNS/DNS-SD part. I never expected it could be so interesting.

Going from an idea to a working prototype with Snabb is really fast. It’s one of the advantages of user-space networking and one of the things I enjoy the most. That said the resulting code has been bigger that I initially expected. I think that to avoid losing this work I will try to land the DNS and mDNS libraries into Snabb.

This post puts an end to this series of practical Snabb posts. I hope you found them interesting as much as I enjoyed writing them. Luckily in the future these posts can be useful for anyone interested in user-space networking to try out Snabb.

January 12, 2018 06:00 AM

January 11, 2018

Frédéric Wang

Review of Igalia's Web Platform activities (H2 2017)

Last september, I published a first blog post to let people know a bit more about Igalia’s activities around the Web platform, with a plan to repeat such a review each semester. The present blog post focuses on the activity of the second semester of 2017.

Accessibility

As part of Igalia’s commitment to diversity and inclusion, we continue our effort to standardize and implement accessibility technologies. More specifically, Igalian Joanmarie Diggs continues to serve as chair of the W3C’s ARIA working group and as an editor of Accessible Rich Internet Applications (WAI-ARIA) 1.1, Core Accessibility API Mappings 1.1, Digital Publishing WAI-ARIA Module 1.0, Digital Publishing Accessibility API Mappings 1.0 all of which became W3C Recommandations in December! Work on versions 1.2 of ARIA and the Core AAM will begin in January. Stay tuned for the First Public Working Drafts.

We also contributed patches to fix several issues in the ARIA implementations of WebKit and Gecko and implemented support for the new DPub ARIA roles. We expect to continue this collaboration with Apple and Mozilla next year as well as to resume more active maintenance of Orca, the screen reader used to access graphical desktop environments in GNU/Linux.

Last but not least, progress continues on switching to Web Platform Tests for ARIA and “Accessibility API Mappings” tests. This task is challenging because, unlike other aspects of the Web Platform, testing accessibility mappings cannot be done by solely examining what is rendered by the user agent. Instead, an additional tool, an “Accessible Technology Test Adapter” (ATTA) must be also be run. ATTAs work in a similar fashion to assistive technologies such as screen readers, using the implemented platform accessibility API to query information about elements and reporting what it obtains back to WPT which in turn determines if a test passed or failed. As a result, the tests are currently officially manual while the platform ATTAs continue to be developed and refined. We hope to make sufficient progress during 2018 that ATTA integration into WPT can begin.

CSS

This semester, we were glad to receive Bloomberg’s support again to pursue our activities around CSS. After a long commitment to CSS and a lot of feedback to Editors, several of our members finally joined the Working Group! Incidentally and as mentioned in a previous blog post, during the CSS Working Group face-to-face meeting in Paris we got the opportunity to answer Microsoft’s questions regarding The Story of CSS Grid, from Its Creators (see also the video). You might want to take a look at our own videos for CSS Grid Layout, regarding alignment and placement and easy design.

On the development side, we maintained and fixed bugs in Grid Layout implementation for Blink and WebKit. We also implemented alignment of positioned items in Blink and WebKit. We have several improvements and bug fixes for editing/selection from Bloomberg’s downstream branch that we’ve already upstreamed or plan to upstream. Finally, it’s worth mentioning that the work done on display: contents by our former coding experience student Emilio Cobos was taken over and completed by antiik (for WebKit) and rune (for Blink) and is now enabled by default! We plan to pursue these developments next year and have various ideas. One of them is improving the way grids are stored in memory to allow huge grids (e.g. spreadsheet).

Web Platform Predictability

One of the area where we would like to increase our activity is Web Platform Predictability. This is obviously essential for our users but is also instrumental for a company like Igalia making developments on all the open source Javascript and Web engines, to ensure that our work is implemented consistently across all platforms. This semester, we were able to put more effort on this thanks to financial support from Bloomberg and Google AMP.

We have implemented more frame sandboxing attributes WebKit to improve user safety and make control of sandboxed documents more flexible. We improved the sandboxed navigation browser context flag and implemented the new allow-popup-to-escape-sandbox, allow-top-navigation-without-user-activation, and allow-modals values for the sandbox attribute.

Currently, HTML frame scrolling is not implemented in WebKit/iOS. As a consequence, one has to use the non-standard -webkit-overflow-scrolling: touch property on overflow nodes to emulate scrollable elements. In parallel to the progresses toward using more standard HTML frame scrolling we have also worked on annoying issues related to overflow nodes, including flickering/jittering of “position: fixed” nodes or broken Find UI or a regression causing content to disappear.

Another important task as part of our CSS effort was to address compatibility issues between the different browsers. For example we fixed editing bugs related to HTML List items: WebKit’s Bug 174593/Chromium’s Issue 744936 and WebKit’s Bug 173148/Chromium’s Issue 731621. Inconsistencies in web engines regarding selection with floats have also been detected and we submitted the first patches for WebKit and Blink. Finally, we are currently improving line-breaking behavior in Blink and WebKit, which implies the implementation of new CSS values and properties defined in the last draft of the CSS Text 3 specification.

We expect to continue this effort on Web Platform Predictability next year and we are discussing more ideas e.g. WebPackage or flexbox compatibility issues. For sure, Web Platform Tests are an important aspect to ensure cross-platform inter-operability and we would like to help improving synchronization with the conformance tests of browser repositories. This includes the accessibility tests mentioned above.

MathML

Last November, we launched a fundraising Campaign to implement MathML in Chromium and presented it during Frankfurt Book Fair and TPAC. We have gotten very positive feedback so far with encouragement from people excited about this project. We strongly believe the native MathML implementation in the browsers will bring about a huge impact to STEM education across the globe and all the incumbent industries will benefit from the technology. As pointed out by Rick Byers, the web platform is a commons and we believe that a more collective commitment and contribution are essential for making this world a better place!

While waiting for progress on Chromium’s side, we have provided minimal maintenance for MathML in WebKit:

  • We fixed all the debug ASSERTs reported on Bugzilla.
  • We did follow-up code clean up and refactoring.
  • We imported Web Platform tests in WebKit.
  • We performed review of MathML patches.

Regarding the last point, we would like to thank Minsheng Liu, a new volunteer who has started to contribute patches to WebKit to fix issues with MathML operators. He is willing to continue to work on MathML development in 2018 as well so stay tuned for more improvements!

Javascript

During the second semester of 2017, we worked on the design, standardization and implementation of several JavaScript features thanks to sponsorship from Bloomberg and Mozilla.

One of the new features we focused on recently is BigInt. We are working on an implementation of BigInt in SpiderMonkey, which is currently feature-complete but requires more optimization and cleanup. We wrote corresponding test262 conformance tests, which are mostly complete and upstreamed. Next semester, we intend to finish that work while our coding experience student Caio Lima continues work on a BigInt implementation on JSC, which has already started to land. Google also decided to implement that feature in V8 based on the specification we wrote. The BigInt specification that we wrote reached Stage 3 of TC39 standardization. We plan to keep working on these two BigInt implementations, making specification tweaks as needed, with an aim towards reaching Stage 4 at TC39 for the BigInt proposal in 2018.

Igalia is also proposing class fields and private methods for JavaScript. Similarly to BigInt, we were able to move them to Stage 3 and we are working to move them to stage 4. Our plan is to write test262 tests for private methods and work on an implementation in a JavaScript engine early next year.

Igalia implemented and shipped async iterators and generators in Chrome 63, providing a convenient syntax for exposing and using asynchronous data streams, e.g., HTML streams. Additionally, we shipped a major performance optimization for Promises and async functions in V8.

We implemented and shipped two internationalization features in Chrome, Intl.PluralRules and Intl.NumberFormat.prototype.formatToParts. To push the specifications of internationalization features forwards, we have been editing various other internationalization-related specifications such as Intl.RelativeTimeFormat, Intl.Locale and Intl.ListFormat; we also convened and led the first of what will be a monthly meeting of internationalization experts to propose and refine further API details.

Finally, Igalia has also been formalizing WebAssembly’s JavaScript API specification, which reached the W3C first public working draft stage, and plans to go on to improve testing of that specification as the next step once further editorial issues are fixed.

Miscellaneous

Thanks to sponsorship from Mozilla we have continued our involvement in the Quantum Render project with the goal of using Servo’s WebRender in Firefox.

Support from Metrological has also given us the opportunity to implement more web standards from some Linux ports of WebKit (GTK and WPE, including:

  • WebRTC
  • WebM
  • WebVR
  • Web Crypto
  • Web Driver
  • WebP animations support
  • HTML interactive form validation
  • MSE

Conclusion

Thanks for reading and we look forward to more work on the web platform in 2018. Onwards and upwards!

January 11, 2018 11:00 PM

Gyuyoung Kim

Share my experience to build Chromium with ICECC

If you’re a Chromium developer, I guess that you’ve suffered from the long build time of Chromium like me. Recently, I’ve set up the icecc build environment for Chromium in the Igalia Korea office. Although there have been some instructions how to build Chromium with ICECC, I think that someone might feel they are a bit difficult. So, I’d like to share my experience how to setup the environment to build Chromium on the icecc with Clang on Linux in order to speed up the build of Chromium.

P.S. Recently Google announced that they will open Goma (Google internal distributed build system) for everyone. Let’s see goma can make us not to use icecc anymore 😉
https://groups.google.com/a/chromium.org/forum/?utm_medium=email&utm_source=footer#!msg/chromium-dev/q7hSGr_JNzg/p44IkGhDDgAJ

Prerequisites in your environment

  1. First, we should install the icecc on your all machines.
    sudo apt-get install icecc [icecc-monitor]
  2. To build Chromium using icecc on the Clang, we have to use some configurations when generating the gn files. To do it as easily as possible, I made a script. So you just download it and register it to $PATH. 
    1. Clone the script project
      $ git clone https://github.com/Gyuyoung/ChromiumBuild.git

      FYI, you can see the detailed configuration here – https://github.com/Gyuyoung/ChromiumBuild/blob/master/buildChromiumICECC.sh

    2. Add your chromium/src path to buildChromiumICECC.sh
      # Please set your path to ICECC_VERSION and CHROMIUM_SRC.
      export CHROMIUM_SRC=$HOME/chromium/src
      export ICECC_VERSION=$HOME/chromium/clang.tar.gz
    3. Register it to PATH environment in .bashrc
      export PATH=/path/to/ChromiumBuild:$PATH
    4. Create a clang toolchain from the patched Chromium version
      I added the process to “sync” argument of buildChromiumICECC.sh. So please just execute the below command before starting compiling. Whenever you run it, clang.tar.gz will be updated every time against the latest Chromium version.
$ buildChromiumICECC.sh sync

Build

  1. Run an icecc scheduler on a master machine,
    sudo service icecc-scheduler start
  2. Then, run an icecc daemon on each slave machine,
    sudo service iceccd start

    If you run icemon, you can monitor the build status on the icecc.

  3. Start building in chromium/src,
    $ buildChromiumICECC.sh Debug|Release

    When you start building Chromium, you’ll see that the icecc works on the monitor!

Build Time

In my case, I’ve  been using 1 laptop and 2 desktops for Chromium build. The HW information is as below,

  • Laptop (Dell XPS 15″ 9560)
    1. CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
    2. RAM: 16G
  • Desktop 1
    1. CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
    2. RAM: 16G
  • Desktop 2
    1. CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
    2. RAM: 8G

I’ve measured how long time I’ve spent to build Chromium with the script in below cases,

  • Laptop
    • Build time on Release with the jumbo build : About 84 min
    • Build time on Release without the jumbo build : About 203 min
  • Laptop + Desktop 1 + Desktop 2
    • Build time on Release with the jumbo build : About 35 min
    • Build time on Release without the jumbo build :  About 73 min

But these builds haven’t applied any object caching by ccache yet. If ccache works next time, the build time will be reduced. Besides, the build time is depended on the count of build nodes and performance. So this time can differ from your environment.

Troubleshooting (Ongoing)

  1. Undeclared identifier
    1. Error message
      /home/gyuyoung/chromium/src/third_party/llvm-build/Release+Asserts
       /lib/clang/6.0.0/include/avx512vnniintrin.h:38:20:
       error: use of undeclared identifier '__builtin_ia32_vpdpbusd512_mask'
       return (__m512i) __builtin_ia32_vpdpbusd512_mask ((__v16si) __S, ^
    2. Solution
      Please check if the path of ICECC_VERSION was set correctly.
  2. Loading error libtinfo.so.5
    1. Error message
      usr/bin/clang: error while loading shared libraries: libtinfo.so.5: 
      failed to map segment from shared object
    2. Solution
      Not find a correct fix yet. Just restart the build for now.

Reference

  1. WebKitGTK SpeedUpBuild: https://trac.webkit.org/wiki/WebKitGTK/SpeedUpBuild
  2. compiling-chromium-with-clang-and-icecc : http://mkollaro.github.io/2015/05/08/compiling-chromium-with-clang-and-icecc/
  3. Rune Lillesveen’s icecc-chromium project:
    https://github.com/lilles/icecc-chromium

by gyuyoung at January 11, 2018 12:37 AM

January 10, 2018

Manuel Rego

"display: contents" is coming

Yes, display: contents is enabled by default in Blink and WebKit and it will be probably shipped in Chrome 65 and Safari 11.1. These browsers will join Firefox that is shipping it since version 37, which makes Edge the only one missing the feature (you can vote for it!).

Regarding this I’d like to highlight that the work to support it in Chromium was started by Emilio Cobos during his Igalia Coding Experience that took place from fall 2016 to summer 2017.

You might (or not) remember a blog post from early 2016 where I was talking about the Igalia Coding Experience program and some ideas of tasks to be done as part of the Web Platform team. One of the them was display: contents which is finally happening.

What is display: contents?

This new value for the display property allows you to somehow remove an element from the box tree but still keep its contents. The proper definition from the spec:

The element itself does not generate any boxes, but its children and pseudo-elements still generate boxes and text runs as normal. For the purposes of box generation and layout, the element must be treated as if it had been replaced in the element tree by its contents (including both its source-document children and its pseudo-elements, such as ::before and ::after pseudo-elements, which are generated before/after the element’s children as normal).

A simple example will help to understand it properly:

<div style="display: contents;
            background: magenta; border: solid thick black; padding: 20px;
            color: cyan; font: 30px/1 Monospace;">
  <span style="background: black;">foobar</span>
</div>

display: contents makes that the div doesn’t generate any box, so its background, border and padding are not renderer. However the inherited properties like color and font have effect on the child (span element) as expected.

For this example, the final result would be something like:

<span style="background: black; color: cyan; font: 30px/1 Monospace;">foobar</span>

Unsupported

foobar

Actual

foobar

Supported

foobar

Unsupported vs actual (in your browser) vs supported output for the previous example

If you want more details Rachel Andrew has a nice blog post about this topic.

CSS Grid Layout & display: contents

As you could expect from a post from myself this is somehow related to CSS Grid Layout. 😎 display: contents can be used as a replacement of subgrids (which are not supported by any browser at this point) in some use cases. However subgrids are still needed for other scenarios.

The canonical example for Grid Layout auto-placement is a simple form like:

<style>
  form   { display:     grid;   }
  label  { grid-column: 1;      }
  input  { grid-column: 2;      }
  button { grid-column: span 2; }
</style>
<form>
  <label>Name</label><input />
  <label>Mail</label><input />
  <button>Send</button>
</form>

A simple form formatted with CSS Grid Layout A simple form formatted with CSS Grid Layout

However this is not the typical HTML of a form, as you usually want to use a list inside, so people using screen readers will know how many fields they have to fill in your form beforehand. So the HTML looks more like this:

<form>
  <ul>
    <li><label>Name</label><input /></li>
    <li><label>Mail</label><input /></li>
    <li><button>Send</button></li>
  </ul>
</form>

With display: contents you’ll be able to have the same layout than in the first case with a similar CSS:

ul     { display: grid;       }
li     { display: contents;   }
label  { grid-column: 1;      }
input  { grid-column: 2;      }
button { grid-column: span 2; }

So this is really nice, now when you convert your website to start using CSS Grid Layout, you would need less changes on your HTML and you won’t need to remove some HTML that is really useful, like the list in the previous example.

Chromium implementation

As I said in the introduction, Firefox shipped display: contents three years ago, however Chromium didn’t have any implementation for it. Igalia as CSS Grid Layout implementor was interested in having support for the feature as it’s a handy solution for several Grid Layout use cases.

The proposal for the Igalia Coding Experience was the implementation of display: contents on Blink as the main task. Emilio did an awesome job and managed to implement most of it, reporting issues to CSS Working Group and other browsers as needed, and writing tests for the web-platform-tests repository to ensure interoperability between the implementations.

Once the Coding Experience was over there were still a few missing things to be able to enable display: contents by default. Rune Lillesveen (Google and previously Opera) who was helping during the whole process with the reviews, finished the work and shipped it past week.

WebKit implementation

WebKit already had an initial support for display: contents that was only used internally by Shadow DOM implementation and not exposed to the end users, neither supported by the rest of the code.

We reactivated the work there too, he didn’t have time to finish the whole thing but later Antti Koivisto (Apple) completed the work and enabled it by default on trunk by November 2017.

Conclusions

Igalia is one of the top external contributors on the open web platform projects. This put us on a position that allows us to implement new features in the different open source projects, thanks to our community involvement and internal knowledge after several years of experience on the field. Regarding display: contents implementation, this feature probably wouldn’t be available today in Chromium and WebKit without Igalia’s support, in this particular case through a Coding Experience.

We’re really happy about the results of the Coding Experience and we’re looking forward to repeat the success story in the future.

Of course, all the merit goes to Emilio, who is an impressive engineer and did a great job during the Coding Experience. As part of this process he got committer privileges in both Chromium and WebKit projects. Kudos!

Last, but not least, thanks to Antti and Rune for finishing the work and making display: contents available to WebKit and Chromium users.

January 10, 2018 11:00 PM

December 27, 2017

Manuel Rego

Web Engines Hackfest 2017

One year more Igalia organized, hosted and sponsored a new edition of the Web Engines Hackfest. This is my usual post about the event focusing on the things I was working on during those days.

Organization

This year I wanted to talk about this because being part of the organization in an event with 60 people is not that simple and take some time. I’m one of the members of the organization which ended up meaning that I was really busy during the 3 days of the event trying to make the life of all the attendees as easy as possible.

Yeah, this year we were 60 people, the biggest number ever! Note that last year we were 40 people, so it’s clear the interest in the event is growing after each edition, which is really nice.

For the first time we had conference badges, with so many faces it was going to be really hard to put names to all of them. In addition we had pronouns stickers and different lanyard color for the people that don’t want to appear in the multimedia material published during and after the event. I believe all these things worked very well and we’ll be repeating for sure in future editions.

My Web Engines Hackfest 2017 conference badge My Web Engines Hackfest 2017 conference badge

The survey after the event showed an awesome positive feedback, so we’re really glad that you have enjoined the hackfest. I was specially happy seeing how many parallel discussions were taking place all around the office in small groups of people.

Talks

The main focus of the event is the different discussions that happen between people about many different topics. This together with the possibility to hack with some people from other browser engines and/or other companies make the hackfest an special event.

On top of that, we arrange a few talks that are usually quite interesting. The talks from this year can be found on a YouTube playlist (thanks Juan for helping with the video editing). This year the talks covered the following topics: Web Platform Tests, zlib optimizations, Chromium on Wayland, BigInt, WebRTC and WebVR.

Some pictures of the Web Engines Hackfest 2017 Some pictures of the Web Engines Hackfest 2017

CSS Grid Layout

During the hackfest I participated in several breakout sessions, like for example one about Web Platform Tests or another one about MathML. However as usual on the last years my main was related to CSS Grid Layout. In this case we took advantage to discuss several topics from which I’m going to highlight two:

Chromium bug on input elements which only happens on Mac

This is about Chromium bug #727076, where the input elements that are grid items get collapsed/shrunk when the user starts to enter some text. This was a tricky issue only reproducible on Mac platform that was hitting us for a while, so we needed to find a solution.

We had some long discussions about this and finally my colleague Javi found the root cause of the problem and finally fixed it, kudos!

This bug is a quite complex and tight to some Chromium specific implementation bits. The summary is that in Mac platform you can get requests about the intrinsic (preferred) width of the grid container at random times, which means that you’re not sure a full layout will be performed afterwards. Our code was not ready for that, as we were always expecting a full layout after asking for the intrinsic width.

Percentage tracks and gutters

This is a neverending topic that has been discussed in the CSS WG for a long time. There are 2 different things so let’s go one by one.

First, how percentage tracks are resolved when the width/height of a grid container is indefinite. So far, this was not symmetric but was working like in regular blocks:

  • Percentage columns work like percentage widths: This means that they are treated as auto during intrinsic size computation and later resolved during layout.
  • Percentage rows work like percentage heights: In this case they are treated as auto and never resolved.

However the CSS WG decided to change this and make both symmetric, so percentage rows will work like percentage columns. This hasn’t been implemented by any browser yet, and all of them have interoperability with the previous status. We’re not 100% sure about the complexity this could bring and before changing current behavior we’re going to gather some usage statistics to verify this won’t break a lot o content out there. We’d also love to get feedback from other implementors about this. More information about this topic can be found on CSS WG issue #1921.

Now let’s talk about the second issue, how percentage gaps work. The whole discussion can be checked on CSS WG issue #509 that I started back in TPAC 2016. For this case there are no interoperability between browsers as Firefox has its own solution of back-computing percentage gaps, the rest of browsers have the same behavior in line with the percentage tracks resolution, but again it is not symmetric:

  • Percentage column gaps contribute zero to intrinsic width computation and are resolved as percent during layout.
  • Percentage row gaps are treated always as zero.

The CSS WG resolved to modify this behavior to make them both symmetric, in this case choosing the row gaps behavior as reference. So browsers will need to change how column gaps work to avoid resolving the percentages during layout. We don’t know if we could detect this situation in Blink/WebKit without quite complex changes on the engine, and we’re waiting for feedback from other implementors on that regard.

So I won’t say any of those topics are definitely closed yet, and it won’t be unexpected if some other changes happen in the future when the implementations try to catch up with the spec changes.

Thanks

To close this blog post let’s say thanks to everyone that come to our office and participated in the event, the Web Engines Hackfest won’t be possible without such a great bunch of people that decided to spend a few days working together on improving the status of the Web.

Web Engines Hackfest 2017 sponsors: Collabora, Google, Igalia and Mozilla Web Engines Hackfest 2017 sponsors: Collabora, Google, Igalia and Mozilla

Of course we cannot forget about the sponsors either: Collabora, Google, Igalia and Mozilla. Thank you all very much!

And last, but not least, thanks to Igalia for organizing and hosting one year more this event. Looking forward to the new year and the 2018 edition!

December 27, 2017 11:00 PM

December 17, 2017

Michael Catanzaro

Epiphany Stable Flatpak Releases

The latest stable version of Epiphany is now available on Flathub. Download it here. You should be able to double click the flatpakref to install it in GNOME Software, if you use any modern GNOME operating system not named Ubuntu. But, in my experience, GNOME Software is extremely buggy, and it often as not does not work for me. If you have trouble, you can use the command line:

flatpak install --from https://flathub.org/repo/appstream/org.gnome.Epiphany.flatpakref

This has actually been available for quite a while now, but I’ve delayed announcing it because some things needed to be fixed to work well under Flatpak. It’s good now.

I’ve also added a download link to Epiphany’s webpage, so that you can actually, you know, download and install the software. That’s a useful thing to be able to do!

Benefits

The obvious benefit of Flatpak is that you get the latest stable version of Epiphany (currently 3.26.5) and WebKitGTK+ (currently 2.18.3), no matter which version is shipped in your operating system.

The other major benefit of Flatpak is that the browser is protected by Flatpak’s top-class bubblewrap sandbox. This is, of course, a UI process sandbox, which is different from the sandboxing model used in other browsers, where individual browser tabs are sandboxed from each other. In theory, the bubblewrap sandbox should be harder to escape than the sandboxes used in other major browsers, because the attack surface is much smaller: other browsers are vulnerable to attack whenever IPC messages are sent between the web process and the UI process. Such vulnerabilities are mitigated by a UI process sandbox. The disadvantage of this approach is that tabs are not sandboxed from each other, as they would be with a web process sandbox, so it’s easier for a compromised tab to do bad things to your other tabs. I’m not sure which approach is better, but clearly either way is much better than having no sandbox at all. (I still hope to have a web process sandbox working for use when WebKit is used outside of Flatpak, but that’s not close to being ready yet.)

Problems

Now, there are a couple of loose ends. We do not yet have desktop notifications working under Flatpak, and we also don’t block the screen from turning off when you’re watching fullscreen video, so you’ll have to wiggle your mouse every five minutes or so when you’re watching YouTube to keep the lights on. These should not be too hard to fix; I’ll try to get them both working soon. Also, drag and drop does not work. I’m not nearly brave enough to try fixing that, though, so you’ll just have to live without drag and drop if you use the Flatpak version.

Also, unfortunately the stable GNOME runtimes do not receive regular updates. So while you get the latest version of Epiphany, most everything else will be older. This is not good. I try to make sure that WebKit gets updated, so you’ll have all the latest security updates there, but everything else is generally stuck at older versions. For example, the 3.26 runtime uses, for the most part, whatever software versions were current at the time of the 3.26.1 release, and any updates newer than that are just not included. That’s a shame, but the GNOME release team does not maintain GNOME’s Flatpak runtimes: we have three other other redundant places to store the same build information (JHBuild, GNOME Continuous, BuildStream) that we need to take care of, and adding yet another is not going to fly. Hopefully this situation will change soon, though, since we should be able to use BuildStream to replace the current JSON manifest that’s used to generate the Flatpak runtimes and keep everything up to date automatically. In the meantime, this is a problem to be aware of.

by Michael Catanzaro at December 17, 2017 07:21 PM

December 13, 2017

Gyuyoung Kim

What is navigator.registerProtocolHandler?

Have you heard that navigator.registerProtocolHandler javascript API?

The API is to give you the power to add your customized scheme(a.k.a. protocol) to your website or web application. If you register your own custom scheme, it can help you to avoid collisions with other DNS, or help other people use it correctly, and be able to claim the moral high ground if they use it incorrectly. In this post, I would like to introduce the feature with some examples, and what I’ve contributed to the feature.

Introduction

The registerProtocolHandler had been started discussing since 2006. Finally, it was introduced in HTML5 for the first time.

This is a simple example of the use of navigator.registerProtocolHandler. Basically, the registerProtocolHandler takes three arguments,

  • scheme – The scheme that you want to handle. For example, mailto, tel, bitcoin, or irc.
  • url – A URL within your web application that can handle the specified scheme.
  • title – The title of the handler.
<script>
     navigator.registerProtocolHandler("web+search",
                                       "https://www.example/search/url=%s",
                                       "web search");
</script>
<a href="web+search:igalia">Igalia</a>

As this example, you can register a custom scheme with a URL that can be combined with the given URL(web+search:igalia)

However, you need to keep in mind some limitations when you use it.

  1. URL should include %s placeholder.
  2. You can register own custom handler only when the URL of the custom handler is same the website origin. If not, the browser will generate a security error.
  3. There are pre-defined schemes in the specification. Except for these schemes, you’re only able to use “web+foo” style scheme. But, the length of “web+foo” is at least five characters including ‘web+’ prefix. If not, the browser will generate a security error.
    • bitcoin, geo, im, irc, ircs, magnet, mailto, mms, news, nntp, openpgp4fpr, sip, sms, smsto, ssh, tel, urn, webcal, wtai, xmpp.
  4. Schemes should not contain any colons. For example, mailto: will generate an error. So, you have to just use mailto.

Status on major browsers

Firefox (3+)

When Firefox meets a webpage which includes navigator.registerProtocolHandler as below,

navigator.registerProtocolHandler("web+search",
                                  "https://www.example/search/url=%s",
                                  "web search");

It will show an alert bar just below the URL bar. Then, it will ask us if we want to add the URL to the Firefox handler list.

After allowing to add it to the supported handler list, you can see it in the Applications section of the settings page (about:preferences#general).

Now, the web+github custom scheme can be used in the site as below,

<a href="web+github:wiki">Wiki</a>
<a href="web+github:Pull requests">PR</a>

Chromium (13+)

Let’s take a look Chrome on the registerProtocolHandler. In Chrome, Chrome shows a new button inside the URL bar instead of the alert bar in Firefox.

If you press the button, a dialog will be shown in order to ask if you allow the website to register the custom scheme.

In the “Handlers” section of the settings (chrome://settings/handlers?search=protocol), you’re able to see that the custom handler was registered.

FYI, other browsers based on Chromium (i.e. Opera, Yandex, Whale, etc.) have handled it similar to Chrome unless each vendor has own specific behavior.

Safari (WebKit)

Though WebKit has supported this feature in WebCore and WebProcess of WebKit2, no browsers based on WebKit have supported this feature with UI. Although I had tried to implement it in UIProcess of WebKit2 through the webkit-dev mailing list so that browsers based on WebKit can support it, Unfortunately, some of Apple engineers had doubts about this feature though there were some agreements to support it in WebKit. So I failed to implement it in UIProcess of WebKit2.

My contributions in WebKit and Chromium

I’ve mainly contributed to apply new changes in the specification, to fix bugs and improve test cases. First, I’d like to introduce the bug that I modified in Chromium recently.

Let’s assume that URL has multiple placeholders(%s).

navigator.registerProtocolHandler("test",
                                  "http://example.com/%s/url=%s",
                                  "title");" 
<a href="test:duplicated_placeholders">this</a>

According to the specification, only first “%s” placeholder should be substituted, not substitute all placeholders. But, Chrome has substituted all placeholders with the given URL as below, even though Firefox has only substituted the first placeholder.

http://example.com/test%3Aduplicated_placeholders/url=test%3Aduplicated_placeholders

So I fixed the problem in [registerProtocolHandler] Only substitute the first “%s” placeholder. The latest Chromium substitutes only first placeholder like below,

http://example.com/test%3Aduplicated_placeholders/url=%s

This is a whole list what I’ve contributed both WebKit and Chromium so far.

  1.  WebKit
  2. Chromium

Summary

So far, we’ve looked at the navigator.registerProtcolHandler for your web application simply. The API can be useful if you want to make users use your web applications like a web-based email client or calendar.

by gyuyoung at December 13, 2017 08:59 AM

December 07, 2017

Víctor Jáquez

Enabling HuC for SKL/KBL in Debian/testing

Recently, our friend Florent complained that it was impossible to set a constant bitrate when encoding H.264 using low-power profile with gstreamer-vaapi .

Low-power (LP) profiles are VA-API entry points, available in Intel SkyLake-based procesor and succesors, which provide video encoding with low power consumption.

Later on, Ullysses and Sree, pointed out that CBR in LP is ony possible if HuC is enabled in the kernel.

HuC is a firmware, loaded by i915 kernel module, designed to offload some of the media functions from the CPU to GPU. One of these functions is bitrate control when encoding. HuC saves unnecessary CPU-GPU synchronization.

In order to load HuC, it is required first to load GuC, another Intel’s firmware designed to perform graphics workload scheduling on the various graphics parallel engines.

How we can install and configure these firmwares to enable CBR in low-power profile, among other things, in Debian/testing?

Check i915 parameters

First we shall confirm that our kernel and our i915 kernel module is capable to handle this functionality:

$ sudo modinfo i915 | egrep -i "guc|huc|dmc"
firmware:       i915/bxt_dmc_ver1_07.bin
firmware:       i915/skl_dmc_ver1_26.bin
firmware:       i915/kbl_dmc_ver1_01.bin
firmware:       i915/kbl_guc_ver9_14.bin
firmware:       i915/bxt_guc_ver8_7.bin
firmware:       i915/skl_guc_ver6_1.bin
firmware:       i915/kbl_huc_ver02_00_1810.bin
firmware:       i915/bxt_huc_ver01_07_1398.bin
firmware:       i915/skl_huc_ver01_07_1398.bin
parm:           enable_guc_loading:Enable GuC firmware loading (-1=auto, 0=never [default], 1=if available, 2=required) (int)
parm:           enable_guc_submission:Enable GuC submission (-1=auto, 0=never [default], 1=if available, 2=required) (int)
parm:           guc_log_level:GuC firmware logging level (-1:disabled (default), 0-3:enabled) (int)
parm:           guc_firmware_path:GuC firmware path to use instead of the default one (charp)
parm:           huc_firmware_path:HuC firmware path to use instead of the default one (charp)

Install firmware

$ sudo apt install firmware-misc-nonfree

UPDATE: In order to install this Debian package, you should have enabled the non-free apt repository in your sources list.

Verify the firmware are installed:

$ ls -1 /lib/firmware/i915/
bxt_dmc_ver1_07.bin
bxt_dmc_ver1.bin
bxt_guc_ver8_7.bin
bxt_huc_ver01_07_1398.bin
kbl_dmc_ver1_01.bin
kbl_dmc_ver1.bin
kbl_guc_ver9_14.bin
kbl_huc_ver02_00_1810.bin
skl_dmc_ver1_23.bin
skl_dmc_ver1_26.bin
skl_dmc_ver1.bin
skl_guc_ver1.bin
skl_guc_ver4.bin
skl_guc_ver6_1.bin
skl_guc_ver6.bin
skl_huc_ver01_07_1398.bin

Update modprobe configuration

Edit or create the configuration file /etc/modprobe.d/i915.con

$ sudo vim /etc/modprobe.d/i915.conf
....
$ cat /etc/modprobe.d/i915.conf
options i915 enable_guc_loading=1 enable_guc_submission=1

Reboot

$ sudo systemctl reboot 

Verification

Now it is possible to verify that the i915 module kernel loaded the firmware correctly by looking at the kenrel logs:

$ journalctl -b -o short-monotonic -k | egrep -i "i915|dmr|dmc|guc|huc"
[   10.303849] miau kernel: Setting dangerous option enable_guc_loading - tainting kernel
[   10.303852] miau kernel: Setting dangerous option enable_guc_submission - tainting kernel
[   10.336318] miau kernel: i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[   10.338664] miau kernel: i915 0000:00:02.0: firmware: direct-loading firmware i915/kbl_dmc_ver1_01.bin
[   10.339635] miau kernel: [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_01.bin (v1.1)
[   10.361811] miau kernel: i915 0000:00:02.0: firmware: direct-loading firmware i915/kbl_huc_ver02_00_1810.bin
[   10.362422] miau kernel: i915 0000:00:02.0: firmware: direct-loading firmware i915/kbl_guc_ver9_14.bin
[   10.393117] miau kernel: [drm] GuC submission enabled (firmware i915/kbl_guc_ver9_14.bin [version 9.14])
[   10.410008] miau kernel: [drm] Initialized i915 1.6.0 20170619 for 0000:00:02.0 on minor 0
[   10.559614] miau kernel: snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
[   11.937413] miau kernel: i915 0000:00:02.0: fb0: inteldrmfb frame buffer device

That means that HuC and GuC firmwares were loaded successfully.

Now we can check the status of the modules using sysfs

$ sudo cat /sys/kernel/debug/dri/0/i915_guc_load_status
GuC firmware status:
        path: i915/kbl_guc_ver9_14.bin
        fetch: SUCCESS
        load: SUCCESS
        version wanted: 9.14
        version found: 9.14
        header: offset is 0; size = 128
        uCode: offset is 128; size = 142272
        RSA: offset is 142400; size = 256

GuC status 0x800330ed:
        Bootrom status = 0x76
        uKernel status = 0x30
        MIA Core status = 0x3

Scratch registers:
         0:     0xf0000000
         1:     0x0
         2:     0x0
         3:     0x5f5e100
         4:     0x600
         5:     0xd5fd3
         6:     0x0
         7:     0x8
         8:     0x3
         9:     0x74240
        10:     0x0
        11:     0x0
        12:     0x0
        13:     0x0
        14:     0x0
        15:     0x0
$ sudo cat /sys/kernel/debug/dri/0/i915_huc_load_status
HuC firmware status:
        path: i915/kbl_huc_ver02_00_1810.bin
        fetch: SUCCESS
        load: SUCCESS
        version wanted: 2.0
        version found: 2.0
        header: offset is 0; size = 128
        uCode: offset is 128; size = 218304
        RSA: offset is 218432; size = 256

HuC status 0x00006080:

Test GStremer

$ gst-launch-1.0 videotestsrc num-buffers=1000 ! video/x-raw, format=NV12, width=1920, height=1080, framerate=\(fraction\)30/1 ! vaapih264enc bitrate=8000 keyframe-period=30 tune=low-power rate-control=cbr ! mp4mux ! filesink location=test.mp4
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Got context from element 'vaapiencodeh264-0': gst.vaapi.Display=context, gst.vaapi.Display=(GstVaapiDisplay)"\(GstVaapiDisplayGLX\)\ vaapidisplayglx0";
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:11.620036001
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...
$ gst-discoverer-1.0 test.mp4 
Analyzing file:///home/vjaquez/gst/master/intel-vaapi-driver/test.mp4
Done discovering file:///home/vjaquez/test.mp4

Topology:
  container: Quicktime
    video: H.264 (High Profile)

Properties:
  Duration: 0:00:33.333333333
  Seekable: yes
  Live: no
  Tags: 
      video codec: H.264 / AVC
      bitrate: 8084005
      encoder: VA-API H264 encoder
      datetime: 2017-12-07T14:29:23Z
      container format: ISO MP4/M4A

Misison accomplished!

References

by vjaquez at December 07, 2017 02:35 PM

November 28, 2017

Diego Pino

Practical Snabb

In a previous article I introduced Snabb, a toolkit for developing network functions. In this article I want to dive into some practical examples on how to use Snabb for network function programming.

The elements of a network function

A network function is any program that does something with traffic data. There’s a certain set of operations that can be done onto any packet. Operations such as reading, modifying (headers or payload), creating (new packets), dropping or forwarding. Any network function is a combination of these primitives. For instance, a NAT function consists of packet header modification and forwarding.

Some of the built-in network functions featured in Snabb are:

  • lwAFTR (NAT, encap/decap): Implementation of the lwAFTR network function as specified in RFC7596. lwAFTR is a NAT between IPv6 and IPv4 address+port.
  • IPSEC (processing): encryption of packet payloads using AES instructions.
  • Snabbwall (filtering): a L7 firewall that relies on libnDPI for Deep-Packet Inspection. It also allows L3/L4 filtering using tcpdump alike expressions.

Real-world scenarios

The downside of by-passing the kernel and taking full control of a NIC is that the NIC cannot be used by any other program. That means the network function run by Snabb acts as a black-box. Some traffic comes in, gets transformed and it’s pushed out through the same NIC (or any other NIC controlled by the network function). The advantage is clear, outstanding performance.

For this reason Snabb is mostly used to develop network functions that run within the ISP’s network, where traffic load is expected to be high. An ISP can spare one or several NICs to run a network function alone since the results pay off (lower hardware costs, custom network function development, good performance, etc).

Snabb might seem like a less attractive tool in other scenarios. However, that doesn’t mean it cannot be used to program network functions that run in a personal computer or in a less demanding network. Snabb has interfaces to Tap, Raw socket and Unix socket programming, which allows to use Snabb as a program managed by the kernel. In fact, using some of these interfaces is the best way to start with Snabb if you don’t count with native hardware support.

Building Snabb

In this tutorial I’ll cover two examples to help me illustrate how to use Snabb. But before proceeding with the examples, we need to download and build Snabb.

$ git clone https://github.com/snabbco/snabb
$ cd snabb
$ make

Now we can run the snabb executable, which will print out a list of all the subprograms available:

$ cd src/
$ sudo ./snabb
Usage: ./snabb <program> ...

This snabb executable has the following programs built in:
  config
  example_replay
  example_spray
  firehose
  ...
  snsh
  wall

For detailed usage of any program run:
  snabb <program> --help

If you rename (or copy or symlink) this executable with one of
the names above then that program will be chosen automatically.

Hello world!

One of the simplest network functions to build is something that reads packets from a source, filters some of them and forwards the rest to an output. In this case I want to capture traffic from my browser (packets to HTTP or HTTPS). Here is how our hello world! program looks like:

#!./snabb snsh

local pcap = require("apps.pcap.pcap")
local PcapFilter = require("apps.packet_filter.pcap_filter").PcapFilter
local RawSocket = require("apps.socket.raw").RawSocket

local args = main.parameters
local iface = assert(args[1], "No listening interface")
local fileout = args[2] or "output.pcap"

local c = config.new()
config.app(c, "nic", RawSocket, iface)
config.app(c, "filter", PcapFilter, {filter = "tcp dst port 80 or dst port 443"})
config.app(c, "writer", pcap.PcapWriter, fileout)

config.link(c, "nic.tx -> filter.input")
config.link(c, "filter.output -> writer.input")

engine.configure(c)
engine.main({duration=30})

main.exit(0)

Now save the script and run it:

$ chmod +x http-filter.snabb 
$ sudo ./http-filter.snabb wlp3s0

While the script is running I open a few websites in my browser. Hopefully some packets will be captured onto output.pcap:

$ sudo tcpdump -tr output.pcap
IP sagan.50062 > 54.239.17.7.http: Flags [P.], seq 0:926, ack 1, win 229, length 926: HTTP: GET / HTTP/1.1
IP sagan.50062 > 54.239.17.7.http: Flags [.], ack 189, win 237, length 0
IP sagan.50062 > 54.239.17.7.http: Flags [.], ack 368, win 245, length 0
IP sagan.37346 > 93.184.220.29.http: Flags [S], seq 370675941, win 29200, options [mss 1460,sackOK,TS val 1370741706 ecr 0,nop,wscale 7], length 0
IP sagan.37346 > 93.184.220.29.http: Flags [.], ack 2640726891, win 229, options [nop,nop,TS val 1370741710 ecr 2287287426], length 0
IP sagan.37346 > 93.184.220.29.http: Flags [P.], seq 0:439, ack 1, win 229, options [nop,nop,TS val 1370741729 ecr 2287287426], length 439: HTTP: POST / HTTP/1.1
IP sagan.37346 > 93.184.220.29.http: Flags [.], ack 789, win 251, options [nop,nop,TS val 1370741733 ecr 2287287449], length 0

Some highlights in this script:

  • The shebang line (#./snabb snsh) refers to the Snabb’s shell (snsh), one of the many subprograms available in Snabb. It allows us to run Snabb scripts, that is Lua programs that have access to the Snabb environment (engine, apps, libraries, etc).
  • There’s a series of libraries that where not loaded: config, engine, main, etc. These libraries are part of the Snabb environment and are automatically loaded in every program.
  • The network function instantiates 3 apps: RawSocket, PcapFilter and PcapWriter, initializes them and pipes them together through links forming a graph. This graph is passed to the engine that executes it for 30 seconds.

Martian packets

Let’s continue with another example: a network function that manages a more complex set of rules to filter out traffic. Since there are more rules I will encapsulate the filtering logic into a custom app.

The data we’re going to filter are martian packets. According to Wikipedia, a martian packet is “an IP packet seen on the public internet that contains a source or destination address that is reserved for special-use by Internet Assigned Numbers Authority (IANA)”. For instance, packets with RFC1918 addresses or multicast addresses seen on the public internet are martian packets.

Unlike the previous example, I decided not to code this network function as an script, but as a program instead. The network function lives at src/program/martian. I’ve pushed the final code to a branch in my Snabb repository:

$ git remote add https://github.com/dpino/snabb.git dpino
$ git fetch dpino
$ git checkout -b dpino/martian-packets

To run the app:

$ sudo ./snabb martian program/martian/test/sample.pcap
link report:
   3 sent on filter.output -> writer.input (loss rate: 0%)
   5 sent on reader.output -> filter.input (loss rate: 0%)

The functions lets pass 3 out of 5 packets from sample.pcap.

$ sudo tcpdump -qns 0 -t -e -r program/martian/test/sample.pcap
reading from file program/martian/test/sample.pcap, link-type EN10MB (Ethernet)
00:00:01:00:00:00 > fe:ff:20:00:01:00, IPv4, length 62: 145.254.160.237.3372 > 65.208.228.223.80: tcp 0
fe:ff:20:00:01:00 > 00:00:01:00:00:00, IPv4, length 62: 65.208.228.223.80 > 145.254.160.237.3372: tcp 0
00:00:01:00:00:00 > fe:ff:20:00:01:00, IPv4, length 54: 145.254.160.237.3372 > 65.208.228.223.80: tcp 0
90:e2:ba:94:2a:bc > 02:cf:69:15:81:01, IPv4, length 242: 10.0.1.100 > 10.10.0.0: ICMP echo reply, id 1024, seq 0, length 208
90:e2:ba:94:2a:bc > 02:cf:69:15:81:01, IPv4, length 242: 10.0.1.100 > 10.10.0.0: ICMP echo reply, id 53, seq 0, length 208

The last two packets are martian packets. They cannot occur in a public network since their source or destination addresses are private addresses.

Some highlights about this network function:

  • Instead of a filtering app, I’ve coded my own filtering app, called MartianFiltering. This new app is the responsible for determining whether a packet is a martian packet or not. This operation has to be done in the push method of the app.
  • I’ve coded some utility functions to parse CIDR addresses (such as 100.64.0.0/10) and to check whether an IP address belongs to a network. Instead I could have used Snabb’s filtering library that allows to filter packets using tcpdump like expressions. For instance, “net 100.64.0.0 mask 255.192.0.0”.
  • The network function doesn’t use a network interface to read packets from, instead it reads packets out of a .pcap file.
  • Every Snabb program has a run function, that is the program’s entry point. A Snabb program or library can also add a selftest function, which is used to unit test the module ($ sudo ./snabb snsh -t program.martian). On the other hand, Snabb apps must implement a new method and optionally a push or pull method (or both, but at least one of them).

Here’s the app’s graph:

config.app(c, "reader", pcap.PcapReader, filein)
config.app(c, "filter", MartianFilter)
config.app(c, "writer", pcap.PcapWriter, fileout)

config.link(c, "reader.output -> filter.input")
config.link(c, "filter.output -> writer.input")

And here is how MartianPacket:pull method looks like:

function MartianFilter:push ()
   local input, output = assert(self.input.input), assert(self.output.output)

   while not link.empty(input) do
      local pkt = link.receive(input)
      local ip_hdr = ipv4:new_from_mem(pkt.data + IPV4_OFFSET, IPV4_SIZE)
      if self:is_martian(ip_hdr:src()) or self:is_martian(ip_hdr:dst()) then
         packet.free(pkt)
      else
         link.transmit(output, pkt)
      end
   end
end

As a rule of thumb, in every Snabb program there’s always one app only that feeds packets into the graph, in this case the PcapReader app. Such applications have to override the method pull. Apps that would like to manipulate packets will have a chance to do it in their push method.

Summary

Snabb is a very useful tool for coding network functions that need to run at very high speed. For this reason, it’s usually deployed as part of an ISP network infrastructure. However, the toolkit is versatile enough to allow us code any type of application that has to manipulate network traffic.

In this tutorial I introduced how to start using Snabb to code network functions. In a first example I showed how to download and build Snabb plus a very simple application that filters HTTP or HTTPS traffic from a network interface. On a second example, I introduced how to code a Snabb program and an app, MartianFiltering. This app exemplifies how to filter out packets based on a set of rules and forward or drop packets based on those conditions. Other more sophisticated network functions, such as firewalling, packet-rate limiting or DDoS prevention attack, behave in a similar manner.

That’s all for now. I left out another example that consisted of sending and receiving Multicast DNS packets. Likely I’ll cover it in a followup article.

November 28, 2017 06:00 AM

November 24, 2017

Víctor Jáquez

Intel MediaSDK on Debian (testing)

Everybody knows it: install Intel MediaSDK in GNU/Linux is a PITA. With CentOS or Yocto is less cumbersome, if you trust blindly on scripts ran as root.

I don’t like CentOS, I feel it like if I were living in the past. I like Debian (testing, of course) and I also wanted to understand a little more about MediaSDK. And this is what I did to have Intel MediaSDK working in Debian/testing.

First, I did a pristine installation of Debian testing with a netinst image in my NUC 6i5SYK, with a normal desktop user setup (Gnome3).

The madness comes later.

Intel’s identifies two types of MediaSDK installation: Gold and Generic. Gold is for CentOS, and Generic for the rest of distributions. Obviously, Generic means you’re on your own. For the purpose of this exercise I used as reference Generic Linux* Intel® Media Server Studio Installation.

Let’s begin by grabbing the Intel® Media Server Studio – Community Edition. You will need to register yourself and accept the user agreement, because this is proprietary software.

At the end, you should have a tarball named MediaServerStudioEssentials2017R3.tar.gz

Extract the files for Generic instalation

$ cd ~
$ tar xvf MediaServerStudioEssentials2017R3.tar.gz
$ cd MediaServerStudioEssentials2017R3
$ tar xvf SDK2017Production16.5.2.tar.gz
$ cd SDK2017Production16.5.2/Generic
$ mkdir tmp
$ tar -xvC tmp -f intel-linux-media_generic_16.5.2-64009_64bit.tar.gz

Kernel

Bad news: in order to get MediaSDK working you need to patch the mainlined kernel.

Worse news: the available patches are only for the version 4.4 the kernel.

Still, systemd works on 4.4, as far as I know, so it would not be a big problem.

Grab building dependencies
$ sudo apt install build-essential devscripts libncurses5-dev
$ sudo apt build-dep linux

Grab kernel source

I like to use the sources from the git repository, since it would be possible to do some rebasing and blaming in the future.

$ cd ~
$ git clone https://github.com/torvalds/linux.git
...
$ git pull -v --tags
$ git checkout -b 4.4 v4.4

Extract MediaSDK patches

$ cd ~/MediaServerStudioEssentials2017R3/SDK2017Production16.5.2/Generic/tmp/opt/intel/mediasdk/opensource/patches/kmd/4.4
$ tar xvf intel-kernel-patches.tar.bz2
$ cd intel-kernel-patches
$ PATCHDIR=$(pwd)

Patch the kernel

cd ~/linux
$ git am $PATCHDIR/*.patch

The patches should apply with some warnings but no fatal errors (don’t worry, be happy).

Still, there’s a problem with this old kernel: our recent compiler doesn’t build it as it is. Another patch is required:

$ wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.8-rc2/0002-UBUNTU-SAUCE-no-up-disable-pie-when-gcc-has-it-enabl.patch
$ git am 0002-UBUNTU-SAUCE-no-up-disable-pie-when-gcc-has-it-enabl.patch

TODO: Shall I need to modify the EXTRAVERSION string in kernel’s Makefile?

Build and install the kernel

Notice that we are using our current kernel configuration. That is error prone. I guess that is why I had to select NVM manually.

$ cp /boot/config-4.12.0-1-amd64 ./.config
$ make olddefconfig
$ make nconfig # -- select NVM
$ scripts/config --disable DEBUG_INFO
$ make deb-pkg
...
$ sudo dpkg -i linux-image-4.4.0+_4.4.0+-2_amd64.deb linux-headers-4.4.0+_4.4.0+-2_amd64.deb linux-firmware-image-4.4.0+_4.4.0+-2_amd64.deb

Configure GRUB2 to boot Linux 4.4. by default

This part was absolutely tricky for me. It took me a long time to figure out how to specify the kernel ID in the grubenv.

$ sudo vi /etc/default/grub

And change the line GRUB_DEFAULT=saved. By default it is set to 0. And update GRUB.

$ sudo update-grub

Now look for the ID of the installed kernel image in /etc/grub/grub.cfg and use it:

$ sudo grub-set-default "gnulinux-4.4.0+-advanced-2c246bc6-65bb-48ea-9517-4081b016facc>gnulinux-4.4.0+-advanced-2c246bc6-65bb-48ea-9517-4081b016facc"

Please note it is twice and separated by a >. Don’t ask me why.

Copy MediaSDK firmware (and libraries too)

I like to use rsync rather normal cp because there are the options like --dry-run and --itemize-changes to verify what I am doing.

$ cd ~/MediaServerStudioEssentials2017R3/SDK2017Production16.5.2/Generic/tmp
$ sudo rsync -av --itemize-changes ./lib /
$ sudo rsync -av --itemize-changes ./opt/intel/common /opt/intel
$ sudo rsync -av --itemize-changes ./opt/intel/mediasdk/{include,lib64,plugins} /opt/intel/mediasdk

All these directories contain blobs that do the MediaSDK magic. They are dlopened by hard coded paths by mfx_dispatch, which will be explain later.

In /lib lives the firmware (kernel blob).

In /opt/intel/common… I have no idea what are those shared objects.

In /opt/intel/mediasdk/include live header files for programming an compilation.

In /opt/intel/mediasdk/lib64 live the driver for the modified libva (iHD) and other libraries.

In /opt/intel/mediasdk/plugins live, well, plugins…

In conclusion, all these bytes are darkness and mystery.

Reboot

$ sudo systemctl reboot

The system should boot, automatically, in GNU/Linux 4.4.

Please, log with Xorg, not in Wayland, since it is not supported, as far as I know.

GStreamer

For compiling GStreamer I will use gst-uninstalled. Someone may say that I should use gst-build because is newer and faster, but I feel more comfortable doing the following kind of hacks with the old&good autotools.

Basically this is a reproduction of Quick-start guide to gst-uninstalled for GStreamer 1.x.

$ sudo apt build-dep gst-plugins-{base,good,bad}1.0
$ wget https://cgit.freedesktop.org/gstreamer/gstreamer/plain/scripts/create-uninstalled-setup.sh -q -O - | sh

I will modify the gst-uninstalled script, and keep it outside of the repository. For that I will use the systemd file-hierarchy spec for user’s executables.

$ cd ~/gst
$ mkdir -p ~/.local/bin
$ mv master/gstreamer/scripts/gst-uninstalled ~/.local/bin
$ ln -sf ~/.local/bin/gst-uninstalled ./gst-master

Do not forget to edit your ~/.profile to add ~/.local/bin in the environment variable PATH.

Patch ~/.local/bin/gst-uninstalled

The modifications are to handle the three dependencies libraries that are required by MediaSDK: libdrm, libva and mfx_dispatch.

diff --git a/scripts/gst-uninstalled b/scripts/gst-uninstalled
index 81f83b6c4..d79f19abd 100755
--- a/scripts/gst-uninstalled
+++ b/scripts/gst-uninstalled
@@ -122,7 +122,7 @@ GI_TYPELIB_PATH=$GST/gstreamer/gst:$GI_TYPELIB_PATH
 export LD_LIBRARY_PATH
 export DYLD_LIBRARY_PATH
 export GI_TYPELIB_PATH
-  
+
 export PKG_CONFIG_PATH="\
 $GST_PREFIX/lib/pkgconfig\
 :$GST/gstreamer/pkgconfig\
@@ -140,6 +140,9 @@ $GST_PREFIX/lib/pkgconfig\
 :$GST/orc\
 :$GST/farsight2\
 :$GST/libnice/nice\
+:$GST/drm\
+:$GST/libva/pkgconfig\
+:$GST/mfx_dispatch\
 ${PKG_CONFIG_PATH:+:$PKG_CONFIG_PATH}"

 export GST_PLUGIN_PATH="\
@@ -227,6 +230,16 @@ export GST_VALIDATE_APPS_DIR=$GST_VALIDATE_APPS_DIR:$GST/gst-editing-services/te
 export GST_VALIDATE_PLUGIN_PATH=$GST_VALIDATE_PLUGIN_PATH:$GST/gst-devtools/validate/plugins/
 export GIO_EXTRA_MODULES=$GST/prefix/lib/gio/modules:$GIO_EXTRA_MODULES

+# MediaSDK
+export LIBVA_DRIVERS_PATH=/opt/intel/mediasdk/lib64
+export LIBVA_DRIVER_NAME=iHD
+export LD_LIBRARY_PATH="\
+/opt/intel/common/mdf/lib64\
+:$GST/drm/.libs\
+:$GST/drm/intel/.libs\
+:$GST/libva/va/.libs\
+:$LD_LIBRARY_PATH"
+

Now, initialize the gst-uninstalled environment:

$ cd ~/gst
$ ./gst-master
libdrm

Grab libdrm from its repository and switch to the branch with the supported version by MediaSDK.

$ cd ~/gst/master
$ git clone git://anongit.freedesktop.org/mesa/drm
$ cd drm
$ git checkout -b intel libdrm-2.4.67

Extract the distributed tarball in the cloned repository.

$ tar -xv --strip-components=1 -C . -f ~/MediaServerStudioEssentials2017R3/SDK2017Production16.5.2/Generic/tmp/opt/intel/mediasdk/opensource/libdrm/2.4.67-64009/libdrm-2.4.67.tar.bz2

Then we could check the big delta between upstream and the changes done by Intel for MediaSDK.

Let’s put it in a commit for later rebases.

$ git add -u
$ git add .
$ git commit -m "mediasdk changes"

Get build dependencies and compile.

$ sudo apt build-dep libdrm
$ ./configure
$ make -j8

Since the pkgconfig files (*.pc) of libdrm are generated to work installed, it is needed to modify them in order to work uninstalled.

$ prefix=${HOME}/gst/master/drm
$ sed -i -e "s#^libdir=.*#libdir=${prefix}/.libs#" ${prefix}/*.pc
$ sed -i -e "s#^includedir=.*#includedir=${prefix}#" ${prefix}/*.pc

In order to C preprocessor could find the uninstalled libdrm header files we need to make them available in the expected path according to the pkgconfig file and right now they are not there. To fix that it is possible to create proper symbolic links.

$ cd ~/gst/master/drm
$ ln -s include/drm/ libdrm

libva

This modified a version of libva. These modifications messed a bit with the opensource version of libva, because Intel decided not to prefix the library, or some other strategy. In gstreamer-vaapi we had to blacklist VA-API version 0.99, because it is the version number, arbitrary set, of this modified version of libva for MediaSDK.

Again, grab the original libva from repo and change the branch aiming to the divert point. It was difficult to find the divert commit id since even the libva version number was changed. Doing some archeology I guessed the branch point was in version 1.0.15, but I’m not sure.

$ cd ~/gst/master
$ git clone https://github.com/01org/libva.git
$ cd libva
$ git checkout -b intel libva-1.0.15
$ tar -xv --strip-components=1 -C . -f ~/MediaServerStudioEssentials2017R3/SDK2017Production16.5.2/Generic/tmp/opt/intel/mediasdk/opensource/libva/1.67.0.pre1-64009/libva-1.67.0.pre1.tar.bz2
$ git add -u
$ git add .
$ git commit -m "mediasdk"

Before compile, verify that Makefile is going to link against the uninstalled libdrm. You can do that by grepping for LIBDRM in Makefile.

Get compilation dependencies and build.

$ sudo apt build-dep libva
$ ./configure
$ make -j8

Moidify the pkgconfig files for uninstalled

$ prefix=${HOME}/gst/master/libva
$ sed -i -e "s#^libdir=.*#libdir=${prefix}/va/.libs#" ${prefix}/pkgconfig/*.pc
$ sed -i -e "s#^includedir=.*#includedir=${prefix}#" ${prefix}/pkgconfig/*.pc

Fix header path with symbolic links

$ cd ~/gst/master/libva/va
$ ln -sf drm/va_drm.h

mfx_dispatch

This static library which must be linked with MediaSDK applications. In our case, to the GStreamer plugin.

According to its documentation (included in the tarball):

the dispatcher is a layer that lies between application and the SDK implementations. Upon initialization, the dispatcher locates the appropiate platform-specific SDK implementation. If there is none, it will select the software SDK implementation. The dispatcher will redirect subsequent function calls to the same functions in the selected SDK implementation.

In the tarball there is the source of the mfx_dispatcher, but it only compiles with cmake. I have not worked with cmake on uninstalled setups, but we are lucky, there is a repository with autotools support:

$ cd ~/gst/master
$ git clone https://github.com/lu-zero/mfx_dispatch.git

And compile. After running ./configure it is better to confirm, grepping the generated Makefie, that the uninstalled versions of libdrm and libva are going to be used.

$ autoreconf  --install
$ ./configure
$ make -j8

Finally, just as the other libraries, it is required to fix the pkgconfig files:d

$ prefix=${HOME}/gst/master/mfx_dispatch
$ sed -i -e "s#^libdir=.*#libdir=${prefix}/.libs#" ${prefix}/*.pc
$ sed -i -e "s#^includedir=.*#includedir=${prefix}#" ${prefix}/*.pc

Test it!

At last we are in a position where it is possible to test if everything works as expected. For it we are going to run the pre-compiled version of vainfo bundled in the tarball.

We will copy it to our uninstalled setup, thus we would running without specifing the path.

$ sync -av /home/vjaquez/MediaServerStudioEssentials2017R3/SDK2017Production16.5.2/Generic/tmp/usr/bin/vainfo ./prefix/bin/
$ vainfo
libva info: VA-API version 0.99.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open /opt/intel/mediasdk/lib64/iHD_drv_video.so
libva info: Found init function __vaDriverInit_0_32
libva info: va_openDriver() returns 0
vainfo: VA-API version: 0.99 (libva 1.67.0.pre1)
vainfo: Driver version: 16.5.2.64009-ubit
vainfo: Supported profile and entrypoints
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline: <unknown entrypoint>
      VAProfileH264ConstrainedBaseline: <unknown entrypoint>
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264Main               : <unknown entrypoint>
      VAProfileH264Main               : <unknown entrypoint>
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileH264High               : <unknown entrypoint>
      VAProfileH264High               : <unknown entrypoint>
      VAProfileMPEG2Simple            : VAEntrypointEncSlice
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointEncSlice
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointEncPicture
      VAProfileVP8Version0_3          : VAEntrypointEncSlice
      VAProfileVP8Version0_3          : VAEntrypointVLD
      VAProfileVP8Version0_3          : <unknown entrypoint>
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSlice
      VAProfileVP9Profile0            : <unknown entrypoint>
      <unknown profile>               : VAEntrypointVideoProc
      VAProfileNone                   : VAEntrypointVideoProc
      VAProfileNone                   : <unknown entrypoint>

It works!

Compile GStreamer

I normally make a copy of ~/gst/master/gstreamer/script/git-update.sh in ~/.local/bin in order to modify it, like adding support for ccache, disabling gtkdoc and gobject-instrospections, increase the parallel tasks, etc. But that is out of the scope of this document.

$ cd ~/gst/master/
$ ./gstreamer/scripts/git-update.sh

Everything should be built without issues and, at the end, we could test if the gst-msdk elements are available:

$ gst-inspect-1.0 msdk
Plugin Details:
  Name                     msdk
  Description              Intel Media SDK encoders
  Filename                 /home/vjaquez/gst/master/gst-plugins-bad/sys/msdk/.libs/libgstmsdk.so
  Version                  1.13.0.1
  License                  BSD
  Source module            gst-plugins-bad
  Source release date      2017-11-23 16:39 (UTC)
  Binary package           GStreamer Bad Plug-ins git
  Origin URL               Unknown package origin

  msdkh264dec: Intel MSDK H264 decoder
  msdkh264enc: Intel MSDK H264 encoder
  msdkh265dec: Intel MSDK H265 decoder
  msdkh265enc: Intel MSDK H265 encoder
  msdkmjpegdec: Intel MSDK MJPEG decoder
  msdkmjpegenc: Intel MSDK MJPEG encoder
  msdkmpeg2enc: Intel MSDK MPEG2 encoder
  msdkvp8dec: Intel MSDK VP8 decoder
  msdkvp8enc: Intel MSDK VP8 encoder

  9 features:
  +-- 9 elements

Great!

Now, let’s run a simple pipeline. Please note that gst-msdk elements have rank zero, then they will not be autoplugged, it is necessary to craft the pipeline manually:

$ gst-launch-1.0 filesrc location= ~/test.264 ! h264parse ! msdkh264dec ! videoconvert ! xvimagesink
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
libva info: VA-API version 0.99.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open /opt/intel/mediasdk/lib64/iHD_drv_video.so
libva info: Found init function __vaDriverInit_0_32
libva info: va_openDriver() returns 0
Redistribute latency...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:02.502411331
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

\o/

by vjaquez at November 24, 2017 09:57 AM

November 22, 2017

Asumu Takikawa

Writing network drivers in a high-level language

Another day, another post about Snabb. Today, I’ll start to explain some work I’ve been doing at Igalia for Deutsche Telekom on driver development. All the DT driver work I’ll be talking about was joint work with Nicola Larosa.

When writing a networking program with Snabb, the program has to get some packets to crunch on from somewhere. Like everything else in Snabb, these packets come from an app.

These source apps might be a synthetic packet generator or, for anything running on real hardware, a network driver that talks to a NIC (a network card). That network driver part is the subject of this blog post.

These network drivers are written in LuaJIT like the rest of the system. This is maybe not that surprising if you know Snabb does kernel-bypass networking (like DPDK or other similar approaches), but it’s still quite remarkable! The vast majority of drivers that people are familiar with (graphics drivers, wifi drivers, or that obscure CueCat driver) are written in C.

For the Igalia project, we worked on extending the existing Snabb drivers for Intel NICs with some extra features. I’ll talk more about the new work that we did specifically in a second blog post. For this post, I’ll introduce how we can even write a driver in Lua.

(and to be clear, the existing Snabb drivers aren’t my work; they’re the work of some excellent Snabb hackers like Luke Gorrie and others)

For the nitty-gritty details about how Snabb bypasses the kernel to let a LuaJIT program operate on the NIC, I recommend reading Luke Gorrie’s neat blog post about it. In this post, I’ll talk about what happens once user-space has a hold on the network card.

Driver infrastructure

When a driver starts up, it of course needs to initialize the hardware. The datasheet for the Intel 82599 NIC for example dedicates an entire chapter to this. A lot of the initialization process consists of poking at the appropriate configuration registers on the device, waiting for things to power up and tell you they’re ready, and so on.

To actually poke at these registers, the driver uses memory-mapped I/O to the PCI device. The MMIO memory is, as far as LuaJIT and we are concerned, just a pointer to a big chunk of memory given to us by the hardware.pci library via the FFI.

It’s up to us to interpret this returned uint32_t pointer in a useful way. Specifically, we know certain offsets into this memory are mapped to registers as specified in the datasheet.

Since we’re living in a high-level language, we want to hide away the pointer arithmetic needed to access these registers. So Snabb has a little DSL in the lib.hardware.register library that takes text descriptions of registers like this:

1
2
3
4
-- Name        Address / layout        Read-write status/description
array_registers = [[
   RSSRK       0x5C80 +0x04*0..9       RW RSS Random Key
]]

and then lets you map them into a register table:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
my_regs = {}

pci      = require 'lib.hardware.pci'
register = require 'lib.hardware.register'

-- get a pointer to MMIO for some PCI address
base_ptr = pci.map_pci_memory_unlocked("02:00.01", 0)

-- defines (an array of) registers in my_regs
register.define_array(array_registers, my_regs, base_ptr)

After defining these registers, you can use the my_regs table to access the registers like any other Lua data. For example, the “RSS Random Key” array of registers can be initialized with some random data like this:

1
2
3
for i=0, 9 do
   my_regs.RSSRK[i](math.random(2^32))
end

This code looks like straightforward Lua code, but it’s poking at the NIC’s configuration registers. These registers are also often manipulated at the bit level, and there is some library support for that in the lib.bits.

For example, here are some prose instructions to initialize a certain part of the NIC from the datasheet:

1
2
3
4
Disable TC arbitrations while enabling the packet buffer free space monitor:

  — Tx Descriptor Plane Control and Status (RTTDCS), bits:
  TDPAC=0b, VMPAC=1b, TDRM=0b, BDPM=1b, BPBFSM=0b

This is basically instructing the implementor to set some bits and clear some bits in the RTTDCS register, which can be translated into some code that looks like this:

1
2
3
4
5
6
7
bits = require "lib.bits"

-- clear these bits
my_regs.RTTDCS:clr(bits { TDPAC=0, TDRM=4, BPBFSM=23 })

-- set these bits
my_regs.RTTDCS:set(bits { VMPAC=1, BDPM=22 })

The bits function just takes a table of bit offsets to set (the table key strings only matter for documentation’s sake) and turns it into a number to use for setting a register. It’s possible to write these bit manipulations with just arithmetic operations as well, but it’s usually more verbose that way.

Getting packets into the driver

To build the actual driver, we use the handy infrastructure above to do the device initialization and configuration and then drive a main loop that accepts packets from the NIC and feeds them into the Snabb program (we will just consider the receive path in this post). The core structure of this main loop is simpler than you might expect.

On a NIC like the Intel 82599, the packets are transferred into the host system’s memory via DMA into a receive descriptor ring. This is a circular buffer that keeps entries that contain a pointer to packet data and then some metadata.

A typical descriptor entry looks like this:

1
2
3
4
5
----------------------------------------------------------------------
|             Address (to memory allocated by driver)                |
----------------------------------------------------------------------
|    VLAN tag    | Errors | Status |    Checksum    |    Length      |
----------------------------------------------------------------------

The driver allocates some DMA-friendly memory (via memory.dma_alloc from core.memory) for the descriptor ring and then sets the NIC registers (RDBAL & RDBAH) so that the NIC knows the physical address of the ring. There are some neat tricks in core.memory which make the virtual to physical address translation easy.

In addition to this ring, a packet buffer is allocated for each entry in the ring and its (physical) address is stored in the first field of the entry (see diagram above).

The NIC will then DMA packets into the buffer as they are received and filtered by the hardware.

The descriptor ring has head/tail pointers (like a typical circular buffer) indicating where new packets arrive, and where the driver is reading off of. The driver mainly sets the tail pointer, indicating how far it has processed.

A Snabb app can introduce new packets into a program by implementing the pull method. A driver’s pull method might have the following shape (based on the intel_app driver):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
local link = require "core.link"

-- pull method definition on Driver class
function Driver:pull ()
   -- make sure the output link exists
   local l = self.output.tx
   if l == nil then return end

   -- sync the driver and HW on descriptor ring head/tail pointers
   self:sync_receive()

   -- pull a standard number of packets for a link
   for i = 1, engine.pull_npackets do
      -- check head/tail pointers to make sure packets are available
      if not self:can_receive() then break end

      -- take packet from descriptor ring, put into the Snabb output link
      link.transmit(l, self:receive())
   end

   -- allocate new packet buffers for all the descriptors we processed
   -- we can't reuse the buffers since they are now owned by the next app
   self:add_receive_buffers()
end

Of course, the real work is done in the helper methods like sync_receive and receive. I won’t go over the implementation of those, but they mostly deal with manipulating the head and tail pointers of the descriptor ring appropriately while doing any allocation that is necessary to keep the ring set up.

The takeaway I wanted to communicate from this skeleton is that using Lua makes for very clear and pleasant code that doesn’t get too bogged down in low-level details. That’s partly because Snabb’s core abstracts the complexity of using device registers and allocating DMA memory and things like that. That kind of abstraction is made a lot easier by LuaJIT and its FFI, so that the surface code looks like it’s just manipulating tables and making function calls.

In the next blog post, I’ll talk about some specific improvements we made to Snabb’s drivers to make it more ready for use with multi-process Snabb apps.

by Asumu Takikawa at November 22, 2017 10:45 AM

November 16, 2017

Alberto Garcia

“Improving the performance of the qcow2 format” at KVM Forum 2017

I was in Prague last month for the 2017 edition of the KVM Forum. There I gave a talk about some of the work that I’ve been doing this year to improve the qcow2 file format used by QEMU for storing disk images. The focus of my work is to make qcow2 faster and to reduce its memory requirements.

The video of the talk is now available and you can get the slides here.

The KVM Forum was co-located with the Open Source Summit and the Embedded Linux Conference Europe. Igalia was sponsoring both events one more year and I was also there together with some of my colleages. Juanjo Sánchez gave a talk about WPE, the WebKit port for embedded platforms that we released.

The video of his talk is also available.

by berto at November 16, 2017 10:16 AM

November 14, 2017

Michael Catanzaro

Igalia is Hiring

Igalia is hiring web browser developers. If you think you’re a good candidate for one of these jobs, you’ll want to fill out the online application accompanying one of the postings. We’d love to hear from you.

We’re especially interested in hiring a browser graphics developer. We realize that not many graphics experts also have experience in web browser development, so it’s OK if you haven’t worked with web browsers before. Low-level Linux graphics experience is the more important qualification for this role.

Igalia is not just a great place to work on cool technical projects like WebKit. It’s also a political and social project: an egalitarian, worker-owned cooperative where everyone has an equal vote in company decisions and receives equal pay. It’s been around for 16 years, so it’s also not a startup. You can work remotely from wherever you happen to be, or from our office in A Coruña, Spain. You won’t have a boss, but you will be expected to work well with your colleagues. It’s not the right fit for everyone, but there’s nowhere I’d rather be.

by Michael Catanzaro at November 14, 2017 03:04 PM

November 13, 2017

Diego Pino

Snabb explained in less than 10 minutes

Last month I attended the 20th edition of GORE (the Spain’s Network Operator Group meeting) where I delivered an introductory talk about Snabb (Spanish). Slides of the talk are also available online (English).

Taking advantage of this presentation I decided to write down an introductory article about Snabb. Something that could allow anyone to understand what’s Snabb easily.

What is Snabb?

Snabb is a toolkit for developing network functions in user-space. This definition refers to two keywords that are worth clarifying: network functions and user-space.

What’s a network function?

A network function is any program that does something on network traffic. What kind of things can be done on traffic? For instance: to read packets, modify their headers, create new packets, discard packets or forward them. Any network function is a combination of these basic operations. Here are some examples:

  • Filtering function (i.e. firewalling): read incoming packets, compare to table of rules and execute an action (forward or drop).
  • Traffic mapping (i.e. NAT): read incoming packets, modify headers and forward packet.
  • Encapsulation (i.e. VPN): read incoming packets, create a new packet, embed packet into new one and send it.

What’s user-space networking?

In the last few years, there has been a new trend for writing down network functions. This new trend consists of writing down the entire network function in user-space and do not leave any processing to the kernel.

Traditionally when writing down network functions we use the abstractions provided by the OS. The goal of any OS is to create abstractions over hardware that programs can use. This happens at many levels. For instance, when dealing with a hard-drive we don’t need to think of heads, cylinders and sectors but use a higher level abstraction: the filesystem. Networking is another layer abstracted by the OS. As programmers, we don’t deal with the NIC directly, instead we work with sockets and have access to APIs to deal with the TCP/IP stack.

However, the addition of higher level abstractions implicitly adds an overhead to the processing of our network function. The first disadvantage is that the function is split in two lands: user-space and kernel-space, and switching between both lands has a cost. And even if we move as much logic as possible to the kernel, there are inherit costs caused by the kernel’s networking layer.

The need of skipping the kernel and program network functions entirely in user-space was triggered by the continuous improvement of hardware. Today is possible to buy a 10G NIC for less than 200 euros. Soon the idea of building high-performance network appliances out of commodity hardware seemed feasible. Someone could pick an Intel Xeon, fill in the available PCI slots with 10G NICs and expect to have the equivalent of a very expensive Cisco or Juniper router for a fraction of its cost.

If we drive the hardware described above entirely with Linux we won’t be able to squeeze all its performance. Every packet hitting the NICs will have to go through the kernel’s networking layer and that has a cost caused by all the operations the kernel does onto packets before they’re available to manipulate by our program. To understand how much this is a problem, I need to introduce the concept of budget in a network function.

Know your network function budget

If we want to make the most of our hardware we generally would like to run our network function at line-rate speed, that means, the maximum speed of the NIC. How much time is that? In a 10G NIC, if we are receiving packets of an average size of 550-bytes at the maximum speed then we’re receiving a new packet every 440ns. That’s all the time we have available to run our network function on a packet.

Usually the way a NIC works is by placing incoming packets in a queue or buffer. This buffer is actually a ring-buffer, that means there are two cursors pointing to the buffer, the Rx cursor and the Tx cursor. When a new packet arrives, the packet is written at the Rx position and the cursor gets updated. When a packet leaves the buffer, the packet is read at the Tx position and the cursor gets updated after read. Our network function fetches packets from the Tx cursor. If it’s too slow processing a packet, the Rx cursor will eventually overpass the TX cursor. When that happens there’s a packet drop (a packet was overwritten before it was consumed).

Let’s go back to the 440ns number. How much time is that? Kernel hacker Jesper Brouer discusses this issue on his excellent talk “Network stack challenges at increasing speed” (I also recommend LWN’s summary of the talk: Improving Linux networking performance). Here’s the cost of some common operations: (cost varies depending on hardware but the order of magnitude is similar across different hardware settings)

  • Spinlock (Lock/Unlock): 16ns.
  • L2 cache hit: 4.3ns.
  • L3 cache hit: 7.9ns.
  • Cache miss: 32ns.

Taking into account those numbers 440ns doesn’t seem like a lot of time. System calls cost is also prohibitive, which should be minimized as much as possible.

Another important thing to notice is that the smaller the size of the packet, the smaller the budget. On a 10G NIC if we’re receiving packets of 64-byte on average, the smallest IPv4 packet size possible, that means we are receiving a new packet every 59ns. In this scenario two straight cache misses would eat the whole budget.

In conclusion, at these NIC speeds the additional overhead the kernel networking layer adds is non trivial, but significantly big enough to affect the execution of our network function. Since our budget gets reduced packets are more likely to be dropped at higher speeds or at smaller packet sizes, limiting the overall performance of our network card.

NOTE: This is a general picture of the issue of doing high-performance networking in the Linux kernel. The kernel hackers are not ignorant of these problems and have been working on ways to fix them in the last years. In that regard is worth mentioning the addition of XDP (eXpress Data Path), a kernel abstraction to execute network functions as closer to the hardware as possible. But that’s a subject for another post.

By-passing the kernel

User-space networking needs to by-pass the kernel’s networking layer so it can squeeze all the performance of the underlying hardware. There are several strategies to do that: user-space drivers, PF_RING, Netmap, etc (Cloudflare has an excellent article on kernel by-pass, commenting several of those strategies). Snabb chooses to handling the hardware directly, that means, to provide user-space drivers for the NICs it supports.

Snabb offers support mostly for Intel cards (although some Solarflare and Mellanox models are also supported). Implementing a driver, either in kernel-space or user-space, is not an easy task. It’s fundamental to have access to the vendor’s datasheet (generally a very large document) to know how to initialize the NIC, how to read packets from it, how to transfer data, etc. Intel provides such datasheet. In fact, Intel started a few years ago a project with a similar goal: DPDK. DPDK is an open-source project that implements drivers in user-space. Although originally it only provided drivers for Intel NICs, as the adoption of the project increased, other vendors have started to add drivers for their hardware.

Inside Snabb

Snabb was started in 2012 by free software hacker Luke Gorrie. Snabb provides direct access to the high-performance NICs but in addition to that it also provides an environment for building and running network functions.

Snabb is composed of several elements:

  • An Engine, that runs the network functions.
  • Libraries, that ease the development of network functions.
  • Apps, reusable software components that generally manipulate packets.
  • Programs, ready-to-use standalone network functions.

A network function in Snabb is a combination of apps connected together by links. The Snabb’s engine is in charge of feeding the app graph with packets and give a chance to every app to execute.

The engine processes the app graph in breadths. A breadth consists of two steps:

  • Inhale, puts packet into the graph.
  • Process, every app has a chance to receive packets and manipulate them.

During the inhale phase the method pull of an app gets executed. Apps that implement such method act as packet generators within the app graph. Packets are placed at the app’s links. Generally there’s only one app of think kind for every graph.

During the process phase the method push of an app gets executed. This gives a chance to every app to read packet from its incoming link, do something with them and likely place them out their outgoing link.

Hands-on example

Let’s build a network function that captures packets from a 10G NIC filters them using a packet-filtering expression and writes down the filtered packets to a pcap file. Such network function would look like this:

Snabb basic filter
Snabb basic filter

In Snabb code the equivalent graph above could be coded like this:

function run()
	local c = config.new()

	-- App definition.
	config.add(c, "nic", Intel82599, {
		pci = "0000:04:00.0"
	})
	config.add(c, "filter", PcapFilter, "src port 80")
	config.add(c, "pcap", Pcap.PcapWriter, "output.pcap")

	-- Link definition.
	config.link(c, "nic.tx        -> filter.input")
	config.link(c, "filter.output -> pcap.input")

	engine.configure(c)
	engine.main({duration=10})
end

A configuration is created describing the app graph of the network function. The configuration is passed down to Snabb which executes it for 10 seconds.

When Snabb’s engine runs this network function it executes the pull method of each app to feed packets into the graph links, inhale step. During the process step, the method push of each app is executed so apps have a chance to fetch packets from their incoming links, do something with them and likely place them into their outgoing links.

Here’s how the real implementation of PcapFilter.push method looks like:

function PcapFilter:push ()
	while not link.empty(self.input.rx) do
 		local p = link.receive(self.input.rx)
  		if self.accept_fn(p.data, p.length) then
     		link.transmit(self.output.tx, p)
     	else
     		packet.free(p)
		end
	end
end

A packet in Snabb is a really simple data structure. Basically, it consists of a length field and an array of bytes of fixed size.

struct packet {
	uint16_t length;
  	unsigned char data[10*1024];
};

A link is a ring-buffer of packets.

struct link {
	struct packet *packets[1024];
  	// the next element to be read
  	int read;
  	// the next element to be written
  	int write;
};

Every app has zero or many input links and zero or many output links. The number of links is created on runtime when the graph is defined. In the example above, the nic app has one outgoing link (nic.tx); the filter app has one incoming link (filter.rx) and one outgoing link (filter.tx); and the pcap app has one incoming link (pcap.input).

It might be surprising that packets and links are defined in C code, instead of Lua. Snabb runs on top of LuaJIT, an ultra-fast virtual machine for executing Lua programs. LuaJIT implements an FFI (Foreign Function Interface) to interact with C data types and call C runtime functions or external libraries directly from Lua code. In Snabb most data structures are defined in C which allows to compact data more efficiently.

local ether_header_t = ffi.typeof [[
/* All values in network byte order.  */
struct {
   uint8_t  dhost[6];
   uint8_t  shost[6];
   uint16_t type;
} __attribute__((packed))
]]

Calling a C-runtime function is really easy too.

ffi.cdef[[
  void syslog(int priority, const char\* format, ...);
]]
ffi.C.syslog(2, "error:...");

Wrapping up and last thoughts

In this article I’ve covered the basics of Snabb. I showed how to use Snabb to build network functions and explained why Snabb is a very convenient toolkit to write such type of programs. Snabb runs very fast since it by-passes the kernel, which makes it very useful for high-performance networking. In addition, Snabb is written in the high-level language Lua which enormously simplifies the entry barrier to start writing network functions.

However, there’s more things in Snabb I left out in this article. Snabb comes with a preset of programs ready to run. It also comes with a vast collection of apps and libraries which can help to speed up the construction of new network functions.

You don’t need to own a Intel10G card to start using Snabb today. Snabb can be used over TAP interfaces. It won’t be highly performant but it’s the best way to start with Snabb.

In a next article I plan to cover a more elaborated example of a network function using TAP interfaces.

November 13, 2017 06:00 AM

November 08, 2017

Eleni Maria Stea

Fosscomm 2017 [update: slides in English]

FOSSCOMM (Free and Open Source Software Communities Meeting) is a Greek conference aiming at free-software and open-source enthusiasts, developers, and communities. The event is solely organized and ran by volunteers (usually university students, communities, Linux User Groups) and is taking place in a different city every year. The attendance is free and everyone is welcome to make a presentation or a workshop related to free and open source projects.

I always try to attend this meeting when the dates and the place are convenient, as it is a great opportunity to meet old friends and hangout with geek people.

This year’s Fosscomm2017 (website is in Greek) was held at the Harokopio University of Athens, during the weekend: 4-5th November 2017.

I grabbed the opportunity to go and give a talk about Mesa 3D, a project where the Igalia’s graphics team makes several contributions and releases the last 5 years.

My presentation was titled: “Hacking on Mesa 3D” and it was a short introductory talk about the OpenGL implementation, the OpenGL extension system, the GLSL compiler, the drivers and some of the development and debugging processes we use.

Original slides in Greek:

index

 

English translation:

index-1

 

Video (in Greek1 because that was the conference language):

 

To my surprise, my talk wasn’t the only one mentioning Igalia. 🙂 Dimitris Glynos, one of the Co-Founders of Census (and FOSSCOMM sponsor) gave the talk “FOSS is all we got: building a competitive IT skill set in Greece today” and mentioned us among other examples of companies that work successfully on open source projects.

Among the other talks I’ve attended, I particularly liked the “Linux Metrics” workshop by Effie Mouzeli and Giorgos Kargiotakis (during FOSSCOMM day #1), that was aiming to teach users and developers how to use metrics tools to detect performance issues. It was so successful that they’ve been asked to re-run it the following day.

Most of the other presentations and workshops, as well as the schedule, can be found here (for those who can understand greek):  https://www.fosscomm.hua.gr/. The FOSSCOMM organizers will soon edit the videos and upload them on a channel on YouTube.

this is my post-FOSSCOMM cup collection

I’d like to thank the people who attended the presentation, the FOSSCOMM2017 organization team who did such a great job on preparing and hosting the event and of course Igalia that is giving me the opportunity to work on cool graphics stuff.

See you at FOSSCOMM 2018! 😉

[1]: Subtitles coming soon.

by hikiko at November 08, 2017 07:30 AM

November 01, 2017

Martin Robinson

Small Things

Even between two highly-developed western countries, there are a lot of cultural differences. After moving, I experienced the sort of culture shock that the Internet warns you about. Thankfully, the passage of time means that grumbling noon-time stomachs gradually give way to curiously peckish 2:30pm lunches. Instead of sitting in dread, willing your useless, polite American hands to flag a waiter, you manage to order a tiny beer using only your eyeballs. Big differences fade into the background so much that maybe you start to keep a list of them, just to avoid the feeling that you are forgetting some original piece of yourself.

This new familiarity begins to expose the incredibly long tail of subtle differences that have been hanging out quietly in the background. Unnamed onomatopoeias have a completely different sound. People are making gestures with their hands while they speak, and those gestures actually mean something very clear. Your brain calmly catalogs these curiosities as they become too trivial to comment on.

If you are like me, you stare at the street, the stoplights, and the sidewalks. Suddenly, the endless, small scale war being waged in the space between the double (and triple) parkers and the buildings becomes apparent. You see the rows of bollards silently holding back a tide of cars and delivery vans. Unspoken rules from your home country no longer apply here, after having taken them for granted for years

I don’t want to blab on too long about mundane things, so I will just point to the example of curb cuts. In the US we use curb cuts to connect the roadway to private garages, driveways, and parking lots. Thousands of dollars are spent lovingly crafting each of these small cement altars to the passage of automobiles. The sidewalk itself kneels to the pavement, so that cars can smoothly and comfortably climb into pedestrian space. This is all, of course, at the expense of people walking and in wheelchairs who often have to travel across an uneven sidewalk and wait for cars as they appear and (hopefully) leave. A curb cut is a signal that at any moment a car may enter the sidewalk and that it has a right to be there.

Spanish cities sometimes use little ramps instead. They look cheap and their metal surface is usually painted a bright and gaudy yellow. Their angle is decidedly steeper compared to modern curb cuts in the US, which means it is not easy to drive onto the sidewalk quickly or comfortably. Additionally, they look like they can also be added and removed cheaply and without modifying the sidewalk at all. Even more bizarrely, they are installed on the sacred roadway itself, so the sidewalk remains level for the all people who might happen to be using it. Sometimes, they even extend so far out into the roadway that parallel parking would be difficult or impossible, which prevents the space from becoming a private parking spot.

These little ramps are a small detail of the city, but for me they send a clear message. They announce to cars that they are entering a segregated pedestrian space. This invitation is conditional on moving slowly and carefully and can be revoked at any time with a hydraulic wrench. Maybe they are common simply because they are a cheap leftover from a period when this was a poorer country. I have a feeling that as time goes on, they will slowly be replaced by compact curb cuts descending from nice, new sidewalks. Despite all this, I feel a little bit of sadness, because their economy and their imperfection made the sidewalk just that much nicer.

November 01, 2017 04:00 AM

October 30, 2017

Víctor Jáquez

GStreamer Conference 2017

This year, the GStreamer Conference happened in Prague, along with the traditional autumn Hackfest.

Prague is a beautiful city, though this year I couldn’t visit it as much as I wanted, since the Embedded Linux Conference Europe and the Open Source Summit also took place there, and Igalia, being a Linux Foundation sponsor, had a booth in the venue, where I talked about our work with WebKit, Snabb, and obviously, GStreamer.

But, let’s back to the GStreamer Hackfest and Conference.

One of the features that I like the most of the GStreamer project is its community, the people involved in it, by writing code, sharing their work with many others. They might appear a bit tough at beginning (or at least that looked to me) but in real they are all kind and talented persons. And I’m proud of consider myself part of this community. Nonetheless it has a diversity problem, as many other Open Source communities.

GStreamer Conference 2017

During the Hackfest, Hyunjun and I, met with Sree and talked about the plans for GStreamer-VAAPI, the new features in VA-API and libva and how we could map them to the GStreamer’s design. Also we talked about the future developments in the msdk elements, merged one year ago in gst-plugins-bad. Also, I talked a bit with Nicolas Dufresne regarding kmssink and DMABuf.

In the Conference, which happened in the same venue as the hackfest, I talked wit the authors of gstreamer-media-SDK. They are really energetic.

I delivered my usual talk about GStreamer-VAAPI. You can find the slides, as a web presentation, here. Also, as every year, our friends of Ubicast, recorded the talks, and made them available for streaming almost instantaneously:

My colleague Enrique talked in the Conference about the Media Source Extensions (MSE) on WebKit, and Hyunjun shared his experience with VA-API on Rust.

Also, in the conference venue, we showed a couple demos. One of them was a MinnowBoard running WPE, rendering videos from YouTube using gstreamer-vaapi to decode video.

by vjaquez at October 30, 2017 04:24 PM

October 22, 2017

Frédéric Wang

Recent Browser Events

TL;DR

At Igalia, we attend many browser events. This is a quick summary of some recents conferences I participated to… or that gave me the opportunity to meet Igalians in Paris 😉.

Week 31: Paris - CSS WG F2F - W3C

My teammate Sergio attended the CSS WG F2F meeting as an observer. On Tuesday morning, I also made an appearance (but it was so brief that ceux que j’ai rencontrés ne m’ont peut-être pas vu). Together with other browser vendors and WG members, Sergio gave an interview regarding the successful story of CSS Grid Layout. By the way, given our implementation work in WebKit and Blink, Igalia finally decided to join the CSS Working Group 😊. Of course, during that week I had dinner with Sergio and it was nice to chat with my colleague in a French restaurant of Montmartre.

Week 38: Tokyo - BlinkOn 8 - Google

Jacobo, Gyuyoung and I attended BlinkOn 8. I had nice discussions and listened to interesting talks about a wide range of topics (Layout NG, Accessibility, CSS, Fonts, Web Predictability & Standards, etc). It was a pleasure to finally meet in persons some developers I had been in touch with during my projects on Ozone/Wayland and WebKit/iOS. For the lightning talks, we presented our activities on embedded linux platforms and the Web Platform. Incidentally, it was great to see Igalia’s work mentioned during the Next Generation Rendering Engine session. Obviously, I had the opportunity to visit places and taste Japanese food in Asakusa, Ueno and Roppongi 😋.

Week 40: A Coruña - Web Engines Hackfest - Igalia

I attended one of my favorite events, that gathers the whole browser community during three days for technical presentations, breakout sessions, hacking and galician food. This year, we had many sponsors and attendees. It is good to see that the event is becoming more and more popular! It was long overdue, but I was finally able to make Brotli and WOFF2 installable as system libraries on Linux and usable by WebKitGTK+ 😊. I opened similar bugs in Gecko and the same could be done in Chromium. Among the things I enjoyed, I met Jonathan Kew in person and heard more about Antonio and Maksim’s progress on Ozone/Wayland. As usual, it was nice to share time with colleagues, attend the assembly meeting, play football matches, have meals, visit Asturias… and tell one’s story 😉.

Week 41: San Jose - WebKit Contributors Meeting - Apple

In the past months, I have mostly been working on WebKit at Igalia and I would have been happy to see my fellow WebKit developers. However, given the events in Japan and Spain, I was not willing to make another trip to the USA just after. Hence I had to miss the WebKit Contributors Meeting again this year 😞. Fortunately, my colleagues Alex, Michael and Žan were present. Igalia is an important contributor to WebKit and we will continue to send people and propose some talks next year.

Week 42: Paris - Monthly Speaker Series - Mozilla

This Wednesday, I attended a conference on Privacy as a Competitive Advantage in Mozilla’s office. It was nice to hear about the increasing interest on privacy and to see the regulation made by the European Union in that direction. My colleague Philippe was visiting the office to work with some Mozilla developers on one of our project, so I was also able to meet him in the conference room. Actually, Mozilla employees were kind enough to let me stay at the office after the conference… Hence I was able to work on Apple’s Web Engine on a project sponsored by Google at the Mozilla office… probably something you can only do at Igalia 😉. Last but not least, Guillaume was also in holidays in Paris this week, so I let you imagine what happens when three French guys meet (hint: it involves food 😋).

October 22, 2017 10:00 PM

October 21, 2017

Adrián Pérez

Web Engines Hackfest, 2017 Edition

At the beginning of October I had the wonderful chance of attending the Web Engines Hackfest in A Coruña, hosted by Igalia. This year we were over 50 participants, which was great to associate even more faces to IRC nick names, but more importantly allows hackers working at all the levels of the Web stack to share a common space for a few days, making it possible to discuss complex topics and figure out the future of the projects which allow humanity to see pictures of cute kittens — among many other things.

Mandatory fluff (CC-BY-NC).

During the hackfest I worked mostly on three things:

  • Preparing the code of the WPE WebKit port to start making preview releases.

  • A patch set which adds WPE packages to Buildroot.

  • Enabling support for the CSS generic system font family.

Fun trivia: Most of the WebKit contributors work from the United States, so the week of the Web Engines hackfest is probably the only single moment during the whole year that there is a sizeable peak of activity in European day times.

Watching repository activity during the hackfest.

Towards WPE Releases

At Igalia we are making an important investment in the WPE WebKit port, which is specially targeted towards embedded devices. An important milestone for the project was reached last May when the code was moved to main WebKit repository, and has been receiving the usual stream of improvements and bug fixes. We are now approaching the moment where we feel that is is ready to start making releases, which is another major milestone.

Our plan for the WPE is to synchronize with WebKitGTK+, and produce releases for both in parallel. This is important because both ports share a good amount of their code and base dependencies (GStreamer, GLib, libsoup) and our efforts to stabilize the GTK+ port before each release will benefit the WPE one as well, and vice versa. In the coming weeks we will be publishing the first official tarball starting off the WebKitGTK+ 2.18.x stable branch.

Wild WEBKIT PORT appeared!

Syncing the releases for both ports means that:

  • Both stable and unstable releases are done in sync with the GNOME release schedule. Unstable releases start at version X.Y.1, with Y being an odd number.

  • About one month before the release dates, we create a new release branch and from there on we work on stabilizing the code. At least one testing release with with version X.Y.90 will be made. This is also what GNOME does, and we will mimic this to avoid confusion for downstream packagers.

  • The stable release will have version X.Y+1.0. Further maintenance releases happen from the release branch as needed. At the same time, a new cycle of unstable releases is started based on the code from the tip of the repository.

Believe it or not, preparing a codebase for its first releases involves quite a lot of work, and this is what took most of my coding time during the Web Engines Hackfest and also the following weeks: from small fixes for build failures all the way to making sure that public API headers (only the correct ones!) are installed and usable, that applications can be properly linked, and that release tarballs can actually be created. Exhausting? Well, do not forget that we need to set up a web server to host the tarballs, a small website, and the documentation. The latter has to be generated (there is still pending work in this regard), and the whole process of making a release scripted.

Still with me? Great. Now for a plot twist: we won't be making proper releases just yet.

APIs, ABIs, and Releases

There is one topic which I did not touch yet: API/ABI stability. Having done a release implies that the public API and ABI which are part of it are stable, and they are not subject to change.

Right after upstreaming WPE we switched over from the cross-port WebKit2 C API and added a new, GLib-based API to WPE. It is remarkably similar (if not the same in many cases) to the API exposed by WebKitGTK+, and this makes us confident that the new API is higher-level, more ergonomic, and better overall. At the same time, we would like third party developers to give it a try (which is easier having releases) while retaining the possibility of getting feedback and improving the WPE GLib API before setting it on stone (which is not possible after a release).

It is for this reason that at least during the first WPE release cycle we will make preview releases, meaning that there might be API and ABI changes from one release to the next. As usual we will not be making breaking changes in between releases of the same stable series, i.e. code written for 2.18.0 will continue to build unchanged with any subsequent 2.18.X release.

At any rate, we do not expect the API to receive big changes because —as explained above— it mimics the one for WebKitGTK+, which has already proven itself both powerful enough for complex applications and convenient to use for the simpler ones. Due to this, I encourage developers to try out WPE as soon as we have the first preview release fresh out of the oven.

Packaging for Buildroot

At Igalia we routinely work with embedded devices, and often we make use of Buildroot for cross-compilation. Having actual releases of WPE will allow us to contribute a set of build definitions for the WPE WebKit port and its dependencies — something that I have already started working on.

Lately I have been taking care of keeping the WebKitGTK+ packaging for Buildroot up-to-date and it has been delightful to work with such a welcoming community. I am looking forward to having WPE supported there, and to keep maintaining the build definitions for both. This will allow making use of WPE with relative ease, while ensuring that Buildroot users will pick our updates promptly.

Generic System Font

Some applications like GNOME Web Epiphany use a WebKitWebView to display widget-like controls which try to follow the design of the rest of the desktop. Unfortunately for GNOME applications this means Cantarell gets hardcoded in the style sheet —it is the default font after all— and this results in mismatched fonts when the user has chosen a different font for the interface (e.g. in Tweaks). You can see this in the following screen capture of Epiphany:

Web using hardcoded Cantarell and (on hover) -webkit-system-font.

Here I have configured the beautiful Inter UI font as the default for the desktop user interface. Now, if you roll your mouse over the image, you will see how much better it looks to use a consistent font. This change also affects the list of plugins and applications, error messages, and in general all the about: pages.

If you are running GNOME 3.26, this is already fixed using font: menu (part of the CSS spec since ye olde CSS 2.1) — but we can do better: Safari has had support since 2015, for a generic “system” font family, similar to sans-serif or cursive:

/* Using the new generic font family (nice!). */
body {
    font-family: -webkit-system-font;
}

/* Using CSS 2.1 font shorthands (not so nice). */
body {
    font: menu;       /* Pick ALL font attributes... */
    font-size: 12pt;  /* ...then reset some of them. */
    font-weight: 400;
}

During the hackfest I implemented the needed moving parts in WebKitGTK+ by querying the GtkSettings::gtk-font-name property. This can be used in HTML content shown in Epiphany as part of the UI, and to make the Web Inspector use the system font as well.

Web Inspector using Cantarell, the default GNOME 3 font (full size).

I am convinced that users do notice and appreciate attention to detail, even if they do unconsciously, and therefore it is worthwhile to work on this kind of improvements. Plus, as a design enthusiast with a slight case of typographic OCD, I cannot stop myself from noticing inconsistent usage of fonts and my mind is now at ease knowing that opening the Web Inspector won't be such a jarring experience anymore.

Outro

But there's one more thing: On occasion we developers have to debug situations in which a process is seemingly stuck. One useful technique involves running the offending process under the control of a debugger (or, in an embedded device, under gdbserver and controlled remotely), interrupting its execution at intervals, and printing stack traces to try and figure out what is going on. Unfortunately, in some circumstances running a debugger can be difficult or impractical. Wouldn't it be grand if it was possible to interrupt the process without needing a debugger and request a stack trace? Enter “Out-Of-Band Stack Traces” (proof of concept):

  1. The process installs a signal handler using sigaction(7), with the SA_SIGINFO flag set.

  2. On reception of the signal, the kernel interrupts the process (even if it's in an infinite loop), and invokes the signal handler passing an extra pointer to an ucontext_t value, which contains a snapshot of the execution status of the thread which was in the CPU before the signal handler was invoked. This is true for many platform including Linux and most BSDs.

  3. The signal handler code can get obtain the instruction and stack pointers from the ucontext_t value, and walk the stack to produce a stack trace of the code that was being executed. Jackpot! This is of course architecture dependent but not difficult to get right (and well tested) for the most common ones like x86 and ARM.

The nice thing about this approach is that the code that obtains the stack trace is built into the program (no rebuilds needed), and it does not even require to relaunch the process in a debugger — which can be crucial for analyzing situations which are hard to reproduce, or which do not happen when running inside a debugger. I am looking forward to have some time to integrate this properly into WebKitGTK+ and specially WPE, because it will be most useful in embedded devices.

by aperez (adrian@perezdecastro.org) at October 21, 2017 01:30 AM

October 17, 2017

Enrique Ocaña

Attending the GStreamer Conference 2017

This weekend I’ll be in Node5 (Prague) presenting our Media Source Extensions platform implementation work in WebKit using GStreamer.

The Media Source Extensions HTML5 specification allows JavaScript to generate media streams for playback and lets the web page have more control on complex use cases such as adaptive streaming.

My plan for the talk is to start with a brief introduction about the motivation and basic usage of MSE. Next I’ll show a design overview of the WebKit implementation of the spec. Then we’ll go through the iterative evolution of the GStreamer platform-specific parts, as well as its implementation quirks and challenges faced during the development. The talk continues with a demo, some clues about the future work and a final round of questions.

Our recent MSE work has been on desktop WebKitGTK+ (the WebKit version powering the Epiphany, aka: GNOME Web), but we also have MSE working on WPE and optimized for a Raspberry Pi 2. We will be showing it in the Igalia booth, in case you want to see it working live.

I’ll be also attending the GStreamer Hackfest the days before. There I plan to work on webm support in MSE, focusing on any issue in the Matroska demuxer or the vp9/opus/vorbis decoders breaking our use cases.

See you there!

UPDATE 2017-10-22:

The talk slides are available at https://eocanha.org/talks/gstconf2017/gstconf-2017-mse.pdf and the video is available at https://gstconf.ubicast.tv/videos/media-source-extension-on-webkit (the rest of the talks here).

by eocanha at October 17, 2017 10:48 AM

October 15, 2017

Javier Muñoz

Attending LibreCon 2017

This week I will be attending LibreCon 2017, one of the largest international events on open source technologies. It will be held on 19 and 20 October in Santiago de Compostela (Spain).

This year’s theme is the application of open source technologies in the industrial and primary sector, as well as the new opportunities that these technologies offer in areas like Cloud Computing, Big Data, Internet of Things (IoT) and the Sharing Economy.

I will be delivering one talk, under the sponsorship of my company Igalia, on Ceph Object Storage and its S3 API. I will introduce the Ceph architecture and the basics to understand how make cloud storage products and services based on Ceph/RGW. I will also comment on the most useful and supported S3 API and tooling working with Ceph.

See you there!

by Javier at October 15, 2017 10:00 PM

October 02, 2017

Iago Toral

Working with lights and shadows – Part III: rendering the shadows

In the previous post in this series I introduced how to render the shadow map image, which is simply the depth information for the scene from the view point of the light. In this post I will cover how to use the shadow map to render shadows.

The general idea is that for each fragment we produce we compute the light space position of the fragment. In this space, the Z component tells us the depth of the fragment from the perspective of the light source. The next step requires to compare this value with the shadow map value for that same X,Y position. If the fragment’s light space Z is larger than the value we read from the shadow map, then it means that this fragment is behind an object that is closer to the light and therefore we can say that it is in the shadows, otherwise we know it receives direct light.

Changes in the shader code

Let’s have a look at the vertex shader changes required for this:

void main()
{
   vec4 pos = vec4(in_position.x, in_position.y, in_position.z, 1.0);
   out_world_pos = Model * pos;
   gl_Position = Projection * View * out_world_pos;

   [...]

   out_light_space_pos = LightViewProjection * out_world_pos;
} 

The vertex shader code above only shows the code relevant to the shadow mapping technique. Model is the model matrix with the spatial transforms for the vertex we are rendering, View and Projection represent the camera’s view and projection matrices and the LightViewProjection represents the product of the light’s view and projection matrices. The variables prefixed with ‘out’ represent vertex shader outputs to the fragment shader.

The code generates the world space position of the vertex (world_pos) and clip space position (gl_Position) as usual, but then also computes the light space position for the vertex (out_light_space_pos) by applying the View and Projection transforms of the light to the world position of the vertex, which gives us the position of the vertex in light space. This will be used in the fragment shader to sample the shadow map.

The fragment shader will need to:

  1. Apply perspective division to compute NDC coordinates from the interpolated light space position of the fragment. Notice that this process is slightly different between OpenGL and Vulkan, since Vulkan’s NDC Z is expected to be in the range [0, 1] instead of OpenGL’s [-1, 1].
  • Transform the X,Y coordinates from NDC space [-1, 1] to texture space [0, 1].

  • Sample the shadow map and compare the result with the light space Z position we computed for this fragment to decide if the fragment is shadowed.

  • The implementation would look something like this:

    float
    compute_shadow_factor(vec4 light_space_pos, sampler2D shadow_map)
    {
       // Convert light space position to NDC
       vec3 light_space_ndc = light_space_pos.xyz /= light_space_pos.w;
    
       // If the fragment is outside the light's projection then it is outside
       // the light's influence, which means it is in the shadow (notice that
       // such sample would be outside the shadow map image)
       if (abs(light_space_ndc.x) > 1.0 ||
           abs(light_space_ndc.y) > 1.0 ||
           abs(light_space_ndc.z) > 1.0)
          return 0.0;
    
       // Translate from NDC to shadow map space (Vulkan's Z is already in [0..1])
       vec2 shadow_map_coord = light_space_ndc.xy * 0.5 + 0.5;
    
       // Check if the sample is in the light or in the shadow
       if (light_space_ndc.z > texture(shadow_map, shadow_map_coord.xy).x)
          return 0.0; // In the shadow
    
       // In the light
       return 1.0;
    }  
    

    The function returns 0.0 if the fragment is in the shadows and 1.0 otherwise. Note that the function also avoids sampling the shadow map for fragments that are outside the light’s frustum (and therefore are not recorded in the shadow map texture): we know that any fragment in this situation is shadowed because it is obviously not visible from the light. This assumption is valid for spotlights and point lights because in these cases the shadow map captures the entire influence area of the light source, for directional lights that affect the entire scene however, we usually need to limit the light’s frustum to the surroundings of the camera, and in that case we probably want want to consider fragments outside the frustum as lighted instead.

    Now all that remains in the shader code is to use this factor to eliminate the diffuse and specular components for fragments that are in the shadows. To achieve this we can simply multiply these components by the factor computed by this function.

    Changes in the program

    The list of changes in the main program are straight forward: we only need to update the pipeline layout and descriptors to attach the new resources required by the shaders, specifically, the light’s view projection matrix in the vertex shader (which could be bound as a push constant buffer or a uniform buffer for example) and the shadow map sampler in the fragment shader.

    Binding the light’s ViewProjection matrix is no different from binding the other matrices we need in the shaders so I won’t cover it here. The shadow map sampler doesn’t really have any mysteries either, but since that is new let’s have a look at the code:

    ...
    VkSampler sampler;
    VkSamplerCreateInfo sampler_info = {};
    sampler_info.sType = VK_STRUCTURE_TYPE_SAMPLER_CREATE_INFO;
    sampler_info.addressModeU = VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE;
    sampler_info.addressModeV = VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE;
    sampler_info.addressModeW = VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE;
    sampler_info.anisotropyEnable = false;
    sampler_info.maxAnisotropy = 1.0f;
    sampler_info.borderColor = VK_BORDER_COLOR_INT_OPAQUE_BLACK;
    sampler_info.unnormalizedCoordinates = false;
    sampler_info.compareEnable = false;
    sampler_info.compareOp = VK_COMPARE_OP_ALWAYS;
    sampler_info.magFilter = VK_FILTER_LINEAR;
    sampler_info.minFilter = VK_FILTER_LINEAR;
    sampler_info.mipmapMode = VK_SAMPLER_MIPMAP_MODE_NEAREST;
    sampler_info.mipLodBias = 0.0f;
    sampler_info.minLod = 0.0f;
    sampler_info.maxLod = 100.0f;
    
    VkResult result =
       vkCreateSampler(device, &sampler_info, NULL, &sampler);
    ...
    

    This creates the sampler object that we will use to sample the shadow map image. The address mode fields are not very relevant since our shader ensures that we do not attempt to sample outside the shadow map, we use linear filtering, but that is not mandatory of course, and we select nearest for the mipmap filter because we don’t have more than one miplevel in the shadow map.

    Next we have to bind this sampler to the actual shadow map image. As usual in Vulkan, we do this with a descriptor update. For that we need to create a descriptor of type VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, and then do the update like this:

    VkDescriptorImageInfo image_info;
    image_info.sampler = sampler;
    image_info.imageView = shadow_map_view;
    image_info.imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
    
    VkWriteDescriptorSet writes;
    writes.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
    writes.pNext = NULL;
    writes.dstSet = image_descriptor_set;
    writes.dstBinding = 0;
    writes.dstArrayElement = 0;
    writes.descriptorCount = 1;
    writes.descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
    writes.pBufferInfo = NULL;
    writes.pImageInfo = &image_info;
    writes.pTexelBufferView = NULL;
    
    vkUpdateDescriptorSets(ctx->device, 1, &writes, 0, NULL);
    

    A combined image sampler brings together the texture image to sample from (a VkImageView of the image actually) and the description of the filtering we want to use to sample that image (a VkSampler). As with all descriptor sets, we need to indicate its binding point in the set (in our case it is 0 because we have a separate descriptor set layout for this that only contains one binding for the combined image sampler).

    Notice that we need to specify the layout of the image when it will be sampled from the shaders, which needs to be VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL.
    If you revisit the definition of our render pass for the shadow map image, you’ll see that we had it automatically transition the shadow map to this layout at the end of the render pass, so we know the shadow map image will be in this layout immediately after it has been rendered, so we don’t need to add barriers to execute the layout transition manually.

    So that’s it, with this we have all the pieces and our scene should be rendering shadows now. Unfortunately, we are not quite done yet, if you look at the results, you will notice a lot of dark noise in surfaces that are directly lit. This is an artifact of shadow mapping called self-shadowing or shadow acne. The next section explains how to get rid of it.

    Self-shadowing artifacts

    Eliminating self-shadowing

    Self-shadowing can happen for fragments on surfaces that are directly lit by a source light for which we are producing a shadow map. The reason for this is that these are the fragments’s Z coordinate in light space should exactly match the value we read from the shadow map for the same X,Y coordinates. In other words, for these fragments we expect:

    light_space_ndc.z == texture(shadow_map, shadow_map_coord.xy).x.
    

    However, due to different precession errors that can be generated on both sides of that equation, we may end up with slightly different values for each side and when the value we produce for light_space_ndc.z end ups being larger than what we read from the shadow map, even if it is a very small amount, it will mark the pixel as shadowed, leading to the result we see in that image.

    The usual way to fix this problem involves adding a small depth offset or bias to the depth values we store in the shadow map so we ensure that we always read a larger value from the shadow map for the fragment. Another way to think about this is to think that when we record the shadow map, we push every object in the scene slightly away from the light source. Unfortunately, this depth offset bias should not be a constant value, since the angle between the surface normals and the vectors from the light source to the fragments also affects the bias value that we should use to correct the self-shadowing.

    Thankfully, GPU hardware provides means to account for this. In Vulkan, when we define the rasterization state of the pipeline we use to create the shadow map, we can add the following:

    VkPipelineRasterizationStateCreateInfo rs;
    ...
    rs.depthBiasEnable = VK_TRUE;
    rs.depthBiasConstantFactor = 4.0f;
    rs.depthBiasSlopeFactor = 1.5f;
    

    Where depthBiasConstantFactor is a constant factor that is automatically added to all depth values produced and depthBiasSlopeFactor is a factor that is used to compute depth offsets also based on the angle. This provides us with the means we need without having to do any extra work in the shaders ourselves to offset the depth values correctly. In OpenGL the same functionality is available via glPolygonOffset().

    Notice that the bias values that need to be used to obtain the best results can change for each scene. Also, notice that too big values can lead to shadows that are “detached” from the objects that cast them leading to very unrealistic results. This effect is also known as Peter Panning, and can be observed in this image:

    Peter Panning artifacts

    As we can see in the image, we no longer have self-shadowing, but now we have the opposite problem: the shadows casted by the red and blue blocks are visibly incorrect, as if they were being rendered further away from the light source than they should be.

    If the bias values are chosen carefully, then we should be able to get a good result, although some times we might need to accept some level of visible self-shadowing or visible Peter Panning:

    Correct shadowing

    The image above shows correct shadowing without any self-shadowing or visible Peter Panning. You may wonder why we can’t see some of the shadows from the red light in the floor where the green light is more intense. The reason is that even though it is not clear because I don’t actually render the objects projecting the lights, the green light is mostly looking down, so its reflection on the floor (that has normals pointing upwards) is strong enough that the contribution from the red light to the floor pixels in this area is insignificant in comparison making the shadows casted from the red light barely visible. You can still see some shadowing if you get close enough with the camera though, I promise 😉

    Shadow antialiasing

    The images above show aliasing around at the edges of the shadows. This happens because for each fragment we decide if it is shadowed or not as a boolean decision, and we use that result to fully shadow or fully light the pixel, leading to aliasing:

    Shadow aliasing

    Another thing contributing to the aliasing effect is that a single pixel in the shadow map image can possibly expand to multiple pixels in camera space. That can happen if the camera is looking at an area of the scene that is close to the camera, but far away from the light source for example. In that case, the resolution of that area of the scene in the shadow map is small, but it is large for the camera, meaning that we end up sampling the same pixel from the shadow map to shadow larger areas in the scene as seen by the camera.

    Increasing the resolution of the shadow map image will help with this, but it is not a very scalable solution and can quickly become prohibitive. Alternatively, we can implement something called Percentage-Closer Filtering to produce antialiased shadows. The technique is simple: instead of sampling just one texel from the shadow map, we take multiple samples in its neighborhood and average the results to produce shadow factors that do not need to be exactly 1 o 0, but can be somewhere in between, producing smoother transitions for shadowed pixels on the shadow edges. The more samples we take, the smoother the shadows edges get but do note that extra samples per pixel also come with a performance cost.

    Smooth shadows with PCF

    This is how we can update our compute_shadow_factor() function to add PCF:

    float
    compute_shadow_factor(vec4 light_space_pos,
                          sampler2D shadow_map,
                          uint shadow_map_size,
                          uint pcf_size)
    {
       vec3 light_space_ndc = light_space_pos.xyz /= light_space_pos.w;
    
       if (abs(light_space_ndc.x) > 1.0 ||
           abs(light_space_ndc.y) > 1.0 ||
           abs(light_space_ndc.z) > 1.0)
          return 0.0;
    
       vec2 shadow_map_coord = light_space_ndc.xy * 0.5 + 0.5;
    
       // compute total number of samples to take from the shadow map
       int pcf_size_minus_1 = int(pcf_size - 1);
       float kernel_size = 2.0 * pcf_size_minus_1 + 1.0;
       float num_samples = kernel_size * kernel_size;
    
       // Counter for the shadow map samples not in the shadow
       float lighted_count = 0.0;
    
       // Take samples from the shadow map
       float shadow_map_texel_size = 1.0 / shadow_map_size;
       for (int x = -pcf_size_minus_1; x <= pcf_size_minus_1; x++)
       for (int y = -pcf_size_minus_1; y <= pcf_size_minus_1; y++) {
          // Compute coordinate for this PFC sample
          vec2 pcf_coord = shadow_map_coord + vec2(x, y) * shadow_map_texel_size;
    
          // Check if the sample is in light or in the shadow
          if (light_space_ndc.z <= texture(shadow_map, pcf_coord.xy).x)
             lighted_count += 1.0;
       }
    
       return lighted_count / num_samples;
    }
    

    We now have a loop where we go through the samples in the neighborhood of the texel and average their respective shadow factors. Notice that because we sample the shadow map in texture space [0, 1], we need to consider the size of the shadow map image to properly compute the coordinates for the texels in the neighborhood so the application needs to provide this for every shadow map.

    Conclusion

    In this post we discussed how to use the shadow map image to produce shadows in the scene as well as typical issues that can show up with the shadow mapping technique, such as self-shadowing and aliasing, and how to correct them. This will be the last post in this series, there is a lot more stuff to cover about lighting and shadowing, such as Cascaded Shadow Maps (which I introduced briefly in this other post), but I think (or I hope) that this series provides enough material to get anyone interested in the technique a reference for how to implement it.

    by Iago Toral at October 02, 2017 09:42 AM

    September 30, 2017

    Samuel Iglesias

    II Google Devfest Asturias 2017

    Hoy os hablo en la lengua de Cervantes para comentaros que el miércoles pasado fui invitado a dar una charla sobre Vulkan en el II Google DevFest Asturias organizado por GDG Asturias. Cabe destacar que este evento parte de la VII Semana de Impulso TIC organizada por el COIIPA y CITIPA, la cual es una magnífica manera de conocer qué se está haciendo en el mundo de las TIC en el Principado de Asturias.

    Mi charla se centró en explicar qué problemas pretende solucionar Vulkan y cuáles son los conceptos que introduce este nuevo API para aplicaciones de gráficos 3D. Espero que sea una charla útil para la gente que quiera conocer Vulkan teniendo algo de conocimiento previo en gráficos.

    Las slides de la charla están subidas aquí.

    GDG Asturias

    September 30, 2017 10:33 AM