Planet Igalia

December 07, 2017

Víctor Jáquez

Enabling HuC for SKL/KBL in Debian/testing

Recently, our friend Florent complained that it was impossible to set a constant bitrate when encoding H.264 using low-power profile with gstreamer-vaapi .

Low-power (LP) profiles are VA-API entry points, available in Intel SkyLake-based procesor and succesors, which provide video encoding with low power consumption.

Later on, Ullysses and Sree, pointed out that CBR in LP is ony possible if HuC is enabled in the kernel.

HuC is a firmware, loaded by i915 kernel module, designed to offload some of the media functions from the CPU to GPU. One of these functions is bitrate control when encoding. HuC saves unnecessary CPU-GPU synchronization.

In order to load HuC, it is required first to load GuC, another Intel’s firmware designed to perform graphics workload scheduling on the various graphics parallel engines.

How we can install and configure these firmwares to enable CBR in low-power profile, among other things, in Debian/testing?

Check i915 parameters

First we shall confirm that our kernel and our i915 kernel module is capable to handle this functionality:

$ sudo modinfo i915 | egrep -i "guc|huc|dmc"
firmware:       i915/bxt_dmc_ver1_07.bin
firmware:       i915/skl_dmc_ver1_26.bin
firmware:       i915/kbl_dmc_ver1_01.bin
firmware:       i915/kbl_guc_ver9_14.bin
firmware:       i915/bxt_guc_ver8_7.bin
firmware:       i915/skl_guc_ver6_1.bin
firmware:       i915/kbl_huc_ver02_00_1810.bin
firmware:       i915/bxt_huc_ver01_07_1398.bin
firmware:       i915/skl_huc_ver01_07_1398.bin
parm:           enable_guc_loading:Enable GuC firmware loading (-1=auto, 0=never [default], 1=if available, 2=required) (int)
parm:           enable_guc_submission:Enable GuC submission (-1=auto, 0=never [default], 1=if available, 2=required) (int)
parm:           guc_log_level:GuC firmware logging level (-1:disabled (default), 0-3:enabled) (int)
parm:           guc_firmware_path:GuC firmware path to use instead of the default one (charp)
parm:           huc_firmware_path:HuC firmware path to use instead of the default one (charp)

Install firmware

$ sudo apt install firmware-misc-nonfree

Verify the firmware are installed:

$ ls -1 /lib/firmware/i915/
bxt_dmc_ver1_07.bin
bxt_dmc_ver1.bin
bxt_guc_ver8_7.bin
bxt_huc_ver01_07_1398.bin
kbl_dmc_ver1_01.bin
kbl_dmc_ver1.bin
kbl_guc_ver9_14.bin
kbl_huc_ver02_00_1810.bin
skl_dmc_ver1_23.bin
skl_dmc_ver1_26.bin
skl_dmc_ver1.bin
skl_guc_ver1.bin
skl_guc_ver4.bin
skl_guc_ver6_1.bin
skl_guc_ver6.bin
skl_huc_ver01_07_1398.bin

Update modprobe configuration

Edit or create the configuration file /etc/modprobe.d/i915.con

$ sudo vim /etc/modprobe.d/i915.conf
....
$ cat /etc/modprobe.d/i915.conf
options i915 enable_guc_loading=1 enable_guc_submission=1

Reboot

$ sudo systemctl reboot 

Verification

Now it is possible to verify that the i915 module kernel loaded the firmware correctly by looking at the kenrel logs:

$ journalctl -b -o short-monotonic -k | egrep -i "i915|dmr|dmc|guc|huc"
[   10.303849] miau kernel: Setting dangerous option enable_guc_loading - tainting kernel
[   10.303852] miau kernel: Setting dangerous option enable_guc_submission - tainting kernel
[   10.336318] miau kernel: i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[   10.338664] miau kernel: i915 0000:00:02.0: firmware: direct-loading firmware i915/kbl_dmc_ver1_01.bin
[   10.339635] miau kernel: [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_01.bin (v1.1)
[   10.361811] miau kernel: i915 0000:00:02.0: firmware: direct-loading firmware i915/kbl_huc_ver02_00_1810.bin
[   10.362422] miau kernel: i915 0000:00:02.0: firmware: direct-loading firmware i915/kbl_guc_ver9_14.bin
[   10.393117] miau kernel: [drm] GuC submission enabled (firmware i915/kbl_guc_ver9_14.bin [version 9.14])
[   10.410008] miau kernel: [drm] Initialized i915 1.6.0 20170619 for 0000:00:02.0 on minor 0
[   10.559614] miau kernel: snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
[   11.937413] miau kernel: i915 0000:00:02.0: fb0: inteldrmfb frame buffer device

That means that HuC and GuC firmwares were loaded successfully.

Now we can check the status of the modules using sysfs

$ sudo cat /sys/kernel/debug/dri/0/i915_guc_load_status
GuC firmware status:
        path: i915/kbl_guc_ver9_14.bin
        fetch: SUCCESS
        load: SUCCESS
        version wanted: 9.14
        version found: 9.14
        header: offset is 0; size = 128
        uCode: offset is 128; size = 142272
        RSA: offset is 142400; size = 256

GuC status 0x800330ed:
        Bootrom status = 0x76
        uKernel status = 0x30
        MIA Core status = 0x3

Scratch registers:
         0:     0xf0000000
         1:     0x0
         2:     0x0
         3:     0x5f5e100
         4:     0x600
         5:     0xd5fd3
         6:     0x0
         7:     0x8
         8:     0x3
         9:     0x74240
        10:     0x0
        11:     0x0
        12:     0x0
        13:     0x0
        14:     0x0
        15:     0x0
$ sudo cat /sys/kernel/debug/dri/0/i915_huc_load_status
HuC firmware status:
        path: i915/kbl_huc_ver02_00_1810.bin
        fetch: SUCCESS
        load: SUCCESS
        version wanted: 2.0
        version found: 2.0
        header: offset is 0; size = 128
        uCode: offset is 128; size = 218304
        RSA: offset is 218432; size = 256

HuC status 0x00006080:

Test GStremer

$ gst-launch-1.0 videotestsrc num-buffers=1000 ! video/x-raw, format=NV12, width=1920, height=1080, framerate=\(fraction\)30/1 ! vaapih264enc bitrate=8000 keyframe-period=30 tune=low-power rate-control=cbr ! mp4mux ! filesink location=test.mp4
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Got context from element 'vaapiencodeh264-0': gst.vaapi.Display=context, gst.vaapi.Display=(GstVaapiDisplay)"\(GstVaapiDisplayGLX\)\ vaapidisplayglx0";
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:11.620036001
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...
$ gst-discoverer-1.0 test.mp4 
Analyzing file:///home/vjaquez/gst/master/intel-vaapi-driver/test.mp4
Done discovering file:///home/vjaquez/test.mp4

Topology:
  container: Quicktime
    video: H.264 (High Profile)

Properties:
  Duration: 0:00:33.333333333
  Seekable: yes
  Live: no
  Tags: 
      video codec: H.264 / AVC
      bitrate: 8084005
      encoder: VA-API H264 encoder
      datetime: 2017-12-07T14:29:23Z
      container format: ISO MP4/M4A

Misison accomplished!

References

by vjaquez at December 07, 2017 02:35 PM

November 28, 2017

Diego Pino

Practical Snabb

In a previous article I introduced Snabb, a toolkit for developing network functions. In this article I want to dive into some practical examples on how to use Snabb for network function programming.

The elements of a network function

A network function is any program that does something with traffic data. There’s a certain set of operations that can be done onto any packet. Operations such as reading, modifying (headers or payload), creating (new packets), dropping or forwarding. Any network function is a combination of these primitives. For instance, a NAT function consists of packet header modification and forwarding.

Some of the built-in network functions featured in Snabb are:

  • lwAFTR (NAT, encap/decap): Implementation of the lwAFTR network function as specified in RFC7596. lwAFTR is a NAT between IPv6 and IPv4 address+port.
  • IPSEC (processing): encryption of packet payloads using AES instructions.
  • Snabbwall (filtering): a L7 firewall that relies on libnDPI for Deep-Packet Inspection. It also allows L3/L4 filtering using tcpdump alike expressions.

Real-world scenarios

The downside of by-passing the kernel and taking full control of a NIC is that the NIC cannot be used by any other program. That means the network function run by Snabb acts as a black-box. Some traffic comes in, gets transformed and it’s pushed out through the same NIC (or any other NIC controlled by the network function). The advantage is clear, outstanding performance.

For this reason Snabb is mostly used to develop network functions that run within the ISP’s network, where traffic load is expected to be high. An ISP can spare one or several NICs to run a network function alone since the results pay off (lower hardware costs, custom network function development, good performance, etc).

Snabb might seem like a less attractive tool in other scenarios. However, that doesn’t mean it cannot be used to program network functions that run in a personal computer or in a less demanding network. Snabb has interfaces to Tap, Raw socket and Unix socket programming, which allows to use Snabb as a program managed by the kernel. In fact, using some of these interfaces is the best way to start with Snabb if you don’t count with native hardware support.

Building Snabb

In this tutorial I’ll cover two examples to help me illustrate how to use Snabb. But before proceeding with the examples, we need to download and build Snabb.

$ git clone https://github.com/snabbco/snabb
$ cd snabb
$ make

Now we can run the snabb executable, which will print out a list of all the subprograms available:

$ cd src/
$ sudo ./snabb
Usage: ./snabb <program> ...

This snabb executable has the following programs built in:
  config
  example_replay
  example_spray
  firehose
  ...
  snsh
  wall

For detailed usage of any program run:
  snabb <program> --help

If you rename (or copy or symlink) this executable with one of
the names above then that program will be chosen automatically.

Hello world!

One of the simplest network functions to build is something that reads packets from a source, filters some of them and forwards the rest to an output. In this case I want to capture traffic from my browser (packets to HTTP or HTTPS). Here is how our hello world! program looks like:

#!./snabb snsh

local pcap = require("apps.pcap.pcap")
local PcapFilter = require("apps.packet_filter.pcap_filter").PcapFilter
local RawSocket = require("apps.socket.raw").RawSocket

local args = main.parameters
local iface = assert(args[1], "No listening interface")
local fileout = args[2] or "output.pcap"

local c = config.new()
config.app(c, "nic", RawSocket, iface)
config.app(c, "filter", PcapFilter, {filter = "tcp dst port 80 or dst port 443"})
config.app(c, "writer", pcap.PcapWriter, fileout)

config.link(c, "nic.tx -> filter.input")
config.link(c, "filter.output -> writer.input")

engine.configure(c)
engine.main({duration=30})

main.exit(0)

Now save the script and run it:

$ chmod +x http-filter.snabb 
$ sudo ./http-filter.snabb wlp3s0

While the script is running I open a few websites in my browser. Hopefully some packets will be captured onto output.pcap:

$ sudo tcpdump -tr output.pcap
IP sagan.50062 > 54.239.17.7.http: Flags [P.], seq 0:926, ack 1, win 229, length 926: HTTP: GET / HTTP/1.1
IP sagan.50062 > 54.239.17.7.http: Flags [.], ack 189, win 237, length 0
IP sagan.50062 > 54.239.17.7.http: Flags [.], ack 368, win 245, length 0
IP sagan.37346 > 93.184.220.29.http: Flags [S], seq 370675941, win 29200, options [mss 1460,sackOK,TS val 1370741706 ecr 0,nop,wscale 7], length 0
IP sagan.37346 > 93.184.220.29.http: Flags [.], ack 2640726891, win 229, options [nop,nop,TS val 1370741710 ecr 2287287426], length 0
IP sagan.37346 > 93.184.220.29.http: Flags [P.], seq 0:439, ack 1, win 229, options [nop,nop,TS val 1370741729 ecr 2287287426], length 439: HTTP: POST / HTTP/1.1
IP sagan.37346 > 93.184.220.29.http: Flags [.], ack 789, win 251, options [nop,nop,TS val 1370741733 ecr 2287287449], length 0

Some highlights in this script:

  • The shebang line (#./snabb snsh) refers to the Snabb’s shell (snsh), one of the many subprograms available in Snabb. It allows us to run Snabb scripts, that is Lua programs that have access to the Snabb environment (engine, apps, libraries, etc).
  • There’s a series of libraries that where not loaded: config, engine, main, etc. These libraries are part of the Snabb environment and are automatically loaded in every program.
  • The network function instantiates 3 apps: RawSocket, PcapFilter and PcapWriter, initializes them and pipes them together through links forming a graph. This graph is passed to the engine that executes it for 30 seconds.

Martian packets

Let’s continue with another example: a network function that manages a more complex set of rules to filter out traffic. Since there are more rules I will encapsulate the filtering logic into a custom app.

The data we’re going to filter are martian packets. According to Wikipedia, a martian packet is “an IP packet seen on the public internet that contains a source or destination address that is reserved for special-use by Internet Assigned Numbers Authority (IANA)”. For instance, packets with RFC1918 addresses or multicast addresses seen on the public internet are martian packets.

Unlike the previous example, I decided not to code this network function as an script, but as a program instead. The network function lives at src/program/martian. I’ve pushed the final code to a branch in my Snabb repository:

$ git remote add https://github.com/dpino/snabb.git dpino
$ git fetch dpino
$ git checkout -b dpino/martian-packets

To run the app:

$ sudo ./snabb martian program/martian/test/sample.pcap
link report:
   3 sent on filter.output -> writer.input (loss rate: 0%)
   5 sent on reader.output -> filter.input (loss rate: 0%)

The functions lets pass 3 out of 5 packets from sample.pcap.

$ sudo tcpdump -qns 0 -t -e -r program/martian/test/sample.pcap
reading from file program/martian/test/sample.pcap, link-type EN10MB (Ethernet)
00:00:01:00:00:00 > fe:ff:20:00:01:00, IPv4, length 62: 145.254.160.237.3372 > 65.208.228.223.80: tcp 0
fe:ff:20:00:01:00 > 00:00:01:00:00:00, IPv4, length 62: 65.208.228.223.80 > 145.254.160.237.3372: tcp 0
00:00:01:00:00:00 > fe:ff:20:00:01:00, IPv4, length 54: 145.254.160.237.3372 > 65.208.228.223.80: tcp 0
90:e2:ba:94:2a:bc > 02:cf:69:15:81:01, IPv4, length 242: 10.0.1.100 > 10.10.0.0: ICMP echo reply, id 1024, seq 0, length 208
90:e2:ba:94:2a:bc > 02:cf:69:15:81:01, IPv4, length 242: 10.0.1.100 > 10.10.0.0: ICMP echo reply, id 53, seq 0, length 208

The last two packets are martian packets. They cannot occur in a public network since their source or destination addresses are private addresses.

Some highlights about this network function:

  • Instead of a filtering app, I’ve coded my own filtering app, called MartianFiltering. This new app is the responsible for determining whether a packet is a martian packet or not. This operation has to be done in the push method of the app.
  • I’ve coded some utility functions to parse CIDR addresses (such as 100.64.0.0/10) and to check whether an IP address belongs to a network. Instead I could have used Snabb’s filtering library that allows to filter packets using tcpdump like expressions. For instance, “net 100.64.0.0 mask 255.192.0.0”.
  • The network function doesn’t use a network interface to read packets from, instead it reads packets out of a .pcap file.
  • Every Snabb program has a run function, that is the program’s entry point. A Snabb program or library can also add a selftest function, which is used to unit test the module ($ sudo ./snabb snsh -t program.martian). On the other hand, Snabb apps must implement a new method and optionally a push or pull method (or both, but at least one of them).

Here’s the app’s graph:

config.app(c, "reader", pcap.PcapReader, filein)
config.app(c, "filter", MartianFilter)
config.app(c, "writer", pcap.PcapWriter, fileout)

config.link(c, "reader.output -> filter.input")
config.link(c, "filter.output -> writer.input")

And here is how MartianPacket:pull method looks like:

function MartianFilter:push ()
   local input, output = assert(self.input.input), assert(self.output.output)

   while not link.empty(input) do
      local pkt = link.receive(input)
      local ip_hdr = ipv4:new_from_mem(pkt.data + IPV4_OFFSET, IPV4_SIZE)
      if self:is_martian(ip_hdr:src()) or self:is_martian(ip_hdr:dst()) then
         packet.free(pkt)
      else
         link.transmit(output, pkt)
      end
   end
end

As a rule of thumb, in every Snabb program there’s always one app only that feeds packets into the graph, in this case the PcapReader app. Such applications have to override the method pull. Apps that would like to manipulate packets will have a chance to do it in their push method.

Summary

Snabb is a very useful tool for coding network functions that need to run at very high speed. For this reason, it’s usually deployed as part of an ISP network infrastructure. However, the toolkit is versatile enough to allow us code any type of application that has to manipulate network traffic.

In this tutorial I introduced how to start using Snabb to code network functions. In a first example I showed how to download and build Snabb plus a very simple application that filters HTTP or HTTPS traffic from a network interface. On a second example, I introduced how to code a Snabb program and an app, MartianFiltering. This app exemplifies how to filter out packets based on a set of rules and forward or drop packets based on those conditions. Other more sophisticated network functions, such as firewalling, packet-rate limiting or DDoS prevention attack, behave in a similar manner.

That’s all for now. I left out another example that consisted of sending and receiving Multicast DNS packets. Likely I’ll cover it in a followup article.

November 28, 2017 06:00 AM

November 24, 2017

Víctor Jáquez

Intel MediaSDK on Debian (testing)

Everybody knows it: install Intel MediaSDK in GNU/Linux is a PITA. With CentOS or Yocto is less cumbersome, if you trust blindly on scripts ran as root.

I don’t like CentOS, I feel it like if I were living in the past. I like Debian (testing, of course) and I also wanted to understand a little more about MediaSDK. And this is what I did to have Intel MediaSDK working in Debian/testing.

First, I did a pristine installation of Debian testing with a netinst image in my NUC 6i5SYK, with a normal desktop user setup (Gnome3).

The madness comes later.

Intel’s identifies two types of MediaSDK installation: Gold and Generic. Gold is for CentOS, and Generic for the rest of distributions. Obviously, Generic means you’re on your own. For the purpose of this exercise I used as reference Generic Linux* Intel® Media Server Studio Installation.

Let’s begin by grabbing the Intel® Media Server Studio – Community Edition. You will need to register yourself and accept the user agreement, because this is proprietary software.

At the end, you should have a tarball named MediaServerStudioEssentials2017R3.tar.gz

Extract the files for Generic instalation

$ cd ~
$ tar xvf MediaServerStudioEssentials2017R3.tar.gz
$ cd MediaServerStudioEssentials2017R3
$ tar xvf SDK2017Production16.5.2.tar.gz
$ cd SDK2017Production16.5.2/Generic
$ mkdir tmp
$ tar -xvC tmp -f intel-linux-media_generic_16.5.2-64009_64bit.tar.gz

Kernel

Bad news: in order to get MediaSDK working you need to patch the mainlined kernel.

Worse news: the available patches are only for the version 4.4 the kernel.

Still, systemd works on 4.4, as far as I know, so it would not be a big problem.

Grab building dependencies
$ sudo apt install build-essential devscripts libncurses5-dev
$ sudo apt build-dep linux

Grab kernel source

I like to use the sources from the git repository, since it would be possible to do some rebasing and blaming in the future.

$ cd ~
$ git clone https://github.com/torvalds/linux.git
...
$ git pull -v --tags
$ git checkout -b 4.4 v4.4

Extract MediaSDK patches

$ cd ~/MediaServerStudioEssentials2017R3/SDK2017Production16.5.2/Generic/tmp/opt/intel/mediasdk/opensource/patches/kmd/4.4
$ tar xvf intel-kernel-patches.tar.bz2
$ cd intel-kernel-patches
$ PATCHDIR=$(pwd)

Patch the kernel

cd ~/linux
$ git am $PATCHDIR/*.patch

The patches should apply with some warnings but no fatal errors (don’t worry, be happy).

Still, there’s a problem with this old kernel: our recent compiler doesn’t build it as it is. Another patch is required:

$ wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.8-rc2/0002-UBUNTU-SAUCE-no-up-disable-pie-when-gcc-has-it-enabl.patch
$ git am 0002-UBUNTU-SAUCE-no-up-disable-pie-when-gcc-has-it-enabl.patch

TODO: Shall I need to modify the EXTRAVERSION string in kernel’s Makefile?

Build and install the kernel

Notice that we are using our current kernel configuration. That is error prone. I guess that is why I had to select NVM manually.

$ cp /boot/config-4.12.0-1-amd64 ./.config
$ make olddefconfig
$ make nconfig # -- select NVM
$ scripts/config --disable DEBUG_INFO
$ make deb-pkg
...
$ sudo dpkg -i linux-image-4.4.0+_4.4.0+-2_amd64.deb linux-headers-4.4.0+_4.4.0+-2_amd64.deb linux-firmware-image-4.4.0+_4.4.0+-2_amd64.deb

Configure GRUB2 to boot Linux 4.4. by default

This part was absolutely tricky for me. It took me a long time to figure out how to specify the kernel ID in the grubenv.

$ sudo vi /etc/default/grub

And change the line GRUB_DEFAULT=saved. By default it is set to 0. And update GRUB.

$ sudo update-grub

Now look for the ID of the installed kernel image in /etc/grub/grub.cfg and use it:

$ sudo grub-set-default "gnulinux-4.4.0+-advanced-2c246bc6-65bb-48ea-9517-4081b016facc>gnulinux-4.4.0+-advanced-2c246bc6-65bb-48ea-9517-4081b016facc"

Please note it is twice and separated by a >. Don’t ask me why.

Copy MediaSDK firmware (and libraries too)

I like to use rsync rather normal cp because there are the options like --dry-run and --itemize-changes to verify what I am doing.

$ cd ~/MediaServerStudioEssentials2017R3/SDK2017Production16.5.2/Generic/tmp
$ sudo rsync -av --itemize-changes ./lib /
$ sudo rsync -av --itemize-changes ./opt/intel/common /opt/intel
$ sudo rsync -av --itemize-changes ./opt/intel/mediasdk/{include,lib64,plugins} /opt/intel/mediasdk

All these directories contain blobs that do the MediaSDK magic. They are dlopened by hard coded paths by mfx_dispatch, which will be explain later.

In /lib lives the firmware (kernel blob).

In /opt/intel/common… I have no idea what are those shared objects.

In /opt/intel/mediasdk/include live header files for programming an compilation.

In /opt/intel/mediasdk/lib64 live the driver for the modified libva (iHD) and other libraries.

In /opt/intel/mediasdk/plugins live, well, plugins…

In conclusion, all these bytes are darkness and mystery.

Reboot

$ sudo systemctl reboot

The system should boot, automatically, in GNU/Linux 4.4.

Please, log with Xorg, not in Wayland, since it is not supported, as far as I know.

GStreamer

For compiling GStreamer I will use gst-uninstalled. Someone may say that I should use gst-build because is newer and faster, but I feel more comfortable doing the following kind of hacks with the old&good autotools.

Basically this is a reproduction of Quick-start guide to gst-uninstalled for GStreamer 1.x.

$ sudo apt build-dep gst-plugins-{base,good,bad}1.0
$ wget https://cgit.freedesktop.org/gstreamer/gstreamer/plain/scripts/create-uninstalled-setup.sh -q -O - | sh

I will modify the gst-uninstalled script, and keep it outside of the repository. For that I will use the systemd file-hierarchy spec for user’s executables.

$ cd ~/gst
$ mkdir -p ~/.local/bin
$ mv master/gstreamer/scripts/gst-uninstalled ~/.local/bin
$ ln -sf ~/.local/bin/gst-uninstalled ./gst-master

Do not forget to edit your ~/.profile to add ~/.local/bin in the environment variable PATH.

Patch ~/.local/bin/gst-uninstalled

The modifications are to handle the three dependencies libraries that are required by MediaSDK: libdrm, libva and mfx_dispatch.

diff --git a/scripts/gst-uninstalled b/scripts/gst-uninstalled
index 81f83b6c4..d79f19abd 100755
--- a/scripts/gst-uninstalled
+++ b/scripts/gst-uninstalled
@@ -122,7 +122,7 @@ GI_TYPELIB_PATH=$GST/gstreamer/gst:$GI_TYPELIB_PATH
 export LD_LIBRARY_PATH
 export DYLD_LIBRARY_PATH
 export GI_TYPELIB_PATH
-  
+
 export PKG_CONFIG_PATH="\
 $GST_PREFIX/lib/pkgconfig\
 :$GST/gstreamer/pkgconfig\
@@ -140,6 +140,9 @@ $GST_PREFIX/lib/pkgconfig\
 :$GST/orc\
 :$GST/farsight2\
 :$GST/libnice/nice\
+:$GST/drm\
+:$GST/libva/pkgconfig\
+:$GST/mfx_dispatch\
 ${PKG_CONFIG_PATH:+:$PKG_CONFIG_PATH}"

 export GST_PLUGIN_PATH="\
@@ -227,6 +230,16 @@ export GST_VALIDATE_APPS_DIR=$GST_VALIDATE_APPS_DIR:$GST/gst-editing-services/te
 export GST_VALIDATE_PLUGIN_PATH=$GST_VALIDATE_PLUGIN_PATH:$GST/gst-devtools/validate/plugins/
 export GIO_EXTRA_MODULES=$GST/prefix/lib/gio/modules:$GIO_EXTRA_MODULES

+# MediaSDK
+export LIBVA_DRIVERS_PATH=/opt/intel/mediasdk/lib64
+export LIBVA_DRIVER_NAME=iHD
+export LD_LIBRARY_PATH="\
+/opt/intel/common/mdf/lib64\
+:$GST/drm/.libs\
+:$GST/drm/intel/.libs\
+:$GST/libva/va/.libs\
+:$LD_LIBRARY_PATH"
+

Now, initialize the gst-uninstalled environment:

$ cd ~/gst
$ ./gst-master
libdrm

Grab libdrm from its repository and switch to the branch with the supported version by MediaSDK.

$ cd ~/gst/master
$ git clone git://anongit.freedesktop.org/mesa/drm
$ cd drm
$ git checkout -b intel libdrm-2.4.67

Extract the distributed tarball in the cloned repository.

$ tar -xv --strip-components=1 -C . -f ~/MediaServerStudioEssentials2017R3/SDK2017Production16.5.2/Generic/tmp/opt/intel/mediasdk/opensource/libdrm/2.4.67-64009/libdrm-2.4.67.tar.bz2

Then we could check the big delta between upstream and the changes done by Intel for MediaSDK.

Let’s put it in a commit for later rebases.

$ git add -u
$ git add .
$ git commit -m "mediasdk changes"

Get build dependencies and compile.

$ sudo apt build-dep libdrm
$ ./configure
$ make -j8

Since the pkgconfig files (*.pc) of libdrm are generated to work installed, it is needed to modify them in order to work uninstalled.

$ prefix=${HOME}/gst/master/drm
$ sed -i -e "s#^libdir=.*#libdir=${prefix}/.libs#" ${prefix}/*.pc
$ sed -i -e "s#^includedir=.*#includedir=${prefix}#" ${prefix}/*.pc

In order to C preprocessor could find the uninstalled libdrm header files we need to make them available in the expected path according to the pkgconfig file and right now they are not there. To fix that it is possible to create proper symbolic links.

$ cd ~/gst/master/drm
$ ln -s include/drm/ libdrm

libva

This modified a version of libva. These modifications messed a bit with the opensource version of libva, because Intel decided not to prefix the library, or some other strategy. In gstreamer-vaapi we had to blacklist VA-API version 0.99, because it is the version number, arbitrary set, of this modified version of libva for MediaSDK.

Again, grab the original libva from repo and change the branch aiming to the divert point. It was difficult to find the divert commit id since even the libva version number was changed. Doing some archeology I guessed the branch point was in version 1.0.15, but I’m not sure.

$ cd ~/gst/master
$ git clone https://github.com/01org/libva.git
$ cd libva
$ git checkout -b intel libva-1.0.15
$ tar -xv --strip-components=1 -C . -f ~/MediaServerStudioEssentials2017R3/SDK2017Production16.5.2/Generic/tmp/opt/intel/mediasdk/opensource/libva/1.67.0.pre1-64009/libva-1.67.0.pre1.tar.bz2
$ git add -u
$ git add .
$ git commit -m "mediasdk"

Before compile, verify that Makefile is going to link against the uninstalled libdrm. You can do that by grepping for LIBDRM in Makefile.

Get compilation dependencies and build.

$ sudo apt build-dep libva
$ ./configure
$ make -j8

Moidify the pkgconfig files for uninstalled

$ prefix=${HOME}/gst/master/libva
$ sed -i -e "s#^libdir=.*#libdir=${prefix}/va/.libs#" ${prefix}/pkgconfig/*.pc
$ sed -i -e "s#^includedir=.*#includedir=${prefix}#" ${prefix}/pkgconfig/*.pc

Fix header path with symbolic links

$ cd ~/gst/master/libva/va
$ ln -sf drm/va_drm.h

mfx_dispatch

This static library which must be linked with MediaSDK applications. In our case, to the GStreamer plugin.

According to its documentation (included in the tarball):

the dispatcher is a layer that lies between application and the SDK implementations. Upon initialization, the dispatcher locates the appropiate platform-specific SDK implementation. If there is none, it will select the software SDK implementation. The dispatcher will redirect subsequent function calls to the same functions in the selected SDK implementation.

In the tarball there is the source of the mfx_dispatcher, but it only compiles with cmake. I have not worked with cmake on uninstalled setups, but we are lucky, there is a repository with autotools support:

$ cd ~/gst/master
$ git clone https://github.com/lu-zero/mfx_dispatch.git

And compile. After running ./configure it is better to confirm, grepping the generated Makefie, that the uninstalled versions of libdrm and libva are going to be used.

$ autoreconf  --install
$ ./configure
$ make -j8

Finally, just as the other libraries, it is required to fix the pkgconfig files:d

$ prefix=${HOME}/gst/master/mfx_dispatch
$ sed -i -e "s#^libdir=.*#libdir=${prefix}/.libs#" ${prefix}/*.pc
$ sed -i -e "s#^includedir=.*#includedir=${prefix}#" ${prefix}/*.pc

Test it!

At last we are in a position where it is possible to test if everything works as expected. For it we are going to run the pre-compiled version of vainfo bundled in the tarball.

We will copy it to our uninstalled setup, thus we would running without specifing the path.

$ sync -av /home/vjaquez/MediaServerStudioEssentials2017R3/SDK2017Production16.5.2/Generic/tmp/usr/bin/vainfo ./prefix/bin/
$ vainfo
libva info: VA-API version 0.99.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open /opt/intel/mediasdk/lib64/iHD_drv_video.so
libva info: Found init function __vaDriverInit_0_32
libva info: va_openDriver() returns 0
vainfo: VA-API version: 0.99 (libva 1.67.0.pre1)
vainfo: Driver version: 16.5.2.64009-ubit
vainfo: Supported profile and entrypoints
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline: <unknown entrypoint>
      VAProfileH264ConstrainedBaseline: <unknown entrypoint>
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264Main               : <unknown entrypoint>
      VAProfileH264Main               : <unknown entrypoint>
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileH264High               : <unknown entrypoint>
      VAProfileH264High               : <unknown entrypoint>
      VAProfileMPEG2Simple            : VAEntrypointEncSlice
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointEncSlice
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointEncPicture
      VAProfileVP8Version0_3          : VAEntrypointEncSlice
      VAProfileVP8Version0_3          : VAEntrypointVLD
      VAProfileVP8Version0_3          : <unknown entrypoint>
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSlice
      VAProfileVP9Profile0            : <unknown entrypoint>
      <unknown profile>               : VAEntrypointVideoProc
      VAProfileNone                   : VAEntrypointVideoProc
      VAProfileNone                   : <unknown entrypoint>

It works!

Compile GStreamer

I normally make a copy of ~/gst/master/gstreamer/script/git-update.sh in ~/.local/bin in order to modify it, like adding support for ccache, disabling gtkdoc and gobject-instrospections, increase the parallel tasks, etc. But that is out of the scope of this document.

$ cd ~/gst/master/
$ ./gstreamer/scripts/git-update.sh

Everything should be built without issues and, at the end, we could test if the gst-msdk elements are available:

$ gst-inspect-1.0 msdk
Plugin Details:
  Name                     msdk
  Description              Intel Media SDK encoders
  Filename                 /home/vjaquez/gst/master/gst-plugins-bad/sys/msdk/.libs/libgstmsdk.so
  Version                  1.13.0.1
  License                  BSD
  Source module            gst-plugins-bad
  Source release date      2017-11-23 16:39 (UTC)
  Binary package           GStreamer Bad Plug-ins git
  Origin URL               Unknown package origin

  msdkh264dec: Intel MSDK H264 decoder
  msdkh264enc: Intel MSDK H264 encoder
  msdkh265dec: Intel MSDK H265 decoder
  msdkh265enc: Intel MSDK H265 encoder
  msdkmjpegdec: Intel MSDK MJPEG decoder
  msdkmjpegenc: Intel MSDK MJPEG encoder
  msdkmpeg2enc: Intel MSDK MPEG2 encoder
  msdkvp8dec: Intel MSDK VP8 decoder
  msdkvp8enc: Intel MSDK VP8 encoder

  9 features:
  +-- 9 elements

Great!

Now, let’s run a simple pipeline. Please note that gst-msdk elements have rank zero, then they will not be autoplugged, it is necessary to craft the pipeline manually:

$ gst-launch-1.0 filesrc location= ~/test.264 ! h264parse ! msdkh264dec ! videoconvert ! xvimagesink
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
libva info: VA-API version 0.99.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open /opt/intel/mediasdk/lib64/iHD_drv_video.so
libva info: Found init function __vaDriverInit_0_32
libva info: va_openDriver() returns 0
Redistribute latency...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:02.502411331
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

\o/

by vjaquez at November 24, 2017 09:57 AM

November 22, 2017

Asumu Takikawa

Writing network drivers in a high-level language

Another day, another post about Snabb. Today, I’ll start to explain some work I’ve been doing at Igalia for Deutsche Telekom on driver development. All the DT driver work I’ll be talking about was joint work with Nicola Larosa.

When writing a networking program with Snabb, the program has to get some packets to crunch on from somewhere. Like everything else in Snabb, these packets come from an app.

These source apps might be a synthetic packet generator or, for anything running on real hardware, a network driver that talks to a NIC (a network card). That network driver part is the subject of this blog post.

These network drivers are written in LuaJIT like the rest of the system. This is maybe not that surprising if you know Snabb does kernel-bypass networking (like DPDK or other similar approaches), but it’s still quite remarkable! The vast majority of drivers that people are familiar with (graphics drivers, wifi drivers, or that obscure CueCat driver) are written in C.

For the Igalia project, we worked on extending the existing Snabb drivers for Intel NICs with some extra features. I’ll talk more about the new work that we did specifically in a second blog post. For this post, I’ll introduce how we can even write a driver in Lua.

(and to be clear, the existing Snabb drivers aren’t my work; they’re the work of some excellent Snabb hackers like Luke Gorrie and others)

For the nitty-gritty details about how Snabb bypasses the kernel to let a LuaJIT program operate on the NIC, I recommend reading Luke Gorrie’s neat blog post about it. In this post, I’ll talk about what happens once user-space has a hold on the network card.

Driver infrastructure

When a driver starts up, it of course needs to initialize the hardware. The datasheet for the Intel 82599 NIC for example dedicates an entire chapter to this. A lot of the initialization process consists of poking at the appropriate configuration registers on the device, waiting for things to power up and tell you they’re ready, and so on.

To actually poke at these registers, the driver uses memory-mapped I/O to the PCI device. The MMIO memory is, as far as LuaJIT and we are concerned, just a pointer to a big chunk of memory given to us by the hardware.pci library via the FFI.

It’s up to us to interpret this returned uint32_t pointer in a useful way. Specifically, we know certain offsets into this memory are mapped to registers as specified in the datasheet.

Since we’re living in a high-level language, we want to hide away the pointer arithmetic needed to access these registers. So Snabb has a little DSL in the lib.hardware.register library that takes text descriptions of registers like this:

1
2
3
4
-- Name        Address / layout        Read-write status/description
array_registers = [[
   RSSRK       0x5C80 +0x04*0..9       RW RSS Random Key
]]

and then lets you map them into a register table:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
my_regs = {}

pci      = require 'lib.hardware.pci'
register = require 'lib.hardware.register'

-- get a pointer to MMIO for some PCI address
base_ptr = pci.map_pci_memory_unlocked("02:00.01", 0)

-- defines (an array of) registers in my_regs
register.define_array(array_registers, my_regs, base_ptr)

After defining these registers, you can use the my_regs table to access the registers like any other Lua data. For example, the “RSS Random Key” array of registers can be initialized with some random data like this:

1
2
3
for i=0, 9 do
   my_regs.RSSRK[i](math.random(2^32))
end

This code looks like straightforward Lua code, but it’s poking at the NIC’s configuration registers. These registers are also often manipulated at the bit level, and there is some library support for that in the lib.bits.

For example, here are some prose instructions to initialize a certain part of the NIC from the datasheet:

1
2
3
4
Disable TC arbitrations while enabling the packet buffer free space monitor:

  — Tx Descriptor Plane Control and Status (RTTDCS), bits:
  TDPAC=0b, VMPAC=1b, TDRM=0b, BDPM=1b, BPBFSM=0b

This is basically instructing the implementor to set some bits and clear some bits in the RTTDCS register, which can be translated into some code that looks like this:

1
2
3
4
5
6
7
bits = require "lib.bits"

-- clear these bits
my_regs.RTTDCS:clr(bits { TDPAC=0, TDRM=4, BPBFSM=23 })

-- set these bits
my_regs.RTTDCS:set(bits { VMPAC=1, BDPM=22 })

The bits function just takes a table of bit offsets to set (the table key strings only matter for documentation’s sake) and turns it into a number to use for setting a register. It’s possible to write these bit manipulations with just arithmetic operations as well, but it’s usually more verbose that way.

Getting packets into the driver

To build the actual driver, we use the handy infrastructure above to do the device initialization and configuration and then drive a main loop that accepts packets from the NIC and feeds them into the Snabb program (we will just consider the receive path in this post). The core structure of this main loop is simpler than you might expect.

On a NIC like the Intel 82599, the packets are transferred into the host system’s memory via DMA into a receive descriptor ring. This is a circular buffer that keeps entries that contain a pointer to packet data and then some metadata.

A typical descriptor entry looks like this:

1
2
3
4
5
----------------------------------------------------------------------
|             Address (to memory allocated by driver)                |
----------------------------------------------------------------------
|    VLAN tag    | Errors | Status |    Checksum    |    Length      |
----------------------------------------------------------------------

The driver allocates some DMA-friendly memory (via memory.dma_alloc from core.memory) for the descriptor ring and then sets the NIC registers (RDBAL & RDBAH) so that the NIC knows the physical address of the ring. There are some neat tricks in core.memory which make the virtual to physical address translation easy.

In addition to this ring, a packet buffer is allocated for each entry in the ring and its (physical) address is stored in the first field of the entry (see diagram above).

The NIC will then DMA packets into the buffer as they are received and filtered by the hardware.

The descriptor ring has head/tail pointers (like a typical circular buffer) indicating where new packets arrive, and where the driver is reading off of. The driver mainly sets the tail pointer, indicating how far it has processed.

A Snabb app can introduce new packets into a program by implementing the pull method. A driver’s pull method might have the following shape (based on the intel_app driver):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
local link = require "core.link"

-- pull method definition on Driver class
function Driver:pull ()
   -- make sure the output link exists
   local l = self.output.tx
   if l == nil then return end

   -- sync the driver and HW on descriptor ring head/tail pointers
   self:sync_receive()

   -- pull a standard number of packets for a link
   for i = 1, engine.pull_npackets do
      -- check head/tail pointers to make sure packets are available
      if not self:can_receive() then break end

      -- take packet from descriptor ring, put into the Snabb output link
      link.transmit(l, self:receive())
   end

   -- allocate new packet buffers for all the descriptors we processed
   -- we can't reuse the buffers since they are now owned by the next app
   self:add_receive_buffers()
end

Of course, the real work is done in the helper methods like sync_receive and receive. I won’t go over the implementation of those, but they mostly deal with manipulating the head and tail pointers of the descriptor ring appropriately while doing any allocation that is necessary to keep the ring set up.

The takeaway I wanted to communicate from this skeleton is that using Lua makes for very clear and pleasant code that doesn’t get too bogged down in low-level details. That’s partly because Snabb’s core abstracts the complexity of using device registers and allocating DMA memory and things like that. That kind of abstraction is made a lot easier by LuaJIT and its FFI, so that the surface code looks like it’s just manipulating tables and making function calls.

In the next blog post, I’ll talk about some specific improvements we made to Snabb’s drivers to make it more ready for use with multi-process Snabb apps.

by Asumu Takikawa at November 22, 2017 10:45 AM

November 16, 2017

Alberto Garcia

“Improving the performance of the qcow2 format” at KVM Forum 2017

I was in Prague last month for the 2017 edition of the KVM Forum. There I gave a talk about some of the work that I’ve been doing this year to improve the qcow2 file format used by QEMU for storing disk images. The focus of my work is to make qcow2 faster and to reduce its memory requirements.

The video of the talk is now available and you can get the slides here.

The KVM Forum was co-located with the Open Source Summit and the Embedded Linux Conference Europe. Igalia was sponsoring both events one more year and I was also there together with some of my colleages. Juanjo Sánchez gave a talk about WPE, the WebKit port for embedded platforms that we released.

The video of his talk is also available.

by berto at November 16, 2017 10:16 AM

November 14, 2017

Michael Catanzaro

Igalia is Hiring

Igalia is hiring web browser developers. If you think you’re a good candidate for one of these jobs, you’ll want to fill out the online application accompanying one of the postings. We’d love to hear from you.

We’re especially interested in hiring a browser graphics developer. We realize that not many graphics experts also have experience in web browser development, so it’s OK if you haven’t worked with web browsers before. Low-level Linux graphics experience is the more important qualification for this role.

Igalia is not just a great place to work on cool technical projects like WebKit. It’s also a political and social project: an egalitarian, worker-owned cooperative where everyone has an equal vote in company decisions and receives equal pay. It’s been around for 16 years, so it’s also not a startup. You can work remotely from wherever you happen to be, or from our office in A Coruña, Spain. You won’t have a boss, but you will be expected to work well with your colleagues. It’s not the right fit for everyone, but there’s nowhere I’d rather be.

by Michael Catanzaro at November 14, 2017 03:04 PM

November 13, 2017

Diego Pino

Snabb explained in less than 10 minutes

Last month I attended the 20th edition of GORE (the Spain’s Network Operator Group meeting) where I delivered an introductory talk about Snabb (Spanish). Slides of the talk are also available online (English).

Taking advantage of this presentation I decided to write down an introductory article about Snabb. Something that could allow anyone to understand what’s Snabb easily.

What is Snabb?

Snabb is a toolkit for developing network functions in user-space. This definition refers to two keywords that are worth clarifying: network functions and user-space.

What’s a network function?

A network function is any program that does something on network traffic. What kind of things can be done on traffic? For instance: to read packets, modify their headers, create new packets, discard packets or forward them. Any network function is a combination of these basic operations. Here are some examples:

  • Filtering function (i.e. firewalling): read incoming packets, compare to table of rules and execute an action (forward or drop).
  • Traffic mapping (i.e. NAT): read incoming packets, modify headers and forward packet.
  • Encapsulation (i.e. VPN): read incoming packets, create a new packet, embed packet into new one and send it.

What’s user-space networking?

In the last few years, there has been a new trend for writing down network functions. This new trend consists of writing down the entire network function in user-space and do not leave any processing to the kernel.

Traditionally when writing down network functions we use the abstractions provided by the OS. The goal of any OS is to create abstractions over hardware that programs can use. This happens at many levels. For instance, when dealing with a hard-drive we don’t need to think of heads, cylinders and sectors but use a higher level abstraction: the filesystem. Networking is another layer abstracted by the OS. As programmers, we don’t deal with the NIC directly, instead we work with sockets and have access to APIs to deal with the TCP/IP stack.

However, the addition of higher level abstractions implicitly adds an overhead to the processing of our network function. The first disadvantage is that the function is split in two lands: user-space and kernel-space, and switching between both lands has a cost. And even if we move as much logic as possible to the kernel, there are inherit costs caused by the kernel’s networking layer.

The need of skipping the kernel and program network functions entirely in user-space was triggered by the continuous improvement of hardware. Today is possible to buy a 10G NIC for less than 200 euros. Soon the idea of building high-performance network appliances out of commodity hardware seemed feasible. Someone could pick an Intel Xeon, fill in the available PCI slots with 10G NICs and expect to have the equivalent of a very expensive Cisco or Juniper router for a fraction of its cost.

If we drive the hardware described above entirely with Linux we won’t be able to squeeze all its performance. Every packet hitting the NICs will have to go through the kernel’s networking layer and that has a cost caused by all the operations the kernel does onto packets before they’re available to manipulate by our program. To understand how much this is a problem, I need to introduce the concept of budget in a network function.

Know your network function budget

If we want to make the most of our hardware we generally would like to run our network function at line-rate speed, that means, the maximum speed of the NIC. How much time is that? In a 10G NIC, if we are receiving packets of an average size of 550-bytes at the maximum speed then we’re receiving a new packet every 440ns. That’s all the time we have available to run our network function on a packet.

Usually the way a NIC works is by placing incoming packets in a queue or buffer. This buffer is actually a ring-buffer, that means there are two cursors pointing to the buffer, the Rx cursor and the Tx cursor. When a new packet arrives, the packet is written at the Rx position and the cursor gets updated. When a packet leaves the buffer, the packet is read at the Tx position and the cursor gets updated after read. Our network function fetches packets from the Tx cursor. If it’s too slow processing a packet, the Rx cursor will eventually overpass the TX cursor. When that happens there’s a packet drop (a packet was overwritten before it was consumed).

Let’s go back to the 440ns number. How much time is that? Kernel hacker Jesper Brouer discusses this issue on his excellent talk “Network stack challenges at increasing speed” (I also recommend LWN’s summary of the talk: Improving Linux networking performance). Here’s the cost of some common operations: (cost varies depending on hardware but the order of magnitude is similar across different hardware settings)

  • Spinlock (Lock/Unlock): 16ns.
  • L2 cache hit: 4.3ns.
  • L3 cache hit: 7.9ns.
  • Cache miss: 32ns.

Taking into account those numbers 440ns doesn’t seem like a lot of time. System calls cost is also prohibitive, which should be minimized as much as possible.

Another important thing to notice is that the smaller the size of the packet, the smaller the budget. On a 10G NIC if we’re receiving packets of 64-byte on average, the smallest IPv4 packet size possible, that means we are receiving a new packet every 59ns. In this scenario two straight cache misses would eat the whole budget.

In conclusion, at these NIC speeds the additional overhead the kernel networking layer adds is non trivial, but significantly big enough to affect the execution of our network function. Since our budget gets reduced packets are more likely to be dropped at higher speeds or at smaller packet sizes, limiting the overall performance of our network card.

NOTE: This is a general picture of the issue of doing high-performance networking in the Linux kernel. The kernel hackers are not ignorant of these problems and have been working on ways to fix them in the last years. In that regard is worth mentioning the addition of XDP (eXpress Data Path), a kernel abstraction to execute network functions as closer to the hardware as possible. But that’s a subject for another post.

By-passing the kernel

User-space networking needs to by-pass the kernel’s networking layer so it can squeeze all the performance of the underlying hardware. There are several strategies to do that: user-space drivers, PF_RING, Netmap, etc (Cloudflare has an excellent article on kernel by-pass, commenting several of those strategies). Snabb chooses to handling the hardware directly, that means, to provide user-space drivers for the NICs it supports.

Snabb offers support mostly for Intel cards (although some Solarflare and Mellanox models are also supported). Implementing a driver, either in kernel-space or user-space, is not an easy task. It’s fundamental to have access to the vendor’s datasheet (generally a very large document) to know how to initialize the NIC, how to read packets from it, how to transfer data, etc. Intel provides such datasheet. In fact, Intel started a few years ago a project with a similar goal: DPDK. DPDK is an open-source project that implements drivers in user-space. Although originally it only provided drivers for Intel NICs, as the adoption of the project increased, other vendors have started to add drivers for their hardware.

Inside Snabb

Snabb was started in 2012 by free software hacker Luke Gorrie. Snabb provides direct access to the high-performance NICs but in addition to that it also provides an environment for building and running network functions.

Snabb is composed of several elements:

  • An Engine, that runs the network functions.
  • Libraries, that ease the development of network functions.
  • Apps, reusable software components that generally manipulate packets.
  • Programs, ready-to-use standalone network functions.

A network function in Snabb is a combination of apps connected together by links. The Snabb’s engine is in charge of feeding the app graph with packets and give a chance to every app to execute.

The engine processes the app graph in breadths. A breadth consists of two steps:

  • Inhale, puts packet into the graph.
  • Process, every app has a chance to receive packets and manipulate them.

During the inhale phase the method pull of an app gets executed. Apps that implement such method act as packet generators within the app graph. Packets are placed at the app’s links. Generally there’s only one app of think kind for every graph.

During the process phase the method push of an app gets executed. This gives a chance to every app to read packet from its incoming link, do something with them and likely place them out their outgoing link.

Hands-on example

Let’s build a network function that captures packets from a 10G NIC filters them using a packet-filtering expression and writes down the filtered packets to a pcap file. Such network function would look like this:

Snabb basic filter
Snabb basic filter

In Snabb code the equivalent graph above could be coded like this:

function run()
	local c = config.new()

	-- App definition.
	config.add(c, "nic", Intel82599, {
		pci = "0000:04:00.0"
	})
	config.add(c, "filter", PcapFilter, "src port 80")
	config.add(c, "pcap", Pcap.PcapWriter, "output.pcap")

	-- Link definition.
	config.link(c, "nic.tx        -> filter.input")
	config.link(c, "filter.output -> pcap.input")

	engine.configure(c)
	engine.main({duration=10})
end

A configuration is created describing the app graph of the network function. The configuration is passed down to Snabb which executes it for 10 seconds.

When Snabb’s engine runs this network function it executes the pull method of each app to feed packets into the graph links, inhale step. During the process step, the method push of each app is executed so apps have a chance to fetch packets from their incoming links, do something with them and likely place them into their outgoing links.

Here’s how the real implementation of PcapFilter.push method looks like:

function PcapFilter:push ()
	while not link.empty(self.input.rx) do
 		local p = link.receive(self.input.rx)
  		if self.accept_fn(p.data, p.length) then
     		link.transmit(self.output.tx, p)
     	else
     		packet.free(p)
		end
	end
end

A packet in Snabb is a really simple data structure. Basically, it consists of a length field and an array of bytes of fixed size.

struct packet {
	uint16_t length;
  	unsigned char data[10*1024];
};

A link is a ring-buffer of packets.

struct link {
	struct packet *packets[1024];
  	// the next element to be read
  	int read;
  	// the next element to be written
  	int write;
};

Every app has zero or many input links and zero or many output links. The number of links is created on runtime when the graph is defined. In the example above, the nic app has one outgoing link (nic.tx); the filter app has one incoming link (filter.rx) and one outgoing link (filter.tx); and the pcap app has one incoming link (pcap.input).

It might be surprising that packets and links are defined in C code, instead of Lua. Snabb runs on top of LuaJIT, an ultra-fast virtual machine for executing Lua programs. LuaJIT implements an FFI (Foreign Function Interface) to interact with C data types and call C runtime functions or external libraries directly from Lua code. In Snabb most data structures are defined in C which allows to compact data more efficiently.

local ether_header_t = ffi.typeof [[
/* All values in network byte order.  */
struct {
   uint8_t  dhost[6];
   uint8_t  shost[6];
   uint16_t type;
} __attribute__((packed))
]]

Calling a C-runtime function is really easy too.

ffi.cdef[[
  void syslog(int priority, const char\* format, ...);
]]
ffi.C.syslog(2, "error:...");

Wrapping up and last thoughts

In this article I’ve covered the basics of Snabb. I showed how to use Snabb to build network functions and explained why Snabb is a very convenient toolkit to write such type of programs. Snabb runs very fast since it by-passes the kernel, which makes it very useful for high-performance networking. In addition, Snabb is written in the high-level language Lua which enormously simplifies the entry barrier to start writing network functions.

However, there’s more things in Snabb I left out in this article. Snabb comes with a preset of programs ready to run. It also comes with a vast collection of apps and libraries which can help to speed up the construction of new network functions.

You don’t need to own a Intel10G card to start using Snabb today. Snabb can be used over TAP interfaces. It won’t be highly performant but it’s the best way to start with Snabb.

In a next article I plan to cover a more elaborated example of a network function using TAP interfaces.

November 13, 2017 06:00 AM

November 08, 2017

Eleni Maria Stea

Fosscomm 2017

FOSSCOMM (Free and Open Source Software Communities Meeting) is a Greek conference aiming at free-software and open-source enthusiasts, developers, and communities. The event is solely organized and ran by volunteers (usually university students, communities, Linux User Groups) and is taking place in a different city every year. The attendance is free and everyone is welcome to make a presentation or a workshop related to free and open source projects.

I always try to attend this meeting when the dates and the place are convenient, as it is a great opportunity to meet old friends and hangout with geek people.

This year’s Fosscomm2017 (website is in Greek) was held at the Harokopio University of Athens, during the weekend: 4-5th November 2017.

I grabbed the opportunity to go and give a talk about Mesa 3D, a project where the Igalia’s graphics team makes several contributions and releases the last 5 years.

My presentation was titled: “Hacking on Mesa 3D” and it was a short introductory talk about the OpenGL implementation, the OpenGL extension system, the GLSL compiler, the drivers and some of the development and debugging processes we use.

Slides and video (in Greek because that was the conference language):

index


To my surprise, my talk wasn’t the only one mentioning Igalia. 🙂 Dimitris Glynos, one of the Co-Founders of Census (and FOSSCOMM sponsor) gave the talk “FOSS is all we got: building a competitive IT skill set in Greece today” and mentioned us among other examples of companies that work successfully on open source projects.

Among the other talks I’ve attended, I particularly liked the “Linux Metrics” workshop by Effie Mouzeli and Giorgos Kargiotakis (during FOSSCOMM day #1), that was aiming to teach users and developers how to use metrics tools to detect performance issues. It was so successful that they’ve been asked to re-run it the following day.

Most of the other presentations and workshops, as well as the schedule, can be found here (for those who can understand greek):  https://www.fosscomm.hua.gr/. The FOSSCOMM organizers will soon edit the videos and upload them on a channel on YouTube.

this is my post-FOSSCOMM cup collection

I’d like to thank the people who attended the presentation, the FOSSCOMM2017 organization team who did such a great job on preparing and hosting the event and of course Igalia that is giving me the opportunity to work on cool graphics stuff.

See you at FOSSCOMM 2018! 😉

 

by hikiko at November 08, 2017 07:30 AM

November 01, 2017

Martin Robinson

Small Things

Even between two highly-developed western countries, there are a lot of cultural differences. After moving, I experienced the sort of culture shock that the Internet warns you about. Thankfully, the passage of time means that grumbling noon-time stomachs gradually give way to curiously peckish 2:30pm lunches. Instead of sitting in dread, willing your useless, polite American hands to flag a waiter, you manage to order a tiny beer using only your eyeballs. Big differences fade into the background so much that maybe you start to keep a list of them, just to avoid the feeling that you are forgetting some original piece of yourself.

This new familiarity begins to expose the incredibly long tail of subtle differences that have been hanging out quietly in the background. Unnamed onomatopoeias have a completely different sound. People are making gestures with their hands while they speak, and those gestures actually mean something very clear. Your brain calmly catalogs these curiosities as they become too trivial to comment on.

If you are like me, you stare at the street, the stoplights, and the sidewalks. Suddenly, the endless, small scale war being waged in the space between the double (and triple) parkers and the buildings becomes apparent. You see the rows of bollards silently holding back a tide of cars and delivery vans. Unspoken rules from your home country no longer apply here, after having taken them for granted for years

I don’t want to blab on too long about mundane things, so I will just point to the example of curb cuts. In the US we use curb cuts to connect the roadway to private garages, driveways, and parking lots. Thousands of dollars are spent lovingly crafting each of these small cement altars to the passage of automobiles. The sidewalk itself kneels to the pavement, so that cars can smoothly and comfortably climb into pedestrian space. This is all, of course, at the expense of people walking and in wheelchairs who often have to travel across an uneven sidewalk and wait for cars as they appear and (hopefully) leave. A curb cut is a signal that at any moment a car may enter the sidewalk and that it has a right to be there.

Spanish cities sometimes use little ramps instead. They look cheap and their metal surface is usually painted a bright and gaudy yellow. Their angle is decidedly steeper compared to modern curb cuts in the US, which means it is not easy to drive onto the sidewalk quickly or comfortably. Additionally, they look like they can also be added and removed cheaply and without modifying the sidewalk at all. Even more bizarrely, they are installed on the sacred roadway itself, so the sidewalk remains level for the all people who might happen to be using it. Sometimes, they even extend so far out into the roadway that parallel parking would be difficult or impossible, which prevents the space from becoming a private parking spot.

These little ramps are a small detail of the city, but for me they send a clear message. They announce to cars that they are entering a segregated pedestrian space. This invitation is conditional on moving slowly and carefully and can be revoked at any time with a hydraulic wrench. Maybe they are common simply because they are a cheap leftover from a period when this was a poorer country. I have a feeling that as time goes on, they will slowly be replaced by compact curb cuts descending from nice, new sidewalks. Despite all this, I feel a little bit of sadness, because their economy and their imperfection made the sidewalk just that much nicer.

November 01, 2017 04:00 AM

October 30, 2017

Víctor Jáquez

GStreamer Conference 2017

This year, the GStreamer Conference happened in Prague, along with the traditional autumn Hackfest.

Prague is a beautiful city, though this year I couldn’t visit it as much as I wanted, since the Embedded Linux Conference Europe and the Open Source Summit also took place there, and Igalia, being a Linux Foundation sponsor, had a booth in the venue, where I talked about our work with WebKit, Snabb, and obviously, GStreamer.

But, let’s back to the GStreamer Hackfest and Conference.

One of the features that I like the most of the GStreamer project is its community, the people involved in it, by writing code, sharing their work with many others. They might appear a bit tough at beginning (or at least that looked to me) but in real they are all kind and talented persons. And I’m proud of consider myself part of this community. Nonetheless it has a diversity problem, as many other Open Source communities.

GStreamer Conference 2017

During the Hackfest, Hyunjun and I, met with Sree and talked about the plans for GStreamer-VAAPI, the new features in VA-API and libva and how we could map them to the GStreamer’s design. Also we talked about the future developments in the msdk elements, merged one year ago in gst-plugins-bad. Also, I talked a bit with Nicolas Dufresne regarding kmssink and DMABuf.

In the Conference, which happened in the same venue as the hackfest, I talked wit the authors of gstreamer-media-SDK. They are really energetic.

I delivered my usual talk about GStreamer-VAAPI. You can find the slides, as a web presentation, here. Also, as every year, our friends of Ubicast, recorded the talks, and made them available for streaming almost instantaneously:

My colleague Enrique talked in the Conference about the Media Source Extensions (MSE) on WebKit, and Hyunjun shared his experience with VA-API on Rust.

Also, in the conference venue, we showed a couple demos. One of them was a MinnowBoard running WPE, rendering videos from YouTube using gstreamer-vaapi to decode video.

by vjaquez at October 30, 2017 04:24 PM

October 22, 2017

Frédéric Wang

Recent Browser Events

TL;DR

At Igalia, we attend many browser events. This is a quick summary of some recents conferences I participated to… or that gave me the opportunity to meet Igalians in Paris 😉.

Week 31: Paris - CSS WG F2F - W3C

My teammate Sergio attended the CSS WG F2F meeting as an observer. On Tuesday morning, I also made an appearance (but it was so brief that ceux que j’ai rencontrés ne m’ont peut-être pas vu). Together with other browser vendors and WG members, Sergio gave an interview regarding the successful story of CSS Grid Layout. By the way, given our implementation work in WebKit and Blink, Igalia finally decided to join the CSS Working Group 😊. Of course, during that week I had dinner with Sergio and it was nice to chat with my colleague in a French restaurant of Montmartre.

Week 38: Tokyo - BlinkOn 8 - Google

Jacobo, Gyuyoung and I attended BlinkOn 8. I had nice discussions and listened to interesting talks about a wide range of topics (Layout NG, Accessibility, CSS, Fonts, Web Predictability & Standards, etc). It was a pleasure to finally meet in persons some developers I had been in touch with during my projects on Ozone/Wayland and WebKit/iOS. For the lightning talks, we presented our activities on embedded linux platforms and the Web Platform. Incidentally, it was great to see Igalia’s work mentioned during the Next Generation Rendering Engine session. Obviously, I had the opportunity to visit places and taste Japanese food in Asakusa, Ueno and Roppongi 😋.

Week 40: A Coruña - Web Engines Hackfest - Igalia

I attended one of my favorite events, that gathers the whole browser community during three days for technical presentations, breakout sessions, hacking and galician food. This year, we had many sponsors and attendees. It is good to see that the event is becoming more and more popular! It was long overdue, but I was finally able to make Brotli and WOFF2 installable as system libraries on Linux and usable by WebKitGTK+ 😊. I opened similar bugs in Gecko and the same could be done in Chromium. Among the things I enjoyed, I met Jonathan Kew in person and heard more about Antonio and Maksim’s progress on Ozone/Wayland. As usual, it was nice to share time with colleagues, attend the assembly meeting, play football matches, have meals, visit Asturias… and tell one’s story 😉.

Week 41: San Jose - WebKit Contributors Meeting - Apple

In the past months, I have mostly been working on WebKit at Igalia and I would have been happy to see my fellow WebKit developers. However, given the events in Japan and Spain, I was not willing to make another trip to the USA just after. Hence I had to miss the WebKit Contributors Meeting again this year 😞. Fortunately, my colleagues Alex, Michael and Žan were present. Igalia is an important contributor to WebKit and we will continue to send people and propose some talks next year.

Week 42: Paris - Monthly Speaker Series - Mozilla

This Wednesday, I attended a conference on Privacy as a Competitive Advantage in Mozilla’s office. It was nice to hear about the increasing interest on privacy and to see the regulation made by the European Union in that direction. My colleague Philippe was visiting the office to work with some Mozilla developers on one of our project, so I was also able to meet him in the conference room. Actually, Mozilla employees were kind enough to let me stay at the office after the conference… Hence I was able to work on Apple’s Web Engine on a project sponsored by Google at the Mozilla office… probably something you can only do at Igalia 😉. Last but not least, Guillaume was also in holidays in Paris this week, so I let you imagine what happens when three French guys meet (hint: it involves food 😋).

October 22, 2017 10:00 PM

October 20, 2017

Adrián Pérez

Web Engines Hackfest, 2017 Edition

At the beginning of October I had the wonderful chance of attending the Web Engines Hackfest in A Coruña, hosted by Igalia. This year we were over 50 participants, which was great to associate even more faces to IRC nick names, but more importantly allows hackers working at all the levels of the Web stack to share a common space for a few days, making it possible to discuss complex topics and figure out the future of the projects which allow humanity to see pictures of cute kittens — among many other things.

Mandatory fluff (CC-BY-NC).

During the hackfest I worked mostly on three things:

  • Preparing the code of the WPE WebKit port to start making preview releases.

  • A patch set which adds WPE packages to Buildroot.

  • Enabling support for the CSS generic system font family.

Fun trivia: Most of the WebKit contributors work from the United States, so the week of the Web Engines hackfest is probably the only single moment during the whole year that there is a sizeable peak of activity in European day times.

Watching repository activity during the hackfest.

Towards WPE Releases

At Igalia we are making an important investment in the WPE WebKit port, which is specially targeted towards embedded devices. An important milestone for the project was reached last May when the code was moved to main WebKit repository, and has been receiving the usual stream of improvements and bug fixes. We are now approaching the moment where we feel that is is ready to start making releases, which is another major milestone.

Our plan for the WPE is to synchronize with WebKitGTK+, and produce releases for both in parallel. This is important because both ports share a good amount of their code and base dependencies (GStreamer, GLib, libsoup) and our efforts to stabilize the GTK+ port before each release will benefit the WPE one as well, and vice versa. In the coming weeks we will be publishing the first official tarball starting off the WebKitGTK+ 2.18.x stable branch.

Wild WEBKIT PORT appeared!

Syncing the releases for both ports means that:

  • Both stable and unstable releases are done in sync with the GNOME release schedule. Unstable releases start at version X.Y.1, with Y being an odd number.

  • About one month before the release dates, we create a new release branch and from there on we work on stabilizing the code. At least one testing release with with version X.Y.90 will be made. This is also what GNOME does, and we will mimic this to avoid confusion for downstream packagers.

  • The stable release will have version X.Y+1.0. Further maintenance releases happen from the release branch as needed. At the same time, a new cycle of unstable releases is started based on the code from the tip of the repository.

Believe it or not, preparing a codebase for its first releases involves quite a lot of work, and this is what took most of my coding time during the Web Engines Hackfest and also the following weeks: from small fixes for build failures all the way to making sure that public API headers (only the correct ones!) are installed and usable, that applications can be properly linked, and that release tarballs can actually be created. Exhausting? Well, do not forget that we need to set up a web server to host the tarballs, a small website, and the documentation. The latter has to be generated (there is still pending work in this regard), and the whole process of making a release scripted.

Still with me? Great. Now for a plot twist: we won't be making proper releases just yet.

APIs, ABIs, and Releases

There is one topic which I did not touch yet: API/ABI stability. Having done a release implies that the public API and ABI which are part of it are stable, and they are not subject to change.

Right after upstreaming WPE we switched over from the cross-port WebKit2 C API and added a new, GLib-based API to WPE. It is remarkably similar (if not the same in many cases) to the API exposed by WebKitGTK+, and this makes us confident that the new API is higher-level, more ergonomic, and better overall. At the same time, we would like third party developers to give it a try (which is easier having releases) while retaining the possibility of getting feedback and improving the WPE GLib API before setting it on stone (which is not possible after a release).

It is for this reason that at least during the first WPE release cycle we will make preview releases, meaning that there might be API and ABI changes from one release to the next. As usual we will not be making breaking changes in between releases of the same stable series, i.e. code written for 2.18.0 will continue to build unchanged with any subsequent 2.18.X release.

At any rate, we do not expect the API to receive big changes because —as explained above— it mimics the one for WebKitGTK+, which has already proven itself both powerful enough for complex applications and convenient to use for the simpler ones. Due to this, I encourage developers to try out WPE as soon as we have the first preview release fresh out of the oven.

Packaging for Buildroot

At Igalia we routinely work with embedded devices, and often we make use of Buildroot for cross-compilation. Having actual releases of WPE will allow us to contribute a set of build definitions for the WPE WebKit port and its dependencies — something that I have already started working on.

Lately I have been taking care of keeping the WebKitGTK+ packaging for Buildroot up-to-date and it has been delightful to work with such a welcoming community. I am looking forward to having WPE supported there, and to keep maintaining the build definitions for both. This will allow making use of WPE with relative ease, while ensuring that Buildroot users will pick our updates promptly.

Generic System Font

Some applications like GNOME Web Epiphany use a WebKitWebView to display widget-like controls which try to follow the design of the rest of the desktop. Unfortunately for GNOME applications this means Cantarell gets hardcoded in the style sheet —it is the default font after all— and this results in mismatched fonts when the user has chosen a different font for the interface (e.g. in Tweaks). You can see this in the following screen capture of Epiphany:

Web using hardcoded Cantarell and (on hover) -webkit-system-font.

Here I have configured the beautiful Inter UI font as the default for the desktop user interface. Now, if you roll your mouse over the image, you will see how much better it looks to use a consistent font. This change also affects the list of plugins and applications, error messages, and in general all the about: pages.

If you are running GNOME 3.26, this is already fixed using font: menu (part of the CSS spec since ye olde CSS 2.1) — but we can do better: Safari has had support since 2015, for a generic “system” font family, similar to sans-serif or cursive:

/* Using the new generic font family (nice!). */
body {
    font-family: -webkit-system-font;
}

/* Using CSS 2.1 font shorthands (not so nice). */
body {
    font: menu;       /* Pick ALL font attributes... */
    font-size: 12pt;  /* ...then reset some of them. */
    font-weight: 400;
}

During the hackfest I implemented the needed moving parts in WebKitGTK+ by querying the GtkSettings::gtk-font-name property. This can be used in HTML content shown in Epiphany as part of the UI, and to make the Web Inspector use the system font as well.

Web Inspector using Cantarell, the default GNOME 3 font (full size).

I am convinced that users do notice and appreciate attention to detail, even if they do unconsciously, and therefore it is worthwhile to work on this kind of improvements. Plus, as a design enthusiast with a slight case of typographic OCD, I cannot stop myself from noticing inconsistent usage of fonts and my mind is now at ease knowing that opening the Web Inspector won't be such a jarring experience anymore.

Outro

But there's one more thing: On occasion we developers have to debug situations in which a process is seemingly stuck. One useful technique involves running the offending process under the control of a debugger (or, in an embedded device, under gdbserver and controlled remotely), interrupting its execution at intervals, and printing stack traces to try and figure out what is going on. Unfortunately, in some circumstances running a debugger can be difficult or impractical. Wouldn't it be grand if it was possible to interrupt the process without needing a debugger and request a stack trace? Enter “Out-Of-Band Stack Traces” (proof of concept):

  1. The process installs a signal handler using sigaction(7), with the SA_SIGINFO flag set.

  2. On reception of the signal, the kernel interrupts the process (even if it's in an infinite loop), and invokes the signal handler passing an extra pointer to an ucontext_t value, which contains a snapshot of the execution status of the thread which was in the CPU before the signal handler was invoked. This is true for many platform including Linux and most BSDs.

  3. The signal handler code can get obtain the instruction and stack pointers from the ucontext_t value, and walk the stack to produce a stack trace of the code that was being executed. Jackpot! This is of course architecture dependent but not difficult to get right (and well tested) for the most common ones like x86 and ARM.

The nice thing about this approach is that the code that obtains the stack trace is built into the program (no rebuilds needed), and it does not even require to relaunch the process in a debugger — which can be crucial for analyzing situations which are hard to reproduce, or which do not happen when running inside a debugger. I am looking forward to have some time to integrate this properly into WebKitGTK+ and specially WPE, because it will be most useful in embedded devices.

by aperez (adrian@perezdecastro.org) at October 20, 2017 11:30 PM

October 17, 2017

Enrique Ocaña

Attending the GStreamer Conference 2017

This weekend I’ll be in Node5 (Prague) presenting our Media Source Extensions platform implementation work in WebKit using GStreamer.

The Media Source Extensions HTML5 specification allows JavaScript to generate media streams for playback and lets the web page have more control on complex use cases such as adaptive streaming.

My plan for the talk is to start with a brief introduction about the motivation and basic usage of MSE. Next I’ll show a design overview of the WebKit implementation of the spec. Then we’ll go through the iterative evolution of the GStreamer platform-specific parts, as well as its implementation quirks and challenges faced during the development. The talk continues with a demo, some clues about the future work and a final round of questions.

Our recent MSE work has been on desktop WebKitGTK+ (the WebKit version powering the Epiphany, aka: GNOME Web), but we also have MSE working on WPE and optimized for a Raspberry Pi 2. We will be showing it in the Igalia booth, in case you want to see it working live.

I’ll be also attending the GStreamer Hackfest the days before. There I plan to work on webm support in MSE, focusing on any issue in the Matroska demuxer or the vp9/opus/vorbis decoders breaking our use cases.

See you there!

UPDATE 2017-10-22:

The talk slides are available at https://eocanha.org/talks/gstconf2017/gstconf-2017-mse.pdf and the video is available at https://gstconf.ubicast.tv/videos/media-source-extension-on-webkit (the rest of the talks here).

by eocanha at October 17, 2017 10:48 AM

October 15, 2017

Javier Muñoz

Attending LibreCon 2017

This week I will be attending LibreCon 2017, one of the largest international events on open source technologies. It will be held on 19 and 20 October in Santiago de Compostela (Spain).

This year’s theme is the application of open source technologies in the industrial and primary sector, as well as the new opportunities that these technologies offer in areas like Cloud Computing, Big Data, Internet of Things (IoT) and the Sharing Economy.

I will be delivering one talk, under the sponsorship of my company Igalia, on Ceph Object Storage and its S3 API. I will introduce the Ceph architecture and the basics to understand how make cloud storage products and services based on Ceph/RGW. I will also comment on the most useful and supported S3 API and tooling working with Ceph.

See you there!

by Javier at October 15, 2017 10:00 PM

October 02, 2017

Iago Toral

Working with lights and shadows – Part III: rendering the shadows

In the previous post in this series I introduced how to render the shadow map image, which is simply the depth information for the scene from the view point of the light. In this post I will cover how to use the shadow map to render shadows.

The general idea is that for each fragment we produce we compute the light space position of the fragment. In this space, the Z component tells us the depth of the fragment from the perspective of the light source. The next step requires to compare this value with the shadow map value for that same X,Y position. If the fragment’s light space Z is larger than the value we read from the shadow map, then it means that this fragment is behind an object that is closer to the light and therefore we can say that it is in the shadows, otherwise we know it receives direct light.

Changes in the shader code

Let’s have a look at the vertex shader changes required for this:

void main()
{
   vec4 pos = vec4(in_position.x, in_position.y, in_position.z, 1.0);
   out_world_pos = Model * pos;
   gl_Position = Projection * View * out_world_pos;

   [...]

   out_light_space_pos = LightViewProjection * out_world_pos;
} 

The vertex shader code above only shows the code relevant to the shadow mapping technique. Model is the model matrix with the spatial transforms for the vertex we are rendering, View and Projection represent the camera’s view and projection matrices and the LightViewProjection represents the product of the light’s view and projection matrices. The variables prefixed with ‘out’ represent vertex shader outputs to the fragment shader.

The code generates the world space position of the vertex (world_pos) and clip space position (gl_Position) as usual, but then also computes the light space position for the vertex (out_light_space_pos) by applying the View and Projection transforms of the light to the world position of the vertex, which gives us the position of the vertex in light space. This will be used in the fragment shader to sample the shadow map.

The fragment shader will need to:

  1. Apply perspective division to compute NDC coordinates from the interpolated light space position of the fragment. Notice that this process is slightly different between OpenGL and Vulkan, since Vulkan’s NDC Z is expected to be in the range [0, 1] instead of OpenGL’s [-1, 1].
  • Transform the X,Y coordinates from NDC space [-1, 1] to texture space [0, 1].

  • Sample the shadow map and compare the result with the light space Z position we computed for this fragment to decide if the fragment is shadowed.

  • The implementation would look something like this:

    float
    compute_shadow_factor(vec4 light_space_pos, sampler2D shadow_map)
    {
       // Convert light space position to NDC
       vec3 light_space_ndc = light_space_pos.xyz /= light_space_pos.w;
    
       // If the fragment is outside the light's projection then it is outside
       // the light's influence, which means it is in the shadow (notice that
       // such sample would be outside the shadow map image)
       if (abs(light_space_ndc.x) > 1.0 ||
           abs(light_space_ndc.y) > 1.0 ||
           abs(light_space_ndc.z) > 1.0)
          return 0.0;
    
       // Translate from NDC to shadow map space (Vulkan's Z is already in [0..1])
       vec2 shadow_map_coord = light_space_ndc.xy * 0.5 + 0.5;
    
       // Check if the sample is in the light or in the shadow
       if (light_space_ndc.z > texture(shadow_map, shadow_map_coord.xy).x)
          return 0.0; // In the shadow
    
       // In the light
       return 1.0;
    }  
    

    The function returns 0.0 if the fragment is in the shadows and 1.0 otherwise. Note that the function also avoids sampling the shadow map for fragments that are outside the light’s frustum (and therefore are not recorded in the shadow map texture): we know that any fragment in this situation is shadowed because it is obviously not visible from the light. This assumption is valid for spotlights and point lights because in these cases the shadow map captures the entire influence area of the light source, for directional lights that affect the entire scene however, we usually need to limit the light’s frustum to the surroundings of the camera, and in that case we probably want want to consider fragments outside the frustum as lighted instead.

    Now all that remains in the shader code is to use this factor to eliminate the diffuse and specular components for fragments that are in the shadows. To achieve this we can simply multiply these components by the factor computed by this function.

    Changes in the program

    The list of changes in the main program are straight forward: we only need to update the pipeline layout and descriptors to attach the new resources required by the shaders, specifically, the light’s view projection matrix in the vertex shader (which could be bound as a push constant buffer or a uniform buffer for example) and the shadow map sampler in the fragment shader.

    Binding the light’s ViewProjection matrix is no different from binding the other matrices we need in the shaders so I won’t cover it here. The shadow map sampler doesn’t really have any mysteries either, but since that is new let’s have a look at the code:

    ...
    VkSampler sampler;
    VkSamplerCreateInfo sampler_info = {};
    sampler_info.sType = VK_STRUCTURE_TYPE_SAMPLER_CREATE_INFO;
    sampler_info.addressModeU = VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE;
    sampler_info.addressModeV = VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE;
    sampler_info.addressModeW = VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE;
    sampler_info.anisotropyEnable = false;
    sampler_info.maxAnisotropy = 1.0f;
    sampler_info.borderColor = VK_BORDER_COLOR_INT_OPAQUE_BLACK;
    sampler_info.unnormalizedCoordinates = false;
    sampler_info.compareEnable = false;
    sampler_info.compareOp = VK_COMPARE_OP_ALWAYS;
    sampler_info.magFilter = VK_FILTER_LINEAR;
    sampler_info.minFilter = VK_FILTER_LINEAR;
    sampler_info.mipmapMode = VK_SAMPLER_MIPMAP_MODE_NEAREST;
    sampler_info.mipLodBias = 0.0f;
    sampler_info.minLod = 0.0f;
    sampler_info.maxLod = 100.0f;
    
    VkResult result =
       vkCreateSampler(device, &sampler_info, NULL, &sampler);
    ...
    

    This creates the sampler object that we will use to sample the shadow map image. The address mode fields are not very relevant since our shader ensures that we do not attempt to sample outside the shadow map, we use linear filtering, but that is not mandatory of course, and we select nearest for the mipmap filter because we don’t have more than one miplevel in the shadow map.

    Next we have to bind this sampler to the actual shadow map image. As usual in Vulkan, we do this with a descriptor update. For that we need to create a descriptor of type VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, and then do the update like this:

    VkDescriptorImageInfo image_info;
    image_info.sampler = sampler;
    image_info.imageView = shadow_map_view;
    image_info.imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
    
    VkWriteDescriptorSet writes;
    writes.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
    writes.pNext = NULL;
    writes.dstSet = image_descriptor_set;
    writes.dstBinding = 0;
    writes.dstArrayElement = 0;
    writes.descriptorCount = 1;
    writes.descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
    writes.pBufferInfo = NULL;
    writes.pImageInfo = &image_info;
    writes.pTexelBufferView = NULL;
    
    vkUpdateDescriptorSets(ctx->device, 1, &writes, 0, NULL);
    

    A combined image sampler brings together the texture image to sample from (a VkImageView of the image actually) and the description of the filtering we want to use to sample that image (a VkSampler). As with all descriptor sets, we need to indicate its binding point in the set (in our case it is 0 because we have a separate descriptor set layout for this that only contains one binding for the combined image sampler).

    Notice that we need to specify the layout of the image when it will be sampled from the shaders, which needs to be VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL.
    If you revisit the definition of our render pass for the shadow map image, you’ll see that we had it automatically transition the shadow map to this layout at the end of the render pass, so we know the shadow map image will be in this layout immediately after it has been rendered, so we don’t need to add barriers to execute the layout transition manually.

    So that’s it, with this we have all the pieces and our scene should be rendering shadows now. Unfortunately, we are not quite done yet, if you look at the results, you will notice a lot of dark noise in surfaces that are directly lit. This is an artifact of shadow mapping called self-shadowing or shadow acne. The next section explains how to get rid of it.

    Self-shadowing artifacts

    Eliminating self-shadowing

    Self-shadowing can happen for fragments on surfaces that are directly lit by a source light for which we are producing a shadow map. The reason for this is that these are the fragments’s Z coordinate in light space should exactly match the value we read from the shadow map for the same X,Y coordinates. In other words, for these fragments we expect:

    light_space_ndc.z == texture(shadow_map, shadow_map_coord.xy).x.
    

    However, due to different precession errors that can be generated on both sides of that equation, we may end up with slightly different values for each side and when the value we produce for light_space_ndc.z end ups being larger than what we read from the shadow map, even if it is a very small amount, it will mark the pixel as shadowed, leading to the result we see in that image.

    The usual way to fix this problem involves adding a small depth offset or bias to the depth values we store in the shadow map so we ensure that we always read a larger value from the shadow map for the fragment. Another way to think about this is to think that when we record the shadow map, we push every object in the scene slightly away from the light source. Unfortunately, this depth offset bias should not be a constant value, since the angle between the surface normals and the vectors from the light source to the fragments also affects the bias value that we should use to correct the self-shadowing.

    Thankfully, GPU hardware provides means to account for this. In Vulkan, when we define the rasterization state of the pipeline we use to create the shadow map, we can add the following:

    VkPipelineRasterizationStateCreateInfo rs;
    ...
    rs.depthBiasEnable = VK_TRUE;
    rs.depthBiasConstantFactor = 4.0f;
    rs.depthBiasSlopeFactor = 1.5f;
    

    Where depthBiasConstantFactor is a constant factor that is automatically added to all depth values produced and depthBiasSlopeFactor is a factor that is used to compute depth offsets also based on the angle. This provides us with the means we need without having to do any extra work in the shaders ourselves to offset the depth values correctly. In OpenGL the same functionality is available via glPolygonOffset().

    Notice that the bias values that need to be used to obtain the best results can change for each scene. Also, notice that too big values can lead to shadows that are “detached” from the objects that cast them leading to very unrealistic results. This effect is also known as Peter Panning, and can be observed in this image:

    Peter Panning artifacts

    As we can see in the image, we no longer have self-shadowing, but now we have the opposite problem: the shadows casted by the red and blue blocks are visibly incorrect, as if they were being rendered further away from the light source than they should be.

    If the bias values are chosen carefully, then we should be able to get a good result, although some times we might need to accept some level of visible self-shadowing or visible Peter Panning:

    Correct shadowing

    The image above shows correct shadowing without any self-shadowing or visible Peter Panning. You may wonder why we can’t see some of the shadows from the red light in the floor where the green light is more intense. The reason is that even though it is not clear because I don’t actually render the objects projecting the lights, the green light is mostly looking down, so its reflection on the floor (that has normals pointing upwards) is strong enough that the contribution from the red light to the floor pixels in this area is insignificant in comparison making the shadows casted from the red light barely visible. You can still see some shadowing if you get close enough with the camera though, I promise 😉

    Shadow antialiasing

    The images above show aliasing around at the edges of the shadows. This happens because for each fragment we decide if it is shadowed or not as a boolean decision, and we use that result to fully shadow or fully light the pixel, leading to aliasing:

    Shadow aliasing

    Another thing contributing to the aliasing effect is that a single pixel in the shadow map image can possibly expand to multiple pixels in camera space. That can happen if the camera is looking at an area of the scene that is close to the camera, but far away from the light source for example. In that case, the resolution of that area of the scene in the shadow map is small, but it is large for the camera, meaning that we end up sampling the same pixel from the shadow map to shadow larger areas in the scene as seen by the camera.

    Increasing the resolution of the shadow map image will help with this, but it is not a very scalable solution and can quickly become prohibitive. Alternatively, we can implement something called Percentage-Closer Filtering to produce antialiased shadows. The technique is simple: instead of sampling just one texel from the shadow map, we take multiple samples in its neighborhood and average the results to produce shadow factors that do not need to be exactly 1 o 0, but can be somewhere in between, producing smoother transitions for shadowed pixels on the shadow edges. The more samples we take, the smoother the shadows edges get but do note that extra samples per pixel also come with a performance cost.

    Smooth shadows with PCF

    This is how we can update our compute_shadow_factor() function to add PCF:

    float
    compute_shadow_factor(vec4 light_space_pos,
                          sampler2D shadow_map,
                          uint shadow_map_size,
                          uint pcf_size)
    {
       vec3 light_space_ndc = light_space_pos.xyz /= light_space_pos.w;
    
       if (abs(light_space_ndc.x) > 1.0 ||
           abs(light_space_ndc.y) > 1.0 ||
           abs(light_space_ndc.z) > 1.0)
          return 0.0;
    
       vec2 shadow_map_coord = light_space_ndc.xy * 0.5 + 0.5;
    
       // compute total number of samples to take from the shadow map
       int pcf_size_minus_1 = int(pcf_size - 1);
       float kernel_size = 2.0 * pcf_size_minus_1 + 1.0;
       float num_samples = kernel_size * kernel_size;
    
       // Counter for the shadow map samples not in the shadow
       float lighted_count = 0.0;
    
       // Take samples from the shadow map
       float shadow_map_texel_size = 1.0 / shadow_map_size;
       for (int x = -pcf_size_minus_1; x <= pcf_size_minus_1; x++)
       for (int y = -pcf_size_minus_1; y <= pcf_size_minus_1; y++) {
          // Compute coordinate for this PFC sample
          vec2 pcf_coord = shadow_map_coord + vec2(x, y) * shadow_map_texel_size;
    
          // Check if the sample is in light or in the shadow
          if (light_space_ndc.z <= texture(shadow_map, pcf_coord.xy).x)
             lighted_count += 1.0;
       }
    
       return lighted_count / num_samples;
    }
    

    We now have a loop where we go through the samples in the neighborhood of the texel and average their respective shadow factors. Notice that because we sample the shadow map in texture space [0, 1], we need to consider the size of the shadow map image to properly compute the coordinates for the texels in the neighborhood so the application needs to provide this for every shadow map.

    Conclusion

    In this post we discussed how to use the shadow map image to produce shadows in the scene as well as typical issues that can show up with the shadow mapping technique, such as self-shadowing and aliasing, and how to correct them. This will be the last post in this series, there is a lot more stuff to cover about lighting and shadowing, such as Cascaded Shadow Maps (which I introduced briefly in this other post), but I think (or I hope) that this series provides enough material to get anyone interested in the technique a reference for how to implement it.

    by Iago Toral at October 02, 2017 09:42 AM

    September 30, 2017

    Samuel Iglesias

    II Google Devfest Asturias 2017

    Hoy os hablo en la lengua de Cervantes para comentaros que el miércoles pasado fui invitado a dar una charla sobre Vulkan en el II Google DevFest Asturias organizado por GDG Asturias. Cabe destacar que este evento parte de la VII Semana de Impulso TIC organizada por el COIIPA y CITIPA, la cual es una magnífica manera de conocer qué se está haciendo en el mundo de las TIC en el Principado de Asturias.

    Mi charla se centró en explicar qué problemas pretende solucionar Vulkan y cuáles son los conceptos que introduce este nuevo API para aplicaciones de gráficos 3D. Espero que sea una charla útil para la gente que quiera conocer Vulkan teniendo algo de conocimiento previo en gráficos.

    Las slides de la charla están subidas aquí.

    GDG Asturias

    September 30, 2017 10:33 AM

    September 19, 2017

    Asumu Takikawa

    IPFIX app for Snabb

    As you know if you’ve been following this blog, at Igalia we build network functions using the Snabb toolkit. When we’re not directly working on customer projects, we often invest time into building new features into Snabb.

    One of these recent investments has been building a basic IP flow export (IPFIX) app for Snabb. The app is now available in the v2017.08 Snabb release. The app’s documentation is up on the web here.

    As you can see from the commit log, Andy Wingo helped out a great deal in building the app (and was responsible for much of the performance engineering).

    What is IP flow export?

    IPFIX refers to a widely used set of tools that let you monitor IP flows in your network. An IP flow is a set of IP packets that share some common characteristics. Often these are described by a flow key composed of a standard 5-tuple of source & destination address, protocol, and TCP/UDP/etc ports.

    Monitoring flows can be useful for traffic measurement and management, usage-based billing, and other use cases (many of which are spelled out in RFC 3917).

    You may have also heard of “Netflow”, which is the trade name used by Cisco for these tools. IPFIX is the standardized version (see RFC 5470 and related RFCs) that is backwards compatible with Netflow.

    (A great long-form overview of the topic is “Flow Monitoring Explained” by Hofstede et al, which was very helpful in designing the app)

    The basic architecture of IPFIX consists of a flow metering process which monitors the traffic, an exporter that communicates the data collected by the meter, and finally a collector that aggregates the data. In practice, the metering process and exporter are often combined into one program (so I’ll just refer to it as an exporter or as a probe).

    Here’s some ASCII art from the RFC showing the structure:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
                                 +----------------+     +----------------+
                                 |[*Application 1]| ... |[*Application n]|
                                 +--------+-------+     +-------+--------+
                                          ^                     ^
                                          |                     |
                                          + = = = = -+- = = = = +
                                                     ^
                                                     |
       +------------------------+            +-------+------------------+
       |IPFIX Exporter          |            | Collector(1)             |
       |[Exporting Process(es)] |<---------->| [Collecting Process(es)] |
       +------------------------+            +--------------------------+
               ....                                  ....
       +------------------------+           +---------------------------+
       |IPFIX Device(i)         |           | Collector(j)              |
       |[Observation Point(s)]  |<--------->| [Collecting Process(es)]  |
       |[Metering Process(es)]  |     +---->| [*Application(s)]         |
       |[Exporting Process(es)] |     |     +---------------------------+
       +------------------------+     .
              ....                    .              ....
       +------------------------+     |     +--------------------------+
       |IPFIX Device(m)         |     |     | Collector(n)             |
       |[Observation Point(s)]  |<----+---->| [Collecting Process(es)] |
       |[Metering Process(es)]  |           | [*Application(s)]        |
       |[Exporting Process(es)] |           +--------------------------+
       +------------------------+
    

    (what the RFC calls a “Device” contains some number of “metering processes”)

    The exporter might track information such as the total number of packets, payload sizes, and start/end times for a unique flow (these are called information elements and are standardized by the IANA).

    The exporter periodically sends its summarized data to a flow collector, which uses some kind of database to keep and track all of the flow information (unlike the exporter which may evict inactive flows). The collector can present this information to users in a variety of ways (e.g., a web UI).

    An exporter communicates with a collector using the IPFIX protocol, so that you can use an exporter with any off-the-shelf collector.

    Snabb app

    For Snabb, we implemented an IPFIX exporter as an app that can be integrated with other apps. For convenience, we provide a snabb ipfix probe commandline program that runs an exporter with a simple configuration.

    The flow keys and record fields to be collected are described using Lua tables, like the following:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    v4 = make_template_info {
       id     = 256,
       filter = "ip",
       keys   = { "sourceIPv4Address",
                  "destinationIPv4Address",
                  "protocolIdentifier",
                  "sourceTransportPort",
                  "destinationTransportPort" },
       values = { "flowStartMilliseconds",
                  "flowEndMilliseconds",
                  "packetDeltaCount",
                  "octetDeltaCount"}
    }
    

    This table describes a flow record that’s then used to dynamically generate the appropriate FFI data structures and configure the control flow of the app. The fields are described using the names from the IANA information element table.

    Since the exporter collects some information from all of the IP packets going through it, it has to be performant and use minimal resources.

    Making IPFIX perform well

    Performance was the most difficult part of implementing the IPFIX app. The core of the app is fairly simple. For each packet, it looks at the fields that define a flow (currently we just support the usual 5-tuple) and uses this as a flow key to index a ctable.

    Since ctables are implemented as a hashtable, this means that the hashing can be a key bottleneck for the app. This isn’t really surprising since hashing is a key operation that is often optimized and also delegated to hardware in networking.

    It turns out that the hashing algorithm used by ctable performed well with keys of certain sizes (4 or 8 bytes), but not with larger keys such as flow keys. We tried out several hash algorithms (e.g., FNV hashing) and got some incremental improvements. In the end, Andy was able to get better performance out of implementing SipHash using DynASM.

    Andy also implemented a bunch of other performance improvements, such as separating out the IPv4 and IPv6 data paths and re-organizing the multiple original apps into a single app that contains several mini-queues and work loops. In the end, these improvements got the app to line-rate performance in our synthetic benchmark for most packet sizes (performance on 64-byte packet sizes needs more work).

    In terms of real-world performance, we still need to do more work but are making progress thanks to our early adopters. Alexander Gall from SWITCH has been providing us with valuable feedback from running and hacking on the app with production data.

    Future improvements

    At this point, we have an IPFIX app that can be integrated into other Snabb solutions or used standalone on commodity Intel hardware. There’s still a lot of work that can be done on it though. One idea for improvement on the performance side is to parallelize it using the RSS feature on 10G network cards. RSS (receive-side scaling) lets you use multiple processes running in parallel on several receive queues on a single NIC.

    Conveniently, it turns out that at Igalia we’ve also been working on improving Snabb’s network drivers to make it easier to use RSS. There’s a good chance that we can parallelize IPFIX very easily, since there’s minimal coordination that’s needed between IPFIX instances.

    There are a number of other improvements we could make too. Probably the most useful is to add support for more information elements, and to make the observed IEs more configurable.

    Another area for improvement is the app configuration. Snabb has support for app configuration with YANG schemas and the IETF has a schema for IPFIX that we could use.

    Another limitation is that the IPFIX app currently just exports over UDP to a collector. The RFCs technically requires support for SCTP as well (adding that could be a lot of work, maybe we would offload the work to an existing userspace library).

    Final thoughts

    Working on IPFIX was pretty fun overall. One of the rewarding aspects of working on the IPFIX app is that it’s relatively easy to test it and see that it’s doing something. For example, you can plug it up to Wireshark and manually check the IPFIX packets to see what’s going on.

    You can also plug it into an off-the-shelf flow collector like nfdump and see some useful output. In fact, we have a test written using nix-shell that will spawn a shell in a new Nix enviroment with nfdump installed and test the app with it. Nix can be very nice for this kind of test environment setup.

    In any case, please feel free to try out the app. We would appreciate any feedback or bug reports!

    by Asumu Takikawa at September 19, 2017 01:23 AM

    September 14, 2017

    Jacobo Aragunde

    Attending BlinkOn 8

    Next week I will be in Tokyo to attend BlinkOn 8! It will be a great opportunity to meet the Chromium community and share what we are doing.

    Godzilla at Shinjuku

    I will give a lightning talk about the challenges of making Chromium run on embedded platforms. I hope to spark the curiosity of the audience in this complex field!

    EDIT: some pictures from the event:

    by Jacobo Aragunde Pérez at September 14, 2017 05:28 PM

    September 09, 2017

    Carlos García Campos

    WebDriver support in WebKitGTK+ 2.18

    WebDriver is an automation API to control a web browser. It allows to create automated tests for web applications independently of the browser and platform. WebKitGTK+ 2.18, that will be released next week, includes an initial implementation of the WebDriver specification.

    WebDriver in WebKitGTK+

    There’s a new process (WebKitWebDriver) that works as the server, processing the clients requests to spawn and control the web browser. The WebKitGTK+ driver is not tied to any specific browser, it can be used with any WebKitGTK+ based browser, but it uses MiniBrowser as the default. The driver uses the same remote controlling protocol used by the remote inspector to communicate and control the web browser instance. The implementation is not complete yet, but it’s enough for what many users need.

    The clients

    The web application tests are the clients of the WebDriver server. The Selenium project provides APIs for different languages (Java, Python, Ruby, etc.) to write the tests. Python is the only language supported by WebKitGTK+ for now. It’s not yet upstream, but we hope it will be integrated soon. In the meantime you can use our fork in github. Let’s see an example to understand how it works and what we can do.

    from selenium import webdriver
    
    # Create a WebKitGTK driver instance. It spawns WebKitWebDriver 
    # process automatically that will launch MiniBrowser.
    wkgtk = webdriver.WebKitGTK()
    
    # Let's load the WebKitGTK+ website.
    wkgtk.get("https://www.webkitgtk.org")
    
    # Find the GNOME link.
    gnome = wkgtk.find_element_by_partial_link_text("GNOME")
    
    # Click on the link. 
    gnome.click()
    
    # Find the search form. 
    search = wkgtk.find_element_by_id("searchform")
    
    # Find the first input element in the search form.
    text_field = search.find_element_by_tag_name("input")
    
    # Type epiphany in the search field and submit.
    text_field.send_keys("epiphany")
    text_field.submit()
    
    # Let's count the links in the contents div to check we got results.
    contents = wkgtk.find_element_by_class_name("content")
    links = contents.find_elements_by_tag_name("a")
    assert len(links) > 0
    
    # Quit the driver. The session is closed so MiniBrowser 
    # will be closed and then WebKitWebDriver process finishes.
    wkgtk.quit()
    

    Note that this is just an example to show how to write a test and what kind of things you can do, there are better ways to achieve the same results, and it depends on the current source of public websites, so it might not work in the future.

    Web browsers / applications

    As I said before, WebKitWebDriver process supports any WebKitGTK+ based browser, but that doesn’t mean all browsers can automatically be controlled by automation (that would be scary). WebKitGTK+ 2.18 also provides new API for applications to support automation.

    • First of all the application has to explicitly enable automation using webkit_web_context_set_automation_allowed(). It’s important to know that the WebKitGTK+ API doesn’t allow to enable automation in several WebKitWebContexts at the same time. The driver will spawn the application when a new session is requested, so the application should enable automation at startup. It’s recommended that applications add a new command line option to enable automation, and only enable it when provided.
    • After launching the application the driver will request the browser to create a new automation session. The signal “automation-started” will be emitted in the context to notify the application that a new session has been created. If automation is not allowed in the context, the session won’t be created and the signal won’t be emitted either.
    • A WebKitAutomationSession object is passed as parameter to the “automation-started” signal. This can be used to provide information about the application (name and version) to the driver that will match them with what the client requires accepting or rejecting the session request.
    • The WebKitAutomationSession will emit the signal “create-web-view” every time the driver needs to create a new web view. The application can then create a new window or tab containing the new web view that should be returned by the signal. This signal will always be emitted even if the browser has already an initial web view open, in that case it’s recommened to return the existing empty web view.
    • Web views are also automation aware, similar to ephemeral web views, web views that allow automation should be created with the constructor property “is-controlled-by-automation” enabled.

    This is the new API that applications need to implement to support WebDriver, it’s designed to be as safe as possible, but there are many things that can’t be controlled by WebKitGTK+, so we have several recommendations for applications that want to support automation:

    • Add a way to enable automation in your application at startup, like a command line option, that is disabled by default. Never allow automation in a normal application instance.
    • Enabling automation is not the only thing the application should do, so add an automation mode to your application.
    • Add visual feedback when in automation mode, like changing the theme, the window title or whatever that makes clear that a window or instance of the application is controllable by automation.
    • Add a message to explain that the window is being controlled by automation and the user is not expected to use it.
    • Use ephemeral web views in automation mode.
    • Use a temporal user profile in application mode, do not allow automation to change the history, bookmarks, etc. of an existing user.
    • Do not load any homepage in automation mode, just keep an empty web view (about:blank) that can be used when a new web view is requested by automation.

    The WebKitGTK client driver

    Applications need to implement the new automation API to support WebDriver, but the WebKitWebDriver process doesn’t know how to launch the browsers. That information should be provided by the client using the WebKitGTKOptions object. The driver constructor can receive an instance of a WebKitGTKOptions object, with the browser information and other options. Let’s see how it works with an example to launch epiphany:

    from selenium import webdriver
    from selenium.webdriver import WebKitGTKOptions
    
    options = WebKitGTKOptions()
    options.browser_executable_path = "/usr/bin/epiphany"
    options.add_browser_argument("--automation-mode")
    epiphany = webdriver.WebKitGTK(browser_options=options)
    

    Again, this is just an example, Epiphany doesn’t even support WebDriver yet. Browsers or applications could create their own drivers on top of the WebKitGTK one to make it more convenient to use.

    from selenium import webdriver
    epiphany = webdriver.Epiphany()
    

    Plans

    During the next release cycle, we plan to do the following tasks:

    • Complete the implementation: add support for all commands in the spec and complete the ones that are partially supported now.
    • Add support for running the WPT WebDriver tests in the WebKit bots.
    • Add a WebKitGTK driver implementation for other languages in Selenium.
    • Add support for automation in Epiphany.
    • Add WebDriver support to WPE/dyz.

    by carlos garcia campos at September 09, 2017 05:33 PM

    September 06, 2017

    Frédéric Wang

    Review of Igalia's Web Platform activities (H1 2017)

    Introduction

    For many years Igalia has been committed to and dedicated efforts to the improvement of Web Platform in all open-source Web Engines (Chromium, WebKit, Servo, Gecko) and JavaScript implementations (V8, SpiderMonkey, ChakraCore, JSC). We have been working in the implementation and standardization of some important technologies (CSS Grid/Flexbox, ECMAScript, WebRTC, WebVR, ARIA, MathML, etc). This blog post contains a review of these activities performed during the first half (and a bit more) of 2017.

    Projects

    CSS

    A few years ago Bloomberg and Igalia started a collaboration to implement a new layout model for the Web Platform. Bloomberg had complex layout requirements and what the Web provided was not enough and caused performance issues. CSS Grid Layout seemed to be the right choice, a feature that would provide such complex designs with more flexibility than the currently available methods.

    We’ve been implementing CSS Grid Layout in Blink and WebKit, initially behind some flags as an experimental feature. This year, after some coordination effort to ensure interoperability (talking to the different parties involved like browser vendors, the CSS Working Group and the web authors community), it has been shipped by default in Chrome 58 and Safari 10.1. This is a huge step for the layout on the web, and modern websites will benefit from this new model and enjoy all the features provided by CSS Grid Layout spec.

    Since the CSS Grid Layout shared the same alignment properties as the CSS Flexible Box feature, a new spec has been defined to generalize alignment for all the layout models. We started implementing this new spec as part of our work on Grid, being Grid the first layout model supporting it.

    Finally, we worked on other minor CSS features in Blink such as caret-color or :focus-within and also several interoperability issues related to Editing and Selection.

    MathML

    MathML is a W3C recommendation to represent mathematical formulae that has been included in many other standards such as ISO/IEC, HTML5, ebook and office formats. There are many tools available to handle it, including various assistive technologies as well as generators from the popular LaTeX typesetting system.

    After the improvements we performed in WebKit’s MathML implementation, we have regularly been in contact with Google to see how we can implement MathML in Chromium. Early this year, we had several meetings with Google’s layout team to discuss this in further details. We agreed that MathML is an important feature to consider for users and that the right approach would be to rely on the new LayoutNG model currently being implemented. We created a prototype for a small LayoutNG-based MathML implementation as a proof-of-concept and as a basis for future technical discussions. We are going to follow-up on this after the end of Q3, once Chromium’s layout team has made more progress on LayoutNG.

    Servo

    Servo is Mozilla’s next-generation web content engine based on Rust, a language that guarantees memory safety. Servo relies on a Rust project called WebRender which replaces the typical rasterizer and compositor duo in the web browser stack. WebRender makes extensive use of GPU batching to achieve very exciting performance improvements in common web pages. Mozilla has decided to make WebRender part of the Quantum Render project.

    We’ve had the opportunity to collaborate with Mozilla for a few years now, focusing on the graphics stack. Our work has focused on bringing full support for CSS stacking and clipping to WebRender, so that it will be available in both Servo and Gecko. This has involved creating a data structure similar to what WebKit calls the “scroll tree” in WebRender. The scroll tree divides the scene into independently scrolled elements, clipped elements, and various transformation spaces defined by CSS transforms. The tree allows WebRender to handle page interaction independently of page layout, allowing maximum performance and responsiveness.

    WebRTC

    WebRTC is a collection of communications protocols and APIs that enable real-time communication over peer-to-peer connections. Typical use cases include video conferencing, file transfer, chat, or desktop sharing. Igalia has been working on the WebRTC implementation in WebKit and this development is currently sponsored by Metrological.

    This year we have continued the implementation effort in WebKit for the WebKitGTK and WebKit WPE ports, as well as the maintenance of two test servers for WebRTC: Ericsson’s p2p and Google’s apprtc. Finally, a lot of progress has been done to add support for Jitsi using the existing OpenWebRTC backend.

    Since OpenWebRTC development is not an active project anymore and given libwebrtc is gaining traction in both Blink and the WebRTC implementation of WebKit for Apple software, we are taking the first steps to replace the original WebRTC implementation in WebKitGTK based on OpenWebRTC, with a new one based on libwebrtc. Hopefully, this way we will share more code between platforms and get more robust support of WebRTC for the end users. GStreamer integration in this new implementation is an issue we will have to study, as it’s not built in libwebrtc. libwebrtc offers many services, but not every WebRTC implementation uses all of them. This seems to be the case for the Apple WebRTC implementation, and it may become our case too if we need tighter integration with GStreamer or hardware decoding.

    WebVR

    WebVR is an API that provides support for virtual reality devices in Web engines. Implementation and devices are currently actively developed by browser vendors and it looks like it is going to be a huge thing. Igalia has started to investigate on that topic to see how we can join that effort. This year, we have been in discussions with Mozilla, Google and Apple to see how we could help in the implementation of WebVR on Linux. We decided to start experimenting an implementation within WebKitGTK. We announced our intention on the webkit-dev mailing list and got encouraging feedback from Apple and the WebKit community.

    ARIA

    ARIA defines a way to make Web content and Web applications more accessible to people with disabilities. Igalia strengthened its ongoing committment to the W3C: Joanmarie Diggs joined Richard Schwerdtfeger as a co-Chair of the W3C’s ARIA working group, and became editor of the Core Accessibility API Mappings, [Digital Publishing Accessibility API Mappings] (https://w3c.github.io/aria/dpub-aam/dpub-aam.html), and Accessible Name and Description: Computation and API Mappings specifications. Her main focus over the past six months has been to get ARIA 1.1 transitioned to Proposed Recommendation through a combination of implementation and bugfixing in WebKit and Gecko, creation of automated testing tools to verify platform accessibility API exposure in GNU/Linux and macOS, and working with fellow Working Group members to ensure the platform mappings stated in the various “AAM” specs are complete and accurate. We will provide more information about these activities after ARIA 1.1 and the related AAM specs are further along on their respective REC tracks.

    Web Platform Predictability for WebKit

    The AMP Project has recently sponsored Igalia to improve WebKit’s implementation of the Web platform. We have worked on many issues, the main ones being:

    • Frame sandboxing: Implementing sandbox values to allow trusted third-party resources to open unsandboxed popups or restrict unsafe operations of malicious ones.
    • Frame scrolling on iOS: Addressing issues with scrollable nodes; trying to move to a more standard and interoperable approach with scrollable iframes.
    • Root scroller: Finding a solution to the old interoperability issue about how to scroll the main frame; considering a new rootScroller API.

    This project aligns with Web Platform Predictability which aims at making the Web more predictable for developers by improving interoperability, ensuring version compatibility and reducing footguns. It has been a good opportunity to collaborate with Google and Apple on improving the Web. You can find further details in this blog post.

    JavaScript

    Igalia has been involved in design, standardization and implementation of several JavaScript features in collaboration with Bloomberg and Mozilla.

    In implementation, Bloomberg has been sponsoring implementation of modern JavaScript features in V8, SpiderMonkey, JSC and ChakraCore, in collaboration with the open source community:

    • Implementation of many ES6 features in V8, such as generators, destructuring binding and arrow functions
    • Async/await and async iterators and generators in V8 and some work in JSC
    • Optimizing SpiderMonkey generators
    • Ongoing implementation of BigInt in SpiderMonkey and class field declarations in JSC

    On the design/standardization side, Igalia is active in TC39 and with Bloomberg’s support

    In partnership with Mozilla, Igalia has been involved in the specification of various JavaScript standard library features for internationalization, in specification, implementation in V8, code reviews in other JavaScript engines, as well as working with the underlying ICU library.

    Other activities

    Preparation of Web Engines Hackfest 2017

    Igalia has been organizing and hosting the Web Engines Hackfest since 2009. This event under an unconference format has been a great opportunity for Web Engines developers to meet, discuss and work together on the web platform and on web engines in general. We announced the 2017 edition and many developers already confirmed their attendance. We would like to thank our sponsors for supporting this event and we are looking forward to seeing you in October!

    Coding Experience

    Emilio Cobos has completed his coding experience program on implementation of web standards. He has been working in the implementation of “display: contents” in Blink but some work is pending due to unresolved CSS WG issues. He also started the corresponding work in WebKit but implementation is still very partial. It has been a pleasure to mentor a skilled hacker like Emilio and we wish him the best for his future projects!

    New Igalians

    During this semester we have been glad to welcome new igalians who will help us to pursue Web platform developments:

    • Daniel Ehrenberg joined Igalia in January. He is an active contributor to the V8 JavaScript engine and has been representing Igalia at the ECMAScript TC39 meetings.
    • Alicia Boya joined Igalia in March. She has experience in many areas of computing, including web development, computer graphics, networks, security, and software design with performance which we believe will be valuable for our Web platform activities.
    • Ms2ger joined Igalia in July. He is a well-known hacker of the Mozilla community and has wide experience in both Gecko and Servo. He has noticeably worked in DOM implementation and web platform test automation.

    Conclusion

    Igalia has been involved in a wide range of Web Platform technologies going from Javascript and layout engines to accessibility or multimedia features. Efforts have been made in all parts of the process:

    • Participation to standardization bodies (W3C, TC39).
    • Elaboration of conformance tests (web-platform-tests test262).
    • Implementation and bug fixes in all open source web engines.
    • Discussion with users, browser vendors and other companies.

    Although, some of this work has been sponsored by Google or Mozilla, it is important to highlight how external companies (other than browser vendors) can make good contributions to the Web Platform, playing an important role on its evolution. Alan Stearns already pointed out the responsibility of the Web Plaform users on the evolution of CSS while Rachel Andrew emphasized how any company or web author can effectively contribute to the W3C in many ways.

    As mentioned in this blog post, Bloomberg is an important contributor of several open source projects and they’ve been a key player in the development of CSS Grid Layout or Javascript. Similarly, Metrological’s support has been instrumental for the implementation of WebRTC in WebKit. We believe others could follow their examples and we are looking forward to seeing more companies sponsoring Web Platform developments!

    September 06, 2017 10:00 PM

    August 31, 2017

    Xabier Rodríguez Calvar

    Some rough numbers on WebKit code

    My wife asked me for some rough LOC numbers on the WebKit project and I think I could share them with you here as well. They come from r221232. As I’ll take into account some generated code it is relevant to mention that I built WebKitGTK+ with the default CMake options.

    First thing I did was running sloccount Source and got the following numbers:

    cpp: 2526061 (70.57%)
    ansic: 396906 (11.09%)
    asm: 207284 (5.79%)
    javascript: 175059 (4.89%)
    java: 74458 (2.08%)
    perl: 73331 (2.05%)
    objc: 44422 (1.24%)
    python: 38862 (1.09%)
    cs: 13011 (0.36%)
    ruby: 11605 (0.32%)
    xml: 11396 (0.32%)
    sh: 3747 (0.10%)
    yacc: 2167 (0.06%)
    lex: 1007 (0.03%)
    lisp: 89 (0.00%)
    php: 10 (0.00%)

    This number do not include IDL code so I did some grepping to get the number myself that gave me 19632 IDL lines:

    $ find Source/ -name ".idl" | xargs cat | grep -ve "^[[:space:]]\/*" -ve "^[[:space:]]*" -ve "^[[:space:]]$" -ve "^[[:space:]][$" -ve "^[[:space:]]};$" | wc -l
    19632

    The interesting part of the IDL files is that they are used to generate code so those 19632 IDL lines expand to:

    ansic: 699140 (65.25%)
    cpp: 368720 (34.41%)
    python: 1492 (0.14%)
    xml: 1040 (0.10%)
    javascript: 883 (0.08%)
    asm: 169 (0.02%)
    perl: 11 (0.00%)

    Let’s have a look now at the LayoutTests (they test the functionality of WebCore + the platform). Tests are composed mainly by HTML files so if you run sloccount LayoutTests you get:

    javascript: 401159 (76.74%)
    python: 87231 (16.69%)
    xml: 22978 (4.40%)
    php: 4784 (0.92%)
    ansic: 3661 (0.70%)
    perl: 2726 (0.52%)
    sh: 199 (0.04%)

    It’s quite interesting to see that sloccount does not consider HTML which is quite relevant when you’re testing a web engine so again, we have to count them manually (thanks to Carlos López who helped me to properly grep here as some binary lines were giving me a headache to get the numbers):

    find LayoutTests/ -name ".html" -print0 | xargs -0 cat | strings | grep -Pv "^[[:space:]]$" | wc -l
    2205690

    You can see 2205690 of “meaningful lines” that combine HTML + other languages that you can see above. I can’t substract here to just get the HTML lines because the number above take into account files with a different extension than HTML, though many of them do include other languages, specially JavaScript.

    But the LayoutTests do not include only pure WebKit tests. There are some imported ones so it might be interesting to run the same procedure under LayoutTests/imported to see which ones are imported and not written directly into the WebKit project. I emphasize that because they can be written by WebKit developers in other repositories and actually I can present myself and Youenn Fablet as an example as we wrote tests some tests that were finally moved into the specification and included back later when imported. So again, sloccount LayoutTests/imported:

    python: 84803 (59.99%)
    javascript: 51794 (36.64%)
    ansic: 3661 (2.59%)
    php: 575 (0.41%)
    xml: 250 (0.18%)
    sh: 199 (0.14%)
    perl: 86 (0.06%)

    The same procedure to count HTML + other stuff lines inside that directory gives a number of 295490:

    $ find LayoutTests/imported/ -name ".html" -print0 | xargs -0 cat | strings | grep -Pv "^[[:space:]]$" | wc -l
    295490

    There are also some other tests that we can talk about, for example the JSTests. I’ll mention already the numbers summed up regarding languages and the manual HTML code (if you made it here, you know the drill already):

    javascript: 1713200 (98.64%)
    xml: 20665 (1.19%)
    perl: 2449 (0.14%)
    python: 421 (0.02%)
    ruby: 56 (0.00%)
    sh: 38 (0.00%)
    HTML+stuff: 997

    ManualTests:

    javascript: 297 (41.02%)
    ansic: 187 (25.83%)
    java: 118 (16.30%)
    xml: 103 (14.23%)
    php: 10 (1.38%)
    perl: 9 (1.24%)
    HTML+stuff: 16026

    PerformanceTests:

    javascript: 950916 (83.12%)
    cpp: 147194 (12.87%)
    ansic: 38540 (3.37%)
    asm: 5466 (0.48%)
    sh: 872 (0.08%)
    ruby: 419 (0.04%)
    perl: 348 (0.03%)
    python: 325 (0.03%)
    xml: 5 (0.00%)
    HTML+stuff: 238002

    TestsWebKitAPI:

    cpp: 44753 (99.45%)
    ansic: 163 (0.36%)
    objc: 76 (0.17%)
    xml: 7 (0.02%)
    javascript: 1 (0.00%)
    HTML+stuff: 3887

    And this is all. Remember that these are just some rough statistics, not a “scientific” paper.

    Update:

    In her expert opinion, in the WebKit project we are devoting around 50% of the total LOC to testing, which makes it a software engineering “textbook” project regarding testing and I think we can be proud of it!

    by calvaris at August 31, 2017 09:03 AM

    August 29, 2017

    Frédéric Wang

    The AMP Project and Igalia working together to improve WebKit and the Web Platform

    TL;DR

    The AMP Project and Igalia have recently been collaborating to improve WebKit’s implementation of the Web platform. Both teams are committed to make the Web better and we expect that all developers and users will benefit from this effort. In this blog post, we review some of the bug fixes and features currently being considered:

    • Frame sandboxing: Implementing sandbox values to allow trusted third-party resources to open unsandboxed popups or restrict unsafe operations of malicious ones.

    • Frame scrolling on iOS: Trying to move to a more standard and interoperable approach via iframe elements; addressing miscellaneous issues with scrollable nodes (e.g. visual artifacts while scrolling, view not scrolled when using “Find Text”…).

    • Root scroller: Finding a solution to the old interoperability issue about how to scroll the main frame; considering a new rootScroller API.

    Some demo pages for frame sandboxing and scrolling are also available if you wish to test features discussed in this blog post.

    Introduction

    AMP is an open-source project to enable websites and ads that are consistently fast, beautiful and high-performing across devices and distribution platforms. Several interoperability bugs and missing features in WebKit have caused problems to AMP users and to Web developers in general. Although it is possible to add platform-specific workarounds to AMP, the best way to help the Web Platform community is to directly fix these issues in WebKit, so that everybody can benefit from these improvements.

    Igalia is a consulting company with a team dedicated to Web Platform developments in all open-source Web Engines (Chromium, WebKit, Servo, Gecko) working in the implementation and standardization of miscellaneous technologies (CSS Grid/flexbox, ECMAScript, WebRTC, WebVR, ARIA, MathML, etc). Given this expertise, the AMP Project sponsored Igalia so that they can lead these developments in WebKit. It is worth noting that this project aligns with the Web Predictability effort supported by both Google and Igalia, which aims at making the Web more predictable for developers. In particular, the following aspects are considered:

    • Interoperability: Effort is made to write Web Platform Tests (WPT), to follow Web standards and ensure consistent behaviors between web engines or operating systems.
    • Compatibility: Changes are carefully analyzed using telemetry techniques or user feedback in order to avoid breaking compatibility with previous versions of WebKit.
    • Reducing footguns: Removals of non-standard features (e.g. CSS vendor prefixes) are attempted while new features are carefully introduced.

    Below we provide further description of the WebKit improvements, showing concretely how the above principles are followed.

    Frame sandboxing

    A sandbox attribute can be specified on the iframe element in order to enable a set of restrictions on any content it hosts. These conditions can be relaxed by specifying a list of values such as allow-scripts (to allow javascript execution in the frame) or allow-popups (to allow the frame to open popups). By default, the same restrictions apply to a popup opened by a sandboxed frame.

    iframe sandboxing
    Figure 1: Example of sandboxed frames (Can they navigate their top frame or open popups? Are such popups also sandboxed?)

    However, sometimes this behavior is not wanted. Consider for example the case of an advertisement inside a sandboxed frame. If a popup is opened from this frame then it is likely that a non-sandboxed context is desired on the landing page. In order to handle this use case, a new allow-popups-to-escape-sandbox value has been introduced. The value is now supported in Safari Technology Preview 34.

    While performing that work, it was noticed that some WPT tests for the sandbox attribute were still failing. It turns out that WebKit does not really follow the rules to allow navigation. More specifically, navigating a top context is never allowed when such context corresponds to an opened popup. We have made some changes to WebKit so that it behaves more closely to the specification. This is integrated into Safari Technology Preview 35 and you can for example try this W3C test. Note that this test requires to change preferences to allow popups.

    It is worth noting that web engines may slightly depart from the specification regarding the previously mentioned rules. In particular, WebKit checks a same-origin condition to be sure that one frame is allowed to navigate another one. WebKit always has contained a special case to ignore this condition when a sandboxed frame with the allow-top-navigation flag tries and navigate its top frame. This feature, sometimes known as “frame busting,” has been used by third-party resources to perform malicious auto-redirecting. As a consequence, Chromium developers proposed to restrict frame busting to the case where the navigation is triggered by a user gesture.

    According to Chromium’s telemetry frame busting without a user gesture is very rare. But when experimenting with the behavior change of allow-top-navigation several regressions were reported. Hence it was instead decided to introduce the allow-top-navigation-by-user-activation flag in order to provide this improved safety context while still preserving backward compatibility. We implemented this feature in WebKit and it is now available in Safari Technology Preview 37.

    Finally, another proposed security improvement is to use an allow-modals flag to explicitly allow sandboxed frames to display modal dialogs (with alert, prompt, etc). That is, the default behavior for sandboxed frames will be to forbid such modal dialogs. Again, such a change of behavior must be done with care. Experiments in Chromium showed that the usage of modal dialogs in sandboxed frames is very low and no users complained. Hence we implemented that behavior in WebKit and the feature should arrive in Safari Technology Preview soon.

    Check out the frame sandboxing demos if if you want to test the new allow-popup-to-escape-sandbox, allow-top-navigation-without-user-activation and allow-modals flags.

    Frame scrolling on iOS

    Apple’s UI choice was to (almost) always “flatten” (expand) frames so that users do not require to scroll them. The rationale for this is that it avoids to be trapped into hierarchy of nested frames. Changing that behavior is likely to cause a big backward compatibility issue on iOS so for now we proposed a less radical solution: Add a heuristic to support the case of “fullscreen” iframes used by the AMP Project. Note that such exceptions already exist in WebKit, e.g. to avoid making offscreen content visible.

    We thus added the following heuristic into WebKit Nightly: do not flatten out-of-flow iframes (e.g. position: absolute) that have viewport units (e.g. vw and vh). This includes the case of the “fullscreen” iframe previously mentioned. For now it is still under a developer flag so that WebKit developers can control when they want to enable it. Of course, if this is successful we might consider more advanced heuristics.

    The fact that frames are never scrollable in iOS is an obvious interoperability issue. As a workaround, it is possible to emulate such “scrollable nodes” behavior using overflow: scroll nodes with the -webkit-overflow-scrolling: touch property set. This is not really ideal for our Web Predictability goal as we would like to get rid of browser vendor prefixes. Also, in practice such workarounds lead to even more problems in AMP as explained in these blog posts. That’s why implementing scrolling of frames is one of the main goals of this project and significant steps have already been made in that direction.

    Class Hierarchy
    Figure 2: C++ classes involved in frame scrolling

    The (relatively complex) class hierarchy involved in frame scrolling is summarized in Figure 2. The frame flattening heuristic mentioned above is handled in the WebCore::RenderIFrame class (in purple). The WebCore::ScrollingTreeFrameScrollingNodeIOS and WebCore::ScrollingTreeOverflowScrollingNodeIOS classes from the scrolling tree (in blue) are used to scroll, respectively, the main frame and overflow nodes on iOS. Scrolling of non-main frames will obviously have some code to share with the former, but it will also have some parts in common with the latter. For example, passing an extra UIScrollView layer is needed instead of relying on the one contained in the WKWebView of the main frame. An important step is thus to introduce a special class for scrolling inner frames that would share some logic from the two other classes and some refactoring to ensure optimal code reuse. Similar refactoring has been done for scrolling node states (in red) to move the scrolling layer parameter into WebCore::ScrollingStateNode instead of having separate members for WebCore::ScrollingStateOverflowScrollingNode and WebCore::ScrollingStateFrameScrollingNode.

    The scrolling coordinator classes (in green) are also important, for example to handle hit testing. At the moment, this is not really implemented for overflow nodes but it might be important to have it for scrollable frames. Again, one sees that some logic is shared for asynchronous scrolling on macOS (WebCore::ScrollingCoordinatorMac) and iOS (WebCore::ScrollingCoordinatorIOS) in ancestor classes. Indeed, our effort to make frames scrollable on iOS is also opening the possibility of asynchronous scrolling of frames on macOS, something that is currently not implemented.

    Class Hierarchy
    Figure 4: Video of this demo page on WebKit iOS with experimental patches to make frame scrollables (2017/07/10)

    Finally, some more work is necessary in the render classes (purple) to ensure that the layer hierarchies are correctly built. Patches have been uploaded and you can view the result on the video of Figure 4. Notice that this work has not been reviewed yet and there are known bugs, for example with overlapping elements (hit testing not implemented) or position: fixed elements.

    Various other scrolling bugs were reported, analyzed and sometimes fixed by Apple. The switch from overflow nodes to scrollable iframes is unlikely to address them. For example, the “Find Text” operation in iOS has advanced features done by the UI process (highlight, smart magnification) but the scrolling operation needed only works for the main frame. It looks like this could be fixed by unifying a bit the scrolling code path with macOS. There are also several jump and flickering bugs with position: fixed nodes. Finally, Apple fixed inconsistent scrolling inertia used for the main frame and the one used for inner scrollable nodes by making the former the same as the latter.

    Root Scroller

    The CSSOM View specification extends the DOM element with some scrolling properties. That specification indicates that the element to consider to scroll the main view is document.body in quirks mode while it is document.documentElement in no-quirks mode. This is the behavior that has always been followed by browsers like Firefox or Interner Explorer. However, WebKit-based browsers always treat document.body as the root scroller. This interoperability issue has been a big problem for web developers. One convenient workaround was to introduce the document.scrollingElement which returns the element to use for scrolling the main view (document.body or document.documentElement) and was recently implemented in WebKit. Use this test page to verify whether your browser supports the document.scrollingElement property and which DOM element is used to scroll the main view in no-quirks mode.

    Nevertheless, this does not solve the issue with existing web pages. Chromium’s Web Platform Predictability team has made a huge communication effort with Web authors and developers which has drastically reduced the use of document.body in no-quirks mode. For instance, Chromium’s telemetry on Figure 3 indicates that the percentage of document.body.scrollTop in no-quirks pages has gone from 18% down to 0.0003% during the past three years. Hence the Chromium team is now considering shipping the standard behavior.

    UseCounter for ScrollTopBodyNotQuirksMode
    Figure 3: Use of document.body.scrollTop in no-quirks mode over time (Chromium's UseCounter)

    In WebKit, the issue has been known for a long time and an old attempt to fix it was reverted for causing regressions. For now, we imported the CSSOM View tests and just marked the one related to the scrolling element as failing. An analysis of the situation has been left on WebKit’s bug; Depending on how things evolve on Chromium’s side we could consider the discussion and implementation work in WebKit.

    Related to that work, a new API is being proposed to set the root scroller to an arbitrary scrolling element, giving more flexibility to authors of Web applications. Today, this is unfortunately not possible without losing some of the special features of the main view (e.g. on iOS, Safari’s URL bar is hidden when scrolling the main view to maximize the screen space). Such API is currently being experimented in Chromium and we plan to investigate whether this can be implemented in WebKit too.

    Conclusion

    In the past months, The AMP Project and Igalia have worked on analyzing some interoperability issue and fixing them in WebKit. Many improvements for frame sandboxing are going to be available soon. Significant progress has also been made for frame scrolling on iOS and collaboration continues with Apple reviewers to ensure that the work will be integrated in future versions of WebKit. Improvements to “root scrolling” are also being considered although they are pending on the evolution of the issues on Chromium’s side. All these efforts are expected to be useful for WebKit users and the Web platform in general.

    Igalia Logo
    AMP Logo

    Last but not least, I would like to thank Apple engineers Simon Fraser, Chris Dumez, and Youenn Fablet for their reviews and help, as well as Google and the AMP team for supporting that project.

    August 29, 2017 10:00 PM

    August 24, 2017

    Eleni Maria Stea

    A terrain rendering approach (part 1)

    There are several methods to create and display a terrain, in real-time. In this post, I will explain the approach I followed on the demo I’m writing for my work at Igalia. Some work is still in progress.

    The terrain had to meet the following requirements:

    • its size should be arbitrary
    • parts outside the viewer’s field of view should be culled

    Parameters that describe the terrain

    For reasons that will become obvious when I explain the terrain generation and drawing, I decided to use a heightfield made by tiles and I used the following parameters to generate it:

    /* parameters needed in terrain generation */
    
    struct TerrainParams {
        float xsz; /* terrain size in x axis */
        float ysz; /* terrain size in y axis */
        float max_height; /* max height of the heightfield */
        int xtiles; /* number of tiles in x axis */
        int ytiles; /* number of tiles in y axis */
        int tile_usub;
        int tile_vsub;
        int num_octaves; /* Perlin noise sums */
        float noise_freq; /* Perlin noise scaling factor */
        Image coarse_heightmap; /* mask for low detail heightmap */
    };

    Let’s explain them a little bit:

    Imagine the terrain as a xsz * ysz grid of tiles where each tile is a subdivided in usub, vsub smaller parts. Each grid point can have an “arbitrary” height (we ‘ll see later how we calculate it) that cannot exceed the maximum height: max_height. The number of tiles in each terrain axis is xtiles for the x-axis and ytiles for the y-axis (that is practically the z-axis of out 3-D space). The variables tile_usub and tile_vsub show the number of subdivisions of each tile.

    wireframe terrain
    Image 1: a wireframe terrain

    Note that in general, I use the u, v notation in normalized spaces and the x, y for the world space.

    The variables num_octaves and noise_freq and the coarse_heightmap image are used to calculate the heights in different terrain points and will be explained later.

    Generating the geometry, applying the textures

    We ‘ve already seen that the terrain is a grid of xsz * ysz with xtiles * ytiles tiles, that are subdivided by tile_usub in the x-axis and by tile_vsub in the z-axis. In order to generate a height at every point of this grid, I needed to calculate uniformly distributed random values, like those of Perlin Noise (PN). But I also needed some higher distortions for “mountains” and “hills” here and there that cannot be simulated with PN. For them I used a function that calculates the sum of some Perlin Noise frequencies and results to a fractal-like heightfield. The number of  the PN sum octaves and the frequency (num_octaves, noise_freq) can be customized depending on how much distortion we want.

    Having a terrain with spreaded random-looking mountains is nice, but it would be much better if we could further customize its appearence to become more suitable for our scene. For example, I wanted to place a number of objects in the middle of the terrain, so I wanted it to look more flat in the center. Also, I wanted to put some high mountains at the edges to hide some skybox parts (for example mountains and buildings that were part of the skybox texture and looked ugly). In order to modify the terrain shape, I used a grayscale image, the coarse_heightmap, as a mask and I multiplied the terrain height values with the mask intensities (I’ll explain this better later). The mask was like the image below:

    terrain coarse heightmap
    Image 2: terrain coarse heightmap

    Since the heightmap is black at the center (=> the itensity values there are close to 0) the height in the middle will be low and the terrain will look flat, whereas at the edges, where the intensity takes its maximum values, the height values will have a value close to their original height. If you get a more careful look at Image 1 and Image 2 you will understand better the relationship between the terrain heights and the mask intensities.

    That’s the function that generates each tile’s heightfield by calculating the height by taking the sum  of the Perlin noise frequencies for each tile:

    float Terrain::get_height(float u, float v) const
    {
        float sn = gph::fbm(u * params.noise_freq, 
                   v * params.noise_freq,
                   params.num_octaves);
        sn = sn * 0.5 + 0.5;
    
        if(params.coarse_heightmap.pixels) {
            Vec4 texel = params.coarse_heightmap.lookup_linear(u, v,
                         1.0 / params.tile_usub, 
                         1.0 / params.tile_vsub);
            sn *= texel.x;
        }
        return sn;
    }

    Note that the function performs the calculations in u-v space. I found it convenient to write a similar function for the world space to use it in other places.

    Now, let’s see how we use the coarse_heightmap on top of that:

    The coarse_heightmap image (Image 2) has a very low resolution of 128x128 which means that if we just lookup the closest pixel value for each terrain point, and the terrain is big, many terrain points will map to the same pixel and we’ll start seeing aliasing. To avoid this artifact, I used bilinear interpolation among the pixel’s neighboring pixels by taking into account the distance of the terrain point from each pixel of the neighborhood.

    terrain before optimisations
    Image 3: terrain and skybox

    The following video is a preview of this early stage of the demo where you can see the terrain from different views (as well as a cow called Spot :P):

    Improvements:

    fog:

    A good trick to make the terrain edges appear more distant that they really are and have a more realistic horizon, is to add some fog that is more dense in the more distant terrain points and less dense at the points that are close to the viewer. The fog can be simulated by interpolating the terrain color at each pixel with a color that looks like the sky (I used a light blue here). The interpolation factor must be a function of the distance between the point and the viewer. I used the following function:

    e(-density * distance)

    (taken by the glFog OpenGL manpage). I hard-coded the density to a value that looked good and I used the -pos.z as the distance where pos is the vertex position in the View space.

    Image 4 shows the result of this operation:

    terrain with fog
    Image 4: the terrain with the fog and a cow 😉
    iMage based LIGHTING:

    The idea behind the IBL is that instead of using point lights with a standard position in our 3D world, we can calculate the lighting as the radiance coming from each skybox pixel. For this purpose, we need an irradiance map. I avoided the overhead of calculating one by using this nice tool I’ve found on github: cmft to pre-calculate it from the images of my skybox. Then, I used the normal direction at each terrain point (in world space coordinates) to sample the map and I calculated the color inside the pixel shader:

    vec4 itexel = textureCube(dstex, normalize(world_normal));
    vec3 object_color = diffuse.xyz * texel.xyz * itexel.xyz;

    As you can guess the world_normal is the normal in world space (modelling transformation) and the object_color the color we get if we multiply our diffuse color from the material with the texel value (from the terrain texture) and the irradiance map texel value (itexel). The result of this operation can be seen in this video:

    TODO LIST

    Although the terrain looks better with these additions, I think that it can be further improved if I add the following things:

    1. Multiple textures for different parts of the terrain
    2. (Maybe) specular color calculated by the irradiance map (since atm I have support for diffuse only)
    Performance optimisations:

    I also want to optimize its performance by adding:

    1. View Frustrum Culling:  The idea is that we only draw the terrain tiles that are visible (part of the view frustrum).
    2. Tessellation shaders: use TC, TE shaders to improve the terrain tesselation.

    Some of the above TODOs are in progress, some are still in my wishlist but all of them all TL;DR to be analyzed in this post anyway.

    I will post about these additions as soon as I finish them, stay tunned! 😉

    by hikiko at August 24, 2017 07:59 AM

    August 20, 2017

    Hyunjun Ko

    Support GstContext for VA-API elements

    Since I started working on gstreamer-vaapi, one of what’s disappointing me is that vaapisink is not so popular even though it should be the best choice on vaapi installed machine. There are some reasonable causes and one of the reasons is probably it doesn’t provide a convinient way to be integrated for application developers.

    Until now, we provided a way to set X11 window handle by gst_video_overlay_set_window_handle, which is to tell the overlay to display video output to a specific window. But this is not enough since VA and X11 Display handle is managed internally inside gstreamer-vaapi elements, which means that users can’t handle them by themselves.

    In short, there was no way to share display handle created by application. Also we have some additional problems due to this issue as the following.

    • If users want to handle multiple display seperatedly, it can’t be possible. bug 754820
    • If users run multiple decoding pipelines with vaapisink, performance is down critically since there’s some locks in each vaapisink with same VADisplay. bug 747946

    Recently we have merged a series of patches to provide a way to set external VA Display and X11 Display from application via GstContext. GstContext provides a way of sharing not only between elements but also with the application using queries and messages. (For more details, see https://developer.gnome.org/gstreamer/stable/gstreamer-GstContext.html)

    By these patches, application can set its own VA Display and X11 Display to VA-API elements as the following:

    • Create VADisplay instance by vaGetDisplay, it doesn’t need to be initialized at startup and terminated at endup.
    • Call gst_element_set_context with the context to which each display instance is set.

    Example: sharing an VADisplay and X11 display with the bus callback, this is almost same as other examples using GstContext.

    static GstBusSyncReply
    bus_sync_handler (GstBus * bus, GstMessage * msg, gpointer data)
    {
      switch (GST_MESSAGE_TYPE (msg)) {
        case GST_MESSAGE_NEED_CONTEXT:{
          const gchar *context_type;
          GstContext *context;
          GstStructure *s;
          VADisplay va_display;
          Display *x11_display;
    
          gst_message_parse_context_type (msg, &context_type);
          gst_println ("Got need context %s from %s", context_type,
              GST_MESSAGE_SRC_NAME (msg));
    
          if (g_strcmp0 (context_type, "gst.vaapi.app.Display") != 0)
            break;
    
          x11_display = /* Get X11 Display somehow */
          va_display = vaGetDisplay (x11_display);
    
          context = gst_context_new ("gst.vaapi.app.Display", TRUE);
          s = gst_context_writable_structure (context);
          gst_structure_set (s, "va-display", G_TYPE_POINTER, va_display, NULL);
          gst_structure_set (s, "x11-display", G_TYPE_POINTER, x11_display, NULL);
    
          gst_element_set_context (GST_ELEMENT (GST_MESSAGE_SRC (msg)), context);
          gst_context_unref (context);
          break;
        }   
        default:
          break;
      }
    
      return GST_BUS_PASS;
    }

    Also you can find the entire example code here.

    Furthermore, we know we need to support Wayland for this feature. See bug 705821. There’s already some pending patches but they need to be rebased and modified based on the current way. I’ll be working on this in the near future.

    We really want to test this feature more especially in practical cases until next release. I’d appreciate if someone reports any bug or issue and I promise I’d focus on it precisely.

    Thanks for reading!

    August 20, 2017 03:00 PM