Planet Igalia

February 18, 2019

Neil Roberts

VkRunner at FOSDEM

I attended FOSDEM again this year thanks to funding from Igalia. This time I gave a talk about VkRunner in the graphics dev room. It’s now available on Igalia’s YouTube channel below:

I thought this might be a good opportunity to give a small status update of what has happened since my last blog post nearly a year ago.

Test suite integration

The biggest news is that VkRunner is now integrated into Khronos’ Vulkan CTS test suite and Mesa’s Piglit test suite. This means that if you work on a feature or a bugfix in your Vulkan driver and you want to make sure it doesn’t get regressed, it’s now really easy to add a VkRunner test for it and have it collected in one of these test suites. For Piglit all that is needed is to give the test script a .vk_shader_test extension and drop it anywhere under the tests/vulkan folder and it will automatically be picked up by the Piglit framework. As an added bonus, these tests are also run automatically on Intel’s CI system, so if your test is related to i965 in Mesa you can be sure it will not be regressed.

On the Khronos CTS side the integration is currently a little less simple. Along with help from Samuel Iglesias, we have merged a branch into master that lays the groundwork for adding VkRunner tests. Currently there are only proof-of-concept tests to show how the tests could work. Adding more tests still requires tweaking the C++ code so it’s not quite as simple as we might hope.

API

When VkRunner is built, in now also builds a static library containing a public API. This can be used to integrate VkRunner into a larger test suite. Indeed, the Khronos CTS integration takes advantage of this to execute the tests using the VkDevice created by the test suite itself. This also means it can execute multiple tests quickly without having to fork an external process.

The API is intended to be very highlevel and is as close to possible as just having simple functions to ask VkRunner to execute a test script and return an enum reporting whether the test succeeded or not. There is an example of its usage in the README.

Precompiled shader scripts

One of the concerns raised when integrating VkRunner into CTS is that it’s not ideal to have to run glslang as an external process in order to compile the shaders in the scripts to SPIR-V. To work around this, I added the ability to have scripts with binary shaders. In this case the 32-bit integer numbers of the compiled SPIR-V are just listed in ASCII in the shader test instead of the GLSL source. Of course writing this by hand would be a pain, so the VkRunner repo includes a Python script to precompile a bunch of shaders in a batch. This can be really useful to run the tests on an embedded device where installing glslang isn’t practical.

However, in the end for the CTS integration we took a different approach. The CTS suite already has a mechanism to precompile all of the shaders for all tests. We wanted to take advantage of this also when compiling the shaders from VkRunner tests. To make this work, Samuel added some functions to the VkRunner API to query the GLSL in a VkRunner shader script and then replace them with binary equivalents. That way the CTS suite can use these functions to replace the shaders with its cached compiled versions.

UBOs, SSBOs and compute shaders

One of the biggest missing features mentioned in my last post was UBO and SSBO support. This has now been fixed with full support for setting values in UBOs and SSBOs and also probing the results of writing to SSBOs. Probing SSBOs is particularily useful alongside another added feature: compute shaders. Thanks to this we can run our shaders as compute shaders to calculate some results into an SSBO and probe the buffer to see whether it worked correctly. Here is an example script to show how that might look:

[compute shader]
#version 450

/* UBO input containing an array of vec3s */
layout(binding = 0) uniform inputs {
        vec3 input_values[4];
};

/* A matrix to apply to these values. This is stored in a push
 * constant. */
layout(push_constant) uniform transforms {
        mat3 transform;
};

/* An SSBO to store the results */
layout(binding = 1) buffer outputs {
        vec3 output_values[];
};

void
main()
{
        uint i = gl_WorkGroupID.x;

        /* Transform one of the inputs */
        output_values[i] = transform * input_values[i];
}

[test]
# Set some input values in the UBO
ubo 0 subdata vec3 0 \
  3 4 5 \
  1 2 3 \
  1.2 3.4 5.6 \
  42 11 9

# Create the SSBO
ssbo 1 1024

# Store a matrix uniform to swap the x and y
# components of the inputs
push mat3 0 \
  0 1 0 \
  1 0 0 \
  0 0 1

# Run the compute shader with one instance
# for each input
compute 4 1 1

# Check that we got the expected results in the SSBO
probe ssbo vec3 1 0 ~= \
  4 3 5 \
  2 1 3 \
  3.4 1.2 5.6 \
  11 42 9

Extensions in the requirements section

The requirements section can now contain the name of any extension. If this is done then VkRunner will check for the availability of the extension when creating the device and enable it. Otherwise it will report that the test was skipped. A lot of the Vulkan extensions also add an extended features struct to be used when creating the device. These features can also be queried and enabled for extentions that VkRunner knows about simply by listing the name of the feature in that struct. For example if shaderFloat16 in listed in the requirements section, VkRunner will check for the VK_KHR_shader_float16_int8 extension and the shaderFloat16 feature within its extended feature struct. This makes it really easy to test optional features.

Cross-platform support

I spent a fair bit of time making sure VkRunner works on Windows including compiling with Visual Studio. The build files have been converted to CMake which makes building on Windows even easier. It also compiles for Android thanks to patches from Jaebaek Seo. The repo contains Android build files to build the library and the vkrunner executable. This can be run directly on a device using adb.

User interface

There is a branch containing the beginnings of a user interface for editing VkRunner scripts. It presents an editor widget via GTK and continuously runs the test script in the background as you are editing it. It then displays the results in an image and reports any errors in a text field. The test is run in a separate process so that if it crashes it doesn’t bring down the user interface. I’m not sure whether it makes sense to merge this branch into master, but in the meantime it can be a convenient way to fiddle with a test when it fails and it’s not obvious why.

And more…

Lots of other work has been going on in the background. The best way to get to more details on what VkRunner can do is to take a look at the README. This has been kept up-to-date as the source of documentation for writing scripts.

by nroberts at February 18, 2019 05:23 PM

February 17, 2019

Eleni Maria Stea

i965: Improved support for the ETC/EAC formats on Intel Gen 7 and previous GPUs

This post is about a recent contribution I’ve done to the i965 mesa driver to improve the emulation of the ETC/EAC texture formats on the Intel Gen 7 and older GPUs, as part of my work for the Igalia‘s graphics team. Demo: The video mostly shows the behavior of some GL calls and operations with … Continue reading i965: Improved support for the ETC/EAC formats on Intel Gen 7 and previous GPUs

by hikiko at February 17, 2019 04:45 PM

February 13, 2019

Víctor Jáquez

Generating a GStreamer-1.14 bundle for TravisCI with Ubuntu/Trusty

For having continous integration in your multimedia project hosted in GitHub with TravisCI, you may want to compile and run tests with a recent version of GStreamer. Nonetheless, TravisCI mainly offers Ubuntu Trusty as one of the possible distributions to deploy in their CI, and that distribution packages GStreamer 1.2, which might be a bit old for your project’s requirements.

A solution for this problem is to provide to TravisCI your own GStreamer bundle with the version you want to compile and test on you project, in this case 1.14. The present blog is recipe I followed to generate that GStreamer bundle with GstGL support.

There are three main issues:

  1. The packaged libglib version is too old, hoping that we will not find an ABI breakage while running the CI.
  2. The packaged ffmpeg version is too old
  3. As we want to compile GStreamer using gst-build, we need a recent version of meson, which requires python3.5, not available in Trusty.

schroot

Old habits die hard, and I have used schroot for handle chroot environments without complains, it handles the bind mounting of /proc, /sys and all that repetitive stuff that seals the isolation of the chrooted environment.

The debootstrap’s variant I use is buildd because it installs the build-essential package.

$ sudo mkdir /srv/chroot/gst-trusty64
$ sudo debootstrap --arch=amd64 --variant=buildd trusty ./gst-trusty64/ http://archive.ubuntu.com/ubuntu
$ sudo vim /etc/schroot/chroot.d

This is the schroot configuration I will use. Please, adapt it to your need.

[gst]
description=Ubuntu Trusty 64-bit for GStreamer
directory=/srv/chroot/gst-trusty64
type=directory
users=vjaquez
root-users=vjaquez
profile=default
setup.fstab=default/vjaquez-home.fstab

I am overrinding the fstab default file for a custom one where the home directory of vjaquez user aims to a clean directory.

$ mkdir -p ~/home-chroot/gst
$ sudo vim /etc/schroot/default/vjaquez-home.fstab
# fstab: static file system information for chroots.
# Note that the mount point will be prefixed by the chroot path
# (CHROOT_PATH)
#
#                
/proc           /proc           none    rw,bind         0       0
/sys            /sys            none    rw,bind         0       0
/dev            /dev            none    rw,bind         0       0
/dev/pts        /dev/pts        none    rw,bind         0       0
/home           /home           none    rw,bind         0       0
/home/vjaquez/home-chroot/gst   /home/vjaquez   none    rw,bind 0       0
/tmp            /tmp            none    rw,bind         0       0

configure chroot environment

We will get into the chroot environment as super user in order to add the required packages. For that pupose we add universe repository in apt.

  • libglib requires: autotools-dev gnome-pkg-tools libtool libffi-dev libelf-dev libpcre3-dev desktop-file-utils libselinux1-dev libgamin-dev dbus dbus-x11 shared-mime-info libxml2-utils
  • Python requires: libssl-dev libreadline-dev libsqlite3-dev
  • GStreamer requires: bison flex yasm python3-pip libasound2-dev libbz2-dev libcap-dev libdrm-dev libegl1-mesa-dev libfaad-dev libgl1-mesa-dev libgles2-mesa-dev libgmp-dev libgsl0-dev libjpeg-dev libmms-dev libmpg123-dev libogg-dev libopus-dev liborc-0.4-dev libpango1.0-dev libpng-dev libpulse-dev librtmp-dev libtheora-dev libtwolame-dev libvorbis-dev libvpx-dev libwebp-dev pkg-config unzip zlib1g-dev
  • And for general setup: language-pack-en ccache git curl
$ schroot --user root --chroot gst
(gst)# sed -i "s/main$/main universe/g" /etc/apt/sources.list
(gst)# apt update
(gst)# apt upgrade
(gst)# apt --no-install-recommends --no-install-suggests install \
autotools-dev gnome-pkg-tools libtool libffi-dev libelf-dev \
libpcre3-dev desktop-file-utils libselinux1-dev libgamin-dev dbus \
dbus-x11 shared-mime-info libxml2-utils \
libssl-dev libreadline-dev libsqlite3-dev \ 
language-pack-en ccache git curl bison flex yasm python3-pip \
libasound2-dev libbz2-dev libcap-dev libdrm-dev libegl1-mesa-dev \
libfaad-dev libgl1-mesa-dev libgles2-mesa-dev libgmp-dev libgsl0-dev \
libjpeg-dev libmms-dev libmpg123-dev libogg-dev libopus-dev \
liborc-0.4-dev libpango1.0-dev libpng-dev libpulse-dev librtmp-dev \
libtheora-dev libtwolame-dev libvorbis-dev libvpx-dev libwebp-dev \
pkg-config unzip zlib1g-dev

Finally we create our installation prefix. In this case /opt/gst to avoid the contamination of /usr/local and logout as root.

(gst)# mkdir -p /opt/gst
(gst)# chown vjaquez /opt/gst
(gst)# exit

compile ffmpeg 3.2

Now, let’s login again, but as the unprivileged user, to build the bundle, starting with ffmpeg. Notice that we are using ccache and building out-of-source.

$ schroot --chroot gst
(gst)$ git clone https://git.ffmpeg.org/ffmpeg.git ffmpeg
(gst)$ cd ffmpeg
(gst)$ git checkout -b work n3.2.12
(gst)$ mkdir build
(gst)$ cd build
(gst)$ ../configure --disable-static --enable-shared \
--disable-programs --enable-pic --disable-doc --prefix=/opt/gst 
(gst)$ PATH=/usr/lib/ccache/:${PATH} make -j8 install

compile glib 2.48

(gst)$ cd ~
(gst)$ git clone https://gitlab.gnome.org/GNOME/glib.git
(gst)$ cd glib
(gst)$ git checkout -b work origin/glib-2-48
(gst)$ mkdir mybuild
(gst)$ cd mybuild
(gst)$ ../autogen.sh --prefix=/opt/gst
(gst)$ PATH=/usr/lib/ccache/:${PATH} make -j8 install

install Python 3.5

Pyenv is a project that allows the automation of installing and executing, in the user home directory, multiple versions of Python.

(gst)$ curl -L https://github.com/pyenv/pyenv-installer/raw/master/bin/pyenv-installer | bash
(gst)$ ~/.pyenv/bin/pyenv install 3.5.0

Install meson 0.50

We will install the last available version of meson in the user home directory, that is why PATH is extended and exported.

(gst)$ cd ~
(gst)$ ~/.pyenv/verion/3.5.0/pip3 install --user meson
(gst)$ export PATH=${HOME}/.local/bin:${PATH}

build GStreamer 1.14

PKG_CONFIG_PATH is exported to expose the compiled versions of ffmpeg and glib. Notice that the libraries are installed in /opt/lib in order to avoid the dispersion of pkg-config files.

(gst)$ cd ~/
(gst)$ export PKG_CONFIG_PATH=/opt/gst/lib/pkgconfig/
(gst)$ git clone https://gitlab.freedesktop.org/gstreamer/gst-build.git
(gst)$ cd gst-build
(gst)$ git checkout -b work origin/1.14
(gst)$ meson -Denable_python=false \
-Ddisable_gst_libav=false -Ddisable_gst_plugins_ugly=true \
-Ddisable_gst_plugins_bad=false -Ddisable_gst_devtools=true \
-Ddisable_gst_editing_services=true -Ddisable_rtsp_server=true \
-Ddisable_gst_omx=true -Ddisable_gstreamer_vaapi=true \
-Ddisable_gstreamer_sharp=true -Ddisable_introspection=true \
--prefix=/opt/gst build --libdir=lib
(gst)$ ninja -C build install

test!

(gst)$ cd ~/
(gst)$ LD_LIBRARY_PATH=/opt/gst/lib \
GST_PLUGIN_SYSTEM_PATH=/opt/gst/lib/gstreamer-1.0/ \
/opt/gst/bin/gst-inspect-1.0

And the list of available elemente shall be shown.

archive the bundle

(gst)$ cd ~/
(gst)$ tar zpcvf gstreamer-1.14-x86_64-linux-gnu.tar.gz -C /opt ./gst

update your .travis.yml

These are the packages you shall add to run this generated GStreamer bundle:

  • libasound2-plugins
  • libfaad2
  • libfftw3-single3
  • libjack-jackd2-0
  • libmms0
  • libmpg123-0
  • libopus0
  • liborc-0.4-0
  • libpulsedsp
  • libsamplerate0
  • libspeexdsp1
  • libtdb1
  • libtheora0
  • libtwolame0
  • libwayland-egl1-mesa
  • libwebp5
  • libwebrtc-audio-processing-0
  • liborc-0.4-dev
  • pulseaudio
  • pulseaudio-utils

And this is the before_install and before_script targets:

      before_install:
        - curl -L http://server.example/gstreamer-1.14-x86_64-linux-gnu.tar.gz | tar xz
        - sed -i "s;prefix=/opt/gst;prefix=$PWD/gst;g" $PWD/gst/lib/pkgconfig/*.pc
        - export PKG_CONFIG_PATH=$PWD/gst/lib/pkgconfig
        - export GST_PLUGIN_SYSTEM_PATH=$PWD/gst/lib/gstreamer-1.0
        - export GST_PLUGIN_SCANNER=$PWD/gst/libexec/gstreamer-1.0/gst-plugin-scanner
        - export PATH=$PATH:$PWD/gst/bin
        - export LD_LIBRARY_PATH=$PWD/gst/lib:$LD_LIBRARY_PATH

      before_script:
        - pulseaudio --start
        - gst-inspect-1.0 | grep Total

by vjaquez at February 13, 2019 07:43 PM

February 11, 2019

Víctor Jáquez

Review of Igalia’s Multimedia Activities (2018/H2)

This is the first semiyearly report about Igalia’s activities around multimedia, covering the second half of 2018.

Great length of this report was exposed in Phil’s talk surveying mutimedia development in WebKitGTK and WPE:

WebKit Media Source Extensions (MSE)

MSE is a specification that allows JS to generate media streams for playback for Web browsers that support HTML 5 video and audio.

Last semester we upstreamed the support to WebM format in WebKitGTK with the related patches in GStreamer, particularly in qtdemux, matroskademux elements.

WebKit Encrypted Media Extensions (EME)

EME is a specification for enabling playback of encrypted content in Web bowsers that support HTML 5 video.

In a downstream project for WPE WebKit we managed to have almost full test coverage in the YoutubeTV 2018 test suite.

We merged our contributions in upstream, WebKit and GStreamer, most of what is legal to publish, for example, making demuxers aware of encrypted content and make them to send protection events with the initialization data and the encrypted caps, in order to select later the decryption key.

We started to coordinate the upstreaming process of a new implementation of CDM (Content Decryption Module) abstraction and there will be even changes in that abstraction.

Lighting talk about EME implementation in WPE/WebKitGTK in GStreamer Conference 2018.

WebKit WebRTC

WebRTC consists of several interrelated APIs and real time protocols to enable Web applications and sites to captures audio, or A/V streams, and exchange them between browsers without requiring an intermediary.

We added GStreamer interfaces to LibWebRTC, to use it for the network part, while using GStreamer for the media capture and processing. All that was upstreamed in 2018 H2.

Thibault described thoroughly the tasks done for this achievement.

Talk about WebRTC implementation in WPE/WebKitGTK in WebEngines hackfest 2018.

Servo/media

Servo is a browser engine written in Rust designed for high parallelization and high GPU usage.

We added basic support for <video> and <audio> media elements in Servo. Later on, we added the GstreamerGL bindings for Rust in gstreamer-rs to render GL textures from the GStreamer pipeline in Servo.

Lighting talk in the GStreamer Conference 2018.

GstWPE

Taking an idea from the GStreamer Conference, we developed a GStreamer source element that wraps WPE. With this source element, it is possible to blend a web page and video in a single video stream; that is, the output of a Web browser (to say, a rendered web page) is used as a video source of a GStreamer pipeline: GstWPE. The element is already merged in the gst-plugins-bad repository.

Talk about GstWPE in FOSDEM 2019

Demo #1

Demo #2

GStreamer VA-API and gst-MSDK

At last, but not the least, we continued helping with the maintenance of GStreamer-VAAPI and gst-msdk, with code reviewing and on-going migration of the internal library to GObject.

Other activities

The second half of 2018 was also intense in terms of conferences and hackfest for the team:


Thanks to bear with us along all this blog post and to keeping under your radar our work.

by vjaquez at February 11, 2019 12:52 PM

February 10, 2019

Manuel Rego

Summary of a week in Lyon for TPAC 2018

Past October Igalia participated on TPAC 2018 with 12 people, I believe it was the biggest presence of igalians in this event ever, probably because of proximity to many of us (as it happened in Lyon) but also reflecting our increasing presence on the web platform ecosystem.

Apart from TPAC itself, Igalia also participated on the W3C Developers Meetup that happened the very same week, where I gave a talk about how to contribute to CSS (more about that later).

Igalia booth at TPAC

In the Igalia booth we were showcasing some of our last developments with different demos running on embedded devices, in which you could find our more recent work around the web platform (WebRTC, MSE, CSS Grid Layout, CSS Box Alignment, MathML, etc.). These demos were using WPE a WebKit port optimized for low-end platforms developed by Igalia.

Igalia booth at TPAC 2018 Igalia booth at TPAC 2018

Be part of CSS evolution

As I mentioned in the introduction, I gave a talk in the W3C Developers Meetup. My talk was called “Be part of CSS evolution” and it tried to explain how the CSS Working Group works and also how anyone can have a direct impact on the development of CSS specifications by raising issues, providing feedback, explaining use cases, etc.

The slides of the talk can be found on this blog and the video has been recently published in Vimeo.

Video of my talk “Be part of CSS evolution

MathML

Most of the people that came to our booth asked about this topic, it’s clear there are a lot of people interested in MathML. As you might already know Igalia has been looking for funding to implement MathML in Chromium and during TPAC we got the confirmation that NISO will be sponsoring an important part of this work.

At TPAC there were several concerns about the future of MathML, and a TAG review was requested just after the conference. Igalia has been in conversations with many people since TPAC: TAG members, Google engineers and folks interested on reviving MathML specification. Past month a new MathML Refresh Community Group has been created and TAG review has been closed with a positive answer regarding the future of MathML.

On top of the specs work, Chromium implementation is in progress and more news will be released soon at mathml.igalia.com about the status of things. If your company would like to support MathML please don’t hesitate to contact us. Stay tuned!

Other

In my case I was attending CSS Working Group and Houdini Task Force meetings, it’s always a pleasure to share a room with such amount of brilliant people working hard on defining the future of CSS. Several people were quite interested about the work Igalia has been recently doing around CSS Containment specification (more info in my previous post). It seems this spec has some potential to become relevant regarding web rendering performance.

Apart from that, there were a bunch of interesting breakout sessions on the technical plenary day. I’d like to highlight the one given by fantasai and Marcos Cáceres about Spec Editing Best Practices, it was really interesting to understand how both write specs to make things easier for people reading them.

Spec Editing Best Practices notes by fantasai Spec Editing Best Practices” notes by fantasai

Last but not least, despite being most of the day at TPAC we found some time to enjoy Lyon during dinners at night, it looks a nice city.

February 10, 2019 11:00 PM

January 30, 2019

Samuel Iglesias

VkRunner is integrated into VK-GL-CTS and piglit

One of the greatest features from piglit was the easy development of OpenGL tests based on GLSL shaders plus some simple commands through shader_runner command. I even wrote about it.

However, Vulkan ecosystem was missing a tool like that but for SPIR-V shader tests… until last year!

Vulkan Logo

VkRunner is a tool written by Neil Roberts, which is very inspired on shader_runner. VkRunner was the result of the Igalia work to enable ARB_gl_spirv extension for Intel’s i965 driver on Mesa, where there was a need to test driver’s code against a good number of shaders to be sure that it was fine.

VkRunner uses a script language to define the requirements needed to run the test, such as the needed extension and features, the shaders to be run and a series of commands to run it. It will then parse everything and execute the equivalent Vulkan commands to do so under the hood, like shader_runner did for OpenGL in piglit.

This is an example of how a Vkrunner looks like:

[compute shader]
#version 450

layout(std140, push_constant) uniform push_constants {
        float in_value;
};

layout(std140, binding = 0) buffer ssbo {
        float out_value;
};

void
main()
{
        out_value = sqrt(in_value);
}

[test]
# Allocate an ssbo big enough for a float at binding 0
ssbo 0 4

# Set the push constant as an input value
uniform float 0 4

compute 1 1 1

# Probe that we got the expected value
tolerance 0.00006% 0.00006% 0.00006% 0.00006%
probe ssbo float 0 0 ~= 2

The end of 2018 was great for VkRunner! First, the tool was integrated into piglit so we can now use it in this amazing open-source testing suite for 3D graphics drivers. Soon after, it was integrated into Khronos Group’s Vulkan and OpenGL Conformance Test Suite (see commit), which will help contributors to easily write SPIR-V tests on Vulkan.

If you want learn more about VkRunner, apart from browsing the repository, Neil wrote a nice blogpost explaining the tool basics, gave a lightning talk at XDC 2018 (slides) in A Coruña and now, he is going to give a talk in the graphics devroom at FOSDEM 2019! You can follow his FOSDEM talk on Saturday via live-stream (or see the recording afterwards) in case you are not going to FOSDEM this year :-)

Igalia

FOSDEM 2019

January 30, 2019 07:00 AM

January 29, 2019

Mario Sanchez Prada

Working on the Chromium Servicification Project

It’s been a few months already since I (re)joined Igalia as part of its Chromium team and I couldn’t be happier about it: right since the very first day, I felt perfectly integrated as part of the team that I’d be part of and quickly started making my way through the -fully upstream- project that would keep me busy during the following months: the Chromium Servicification Project.

But what is this “Chromium servicification project“? Well, according to the Wiktionary the word “servicification” means, applied to computing, “the migration from monolithic legacy applications to service-based components and solutions”, which is exactly what this project is about: as described in the Chromium servicification project’s website, the whole purpose behind this idea is “to migrate the code base to a more modular, service-oriented architecture”, in order to “produce reusable and decoupled components while also reducing duplication”.

Doing so would not only make Chromium a more manageable project from a source code-related point of view and create better and more stable interfaces to embed chromium from different projects, but should also enable teams to experiment with new features by combining these services in different ways, as well as to ship different products based in Chromium without having to bundle the whole world just to provide a particular set of features. 

For instance, as Camille Lamy put it in the talk delivered (slides here) during the latest Web Engines Hackfest,  “it might be interesting long term that the user only downloads the bits of the app they need so, for instance, if you have a very low-end phone, support for VR is probably not very useful for you”. This is of course not the current status of things yet (right now everything is bundled into a big executable), but it’s still a good way to visualise where this idea of moving to a services-oriented architecture should take us in the long run.

With this in mind, the idea behind this project would be to work on the migration of the different parts of Chromium depending on those components that are being converted into services, which would be part of a “foundation” base layer providing the core services that any application, framework or runtime build on top of chromium would need.

As you can imagine, the whole idea of refactoring such an enormous code base like Chromium’s is daunting and a lot of work, especially considering that currently ongoing efforts can’t simply be stopped just to perform this migration, and that is where our focus is currently aimed at: we integrate with different teams from the Chromium project working on the migration of those components into services, and we make sure that the clients of their old APIs move away from them and use the new services’ APIs instead, while keeping everything running normally in the meantime.

At the beginning, we started working on the migration to the Network Service (which allows to run Chromium’s network stack even without a browser) and managed to get it shipped in Chromium Beta by early October already, which was a pretty big deal as far as I understand. In my particular case, that stage was a very short ride since such migration was nearly done by the time I joined Igalia, but still something worth mentioning due to the impact it had in the project, for extra context.

After that, our team started working on the migration of the Identity service, where the main idea is to encapsulate the functionality of accessing the user’s identities right through this service, so that one day this logic can be run outside of the browser process. One interesting bit about this migration is that this particular functionality (largely implemented inside the sign-in component) has historically been located quite high up in the stack, and yet it’s now being pushed all the way down into that “foundation” base layer, as a core service. That’s probably one of the factors contributing to making this migration quite complicated, but everyone involved is being very dedicated and has been very helpful so far, so I’m confident we’ll get there in a reasonable time frame.

If you’re curious enough, though, you can check this status report for the Identity service, where you can see the evolution of this particular migration, along with the impact our team had since we started working on this part, back on early October. There are more reports and more information in the mailing list for the Identity service, so feel free to check it out and/or subscribe there if you like.

One clarification is needed, tough: for now, the scope of this migrations is focused on using the public C++ APIs that such services expose (see //services/<service_name>/public/cpp), but in the long run the idea is that those services will also provide Mojo interfaces. That will enable using their functionality regardless of whether you’re running those services as part of the browser’s process, or inside their own & separate processes, which will then allow the flexibility that chromium will need to run smoothly and safely in different kind of environments, from the least constrained ones to others with a less favourable set of resources at their disposal.

And this is it for now, I think. I was really looking forward to writing a status update about what I’ve been up to in the past months and here it is, even though it’s not the shortest of all reports.

One last thing, though: as usual, I’m going to FOSDEM this year as well, along with a bunch of colleagues & friends from Igalia, so please feel free to drop me/us a line if you want to chat and/or hangout, either to talk about work-related matters or anything else really.

And, of course, I’d be also more than happy to talk about any of the open job positions at Igalia, should you consider applying. There are quite a few of them available at the moment for all kind of things (most of them available for remote work): from more technical roles such as graphicscompilersmultimedia, JavaScript engines, browsers (WebKitChromium, Web Platform) or systems administration (this one not available for remotes, though), to other less “hands-on” types of roles like developer advocatesales engineer or project manager, so it’s possible there’s something interesting for you if you’re considering to join such an special company like this one.

See you in FOSDEM!

by mario at January 29, 2019 06:35 PM

Samuel Iglesias

Improving my emacs setup

I need to start this post mentioning the reason I improved my Emacs setup after so many years with no change in it. Funny enough, I need to say thanks to Visual Studio Code :-D

Last month, I came across the blogpost “10 years of love for Emacs undone by a week of VSCode”. As I have been using Emacs for almost a decade, I wondered if that could be true… so I tried Visual Studio Code!

During my testing period, I found that IntelliSense worked like a charm, I love that “peek declaration” feature, it has a huge number of extensions that provide almost anything you want and it is well supported on GNU/Linux, including GDB and terminal support. This was the first Microsoft product in years that I considered it worth using every day :-O

Screenshot of Visual Studio Code

This experience made me think also what I was missing on my Emacs setup and I did not know before. I realized that I missed having multiple cursors, a good source code tag system, a modern theme (yes, why not?), markdown support and, if possible, a integrated way to check Pull Requests on Github and Merge Requests on Gitlab. I found a way to have everything in Emacs except the Gitlab’s Merge Requests integration, due to a failed installation of the gitlab package. Now I am much happier user of Emacs than one month ago, and I need to say thanks to Visual Studio Code :-P

In case you want to test it, I have pushed my emacs.d/ config to Github. Be aware this is not the final version… I plan to improve it in the future.

Screenshot of Emacs

January 29, 2019 07:00 AM

January 27, 2019

Adrián Pérez

Web Engines Hackfest 2018 → FOSDEM 2019

The last quarter of 2018 has been a quite hectic one, and every time I had some spare time after the Web Engines Hackfest the prospective of sitting down to write some thoughts about it seemed dreadful. Christmas went by already —two full weeks of holidays, practically without touching a computer— and suddenly I found myself booking tickets to this year's FOSDEM and it just hit me: it is about time to get back blogging!

FOSDEM

There is not much that I would want to add about FOSDEM, an event which I have attended a number of times before (and some others about which I have not even blogged). This is an event I always look forward to, and the one single reason that keeps me coming back is recharging my batteries.

This may seem contradictory because the event includes hundreds of talks and workshops tucked in just two days. Don't get me wrong, the event is physically tiresome, but there are always tons of new and exciting topics to learn about and many Free/Libre Software communities being represented, which means that there is a contagious vibe of enthusiasm. This makes me go back home with the will to do more.

Last but not least, FOSDEM is one of these rare events in which I get to meet many people who are dear to me — in some cases spontaneously, even without knowing we all would be attending. See you in Brussels!

Web Engines Hackfest

Like on previous years, the Web Engines Hackfest has been hosted by Igalia, in the lovely city of A Coruña. Every year the number of participants has been increasing, and we hit the mark of 70 people in the 2018 edition.

Are We GTK+4 Yet?

This time I was looking forward to figuring out how to bring WebKitGTK+ into the future, and in particular to GTK+4. We had a productive discussion with Benjamin Otte which helped a great deal to understand how the GTK+ scene graph works, and how to approach the migration to the new version of the toolkit in an incremental way. And he happens to be a fan of Factorio, too!

In its current incarnation the WebKitWebView widget needs to use Cairo as the final step to draw its contents, because that is how widgets work, while widgets in GTK+4 populate nodes of a scene graph with the contents they need to display. The “good” news is that it is possible to populate a render node using a Cairo surface, which will allow us to keep the current painting code. While it would be more optimal to avoid Cairo altogether and let WebKit paint using the GPU on textures that the scene graph would consume directly, I expect this to make the initial bringup more approachable, and allow building WebKitGTK+ both for GTK+3 and GTK+4 from the same code base. There will be room for improvements, but at least we expect performance to be on par with the current WebKitGTK+ releases running on GTK+3.

An ideal future: paint Web content in the GPU, feed textures to GTK+.

While not needing to modify our existing rendering pipeline should help, and probably just having the WebKitWebView display something on GTK+4 should not take that much effort, the migration will still be a major undertaking involving some major changes like switching input event handling to use GtkEventController, and it will not be precisely a walk in the park.

As of this writing, we are not (yet) actively working on supporting GTK+4, but rest assured that it will eventually happen. There are other ideas we have on the table to provide a working Web content rendering widget for GTK+4, but that will a the topic for another day.

The MSE Rush

At some point people decided that it would be a good idea to allow Web content to play videos, and thus the <video> and <audio> tags were born. All was good and swell until people wanted playback to adapt to different types of network connections and multiple screen resolutions (phones, tablets, cathode ray tubes, cinema projectors...). The “solution” is to serve video and audio in multiple small chunks of varying qualities, which are then chosen, downloaded, and stitched together while the content is being played back. Sci-fi? No: Media Source Extensions.

A few days before the hackfest it came to our attention that a popular video site stopped working with WebKitGTK+ and WPE WebKit. The culprit: The site started requiring MSE in some cases, without supporting a fallback anymore, our MSE implementation was disabled by default, and when enabled it showed a number of bugs which made it hardly possible to watch an entire video in one go.

During many the Web Engines Hackfest a few of us worked tirelessly, sometimes into the wee hours, to make MSE work well. We managed to crank out no less than two WebKitGTK+ releases (and one for WPE WebKit) which fixed most of the rough edges, making it possible to have MSE enabled and working.

And What Else?

To be completely honest, shipping the releases with a working MSE implementation made the hackfest pass in a blur and I cannot remember much else other than having a great time meeting everybody, and having many fascinating conversations — often around a table sharing good food. And that is already good motivation to attend again next year 😉

by aperez (adrian@perezdecastro.org) at January 27, 2019 09:45 PM

Eleni Maria Stea

Hair simulation with a mass-spring system (punk’s not dead!)

Hair rendering and simulation can be challenging, especially in real-time. There are many sophisticated algorithms for it (based on particle systems, hair mesh simulation, mass-spring systems and more) that can give very good results. But in this post, I will try to explain a simple and somehow hacky approach I followed in my first attempt to … Continue reading Hair simulation with a mass-spring system (punk’s not dead!)

by hikiko at January 27, 2019 08:34 PM

January 17, 2019

Gyuyoung Kim

The story of the webOS Chromium contribution over the past year

In this article, I share how I started webOS Chromium upstream, what webOS patches were contributed by LG Electronics, and how I’ve contributed to Chromium.

First, let’s briefly describe the history of the webOS. WebOS was created by Palm, Inc. Palm Inc. was acquired by HP in 2010 and HP made the platform open source, so it then became open webOS. In January 2014 the operation system was sold to LG Electronics. LG Electronics has been shipping the webOS for their TV and signage products since. LG Electronics has also been spreading the webOS to more of their products.

The webOS uses Chromium to run web applications. So, Chromium is a very important component in the webOS. As other Chromium embedders, the webOS also has many downstream patches. So, LG Electronics has tried to contribute own downstream patches to the Chromium open source project to reduce the effort to catch up to the latest Chromium version as well as to improve the quality of the downstream patches. As one of LG Electronics contractors for the last one and half years, I’ve started to work on the webOS Chromium contribution since September 2017. So, let’s start to explain the process of contributing:

1. The Corporate CLA

The Chromium project only accepts patches after the contributor signs a Contributor License Agreement (CLA). There are two kinds of CLA, one for individual and one corporate contributors. If a company signs the corporate CLA, then the individual contributors are exempt from signing an individual CLA, however, they must use their corporate email address as well as join the google group which was created when the corporate CLA was signed. LG Electronics signed up the corporate CLA and they were added to AUTHOR file.

  • Corporate Contributor License Agreement (Link)
  • Individual Contributor License Agreement (Link)

  • 2. List upstreamable patches in webOS

    After finishing the registration of the corporate CLA, I started to list up upstreamable webOS patches. It seemed to me that there were two categories in the patches. One was new features for the webOS. The other one was bug fixes. In the case of new features, the patches were mainly to improve the performance or to make LG products like TV and signage use less memory. I tried to list upstreamable patches among those patches. The patch criteria was either to improve the performance or obtain a benefit on the desktop. I thought this would allow owners to accept and merge the patch into the mainline more easily.

    3. What patches have been merged to Chromium mainline?

    Before uploading webOS patches, I merged the patches to replace deprecated WTF or base utilities (WTF::RefPtr, base::MakeUnique) with c++ standard things. I thought that it would be good to show that LG Electronics started to contribute to Chromium. After replacing all of them, I could start to contribute webOS patches in earnest. I’ve since merged webOS patches to reduce memory usage, release more used memory under OOM, add a new content API to suspend/resume DOM operation, and so on. Below is the list of the main patches I successfully merged.

    1. New content API to suspend/resume DOM operation
    2. Release more used memory under an out-of-memory situation
      • Note: According to the Chromium performance bot, the patches to reduce the memory usage in RenderThreadImpl::ClearMemory could reduce the memory usage until 2MB in the background.
  • Note: OnMemoryPressure listener was added to the compositor layers through these patches. So, the compositor has been releasing more used memory under OOM through the OOM handler. In my opinion, this is very good contribution from the webOS.
  • Introduce new command line switches for embedded devices
  • 4. Trace LG Electronics contribution stats

    As more webOS patches have been merged to Chromium mainline, I thought that it would be good if we run a tool to chase all LG Electronics Chromium contributions so that LG Electronics’s Chromium contribution efforts are well documented. To do this I set up the LG Electronics Chromium contribution stats using the GitStats tool. The tool has been generating the stats every day.

    I was happy to work on the webOS upstream project over the past year. It was challenging work because the downstream patch should show some benefits in Chromium mainline. I’m sure that LG Electronics will continue to keep contributing good patches to webOS and I hope they’re going to become a good partner as well as a contributor in Chromium.

    by gyuyoung at January 17, 2019 09:12 AM

    January 16, 2019

    Víctor Jáquez

    Rust bindings for GStreamerGL: Memoirs

    Rust is a great programming language but the community around it’s just amazing. Those are the ingredients for the craft of useful software tools, just like Servo, an experimental browser engine designed for tasks isolation and high parallelization.

    Both projects, Rust and Servo, are funded by ">">Mozilla.

    Thanks to Mozilla and Igalia I have the opportunity to work on Servo, adding it HTML5 multimedia features.

    First, with the help of Fernando Jiménez, we finished what my colleague Philippe Normand and Sebastian Dröge (one of my programming heroes) started: a media player in Rust designed to be integrated in Servo. This media player lives in its own crate: servo/media along with the WebAudio engine. A crate, in Rust jargon, is like a library. This crate is (very ad-hocly) designed to be multimedia framework agnostic, but the only backend right now is for GStreamer. Later we integrated it into Servo adding an initial support for audio and video tags.

    Currently, servo/media passes, through a IPC channel, the array with the whole frame to render in Servo. This implies, at least, one copy of the frame in memory, and we would like to avoid it.

    For painting and compositing the web content, Servo uses WebRender, a crate designed to use the GPU intensively. Thus, if instead of raw frame data we pass OpenGL textures to WebRender the performance could be enhanced notoriously.

    Luckily, GStreamer already supports the uploading, downloading, painting and composition of video frames as OpenGL textures with the OpenGL plugin and its OpenGL Integration library. Even more, with plugins such as GStreamer-VAAPI, Gst-OMX (OpenMAX), and others, it’s possible to process video without using the main CPU or its mapped memory in different platforms.

    But from what’s available in GStreamer to what it’s available in Rust there’s a distance. Nonetheless, Sebastian has putting a lot of effort in the Rust bindings for GStreamer, either for applications and plugins, sadly, GStreamer’s OpenGL Integration library (GstGL for short) wasn’t available at that time. So I rolled up my sleeves and got to work on the bindings.

    These are the stories of that work.

    As GStreamer shares with GTK+ the GObject framework and its introspection mechanism, both projects have collaborated on the required infrastructure to support Rust bindings. Thanks to all the GNOME folks who are working on the intercommunication between Rust and GObject. The quest has been long and complex, since Rust doesn’t map all the object oriented concepts, and GObject, being a set of practices and software helpers to do object oriented programming with C, its usage is not homogeneous.

    The Rubicon that ease the generation of Rust bindings for GObject-based projects is GIR, a tool, written in Rust, that reads gir files, along with metadata in toml, and outputs two types of bindings: sys and api.

    Rust can call external functions through FFI (foreign function interface), which is just a declaration of a C function with Rust types. But these functions are considered unsafe. The sys bindings, are just the exporting of the C function for the library organized by the library’s namespace.

    The next step is to create a safe and rustified API. This is the api bindings.

    As we said, GObject libraries are quite homogeneous, and even following the introspection annotations, there will be cases where GIR won’t be able to generate the correct bindings. For that reason GIR is constantly evolving, looking for a common way to solve the corner cases that exist in every GObject project. For example, these are my patches in order to generate the GstGL bindings.

    The done tasks were:

    For this document we assume that the reader has a functional Rust setup and they know the basic concepts.

    Clone and build gir

    $ cd ~/ws
    $ git clone https://github.com/gtk-rs/gir.git
    $ cd gir
    $ cargo build --release
    

    The reason to build gir in release mode is because, otherwise would be very slow.

    For sys bindings.

    These kind of bindings are normally straight forward (and unsafe) since they only map the C API to Rust via FFI mechanism.

    $ cd ~/ws
    $ git clone https://gitlab.freedesktop.org/gstreamer/gstreamer-rs-sys.git
    $ cd gstreamer-rs-sys
    $ cp /usr/share/gir-1.0/GstGL-1.0.gir gir-files/
    
    1. Verify if the gir file is more o less correct
      1. If there something strange, we should fix the code that generated it.
      2. If that is not possible, the last resource is to fix the gir file directly, which is just XML, not manually but through a script using xmlstartlet. See fix.sh in gtk-rs as example.
    2. Create the toml file with the metadata required to create the bindings. In other words, this file contains the exceptions, rules and options used by the tool to generated the bindings. See Gir_GstGL.toml in gstreamer-rs-sys as example. The documentation of the toml file is in the gir’s README.md file.
    $ ~/ws/gir/target/release/gir -c Gir_GstGL.toml
    

    This command will generate, as specified in the toml file (target_path), a crate in the directory named gstreamer-gl-sys.

    Api bindings.

    These type of bindings may require more manual work since their purpose is to offer a rustified API of the library, with all its syntactic sugar, semantics, and so on. But in general terms, the process is similar:

    $ cd ~/ws
    $ git clone https://gitlab.freedesktop.org/gstreamer/gstreamer-sys.git
    $ cd gstreamer-sys
    $ cp /usr/share/gir-1.0/GstGL-1.0.gir gir-files/
    

    Again, it would be possible to end up applying fixes to the gir file through a fix.sh script using xmlstartlet.

    And again, the confection of the toml file might take a lot of time, by trial and error, by cleaning and tidying up the API. See Gir_GstGL.toml in gstreamer-rs as example.

    $ ~/ws/gir/target/release/gir -c Gir_GstGL.toml
    

    A good way to test your bindings is by crafting a test application, which shows how to use the API. Personally I devoted a ton of time in the test application for GstGL, but worth it. It made me aware of a missing part in the crate used for GL applications in Rust, named Glutin, which was a way to get the used EGLDisplay. So also worked on that and sent a pull request that was recently merged. The sweets of the free software development.

    Nowadays I’m integrating GstGL API in servo/media and later, Servo!

    by vjaquez at January 16, 2019 07:42 PM

    January 14, 2019

    Andrés Gómez

    matrix-send me a notification!

    When you are working in the console of an Un*x system you always have the possibility of using some kind of notification system to warn you when a task has completed. Quite typically, that would involve an email that could arrive to your box’ local inbox or, if you have a mail agent properly configure, to some other inbox in the Internet.

    With the arriving of the Instant Messaging systems you could somehow move from the good old email notification to some other fancy service. That has been my prefered method for quite a while since I understand email as a “non-instant” messaging system. Basically, I do not want to get instant notifications when a mail arrives. Add to that the hassle of setting some kind of filter criteria to get the notifications only for specific mail rules and the not yet universally supported IMAP4 push method, instead of pulling for newly arrived mail …

    Anyway, long story short, for some time now we are using [matrix] as our Instant Messaging service at Igalia so, why not getting notifications there when a task is completed?

    Yes, you have guessed correctly, that’s possible and, actually, it’s very easy to set up, specially with the help of matrix-send.

    First, you need an account that will send you the notification(s). Ideally, that would be a bot user, but it could be any account. Then, you have get an access token with such user so you can interact with the matrix server from the command line as if it would be any other ordinary matrix client. Finally, you need to create a chat room between that user and your own in order to keep the communication ongoing. All this is explained in matrix’ client-server API documentation but, to make things easier, it would go as follows:

    $ curl -XPOST -d '{"user":"<matrix-user>", "password":"<password>", "type":"m.login.password"}' "https://<matrix-server>/_matrix/client/r0/login"
    {
        "access_token": "<access-token>",
        "device_id": "<device-id>",
        "home_server": "<home-server>",
        "user_id": "@<matrix-user>:<home-server>"
    }

    This will give you the needed access-token.

    Now, from your regular matrix client, invite the bot user to a conversation in a new room. Check in the configuration of the new room for its internal ID. It would be something like
    !<internal-id>:<home-server>.

    Then, accept such invitation from the command line:

    $ curl -XPOST -d '{}' "https://<matrix-server>/_matrix/client/r0/rooms/%21<internal-room-id>:<home-server>/join?access_token=<access-token>"
    {
        "room_id": "!<internal-room-id>:<home-server>"
    }

    All that is left is to configure matrix-send and start using it. Mind you, I’ve done a small addition that it has not been merged yet so I would just clone from my fork.

    The configuration file would look like this:

    $ cat ~/.config/matrix-send/config.ini
    [DEFAULT]
    endpoint=https://<matrix-server>/_matrix/
    access_token=<access-token>
    channel_id=!<internal-room-id>:<home-server>
    msgtype=m.text

    The interesting addition from my own is the msgtype field. By default, in matrix-send its value is m.notice which, depending on the configuration, quite typically won’t trigger a notification in your matrix client.

    All that is left is to make matrix-send executable and test it:

    $ chmod +x <path-to-matrix-send>/matrix-send.py
    $ <path-to-matrix-send>/matrix-send.py "Hello World!"

    by tanty at January 14, 2019 10:05 PM

    January 10, 2019

    Manuel Rego

    An introduction to CSS Containment

    Igalia has been recently working on the implementation of css-contain in Chromium by providing some fixes and optimizations based on this standard. This is a brief blog post trying to give an introduction to the spec, explain the status of things, the work done during past year, and some plans for the future.

    What’s css-contain?

    The main goal of CSS Containment standard is to improve the rendering performance of web pages, allowing the isolation of a subtree from the rest of the document. This specification only introduces one new CSS property called contain with different possible values. Browser engines can use that information to implement optimizations and avoid doing extra work when they know which subtrees are independent of the rest of the page.

    Let’s explain what is this about and why this can be can bring performance improvements in complex websites. Imagine that you have a big HTML page which generates a complex DOM tree, but you know that some parts of that page are totally independent of the rest of the page and the content in those parts is modified at some point.

    Browser engines usually try to avoid doing more work than needed and use some heuristics to avoid spending more time than required. However there are lots of corner cases and complex situations in which the browser needs to actually recompute the whole webpage. To improve these scenarios the author has to identify which parts (subtrees) of their website are independent and isolate them from the rest of the page thanks to the contain property. Then when there are changes in some of those subrees the rendering engine will be able to avoid doing any work outside of the subtree boundaries.

    Not everything is for free, when you use contain there are some restrictions that will affect those elements, so the browser is totally certain it can apply optimizations without causing any breakage (e.g. you need to manually set the size of the elment if you want to use size containment).

    The CSS Containment specification defines four values for the contain property, one per each type of containment:

    • layout: The internal layout of the element is totally isolated from the rest of the page, it’s not affected by anything outside and its contents cannot have any effect on the ancestors.
    • paint: Descendants of the element cannot be displayed outside its bounds, nothing will overflow this element (or if it does it won’t be visible).
    • size: The size of the element can be computed without checking its children, the element dimensions are independent of its contents.
    • style: The effects of counters and quotes cannot escape this element, so they are isolated from the rest of the page.
      Note that regarding style containment there is an ongoing discussion on the CSS Working Group about how useful it is (due to the narrowed scope of counters and quotes).

    You can combine the different type of containments as you wish, but the spec also provides two extra values that are a kind of “shorthand” for the other four:

    • content: Which is equivalent to contain: layout paint style.
    • strict: This is the same than having all four types of containment, so it’s equivalent to contain: layout paint size style.

    Example

    Let’s show an example of how CSS Containment can help to improve the performance of a webpage.

    Imagine a page with lots of elements, in this case 10,000 elements like this:

      <div class="item">
        <div>Lorem ipsum...</div>
      </div>

    And that it modifies the content of one of the inner DIVs trough textContent attribute.

    If you don’t use css-contain, even when the change is on a single element, Chromium spends a lot of time on layout because it traverses the whole DOM tree (which in this case is big as it has 10,000 elements).

    CSS Containment Example DOM Tree CSS Containment Example DOM Tree

    Here is when contain property comes to the rescue. In this example the DIV item has fixed size, and the contents we’re changing in the inner DIV will never overflow it. So we can apply contain: strict to the item, that way the browser won’t need to visit the rest of the nodes when something changes inside an item, it can stop checking things on that element and avoid going outside.

    Notice that if the content overflows the item it would get clipped, also if we don’t set a fixed size for the item it’ll be rendered as an empty box so nothing would be visible (actually in this example the borders would be present but they would be the only visible thing).

    CSS Containment Example CSS Containment Example

    Despite how simple is each of the items in this example, we’re getting a big improvement by using CSS Containment in layout time going down from ~4ms to ~0.04ms which is a huge difference. Imagine what would happen if the DOM tree has very complex structures and contents but only a small part of the page gets modified, if you can isolate that from the rest of the page you could get similar benefits.

    State of the art

    This is not a new spec, Chrome 52 shipped the initial support by July 2016, but during last year there has been some extra development related to it and that’s what I want to highlight in this blog post.

    First of all many specification issues have been fixed and some of them imply changes on the implementations, most of this work has been carried on by Florian Rivoal in collaboration with the CSS Working Group.

    Not only that but on the tests side Gérard Talbot has completed the test suite in the web-platform-tests (WPT) repository, which is really important to fix bugs on the implementations and ensure interoperability.

    In my case I’ve been working on the Chromium implementation fixing several bugs and interoperability issues and getting it up to date according to the last specification changes. I took advantage of the WPT test suite to do this work and also contributed back a bunch of tests there. I also imported Firefox tests into Chromium to improve interop (even did a small Firefox patch as part of this work).

    Last, it’s worth to notice that Firefox has been actively working on the implementation of css-contain during last year (you can test it by enabling the runtime flag layout.css.contain.enabled). Hopefully that would bring a second browser engine shipping the spec in the future.

    Wrap-up

    CSS Containment is a nice and simple specification that can be useful to improve web rendering performance in many different use cases. It’s true that currently it’s only supported by Chromium (remember that Firefox is working on it too) and that more improvements and optimizations can be implemented based on it, still it seems to have a huge potential.

    Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

    One more time all the work from Igalia related to css-contain has been sponsored by Bloomberg as part of our ongoing collaboration.

    Bloomberg has some complex UIs that are taking advantage of css-contain to improve the rendering performance, in future blog posts we’ll talk about some of these cases and the optimizations that have been implemented on the rendering engine to improve them.

    January 10, 2019 11:00 PM

    Diego Pino

    The eXpress Data Path

    In the previous article I briefly introduced XDP (eXpress Data Path) and eBPF, the multipurpose in-kernel virtual machine. On the XDP side, I focused only on the motivations behind this new technology, the reasons why rearchitecting the Linux kernel networking layer to enable faster packet processing. However, I didn’t get much into the details on how XDP works. In this new blog post I try to go deeper into XDP.

    XDP: A fast path for packet processing

    The design of XDP has its roots in a DDoS attack mitigation solution presented by Cloudflare at Netdev 1.1. Cloudflare leverages heavily on iptables, which according to their own metrics is able to handle 1 Mpps on a decent server (Source: Why we use the Linux kernel’s TCP stack). In the event of a DDoS attack, the amount of spoofed traffic can be up to 3 Mpps. Under those circumstances, a Linux box starts to be overflooded by IRQ interruptions until it becomes unusable.

    Because Cloudflare wanted to keep the convenience of using iptables (and the rest of the kernel’s network stack), they couldn’t go with a solution that takes full control of the hardware, such as DPDK. Their solution consisted of implementing what they called a “partial kernel bypass”. Some queues of the NIC are still attached to the kernel while others are attached to an user-space program that decides whether a packet should be dropped or not. By dropping packets at the lowest point of the stack, the amount of traffic that reaches the kernel’s networking subsystem gets significantly reduced.

    Cloudflare’s solution used the Netmap toolkit to implement its partial kernel bypass (Source: Single Rx queue kernel bypass with Netmap). However this idea could be generalized by adding a checkpoint in the Linux kernel network stack, preferably as soon as a packet is received in the NIC. This checkpoint should pass a packet to an user-space program that will decide what to do with it: drop it or let it continue through the normal path.

    Luckily, Linux already features a mechanism that allows user-space code execution within the kernel: the eBPF VM. So the solution seemed obvious.

    Linux network stack with XDP
    Linux network stack with XDP

    Packet operations

    Every network function, no matter how complex it is, consists of a series of basic operations:

    • Firewall: read incoming packets, compare them to a table of rules and execute an action: forward or drop.
    • NAT: read incoming packets, modify headers and forward packet.
    • Tunelling: read incoming packets, create a new packet, embed packet into new one and forward it.

    XDP passes packets to our eBPF program which decides what to do with them. We can read them or modify them if we need it. We can also access to helper functions to parse packets, compute checksums, and other functionalities, at no cost (avoiding system call cost penalties). And thanks to eBPF Maps we have access to complex data structures for persistent data storage, like tables. We are also able to decide what to do with a packet. Are we going to drop it? Forward it? To control a packet’s processing logic, XDP provides a set of predefined actions:

    • XDP_PASS: pass the packet to the normal network stack.
    • XDP_DROP: very fast drop.
    • XDP_TX: forward or TX-bounce back-out same interface.
    • XDP_REDIRECT: redirects the packet to another NIC or CPU.
    • XDP_ABORTED: indicates eBPF program error.

    XDP_PASS, XDP_TX and XDP_REDIRECT are specific cases of a forwarding action, whereas XDP_ABORTED is actually treated as a packet drop.

    Let’s take a look at one example that uses most of these elements to program a simple network function.

    Example: An IPv6 packet filter

    The canonical example when introducing XDP is a DDoS filter. What such network function does is to drop packets if they’re coming from a suspicious origin. In my case, I’m going with something even simpler: a function that filters out all traffic except IPv6.

    The advantage of this simpler function is that we don’t need to manage a list of suspicious addresses. Our program will simply examine the ethertype value of a packet and let it continue through the network stack or drop it depending on whether is an IPv6 packet or not.

    SEC("prog")
    int xdp_ipv6_filter_program(struct xdp_md *ctx)
    {
        void *data_end = (void *)(long)ctx->data_end;
        void *data     = (void *)(long)ctx->data;
        struct ethhdr *eth = data;
        u16 eth_type = 0;
    
        if (!(parse_eth(eth, data_end, eth_type))) {
            bpf_debug("Debug: Cannot parse L2\n");
            return XDP_PASS;
        }
    
        bpf_debug("Debug: eth_type:0x%x\n", ntohs(eth_type));
        if (eth_type == ntohs(0x86dd)) {
            return XDP_PASS;
        } else {
            return XDP_DROP;
        }
    }
    

    The function xdp_ipv6_filter_program is our main program. We define a new section in the binary called prog. This serves as a hook between our program and XDP. Whenever XDP receives a packet, our code will be executed.

    ctx represents a context, a struct which contains all the data necessary to access a packet. Our program calls parse_eth to fetch the ethertype. Then checks whether its value is 0x86dd (IPv6 ethertype), in that case the packet passes. Otherwise the packet is dropped. In addition, all the ethertype values are printed for debugging purposes.

    bpf_debug is in fact a macro defined as:

    #define bpf_debug(fmt, ...)                          \
        ({                                               \
            char ____fmt[] = fmt;                        \
            bpf_trace_printk(____fmt, sizeof(____fmt),   \
                ##__VA_ARGS__);                          \
        })
    

    It uses the function bpf_trace_printk under the hood, a function which prints out messages in /sys/kernel/debug/tracing/trace_pipe.

    The function parse_eth takes a packet’s beginning and end and parses its content.

    static __always_inline
    bool parse_eth(struct ethhdr *eth, void *data_end, u16 *eth_type)
    {
        u64 offset;
    
        offset = sizeof(*eth);
        if ((void *)eth + offset > data_end)
            return false;
        *eth_type = eth->h_proto;
        return true;
    }
    

    Running external code in the kernel involves certain risks. For instance, an infinite loop may freeze the kernel or a program may access an unrestricted area of memory. To avoid these potential hazards a verifier is run when the eBPF code is loaded. The verifier walks all possible code paths, checking our program doesn’t access out-of-range memory and there are not out of bound jumps. The verifier also ensures the program terminates in finite time.

    The snippets above conform our eBPF program. Now we just need to compile it (Full source code is available at: xdp_ipv6_filter).

    $ make
    

    Which generates xdp_ipv6_filter.o, the eBPF object file.

    Now we’re going to load this object file into a network interface. There are two ways to do that:

    • Write an user-space program that loads the object file and attaches it to a network interface.
    • Use iproute2 to load the object file to an interface.

    For this example, I’m going to use the latter method.

    Currently there’s a limited amount of network interfaces that support XDP (ixgbe, i40e, mlx5, veth, tap, tun, virtio_net and others), although the list is growing. Some of this network interfaces support XDP at driver level. That means, the XDP hook is implemented at the lowest point in the networking layer, just when the NIC receives a packet in the Rx ring. In other cases, the XDP hook is implemented at a higher point in the network stack. The former method offers better performance results, although the latter makes XDP available for any network interface.

    Luckily, veth interfaces are supported by XDP. I’m going to create a veth pair and attach the eBPF program to one of its ends. Remember that a veth always comes in pairs. It’s like a virtual patch cable connecting two interfaces. Whatever is transmited in one of the ends arrives to the other end and viceversa.

    $ sudo ip link add dev veth0 type veth peer name veth1
    $ sudo ip link set up dev veth0
    $ sudo ip link set up dev veth1
    

    Now I attach the eBPF program to veth1:

    $ sudo ip link set dev veth1 xdp object xdp_ipv6_filter.o
    

    You may have noticed I called the section for the eBPF program “prog”. That’s the name of the section iproute2 expects to find and naming the section with a different name will result into an error.

    If the program was successfully loaded I should see an xdp flag in the veth1 interface:

    $ sudo ip link sh veth1
    8: veth1@veth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
        link/ether 32:05:fc:9a:d8:75 brd ff:ff:ff:ff:ff:ff
        prog/xdp id 32 tag bdb81fb6a5cf3154 jited
    

    To verify my program works as expected, I’m going to push a mix of IPv4 and IPv6 packets to veth0 (ipv4-and-ipv6-data.pcap). My sample has a total of 20 packets (10 IPv4 and 10 IPv6). Before doing that though, I’m going to launch a tcpdump program on veth1 which is ready to capture only 10 IPv6 packets.

    $ sudo tcpdump "ip6" -i veth1 -w captured.pcap -c 10
    tcpdump: listening on veth1, link-type EN10MB (Ethernet), capture size 262144 bytes
    

    Send packets to veth0:

    $ sudo tcpreplay -i veth0 ipv4-and-ipv6-data.pcap
    

    The filtered packets arrived at the other end. The tcpdump program terminates since all the expected packets were received.

    10 packets captured
    10 packets received by filter
    0 packets dropped by kernel
    

    We can also print out /sys/kernel/debug/tracing/trace_pipe to check the ethertype values listed:

    $ sudo cat /sys/kernel/debug/tracing/trace_pipe
    tcpreplay-4496  [003] ..s1 15472.046835: 0: Debug: eth_type:0x86dd
    tcpreplay-4496  [003] ..s1 15472.046847: 0: Debug: eth_type:0x86dd
    tcpreplay-4496  [003] ..s1 15472.046855: 0: Debug: eth_type:0x86dd
    tcpreplay-4496  [003] ..s1 15472.046862: 0: Debug: eth_type:0x86dd
    tcpreplay-4496  [003] ..s1 15472.046869: 0: Debug: eth_type:0x86dd
    tcpreplay-4496  [003] ..s1 15472.046878: 0: Debug: eth_type:0x800
    tcpreplay-4496  [003] ..s1 15472.046885: 0: Debug: eth_type:0x800
    tcpreplay-4496  [003] ..s1 15472.046892: 0: Debug: eth_type:0x800
    tcpreplay-4496  [003] ..s1 15472.046903: 0: Debug: eth_type:0x800
    tcpreplay-4496  [003] ..s1 15472.046911: 0: Debug: eth_type:0x800
    ...
    

    XDP: The future of in-kernel packet processing?

    XDP started as a fast path for certain use cases, especially the ones which could result into an early packet drop (like a DDoS attack prevention solution). However, since a network function is nothing else but a combination of basic primitives (reads, writes, forwarding, dropping…), all of them available via XDP/eBPF, it could possible to use XDP for more than packet dropping. It could be used, in fact, to implement any network function.

    So what started as a fast path gradually is becoming the normal path. We’re seeing now how tools such as iptables are getting rewritten in XDP/eBPF, keeping their user-level interfaces intact. The enormous performance gains of this new approach makes the effort worth it. And since the hunger for more performance gains never ends, it seems reasonable to think that any other tool that can be possibly written in XDP/eBPF will follow a similar fate.

    iptables vs nftables vs bpfilter
    iptables vs nftables vs bpfilter

    Source: Why is the kernel community replacing iptables with BPF?

    Summary

    In this article I took a closer look at XDP. I explained the motivations that lead to its design. Through a simple example, I showed how XDP and eBPF work together to perform fast packet processing inside the kernel. XDP provides check points within the kernel’s network stack. An eBPF program can hook to XDP events to perform an operation on a packet (modify its headers, drop it, forward it, etc).

    XDP offers high-performance packet processing while maintaining interoperatibility with the rest of networking subsystem, an advantage over full kernel bypass solutions. I didn’t get much into the internals of XDP and how it interacts with other parts of the networking subsystem though. I encourage checking the first two links in the recommended readings section for further understanding on XDP internals.

    In the next article, the last in the series, I will cover the new AF_XDP socket address family and the implementation of a Snabb bridge for this new interface.

    Recommended readings:

    January 10, 2019 10:00 AM

    January 08, 2019

    Carlos García Campos

    Epiphany automation mode

    Last week I finally found some time to add the automation mode to Epiphany, that allows to run automated tests using WebDriver. It’s important to note that the automation mode is not expected to be used by users or applications to control the browser remotely, but only by WebDriver automated tests. For that reason, the automation mode is incompatible with a primary user profile. There are a few other things affected by the auotmation mode:

    • There’s no persistency. A private profile is created in tmp and only ephemeral web contexts are used.
    • URL entry is not editable, since users are not expected to interact with the browser.
    • An info bar is shown to notify the user that the browser is being controlled by automation.
    • The window decoration is orange to make it even clearer that the browser is running in automation mode.

    So, how can I write tests to be run in Epiphany? First, you need to install a recently enough selenium. For now, only the python API is supported. Selenium doesn’t have an Epiphany driver, but the WebKitGTK driver can be used with any WebKitGTK+ based browser, by providing the browser information as part of session capabilities.

    from selenium import webdriver
    
    options = webdriver.WebKitGTKOptions()
    options.binary_location = 'epiphany'
    options.add_argument('--automation-mode')
    options.set_capability('browserName', 'Epiphany')
    options.set_capability('version', '3.31.4')
    
    ephy = webdriver.WebKitGTK(options=options, desired_capabilities={})
    ephy.get('http://www.webkitgtk.org')
    ephy.quit()
    

    This is a very simple example that just opens Epiphany in automation mode, loads http://www.webkitgtk.org and closes Epiphany. A few comments about the example:

    • Version 3.31.4 will be the first one including the automation mode.
    • The parameter desired_capabilities shouldn’t be needed, but there’s a bug in selenium that has been fixed very recently.
    • WebKitGTKOptions.set_capability was added in selenium 3.14, if you have an older version you can use the following snippet instead
    from selenium import webdriver
    
    options = webdriver.WebKitGTKOptions()
    options.binary_location = 'epiphany'
    options.add_argument('--automation-mode')
    capabilities = options.to_capabilities()
    capabilities['browserName'] = 'Epiphany'
    capabilities['version'] = '3.31.4'
    
    ephy = webdriver.WebKitGTK(desired_capabilities=capabilities)
    ephy.get('http://www.webkitgtk.org')
    ephy.quit()
    

    To simplify the driver instantation you can create your own Epiphany driver derived from the WebKitGTK one:

    from selenium import webdriver
    
    class Epiphany(webdriver.WebKitGTK):
        def __init__(self):
            options = webdriver.WebKitGTKOptions()
            options.binary_location = 'epiphany'
            options.add_argument('--automation-mode')
            options.set_capability('browserName', 'Epiphany')
            options.set_capability('version', '3.31.4')
    
            webdriver.WebKitGTK.__init__(self, options=options, desired_capabilities={})
    
    ephy = Epiphany()
    ephy.get('http://www.webkitgtk.org')
    ephy.quit()
    

    The same for selenium < 3.14

    from selenium import webdriver
    
    class Epiphany(webdriver.WebKitGTK):
        def __init__(self):
            options = webdriver.WebKitGTKOptions()
            options.binary_location = 'epiphany'
            options.add_argument('--automation-mode')
            capabilities = options.to_capabilities()
            capabilities['browserName'] = 'Epiphany'
            capabilities['version'] = '3.31.4'
    
            webdriver.WebKitGTK.__init__(self, desired_capabilities=capabilities)
    
    ephy = Epiphany()
    ephy.get('http://www.webkitgtk.org')
    ephy.quit()
    

    by carlos garcia campos at January 08, 2019 05:22 PM

    January 07, 2019

    Diego Pino

    A brief introduction to XDP and eBPF

    In a previous post I explained how to build a kernel with XDP (eXpress Data Path) support. Having that feature enabled is mandatory in order to use it. XDP is a new Linux kernel component that highly improves packet processing performance.

    In the last years, we have seen an upraise of programming toolkits and techniques to overcome the limitations of the Linux kernel when it comes to do high-performance packet processing. One of the most popular techniques is kernel bypass which means to skip the kernel’s networking layer and do all packet processing from user-space. Kernel bypass also involves to manage the NIC from user-space, in other words, to rely on an user-space driver to handle the NIC.

    By giving full control of the NIC to an user-space program, we reduce the overhead introduced by the kernel (context switching, networking layer processing, interruptions, etc), which is relevant enough when working at speeds of 10Gbps or higher. Kernel bypass plus a combination of other features (batch packet processing) and performance tuning adjustments (NUMA awareness, CPU isolation, etc) conform the basis of high-performance user-space networking. Perhaps the poster child of this new approach to packet processing is Intel’s DPDK (Data Plane Development Kit), although other well-know toolkits and techniques are Cisco’s VPP (Vector Packet Processing), Netmap and of course Snabb.

    The disadvantages of user-space networking are several:

    • An OS’s kernel is an abstraction layer for hardware resources. Since user-space programs need to manage their resources directly, they also need to manage their hardware. That often means to program their own drivers.
    • As the kernel-space is completely skipped, all the networking functionality provided by the kernel is skipped too. User-space programs need to reimplement functionality that might be already provided by the kernel or the OS.
    • Programs work as sandboxes, which severely limit their ability to interact, and be integrated, with other parts of the OS.

    Essentially, user-space networking achieves high-speed performance by moving packet-processing out of the kernel’s realm into user-space. XDP does in fact the opposite: it moves user-space networking programs (filters, mappers, routing, etc) into the kernel’s realm. XDP allow us to execute our network function as soon as a packet hits the NIC, and before it starts moving upwards into the kernel’s networking subsystem, which results into a significant increase of packet-processing speed. But how does the kernel make possible for an user to execute their programs within the kernel’s realm? Before answering this question we need to take a look at BPF.

    BPF and eBPF

    Despite its somehow misleading name, BPF (Berkeley Packet Filtering) is in fact a virtual machine model. This VM was originally designed for packet filtering processing, thus its name.

    One of the most prominent users of BPF is the tool tcpdump. When capturing packets with tcpdump, an user can define a packet-filtering expression. Only packets that match that expression will actually be captured. For instance, the expression “tcp dst port 80” captures all TCP packets which destination port equals to 80. This expression can be reduced by a compiler to BPF bytecode.

    $ sudo tcpdump -d "tcp dst port 80"
    (000) ldh      [12]
    (001) jeq      #0x86dd          jt 2    jf 6
    (002) ldb      [20]
    (003) jeq      #0x6             jt 4    jf 15
    (004) ldh      [56]
    (005) jeq      #0x50            jt 14   jf 15
    (006) jeq      #0x800           jt 7    jf 15
    (007) ldb      [23]
    (008) jeq      #0x6             jt 9    jf 15
    (009) ldh      [20]
    (010) jset     #0x1fff          jt 15   jf 11
    (011) ldxb     4*([14]&0xf)
    (012) ldh      [x + 16]
    (013) jeq      #0x50            jt 14   jf 15
    (014) ret      #262144
    (015) ret      #0
    

    Basically what the program above does is:

    • Instruction (000): loads the packet’s offset 12, as a 16-bit word, into the accumulator. Offset 12 represents a packet’s ethertype.
    • Instruction (001): compares the value of the accumulator to 0x86dd, which is the ethertype value for IPv6. If the result is true, the program counter jumps to instruction (002), if not it jumps to (006).
    • Instruction (006): compares the value to 0x800 (ethertype value of IPv4). If true jump to (007), if not (015).

    And so forth, until the packet-filtering program returns a result. This result is generally a boolean. Returning a non-zero value (instruction (014)) means the packet matched, whereas returning a zero value (instruction (015)) means the packet didn’t match.

    The BPF VM and its bytecode was introduced by Steve McCanne and Van Jacobson in late 1992, in their paper The BSD Packet Filter: A New Architecture for User-level Packet Capture, and it was presented for the first time at Usenix Conference Winter ‘93.

    Since BPF is a VM, it defines an environment where programs are executed. Besides a bytecode, it also defines a packet-based memory model (load instructions are implicitly done on the processing packet), registers (A and X; Accumulator and Index register), a scratch memory store and an implicit Program Counter. Interestingly, BPF’s bytecode was modeled after the Motorola 6502 ISA. As Steve McCanne recalls in his Sharkfest ‘11 keynote, he was familiar with 6502 assembly from his junior high-school days programming on an Apple II and that influence him when he designed the BPF bytecode.

    The Linux kernel features BPF support since v2.5, mainly added by Jay Schullist. There were not major changes in the BPF code until 2011, when Eric Dumazet turned the BPF interpreter into a JIT (Source: A JIT for packet filters). Instead of interpreting BPF bytecode, now the kernel was able to translate BPF programs directly to a target architecture: x86, ARM, MIPS, etc.

    Later on, in 2014, Alexei Starovoitov introduced a new BPF JIT. This new JIT was actually a new architecture based on BPF, known as eBPF. Both VMs co-existed for some time I think, but nowadays packet-filtering is implemented on top of eBPF. In fact, a lot of documentation refers now to eBPF as BPF, and the classic BPF is known as cBPF.

    eBPF extends the classic BPF virtual machine in several ways:

    • Takes advantage of modern 64-bit architectures. eBPF uses 64-bit registers and increases the number of available registers from 2 (Accumulator and X register) to 10. eBPF also extends the number of opcodes (BPF_MOV, BPF_JNE, BPF_CALL…).
    • Decoupled from the networking subsystem. BPF was bounded to a packet-based data model. Since it was used for packet filtering, its code lived within the networking subsystem. However, the eBPF VM is no longer bounded to a data model and it can be used for any purpose. It’s possible to attach now an eBPF program to a tracepoint or to a kprobe. This opens up the door of eBPF to instrumentation, performance analysis and many more uses within other kernel subsystems. The eBPF code lives now at its own path: kernel/bpf.
    • Global data stores called Maps. Maps are key-value stores that allow the interchange of data between user-space and kernel-space. eBPF provides several types of Maps.
    • Helper functions. Such as packet rewrite, checksum calculation or packet cloning. Unlike user-space programming, these functions get executed inside the kernel. In addition, it’s possible to execute system calls from eBPF programs.
    • Tail-calls. eBPF programs are limited to 4096 bytes. The tail-call feature allows a eBPF program to pass control a new eBPF program, overcoming this limitation (up to 32 programs can be chained).

    eBPF: an example

    The Linux kernel sources include several eBPF examples. They’re available at samples/bpf/. To compile these examples simply type:

    $ sudo make samples/bpf/
    

    Instead of coding a new eBPF example, I’m going to reuse one of the samples available in samples/bpf/. I will go through some parts of the code and explain how it works. The example I chose was the tracex4 program.

    Generally, all the examples at samples/bpf/ consist of 2 files. In this case:

    We need to compile then tracex4_kern.c to eBPF bytecode. At this moment, gcc lacks a backend for eBPF. Luckily, clang can emit eBPF bytecode. The Makefile uses clang to compile tracex4_kern.c into an object file.

    I commented earlier that one of the most interesting features of eBPF are Maps. Maps are key/value stores that allow to exchange data between user-space and kernel-space programs. tracex4_kern defines one map:

    struct pair {
        u64 val;
        u64 ip;
    };  
    
    struct bpf_map_def SEC("maps") my_map = {
        .type = BPF_MAP_TYPE_HASH,
        .key_size = sizeof(long),
        .value_size = sizeof(struct pair),
        .max_entries = 1000000,
    };
    

    BPF_MAP_TYPE_HASH is one of the many Map types offered by eBPF. In this case, it’s simply a hash. You may also have noticed the SEC("maps") declaration. SEC is a macro used to create a new section in the binary. Actually the tracex4_kern example defines two more sections:

    SEC("kprobe/kmem_cache_free")
    int bpf_prog1(struct pt_regs *ctx)
    {   
        long ptr = PT_REGS_PARM2(ctx);
    
        bpf_map_delete_elem(&my_map, &ptr); 
        return 0;
    }
        
    SEC("kretprobe/kmem_cache_alloc_node") 
    int bpf_prog2(struct pt_regs *ctx)
    {
        long ptr = PT_REGS_RC(ctx);
        long ip = 0;
    
        // get ip address of kmem_cache_alloc_node() caller
        BPF_KRETPROBE_READ_RET_IP(ip, ctx);
    
        struct pair v = {
            .val = bpf_ktime_get_ns(),
            .ip = ip,
        };
        
        bpf_map_update_elem(&my_map, &ptr, &v, BPF_ANY);
        return 0;
    }   
    

    These two functions will allow us to delete an entry from a map (kprobe/kmem_cache_free) and to add a new entry to a map (kretprobe/kmem_cache_alloc_node). All the function calls in capital letters are actually macros defined at bpf_helpers.h.

    If I dump the sections of the object file, I should be able to see these new sections defined:

    $ objdump -h tracex4_kern.o
    
    tracex4_kern.o:     file format elf64-little
    
    Sections:
    Idx Name          Size      VMA               LMA               File off  Algn
      0 .text         00000000  0000000000000000  0000000000000000  00000040  2**2
                      CONTENTS, ALLOC, LOAD, READONLY, CODE
      1 kprobe/kmem_cache_free 00000048  0000000000000000  0000000000000000  00000040  2**3
                      CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
      2 kretprobe/kmem_cache_alloc_node 000000c0  0000000000000000  0000000000000000  00000088  2**3
                      CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
      3 maps          0000001c  0000000000000000  0000000000000000  00000148  2**2
                      CONTENTS, ALLOC, LOAD, DATA
      4 license       00000004  0000000000000000  0000000000000000  00000164  2**0
                      CONTENTS, ALLOC, LOAD, DATA
      5 version       00000004  0000000000000000  0000000000000000  00000168  2**2
                      CONTENTS, ALLOC, LOAD, DATA
      6 .eh_frame     00000050  0000000000000000  0000000000000000  00000170  2**3
                      CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
    

    Then there is tracex4_user.c, the main program. Basically what the program does is to listen to kmem_cache_alloc_node events. When that event happens, the corresponding eBPF code is executed. The code stores the IP attribute of an object into a map, which is printed in loop in the main program. Example:

    $ sudo ./tracex4
    obj 0xffff8d6430f60a00 is  2sec old was allocated at ip ffffffff9891ad90
    obj 0xffff8d6062ca5e00 is 23sec old was allocated at ip ffffffff98090e8f
    obj 0xffff8d5f80161780 is  6sec old was allocated at ip ffffffff98090e8f
    

    How the user-space program and the eBPF program are connected? On initialization, tracex4_user.c loads the tracex4_kern.o object file using the load_bpf_file function.

    int main(int ac, char **argv)
    {
        struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
        char filename[256];
        int i;
    
        snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
    
        if (setrlimit(RLIMIT_MEMLOCK, &r)) {
            perror("setrlimit(RLIMIT_MEMLOCK, RLIM_INFINITY)");
            return 1;
        }
    
        if (load_bpf_file(filename)) {
            printf("%s", bpf_log_buf);
            return 1;
        }
    
        for (i = 0; ; i++) {
            print_old_objects(map_fd[1]);
            sleep(1);
        }
    
        return 0;
    }
    

    When load_bpf_file is executed, the probes defined in the eBPF file are added to /sys/kernel/debug/tracing/kprobe_events. We’re listening now to those events and our program can do something when they happen.

    $ sudo cat /sys/kernel/debug/tracing/kprobe_events
    p:kprobes/kmem_cache_free kmem_cache_free
    r:kprobes/kmem_cache_alloc_node kmem_cache_alloc_node
    

    All the other programs in sample/bpf/ follow a similar structure. There’s always two files:

    • XXX_kern.c: the eBPF program.
    • XXX_user.c: the main program.

    The eBPF program defines Maps and functions hooked to a binary section. When the kernel emits a certain type of event (a tracepoint, for instance) our hooks will be executed. Maps are used to exchange data between the kernel program and the user-space program.

    Wrapping up

    In this article I have covered BPF and eBPF from a high-level view. I’m aware there’s a lot of resources and information nowadays about eBPF, but I feel I needed to explain it with my own words. Please check out the list of recommended readings for further information.

    On the next article I will cover XDP and its relation with eBPF.

    Recommended readings:

    January 07, 2019 08:00 AM

    December 08, 2018

    Philippe Normand

    Web overlay in GStreamer with WPEWebKit

    After a year or two of hiatus I attended the GStreamer conference which happened in beautiful Edinburgh. It was great to meet the friends from the community again and learn about what’s going on in the multimedia world. The quality of the talks was great, the videos are published online as usual in Ubicast. I delivered a talk about the Multimedia support in WPEWebKit, you can watch it there and the slides are also available.

    One of the many interesting presentations was about GStreamer for cloud-based live video. Usually anything with the word cloud would tend to draw my attention away but for some reason I attended this presentation, and didn’t regret it! The last demo presented by the BBC folks was about overlaying Web content on native video streams. It’s an interesting use-case for live TV broadcasting for instance. A web page provides dynamic notifications popping up and down, the web page is rendered with a transparent background and blended over the live video stream. The BBC folks implemented a GStreamer source element relying on CEF for their Brave project.

    So here you wonder, why am I talking about Chromium Embedded Framework (CEF)? Isn’t this post about WPEWebKit? After seeing the demo from the Brave developers I immediately thought WPE could be a great fit for this HTML overlay use-case too! So a few weeks after the conference I finally had the time to start working on the WPE GStreamer plugin. My colleague Žan Doberšek, WPE’s founder hacker, provided a nice solution for the initial rendering issues of the prototype, many thanks to him!

    Here’s a first example, a basic web-browser with gst-play:

    $ gst-play-1.0 --videosink gtkglsink wpe://https://gnome.org
    

    A GTK window opens up and the GNOME homepage should load. You can click on links too! To overlay a web page on top of a video you can use a pipeline like this one:

    $ gst-launch-1.0 glvideomixer name=m sink_1::zorder=0 sink_0::height=818 sink_0::width=1920 ! gtkglsink \
     wpesrc location="file:///home/phil/Downloads/plunk/index.html" draw-background=0 ! m. \
     uridecodebin uri="http://192.168.1.44/Sintel.2010.1080p.mkv" name=d d. ! queue ! glupload \
      ! glcolorconvert ! m.
    

    which can be represented with this simplified graph:

    The advantage of this approach is that many heavy-lifting tasks happen in the GPU. WPE loads the page using its WPENetworkProcess external process, parses everything (DOM, CSS, JS, …) and renders it as a EGLImage, shared with the UIProcess (the GStreamer application, gst-launch in this case). In most situations decodebin will use an hardware decoder. The decoded video frames are uploaded to the GPU and composited with the EGLImages representing the web-page, in a single OpenGL scene, using the glvideomixer element.

    The initial version of the GstWPE plugin is now part of the gst-plugins-bad staging area, where most new plugins are uploaded for further improvements later on. Speaking of improvements, the following tasks have been identified:

    • The wpesrc draw-background property is not yet operational due to missing WPEWebKit API for background-color configuration support. I expect to complete this task very soon, interested people can follow this bugzilla ticket
    • Audio support, WPEWebKit currently provides only EGLImages to application side. The audio session is rendered directly to GStreamer’s autoaudiosink in WebKit, so there’s currently no audio sharing support in wpesrc.
    • DMABuf support as an alternative to EGLImages. WPEWebKit internally leverages linux-dmabuf support already but doesn’t expose the file descriptors and plane informations.
    • Better navigation events support. GStreamer’s navigation events API was initially designed for DVD menus navigation uses-cases mostly, the exposed input events informations are not a perfect match for WPEWebKit which expects hardware-level informations from keyboard, mouse and touch devices.

    There are more ways and use-cases related with WPE, I expect to unveil another WPE embedding project very soon. Watch this space! As usual many thanks to my Igalia colleagues for sponsoring this work. We are always happy to hear what others are doing with WPE and to help improving it, don’t hesitate to get in touch!

    by Philippe Normand at December 08, 2018 02:09 PM

    GStreamer’s playbin3 overview for application developers

    Multimedia applications based on GStreamer usually handle playback with the playbin element. I recently added support for playbin3 in WebKit. This post aims to document the changes needed on application side to support this new generation flavour of playbin.

    So, first of, why is it named playbin3 anyway? The GStreamer 0.10.x series had a playbin element but a first rewrite (playbin2) made it obsolete in the GStreamer 1.x series. So playbin2 was renamed to playbin. That’s why a second rewrite is nicknamed playbin3, I suppose :)

    Why should you care about playbin3? Playbin3 (and the elements it’s using internally: parsebin, decodebin3, uridecodebin3 among others) is the result of a deep re-design of playbin2 (along with decodebin2 and uridecodebin) to better support:

    • gapless playback
    • audio cross-fading support (not yet implemented)
    • adaptive streaming
    • reduced CPU, memory and I/O resource usage
    • faster stream switching and full control over the stream selection process

    This work was carried on mostly by Edward Hervey, he presented his work in detail at 3 GStreamer conferences. If you want to learn more about this and the internals of playbin3 make sure to watch his awesome presentations at the 2015 gst-conf, 2016 gst-conf and 2017 gst-conf.

    Playbin3 was added in GStreamer 1.10. It is still considered experimental but in my experience it works already very well. Just keep in mind you should use at least the latest GStreamer 1.12 (or even the upcoming 1.14) release before reporting any issue in Bugzilla. Playbin3 is not a drop-in replacement for playbin, both elements share only a sub-set of GObject properties and signals. However, if you don’t want to modify your application source code just yet, it’s very easy to try playbin3 anyway:

    $ USE_PLAYBIN3=1 my-playbin-based-app
    

    Setting the USE_PLAYBIN environment variable enables a code path inside the GStreamer playback plugin which swaps the playbin element for the playbin3 element. This trick provides a glance to the playbin3 element for the most lazy people :) The problem is that depending on your use of playbin, you might get runtime warnings, here’s an example with the Totem player:

    $ USE_PLAYBIN3=1 totem ~/Videos/Agent327.mp4
    (totem:22617): GLib-GObject-WARNING **: ../../../../gobject/gsignal.c:2523: signal 'video-changed' is invalid for instance '0x556db67f3170' of type 'GstPlayBin3'
    
    (totem:22617): GLib-GObject-WARNING **: ../../../../gobject/gsignal.c:2523: signal 'audio-changed' is invalid for instance '0x556db67f3170' of type 'GstPlayBin3'
    
    (totem:22617): GLib-GObject-WARNING **: ../../../../gobject/gsignal.c:2523: signal 'text-changed' is invalid for instance '0x556db67f3170' of type 'GstPlayBin3'
    
    (totem:22617): GLib-GObject-WARNING **: ../../../../gobject/gsignal.c:2523: signal 'video-tags-changed' is invalid for instance '0x556db67f3170' of type 'GstPlayBin3'
    
    (totem:22617): GLib-GObject-WARNING **: ../../../../gobject/gsignal.c:2523: signal 'audio-tags-changed' is invalid for instance '0x556db67f3170' of type 'GstPlayBin3'
    
    (totem:22617): GLib-GObject-WARNING **: ../../../../gobject/gsignal.c:2523: signal 'text-tags-changed' is invalid for instance '0x556db67f3170' of type 'GstPlayBin3'
    sys:1: Warning: g_object_get_is_valid_property: object class 'GstPlayBin3' has no property named 'n-audio'
    sys:1: Warning: g_object_get_is_valid_property: object class 'GstPlayBin3' has no property named 'n-text'
    sys:1: Warning: ../../../../gobject/gsignal.c:3492: signal name 'get-video-pad' is invalid for instance '0x556db67f3170' of type 'GstPlayBin3'
    

    As mentioned previously, playbin and playbin3 don’t share the same set of GObject properties and signals, so some changes in your application are required in order to use playbin3.

    If your application is based on the GstPlayer library then you should set the GST_PLAYER_USE_PLAYBIN3 environment variable. GstPlayer already handles both playbin and playbin3, so no changes needed in your application if you use GstPlayer!

    Ok, so what if your application relies directly on playbin? Some changes are needed! If you previously used playbin stream selection properties and signals, you will now need to handle the GstStream and GstStreamCollection APIs. Playbin3 will emit a stream collection message on the bus, this is very nice because the collection includes information (metadata!) about the streams (or tracks) the media asset contains. In playbin this was handled with a bunch of signals (audio-tags-changed, audio-changed, etc), properties (n-audio, n-video, etc) and action signals (get-audio-tags, get-audio-pad, etc). The new GstStream API provides a centralized and non-playbin-specific access point for all these informations. To select streams with playbin3 you now need to send a select_streams event so that the demuxer can know exactly which streams should be exposed to downstream elements. That means potentially improved performance! Once playbin3 completed the stream selection it will emit a streams selected message, the application should handle this message and potentially update its internal state about the selected streams. This is also the best moment to update your UI regarding the selected streams (like audio track language, video track dimensions, etc).

    Another small difference between playbin and playbin3 is about the source element setup. In playbin there is a source read-only GObject property and a source-setup GObject signal. In playbin3 only the latter is available, so your application should rely on source-setup instead of the notify::source GObject signal.

    The gst-play-1.0 playback utility program already supports playbin3 so it provides a good source of inspiration if you consider porting your application to playbin3. As mentioned at the beginning of this post, WebKit also now supports playbin3, however it needs to be enabled at build time using the CMake -DUSE_GSTREAMER_PLAYBIN3=ON option. This feature is not part of the WebKitGTK+ 2.20 series but should be shipped in 2.22. As a final note I wanted to acknowledge my favorite worker-owned coop Igalia for allowing me to work on this WebKit feature and also our friends over at Centricular for all the quality work on playbin3.

    by Philippe Normand at December 08, 2018 09:48 AM

    December 05, 2018

    Samuel Iglesias

    VK_KHR_shader_float_controls and Mesa support

    Khronos Group has published two new extensions for Vulkan: VK_KHR_shader_float16_int8 and VK_KHR_shader_float_controls. In this post, I will talk about VK_KHR_shader_float_controls, which is the extension I have been implementing on Anvil driver, the open-source Intel Vulkan driver, as part of my job at Igalia. For information about VK_KHR_shader_float16_int8 and its implementation in Mesa, you can read Iago’s blogpost.

    The Vulkan Working Group has defined a new extension VK_KHR_shader_float_controls, which allows applications to query and override the implementation’s default floating point behavior for rounding modes, denormals, signed zero and infinity. From the Vulkan application developer perspective, VK_shader_float_controls defines a new structure called VkPhysicalDeviceFloatControlsPropertiesKHR where the drivers expose the supported capabilities such as the rounding modes for each floating point data type, how the denormals are expected to be handled by the hardware (either flush to zero or preserve their bits) and if the value is a signed zero, infinity and NaN, whether it will preserve their bits.

    typedef struct VkPhysicalDeviceFloatControlsPropertiesKHR {
        VkStructureType    sType;
        void*              pNext;
        VkBool32           separateDenormSettings;
        VkBool32           separateRoundingModeSettings;
        VkBool32           shaderSignedZeroInfNanPreserveFloat16;
        VkBool32           shaderSignedZeroInfNanPreserveFloat32;
        VkBool32           shaderSignedZeroInfNanPreserveFloat64;
        VkBool32           shaderDenormPreserveFloat16;
        VkBool32           shaderDenormPreserveFloat32;
        VkBool32           shaderDenormPreserveFloat64;
        VkBool32           shaderDenormFlushToZeroFloat16;
        VkBool32           shaderDenormFlushToZeroFloat32;
        VkBool32           shaderDenormFlushToZeroFloat64;
        VkBool32           shaderRoundingModeRTEFloat16;
        VkBool32           shaderRoundingModeRTEFloat32;
        VkBool32           shaderRoundingModeRTEFloat64;
        VkBool32           shaderRoundingModeRTZFloat16;
        VkBool32           shaderRoundingModeRTZFloat32;
        VkBool32           shaderRoundingModeRTZFloat64;
    } VkPhysicalDeviceFloatControlsPropertiesKHR;
    

    This structure will be filled by the driver when calling vkGetPhysicalDeviceProperties2(), with a pointer to such structure as one of the pNext pointers of VkPhysicalDeviceProperties2 structure. With that, we know if the driver will support the SPIR-V capabilities we want to use in our shaders, if separate*Settings are true, remember to check the value of the property for the floating point bit-size types you are planning to work with.

    The required bits to enable such capabilities in a SPIR-V shader are the following:

    1. Enable the extension: OpExtension "SPV_KHR_float_controls"
    2. Enable the desired capability. For example: OpCapability DenormFlushToZero
    3. Specify where to apply it. For example, we would like to flush to zero all fp64 denormalss in the %main function of a shader: OpExecutionMode %main DenormFlushToZero 64. If we want to apply different modes, we would repeat that line with the needed ones.
    4. Profit!

    I implemented the support of this extensions for the Anvil’s supported GPUs (Broadwell, Skylake, Kabylake and newer), although we don’t support all the capabilities. For example on Broadwell, float16 denormals are not supported, and the support for flushing to zero the float16 denormals is not supported for all the instructions in the rest of generations.

    If you are interested, the patches are now under review :-) As there are not real world code using this feature yet, please fill any bug you find about this in our bugzilla.

    December 05, 2018 03:57 PM

    December 04, 2018

    Iago Toral

    VK_KHR_shader_float16_int8 on Anvil

    The last time I talked about my driver work was to announce the implementation of the shaderInt16 feature for the Anvil Vulkan driver back in May, and since then I have been working on VK_KHR_shader_float16_int8, a new Vulkan extension recently announced by the Khronos group, for which I have just posted initial patches in mesa-dev supporting Broadwell and later Intel platforms.

    As you probably guessed by the name, this extension enables Vulkan to consume SPIR-V shaders that use of Float16 and Int8 types in arithmetic operations, extending the functionality included with VK_KHR_16bit_storage and VK_KHR_8bit_storage, which was limited to load/store operations. In theory, applications that do not need the range and precision of regular 32-bit floating point and integers, can use these new types to improve performance by increasing ALU throughput and reducing register pressure, which in some platforms can also lead to improved parallelism.

    In the case of the Intel platforms initial testing done by Intel suggests that better ALU throughput is expected when issuing half-float instructions. Lower register pressure is also expected, at least for SIMD16 fragment and compute shaders, where we can pack all 16-channels worth of half-float data into a single GPU register, which could significantly improve performance for shaders that would otherwise need to spill registers to memory.

    Another neat thing is that while VK_KHR_shader_float16_int8 is a Vulkan extension, its implementation is mostly API agnostic, so most of the work we did here should also help us have a proper mediump implementation for GLSL ES shaders in the future.

    There are a few caveats to consider as well though: on some hardware platforms smaller bit-sizes have certain hardware restrictions that may lead to emitting worse shader code in some scenarios, and generally, Mesa’s compiler infrastructure (and the Intel compiler backend in particular) have a long history of being 32-bit only, so there are parts of the compiler stack that still work better for 32-bit code.

    Because VK_KHR_shader_float16_int8 is a brand new feature, we don’t really have any real world use cases yet. This is on top of the fact that Mesa’s compiler backends have been mostly (or exclusively) 32-bit aware until now (and more recently 64-bit too), so going forward I would expect a lot of focus on making our compiler be as robust (and optimal) for 16-bit code as it is for 32-bit code.

    While we are already aware of a few areas where we can do better and I am currently working on addressing a few of these, one of the major limiting factors we have at the moment is the fact that the only source of 16-bit shaders available to us is the Khronos CTS, which due to its particular motivation, is very different from real world shader workloads and it is not a valid source material to drive compiler optimization work. Unfortunately, it might take some time until we start seeing applications using these new features, so in the meantime we will need to find other ways to drive further work in this area, and I think our best option here might be GLSL ES’s mediump and lowp qualifiers.

    GLSL ES mediump and lowp qualifiers have been around for a long time but they are only defined as hints to the shader compiler that lower precision is acceptable and we have never really used them to emit half-float code. Thankfully, Topi Pohjolainen from Intel has been working on this for a while, which would open up a much better scenario for improving our 16-bit compiler paths, so this is something I am really looking forward to.

    Finally, as I say above, we could could definitely use more testing and feedback from real world use cases, so if you decide to use this feature in your next project and you hit any bugs, please be sure to file them in Bugzilla so we can continue to improve our implementation.

    by Iago Toral at December 04, 2018 08:25 AM

    November 23, 2018

    Víctor Jáquez

    Building gst-msdk with MediaSDK opensource

    I tried, several months ago, the open source version of Intel MediaSDK and it was a complete mess. In order to review some patches for gst-msdk I tried it again. I am surprised how the situation has improved since then.

    Install dependencies

    $ sudo apt get install libva-dev vainfo cmake ccache
    $ sudo apt build-dep gstreamer1.0 gst-plugins-{base,good,bad}1.0
    $ sudo apt remove libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev
    

    Seting up the workplace

    $ sudo mkdir /opt/intel
    $ sudo chown usuario:usuario /opt/intel
    $ mkdir ~/msdk
    $ cd ~/msdk
    

    Build MediaSDK

    It will be built in its source directory: ~/msdk/MediaSDK/build

    It will be installed in /opt/intel

    $ git clone https://github.com/Intel-Media-SDK/MediaSDK.git
    $ cd MediaSDK
    $ mkdir build
    $ cd build
    $ cmake ..
    $ make
    $ make install
    

    Build media-driver

    $ cd ~/msdk
    $ git clone https://github.com/intel/media-driver.git
    $ git clone https://github.com/intel/gmmlib.git
    $ mkdir build
    $ cd build
    $ cmake ../media-driver
    $ make
    

    Let’s install media-driver in /opt/intel too

    $ cd ~/msdk/build
    $ cp ./media_driver/iHD_drv_video.so /opt/intel
    

    But don’t remove, rename or move the directori ~/msdk/build because iHD_drv_video.so links against libigdgmm.so.5 which is there. Thus either you keep the directory or you install that library in a path searchable by the linker, or set the environment variable LD_LIBRARY_PATH

    Test environment

    $ LIBVA_DRIVERS_PATH=/opt/intel LIBVA_DRIVER_NAME=iHD vainfo
      libva info: VA-API version 1.3.0
      libva info: va_getDriverName() returns -1
      libva info: User requested driver 'iHD'
      libva info: Trying to open /opt/intel/iHD_drv_video.so
      libva info: Found init function __vaDriverInit_1_3
      libva info: va_openDriver() returns 0
     vainfo: VA-API version: 1.3 (libva 2.2.0)
     vainfo: Driver version: Intel iHD driver - 1.0.0
     vainfo: Supported profile and entrypoints
       VAProfileNone                   : VAEntrypointVideoProc
       VAProfileNone                   : VAEntrypointStats
       VAProfileMPEG2Simple            : VAEntrypointVLD
       VAProfileMPEG2Simple            : VAEntrypointEncSlice
       VAProfileMPEG2Main              : VAEntrypointVLD
       VAProfileMPEG2Main              : VAEntrypointEncSlice
       VAProfileH264Main               : VAEntrypointVLD
       VAProfileH264Main               : VAEntrypointEncSlice
       VAProfileH264Main               : VAEntrypointFEI
       VAProfileH264Main               : VAEntrypointEncSliceLP
       VAProfileH264High               : VAEntrypointVLD
       VAProfileH264High               : VAEntrypointEncSlice
       VAProfileH264High               : VAEntrypointFEI
       VAProfileH264High               : VAEntrypointEncSliceLP
       VAProfileVC1Simple              : VAEntrypointVLD
       VAProfileVC1Main                : VAEntrypointVLD
       VAProfileVC1Advanced            : VAEntrypointVLD
       VAProfileJPEGBaseline           : VAEntrypointVLD
       VAProfileJPEGBaseline           : VAEntrypointEncPicture
       VAProfileH264ConstrainedBaseline: VAEntrypointVLD
       VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
       VAProfileH264ConstrainedBaseline: VAEntrypointFEI
       VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
       VAProfileVP8Version0_3          : VAEntrypointVLD
       VAProfileHEVCMain               : VAEntrypointVLD
       VAProfileHEVCMain               : VAEntrypointEncSlice
       VAProfileHEVCMain               : VAEntrypointFEI
    

    Setup gst-build

    It will be build in its source directory: ~/msdk/gst-build/build

    $ cd ~/msdk
    $ git clone https://gitlab.freedesktop.org/gstreamer/gst-build.git
    $ cd gst-build
    $ export INTELMEDIASDKROOT=/opt/intel/mediasdk
    $ meson build -Dpython=disabled -Dgst-plugins-bad:msdk=enabled
    $ ninja -C build
    

    Check for build elements

    $ ninja -C ~/msdk/gst-build/build  uninstalled
    [gst-master] $ GST_VAAPI_ALL_DRIVERS=1 \
                   LIBVA_DRIVERS_PATH=/opt/intel \
                   LIBVA_DRIVER_NAME=iHD \
                   gst-inspect-1.0 | egrep "vaapi|msdk"
     vaapi:  vaapijpegdec: VA-API JPEG decoder
    vaapi:  vaapimpeg2dec: VA-API MPEG2 decoder
    vaapi:  vaapih264dec: VA-API H264 decoder
    vaapi:  vaapivc1dec: VA-API VC1 decoder
    vaapi:  vaapivp8dec: VA-API VP8 decoder
    vaapi:  vaapih265dec: VA-API H265 decoder
    vaapi:  vaapipostproc: VA-API video postprocessing
    vaapi:  vaapidecodebin: VA-API Decode Bin
    vaapi:  vaapisink: VA-API sink
    vaapi:  vaapimpeg2enc: VA-API MPEG-2 encoder
    vaapi:  vaapih265enc: VA-API H265 encoder
    vaapi:  vaapijpegenc: VA-API JPEG encoder
    vaapi:  vaapih264enc: VA-API H264 encoder
    msdk:  msdkvpp: MSDK Video Postprocessor
    msdk:  msdkvc1dec: Intel MSDK VC1 decoder
    msdk:  msdkvp8enc: Intel MSDK VP8 encoder
    msdk:  msdkvp8dec: Intel MSDK VP8 decoder
    msdk:  msdkmpeg2enc: Intel MSDK MPEG2 encoder
    msdk:  msdkmpeg2dec: Intel MSDK MPEG2 decoder
    msdk:  msdkmjpegenc: Intel MSDK MJPEG encoder
    msdk:  msdkmjpegdec: Intel MSDK MJPEG decoder
    msdk:  msdkh265enc: Intel MSDK H265 encoder
    msdk:  msdkh265dec: Intel MSDK H265 decoder
    msdk:  msdkh264enc: Intel MSDK H264 encoder
    msdk:  msdkh264dec: Intel MSDK H264 decoder
    

    Rembember

    Remember to export these environment variables (perhaps you what to create a script file to set them):

    export GST_VAAPI_ALL_DRIVERS=1
    export LIBVA_DRIVERS_PATH=/opt/intel
     export LIBVA_DRIVER_NAME=iHD
    

    by vjaquez at November 23, 2018 05:05 PM

    November 14, 2018

    Asumu Takikawa

    Data Path Objects in VPP

    A while back, I wrote a blog post explaining some of the basics of writing plugins for the VPP networking toolkit.

    In that previous post, I explained a few mechanisms for hooking a plugin into VPP’s graph architecture so that your code can process incoming packets.

    I also briefly mentioned something called DPOs (data path objects) but didn’t explain what they are or how they work. Since then, I’ve been reading and hacking on code that involves DPOs, so I’d like to attempt to explain them in this post.

    (I’ll be assuming you’ve read the previous post or are already somewhat familiar with VPP, so if that’s not the case you may want to take a look at my previous post)

    Data path objects

    Here’s how DPOs are defined in their main header file (vpp.h):

    A Data-Path Object is an object that represents actions that are applied to packets as they are switched through VPP’s data-path.

    So a DPO is an object, which means that it’s a value that we can create and manipulate (via instances of of dpo_id_t) and also has some behavior (i.e., it has specialized methods or functions that do something).

    By “as they are switched through”, this means that DPOs can be set to activate via rules set in VPP’s FIB (forwarding information base). For example, you can add a DPO that will act on IPv6 packets matching an address prefix that you choose.

    The job of a FIB is to maintain forwarding information so that the switch knows which interfaces on which to forward packets along. With DPOs, you can add entries to the FIB that tell VPP to forward packets via your DPO to a VPP node of your choosing instead (where presumably you will act on the packets somehow).

    You can see this at work by interacting with the FIB in VPP. Here’s an example CLI interaction:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    vpp# dslite set aftr-tunnel-endpoint-address 2001:db8:85a3::8a2e:370:1
    vpp# dslite add pool address 10.1.1.5
    vpp# show fib entry
    FIB Entries:
    [... omitted ...]
    7@2001:db8:85a3::8a2e:370:1/128
      unicast-ip6-chain
      [@0]: dpo-load-balance: [proto:ip6 index:9 buckets:1 uRPF:7 to:[0:0]]
        [0] [@19]: DS-Lite: AFTR:0
    8@10.1.1.5/32
      unicast-ip4-chain
      [@0]: dpo-load-balance: [proto:ip4 index:10 buckets:1 uRPF:8 to:[0:0]]
        [0] [@12]: DS-Lite: AFTR:0
    

    The first two commands are part of the DS-Lite plugin and use some DPOs to set up a kind of IPv4 in IPv6 tunnel. You can see from the results of show fib entry that the commands have populated the FIB with entries for the given addresses and their associated DPO: DS-Lite: AFTR:0.

    The ability to tie into the FIB is why you may want to use DPOs instead of some of the mechanisms I mentioned in the previous blog post. For some applications, it could make sense to hook into a feature arc because you potentially want to look at all packets (e.g., a monitoring program like an IPFIX meter) or, say, all IP packets. But in other cases, you are only interested in packets going to a specific prefix (e.g., you are setting up an endpoint for a tunnel) and would like to take advantage of the FIB for that.

    DPO API

    In order to set up DPOs, you first create an interface of DPO functions for your own DPO type. I’ve been reading the DS-Lite implementation in VPP a lot recently so I’ll show some (simplified) examples from that.

    The typical pattern to use DPOs is to first create your own DPO type and create an API of DPO functions to use with that type. The first part of this API is a constructor function for making instances of the DPO, like dslite_dpo_create:

    1
    2
    3
    4
    5
    6
    7
    dpo_type_t dslite_dpo_type;
    
    void
    dslite_dpo_create (dpo_proto_t dproto, index_t aftr_index, dpo_id_t * dpo)
    {
      dpo_set (dpo, dslite_dpo_type, dproto, aftr_index);
    }
    

    The dpo_set function takes a protocol constant, an index_t, and a dpo_id_t (a struct that identifies a particular DPO). It’s used to initialize the DPO. You could directly ues dpo_set if you wanted by passing in the dslite_dpo_type, so dslite_dpo_create is effectively a partial application of dpo_set.

    A use of the constructor in your API client’s code might look like this:

    1
    2
    3
    4
    5
    /* declaration & temp initialization of DPO */
    dpo_id_t my_dpo = DPO_INVALID;
    
    /* initialize DPO for desired protocol */
    dslite_dpo_create(DPO_PROTO_IP6, 0, &my_dpo);
    

    The constructor takes a few arguments, namely the protocol to use, an index for the DPO, and a pointer to the DPO that’s going to be initialized.

    The most interesting argument is the protocol, which in this case is DPO_PROTO_IP6. You pass in a protocol at construction time because:

    • DPOs can be specialized to work on packets with a specific protocol because the actions you take on them are specialized, and
    • DPOs can be used with more than one protocol type, for example both IPv4 and IPv6.

    In particular, you can also send packets to different nodes depending on the protocol that is matched. This is set up with some additional data structures in the API code like this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    const static char *const dslite_ce_ip4_nodes[] = {
      "dslite-ce-encap",
      NULL,
    };
    
    const static char *const dslite_ce_ip6_nodes[] = {
      "dslite-ce-decap",
      NULL,
    };
    
    const static char *const *const dslite_nodes[DPO_PROTO_NUM] = {
      [DPO_PROTO_IP4] = dslite_ip4_nodes,
      [DPO_PROTO_IP6] = dslite_ip6_nodes,
      [DPO_PROTO_MPLS] = NULL
    };
    

    The code above basically constructs a table mapping DPO protocol to arrays of node names. The table doesn’t have to be exhaustive (relative to all protocols that DPOs work on), but it should cover whatever protocols you want to use with your particular DPO type.

    The nodes specified in your mapping are actually registered for a DPO type by calling the dpo_register_new_type function:

    1
    2
    3
    4
    5
    void
    dslite_dpo_module_init (void)
    {
      dslite_dpo_type = dpo_register_new_type (&dslite_dpo_vft, dslite_nodes);
    }
    

    This dslite_dpo_module_init function is called from the NAT plugin’s initialization function (the DS-Lite code is a part of the NAT code). If you write your own DPO API, you’ll need to register the new DPO type in your VPP plugin’s initialization code.

    You might be wondering where the object-oriented aspect of DPOs come from, given the allusion in the name. When defining your DPO API, you also define a virtual function table struct (dpo_vft_t) that is passed to dpo_register_new_type call shown above. That table might look like this:

    1
    2
    3
    4
    5
    const static dpo_vft_t dslite_dpo_vft = {
      .dv_lock = dslite_dpo_lock,
      .dv_unlock = dslite_dpo_unlock,
      .dv_format = format_dslite_dpo,
    };
    

    In which the fields are basically methods that you implement for the DPO type. For the DS-Lite example, these functions do very little so I won’t go into the details here.

    Using DPOs in forwarding

    Once you’ve defined your DPO type and API, you can use it to forward packets to your VPP graph nodes. In order to hook your DPO up to the FIB, which lets you switch packets to your nodes, you need to construct a DPO instance in your plugin code and then call an function that registers FIB entries.

    This example code from the DS-Lite implementation illustrates some of this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    /* recall constructor examples from earlier */
    dslite_dpo_create (DPO_PROTO_IP6, 0, &dpo);
    
    /* FIB prefix data structure, used below */
    fib_prefix_t pfx = {
      .fp_proto = FIB_PROTOCOL_IP6,
      .fp_len = 128,
      .fp_addr.ip6.as_u64[0] = addr->as_u64[0],
      .fp_addr.ip6.as_u64[1] = addr->as_u64[1],
    };
    
    /* register FIB entry for DPO */
    fib_table_entry_special_dpo_add (0,
                                     &pfx,
                                     /* if you're writing a plugin you use this,
                                        some other DPO code uses other constants */
                                     FIB_SOURCE_PLUGIN_HI,
                                     FIB_ENTRY_FLAG_EXCLUSIVE,
                                     &dpo);
    

    The excerpt above is doing a few things:

    1. Constructs a DPO and putting it in the dpo variable,
    2. Declares a FIB prefix (fib_prefix_t) used for switching (which could be an address for IP, or a label for MPLS), and
    3. Adds an entry to the FIB using the DPO and FIB prefix.

    With that information added, the FIB can start switching packets to the nodes specified in your DPO (in this case to dslite-ce-decap). When packets go to your node, they are processed like in any other VPP node that you write.

    To explain what the above excerpt is doing a bit more concretely, recall that this example is taken from the DS-Lite code. DS-Lite is a mechanism for sending IPv4 traffic tunneled (i.e., encapsulated) over an IPv6 network.

    The DPO above is associated with the server endpoint for a DS-Lite tunnel that does NAT on the inner (decapsulated) packet, which means the server only receives encapsulated packets addressed to its IPv6 address.

    Therefore the prefix is an IPv6 address (put inside .fp_addr.ip6 above) and the prefix length is 128, the full length of the address.

    In other networking setups, you may have a different kind of tunnel in which the client does NAT rather than the server. In this case, you might set a prefix length that isn’t the full length of an address, and instead corresponds to however you allocate your NAT addresses (see the MAP-E code in VPP for an example of this).

    The point here is that DPOs let you use the typical prefix forwarding capabilities of IP (or MPLS, etc) to hook up packets to your VPP node.

    Further reading

    Hopefully this blog post made it a bit clearer why you might want to use DPOs in your own VPP code and how to start doing so. To learn more, I would suggest just reading examples in the code base (files ending with _dpo.c and _dpo.h are helpful, and then look for uses of those API functions).

    The DS-Lite code is also relatively simple and easy to read. The main files for the DPO code in DS-Lite are dslite_dpo.c and dslite.c.

    by Asumu Takikawa at November 14, 2018 06:40 PM

    November 12, 2018

    Michael Catanzaro

    The GNOME (and WebKitGTK+) Networking Stack

    WebKit currently has four network backends:

    • CoreFoundation (used by macOS and iOS, and thus Safari)
    • CFNet (used by iTunes on Windows… I think only iTunes?)
    • cURL (used by most Windows applications, also PlayStation)
    • libsoup (used by WebKitGTK+ and WPE WebKit)

    One guess which of those we’re going to be talking about in this post. Yeah, of course, libsoup! If you’re not familiar with libsoup, it’s the GNOME HTTP library. Why is it called libsoup? Because before it was an HTTP library, it was a SOAP library. And apparently somebody thought that when Mexican people say “soap,” it often sounds like “soup,” and also thought that this was somehow both funny and a good basis for naming a software library. You can’t make this stuff up.

    Anyway, libsoup is built on top of GIO’s sockets APIs. Did you know that GIO has Object wrappers for BSD sockets? Well it does. If you fancy lower-level APIs, create a GSocket and have a field day with it. Want something a bit more convenient? Use GSocketClient to create a GSocketConnection connected to a GNetworkAddress. Pretty straightforward. Everything parallels normal BSD sockets, but the API is nice and modern and GObject, and that’s really all there is to know about it. So when you point WebKitGTK+ at an HTTP address, libsoup is using those APIs behind the scenes to handle connection establishment. (We’re glossing over details like “actually implementing HTTP” here. Trust me, libsoup does that too.)

    Things get more fun when you want to load an HTTPS address, since we have to add TLS to the picture, and we can’t have TLS code in GIO or GLib due to this little thing called “copyright law.” See, there are basically three major libraries used to implement TLS on Linux, and they all have problems:

    • OpenSSL is by far the most popular, but it’s, hm, shall we say technically non-spectacular. There are forks, but the forks have problems too (ask me about BoringSSL!), so forget about them. The copyright problem here is that the OpenSSL license is incompatible with the GPL. (Boring details: Red Hat waves away this problem by declaring OpenSSL a system library qualifying for the GPL’s system library exception. Debian has declared the opposite, so Red Hat’s choice doesn’t gain you anything if you care about Debian users. The OpenSSL developers are trying to relicense to the Apache license to fix this, but this process is taking forever, and the Apache license is still incompatible with GPLv2, so this would make it impossible to use GPLv2+ software except under the terms of GPLv3+. Yada yada details.) So if you are writing a library that needs to be used by GPL applications, like say GLib or libsoup or WebKit, then it would behoove you to not use OpenSSL.
    • GnuTLS is my favorite from a technical standpoint. Its license is LGPLv2+, which is unproblematic everywhere, but some of its dependencies are licensed LGPLv3+, and that’s uncomfortable for many embedded systems vendors, since LGPLv3+ contains some provisions that make it difficult to deny you your freedom to modify the LGPLv3+ software. So if you rely on embedded systems vendors to fund the development of your library, like say libsoup or WebKit, then you’re really going to want to avoid GnuTLS.
    • NSS is used by Firefox. I don’t know as much about it, because it’s not as popular. I get the impression that it’s more designed for the needs of Firefox than as a Linux system library, but it’s available, and it works, and it has no license problems.

    So naturally GLib uses NSS to avoid the license issues of OpenSSL and GnuTLS, right?

    Haha no, it uses a dynamically-loadable extension point system to allow you to pick your choice of OpenSSL or GnuTLS! (Support for NSS was started but never finished.) This is OK because embedded systems vendors don’t use GPL applications and have no problems with OpenSSL, while desktop Linux users don’t produce tivoized embedded systems and have no problems with LGPLv3. So if you’re using desktop Linux and point WebKitGTK+ at an HTTPS address, then GLib is going to load a GIO extension point called glib-networking, which implements all of GIO’s TLS APIs — notably GTlsConnection and GTlsCertificate — using GnuTLS. But if you’re building an embedded system, you simply don’t build or install glib-networking, and instead build a different GIO extension point called glib-openssl, and libsoup will create GTlsConnection and GTlsCertificate objects based on OpenSSL instead. Nice! And if you’re Centricular and you’re building GStreamer for Windows, you can use yet another GIO extension point, glib-schannel, for your native Windows TLS goodness, all hidden behind GTlsConnection so that GStreamer (or whatever application you’re writing) doesn’t have to know about SChannel or OpenSSL or GnuTLS or any of that sad complexity.

    Now you know why the TLS extension point system exists in GIO. Software licenses! And you should not be surprised to learn that direct use of any of these crypto libraries is banned in libsoup and WebKit: we have to cater to both embedded system developers and to GPL-licensed applications. All TLS library use is hidden behind the GTlsConnection API, which is really quite nice to use because it inherits from GIOStream. You ask for a TLS connection, have it handed to you, and then read and write to it without having to deal with any of the crypto details.

    As a recap, the layering here is: WebKit -> libsoup -> GIO (GLib) -> glib-networking (or glib-openssl or glib-schannel).

    So when Epiphany fails to load a webpage, and you’re looking at a TLS-related error, glib-networking is probably to blame. If it’s an HTTP-related error, the fault most likely lies in libsoup. Same for any other GNOME applications that are having connectivity troubles: they all use the same network stack. And there you have it!

    P.S. The glib-openssl maintainers are helping merge glib-openssl into glib-networking, such that glib-networking will offer a choice of GnuTLS or OpenSSL and obsoleting glib-openssl. This is still a work in progress. glib-schannel will be next!

    P.S.S. libcurl also gives you multiple choices of TLS backend, but makes you choose which at build time, whereas with GIO extension points it’s actually possible to choose at runtime from the selection of installed extension points. The libcurl approach is fine in theory, but creates some weird problems, e.g. different backends with different bugs are used on different distributions. On Fedora, it used to use NSS, but now uses OpenSSL, which is fine for Fedora, but would be a license problem elsewhere. Debian actually builds several different backends and gives you a choice, unlike everywhere else. I digress.

    by Michael Catanzaro at November 12, 2018 04:51 AM

    November 07, 2018

    Javier Fernández

    CSS Grid on LayoutNG: a Web Engines Hackfest story

    I had the pleasure to attend the Web Engines Hackfest last week, hosted and organized by Igalia in its HQ in A Coruña. I’m really proud of what we are achieving with this event, thank so much to everybody involved in the organization but, specially, to the people attending. This year we had a lot of talent in a single place, hacking and sharing their expertise on the Web Platform; I really think we all have pushed the Web Platform forward during these days.

    As you may already know, I’m part of the Web Platform team at Igalia working on the implementation of the CSS Grid Layout feature for Blink and WebKit web engines. This work has been sponsored by Bloomberg, as part of the collaboration we started several years ago to improve the Web Platform in many areas, from JS (V8, JSC and even ChakraCore) to different modules of the layout engine (eg. CSS features, editing/selection).

    Since the day I received the invitation to attend the hackfest I knew that one of the tasks I wanted to hack on was the implementation of the CSS Grid feature in the new Chromium’s layout engine (still experimental), known as LayoutNG. Having Chistinan Biesinger, one of the Google engineers working on LayoutNG, here during 3 days was so good to let pass by the oportunity to, at least, start this task. I asked him to give a lighting talk during the layout breakout session about the current status of the LayoutNG project, a brief explanation of some of the most relevant details of its logic and its advantages with the current layout.

    Layout Breakout Session

    A small group of people interested on layout met to discuss about the future of layout in different web engines. I attended with some folks representing Igalia, and other people from Mozilla, Google, WebKit and ARM.

    Christian described the key parts of the new LayoutNG, which provides a simpler code and generally better performance (although results are still preliminary since its under development phase). The concept of fragments gained relevance in this new layout model and currently, inline and block layout is basically complete; the multicolumn layout is quite advanced while Flexbox is still in the early stages (although a substantial portion of the layout tests are passing now).

    We discussed about the different strategies that Firefox, Chrome and Safari are following to redesign the layout logic, which has a huge legacy codebase required to support the old web along the last years. However, browsers need to adapt to the new layout models that a modern Web Platform requires. Chrome with LayoutNG implies a clear bet, with a big team and strong determination; it seems it’ll be ready to ship in the first months of 2019. Firefox is also starting to implement a new layout design with Servo, although I couldn’t get details about its current status and plans. Finally, WebKit started a few months ago a new project called Next-Generation layout which tries to implement a new Layout Formatting Context (LFC) logic from scratch, getting rid f the huge technical debt acquired during the last years; although I couldn’t get confirmation, my opinion is that it’s still an experimental project.

    We also had time to talk about the effort ARM is doing towards a better parallelization of the CSS parsing and style recalc logic, following a similar approach to Mozilla’s Sylo in Servo. It’s a very intresting initiative, but still quite experimental. There is some progress on specific codepahts, but still dealing with Oilpan (Blink’s Garbage Collector) which is the root cause of several issues that prevents to obtain an effective parallelization.

    Hacking, hacking, hacking, ….

    As I commented, this event is designed precisely to gather together some of the most brilliant minds in the Web Platform to discuss, analyze and hack on complex topics that are usually very difficult to handle when working remotely. I had a clear hacking task this time, so that’s why I decided to focus a bit more on coding. Although I already had assumed that implementing CSS Grid in LayoutNG would be a huge challenge, I decided to take it and at least start the task. I took as reference the Flexible Box implementation, which is under development right now and something Christian was partially involved on.

    As it happened with the Flexbible Box implementation, the first step was to redesign the logic so that we can get rid of the dependency of the old Layout Tree, in this case, the LayoutGrid class. This has ben a complex and long task, which took me a quite big part of my time during the hackfest. The following diagrams show the redesign effort achieved, which I’d admit is still a preliminary approach that needs to be refined:

    The next step was to implement an skeleton of the new layout-ng grid algorithm. Thanks to Christian’s direction, I quickly figure out how to do it and it looks like something like this:

    namespace blink {
    
    NGGridLayoutAlgorithm::NGGridLayoutAlgorithm(NGBlockNode node,
                                                 const NGConstraintSpace& space,
                                                 NGBreakToken* break_token)
        : NGLayoutAlgorithm(node, space, ToNGBlockBreakToken(break_token)) {}
    
    scoped_refptr NGGridLayoutAlgorithm::Layout() {
      return container_builder_.ToBoxFragment();
    }
    
    base::Optional NGGridLayoutAlgorithm::ComputeMinMaxSize(
        const MinMaxSizeInput& input) const {
      // TODO Implement this.
      return base::nullopt;
    }
    
    }  // namespace blink
    

    Finally, I tried to implement the Grid layout’s algorithm, according to the CSS Grid Layout feature’s specification, using the new LayoutNG APIs. This is the more complex tasks, since I still have to learn how sizing and positioning functions are used on the new layout logic, specially how to use the new Fragments and ContainerBuilder concepts.

    I submitted a WIP CL so that anybody can take a look, give suggestions or continue with the work. My plan is to devote some time to this challenge, every now and then, but I can’t set specifc goals or schedule for the time being. If anybody wants to speed up this task, perhaps it’d be possible to fund a project, which Igalia would be happy to participate.

    Other Web Engines Hackfest stories

    I also tried to attend some of the talks give during the hackfest and participate in a few breakout sessions. I’ll give now my impression of some of the ones I liked more.

    I enjoyed a lot the one given by Camille Lamy, Colin Blundell and Robert Kroeger (Google) about the Chrome’s Servicification project. The new services design they are implementing is awesome and it will improve for sure Chrome modularity and codebase maintenance.

    I participated in the MathML breakout session, which has somehow related to LayoutNG. Igalia launched a crowfunding campaign to implement the MathML specification in Chrome, using the new LayoutNG APIs. We thin that MathML could be a great success case for the new LayoutNG APIs, which has the goal of provide a stable API to implement new and complex layout models. This model will provide flexibility to the web engine, proving an easier way to implement new layout models without depending too much on the Chrome development cycle. In a way, this development model could be similar to a polyfill, but it’s integrated in the browser as native code instead of via external libraries.

    by jfernandez at November 07, 2018 12:23 PM

    Michael Catanzaro

    Mesa Update Breaks WebKitGTK+ in Fedora 29

    If you’re using Fedora and discovered that WebKitGTK+ is displaying blank pages, the cause is a bad mesa update, mesa-18.2.3-1.fc29. This in turn was caused by a GCC bug that resulted in miscompilation of mesa.

    To avoid this bug, downgrade to mesa-18.2.2-1.fc29:

    $ sudo dnf downgrade mesa*

    You can also update to mesa-18.2.4-2.fc29, but this build has not yet reached updates-testing, let alone stable, so downgrading is easier for now. Another workaround is to run your application with accelerated compositing mode disabled, to avoid OpenGL usage:

    $ WEBKIT_DISABLE_COMPOSITING_MODE=1 epiphany

    On the bright side of things, from all the bug reports I’ve received over the past two days I’ve discovered that lots of people use Epiphany and notice when it’s broken. That’s nice!

    Huge thanks to Dave Airlie for quickly preparing the fixed mesa update, and to Jakub Jelenik for handling the same for GCC.

    by Michael Catanzaro at November 07, 2018 02:28 AM

    November 03, 2018

    Michael Catanzaro

    WebKitGTK+ 2.22.2 and 2.22.3, Media Source Extensions, and YouTube

    Last month, I attended the Web Engines Hackfest (hosted by Igalia in A Coruña, Spain) and also the WebKit Contributors Meeting (hosted by Apple in San Jose, California). These are easily the two biggest WebKit development events of the year, and it’s always amazing to meet everyone in person yet again. A Coruña is an amazing city, and every browser developer ought to visit at least once. And the Contributors Meeting is a no-brainer event for WebKit developers.

    One of the main discussion points this year was Media Source Extensions (MSE). MSE is basically a way for browsers to control how videos are downloaded. Until recently, if you were to play a YouTube video in Epiphany, you’d notice that the video loads way faster than it does in other browsers. This is because WebKitGTK+ — until recently — had no support for MSE. In other browsers, YouTube uses MSE to limit the speed at which video is downloaded, in order to reduce wasted bandwidth in case you stop watching the video before it ends. But with WebKitGTK+, MSE was not available, so videos would load as quickly as possible. MSE also makes it harder for browsers to offer the ability to download the videos; you’ll notice that neither Firefox nor Chrome offer to download the videos in their context menus, a feature that’s been available in Epiphany for as long as I remember.

    So that sounds like it’s good to not have MSE. Well, the downside is that YouTube requires it in order to receive HD videos, to avoid that wasted bandwidth and to make it harder for users to download HD videos. And so WebKitGTK+ users have been limited to 720p video with H.264 and 480p video with WebM, where other browsers had access to 1080p and 1440p video. I’d been stuck with 480p video on Fedora for so long, I’d forgotten that internet video could look good.

    Unfortunately, WebKitGTK+ was quite late to implement MSE. All other major browsers turned it on several years ago, but WebKitGTK+ dawdled. There was some code to support MSE, but it didn’t really work, and was disabled. And so it came to pass that, in September of this year, YouTube began to require MSE to access any WebM video, and we had a crisis. We don’t normally enable major new features in stable releases, but this was an exceptional situation and users would not be well-served by delaying until the next release cycle. So within a couple weeks, we were able to release WebKitGTK+ 2.22.2 and Epiphany 3.30.1 (both on September 21), and GStreamer 1.14.4 (on October 2, thanks to Tim-Philipp Müller for expediting that release). Collectively, these releases enabled basic video playback with MSE for users of GNOME 3.30. And if you still use of GNOME 3.28, worry not: you are still supported and can get MSE if you update to Epiphany 3.28.5 and also have the aforementioned versions of WebKitGTK+ and GStreamer.

    MSE in WebKitGTK+ 2.22.2 had many rough edges because it was a mad rush to get the feature into a minimally-viable state, but those issues have been polished off in 2.22.3, which we released earlier this week on October 29. Be sure you have WebKitGTK+ 2.22.3, plus GStreamer 1.14.4, for a good experience on YouTube. Unfortunately we can’t provide support for older software versions anymore: if you don’t have GStreamer 1.14.4, then you’ll need to configure WebKitGTK+ with -DENABLE_MEDIA_SOURCE=OFF at build time and suffer from lack of MSE.

    Epiphany 3.28.1 uses WebKitSettings to turn on the “enable-mediasource” setting. Turn that on if your application wants MSE now (if it’s a web browser, it certainly does). This setting will be enabled by default in WebKitGTK+ 2.24. Huge thanks to the talented developers who made this feature possible! Enjoy your 1080p and 1440p video.

    by Michael Catanzaro at November 03, 2018 04:19 AM

    On WebKit Build Options (Also: How to Accidentally Disable Important Security Features!)

    When building WebKitGTK+, it’s a good idea to stick to the default values for the build options. If you’re building some sort of embedded system and really know what you’re doing, then OK, it might make sense to change some settings and disable some stuff. But Linux distros are generally well-advised to stick to the defaults to avoid creating problems for users.

    One exception is if you need to disable certain features to avoid newer dependencies when building WebKit for older systems. For example, Ubuntu 18.04 disables web fonts (ENABLE_WOFF2=OFF) because it doesn’t have the libbrotli and libwoff2 dependencies that are required for that feature to work, hence some webpages will display using subpar fonts. And distributions shipping older versions of GStreamer will need to disable the ENABLE_MEDIA_SOURCE option (which is missing from the below feature list by mistake), since that requires the very latest GStreamer to work.

    Other exceptions are the ENABLE_GTKDOC and ENABLE_MINIBROWSER settings, which distros do want. ENABLE_GTKDOC is disabled by default because it’s slow to build, and ENABLE_MINIBROWSER because, well, actually I don’t know why, you always want that one and it’s just annoying to find it’s not built.

    OK, but really now, other than those exceptions, you should probably leave the defaults alone.

    The feature list that prints when building WebKitGTK+ looks like this:

    --  ENABLE_ACCELERATED_2D_CANVAS .......... OFF
    --  ENABLE_DRAG_SUPPORT                     ON
    --  ENABLE_GEOLOCATION .................... ON
    --  ENABLE_GLES2                            OFF
    --  ENABLE_GTKDOC ......................... OFF
    --  ENABLE_ICONDATABASE                     ON
    --  ENABLE_INTROSPECTION .................. ON
    --  ENABLE_JIT                              ON
    --  ENABLE_MINIBROWSER .................... OFF
    --  ENABLE_OPENGL                           ON
    --  ENABLE_PLUGIN_PROCESS_GTK2 ............ ON
    --  ENABLE_QUARTZ_TARGET                    OFF
    --  ENABLE_SAMPLING_PROFILER .............. ON
    --  ENABLE_SPELLCHECK                       ON
    --  ENABLE_TOUCH_EVENTS ................... ON
    --  ENABLE_VIDEO                            ON
    --  ENABLE_WAYLAND_TARGET ................. ON
    --  ENABLE_WEBDRIVER                        ON
    --  ENABLE_WEB_AUDIO ...................... ON
    --  ENABLE_WEB_CRYPTO                       ON
    --  ENABLE_X11_TARGET ..................... ON
    --  USE_LIBHYPHEN                           ON
    --  USE_LIBNOTIFY ......................... ON
    --  USE_LIBSECRET                           ON
    --  USE_SYSTEM_MALLOC ..................... OFF
    --  USE_WOFF2                               ON

    And, asides from the exceptions noted above, those are probably the options you want to ship with.

    Why are some things disabled by default? ENABLE_ACCELERATED_2D_CANVAS is OFF by default because it is experimental (i.e. not great :) and requires CairoGL, which has been available in most distributions for about half a decade now, but still hasn’t reached Debian yet, because the Debian developers know that the Cairo developers consider CarioGL experimental (i.e. not great!). Many of our developers use Debian, and we’re not keen on having two separate sets of canvas bugs depending on whether you’re using Debian or not, so best keep this off for now. ENABLE_GLES2 switches you from desktop GL to GLES, which is maybe needed for embedded systems with crap proprietary graphics drivers, but certainly not what you want when building for a general-purpose distribution with mesa. Then ENABLE_QUARTZ_TARGET is for building on macOS, not for Linux. And then we come to USE_SYSTEM_MALLOC.

    USE_SYSTEM_MALLOC disables WebKit’s bmalloc memory allocator (“fast malloc”) in favor of glibc malloc. bmalloc is performance-optimized for macOS, and I’m uncertain how its performance compares to glibc malloc on Linux. Doesn’t matter really, because bmalloc contains important heap security features that will be disabled if you switch to glibc malloc, and that’s all you need to know to decide which one to use. If you disable bmalloc, you lose the Gigacage, isolated heaps, heap subspaces, etc. I don’t pretend to understand how any of those things work, so I’ll just refer you to this explanation by Sam Brown, who sounds like he knows what he’s talking about. The point is that, if an attacker has found a memory vulnerability in WebKit, these heap security features make it much harder to exploit and take control of users’ computers, and you don’t want them turned off.

    USE_SYSTEM_MALLOC is currently enabled (bad!) in openSUSE and SUSE Linux Enterprise 15, presumably because when the Gigacage was originally introduced, it crashed immediately for users who set address space (virtual memory allocation) limits. Gigacage works by allocating a huge address space to reduce the chances that an attacker can find pointers within that space, similar to ASLR, so limiting the size of the address space prevents Gigacage from working. At first we thought it made more sense to crash than to allow a security feature to silently fail, but we got a bunch of complaints from users who use ulimit to limit the address space used by processes, and also from users who disable overcommit (which is required for Gigacage to allocate ludicrous amounts of address space), and so nowadays we just silently disable Gigacage instead if enough address space for it cannot be allocated. So hopefully there’s no longer any reason to disable this important security feature at build time! Distributions should be building with the default USE_SYSTEM_MALLOC=OFF.

    The openSUSE CMake line currently looks like this:

    %cmake \
      -DCMAKE_BUILD_TYPE=Release \
      -DLIBEXEC_INSTALL_DIR=%{_libexecdir}/libwebkit2gtk%{_wk2sover} \
      -DPORT=GTK \
    %if 0%{?suse_version} == 1315
      -DCMAKE_C_COMPILER=gcc-7 \
      -DCMAKE_CXX_COMPILER=g++-7 \
      -DENABLE_WEB_CRYPTO=OFF \
      -DUSE_GSTREAMER_GL=false \
    %endif
    %if 0%{?suse_version} <= 1500
      -DUSE_WOFF2=false \
    %endif
      -DENABLE_MINIBROWSER=ON \
    %if %{with python3}
      -DPYTHON_EXECUTABLE=%{_bindir}/python3 \
    %endif
    %if !0%{?is_opensuse}
      -DENABLE_PLUGIN_PROCESS_GTK2=OFF \
    %endif
    %ifarch armv6hl ppc ppc64 ppc64le riscv64 s390 s390x
      -DENABLE_JIT=OFF \
    %endif
      -DUSE_SYSTEM_MALLOC=ON \
      -DCMAKE_EXE_LINKER_FLAGS="-Wl,--as-needed -Wl,-z,now -pthread" \
      -DCMAKE_MODULE_LINKER_FLAGS="-Wl,--as-needed -Wl,-z,now -pthread" \
      -DCMAKE_SHARED_LINKER_FLAGS="-Wl,--as-needed -Wl,-z,now -pthread"

    which all looks pretty reasonable to me: certain features that require “newer” dependencies are disabled on the old distros, and NPAPI plugins are not supported in the enterprise distro, and JIT doesn’t work on odd architectures. I would remove the ENABLE_JIT=OFF lines only because WebKit’s build system should be smart enough nowadays to disable it automatically to save you the trouble of thinking about which architectures the JIT works on. And I would also remove the -DUSE_SYSTEM_MALLOC=ON line to ensure users are properly protected.

    by Michael Catanzaro at November 03, 2018 03:29 AM

    October 25, 2018

    José Dapena

    3 events in a month

    As part of my job at Igalia, I have been attending 2-3 events per year. My role mostly as a Chromium stack engineer is not usually much demanding regarding conference trips, but they are quite important as an opportunity to meet collaborators and project mates.

    This month has been a bit different, as I ended up visiting Santa Clara LG Silicon Valley Lab in California, Igalia headquarters in A Coruña, and Dresden. It was mostly because I got involved in the discussions for the web runtime implementation being developed by Igalia for AGL.

    AGL f2f at LGSVL

    It is always great to visit LG Silicon Valley Lab (Santa Clara, US), where my team is located. I have been participating for 6 years in the development of the webOS web stack you can most prominently enjoy in LG webOS smart TV.

    One of the goals for next months at AGL is providing an efficient web runtime. In LGSVL we have been developing and maintaining WAM, the webOS web runtime. And as it was released with an open source license in webOS Open Source Edition, it looked like a great match for AGL. So my team did a proof of concept in May and it was succesful. At the same time Igalia has been working on porting Chromium browser to AGL. So, after some discussions AGL approved sponsoring my company, Igalia for porting the LG webOS web runtime to AGL.

    As LGSVL was hosting the september 2018 AGL f2f meeting, Igalia sponsored my trip to the event.

    AGL f2f Santa Clara 2018, AGL wiki CC BY 4.0

    So we took the opportunity to continue discussions and progress in the development of the WAM AGL port. And, as we expected, it was quite beneficial to unblock tasks like AGL app framework security integration, and the support of AGL latest official release, Funky Flounder. Julie Kim from Igalia attended the event too, and presented an update on the progress of the Ozone Wayland port.

    The organization and the venue were great. Thanks to LGSVL!

    Web Engines Hackfest 2018 at Igalia

    Next trip was definitely closer. Just 90 minutes drive to our Igalia headquarters in A Coruña.


    Igalia has been organizing this event since 2009. It is a cross-web-engine event, where engineers of Mozilla, Chromium and WebKit have been meeting yearly to do some hacking, and discuss the future of the web.

    This time my main interest was participating in the discussions about the effort by Igalia and Google to support Wayland natively in Chromium. I was pleased to know around 90% of the work had already landed in upstream Chromium. Great news as it will smooth integration of Chromium for embedders using Ozone Wayland, like webOS. It was also great to know the work for improving GPU performance reducing the number of copies required for painting web contents.

    Web Engines Hackfest 2018 CC BY-SA 2.0

    Other topics of my interest:
    – We did a follow-up of the discussion in last BlinkOn about the barriers for Chromium embedders, sharing the experiences maintaining a downstream Chromium tree.
    – Joined the discussions about the future of WebKitGTK. In particular the graphics pipeline adaptation to the upcoming GTK+ 4.

    As usual, the organization was great. We had 70 people in the event, and it was awesome to see all the activity in the office, and so many talented engineers in the same place. Thanks Igalia!

    Web Engines Hackfest 2018 CC BY-SA 2.0

    AGL All Members Meeting Europe 2018 at Dresden

    The last event in barely a month was my first visit to the beautiful town of Dresden (Germany).

    The goal was continuing the discussions for the projects Igalia is developing for AGL platform: Chromium upstream native Wayland support, and the WAM web runtime port. We also had a booth showcasing that work, but also our lightweight WebKit port WPE that was, as usual, attracting interest with its 60fps video playback performance in a Raspberry Pi 2.

    I co-presented with Steve Lemke a talk about the automotive activities at LGSVL, taking the opportunity to update on the status of the WAM web runtime work for AGL (slides here). The project is progressing and Igalia should be landing soon the first results of the work.

    Igalia booth at AGL AMM Europe 2018

    It was great to meet all this people, and discuss in person the architecture proposal for the web runtime, unblocking several tasks and offering more detailed planning for next months.

    Dresden was great, and I can’t help highlighting the reception and guided tour in the Dresden Transportation Museum. Great choice by the organization. Thanks to Linux Foundation and the AGL project community!

    Next: Chrome Dev Summit 2018

    So… what’s next? I will be visiting San Francisco in November for Chrome Dev Summit.

    I can only thank Igalia for sponsoring my attendance to these events. They are quite important for keeping things moving forward. But also, it is also really nice to meet friends and collaborators. Thanks Igalia!

    by José Dapena Paz at October 25, 2018 09:29 AM