Planet Igalia WebKit

September 29, 2022

WPE WebKit Blog

WPE Networking Overview

At the heart of any browser engine is networking: Connecting with services and other users. Unlike other engines, WebKit approaches this more abstractly by leaving a large portion of the networking up to individual ports. This includes network protocols such as HTTP, WebSockets, and WebRTC. The upside to this approach is a higher level of integration with the system-provided libraries and features so WebKit will behave similarly to other software on the platform often with more centralized configuration.

Due to this abstraction there are a few independent layers that make up the networking stack of WPE. In this post, I’ll break down what each layer accomplishes as well as give some insight into the codebase’s structure.

Networking Layers

WebKit Network Layers

WebKit

Before we get into the libraries used for WPE, let’s discuss WebKit itself. Despite abstracting out a lot of the protocol handling, WebKit itself still needs to understand a lot of fundamentals of HTTP.

WebCore (discussed in WPE Overview) understands HTTP requests, headers, and cookies, as they are required to implement many higher-level features. What it does not do is the network operations, most parsing, or on-disk storage. In the codebase, these are represented by ResourceRequest and ResourceResponse objects, which map to general HTTP functionality.

NetworkProcess

A core part of modern web engine security is the multi-process model. In order to defend against exploits, each website runs in its own isolated process that does not have network access. In order to allow for network access, they must talk over IPC to a dedicated NetworkProcess, typically one per browser instance. The NetworkProcess receives a ResourceRequest, creates a NetworkDataTask with it to download the data, and responds with a ResourceResponse to the WebProcess which looks like this:

WebKit Network Flowchart

WPE

WPE implements the platform-specific versions of the classes above as ResourceRequestSoup and NetworkDataTaskSoup, primarily using a library called libsoup.

The libsoup library was originally created for the GNOME project’s email client and has since grown to be a very featureful HTTP implementation, now maintained by Igalia.

At a high level, the main task that libsoup does is manage connections and queued requests to websites and then efficiently streams the responses back to WPE. Properly implementing HTTP is a fairly large task, and this is a non-exhaustive list of features it implements: HTTP/1.1, HTTP/2, WebSockets, cookies, decompression, multiple authentication standards, HSTS, and HTTP proxies.

On its own, libsoup is really focused on the HTTP layer and uses the GLib library to implement many of its networking features in a portable way. This is where TCP, DNS, and TLS are handled. It is also directly used by WebKit for URI parsing and DNS pre-caching.

Using GLib also helps standardize behavior across modern Linux systems. It allows configuration of a global proxy resolver that WebKit, along with other applications, can use.

TLS

Another unique detail of our stack is that TLS is fully abstracted inside of GLib by a project called GLib-Networking. This project provides multiple implementations of TLS that can be chosen at runtime, including OpenSSL and gnutls on Linux. The benefit here is that clients can choose the implementation they prefer—whether for licensing, certification, or technical reasons.

Usage

Let’s go step by step to see some real world usage. If we call webkit_web_view_load_uri() for a new domain it will:

  1. Create a ResourceRequest in WebCore that represents an HTTP request with a few basic headers set.
    • ResourceRequestSoup will create its own internal representation for the request using soup_message_new_for_uri().
  2. This is passed to the NetworkProcess to load this request as a NetworkDataTask.
  3. NetworkDataTaskSoup will send/receive the request/response with soup_session_send() which queues the message to be sent.
  4. libsoup will connect to the host using GSocketClient which does a DNS lookup and TCP connection.
    • If this is a TLS connection GTlsClientConnection will use a library such as gnutls to do a TLS handshake.
  5. libsoup will write the HTTP request and read from the socket parsing the HTTP responses eventually returning the data to WebKit.
  6. WebKit receives this data, along with periodic updates about the state of the request, and sends it out of the NetworkProcess back to the main process as a ResourceResponse eventually loading the data in the WebProcess.

Summary

In conclusion, WebKit provides a very flexible abstraction for platforms, and WPE leverages mature system libraries to provide a portable implementation. It has many layers, but they are all well organized and suited to their tasks.

If you are working with WPE and are interested in collaborating, feel free to contact us. If you are interested in working with Igalia, you can apply here.

September 29, 2022 12:00 AM

September 27, 2022

Manuel Rego

TPAC 2022

A few weeks ago the W3C arranged a new edition of TPAC, this time it was an hybrid event. I had the chance to attend onsite in Vancouver (Canada).

This is the second conference for me this year after the Web Engines Hackfest, and the first one travelling away. It was awesome to meet so many people in real life again, some of them for the first time face to face. Also it was great spending some days with the other igalians attending (Andreu Botella, Brian Kardell and Valerie Young).

This is my third TPAC and I’ve experienced an evolution since my first one back in 2016 in Lisbon. At that time Igalia started to be known in some specific groups, mostly ARIA and CSS Working Groups, but many people didn’t know us at all. However nowadays lots of people know us, Igalia is mentioned every now and then in conversations, in presentations, etc. When someone asks where you work and you reply Igalia, they only have good words about the work we do. “You should hire Igalia to implement that” is a sentence that we heard several times during the whole week. It’s clear how we’ve grown into the ecosystem and how we can have an important impact on the web platform evolution. These days many companies understand the benefits of contributing to the web platform bringing new priorities to the table. And Igalia is ready to help pushing some features forward on the different standard bodies and browser engines.

Igalia sponsored TPAC in bronze level and also the inclusion fund.

The rest of this blog post is about some brief highlights related to my participation on the event.

Web Developers Meetup

On Tuesday evening there was the developer meetup with 4 talks and some demos. The talks were very nice, recordings will be published soon but meanwhile you can check slides at the meetup website. It’s worth to highlight that Igalia is somehow involved in topics related to the four presentations, some of them are probably better known than others. But some examples so you can connect the dots:

Wolvic logo Wolvic logo

On top of the talks, Igalia was showing a demo of Wolvic, the open source browser for WebXR devices. A bunch of people asked questions and got interested on it, and everyone liked the stickers with the Wolvic logo.

Igalia also sponsored this meetup.

CSS

Lots of amazing things have been happening to CSS recently like :has() and Container Queries. But these folks are always thinking in the next steps and during the CSSWG meeting there were discussions about some new proposals:

  • CSS Anchoring: This is about positioning things like pop-ups relative to other elements. The goal is to simplify current situation so it can be done with some CSS lines instead of having to deal manually with some JavaScript code.
  • CSS Toggles: This is a proposal to properly add the toggle concept in CSS, similar to what people have been doing with the “Checkbox Hack”.

Picture of the CSSWG meeting by Rossen Atanassov Picture of the CSSWG meeting by Rossen Atanassov

On the breaks there were lots of informal discussions about the line-clamp property and how to unprefix it. There different proposals about how to address this issue, let’s hope there’s some sort of agreement and this can be implemented soon.

Accessibility and Shadow DOM

Shadow DOM is cool but also quite complex, and right now the accessibility story with Shadow DOM is mostly broken. The Accessibility Object Model (AOM) proposals aim to solve some of these issues.

At TPAC a bunch of folks interested on AOM arranged an informal meeting followed by a number of conversations around some of these topics. On one side, there’s the ARIA reflection proposal that is being worked on in different browsers, which will allow setting ARIA attributes via JavaScript. In addition, there were lots of discussions around Cross-root ARIA Delegation proposal, and its counterpart Cross-root ARIA Reflection, thinking about the best way to solve these kind of issues. We now have some kind of agreement on an initial proposal design that would need to be discussed with the different groups involved. Let’s hope we’re on the right path to find a proper solution to help making Shadow DOM more accessible.

Maps on the Web

This is one of those problems that don’t have a proper solution on the web platform. Igalia has been in conversations around this for a while, for example you can check this talk by Brian Kardell at a W3C workshop. This year Bocoup together with the Natural Resources Canada did a research on the topic, defining a long term roadmap to add support for maps on the Web.

There were a few sessions at TPAC about maps where Igalia participated. Things are still in a kind of early stage, but some of the features described on the roadmap are very interesting and have a broader scope than only maps (for example pan and zoom would be very useful for other more generic use cases too). Let’s see how all this evolves in the future.

Wrap-up

And that’s all from my side after a really great week in the nice city of Vancouver. As usual in this kind of event, the most valuable part was meeting people and having lots of informal conversations all over the place. The hybrid setup was really nice, but still those face-to-face conversations are something different to what you can do attending remotely.

See you all in the next editions!

September 27, 2022 10:00 PM

July 28, 2022

WPE WebKit Blog

WPE QA and tooling

In the previous posts, my colleagues Claudio and Miguel wrote respectively about the major components of the project and, specifically, the graphics architecture of WPE. Today, you’ll see our efforts to improve the quality of both WPE and the experience of working and using it. While the previous entries in this blog post series about WPE aren’t necessarily required in order to read this one, we recommend you to starting with the first post in the series.

Automated testing

Testing is an essential part of the WebKit project, primarily due to the large number of use cases covered by HTML/CSS/Javascript specifications and the need for the project to work correctly in a wide range of configurations.

As an official port of WebKit, WPE uses the former’s testing infrastructure, based on BuildBot. There are two primary servers, one working as an early warning system by testing the patches before they’re committed to the main repository, and another for more extensive testing after accepting the incoming changes.

build.webkit.org screenshot

Currently, the WPE testing bots target debug and release configurations using the Flatpak SDK (more on it later in this article) on 64bit Intel-based Linux Debian systems. We have plans of adding bots running on Raspberry Pi boards in the future. Alongside nightly testing, we keep builder bots covering the Ubuntu LTS/Debian versions we support. After August 14th, 2022, the earliest supported versions will be Ubuntu 20.04 LTS and Debian 11 (Bullseye).

Test suites

Initially, the WPE builder bots build WPE in both release and debug configurations and feed the built packages into the tester bots, which run some test suites according to their configuration, each suite focused in one aspect of the project:

  • Layout tests: The main suite tests whether WebKit correctly renders web pages and its implementation of web APIs. This suite comprises both WebKit’s test cases and the imported tests from Web Platform Test. At the time of writing, it runs over 50,000 test cases.
  • API Tests: This suite tests the API provided to developers by WebKit and its ports. For example, this step tests the WPE API used in Cog.
  • Javascriptcore tests: Covers the JavascriptCore engine, running WebKit’s tests alongside test262, the reference test suite for JS/ECMAScript implementations.
  • WebDriver: Tests from Selenium and W3C WebDriver APIs for browser automation.
  • Other small suites: Tests for WebKit’s tooling components.

Due to a large number of tests and the fast development of both WebKit and the specifications—it’s not uncommon to have dozens of daily commits touching dozens of tests—it’s hard to keep the testing bots green.

For example, while we try to make the tests work on all platforms, many old layout tests use the -expected.txt scheme, where the render tree is printed in a textual format with the text sized in pixels for every node. While this works fine in most cases, many tests have minor differences between the expected result in the Mac platform and the WPE/GTK platform. One of the causes is the font rendering particularities of each port.

Thankfully, this situation improved significantly since the beginning of the project. Among the efforts, many tests are now using a “reference” HTML file, which are HTML files that render to the same expected result as the test case, so both the test case and the reference will use the same font rendering scheme and can be compared pixel by pixel.

Building and running WPE

This section focuses on the experience of building and running WPE in a regular Linux x86–64 system. In a future post, we’ll cover building for and running on embedded devices.

Checking out the code

Recently, WebKit moved to GitHub, so you can clone it directly from there:

$ mkdir ~/dev
$ git clone https://github.com/WebKit/WebKit.git

Note: Due to the size of the project history, you might want to use --depth=1 to clone a single revision, followed by git pull --unshallow from inside the cloned repository to fetch the history if needed.

There’s more information in WebKit’s GitHub wiki about setting up the git checkout for contributing code back to WebKit. It’ll set up some git hooks to do some tasks required by the project, like formatting the commit message and automatically linking the pull request to the Bugzilla issue.

All commands in the following sections are run from inside the cloned repository.

Updating the dependencies (aka The WebKit Flatpak SDK)

Like most complex software projects, WebKit has a reasonably extensive list of dependencies. Keeping a reference set of their versions frozen during development is desirable to make it easier to reproduce bugs and test results. In older times, WPE and WebKitGTK used JHBuild to freeze a set of dependencies. While this worked for a long time, it did not cover all dependencies. Sometimes, there could be minor differences in the layout tests between the reference test bots and the developer machine due to some dependency resolved by the host system outside JHBuild.

To improve reproducibility, since 2020, WPE and WebKitGTK have been using an SDK based on Flatpak (kudos to my colleagues Thibault Saunier and Philippe Normand), with a much more extensive dependency coverage and isolation from the host system. Alongside the dependencies, it ships some tools like rr and supports tools like clangd. Almost all bots enable this SDK, the exception being the LTS/Stable bots; as in the latter, we want to build with the already available packages in each distribution.

$ ./Tools/Scripts/update-webkit-flatpak

The command will set up the local flatpak repository at ./WebKitBuild/UserFlatpak with the downloaded SDK and create some bundled icecc toolchains. This enables distributed builds in local networks…

Building

Once the SDK download finishes, you can use the helper script ./Tools/Scripts/build-webkit, which wraps the cmake command with some pre-set options commonly used in normal development, like enabling developer-only features usually disabled in regular builds. Manually invoking cmake is possible, although usually only when you want more control over the build. To build WPE in release mode, use:

$ ./Tools/Scripts/build-webkit --release --wpe

Optionally, you can pass it multiple arguments to be fed directly to make or cmake with the switches --makeargs=... and --cmakeargs=..., respectively. For example, --makeargs="-j8" will limit make to 8 parallel jobs and --cmakeargs="-DENABLE_GAMEPAD=1" will enable gamepad support (requires libmanette, bundled in the SDK).

The first build might take a while (up to almost one hour in a regular laptop). Fortunately, the SDK uses ccache to avoid recompiling the same object files, so subsequent builds without significant changes usually are faster. For more info on speeding the build, check the wiki.

Running the browser (Cog)

To run Cog, the reference WPE browser, you need a Wayland server, which is common in most Linux systems nowadays.

$ ./Tools/Scripts/run-minibrowser --wpe --release https://wpewebkit.org/
Cog with GTK4 shell screenshot

Running some tests

To run the API tests, which reside in Tools/TestWebKitAPI/Tests/, you can use the following command:

$ ./Tools/Scripts/run-wpe-tests --release --display-server=headless

Other test suites:

  • Layout tests: ./Tools/Scripts/run-wpe-tests
  • JSC tests: ./Tools/Scripts/run-javascriptcore-tests
  • WebDriver: ./Tools/Scripts/run-webdriver-tests

As stated when we described the test suites, the main challenge in testing is keeping up with the fast pace of development, as it’s not uncommon to have some revisions updating hundreds of tests.

Contributing code to WPE

After hacking locally, you can submit your changes following the workflow listed in the WebKit wiki.

Testing WPE in the wild

If you don’t want to build your WPE build or image, there are some options to get a taste of WPE listed on our website, including:

Some of these options, like the prebuilt images and the Balena blocks, will be the subject of future blog posts in this series.

Final thoughts

With this, we conclude this brief overview of WPE automated testing and the main tools we use in our daily work with WPE. In future posts in this area we’ll go deeper in other subjects like testing on embedded boards and debugging practices.

If this post got you interested in collaborating with WPE development, or you are in need of a web engine to run on your embedded device, feel free to contact us. We’ll be pleased to help!

We also have open positions at the WebKit team at Igalia. If you’re motivated by this field and you’re interested in developing your career around it, you can apply here!

July 28, 2022 12:00 AM

July 20, 2022

Víctor Jáquez

Gamepad in WPEWebkit

This is the brief story of the Gamepad implementation in WPEWebKit.

It started with an early development done by Eugene Mutavchi (kudos!). Later, by the end of 2021, I retook those patches and dicussed them with my fellow igalian Adrián, and we decided to come with a slightly different approach.

Before going into the details, let’s quickly review the WPE architecture:

  1. cog library — it’s a shell library that simplifies the task of writing a WPE browser from the scratch, by providing common functionality and helper APIs.
  2. WebKit library — that’s the web engine that, given an URI and other following inputs, returns, among other ouputs, graphic buffers with the page rendered.
  3. WPE library — it’s the API that bridges cog (1) (or whatever other browser application) and WebKit (2).
  4. WPE backend — it’s main duty is to provide graphic buffers to WebKit, buffers supported by the hardware, the operating system, windowing system, etc.

Eugene’s implementation has code in WebKit (implementing the gamepad support for WPE port); code in WPE library with an API to communicate WebKit’s gamepad and WPE backend, which provided a custom implementation of gamepad, reading directly the event in the Linux device. Almost everything was there, but there were some issues:

  • WPE backend is mainly designed as a set of protocols, similar to Wayland, to deal with graphic buffers or audio buffers, but not for input events. Cog library is the place where input events are handled and injected to WebKit, such as keyboard.
  • The gamepad handling in a WPE backend was ad-hoc and low level, reading directly the events from Linux devices. This approach is problematic since there are plenty gamepads in the market and each has its own axis and buttons, so remapping them to the standard map is required. To overcome this issue and many others, there’s a GNOME library: libmanette, which is already used by WebKitGTK port.

Today’s status of the gamepad support is that it works but it’s not yet fully upstreamed.

  • merged ">">libwpe pull request.
  • cog pull request — there are two implementations: none and libmanette. None is just a dummy implementation which will ignore any request for a gamepad provider; it’s provided if libmanette is not available or if available libwpe hasn’t gamepad support.
  • WebKit pull request.

To prove you all that it works my exhibit A is this video, where I play asteroids in a RasberryPi 4 64 bits:

The image was done with buildroot, using its master branch (from a week ago) with a bunch of modifications, such as adding libmanette, a kernel patch for my gamepad device, kernel 5.15.55 and its corresponding firmware, etc.

by vjaquez at July 20, 2022 10:08 AM

July 19, 2022

Manuel Rego

Some highlights of the Web Engines Hackfest 2022

Last month Igalia arranged a new edition of the Web Engines Hackfest in A Coruña (Galicia, Spain), where brought together more than 70 people working on the web platform during 2 days, with lots of discussions and conversations around different features.

This was my first onsite event since “before times”, it was amazing seeing people for real after such a long time, meeting again some old colleagues and also a bunch of people for the first time. Being an organizer of the event meant that they were very busy days for me, but it looks like people were happy with the result and enjoyed the event quite a lot.

This is a brief post about my personal highlights during the event.

Talks

During the hackfest we had an afternoon with 5 talks, the talks were live streamed on YouTube and people could follow them remotely and also ask questions through the event matrix channel.

Leo Balter's Talk

Leo Balter’s Talk

  • Leo Balter talked about how Salesforce participates on the web platform as partner, working with browsers and web standards.
    I really liked this talk, because it explains how companies that use the web platform, can collaborate and have a direct impact on the evolution of the web. And there are many ways to do that: reporting bugs that affect your company, explaining use cases that are important for you and the things you miss from the platform, providing feedback about different features, looking for partners to fix outstanding issues or add support for new stuff, etc.
    Igalia has been showing during the last decade that there’s a way to have an impact on the web platform outside of the big companies and browser vendors. Thanks to our position on the different communities, we can help companies to push features they’re interested in and that would benefit the entire web platform in the future.
  • Dominik Röttsches gave a talk about COLRv1 fonts giving details on Chromium implementation and the different open-source software components involved.
    This new font format allows to do really amazing things and Dominik showed how to create a Galician emoji font with popular things like Tower of Hercules or Polbo á feira. With some early demos on variable COLRv1 and the beginnings of the first Galician emoji font…
  • Daniel Minor explained the work done in Gecko and SpiderMonkey to refactor the internationalization system.
    Very interesting talk with lots of information and details about internationalization, going deep on text segmentation and how it works on different languages, and also introducing the ICU4X project.
  • Ada Rose Cannon did a great introduction to WebXR and Augmented Reality.
    Despite not being onsite, this was an awesome talk and the video was actually a very immersive experience. Ada explained many concepts and features around WebXR and Augmented Reality with a bunch of cool examples and demos.
  • Thomas Steiner talked about Project Fugu APIs that have been implemented in Chromium.
    Using the Web Engines Hackfest logo as example, he explained different new capabilities that Project Fugu is adding to the web through a real application called SVGcode.

It was a great set of talks, and you can now watch them all on YouTube. We hope you enjoy them if you haven’t the chance to watch them yet.

CSS & Interop 2022

On the CSS breakout session we talked about all the new big features that are arriving to browsers these days. Container Queries and :has being probably the most notable examples, features that people have been requesting since the early days and that are shipping into browsers this year.

Apart from that, we talked about the Interop 2022 effort. How the target areas to improve interoperability are defined, and how much it implies the work in some of them.

MathML & Fonts

Frédéric Wang did a nice presentation about MathML and all the work that has been done in the recent years. The feature is close to shipping in Chromium (modulo finding some solution regarding printing or waiting for LayoutNG to be ready to print), that will be a huge step forward for MathML becoming it a feature supported in the major browser engines.

Related to the MathML work there were some discussion around fonts, particularly OpenType MATH fonts, you can read Fred’s post for more details. There are some good news regarding this topic, new macOS version includes STIX Two Math installed by default, and there are ongoing conversations to get some OpenType MATH font by default in Android too.

MathML Breakout Session MathML Breakout Session

Accessibility & AOM

Valerie Young, who has recently started acting as co-chair of the ARIA Working Group, was leading a session around accessibility where we talked about ARIA and related things like AOM.

The Accessibility Object Model (AOM) is an effort that involves a lot of different things. In this session we talked about ARIA Attribute Reflection and the issues making accessible custom elements that use Shadow DOM, and that proposals like Cross-root ARIA Delegation are trying to solve.

Accessibility Breakout Session Accessibility Breakout Session

Acknowledgements

To close this post I’d like to say thank you to everyone that participated in the Web Engines Hackfest, without your presence this event wouldn’t make any sense. Also I’d like to thank the speakers for the great talks and the time devoted to work on them for this event. As usual big thanks to the sponsors Arm, Google and Igalia to make this event possible once more. And thanks again to Igalia for letting me be part of the event organization.

Web Engines Hackfest 2022 Sponsors - Host & Organizer: Igalia. Gold Sponsors: Arm, Google and Igalia. Other Sponsors: Arm (Lunch sponsor) Web Engines Hackfest 2022 Sponsors

July 19, 2022 10:00 PM

July 15, 2022

WPE WebKit Blog

WPE Graphics architecture

Following the previous post in the series about WPE where we talked about the WPE components, this post will explain briefly the WPE graphics architecture, and how the engine is able to render HTML content into the display. If you haven’t read the previous entries in this blog post series about WPE, we recommend you to start with the first post in the series for an introduction, and then come back to this.

DOM + CSS = RenderTree

As the document is parsed, it will begin building the DOM tree and load-blocking CSS resources. At some point, possibly before the entire DOM tree is built, it’s time to draw things on the screen. The first step to render the content of a page is to perform what’s called the attachment, which is merging the DOM tree with the CSS rules, in order to create the RenderTree. This RenderTree is a collection of RenderObjects, structured into a tree, and each of these RenderObjects represent the elements in the DOM tree that have visual output. RenderObjects have the capability to render the associated DOM tree node into a surface by using the GraphicsContext class (in the case of WPE, this GraphicsContext uses Cairo to perform the rendering).

Once the RenderTree is created, the layout is performed, ensuring that each of the RenderObjects have their proper size and position set.

Going from source content to displayed content

It would be possible to render the content of the web page just traversing this RenderTree and painting each of the RenderObjects, but there would be problems when rendering elements that overlap each other, because the order of the elements in the RenderTree doesn’t necessarily match the order in which they must be painted in order to get the appropriate result. For example, an element with a big z-index value should be painted last, no matter its position in the RenderTree.

This is an example of how some HTML content is translated into the RenderTree (there are some RenderObjects missing here that are not relevant for the explanation).

RenderTree generated from example HTML

RenderLayers

In order to ensure that the elements of the RenderTree are rendered in the appropriate order, the concept of RenderLayer is added. A RenderLayer represents a layer in the document containing some elements that have to be rendered at the same depth (even though this is not exactly the case, you can think of each RenderLayer as a group of RenderObjects that are at a certain z-index). Each RenderObject is associated to a RenderLayer either directly or indirectly via an ancestor RenderObject.

RenderLayers are grouped into a tree, which is called the RenderLayer tree, and RenderLayer children are sorted into two lists: those that are below the RenderLayer, and those that are above. With this we have a structure that has grouped all the RenderObjects that have to be rendered together: they will be on top of the content that has has been rendered by the RenderLayers below this one, and and below the content rendered by the RenderLayers over this one.

There are several conditions that can decide whether a RenderLayer is needed for some element, it doesn’t necessarily needs to be due to the usage of z-index. It can be required due to transparency, CSS filters, overflow, transformations, and so on.

Continuing with the example, these are RenderLayers that we would get for that HTML code:

RenderLayer tree generated from example HTML

We can see that there are four RenderLayers:

  • The root one, corresponding to the RenderView element. This is mandatory.
  • Another one corresponding to the first RenderBlock.
  • One corresponding to the RenderVideo element, because video elements always get their own RenderLayer.
  • One corresponding to the transformed RenderBlock.

RenderLayers have a paint method that is able to paint all the RenderObjects associated to the layer into a GraphicsContext (as mentioned, WPE uses Cairo for this). As in the previous case, it’s possible to paint the content of the page at this point just by traversing the RenderLayer tree and requesting the RenderLayers to paint their content, but in this case the result will be the correct one. Actually this is what WebKitGTK does when it’s run with accelerated compositing disabled.

Layer composition

While with the previous step we are already able to render the page contents, this approach is not very efficient, especially when the page contains animations, elements with transparency, etc. This is because in order to paint a single pixel, all the RenderLayers need to be traversed, and those that are contributing to that pixel need to be repainted (totally or partially), even if the content of those RenderLayers hasn’t changed. For example, think about an animation that’s moving an element. For each frame of that animation, the animated element needs to be repainted, but the area that was covered by the animated element in the last frame needs to be repainted as well. The same happens if there’s a translucent element on top of other content. If the translucent element changes, it needs to be repainted, but the content below the translucent element needs to be repainted as well because the blend needs to be performed again.

This would be much more efficient if the content that doesn’t change was somehow separated from the content that’s changing, and we could render those two types of content separately. This is where the composition stage comes into action.

The idea here is that we’re going to paint the RenderLayer contents into intermediate buffers, and then compose those buffers one on top of the other to get the final result. This last step is what we call composition. And it fixes the problems we mentioned with animations of transparency: animations don’t require repainting several RenderLayers. Actually moving an element just means painting one buffer with an offset during the composition. And for transparency, we just need to perform the new blending of the two buffers during the composition, but the RenderLayers of the content below the translucent element don’t need to be repainted.

Once we have the RenderLayer tree, we could just paint each RenderLayer in its own buffer in order to perform the composition. But this would be a waste of memory, as not every RenderLayer needs a buffer. We introduce here a new component, the GraphicsLayer.

GraphicsLayers are a structure used to group those RenderLayers that will render into the same buffer, and it will also contain all the information required to perform the composition of these buffers. A RenderLayer may have a GraphicsLayer associated to it if it requires its own buffer to render. Otherwise, it will render into an ancestor’s buffer (specifically, the first ancestor that has a GraphicsLayer). As usual, GraphicsLayers are structured into a tree.

This is how the example code would be translated into GraphicsLayers.

GraphicsLayer tree generated from example HTML

We can see that we have now three GraphicsLayers:

  • The root one, which is mandatory. It belongs to the RenderView element, but the first RenderBlock will render into this GraphicsLayer's buffer as well.
  • The one for the RenderVideo element, as videos are updated independently from the rest of the content.
  • The one for the transformed element, as the transformed elements are updated independently from the rest of the content.

Whith this structure, now we can render the intermediate buffers of the RenderView and the transformed RenderBlock, and we don’t need to update them any more. For each frame, those buffers will be composited together with the RenderVideo buffer. This RenderVideo will be updating its buffer whenever a new video frame arrives, but it won’t affect the content of the other GraphicsLayers.

So now we have successfully separated the content that is changing and needs to be updated from the content that remains constant and doesn’t need to be repainted anymore, just composited.

Accelerated compositing and threaded accelerated compositing

There’s something else that be done in order to increase the rendering performance, and it’s using the GPU to perform the composition. The GPU is highly optimized to perform operations like the buffer composition that we need to do, as well as handle 3D transforms, blending, etc. We just need to upload the buffers into textures and let the GPU handle the required operations. WPE does this though the usage of the EGL and GLES2 graphics APIs. In order to perform the composition, EGL is used to create a GLES2 EGLContext. Using that context, the intermediate buffers are uploaded to textures, and then those textures are positioned and composited according to their appropriate positions. This leverages the GPU for the composition work, leaving the CPU free to perform other tasks.

This is why this step is called accelerated compositing.

But there’s more.

Until this point, all the steps that are needed to render the content of the page are performed in the main thread. This means that while the main thread is rendering and compositing, it’s not able to perform other tasks, like run JS code.

WPE improves this by using a parallel thread whose only mission is to perform the composition. You can think of it as a thread that runs a loop that composites the incoming buffers using the GPU when there’s content to render. This is what we call threaded accelerated compositing.

This is specially useful when there’s a video or an animation running on the page:

  • If there’s a video running in the page, in the non-threaded case, for each video frame the main thread would need to get the frame and perform the composition with the rest of the page content. In the threaded case, the video element delivers the frames directly to the compositor thread, and requests a composition to be done, without the main thread being involved at all.

  • If there’s an animation, in the non-threaded case, for each frame of the animation the main thread would need to calculate the animation step and then perform the composition of the animated element with the rest of the page content. In the threaded case, the animation is passed to the compositor thread, and the animation steps are calculated on that thread, triggering a composition when needed. The main thread doesn’t need to to anything besides starting the animation.

It would take another post to explain in detail how the threaded accelerated composition is implemented on WPE, but if you’re curious about it, know that WPE uses an specialization of the GraphicsLayer called CoordinatedGraphicsLayer in order to implement this. You can use that as an starting point.

So this is the whole process that’s performed in WPE in order to display the content of a page. We hope it’s useful!

Future

At Igalia we’re constantly evolving and improving WPE, and have ongoing efforts to improve the graphics architecture as well. Besides small optimizations and refactors here and there, the most important goals that we have are:

  • Add a GPU process. Currently the EGL and GLES2 operations are performed in the web process. As there can be several web processes running when several pages are open, this means the browser can be using a lot of EGL contexts in total, which is a waste of memory. Also, all these processes could potentially be affected by errors, leaks, etc., in the code that handles the GPU operations. The idea is to centralize all the GPU operations into a single process, the GPU one, so all the web processes will issue paint requests to the GPU process instead of painting their content themselves. This will reduce the memory usage and improve the software’s robustness.

  • Remove CPU rasterization and paint all the content with GLES2. Using the CPU to paint the layer contents with cairo is expensive, especially in platforms with slow CPUs, as embedded devices sometimes do. Our goal here is to completely remove the cairo rasterization and use GLES2 calls to render the 2D primitives. This will greatly improve the rendering performance.

  • Use ANGLE to perform WebGL operations. WPE currently implements the WebGL 1.0 specification through direct calls to GLES2 methods. We are changing this in order to perform the operations using ANGLE, which will allow WPE to support the WebGL 2.0 specification as well.

But what about the backends?

In the previous post there was a mention of backends that are used to integrate with the underlying platform. How is this relevant to the graphics architecture?

Backends have several missions when it comes to communicate with the platform, but regarding graphics, they have two functions to achieve:

  • Provide a platform dependent surface that WPE will render to. This can be a normal buffer, a Wayland buffer, a native window, or whatever, as long as the system EGL implementation allows creating an EGLContext to render to it.

  • Process WPE indications that a new frame has been rendered, performing whatever tasks are necessary to take that frame to the display. Also notify WPE when that frame was been displayed.

The most common example of this is a Wayland backend, which provides a buffer to WPE for rendering. When WPE has finished rendering the content, it notifies the backend, which sends the buffer to the Wayland compositor, and notifies back to WPE when the frame has been displayed.

So, whatever platform you want to run WPE on, you need to have a backend providing at least these capabilities.

Final thoughts

This was a brief overview of how WPE rendering works, and also what are the major improvements we’re trying to achieve at Igalia. We’re constantly putting in a lot of work to keep WPE the best web engine available for embedded devices.

If this post got you interested in collaborating with WPE development, or you are in need of a web engine to run on your embedded device, feel free to contact us. We’ll be pleased to help!

We also have open positions at the WebKit team at Igalia. If you’re motivated by this field and you’re interested in developing your career around it, you can apply here!

July 15, 2022 12:00 AM

July 01, 2022

Claudio Saavedra

Fri 2022/Jul/01

I wrote a technical overview of the WebKit WPE project for the WPE WebKit blog, for those interested in WPE as a potential solution to the problem of browsers in embedded devices.

This article begins a series of technical writeups on the architecture of WPE, and we hope to publish during the rest of the year further articles breaking down different components of WebKit, including graphics and other subsystems, that will surely be of great help for those interested in getting more familiar with WebKit and its internals.

July 01, 2022 10:39 AM

WPE WebKit Blog

An overview of the WPE WebKit project

In the previous post in this series, we explained that WPE is a WebKit port optimized for embedded devices. In this post, we’ll dive into a more technical overview of the different components of WPE, WebKit, and how they all fit together. If you’re still wondering what a web engine is or how WPE came to be, we recommend you to go back to the first post in the series and then come back here.

WebKit architecture in a nutshell

To understand what makes WPE special, we first need to have a basic understanding of the architecture of WebKit itself, and how it ties together a given architecture/platform and a user-facing web browser.

WebKit, the engine, is split into different components that encapsulate its different parts. At the heart of it is WebCore. As the name suggests, this contains the core features of the engine (rendering, layout, platform access, HTML and DOM support, the graphics layer, etc). However, some of these ultimately depend heavily on the OS and underlying software platform in order to function. For example: how do we actually do any I/O on different platforms? How do we render onscreen? What’s the underlying multimedia platform and how does it decode media and play it?

WebCore handles the multitude of potential answers to these questions by abstracting the implementation of each component and allowing port developers to fill the gaps for each supported platforms. For example, for rendering on Mac, Cocoa APIs implement the graphics APIs needed. On Linux, this can be done through different implementations via Wayland, Vulkan, etc. For networking I/O on Mac, the networking APIs in the Foundation framework are used. On Linux, libsoup fills that gap, and so on.

On the opposite side, for browser implementors to be able to write a browser using WebKit, an API is needed. WebKit, after all, is a library. WebKit ports, besides providing the platform support described above, also provide APIs that suit the target environments: The Apple ports provide Objective-C APIs (which are then used to write Safari and the iOS browsers, for instance), while the GTK+ port provides a GObject-based APIs for Linux (that are used in Epiphany, the GNOME browser, and other GNOME applications that rely on WebKit to render HTML). All of these APIs are built on top of an internal, middle-man, C API that is meant to make it easy for each port to provide a high-level API for browser developers.

With all this in place, it would seem that it shouldn’t be so difficult for any vendor trying to reuse WebKit in a new platform to support new hardware and implement a browser, right? All that you need to do is:

  • Implement backends that integrate with your hardware platform: for multimedia, IO, OS support, networking, graphics, etc.
  • Write an API that you can use to plug the engine into your browser.
  • Maintain the changes needed off-tree, that is, outside the source code tree of WebKit.
  • Keep your implementation up-to-date with the many changes that happen in the WebKit codebase on a daily basis, so that you can update WebKit regularly and take advantage of the many bug fixes, improvements, and new features that land on WebKit continuously.

Does that sound easy? No, it’s not easy at all! In fact, implementation of ports in this fashion is strongly discouraged and vendors who have tried this approach in the past have had to do a huge effort just to play catch-up with the fast-paced development of WebKit. This is where WPE comes to the rescue.

Simplifying browsers development in the diverse embedded world

To simplify the task of porting WebKit to different platforms, Igalia started working on a platform-agnostic, Linux-based, and full-featured port of WebKit. This port relies on existing and mature platform backends for everything that can be easily reused across platforms: multimedia, networking, and I/O, which are already present in-tree and are used by Linux ports, like the GTK one. For the areas that are most likely to require hardware-specific support (that is, graphics and input), WPE abstracts the implementation so that it can be more easily provided out of tree, allowing implementors to avoid having to deal with the WebKit internals more than what’s strictly needed.

Additionally, WPE provides a high-level API that can be used to implement actual browsers. This API is very similar to the WebKitGTK API, making it easy for developers already familiar with the latter to start working with WPE. The cog library also serves as a wrapper around WPE to make it easier still. Once WPE was mature enough, it was accepted by Apple as an official WebKit port, meaning that the port lives now in-tree and takes immediate advantage of the many improvements that land on the WebKit repository on a daily basis.

How does WPE integrate with WebKit?

A diagram of the WPE WebKit architecture

The WPE port has several components. Some are in-tree (that is, are a part of WebKit itself), while others are out-of-tree. Let’s examine those components and how they relate to each other, from top to bottom:

  • The Cog library. While not an integral part of WPE, libcog is a shell library that simplifies the task of writing a WPE browser from the scratch, by providing common functionality and helper APIs. This component also includes the cog browser, a simple WPE browser built on top of libcog that can be used as a reference or a starting point for the development of a new browser for a specific use case.
  • The WPE WebKit API: the entry point for browser developers to the WebKit engine, provides a comprehensive GObject/C API. The cog library uses this API extensively and we recommend relying on it, but for more specific needs and more fine-tuning of the engine, working directly with the WebKit API can be often necessary. The API is stable and easy to use, especially, and for those familiar with the GTK/GNOME platform.
  • WPE’s WebCore implementation: This part, internal to WebKit, implements an abstraction of the graphics and input layers of WebKit. This implementation relies on the libwpe library to provide the functionality required in an abstract way. Thanks to the architecture of WPE, implementors don’t need to bother with the complexities of WebCore and WebKit internals.
  • The libwpe library. This is an out-of-tree library that provides the API required by the WPE port in a generic way to implement the graphical and input backends. Specific functionality for a concrete platform is not provided, but the library relies on the existence of a backend implementation, as is described next.
  • Finally, a WPE backend implementation. This is where all the platform-specific code lives. Backends are loadable modules that can be chosen depending on the underlying hardware. These should provide access to graphics and input depending on the specific architecture, platform, and operating system requirements. As a reference, WPEBackend-fdo is a freedesktop.org-based backend, which uses Wayland and freekdesktop.org technologies, and is supported for several architectures, including NXP and Broadcom chipsets, like the Raspberry Pi, and also regular PC architectures, easing testing and development.

An implementor interested in building a browser in a new architecture only needs to focus on the development of the last component – a WPE backend. Having a backend, starting the development of a WebKit-powered browser is already much easier than it ever was!

For a more detailed description of the architecture of WPE and WebKit, check this article on the architecture of WPE.

OK, sounds interesting, how do I get my hands dirty?

If you have made it this far, you should give WPE a try!

We have listed several on the exploring WPE page. From there, you will see that depending on how interested you are in the project, your background, and what you’d like to do with it, there are different ways!

It can be as easy as installing WPE directly from the most popular Linux distributions or downloading and flashing prebuilt images for the Raspberry Pi. There are easy and flexible options like Flatpak or Balena, which you can dig into to learn more. If you want to build WPE yourself, you can use Yocto and if you’d like to contribute—that’s very welcome!

Happy hacking!

July 01, 2022 12:00 AM

June 29, 2022

Patrick Griffis

WebExtension Support in Epiphany

I’m excited to help bring WebExtensions to Epiphany (GNOME Web) thanks to investment from my employer Igalia. In this post, I’ll go over a summary of how extensions work and give details on what Epiphany supports.

Web browsers have supported extensions in some form for decades. They allow the creation of features that would otherwise be part of a browser but can be authored and experimented with more easily. They’ve helped develop and popularize ideas like ad blocking, password management, and reader modes. Sometimes, as in very popular cases like these, browsers themselves then begin trying to apply lessons upstream.

Toward universal support

For most of this history, web extensions have used incompatible browser-specific APIs. This began to change in 2015 with Firefox adopting an API similar to Chrome’s. In 2020, Safari also followed suit. We now have the foundations of an ecosystem-wide solution.

“The foundations of” is an important thing to understand: There are still plenty of existing extensions built with browser-specific APIs and this doesn’t magically make them all portable. It does, however, provide a way towards making portable extensions. In some cases, existing extensions might just need some porting. In other cases, they may utilize features that aren’t entirely universal yet (or, may never be).

Bringing Extensions to Epiphany

With version 43.alpha Epiphany users can begin to take advantage of some of the same powerful and portable extensions described above. Note that there are quite a few APIs that power this and with this release we’ve covered a meaningful segment of them but not all (details below). Over time our API coverage and interoperability will continue to grow.

What WebExtensions can do: Technical Details

At a high level, WebExtensions allow a private privileged web page to run in the browser. This is an invisible Background Page that has access to a browser JavaScript API. This API, given permission, can interact with browser tabs, cookies, downloads, bookmarks, and more.

Along with the invisible background page, it gives a few options to show a UI to the user. One such method is a Browser Action which is shown as a button in the browser’s toolbar that can popup an HTML view for the user to interact with. Another is an Options Page dedicated to configuring the extension.

Lastly, an extension can inject JavaScript directly into any website it has permissions to via Content Scripts. These scripts are given full access to the DOM of any web page they run in. However content scripts don’t have access to the majority of the browser API but, along with the above pages, it has the ability to send and receive custom JSON messages to all pages within an extension.

Example usage

For a real-world example, I use Bitwarden as my password manager which I’ll simplify how it roughly functions. Firstly there is a Background Page that does account management for your user. It has a Popup that the user can trigger to interface with your account, passwords, and options. Finally, it also injects Content Scripts into every website you open.

The Content Script can detect all input fields and then wait for a message to autofill information into them. The Popup can request the details of the active tab and, upon you selecting an account, send a message to the Content Script to fill this information. This flow does function in Epiphany now but there are still some issues to iron out for Bitwarden.

Epiphany’s current support

Epiphany 43.alpha supports the basic structure described above. We are currently modeling our behavior after Firefox’s ManifestV2 API which includes compatibility with Chrome extensions where possible. Supporting ManifestV3 is planned alongside V2 in the future.

As of today, we support the majority of:

  • alarms - Scheduling of events to trigger at specific dates or times.
  • commands - Keyboard shortcuts.
  • cookies - Management and querying of browser cookies.
  • downloads - Ability to start and manage downloads.
  • menus - Creation of context menu items.
  • notifications - Ability to show desktop notifications.
  • storage - Storage of extension private settings.
  • tabs - Control and monitoring of browser tabs, including creating, closing, etc.
  • windows - Control and monitoring of browser windows.

A notable missing API is webRequest which is commonly used by blocking extensions such as uBlock Origin or Privacy Badger. I would like to implement this API at some point however it requires WebKitGTK improvements.

For specific API details please see Epiphany’s documentation.

What this means today is that users of Epiphany can write powerful extensions using a well-documented and commonly used format and API. What this does not mean is that most extensions for other browsers will just work out of the box, at least not yet. Cross-browser extensions are possible but they will have to only require the subset of APIs and behaviors Epiphany currently supports.

How to install extensions

This support is still considered experimental so do understand this may lead to crashes or other unwanted behavior. Also please report issues you find to Epiphany rather than to extensions.

You can install the development release and test it like so:

flatpak remote-add --if-not-exists gnome-nightly https://nightly.gnome.org/gnome-nightly.flatpakrepo
flatpak install gnome-nightly org.gnome.Epiphany.Devel
flatpak run --command=gsettings org.gnome.Epiphany.Devel set org.gnome.Epiphany.web:/org/gnome/epiphany/web/ enable-webextensions true

You will now see Extensions in Epiphany’s menu and if you run it from the terminal it will print out any message logged by extensions for debugging. You can download extensions most easily from Mozilla’s website.

June 29, 2022 04:00 AM

June 20, 2022

Frédéric Wang

Update on OpenType MATH fonts

I mentioned in a previous post that Igalia organized the Web Engines Hackfest 2022 last week. As usual, fonts were one of the topic discussed. Dominik Röttsches presented COLRv1 color vector fonts in Chrome and OSS (transcript) and we also settled a breakout session on Tuesday morning. Because one issue raised was the availability of OpenType MATH fonts on operating systems, I believe it’s worth giving an update on the latest status…

There are only a few fonts with an OpenType MATH table. Such fonts can be used for math layout e.g. modern TeX engines to render LaTeX, Microsoft Office to render OMML or Web engines to render MathML. Three of such fonts are interesting to consider, so I’m providing a quick overview together with screenshots generated by XeTeX from the LaTeX formula $${\sqrt{\sum_{n=1}^\infty {\frac{10}{n^4}}}} = {\int_0^\infty \frac{2x dx}{e^x-1}} = \frac{\pi^2}{3} \in {\mathbb R}$$:

Recently, Igalia has been in touch with Myles C. Maxfield who has helped with internal discussion at Apple regarding inclusion of STIX Two Math in the list of fonts on macOS. Last week he came back to us announcing it’s now the case on all the betas of macOS 13 Ventura 🎉 ! I just tested it this morning and indeed STIX Two Math is now showing up as expected in the Font Book. Here is the rendering of the last slide of my hackfest presentation in Safari 16.0:

Screenshot of a math formula rendered with STIX Two Math by Safari

Obviously, this is a good news for Chromium and Firefox too. For the former, we are preparing our MathML intent-to-ship and having decent rendering on macOS by default is important. As for the latter, we could in the future finally get rid of hardcoded tables to support the deprecated STIXGeneral set.

Another interesting platform to consider for Chromium is Android. Last week, there has been new activity on the Noto fonts bug and answers seem more encouraging now… So let’s hope we can get a proper math font on Android soon!

Finally, I’m not exactly sure about the situation on Linux and it may be different for the various distributions. STIX and Latin Modern should generally be available as system packages that can be easily installed. It would be nicer if they were pre-installed by default though…

June 20, 2022 12:00 AM