Planet Igalia WebKit

February 20, 2024

Carlos García Campos

A Clarification About WebKit Switching to Skia

In the previous post I talked about the plans of the WebKit ports currently using Cairo to switch to Skia for 2D rendering. Apple ports don’t use Cairo, so they won’t be switching to Skia. I understand the post title was confusing, I’m sorry about that. The original post has been updated for clarity.

by carlos garcia campos at February 20, 2024 06:11 PM

February 19, 2024

Carlos García Campos

WebKitGTK and WPEWebKit Switching to Skia for 2D Graphics Rendering

In recent years we have had an ongoing effort to improve graphics performance of the WebKit GTK and WPE ports. As a result of this we shipped features like threaded rendering, the DMA-BUF renderer, or proper vertical retrace synchronization (VSync). While these improvements have helped keep WebKit competitive, and even perform better than other engines in some scenarios, it has been clear for a while that we were reaching the limits of what can be achieved with a CPU based 2D renderer.

There was an attempt at making Cairo support GPU rendering, which did not work particularly well due to the library being designed around stateful operation based upon the PostScript model—resulting in a convenient and familiar API, great output quality, but hard to retarget and with some particularly slow corner cases. Meanwhile, other web engines have moved more work to the GPU, including 2D rendering, where many operations are considerably faster.

We checked all the available 2D rendering libraries we could find, but none of them met all our requirements, so we decided to try writing our own library. At the beginning it worked really well, with impressive results in performance even compared to other GPU based alternatives. However, it proved challenging to find the right balance between performance and rendering quality, so we decided to try other alternatives before continuing with its development. Our next option had always been Skia. The main reason why we didn’t choose Skia from the beginning was that it didn’t provide a public library with API stability that distros can package and we can use like most of our dependencies. It still wasn’t what we wanted, but now we have more experience in WebKit maintaining third party dependencies inside the source tree like ANGLE and libwebrtc, so it was no longer a blocker either.

In December 2023 we made the decision of giving Skia a try internally and see if it would be worth the effort of maintaining the project as a third party module inside WebKit. In just one month we had implemented enough features to be able to run all MotionMark tests. The results in the desktop were quite impressive, getting double the score of MotionMark global result. We still had to do more tests in embedded devices which are the actual target of WPE, but it was clear that, at least in the desktop, with this very initial implementation that was not even optimized (we kept our current architecture that is optimized for CPU rendering) we got much better results. We decided that Skia was the option, so we continued working on it and doing more tests in embedded devices. In the boards that we tried we also got better results than CPU rendering, but the difference was not so big, which means that with less powerful GPUs and with our current architecture designed for CPU rendering we were not that far from CPU rendering. That’s the reason why we managed to keep WPE competitive in embeeded devices, but Skia will not only bring performance improvements, it will also simplify the code and will allow us to implement new features . So, we had enough data already to make the final decision of going with Skia.

In February 2024 we reached a point in which our Skia internal branch was in an “upstreamable” state, so there was no reason to continue working privately. We met with several teams from Google, Sony, Apple and Red Hat to discuss with them about our intention to switch from Cairo to Skia, upstreaming what we had as soon as possible. We got really positive feedback from all of them, so we sent an email to the WebKit developers mailing list to make it public. And again we only got positive feedback, so we started to prepare the patches to import Skia into WebKit, add the CMake integration and the initial Skia implementation for the WPE port that already landed in main.

We will continue working on the Skia implementation in upstream WebKit, and we also have plans to change our architecture to better support the GPU rendering case in a more efficient way. We don’t have a deadline, it will be ready when we have implemented everything currently supported by Cairo, we don’t plan to switch with regressions. We are focused on the WPE port for now, but at some point we will start working on GTK too and other ports using cairo will eventually start getting Skia support as well.

by carlos garcia campos at February 19, 2024 01:27 PM

February 01, 2024

WPE WebKit Blog

Use Case: Server-side headless rendering

WPE and server-side headless rendering

In many distributed applications, it can be useful to run a light web browser on the server side to render some HTML content or process images, video and/or audio using JavaScript.

Some concrete use-cases can be:

  • Video post-production using HTML overlays.
  • Easy 3D rendering with WebGL that can be broadcasted as a video stream.
  • Reusing the same JavaScript code between a frontend web application and the backend processing.

WPE WebKit is the perfect solution for all those use cases as it offers a lightweight solution which can run on low-end hardware or even within a container. It provides a lot of flexibility at the moment of choosing the backend infrastructure as WPE WebKit can, for instance, run from within a container with a very minimal Linux configuration (no need for any windowing system) and with full hardware acceleration and zero-copy of the video buffers between the GPU and the CPU.

Additionally, the fact that WPE WebKit is optimized for lower-powered devices, makes it also the perfect option for server-side rendering when scaling commercial deployments while keeping cost under control, which is yet another important factor to take into account when considering cloud rendering.

February 01, 2024 12:00 AM

January 29, 2024

WPE WebKit Blog

A New WPE Backend Using EGLStream

What is a WPE Backend?

Depending on the target hardware WPE may need to use different techniques and technologies to ensure correct graphical rendering. To be independent of any user-interface toolkit and windowing system, WPE WebKit delegates the rendering to a third-party API defined in the libwpe library. A concrete implementation of this API is a “WPE backend”.

WPE WebKit is a multiprocess application, the end-user starts and controls the web widgets in the application process (which we often call “the UI process” while the web engine itself uses different subprocesses: WPENetworkProcess is in charge of managing network connections and WPEWebProcess (or “web process”) in charge of the HTML and JavaScript parsing, execution and rendering. The WPE backend is at a crossroads between the UI process and one or more web process instances.

Diagram showing a box for the WPE backend in between the UI process and WPEWebProcess

The WPE backend is a shared library that is loaded at runtime by the web process and by the UI process. It is used to render the visual aspect of a web page and transfer the resulting video buffer from the web process to the application process.

Backend Interfaces

The WPE backend shared library must export at least one symbol called _wpe_loader_interface of type struct wpe_loader_interface as defined in the libwpe API. Presently its only member is load_object, a callback function that receives a string with an interface name and returns concrete implementations of the following interfaces:

The names passed to the .load_object() function are the same as those of the interface types, prefixed with an underscore. For example, a .load_object("_wpe_renderer_host_interface") call must return a pointer to a struct wpe_renderer_host_interface object.

Example C code for a load_object callback.
static struct wpe_renderer_host_interface = { /* ... */ };
static struct wpe_renderer_backend_egl_interface = { /* ... */ };

static void*
my_backend_load_object(const char *name)
{
if (!strcmp(name, "_wpe_renderer_host_interface"))
return &my_renderer_host;
if (!strcmp(name, "_wpe_renderer_backend_egl_interface"))
return &my_renderer_backend_egl;

/* ... */

return NULL;
}

struct wpe_loader_interface _wpe_loader_interface = {
.load_object = my_backend_load_object,
};

Each of these interfaces follow the same base structure: the struct members are callback functions, all interfaces have create and destroy members which act as instance constructor and destructor, plus any additional “methods”. The pointer returned by the create callback will be passed as the object “instance” of the other methods:

struct wpe_renderer_host_interface {
void* (*create)(void);
void (*destroy)(void *object);
/* ... */
};

In the UI process side WPE WebKit will create:

  • One “renderer host” instance, using wpe_renderer_host_interface.create().
  • Multiple “renderer host client” instances, using wpe_renderer_host_interface.create_client(). These are mainly used for IPC communication, one instance gets created for each web process launched by WebKit.
  • Multiple “view backend” instances, using wpe_view_backend_interface.create(). One instance is created for each rendering target in the web process.

In each web process—there can be more than one—WPE WebKit will create:

  • One “renderer backend EGL” instance, using wpe_renderer_backend_egl_interface.create().
  • Multiple “renderer backend EGL target” instances, using wpe_renderer_backend_egl_target_interface.create(). An instance is created for each new rendering target needed by the application.
How about wpe_renderer_backend_egl_offscreen_target_interface?

The rendererBackendEGLTarget instances may be created by the wpe_renderer_backend_egl_target_interface, or the wpe_renderer_backend_egl_offscreen_target_interface depending on the interfaces implemented in the backend.

Here we are only focusing on the wpe_renderer_backend_egl_target_interface that is relying on a classical EGL display (defined in the rendererBackendEGL instance). The wpe_renderer_backend_egl_offscreen_target_interface may be used in very specific use-cases that are out of the scope of this post. You can check its usage in the WPE WebKit source code for more information.

These instances typically communicate with each others using Unix sockets for IPC. The IPC layer must be implemented in the WPE backend itself because the libwpe interfaces only pass around the file descriptors to be used as communication endpoints.

From a topological point of view, all those instances are organized as follows:

From an usage point of view:

  • The rendererHost and rendererHostClient instances are only used to manage IPC endpoints on the UI process side that are connected to each running web process. They are not used by the graphical rendering system.
  • The rendererBackendEGL instance (one per web process) is only used to connect to the native display for a specific platform. For example, on a desktop Linux, the platform may be X11 where the native display would be the result of calling XOpenDisplay(); or the platform may be Wayland and in this case the native display would be the result of calling wl_display_connect(); and so on.
  • The rendererBackendEGLTarget (on the web process side) and viewBackend (on the UI process side) instances are the ones truly managing the web page graphical rendering.

Graphics Rendering

As seen above, the interfaces in charge of the rendering are wpe_renderer_backend_egl_target_interface and wpe_view_backend_interface. During their creation, WPE WebKit exchanges the file descriptors used to establish a direct IPC connection between a rendererBackendEGL (in the web process), and a viewBackend (in the UI process).

During the EGL initialization phase, when a new web process is launched, WebKit will use the native display and platform provided by the wpe_renderer_backend_egl_interface.get_native_display() and .get_platform() functions to create a suitable OpenGL ES context.

When WebKit’s ThreadedCompositor is ready to render a new frame (in the web process), it calls the wpe_renderer_backend_egl_target_interface.frame_will_render() function to let the WPE backend know that rendering is about to start. At this moment, the previously created OpenGL ES context is made current to be used as the target for GL drawing commands.

Once the threaded compositor has finished drawing, it will swap the front and back EGL buffers and call the wpe_renderer_backend_egl_target_interface.frame_rendered() function to signal that the frame is ready. The compositor will then wait until the WPE backend calls wpe_renderer_backend_egl_target_dispatch_frame_complete() to indicate that the compositor may produce a new frame.

What happens inside the .frame_will_render() and .frame_rendered() implementations is up to the WPE backend. As en example, it could set up a Frame Buffer Object to have the web content draw offscreen, in a texture that can be passed back to the UI process for further processing, or use extensions like EGLStream, or DMA-BUF exports to transfer the frame to the UI process without copying the pixel data.

Typically the backend sends each new frame to the corresponding view backend in in its .frame_rendered() function. The application can use the frame until it sends back an IPC message to the renderer target (in the web process) to indicate that the frame is not in use anymore and may be be freed or recycled. Although it is not a requirement to do it at this exact point, usually when a renderer backend receives this message it calls the wpe_renderer_backend_egl_target_dispatch_frame_complete() function to trigger the rendering of a new frame. As a side effect, this mechanism also allows controlling the pace at which new frames are produced.

Using EGLStream

EGLStream is an EGL extension that defines a mechanism to transfer hardware video buffers from one process to another efficiently, without getting them out of GPU memory. Although the extension is supported only in Nvidia hardware, it makes for a good example as it transparently handles some complexities involved, like buffers with multiple planes.

This backend uses the EGLStream extension to transfer graphics buffers from the web process, which acts as a producer, to the UI process acting as a consumer. The producer extension EGL_KHR_stream_producer_eglsurface allows creating a surface that may be used as target for rendering, then using eglSwapBuffers() finishes drawing and sends the result to the consumer. Meanwhile, in the consumer side, the EGL_NV_stream_consumer_eglimage extension is used to turn each buffer into an EGLImage.

The reference source code for this WPE backend is available in the WPEBackend-offscreen-nvidia repository, which has been tested with WPE WebKit 2.38.x or 2.40.x, and libwpe version 1.14.x.

Behold, the Future Belongs to DMA-BUF!

With the growing adoption of DMA-BUF for sharing memory buffers on modern Linux platforms, the WPE WebKit architecture will be evolving and, in the future, the need for a WPE Backend should disappear in most cases.

Ongoing work on WPE WebKit removes the need to provide a WPE backend implementation for most hardware platforms, with a generic implementation using DMA-BUF provided as an integral, built-in feature of WebKit. It will still be possible to provide external implementations for platforms that might need to use custom buffer sharing mechanisms.

From the application developer point of view, in most cases writing programs that use the WPE WebKit API will be simpler, with the complexity of the communication among multiple processes handled by WebKit.

Stream Setup

The steps needed to set up EGLStream endpoints need to be done in a particular order:

  1. Create the consumer.
  2. Get the stream file descriptor for the consumer.
  3. Send the stream file descriptor to the producer.
  4. Create the producer.

First, the consumer needs to be created:

EGLStream createConsumerStream(EGLDisplay eglDisplay) {
static const EGLint s_streamAttribs[] = {
EGL_STREAM_FIFO_LENGTH_KHR, 1,
EGL_CONSUMER_ACQUIRE_TIMEOUT_USEC_KHR, 1000 * 1000,
EGL_NONE
};
return eglCreateStreamKHR(eglDisplay, s_streamAttribs);
}

The EGL_STREAM_FIFO_LENGTH_KHR parameter defines the length of the EGLStream queue. If set to zero, the stream will work in “mailbox” mode and each time the producer has a new frame it will empty the stream content and replace the frame by the new one. If non-zero, the stream works work in “FIFO” mode, which means that the stream queue can contain up to EGL_STREAM_FIFO_LENGTH_KHR frames.

Here we configure a queue for one frame because in this case the specification of EGL_KHR_stream_producer_eglsurface guarantees that calling eglSwapBuffers() on the producer the call will block until the consumer retires the previous frame from queue. This is used as implicit synchronization between the UI process side and the web process side without needing to rely on custom IPC, which would add a small delay between frames.

The EGL_CONSUMER_ACQUIRE_TIMEOUT_USEC_KHR parameter defines the maximum timeout in microseconds to wait on the consumer side to acquire a frame when calling eglStreamConsumerAcquireKHR(). It is only used with the EGL_KHR_stream_consumer_gltexture extension because the EGL_NV_stream_consumer_eglimage extension allows setting a timeout on each call to eglQueryStreamConsumerEventNV() function.

Second, to initialize the consumer using the EGL_NV_stream_consumer_eglimage extension it is enough to call the eglStreamImageConsumerConnectNV() function.

Once the consumer has been initialized, you need to send the EGLStream file descriptor to the producer process. The usual way of achieving this would be using IPC between the two processes, sending the file descriptor in a SCM_RIGHTS message through an Unix socket—although with recent kernels using pidfd_getfd() may be an option if both processes are related.

When the file descriptor is finally received, the producer endpoint can be created using the EGL_KHR_stream_producer_eglsurface extension:

const EGLint surfaceAttribs[] = {
EGL_WIDTH, width,
EGL_HEIGHT, height,
EGL_NONE
};
EGLStream eglStream = eglCreateStreamFromFileDescriptorKHR(eglDisplay, consumerFD);
EGLSurface eglSurface = eglCreateStreamProducerSurfaceKHR(eglDisplay, config, eglStream, surfaceAttribs);

As with pbuffer surfaces, the dimensions need to be specified as surface attributes. When picking a frame buffer configuration with eglChooseConfig() the EGL_SURFACE_TYPE attribute must be set to EGL_STREAM_BIT_KHR. From this point onwards, rendering proceeds as usual: the EGL surface and context are made active, and once the painting is done a call to eglSwapBuffers() will “present” the frame, which in this case means sending the buffer with the pixel data down the EGLStream to the consumer.

Consuming Frames

While on the producer side rendering treats the EGLStream surface like any other, on the consumer some more work is needed to manager the lifetime of the data received: frames have to be manually acquired and released once they are not needed anymore.

The producer calls eglQueryStreamConsumerEventNV() repeatedly to retire the next event from the stream:

  • EGL_STREAM_IMAGE_ADD_NV indicates that there is a buffer in the stream that has not yet been bound to an EGLImage, and the application needs to create a new one to which the actual data will be bound later.
  • EGL_STREAM_IMAGE_AVAILABLE_NV indicates that a new frame is available and that it can be bound to the previously created EGLImage.
  • EGL_STREAM_IMAGE_REMOVE_NV indicates that a buffer has been retired from the stream, and that its associated EGLImage may be released once the application has finished using it.

This translates roughly to the following code:

static constexpr EGLTime MAX_TIMEOUT_USEC = 1000 * 1000;
EGLImage eglImage = EGL_NO_IMAGE;

while (true) {
EGLenum event = 0;
EGLAttrib data = 0;

// WARNING: The specification states that the timeout is in nanoseconds
// (see: https://registry.khronos.org/EGL/extensions/NV/EGL_NV_stream_consumer_eglimage.txt)
// but in reality it is in microseconds, at least with the version 535.113.01 of the NVidia drivers.
if (!eglQueryStreamConsumerEventNV(display, eglStream, MAX_TIMEOUT_USEC, &event, &data))
break;

switch (event) {
case EGL_STREAM_IMAGE_ADD_NV: // Bind an incoming buffer to an EGLImage.
if (eglImage) eglDestroyImage(display, eglImage);
eglImage = eglCreateImage(display, EGL_NO_CONTEXT, EGL_STREAM_CONSUMER_IMAGE_NV,
static_cast<EGLClientBuffer>(eglStream), nullptr);
continue; // Handle the next event.

case EGL_STREAM_IMAGE_REMOVE_NV: // Buffer removed, EGLImage may be disposed.
if (data) {
EGLImage image = reinterpret_cast<EGLImage>(data);
eglDestroyImage(display, image);
if (image == eglImage)
eglImage = EGL_NO_IMAGE;
}
continue; // Handle the next event.

case EGL_STREAM_IMAGE_AVAILABLE_NV: // New frame available.
if (eglStreamAcquireImageNV(display, eglStream, &eglImage, EGL_NO_SYNC))
break;

default:
continue; // Handle the next event.
}

/*** Use the EGLImage here ***/

eglStreamReleaseImageNV(display, eglStream, eglImage, EGL_NO_SYNC);
}

The application is free to use each EGLImage as it sees fit. An obvious example would be to use it as the contents for a texture, which then gets painted in the “content” area of a web browser; or as the contents of the screen for an in-game computer that the player can interact with, enabling display of real, live web content as part of the gaming experience—now that would be a deeply embedded browser!

One Last Thing

There is a small showstopper to have EGLStream support working: currently when WPE WebKit uses surfaceless EGL contexts it sets the surface type to EGL_WINDOW_BIT attribute, while EGL_STREAM_BIT_KHR would be needed instead. A small patch is enough to apply this tweak:

diff --git a/Source/WebCore/platform/graphics/egl/GLContextEGL.cpp b/Source/WebCore/platform/graphics/egl/GLContextEGL.cpp
index d5efa070..5f200edc 100644
--- a/Source/WebCore/platform/graphics/egl/GLContextEGL.cpp
+++ b/Source/WebCore/platform/graphics/egl/GLContextEGL.cpp
@@ -122,9 +122,11 @@ bool GLContextEGL::getEGLConfig(EGLDisplay display, EGLConfig* config, EGLSurfac
attributeList[13] = EGL_PIXMAP_BIT;
break;
case GLContextEGL::WindowSurface:
- case GLContextEGL::Surfaceless:
attributeList[13] = EGL_WINDOW_BIT;
break;
+ case GLContextEGL::Surfaceless:
+ attributeList[13] = EGL_STREAM_BIT_KHR;
+ break;
}

EGLint count;

January 29, 2024 06:00 AM