Planet Igalia

April 22, 2021

Brian Kardell

TAG 2021

TAG 2021

The W3C is in the middle of a big, and arguably very difficult election for the W3C Techincal Architecture Group (aka TAG). The TAG is one of two small bodies within the W3C which are elected by membership. If you're unfamilliar with this, I wrote this exceptionally brief Primer for Busy People. I'd like to tell you why I think this is a big election, why it is complex, what I (parsonally) think about it, and what you can do if you agree.

The current W3C TAG election is both important and complex for several reasons. It's big because 4 of the 6 elected seats are up for election and two exceptional members (Alice Boxhall from Google and David Baron from Mozilla) are unable to run for reelection. It's big because there are) nine candidates and each brings different things to the table (and statements don't always capture enough). It's complex because of the voting system and participation.

Let me share thoughts on what I think would be a good result, and then I'll explain why that's hard to achieve and what I think we need to avoid.

A good result...

I believe the best result involves 3 candidates for sure (listed alphabetically): Lea Verou, Sangwhan Moon and Theresa O’Connor.

Let's start with the incumbents. I fully support re-electing both Theresa O’Connor and Sangwhan Moon and cannot imagine reasons not to. Both have a history in standards, have served well on TAG, have a diversity of knowledge, and are very reasonable and able to work well. Re-electing some good incumbents with these qualities is a practical advantage to the TAG as well as they are already well immersed. These are easy choices.

Lea Verou is another easy choice for me. Lea brings a really diverse background, set of perspectives and skills to the table. She's worked for the W3C, she's a great communicator to developers (this is definitely a great skill in TAG whose outreach is important), she's worked with small teams, produced a number of popular libraries and helped drive some interesting standards. The OpenJS Foundation was pleased to nominate her, but Frontiers and several others were also supportive. Lea also deserves "high marks".

These 3 are also a happily diverse group.

This leaves one seat. There are 3 other candidates who I think would be good, for different reasons: Martin Thompson, Amy Guy and Jeffrey Yaskin. Each of them will bring something different to the table and if I am really honest, it is a hard choice. I wish we could seat all 3 of them, but we can't. At least in this election (that is, they can run again).

For brevity, I will not even attempt to make all the cases here, but I encourage you to read their statements and ask friends. Truth be told, I have a strong sense that "any mix of these 6 could be fine" and different mixes optimize for slightly different things. Also, as I will explain, there are some things that seem slightly more important to me than who I recommend is third best vs fourth or fifth...

TLDR; Turn out the vote

If you find yourself in agreement with me, more or less, I would suggest: "place the 3 I mentioned above (Lea, Tess, Sangwhan) at least your top 4 places", pick a fourth from my other list and put them in whatever order you like..

tess lea sangwhan

I think there are many possible great slates for TAG in 2021, but they all involve Lea, Tess and Sangwhan. Please help support them and place them among your top 4 votes.

If you're a W3C member, your AC Representative votes for you -- tell them. Make sure they vote - the best result definitely depends on higher than average turnout. Suprisingly, about 75% of membership doesn't normally vote. These elections are among the rare times when there are "votes" where there are equal voices. A tiny 1 person member org has exactly the same voting power as every mega company.

If you're not a W3C member, you don't have a way to vote directly but you can publicly state your support and tag in member orgs or reach out to people you know who work for member orgs. Historically this has definitely helped - let's keep W3C working well!


STV, Turnout and Negative Preference

The W3C's election system uses STV. STV stands for "single transferrable vote". Single is the operative word: While you express 9 preferences in this election, only one of those will actually be counted to help someone win a seat. The counting system is (I think) rather complex, but the higher up on your list someone appears it is far more likely to be the one that counts. Each vote that is counted counts a 1 vote in that candidates' favor.

Let me stress why this matters: STV optimizes for choosing diversity of opinion with demonstrably critical support. A passionate group supporting an 'issue' candidate will all place their candidate as the #1 choice - those are guaranteed to be counted.

Historically only about 100 of the W3C's 438 member organizations actually turn out in a good election. Let's imagine turnout is even lower in 2020 and it's only 70. This means that if a candidate reaches 18 votes (a little over 4% of membership) they have a seat, no matter how the rest of us vote - even if everyone else had and actively negative preference for them.

Non-participation is an issue for all voting systems, but it seems like STV can really amplify outcomes which are undesirable here. The only solution to this problem is to increase turnout. Increasing turnout raises the quota bar and helps ensure that this doesn't happen.

Regardless of how you feel about any of the candidates, please help turnout the vote. The W3C works best when as many members vote as possible!

April 22, 2021 07:00 PM

Imanol Fernandez

WebXR landing in WebKit

Since I joined Igalia, I have been working on finishing up the core WebXR implementation in WebKit, focused on the DOM, render loop, graphics and input sources. We are targeting the OpenXR API and we have reached the point where we are able to run some demos, so it is a good time to share a summary of the work that we have done so far. You can check all the patches and the code at the WebKit WebXR module.

April 22, 2021 10:45 AM

April 21, 2021

Víctor Jáquez

Review of Igalia Multimedia activities (2020/H2)

As the first quarter of 2021 has aready come to a close, we reckon it’s time to recap our achievements from the second half of 2020, and update you on the improvements we have been making to the multimedia experience on the Web and Linux in general.

Our previous reports:

WPE / WebKitGTK

We have closed ~100 issues related with multimedia in WebKitGTK/WPE, such as fixed seek issues while playback, plugged memory leaks, gardening tests, improved Flatpak-based developing work-flow, enabled new codecs, etc.. Overall, we improved a bit the multimedia’s user experience on these Webkit engine ports.

To highlight a couple tasks, we did some maintenance work on WebAudio backends, and we upstreamed an internal audio mixer, keeping only one connection to the audio server, such as PulseAudio, instead of multiple connections, one for every audio resource. The mixer combines all streams into a single audio server connection.

Adaptive media streaming for the Web (MSE)

We have been working on a new MSE backend for a while, but along the way many related bugs have appeared and they were squashed. Also many code cleanups has been carried out. Though it has been like yak shaving, we are confident that we will reach the end of this long and winding road soonish.

DRM media playback for the Web (EME)

Regarding digital protected media playback, we worked to upstream OpenCDM, support with Widevine, through RDK’s Thunder framework, while continued with the usual maintenance of the others key systems, such as Clear Key, Widevine and PlayReady.

For more details we published a blog post: Serious Encrypted Media Extensions on GStreamer based WebKit ports.

Realtime communications for the Web (WebRTC)

Just as EME, WebRTC is not currently enabled by default in browsers such as Epiphany because license problems, but they are available for custom adopters, and we are maintaining it. For example, we collaborated to upgrade LibWebRTC to M87 and fixed the expected regressions and gardening.

Along the way we experimented a bit with the new GPUProcess for capture devices, but we decided to stop the experimentation while waiting for a broader adoption of the process, for example in graphics rendering, in WPE/WebKitGTK.

GPUProcess work will be retaken at some point, because it’s not, currently, a hard requirement, since we already have moved capture devices handling from the UIProcess to the WebProcess, isolating all GStreamer operations in the latter.

GStreamer

GStreamer is one of our core multimedia technologies, and we contribute on it on a daily basis. We pushed ~400 commits, with similar number of code reviews, along the second half of 2020. Among of those contributions let us highlight the following list:

  • A lot of bug fixing aiming for release 1.18.
  • Reworked and enhanced decodebin3, the GstTranscoder
    API
    and encodebin.
  • Merged av1parse in video parsers plugin.
  • Merged qroverlay plugin.
  • Iterated on the mono-repo
    proposal, which requires consensus and coordination among the whole community.
  • gstwpe element has been greatly improved from new user requests.
  • Contributed on the new libgstcodecs library, which enables stateless video decoders through different platforms (for example, v4l2, d3d11, va, etc.).
  • Developed a new plugin for VA-API using this library, exposing H.264, H.265, VP9, VP8, MPEG2 decoders and a full featured postprocessor, with better performance, according our measurements, than GStreamer-VAAPI.

Conferences

Despite 2020 was not a year for conferences, many of them went virtual. We attended one, the Mile high video conference, and participated in the Slack workspace.

Thank you for reading this report and stay tuned with our work.

by vjaquez at April 21, 2021 04:49 AM

April 20, 2021

Danylo Piliaiev

Turnips in the wild (Part 1)

Running games and benchmarks is much more exciting than trying to fix a handful of remaining synthetic tests. Turnip, which is an open-source Vulkan driver for recent Adreno GPUs, should already be capable of running real world applications, and they always have a way to break the driver in a new, unexpected ways.

TauCeti Vulkan Technology Benchmark

The benchmark greeted me with this wonderful field which looked a little blocky:

Screenshot of the benchmark with grass/dirt field which looks more like a patchwork

It’s not a crash, it’s not a hang, there is something wrong either with textures or with a shader. So, let’s take a closer look at a frame. Here is the first major draw call which looks wrong:

Screenshot of a bad draw call being inspected in RenderDoc

Now, let’s take a look at textures used by the draw:

More than twelve textures used by the draw call

That’s a lot of textures! But all of them look fine; yes, there are some textures that look blocky, but these textures are just small, so nothing wrong here.

The next stop is the fragment shader, I recently implemented an extension which helps with that. VK_KHR_pipeline_executable_properties allows the reporting of additional information about shaders in a pipeline. In case of Turnip it is statistics about registers and assembly of the shader:

Part of the problematic shader with a few instructions having a warning printed near them
Excerpt from the problematic shader
sam.base0 (f32)(xy)r0.x, r0.z, a1.x	; no field 'HAS_SAMP', WARNING: unexpected bits[0:7] in #cat5-samp-s2en-bindless-a1: 0x6 vs 0x0

Bingo! Usually when the issue is in a shader I have to either make educated guesses and/or reduce the shader until the issue is apparent. However, here we can immediately spot instructions with a warning. It may not be the cause of earth mis-rendering, but these instructions do sampling from textures, soooo.

After fixing their encoding, which was caused by a mistake in the XML definition, the ground is now rendered correctly:

Screenshot of the benchmark with a field which now has proper grass and dirt

After looking a bit more at the frame I saw that the same issue plagued all rock formations, and now it is gone too. The final fix could be seen at

ir3/isa,parser: fix encoding and parsing of bindless s2en SAM (!9628)

Genshin Impact

Now it’s time for a more heavyweight opponent - Genshin Impact, one of the most popular mobile games at the moment. Genshin Impact supports both GLES and Vulkan, and defaults to GLES on all but several devices. In any case, Vulkan could be enabled by editing a config file.

There are several mis-renderings in the game, however here I would describe the one which shows why it is important to use all available validation tooling for such complex API as Vulkan. Other mis-renderings would be a matter for the second post.

Gameplay – Disco Water

Proceeding to the gameplay and running around for a bit revealed a major artifacts on the water surface:

Screenshot of the gameplay with body of water that has large colorful artifacts

This one was a multiframe effect so I had to capture a trace of all frames and then find the one where issue began. The draw call I found:

GIF showing before and after problematic draw call

It adds water to the first framebuffer and a lot of artifacts to the second one. Looking at the framebuffer configuration - I could see that the fragment shader doesn’t write to the second framebuffer. Is it allowed? Yes. Is the behavior defined? No! VK_LAYER_KHRONOS_validation warns us about it, if warnings are enabled:

UNASSIGNED-CoreValidation-Shader-InputNotProduced(WARN / SPEC): Validation Warning:
Attachment 1 not written by fragment shader; undefined values will be written to attachment

“undefined values will be written to attachment” may mean that nothing is written into it, like expected by the game, or that random values are written to the second attachment, like Turnip does. Such behavior should not be relied upon, so Turnip does not contradict the specification here and there is nothing to fix. Case closed.

More mis-renderings and investigations to come in the next post(s).

by Danylo Piliaiev at April 20, 2021 09:00 PM

Eleni Maria Stea

EXT_external_objects and EXT_external_objects_fd for the Intel iris driver have been merged into mesa3D!

This post is a quick status update on OpenGL and Vulkan Interoperability extensions for Linux mesa3D drivers: Both EXT_external_objects and EXT_external_objects_fd implementations for the Intel iris driver have been finally merged into mesa3D earlier today and will be available in next release! 🎉 Parts of these extensions were already upstream (EXT_semaphore, EXT_semaphore_fd, and some drivers … Continue reading EXT_external_objects and EXT_external_objects_fd for the Intel iris driver have been merged into mesa3D!

by hikiko at April 20, 2021 07:49 PM

Enrique Ocaña

GStreamer WebKit debugging tricks using GDB (2/2)

This post is a continuation of a series of blog posts about the most interesting debugging tricks I’ve found while working on GStreamer WebKit on embedded devices. These are the other posts of the series published so far:

Print corrupt stacktraces

In some circumstances you may get stacktraces that eventually stop because of missing symbols or corruption (?? entries).

#3  0x01b8733c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

However, you can print the stack in a useful way that gives you leads about what was next in the stack:

  • For i386: x/256wa $esp
  • For x86_64: x/256ga $rsp
  • For ARM 32 bit: x/256wa $sp

You may want to enable asm-demangle: set print asm-demangle

Example output, the 3 last lines give interesting info:

0x7ef85550:     0x1b87400       0x2     0x0     0x1b87400
0x7ef85560:     0x0     0x1b87140       0x1b87140       0x759e88a4
0x7ef85570:     0x1b87330       0x759c71a9 <gst_base_sink_change_state+956>     0x140c  0x1b87330
0x7ef85580:     0x759e88a4      0x7ef855b4      0x0     0x7ef855b4
...
0x7ef85830:     0x76dbd6c4 <WebCore::AppendPipeline::resetPipeline()::__PRETTY_FUNCTION__>        0x4     0x3     0x1bfeb50
0x7ef85840:     0x0     0x76d59268      0x75135374      0x75135374
0x7ef85850:     0x76dbd6c4 <WebCore::AppendPipeline::resetPipeline()::__PRETTY_FUNCTION__>        0x1b7e300       0x1d651d0       0x75151b74

More info: 1

Sometimes the symbol names aren’t printed in the stack memdump. You can do this trick to iterate the stack and print the symbols found there (take with a grain of salt!):

(gdb) set $i = 0
(gdb) p/a *((void**)($sp + 4*$i++))

[Press ENTER multiple times to repeat the command]

$46 = 0xb6f9fb17 <_dl_lookup_symbol_x+250>
$58 = 0xb40a9001 <g_log_writer_standard_streams+128>
$142 = 0xb40a877b <g_return_if_fail_warning+22>
$154 = 0xb65a93d5 <WebCore::MediaPlayerPrivateGStreamer::changePipelineState(GstState)+180>
$164 = 0xb65ab4e5 <WebCore::MediaPlayerPrivateGStreamer::playbackPosition() const+420>
...

Many times it’s just a matter of gdb not having loaded the unstripped version of the library. /proc/<PID>/smaps and info proc mappings can help to locate the library providing the missing symbol. Then we can load it by hand.

For instance, for this backtrace:

#0  0x740ad3fc in syscall () from /home/enrique/buildroot-wpe/output/staging/lib/libc.so.6 
#1  0x74375c44 in g_cond_wait () from /home/enrique/buildroot-wpe/output/staging/usr/lib/libglib-2.0.so.0 
#2  0x6cfd0d60 in ?? ()

In a shell, we examine smaps and find out that the unknown piece of code comes from libgstomx:

$ cat /proc/715/smaps
...
6cfc1000-6cff8000 r-xp 00000000 b3:02 785380     /usr/lib/gstreamer-1.0/libgstomx.so
...

Now we load the unstripped .so in gdb and we’re able to see the new symbol afterwards:

(gdb) add-symbol-file /home/enrique/buildroot-wpe/output/build/gst-omx-custom/omx/.libs/libgstomx.so 0x6cfc1000
(gdb) bt
#0  0x740ad3fc in syscall () from /home/enrique/buildroot-wpe/output/staging/lib/libc.so.6
#1  0x74375c44 in g_cond_wait () from /home/enrique/buildroot-wpe/output/staging/usr/lib/libglib-2.0.so.0
#2  0x6cfd0d60 in gst_omx_video_dec_loop (self=0x6e0c8130) at gstomxvideodec.c:1311
#3  0x6e0c8130 in ?? ()

Useful script to prepare the add-symbol-file:

cat /proc/715/smaps | grep '[.]so' | sed -e 's/-[0-9a-f]*//' | { while read ADDR _ _ _ _ LIB; do echo "add-symbol-file $LIB 0x$ADDR"; done; }

More info: 1

The “figuring out corrupt ARM stacktraces” post has some additional info about how to use addr2line to translate memory addresses to function names on systems with a hostile debugging environment.

Debugging a binary without debug symbols

There are times when there’s just no way to get debug symbols working, or where we’re simply debugging on a release version of the software. In those cases, we must directly debug the assembly code. The gdb text user interface (TUI) can be used to examine the disassebled code and the CPU registers. It can be enabled with these commands:

layout asm
layout regs
set print asm-demangle

Some useful keybindings in this mode:

  • Arrows: scroll the disassemble window
  • CTRL+p/n: Navigate history (previously done with up/down arrows)
  • CTRL+b/f: Go backward/forward one character (previously left/right arrows)
  • CTRL+d: Delete character (previously “Del” key)
  • CTRL+a/e: Go to the start/end of the line

This screenshot shows how we can infer that an empty RefPtr is causing a crash in some WebKit code.

Wake up an unresponsive gdb on ARM

Sometimes, when you continue (‘c’) execution on ARM there’s no way to stop it again unless a breakpoint is hit. But there’s a trick to retake the control: just send a harmless signal to the process.

kill -SIGCONT 1234

Know which GStreamer thread id matches with each gdb thread

Sometimes you need to match threads in the GStreamer logs with threads in a running gdb session. The simplest way is to ask it to GThread for each gdb thread:

(gdb) set output-radix 16
(gdb) thread apply all call g_thread_self()

This will print a list of gdb threads and GThread*. We only need to find the one we’re looking for.

Generate a pipeline dump from gdb

If we have a pointer to the pipeline object, we can call the function that dumps the pipeline:

(gdb) call gst_debug_bin_to_dot_file_with_ts((GstBin*)0x15f0078, GST_DEBUG_GRAPH_SHOW_ALL, "debug")

by eocanha at April 20, 2021 06:00 AM

April 19, 2021

Samuel Iglesias

Low Resolution Z Buffer support on Turnip

Last year I worked on implementing in Turnip the support for a HW feature present in Qualcomm Adreno GPUs: the low-resolution Z buffer (aka LRZ). This is a HW feature already supported in Freedreno, which is the open-source OpenGL driver for these GPUs.

What is low-resolution Z buffer

Low-resolution Z buffer is very similar to a depth prepass that helps the HW avoid executing the fragment shader on those fragments that will be subsequently discarded by the depth test afterwards (Hidden surface removal). This feature comes with some limitations though, such as the fragment shader not being allowed to have side effects (writing to SSBOs, atomic operations, etc) among others.

The interesting part of this feature is that it allows the applications to submit the vertices in any order (saving CPU time that was otherwise used on ordering them) and the HW will process them in the binning pass as explained below, detecting which ones are occluded and giving an increase in performance in some specific use cases due to this.

Tiled-rendering

To understand better how LRZ works, we need to talk a bit about tiled-based rendering. This is a way of rendering based on subdividing the framebuffer in tiles and rendering each tile separately. The advantage of this design is that the amount of memory and bandwidth is reduced compared to immediate mode rendering systems that draw the entire frame at once. Tile-based rendering is very popular on embedded GPUs, including Qualcomm Adreno.

Entering into more details, the graphics pipeline is divided into three different passes executed per tile of the frame.

Tiled-rendering architecture diagram
Tiled-rendering architecture diagram.
  • The binning pass. This pass processes the geometry of the scene and records in a table on which tiles a primitive will be rendered. By doing this, the HW only needs to render the primitives that affect a specific tile when is processed.

  • The rendering pass. This pass gets the rasterized primitives and executes all the fragment related processes of the pipeline (fragment shader execution, depth pass, stencil pass, blending, etc). Once it finishes, the resolve pass starts.

  • The resolve pass. It first resolves the tile buffer (GMEM) if it is multisample, and copy the final color and depth values for all tile pixels back to system memory. If it is the last tile of the framebuffer, it swap buffers and start the binning pass for next frame.

Where is LRZ used then? Well, in both binning and rendering passes. In the binning pass, it is possible to store the depth value of each vertex of the geometries of the scene in a buffer as the HW has that data available. That is the depth buffer used internally for LRZ. It has lower resolution as too much detail is not needed, which helps to save bandwidth while transferring its contents to system memory.

Thanks to LRZ, the rendering pass is only executed on the fragments that are going to be visible at the end. However, there are some limitations as mentioned before: if a fragment shader has collateral effects, such as writing SSBO, atomics, etc; or if blending is enabled, or if the fragment shader could modify the fragment’s depth… then LRZ cannot be used as the results may be wrong.

However, LRZ brings a couple of things on the table that makes it interesting. One is that applications don’t need to reorder their primitives before submission to be more efficient, that is done by the HW with LRZ automatically. Another one is performance improvements in some use cases. For example, imagine a fragment shader discards parts of fragments but it doesn’t have any other collateral effect otherwise. In that case, although we cannot do early depth testing, we can do early LRZ as we know that some fragments won’t pass a depth test even if they are not discarded by the fragment shader.

Turnip implementation

Talking about the LRZ implementation, I took Freedreno’s code as a starting point to implement LRZ on Turnip. After some months of work, it finally landed in Mesa master.

Last week, more patches related to LRZ landed in Mesa master: the ones fixing LRZ interactions with VK_EXT_extended_dynamic_state, as with this extension the application can change some states in command buffer time that could affect LRZ state and, therefore, we need to track them accordingly.

I also implemented some LRZ improvements that are currently under review also landed (thanks Eric Anholt!), such as the support to do early-LRZ-late-depth test that I mentioned before, which could bring a performance improvement in some applications.

LRZ improvements
Left: original vulkan tutorial demo implementation. Right: same demo modified to discard fragments with red component lower than 0.5f.

For instance, I did some measurements in a vulkan-tutorial.com implementation of my own that I modified to discard a significant amount of fragments (see previous figure). This is one of the cases that early-LRZ-late-depth test helps to improve performance.

When running the modified demo with these patches, I found a performance improvement between 13-16%.

Acknowledgments

All this LRZ work was my first big contribution to this open-source reverse engineered driver! I don’t want to finish this post without thanking publicly Rob Clark for the original Freedreno implementation and his reviews of my work, as well as Jonathan Marek and Connor Abbott for their insightful reviews, advice and tips to make it working. Edited: Many thanks to Eric Anholt for his reviews in the last two patch series!

Happy hacking!

April 19, 2021 07:40 AM

April 16, 2021

Eric Meyer

Scripted Server Startup for MDN and WPT

A sizeable chunk of my work at Igalia so far involves editing and updating the Mozilla Developer Network (MDN), and a smaller chunk has me working on the Web Platform Tests (WPT).  In both cases, the content is stored in large public repositories (MDN, WPT) and contributors are encouraged to fork the repositories, clone them locally, and push updates via the fork as PRs (Pull Requests).  And while both repositories roll in localhost web server setups so you can preview your edits locally, each has its own.

As useful as these are, if you ignore the whole “auto-force a browser page reload every time the file is modified in any way whatsoever” thing that I’ve been trying very hard to keep from discouraging me from saving often, each has to be started in its own way, from within their respective repository directories, and it’s generally a lot more convenient to do so in a separate Terminal window.

I was getting tired of constantly opening a new Terminal window, cding into the correct place, remembering the exact invocation needed to launch the local server, and on and on, so I decided to make my life slightly easier with a few short scripts and aliases.  Maybe this will be useful to you as well.

First, I decided to keep things relatively simple.  Instead of writing a small program that would handle all server startups by parsing shell arguments and what have you, I wrote a couple of very similar shell scripts.  Here’s the script for launching MDN’s localhost:

#!/bin/bash
cd ~/repos/mdn/content/
yarn start

Then I added an alias to ~/.bashrc which employs a technique I swiped from this Stack Overflow answer.

alias mdn-server="open -a Terminal.app ~/bin/mdn-start.bsh"

Translated into English, that means “open the file ~/bin/mdn-start.bsh using the -application Terminal.app”.

Thus, when I type mdn-server in any command prompt, a new Terminal window will open and the shell script mdn-start.bsh will be run; the script switches into the needed directory and launches the localhost server using yarn, as per the MDN instructions.  What’s more, when I’m done working on MDN, I can switch to the window running the server, stop the server with ⌃C (control-C), and the Terminal window closes automatically.

I did something very similar for WPT, except in this case the alias reads:

alias wpt-server="open -a Terminal.app ~/bin/wpt-serve.bsh"

And the script to which it points reads:

#!/bin/bash
cd ~/repos/wpt/
./wpt serve

As I mentioned before, I chose to do it this way rather than writing a single alias (say, local-server) that would accept arguments (mdn, wpt, etc.) and fire off scripts accordingly, but that’s also an option and a viable one at that.

So that’s my little QoL (Quality of Life) upgrade to make working on MDN and WPT a little easier.  I hope it helps you in some way!


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at April 16, 2021 01:55 PM

April 13, 2021

Eleni Maria Stea

Using EGL and the dma_buf kernel framework to associate two textures with the contents of the same buffer without copy taking place

It’s been a few weeks I’ve been experimenting with EGL/GLESv2 as part of my work for WebKit (Browsers) team of Igalia. One thing I wanted to familiarize with was using shared DMA buffers to avoid copying textures in graphics programs. I’ve been experimenting with the dma_buf API, which is a generic Linux kernel framework for … Continue reading Using EGL and the dma_buf kernel framework to associate two textures with the contents of the same buffer without copy taking place

by hikiko at April 13, 2021 01:12 PM

Enrique Ocaña

GStreamer WebKit debugging tricks using GDB (1/2)

I’ve been developing and debugging desktop and mobile applications on embedded devices over the last decade or so. The main part of this period I’ve been focused on the multimedia side of the WebKit ports using GStreamer, an area that is a mix of C (glib, GObject and GStreamer) and C++ (WebKit).

Over these years I’ve had to work on ARM embedded devices (mobile phones, set-top-boxes, Raspberry Pi using buildroot) where most of the environment aids and tools we take for granted on a regular x86 Linux desktop just aren’t available. In these situations you have to be imaginative and find your own way to get the work done and debug the issues you find in along the way.

I’ve been writing down the most interesting tricks I’ve found in this journey and I’m sharing them with you in a series of 7 blog posts, one per week. Most of them aren’t mine, and the ones I learnt in the begining of my career can even seem a bit naive, but I find them worth to share anyway. I hope you find them as useful as I do.

Breakpoints with command

You can break on a place, run some command and continue execution. Useful to get logs:

break getenv
command
 # This disables scroll continue messages
 # and supresses output
 silent
 set pagination off
 p (char*)$r0
continue
end

break grl-xml-factory.c:2720 if (data != 0)
command
 call grl_source_get_id(data->source)
 # $ is the last value in the history, the result of
 # the previous call
 call grl_media_set_source (send_item->media, $)
 call grl_media_serialize_extended (send_item->media, 
  GRL_MEDIA_SERIALIZE_FULL)
 continue
end

This idea can be combined with watchpoints and applied to trace reference counting in GObjects and know from which places the refcount is increased and decreased.

Force execution of an if branch

Just wait until the if chooses a branch and then jump to the other one:

6 if (i > 3) {
(gdb) next
7 printf("%d > 3\n", i);
(gdb) break 9
(gdb) jump 9
9 printf("%d <= 3\n", i);
(gdb) next
5 <= 3

Debug glib warnings

If you get a warning message like this:

W/GLib-GObject(18414): g_object_unref: assertion `G_IS_OBJECT (object)' failed

the functions involved are: g_return_if_fail_warning(), which calls to g_log(). It’s good to set a breakpoint in any of the two:

break g_log

Another method is to export G_DEBUG=fatal_criticals, which will convert all the criticals in crashes, which will stop the debugger.

Debug GObjects

If you want to inspect the contents of a GObjects that you have in a reference…

(gdb) print web_settings 
$1 = (WebKitWebSettings *) 0x7fffffffd020

you can dereference it…

(gdb) print *web_settings
$2 = {parent_instance = {g_type_instance = {g_class = 0x18}, ref_count = 0, qdata = 0x0}, priv = 0x0}

even if it’s an untyped gpointer…

(gdb) print user_data
(void *) 0x7fffffffd020
(gdb) print *((WebKitWebSettings *)(user_data))
{parent_instance = {g_type_instance = {g_class = 0x18}, ref_count = 0, qdata = 0x0}, priv = 0x0}

To find the type, you can use GType:

(gdb) call (char*)g_type_name( ((GTypeInstance*)0x70d1b038)->g_class->g_type )
$86 = 0x2d7e14 "GstOMXH264Dec-omxh264dec"

Instantiate C++ object from gdb

(gdb) call malloc(sizeof(std::string))
$1 = (void *) 0x91a6a0
(gdb) call ((std::string*)0x91a6a0)->basic_string()
(gdb) call ((std::string*)0x91a6a0)->assign("Hello, World")
$2 = (std::basic_string<char, std::char_traits<char>, std::allocator<char> > &) @0x91a6a0: {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x91a6f8 "Hello, World"}}
(gdb) call SomeFunctionThatTakesAConstStringRef(*(const std::string*)0x91a6a0)

See: 1 and 2

by eocanha at April 13, 2021 10:49 AM

April 11, 2021

Andy Wingo

guile's reader, in guile

Good evening! A brief(ish?) note today about some Guile nargery.

the arc of history

Like many language implementations that started life when you could turn on the radio and expect to hear Def Leppard, Guile has a bottom half and a top half. The bottom half is written in C and exposes a shared library and an executable, and the top half is written in the language itself (Scheme, in the case of Guile) and somehow loaded by the C code when the language implementation starts.

Since 2010 or so we have been working at replacing bits written in C with bits written in Scheme. Last week's missive was about replacing the implementation of dynamic-link from using the libltdl library to using Scheme on top of a low-level dlopen wrapper. I've written about rewriting eval in Scheme, and more recently about how the road to getting the performance of C implementations in Scheme has been sometimes long.

These rewrites have a quixotic aspect to them. I feel something in my gut about rightness and wrongness and I know at a base level that moving from C to Scheme is the right thing. Much of it is completely irrational and can be out of place in a lot of contexts -- like if you have a task to get done for a customer, you need to sit and think about minimal steps from here to the goal and the gut doesn't have much of a role to play in how you get there. But it's nice to have a project where you can do a thing in the way you'd like, and if it takes 10 years, that's fine.

But besides the ineffable motivations, there are concrete advantages to rewriting something in Scheme. I find Scheme code to be more maintainable, yes, and more secure relative to the common pitfalls of C, obviously. It decreases the amount of work I will have when one day I rewrite Guile's garbage collector. But also, Scheme code gets things that C can't have: tail calls, resumable delimited continuations, run-time instrumentation, and so on.

Taking delimited continuations as an example, five years ago or so I wrote a lightweight concurrency facility for Guile, modelled on Parallel Concurrent ML. It lets millions of fibers to exist on a system. When a fiber would need to block on an I/O operation (read or write), instead it suspends its continuation, and arranges to restart it when the operation becomes possible.

A lot had to change in Guile for this to become a reality. Firstly, delimited continuations themselves. Later, a complete rewrite of the top half of the ports facility in Scheme, to allow port operations to suspend and resume. Many of the barriers to resumable fibers were removed, but the Fibers manual still names quite a few.

Scheme read, in Scheme

Which brings us to today's note: I just rewrote Guile's reader in Scheme too! The reader is the bit that takes a stream of characters and parses it into S-expressions. It was in C, and now is in Scheme.

One of the primary motivators for this was to allow read to be suspendable. With this change, read-eval-print loops are now implementable on fibers.

Another motivation was to finally fix a bug in which Guile couldn't record source locations for some kinds of datums. It used to be that Guile would use a weak-key hash table to associate datums returned from read with source locations. But this only works for fresh values, not for immediate values like small integers or characters, nor does it work for globally unique non-immediates like keywords and symbols. So for these, we just wouldn't have any source locations.

A robust solution to that problem is to return annotated objects rather than using a side table. Since Scheme's macro expander is already set to work with annotated objects (syntax objects), a new read-syntax interface would do us a treat.

With read in C, this was hard to do. But with read in Scheme, it was no problem to implement. Adapting the expander to expect source locations inside syntax objects was a bit fiddly, though, and the resulting increase in source location information makes the output files bigger by a few percent -- due somewhat to the increased size of the .debug_lines DWARF data, but also due to serialized source locations for syntax objects in macros.

Speed-wise, switching to read in Scheme is a regression, currently. The old reader could parse around 15 or 16 megabytes per second when recording source locations on this laptop, or around 22 or 23 MB/s with source locations off. The new one parses more like 10.5 MB/s, or 13.5 MB/s with positions off, when in the old mode where it uses a weak-key side table to record source locations. The new read-syntax runs at around 12 MB/s. We'll be noodling at these in the coming months, but unlike when the original reader was written, at least now the reader is mainly used only at compile time. (It still has a role when reading s-expressions as data, so there is still a reason to make it fast.)

As is the case with eval, we still have a C version of the reader available for bootstrapping purposes, before the Scheme version is loaded. Happily, with this rewrite I was able to remove all of the cruft from the C reader related to non-default lexical syntax, which simplifies maintenance going forward.

An interesting aspect of attempting to make a bug-for-bug rewrite is that you find bugs and unexpected behavior. For example, it turns out that since the dawn of time, Guile always read #t and #f without requiring a terminating delimiter, so reading "(#t1)" would result in the list (#t 1). Weird, right? Weirder still, when the #true and #false aliases were added to the language, Guile decided to support them by default, but in an oddly backwards-compatible way... so "(#false1)" reads as (#f 1) but "(#falsa1)" reads as (#f alsa1). Quite a few more things like that.

All in all it would seem to be a successful rewrite, introducing no new behavior, even producing the same errors. However, this is not the case for backtraces, which can expose the guts of read in cases where that previously wouldn't happen because the C stack was opaque to Scheme. Probably we will simply need to add more sensible error handling around callers to read, as a backtrace isn't a good user-facing error anyway.

OK enough rambling for this evening. Happy hacking to all and to all a good night!

by Andy Wingo at April 11, 2021 07:51 PM

April 08, 2021

Andy Wingo

sign of the times

Hello all! There is a mounting backlog of things that landed in Guile recently and to avoid having to eat the whole plate in one bite, I'm going to try to send some shorter missives over the next weeks.

Today's is about a silly thing, dynamic-link. This interface is dlopen, but "portable". See, back in the day -- like, 1998 -- there were lots of kinds of systems and how to make and load a shared library portably was hard. You'd have people with AIX and Solaris and all kinds of weird compilers and linkers filing bugs on your project if you hard-coded a GNU toolchain invocation when creating loadable extensions, or hard-coded dlopen or similar to use them.

Libtool provided a solution to create portable loadable libraries, which involved installing .la files alongside the .so files. You could use libtool to link them to a library or an executable, or you could load them at run-time via the libtool-provided libltdl library.

But, the .la files were a second source of truth, and thus a source of bugs. If a .la file is present, so is an .so file, and you could always just use the .so file directly. For linking against an installed shared library on modern toolchains, the .la files are strictly redundant. Therefore, all GNU/Linux distributions just delete installed .la files -- Fedora, Debian, and even Guix do so.

Fast-forward to today: there has been a winnowing of platforms, and a widening of the GNU toolchain (in which I include LLVM as well as it has a mostly-compatible interface). The only remaining toolchain flavors are GNU and Windows, from the point of view of creating loadable shared libraries. Whether you use libtool or not to create shared libraries, the result can be loaded either way. And from the user side, dlopen is the universally supported interface, outside of Windows; even Mac OS fixed their implementation a few years back.

So in Guile we have been in an unstable equilibrium: creating shared libraries by including a probably-useless libtool into the toolchain, and loading them by using a probably-useless libtool-provided libltdl.

But the use of libltdl has not been without its costs. Because libltdl intends to abstract over different platforms, it encourages you to leave off the extension when loading a library, instead promising to try a platform-specific set such as .so, .dll, .dylib etc as appropriate. In practice the abstraction layer was under-maintained and we always had some problems on Mac OS, for example.

Worse, as ltdl would search through the path for candidates, it would only report the last error it saw from the underlying dlopen interface. It was almost always the case that if A and B were in the search path, and A/foo.so failed to load because of a missing dependency, the error you would get as a user would instead be "file not found", because ltdl swallowed the first error and kept trucking to try to load B/foo.so which didn't exist.

In summary, this is a case where the benefits of an abstraction layer decline over time. For a few years now, libltdl hasn't been paying for itself. Libtool is dead, for all intents and purposes (last release in 2015); best to make plans to migrate away, somehow.

In the case of the dlopen replacement, in Guile we ended up rewriting the functionality in Scheme. The underlying facility is now just plain dlopen, for which we shim a version of dlopen on Windows, inspired by the implementation in cygwin. There are still platform-specific library extensions, but that is handled easily on the Scheme layer.

Looking forward, I think it's probably time to replace Guile's use of libtool to create its libraries and executables. I loathe the fact that libtool puts shell scripts in the place of executables in build directories and stashes the actual executables elsewhere -- like, visceral revulsion. There is no need for that nowadays. Not sure what to replace it with, nor on what timeline.

And what about autotools? That, my friends, would be a whole nother blog post. Until then, & probably sooner, happy hacking!

by Andy Wingo at April 08, 2021 07:09 PM

Jacobo Aragunde

A recap of Chromium dialog accessibility enhancements

It’s been a bit more than a year since I started working on Chromium accessibility. Although I’ve worked on several kinds of issues, a lot of them had to do with UI dialogs of some kind. Browser interfaces present many kinds of dialogs and these problems affected all the desktop platforms. This post is a recap of all the kinds of things that have been fixed related with different uses of dialogs in Chromium!

JavaScript prompt dialogs

Several JavaScript things cause dialogs to be shown: Things like alert(), prompt() or even in certain cases the window.beforeunload event (if it is a function which returns a string, this is presented in a dialog before unload happens). These were reported in #779501, #1164927 and #1166370. On Windows ATs, these dialogs were not reading the actual content of the dialog when announced, thus defeating their purpose. This was fixed by removing some workarounds in the MessageBoxView class, a reusable component found inside these dialogs which used to implement alert-like behavior. This confused some ATs, which saw an alert inside a dialog; now it’s the parent dialog who might behave like an alert if needed.

Once this was fixed, we took also care of some speech duplication caused by several items having the same accessible name, reported as #1185092.

Save password and credit card dialogs

But browser interfaces themselves also use dialogs to create (hopefully) helpful interfaces for their users to do things like offering to save sensitive information like save a password or credit card information. Sadly, neither the save password or credit card dialogs were announced (#1079320, #1119367, #1125118). A previous attempt to enforce announcing “save password” (of which I talked in a previous post) caused it to be announced twice when using the toolbar icon to open it (#1132318).

These dialogs are particular because they have two behaviors: they can be explicitly opened using a toolbar icon, in which case they get focus immediately and must use the standard dialog role, or can be opened when a password is submitted. In the latter case, they don’t get focus and so they need to be announced as alerts. We fixed the problem by making sure we use one or the other role depending on how they are triggered, which enforces the correct code paths to produce alert event only when necessary.

Another useful things browsers offer to do is translate a page. This problem also affected the translate page dialog (#1119734). It shares code and behavior with the aforementioned ones and was fixed at the same time.

Accounts and sync menu

Accounts and sync, the avatar icon in the toolbar, looks and behaves more similarly to a menu but it’s implemented with dialog (bubble) code. There were, unfortunately, some problems with it: it was not announced when opened (#1098304) and contents that were informative and not focusable were not announced (#1078580 and #1161166). We used a variety of techniques to fix the problem:

  • The bubble already used a menu-like container role; so we enforce it to behave completely like a menu, using all the corresponding roles and emitting menu open/close events.
  • Use the informative, inaccessible titles as accessible names for containers, to make ATs announce them by context when focus jumps inside the container. Enforce certain role names for this to work on Windows.
  • Make the top-most, inaccessible content (user name and email) part of the menu title for it to be announced by context when the menu/dialog is opened.

Extension-related dialogs

Extension management also involves a lot of dialogs (install, manage, remove, confirmation). We detected and fixed the issue #1123107: the “add extension” dialog was not properly announced on Linux.

System dialogs

Browsers also launch system dialogs (open, save, print, etc). In the issue #1042864, keystroke events were not being produced in these cases.. This was quite a complex one! I wrote two posts dedicated to just this problem and its solution.

Restore after crash dialog

If your browser crashes, a dialog asking the user if they’d like to restore the tabs that were open before the crash is show. This dialog was not being announced (#1023822) and could not be focused with the Alt+Shift+A and F6 hotkeys (#1042010). These were the first issues I worked on, and wrote a specific post about them as well.

Site permission dialogs

Dialogs are also used in browsers to ask for their explicit permissions to access powerful features. We briefly mentioned how these were fixed in a previous post and we fixed their announcement (#1052675) and their ability to be focused with hotkeys (#1052676). In this process, I wrote the required code to emit alert events in the Linux accessibility backend, which was still missing at that point.

Additionally, I added more tests and cleaned up things a bit, refactoring code and removing some redundant events. It’s been quite a trip, building and testing on all the desktop platforms, taking into account platform-specific behavior and AT quirks… I still plan to keep working on this area, hardening automated testing if possible to prevent regressions, which should reduce the amount of effort in the long run.

Thanks for reading, and happy hacking!

by Jacobo Aragunde Pérez at April 08, 2021 04:00 PM

April 06, 2021

Eleni Maria Stea

Setting up to debug ANGLE with GDB (on Linux)

ANGLE is an EGL/GLES2 implementation on top of other graphics APIs. It is mainly used in systems that lack a native GLES2/GLES3 driver and in some browsers for example Chromium. As recently, I’ve used it for some browsers related work in Igalia‘s WebKit team (more on that coming soon) and had to set it up … Continue reading Setting up to debug ANGLE with GDB (on Linux)

by hikiko at April 06, 2021 06:53 PM

April 04, 2021

Manuel Rego

:focus-visible in WebKit - March 2021

Another month is gone, and we are back with another status update (see January and February ones).

This is about the work Igalia is doing on the implementation of :focus-visible in WebKit. This is a part of the Open Prioriziatation campaign and being sponsored by many people. Thank you!

The work on March has slowed down, so this status update is smaller than previous ones. Main focus has been around spec discussions trying to reach agreement.

Implementation details

The initial patch is available in the last Safari Technology Preview (STP) releases behind a runtime flag, but it has an annoying bug that was causing the body element to match :focus-visible when you used the keyboard to move focus. The issue was fixed past month but it hasn’t been included on a STP release yet (hopefully it’ll made it in release 124). Apart from that some minor patches related to implementation details have landed too. But this was just a small part of the work during March.

In addition I realized that :focus-visible appears in the Chromium and Firefox DevTools, so I took a look about how to make that happen on WebKit too. At that point I realized that :focus-within, which has been shipping for a long time, isn’t available in WebKit Web Inspector yet, so I cooked a simple patch to add it there. However that hasn’t landed yet, because it needs some UI rework, otherwise the list of pseudo-classes is going to be too long and not looking nice on the inspector. So the patch is waiting for some changes on the UI before it can be merged. Once that’s solved, adding :focus-within and :focus-visible to the Web Inspector is going to be pretty straight forward.

Spec discussions

This was the main part of the work during March, and the goal was to reach some agreement before finishing the implementation bits.

The main issue was how :focus-visible should work when a script moves focus. The behavior from the current implementations was not interoperable, the spec was not totally clear and, as explained on the previous report, in order to clarify this I created a set of new tests. These tests demonstrated some interesting incompatibilities. Based on this, we compared the results with the widely used polyfill as well. We found that there were various misalignments on tricky cases which generated significant discussions on which was correct, and why. After considerable discussion with people from Google and Mozilla, it looks like we have finally reached an agreement on the expectations.

Next was to see if we could clarify the text so that these cases couldn’t be interpreted in importantly incompatible ways, and following the advice from the CSS Working Group, I worked on a PR for the HTML spec trying to define when a browser should draw a focus indicator, and thus match :focus-visible. There some discussion about which elements should always match :focus-visible and how to define that in a normative text was raised (as some elements like <select> draw a focus ring in some browsers and not other when clicked, and some elements like <input type="date"> allow keyboard input or not depending on the platform). The discussion is still ongoing, and we’re still trying to find the proper way to define this in the HTML spec. Anyway if we manage to do that, that would be a great step forward regarding interoperability of :focus-visible implementations, and a big win for the final people using this feature.

Apart from that I’ve also created a test for my proposal about how :focus-visible should work in combination with Shadow DOM, but I don’t think I want to open that can of worms until the other part is done.

Some numbers

Let’s take a look to the numbers again, though things have moved slowly this time.

  • 21 PRs merged in WPT (1 in March).
  • 17 patches landed in WebKit (3 in March).
  • 7 patches landed in Chromium.
  • 2 PRs merged in CSS spcs (1 in March).
  • 1 PR merged in HTML spec.

Next steps

The main goal for April would be to close the spec discussions and land the PRs in HTML spec, so we can focus again in finishing the implementation in WebKit.

However if reaching an agreement to make these changes on HTML spec is not possible, probably we can still land some patches on the implementation side, to add some support for script focus on the WebKit side.

Stay tuned for more updates!

April 04, 2021 10:00 PM

March 31, 2021

Pablo Saavedra

Mini-PC Fulong 2.0 Inside

The history of the world is a continuous succession of  contradictions. The announcement from MIPS Technologies about their decision of definitely abandoning MIPS arch in favour of RISC-V is just another example. But the truth is that things are far from trivial in this topic. Even when the end-of-life date for the MIPS architecture looks closer in time than ever,  there are still infrastructures and platforms what need to keep being supported and maintained for this architecture in the meantime. To make the situation more complex, at the same time I am writing this post the Loongson Technology Ltd  is  announcing a new 16-Core MIPS 12nm CPU for 256-Core (Tom’s Hardware news). Loongson Technology also says that they keep a strong commitment with RISC-V for the future but they will keep their bet for MIPS64 in the meantime. So if MIPS is going to die it going be in lovely death.

In this context, here in Igalia we are hosting and maintaining the CI workers for the JavaScriptCore 32-bit (MIPS) infrastructure for the WebKit web browser engine.

No one ever said that finding end-user hardware of this kind of system is easy-peasy. The options in the market often don’t achieve the sufficient level of maturity or come with a poor set of hardware specifications. The choices are often not intended for long time-consuming CPU tasks, or they simply lack good OS support (maintenance, updates, custom kernels, open-source drivers …).

Nowadays we are using a parallelized cluster of MIPSEL CI 20 boards to move the JavaScriptCore 32-bits (MIPS) CI workers. Don’t get me wrong: the CI 20 boards are certainly not bad. These boards are really great for development and evaluation purposes, but even rare failures become commonplace when you run 30 of them 24/7 in parallel. For this reason some time ago we started looking for an alternative that would eventually replace them. And this was when we found the following candidate.

The candidate

We had a look at what Debian was using for their QA infrastructure and talked to the MIPS team – credits to Berto García who helped us with this – and we concluded that the Loongson 3B4000 MIPSel board was a promising option so we decided to explore it.

We started looking for information about this CPU model and we found references for the Loongson 3A4000 + Fuloong 2.0 Mini-PC. This computer is a kind of very interesting end-user product based on the MIPS64el architecture. In particular, this computer uses a similar but more recent and powerful evolution of the Loongson 3B4000 processor. The Fuloong 2.0 comes in a barebone format with the Loongson-3A R4 (Loongson-3A4000) @ 1500MHz, a quad-core processor, with 8GB DDR4 RAM and a 1TB NVMe of internal storage. These technical specifications are completed with a Realtek ALC662 sound card, 2x USB 3.0 ports + 1x USB Type-C + 4x USB 2.0, 2x HDMI video outputs, 2x Ethernet (WGI211AT), audio connectors, M.2 slot for WiFi module and, finally, a Vivante GL1000 GPU (OpenGL ES 2.0/1.1). This specifications are clearly far from the common constraints of the regular development MIPS boards and are technically a serious candidate for replacing the current boards used in the CI cluster.

However, the acquisition of this kind of products has some non-technical cons that is important to have in mind before taking any decision. For example, it is very difficult to find a reseller in Europe providing this kind of machines. This means that this computer needs to be directly shipped from China, which also means that the acquisition process can suffer from the common problems of this kind of orders: higher delivery time (~1 month), paperwork for customs, taxes, delivery tracking issues … Anyway, this post is intended to keep the focus on the technical details ;-). The fact is, once these issues are solved you will receive a machine similar to this one shown in the photos:

Mini-PC Fulong 2.0 running Linux Distro Mini-PC Fulong 2.0 Closed Mini-PC Fulong 2.0 Inside

The unboxing

The machine comes with a pre-installed custom distro (“Dragon Dream F28”, based on Fedora 28). This distro is quite old but it is the one provided by the manufacturer (Lemote). Apparently it is the only one that, in theory, fully supports the machine. The installed image comes with a desktop environment on top of an X server. The distro is also synced with an RPM repository hosted by Lemote. This is really convenient to start experimenting with the computer and very useful to get information about the system before taking any action on the computer. Here is the output of some commands:

# cat /proc/cpuinfo
system type : generic-loongson-machine
machine : loongson,generic
processor : 0
cpu model : Loongson-3 V0.4 FPU V0.1
model name : Loongson-3A R4 (Loongson-3A4000) @ 1500MHz
CPU MHz : 1500.00
BogoMIPS : 2990.15
wait instruction : yes
microsecond timers : yes
tlb_entries : 2112
extra interrupt vector : no
hardware watchpoint : no
isa : mips1 mips2 mips3 mips4 mips5 mips32r1 mips32r2 mips64r1 mips64r2
ASEs implemented : vz msa loongson-mmi loongson-cam loongson-ext loongson-ext2
shadow register sets : 1
kscratch registers : 6
package : 0
core : 0
... (x4)

dmesg:

Mar 9 12:43:19 fuloong-01 kernel: [ 2.884260] Console: switching to colour frame buffer device 240x67 
Mar 9 12:43:19 fuloong-01 kernel: [ 2.915928] loongson-drm 0000:00:06.1: fb0: loongson-drmdrm frame buffer device 
Mar 9 12:43:19 fuloong-01 kernel: [ 2.919792] etnaviv 0000:00:06.0: Device 14:7a15, irq 93 
Mar 9 12:43:19 fuloong-01 kernel: [ 2.920249] etnaviv 0000:00:06.0: model: GC1000, revision: 5037 
Mar 9 12:43:19 fuloong-01 kernel: [ 2.920378] [drm] Initialized etnaviv 1.3.0 20151214 for 0000:00:06.0 on minor 1

lsblk:

# lsblk
nvme0n1 259:0 0 477G 0 disk
├─nvme0n1p1 259:1 0 190M 0 part /boot/efi
├─nvme0n1p2 259:2 0 1,7G 0 part /boot
├─nvme0n1p3 259:3 0 7,5G 0 part [SWAP]
├─nvme0n1p4 259:4 0 46,6G 0 part /
└─nvme0n1p5 259:5 0 421,1G 0 part /home

Getting Debian into the Fuloong 2.0

The WebKitGTK and WPE WebKit CI infrastructure is entirely based on Debian Stable and/or Ubuntu LTS. This is according to the WebKitGTK maintenance and development policy. For that reason we were pretty interested in getting the machine running with Debian Stable (“buster” as of this writing). So what comes next is the description of the installation process of a pure Debian base system hybridized with the Lemote Fedora Linux kernel using an external USB storage stick as the bootable disk. The process is a mix between the following two documents:

Those documents provide a good detailed explanation of the steps to follow to perform the installation. Only the installation of the kernel and the grub2-efi differs a bit but let’s come back to that later. The idea is:

  • Set the EFI/BIOS to boot from the USB storage (EFI)
  • Install the base Debian OS in a external microSD card connected to the USB3-SS port
  • Keep using the internal nvme disk as the working dir (/home, /var/lib/lxc)

The installation process is initiated in the pre-installed Fedora image. The first action is to mount the external USB storage (sda) in the living system as follows:

# lsblk
sda 8:0 1 14,9G 0 disk
├─sda1 8:1 1 200M 0 part /mnt/debinst/boot/efi
└─sda2 8:2 1 10G 0 part /mnt/debinst
nvme0n1 259:0 0 477G 0 disk
├─nvme0n1p1 259:1 0 190M 0 part /boot/efi
├─nvme0n1p2 259:2 0 1,7G 0 part /boot
├─nvme0n1p3 259:3 0 7,5G 0 part [SWAP]
├─nvme0n1p4 259:4 0 46,6G 0 part /
└─nvme0n1p5 259:5 0 421,1G 0 part /home

As I said, the steps to install the Debian system into the SDcard are quite straightforward. The problems begins during the installation of GRUB and the Linux kernel …

The Linux Kernel

Having followed the guide we will reach the Install a Kernel step. Debian provides a Loongson Linux 4.19 kernel for the Loongson 3A/3B boards.

ii linux-image-4.19.0-14-loongson-3 4.19.171-2 mips64el Linux 4.19 for Loongson 3A/3B
ii linux-image-loongson-3 4.19+105+deb10u9 mips64el Linux for Loongson 3A/3B (meta-package)
ii linux-libc-dev:mips64el 4.19.171-2 mips64el Linux support headers for userspace development

It is quite old in comparison with the one that the Lemote Fedora distro contains (5.4.63-20201012-def) so I prefered to keep the one, although it should be possible to get the machine running with this kernel as well.

Grub2 EFI, first attempt trying to build it for the device

This is the main issue that I found. The first thing that I tried was to look for a GRUB package with EFI support in the mips64el Debian chroot:

root@fuloong-01:/# apt search grub | grep efi
<<empty>>

The frustration came quickly when I didn’t find any GRUB candidate. It was then when I remembered that there was a grub-yeeloong package in the Debian repository that could be useful in this case. The Yeeloong is the predecessor of the Loongson so what I tried next was to rebuild the GRUB package but adding the mips64el architecture for the grub-yeeloong package. Something like the following:

  • Getting the Debian sources and dependencies for the grub2 packages:
    apt source grub2
    apt install debhelper patchutils python flex bison po-debconf help2man texinfo xfonts-unifont libfreetype6-dev gettext libdevmapper-dev libsdl1.2-dev xorriso parted libfuse-dev ttf-dejavu-core liblzma-dev wamerican pkg-config bash-completion build-essentia
    
  • Patching the /debian/control file using this patch
  • … and then to build the Debian package:
    ~/debs# cd grub2-2.02+dfsg1 && dpkg-buildpackage
    
    ~/debs/grub2-2.02+dfsg1# ls ../
    grub-common-dbgsym_2.02+dfsg1-20+deb10u3_mips64el.deb grub-yeeloong_2.02+dfsg1-20+deb10u3_mips64el.deb grub2_2.02+dfsg1-20+deb10u3.debian.tar.xz grub2_2.02+dfsg1.orig.tar.xz
    grub-common_2.02+dfsg1-20+deb10u3_mips64el.deb grub2-2.02+dfsg1 grub2_2.02+dfsg1-20+deb10u3.dsc
    grub-mount-udeb_2.02+dfsg1-20+deb10u3_mips64el.udeb grub2-common-dbgsym_2.02+dfsg1-20+deb10u3_mips64el.deb grub2_2.02+dfsg1-20+deb10u3_mips64el.buildinfo
    grub-yeeloong-bin_2.02+dfsg1-20+deb10u3_mips64el.deb grub2-common_2.02+dfsg1-20+deb10u3_mips64el.deb grub2_2.02+dfsg1-20+deb10u3_mips64el.changes
    

The .deb package is built correctly but the problem is the binary. It lacks EFI runtime support so it is not useful in our case:

*******************************************************
GRUB2 will be compiled with following components:
Platform: mipsel-none <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
With devmapper support: Yes
With memory debugging: No
With disk cache statistics: No
With boot time statistics: No
efiemu runtime: No (only available on i386)
grub-mkfont: Yes
grub-mount: Yes
starfield theme: Yes
With DejaVuSans font from /usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf
With libzfs support: No (need zfs library)
Build-time grub-mkfont: Yes
With unifont from /usr/share/fonts/X11/misc/unifont.pcf.gz
With liblzma from -llzma (support for XZ-compressed mips images)
With quiet boot: No
*******************************************************

This is what happens if you still try to install it:

root@fuloong-01:~/debs/grub2-2.02+dfsg1# dpkg -i ../grub-yeeloong-bin_2.02+dfsg1-20+deb10u3_mips64el.deb ../grub-common_2.02+dfsg1-20+deb10u3_mips64el.deb ../grub2-common_2.02+dfsg1-20+deb10u3_mips64el.deb
root@fuloong-01:~/debs/grub2-2.02+dfsg1# grub-install /dev/sda
Installing for mipsel-loongson platform.
...
grub-install: warning: WARNING: no platform-specific install was performed. <<<<<<<<<<
Installation finished. No error reported.

There is not glue between EFI and GRUB. Files like BOOTMIPS.EFI, gcdmips64el.efi and grub.efi are missing so this is package is not useful at all:

root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/
System.map-4.19.0-14-loongson-3 config-4.19.0-14-loongson-3 efi grub grub.elf initrd.img-4.19.0-14-loongson-3 vmlinux-4.19.0-14-loongson-3
root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/grub
fonts grubenv locale mipsel-loongson
root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/efi/
<<empty>>
root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/
System.map-4.19.0-14-loongson-3 config-4.19.0-14-loongson-3 efi grub grub.elf initrd.img-4.19.0-14-loongson-3 vmlinux-4.19.0-14-loongson-3
root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/grub
grub/ grub.elf

The grub-install command will also confirm that the mips64el-efi target is not supported:

root@fuloong-01:~/debs/grub2-2.02+dfsg1# /usr/sbin/grub-install --help
Usage: grub-install [OPTION...] [OPTION] [INSTALL_DEVICE]
Install GRUB on your drive.
...
--target=TARGET install GRUB for TARGET platform
[default=mipsel-loongson]; available targets:
arm-efi, arm-uboot, arm64-efi, i386-coreboot,
i386-efi, i386-ieee1275, i386-multiboot, i386-pc,
i386-qemu, i386-xen, i386-xen_pvh, ia64-efi,
mips-arc, mips-qemu_mips, mipsel-arc,
mipsel-loongson, mipsel-qemu_mips,
powerpc-ieee1275, sparc64-ieee1275, x86_64-efi,
x86_64-xen

Second attempt, the loongson-community Grub2 EFI

Now that we know that we can not use an official Debian package to install and configure GRUB it is time for a bit of google-fu.

I must have a lot of practice since it only took me a short while to find that the Lemote Fedora distro provides its own GRUB package for the Loongson and, later, I found new hope reading this article. This article explains how to build the GRUB from loongson-community with EFI support so what I would do next was the obvious logical step: To try to build it and check it:

    • git clone https://github.com/loongson-community/grub.git
      cd grub
      bash autogen.sh
      ./configure --prefix=/opt/alternative/
      make ; make install
    • The configure output looks promising:
      *******************************************************
      GRUB2 will be compiled with following components:
      Platform: mips64el-efi <<<<<<<<<<<<<<<<< Looks good.
      With devmapper support: Yes
      With memory debugging: No
      With disk cache statistics: No
      With boot time statistics: No
      efiemu runtime: No (not available on efi)
      grub-mkfont: No (need freetype2 library)
      grub-mount: Yes
      starfield theme: No (No build-time grub-mkfont)
      With libzfs support: No (need zfs library)
      Build-time grub-mkfont: No (need freetype2 library)
      Without unifont (no build-time grub-mkfont)
      With liblzma from -llzma (support for XZ-compressed mips images)
      *******************************************************

    … but unfortunately I started to have more and more build errors in every step. Errors like these:

cc1: error: position-independent code requires ‘-mabicalls’
grub_script.yy.c:19:22: error: statement with no effect [-Werror=unused-value]
build-grub-module-verifier: error: unsupported relocation 0x51807.

… so after several attempts I finally gave up trying to build the loongson-community with GRUB EFI support. Here the patch with some of the modifications that I tried in the code just in case you are better at solving these build errors than me.

Third attempt, reusing the GRUB2 EFI resources from the pre-installed system

… and the last one.

My winner horse was the simpler solution: to reuse the /boot and /boot/efi directories installed in the Fedora system as base for a new Debian system:

    • Clone the tree in the destination dir:
      cp -a /boot /mnt/debinst/boot
    • Replace the UUIDs patch

    The /boot dir in the target installation will be look like this:

    [root@fuloong-01 boot]# tree /mnt/debinst/boot/
    /mnt/debinst/boot/
    ├── boot -> .
    ├── config-5.4.60-1.fc28.lemote.mips64el
    ├── e8a27b4e4fcc4db9ab7a64bd81393773
    │   └── 5.4.60-1.fc28.lemote.mips64el
    │   ├── initrd
    │   └── linux
    ├── efi
    │   ├── boot
    │   │   ├── grub.cfg
    │   │   └── grub.efi
    │   ├── EFI
    │   │   ├── BOOT
    │   │   │   ├── BOOTMIPS.EFI
    │   │   │   ├── fonts
    │   │   │   │   └── unicode.pf2
    │   │   │   ├── gcdmips64el.efi
    │   │   │   ├── grub.cfg
    │   │   │   └── grubenv
    │   │   └── fedora
    │   ├── mach_kernel
    │   └── System
    │   └── Library
    │   └── CoreServices
    │   └── SystemVersion.plist
    ├── extlinux
    ├── grub2
    │   ├── grubenv -> ../efi/EFI/BOOT/grubenv
    │   └── themes
    │   └── system
    │   ├── background.png
    │   └── fireworks.png
    ├── grub.cfg
    ├── grub.efi
    ├── initramfs-5.4.60-1.fc28.lemote.mips64el.img
    ├── loader
    │   └── entries
    │   └── e8a27b4e4fcc4db9ab7a64bd81393773-5.4.60-1.fc28.lemote.mips64el.conf
    ├── lost+found
    ├── System.map-5.4.60-1.fc28.lemote.mips64el
    ├── vmlinuz-205
    └── vmlinuz-5.4.60-1.fc28.lemote.mips64el

… et voilà!

Finally we have a pure Debian Buster root base system hybridized with the Lemote Fedora Linux kernel:

root@fuloong-01:~# cat /etc/debian_version
10.8
root@fuloong-01:~# uname -a
Linux fuloong-01 5.4.60-1.fc28.lemote.mips64el #1 SMP PREEMPT Mon Aug 24 09:33:35 CST 2020 mips64 GNU/Linux
root@fuloong-01:~# cat /etc/apt/sources.list
deb http://httpredir.debian.org/debian buster main contrib non-free 
deb-src http://httpredir.debian.org/debian buster main contrib non-free 
deb http://security.debian.org/ buster/updates main contrib non-free 
deb http://httpredir.debian.org/debian/ buster-updates main contrib non-free
root@fuloong-01:~# apt update
Hit:1 http://httpredir.debian.org/debian buster InRelease
Get:2 http://security.debian.org buster/updates InRelease [65,4 kB]
Get:3 http://httpredir.debian.org/debian buster-updates InRelease [51,9 kB]
Get:4 http://security.debian.org buster/updates/main mips64el Packages [242 kB]
Get:5 http://security.debian.org buster/updates/main Translation-en [142 kB]
Fetched 501 kB in 1s (417 kB/s)                                
Reading package lists... Done
Building dependency tree       
Reading state information... Done
3 packages can be upgraded. Run 'apt list --upgradable' to see them.

With this hardware we can reasonably run native GDB directly on it and have the possibility to run other tools in the host (e.g. you can run any monitoring agent on it to get stats and so). Definitely, having this hardware enabled for using it in the CI infrastructure will be a promising step towards a better QA for the project.
That is all from my side. I will probably continue dedicating some time to get buildable packages of GRUB-EFI and the Linux Kernel that we could use for this and similar machines (e.g. for tools like perf who needs to have the userspace binaries in sync with the kernel version). In the meantime, I really hope that this can be useful to someone out there who is interested in this hardware. If you have some comment or question or you simply wish to share your thoughts about this just leave a comment.

Stay safe!

by Pablo Saavedra at March 31, 2021 07:16 AM

March 25, 2021

Andy Wingo

here we go again

Around 18 months ago, Richard Stallman was forced to resign from the Free Software Foundation board of directors and as president. It could have been anything -- at that point he already had a history of behaving in a way that was particularly alienating to women -- but in the end it was his insinuation that it was somehow OK if his recently-deceased mentor Marvin Minsky, then in his 70s or 80s, had sex with a 17-year-old on Jeffrey Epstein's private island. A weird pick of hill to stake one's reputation on, to say the least.

At the time I was relieved that we would finally be getting some leadership renewal at the FSF, and hopeful that we could get some mission renewal as well. I was also looking forward to the practical implications would be for the GNU project, as more people agreed that GNU was about software freedom and not about its founder.

But now we're back! Not only has RMS continued through this whole time to insist that he runs the GNU project -- something that is simply not the case, in my estimation -- but this week, a majority of a small self-selected group of people, essentially a subset of current and former members of the FSF board of directors and including RMS himself, elected to reinstate RMS to the board of the Free Software Foundation. Um... read the room, FSF voting members? What kind of message are you sending?

In this context I can only agree with the calls for the entire FSF board to resign. The board is clearly not fit for purpose, if it can make choices like this.

dissociation?

I haven't (yet?) signed the open letter because I would be in an inconsistent position if I did so. The letter enjoins people to "refuse to contribute to projects related to the FSF and RMS"; as a co-maintainer of GNU Guile, which has its origins in the heady 1990s of the FSF but has nothing to do any more with RMS, but whose copyrights are entirely held by the FSF, is hosted on FSF-run servers, and is even obliged (GPLv3 §5d, as referenced by LGPLv3) to print out Copyright (C) 1995-2021 Free Software Foundation, Inc. when it starts, I must admit that I contribute to a project that is "related to the FSF". But I don't see how Guile could continue this association, if the FSF board continues as it is. It's bad for contributors and for the future of the project.

It would be very tricky to disentangle Guile from the FSF -- consider hosting, for example -- so it's not the work of a day, but it's something to think about.

Of course I would rather that the FSF wouldn't allow itself to be seen as an essentially misogynist organization. So clean house, FSF!

on the nature of fire

Reflecting on how specifically we could have gotten here -- I don't know. I don't know the set of voting members at the FSF, what discussions were made, who voted what. But, having worked as a volunteer on GNU projects for almost two decades now, I have a guess. RMS and his closest supporters see themselves as guardians of the flame of free software -- a lost world of the late 70s MIT AI lab, reborn in a flurry of mid-80s hack, but since 25 years or so, slipping further and further away. These are dark times, in their view, and having the principled founder in a leadership role can only be a good thing.

(Of course, the environment in the AI lab was only good for some. The treatment of Margaret Hamilton as recounted in Levy's Hackers shows that not all were welcome. If this were just one story, I would discount it, but looking back, it does seem to be part of a pattern.)

But is that what the FSF is for today? If so, Guile should certainly leave. I'm not here for software as perfomative nostalgia -- I'm here to have fun with friends and start a fire. The FSF should look to do the same -- look at the world we are in, look where the energy is now, and engage in real conversations about success and failure and tactics. There is a world to win and doubling down on RMS won't get us there from here.

by Andy Wingo at March 25, 2021 12:22 PM

March 17, 2021

Samuel Iglesias

VK_KHR_depth_stencil_resolve support on Turnip

Last year, I have been working on Turnip driver development as my daily job at Igalia. One of those tasks was implementing the support for VK_KHR_depth_stencil_resolve extension.

VK_KHR_depth_stencil_resolve

This extension adds support for automatically resolving multisampled depth/stencil attachments in a subpass in a similar manner as for color attachments. This extension, however, does not add support for resolving msaa depth/stencil images with vkCmdResolveImage() command.

As you can imagine, this extension is easy to use by any application. Unless you are using a driver that supports Vulkan 1.2 or higher (VK_KHR_depth_stencil_resolve was promoted to core in Vulkan 1.2), you should check first if it is supported. Once done that, ask the driver which features are supported by extending VkPhysicalDeviceProperties2 with VkPhysicalDeviceDepthStencilResolveProperties structure when calling vkGetPhysicalDeviceProperties2().

struct VkPhysicalDeviceDepthStencilResolveProperties {
    VkStructureType       sType;
    void*                 pNext;
    VkResolveModeFlags    supportedDepthResolveModes;
    VkResolveModeFlags    supportedStencilResolveModes;
    VkBool32              independentResolveNone;
    VkBool32              independentResolve;
}

This structure will be filled by the driver to indicate the different depth/stencil resolve modes supported by the driver (more info about their meaning and possible values in the spec).

Next step: just fill a VkSubpassDescriptionDepthStencilResolve struct with the resolve mode wanted and the depth/stencil attachment used for resolve. Then, extend the VkSubpassDescription2 struct with it. And that’s all.

struct VkSubpassDescriptionDepthStencilResolve {
    VkStructureType                  sType;
    const void*                      pNext;
    VkResolveModeFlagBits            depthResolveMode;
    VkResolveModeFlagBits            stencilResolveMode;
    const VkAttachmentReference2*    pDepthStencilResolveAttachment;
}

Turnip implementation

Implementing this extension on Turnip was more or less direct, although there were some issues to fix.

For all depth/stencil formats, it was added its support in the both resolve paths used by the driver, one for sysmem (system memory) and other for gmem (tile buffer), including flushing the depth cache when needed. However, for VK_FORMAT_D32_SFLOAT_S8_UINT format, which has the stencil part in a separate plane, it was needed to add specific code to extend the 2D path used in the driver for sysmem resolves.

However the main issue when was testing the extension implementation on Adreno 630 GPU. It turns out that VK-GL-CTS tests exercising this extension for MSAA VK_FORMAT_D24_UNORM_S8_UINT formats were failing always, except when disabling UBWC via TU_DEBUG=noubwc environment variable. UBWC (Universal Bandwidth Compression) is a HW feature designed to improve throughput to system memory by minimizing the bandwidth of data (which also gives some power savings). The problem was that the UBWC support for MSAA VK_FORMAT_D24_UNORM_S8_UINT format is known to be failing on Adreno 630 and Adreno 618 (see merge request for freedreno driver). I just needed to disable it Turnip to fix these failures.

I also found other VK_KHR_depth_stencil_resolve CTS tests failing: the ones testing the format compatibility for VK_FORMAT_D32_SFLOAT_S8_UINT and VK_FORMAT_D24_UNORM_S8_UINT formats. For VK_FORMAT_D32_SFLOAT_S8_UINT failures, it was needed to take into account the particularity that it has a separate plane for the stencil part when resolving it to VK_FORMAT_S8_UINT. In the VK_FORMAT_D24_UNORM_S8_UINT failures, the problem was that we were setting wrongly the resolve mode used by the HW: it was wrongly doing a sample average, when we wanted to use the value of sample 0. This merge request fixed both issues.

And that’s all, this was a extension that allowed me to dive into the different resolve paths used by Turnip and learn one or two things about the HW ;-) Thanks a lot to Jonathan Marek for his reviews and suggestions to improve the implementation of this extension.

Happy hacking!

March 17, 2021 08:14 AM

March 16, 2021

Iago Toral

Improving performance of the V3D compiler for OpenGL and Vulkan

Lately, I have been looking at improving performance of the V3DV Vulkan driver for the Raspberry Pi 4. So far we had been toying a lot with some Vulkan ports of the Quake trilogy but we wanted to have a look at more modern games as well, and for that we started to look at some Unreal Engine 4 samples, particularly the Shooter demo.


Unreal Engine 4 Shooter Demo

In our initial tests with “High” settings, even at 480p we were running the sample at 8-15 fps, with 720p being mostly in the 5-10 fps range. Not great, obviously, but a good opportunity to guide and focus our performance efforts.

One aspect of the UE4 sample that was immediately obvious compared to the Quake games is that the shading is a lot more expensive in general, and more specifically, it involves more texture lookups and UBO loads, which require expensive accesses to memory from the shader cores, so this was the first area we targeted. The good thing about this is that because our Vulkan and OpenGL drivers share the compiler stack, any optimizations we do here benefit both drivers.

What follows is a brief discussion of some of the optimizations we did recently to our backend compiler and the results we have observed from this work.

Optimizing the backend compiler

So the first thing we tackled was better managing the latency of texture lookups. Interestingly, the NIR scheduler was setting this up so that we would try to put instructions that consumed the result of a texture lookup as far away as possible from the instruction that triggered the lookup, but then the backend compiler was not fully taking advantage of this and would still end up waiting on lookup results sooner than it should.

Fixing this helped performance by 1%-3%, although it could go a bit above that in some cases. It has a caveat though: doing this extends the liveness of our lookup sequences, and that makes spilling more difficult (we can’t emit spills/unspills in the middle of an outstanding memory lookup), so when register pressure is high enough that we need to inject register spills to compile the shader, we would typically be a lot more constrained and end up producing significantly worse code, or even worse, failing to register allocate for the shader completely. To avoid this, we recompile the shader without the optimization if we detect that we need to do any spilling. One can use V3D_DEBUG=perf to detect if this is happening for any shaders, looking for messages like this:

Falling back to strategy 'disable TMU pipelining' for MESA_SHADER_FRAGMENT.

While the above optimization was useful, I was expecting that it would make a larger impact for this particular demo so I kept looking for ways to do our memory lookups more efficient. One thing that is relevant to this analysis is that we were using the same hardware unit for both texture and UBO lookups, but for the latter, we could really use a different strategy by handling our UBOs as uniform streams. This has some caveats, but making a long story short, the fact that many UBO loads use uniform addresses, that we usually read a bunch of consecutive scalars from a UBO load (such as a full vec4) and that applications usually emit UBO loads for nearby addresses together, we can emit fairly optimal code for many of these, leading to more efficient memory access in general.

Eventually, this turned out to be a big win, yielding 20%-30% improvements for the Shooter demo even with the initial basic implementation, which we would then tune and optimize further.

Again related to memory accesses, I have also been improving how we schedule instructions involved with setting up memory lookups. Our scheduler was more restrictive here than it needed, and the extra flexibility can help reduce instruction counts, which will affect these more modern games most, as they are more likely to emit a larger number of texture/image operations in their shaders.

Another thing we did was to improve instruction packing. The V3D GPU is able to emit multiple instructions in the same cycle so long as the instructions meet some requirements. Turns out our assembly scheduler was being too restrictive with this and we could do better. Going by shader-db results, this led to ~5% less instructions on our programs and added another modest 1%-2% performance improvement for the Shooter demo.

Another issue we noticed is that a lot of the shaders were passing a lot of varyings from the vertex to fragment shaders, and our setup for fragment shader inputs was not optimal. The issue here was that there is a specific instruction involved in this process that writes two registers, one of then with an instruction of delay, and our dependency tracking was not able to handle this properly, effectively assuming that both registers are written in the same instruction which then had an impact in how we scheduled instructions. Fixing this required to handle this case specially in our scheduler so we could be more effective at scheduling these instructions in a way that would enable optimal pipelining of the instructions involved with varying setups for fragment shaders.

Another aspect we improved was related to our uniform handling. Generally, there many instances in which we need to emit duplicate uniforms. There are reasons for that related to how the GPU works, but in some cases, for example with consecutive UBO loads, we would emit the uniform with the base address of the UBO multiple times very close to each other. We can obviously do better by trying to track previous uses of a uniform/constant value and reusing them in nearby instructions. Of course, this comes at the expense of increasing register pressure (for reasons beyond the scope of this post our shaders typically require a lot of uniforms), so there is a balancing game to play here too. Reducing the size of our uniform streams this way also plays an important role in lowering some of the CPU overhead of the driver, since these streams need to be rebuilt often when certain pipeline states change.

Finally, an optimization that was more targeted at reducing register pressure rather than improving performance: we noticed that we would sometimes put some instructions far away from their consumers with no obvious benefit to it. This was bad enough some times that it would even cause us to be unable to compile some shaders. Mesa has a handy NIR pass for this called nir_opt_sink, which also proved to be helpful for this. This allowed us to get a few more shaders from shader-db to compile and reduce spilling for a bunch of shaders. For the Shooter demo, this changed a large compute shader involved with histogram post-processing, which had 48 spills and 50 fills to only have 8 spills and 15 fills. While the impact in performance of this is probably very small since the game only runs this pass once per frame, it made a notable difference in loading time, since compiling a shader with this much spilling is very slow at present. I have a mental note to improve this some day, I know Intel managed to fix this for their compiler, but for the time being, this alone managed to make the loading time much more reasonable.

Results

First, here are some shader-db stats, which describe how these optimizations change various statistics for a large collections of real world shaders:

total instructions in shared programs: 14992758 -> 13731927 (-8.41%)
instructions in affected programs: 14003658 -> 12742827 (-9.00%)
helped: 80448
HURT: 4297

total threads in shared programs: 407932 -> 412242 (1.06%)
threads in affected programs: 4514 -> 8824 (95.48%)
helped: 2189
HURT: 34

total uniforms in shared programs: 4069524 -> 3790401 (-6.86%)
uniforms in affected programs: 2267834 -> 1988711 (-12.31%)
helped: 40588
HURT: 1186

total max-temps in shared programs: 2388462 -> 2322009 (-2.78%)
max-temps in affected programs: 897803 -> 831350 (-7.40%)
helped: 30598
HURT: 2464

total spills in shared programs: 6241 -> 5940 (-4.82%)
spills in affected programs: 3963 -> 3662 (-7.60%)
helped: 75
HURT: 24

total fills in shared programs: 14587 -> 13372 (-8.33%)
fills in affected programs: 11192 -> 9977 (-10.86%)
helped: 90
HURT: 14

total sfu-stalls in shared programs: 28106 -> 31489 (12.04%)
sfu-stalls in affected programs: 16039 -> 19422 (21.09%)
helped: 4382
HURT: 6429

total inst-and-stalls in shared programs: 15020864 -> 13763416 (-8.37%)
inst-and-stalls in affected programs: 14028723 -> 12771275 (-8.96%)
helped: 80396
HURT: 4305

Less is better for all stats, except threads. We can see significant improvements across the board: we generally produce shaders with less instructions that have less maximum register pressure, we reduce spills and uniform counts and can run with more threads. We only are worse at stalls, but that is generally because we now produce more compact code with less instructions, so more stalls are expected.

Another good thing in these stats is the large number of helped shaders compared to hurt shaders, meaning that it is very likely that these optimizations will help most applications to some extent.

But enough of boring compiler statistics, most people won’t care about that and what they want to know is how this impacts performance on actual games and applications, which is what the following graphs shows (these were obtained by replaying specific traces with gfx-reconstruct). Keep in mind that while I am using a collection of Vulkan samples/games here it is expected that these optimizations apply to OpenGL too.


Before vs After

Framerate improvement after optimization (in %)

As it can be observed from the graph above, this optimization work made a significant impact in the observed framerate in all the cases. It is not surprising that the UE4 demo is the one that sees the most improvement, considering this the one we used to guide most of the optimization work.

Other optimizations and future work

In this post I have been focusing exclusively on compiler optimizations, but we have also been improving other parts of the Vulkan driver. While I won’t go into details to avoid making this post too long, we have also been improving aspects of the driver involved with buffer to image copies, depth buffer clears, dirty descriptor state management, usage of the TFU unit for transfer operations and more.

Finally, there is one other aspect of this UE4 demo that is pretty obvious as soon as you start a game: it can compile a lot of shaders in the middle of the gameplay loop which can lead to significant stutter. While there is not much we can do about this on the driver side, but adding support for a shader cache on disk should eliminate the problem on sessions after the first, so this is something that we may work on in the future.

We will certainly continue to look at improving the driver performance in the future so stay tuned for further updates on our progress or maybe join us at #videocore on Freenode.

by Iago Toral at March 16, 2021 09:36 AM

Eric Meyer

First Month at Igalia

Today marks one month at Igalia.  It’s been a lot, and there’s more to come, but it’s been a really great experience.  I get to do things I really enjoy and value, and Igalia supports and encourages all of it without trying to steer me in specific directions.  I’ve been incredibly lucky to experience that kind of working environment twice in my life — and the other one was an outfit I helped create.

Here’s a summary of what I’ve been up to:

  • Generally got up to speed on what Igalia is working on (spoiler: a lot).
  • Redesigned parts of wpewebkit.org, fixed a few outstanding bugs, edited most of the rest. (The site runs on 11ty, so I’ve been learning that as well.)
  • Wrote a bunch of CSS tests/demos that will form the basis for other works, like articles and videos.
  • Drafted a few of said articles.  As I write this, two are very close to being complete, and a third is almost ready for editing.
  • Edited some pages on the Mozilla Developer Network (MDN), clarifying or upgrading text in some places and replacing unclear examples in others.
  • Joined the Open Web Docs Steering Committee.
  • Reviewed various specs and proposals (e.g., Miriam’s very interesting @scope proposal).

And that’s not all!  Here’s what I have planned for the next few months:

  • More contributions to MDN, much of it in the CSS space, but also branching out into documenting some up-and-coming APIs in areas that are fairly new to me.  (Details to come!)
  • Contributions to the Web Platform Tests (WPT), once I get familiar with how that process is structured.
  • Articles on topics that will include (but are not limited to!) gaps in CSS, logical properties, and styling based on writing direction.  I haven’t actually settled on outlets for those yet, so if you’d be interested in publishing any of them, hit me up.  I usually aim for about a thousand words, including example markup and CSS.
  • Very likely will rejoin the CSS Working Group after a (mumblecough)-year absence.
  • Assembling a Raspberry Pi system to test out WPEWebKit in its native, embedded environment and get a handle on how to create a “setting up WPEWebKit for total embedded-device noobs”, of which I am one.

That last one will be an entirely new area for me, as I’ve never really worked with an embedded-device browser before.  WPEWebKit is a WebKit port, actually the official WebKit port for embedded devices, and as such is aggressively tuned for performance and low resource demand.  I’m really looking forward to not only seeing what it’s like to use it, but also how I might be able to leverage it into some interesting projects.

WPEWebKit is one of the reasons why Igalia is such a big contributor to WebKit, helping drive its standards support forward and raise its interoperability with other browser engines.  There’s a thread of self-interest there: a better WebKit means a better WPEWebKit, which means more capable embedded devices for Igalia’s clients.  But after a month on the inside, I feel comfortable saying most of Igalia’s commitment to interoperability is philosophical in nature — they truly believe that more consistency and capability in web browsers benefits everyone.  As in, THIS IS FOR EVERYONE.

And to go along with that, more knowledge and awareness is seen as an unvarnished good, which is why they’re having me working on MDN content.  To that end, I’m putting out an invitation here and now: if you come across a page on MDN about CSS or HTML that confuses you, or seems inaccurate, or just doesn’t have much information at all, please get in touch to let me know, particularly if you are not a native English speaker.

I can’t offer translation services, unfortunately, but I can do my best to make the English content of MDN as clear as possible.  Sometimes, what makes sense to a native English speaker is obscure or unclear to others.  So while this offer is open to everyone, don’t hold back if you’re struggling to parse the English.  It’s more likely the English is unclear and imprecise, and I’d like to erase that barrier if I can.

The best way to submit a report is to send me email with [MDN] and the URL of the page you’re writing about in the subject line.  If you’re writing about a collection of pages, put the URLs into the email body rather than the subject line, but please keep the [MDN] in the subject so I can track it more easily.  You can also ping me on Twitter, though I’ll probably ask you to email me so I don’t lose track of the report.  Just FYI.

I feel like there was more, but this is getting long enough and anyway, it already seems like a lot.  I can’t wait to share more with you in the coming months!


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at March 16, 2021 12:53 AM

March 09, 2021

Igalia Compilers Team

Igalia’s Compilers Team in 2020

In a previous blog post, we introduced the kind of work the Igalia compilers team does and gave a mid-year update on our 2020 progress.

Now that we have made our way into 2021, we wanted to recap our achievements from 2020 and update you on the exciting improvements we have been making to the web programming platform. Of course, we couldn’t have done this work alone; all of this was brought to you through our collaborations with our clients and upstream partners in the web ecosystem.

https://twitter.com/rkirsling/status/1276299298020290561

JavaScript class features

Our engineers at Igalia have continued to push forward on improvements to JS classes. Earlier in 2020, we had landed support for public field declarations in JavaScriptCore (JSC). In the latter half of 2020, we achieved major milestones such as getting private class fields into JSC (with optimizing compiler support!):

https://twitter.com/robpalmer2/status/1276378092349657095
https://twitter.com/caitp88/status/1318919341467979776

as well as static public and private fields.

We also helped ship private methods and accessors in V8 version 84. Our work on private methods also landed in JSC and we expect it to be available in future releases of Safari.

These additions will help JS developers create better abstractions by encapsulating state and behavior in their classes.

TC39 and Temporal

Our compilers team also contributed throughout 2020 to web standards through its participation in TC39 and related standards bodies.

One of the big areas we have been working on is the Temporal proposal, which aims to provide better date and time handling in JS. When we blogged about this in mid-2020, the proposal was still in Stage 2 but we’re expecting it to go Stage 3 soon in 2021. Igalians have been working hard on many aspects of the proposal since mid-2020, including managing community feedback, working on the polyfill, and maintaining the documentation.

For more info on Temporal, also check out a talk by one of engineers, Ujjwal Sharma, at Holy JS 2020 Piter.

Another area we have been contributing to for a number of years is the ECMA-402 Internationalization (Intl) standard, an important effort that provides i18n support for JS. We help maintain and edit the specification while also contributing tests and pushing Intl proposals forward. For example, we helped with the test suite of the Intl.Segmenter feature for implementing localized text segmentation, which recently shipped in Chrome. For a good overview of other recent Intl efforts, check out these slides from IUC44.

We’re also contributing to many other proposed features for JS, such as WeakRefs, Decimal (Daniel Ehrenberg from our team gave a talk on this at Node.TLV 2020), Import Assertions, Records & Tuples, Top-level await, and Module blocks & module bundling (Daniel also gave a talk on these topics at Holy JS 2020 Moscow).

Node.js

In addition to our contributions to the client side of the web, we are also contributing to server side use of web engines. In particular, we have continued to contribute to Node.js throughout 2020.

Some notable contributions include adding experimental support for per-context memory measurements in version 13 during early 2020.

Since late 2020, we have been working on improving Node.js startup speed by moving more of the bootstrap process into the startup snapshot. For more on this topic, you can watch a talk that one of our engineers, Joyee Cheung, presented at NodeConf Remote 2020 here (slides are available here).

JSC support on 32-bit platforms

Our group also continues to maintain support in JSC for 32-bit platforms. Earlier in 2020 we contributed improvements to JSC on 32-bit such as tail call optimizations, support for checkpoints, and others.

Since then we have been optimizing LLInt (the low-level interpreter for JSC) on 32-bit, and porting the support of inline caching for delete operations to 32-bit (to improve the performance of delete, you can read about the background on the original optimization from the Webkit blog here).

We also blogged about our efforts to support the for-of intrinsic on 32-bit to improve iteration on JS arrays.

WebAssembly

Finally, we have made a number of contributions to WebAssembly (Wasm), the new low-level compiler-target language for the web, on both the specification and implementation sides.

During 2020, we helped ship and standardize several Wasm features in web engines such as support for multiple-values, which can help compilers to Wasm produce better code, and support for BigInt/I64 conversion in the JS API, which lifts a restriction that made it harder to interact with Wasm programs from JS.

We’ve also improved support in tools such as LLVM for the reference types proposal, which adds new types to the language that can represent references to values from JS or other host languages. Eventually reference types will be key to supporting the garbage collection proposal (in which references are extended to new struct and array types), which will allow for easier compilation of languages that use GC to Wasm.

We’re also actively working on web engine support for exception handling, reference types, and other proposals while continuing to contribute to tools and specification work. We plan to help ship more WebAssembly features in browsers during 2021, so look forward to our mid-year update post!

by Compilers Team at March 09, 2021 09:35 AM

March 03, 2021

Samuel Iglesias

Igalia is hiring!

One of the best decisions I did in my life was when I joined Igalia in 2012. Inside Igalia, I have been working in different open-source projects, most of the time related to graphics technologies, interacting with different communities, giving talks, organizing conferences and, more importantly, contributing to free software as my daily job.

Now I’m thrilled to announce that we are hiring for our Graphics team!

Igalia

Right now we have two open positions:

  • Graphics Developer.

    We are looking for candidates that would like to contribute to open-source OpenGL/Vulkan graphics drivers (Mesa), or other areas of the open-source graphics stack such as X11 or Wayland, among others. If you have experience with them, or you are very motivated to become an expert there, just send us your CV!

  • Kernel Developer.

    We are looking for candidates that either have experience with kernel development or they can ramp-up quickly to contribute to linux kernel drivers. Although no specific subsystem is mentioned in the job position, I encourage you to apply if you have DRM experience and/or ARM[64]/MIPS related knowledge.

Graphics technologies are not your cup of tea? We have positions in other areas like browsers, compilers, multimedia… Just check out our job offers on our website!

What we offer is to work in an open-source consultancy in which you can participate equally in the management and decision-making process of the company via our democratic, consensus-based assembly structure. As all of our positions are remote-friendly, we welcome submissions from any part of the world.

Are you still a student? We have launched the 2021 edition of our Coding Experience program. Check it out!

Igalia's office

March 03, 2021 01:21 PM

February 28, 2021

Manuel Rego

:focus-visible in WebKit - February 2021

One month has passed since the previous report so it’s time for a status update.

As you probably already know, Igalia is working on the implementation of :focus-visible in WebKit, a project that is being sponsored by many people and organizations through the Open Prioriziatation campaing. We’ve reached 84% of the goal, thanks you all! 🎉 And if you haven’t contributed yet, you’re still in time to do it if you believe this work is important for you.

The main highlight for February is that initial work has started to land in WebKit, though some important bits are still under review.

Spec issues

There were some open discussions on the spec side regarding different topics, let’s review here the status of them:

  • :focus-visible on <select> element: After some discussion on the CSS Working Group (CSSWG) issue it was agreed to remove the <select> element from the tests and each browser could decide whether to match :focus-visible when it’s clicked, as there was not a clear agreement regarding how to interpret if it allows or not keybaord input.

    In any case Chromium has still a bug on elements that have a popup (like a <select>). When you click them they match :focus-visible, but they don’t show the focus indicator (because they want to avoid showing two focus indicators, one on the <select> and another one on the <option>). As it’s not showing a focus indicator, it shouldn’t match :focus-visible in that situation actually.

  • :focus-visible on script focus: The spec is not totally clear about when an element should match (or not) :focus-visible after script focus. There has been a nice proposal from Alice Boxhall and Brian Kardell on the CSSWG issue, but when we discussed this in the CSSWG it was decided that was not the proper forum for these discussions. This was because the CSSWG has defined that :focus-visible should match when the browser shows a focus indicator to the user, but it doesn’t care when it’s actually showed or not. That definition is very clear, and despite that each browser shows the focus indicator (or not) in different situations, the definition is still correct.

    Currently the list of heuristics on the spec are not normative, they’re just hints to the browsers about when to match :focus-visible (when to show a focus indicator). But I believe it’d be really nice to have interoperability here, so the people using this feature won’t find weird problems here and there. So the suggestion from the CSSWG was to discuss this in the HTML spec directly, proposing there a set of rules about when a browser should show a focus indicator to the user, those rules would be the current heuristics on the :focus-visible spec with some slight modifications to cover the corner cases that have been discussed. Hopefully we can reach an agreement between the different parties, and manage to define this properly on the HTML spec, so all the implementations can be interoperable on this regard.

    I believe we need to dig deeper in the specific case of script focus, as I’m not totally sure how some scenarios (e.g. blur() before focus() and things like that) should work. For that reason I worked on a set of tests trying to clarify the different situations when the browser should show or not a focus indicator. These need to be discussed with more people to see if we can reach an agreement and prepare some spec text for HTML.

  • :focus-visible and Shadow DOM: Another topic already explained in the previous report. My proposal to the CSSWG was to avoid matching :focus-visible on the ShadowRoot when some element in the Shadow Tree has the focus, in order to avoid having two focus indicators.

    There has been raised a concern, as this would allow to guess if an element is or not a ShadowRoot by focusing it via script and then checking if it matches or not :focus-visible (but ShadowRoot elements shouldn’t be observable). However that’s already possible in WebKit that currently uses :-webkit-direct-focus in the default User Agent style sheet, to avoid the double focus indicator in this case. In WebKit you can focus via script an element and check if it has or not an outline to determine if it’s a ShadowRoot.

    Anyway like in the previous case, this would be part of the heuristics so according to CSSWG’s suggestion, this should be discussed on the HTML spec directly.

Default User Agent style sheet

Early this month I landed a patch to start using :focus-visible in Chromium User Agent style sheet, this is included in version 90. 🚀 This means that from that version on you won’t see an outline when you click on a <div tabindex="0">, only when you focus it with the keyboard. Also the hack :focus:not(:focus-visible) won’t be needed anymore (actually it has been removed from the spec too).

In addition, Firefox is also using :focus-visible on their User Agent style sheet since version 87.

More about tests

During this month there has been still some extra work on the tests. While I was implementing things on WebKit I realized about some minor issues in the tests that have been fixed along the way.

I also found out some limitations of WebKit with regard to testdriver.js support for simulating keyboard inputs. Some of the :focus-visible tests use the method test_driver.send_keys() to send keys like Control or Enter. I added support for them on WebKit. Apart from that, I fixed how modifier keys are identified in WebKitGTK and WPE, as they were not following other browsers exactly (e.g. event.ctrlKey was not set on keydown event, only on keyup).

WebKit implementation

And the most important part, the actual WebKit implementation has been moving forward during this month. I managed to have a patch that passed all the tests, and split it a little bit in order to merge things upstream.

The first patch that just do the parsing of the new pseudo-class and adds a experimental flag has already landed.

Now a second patch is under review. It originally contained the whole implementation that passes all the tests, but due to some discussion on the script focus issues, that part has been removed. Anyway the review is ongoing and hopefully it’ll land soon and you could start testing it in the WebKit nightlies.

Some numbers

Like in the previous post, let’s again review the numbers of what has been done during in this project:

  • 20 PRs merged in WPT (7 in February).
  • 14 patches landed in WebKit (9 in February).
  • 7 patches landed in Chromium (3 in February).
  • 1 PR merged in Selectors spec (1 in February).
  • 1 PR merged in HTML spec (1 in February).

Next steps

First thing is to get the main patch landed in WebKit and verify that things are working as expected on the different platforms.

Another issue we need to solve is to reach an agreement on how script focus should work regarding :focus-visible, and then get that implemented in WebKit covering all the cases.

After that we could request to enable the feature by default in WebKit. Once that’s done, we could discuss the possibility to change the default User Agent style sheet to use :focus-visible too.

There are some interop work pending to do. A few things are failing on Firefox and we could try to help to fix them. Also some weird issues with <select> elements in Chromium that might need some love too. And depending on the ongoing spec discussions there might be some changes needed or not in the different browsers. Anyway, we might find the time or not to do this, let’s see how things evolve in the next weeks.

Big thanks to everyone that has contributed to this project, you’re all wonderful for letting us work on this. 🙏 Stay tuned for more updates in the future!

February 28, 2021 11:00 PM

February 22, 2021

Eric Meyer

First Week at Igalia

The first week on the job at Igalia was… it was good, y’all.  Upon formally joining the Support Team, got myself oriented, built a series of tests-slash-demos that will be making their way into some forthcoming posts and videos, and forked a copy of the Mozilla Developer Network (MDN) so I can start making edits and pushing them to the public site.  In fact, the first of those edits landed Sunday night!  And there was the usual setting up accounts and figuring out internal processes and all that stuff.

A series of tests of the CSS logical property ';block-border'.
Illustrating the uses of border-block.

To be perfectly honest, a lot of my first-week momentum was provided by the rest of the Support Team, and setting expectations during the interview process.  You see, at one point in the past I had a position like this, and I had problems meeting expectations.  This was partly due to my inexperience working in that sort of setting, but also partly due to a lack of clear communication about expectations.  Which I know because I thought I was doing well in meeting them, and then was told otherwise in evaluations.

So when I was first talking with the folks at Igalia, I shared that experience.  Even though I knew Igalia has a different approach to management and evaluation, I told them repeatedly, “If I take this job, I want you to point me in a direction.”  They’ve done exactly that, and it’s been great.  Special thanks to Brian Kardell in this regard.

I’m already looking forward to what we’re going to do with the demos I built and am still refining, and to making more MDN edits, including some upgrades to code examples.  And I’ll have more to say about MDN editing soon.  Stay tuned!


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at February 22, 2021 09:32 PM

February 15, 2021

Eric Meyer

First Day at Igalia

Today is my first day as a full-time employee at Igalia, where I’ll be doing a whole lot of things I love to do: document and explain web standards at MDN and other places, participate in standards work at the W3C, take on some webmaster duties, and play a part in planning Igalia’s strategy with respect to advancing the web.  And likely other things!

I’ll be honest, this is a pretty big change for me.  I haven’t worked for anyone other than myself since 2003.  But the last time I did work for someone else, it was for Netscape (slash AOL slash Time Warner) as a Standards Evangelist, a role I very much enjoyed.  In many ways, I’m taking that role back up at Igalia, in a company whose values and structure are much more in line with my own.  I’m really looking forward to finding out what we can do together.

If the name Igalia doesn’t ring any bells, don’t worry: nobody outside the field has heard of them, and most people inside the field haven’t either.  So, remember when CSS Grid came to browsers back in 2017?  Igalia did the implementation that landed in Safari and Chromium.  They’ve done a lot of other things besides that — some of which I’ll be helping to spread the word about — but it’s the thing that web folks will be most likely to recognize.

This being my first day and all, I’m still deep in the setting up of logins and filling out of forms and general orienting of oneself to a new team and set of opportunities to make a positive difference, so there isn’t much more to say besides I’m stoked and planning to say more a little further down the road.  For now, onward!


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at February 15, 2021 05:23 PM

February 13, 2021

Eleni Maria Stea

About VK_EXT_sample_locations

More than a year ago, I had worked on the implementation of VK_EXT_sample_locations extension for anv, the Intel Vulkan driver of mesa3D, as part of my work for Igalia. The implementation had been reviewed (see acknowledgments) at the time, but as the conformance tests that were available back then had to be improved and that … Continue reading About VK_EXT_sample_locations

by hikiko at February 13, 2021 09:47 PM

February 10, 2021

Danylo Piliaiev

Freedreno now supports OpenGL 3.3 on A6XX

I recently joined Igalia and as a way to familiarize myself with Adreno GPUs it was decided to get Freedreno up to OpenGL 3.3.

Just recently, Freedreno exposed only OpenGL 3.0. The big jump in version required only two small extensions and a few fixes to get rid of most crashes in Piglit since almost all features were already supported.

ARB_blend_func_extended

I decided to start with dual-source-blending. Fittingly, it was the cause of the first issue I investigated in Mesa two and a half years ago =)

Turnip already had it implemented so it was a good first task. Most of the time was spent getting myself familiarized with Freedreno and squashing the hangs that happened due to the lack of experience with this driver. And shortly I have a working MR

freedreno/a6xx: add support for dual-source blending (!7708)

However, Freedreno didn’t have Piglit tests running on CI and I missed the failure of arb_blend_func_extended-fbo-extended-blend-pattern_gles2 test. I did test the _gles3 version though. Thus the issue with gles2 was not found immediately.

What’s different in the gles2 test? It uses gl_SecondaryFragColorEXT for the second color of dual-source blending. gl_FragColor and gl_SecondaryFragColorEXT are different from the other ways to specify color in that they broadcast the color to all enabled draw buffers. In Mesa they both translated to slot FRAG_RESULT_COLOR, which in Freedreno didn’t expect the second source for blending. The solution was simple, if a shader has a dual-source blending - remap FRAG_RESULT_COLOR to FRAG_RESULT_DATA0 + dual_src_index.

freedreno/ir3: remap FRAG_RESULT_COLOR to _DATA* for dual-src blending (!8245)

While looking at how other drivers handle FRAG_RESULT_COLOR I saw that Zink also has a similar issue but in a NIR lowering path, so I fixed it too since it was simple enough.

nir/lower_fragcolor: handle dual source blending (!8247)

ARB_shader_stencil_export

While implementing ARB_blend_func_extended I saw in Turnip that RB_FS_OUTPUT_CNTL0 also could enable stencil export:

tu_cs_emit_pkt4(cs, REG_A6XX_RB_FS_OUTPUT_CNTL0, 2);
tu_cs_emit(cs, COND(fs->writes_pos, A6XX_RB_FS_OUTPUT_CNTL0_FRAG_WRITES_Z) |
               COND(fs->writes_smask, A6XX_RB_FS_OUTPUT_CNTL0_FRAG_WRITES_SAMPMASK) |
               COND(fs->writes_stencilref, A6XX_RB_FS_OUTPUT_CNTL0_FRAG_WRITES_STENCILREF) |
               COND(dual_src_blend, A6XX_RB_FS_OUTPUT_CNTL0_DUAL_COLOR_IN_ENABLE));

Meanwhile, in Freedreno there was just a cryptic 0xfc000000:

OUT_PKT4(ring, REG_A6XX_SP_FS_OUTPUT_CNTL0, 1);
OUT_RING(ring, A6XX_SP_FS_OUTPUT_CNTL0_DEPTH_REGID(posz_regid) |
        A6XX_SP_FS_OUTPUT_CNTL0_SAMPMASK_REGID(smask_regid) |
        COND(fs_has_dual_src_color,
                A6XX_SP_FS_OUTPUT_CNTL0_DUAL_COLOR_IN_ENABLE) |
        0xfc000000);

Which I was quickly told to be a shifted register r63.x, which has a value of 0xfc and is used to signify an unused value.

Again, Turnip already supported stencil export and compiler had it wired up; it’s hard to have a simpler extension to implement.

freedreno/a6xx: add support for ARB_shader_stencil_export (!7810)

Running tests for GL 3.1 - 3.3

After enabling ARB_blend_func_extended and ARB_shader_stencil_export Freedreno had everything for a jump to GL 3.3. It was a time to run tests.

KHR-GL31.texture_size_promotion.functional

First observation was that the test passes with FD_MESA_DEBUG=nogmem, next I compared traces with and without nogmem and saw the difference in a texture layout (I should have just printed the layouts with FD_MESA_DEBUG=layout). It didn’t give me an answer, so I checked whether I could bisect the failure. To my delight it was bisectable

freedreno: reduce extra height alignment in a6xx layout (e4974852)

The change is only one line of code and without the context first thing which comes to mind is that maybe the alignment is wrong:

         if (level == mip_levels - 1)
-            nblocksy = align(nblocksy, 32);
+            height = align(nblocksy, 4);

         uint32_t nblocksx =

But it’s even more trivial, nblocksy was erroneously changed to height, and height just wasn’t used afterwards which hid the issue. Thus it was fixed in

freedreno/a6xx: Fix typo in height alignment calculation in a6xx layout (!7792)

gl-3.2-layered-rendering-clear-color-all-types 2d_multisample_array single_level

It was quickly found that clearing a framebuffer in Freedreno didn’t take into account that framebuffer could have several layers, so only the first one was cleared.

Initially, I thought that we would need a number_of_layers draw calls to clear all layers, but looking at a generic blitter in gallium suggested a better answer - instanced draw where each instance draw a fullscreen quad into a distinct layer. Which could be accomplished just by:

gl_Layer = gl_InstanceID;

in the vertex shader. However, this required the support of GL_AMD_vertex_shader_layer which Freedreno didn’t have…

Implementing GL_AMD_vertex_shader_layer

Fortunately, Turnip again had it implemented. As with stencil export it was just a matter of finding a register of VARYING_SLOT_LAYER and plugging it into respective control structure.

freedreno/a6xx: add support for gl_Layer in vertex shader

Simple enough? Yes! However, it failed the only tests available for the extension:

amd_vertex_shader_layer-layered-2d-texture-render    // Fails
amd_vertex_shader_layer-layered-depth-texture-render // Crashes

The crash when clearing a depth texture is a known one, but why 2d-texture-render fails? Let’s look at the output:

layered-2d-texture-render without noubwc
amd_vertex_shader_layer-layered-2d-texture-render

First layer and its mipmaps are cleared, but on other layers most mipmaps are not! Mipmaps are nowhere near gl_Layer, so it seams that the implementation is indeed correct, however the single test we expected to pass - fails. Which is bad because we don’t have other tests to test gl_Layer.

Fixing amd_vertex_shader_layer-layered-2d-texture-render

A good starting point is always to check whether one of FD_MESA_DEBUG options helps. In this case noubwc changed the result of the test to:

layered-2d-texture-render with noubwc
amd_vertex_shader_layer-layered-2d-texture-render with noubwc

Not by a lot but still a lead. Something wrong with image layouts…

I searched for similar tests for Vulkan and found them in vktDrawShaderLayerTests.cpp, the tests passed on Turnip which told me that most likely the issue is not in the common code of layout calculations. After some hacking of the tests I made one close enough to amd_vertex_shader_layer-layered-2d-texture-render. However, directly comparing traces between GL and Vulkan isn’t productive. So I decided to see what changes with different mip levels in Freedreno and Turnip separately.

So I did make 4 tests:

  • GL with clearing level 1
  • GL with clearing level 2
  • VK drawing to level 1
  • VK drawing to level 2

With that comparing “GL level 1”” with “GL level 2” and “VK level 1”” with “VK level 2” yielded just a few differences. For GL the only difference which could had been relevant was:

Level 1:

RB_MRT[0].ARRAY_PITCH: 8192

Level 2:

RB_MRT[0].ARRAY_PITCH: 4096

However in VK this value didn’t change, there was no difference in traces barring window/scissor/blit sizes!

RB_MRT[0].ARRAY_PITCH: 45056

In VK it was just a constant 45056 regardless of mip level. Hardcoding the value in Freedreno fixed the test. After that it was a matter of comparing how the field is calculated for Turnip and Freedreno resulting in:

freedreno/a6xx: fix array pitch for layer-first layouts

The layered clear

With the above changes the layered clear was simple enough:

freedreno/a6xx: support layered framebuffers in blitter_clear

gl-3.2-minmax or bumping the maximum count of varyings

Before, Freedreno supported only 16 vec4 varyings or 64 components between shader stages. GL3.2 on the other hand mandates the minimum of 128 components. After a couple of trivial fixes - I thought it would be easy, but Marek Olšák pointed out that I didn’t cover all cases with my tests. And of course one of them, the passing of varyings between VS and TCS, hanged the GPU.

I ran a few tests only changing the varyings count and observed that GPU hanged starting from 17 vec4. By comparing the traces I didn’t find anything suspicious in shaders, so I decided to check a few states that depended on varyings count. After poking a few of them I found that setting SP_HS_UNKNOWN_A831 above 64 resulted in a guaranteed hang.

In Turnip SP_HS_UNKNOWN_A831 had a special case for A650 which made me a bit suspicious. However, using A650’s formula for A630 didn’t change anything. It was time to see what blob driver does, so I wrote a script to iterate from 1 to 32 vec4, push test on device, run it, copy trace back, extract A831 value and append to a csv. Here is the result:

vec4 count A831 (Hex) A831 (Dec) PC_HS_INPUT_SIZE
1 0x10 16 0xc
2 0x14 20 0xf
3 0x18 24 0x12
4 0x1c 28 0x15
5 0x20 32 0x18
6 0x24 36 0x1b
7 0x28 40 0x1e
8 0x2c 44 0x21
9 0x30 48 0x24
10 0x34 52 0x27
11 0x38 56 0x2a
12 0x3c 60 0x2d
13 0x3f 63 0x30
14 0x40 64 0x33
15 0x3d 61 0x36
16 0x3d 61 0x39
17 0x40 64 0x3c
18 0x3f 63 0x3f
19 0x3e 62 0x42
20 0x3d 61 0x45
21 0x3f 63 0x48
22 0x3d 61 0x4b
23 0x40 64 0x4e
24 0x3d 61 0x51
25 0x3f 63 0x54
26 0x3c 60 0x57
27 0x3e 62 0x5a
28 0x40 64 0x5d
29 0x3c 60 0x60
30 0x3e 62 0x63
31 0x40 64 0x66

So A831 is indeed capped at 64 and after the first time A831 reaches 64, the relation ceases to be linear. Ugh…

Without having any good ideas I posted the results on Gitlab asking Jonathan Marek, who added a special case for A650 some time ago, for help. The previous A650 formula was:

prims_per_wave = wavesize // tcs_vertices_out
total_size = output_size * patch_control_points * prims_per_wave
unknown_a831 = DIV_ROUND_UP(total_size, wavesize)

After looking at the table above Jonathan Marek suggested the following formula:

prims_per_wave = wavesize // tcs_vertices_out
while True:
    total_size = output_size * patch_control_points * prims_per_wave
    unknown_a831 = DIV_ROUND_UP(total_size, wavesize)
    if unknown_a831 <= 0x40:
        break
    prims_per_wave -= 1

And it worked! What does the new formula mean is that there is A831 * wavesize available space per wave for varyings; the more varyings we use in a shader the less threads the wave will have. For TCS we use wave size of 64 so the total size available would be

64 (wave size) * 64 (max A831) * 4 = 16k

freedreno/a6xx: bump varyings limit (!7917)

Bumping OpenGL support to 3.3

There still was a considerable amount of Piglit failures but they were either already known or not so important to prevent the the version increase. Which was done in

freedreno: Enable GLSL 3.30, updating us to GL 3.3 contexts (!8270)

by Danylo Piliaiev at February 10, 2021 10:00 PM

February 09, 2021

Andrés Gómez

Replaying 3D traces with piglit


If you don’t know what is traces based rendering regression testing, read the appendix before continuing.


The Mesa community has witnessed an explosion of the Continuous Integration interest in the last two years.

In addition to checking the proper building of the project, integrating the testing of its functional correctness has become a priority. The user space graphics drivers exhibit a wide variety of types of tests and test suites. One kind of those tests are the traces based rendering regression testing.

The public effort to add this kind of tests into Mesa’s CI started with this mail from Alexandros Frantzis.

At some point, we had support for replaying OpenGL, Vulkan and D3D11 traces using apitrace, RenderDoc and GFXReconstruct with the in-tree tool tracie. However, it was a very custom solution made to the needs of Mesa so I proposed to move this codebase and integrate it into the piglit test suite. It was a natural step forward.

This is how replayer was born into piglit.

replayer

The first step to test a trace is, actually, obtaining a trace. I won’t go into the details about how to create one from scratch. The process is well documented on each of the tools listed above. However, the Mesa community has been collecting publicly distributable traces for a while and placing them in traces-db whose CI is copying them to Freedesktop.org’s MinIO instance.

To make things simple, once we have built and installed piglit, if we would like to test an apitrace created OpenGL trace, we can download from there with:

$ replayer.py download \
 	 --download-url https://minio-packet.freedesktop.org/mesa-tracie-public/ \
 	 --db-path ./traces-db \
 	 --force-download \
 	 glxgears/glxgears-2.trace

The parameters are self explanatory. The downloaded trace will now exist at ./traces-db/glxgears/glxgears-2.trace.

The next step will be to dump an image from the trace. Since it is a .trace file we will need to have apitrace installed in the system. If we do not specify the call(s) from which to dump the image(s), we will just get the last frame of the trace:

$ replayer.py dump ./traces-db/glxgears/glxgears-2.trace

The dumped PNG image will be at ./results/glxgears-2.trace-0000001413.png. Notice, the number suffix is the snapshot id from the trace.

Dumping from a trace may result in a range of different possible images. One example is when the trace makes use of uninitialized values, leading to undefined behaviors.

However, since the original aim was performing pre-merge rendering regression testing in Mesa’s CI, the idea is that replaying any of the provided traces would be quick and the dumped image will be consistent. In other words, if we would dump several times the same frame of a trace with the same GFX stack, the image will always be the same.

With this precondition, we can test whether 2 different images are the same just by doing a hash of its content. replayer can obtain the hash for the generated dumped image:

$ replayer.py checksum ./results/glxgears-2.trace-0000001413.png 
f8eba0fec6e3e0af9cb09844bc73bdc8

Now, if we would build a different commit of Mesa, we could check the generated image at this new point against the previously generated reference image. If everything goes well, we will see something like:

$ replayer.py compare trace \
 	 --download-url https://minio-packet.freedesktop.org/mesa-tracie-public/ \
 	 --device-name gl-vmware-llvmpipe \
 	 --db-path ./traces-db \
 	 --keep-image \
 	 glxgears/glxgears-2.trace f8eba0fec6e3e0af9cb09844bc73bdc8
[dump_trace_images] Info: Dumping trace ./traces-db/glxgears/glxgears-2.trace...
[dump_trace_images] Running: apitrace dump --calls=frame ./traces-db/glxgears/glxgears-2.trace
// process.name = "/usr/bin/glxgears"
1384 glXSwapBuffers(dpy = 0x56060e921f80, drawable = 31457282)

1413 glXSwapBuffers(dpy = 0x56060e921f80, drawable = 31457282)

error: drawable failed to resize: expected 1515x843, got 300x300
[dump_trace_images] Running: eglretrace --headless --snapshot=1413 --snapshot-prefix=./results/trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace- ./blog-traces-db/glxgears/glxgears-2.trace
Wrote ./results/trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace-0000001413.png

OK
[check_image]
    actual: f8eba0fec6e3e0af9cb09844bc73bdc8
  expected: f8eba0fec6e3e0af9cb09844bc73bdc8
[check_image] Images match for:
  glxgears/glxgears-2.trace

PIGLIT: {"images": [{"image_desc": "glxgears/glxgears-2.trace", "image_ref": "f8eba0fec6e3e0af9cb09844bc73bdc8.png", "image_render": "./results/trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace-0000001413-f8eba0fec6e3e0af9cb09844bc73bdc8.png"}], "result": "pass"}

replayer‘s compare subcommand is the one spitting a piglit formatted test expectations output.

Putting everything together

We can make the whole process way simpler by passing the replayer a YAML tests list file. For example:

$ cat testing-traces.yml
traces-db:
  download-url: https://minio-packet.freedesktop.org/mesa-tracie-public/

traces:
  - path: gputest/triangle.trace
    expectations:
      - device: gl-vmware-llvmpipe
        checksum: c8848dec77ee0c55292417f54c0a1a49
  - path: glxgears/glxgears-2.trace
    expectations:
      - device: gl-vmware-llvmpipe
        checksum: f53ac20e17da91c0359c31f2fa3f401e
$ replayer.py compare yaml \
 	 --device-name gl-vmware-llvmpipe \
 	 --yaml-file testing-traces.yml 
[check_image] Downloading file gputest/triangle.trace took 5s.
[dump_trace_images] Info: Dumping trace ./replayer-db/gputest/triangle.trace...
[dump_trace_images] Running: apitrace dump --calls=frame ./replayer-db/gputest/triangle.trace
// process.name = "/home/anholt/GpuTest_Linux_x64_0.7.0/GpuTest"
397 glXSwapBuffers(dpy = 0x7f0ad0005a90, drawable = 56623106)

510 glXSwapBuffers(dpy = 0x7f0ad0005a90, drawable = 56623106)


/home/anholt/GpuTest_Linux_x64_0.7.0/GpuTest
[dump_trace_images] Running: eglretrace --headless --snapshot=510 --snapshot-prefix=./results/trace/gl-vmware-llvmpipe/gputest/triangle.trace- ./replayer-db/gputest/triangle.trace
Wrote ./results/trace/gl-vmware-llvmpipe/gputest/triangle.trace-0000000510.png

OK
[check_image]
    actual: c8848dec77ee0c55292417f54c0a1a49
  expected: c8848dec77ee0c55292417f54c0a1a49
[check_image] Images match for:
  gputest/triangle.trace

[check_image] Downloading file glxgears/glxgears-2.trace took 5s.
[dump_trace_images] Info: Dumping trace ./replayer-db/glxgears/glxgears-2.trace...
[dump_trace_images] Running: apitrace dump --calls=frame ./replayer-db/glxgears/glxgears-2.trace
// process.name = "/usr/bin/glxgears"
1384 glXSwapBuffers(dpy = 0x56060e921f80, drawable = 31457282)

1413 glXSwapBuffers(dpy = 0x56060e921f80, drawable = 31457282)


/usr/bin/glxgears
error: drawable failed to resize: expected 1515x843, got 300x300
[dump_trace_images] Running: eglretrace --headless --snapshot=1413 --snapshot-prefix=./results/trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace- ./replayer-db/glxgears/glxgears-2.trace
Wrote ./results/trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace-0000001413.png

OK
[check_image]
    actual: f8eba0fec6e3e0af9cb09844bc73bdc8
  expected: f8eba0fec6e3e0af9cb09844bc73bdc8
[check_image] Images match for:
  glxgears/glxgears-2.trace

replayer features also the query subcommand, which is just a helper to read the YAML files with the tests configuration.

Testing the other kind of supported 3D traces doesn’t change much from what’s shown here. Just make sure to have the needed tools installed: RenderDoc, GFXReconstruct, the VK_LAYER_LUNARG_screenshot layer, Wine and DXVK. A good reference for building, installing and configuring these tools are Mesa’s GL and VK test containers building scripts.

replayer also accepts several configurations to tweak how to behave and where to find the actual tracing tools needed for replaying the different types of traces. Make sure to check the replay section in piglit’s configuration example file.

replayer‘s README.md file is also a good read for further information.

piglit

replayer is a test runner in a similar fashion to shader_runner or glslparsertest. We are now missing how does it integrate so we can do piglit runs which will produce piglit formatted results.

This is done through the replay test profile.

This profile needs a couple configuration values. Easiest is just to set the PIGLIT_REPLAY_DESCRIPTION_FILE and PIGLIT_REPLAY_DEVICE_NAME env variables. They are self explanatory, but make sure to check the documentation for this and other configuration options for this profile.

The following example features a similar run to the one done above invoking directly replayer but with piglit integration, providing formatted results:

$ PIGLIT_REPLAY_DESCRIPTION_FILE=testing-traces.yml PIGLIT_REPLAY_DEVICE_NAME=gl-vmware-llvmpipe piglit run replay -n replay-example replay-results
[2/2] pass: 2   
Thank you for running Piglit!
Results have been written to replay-results

We can create some summary based on the results:

# piglit summary console replay-results/
trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace: pass
trace/gl-vmware-llvmpipe/gputest/triangle.trace: pass
summary:
       name: replay-example
       ----  --------------
       pass:              2
       fail:              0
      crash:              0
       skip:              0
    timeout:              0
       warn:              0
 incomplete:              0
 dmesg-warn:              0
 dmesg-fail:              0
    changes:              0
      fixes:              0
regressions:              0
      total:              2
       time:       00:00:00

Creating an HTML summary may be also interesting, specially when finding failures!

Wishlist

  • Through different backends, replayer supports running apitrace, RenderDoc and GFXReconstruct traces. We may want to support other tracing tools in the future. The dummy backend used for functional testing is a good starting point when writing a new backend.
  • The solution chosen for checking whether we detect a rendering regression is dependent on having consistent results, as said before. It’d be great if we could add a secondary testing method whenever the expected rendered image is variable. From the top of my head, using exclusion masks could be a quick single-run solution when we know which specific areas in a rendered scenario are the ones fluctuating. For more complex variations, a multi-run based solution seems to be the best option. EzBench has a great statistical approach for this!
  • The current syntax of the YAML test list files implies running the compare subcommand with the default behavior of checking against the last frame of the tested trace. This means figuring out which call number is the one of the last frame first. It would be great to support providing the call numbers directly in the YAML files to be able to test more than just the last frame and, additionally, cut down the time taken to run the test.
  • The HTML generated summary allows us to see the reference and generated image during a test run side to side when it fails. It’d be great to have also some easy way of checking its differences. Using Rembrandt.js could be a possible solution.

Thanks a lot to the whole Mesa community for helping with the creation of this tool. Alexandros Frantzis, Rohan Garg and Tomeu Vizoso did a lot of the initial development for the in-tree tracie tool while Dylan Baker was very patient while reviewing my patches for the piglit integration.

Finally, thanks to Igalia for allowing me to work in this.


Appendix

In 3D computer graphics we say “traces”, for short, to name the files generated by 3D APIs capturing tools which store not only the calls to the specific 3D API but also the internal state of the 3D program during the capturing process: shaders, textures, buffers, etc.

Being able to “record” the execution of a 3D program is very useful. Usually, it will allow us to replay the execution without the need of the original program from which we generated the trace, it will also allow in-depth analysis for debugging and performance optimization, it’s a very good solution for sharing with other developers, and, in some cases, will allow us to check how the replay will happen with different GPUs.

In this post, however, I focus in a specific usage: rendering regression testing.

When doing a regression test what we would do is compare a specific metric obtained by replaying the trace with a specific version of the GFX software stack against the same metric obtained from a different version of the GFX stack. If the value of the metric changes we have found a regression (or an improvement!).

To make things simpler, we would like to check changes happening just in one of the many elements of the software stack. The most relevant component is the user space driver. In particular, I care about the Mesa drivers and the GNU/Linux stack.

Mainly, there are two kinds of regression testing we can do with a trace: performance or rendering regression testing. When doing a performance one, the checked metric(s) usually are in terms of speed or memory usage. In the case of the rendering ones what we would do is comparing the rendered output at one (or many) point during the trace replay. This output, a bitmap image, is the metric that we will compare in between two different points of the Mesa driver. If the images differ, we may have found a regression; artifacts, improper colors, etc, or an enhancement, if the reference image is the one featuring any of these problems.

by tanty at February 09, 2021 08:47 AM

February 03, 2021

Eleni Maria Stea

Build WebKit/WPE on Linux/X11

There’s a lot of documentation online about building Webkit/WPE on Linux. But as most instructions are targetting embedded platforms developers, the focus is on building Webkit with Wayland using the flatpak-sdk to automate and speed up the building process. As the steps I’ve followed to build it on my X11 system and run the Webkit/WPE … Continue reading Build WebKit/WPE on Linux/X11

by hikiko at February 03, 2021 09:41 PM

Alejandro Piñeiro

v3dv status update 2021-02-03

So some months have passed since our last update, when we announced that v3dv became Vulkan 1.0 conformant. The main reason for not publishing so many posts is that we saw the 1.0 checkpoint as a good moment to hold on adding new big features, and focus on improving the codebase (refactor, clean-ups, etc.) and the already existing features. For the latter we did a lot of work on performance. That alone would deserve a specific blog post, so in this one I will summarize the other stuff we did.

New features

Even if we didn’t focus on adding new features, we were still able to add some:

  • The following optional 1.0 features were enabled: logicOp, althaToOne, independentBlend, drawIndirectFirstInstance, and shaderStorageImageExtendedFormats.
  • Added support for timestamp queries.
  • Added implementation for VK_KHR_maintenance1, VK_EXT_private_data, and VK_KHR_display extensions
  • Added support for Wayland WSI.

Here I would like to highlight that we started to get feature contributions out of the initial core of developers that created the driver. VK_KHR_display was submitted by Steven Houston, and Wayland WSI support was submitted by Ella-0. Thanks a lot for it, really appreciated! We hope that this would begin a trend of having more things implemented by the rpi/mesa community as a whole.

Bugfixing and vulkan tools

Even if the driver got conformant, we were still testing the driver with several demos and applications, and provided fixes. As a example, we got Sascha Willem’s oit (Order Independent Transparency) working:

Sascha Willem’s oit demo on the rpi4

Among those applications that we were testing, we can highlight renderdoc and gfxreconstruct. The former is a frame-capture based graphics debugger and the latter is a tool that allows to capture and replay several frames. Both tools are heavily used when debugging and testing vulkan applications. We tested that they work on the rpi4 (fixing some bugs while doing it), and also started to use them to help/guide the performance work we are doing.

Fosdem 2021

If you are interested on an overview of the development of the driver during the last year, we are going to present “Overview of the Open Source Vulkan Driver for Raspberry Pi 4” on FOSDEM this weekend (presentation details here).

Previous updates

Just in case you missed any of the updates of the vulkan driver so far:

Vulkan raspberry pi first triangle
Vulkan update now with added source code
v3dv status update 2020-07-01
V3DV Vulkan driver update: VkQuake1-3 now working
v3dv status update 2020-07-31
v3dv status update 2020-09-07
Vulkan update: we’re conformant!

by infapi00 at February 03, 2021 11:13 AM

January 27, 2021

Manuel Rego

:focus-visible in WebKit - January 2021

Let’s do a small introduction as this is a kind of special post.

As you might already know, last summer Igalia launched the Open Prioritization experiment, and :focus-visible in WebKit was the winner according to the pledges that the different projects got. Now it has moved into collecting funds stage, so far we’ve reached 80% of the goal and Igalia has already started to work on this. If you are interested and want to help sponsoring this work, please visit the project page at Open Collective.

In our regular client projects in Igalia, we provide periodic progress reports about the status of tasks and next plans. This blog post is a kind of monthly report, but this time the project has many customers, so it looks like this is a better format to share information about the status of things. Thank you all for supporting us in this development! 🙏

Understanding :focus-visible

Disclaimer: This is not a blog post explaining how :focus-visible works or the implications it has, you can read other articles if you’re looking for that.

First things first, my initial thoughts were that :focus-visible was a pseduo-class which would match an element when the browser natively shows a focus indicator (focus ring, outline) when an element of a page is focused. And that’s more or less what the spec says on the first sentence:

The :focus-visible pseudo-class applies while an element matches the :focus pseudo-class and the user agent determines via heuristics that the focus should be made evident on the element.

They key part here is that native behavior doesn’t show the focus indicator on purpose in some situations when the :focus pseudo-class matches, mainly because usability studies indicate that showing it in all the cases is not what the user expects and wants. Before having :focus-visible the web authors have not way to access the same criteria to style the focus indicator only when it’s going to be shown natively, and still keep the website being accessible.

Apart from that the spec has a set of heuristics that despite being non-normative, it looks like all implementations are following them. Summarizing them briefly they’d be something like:

  • If you use the mouse to focus an element it won’t match :focus-visible.
  • If you use the keyboard it’ll match :focus-visible.
  • Elements that support keyboard input (like <input> or contenteditable) always match :focus-visible.
  • When a script focuses a new element it’ll match or not :focus-visible depending on the previous active element.

This is just a quick & dirty summary, please read the spec for all the details. There have been years of research around these topics (how focus should work or not on the different use cases, what are the users and accessibility needs, how websites are managing focus, etc.) and these heuristics are somehow the result of all that work.

:focus-visible in the default UA style sheet

At this point it looks like we can more or less understand what :focus-visible is about. So let’s start playing with it. The definition seems very clear, but testing things in the current implementations (Chromium and Firefox) you might find some unexpected situations.

Let’s use a very basic example:

<style>
  :focus-visible { background: lime; }
</style>
<div tabindex="0">Focus me.</div>
#example1:focus-visible { background: lime; } #warning1 { display: none; color: red; font-size: 75%; font-style: italic; } @supports not (selector(:focus-visible)) { #warning1 { display: block; } }
WARNING: :focus-visible is not supported on your browser.
Focus me.

If you focus the <div> with a mouse click, :focus-visible doesn’t match per spec, so in this case the background doesn’t become green (if you use the keyboard to focus it will match :focus-visible and the background would be green). This works the same in Chromium and Firefox, but Chromium (despite the element doesn’t match :focus-visible) shows a focus indicator. Somehow the first spec definition is already not working as expected on Chromium… The issue here is that Chromium still uses :focus { outline: auto; } in the default UA style sheet, and the element matches :focus after the mouse click, that’s why it’s showing a focus indicator while not matching :focus-visible.

Actually this was already on the spec, but Chromium is not following that yet:

User agents should also use :focus-visible to specify the default focus style, so that authors using :focus-visible will not also need to disable the default :focus style.

There was already a related CSSWG issue on the topic, as the spec currently suggests the following code:

:focus:not(:focus-visible) {
  outline: 0;
}

This works as a kind of workaround for this issue, but if the default UA style sheet uses :focus-visible that won’t be needed.

Anyway, I’ve reported the Chromium bug and created WPT tests, during the tests review Emilio Cobos realized that this needed a change on the HTML spec and he wrote a PR to update it. After some discussion with Alice Boxhall the HTML change and the tests were approved and merged. I even was brave enough to write a patch to change this in Chromium which is still under review and needs some extra work.

The tests

WebKit is the third browser engine adding support for :focus-visible so the first natural step was to import the WPT tests related to the feature. This looked like something simple but it ended up needing some work to improve the tests.

There was a bunch of :focus-visible tests already in the WPT suite, but they need some love:

  • Some parts of the spec were not covered by tests, so I added some new tests.
  • Some tests were passing in WebKit, even when there was not an implementation yet, so I modified them to fail if there’s no :focus-visible support.

Then I imported the tests in WebKit and I discovered a bug related to focus event and :focus pseudo-class. :focus pseudo-class was not matching inside the focus event handler. This is probably not important for web authors, but :focus-visbile tests were relying on that. Actually this had been fixed in Chromium more than 5 years ago, so first I moved to WPT the Chromium internal test and used it to fix the problem in WebKit.

Once the tests were imported in WebKit, the problem was that a bunch of them were timing out in Mac platforms. After investigating the issue I realized that it’s because focus event is not dispatched when you click on a button in Mac. Digging deeper I found this old bug with lots of discussions on the topic, it looks like this is done to keep alignment with the native platform behavior, and also in order to avoid showing a focus indicator. Even Firefox has the same behavior on Mac. However, Chromium always dispatch the event independently of the platform. This makes that some of the tests don’t work automatically on Mac, as they wait for a focus event that is never dispatched. Anyway maybe once :focus-visible is implemented, it could be rediscussed the possibility of modifying this behavior, thought it might be not possible anyway. In any case, WebKitGTK port, the one I’m using for the development, does trigger the focus event in this case; and I’ve also changed WPE port to do the same (maybe Windows port will follow too).

One more thing about the tests, lots of these :focus-visible tests use testdriver.js to simulate user actions. For example for clicking an element they use test_driver.click(element), however that simple instruction is causing some kind of memory leak on Chromium when running the tests. The actual Chromium bug hasn’t been fixed yet, but I landed some workarounds that prevent the issue in these tests (waiting for the promise to be resolved before marking the test as done).

Status of WPT :focus-visible tests in the different browsers Status of WPT :focus-visible tests in the different browsers

To close the tests part, you can check the status in wpt.fyi, most of them are passing in all implementations which is great, but there are some interoperability issues that we’ll review next.

Interop issues

As I mentioned the wpt.fyi website helps to easily identify the interop issues between the different implementations.

  • :focus-visible on the default UA style sheet: This has been already commented before, but this is the reason why Chromium fails focus-visible-018.html test. Firefox fails focus-visible-017.html because the default UA style sheet mentions outline: auto, but Firefox uses a dotted outline.

  • :focus-visible on <select> element: There’s a Firefox failure on focus-visible-002.html because it doesn’t match :focus-visible when you click a <select> element. I opened a CSSWG issue to discuss this, and I initially thought that the agreement was that Firefox behavior is the right one. So I did a patch to change Chromium’s behavior and update the tests, but during the review I was pointed to a Chromium bug about this topic that was closed as WONTFIX, the reason is that when you click a <select> element you can type letters to select the option from the keyboard. Right now the discussion has been reopened and we’ll need to wait for the final resolution on the topic, to see which is the right implementation.

  • Keyboard interaction once an element is focused: This is tested by focus-visible-007.html. The example here is that you click an element to focus it, initially the element doesn’t match :focus-visible but then you use the keyboard (for example you type a letter), in that situation Chromium will start matching :focus-visible while Firefox won’t. The spec is quite explicit on the topic so it looks like a Firefox bug:

    If the user interacts with the page via the keyboard, the currently focused element should match :focus-visible (i.e. keyboard usage may change whether this pseudo-class matches even if it doesn’t affect :focus).

  • Programmatic focus and :focus-visible: What should happen with :focus-visible when the website uses element.focus() from a script to move the focus? The spec has some heuristics that depend on if the active element before focus() is called was matching (or not) :focus-visible. But I’ve opened a CSSWG issue to discuss what should happen when there’s no active element. The discussion is still ongoing and depending on that there might be changes in the current implementations. Right now there are some subtle differences between Chromium and Firefox here.

:-webkit-direct-focus?

Probably you don’t know what’s that, but it’s somehow related to :focus-visible so I believe it’s worth to mention it here.

WebKit is the browser that supports better :focus pseudo-class behavior on Shadow DOM (see the WPT tests results). The issue here is that the ShadowRoot should match :focus if some of the descendants are focused, so if you have an <input> element in the Shadow Tree, and you focus it, you’ll have 2 elements matching :focus the ShadowRoot and the <input>.

<style>
  #host {  padding: 1em; background: lightgrey; }
  #host:focus { background: lime; }
</style>
<div id="host"></div>
<script>
  shadowRoot = host.attachShadow(
    {mode: 'open', delegatesFocus: true});
  shadowRoot.innerHTML =
    '<input value="Focus me">';
</script>
#example2host { padding: 1em; background: lightgrey; } #example2host:focus { background: lime; }

In Chromium if you use delegatesFocus=true in element.attachShadow(), and you have an example like the one described above, you’ll get two focus indicators, one in the ShadowRoot and one in the <input>. Firefox doesn’t match :focus in the ShadowRoot so the issue is not present there.

WebKit matches :focus independently of delegatesFocus value (which is the right behavior per spec), so it’d be even more common to have a situation of getting two focus indicators. To avoid that WebKit introduced :-webkit-direct-focus pseudo-class, that is not web exposed, but it’s used in the default UA style sheet to avoid this bad effect of having a focus indicator on the ShadowRoot.

I believe :focus-visible spec should describe that behavior regarding how it works on ShadowRoot so it doesn’t match on those situations. That way WebKit could get rid of :-webkit-direct-focus and use :focus-visible once it’s implemented. I’ve reported a CSSWG issue to discuss this topic.

WIP implementation

So far I haven’t talked about the implementation at all, but the reason is that all the previous work is required in order to be able to do a proper implementation, with good quality and that is interoperable between the different browsers. :focus-visbile is a new feature, and despite all the interop mess regarding how focus works in the different browsers and platforms, we should aim to have a :focus-visible implementation as much interoperable as possible.

Despite all this related work, I’ve also found some time to work on a patch. It’s still not ready to be sent upstream but it’s already doing some things and passing some of the WPT tests. Of course several things are still missing, but next you can see quick screen recording with :focus-visible working on WebKit.

:focus-visible example running on WebKitGTK MiniBrowser

Some numbers

I know this is not really relevant, but it helps to get a grasp on what has been happening during this month:

  • 3 CSSWG issues reported.
  • 13 PRs merged in WPT.
  • 5 patches landed in WebKit.
  • 4 patches landed in Chromium.
  • And many discussions with different people, special thanks to Alice and Emilio that have been really helpful.

Next steps

The plan for February is to try to find an agreement on the CSSWG issues, close them, and update the WPT tests accordingly. Maybe this work could include even landing some patches on the current implementations. And of course, focus (pun intended) the effort on implementation of :focus-visible in WebKit.

I hope this blog post helps you to understand better the work that goes behind the scenes when a web platform feature is implemented, especially if you want to do it on a way that ensures browser interoperability and reduces web authors’ pain.

If you enjoyed this project update, stay tuned as there will more in the future.

January 27, 2021 11:00 PM

January 24, 2021

Brian Kardell

Optimizing The W3C Sandwiches

Optimizing The W3C Sandwiches

In which I muse about a boring, but, I think important topic of flawed W3C structures and process, in a hopefully not-boring way...

Imagine that there is a trade organization which had a marathon meeting event for its 500 members every year. It's a long meeting, and so during it, they serve sandwiches. A break, with food and peers, it seems is really conducive to the productivity. So, while it's a little non-obvious, while small and with no special 'powers' - the sandwiches are actually really important. The trade organization realizes this and decides to formalize a way to make sure we always have good sandwiches. A few weeks before they send out an email asking people:

Please tell us an ingredient which you could provide enough of for every sandwich for the next 3 years. Your ingredient may or may not be used, we are just looking for ingredient nomination statements expressing what your ingredient brings to the table. If your ingredient is chosen, we will tell you whether that is a 1, 2 or 3 year commitment.

Even if everyone doesn't really understand how inter-connectedly important the sandwiches are, they aren't entirely apathetic about them either: When it comes time to eat, and everyone's hungry after long meetings, there will definitely be opinions about the sandwiches.

BUT... At the same time, there will incredibly be very few ingredient nominations. Why? So many reasons!

Even if a person really had some vague notion that "a lot of my favorite sandwiches had pickles", their company is small, and it doesn't actually deal with pickles. After a moment or three worth of thought about "which pickles?" and "do other people even like pickles?" and "how would I ask my boss for money to offer 1000 pickles... maybe" - this is quickly abandoned as not really worth the time. Each person receiving this email knows there are 499 other organizations, and that what we need are really just 4 ingredients! Someone else - with the money and food connections will nominate anyways.

And so, that's what happens... Several of them are from some grocery or food service industry. Their businesses totally get the importance of the sandwiches, as well as what people largely "like" and they are structured in a way where this basically isn't a problem for them. Several of them might even discuss among themselves to make sure they're bringing complimentary things... They'll nominate things like"lettuce" or "tomatoes", or maybe "cheese". Maybe some others don't discuss, but they do a lot of sandwich adjacent work and so they still nominate some pretty general condiments that could plausibly be used on a lot of good sandwiches... maybe they offer "mustard" or "mayonnaise". But, then there's also people with other food connections who offer other things. In the end, we wind up with eggs, Vegemite, peanut butter, some gourmet cactus jelly, caviar, and butter. Then, of course - there's someone who offers some not-even-food item.

Now, with the exception of the last one - there's not really something inherently wrong with any of those ingredients - but I think we could at least agree that making a sandwich from random combinations of them is likely to produce a lot of suboptimal sandwiches.

Vote for pickles!

Everyone should have a say here, and we need to account for multiple tastes - so the way organization goes about deciding on which sandwich everyone will have is by voting for ingredients.

Now, again -- not due to apathy about the actual sandwiches, or the results of the meeting that they will effect - only about 1/5th of people will actually vote for ingredients. Why? Again, "this isn't worth my time, surely someone else can figure out sandwiches" plays in heavily. Perhaps you could even say that lack of participation an indirect recognion of what a dysfunctional way this is to make a sandwich.

I mean, who really has time to read the ingredient nomination statements, and also to filter out what might be just bull? Who wants to take the time to learn about the dietary benefits of whey, or read the back and forth debates extolling the health virtues of mayonnaise over mustard - or read some people ranting about why the tomato lobby is colluding with the spinach people, putting honest bean farmers out of business - and inevitable back and forth. Who has time to get entangled in the debate about how maybe the caviar people shouldn't even have been invited in the first place?

Still, the organization believes that members should make the decision, and, it seems there are a few ways we could ask this question. So, in early years it was asked like this:

Here are 12 ingredients, everybody pick 4, and we'll use the some of the same ingredients from last year, plus the top 4 vote getters here to make everyone sandwiches!

The initial sandwiches reviews seemed pretty good, actually. Some people even praised the first sandwiches' 'distinguish tastes'. But, slowly, a lot of people began admitting that really, they actually preferred something a little more boring, or less healthy but had been kind of afraid to admit it.

Also, it was really kind of a crap shoot. There was really nothing preventing us getting a real mishmash of random ingredients that didn't go thereto at all. Well, except for the fact that, in the end that most people were casting 4 votes, representing all of the ingredients of an actual sandwich. Nobody on purpose probably picked the combination of vegemite, mustard, butter and caviar - even if they really liked some of those. This meant that there were at least some odds that lots of "common sandwich" votes were somewhat compatible.

But really... that still involved a lot of luck, and after a few years, people began to kind of resent the sandwiches. The quality of productivity, collaboration and relationships at the event suffered.

How about a nice club sandwich?

But then, some people realized... Hey... Maybe a better way would be to try to talk to lots of people and put together a proposal for a good sandwich. The whole sandwich. Let's call them the "Whole Sandwich People". Like, can we get enough people to vote for the same way, by just saying:

Hi. We've taken the time to help make sure we had good ingredients, and looking at everything that is available and trying to balance lots of things, we're recommending this nice club sandwich - <here's why>. If that sounds good, order the club.

Well, that question is a lot easier to answer, so at least some additional people say "yes, please, that sounds good". The simple volume of several new people voting in the same direction was enough to generally make sure that those sandwiches won.

Cool! The Whole Sandwich People also cared about variety and nutrition and all of the other things and so began actively working to make sure that good, compatible, but not commonly offered ingredients got nominated. And, we got some more interesting sandwiches because of it.

The Exclusive Club Sandwich

In the end, not everyone was happy though. Some kinds of ingredients just weren't getting picked anymore. People began asking: How can our trade organization possibly represent the tastes of the people who always offers us caviar if we never pick caviar!? Your club sandwich is an exclusive club, it's biased toward the food service people!

The Whole Sandwich People suggested that this seemed like a weird take. In fact, they pointed to the fact that sandwiches had gotten more diverse, interesting and nutritious by discussing the whole sandwich. Sure, a lot of the ingredients were coming from food service companies, but that's just practical realities about other parts of the dysfunctional system.

And, yes, the Whole Sandwich People admitted, it's true, none of our sandwiches have included caviar. But, where is the bug? Couldn't the caviar people work with others to find a way to offer an acceptable Whole Sandwich with caviar? It might be plausible - it's not even that we necessarily hate caviar! The truth is, we just don't know how to make a good sandwich with it, and all of our sandwich experiences with it so far are bad. Or, if you can't offer a whole sandwich with caviar that people will order willingly.... Maybe just find something to offer other than caviar.

This, it seems, was unacceptable. It was seen as unfair. The Whole Sandwich People held too much sway over sandwich determination. So, in an attempt to improve the fairness things, the trade organization changed the question to:

Here are 9 ingredients, everybody rank them in order of preference and they we will use a complex system whereby one of the things you ranked will be counted, and we'll use the 4 winners to make everyone sandwiches

This amplifies the likelihood that things like caviar wind up our sandwiches. That's literally the feature.

Why? Because there is no single ingredient that makes a club sandwich. While everyone ranks the widely approved ingredients the inevitably have to be in an order, and only one of them will count. Meanwhile, the small but passionate group of people who are really passionate about a less common ingredient, and who are maybe even a little irked because they are never picked, just put that as #1 - which definitely counts.

Have we optimized the sandwich? Because I thought was the goal. I think, no.

The TAG/AB Sandwiches

All of this is analogy for how the W3C TAG and AB, and I've laid it out in an attempt to explain why I think this is a really silly way to do it, and why I think it should change: The "best sandwich" isn't created by talking about ingredients in isolation - you have to talk about the sandwich.

Yes, I get that sandwiches are a bit of a strained analogy. TAG isn't exactly a sandwich.... Maybe "a bunch of people going to a restaurant and attempting to pick a shared menu of 4 items based on whatever random stuff is available" is a better one. Or maybe picking a DND party is a better one still...

But, the fuzzy point is mostly the same with any of them: "Good" and "bad" are, except at real extremes not really judge-able in isolation. Sure, there are some "not-actually-even-food" items we can universally say "no thanks" to, but that's also rare, and not really how it works.

What matters in counting is "which one is best" and there is almost never a clear answer to that question without considering the whole. What matters to really knowing what people want, is also asking them in such a way that they can actually participate meaningfully.

How big of a problem is this, really?

After all, it's been a few elections now since we changed the question, and while there were some early "surprises", things seem to be generally going alright. We just had a really big TAG election, for example, and people seem generally pretty happy with the results of the actual election part. I know I am. So... Maybe this is much ado about nothing? Is it really worth our time to discuss?

I think, yes. Here's why: Good results in recent elections have been a combination of factors that won't always hold and that we shouldn't have to count on. They as much despite the process as anything (as argued below).

Plus, it's just silly.

How to address this...?

Well, that's the big question. But, one thing seem obvious to me: We need to stop insisting that we can't talk about the sandwich and instead focus on how we talk about the sandwich all the way through the process.

We Whole Sandwich People have learned that even if the way the question has asked has been changed, it's still possible to help elect whole sandwiches, it's just much harder and requires an astonishing amount of more coordination, trust and ultimately still some luck.

The point is that all of the things that we need to happen are already happening, but the process actually fights them. That's broken.

I think we need to move to a more open model, but one that moves impractical noise out of the general discussion for people whom that is mostly annoying and/or confusing. I think we need to stop being "procedurally secretive" about nominations. I think we need to enlist a willing group of people who can help cooperatively search for candidates and make sure that we're getting ingredients that are good by a number of metrics. I think this group should work to seek something like consensus on a menu of a few possible, "well-balanced meals" and provide recommendations that allow ACs to work with that by default or dig into ala carte options and more details only if they really want to.

Further, I think that rotating terms only complicates this. It's very, very simple to re-affirm seats every time - and this really lets us talk about the whole thing at once.

We don't really even have to change a single rule to accomplish any of this in practice, but currently the rules fight it at every step and make it an extraordinary difficult exercise. We should fix that. I intend to open some issues with some more specific proposals soon, but I'd love to hear anyone's thoughts or work collaborative with anyone to make those good proposals.

January 24, 2021 05:00 AM

January 20, 2021

Sergio Villar

Flexbox Cats (a.k.a fixing images in flexbox)

In my previous post I discussed my most recent contributions to flexbox code in WebKit mainly targeted at reducing the number of interoperability issues among the most popular browsers. The ultimate goal was of course to make the life of web developers easier. It got quite some attention (I loved Alan Stearns' description of the post) so I decided to write another one, this time focused in the changes I recently landed in WebKit (Safari’s engine) to improve the handling of elements with aspect ratio inside flexbox, a.

January 20, 2021 10:45 AM

January 08, 2021

Samuel Iglesias

Vkrunner RPM packages available

VkRunner is a Vulkan shader tester based on Piglit’s shader_runner (I already talked about it in my blog). This tool is very helpful for creating simple Vulkan tests without writing hundreds of lines of code. In the Graphics Team at Igalia, we use it extensively to help us in the open-source driver development in Mesa such as V3D and Turnip drivers.

As a hobby project for last Christmas holiday season, I wrote the .spec file for VkRunner and uploaded it to Fedora’s Copr and OpenSUSE Build Service (OBS) for generating the respective RPM packages.

This is the first time I create a package and thanks to the documentation on how to create RPM packages, the process was simpler than I initially thought. If I find the time to read Debian New Maintainers’ Guide, I will create a DEB package as well.

Anyway, if you have installed Fedora or OpenSUSE in your computer and you want to try VkRunner, just follow these steps:

Fedora

  • Fedora:
$ sudo dnf copr enable samuelig/vkrunner
$ sudo dnf install vkrunner

OpenSUSE logo

  • OpenSUSE / SLE:
$ sudo zypper addrepo https://download.opensuse.org/repositories/home:samuelig/openSUSE_Leap_15.2/home:samuelig.repo
$ sudo zypper refresh
$ sudo zypper install vkrunner

Enjoy it!

January 08, 2021 07:42 AM

January 01, 2021

Eric Meyer

Highlighting Accessible Twitter Content

For my 2020 holiday break, I decided to get more serious about supporting the use of alternative text on Twitter.  I try to be rigorous about adding descriptive text to my images, GIFs, and videos, but I want to be more conscientious about not spreading inaccessible content through my retweets.

The thing is, Twitter doesn’t make it obvious whether someone else’s content has been described, and the way it structures (if I can reasonably use that word) its content makes it annoyingly difficult to conduct element or accessibility-property inspections.  So, in keeping with the design principles that underlie both the Web and CSS, I decided to take matters into my own hands.  Which is to say, I wrote a user stylesheet.

I started out by calling out things that lacked useful alt text.  It went something like this:

div[aria-label="Image"]::before {
	content: "WARNING: no useful ALT text";
	background: yellow;
	border: 0.5em solid red;
	border-radius: 1em;
	box-shadow: 0 0 0.5em black;
	…

…and so on, layering on some sizing, font stuff, and positioning to hopefully place the text where it would be visible.  This failed to satisfy for two reasons:

  1. Because of the way Twitter nests it dozens of repeatedly utility-classes divs, and the styles repeatedly applied thereby, many images were tall (but cut off) or wide (ditto) in ways that pulled the positioned generated text out of the visible frame shown on the site.  There wasn’t an easily-found human-readable predictable way to address the element I wanted to use as a positioning context.  So that was a problem.
  2. Almost every image in my feed had a big red and yellow WARNING on it, which quickly depressed me.

What I realized was that rather than calling out the failures, I needed to highlight the successes.  So I commented out the Big Red Angry Text approach above and got a lot more simple.

div[aria-label="Image"] {
	filter: grayscale(1) contrast(0.5);
}
div[aria-label="Image"]:hover {
	filter: none;
}

A screenshot of the author’s Twitter timeline, showing three tweets.  The middle tweet is text only.  The top tweet has a blurred-out image which is grayscale and of reduced contrast, indicating it has no useful alternative text.  The bottom tweet is also blurred, but it full color and contrast, indicating it has been given useful alternative text by the person who posted it.
Three consecutive tweets from my timeline on Friday, January 1st, 2021.  The blurring of the images in the top and bottom tweets is an effect of the Data Saver preference, not my CSS.

Just that.  De-emphasize the images that have the default alt text by way of their enclosing divs, and remove that effect on hover so I can see the image as intended if I so choose.  I like this approach better because it de-emphasizes images that aren’t properly described, while those which are described get a visual pop.  They stand out as lush islands in a flat sea.

In case you’ve been wondering why I’m selecting divs instead of img and video elements, it’s because I use the Data Saver setting on Twitter, which requires me to click on an image or video to load it.  (You can set it via Settings > Accessibility, display and languages > Data usage > Data saver.  It’s also what’s blurring the images in the screenshot shown here.)  I enable this setting to reduce network load, but also to give me an extra layer of protection when disturbing images and videos circulate.  I generally follow people who are careful about not sharing disturbing content, but I sometimes go wandering outside my main timeline, and never know what I’ll find out there.

After some source digging, I discovered a decent way to select non-described videos, which I combined with the existing image styles:

div[aria-label="Image"],
div[aria-label="Embedded video"] {
	filter: grayscale(1) contrast(0.5);
}
div[aria-label="Image"]:hover,
div[aria-label="Embedded video"]:hover {
	filter: none;
}

The fun part is, Twitter’s architecture spits out nested divs with that same ARIA label for videos, which I imagine could be annoying to people using screen readers.  Besides that, it also has the effect of applying the filter twice, which means videos that haven’t been described get their contrast double-reduced!  And their grayscale double-enforced!  Fun.

What I didn’t expect was that when I start playing a video, it loses the grayscale and contrast reduction effects even when not being hovered, which makes the second rule above a little over-written.  I don’t see the DOM structure changing a whole lot when the video loads and plays, so either videos are being treated differently for filter purposes, or I’m missing something in the DOM that’s invalidating the selector matching.  I might poke at it over time to find a fix, or I may just let it go.  The user experience isn’t too far off what I wanted anyway.

There is a gap in my coverage, which is GIFs pulled from Twitter’s GIF pool.  These have default alt text other than Image, which makes selecting for them next to impossible.  Just a few examples pulled from Firefox’s Accessibility panel when I searched the GIF panel for “this is a test”:

testing GIF
This Is ATest Fool GIF
Corona Test GIF by euronews
Test Fail GIF
Corona Virus GIF by guardian
Its ATest Josh Subdquist GIF
Corona Stay Home GIF by INTO ACTION
Is This A Test GIF
Stressed Out Community GIF
A1b2c3 GIF

I assume these are Giphy titles or something like that.  In nearly every case, they’re insufficient, if not misleading or outright useless.  I looked for markers in the DOM to be able to catch these, but didn’t find anything that was obviously useful.

I did think briefly about filtering for any aria-label that contains the string GIF ([aria-label*="GIF"]), but that would improperly catch images and videos that have been described but happen to have the string GIF inside them somewhere.  This might be a relatively rare occurrence, but I’m loth to gray out media that someone went to the effort of describing.  I may change my mind about this, but for now, I’m accepting that GIFs which appear in full color are probably not described, particularly when containing common memes, and will try to be careful.

I apply the above styles in Firefox using Stylus, which also available for Chrome, and they’re working pretty well for me.  I wish I could figure out a way to apply them in mobile contexts, but that’s a (much bigger) problem for another day.

I’m not the first to tread this ground, nor do I expect to be the last, sadly.  For a deeper dive into all the details of Twitter accessibility and the pitfalls that can occur, please read Adrian Roselli’s excellent article Improving Your Tweet Accessibility from just over two years ago.  And if you want apply accessibility-aid CSS to your own Twitter experience but can’t or won’t use Stylus, Adrian has a bookmarklet that injects Twitter alt text all set up and ready to go — you can use it as-is, or replace the CSS in his bookmarklet with mine above or your own if you want to take a different approach.

So that’s how I’m upping my awareness of accessible content on Twitter in 2021.  I’d love to hear what y’all are using to improve your own experiences, or links to tools and resources on this same topic.  If you have any of that, please drop the links in a comment below, so that everyone who reads this can benefit.  Thanks!


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at January 01, 2021 09:06 PM