Planet Igalia

May 06, 2021

Eleni Maria Stea

Sharing texture data between ANGLE and the native system driver using DMA buffers and EGL on Linux (proof of concept)

This post is about an experiment I’ve performed to investigate if it’s possible to fill a texture from an ANGLE EGL/GLESv2 context (ANGLE is an EGL/GLESv2 implementation on top of other graphics APIs), and use it from a native (for example mesa3D) EGL/OpenGL context on Linux. I’ve written a program that is similar to my … Continue reading Sharing texture data between ANGLE and the native system driver using DMA buffers and EGL on Linux (proof of concept)

by hikiko at May 06, 2021 01:23 PM

May 05, 2021

Danylo Piliaiev

Turnips in the wild (Part 2)

In Turnips in the wild (Part 1) we walked through two issues, one in TauCeti Benchmark and the other in Genshin Impact. Today, I have an update about the one I didn’t have plan to fix, and a showcase of two remaining issues I met in Genshin Impact.

Genshin Impact

Gameplay – Disco Water

In the previous post I said that I’m not planning to fix the broken water effect since it relied on undefined behavior.

Screenshot of the gameplay with body of water that has large colorful artifacts

However, I was notified that same issue was fixed in OpenGL driver for Adreno (Freedreno) and the fix is rather easy. Even though for Vulkan it is clearly an undefined behavior, with other APIs it might not be so clear. Thus, given that we want to support translation from other APIs, there are already apps which rely on this behavior, and it would be just a bit more performant - I made a fix for it.

Screenshot of the gameplay with body of water without artifacts

The issue was fixed by “tu: do not corrupt unwritten render targets (!10489)”

Login Screen

The login screen welcomes us with not-so-healthy colors:

Screenshot of a login screen in Genshin Impact which has wrong colors - columns and road are blue and white

And with a few failures to allocate registers in the logs. The failure to allocate registers isn’t good and may cause some important shader not to run, but let’s hope it’s not that. Thus, again, we should take a closer look at the frame.

Once the frame is loaded I’m staring at an empty image at the end of the frame… Not a great start.

Such things mostly happen due to a GPU hang. Since I’m inspecting frames on Linux I took a look at dmesg and confirmed the hang:

 [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence ...

Fortunately, after walking through draw calls, I found that the mis-rendering happens before the call which hangs. Let’s look at it:

Screenshot of a correct draw call right before the wrong one being inspected in RenderDoc
Draw call right before
Screenshot of a draw call, that draws the wrong colors, being inspected in RenderDoc
Draw call with the issue

It looks like some fullscreen effect. As in the previous case - the inputs are fine, the only image input is a depth buffer. Also, there are always uniforms passed to the shaders, but when there is only a single problematic draw call - they are rarely an issue (also they are easily comparable with the proprietary driver if I spot some nonsensical values among them).

Now it’s time to look at the shader, ~150 assembly instructions, nothing fancy, nothing obvious, and a lonely kill near the top. Before going into the most “fun” part, it’s a good idea to make sure that the issue is 99% in the shader. RenderDoc has a cool feature which allows to debug shader (its SPIRV code) at a certain fragment (or vertex, or CS invocation), it does the evaluation on CPU, so I can use it as some kind of a reference implementation. In our case the output between RenderDoc and actual shader evaluation on GPU is different:

Screenshot of the color value calculated on CPU by RenderDoc
Evaluation on CPU: color = vec4(0.17134, 0.40289, 0.69859, 0.00124)
Screenshot of the color value calculated on GPU
On GPU: color = vec4(3.1875, 4.25, 5.625, 0.00061)

Knowing the above there is only one thing left to do - reduce the shader until we find the problematic instruction(s). Fortunately there is a proprietary driver which renders the scene correctly, therefor instead of relying on intuition, luck, and persistance - we could quickly bisect to the issue by editing and comparing the edited shader with a reference driver. Actually, it’s possible to do this with shader debugging in RenderDoc, but I had problems with it at that moment and it’s not that easy to do.

The process goes like this:

  1. Decompile SPIRV into GLSL and check that it compiles back (sometimes it requires some editing)
  2. Remove half of the code, write the most promising temporary variable as a color, and take a look at results
  3. Copy the edited code to RenderDoc instance which runs on proprietary driver
  4. Compare the results
  5. If there is a difference - return deleted code, now we know that the issue is probably in it. Thus, bisect it by returning to step 2.

This way I bisected to this fragment:

_243 = clamp(_243, 0.0, 1.0);
_279 = clamp(_279, 0.0, 1.0);
float _290;
if (_72.x) {
  _290 = _279;
} else {
  _290 = _243;
}

color0 = vec4(_290);
return;

Writing _279 or _243 to color0 produced reasonable results, but writing _290 produced nonsense. The difference was only the presence of condition. Now, having a minimal change which reproduces the issue, it’s possible to compare native assembly.

Bad:

mad.f32 r0.z, c0.y, r0.x, c6.w
sqrt r0.y, r0.y
mul.f r0.x, r1.y, c1.z
(ss)(nop2) mad.f32 r1.z, c6.x, r0.y, c6.y
(nop3) cmps.f.ge r0.y, r0.x, r1.w
(sat)(nop3) sel.b32 r0.w, r0.z, r0.y, r1.z

Good:

(sat)mad.f32 r0.z, c0.y, r0.x, c6.w
sqrt r0.y, r0.y
(ss)(sat)mad.f32 r1.z, c6.x, r0.y, c6.y
(nop2) mul.f r0.y, r1.y, c1.z
add.f r0.x, r0.z, r1.z
(nop3) cmps.f.ge r0.w, r0.y, r1.w
cov.u r1.w, r0.w
(rpt2)nop
(nop3) add.f r0.w, r0.x, r1.w

By running them in my head I reasoned that they should produce the same results. Something works not as expected. After a bit more changes in GLSL, it became apparent that something wrong with clamp(x, 0, 1) which is translated into (sat) modifier for instructions. A bit more digging and I found out that hardware doesn’t understand saturation modifier being placed on sel. instruction (sel is a selection between two values based on third).

Disallowing compiler to place saturation on sel instruction resolved the bug:

Login screen after the fix

The issue was fixed by “ir3: disallow .sat on SEL instructions (!9666)”

Gameplay – Where did the trees go?

Screenshot of the gameplay with trees and grass being almost black

The trees and grass are seem to be rendered incorrectly. After looking through the trace and not finding where they were actually rendered, I studied the trace on proprietary driver and found them. However, there weren’t any such draw calls on Turnip!

The answer was simple, shaders failed to compile due to the failure in a register allocation I mentioned earlier… The general solution would be an implementation of register spilling. However in this case there is a pending merge request that implements a new register allocator, which later would help us implement register spilling. With it shaders can now be compiled!

Screenshot of the gameplay with trees and grass being rendered correctly

More Turnip adventures to come!

by Danylo Piliaiev at May 05, 2021 09:00 PM

May 04, 2021

Manuel Rego

:focus-visible in WebKit - April 2021

A new report after a new month is gone, you can see the previous ones at:

Again this is a status report about the work Igalia is doing on the implementation of :focus-visible in WebKit, which is part of the Open Prioriziatation campaign. Thanks everyone supporting this!

There has been some nice progress in April, though some things are still under ongoing discussion.

Script focus and :focus-visible

Finally we decided to reflect the reality in the script focus tests that I created, and merge them on WPT. The ones where the implementations Chromium and Firefox don’t match the agreed expectations (the ones when you click a non-focusable element, and that click moves focus elsewhere), where marked as .tentative.

So we opened a separated issue to explain the situation and gather more feedback about what to do here.

Once we had the tests I implemented the same behavior as other browsers, the current one reflected on the tests, in WebKit. The patch got reviewed and merged, so script focus and :focus-visible work the same in all browsers right now.

:focus-visible and Shadow DOM

The test that checks that :focus-visible doesn’t match on ShadowRoot was also merged in WPT. That’s the current behavior on WebKit implementation too. More about this in January’s post.

Implementation details

There was a crash in WebKit due to my initial implementation of script focus, that problem has been already fixed. Also an extra bug was found and fixed too.

On the review of those patches, some new discussion started about different things related to :focus-visible feature, like why a keyboard input triggers :focus-visible matching. The discussion with Apple engineers is ongoing on the bug and let’s see how it ends.

Some numbers

Let’s take a look to the numbers again:

  • 26 PRs merged in WPT (5 in April).
  • 27 patches landed in WebKit (10 in April).
  • 9 patches landed in Chromium (2 in April).
  • 2 PRs merged in CSS spcs.
  • 1 PR merged in HTML spec.

Next steps

Implementation is mostly over, now the goal is to close the discussions with the different parties and check the possibilities of shipping this in WebKit.

Thanks everyone that has provided input in the different discussions and jumped on the patches review. Your feedback has been really useful to keep moving this forward.

Stay tuned!

May 04, 2021 10:00 PM

Enrique Ocaña

GStreamer WebKit debugging by instrumenting source code (2/3)

In this post I show some more useful debugging tricks. Check also the other posts of the series:

Print current thread id

The thread id is generated by Linux and can take values higher than 1-9, just like PIDs. This thread number is useful to know which function calls are issued by the same thread, avoiding confusion between threads.

#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

printf("%s [%d]\n", __PRETTY_FUNCTION__, syscall(SYS_gettid));
fflush(stdout);

Debug GStreamer thread locks

We redefine the GST_OBJECT_LOCK/UNLOCK/TRYLOCK macros to print the calls, compare locks against unlocks, and see who’s not releasing its lock:

#include "wtf/Threading.h"
#define GST_OBJECT_LOCK(obj) do { \
  printf("### [LOCK] %s [%p]\n", __PRETTY_FUNCTION__, &Thread::current()); fflush(stdout); \
  g_mutex_lock(GST_OBJECT_GET_LOCK(obj)); \
} while (0)
#define GST_OBJECT_UNLOCK(obj) do { \
  printf("### [UNLOCK] %s [%p]\n", __PRETTY_FUNCTION__, &Thread::current()); fflush(stdout); \
  g_mutex_unlock(GST_OBJECT_GET_LOCK(obj)); \
} while (0)
#define GST_OBJECT_TRYLOCK(obj) ({ \
  gboolean result = g_mutex_trylock(GST_OBJECT_GET_LOCK(obj)); \
  if (result) { \
   printf("### [LOCK] %s [%p]\n", __PRETTY_FUNCTION__, &Thread::current()); fflush(stdout); \
  } \
  result; \
})

Warning: The statement expression that allows the TRYLOCK macro to return a value will only work on GCC.

There’s a way to know which thread has taken a lock in glib/GStreamer using gdb. First locate the stalled thread:

(gdb) thread 
(gdb) bt
#2  0x74f07416 in pthread_mutex_lock ()
#3  0x7488aec6 in gst_pad_query ()
#4  0x6debebf2 in autoplug_query_allocation ()

(gdb) frame 3
#3  0x7488aec6 in gst_pad_query (pad=pad@entry=0x54a9b8, ...)
4058        GST_PAD_STREAM_LOCK (pad);

Now get the process id (PID) and use the pthread_mutex_t structure to print the Linux thread id that has acquired the lock:

(gdb) call getpid()
$30 = 6321
(gdb) p ((pthread_mutex_t*)pad.stream_rec_lock.p)->__data.__owner
$31 = 6368
(gdb) thread find 6321.6368
Thread 21 has target id 'Thread 6321.6368'

Trace function calls (poor developer version)

If you’re using C++, you can define a tracer class. This is for webkit, but you get the idea:

#define MYTRACER() MyTracer(__PRETTY_FUNCTION__);
class MyTracer {
public:
    MyTracer(const gchar* functionName)
      : m_functionName(functionName) {
      printf("### %s : begin %d\n", m_functionName.utf8().data(), currentThread()); fflush(stdout);
    }
    virtual ~MyTracer() {
        printf("### %s : end %d\n", m_functionName.utf8().data(), currentThread()); fflush(stdout);
    }
private:
    String m_functionName;
};

And use it like this in all the functions you want to trace:

void somefunction() {
  MYTRACER();
  // Some other code...
}

The constructor will log when the execution flow enters into the function and the destructor will log when the flow exits.

Setting breakpoints from C

In the C code, just call raise(SIGINT) (simulate CTRL+C, normally the program would finish).

And then, in a previously attached gdb, after breaking and having debugging all you needed, just continue the execution by ignoring the signal or just plainly continuing:

(gdb) signal 0
(gdb) continue

There’s a way to do the same but attaching gdb after the raise. Use raise(SIGSTOP) instead (simulate CTRL+Z). Then attach gdb, locate the thread calling raise and switch to it:

(gdb) thread apply all bt
[now search for "raise" in the terminal log]
Thread 36 (Thread 1977.2033): #1 0x74f5b3f2 in raise () from /home/enrique/buildroot/output2/staging/lib/libpthread.so.0
(gdb) thread 36

Now, from a terminal, send a continuation signal: kill -SIGCONT 1977. Finally instruct gdb to single-step only the current thread (IMPORTANT!) and run some steps until all the raises have been processed:

(gdb) set scheduler-locking on
(gdb) next    // Repeat several times...

Know the name of a GStreamer function stored in a pointer at runtime

Just use this macro:

GST_DEBUG_FUNCPTR_NAME(func)

Detecting memory leaks in WebKit

RefCountedLeakCounter is a tool class that can help to debug reference leaks by printing this kind of messages when WebKit exits:

  LEAK: 2 XMLHttpRequest
  LEAK: 25 CachedResource
  LEAK: 3820 WebCoreNode

To use it you have to modify the particular class you want to debug:

  • Include wtf/RefCountedLeakCounter.h
  • DEFINE_DEBUG_ONLY_GLOBAL(WTF::RefCountedLeakCounter, myClassCounter, ("MyClass"));
  • In the constructor: myClassCounter.increment()
  • In the destructor: myClassCounter.decrement()

by eocanha at May 04, 2021 06:00 AM

April 30, 2021

Brian Kardell

ES Temporal: 2 Minute Standards

ES Temporal: 2 Minute Standards

You might have heard that Temporal has recently reached Stage 3 in ECMA, here's a #StandardsIn2Min explaination of it...

Since it's beginnings JavaScript has had only a rudimentary Date object. It copied Date from an early edition of Java and it was intended to be used for programming just about anything relating to time. While Java itself quickly deprecated and improved their situation, JavaScript implementations didn't follow suit. It also introduced its own warts and quirks to Date along the way. As a result, JavaScript libraries dedicated toward reasoning about the complexities of time, like moment.js became common and essential.

Temporal, now at stage 3 as of this writing, is the result of a lot of work initiated by maintainers of those projects and shepherded through the standards process. It introduces lots of rich APIs to the JavaScript standard library, hosted by a new top-level Temporal object. Large top-level introductions of this sort (like Temporal, Math and Intl) are exceedingly rare.

Temporal provides a number of objects, all immutable and serializable, and each with their own methods for reasoning about time in different ways.

It contains some fundamental concepts which implement standards for calendar systems and timezones respectively, and set the foundations for how a lot of methods do their work:

  • Temporal.Calendar
  • Temporal.TimeZone

It introduces Temporal.Instant which is used for dealing with an instant in time to various degrees of precision.

It also introduces a number of "plain" themed objects geared toward providing APIs for the different ways we think about time not simply in terms of different kinds of precision. For example...

  • Temporal.PlainTime is for dealing with wall-clock time that is not associated with a particular date or time zone.
  • Temporal.PlainMonthDay is for date without a year components, useful for annual events like "The fourth of July" or "Christmas Day".
  • Temporal.PlainDateTime is for representing a calendar date and wall-clock time that does not carry time zone information, e.g. December 7th, 1995 at 3:00 PM (in the Gregorian calendar).
  • Temporal.PlainYearMonth is useful for expressing things like "The October 2020 edition of Vanity Fair".

It also includes...

  • Temporal.ZonedDateTime is for reasoning about dates and times in the timezone offsets reckoned by a particular calendar
  • Temporal.Duration is used for measuring the duration between two temporal objects.
  • Temporal.now provides APIs about current the moment in time.
A diagram illustrating the different types, relationships and concepts described that make up Temporal

You can learn a lot more by following through links in the Temporal proposal repo, including helpful Reference documentation and examples, a cookbook to help you get started and learn the ins and outs of Temporal which include a (not production ready) polyfill in every page so that you can open Dev Tools and explore for yourself.

If you're interested in hearing about the history, development, challenges, inner workings or rationale behind any of this, I recently hosted an edition of our podcast on this topic with guests who worked on the standard.

April 30, 2021 04:00 AM

April 29, 2021

Ricardo García

Vulkan Ray Tracing Resources and Overview

As you may know, I’ve been working on VK-GL-CTS for some time now. VK-GL-CTS the Conformance Test Suite for Vulkan and OpenGL, a large collection of tests used to verify implementations of the Vulkan and OpenGL APIs work as intended by the specification. My work has been mainly focused on the Vulkan side of things as part of Igalia's ongoing collaboration with Valve.

Last year, Khronos released the official specification of the Vulkan ray tracing extensions and I had the chance to participate in the final stages of the process by improving test coverage and fixing bugs in existing CTS tests, which is work that continues to this day mixed with other types of tasks in my backlog.

As part of this effort I learned many bits of the new Vulkan Ray Tracing API and even provided some very minor feedback about the spec, which resulted in me being listed as contributor to the VK_KHR_acceleration_structure extension.

Now that the waters are a bit more calm, I wanted to give you a list of resources and a small overview of the main concepts behind the Vulkan version of ray tracing.

General Overview

There are a few basic resources that can help you get acquainted with the new APIs.

  1. The official Khronos blog published an overview of the ray tracing extensions that explains some of the basic concepts like acceleration structures, ray tracing pipelines (and what their different shader stages do) and ray queries.

  2. Intel’s Jason Ekstrand gave an excellent talk about ray tracing in Vulkan in XDC 2020. I highly recommend you to watch it if you’re interested.

  3. For those wanting to get their hands on some code, the Khronos official Vulkan Samples repository includes a basic ray tracing sample.

  4. The official Vulkan specification text (warning: very large HTML document), while intimidating, is actually a good source to learn many new parts of the API. If you’re already familiar with Vulkan, the different sections about ray tracing and ray tracing pipelines are worth reading.

Acceleration Structures

The basic idea of ray tracing, as a tool, is that you must be able to choose an arbitrary point in space as the ray origin and a direction vector, and ask your implementation if that ray intersects anything along the way given a minimum and maximum distance.

In a modern computer or console game the number of triangles present in a scene is huge, so you can imagine detecting intersections between them and your ray can be very expensive. The implementation typically needs to organize the scene geometry in a hierarchical tree-like structure that can be traversed more efficiently by discarding large amounts of geometry with some simple tests. That’s what an Acceleration Structure is.

Fortunately, you don’t have to organize the scene geometry yourself. Implementations are free to choose the best and most suitable acceleration structure format according to the underlying hardware. They will build this acceleration structure for you and give you an opaque handle to it that you can use in your app with the rest of the API. You’re only required to provide the long list of geometries making up your scene.

You may be thinking, and you’d be right, that building the acceleration structure must be a complex and costly process itself, and it is. For this reason, you must try to avoid rebuilding them completely all the time, in every frame of the app. This is why acceleration structures are divided in two types: bottom level and top level.

Bottom level acceleration structures (BLAS) contain lists of geometries and typically represent whole models in your scene: a building, a tree, an object, etc.

Top level acceleration structures (TLAS) contain lists of “pointers” to bottom level acceleration structures, together with a transformation matrix for each pointer.

In the diagram below, taken from Jason Ekstrand’s XDC 2020 talk[1], you can see the blue square representing the TLAS, the red squares representing BLAS and the purple squares representing geometries.

Picture showing a hand-drawn cowboy, cactus and cow. A blue square surrounds the whole picture. Orange squares surround the cowboy, cactus and cow. Individual pieces of the cowboy, cactus and cow are surrounded by purple squares.

The whole idea behind this is that you may be able to build the bottom level acceleration structure for each model only once as long as the model itself does not change, and you will include this model in your scene one or more times. Each time, it will have an associated transformation matrix that will allow you to translate, rotate or scale the model without rebuilding it. So, in each frame, you may only have to rebuild the top level acceleration structure while keeping the bottom level ones intact. Other tricks you can use include rebuilding the top level acceleration structure in a reduced frame rate compared to the app, or using a simplified version of the world geometry when tracing rays instead of the more detailed model used when rendering the scene normally.

Acceleration structures, ray origins and direction vectors typically use world-space coordinates.

Ray Queries

In its most basic form, you can access the ray tracing facilities of the implementation by using ray queries. Before ray tracing, Vulkan already had graphics and compute pipelines. One of the main components of those pipelines are shader programs: application-provided instructions that run on the GPU telling it what to do and, in a graphics pipeline, how to process geometry data (vertex shaders) and calculate the color of each pixel that ends up on the screen (fragment shaders).

When ray queries are supported, you can trace rays from those “classic” shader programs for any purpose. For example, to implement lighting effects in a fragment shader.

Ray Tracing Pipelines

The full power of ray tracing in Vulkan comes in the form of a completely new type of pipeline, the ray tracing pipeline, that complements the existing compute and graphics pipelines.

Most Vulkan ray tracing tutorials, including the Khronos blog post I mentioned before, explain the basics of these pipelines, including the new shader stages (ray generation, intersection, any hit, closest hit, etc) and how they work together. They cover acceleration structure traversal for each ray and how that triggers execution of a particular shader program provided by your app. The image below, taken from the official Vulkan specification[2], contains the typical representation of this traversal process.

Ray Tracing Acceleration Structure traversal diagram showing the ray generation shader initiating the traversal procedure, the miss shader called when the ray does not intersect any geometry and the intersection, any hit and closest hit shaders called when an intersection is found

The main difference between the traditional graphics pipelines and ray tracing pipelines is the following one. If you’re familiar with the classic graphics pipelines, you know the app decides and has full control over what is being drawn at any moment. Your command stream usually looks like this.

  1. Begin render pass (I’ll be using this depth buffer to discard overlapping geometry on the screen and the resulting pixels need to be written to this image)

  2. Bind descriptor sets (I’ll be using these textures and data buffers)

  3. Bind pipeline (This is how the whole process looks like, including the crucial part of shader programs: let me tell you what to do with each vertex and how to calculate the color of each resulting pixel)

  4. Draw this

  5. Draw that

  6. Bind pipeline (I’ll be using different shader programs for the next draws, thank you)

  7. Draw some more

  8. Draw even more

  9. Bind descriptor sets (The textures and other data will be different from now on)

  10. Bind pipeline (The shaders will be different too)

  11. Additional draws

  12. Final draws (Almost there, buddy)

  13. End render pass (I’m done)

Each draw command in the command stream instructs the GPU to draw an object and, because the app is recording that command, the app knows what that object is and the appropriate resources that need to be used to draw that object, including textures, data buffers and shader programs. Before recording the draw command, the app can prepare everything in advance and tell the implementation which shaders and resources will be used with the draw command.

In a ray tracing pipeline, the scene geometry is organized in an acceleration structure. When tracing a ray, you don’t know, in advance, which geometry it’s going to intersect. Each geometry may need a particular set of resources and even the shader programs may need to change with each geometry or geometry type.

Shader Binding Table

For this reason, ray tracing APIs need you to create a Shader Binding Table or SBT for short. SBTs represent (potentially) large arrays of shaders organized in shader groups, where each shader group has a handle that sits in a particular position in the array. The implementation will access this table, for example, when the ray hits a particular piece of geometry. The index it will use to access this table or array will depend on several parameters. Some of them come from the ray tracing command call in a ray generation shader, and others come from the index of the geometry and instance data in the acceleration structure.

There’s a formula to calculate that index and, while it’s not very complex, it will determine the way you must organize your shader binding table so it matches your acceleration structure, which can be a bit of a headache if you’re new to the process.

I highly recommend to take a look at Will Usher’s Shader Binding Table Tutorial, which includes an interactive SBT builder tool that will let you get an idea of how things work and fit together.

The Shader Binding Table is complemented in Vulkan by a Shader Record Buffer. The concept is that entries in the Shader Binding Table don’t have a fixed size that merely corresponds to the size of a shader group handle identifying what to run when the ray hits that particular piece of geometry. Instead, each table entry can be a bit larger and you can put arbitrary data after each handle. That data block is called the Shader Record Buffer, and can be accessed from shader programs when they run. They may be used, for example, to store indices to resources and other data needed to draw that particular piece of geometry, so the shaders themselves don’t have to be completely unique per geometry and can be reused more easily.

Conclusion

As you can see, ray tracing can be more complex than usual but it’s a very powerful tool. I hope the basic explanations and resources I linked above help you get to know it better. Happy hacking!

Notes

[1] The Acceleration Structure representation image with the cowboy, cactus and cow is © 2020 Jason Ekstrand and licensed under the terms of CC-BY.

[2] The Acceleration Structure traversal diagram in a ray tracing pipeline is © 2020 The Khronos Group and released under the terms of CC-BY.

April 29, 2021 07:48 PM

April 27, 2021

Enrique Ocaña

GStreamer WebKit debugging by instrumenting source code (1/3)

This is the continuation of the GStreamer WebKit debugging tricks post series. In the next three posts, I’ll focus on what we can get by doing some little changes to the source code for debugging purposes (known as “instrumenting”), but before, you might want to check the previous posts of the series:

Know all the env vars read by a program by using LD_PRELOAD to intercept libc calls

// File getenv.c
// To compile: gcc -shared -Wall -fPIC -o getenv.so getenv.c -ldl
// To use: export LD_PRELOAD="./getenv.so", then run any program you want
// See http://www.catonmat.net/blog/simple-ld-preload-tutorial-part-2/

#define _GNU_SOURCE

#include <stdio.h>
#include <dlfcn.h>

// This function will take the place of the original getenv() in libc
char *getenv(const char *name) {
 printf("Calling getenv(\"%s\")\n", name);

 char *(*original_getenv)(const char*);
 original_getenv = dlsym(RTLD_NEXT, "getenv");

 return (*original_getenv)(name);
}

See the breakpoints with command example to know how to get the same using gdb. Check also Zan’s libpine for more features.

Track lifetime of GObjects by LD_PRELOADing gobject-list

The gobject-list project, written by Thibault Saunier, is a simple LD_PRELOAD library for tracking the lifetime of GObjects. When loaded into an application, it prints a list of living GObjects on exiting the application (unless the application crashes), and also prints reference count data when it changes. SIGUSR1 or SIGUSR2 can be sent to the application to trigger printing of more information.

Overriding the behaviour of a debugging macro

The usual debugging macros aren’t printing messages? Redefine them to make what you want:

#undef LOG_MEDIA_MESSAGE
#define LOG_MEDIA_MESSAGE(...) do { \
  printf("LOG %s: ", __PRETTY_FUNCTION__); \
  printf(__VA_ARGS__); \
  printf("\n"); \
  fflush(stdout); \
} while(0)

This can be done to enable asserts on demand in WebKit too:

#undef ASSERT
#define ASSERT(assertion) \
  (!(assertion) ? \
      (WTFReportAssertionFailure(__FILE__, __LINE__, WTF_PRETTY_FUNCTION, #assertion), \
       CRASH()) : \
      (void)0)

#undef ASSERT_NOT_REACHED
#define ASSERT_NOT_REACHED() do { \
  WTFReportAssertionFailure(__FILE__, __LINE__, WTF_PRETTY_FUNCTION, 0); \
  CRASH(); \
} while (0)

It may be interesting to enable WebKit LOG() and GStreamer GST_DEBUG() macros only on selected files:

#define LOG(channel, msg, ...) do { \
  printf("%s: ", #channel); \
  printf(msg, ## __VA_ARGS__); \
  printf("\n"); \
  fflush(stdout); \
} while (false)

#define _GST_DEBUG(msg, ...) do { \
  printf("### %s: ", __PRETTY_FUNCTION__); \
  printf(msg, ## __VA_ARGS__); \
  printf("\n"); \
  fflush(stdout); \
} while (false)

Note all the preprocessor trickery used here:

  • First arguments (channel, msg) are captured intependently
  • The remaining args are captured in __VA_ARGS__
  • do while(false) is a trick to avoid {braces} and make the code block work when used in if/then/else one-liners
  • #channel expands LOG(MyChannel,....) as printf("%s: ", "MyChannel"). It’s called “stringification”.
  • ## __VA_ARGS__ expands the variable argument list as a comma-separated list of items, but if the list is empty, it eats the comma after “msg”, preventing syntax errors

Print the compile-time type of an expression

Use typeid(<expression>).name(). Filter the ouput through c++filt -t:

std::vector<char *> v; 
printf("Type: %s\n", typeid(v.begin()).name());

Abusing the compiler to know all the places where a function is called

If you want to know all the places from where the GstClockTime toGstClockTime(float time) function is called, you can convert it to a template function and use static_assert on a wrong datatype like this (in the .h):

template <typename T = float> GstClockTime toGstClockTime(float time) { 
  static_assert(std::is_integral<T>::value,
    "Don't call toGstClockTime(float)!");
  return 0;
}

Note that T=float is different to integer (is_integral). It has nothing to do with the float time parameter declaration.

You will get compile-time errors like this on every place the function is used:

WebKitMediaSourceGStreamer.cpp:474:87:   required from here
GStreamerUtilities.h:84:43: error: static assertion failed: Don't call toGstClockTime(float)!

Use pragma message to print values at compile time

Sometimes is useful to know if a particular define is enabled:

#include <limits.h>

#define _STR(x) #x
#define STR(x) _STR(x)

#pragma message "Int max is " STR(INT_MAX)

#ifdef WHATEVER
#pragma message "Compilation goes by here"
#else
#pragma message "Compilation goes by there"
#endif

...

The code above would generate this output:

test.c:6:9: note: #pragma message: Int max is 0x7fffffff
 #pragma message "Int max is " STR(INT_MAX)
         ^~~~~~~
test.c:11:9: note: #pragma message: Compilation goes by there
 #pragma message "Compilation goes by there"
         ^~~~~~~

by eocanha at April 27, 2021 06:00 AM

April 25, 2021

Brian Kardell

Design Affordance Controls

Design Affordance Controls

We often use a single word (the actual word varies) to discuss "controls" . This, I believe, carries with it a fundamental assumption that they are all somehow "the same," and that has shaped how we think about them. In this post I'll explain why I have recently come to think that perhaps they aren't really quite the same at all. I'll also suggest that some additional terminology which could describe a few broad classes of controls, could help us better discuss (and perhaps shape) them.

What is a "control", exactly?

On its face, it seems like kind of an absurdly simple question, right?

However, consider this: Many UI toolkits outside the web have, at some point, explicitly defined some kind of control which could be described as "a container with scrollbars". In a way, this makes sense: When something is scrollable, there are several UI implications:

  • They paint scroll bars and mouse/touch affordances.
  • They become part of the sequential focus order.
  • Standard keyboard control affordances for managing the scroll are added.
  • There are events and UI states to track.
  • They might gain an affordance to become user-sizable.
  • ...and so on.

On the web, we don't look at the problem that way. There is no special "scroll container" element, nor even an ARIA role for it. Instead, elements are, first, just "Good Semantic Content", and whether they are scrollable (or not) is considered a matter of, and subject to, the design. Overflow, and affordances related to it, are decidedly presentational.

This is pretty interesting because, for example, authors can choose when a <section> should be "component-like" with affordances or not - they vary in pursuit of good design.

Form controls, on the other hand, are definitely not like that at all. Their entire nature is to always be a particular control. As we look to introducing new "controls" in HTML, I think it could be helpful to think about this sort of distinction.

"Design Affordance Controls"

I would like to argue that there are several other "common controls" (collapsible content, tabs and accordions are some examples) which have more in common with scrollable areas than they do with form controls, and that this might be worth careful consideration.

While these components aren't simply about "overflow" (they manage actual hiddenness and have ARIA roles), they do seem to share a lot of other qualities with scroll containers:

  • They can be (and should be, I will argue) thought of, first and foremost, as natural, meaningful document content..
  • Whether or not these elements are "control-like" is something which should be deployed by authors subject to the design in order to provide helpful affordances for users to be able to more easily consume the content.
  • Their interaction is not primary to the control, but secondary.

I label these here as "Design Affordance Controls". To consider why this matters: HTML has <details> and <summary> elements which provide collapsible content and I will use them as a point of comparison.

What about print?

When you print a page containing input collection controls, what prints is the control, in its current state. This is entirely unsurprising. What else could it do, really?

However, is this uniformly desirable? I would suggest it is not. In fact, it is not universally true of any of the things I have labelled "Display Affordance Controls". Just as with scrolling, there isn't a simple "Yes, always" or "No, never" to the question of whether it should be content-like, or control like.. It is reasonable for an author to decide whether it should print either way.

To illustrate: Imagine that I built a a site about recipes. It has sections about the 'ingredients', 'instructions' and 'dietary information'. I might like those to display on someone's screen as collapsible sections of the sort provided by <details> and <summary> in order to provide a a nicer design and a set of affordances for a user to consume the content more easily.

Recipes displayed with disclosure sections

But really, at the end of the day, it's just that: A convenience of design affordance. As an author, in this case, I'd like it to print with all of the content, sans controls.

Receipes as simple sections

With <details> and <summary>, this is not an option. Their control-ness is hard-wired and fundamental. This feels like a mistake.

Interchangeability?

Unlike the collection of input which wants to describe a single "right shape", there isn't really a "right" answer to which of the things I have labelled Design Affordance Controls we should use. In the above example, one could easily and reasonably swap in several (maybe any) of them. Tabs, for example, are also a reasonable choice.

Recipes displayed with sections as tabs

Because this is all in pursuit of design, it might even be desirable to even change our minds! "Responsive Tabs" which allow for a control to be presented as either "tab-like" or "accordion-like" based on design constraints aren't uncommon, and are an example of just this. Their existence helps illustrate that there are at least reasons to consider that this observation is relevant.

(There is a Note about "browser tabs" later)

UnControling?

Even outside of print, as an author, the answer to whether I'd like something to even provide <details> and <summary> affordances varies based on the design. This website, in fact, is an example.

If your browser window is big enough, the left side of this website (as of this writing, at least) is a bunch of information about me. It's not, very probably, why you're here. You didn't come to learn about me. You want to read the <main> content. But, that's not a design problem either... In fact, I think that showing you both and making it easy for you to get that information without being overly distracting is a feature of the design.

However, in a smaller viewport, this would become really inconvenient for a user. It would involve scrolling through a lot of "noise" in order to get to the actual content. The "easy" solution might be to just hide it. However, as I said, making it easy to find that information is a feature. So, on a small screen, I choose to put a disclosure widget that is collapsed by default with 'author information' in the summary.

Almost every website on earth, it seems, has some "spiritually similar" idea (hamburger menus, for example). Cool.

Except... wait... is it? That's not how the API surface of <details> and <summary> work. At all. I believe that's because of how we approached its design. I can't think of any precedent of a control that becomes... not a control. I can, however, point to plenty of examples of "regular things" that can potentially gain affordances.

Enhancing vs... Unenhancing?

Just as scrollbars "enhance" regular content with affordances, Progressive Enhancement does something similar. That's useful to think about.

Assuming that content should be meaningful and non-interactive to start is a good exercise, generally. Any of a myriad of problems can get in the way and if markup and styling implies something is interactive, but it doesn't wind up being so, users are left in one of two bad states:

  • (ideally) a UI which seems to imply that a section could be collapsed, but frustratingly that won't seem to work.
  • (much worse) a UI which seems to imply there is content (which their is) which the user should be able to expand, but frustratingly cannot.

This exercise is also potentially helpful in designing a new control like this for HTML itself. Until new controls are supported by every browser (and historically, that takes a long time), the situation is not dissimilar.

A more robust solution involves enhancing otherwise good and meaningful content (as you see in the print version above) with affordances, if that is both possible and desirable. In fact, this is precisely what several of the original PE examples/essays did. The essence of the element didn't change, only the affordances inside it did, and only if they could. They were perfectly good on their own, and were careful to avoid the above kind of situation.

But <details> and <summary> were not that. They were just unknown elements (effectively, <spans>s). They ran together stylistically, and they had no useful meaning to assistive technology either. If we had thought of these as "Design Affordances" which could appear on a <section>, how much better would that have been in the interim?

Clues in Linking and Searching?

Pages allow authors to share links to anchors. Find-in-page allows authors to find content in the page by searching. Today, this is a problem for <details> and <summary>, again in part because of how 'control-ness' is fundamentally baked into them. Again, this feels like a mistake, and these are pain points that we're working on - but it's interesting to realize that the same would be the case for other controls of this class if we continue to think of them this way.

I feel like these offer further evidence their UI affordances are necessarily secondary and not fundamental.

A Quick Note on "Browser Tabs"

Lots of interfaces that we use have things we'd all refer to in polite conversation as "Tabs". These are used as examples that confound a lot of conversations and take them in many directions. It's never the case, for example, that one would "print all the open tabs" or "view them all in a sequence". In fact, one could reply to many of the things I've written here with "...but, browser tabs...".

It's important to note that while we refer to all of these things as "tabs", most UI toolkits have entirely separate classes of "tab-like" things with distinct names. The fundamental distinction between them is that one of those kinds of tabs (the kind in browsers, chat clients, editors and so on) are actually managing windows, and the other kind is managing panels of content inside a window. On the web, our model is documents. But, there is an easy-to-imagine parallel with embedded documents here. That is, one could perhaps make tabs out of <section>s, or perhaps out of <iframe>s. These would have roughly the same kinds of boundaries as toolkits, but (even from a user's POV) actually different expectations on several of the things noted in this post.

Conclusions?

This post isn't actually intended to present a "solution" as much as it is to provide food for thought about how we shape the conversations as we design new things for HTML. Perhaps even toward helping shape answers to some long open questions. Even something as simple as "what is an accordion?" remains, believe it or not, just a bit elusive in some important ways that even the ARIA Practices Guide (APG) itself has struggled with. I hope that this line of thinking and discussion is helpful to that struggle.

In truth, it's probably not even this cut and dry. As I said: These controls aren't exactly like scrollbars either. I'm definitely not suggesting we should just match that either. It's tempting to kind of think of something like input's type attribute here - but it's not quite that either (which is good, because that is rife with issues -- see Monica's Input I <3 you, but you're bringing me down. for a rough sense of them).

. Instead, it seems there is very probably a small spectrum of "classes of controls and concerns" here that are worth thinking about carefully as we are designing new controls. But we can't really do that unless we start talking about it somewhere. Starting that conversation is my hope here.

I'm currently working with lots of folks in OpenUI on trying to introduce some of these controls, and I'm hopeful we can incorporate these thoughts into our discussions. While this is a lot more to be fleshed out, there is a version of this that I can imagine that very closely resembles a proposal from Ian Hickson from long ago. Ian's proposal provided a wrapper element around otherwise good, sectioned content and would have offered minimal affordances (like grouping the headings as "tabs") and state management. There is a lot that appeals to me about that kind of approach and could play well here. A few of us (myself, Dave Rupert, Pascal Schilp, Jonathan Neal, Miriam Suzanne, Zach Leatherman, Greg Whitworth, Nicole Sullivan) have also been discussing bits of this. Some of us have also been working on creating a custom element along similar lines which can be plain old sections, "tab-like" or "accordion-like". This broke out from collaborative work that began with several of us attempting to align our ideas together on Pascal Schilp's Generic Tabs repository (that isn't the component, but it already has some nice qualities). I'm looking forward to sharing our component and more details soon.

Honestly, I'd love to hear your thoughts. I feel like this is an area ripe for serious R&D, study and discussion.

Special thanks to several friends for proofing/commenting on this as it developed: Alice Boxhall, Jonathan Neal, Miriam Suzanne, Eric Meyer, Dave Rupert. Thanks don't imply some kind of endorsement.

April 25, 2021 04:00 AM

April 22, 2021

Alex Surkov

Hidden Gem of Chromium Accessibility

Low-level accessibility tools

There’s a hidden gem in Chromium accessibility. It is the command line accessibility tools. Similar to accessibility inspection tools available on all popular platforms such as Windows or OSX, these tools are designed for the very same purpose: they provide a low level access to accessibility features of a web site. They also have a superpower though that makes them mighty and unique, but let’s talk about it later. Let’s do some backgrounds for the start.

An accessible website means that the assistive technologies like a screen reader can see, perceive and operate on the website. Not every website is accessible. If website content can’t be expressed in a way the assistive technologies understand, then the website is inaccessible. It’s like translating the website to the assistive technologies language: if it cannot be done, then you are out of luck.

Each platform has its own accessibility API, the assistive technologies rely on those to work with websites. In particular it means the website content should be designed in such a way that it can be mapped to those APIs. So if you have an accessibility issue on a website, then quite likely it indicates the content is not properly mapped to the accessibility APIs.

If a website has an accessibility bug, then it’s often a good idea to inspect how the website is mapped to the APIs. Each platform has its own set of accessibility tools: Linux has Accerciser, OSX has Accessibility Inspect, Windows has Inspect. Each browser is also packed by developer tools which incorporate functions for website accessibility inspection. They all are designed to help web authors to better understand how screen readers see a web site. They represent website content in a quite techy way in terms of APIs, objects and their properties, but it may give you a clue why accessibility doesn’t work as expected. It’s definitely worth a shot.

Let’s stop from burrowing further into tech details of how accessibility works. It should be good enough for the context to move back to the original topic. In case if you want to dive deeper, then you can check out this podcast where igalians discuss tech accessibility.

So what makes Chromium accessibility tools special?

  • First, they are cross platform. You have a single tool for all major platforms: Windows, OSX and Linux. It is handy and makes a learning curve shallow.
  • Second, these tools are open source, you know what happens under-the-hood and you can always add a new feature if missing.
  • And finally, these are CLI tools and can be used as a building block to create other accessibility inspection applications. This is my favourite part. I think this latter argument captures the spirit of the Unix world quite nicely, but let’s talk on this later.

The Chromium accessibility is packed by two tools:

  • ax_dump_tree which is used to dump an accessible tree of an application. It makes it quite similar to well known inspection tools (see above) and
  • ax_dump_events tool which is designed to log all accessibility events coming from an application.

Being able to inspect an accessible tree which shows a hierarchy of accessible elements and includes all accessibility properties like name, description is a quite handy feature when you dig into accessibility bugs.

Accessibility events logging can be also super helpful. It’s not something typically supported out of box by existing accessibility tools, but this is a super important bit of information. Indeed, events are the primary notification mechanism used by the assistive technologies to pick up changes from a website. If events are broken, then the assistive technologies get broken too. It makes the ax_dump_events tool special, and also makes another good reason why you’d want to try it next time :)

These tools support a bunch of pre-defined selectors to grab a tree or record events in browsers in no time. For example:

  • — chrome and — chromium for Chrome and Chromium
  • — firefox for Firefox
  • — edge for Edge
  • — safari for Safari

For example, you do:

ax_dump_tree — edge

to dump the accessible tree of Microsoft Edge browser on Windows.

Same with events. To start recording events for Safari browser you do:

ax_dump_events — safari

If you need to scope the tool to a web site (as opposed to the whole browser), then you have — active-tab selector to do so. In case of multiple windows of the same application, you can tune your search by — pattern option. For non browser applications you can specify process ID via — pid option. See docs for full documentation.

Let’s recapture some of the above. You’d want to use these tools when:

  • You need to inspect an accessible tree for a website/test case, or when you need to check which accessibility events are fired: sometimes it’s important to understand how the assistive technologies see your website (Say “hi” to web authors);
  • You need to compare accessible trees/events between browsers. It’s helpful when you need to figure out why a screen reader works in one browser and doesn’t work in another (say “hi” to browser developers);
  • It’s useful to find implementation gaps and inconsistencies (say “hi” to spec authors).

There is no official place for the tools yet. You can build those yourself from Chromium sources. I also prepared nightly builds and uploaded them at my google drive at your convenience.

The tools are awesome: fast, reliable and easy to use. They are perfect except … they have no GUI :) All console tools have such flaws, just by design. Indeed, CLI lacks certain useful options inherent to GUI tools, for example, click-to-inspect feature, which allows you to select a DOM element and inspect its accessible tree. This is very handy. You can find the feature literally in any accessibility tool, and not having it is a real bummer.

But no-GUI disadvantage can become a benefit: the tools can be easily incorporated into other accessibility tools as an accessibility inspect engine. You can create a cross platform GUI tool backed, for example, by Electron, and powered by Chromium’s ax_dump tools. Indeed, you don’t have to c++ or python into web applications accessibility, you can rely on ax_dump tools output instead. Chromium developer tools, which is not surprising, are based on these very same CLI tools.

And last but not least. It opens a new world for cross browser testing. Now you should be able to automate the platform accessibility mapping, for example, HTML-AAM, and it might be an easier solution than Accessible Testing Protocol. Needless to say Chromium already uses it in their automated test suite.

This is the beauty of the Unix word: you have a basic but powerful command line tool, which can serve as an engine to build anything you want upon it: GUI accessibility tools, automated testing, or you can use them as is because they are just cool.

by Alexander Surkov at April 22, 2021 07:22 PM

Imanol Fernandez

WebXR landing in WebKit

Since I joined Igalia, I have been working on finishing up the core WebXR implementation in WebKit, focused on the DOM, render loop, graphics and input sources. We are targeting the OpenXR API and we have reached the point where we are able to run some demos, so it is a good time to share a summary of the work that we have done so far. You can check all the patches and the code at the WebKit WebXR module.

April 22, 2021 10:45 AM

April 21, 2021

Víctor Jáquez

Review of Igalia Multimedia activities (2020/H2)

As the first quarter of 2021 has aready come to a close, we reckon it’s time to recap our achievements from the second half of 2020, and update you on the improvements we have been making to the multimedia experience on the Web and Linux in general.

Our previous reports:

WPE / WebKitGTK

We have closed ~100 issues related with multimedia in WebKitGTK/WPE, such as fixed seek issues while playback, plugged memory leaks, gardening tests, improved Flatpak-based developing work-flow, enabled new codecs, etc.. Overall, we improved a bit the multimedia’s user experience on these Webkit engine ports.

To highlight a couple tasks, we did some maintenance work on WebAudio backends, and we upstreamed an internal audio mixer, keeping only one connection to the audio server, such as PulseAudio, instead of multiple connections, one for every audio resource. The mixer combines all streams into a single audio server connection.

Adaptive media streaming for the Web (MSE)

We have been working on a new MSE backend for a while, but along the way many related bugs have appeared and they were squashed. Also many code cleanups has been carried out. Though it has been like yak shaving, we are confident that we will reach the end of this long and winding road soonish.

DRM media playback for the Web (EME)

Regarding digital protected media playback, we worked to upstream OpenCDM, support with Widevine, through RDK’s Thunder framework, while continued with the usual maintenance of the others key systems, such as Clear Key, Widevine and PlayReady.

For more details we published a blog post: Serious Encrypted Media Extensions on GStreamer based WebKit ports.

Realtime communications for the Web (WebRTC)

Just as EME, WebRTC is not currently enabled by default in browsers such as Epiphany because license problems, but they are available for custom adopters, and we are maintaining it. For example, we collaborated to upgrade LibWebRTC to M87 and fixed the expected regressions and gardening.

Along the way we experimented a bit with the new GPUProcess for capture devices, but we decided to stop the experimentation while waiting for a broader adoption of the process, for example in graphics rendering, in WPE/WebKitGTK.

GPUProcess work will be retaken at some point, because it’s not, currently, a hard requirement, since we already have moved capture devices handling from the UIProcess to the WebProcess, isolating all GStreamer operations in the latter.

GStreamer

GStreamer is one of our core multimedia technologies, and we contribute on it on a daily basis. We pushed ~400 commits, with similar number of code reviews, along the second half of 2020. Among of those contributions let us highlight the following list:

  • A lot of bug fixing aiming for release 1.18.
  • Reworked and enhanced decodebin3, the GstTranscoder
    API
    and encodebin.
  • Merged av1parse in video parsers plugin.
  • Merged qroverlay plugin.
  • Iterated on the mono-repo
    proposal, which requires consensus and coordination among the whole community.
  • gstwpe element has been greatly improved from new user requests.
  • Contributed on the new libgstcodecs library, which enables stateless video decoders through different platforms (for example, v4l2, d3d11, va, etc.).
  • Developed a new plugin for VA-API using this library, exposing H.264, H.265, VP9, VP8, MPEG2 decoders and a full featured postprocessor, with better performance, according our measurements, than GStreamer-VAAPI.

Conferences

Despite 2020 was not a year for conferences, many of them went virtual. We attended one, the Mile high video conference, and participated in the Slack workspace.

Thank you for reading this report and stay tuned with our work.

by vjaquez at April 21, 2021 04:49 AM

April 20, 2021

Danylo Piliaiev

Turnips in the wild (Part 1)

Running games and benchmarks is much more exciting than trying to fix a handful of remaining synthetic tests. Turnip, which is an open-source Vulkan driver for recent Adreno GPUs, should already be capable of running real world applications, and they always have a way to break the driver in a new, unexpected ways.

TauCeti Vulkan Technology Benchmark

The benchmark greeted me with this wonderful field which looked a little blocky:

Screenshot of the benchmark with grass/dirt field which looks more like a patchwork

It’s not a crash, it’s not a hang, there is something wrong either with textures or with a shader. So, let’s take a closer look at a frame. Here is the first major draw call which looks wrong:

Screenshot of a bad draw call being inspected in RenderDoc

Now, let’s take a look at textures used by the draw:

More than twelve textures used by the draw call

That’s a lot of textures! But all of them look fine; yes, there are some textures that look blocky, but these textures are just small, so nothing wrong here.

The next stop is the fragment shader, I recently implemented an extension which helps with that. VK_KHR_pipeline_executable_properties allows the reporting of additional information about shaders in a pipeline. In case of Turnip it is statistics about registers and assembly of the shader:

Part of the problematic shader with a few instructions having a warning printed near them
Excerpt from the problematic shader
sam.base0 (f32)(xy)r0.x, r0.z, a1.x	; no field 'HAS_SAMP', WARNING: unexpected bits[0:7] in #cat5-samp-s2en-bindless-a1: 0x6 vs 0x0

Bingo! Usually when the issue is in a shader I have to either make educated guesses and/or reduce the shader until the issue is apparent. However, here we can immediately spot instructions with a warning. It may not be the cause of earth mis-rendering, but these instructions do sampling from textures, soooo.

After fixing their encoding, which was caused by a mistake in the XML definition, the ground is now rendered correctly:

Screenshot of the benchmark with a field which now has proper grass and dirt

After looking a bit more at the frame I saw that the same issue plagued all rock formations, and now it is gone too. The final fix could be seen at

ir3/isa,parser: fix encoding and parsing of bindless s2en SAM (!9628)

Genshin Impact

Now it’s time for a more heavyweight opponent - Genshin Impact, one of the most popular mobile games at the moment. Genshin Impact supports both GLES and Vulkan, and defaults to GLES on all but several devices. In any case, Vulkan could be enabled by editing a config file.

There are several mis-renderings in the game, however here I would describe the one which shows why it is important to use all available validation tooling for such complex API as Vulkan. Other mis-renderings would be a matter for the second post.

Gameplay – Disco Water

Proceeding to the gameplay and running around for a bit revealed a major artifacts on the water surface:

Screenshot of the gameplay with body of water that has large colorful artifacts

This one was a multiframe effect so I had to capture a trace of all frames and then find the one where issue began. The draw call I found:

GIF showing before and after problematic draw call

It adds water to the first framebuffer and a lot of artifacts to the second one. Looking at the framebuffer configuration - I could see that the fragment shader doesn’t write to the second framebuffer. Is it allowed? Yes. Is the behavior defined? No! VK_LAYER_KHRONOS_validation warns us about it, if warnings are enabled:

UNASSIGNED-CoreValidation-Shader-InputNotProduced(WARN / SPEC): Validation Warning:
Attachment 1 not written by fragment shader; undefined values will be written to attachment

“undefined values will be written to attachment” may mean that nothing is written into it, like expected by the game, or that random values are written to the second attachment, like Turnip does. Such behavior should not be relied upon, so Turnip does not contradict the specification here and there is nothing to fix. Case closed.

More mis-renderings and investigations to come in the next post(s).

Turnips in the wild (Part 2)

by Danylo Piliaiev at April 20, 2021 09:00 PM

Eleni Maria Stea

EXT_external_objects and EXT_external_objects_fd for the Intel iris driver have been merged into mesa3D! [updated]

This post is a quick status update on OpenGL and Vulkan Interoperability extensions for Linux mesa3D drivers: Both EXT_external_objects and EXT_external_objects_fd implementations for the Intel iris driver have been finally merged into mesa3D earlier today and will be available in next release! 🎉 Parts of these extensions were already upstream (EXT_semaphore, EXT_semaphore_fd, and some drivers … Continue reading EXT_external_objects and EXT_external_objects_fd for the Intel iris driver have been merged into mesa3D! [updated]

by hikiko at April 20, 2021 07:49 PM

Enrique Ocaña

GStreamer WebKit debugging tricks using GDB (2/2)

This post is a continuation of a series of blog posts about the most interesting debugging tricks I’ve found while working on GStreamer WebKit on embedded devices. These are the other posts of the series published so far:

Print corrupt stacktraces

In some circumstances you may get stacktraces that eventually stop because of missing symbols or corruption (?? entries).

#3  0x01b8733c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

However, you can print the stack in a useful way that gives you leads about what was next in the stack:

  • For i386: x/256wa $esp
  • For x86_64: x/256ga $rsp
  • For ARM 32 bit: x/256wa $sp

You may want to enable asm-demangle: set print asm-demangle

Example output, the 3 last lines give interesting info:

0x7ef85550:     0x1b87400       0x2     0x0     0x1b87400
0x7ef85560:     0x0     0x1b87140       0x1b87140       0x759e88a4
0x7ef85570:     0x1b87330       0x759c71a9 <gst_base_sink_change_state+956>     0x140c  0x1b87330
0x7ef85580:     0x759e88a4      0x7ef855b4      0x0     0x7ef855b4
...
0x7ef85830:     0x76dbd6c4 <WebCore::AppendPipeline::resetPipeline()::__PRETTY_FUNCTION__>        0x4     0x3     0x1bfeb50
0x7ef85840:     0x0     0x76d59268      0x75135374      0x75135374
0x7ef85850:     0x76dbd6c4 <WebCore::AppendPipeline::resetPipeline()::__PRETTY_FUNCTION__>        0x1b7e300       0x1d651d0       0x75151b74

More info: 1

Sometimes the symbol names aren’t printed in the stack memdump. You can do this trick to iterate the stack and print the symbols found there (take with a grain of salt!):

(gdb) set $i = 0
(gdb) p/a *((void**)($sp + 4*$i++))

[Press ENTER multiple times to repeat the command]

$46 = 0xb6f9fb17 <_dl_lookup_symbol_x+250>
$58 = 0xb40a9001 <g_log_writer_standard_streams+128>
$142 = 0xb40a877b <g_return_if_fail_warning+22>
$154 = 0xb65a93d5 <WebCore::MediaPlayerPrivateGStreamer::changePipelineState(GstState)+180>
$164 = 0xb65ab4e5 <WebCore::MediaPlayerPrivateGStreamer::playbackPosition() const+420>
...

Many times it’s just a matter of gdb not having loaded the unstripped version of the library. /proc/<PID>/smaps and info proc mappings can help to locate the library providing the missing symbol. Then we can load it by hand.

For instance, for this backtrace:

#0  0x740ad3fc in syscall () from /home/enrique/buildroot-wpe/output/staging/lib/libc.so.6 
#1  0x74375c44 in g_cond_wait () from /home/enrique/buildroot-wpe/output/staging/usr/lib/libglib-2.0.so.0 
#2  0x6cfd0d60 in ?? ()

In a shell, we examine smaps and find out that the unknown piece of code comes from libgstomx:

$ cat /proc/715/smaps
...
6cfc1000-6cff8000 r-xp 00000000 b3:02 785380     /usr/lib/gstreamer-1.0/libgstomx.so
...

Now we load the unstripped .so in gdb and we’re able to see the new symbol afterwards:

(gdb) add-symbol-file /home/enrique/buildroot-wpe/output/build/gst-omx-custom/omx/.libs/libgstomx.so 0x6cfc1000
(gdb) bt
#0  0x740ad3fc in syscall () from /home/enrique/buildroot-wpe/output/staging/lib/libc.so.6
#1  0x74375c44 in g_cond_wait () from /home/enrique/buildroot-wpe/output/staging/usr/lib/libglib-2.0.so.0
#2  0x6cfd0d60 in gst_omx_video_dec_loop (self=0x6e0c8130) at gstomxvideodec.c:1311
#3  0x6e0c8130 in ?? ()

Useful script to prepare the add-symbol-file:

cat /proc/715/smaps | grep '[.]so' | sed -e 's/-[0-9a-f]*//' | { while read ADDR _ _ _ _ LIB; do echo "add-symbol-file $LIB 0x$ADDR"; done; }

More info: 1

The “figuring out corrupt ARM stacktraces” post has some additional info about how to use addr2line to translate memory addresses to function names on systems with a hostile debugging environment.

Debugging a binary without debug symbols

There are times when there’s just no way to get debug symbols working, or where we’re simply debugging on a release version of the software. In those cases, we must directly debug the assembly code. The gdb text user interface (TUI) can be used to examine the disassebled code and the CPU registers. It can be enabled with these commands:

layout asm
layout regs
set print asm-demangle

Some useful keybindings in this mode:

  • Arrows: scroll the disassemble window
  • CTRL+p/n: Navigate history (previously done with up/down arrows)
  • CTRL+b/f: Go backward/forward one character (previously left/right arrows)
  • CTRL+d: Delete character (previously “Del” key)
  • CTRL+a/e: Go to the start/end of the line

This screenshot shows how we can infer that an empty RefPtr is causing a crash in some WebKit code.

Wake up an unresponsive gdb on ARM

Sometimes, when you continue (‘c’) execution on ARM there’s no way to stop it again unless a breakpoint is hit. But there’s a trick to retake the control: just send a harmless signal to the process.

kill -SIGCONT 1234

Know which GStreamer thread id matches with each gdb thread

Sometimes you need to match threads in the GStreamer logs with threads in a running gdb session. The simplest way is to ask it to GThread for each gdb thread:

(gdb) set output-radix 16
(gdb) thread apply all call g_thread_self()

This will print a list of gdb threads and GThread*. We only need to find the one we’re looking for.

Generate a pipeline dump from gdb

If we have a pointer to the pipeline object, we can call the function that dumps the pipeline:

(gdb) call gst_debug_bin_to_dot_file_with_ts((GstBin*)0x15f0078, GST_DEBUG_GRAPH_SHOW_ALL, "debug")

by eocanha at April 20, 2021 06:00 AM

April 19, 2021

Samuel Iglesias

Low Resolution Z Buffer support on Turnip

Last year I worked on implementing in Turnip the support for a HW feature present in Qualcomm Adreno GPUs: the low-resolution Z buffer (aka LRZ). This is a HW feature already supported in Freedreno, which is the open-source OpenGL driver for these GPUs.

What is low-resolution Z buffer

Low-resolution Z buffer is very similar to a depth prepass that helps the HW avoid executing the fragment shader on those fragments that will be subsequently discarded by the depth test afterwards (Hidden surface removal). This feature comes with some limitations though, such as the fragment shader not being allowed to have side effects (writing to SSBOs, atomic operations, etc) among others.

The interesting part of this feature is that it allows the applications to submit the vertices in any order (saving CPU time that was otherwise used on ordering them) and the HW will process them in the binning pass as explained below, detecting which ones are occluded and giving an increase in performance in some specific use cases due to this.

Tiled-rendering

To understand better how LRZ works, we need to talk a bit about tiled-based rendering. This is a way of rendering based on subdividing the framebuffer in tiles and rendering each tile separately. The advantage of this design is that the amount of memory and bandwidth is reduced compared to immediate mode rendering systems that draw the entire frame at once. Tile-based rendering is very popular on embedded GPUs, including Qualcomm Adreno.

Entering into more details, the graphics pipeline is divided into three different passes executed per tile of the frame.

Tiled-rendering architecture diagram
Tiled-rendering architecture diagram.
  • The binning pass. This pass processes the geometry of the scene and records in a table on which tiles a primitive will be rendered. By doing this, the HW only needs to render the primitives that affect a specific tile when is processed.

  • The rendering pass. This pass gets the rasterized primitives and executes all the fragment related processes of the pipeline (fragment shader execution, depth pass, stencil pass, blending, etc). Once it finishes, the resolve pass starts.

  • The resolve pass. It first resolves the tile buffer (GMEM) if it is multisample, and copy the final color and depth values for all tile pixels back to system memory. If it is the last tile of the framebuffer, it swap buffers and start the binning pass for next frame.

Where is LRZ used then? Well, in both binning and rendering passes. In the binning pass, it is possible to store the depth value of each vertex of the geometries of the scene in a buffer as the HW has that data available. That is the depth buffer used internally for LRZ. It has lower resolution as too much detail is not needed, which helps to save bandwidth while transferring its contents to system memory.

Thanks to LRZ, the rendering pass is only executed on the fragments that are going to be visible at the end. However, there are some limitations as mentioned before: if a fragment shader has collateral effects, such as writing SSBO, atomics, etc; or if blending is enabled, or if the fragment shader could modify the fragment’s depth… then LRZ cannot be used as the results may be wrong.

However, LRZ brings a couple of things on the table that makes it interesting. One is that applications don’t need to reorder their primitives before submission to be more efficient, that is done by the HW with LRZ automatically. Another one is performance improvements in some use cases. For example, imagine a fragment shader discards parts of fragments but it doesn’t have any other collateral effect otherwise. In that case, although we cannot do early depth testing, we can do early LRZ as we know that some fragments won’t pass a depth test even if they are not discarded by the fragment shader.

Turnip implementation

Talking about the LRZ implementation, I took Freedreno’s code as a starting point to implement LRZ on Turnip. After some months of work, it finally landed in Mesa master.

Last week, more patches related to LRZ landed in Mesa master: the ones fixing LRZ interactions with VK_EXT_extended_dynamic_state, as with this extension the application can change some states in command buffer time that could affect LRZ state and, therefore, we need to track them accordingly.

I also implemented some LRZ improvements that are currently under review also landed (thanks Eric Anholt!), such as the support to do early-LRZ-late-depth test that I mentioned before, which could bring a performance improvement in some applications.

LRZ improvements
Left: original vulkan tutorial demo implementation. Right: same demo modified to discard fragments with red component lower than 0.5f.

For instance, I did some measurements in a vulkan-tutorial.com implementation of my own that I modified to discard a significant amount of fragments (see previous figure). This is one of the cases that early-LRZ-late-depth test helps to improve performance.

When running the modified demo with these patches, I found a performance improvement between 13-16%.

Acknowledgments

All this LRZ work was my first big contribution to this open-source reverse engineered driver! I don’t want to finish this post without thanking publicly Rob Clark for the original Freedreno implementation and his reviews of my work, as well as Jonathan Marek and Connor Abbott for their insightful reviews, advice and tips to make it working. Edited: Many thanks to Eric Anholt for his reviews in the last two patch series!

Happy hacking!

April 19, 2021 07:40 AM

April 16, 2021

Eric Meyer

Scripted Server Startup for MDN and WPT

A sizeable chunk of my work at Igalia so far involves editing and updating the Mozilla Developer Network (MDN), and a smaller chunk has me working on the Web Platform Tests (WPT).  In both cases, the content is stored in large public repositories (MDN, WPT) and contributors are encouraged to fork the repositories, clone them locally, and push updates via the fork as PRs (Pull Requests).  And while both repositories roll in localhost web server setups so you can preview your edits locally, each has its own.

As useful as these are, if you ignore the whole “auto-force a browser page reload every time the file is modified in any way whatsoever” thing that I’ve been trying very hard to keep from discouraging me from saving often, each has to be started in its own way, from within their respective repository directories, and it’s generally a lot more convenient to do so in a separate Terminal window.

I was getting tired of constantly opening a new Terminal window, cding into the correct place, remembering the exact invocation needed to launch the local server, and on and on, so I decided to make my life slightly easier with a few short scripts and aliases.  Maybe this will be useful to you as well.

First, I decided to keep things relatively simple.  Instead of writing a small program that would handle all server startups by parsing shell arguments and what have you, I wrote a couple of very similar shell scripts.  Here’s the script for launching MDN’s localhost:

#!/bin/bash
cd ~/repos/mdn/content/
yarn start

Then I added an alias to ~/.bashrc which employs a technique I swiped from this Stack Overflow answer.

alias mdn-server="open -a Terminal.app ~/bin/mdn-start.bsh"

Translated into English, that means “open the file ~/bin/mdn-start.bsh using the -application Terminal.app”.

Thus, when I type mdn-server in any command prompt, a new Terminal window will open and the shell script mdn-start.bsh will be run; the script switches into the needed directory and launches the localhost server using yarn, as per the MDN instructions.  What’s more, when I’m done working on MDN, I can switch to the window running the server, stop the server with ⌃C (control-C), and the Terminal window closes automatically.

I did something very similar for WPT, except in this case the alias reads:

alias wpt-server="open -a Terminal.app ~/bin/wpt-serve.bsh"

And the script to which it points reads:

#!/bin/bash
cd ~/repos/wpt/
./wpt serve

As I mentioned before, I chose to do it this way rather than writing a single alias (say, local-server) that would accept arguments (mdn, wpt, etc.) and fire off scripts accordingly, but that’s also an option and a viable one at that.

So that’s my little QoL (Quality of Life) upgrade to make working on MDN and WPT a little easier.  I hope it helps you in some way!


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at April 16, 2021 01:55 PM

April 13, 2021

Eleni Maria Stea

Using EGL and the dma_buf kernel framework to associate two textures with the contents of the same buffer without copy taking place

It’s been a few weeks I’ve been experimenting with EGL/GLESv2 as part of my work for WebKit (Browsers) team of Igalia. One thing I wanted to familiarize with was using shared DMA buffers to avoid copying textures in graphics programs. I’ve been experimenting with the dma_buf API, which is a generic Linux kernel framework for … Continue reading Using EGL and the dma_buf kernel framework to associate two textures with the contents of the same buffer without copy taking place

by hikiko at April 13, 2021 01:12 PM

Enrique Ocaña

GStreamer WebKit debugging tricks using GDB (1/2)

I’ve been developing and debugging desktop and mobile applications on embedded devices over the last decade or so. The main part of this period I’ve been focused on the multimedia side of the WebKit ports using GStreamer, an area that is a mix of C (glib, GObject and GStreamer) and C++ (WebKit).

Over these years I’ve had to work on ARM embedded devices (mobile phones, set-top-boxes, Raspberry Pi using buildroot) where most of the environment aids and tools we take for granted on a regular x86 Linux desktop just aren’t available. In these situations you have to be imaginative and find your own way to get the work done and debug the issues you find in along the way.

I’ve been writing down the most interesting tricks I’ve found in this journey and I’m sharing them with you in a series of 7 blog posts, one per week. Most of them aren’t mine, and the ones I learnt in the begining of my career can even seem a bit naive, but I find them worth to share anyway. I hope you find them as useful as I do.

Breakpoints with command

You can break on a place, run some command and continue execution. Useful to get logs:

break getenv
command
 # This disables scroll continue messages
 # and supresses output
 silent
 set pagination off
 p (char*)$r0
continue
end

break grl-xml-factory.c:2720 if (data != 0)
command
 call grl_source_get_id(data->source)
 # $ is the last value in the history, the result of
 # the previous call
 call grl_media_set_source (send_item->media, $)
 call grl_media_serialize_extended (send_item->media, 
  GRL_MEDIA_SERIALIZE_FULL)
 continue
end

This idea can be combined with watchpoints and applied to trace reference counting in GObjects and know from which places the refcount is increased and decreased.

Force execution of an if branch

Just wait until the if chooses a branch and then jump to the other one:

6 if (i > 3) {
(gdb) next
7 printf("%d > 3\n", i);
(gdb) break 9
(gdb) jump 9
9 printf("%d <= 3\n", i);
(gdb) next
5 <= 3

Debug glib warnings

If you get a warning message like this:

W/GLib-GObject(18414): g_object_unref: assertion `G_IS_OBJECT (object)' failed

the functions involved are: g_return_if_fail_warning(), which calls to g_log(). It’s good to set a breakpoint in any of the two:

break g_log

Another method is to export G_DEBUG=fatal_criticals, which will convert all the criticals in crashes, which will stop the debugger.

Debug GObjects

If you want to inspect the contents of a GObjects that you have in a reference…

(gdb) print web_settings 
$1 = (WebKitWebSettings *) 0x7fffffffd020

you can dereference it…

(gdb) print *web_settings
$2 = {parent_instance = {g_type_instance = {g_class = 0x18}, ref_count = 0, qdata = 0x0}, priv = 0x0}

even if it’s an untyped gpointer…

(gdb) print user_data
(void *) 0x7fffffffd020
(gdb) print *((WebKitWebSettings *)(user_data))
{parent_instance = {g_type_instance = {g_class = 0x18}, ref_count = 0, qdata = 0x0}, priv = 0x0}

To find the type, you can use GType:

(gdb) call (char*)g_type_name( ((GTypeInstance*)0x70d1b038)->g_class->g_type )
$86 = 0x2d7e14 "GstOMXH264Dec-omxh264dec"

Instantiate C++ object from gdb

(gdb) call malloc(sizeof(std::string))
$1 = (void *) 0x91a6a0
(gdb) call ((std::string*)0x91a6a0)->basic_string()
(gdb) call ((std::string*)0x91a6a0)->assign("Hello, World")
$2 = (std::basic_string<char, std::char_traits<char>, std::allocator<char> > &) @0x91a6a0: {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x91a6f8 "Hello, World"}}
(gdb) call SomeFunctionThatTakesAConstStringRef(*(const std::string*)0x91a6a0)

See: 1 and 2

by eocanha at April 13, 2021 10:49 AM

April 11, 2021

Andy Wingo

guile's reader, in guile

Good evening! A brief(ish?) note today about some Guile nargery.

the arc of history

Like many language implementations that started life when you could turn on the radio and expect to hear Def Leppard, Guile has a bottom half and a top half. The bottom half is written in C and exposes a shared library and an executable, and the top half is written in the language itself (Scheme, in the case of Guile) and somehow loaded by the C code when the language implementation starts.

Since 2010 or so we have been working at replacing bits written in C with bits written in Scheme. Last week's missive was about replacing the implementation of dynamic-link from using the libltdl library to using Scheme on top of a low-level dlopen wrapper. I've written about rewriting eval in Scheme, and more recently about how the road to getting the performance of C implementations in Scheme has been sometimes long.

These rewrites have a quixotic aspect to them. I feel something in my gut about rightness and wrongness and I know at a base level that moving from C to Scheme is the right thing. Much of it is completely irrational and can be out of place in a lot of contexts -- like if you have a task to get done for a customer, you need to sit and think about minimal steps from here to the goal and the gut doesn't have much of a role to play in how you get there. But it's nice to have a project where you can do a thing in the way you'd like, and if it takes 10 years, that's fine.

But besides the ineffable motivations, there are concrete advantages to rewriting something in Scheme. I find Scheme code to be more maintainable, yes, and more secure relative to the common pitfalls of C, obviously. It decreases the amount of work I will have when one day I rewrite Guile's garbage collector. But also, Scheme code gets things that C can't have: tail calls, resumable delimited continuations, run-time instrumentation, and so on.

Taking delimited continuations as an example, five years ago or so I wrote a lightweight concurrency facility for Guile, modelled on Parallel Concurrent ML. It lets millions of fibers to exist on a system. When a fiber would need to block on an I/O operation (read or write), instead it suspends its continuation, and arranges to restart it when the operation becomes possible.

A lot had to change in Guile for this to become a reality. Firstly, delimited continuations themselves. Later, a complete rewrite of the top half of the ports facility in Scheme, to allow port operations to suspend and resume. Many of the barriers to resumable fibers were removed, but the Fibers manual still names quite a few.

Scheme read, in Scheme

Which brings us to today's note: I just rewrote Guile's reader in Scheme too! The reader is the bit that takes a stream of characters and parses it into S-expressions. It was in C, and now is in Scheme.

One of the primary motivators for this was to allow read to be suspendable. With this change, read-eval-print loops are now implementable on fibers.

Another motivation was to finally fix a bug in which Guile couldn't record source locations for some kinds of datums. It used to be that Guile would use a weak-key hash table to associate datums returned from read with source locations. But this only works for fresh values, not for immediate values like small integers or characters, nor does it work for globally unique non-immediates like keywords and symbols. So for these, we just wouldn't have any source locations.

A robust solution to that problem is to return annotated objects rather than using a side table. Since Scheme's macro expander is already set to work with annotated objects (syntax objects), a new read-syntax interface would do us a treat.

With read in C, this was hard to do. But with read in Scheme, it was no problem to implement. Adapting the expander to expect source locations inside syntax objects was a bit fiddly, though, and the resulting increase in source location information makes the output files bigger by a few percent -- due somewhat to the increased size of the .debug_lines DWARF data, but also due to serialized source locations for syntax objects in macros.

Speed-wise, switching to read in Scheme is a regression, currently. The old reader could parse around 15 or 16 megabytes per second when recording source locations on this laptop, or around 22 or 23 MB/s with source locations off. The new one parses more like 10.5 MB/s, or 13.5 MB/s with positions off, when in the old mode where it uses a weak-key side table to record source locations. The new read-syntax runs at around 12 MB/s. We'll be noodling at these in the coming months, but unlike when the original reader was written, at least now the reader is mainly used only at compile time. (It still has a role when reading s-expressions as data, so there is still a reason to make it fast.)

As is the case with eval, we still have a C version of the reader available for bootstrapping purposes, before the Scheme version is loaded. Happily, with this rewrite I was able to remove all of the cruft from the C reader related to non-default lexical syntax, which simplifies maintenance going forward.

An interesting aspect of attempting to make a bug-for-bug rewrite is that you find bugs and unexpected behavior. For example, it turns out that since the dawn of time, Guile always read #t and #f without requiring a terminating delimiter, so reading "(#t1)" would result in the list (#t 1). Weird, right? Weirder still, when the #true and #false aliases were added to the language, Guile decided to support them by default, but in an oddly backwards-compatible way... so "(#false1)" reads as (#f 1) but "(#falsa1)" reads as (#f alsa1). Quite a few more things like that.

All in all it would seem to be a successful rewrite, introducing no new behavior, even producing the same errors. However, this is not the case for backtraces, which can expose the guts of read in cases where that previously wouldn't happen because the C stack was opaque to Scheme. Probably we will simply need to add more sensible error handling around callers to read, as a backtrace isn't a good user-facing error anyway.

OK enough rambling for this evening. Happy hacking to all and to all a good night!

by Andy Wingo at April 11, 2021 07:51 PM

April 08, 2021

Andy Wingo

sign of the times

Hello all! There is a mounting backlog of things that landed in Guile recently and to avoid having to eat the whole plate in one bite, I'm going to try to send some shorter missives over the next weeks.

Today's is about a silly thing, dynamic-link. This interface is dlopen, but "portable". See, back in the day -- like, 1998 -- there were lots of kinds of systems and how to make and load a shared library portably was hard. You'd have people with AIX and Solaris and all kinds of weird compilers and linkers filing bugs on your project if you hard-coded a GNU toolchain invocation when creating loadable extensions, or hard-coded dlopen or similar to use them.

Libtool provided a solution to create portable loadable libraries, which involved installing .la files alongside the .so files. You could use libtool to link them to a library or an executable, or you could load them at run-time via the libtool-provided libltdl library.

But, the .la files were a second source of truth, and thus a source of bugs. If a .la file is present, so is an .so file, and you could always just use the .so file directly. For linking against an installed shared library on modern toolchains, the .la files are strictly redundant. Therefore, all GNU/Linux distributions just delete installed .la files -- Fedora, Debian, and even Guix do so.

Fast-forward to today: there has been a winnowing of platforms, and a widening of the GNU toolchain (in which I include LLVM as well as it has a mostly-compatible interface). The only remaining toolchain flavors are GNU and Windows, from the point of view of creating loadable shared libraries. Whether you use libtool or not to create shared libraries, the result can be loaded either way. And from the user side, dlopen is the universally supported interface, outside of Windows; even Mac OS fixed their implementation a few years back.

So in Guile we have been in an unstable equilibrium: creating shared libraries by including a probably-useless libtool into the toolchain, and loading them by using a probably-useless libtool-provided libltdl.

But the use of libltdl has not been without its costs. Because libltdl intends to abstract over different platforms, it encourages you to leave off the extension when loading a library, instead promising to try a platform-specific set such as .so, .dll, .dylib etc as appropriate. In practice the abstraction layer was under-maintained and we always had some problems on Mac OS, for example.

Worse, as ltdl would search through the path for candidates, it would only report the last error it saw from the underlying dlopen interface. It was almost always the case that if A and B were in the search path, and A/foo.so failed to load because of a missing dependency, the error you would get as a user would instead be "file not found", because ltdl swallowed the first error and kept trucking to try to load B/foo.so which didn't exist.

In summary, this is a case where the benefits of an abstraction layer decline over time. For a few years now, libltdl hasn't been paying for itself. Libtool is dead, for all intents and purposes (last release in 2015); best to make plans to migrate away, somehow.

In the case of the dlopen replacement, in Guile we ended up rewriting the functionality in Scheme. The underlying facility is now just plain dlopen, for which we shim a version of dlopen on Windows, inspired by the implementation in cygwin. There are still platform-specific library extensions, but that is handled easily on the Scheme layer.

Looking forward, I think it's probably time to replace Guile's use of libtool to create its libraries and executables. I loathe the fact that libtool puts shell scripts in the place of executables in build directories and stashes the actual executables elsewhere -- like, visceral revulsion. There is no need for that nowadays. Not sure what to replace it with, nor on what timeline.

And what about autotools? That, my friends, would be a whole nother blog post. Until then, & probably sooner, happy hacking!

by Andy Wingo at April 08, 2021 07:09 PM

Jacobo Aragunde

A recap of Chromium dialog accessibility enhancements

It’s been a bit more than a year since I started working on Chromium accessibility. Although I’ve worked on several kinds of issues, a lot of them had to do with UI dialogs of some kind. Browser interfaces present many kinds of dialogs and these problems affected all the desktop platforms. This post is a recap of all the kinds of things that have been fixed related with different uses of dialogs in Chromium!

JavaScript prompt dialogs

Several JavaScript things cause dialogs to be shown: Things like alert(), prompt() or even in certain cases the window.beforeunload event (if it is a function which returns a string, this is presented in a dialog before unload happens). These were reported in #779501, #1164927 and #1166370. On Windows ATs, these dialogs were not reading the actual content of the dialog when announced, thus defeating their purpose. This was fixed by removing some workarounds in the MessageBoxView class, a reusable component found inside these dialogs which used to implement alert-like behavior. This confused some ATs, which saw an alert inside a dialog; now it’s the parent dialog who might behave like an alert if needed.

Once this was fixed, we took also care of some speech duplication caused by several items having the same accessible name, reported as #1185092.

Save password and credit card dialogs

But browser interfaces themselves also use dialogs to create (hopefully) helpful interfaces for their users to do things like offering to save sensitive information like save a password or credit card information. Sadly, neither the save password or credit card dialogs were announced (#1079320, #1119367, #1125118). A previous attempt to enforce announcing “save password” (of which I talked in a previous post) caused it to be announced twice when using the toolbar icon to open it (#1132318).

These dialogs are particular because they have two behaviors: they can be explicitly opened using a toolbar icon, in which case they get focus immediately and must use the standard dialog role, or can be opened when a password is submitted. In the latter case, they don’t get focus and so they need to be announced as alerts. We fixed the problem by making sure we use one or the other role depending on how they are triggered, which enforces the correct code paths to produce alert event only when necessary.

Another useful things browsers offer to do is translate a page. This problem also affected the translate page dialog (#1119734). It shares code and behavior with the aforementioned ones and was fixed at the same time.

Accounts and sync menu

Accounts and sync, the avatar icon in the toolbar, looks and behaves more similarly to a menu but it’s implemented with dialog (bubble) code. There were, unfortunately, some problems with it: it was not announced when opened (#1098304) and contents that were informative and not focusable were not announced (#1078580 and #1161166). We used a variety of techniques to fix the problem:

  • The bubble already used a menu-like container role; so we enforce it to behave completely like a menu, using all the corresponding roles and emitting menu open/close events.
  • Use the informative, inaccessible titles as accessible names for containers, to make ATs announce them by context when focus jumps inside the container. Enforce certain role names for this to work on Windows.
  • Make the top-most, inaccessible content (user name and email) part of the menu title for it to be announced by context when the menu/dialog is opened.

Extension-related dialogs

Extension management also involves a lot of dialogs (install, manage, remove, confirmation). We detected and fixed the issue #1123107: the “add extension” dialog was not properly announced on Linux.

System dialogs

Browsers also launch system dialogs (open, save, print, etc). In the issue #1042864, keystroke events were not being produced in these cases.. This was quite a complex one! I wrote two posts dedicated to just this problem and its solution.

Restore after crash dialog

If your browser crashes, a dialog asking the user if they’d like to restore the tabs that were open before the crash is show. This dialog was not being announced (#1023822) and could not be focused with the Alt+Shift+A and F6 hotkeys (#1042010). These were the first issues I worked on, and wrote a specific post about them as well.

Site permission dialogs

Dialogs are also used in browsers to ask for their explicit permissions to access powerful features. We briefly mentioned how these were fixed in a previous post and we fixed their announcement (#1052675) and their ability to be focused with hotkeys (#1052676). In this process, I wrote the required code to emit alert events in the Linux accessibility backend, which was still missing at that point.

Additionally, I added more tests and cleaned up things a bit, refactoring code and removing some redundant events. It’s been quite a trip, building and testing on all the desktop platforms, taking into account platform-specific behavior and AT quirks… I still plan to keep working on this area, hardening automated testing if possible to prevent regressions, which should reduce the amount of effort in the long run.

Thanks for reading, and happy hacking!

by Jacobo Aragunde Pérez at April 08, 2021 04:00 PM

April 06, 2021

Eleni Maria Stea

Setting up to debug ANGLE with GDB (on Linux)

ANGLE is an EGL/GLES2 implementation on top of other graphics APIs. It is mainly used in systems that lack a native GLES2/GLES3 driver and in some browsers for example Chromium. As recently, I’ve used it for some browsers related work in Igalia‘s WebKit team (more on that coming soon) and had to set it up … Continue reading Setting up to debug ANGLE with GDB (on Linux)

by hikiko at April 06, 2021 06:53 PM

April 04, 2021

Manuel Rego

:focus-visible in WebKit - March 2021

Another month is gone, and we are back with another status update (see January and February ones).

This is about the work Igalia is doing on the implementation of :focus-visible in WebKit. This is a part of the Open Prioriziatation campaign and being sponsored by many people. Thank you!

The work on March has slowed down, so this status update is smaller than previous ones. Main focus has been around spec discussions trying to reach agreement.

Implementation details

The initial patch is available in the last Safari Technology Preview (STP) releases behind a runtime flag, but it has an annoying bug that was causing the body element to match :focus-visible when you used the keyboard to move focus. The issue was fixed past month but it hasn’t been included on a STP release yet (hopefully it’ll made it in release 124). Apart from that some minor patches related to implementation details have landed too. But this was just a small part of the work during March.

In addition I realized that :focus-visible appears in the Chromium and Firefox DevTools, so I took a look about how to make that happen on WebKit too. At that point I realized that :focus-within, which has been shipping for a long time, isn’t available in WebKit Web Inspector yet, so I cooked a simple patch to add it there. However that hasn’t landed yet, because it needs some UI rework, otherwise the list of pseudo-classes is going to be too long and not looking nice on the inspector. So the patch is waiting for some changes on the UI before it can be merged. Once that’s solved, adding :focus-within and :focus-visible to the Web Inspector is going to be pretty straight forward.

Spec discussions

This was the main part of the work during March, and the goal was to reach some agreement before finishing the implementation bits.

The main issue was how :focus-visible should work when a script moves focus. The behavior from the current implementations was not interoperable, the spec was not totally clear and, as explained on the previous report, in order to clarify this I created a set of new tests. These tests demonstrated some interesting incompatibilities. Based on this, we compared the results with the widely used polyfill as well. We found that there were various misalignments on tricky cases which generated significant discussions on which was correct, and why. After considerable discussion with people from Google and Mozilla, it looks like we have finally reached an agreement on the expectations.

Next was to see if we could clarify the text so that these cases couldn’t be interpreted in importantly incompatible ways, and following the advice from the CSS Working Group, I worked on a PR for the HTML spec trying to define when a browser should draw a focus indicator, and thus match :focus-visible. There some discussion about which elements should always match :focus-visible and how to define that in a normative text was raised (as some elements like <select> draw a focus ring in some browsers and not other when clicked, and some elements like <input type="date"> allow keyboard input or not depending on the platform). The discussion is still ongoing, and we’re still trying to find the proper way to define this in the HTML spec. Anyway if we manage to do that, that would be a great step forward regarding interoperability of :focus-visible implementations, and a big win for the final people using this feature.

Apart from that I’ve also created a test for my proposal about how :focus-visible should work in combination with Shadow DOM, but I don’t think I want to open that can of worms until the other part is done.

Some numbers

Let’s take a look to the numbers again, though things have moved slowly this time.

  • 21 PRs merged in WPT (1 in March).
  • 17 patches landed in WebKit (3 in March).
  • 7 patches landed in Chromium.
  • 2 PRs merged in CSS spcs (1 in March).
  • 1 PR merged in HTML spec.

Next steps

The main goal for April would be to close the spec discussions and land the PRs in HTML spec, so we can focus again in finishing the implementation in WebKit.

However if reaching an agreement to make these changes on HTML spec is not possible, probably we can still land some patches on the implementation side, to add some support for script focus on the WebKit side.

Stay tuned for more updates!

April 04, 2021 10:00 PM

March 31, 2021

Pablo Saavedra

Mini-PC Fulong 2.0 Inside

The history of the world is a continuous succession of  contradictions. The announcement from MIPS Technologies about their decision of definitely abandoning MIPS arch in favour of RISC-V is just another example. But the truth is that things are far from trivial in this topic. Even when the end-of-life date for the MIPS architecture looks closer in time than ever,  there are still infrastructures and platforms what need to keep being supported and maintained for this architecture in the meantime. To make the situation more complex, at the same time I am writing this post the Loongson Technology Ltd  is  announcing a new 16-Core MIPS 12nm CPU for 256-Core (Tom’s Hardware news). Loongson Technology also says that they keep a strong commitment with RISC-V for the future but they will keep their bet for MIPS64 in the meantime. So if MIPS is going to die it going be in lovely death.

In this context, here in Igalia we are hosting and maintaining the CI workers for the JavaScriptCore 32-bit (MIPS) infrastructure for the WebKit web browser engine.

No one ever said that finding end-user hardware of this kind of system is easy-peasy. The options in the market often don’t achieve the sufficient level of maturity or come with a poor set of hardware specifications. The choices are often not intended for long time-consuming CPU tasks, or they simply lack good OS support (maintenance, updates, custom kernels, open-source drivers …).

Nowadays we are using a parallelized cluster of MIPSEL CI 20 boards to move the JavaScriptCore 32-bits (MIPS) CI workers. Don’t get me wrong: the CI 20 boards are certainly not bad. These boards are really great for development and evaluation purposes, but even rare failures become commonplace when you run 30 of them 24/7 in parallel. For this reason some time ago we started looking for an alternative that would eventually replace them. And this was when we found the following candidate.

The candidate

We had a look at what Debian was using for their QA infrastructure and talked to the MIPS team – credits to Berto García who helped us with this – and we concluded that the Loongson 3B4000 MIPSel board was a promising option so we decided to explore it.

We started looking for information about this CPU model and we found references for the Loongson 3A4000 + Fuloong 2.0 Mini-PC. This computer is a kind of very interesting end-user product based on the MIPS64el architecture. In particular, this computer uses a similar but more recent and powerful evolution of the Loongson 3B4000 processor. The Fuloong 2.0 comes in a barebone format with the Loongson-3A R4 (Loongson-3A4000) @ 1500MHz, a quad-core processor, with 8GB DDR4 RAM and a 1TB NVMe of internal storage. These technical specifications are completed with a Realtek ALC662 sound card, 2x USB 3.0 ports + 1x USB Type-C + 4x USB 2.0, 2x HDMI video outputs, 2x Ethernet (WGI211AT), audio connectors, M.2 slot for WiFi module and, finally, a Vivante GL1000 GPU (OpenGL ES 2.0/1.1). This specifications are clearly far from the common constraints of the regular development MIPS boards and are technically a serious candidate for replacing the current boards used in the CI cluster.

However, the acquisition of this kind of products has some non-technical cons that is important to have in mind before taking any decision. For example, it is very difficult to find a reseller in Europe providing this kind of machines. This means that this computer needs to be directly shipped from China, which also means that the acquisition process can suffer from the common problems of this kind of orders: higher delivery time (~1 month), paperwork for customs, taxes, delivery tracking issues … Anyway, this post is intended to keep the focus on the technical details ;-). The fact is, once these issues are solved you will receive a machine similar to this one shown in the photos:

Mini-PC Fulong 2.0 running Linux Distro Mini-PC Fulong 2.0 Closed Mini-PC Fulong 2.0 Inside

The unboxing

The machine comes with a pre-installed custom distro (“Dragon Dream F28”, based on Fedora 28). This distro is quite old but it is the one provided by the manufacturer (Lemote). Apparently it is the only one that, in theory, fully supports the machine. The installed image comes with a desktop environment on top of an X server. The distro is also synced with an RPM repository hosted by Lemote. This is really convenient to start experimenting with the computer and very useful to get information about the system before taking any action on the computer. Here is the output of some commands:

# cat /proc/cpuinfo
system type : generic-loongson-machine
machine : loongson,generic
processor : 0
cpu model : Loongson-3 V0.4 FPU V0.1
model name : Loongson-3A R4 (Loongson-3A4000) @ 1500MHz
CPU MHz : 1500.00
BogoMIPS : 2990.15
wait instruction : yes
microsecond timers : yes
tlb_entries : 2112
extra interrupt vector : no
hardware watchpoint : no
isa : mips1 mips2 mips3 mips4 mips5 mips32r1 mips32r2 mips64r1 mips64r2
ASEs implemented : vz msa loongson-mmi loongson-cam loongson-ext loongson-ext2
shadow register sets : 1
kscratch registers : 6
package : 0
core : 0
... (x4)

dmesg:

Mar 9 12:43:19 fuloong-01 kernel: [ 2.884260] Console: switching to colour frame buffer device 240x67 
Mar 9 12:43:19 fuloong-01 kernel: [ 2.915928] loongson-drm 0000:00:06.1: fb0: loongson-drmdrm frame buffer device 
Mar 9 12:43:19 fuloong-01 kernel: [ 2.919792] etnaviv 0000:00:06.0: Device 14:7a15, irq 93 
Mar 9 12:43:19 fuloong-01 kernel: [ 2.920249] etnaviv 0000:00:06.0: model: GC1000, revision: 5037 
Mar 9 12:43:19 fuloong-01 kernel: [ 2.920378] [drm] Initialized etnaviv 1.3.0 20151214 for 0000:00:06.0 on minor 1

lsblk:

# lsblk
nvme0n1 259:0 0 477G 0 disk
├─nvme0n1p1 259:1 0 190M 0 part /boot/efi
├─nvme0n1p2 259:2 0 1,7G 0 part /boot
├─nvme0n1p3 259:3 0 7,5G 0 part [SWAP]
├─nvme0n1p4 259:4 0 46,6G 0 part /
└─nvme0n1p5 259:5 0 421,1G 0 part /home

Getting Debian into the Fuloong 2.0

The WebKitGTK and WPE WebKit CI infrastructure is entirely based on Debian Stable and/or Ubuntu LTS. This is according to the WebKitGTK maintenance and development policy. For that reason we were pretty interested in getting the machine running with Debian Stable (“buster” as of this writing). So what comes next is the description of the installation process of a pure Debian base system hybridized with the Lemote Fedora Linux kernel using an external USB storage stick as the bootable disk. The process is a mix between the following two documents:

Those documents provide a good detailed explanation of the steps to follow to perform the installation. Only the installation of the kernel and the grub2-efi differs a bit but let’s come back to that later. The idea is:

  • Set the EFI/BIOS to boot from the USB storage (EFI)
  • Install the base Debian OS in a external microSD card connected to the USB3-SS port
  • Keep using the internal nvme disk as the working dir (/home, /var/lib/lxc)

The installation process is initiated in the pre-installed Fedora image. The first action is to mount the external USB storage (sda) in the living system as follows:

# lsblk
sda 8:0 1 14,9G 0 disk
├─sda1 8:1 1 200M 0 part /mnt/debinst/boot/efi
└─sda2 8:2 1 10G 0 part /mnt/debinst
nvme0n1 259:0 0 477G 0 disk
├─nvme0n1p1 259:1 0 190M 0 part /boot/efi
├─nvme0n1p2 259:2 0 1,7G 0 part /boot
├─nvme0n1p3 259:3 0 7,5G 0 part [SWAP]
├─nvme0n1p4 259:4 0 46,6G 0 part /
└─nvme0n1p5 259:5 0 421,1G 0 part /home

As I said, the steps to install the Debian system into the SDcard are quite straightforward. The problems begins during the installation of GRUB and the Linux kernel …

The Linux Kernel

Having followed the guide we will reach the Install a Kernel step. Debian provides a Loongson Linux 4.19 kernel for the Loongson 3A/3B boards.

ii linux-image-4.19.0-14-loongson-3 4.19.171-2 mips64el Linux 4.19 for Loongson 3A/3B
ii linux-image-loongson-3 4.19+105+deb10u9 mips64el Linux for Loongson 3A/3B (meta-package)
ii linux-libc-dev:mips64el 4.19.171-2 mips64el Linux support headers for userspace development

It is quite old in comparison with the one that the Lemote Fedora distro contains (5.4.63-20201012-def) so I prefered to keep the one, although it should be possible to get the machine running with this kernel as well.

Grub2 EFI, first attempt trying to build it for the device

This is the main issue that I found. The first thing that I tried was to look for a GRUB package with EFI support in the mips64el Debian chroot:

root@fuloong-01:/# apt search grub | grep efi
<<empty>>

The frustration came quickly when I didn’t find any GRUB candidate. It was then when I remembered that there was a grub-yeeloong package in the Debian repository that could be useful in this case. The Yeeloong is the predecessor of the Loongson so what I tried next was to rebuild the GRUB package but adding the mips64el architecture for the grub-yeeloong package. Something like the following:

  • Getting the Debian sources and dependencies for the grub2 packages:
    apt source grub2
    apt install debhelper patchutils python flex bison po-debconf help2man texinfo xfonts-unifont libfreetype6-dev gettext libdevmapper-dev libsdl1.2-dev xorriso parted libfuse-dev ttf-dejavu-core liblzma-dev wamerican pkg-config bash-completion build-essentia
    
  • Patching the /debian/control file using this patch
  • … and then to build the Debian package:
    ~/debs# cd grub2-2.02+dfsg1 && dpkg-buildpackage
    
    ~/debs/grub2-2.02+dfsg1# ls ../
    grub-common-dbgsym_2.02+dfsg1-20+deb10u3_mips64el.deb grub-yeeloong_2.02+dfsg1-20+deb10u3_mips64el.deb grub2_2.02+dfsg1-20+deb10u3.debian.tar.xz grub2_2.02+dfsg1.orig.tar.xz
    grub-common_2.02+dfsg1-20+deb10u3_mips64el.deb grub2-2.02+dfsg1 grub2_2.02+dfsg1-20+deb10u3.dsc
    grub-mount-udeb_2.02+dfsg1-20+deb10u3_mips64el.udeb grub2-common-dbgsym_2.02+dfsg1-20+deb10u3_mips64el.deb grub2_2.02+dfsg1-20+deb10u3_mips64el.buildinfo
    grub-yeeloong-bin_2.02+dfsg1-20+deb10u3_mips64el.deb grub2-common_2.02+dfsg1-20+deb10u3_mips64el.deb grub2_2.02+dfsg1-20+deb10u3_mips64el.changes
    

The .deb package is built correctly but the problem is the binary. It lacks EFI runtime support so it is not useful in our case:

*******************************************************
GRUB2 will be compiled with following components:
Platform: mipsel-none <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
With devmapper support: Yes
With memory debugging: No
With disk cache statistics: No
With boot time statistics: No
efiemu runtime: No (only available on i386)
grub-mkfont: Yes
grub-mount: Yes
starfield theme: Yes
With DejaVuSans font from /usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf
With libzfs support: No (need zfs library)
Build-time grub-mkfont: Yes
With unifont from /usr/share/fonts/X11/misc/unifont.pcf.gz
With liblzma from -llzma (support for XZ-compressed mips images)
With quiet boot: No
*******************************************************

This is what happens if you still try to install it:

root@fuloong-01:~/debs/grub2-2.02+dfsg1# dpkg -i ../grub-yeeloong-bin_2.02+dfsg1-20+deb10u3_mips64el.deb ../grub-common_2.02+dfsg1-20+deb10u3_mips64el.deb ../grub2-common_2.02+dfsg1-20+deb10u3_mips64el.deb
root@fuloong-01:~/debs/grub2-2.02+dfsg1# grub-install /dev/sda
Installing for mipsel-loongson platform.
...
grub-install: warning: WARNING: no platform-specific install was performed. <<<<<<<<<<
Installation finished. No error reported.

There is not glue between EFI and GRUB. Files like BOOTMIPS.EFI, gcdmips64el.efi and grub.efi are missing so this is package is not useful at all:

root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/
System.map-4.19.0-14-loongson-3 config-4.19.0-14-loongson-3 efi grub grub.elf initrd.img-4.19.0-14-loongson-3 vmlinux-4.19.0-14-loongson-3
root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/grub
fonts grubenv locale mipsel-loongson
root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/efi/
<<empty>>
root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/
System.map-4.19.0-14-loongson-3 config-4.19.0-14-loongson-3 efi grub grub.elf initrd.img-4.19.0-14-loongson-3 vmlinux-4.19.0-14-loongson-3
root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/grub
grub/ grub.elf

The grub-install command will also confirm that the mips64el-efi target is not supported:

root@fuloong-01:~/debs/grub2-2.02+dfsg1# /usr/sbin/grub-install --help
Usage: grub-install [OPTION...] [OPTION] [INSTALL_DEVICE]
Install GRUB on your drive.
...
--target=TARGET install GRUB for TARGET platform
[default=mipsel-loongson]; available targets:
arm-efi, arm-uboot, arm64-efi, i386-coreboot,
i386-efi, i386-ieee1275, i386-multiboot, i386-pc,
i386-qemu, i386-xen, i386-xen_pvh, ia64-efi,
mips-arc, mips-qemu_mips, mipsel-arc,
mipsel-loongson, mipsel-qemu_mips,
powerpc-ieee1275, sparc64-ieee1275, x86_64-efi,
x86_64-xen

Second attempt, the loongson-community Grub2 EFI

Now that we know that we can not use an official Debian package to install and configure GRUB it is time for a bit of google-fu.

I must have a lot of practice since it only took me a short while to find that the Lemote Fedora distro provides its own GRUB package for the Loongson and, later, I found new hope reading this article. This article explains how to build the GRUB from loongson-community with EFI support so what I would do next was the obvious logical step: To try to build it and check it:

    • git clone https://github.com/loongson-community/grub.git
      cd grub
      bash autogen.sh
      ./configure --prefix=/opt/alternative/
      make ; make install
    • The configure output looks promising:
      *******************************************************
      GRUB2 will be compiled with following components:
      Platform: mips64el-efi <<<<<<<<<<<<<<<<< Looks good.
      With devmapper support: Yes
      With memory debugging: No
      With disk cache statistics: No
      With boot time statistics: No
      efiemu runtime: No (not available on efi)
      grub-mkfont: No (need freetype2 library)
      grub-mount: Yes
      starfield theme: No (No build-time grub-mkfont)
      With libzfs support: No (need zfs library)
      Build-time grub-mkfont: No (need freetype2 library)
      Without unifont (no build-time grub-mkfont)
      With liblzma from -llzma (support for XZ-compressed mips images)
      *******************************************************

    … but unfortunately I started to have more and more build errors in every step. Errors like these:

cc1: error: position-independent code requires ‘-mabicalls’
grub_script.yy.c:19:22: error: statement with no effect [-Werror=unused-value]
build-grub-module-verifier: error: unsupported relocation 0x51807.

… so after several attempts I finally gave up trying to build the loongson-community with GRUB EFI support. Here the patch with some of the modifications that I tried in the code just in case you are better at solving these build errors than me.

Third attempt, reusing the GRUB2 EFI resources from the pre-installed system

… and the last one.

My winner horse was the simpler solution: to reuse the /boot and /boot/efi directories installed in the Fedora system as base for a new Debian system:

    • Clone the tree in the destination dir:
      cp -a /boot /mnt/debinst/boot
    • Replace the UUIDs patch

    The /boot dir in the target installation will be look like this:

    [root@fuloong-01 boot]# tree /mnt/debinst/boot/
    /mnt/debinst/boot/
    ├── boot -> .
    ├── config-5.4.60-1.fc28.lemote.mips64el
    ├── e8a27b4e4fcc4db9ab7a64bd81393773
    │   └── 5.4.60-1.fc28.lemote.mips64el
    │   ├── initrd
    │   └── linux
    ├── efi
    │   ├── boot
    │   │   ├── grub.cfg
    │   │   └── grub.efi
    │   ├── EFI
    │   │   ├── BOOT
    │   │   │   ├── BOOTMIPS.EFI
    │   │   │   ├── fonts
    │   │   │   │   └── unicode.pf2
    │   │   │   ├── gcdmips64el.efi
    │   │   │   ├── grub.cfg
    │   │   │   └── grubenv
    │   │   └── fedora
    │   ├── mach_kernel
    │   └── System
    │   └── Library
    │   └── CoreServices
    │   └── SystemVersion.plist
    ├── extlinux
    ├── grub2
    │   ├── grubenv -> ../efi/EFI/BOOT/grubenv
    │   └── themes
    │   └── system
    │   ├── background.png
    │   └── fireworks.png
    ├── grub.cfg
    ├── grub.efi
    ├── initramfs-5.4.60-1.fc28.lemote.mips64el.img
    ├── loader
    │   └── entries
    │   └── e8a27b4e4fcc4db9ab7a64bd81393773-5.4.60-1.fc28.lemote.mips64el.conf
    ├── lost+found
    ├── System.map-5.4.60-1.fc28.lemote.mips64el
    ├── vmlinuz-205
    └── vmlinuz-5.4.60-1.fc28.lemote.mips64el

… et voilà!

Finally we have a pure Debian Buster root base system hybridized with the Lemote Fedora Linux kernel:

root@fuloong-01:~# cat /etc/debian_version
10.8
root@fuloong-01:~# uname -a
Linux fuloong-01 5.4.60-1.fc28.lemote.mips64el #1 SMP PREEMPT Mon Aug 24 09:33:35 CST 2020 mips64 GNU/Linux
root@fuloong-01:~# cat /etc/apt/sources.list
deb http://httpredir.debian.org/debian buster main contrib non-free 
deb-src http://httpredir.debian.org/debian buster main contrib non-free 
deb http://security.debian.org/ buster/updates main contrib non-free 
deb http://httpredir.debian.org/debian/ buster-updates main contrib non-free
root@fuloong-01:~# apt update
Hit:1 http://httpredir.debian.org/debian buster InRelease
Get:2 http://security.debian.org buster/updates InRelease [65,4 kB]
Get:3 http://httpredir.debian.org/debian buster-updates InRelease [51,9 kB]
Get:4 http://security.debian.org buster/updates/main mips64el Packages [242 kB]
Get:5 http://security.debian.org buster/updates/main Translation-en [142 kB]
Fetched 501 kB in 1s (417 kB/s)                                
Reading package lists... Done
Building dependency tree       
Reading state information... Done
3 packages can be upgraded. Run 'apt list --upgradable' to see them.

With this hardware we can reasonably run native GDB directly on it and have the possibility to run other tools in the host (e.g. you can run any monitoring agent on it to get stats and so). Definitely, having this hardware enabled for using it in the CI infrastructure will be a promising step towards a better QA for the project.
That is all from my side. I will probably continue dedicating some time to get buildable packages of GRUB-EFI and the Linux Kernel that we could use for this and similar machines (e.g. for tools like perf who needs to have the userspace binaries in sync with the kernel version). In the meantime, I really hope that this can be useful to someone out there who is interested in this hardware. If you have some comment or question or you simply wish to share your thoughts about this just leave a comment.

Stay safe!

by Pablo Saavedra at March 31, 2021 07:16 AM

March 25, 2021

Andy Wingo

here we go again

Around 18 months ago, Richard Stallman was forced to resign from the Free Software Foundation board of directors and as president. It could have been anything -- at that point he already had a history of behaving in a way that was particularly alienating to women -- but in the end it was his insinuation that it was somehow OK if his recently-deceased mentor Marvin Minsky, then in his 70s or 80s, had sex with a 17-year-old on Jeffrey Epstein's private island. A weird pick of hill to stake one's reputation on, to say the least.

At the time I was relieved that we would finally be getting some leadership renewal at the FSF, and hopeful that we could get some mission renewal as well. I was also looking forward to the practical implications would be for the GNU project, as more people agreed that GNU was about software freedom and not about its founder.

But now we're back! Not only has RMS continued through this whole time to insist that he runs the GNU project -- something that is simply not the case, in my estimation -- but this week, a majority of a small self-selected group of people, essentially a subset of current and former members of the FSF board of directors and including RMS himself, elected to reinstate RMS to the board of the Free Software Foundation. Um... read the room, FSF voting members? What kind of message are you sending?

In this context I can only agree with the calls for the entire FSF board to resign. The board is clearly not fit for purpose, if it can make choices like this.

dissociation?

I haven't (yet?) signed the open letter because I would be in an inconsistent position if I did so. The letter enjoins people to "refuse to contribute to projects related to the FSF and RMS"; as a co-maintainer of GNU Guile, which has its origins in the heady 1990s of the FSF but has nothing to do any more with RMS, but whose copyrights are entirely held by the FSF, is hosted on FSF-run servers, and is even obliged (GPLv3 §5d, as referenced by LGPLv3) to print out Copyright (C) 1995-2021 Free Software Foundation, Inc. when it starts, I must admit that I contribute to a project that is "related to the FSF". But I don't see how Guile could continue this association, if the FSF board continues as it is. It's bad for contributors and for the future of the project.

It would be very tricky to disentangle Guile from the FSF -- consider hosting, for example -- so it's not the work of a day, but it's something to think about.

Of course I would rather that the FSF wouldn't allow itself to be seen as an essentially misogynist organization. So clean house, FSF!

on the nature of fire

Reflecting on how specifically we could have gotten here -- I don't know. I don't know the set of voting members at the FSF, what discussions were made, who voted what. But, having worked as a volunteer on GNU projects for almost two decades now, I have a guess. RMS and his closest supporters see themselves as guardians of the flame of free software -- a lost world of the late 70s MIT AI lab, reborn in a flurry of mid-80s hack, but since 25 years or so, slipping further and further away. These are dark times, in their view, and having the principled founder in a leadership role can only be a good thing.

(Of course, the environment in the AI lab was only good for some. The treatment of Margaret Hamilton as recounted in Levy's Hackers shows that not all were welcome. If this were just one story, I would discount it, but looking back, it does seem to be part of a pattern.)

But is that what the FSF is for today? If so, Guile should certainly leave. I'm not here for software as perfomative nostalgia -- I'm here to have fun with friends and start a fire. The FSF should look to do the same -- look at the world we are in, look where the energy is now, and engage in real conversations about success and failure and tactics. There is a world to win and doubling down on RMS won't get us there from here.

by Andy Wingo at March 25, 2021 12:22 PM

March 17, 2021

Samuel Iglesias

VK_KHR_depth_stencil_resolve support on Turnip

Last year, I have been working on Turnip driver development as my daily job at Igalia. One of those tasks was implementing the support for VK_KHR_depth_stencil_resolve extension.

VK_KHR_depth_stencil_resolve

This extension adds support for automatically resolving multisampled depth/stencil attachments in a subpass in a similar manner as for color attachments. This extension, however, does not add support for resolving msaa depth/stencil images with vkCmdResolveImage() command.

As you can imagine, this extension is easy to use by any application. Unless you are using a driver that supports Vulkan 1.2 or higher (VK_KHR_depth_stencil_resolve was promoted to core in Vulkan 1.2), you should check first if it is supported. Once done that, ask the driver which features are supported by extending VkPhysicalDeviceProperties2 with VkPhysicalDeviceDepthStencilResolveProperties structure when calling vkGetPhysicalDeviceProperties2().

struct VkPhysicalDeviceDepthStencilResolveProperties {
    VkStructureType       sType;
    void*                 pNext;
    VkResolveModeFlags    supportedDepthResolveModes;
    VkResolveModeFlags    supportedStencilResolveModes;
    VkBool32              independentResolveNone;
    VkBool32              independentResolve;
}

This structure will be filled by the driver to indicate the different depth/stencil resolve modes supported by the driver (more info about their meaning and possible values in the spec).

Next step: just fill a VkSubpassDescriptionDepthStencilResolve struct with the resolve mode wanted and the depth/stencil attachment used for resolve. Then, extend the VkSubpassDescription2 struct with it. And that’s all.

struct VkSubpassDescriptionDepthStencilResolve {
    VkStructureType                  sType;
    const void*                      pNext;
    VkResolveModeFlagBits            depthResolveMode;
    VkResolveModeFlagBits            stencilResolveMode;
    const VkAttachmentReference2*    pDepthStencilResolveAttachment;
}

Turnip implementation

Implementing this extension on Turnip was more or less direct, although there were some issues to fix.

For all depth/stencil formats, it was added its support in the both resolve paths used by the driver, one for sysmem (system memory) and other for gmem (tile buffer), including flushing the depth cache when needed. However, for VK_FORMAT_D32_SFLOAT_S8_UINT format, which has the stencil part in a separate plane, it was needed to add specific code to extend the 2D path used in the driver for sysmem resolves.

However the main issue when was testing the extension implementation on Adreno 630 GPU. It turns out that VK-GL-CTS tests exercising this extension for MSAA VK_FORMAT_D24_UNORM_S8_UINT formats were failing always, except when disabling UBWC via TU_DEBUG=noubwc environment variable. UBWC (Universal Bandwidth Compression) is a HW feature designed to improve throughput to system memory by minimizing the bandwidth of data (which also gives some power savings). The problem was that the UBWC support for MSAA VK_FORMAT_D24_UNORM_S8_UINT format is known to be failing on Adreno 630 and Adreno 618 (see merge request for freedreno driver). I just needed to disable it Turnip to fix these failures.

I also found other VK_KHR_depth_stencil_resolve CTS tests failing: the ones testing the format compatibility for VK_FORMAT_D32_SFLOAT_S8_UINT and VK_FORMAT_D24_UNORM_S8_UINT formats. For VK_FORMAT_D32_SFLOAT_S8_UINT failures, it was needed to take into account the particularity that it has a separate plane for the stencil part when resolving it to VK_FORMAT_S8_UINT. In the VK_FORMAT_D24_UNORM_S8_UINT failures, the problem was that we were setting wrongly the resolve mode used by the HW: it was wrongly doing a sample average, when we wanted to use the value of sample 0. This merge request fixed both issues.

And that’s all, this was a extension that allowed me to dive into the different resolve paths used by Turnip and learn one or two things about the HW ;-) Thanks a lot to Jonathan Marek for his reviews and suggestions to improve the implementation of this extension.

Happy hacking!

March 17, 2021 08:14 AM

March 16, 2021

Iago Toral

Improving performance of the V3D compiler for OpenGL and Vulkan

Lately, I have been looking at improving performance of the V3DV Vulkan driver for the Raspberry Pi 4. So far we had been toying a lot with some Vulkan ports of the Quake trilogy but we wanted to have a look at more modern games as well, and for that we started to look at some Unreal Engine 4 samples, particularly the Shooter demo.


Unreal Engine 4 Shooter Demo

In our initial tests with “High” settings, even at 480p we were running the sample at 8-15 fps, with 720p being mostly in the 5-10 fps range. Not great, obviously, but a good opportunity to guide and focus our performance efforts.

One aspect of the UE4 sample that was immediately obvious compared to the Quake games is that the shading is a lot more expensive in general, and more specifically, it involves more texture lookups and UBO loads, which require expensive accesses to memory from the shader cores, so this was the first area we targeted. The good thing about this is that because our Vulkan and OpenGL drivers share the compiler stack, any optimizations we do here benefit both drivers.

What follows is a brief discussion of some of the optimizations we did recently to our backend compiler and the results we have observed from this work.

Optimizing the backend compiler

So the first thing we tackled was better managing the latency of texture lookups. Interestingly, the NIR scheduler was setting this up so that we would try to put instructions that consumed the result of a texture lookup as far away as possible from the instruction that triggered the lookup, but then the backend compiler was not fully taking advantage of this and would still end up waiting on lookup results sooner than it should.

Fixing this helped performance by 1%-3%, although it could go a bit above that in some cases. It has a caveat though: doing this extends the liveness of our lookup sequences, and that makes spilling more difficult (we can’t emit spills/unspills in the middle of an outstanding memory lookup), so when register pressure is high enough that we need to inject register spills to compile the shader, we would typically be a lot more constrained and end up producing significantly worse code, or even worse, failing to register allocate for the shader completely. To avoid this, we recompile the shader without the optimization if we detect that we need to do any spilling. One can use V3D_DEBUG=perf to detect if this is happening for any shaders, looking for messages like this:

Falling back to strategy 'disable TMU pipelining' for MESA_SHADER_FRAGMENT.

While the above optimization was useful, I was expecting that it would make a larger impact for this particular demo so I kept looking for ways to do our memory lookups more efficient. One thing that is relevant to this analysis is that we were using the same hardware unit for both texture and UBO lookups, but for the latter, we could really use a different strategy by handling our UBOs as uniform streams. This has some caveats, but making a long story short, the fact that many UBO loads use uniform addresses, that we usually read a bunch of consecutive scalars from a UBO load (such as a full vec4) and that applications usually emit UBO loads for nearby addresses together, we can emit fairly optimal code for many of these, leading to more efficient memory access in general.

Eventually, this turned out to be a big win, yielding 20%-30% improvements for the Shooter demo even with the initial basic implementation, which we would then tune and optimize further.

Again related to memory accesses, I have also been improving how we schedule instructions involved with setting up memory lookups. Our scheduler was more restrictive here than it needed, and the extra flexibility can help reduce instruction counts, which will affect these more modern games most, as they are more likely to emit a larger number of texture/image operations in their shaders.

Another thing we did was to improve instruction packing. The V3D GPU is able to emit multiple instructions in the same cycle so long as the instructions meet some requirements. Turns out our assembly scheduler was being too restrictive with this and we could do better. Going by shader-db results, this led to ~5% less instructions on our programs and added another modest 1%-2% performance improvement for the Shooter demo.

Another issue we noticed is that a lot of the shaders were passing a lot of varyings from the vertex to fragment shaders, and our setup for fragment shader inputs was not optimal. The issue here was that there is a specific instruction involved in this process that writes two registers, one of then with an instruction of delay, and our dependency tracking was not able to handle this properly, effectively assuming that both registers are written in the same instruction which then had an impact in how we scheduled instructions. Fixing this required to handle this case specially in our scheduler so we could be more effective at scheduling these instructions in a way that would enable optimal pipelining of the instructions involved with varying setups for fragment shaders.

Another aspect we improved was related to our uniform handling. Generally, there many instances in which we need to emit duplicate uniforms. There are reasons for that related to how the GPU works, but in some cases, for example with consecutive UBO loads, we would emit the uniform with the base address of the UBO multiple times very close to each other. We can obviously do better by trying to track previous uses of a uniform/constant value and reusing them in nearby instructions. Of course, this comes at the expense of increasing register pressure (for reasons beyond the scope of this post our shaders typically require a lot of uniforms), so there is a balancing game to play here too. Reducing the size of our uniform streams this way also plays an important role in lowering some of the CPU overhead of the driver, since these streams need to be rebuilt often when certain pipeline states change.

Finally, an optimization that was more targeted at reducing register pressure rather than improving performance: we noticed that we would sometimes put some instructions far away from their consumers with no obvious benefit to it. This was bad enough some times that it would even cause us to be unable to compile some shaders. Mesa has a handy NIR pass for this called nir_opt_sink, which also proved to be helpful for this. This allowed us to get a few more shaders from shader-db to compile and reduce spilling for a bunch of shaders. For the Shooter demo, this changed a large compute shader involved with histogram post-processing, which had 48 spills and 50 fills to only have 8 spills and 15 fills. While the impact in performance of this is probably very small since the game only runs this pass once per frame, it made a notable difference in loading time, since compiling a shader with this much spilling is very slow at present. I have a mental note to improve this some day, I know Intel managed to fix this for their compiler, but for the time being, this alone managed to make the loading time much more reasonable.

Results

First, here are some shader-db stats, which describe how these optimizations change various statistics for a large collections of real world shaders:

total instructions in shared programs: 14992758 -> 13731927 (-8.41%)
instructions in affected programs: 14003658 -> 12742827 (-9.00%)
helped: 80448
HURT: 4297

total threads in shared programs: 407932 -> 412242 (1.06%)
threads in affected programs: 4514 -> 8824 (95.48%)
helped: 2189
HURT: 34

total uniforms in shared programs: 4069524 -> 3790401 (-6.86%)
uniforms in affected programs: 2267834 -> 1988711 (-12.31%)
helped: 40588
HURT: 1186

total max-temps in shared programs: 2388462 -> 2322009 (-2.78%)
max-temps in affected programs: 897803 -> 831350 (-7.40%)
helped: 30598
HURT: 2464

total spills in shared programs: 6241 -> 5940 (-4.82%)
spills in affected programs: 3963 -> 3662 (-7.60%)
helped: 75
HURT: 24

total fills in shared programs: 14587 -> 13372 (-8.33%)
fills in affected programs: 11192 -> 9977 (-10.86%)
helped: 90
HURT: 14

total sfu-stalls in shared programs: 28106 -> 31489 (12.04%)
sfu-stalls in affected programs: 16039 -> 19422 (21.09%)
helped: 4382
HURT: 6429

total inst-and-stalls in shared programs: 15020864 -> 13763416 (-8.37%)
inst-and-stalls in affected programs: 14028723 -> 12771275 (-8.96%)
helped: 80396
HURT: 4305

Less is better for all stats, except threads. We can see significant improvements across the board: we generally produce shaders with less instructions that have less maximum register pressure, we reduce spills and uniform counts and can run with more threads. We only are worse at stalls, but that is generally because we now produce more compact code with less instructions, so more stalls are expected.

Another good thing in these stats is the large number of helped shaders compared to hurt shaders, meaning that it is very likely that these optimizations will help most applications to some extent.

But enough of boring compiler statistics, most people won’t care about that and what they want to know is how this impacts performance on actual games and applications, which is what the following graphs shows (these were obtained by replaying specific traces with gfx-reconstruct). Keep in mind that while I am using a collection of Vulkan samples/games here it is expected that these optimizations apply to OpenGL too.


Before vs After

Framerate improvement after optimization (in %)

As it can be observed from the graph above, this optimization work made a significant impact in the observed framerate in all the cases. It is not surprising that the UE4 demo is the one that sees the most improvement, considering this the one we used to guide most of the optimization work.

Other optimizations and future work

In this post I have been focusing exclusively on compiler optimizations, but we have also been improving other parts of the Vulkan driver. While I won’t go into details to avoid making this post too long, we have also been improving aspects of the driver involved with buffer to image copies, depth buffer clears, dirty descriptor state management, usage of the TFU unit for transfer operations and more.

Finally, there is one other aspect of this UE4 demo that is pretty obvious as soon as you start a game: it can compile a lot of shaders in the middle of the gameplay loop which can lead to significant stutter. While there is not much we can do about this on the driver side, but adding support for a shader cache on disk should eliminate the problem on sessions after the first, so this is something that we may work on in the future.

We will certainly continue to look at improving the driver performance in the future so stay tuned for further updates on our progress or maybe join us at #videocore on Freenode.

by Iago Toral at March 16, 2021 09:36 AM

Eric Meyer

First Month at Igalia

Today marks one month at Igalia.  It’s been a lot, and there’s more to come, but it’s been a really great experience.  I get to do things I really enjoy and value, and Igalia supports and encourages all of it without trying to steer me in specific directions.  I’ve been incredibly lucky to experience that kind of working environment twice in my life — and the other one was an outfit I helped create.

Here’s a summary of what I’ve been up to:

  • Generally got up to speed on what Igalia is working on (spoiler: a lot).
  • Redesigned parts of wpewebkit.org, fixed a few outstanding bugs, edited most of the rest. (The site runs on 11ty, so I’ve been learning that as well.)
  • Wrote a bunch of CSS tests/demos that will form the basis for other works, like articles and videos.
  • Drafted a few of said articles.  As I write this, two are very close to being complete, and a third is almost ready for editing.
  • Edited some pages on the Mozilla Developer Network (MDN), clarifying or upgrading text in some places and replacing unclear examples in others.
  • Joined the Open Web Docs Steering Committee.
  • Reviewed various specs and proposals (e.g., Miriam’s very interesting @scope proposal).

And that’s not all!  Here’s what I have planned for the next few months:

  • More contributions to MDN, much of it in the CSS space, but also branching out into documenting some up-and-coming APIs in areas that are fairly new to me.  (Details to come!)
  • Contributions to the Web Platform Tests (WPT), once I get familiar with how that process is structured.
  • Articles on topics that will include (but are not limited to!) gaps in CSS, logical properties, and styling based on writing direction.  I haven’t actually settled on outlets for those yet, so if you’d be interested in publishing any of them, hit me up.  I usually aim for about a thousand words, including example markup and CSS.
  • Very likely will rejoin the CSS Working Group after a (mumblecough)-year absence.
  • Assembling a Raspberry Pi system to test out WPEWebKit in its native, embedded environment and get a handle on how to create a “setting up WPEWebKit for total embedded-device noobs”, of which I am one.

That last one will be an entirely new area for me, as I’ve never really worked with an embedded-device browser before.  WPEWebKit is a WebKit port, actually the official WebKit port for embedded devices, and as such is aggressively tuned for performance and low resource demand.  I’m really looking forward to not only seeing what it’s like to use it, but also how I might be able to leverage it into some interesting projects.

WPEWebKit is one of the reasons why Igalia is such a big contributor to WebKit, helping drive its standards support forward and raise its interoperability with other browser engines.  There’s a thread of self-interest there: a better WebKit means a better WPEWebKit, which means more capable embedded devices for Igalia’s clients.  But after a month on the inside, I feel comfortable saying most of Igalia’s commitment to interoperability is philosophical in nature — they truly believe that more consistency and capability in web browsers benefits everyone.  As in, THIS IS FOR EVERYONE.

And to go along with that, more knowledge and awareness is seen as an unvarnished good, which is why they’re having me working on MDN content.  To that end, I’m putting out an invitation here and now: if you come across a page on MDN about CSS or HTML that confuses you, or seems inaccurate, or just doesn’t have much information at all, please get in touch to let me know, particularly if you are not a native English speaker.

I can’t offer translation services, unfortunately, but I can do my best to make the English content of MDN as clear as possible.  Sometimes, what makes sense to a native English speaker is obscure or unclear to others.  So while this offer is open to everyone, don’t hold back if you’re struggling to parse the English.  It’s more likely the English is unclear and imprecise, and I’d like to erase that barrier if I can.

The best way to submit a report is to send me email with [MDN] and the URL of the page you’re writing about in the subject line.  If you’re writing about a collection of pages, put the URLs into the email body rather than the subject line, but please keep the [MDN] in the subject so I can track it more easily.  You can also ping me on Twitter, though I’ll probably ask you to email me so I don’t lose track of the report.  Just FYI.

I feel like there was more, but this is getting long enough and anyway, it already seems like a lot.  I can’t wait to share more with you in the coming months!


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at March 16, 2021 12:53 AM

March 09, 2021

Igalia Compilers Team

Igalia’s Compilers Team in 2020

In a previous blog post, we introduced the kind of work the Igalia compilers team does and gave a mid-year update on our 2020 progress.

Now that we have made our way into 2021, we wanted to recap our achievements from 2020 and update you on the exciting improvements we have been making to the web programming platform. Of course, we couldn’t have done this work alone; all of this was brought to you through our collaborations with our clients and upstream partners in the web ecosystem.

https://twitter.com/rkirsling/status/1276299298020290561

JavaScript class features

Our engineers at Igalia have continued to push forward on improvements to JS classes. Earlier in 2020, we had landed support for public field declarations in JavaScriptCore (JSC). In the latter half of 2020, we achieved major milestones such as getting private class fields into JSC (with optimizing compiler support!):

https://twitter.com/robpalmer2/status/1276378092349657095
https://twitter.com/caitp88/status/1318919341467979776

as well as static public and private fields.

We also helped ship private methods and accessors in V8 version 84. Our work on private methods also landed in JSC and we expect it to be available in future releases of Safari.

These additions will help JS developers create better abstractions by encapsulating state and behavior in their classes.

TC39 and Temporal

Our compilers team also contributed throughout 2020 to web standards through its participation in TC39 and related standards bodies.

One of the big areas we have been working on is the Temporal proposal, which aims to provide better date and time handling in JS. When we blogged about this in mid-2020, the proposal was still in Stage 2 but we’re expecting it to go Stage 3 soon in 2021. Igalians have been working hard on many aspects of the proposal since mid-2020, including managing community feedback, working on the polyfill, and maintaining the documentation.

For more info on Temporal, also check out a talk by one of engineers, Ujjwal Sharma, at Holy JS 2020 Piter.

Another area we have been contributing to for a number of years is the ECMA-402 Internationalization (Intl) standard, an important effort that provides i18n support for JS. We help maintain and edit the specification while also contributing tests and pushing Intl proposals forward. For example, we helped with the test suite of the Intl.Segmenter feature for implementing localized text segmentation, which recently shipped in Chrome. For a good overview of other recent Intl efforts, check out these slides from IUC44.

We’re also contributing to many other proposed features for JS, such as WeakRefs, Decimal (Daniel Ehrenberg from our team gave a talk on this at Node.TLV 2020), Import Assertions, Records & Tuples, Top-level await, and Module blocks & module bundling (Daniel also gave a talk on these topics at Holy JS 2020 Moscow).

Node.js

In addition to our contributions to the client side of the web, we are also contributing to server side use of web engines. In particular, we have continued to contribute to Node.js throughout 2020.

Some notable contributions include adding experimental support for per-context memory measurements in version 13 during early 2020.

Since late 2020, we have been working on improving Node.js startup speed by moving more of the bootstrap process into the startup snapshot. For more on this topic, you can watch a talk that one of our engineers, Joyee Cheung, presented at NodeConf Remote 2020 here (slides are available here).

JSC support on 32-bit platforms

Our group also continues to maintain support in JSC for 32-bit platforms. Earlier in 2020 we contributed improvements to JSC on 32-bit such as tail call optimizations, support for checkpoints, and others.

Since then we have been optimizing LLInt (the low-level interpreter for JSC) on 32-bit, and porting the support of inline caching for delete operations to 32-bit (to improve the performance of delete, you can read about the background on the original optimization from the Webkit blog here).

We also blogged about our efforts to support the for-of intrinsic on 32-bit to improve iteration on JS arrays.

WebAssembly

Finally, we have made a number of contributions to WebAssembly (Wasm), the new low-level compiler-target language for the web, on both the specification and implementation sides.

During 2020, we helped ship and standardize several Wasm features in web engines such as support for multiple-values, which can help compilers to Wasm produce better code, and support for BigInt/I64 conversion in the JS API, which lifts a restriction that made it harder to interact with Wasm programs from JS.

We’ve also improved support in tools such as LLVM for the reference types proposal, which adds new types to the language that can represent references to values from JS or other host languages. Eventually reference types will be key to supporting the garbage collection proposal (in which references are extended to new struct and array types), which will allow for easier compilation of languages that use GC to Wasm.

We’re also actively working on web engine support for exception handling, reference types, and other proposals while continuing to contribute to tools and specification work. We plan to help ship more WebAssembly features in browsers during 2021, so look forward to our mid-year update post!

by Compilers Team at March 09, 2021 09:35 AM

March 03, 2021

Samuel Iglesias

Igalia is hiring!

One of the best decisions I did in my life was when I joined Igalia in 2012. Inside Igalia, I have been working in different open-source projects, most of the time related to graphics technologies, interacting with different communities, giving talks, organizing conferences and, more importantly, contributing to free software as my daily job.

Now I’m thrilled to announce that we are hiring for our Graphics team!

Igalia

Right now we have two open positions:

  • Graphics Developer.

    We are looking for candidates that would like to contribute to open-source OpenGL/Vulkan graphics drivers (Mesa), or other areas of the open-source graphics stack such as X11 or Wayland, among others. If you have experience with them, or you are very motivated to become an expert there, just send us your CV!

  • Kernel Developer.

    We are looking for candidates that either have experience with kernel development or they can ramp-up quickly to contribute to linux kernel drivers. Although no specific subsystem is mentioned in the job position, I encourage you to apply if you have DRM experience and/or ARM[64]/MIPS related knowledge.

Graphics technologies are not your cup of tea? We have positions in other areas like browsers, compilers, multimedia… Just check out our job offers on our website!

What we offer is to work in an open-source consultancy in which you can participate equally in the management and decision-making process of the company via our democratic, consensus-based assembly structure. As all of our positions are remote-friendly, we welcome submissions from any part of the world.

Are you still a student? We have launched the 2021 edition of our Coding Experience program. Check it out!

Igalia's office

March 03, 2021 01:21 PM

February 28, 2021

Manuel Rego

:focus-visible in WebKit - February 2021

One month has passed since the previous report so it’s time for a status update.

As you probably already know, Igalia is working on the implementation of :focus-visible in WebKit, a project that is being sponsored by many people and organizations through the Open Prioriziatation campaing. We’ve reached 84% of the goal, thanks you all! 🎉 And if you haven’t contributed yet, you’re still in time to do it if you believe this work is important for you.

The main highlight for February is that initial work has started to land in WebKit, though some important bits are still under review.

Spec issues

There were some open discussions on the spec side regarding different topics, let’s review here the status of them:

  • :focus-visible on <select> element: After some discussion on the CSS Working Group (CSSWG) issue it was agreed to remove the <select> element from the tests and each browser could decide whether to match :focus-visible when it’s clicked, as there was not a clear agreement regarding how to interpret if it allows or not keybaord input.

    In any case Chromium has still a bug on elements that have a popup (like a <select>). When you click them they match :focus-visible, but they don’t show the focus indicator (because they want to avoid showing two focus indicators, one on the <select> and another one on the <option>). As it’s not showing a focus indicator, it shouldn’t match :focus-visible in that situation actually.

  • :focus-visible on script focus: The spec is not totally clear about when an element should match (or not) :focus-visible after script focus. There has been a nice proposal from Alice Boxhall and Brian Kardell on the CSSWG issue, but when we discussed this in the CSSWG it was decided that was not the proper forum for these discussions. This was because the CSSWG has defined that :focus-visible should match when the browser shows a focus indicator to the user, but it doesn’t care when it’s actually showed or not. That definition is very clear, and despite that each browser shows the focus indicator (or not) in different situations, the definition is still correct.

    Currently the list of heuristics on the spec are not normative, they’re just hints to the browsers about when to match :focus-visible (when to show a focus indicator). But I believe it’d be really nice to have interoperability here, so the people using this feature won’t find weird problems here and there. So the suggestion from the CSSWG was to discuss this in the HTML spec directly, proposing there a set of rules about when a browser should show a focus indicator to the user, those rules would be the current heuristics on the :focus-visible spec with some slight modifications to cover the corner cases that have been discussed. Hopefully we can reach an agreement between the different parties, and manage to define this properly on the HTML spec, so all the implementations can be interoperable on this regard.

    I believe we need to dig deeper in the specific case of script focus, as I’m not totally sure how some scenarios (e.g. blur() before focus() and things like that) should work. For that reason I worked on a set of tests trying to clarify the different situations when the browser should show or not a focus indicator. These need to be discussed with more people to see if we can reach an agreement and prepare some spec text for HTML.

  • :focus-visible and Shadow DOM: Another topic already explained in the previous report. My proposal to the CSSWG was to avoid matching :focus-visible on the ShadowRoot when some element in the Shadow Tree has the focus, in order to avoid having two focus indicators.

    There has been raised a concern, as this would allow to guess if an element is or not a ShadowRoot by focusing it via script and then checking if it matches or not :focus-visible (but ShadowRoot elements shouldn’t be observable). However that’s already possible in WebKit that currently uses :-webkit-direct-focus in the default User Agent style sheet, to avoid the double focus indicator in this case. In WebKit you can focus via script an element and check if it has or not an outline to determine if it’s a ShadowRoot.

    Anyway like in the previous case, this would be part of the heuristics so according to CSSWG’s suggestion, this should be discussed on the HTML spec directly.

Default User Agent style sheet

Early this month I landed a patch to start using :focus-visible in Chromium User Agent style sheet, this is included in version 90. 🚀 This means that from that version on you won’t see an outline when you click on a <div tabindex="0">, only when you focus it with the keyboard. Also the hack :focus:not(:focus-visible) won’t be needed anymore (actually it has been removed from the spec too).

In addition, Firefox is also using :focus-visible on their User Agent style sheet since version 87.

More about tests

During this month there has been still some extra work on the tests. While I was implementing things on WebKit I realized about some minor issues in the tests that have been fixed along the way.

I also found out some limitations of WebKit with regard to testdriver.js support for simulating keyboard inputs. Some of the :focus-visible tests use the method test_driver.send_keys() to send keys like Control or Enter. I added support for them on WebKit. Apart from that, I fixed how modifier keys are identified in WebKitGTK and WPE, as they were not following other browsers exactly (e.g. event.ctrlKey was not set on keydown event, only on keyup).

WebKit implementation

And the most important part, the actual WebKit implementation has been moving forward during this month. I managed to have a patch that passed all the tests, and split it a little bit in order to merge things upstream.

The first patch that just do the parsing of the new pseudo-class and adds a experimental flag has already landed.

Now a second patch is under review. It originally contained the whole implementation that passes all the tests, but due to some discussion on the script focus issues, that part has been removed. Anyway the review is ongoing and hopefully it’ll land soon and you could start testing it in the WebKit nightlies.

Some numbers

Like in the previous post, let’s again review the numbers of what has been done during in this project:

  • 20 PRs merged in WPT (7 in February).
  • 14 patches landed in WebKit (9 in February).
  • 7 patches landed in Chromium (3 in February).
  • 1 PR merged in Selectors spec (1 in February).
  • 1 PR merged in HTML spec (1 in February).

Next steps

First thing is to get the main patch landed in WebKit and verify that things are working as expected on the different platforms.

Another issue we need to solve is to reach an agreement on how script focus should work regarding :focus-visible, and then get that implemented in WebKit covering all the cases.

After that we could request to enable the feature by default in WebKit. Once that’s done, we could discuss the possibility to change the default User Agent style sheet to use :focus-visible too.

There are some interop work pending to do. A few things are failing on Firefox and we could try to help to fix them. Also some weird issues with <select> elements in Chromium that might need some love too. And depending on the ongoing spec discussions there might be some changes needed or not in the different browsers. Anyway, we might find the time or not to do this, let’s see how things evolve in the next weeks.

Big thanks to everyone that has contributed to this project, you’re all wonderful for letting us work on this. 🙏 Stay tuned for more updates in the future!

February 28, 2021 11:00 PM

February 22, 2021

Eric Meyer

First Week at Igalia

The first week on the job at Igalia was… it was good, y’all.  Upon formally joining the Support Team, got myself oriented, built a series of tests-slash-demos that will be making their way into some forthcoming posts and videos, and forked a copy of the Mozilla Developer Network (MDN) so I can start making edits and pushing them to the public site.  In fact, the first of those edits landed Sunday night!  And there was the usual setting up accounts and figuring out internal processes and all that stuff.

A series of tests of the CSS logical property ';block-border'.
Illustrating the uses of border-block.

To be perfectly honest, a lot of my first-week momentum was provided by the rest of the Support Team, and setting expectations during the interview process.  You see, at one point in the past I had a position like this, and I had problems meeting expectations.  This was partly due to my inexperience working in that sort of setting, but also partly due to a lack of clear communication about expectations.  Which I know because I thought I was doing well in meeting them, and then was told otherwise in evaluations.

So when I was first talking with the folks at Igalia, I shared that experience.  Even though I knew Igalia has a different approach to management and evaluation, I told them repeatedly, “If I take this job, I want you to point me in a direction.”  They’ve done exactly that, and it’s been great.  Special thanks to Brian Kardell in this regard.

I’m already looking forward to what we’re going to do with the demos I built and am still refining, and to making more MDN edits, including some upgrades to code examples.  And I’ll have more to say about MDN editing soon.  Stay tuned!


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at February 22, 2021 09:32 PM

February 15, 2021

Eric Meyer

First Day at Igalia

Today is my first day as a full-time employee at Igalia, where I’ll be doing a whole lot of things I love to do: document and explain web standards at MDN and other places, participate in standards work at the W3C, take on some webmaster duties, and play a part in planning Igalia’s strategy with respect to advancing the web.  And likely other things!

I’ll be honest, this is a pretty big change for me.  I haven’t worked for anyone other than myself since 2003.  But the last time I did work for someone else, it was for Netscape (slash AOL slash Time Warner) as a Standards Evangelist, a role I very much enjoyed.  In many ways, I’m taking that role back up at Igalia, in a company whose values and structure are much more in line with my own.  I’m really looking forward to finding out what we can do together.

If the name Igalia doesn’t ring any bells, don’t worry: nobody outside the field has heard of them, and most people inside the field haven’t either.  So, remember when CSS Grid came to browsers back in 2017?  Igalia did the implementation that landed in Safari and Chromium.  They’ve done a lot of other things besides that — some of which I’ll be helping to spread the word about — but it’s the thing that web folks will be most likely to recognize.

This being my first day and all, I’m still deep in the setting up of logins and filling out of forms and general orienting of oneself to a new team and set of opportunities to make a positive difference, so there isn’t much more to say besides I’m stoked and planning to say more a little further down the road.  For now, onward!


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at February 15, 2021 05:23 PM

February 13, 2021

Eleni Maria Stea

About VK_EXT_sample_locations

More than a year ago, I had worked on the implementation of VK_EXT_sample_locations extension for anv, the Intel Vulkan driver of mesa3D, as part of my work for Igalia. The implementation had been reviewed (see acknowledgments) at the time, but as the conformance tests that were available back then had to be improved and that … Continue reading About VK_EXT_sample_locations

by hikiko at February 13, 2021 09:47 PM

February 10, 2021

Danylo Piliaiev

Freedreno now supports OpenGL 3.3 on A6XX

I recently joined Igalia and as a way to familiarize myself with Adreno GPUs it was decided to get Freedreno up to OpenGL 3.3.

Just recently, Freedreno exposed only OpenGL 3.0. The big jump in version required only two small extensions and a few fixes to get rid of most crashes in Piglit since almost all features were already supported.

ARB_blend_func_extended

I decided to start with dual-source-blending. Fittingly, it was the cause of the first issue I investigated in Mesa two and a half years ago =)

Turnip already had it implemented so it was a good first task. Most of the time was spent getting myself familiarized with Freedreno and squashing the hangs that happened due to the lack of experience with this driver. And shortly I have a working MR

freedreno/a6xx: add support for dual-source blending (!7708)

However, Freedreno didn’t have Piglit tests running on CI and I missed the failure of arb_blend_func_extended-fbo-extended-blend-pattern_gles2 test. I did test the _gles3 version though. Thus the issue with gles2 was not found immediately.

What’s different in the gles2 test? It uses gl_SecondaryFragColorEXT for the second color of dual-source blending. gl_FragColor and gl_SecondaryFragColorEXT are different from the other ways to specify color in that they broadcast the color to all enabled draw buffers. In Mesa they both translated to slot FRAG_RESULT_COLOR, which in Freedreno didn’t expect the second source for blending. The solution was simple, if a shader has a dual-source blending - remap FRAG_RESULT_COLOR to FRAG_RESULT_DATA0 + dual_src_index.

freedreno/ir3: remap FRAG_RESULT_COLOR to _DATA* for dual-src blending (!8245)

While looking at how other drivers handle FRAG_RESULT_COLOR I saw that Zink also has a similar issue but in a NIR lowering path, so I fixed it too since it was simple enough.

nir/lower_fragcolor: handle dual source blending (!8247)

ARB_shader_stencil_export

While implementing ARB_blend_func_extended I saw in Turnip that RB_FS_OUTPUT_CNTL0 also could enable stencil export:

tu_cs_emit_pkt4(cs, REG_A6XX_RB_FS_OUTPUT_CNTL0, 2);
tu_cs_emit(cs, COND(fs->writes_pos, A6XX_RB_FS_OUTPUT_CNTL0_FRAG_WRITES_Z) |
               COND(fs->writes_smask, A6XX_RB_FS_OUTPUT_CNTL0_FRAG_WRITES_SAMPMASK) |
               COND(fs->writes_stencilref, A6XX_RB_FS_OUTPUT_CNTL0_FRAG_WRITES_STENCILREF) |
               COND(dual_src_blend, A6XX_RB_FS_OUTPUT_CNTL0_DUAL_COLOR_IN_ENABLE));

Meanwhile, in Freedreno there was just a cryptic 0xfc000000:

OUT_PKT4(ring, REG_A6XX_SP_FS_OUTPUT_CNTL0, 1);
OUT_RING(ring, A6XX_SP_FS_OUTPUT_CNTL0_DEPTH_REGID(posz_regid) |
        A6XX_SP_FS_OUTPUT_CNTL0_SAMPMASK_REGID(smask_regid) |
        COND(fs_has_dual_src_color,
                A6XX_SP_FS_OUTPUT_CNTL0_DUAL_COLOR_IN_ENABLE) |
        0xfc000000);

Which I was quickly told to be a shifted register r63.x, which has a value of 0xfc and is used to signify an unused value.

Again, Turnip already supported stencil export and compiler had it wired up; it’s hard to have a simpler extension to implement.

freedreno/a6xx: add support for ARB_shader_stencil_export (!7810)

Running tests for GL 3.1 - 3.3

After enabling ARB_blend_func_extended and ARB_shader_stencil_export Freedreno had everything for a jump to GL 3.3. It was a time to run tests.

KHR-GL31.texture_size_promotion.functional

First observation was that the test passes with FD_MESA_DEBUG=nogmem, next I compared traces with and without nogmem and saw the difference in a texture layout (I should have just printed the layouts with FD_MESA_DEBUG=layout). It didn’t give me an answer, so I checked whether I could bisect the failure. To my delight it was bisectable

freedreno: reduce extra height alignment in a6xx layout (e4974852)

The change is only one line of code and without the context first thing which comes to mind is that maybe the alignment is wrong:

         if (level == mip_levels - 1)
-            nblocksy = align(nblocksy, 32);
+            height = align(nblocksy, 4);

         uint32_t nblocksx =

But it’s even more trivial, nblocksy was erroneously changed to height, and height just wasn’t used afterwards which hid the issue. Thus it was fixed in

freedreno/a6xx: Fix typo in height alignment calculation in a6xx layout (!7792)

gl-3.2-layered-rendering-clear-color-all-types 2d_multisample_array single_level

It was quickly found that clearing a framebuffer in Freedreno didn’t take into account that framebuffer could have several layers, so only the first one was cleared.

Initially, I thought that we would need a number_of_layers draw calls to clear all layers, but looking at a generic blitter in gallium suggested a better answer - instanced draw where each instance draw a fullscreen quad into a distinct layer. Which could be accomplished just by:

gl_Layer = gl_InstanceID;

in the vertex shader. However, this required the support of GL_AMD_vertex_shader_layer which Freedreno didn’t have…

Implementing GL_AMD_vertex_shader_layer

Fortunately, Turnip again had it implemented. As with stencil export it was just a matter of finding a register of VARYING_SLOT_LAYER and plugging it into respective control structure.

freedreno/a6xx: add support for gl_Layer in vertex shader

Simple enough? Yes! However, it failed the only tests available for the extension:

amd_vertex_shader_layer-layered-2d-texture-render    // Fails
amd_vertex_shader_layer-layered-depth-texture-render // Crashes

The crash when clearing a depth texture is a known one, but why 2d-texture-render fails? Let’s look at the output:

layered-2d-texture-render without noubwc
amd_vertex_shader_layer-layered-2d-texture-render

First layer and its mipmaps are cleared, but on other layers most mipmaps are not! Mipmaps are nowhere near gl_Layer, so it seams that the implementation is indeed correct, however the single test we expected to pass - fails. Which is bad because we don’t have other tests to test gl_Layer.

Fixing amd_vertex_shader_layer-layered-2d-texture-render

A good starting point is always to check whether one of FD_MESA_DEBUG options helps. In this case noubwc changed the result of the test to:

layered-2d-texture-render with noubwc
amd_vertex_shader_layer-layered-2d-texture-render with noubwc

Not by a lot but still a lead. Something wrong with image layouts…

I searched for similar tests for Vulkan and found them in vktDrawShaderLayerTests.cpp, the tests passed on Turnip which told me that most likely the issue is not in the common code of layout calculations. After some hacking of the tests I made one close enough to amd_vertex_shader_layer-layered-2d-texture-render. However, directly comparing traces between GL and Vulkan isn’t productive. So I decided to see what changes with different mip levels in Freedreno and Turnip separately.

So I did make 4 tests:

  • GL with clearing level 1
  • GL with clearing level 2
  • VK drawing to level 1
  • VK drawing to level 2

With that comparing “GL level 1”” with “GL level 2” and “VK level 1”” with “VK level 2” yielded just a few differences. For GL the only difference which could had been relevant was:

Level 1:

RB_MRT[0].ARRAY_PITCH: 8192

Level 2:

RB_MRT[0].ARRAY_PITCH: 4096

However in VK this value didn’t change, there was no difference in traces barring window/scissor/blit sizes!

RB_MRT[0].ARRAY_PITCH: 45056

In VK it was just a constant 45056 regardless of mip level. Hardcoding the value in Freedreno fixed the test. After that it was a matter of comparing how the field is calculated for Turnip and Freedreno resulting in:

freedreno/a6xx: fix array pitch for layer-first layouts

The layered clear

With the above changes the layered clear was simple enough:

freedreno/a6xx: support layered framebuffers in blitter_clear

gl-3.2-minmax or bumping the maximum count of varyings

Before, Freedreno supported only 16 vec4 varyings or 64 components between shader stages. GL3.2 on the other hand mandates the minimum of 128 components. After a couple of trivial fixes - I thought it would be easy, but Marek Olšák pointed out that I didn’t cover all cases with my tests. And of course one of them, the passing of varyings between VS and TCS, hanged the GPU.

I ran a few tests only changing the varyings count and observed that GPU hanged starting from 17 vec4. By comparing the traces I didn’t find anything suspicious in shaders, so I decided to check a few states that depended on varyings count. After poking a few of them I found that setting SP_HS_UNKNOWN_A831 above 64 resulted in a guaranteed hang.

In Turnip SP_HS_UNKNOWN_A831 had a special case for A650 which made me a bit suspicious. However, using A650’s formula for A630 didn’t change anything. It was time to see what blob driver does, so I wrote a script to iterate from 1 to 32 vec4, push test on device, run it, copy trace back, extract A831 value and append to a csv. Here is the result:

vec4 count A831 (Hex) A831 (Dec) PC_HS_INPUT_SIZE
1 0x10 16 0xc
2 0x14 20 0xf
3 0x18 24 0x12
4 0x1c 28 0x15
5 0x20 32 0x18
6 0x24 36 0x1b
7 0x28 40 0x1e
8 0x2c 44 0x21
9 0x30 48 0x24
10 0x34 52 0x27
11 0x38 56 0x2a
12 0x3c 60 0x2d
13 0x3f 63 0x30
14 0x40 64 0x33
15 0x3d 61 0x36
16 0x3d 61 0x39
17 0x40 64 0x3c
18 0x3f 63 0x3f
19 0x3e 62 0x42
20 0x3d 61 0x45
21 0x3f 63 0x48
22 0x3d 61 0x4b
23 0x40 64 0x4e
24 0x3d 61 0x51
25 0x3f 63 0x54
26 0x3c 60 0x57
27 0x3e 62 0x5a
28 0x40 64 0x5d
29 0x3c 60 0x60
30 0x3e 62 0x63
31 0x40 64 0x66

So A831 is indeed capped at 64 and after the first time A831 reaches 64, the relation ceases to be linear. Ugh…

Without having any good ideas I posted the results on Gitlab asking Jonathan Marek, who added a special case for A650 some time ago, for help. The previous A650 formula was:

prims_per_wave = wavesize // tcs_vertices_out
total_size = output_size * patch_control_points * prims_per_wave
unknown_a831 = DIV_ROUND_UP(total_size, wavesize)

After looking at the table above Jonathan Marek suggested the following formula:

prims_per_wave = wavesize // tcs_vertices_out
while True:
    total_size = output_size * patch_control_points * prims_per_wave
    unknown_a831 = DIV_ROUND_UP(total_size, wavesize)
    if unknown_a831 <= 0x40:
        break
    prims_per_wave -= 1

And it worked! What does the new formula mean is that there is A831 * wavesize available space per wave for varyings; the more varyings we use in a shader the less threads the wave will have. For TCS we use wave size of 64 so the total size available would be

64 (wave size) * 64 (max A831) * 4 = 16k

freedreno/a6xx: bump varyings limit (!7917)

Bumping OpenGL support to 3.3

There still was a considerable amount of Piglit failures but they were either already known or not so important to prevent the the version increase. Which was done in

freedreno: Enable GLSL 3.30, updating us to GL 3.3 contexts (!8270)

by Danylo Piliaiev at February 10, 2021 10:00 PM

February 09, 2021

Andrés Gómez

Replaying 3D traces with piglit


If you don’t know what is traces based rendering regression testing, read the appendix before continuing.


The Mesa community has witnessed an explosion of the Continuous Integration interest in the last two years.

In addition to checking the proper building of the project, integrating the testing of its functional correctness has become a priority. The user space graphics drivers exhibit a wide variety of types of tests and test suites. One kind of those tests are the traces based rendering regression testing.

The public effort to add this kind of tests into Mesa’s CI started with this mail from Alexandros Frantzis.

At some point, we had support for replaying OpenGL, Vulkan and D3D11 traces using apitrace, RenderDoc and GFXReconstruct with the in-tree tool tracie. However, it was a very custom solution made to the needs of Mesa so I proposed to move this codebase and integrate it into the piglit test suite. It was a natural step forward.

This is how replayer was born into piglit.

replayer

The first step to test a trace is, actually, obtaining a trace. I won’t go into the details about how to create one from scratch. The process is well documented on each of the tools listed above. However, the Mesa community has been collecting publicly distributable traces for a while and placing them in traces-db whose CI is copying them to Freedesktop.org’s MinIO instance.

To make things simple, once we have built and installed piglit, if we would like to test an apitrace created OpenGL trace, we can download from there with:

$ replayer.py download \
 	 --download-url https://minio-packet.freedesktop.org/mesa-tracie-public/ \
 	 --db-path ./traces-db \
 	 --force-download \
 	 glxgears/glxgears-2.trace

The parameters are self explanatory. The downloaded trace will now exist at ./traces-db/glxgears/glxgears-2.trace.

The next step will be to dump an image from the trace. Since it is a .trace file we will need to have apitrace installed in the system. If we do not specify the call(s) from which to dump the image(s), we will just get the last frame of the trace:

$ replayer.py dump ./traces-db/glxgears/glxgears-2.trace

The dumped PNG image will be at ./results/glxgears-2.trace-0000001413.png. Notice, the number suffix is the snapshot id from the trace.

Dumping from a trace may result in a range of different possible images. One example is when the trace makes use of uninitialized values, leading to undefined behaviors.

However, since the original aim was performing pre-merge rendering regression testing in Mesa’s CI, the idea is that replaying any of the provided traces would be quick and the dumped image will be consistent. In other words, if we would dump several times the same frame of a trace with the same GFX stack, the image will always be the same.

With this precondition, we can test whether 2 different images are the same just by doing a hash of its content. replayer can obtain the hash for the generated dumped image:

$ replayer.py checksum ./results/glxgears-2.trace-0000001413.png 
f8eba0fec6e3e0af9cb09844bc73bdc8

Now, if we would build a different commit of Mesa, we could check the generated image at this new point against the previously generated reference image. If everything goes well, we will see something like:

$ replayer.py compare trace \
 	 --download-url https://minio-packet.freedesktop.org/mesa-tracie-public/ \
 	 --device-name gl-vmware-llvmpipe \
 	 --db-path ./traces-db \
 	 --keep-image \
 	 glxgears/glxgears-2.trace f8eba0fec6e3e0af9cb09844bc73bdc8
[dump_trace_images] Info: Dumping trace ./traces-db/glxgears/glxgears-2.trace...
[dump_trace_images] Running: apitrace dump --calls=frame ./traces-db/glxgears/glxgears-2.trace
// process.name = "/usr/bin/glxgears"
1384 glXSwapBuffers(dpy = 0x56060e921f80, drawable = 31457282)

1413 glXSwapBuffers(dpy = 0x56060e921f80, drawable = 31457282)

error: drawable failed to resize: expected 1515x843, got 300x300
[dump_trace_images] Running: eglretrace --headless --snapshot=1413 --snapshot-prefix=./results/trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace- ./blog-traces-db/glxgears/glxgears-2.trace
Wrote ./results/trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace-0000001413.png

OK
[check_image]
    actual: f8eba0fec6e3e0af9cb09844bc73bdc8
  expected: f8eba0fec6e3e0af9cb09844bc73bdc8
[check_image] Images match for:
  glxgears/glxgears-2.trace

PIGLIT: {"images": [{"image_desc": "glxgears/glxgears-2.trace", "image_ref": "f8eba0fec6e3e0af9cb09844bc73bdc8.png", "image_render": "./results/trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace-0000001413-f8eba0fec6e3e0af9cb09844bc73bdc8.png"}], "result": "pass"}

replayer‘s compare subcommand is the one spitting a piglit formatted test expectations output.

Putting everything together

We can make the whole process way simpler by passing the replayer a YAML tests list file. For example:

$ cat testing-traces.yml
traces-db:
  download-url: https://minio-packet.freedesktop.org/mesa-tracie-public/

traces:
  - path: gputest/triangle.trace
    expectations:
      - device: gl-vmware-llvmpipe
        checksum: c8848dec77ee0c55292417f54c0a1a49
  - path: glxgears/glxgears-2.trace
    expectations:
      - device: gl-vmware-llvmpipe
        checksum: f53ac20e17da91c0359c31f2fa3f401e
$ replayer.py compare yaml \
 	 --device-name gl-vmware-llvmpipe \
 	 --yaml-file testing-traces.yml 
[check_image] Downloading file gputest/triangle.trace took 5s.
[dump_trace_images] Info: Dumping trace ./replayer-db/gputest/triangle.trace...
[dump_trace_images] Running: apitrace dump --calls=frame ./replayer-db/gputest/triangle.trace
// process.name = "/home/anholt/GpuTest_Linux_x64_0.7.0/GpuTest"
397 glXSwapBuffers(dpy = 0x7f0ad0005a90, drawable = 56623106)

510 glXSwapBuffers(dpy = 0x7f0ad0005a90, drawable = 56623106)


/home/anholt/GpuTest_Linux_x64_0.7.0/GpuTest
[dump_trace_images] Running: eglretrace --headless --snapshot=510 --snapshot-prefix=./results/trace/gl-vmware-llvmpipe/gputest/triangle.trace- ./replayer-db/gputest/triangle.trace
Wrote ./results/trace/gl-vmware-llvmpipe/gputest/triangle.trace-0000000510.png

OK
[check_image]
    actual: c8848dec77ee0c55292417f54c0a1a49
  expected: c8848dec77ee0c55292417f54c0a1a49
[check_image] Images match for:
  gputest/triangle.trace

[check_image] Downloading file glxgears/glxgears-2.trace took 5s.
[dump_trace_images] Info: Dumping trace ./replayer-db/glxgears/glxgears-2.trace...
[dump_trace_images] Running: apitrace dump --calls=frame ./replayer-db/glxgears/glxgears-2.trace
// process.name = "/usr/bin/glxgears"
1384 glXSwapBuffers(dpy = 0x56060e921f80, drawable = 31457282)

1413 glXSwapBuffers(dpy = 0x56060e921f80, drawable = 31457282)


/usr/bin/glxgears
error: drawable failed to resize: expected 1515x843, got 300x300
[dump_trace_images] Running: eglretrace --headless --snapshot=1413 --snapshot-prefix=./results/trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace- ./replayer-db/glxgears/glxgears-2.trace
Wrote ./results/trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace-0000001413.png

OK
[check_image]
    actual: f8eba0fec6e3e0af9cb09844bc73bdc8
  expected: f8eba0fec6e3e0af9cb09844bc73bdc8
[check_image] Images match for:
  glxgears/glxgears-2.trace

replayer features also the query subcommand, which is just a helper to read the YAML files with the tests configuration.

Testing the other kind of supported 3D traces doesn’t change much from what’s shown here. Just make sure to have the needed tools installed: RenderDoc, GFXReconstruct, the VK_LAYER_LUNARG_screenshot layer, Wine and DXVK. A good reference for building, installing and configuring these tools are Mesa’s GL and VK test containers building scripts.

replayer also accepts several configurations to tweak how to behave and where to find the actual tracing tools needed for replaying the different types of traces. Make sure to check the replay section in piglit’s configuration example file.

replayer‘s README.md file is also a good read for further information.

piglit

replayer is a test runner in a similar fashion to shader_runner or glslparsertest. We are now missing how does it integrate so we can do piglit runs which will produce piglit formatted results.

This is done through the replay test profile.

This profile needs a couple configuration values. Easiest is just to set the PIGLIT_REPLAY_DESCRIPTION_FILE and PIGLIT_REPLAY_DEVICE_NAME env variables. They are self explanatory, but make sure to check the documentation for this and other configuration options for this profile.

The following example features a similar run to the one done above invoking directly replayer but with piglit integration, providing formatted results:

$ PIGLIT_REPLAY_DESCRIPTION_FILE=testing-traces.yml PIGLIT_REPLAY_DEVICE_NAME=gl-vmware-llvmpipe piglit run replay -n replay-example replay-results
[2/2] pass: 2   
Thank you for running Piglit!
Results have been written to replay-results

We can create some summary based on the results:

# piglit summary console replay-results/
trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace: pass
trace/gl-vmware-llvmpipe/gputest/triangle.trace: pass
summary:
       name: replay-example
       ----  --------------
       pass:              2
       fail:              0
      crash:              0
       skip:              0
    timeout:              0
       warn:              0
 incomplete:              0
 dmesg-warn:              0
 dmesg-fail:              0
    changes:              0
      fixes:              0
regressions:              0
      total:              2
       time:       00:00:00

Creating an HTML summary may be also interesting, specially when finding failures!

Wishlist

  • Through different backends, replayer supports running apitrace, RenderDoc and GFXReconstruct traces. We may want to support other tracing tools in the future. The dummy backend used for functional testing is a good starting point when writing a new backend.
  • The solution chosen for checking whether we detect a rendering regression is dependent on having consistent results, as said before. It’d be great if we could add a secondary testing method whenever the expected rendered image is variable. From the top of my head, using exclusion masks could be a quick single-run solution when we know which specific areas in a rendered scenario are the ones fluctuating. For more complex variations, a multi-run based solution seems to be the best option. EzBench has a great statistical approach for this!
  • The current syntax of the YAML test list files implies running the compare subcommand with the default behavior of checking against the last frame of the tested trace. This means figuring out which call number is the one of the last frame first. It would be great to support providing the call numbers directly in the YAML files to be able to test more than just the last frame and, additionally, cut down the time taken to run the test.
  • The HTML generated summary allows us to see the reference and generated image during a test run side to side when it fails. It’d be great to have also some easy way of checking its differences. Using Rembrandt.js could be a possible solution.

Thanks a lot to the whole Mesa community for helping with the creation of this tool. Alexandros Frantzis, Rohan Garg and Tomeu Vizoso did a lot of the initial development for the in-tree tracie tool while Dylan Baker was very patient while reviewing my patches for the piglit integration.

Finally, thanks to Igalia for allowing me to work in this.


Appendix

In 3D computer graphics we say “traces”, for short, to name the files generated by 3D APIs capturing tools which store not only the calls to the specific 3D API but also the internal state of the 3D program during the capturing process: shaders, textures, buffers, etc.

Being able to “record” the execution of a 3D program is very useful. Usually, it will allow us to replay the execution without the need of the original program from which we generated the trace, it will also allow in-depth analysis for debugging and performance optimization, it’s a very good solution for sharing with other developers, and, in some cases, will allow us to check how the replay will happen with different GPUs.

In this post, however, I focus in a specific usage: rendering regression testing.

When doing a regression test what we would do is compare a specific metric obtained by replaying the trace with a specific version of the GFX software stack against the same metric obtained from a different version of the GFX stack. If the value of the metric changes we have found a regression (or an improvement!).

To make things simpler, we would like to check changes happening just in one of the many elements of the software stack. The most relevant component is the user space driver. In particular, I care about the Mesa drivers and the GNU/Linux stack.

Mainly, there are two kinds of regression testing we can do with a trace: performance or rendering regression testing. When doing a performance one, the checked metric(s) usually are in terms of speed or memory usage. In the case of the rendering ones what we would do is comparing the rendered output at one (or many) point during the trace replay. This output, a bitmap image, is the metric that we will compare in between two different points of the Mesa driver. If the images differ, we may have found a regression; artifacts, improper colors, etc, or an enhancement, if the reference image is the one featuring any of these problems.

by tanty at February 09, 2021 08:47 AM

February 03, 2021

Eleni Maria Stea

Build WebKit/WPE on Linux/X11

There’s a lot of documentation online about building Webkit/WPE on Linux. But as most instructions are targetting embedded platforms developers, the focus is on building Webkit with Wayland using the flatpak-sdk to automate and speed up the building process. As the steps I’ve followed to build it on my X11 system and run the Webkit/WPE … Continue reading Build WebKit/WPE on Linux/X11

by hikiko at February 03, 2021 09:41 PM

Alejandro Piñeiro

v3dv status update 2021-02-03

So some months have passed since our last update, when we announced that v3dv became Vulkan 1.0 conformant. The main reason for not publishing so many posts is that we saw the 1.0 checkpoint as a good moment to hold on adding new big features, and focus on improving the codebase (refactor, clean-ups, etc.) and the already existing features. For the latter we did a lot of work on performance. That alone would deserve a specific blog post, so in this one I will summarize the other stuff we did.

New features

Even if we didn’t focus on adding new features, we were still able to add some:

  • The following optional 1.0 features were enabled: logicOp, althaToOne, independentBlend, drawIndirectFirstInstance, and shaderStorageImageExtendedFormats.
  • Added support for timestamp queries.
  • Added implementation for VK_KHR_maintenance1, VK_EXT_private_data, and VK_KHR_display extensions
  • Added support for Wayland WSI.

Here I would like to highlight that we started to get feature contributions out of the initial core of developers that created the driver. VK_KHR_display was submitted by Steven Houston, and Wayland WSI support was submitted by Ella-0. Thanks a lot for it, really appreciated! We hope that this would begin a trend of having more things implemented by the rpi/mesa community as a whole.

Bugfixing and vulkan tools

Even if the driver got conformant, we were still testing the driver with several demos and applications, and provided fixes. As a example, we got Sascha Willem’s oit (Order Independent Transparency) working:

Sascha Willem’s oit demo on the rpi4

Among those applications that we were testing, we can highlight renderdoc and gfxreconstruct. The former is a frame-capture based graphics debugger and the latter is a tool that allows to capture and replay several frames. Both tools are heavily used when debugging and testing vulkan applications. We tested that they work on the rpi4 (fixing some bugs while doing it), and also started to use them to help/guide the performance work we are doing.

Fosdem 2021

If you are interested on an overview of the development of the driver during the last year, we are going to present “Overview of the Open Source Vulkan Driver for Raspberry Pi 4” on FOSDEM this weekend (presentation details here).

Previous updates

Just in case you missed any of the updates of the vulkan driver so far:

Vulkan raspberry pi first triangle
Vulkan update now with added source code
v3dv status update 2020-07-01
V3DV Vulkan driver update: VkQuake1-3 now working
v3dv status update 2020-07-31
v3dv status update 2020-09-07
Vulkan update: we’re conformant!

by infapi00 at February 03, 2021 11:13 AM

January 27, 2021

Manuel Rego

:focus-visible in WebKit - January 2021

Let’s do a small introduction as this is a kind of special post.

As you might already know, last summer Igalia launched the Open Prioritization experiment, and :focus-visible in WebKit was the winner according to the pledges that the different projects got. Now it has moved into collecting funds stage, so far we’ve reached 80% of the goal and Igalia has already started to work on this. If you are interested and want to help sponsoring this work, please visit the project page at Open Collective.

In our regular client projects in Igalia, we provide periodic progress reports about the status of tasks and next plans. This blog post is a kind of monthly report, but this time the project has many customers, so it looks like this is a better format to share information about the status of things. Thank you all for supporting us in this development! 🙏

Understanding :focus-visible

Disclaimer: This is not a blog post explaining how :focus-visible works or the implications it has, you can read other articles if you’re looking for that.

First things first, my initial thoughts were that :focus-visible was a pseduo-class which would match an element when the browser natively shows a focus indicator (focus ring, outline) when an element of a page is focused. And that’s more or less what the spec says on the first sentence:

The :focus-visible pseudo-class applies while an element matches the :focus pseudo-class and the user agent determines via heuristics that the focus should be made evident on the element.

They key part here is that native behavior doesn’t show the focus indicator on purpose in some situations when the :focus pseudo-class matches, mainly because usability studies indicate that showing it in all the cases is not what the user expects and wants. Before having :focus-visible the web authors have not way to access the same criteria to style the focus indicator only when it’s going to be shown natively, and still keep the website being accessible.

Apart from that the spec has a set of heuristics that despite being non-normative, it looks like all implementations are following them. Summarizing them briefly they’d be something like:

  • If you use the mouse to focus an element it won’t match :focus-visible.
  • If you use the keyboard it’ll match :focus-visible.
  • Elements that support keyboard input (like <input> or contenteditable) always match :focus-visible.
  • When a script focuses a new element it’ll match or not :focus-visible depending on the previous active element.

This is just a quick & dirty summary, please read the spec for all the details. There have been years of research around these topics (how focus should work or not on the different use cases, what are the users and accessibility needs, how websites are managing focus, etc.) and these heuristics are somehow the result of all that work.

:focus-visible in the default UA style sheet

At this point it looks like we can more or less understand what :focus-visible is about. So let’s start playing with it. The definition seems very clear, but testing things in the current implementations (Chromium and Firefox) you might find some unexpected situations.

Let’s use a very basic example:

<style>
  :focus-visible { background: lime; }
</style>
<div tabindex="0">Focus me.</div>
#example1:focus-visible { background: lime; } #warning1 { display: none; color: red; font-size: 75%; font-style: italic; } @supports not (selector(:focus-visible)) { #warning1 { display: block; } }
WARNING: :focus-visible is not supported on your browser.
Focus me.

If you focus the <div> with a mouse click, :focus-visible doesn’t match per spec, so in this case the background doesn’t become green (if you use the keyboard to focus it will match :focus-visible and the background would be green). This works the same in Chromium and Firefox, but Chromium (despite the element doesn’t match :focus-visible) shows a focus indicator. Somehow the first spec definition is already not working as expected on Chromium… The issue here is that Chromium still uses :focus { outline: auto; } in the default UA style sheet, and the element matches :focus after the mouse click, that’s why it’s showing a focus indicator while not matching :focus-visible.

Actually this was already on the spec, but Chromium is not following that yet:

User agents should also use :focus-visible to specify the default focus style, so that authors using :focus-visible will not also need to disable the default :focus style.

There was already a related CSSWG issue on the topic, as the spec currently suggests the following code:

:focus:not(:focus-visible) {
  outline: 0;
}

This works as a kind of workaround for this issue, but if the default UA style sheet uses :focus-visible that won’t be needed.

Anyway, I’ve reported the Chromium bug and created WPT tests, during the tests review Emilio Cobos realized that this needed a change on the HTML spec and he wrote a PR to update it. After some discussion with Alice Boxhall the HTML change and the tests were approved and merged. I even was brave enough to write a patch to change this in Chromium which is still under review and needs some extra work.

The tests

WebKit is the third browser engine adding support for :focus-visible so the first natural step was to import the WPT tests related to the feature. This looked like something simple but it ended up needing some work to improve the tests.

There was a bunch of :focus-visible tests already in the WPT suite, but they need some love:

  • Some parts of the spec were not covered by tests, so I added some new tests.
  • Some tests were passing in WebKit, even when there was not an implementation yet, so I modified them to fail if there’s no :focus-visible support.

Then I imported the tests in WebKit and I discovered a bug related to focus event and :focus pseudo-class. :focus pseudo-class was not matching inside the focus event handler. This is probably not important for web authors, but :focus-visbile tests were relying on that. Actually this had been fixed in Chromium more than 5 years ago, so first I moved to WPT the Chromium internal test and used it to fix the problem in WebKit.

Once the tests were imported in WebKit, the problem was that a bunch of them were timing out in Mac platforms. After investigating the issue I realized that it’s because focus event is not dispatched when you click on a button in Mac. Digging deeper I found this old bug with lots of discussions on the topic, it looks like this is done to keep alignment with the native platform behavior, and also in order to avoid showing a focus indicator. Even Firefox has the same behavior on Mac. However, Chromium always dispatch the event independently of the platform. This makes that some of the tests don’t work automatically on Mac, as they wait for a focus event that is never dispatched. Anyway maybe once :focus-visible is implemented, it could be rediscussed the possibility of modifying this behavior, thought it might be not possible anyway. In any case, WebKitGTK port, the one I’m using for the development, does trigger the focus event in this case; and I’ve also changed WPE port to do the same (maybe Windows port will follow too).

One more thing about the tests, lots of these :focus-visible tests use testdriver.js to simulate user actions. For example for clicking an element they use test_driver.click(element), however that simple instruction is causing some kind of memory leak on Chromium when running the tests. The actual Chromium bug hasn’t been fixed yet, but I landed some workarounds that prevent the issue in these tests (waiting for the promise to be resolved before marking the test as done).

Status of WPT :focus-visible tests in the different browsers Status of WPT :focus-visible tests in the different browsers

To close the tests part, you can check the status in wpt.fyi, most of them are passing in all implementations which is great, but there are some interoperability issues that we’ll review next.

Interop issues

As I mentioned the wpt.fyi website helps to easily identify the interop issues between the different implementations.

  • :focus-visible on the default UA style sheet: This has been already commented before, but this is the reason why Chromium fails focus-visible-018.html test. Firefox fails focus-visible-017.html because the default UA style sheet mentions outline: auto, but Firefox uses a dotted outline.

  • :focus-visible on <select> element: There’s a Firefox failure on focus-visible-002.html because it doesn’t match :focus-visible when you click a <select> element. I opened a CSSWG issue to discuss this, and I initially thought that the agreement was that Firefox behavior is the right one. So I did a patch to change Chromium’s behavior and update the tests, but during the review I was pointed to a Chromium bug about this topic that was closed as WONTFIX, the reason is that when you click a <select> element you can type letters to select the option from the keyboard. Right now the discussion has been reopened and we’ll need to wait for the final resolution on the topic, to see which is the right implementation.

  • Keyboard interaction once an element is focused: This is tested by focus-visible-007.html. The example here is that you click an element to focus it, initially the element doesn’t match :focus-visible but then you use the keyboard (for example you type a letter), in that situation Chromium will start matching :focus-visible while Firefox won’t. The spec is quite explicit on the topic so it looks like a Firefox bug:

    If the user interacts with the page via the keyboard, the currently focused element should match :focus-visible (i.e. keyboard usage may change whether this pseudo-class matches even if it doesn’t affect :focus).

  • Programmatic focus and :focus-visible: What should happen with :focus-visible when the website uses element.focus() from a script to move the focus? The spec has some heuristics that depend on if the active element before focus() is called was matching (or not) :focus-visible. But I’ve opened a CSSWG issue to discuss what should happen when there’s no active element. The discussion is still ongoing and depending on that there might be changes in the current implementations. Right now there are some subtle differences between Chromium and Firefox here.

:-webkit-direct-focus?

Probably you don’t know what’s that, but it’s somehow related to :focus-visible so I believe it’s worth to mention it here.

WebKit is the browser that supports better :focus pseudo-class behavior on Shadow DOM (see the WPT tests results). The issue here is that the ShadowRoot should match :focus if some of the descendants are focused, so if you have an <input> element in the Shadow Tree, and you focus it, you’ll have 2 elements matching :focus the ShadowRoot and the <input>.

<style>
  #host {  padding: 1em; background: lightgrey; }
  #host:focus { background: lime; }
</style>
<div id="host"></div>
<script>
  shadowRoot = host.attachShadow(
    {mode: 'open', delegatesFocus: true});
  shadowRoot.innerHTML =
    '<input value="Focus me">';
</script>
#example2host { padding: 1em; background: lightgrey; } #example2host:focus { background: lime; }

In Chromium if you use delegatesFocus=true in element.attachShadow(), and you have an example like the one described above, you’ll get two focus indicators, one in the ShadowRoot and one in the <input>. Firefox doesn’t match :focus in the ShadowRoot so the issue is not present there.

WebKit matches :focus independently of delegatesFocus value (which is the right behavior per spec), so it’d be even more common to have a situation of getting two focus indicators. To avoid that WebKit introduced :-webkit-direct-focus pseudo-class, that is not web exposed, but it’s used in the default UA style sheet to avoid this bad effect of having a focus indicator on the ShadowRoot.

I believe :focus-visible spec should describe that behavior regarding how it works on ShadowRoot so it doesn’t match on those situations. That way WebKit could get rid of :-webkit-direct-focus and use :focus-visible once it’s implemented. I’ve reported a CSSWG issue to discuss this topic.

WIP implementation

So far I haven’t talked about the implementation at all, but the reason is that all the previous work is required in order to be able to do a proper implementation, with good quality and that is interoperable between the different browsers. :focus-visbile is a new feature, and despite all the interop mess regarding how focus works in the different browsers and platforms, we should aim to have a :focus-visible implementation as much interoperable as possible.

Despite all this related work, I’ve also found some time to work on a patch. It’s still not ready to be sent upstream but it’s already doing some things and passing some of the WPT tests. Of course several things are still missing, but next you can see quick screen recording with :focus-visible working on WebKit.

:focus-visible example running on WebKitGTK MiniBrowser

Some numbers

I know this is not really relevant, but it helps to get a grasp on what has been happening during this month:

  • 3 CSSWG issues reported.
  • 13 PRs merged in WPT.
  • 5 patches landed in WebKit.
  • 4 patches landed in Chromium.
  • And many discussions with different people, special thanks to Alice and Emilio that have been really helpful.

Next steps

The plan for February is to try to find an agreement on the CSSWG issues, close them, and update the WPT tests accordingly. Maybe this work could include even landing some patches on the current implementations. And of course, focus (pun intended) the effort on implementation of :focus-visible in WebKit.

I hope this blog post helps you to understand better the work that goes behind the scenes when a web platform feature is implemented, especially if you want to do it on a way that ensures browser interoperability and reduces web authors’ pain.

If you enjoyed this project update, stay tuned as there will more in the future.

January 27, 2021 11:00 PM

January 24, 2021

Brian Kardell

Optimizing The W3C Sandwiches

Optimizing The W3C Sandwiches

In which I muse about a boring, but, I think important topic of flawed W3C structures and process, in a hopefully not-boring way...

Imagine that there is a trade organization which had a marathon meeting event for its 500 members every year. It's a long meeting, and so during it, they serve sandwiches. A break, with food and peers, it seems is really conducive to the productivity. So, while it's a little non-obvious, while small and with no special 'powers' - the sandwiches are actually really important. The trade organization realizes this and decides to formalize a way to make sure we always have good sandwiches. A few weeks before they send out an email asking people:

Please tell us an ingredient which you could provide enough of for every sandwich for the next 3 years. Your ingredient may or may not be used, we are just looking for ingredient nomination statements expressing what your ingredient brings to the table. If your ingredient is chosen, we will tell you whether that is a 1, 2 or 3 year commitment.

Even if everyone doesn't really understand how inter-connectedly important the sandwiches are, they aren't entirely apathetic about them either: When it comes time to eat, and everyone's hungry after long meetings, there will definitely be opinions about the sandwiches.

BUT... At the same time, there will incredibly be very few ingredient nominations. Why? So many reasons!

Even if a person really had some vague notion that "a lot of my favorite sandwiches had pickles", their company is small, and it doesn't actually deal with pickles. After a moment or three worth of thought about "which pickles?" and "do other people even like pickles?" and "how would I ask my boss for money to offer 1000 pickles... maybe" - this is quickly abandoned as not really worth the time. Each person receiving this email knows there are 499 other organizations, and that what we need are really just 4 ingredients! Someone else - with the money and food connections will nominate anyways.

And so, that's what happens... Several of them are from some grocery or food service industry. Their businesses totally get the importance of the sandwiches, as well as what people largely "like" and they are structured in a way where this basically isn't a problem for them. Several of them might even discuss among themselves to make sure they're bringing complimentary things... They'll nominate things like"lettuce" or "tomatoes", or maybe "cheese". Maybe some others don't discuss, but they do a lot of sandwich adjacent work and so they still nominate some pretty general condiments that could plausibly be used on a lot of good sandwiches... maybe they offer "mustard" or "mayonnaise". But, then there's also people with other food connections who offer other things. In the end, we wind up with eggs, Vegemite, peanut butter, some gourmet cactus jelly, caviar, and butter. Then, of course - there's someone who offers some not-even-food item.

Now, with the exception of the last one - there's not really something inherently wrong with any of those ingredients - but I think we could at least agree that making a sandwich from random combinations of them is likely to produce a lot of suboptimal sandwiches.

Vote for pickles!

Everyone should have a say here, and we need to account for multiple tastes - so the way organization goes about deciding on which sandwich everyone will have is by voting for ingredients.

Now, again -- not due to apathy about the actual sandwiches, or the results of the meeting that they will effect - only about 1/5th of people will actually vote for ingredients. Why? Again, "this isn't worth my time, surely someone else can figure out sandwiches" plays in heavily. Perhaps you could even say that lack of participation an indirect recognition of what a dysfunctional way this is to make a sandwich.

I mean, who really has time to read the ingredient nomination statements, and also to filter out what might be just bull? Who wants to take the time to learn about the dietary benefits of whey, or read the back and forth debates extolling the health virtues of mayonnaise over mustard - or read some people ranting about why the tomato lobby is colluding with the spinach people, putting honest bean farmers out of business - and inevitable back and forth. Who has time to get entangled in the debate about how maybe the caviar people shouldn't even have been invited in the first place?

Still, the organization believes that members should make the decision, and, it seems there are a few ways we could ask this question. So, in early years it was asked like this:

Here are 12 ingredients, everybody pick 4, and we'll use the some of the same ingredients from last year, plus the top 4 vote getters here to make everyone sandwiches!

The initial sandwiches reviews seemed pretty good, actually. Some people even praised the first sandwiches' 'distinguish tastes'. But, slowly, a lot of people began admitting that really, they actually preferred something a little more boring, or less healthy but had been kind of afraid to admit it.

Also, it was really kind of a crap shoot. There was really nothing preventing us getting a real mishmash of random ingredients that didn't go thereto at all. Well, except for the fact that, in the end that most people were casting 4 votes, representing all of the ingredients of an actual sandwich. Nobody on purpose probably picked the combination of vegemite, mustard, butter and caviar - even if they really liked some of those. This meant that there were at least some odds that lots of "common sandwich" votes were somewhat compatible.

But really... that still involved a lot of luck, and after a few years, people began to kind of resent the sandwiches. The quality of productivity, collaboration and relationships at the event suffered.

How about a nice club sandwich?

But then, some people realized... Hey... Maybe a better way would be to try to talk to lots of people and put together a proposal for a good sandwich. The whole sandwich. Let's call them the "Whole Sandwich People". Like, can we get enough people to vote for the same way, by just saying:

Hi. We've taken the time to help make sure we had good ingredients, and looking at everything that is available and trying to balance lots of things, we're recommending this nice club sandwich - <here's why>. If that sounds good, order the club.

Well, that question is a lot easier to answer, so at least some additional people say "yes, please, that sounds good". The simple volume of several new people voting in the same direction was enough to generally make sure that those sandwiches won.

Cool! The Whole Sandwich People also cared about variety and nutrition and all of the other things and so began actively working to make sure that good, compatible, but not commonly offered ingredients got nominated. And, we got some more interesting sandwiches because of it.

The Exclusive Club Sandwich

In the end, not everyone was happy though. Some kinds of ingredients just weren't getting picked anymore. People began asking: How can our trade organization possibly represent the tastes of the people who always offers us caviar if we never pick caviar!? Your club sandwich is an exclusive club, it's biased toward the food service people!

The Whole Sandwich People suggested that this seemed like a weird take. In fact, they pointed to the fact that sandwiches had gotten more diverse, interesting and nutritious by discussing the whole sandwich. Sure, a lot of the ingredients were coming from food service companies, but that's just practical realities about other parts of the dysfunctional system.

And, yes, the Whole Sandwich People admitted, it's true, none of our sandwiches have included caviar. But, where is the bug? Couldn't the caviar people work with others to find a way to offer an acceptable Whole Sandwich with caviar? It might be plausible - it's not even that we necessarily hate caviar! The truth is, we just don't know how to make a good sandwich with it, and all of our sandwich experiences with it so far are bad. Or, if you can't offer a whole sandwich with caviar that people will order willingly.... Maybe just find something to offer other than caviar.

This, it seems, was unacceptable. It was seen as unfair. The Whole Sandwich People held too much sway over sandwich determination. So, in an attempt to improve the fairness things, the trade organization changed the question to:

Here are 9 ingredients, everybody rank them in order of preference and they we will use a complex system whereby one of the things you ranked will be counted, and we'll use the 4 winners to make everyone sandwiches

This amplifies the likelihood that things like caviar wind up our sandwiches. That's literally the feature.

Why? Because there is no single ingredient that makes a club sandwich. While everyone ranks the widely approved ingredients the inevitably have to be in an order, and only one of them will count. Meanwhile, the small but passionate group of people who are really passionate about a less common ingredient, and who are maybe even a little irked because they are never picked, just put that as #1 - which definitely counts.

Have we optimized the sandwich? Because I thought was the goal. I think, no.

The TAG/AB Sandwiches

All of this is analogy for how the W3C TAG and AB, and I've laid it out in an attempt to explain why I think this is a really silly way to do it, and why I think it should change: The "best sandwich" isn't created by talking about ingredients in isolation - you have to talk about the sandwich.

Yes, I get that sandwiches are a bit of a strained analogy. TAG isn't exactly a sandwich.... Maybe "a bunch of people going to a restaurant and attempting to pick a shared menu of 4 items based on whatever random stuff is available" is a better one. Or maybe picking a DND party is a better one still...

But, the fuzzy point is mostly the same with any of them: "Good" and "bad" are, except at real extremes not really judge-able in isolation. Sure, there are some "not-actually-even-food" items we can universally say "no thanks" to, but that's also rare, and not really how it works.

What matters in counting is "which one is best" and there is almost never a clear answer to that question without considering the whole. What matters to really knowing what people want, is also asking them in such a way that they can actually participate meaningfully.

How big of a problem is this, really?

After all, it's been a few elections now since we changed the question, and while there were some early "surprises", things seem to be generally going alright. We just had a really big TAG election, for example, and people seem generally pretty happy with the results of the actual election part. I know I am. So... Maybe this is much ado about nothing? Is it really worth our time to discuss?

I think, yes. Here's why: Good results in recent elections have been a combination of factors that won't always hold and that we shouldn't have to count on. They as much despite the process as anything (as argued below).

Plus, it's just silly.

How to address this...?

Well, that's the big question. But, one thing seem obvious to me: We need to stop insisting that we can't talk about the sandwich and instead focus on how we talk about the sandwich all the way through the process.

We Whole Sandwich People have learned that even if the way the question has asked has been changed, it's still possible to help elect whole sandwiches, it's just much harder and requires an astonishing amount of more coordination, trust and ultimately still some luck.

The point is that all of the things that we need to happen are already happening, but the process actually fights them. That's broken.

I think we need to move to a more open model, but one that moves impractical noise out of the general discussion for people whom that is mostly annoying and/or confusing. I think we need to stop being "procedurally secretive" about nominations. I think we need to enlist a willing group of people who can help cooperatively search for candidates and make sure that we're getting ingredients that are good by a number of metrics. I think this group should work to seek something like consensus on a menu of a few possible, "well-balanced meals" and provide recommendations that allow ACs to work with that by default or dig into ala carte options and more details only if they really want to.

Further, I think that rotating terms only complicates this. It's very, very simple to re-affirm seats every time - and this really lets us talk about the whole thing at once.

We don't really even have to change a single rule to accomplish any of this in practice, but currently the rules fight it at every step and make it an extraordinary difficult exercise. We should fix that. I intend to open some issues with some more specific proposals soon, but I'd love to hear anyone's thoughts or work collaborative with anyone to make those good proposals.

January 24, 2021 05:00 AM

January 20, 2021

Sergio Villar

Flexbox Cats (a.k.a fixing images in flexbox)

In my previous post I discussed my most recent contributions to flexbox code in WebKit mainly targeted at reducing the number of interoperability issues among the most popular browsers. The ultimate goal was of course to make the life of web developers easier. It got quite some attention (I loved Alan Stearns' description of the post) so I decided to write another one, this time focused in the changes I recently landed in WebKit (Safari’s engine) to improve the handling of elements with aspect ratio inside flexbox, a.

January 20, 2021 10:45 AM