Planet Igalia

October 18, 2020

Eleni Maria Stea

[OpenGL and Vulkan Interoperability on Linux] Part 7: Reusing a Vulkan vertex buffer from OpenGL

This is the 7th post on OpenGL and Vulkan Interoperability with EXT_external_objects. It’s about another EXT_external_objects use case implemented for Piglit as part of my work for Igalia‘s graphics team. In this case a vertex buffer is allocated and filled with data from Vulkan and then it’s used from OpenGL to render a pattern on … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 7: Reusing a Vulkan vertex buffer from OpenGL

by hikiko at October 18, 2020 04:47 PM

[OpenGL and Vulkan Interoperability on Linux] Part 6: We should be able to reuse a Vulkan pixel buffer from OpenGL but not to overwrite it!

This is another blog post on OpenGL and Vulkan Interoperability. It’s not really a description of a new use case as the Piglit test I am going to describe is quite similar to the previous example we’ve seen where we reused a Vulkan pixel buffer from OpenGL. This Piglit test was written because there’s an … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 6: We should be able to reuse a Vulkan pixel buffer from OpenGL but not to overwrite it!

by hikiko at October 18, 2020 11:23 AM

[OpenGL and Vulkan Interoperability on Linux] Part 5: A Vulkan pixel buffer is reused by OpenGL

This is the 5th post of the OpenGL and Vulkan interoperability series where I describe some use cases for the EXT_external_objects and EXT_external_objects_fd extensions. These use cases have been implemented inside Piglit as part of my work for Igalia‘s graphics team using a Vulkan framework I’ve written for this purpose. And in this 5th post, … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 5: A Vulkan pixel buffer is reused by OpenGL

by hikiko at October 18, 2020 10:02 AM

October 17, 2020

Eleni Maria Stea

[OpenGL and Vulkan Interoperability on Linux] Part 4: Using OpenGL to overwrite Vulkan allocated textures.

This is the 4th post on OpenGL and Vulkan Interoperability on Linux. The first one was an introduction to EXT_external_objects and EXT_external_objects_fd extensions, the second was describing a simple interoperability use case where a Vulkan allocated textured is filled by OpenGL, and the third was about a slightly more complex use case where a Vulkan … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 4: Using OpenGL to overwrite Vulkan allocated textures.

by hikiko at October 17, 2020 08:06 AM

October 16, 2020

Enrique Ocaña

Figuring out corrupt stacktraces on ARM

If you’re developing C/C++ on embedded devices, you might already have stumbled upon a corrupt stacktrace like this when trying to debug with gdb:

(gdb) bt 
#0  0xb38e32c4 in pthread_getname_np () from /home/enrique/buildroot/output5/staging/lib/libpthread.so.0
#1  0xb38e103c in __lll_timedlock_wait () from /home/enrique/buildroot/output5/staging/lib/libpthread.so.0 
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

In these cases I usually give up gdb and try to solve my problems by adding printf()s and resorting to other tools. However, there are times when you really really need to know what is in that cursed stack.

ARM devices subroutine calls work by setting the return address in the Link Register (LR), so the subroutine knows where to point the Program Counter (PC) register to. While not jumping into subroutines, the values of the LR register is saved in the stack (to be restored later, right before the current subroutine returns to the caller) and the register can be used for other tasks (LR is a “scratch register”). This means that the functions in the backtrace are actually there, in the stack, in the form of older saved LRs, waiting for us to get them.

So, the first step would be to dump the memory contents of the backtrace, starting from the address pointed by the Stack Pointer (SP). Let’s print the first 256 32-bit words and save them as a file from gdb:

(gdb) set logging overwrite on
(gdb) set logging file /tmp/bt.txt
(gdb) set logging on
Copying output to /tmp/bt.txt.
(gdb) x/256wa $sp
0xbe9772b0:     0x821e  0xb38e103d   0x1aef48   0xb1973df0
0xbe9772c0:      0x73d  0xb38dc51f        0x0          0x1
0xbe9772d0:   0x191d58    0x191da4   0x19f200   0xb31ae5ed
...
0xbe977560: 0xb28c6000  0xbe9776b4        0x5      0x10871 <main(int, char**)>
0xbe977570: 0xb6f93000  0xaaaaaaab 0xaf85fd4a   0xa36dbc17
0xbe977580:      0x130         0x0    0x109b9 <__libc_csu_init> 0x0
...
0xbe977690:        0x0         0x0    0x108cd <_start>  0x0
0xbe9776a0:        0x0     0x108ed <_start+32>  0x10a19 <__libc_csu_fini> 0xb6f76969  
(gdb) set logging off
Done logging to /tmp/bt.txt.

Gdb already can name some of the functions (like main()), but not all of them. At least not the ones more interesting for our purpose. We’ll have to look for them by hand.

We first get the memory page mapping from the process (WebKit’s WebProcess in my case) looking in /proc/pid/maps. I’m retrieving it from the device (named metro) via ssh and saving it to a local file. I’m only interested in the code pages, those with executable (‘x’) permissions:

$ ssh metro 'cat /proc/$(ps axu | grep WebProcess | grep -v grep | { read _ P _ ; echo $P ; })/maps | grep " r.x. "' > /tmp/maps.txt

The file looks like this:

00010000-00011000 r-xp 00000000 103:04 2617      /usr/bin/WPEWebProcess
...
b54f2000-b6e1e000 r-xp 00000000 103:04 1963      /usr/lib/libWPEWebKit-0.1.so.2.2.1 
b6f6b000-b6f82000 r-xp 00000000 00:02 816        /lib/ld-2.24.so 
be957000-be978000 rwxp 00000000 00:00 0          [stack] 
be979000-be97a000 r-xp 00000000 00:00 0          [sigpage] 
be97b000-be97c000 r-xp 00000000 00:00 0          [vdso] 
ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]

Now we process the backtrace to remove address markers and have one word per line:

$ cat /tmp/bt.txt | sed -e 's/^[^:]*://' -e 's/[<][^>]*[>]//g' | while read A B C D; do echo $A; echo $B; echo $C; echo $D; done | sed 's/^0x//' | while read P; do printf '%08x\n' "$((16#"$P"))"; done | sponge /tmp/bt.txt

Then merge and sort both files, so the addresses in the stack appear below their corresponding mappings:

$ cat /tmp/maps.txt /tmp/bt.txt | sort > /tmp/merged.txt

Now we process the resulting file to get each address in the stack with its corresponding mapping:

$ cat /tmp/merged.txt | while read LINE; do if [[ $LINE =~ - ]]; then MAPPING="$LINE"; else echo $LINE '-->' $MAPPING; fi; done | grep '/' | sed -E -e 's/([0-9a-f][0-9a-f]*)-([0-9a-f][0-9a-f]*)/\1 - \2/' > /tmp/mapped.txt

Like this (address in the stack, page start (or base), page end, page permissions, executable file load offset (base offset), etc.):

0001034c --> 00010000 - 00011000 r-xp 00000000 103:04 2617 /usr/bin/WPEWebProcess
...
b550bfa4 --> b54f2000 - b6e1e000 r-xp 00000000 103:04 1963 /usr/lib/libWPEWebKit-0.1.so.2.2.1 
b5937445 --> b54f2000 - b6e1e000 r-xp 00000000 103:04 1963 /usr/lib/libWPEWebKit-0.1.so.2.2.1 
b5fb0319 --> b54f2000 - b6e1e000 r-xp 00000000 103:04 1963 /usr/lib/libWPEWebKit-0.1.so.2.2.1
...

The addr2line tool can give us the exact function an address belongs to, or even the function and source code line if the code has been built with symbols. But the addresses addr2line understands are internal offsets, not absolute memory addresses. We can convert the addresses in the stack to offsets with this expression:

offset = address - page start + base offset

I’m using buildroot as my cross-build environment, so I need to pick the library files from the staging directory because those are the unstripped versions. The addr2line tool is the one from the buldroot cross compiling toolchain. Written as a script:

$ cat /tmp/mapped.txt | while read ADDR _ BASE _ END _ BASEOFFSET _ _ FILE; do OFFSET=$(printf "%08x\n" $((0x$ADDR - 0x$BASE + 0x$BASEOFFSET))); FILE=~/buildroot/output/staging/$FILE; if [[ -f $FILE ]]; then LINE=$(~/buildroot/output/host/usr/bin/arm-buildroot-linux-gnueabihf-addr2line -p -f -C -e $FILE $OFFSET); echo "$ADDR $LINE"; fi; done > /tmp/addr2line.txt

Finally, we filter out the useless [??] entries:

$ cat /tmp/bt.txt | while read DATA; do cat /tmp/addr2line.txt | grep "$DATA"; done | grep -v '[?][?]' > /tmp/fullbt.txt

What remains is something very similar to what the real backtrace should have been if everything had originally worked as it should in gdb:

b31ae5ed gst_pad_send_event_unchecked en /home/enrique/buildroot/output5/build/gstreamer1-1.10.4/gst/gstpad.c:5571 
b31a46c1 gst_debug_log en /home/enrique/buildroot/output5/build/gstreamer1-1.10.4/gst/gstinfo.c:444 
b31b7ead gst_pad_send_event en /home/enrique/buildroot/output5/build/gstreamer1-1.10.4/gst/gstpad.c:5775 
b666250d WebCore::AppendPipeline::injectProtectionEventIfPending() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebCore/platform/graphics/gstreamer/mse/AppendPipeline.cpp:1360 
b657b411 WTF::GRefPtr<_GstEvent>::~GRefPtr() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/DerivedSources/ForwardingHeaders/wtf/glib/GRefPtr.h:76 
b5fb0319 WebCore::HTMLMediaElement::pendingActionTimerFired() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebCore/html/HTMLMediaElement.cpp:1179 
b61a524d WebCore::ThreadTimers::sharedTimerFiredInternal() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebCore/platform/ThreadTimers.cpp:120 
b61a5291 WTF::Function<void ()>::CallableWrapper<WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::{lambda()#1}>::call() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/DerivedSources/ForwardingHeaders/wtf/Function.h:101 
b6c809a3 operator() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:171 
b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164 
b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164 
b2ad4223 g_main_context_dispatch en :? 
b6c80601 WTF::{lambda(_GSource*, int (*)(void*), void*)#1}::_FUN(_GSource*, int (*)(void*), void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:40 
b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164 
b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164 
b2adfc49 g_poll en :? 
b2ad44b7 g_main_context_iterate.isra.29 en :? 
b2ad477d g_main_loop_run en :? 
b6c80de3 WTF::RunLoop::run() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:97 
b6c654ed WTF::RunLoop::dispatch(WTF::Function<void ()>&&) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/RunLoop.cpp:128 
b5937445 int WebKit::ChildProcessMain<WebKit::WebProcess, WebKit::WebProcessMain>(int, char**) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebKit/Shared/unix/ChildProcessMain.h:64 
b27b2978 __bss_start en :?

I hope you find this trick useful and the scripts handy in case you ever to resort to examining the raw stack to get a meaningful backtrace.

Happy debugging!

by eocanha at October 16, 2020 07:07 PM

October 15, 2020

Andy Wingo

on "binary security of webassembly"

Greets!

You may have seen an interesting paper cross your radar a couple months ago: Everything Old is New Again: Binary Security of WebAssembly, by Daniel Lehmann, Johannes Kinder and Michael Pradel. The paper makes some strong claims and I would like to share some thoughts on it.

reader-response theory

For context, I have been working on web browsers for the last 8 years or so, most recently on the JavaScript and WebAssembly engine in Firefox. My work mostly consists of implementing new features, which if you are familiar with software development translates as "writing bugs". Almost all of those bugs are security bugs, potentially causing Firefox to go from being an agent of the user to an agent of the Mossad, or of cryptocurrency thieves, or anything else.

Mitigating browser bug flow takes a siege mentality. Web browsers treat all web pages and their corresponding CSS, media, JavaScript, and WebAssembly as hostile. We try to reason about global security properties, and translate those properties into invariants ensured at compile-time and run-time, for example to ensure that a web page from site A can't access cookies from site B.

In this regard, WebAssembly has some of the strongest isolation invariants in the whole platform. A WebAssembly module has access to nothing, by default: neither functionality nor data. Even a module's memory is isolated from the rest of the browser, both by construction (that's just how WebAssembly is specified) and by run-time measures (given that pointers are 32 bits in today's WebAssembly, we generally reserve a multi-gigabyte region for a module's memory that can contain nothing else).

All of this may seem obvious, but consider that a C++ program compiled to native code on a POSIX platform can use essentially everything that the person running it has access to: your SSH secrets, your email, all of your programs, and so on. That same program compiled to WebAssembly does not -- any capability it has must have been given to it by the person running the program. For POSIX-like programs, the WebAssembly community is working on a POSIX for the web that standardizes a limited-capability access to data and functionality from the world, and in web browsers, well of course the module has access only to the capabilities that the embedding web page gives to it. Mostly, as the JS run-time accompanying the WebAssembly is usually generated by emscripten, this set of capabilties is a function of the program itself.

Of course, complex WebAssembly systems may contain multiple agents, acting on behalf of different parties. For example, a module might, through capabilities provided by the host, be able to ask flickr to delete a photo, but might also be able to crawl a URL for photos. Probably in this system, crawling a web page shouldn't be able to "trick" the WebAssembly module into deleting a photo. The C++ program compiled to WebAssembly could have a bug of course, in which case, you get to keep both pieces.

I mention all of this because we who work on WebAssembly are proud of this work! It is a pleasure to design and build a platform for high-performance code that provides robust capabilities-based security properties.

the new criticism

Therefore it was with skepticism that I started reading the Lehmann et al paper. The paper focusses on WebAssembly itself, not any particular implementation thereof; what could be wrong about WebAssembly?

I found the answer to be quite nuanced. To me, the paper shows three interesting things:

  1. Memory-safety bugs in C/C++ programs when compiled to WebAssembly can cause control-flow edges that were not present in the source program.

  2. Unexpected control-flow in a web browser can sometimes end up in a call to eval with the permissions of the web page, which is not good.

  3. It's easier in some ways to exploit bugs in a C/C++ program when compiled to WebAssembly than when compiled natively, because many common mitigations aren't used by the WebAssembly compiler toolchain.

Firstly, let's dicuss the control-flow point. Let's say that the program has a bug, and you have made an exploit to overwrite some memory location. What can you do with it? Well, consider indirect calls (call_indirect). This is what a compiler will emit for a vtable method call, or for a call to a function pointer. The possible targets for the indirect call are stored in a table, which is a side array of all possible call_indirect targets. The actual target is selected at run-time based on an index; WebAssembly function pointers are just indices into this table.

So if a function loads an index into the indirect call table from memory, and some exploit can change this index, then you can cause a call site to change its callee. Although there is a run-time type check that occurs at the call_indirect site to ensure that the callee is called with the right type, many functions in a module can have compatible types and thus be callable without an error.

OK, so that's not great. But what can you do with it? Well it turns out that emscripten will sometimes provide JavaScript's eval to the WebAssembly module. Usually it will be called only with a static string, but anything can happen. If an attacker can redirect a call site to eval instead of one of the possible targets from the source code, you can (e.g.) send the user's cookies to evil.com.

There's a similar vulnerability regarding changing the operand to eval, instead. Strings are represented in linear memory as well, and there's no write protection on them, even if they are read-only data. If your write primitive can change the string being passed to eval, that's also a win for the attacker. More details in the paper.

This observation brings us to the last point, which is that many basic mitigations in (e.g.) POSIX deployments aren't present in WebAssembly. There are no OS-level read-only protections for static data, and the compiler doesn't enforce this either. Also WebAssembly programs have to bundle their own malloc, but the implementations provided by emscripten don't implement the "hardening" techniques. There is no addres-space layout randomization, so exploits are deterministic. And so on.

on mitigations

It must be said that for most people working on WebAssembly, security "mitigations" are... unsatisfactory. They aren't necessary for memory-safe programs, and they can't prevent memory-unsafe programs from having unexpected behavior. Besides, we who work on WebAssembly are more focussed on the security properties of the WebAssembly program as embedded in its environment, but not on the program itself. Garbage in, garbage out, right?

In that regard, I think that one answer to this paper is just "don't". Don't ship memory-unsafe programs, or if you do, don't give them eval capabilities. No general mitigation will make these programs safe. Writing your program in e.g. safe Rust is a comprehensive fix to this class of bug.

But, we have to admit also that shipping programs written in C and C++ is a primary goal of WebAssembly, and that no matter how hard we try, some buggy programs will get shipped, and therefore that there is marginal value to including mitigations like read-only data or even address space randomization. We definitely need to work on getting control-flow integrity protections working well with the WebAssembly toolchain, probably via multi-table support (part of the reference types extension; my colleague Paulo Matos just landed a patch in this area). And certainly Emscripten should work towards minimizing the capabilities set exposed to WebAssembly by the generated JavaScript, notably by compiling away uses of eval by embind.

Finally, I think that many of the problems identified by this paper will be comprehensively fixed in a different way by more "managed" languages. The problem is that C/C++ pointers are capabilities into all of undifferentiated linear memory. By contrast, handles to GC-managed objects are unforgeable: given object A, you can't get to object B except if object A references B. It would be great if we could bring some of the benefits of this more capability-based approach to in-memory objects to languages like C and C++; more on that in a future note, I think.

chapeau

In the end, despite my initial orneriness, I have to admit that the paper authors point out some interesting areas to work on. It's clear that there's more work to do. I was also relieved to find that my code is not at fault in this particular instance :) Onwards and upwards, and until next time, happy hacking!

by Andy Wingo at October 15, 2020 10:29 AM

October 13, 2020

Andy Wingo

malloc as a service

Greetings, internet! Today I have the silliest of demos for you: malloc-as-a-service.

loading walloc...

JavaScript disabled, no walloc demo. See the walloc web page for more information. >&&<&>>>>>&&>

The above input box, if things managed to work, loads up a simple bare-bones malloc implementation, and exposes "malloc" and "free" bindings. But the neat thing is that it's built without emscripten: it's a standalone C file that compiles directly to WebAssembly, with no JavaScript run-time at all. I share it here because it might come in handy to people working on WebAssembly toolchains, and also because it was an amusing experience to build.

wat?

The name of the allocator is "walloc", in which the w is for WebAssembly.

Walloc was designed with the following priorities, in order:

  1. Standalone. No stdlib needed; no emscripten. Can be included in a project without pulling in anything else.

  2. Reasonable allocation speed and fragmentation/overhead.

  3. Small size, to minimize download time.

  4. Standard interface: a drop-in replacement for malloc.

  5. Single-threaded (currently, anyway).

Emscripten includes a couple of good malloc implementations (dlmalloc and emmalloc) which probably you should use instead. But if you are really looking for a bare-bones malloc, walloc is fine.

You can check out all the details at the walloc project page; a selection of salient bits are below.

Firstly, to build walloc, it's just a straight-up compile:

clang -DNDEBUG -Oz --target=wasm32 -nostdlib -c -o walloc.o walloc.c

The resulting walloc.o is a conforming WebAssembly file on its own, but which also contains additional symbol table and relocation sections which allow wasm-ld to combine separate compilation units into a single final WebAssembly file. walloc.c on its own doesn't import or export anything, in the WebAssembly sense; to make bindings visible to JS, you need to add a little wrapper:

typedef __SIZE_TYPE__ size_t;

#define WASM_EXPORT(name) \
  __attribute__((export_name(#name))) \
  name

// Declare these as coming from walloc.c.
void *malloc(size_t size);
void free(void *p);
                          
void* WASM_EXPORT(walloc)(size_t size) {
  return malloc(size);
}

void WASM_EXPORT(wfree)(void* ptr) {
  free(ptr);
}

If you compile that to exports.o and link via wasm-ld --no-entry --import-memory -o walloc.wasm exports.o walloc.o, you end up with the walloc.wasm used in the demo above. See your inspector for the URL.

The resulting wasm file is about 2 kB (uncompressed).

Walloc isn't the smallest allocator out there. A simple bump-pointer allocator that never frees is the fastest thing you can have. There is also an alternate allocator for Rust, wee_alloc, which is said to be smaller than walloc, though I think it is less space-efficient for small objects. But still, walloc is pretty small.

implementation notes

When a C program is compiled to WebAssembly, the resulting wasm module (usually) has associated linear memory. It can be linked in a way that the memory is created by the module when it's instantiated, or such that the module is given a memory by its host. The above example passed --import-memory to the linker, allowing the host to bound memory usage for the module instance.

The linear memory has the usual data, stack, and heap segments. The data and stack are placed first. The heap starts at the &__heap_base symbol. (This symbol is computed and defined by the linker.) All bytes above &__heap_base can be used by the wasm program as it likes. So &__heap_base is the lower bound of memory managed by walloc.

                                              memory growth ->
+----------------+-----------+-------------+-------------+----
| data and stack | alignment | walloc page | walloc page | ...
+----------------+-----------+-------------+-------------+----
^ 0              ^ &__heap_base            ^ 64 kB aligned

Interestingly, there are a few different orderings of data and stack used by different toolchains. It used to even be the case that the stack grew up. This diagram from the recent "Everything Old is New Again: Binary Security of WebAssembly" paper by Lehmann et al is a good summary:

The sensible thing to prevent accidental overflow (underflow, really) is to have the stack grow down to 0, with data at higher addresses. But this can cause WebAssembly code that references data to take up more bytes, because addresses are written using variable-length "LEB" encodings that favor short offsets, so it isn't the default, right now at least.

Anyway! The upper bound of memory managed by walloc is the total size of the memory, which is aligned on 64-kilobyte boundaries. (WebAssembly ensures this alignment.) Walloc manages memory in 64-kb pages as well. It starts with whatever memory is initially given to the module, and will expand the memory if it runs out. The host can specify a maximum memory size, in pages; if no more pages are available, walloc's malloc will simply return NULL; handling out-of-memory is up to the caller.

Walloc has two allocation strategies: small and large objects.

big bois

A large object is more than 256 bytes.

There is a global freelist of available large objects, each of which has a header indicating its size. When allocating, walloc does a best-fit search through that list.

struct large_object {
  struct large_object *next;
  size_t size;
  char payload[0];
};
struct large_object* large_object_free_list;

Large object allocations are rounded up to 256-byte boundaries, including the header.

If there is no object on the freelist that can satisfy an allocation, walloc will expand the heap by the size of the allocation, or by half of the current walloc heap size, whichever is larger. The resulting page or pages form a large object that can satisfy the allocation.

If the best object on the freelist has more than a chunk of space on the end, it is split, and the tail put back on the freelist. A chunk is 256 bytes.

+-------------+---------+---------+-----+-----------+
| page header | chunk 1 | chunk 2 | ... | chunk 255 |
+-------------+---------+---------+-----+-----------+
^ +0          ^ +256    ^ +512                      ^ +64 kB

As each page is 65536 bytes, and each chunk is 256 bytes, there are therefore 256 chunks in a page. The first chunk in a page that begins an allocated object, large or small, contains a header chunk. The page header has a byte for each of the 256 chunks in the page. The byte is 255 if the corresponding chunk starts a large object; otherwise the byte indicates the size class for packed small-object allocations (see below).

+-------------+---------+---------+----------+-----------+
| page header | large object 1    | large object 2 ...   |
+-------------+---------+---------+----------+-----------+
^ +0          ^ +256    ^ +512                           ^ +64 kB

When splitting large objects, we avoid starting a new large object on a page header chunk. A large object can only span where a page header chunk would be if it includes the entire page.

Freeing a large object pushes it on the global freelist. We know a pointer is a large object by looking at the page header. We know the size of the allocation, because the large object header precedes the allocation. When the next large object allocation happens after a free, the freelist will be compacted by merging adjacent large objects.

small fry

Small objects are allocated from segregated freelists. The granule size is 8 bytes. Small object allocations are packed in a chunk of uniform allocation size. There are size classes for allocations of each size from 1 to 6 granules, then 8, 10, 16, and 32 granules; 10 sizes in all. For example, an allocation of e.g. 12 granules will be satisfied from a 16-granule chunk. Each size class has its own free list.

struct small_object_freelist {
  struct small_object_freelist *next;
};
struct small_object_freelist small_object_freelists[10];

When allocating, if there is nothing on the corresponding freelist, walloc will allocate a new large object, then change its chunk kind in the page header to the size class. It then goes through the fresh chunk, threading the objects through each other onto a free list.

+-------------+---------+---------+------------+---------------------+
| page header | large object 1    | granules=4 | large object 2' ... |
+-------------+---------+---------+------------+---------------------+
^ +0          ^ +256    ^ +512    ^ +768       + +1024               ^ +64 kB

In this example, we imagine that the 4-granules freelist was empty, and that the large object freelist contained only large object 2, running all the way to the end of the page. We allocated a new 4-granules chunk, splitting the first chunk off the large object, and pushing the newly trimmed large object back onto the large object freelist, updating the page header appropriately. We then thread the 4-granules (32-byte) allocations in the fresh chunk together (the chunk has room for 8 of them), treating them as if they were instances of struct freelist, pushing them onto the global freelist for 4-granules allocations.

           in fresh chunk, next link for object N points to object N+1
                                 /--------\                     
                                 |        |
            +------------------+-^--------v-----+----------+
granules=4: | (padding, maybe) | object 0 | ... | object 7 |
            +------------------+----------+-----+----------+
                               ^ 4-granule freelist now points here 

The size classes were chosen so that any wasted space (padding) is less than the size class.

Freeing a small object pushes it back on its size class's free list. Given a pointer, we know its size class by looking in the chunk kind in the page header.

and that's it

Hey have fun with the thing! Let me know if you find it useful. Happy hacking and until next time!

by Andy Wingo at October 13, 2020 01:34 PM

October 01, 2020

Sergio Villar

Closing the gap (in flexbox 😇)

Flexbox had a lot of early problems, but by mid-May 2020 where our story begins, both Firefox and Chromium had done a lot of work on improving things with this feature. WebKit, however, hadn’t caught up. Prioritizing the incredible amounts of work a web engine requires is difficult. The WebKit implementation was still passable for very many (most) cases of the core features, and it didn’t have problems that caused crashes or something that urgently demanded attention, so engineers dedicated their limited time toward other things. The net result, however, was that as this choice repeated many times, the comparative state of WebKit’s flexbox implementation had fallen behind pretty significantly.
Web Platform Tests (WPT) is a huge ongoing effort from many people to come up with a very extensive list of tests that could help both spec editors and implementors to make sure we have great compatibility. In the case of flexbox, for example, there are currently 773 tests (2926 subtests) and WebKit was failing a good amount of them. This matters a lot because there are things that flexbox is ideal for, and it is exceptionally widely used. In mid-May, Igalia was contracted to improve things here, and in this post, I’ll explain and illustrate how we did that.

The Challenge

The main issues were (in no particular order):
  • min-width:auto and min-height:auto handling
  • Nested flexboxes in column flows
  • Flexboxes inside tables and viceversa
  • Percentages in heights with indefinite sizes
  • WebKit CI not runnning many WPT flexbox tests
  • and of course… lack of gap support in Flexbox
Modifying Flexbox layout code is a challenge by itself. Tiny modifications in the source code could cause huge differences in the final layout. You might even have a patch that passes all the tests and regresses multiple popular web sites.
Good news is that we were able to tackle most of those issues. Let’s review what changes you could eventually expect from future releases of Safari (note that Apple doesn’t disclose information about future products and/or releases) and the other WebKit based browsers (like GNOME Web).

Flexbox gaps 🥳🎉

Probably one of the most awaited features in WebKit by web developers. It’s finally here after Firefox and Chrome landed it not so long ago. The implementation was initially inspired by the one in Chrome but then it diverged a bit in the final version of the patch. The important thing is that the behaviour should be the same, at least all the tests in WPT related to gaps are passing now in WebKit trunk.

<div style="display: flex; flex-wrap: wrap; gap: 1ch">
  <div style="background: magenta; color: white">Lorem</div>
  <div style="background: green; color: white">ipsum</div>
  <div style="background: orange; color: white">dolor</div>
  <div style="background: blue; color: white">sit</div>
  <div style="background: brown; color: white">amet</div>
</div>

Tables as flex items

Tables should obey the flex container sizing whenever they are flex items. As it can be seen in the examples bellow, the tables’ layout code was kicking in and ignoring the constraints set by the flex container. Tables should do what the flex algorithm mandates and thus they should allow being stretched/squeezed as required.

<div style="display:flex; width:100px; background:red;">
  <div style="display:table; width:10px; max-width:10px; height:100px; background:green;">
    <div style="width:100px; height:10px; background:green;"></div>
  </div>
</div>

Tables with items exceeding the 100% of available size

This is the case of tables placed inside flex items. The automatic layout table algorithm was generating tables with unlimited widths when the sum of the sizes of their columns (expressed in percentages) was exceeding the 100%. It was impossible to fulfill at the same time the constraints set by tables and flexbox algorithms.

<div style="display:flex; width:100px; height:100px; align-items:flex-start; background:green;">
  <div style="flex-grow:1; flex-shrink:0;">
    <table style="height:50px; background:green;" cellpadding="0" cellspacing="0">
      <tr>
        <td style="width:100%; background:green;"> </td>
        <td style="background:green;"> </td>
      </tr>
    </table>
  </div>
</div>

Note how the table was growing indefinitely (I cropped the “Before” picture to fit in the post) to the right before the fix.

Alignment in single-line flexboxes

Interesting case. The code was considering that single-line flexboxes were those where all the flex items were placed in a single line after computing the required space for them. Though sensible, that’s not what a single line flexbox is, it’s a flex container with flex-wrap:nowrap. This means that a flex container with flex-wrap:wrap whose children do not need more than 1 flex line to be placed is not a single-line flex container from the specs POV (corolary: implementing specs is hard).

<div style="display: flex; flex-wrap: wrap; align-content: flex-end; width: 425px; height: 70px; border: 2px solid black">
  <div style="height: 20px">This text should be at the bottom of its container</div>
</div>

Percentages in flex items with indefinite sizes

One of the trickiest ones. Although it didn’t involve a lot of code it caused two serious regressions in Youtube’s upload form and when viewing Twitter videos in fullscreen which required some previous fixes and delayed a bit the landing of this patch. Note that this behaviour was really conflictive from the pure specification POV as there were many changes over the time. Defining a good behaviour is really complicated. Without entering in too much details, flexbox has a couple of cases were sizes are considered as definite when they are theoretically indefinite. In this case we consider that if the flex container main size is definite then the post-flexing size of flex items is also treated as definite.

<div style="display: flex; flex-direction: column; height: 150px; width: 150px; border: 2px solid black;">
  <div>
    <div style="height: 50%; overflow: hidden;">
      <div style="width: 50px; height: 50px; background: green;"></div>
    </div>
  <div>
  <div style="flex: none; width: 50px; height: 50px; background: green;"></div>
</div>

Hit testing with overlapping flex items

There were some issues with pointer events passing through overlapping flex items (due to negative margins for example). This was fixed by letting the hit testing code proceed in reverse (the opposite to painting) order-modified document order instead of using the raw order from the DOM.

<div style="display:flex; border: 1px solid black; width: 300px;">
  <a style="width: 200px;" href="#">Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua</a>
  <div style="margin-left: -200px; width: 130px; height: 50px; background: orange;"></div>
</div>

In the “Before” case hit testing was bypassing the orange block and thus, the cursor was showing a hand because it detected that it was hovering a link. After the fix, the cursor is properly rendered as an arrow because the orange block covers the underneath link.

Computing percentages with scrollbars

In this case the issue was that, in order to compute percentages in heights, we were incorrectly using the size of the scrollbars too.

<div style="display: inline-flex; height: 10em;">
  <div style="overflow-x: scroll;">
    <div style="width: 200px; height: 100%; background: green"></div>
  </div>
</div>

Note that in the “After” picture the horizontal scrollbar background is visible while in the “Before” the wrong height computation made the flex item overlap the scrollbar.

Image items with specific sizes

The flex layout algorithm needs the intrinsic sizes of the flex items to compute their sizes and the size of the flex container. Changes to those intrinsic sizes should trigger new layouts, and the code was not doing that.

<!-- Just to showcase how the img bellow is not properly sized -->
<div style="position: absolute; background-color: red; width: 50px; height: 50px; z-index: -1;"></div>
 
<div style="display: flex; flex-direction: column; width: 100px; height: 5px;">
  <img style="width: 100px; height: 100px;" src="https://wpt.live/css/css-flexbox/support/100x100-green.png">
</div>


Nested flexboxes with ‘min-height: auto’

Another tricky one and another one related to the handling of nested column flexboxes. As in the previous issue with nested column flexboxes the problem was that we were not supporting this case. For those wanting to have a deeper understanding of the issue this bug was about implementing section 4.5 of the specs. This was one of the more complicated ones to fix, Edward Lorenz would love that part of the layout code, the slightest change in one of those source code lines could trigger huge changes in the final rendering.

<div style='display:flex; flex-direction: column; overflow-y: scroll; width: 250px; height: 250px; border: 1px solid black'>
  <div style='display:flex;'>
    <div style="width: 100px; background: blue"></div>
    <div style='width: 120px; background: orange'></div>
    <div style='width: 10px; background: yellow; height: 300px'></div>
  </div>
</div>

As it can be seen, in the “Before” picture the blue and orange blocks are sized differently to the yellow one. That’s fixed in the “After” picture.

Percentages in quirks mode

Another one affecting how percentages are computed in heights, but this one specific to quirks mode. We’re matching now Firefox, Chrome and pre-Chromium Edge, i.e., flexbox should not care much about quirks mode since it was invented many years after quirky browsers dominated the earth.

<!DOCTYPE html PUBLIC>
<div style="width: 100px; height: 50px;">
  <div style="display: flex; flex-direction: column; outline: 2px solid blue;">
    <div style="flex: 0 0 50%"></div>
  </div>
</div>

Percentages in ‘flex-basis’

Percentages were working generally fine inside flex-basis, however there was one particular problematic case. It arose whenever that percentage was refererring to, oh surprise, and indefinite height. And again, we’re talking about nested flexboxes with column flows. Indeed, definite/indefinite sizes is one of the toughest things to get right from the layout POV. In this particular case, the fix was to ignore the percentages and and treat them as height: auto.

<div style="display: flex; flex-direction: column; width: 200px;">
  <div style="flex-basis: 0%; height: 100px; background: red;">
    <div style="background: lime">Here's some text.</div>
  </div>
</div>

Flex containers inside STF tables

Fixing a couple of test cases submitted by an anonymous Opera employee 8! years ago. This is another case of competing layout contexts trying to do things their own way.

<div style="display: table; background:red">
   <div style="display: flex; width: 0px">
      <p style="margin: 1em 1em;width: 50px">Text</p>
      <p style="margin: 1em 1em;width: 50px">Text</p>
      <p style="margin: 1em 1em;width: 50px">Text</p>
   </div>
</div>

After the fix the table is properly sized to 0px width and thus no red is seen.

Conclusions

These examples are just some interesting ones I’ve chosen to highlight. In the end, almost 50 new flexbox tests are passing in WebKit that weren’t back in May!. I wouldn’t like to forget the great job done by my colleague Carlos Lopez who imported tons of WPT flexbox tests into the WebKit source tree. He also performed awesome triage work which made my life a lot easier.
Investing in interoperability is a huge deal for the web. It’s good for everyone, from spec authors to final users, including browser vendors, downstream ports or web authors. So if you care about the web, or your business orbits around web technologies, you should definitely promote and invest on interoperability.

Implementing standards or fixing bugs in web engines is the kind of work we happily do at Igalia on a daily basis. We are the second largest contributor to both WebKit and Chrome/Blink, so if you have an annoying bug on a particular web engine (Gecko and Servo as well) that you want to be fixed, don’t hesitate and contact us, we’d be glad to help. Also, should you want to be part of a workers-owned cooperative with an asambleary decision-making mechanism and a strong focus on free software technologies join us!.

Acknowledgements

Many thanks to WebKit reviewers from Apple and Igalia like Darin Adler, Manuel Rego, Javier Fernández or Daniel Bates who made the process really easy for me, always providing very nice feedback for the patches I submitted.
I’m also really thankful to Googlers like Christian Biesinger, David Grogan and Stephen McGruer who worked on the very same things in Blink and/or provided very nice guidance and support when porting patches.

by svillar at October 01, 2020 11:34 AM

September 28, 2020

Adrián Pérez

Sunsetting NPAPI support in WebKitGTK (and WPE)

  1. Summary
  2. What is NPAPI?
  3. What is NPAPI used for?
  4. Why are NPAPI plug-ins being phased out?
  5. What are other browsers doing?
  6. Is WebKitGTK following suit?

Summary

Here’s a tl;dr list of bullet points:

  • NPAPI is an old mechanism to extend the functionality of a web browser. It is time to let it go.
  • One year ago, WebKitGTK 2.26.0 removed support for NPAPI plug-ins which used GTK2, but the rest of plug-ins kept working.
  • WebKitGTK 2.30.x will be the last stable series with support for NPAPI plug-ins at all. Version 2.30.0 was released a couple of weeks ago.
  • WebKitGTK 2.32.0, due in March 2021, will be the first stable release to ship without support for NPAPI plug-ins.
  • We have already removed the relevant code from the WebKit repository.
  • While the WPE WebKit port allowed running windowless NPAPI plug-ins, this was never advertised nor supported by us.

What is NPAPI?

In 1995, Netscape Navigator 2.0 introduced a mechanism to extend the functionality of the web browser. That was NPAPI, short for Netscape Plugin Application Programming Interface. NPAPI allowed third parties to add support for new content types; for example Future Splash (.spl files), which later became Flash (.swf).

When a NPAPI plug-in is used to render content, the web browser carves a hole in the rectangular location where content handled by the plug-in will be placed, and hands off the rendering responsibility to the plug-in. This would end up calling call for trouble, as we will see later.

What is NPAPI used for?

A number of technologies have used NPAPI along the years for different purposes:

  • Displaying of multimedia content using Flash Player or the Silverlight plug-ins.
  • Running rich Java™ applications in the browser.
  • Displaying documents in non-Web formats (PDF, DjVu) inside browser windows.
  • A number of questionable practices, like VPN client software using a browser plug‑in for configuration.

Why are NPAPI plug-ins being phased out?

The design of NPAPI makes the web browser give full responsibility to plug-ins: the browser has no control whatsoever over what plug-ins do to display content, which makes it hard to make them participate in styling and layout. More importantly, plug-ins are compiled, native code over which browser developers cannot exercise quality control, which resulted in a history of security incidents, crashes, and browser hangs.

Today, Web browsers’ rendering engines can do a better job than plug-ins, more securely and efficiently. The Web platform is mature and there is no place to blindly trust third party code to behave well. NPAPI is a 25 years old technology showing its age—it has served its purpose, but it is no longer needed.

The last nail in the coffin was Adobe’s 2017 announcement that the Flash plugin will be discontinued in January 2021.

What are other browsers doing?

Glad that you asked! It turns out that all major browsers have plans for incrementally reducing how much of NPAPI usage they allow, until they eventually remove it.

Firefox

Let’s take a look at the Firefox roadmap first:

Version Date Plug-in support changes
47 June 2016 All plug-ins except Flash need the user to click on the element to activate them.
52 March 2017 Only loads the Flash plug‑in by default.
55 August 2017 Does not load the Flash plug‑in by default, instead it asks users to choose whether sites may use it.
56 September 2017 On top of asking the user, Flash content can only be loaded from http:// and https:// URIs; the Android version completely removes plug‑in support. There is still an option to allow always running the Flash plug-in without asking.
69 September 2019 The option to allow running the Flash plug-in without asking the user is gone.
85 January 2021 Support for plug-ins is gone.
Table: Firefox NPAPI plug-in roadmap.

In conclusion, the Mozilla folks have been slowly boiling the frog for the last four years and will completely remove the support for NPAPI plug-ins coinciding with the Flash player reaching EOL status.

Chromium / Chrome

Here’s a timeline of the Chromium roadmap, merged with some highlights from their Flash Roadmap:

Version Date Plug-in support changes
? Mid 2014 The interface to unblock running plug-ins is made more complicated, to discourage usage.
? January 2015 Plug-ins blocked by default, some popular ones allowed.
42 April 2015 Support for plug-ins disabled by default, setting available in chrome://flags.
45 September 2015 Support for NPAPI plug-ins is removed.
55 December 2016 Browser does not advertise Flash support to web content, the user is asked whether to run the plug-in for sites that really need it.
76 July 2019 Flash support is disabled by default, can still be enabled with a setting.
88 January 2021 Flash support is removed.
Table: Chromium NPAPI/Flash plug-in roadmap.

Note that Chromium continued supporting Flash content even when it already removed support for NPAPI in 2015: by means of their acute NIH syndrome, Google came up with PPAPI, which replaced NPAPI and which was basically designed to support Flash and is currently used by Chromium’s built-in PDF viewer—which will go away also coinciding with Flash being EOL, nevertheless.

Safari

On the Apple camp, the story is much easier to tell:

  • Their handheld devices—iPhone, iPad, iPod Touch—never supported NPAPI plug-ins to begin with. Easy-peasy.
  • On desktop, Safari has required explicit approval from the user to allow running plug-ins since June 2016. The Flash plug-in has not been preinstalled in Mac OS since 2010, requiring users to manually install it.
  • NPAPI plug-in support will be removed from WebKit by the end of 2020.

Is WebKitGTK following suit?

Yes. In September 2019 WebKitGTK 2.26 removed support for NPAPI plug-ins which use GTK2. This included Flash, but the PPAPI version could still be used via freshplayerplugin.

In March 2021, when the next stable release series is due, WebKitGTK 2.32 will remove the support for NPAPI plug-ins. This series will receive updates until September 2021.

The above gives a full two years since we started restricting which plug-ins can be loaded before they stop working, which we reckon should be enough. At the moment of writing this article, the support for plug-ins was already gone from the WebKit source the GTK and WPE ports.

Yes, you read well, WPE supported NPAPI plug-ins, but in a limited fashion: only windowless plug-ins worked. In practice, making NPAPI plug-ins work on Unix-like systems required using the XEmbed protocol to allow them to place their rendered content overlaid on top of WebKit’s, but the WPE port does not use X11. Provided that we never advertised nor officially supported the NPAPI support in the WPE port, we do not expect any trouble removing it.

by aperez (adrian@perezdecastro.org) at September 28, 2020 10:00 PM

September 27, 2020

Ricardo García

My participation in XDC 2020

The 2020 X.Org Developers Conference took place from September 16th to September 18th. For the first time, due to the ongoing COVID-19 pandemic, it was a fully virtual event. While this meant that some interesting bits of the conference, like the hallway track, catching up in person with some people and doing some networking, was not entirely possible this time, I have to thank the organizers for their work in making the conference an almost flawless event. The conference was livestreamed directly to YouTube, which was the main way for attendees to watch the many different talks. freenode was used for the hallway track, with most discussions happening in the ##xdc2020 IRC channel. In addition ##xdc2020-QA was used for attendees wanting to add questions or comments at the end of the talk.

Igalia was a silver sponsor of the event and we also participated with 5 different talks, including one by yours truly.

My talk about VK_EXT_extended_dynamic_state was based on my previous blog post, but it includes a more detailed explanation of the extension as well as more detailed comments and an explanation about how the extension was created. I took advantage of the possibility of using pre-recorded videos for the conference, as I didn’t fully trust my kids wouldn’t interrupt me in the middle of the talk. In the end I think it was a good idea and, from the presenter point of view, I also found out using a script and following it strictly (to some degree) prevented distractions and made the talk a bit shorter and more to the point, because I tend to beat around the bush when talking live. You can watch my talk in the embedded video below.

Slides for the talk are also available and below you can find a transcript of the talk.

<Title slide>

Hello, my name is Ricardo García, I work at Igalia as part of its Graphics team and today I will be talking about the extended dynamic state Vulkan extension. At Igalia I was involved in creating CTS tests for this extension and also in reviewing the spec when writing those tests, in a very minor capacity. This extension is pretty simple and very useful, and the talk is divided in two parts. First I will talk about the extension itself and then I’ll reflect on a few bits about how this extension was created that I consider quite interesting.

<Part 1>

<Extension description slide>

So, first, what does this extension do? Its documentation says:

VK_EXT_extended_dynamic_state adds some more dynamic state to support applications that need to reduce the number of pipeline state objects they compile and bind.

In other words, as you will see, it makes Vulkan pipeline objects more flexible and easier to use from the application point of view.

<Pipeline diagram slide>

So, to give you some context, this is [the] typical graphics pipeline representation in many APIs like OpenGL, DirectX or Vulkan. You’ve probably seen variations of this a million times. The pipeline is divided in stages, some of them fixed-function, some of them programmable with shaders. Each stage usually takes some data from the previous stage and produces data to be consumed by the next one, apart from using other external resources like buffers or textures or whatever. What’s the Vulkan approach to represent this process?

<Creation structure slide>

Vulkan wants you to specify almost every single aspect of the previous pipeline in advance by creating a graphics pipeline object that contains information about how every stage should work. And, once created, most of these pipeline parameters or configuration cannot be changed. As you can see here, this includes shader programs, how vertices are read and processed, depth and stencil tests, you name it. Pipeline objects are heavy objects in Vulkan and they are hard to create. Why does Vulkan want you to do that? The answer has always been this keyword: “optimization”. Giving all the information in advance gives more chances for every current or even future implementations to optimize how the pipeline works. It’s the safe choice. And, despite this, you can see there’s a pipeline creation parameter with information about dynamic state. These are things that can be changed when using the pipeline without having to create a separate and almost identical pipeline object.

<New dynamic states slide>

What the extension does should be pretty obvious now: it adds a bunch of additional elements that can be changed on the fly without creating additional pipelines. This includes things like primitive topology, front face vertex order, vertex stride, cull mode and more aspects of the depth and stencil tests, etc. A lot of things. Using them if needed means fewer pipeline objects, fewer pipeline cache accesses and simpler programs in general. As I said before, it makes Vulkan pipeline objects more flexible and easier to use from the application point of view, because more pipeline aspects can be changed on the fly when using these pipeline objects instead of having to create separate objects for each combination of parameters you may want to modify at runtime. This may make the application logic simpler and it can also help when Vulkan is used as the backend, for example, to implement higher level APIs that are not so rigid regarding pipelines. I know this extension is useful for some emulators and other API-translating projects.

<New commands slide>

Together with those it also introduces a new set of functions to change those parameters on the fly when recording commands that will use the pipeline state object.

<Pipeline diagram slide>

So, knowing that and going back to the graphics pipeline, the obvious question is: does this impact performance? Aren’t we reducing the number of optimization opportunities the implementation has if we use these additional dynamic states? In theory, yes. In practice, it depends on the implementation. Many GPUs and Vulkan drivers out there today have some pipeline aspects that are considered “dynamic” in the sense that they are easily changed on the fly without a perceptible impact in performance, while others are truly important for optimization. For example, take shaders. In Vulkan they’re provided as SPIR-V programs that need to be translated to GPU machine code and creating pipelines when the application starts makes it easy to compile shaders beforehand to avoid stuttering and frame timing issues later, for example. And not only that. As you create pipelines, you’re telling the implementation which shaders are used together. Say you have a vertex shader that outputs 4 parameters, and it’s used in a pipeline with a fragment shader that only uses the first 2. When creating the pipeline the implementation can decide to discard instructions that are only related to producing the 2 extra unused parameters in the vertex shader. But other things like, for example, changing the front face? That may be trivial without affecting performance.

<Part 2>

<Eric Lengyel tweet slide>

Moving on to the second part, I wanted to talk about how this extension was created. It all started with an “angry” tweet by Eric Lengyel (sorry if I’m not pronouncing it correctly) who also happens to be the author of the previous diagram. He complained in Twitter that you couldn’t change the front face dynamically, which happens to be super useful for rendering reflections, and pointed to an OpenGL NVIDIA extension that allowed you to do exactly that.

<Piers Daniell reply slide>

This was noticed by Piers Daniell from NVIDIA, who created a proposal in Khronos. That proposal was discussed with other vendors (software and hardware) that chimed in on aspects that could be or should be made dynamic if possible, which resulted in the multi-vendor extension we have today.

<RADV implementation slide>

In fact, RADV was one of the first Vulkan implementations to support the extension thanks to the effort by Samuel Pitoiset.

<Promoters of Khronos slide>

This whole process got me thinking Khronos may sometimes be seen from the outside as this closed silo composed mainly of hardware vendors. Certainly, there are a lot of hardware vendors but if you take the list of promoter members you can see some fairly well-known software vendors as well, and API usability and adoption are important for both groups. There are many people in Khronos trying to make Vulkan easier to use even if we’re all aware that’s somewhat in conflict with providing a lower level API that should let you write performant applications.

<Khronos Contributors slide>

If you take a look at the long list of contributor members, that’s only shown partially here because it’s very long, you’ll notice a lot of actors from different backgrounds as well.

<Vulkan-Docs repo slide>

Moreover, while Khronos and its different Vulkan working groups are far from an open source project or community, I believe they’re certainly more open to contributions than what many people think. For example, the Vulkan spec is published in a GitHub repo with instructions to build it (the spec is written in AsciiDoc) and this repo is open for issues and pull requests. So, obviously, if you want to change major parts of Vulkan and how some aspects of the API work, you’re going to meet opposition and maybe you should be joining Khronos to discuss things internally with everyone involved in there. However, while an angry tweet was enough for this particular extension, if you’re not well-known you may want to create an issue instead, exposing your use case and maybe with other colleagues chiming in on details or supporting of your proposal. I know for a fact issues created in this public repo are discussed in periodic Khronos meetings. It may take some weeks if people are busy and there’s a lot of things on the table, but they’re going to end up being discussed, which is a very good thing I was happy to see, and I want to put emphasis on that. I would like Khronos to continue doing that and I would like more people to take advantage of the public repos from Khronos. I know the people involved in the Vulkan spec want to make the text as clear as possible. Maybe you think some paragraph is confusing, or there’s a missing link to another section that provides more context, or something absurd is allowed by the spec and should be forbidden. You can try a reasoned pull request for any of those. Obviously, no guarantees it will go in, but interesting in any case.

<Blend state tweet slide>

For example, in the Twitter thread I showed before, I tweeted a reply when the extension was published and, among a few retweets, likes and quoted replies I found this very interesting Tweet I’m showing you here, asking for the whole blend state to be made dynamic and indicating that would be game-changing for some developers and very interesting for web browsers. We all want our web browsers to leverage the power of the GPU as much as possible, right? So why not? I thought creating an issue in the public repo for this case could be interesting.

<Dynamic blend state issue slide>

And, in fact, it turns out someone had already created an issue about it, as you can see here.

<Tom Olson reply slide>

And in this case, in this issue, Tom Olson from ARM replied that the working group had been discussing it and it turns out in this particular case existing hardware doesn’t make it easy to make the blend state fully dynamic without possibly recompiling shaders under the hood and introducing unwanted complexity in the implementations, so it was rejected for now. But even if, in this case, the reply is negative, you can see what I was mentioning: the issue reached the working group, it was considered, discussed and the issue creator got a reply and feedback. And that’s what I wanted to show you.

<Final slide>

And that’s all. Thanks for listening! Any questions maybe?

The talk was followed by a Q&A section moderated, in this case, by Martin Peres. In the text below RG stands for Ricardo Garcia and MP stands for Martin Peres.

RG: OK…​ Hello everyone!

MP: OK, so far we do not have any questions. Jason Ekstrand has a comment: "We (the Vulkan Working Group) has had many contributions to the spec".

RG: Yeah, yeah, exactly. I mean, I don’t think it’s very well known but yeah, indeed, there are a lot of people who have already contributed issues, pull requests and there have been many external contributions already so these things should definitely continue and even happen more often.

MP: OK, I’m gonna ask a question. So…​ how much do you think this is gonna help layering libraries like Zink because I assume, I mean, one of the big issues with Zink is that you need to have a lot of pipelines precompiled and…​ is this helping Zink?

RG: I don’t know if it’s being used. I think I did a search yesterday to see if Zink was using the extension and I don’t remember if I found anything specific so maybe the Zink people can answer the question but, yeah, it should definitely help in those cases because OpenGL is not as strict as Vulkan regarding pipelines obviously. You can change more things on the fly and if the underlying Vulkan implementation supports extended dynamic state it should make it easier to emulate OpenGL on top of Vulkan. For example, I know it’s being used by VKD3D right now to emulate DirectX 12 and there’s a emulator, a few emulators out there which are using the extension because, you know, APIs for consoles are different and they can use this type of extensions to make code better.

MG: Agree. Jason also has another comment saying there are even extensions in flight from the Mesa community for some windowing-system related stuff.

RG: Yeah, I was happy to see yesterday…​ I think it was yesterday, well, here at this XDC that the present timing extension pull request is being handled right now on GitHub which I think is a very good thing. It’s a trend I would like to [see] continue because, well, I guess sometimes, you know, the discussions inside the Working Group and inside Khronos may involve IP or whatever so it’s better to have those discussions sometimes in private, but it is a good thing that maybe, you know, there are a few extensions that could be handled publicly in GitHub instead of the internal tools in Khronos. So, yeah, that’s a good thing and a trend I would like to see continue: extensions discussed in public.

MG: Yeah, sounds very cool. OK, I think we do not have any question…​ other questions or comments so let’s say thank you very much and…​

RG: Thank you very much and let me congratulate you for…​ to the organizers for organizing XDC and…​ everyone, enjoy the rest of the day, thank you.

MG: Thank you! See you in 13m 30s for the status of freedesktop.org’s GitLab cloud hosting.

Regarding Zink, at the time I’m writing this, there’s an in-progress merge request for it to take advantage of the extension. Regarding the present timing extension, its pull request is at GitHub and you can also watch a short talk from Day One of the conference. I also mentioned the extension being used by VKD3D. I was specifically referring to the VKD3D-Proton fork.

References used in the talk:

September 27, 2020 01:25 PM

September 24, 2020

Samuel Iglesias

X.Org Developers Conference 2020

XDC 2020 sponsors

Last week, X.Org Developers Conference 2020 was held online for the first time. This year, with all the COVID-19 situation that is affecting almost every country worldwide, the X.Org Foundation Board of Directors decided to make it virtual.

I love open-source conferences :-) They are great for networking, have fun with the rest of community members, have really good technical discussions in the hallway track… and visit a new place every year! Unfortunately, we couldn’t do any of that this time and we needed to look for an alternative… being going virtual the obvious one.

The organization team at Intel, lead by Radoslaw Szwichtenberg and Martin Peres, analyzed the different open-source alternatives to organize XDC 2020 in a virtual manner. Finally, due to the setup requirements and the possibility of having more than 200 attendees connected to the video stream at the same time (Big Blue Button doesn’t recommend more than 100 simultaneous users), they selected Jitsi for speakers + Youtube for streaming/recording + IRC for questions. Arkadiusz Hiler summarized very well what they did from the A/V technical point of view and how was the experience hosting a virtual XDC.

I’m very happy with the final result given the special situation this year: the streaming was flawless, we didn’t have almost any technical issue (except one audio issue in the opening session… like in physical ones! :-D), and IRC turned out to be very active during the conference. Thanks a lot to the organizers for their great job!

However, there is always room for improvements. Therefore, the X.Org Foundation board is asking for feedback, please share with us your opinion on XDC 2020!

Just focusing on my own experience, it was very good. I enjoyed a lot the talks presented this year and the interesting discussions happening in IRC. I would like to highlight the four talks presented by my colleagues at Igalia :-D

This year I was also a speaker! I presented “Improving Khronos CTS tests with Mesa code coverage” talk (watch recording), where I explained how we can improve the VK-GL-CTS quality by leveraging Mesa code coverage using open-source tools.

My XDC 2020 talk

I’m looking forward to attending X.Org Developers Conference 2021! But first, we need an organizer! Requests For Proposals for hosting XDC2021 are now open!

See you next year!

September 24, 2020 08:10 AM

September 20, 2020

Eleni Maria Stea

[OpenGL and Vulkan Interoperability on Linux] Part 3: Using OpenGL to display Vulkan allocated textures.

This is the third post of the OpenGL and Vulkan interoperability series, where I explain some EXT_external_objects and EXT_external_objects_fd use cases with examples taken by the Piglit tests I’ve written to test the extensions as part of my work for Igalia‘s graphics team. We are going to see a slightly more complex case of Vulkan/GL … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 3: Using OpenGL to display Vulkan allocated textures.

by hikiko at September 20, 2020 01:08 PM

September 18, 2020

Eleni Maria Stea

[OpenGL and Vulkan Interoperability on Linux]: The XDC 2020 presentation

XDC 2020 (co-sponsored by Igalia) was virtual this year due to covid-19. Several presentations were coming from Igalians (see my XDC post) and I’ve given a talk about OpenGL and Vulkan interoperability and our work to support EXT_external_objects and EXT_external_objects_fd on different Mesa3D drivers. These are my slides and pages 9-12 contain short descriptions of … Continue reading [OpenGL and Vulkan Interoperability on Linux]: The XDC 2020 presentation

by hikiko at September 18, 2020 05:19 PM

XDC 2020

XDC 2020 was virtual this year due to the covid-19 situation. Despite that, there were many interesting talks and 5 of them were given by Igalians. Like in the previous years Igalia was among the sponsors (a silver sponsor). The talks coming from Igalians were the following: Day #1: Overview of the open source Vulkan … Continue reading XDC 2020

by hikiko at September 18, 2020 05:18 PM

September 07, 2020

Víctor Jáquez

Review of Igalia Multimedia activities (2020/H1)

This blog post is a review of the various activities the Igalia Multimedia team was involved in during the first half of 2020.

Our previous reports are:

Just before a new virus turned into pandemics we could enjoy our traditional FOSDEM. There, our colleague Phil gave a talk about many of the topics covered in this report.

GstWPE

GstWPE’s wpesrc element, produces a video texture representing a web page rendered off-screen by WPE.

We have worked on a new iteration of the GstWPE demo, focusing on one-to-many, web-augmented overlays, broadcasting with WebRTC and Janus.

Also, since the merge of gstwpe plugin in gst-plugins-bad (staging area for new elements) new users have come along spotting rough areas and improving the element along the way.

Video Editing

GStreamer Editing Services (GES) is a library that simplifies the creation of multimedia editing applications. It is based on the GStreamer multimedia framework and is heavily used by Pitivi video editor.

Implemented frame accuracy in the GStreamer Editing Services (GES)

As required by the industry, it is now possible to reference all time in frame number, providing a precise mapping between frame number and play time. Many issues were fixed in GStreamer to reach the precision enough for make this work. Also intensive regression tests were added.

Implemented time effects support in GES

Important refactoring inside GStreamer Editing Services have happened to allow cleanly and safely change playback speed of individual clips.

Implemented reverse playback in GES

Several issues have been fixed inside GStreamer core elements and base classes in order to support reverse playback. This allows us to implement reliable and frame accurate reverse playback for individual clips.

Implemented ImageSequence support in GStreamer and GES

Since OpenTimelineIO implemented ImageSequence support, many users in the community had said it was really required. We reviewed and finished up imagesequencesrc element, which had been awaiting review for years.

This feature is now also supported in the OpentimelineIO GES adapater.

Optimized nested timelines preroll time by an order of magnitude

Caps negotiation, done while the pipeline transitions from pause state to playing state, testing the whole pipeline functionality, was the bottleneck for nested timelines, so pipelines were reworked to avoid useless negotiations. At the same time, other members of GStreamer community have improved caps negotiation performance in general.

Last but not least, our colleague Thibault gave a talk in The Pipeline Conference about The Motion Picture Industry and Open Source Software: GStreamer as an Alternative, explaining how and why GStreamer could be leveraged in the motion picture industry to allow faster innovation, and solve issues by reusing all the multi-platform infrastructure the community has to offer.

WebKit multimedia

There has been a lot of work on WebKit multimedia, particularly for WebKitGTK and WPE ports which use GStreamer framework as backend.

WebKit Flatpak SDK

But first of all we would like to draw readers attention to the new WebKit Flatpak SDK. It was not a contribution only from the multimedia team, but rather a joint effort among different teams in Igalia.

Before WebKit Flatpak SDK, JHBuild was used for setting up a WebKitGTK/WPE environment for testing and development. Its purpose to is to provide a common set of well defined dependencies instead of relying on the ones available in the different Linux distributions, which might bring different outputs. Nonetheless, Flatpak offers a much more coherent environment for testing and develop, isolated from the rest of the building host, approaching to reproducible outputs.

Another great advantage of WebKit Flatpak SDK, at least for the multimedia team, is the possibility of use gst-build to setup a custom GStreamer environment, with latest master, for example.

Now, for sake of brevity, let us sketch an non-complete list of activities and achievements related with WebKit multimedia.

General multimedia

Media Source Extensions (MSE)

Encrypted Media Extension (EME)

One of the major results of this first half, is the upstream of ThunderCDM, which is an implementation of a Content Decryption Module, providing Widevine decryption support. Recently, our colleague Xabier, published a blog post on this regard.

And it has enabled client-side video rendering support, which ensures video frames remain protected in GPU memory so they can’t be reached by third-party. This is a requirement for DRM/EME.

WebRTC

GStreamer

Though we normally contribute in GStreamer with the activities listed above, there are other tasks not related with WebKit. Among these we can enumerate the following:

GStreamer VAAPI

  • Reviewed a lot of patches.
  • Support for media-driver (iHD), the new VAAPI driver for Intel, mostly for Gen9 onwards. There are a lot of features with this driver.
  • A new vaapioverlay element.
  • Deep code cleanups. Among these we would like to mention:
    • Added quirk mechanism for different backends.
    • Change base classes to GstObject and GstMiniObject of most of classes and buffers types.
  • Enhanced caps negotiation given current driver’s constraints

Conclusions

The multimedia team in Igalia has keep working, along the first half of this strange year, in our three main areas: browsers (mainly on WebKitGTK and WPE), video editing and GStreamer framework.

We worked adding and enhancing WebKitGTK and WPE multimedia features in order to offer a solid platform for media providers.

We have enhanced the Video Editing support in GStreamer.

And, along these tasks, we have contribuited as much in GStreamer framework, particulary in hardware accelerated decoding and encoding and VA-API.

by vjaquez at September 07, 2020 03:12 PM

Alejandro Piñeiro

v3dv status update 2020-09-07

So here a new update of the evolution of the Vulkan driver for the rpi4 (broadcom GPU).

Features

Since my last update we finished the support for two features. Robust buffer access and multisampling.

Robust buffer access is a feature that allows to specify that accesses to buffers are bounds-checked against the range of the buffer descriptor. Usually this is used as a debug tool during development, but disabled on release (this is explained with more detail on this ARM guide). So sorry, no screenshot here.

On my last update I mentioned that we have started the support for multisampling, enough to get some demos working. Since then we were able to finish the rest of the mulsisampling support, and even implemented the optional feature sample rate shading. So now the following Sascha Willems’s demo is working:

Sascha Willems deferred multisampling demo run on rpi4

Bugfixing

Taking into account that most of the features towards support Vulkan core 1.0 are implemented now, a lot of the effort since the last update was done on bugfixing, focusing on the specifics of the driver. Our main reference for this is Vulkan CTS, the official Khronos testsuite for Vulkan and OpenGL.

As usual, here some screenshots from the nice Sascha Willems’s demos, showing demos that were failing when I wrote the last update, and are working now thanks of the bugfixing work.

Sascha Willems hdr demo run on rpi4

Sascha Willems gltf skinning demo run on rpi4

Next

At this point there are no full features pending to implement to fulfill the support for Vulkan core 1.0. So our focus would be on getting to pass all the Vulkan CTS tests.

Previous updates

Just in case you missed any of the updates of the vulkan driver so far:

Vulkan raspberry pi first triangle
Vulkan update now with added source code
v3dv status update 2020-07-01
V3DV Vulkan driver update: VkQuake1-3 now working
v3dv status update 2020-07-31

by infapi00 at September 07, 2020 02:37 PM

September 02, 2020

Xabier Rodríguez Calvar

Serious Encrypted Media Extensions on GStreamer based WebKit ports

Encrypted Media Extensions (a.k.a. EME) is the W3C standard for encrypted media in the web. This way, media providers such as Hulu, Netflix, HBO, Disney+, Prime Video, etc. can provide their contents with a reasonable amount of confidence that it will make it very complicated for people to “save” their assets without their permission. Why do I use the word “serious” in the title? In WebKit there is already support for Clear Key, which is the W3C EME reference implementation but EME supports more encryption systems, even privative ones (I have my opinion about this, you can ask me privately). No service provider (that I know) supports Clear Key, they usually rely on Widevine, PlayReady or some other.

Three years ago, my colleague Žan Doberšek finished the implementation of what was going to be the shell of WebKit’s modern EME implementation, following latest W3C proposal. We implemented that downstream (at Web Platform for Embedded) as well using Thunder, which includes as a plugin a fork of what was Open Content Decryption Module (a.k.a. OpenCDM). The OpenCDM API changed quite a lot during this journey. It works well and there are millions of set-top-boxes using it currently.

The delta between downstream and the upstream GStreamer based WebKit ports was quite big, testing was difficult and syncing was not always easy, so we decided reverse the situation.

Our first step was done by my colleague Charlie Turner, who made Clear Key work upstream again while adapted some changes the Apple folks had done meanwhile. It was amazing to see Clear Key tests passing again and his work with the CDMProxy related classes was awesome. After having ClearKey working, I had to adapt them a bit to accomodate Thunder. To explain a bit about the WebKit EME architecture, I must say that there are two layers. The first is the crossplatform one, which implements the W3C API (MediaKeys, MediaKeySession, CDM…). These classes rely on the platform ones (CDMPrivate, CDMInstance, CDMInstanceSession) to handle the platform management, message exchange, etc. which would be the second layer. Apple playback system is fully integrated with their DRM system so they don’t need anything else. We do because we need to integrate our own decryptors to defer to Thunder for decryption so in the GStreamer based ports we also need the CDMProxy related classes, which would be CDMProxy, CDMInstanceProxy, CDMInstanceSessionProxy… The last two extend CDMInstance and CDMInstanceSession respectively to be able to deal with the key management, that is abstracted to the KeyHandle and KeyStore.

Once the abstraction is there (let’s remember that the abstranction works both for Clear Key and Thunder), the Thunder implementation is quite simple, just gluing the CDMProxy, CDMInstanceProxy and CDMInstanceSessionProxy classes to the Thunder system and writing a GStreamer decryptor element for it. I might have made a mistake when selecting the files but considering Thunder classes + the GStreamer common decryptor code, cloc says it is just 1198 lines of platform code. I think it is pretty low for what it does. Apart from that, obviously, there are 5760 lines of crossplatform code.

To build and run all this you need to do several things:

  1. Build the dependencies with WEBKIT_JHBUILD=1 JHBUILD_ENABLE_THUNDER="yes" to enable the old fashioned JHBuild build and force it to build the Thunder dependencies. All dependendies are on JHBuild, even Widevine is referenced but to download it you need the proper credentials as it is closed source.
  2. Pass --thunder when calling build-webkit.sh.
  3. Run MiniBrowser with WEBKIT_GST_EME_RANK_PRIORITY="Thunder" and pass parameters --enable-mediasource=TRUE --enable-encrypted-media=TRUE --autoplay-policy=allow. The autoplay policy is usually optional but in this case it is necessary for the YouTube TV tests. We need to give the Thunder decryptor a higher priority because of WebM, that does not specify a key system and without it the Clear Key one can be selected and fail. MP4 does not create trouble because the protection system is specified and the caps negotiation does its magic.

As you could have guessed if you have a closer look at the GStreamer JHBuild moduleset, you’ll see that only Widevine is supported. To support more, you only have to make them build in the Thunder ecosystem and add them to CDMFactoryThunder::supportedKeySystems.

When I coded this, all YouTube TV tests for Widevine were green in the desktop. At the moment of writing this post they aren’t because of some problem with the Widevine installation that will be sorted quickly, I hope.

by calvaris at September 02, 2020 02:59 PM

August 27, 2020

Chris Lord

OffscreenCanvas, jobs, life

Hoo boy, it’s been a long time since I last blogged… About 2 and a half years! So, what’s been happening in that time? This will be a long one, so if you’re only interested in a part of it (and who could blame you), I’ve titled each section.

Leaving Impossible

Well, unfortunately my work with Impossible ended, as we essentially ran out of funding. That’s really a shame, we worked on some really cool, open-source stuff, and we’ve definitely seen similar innovations in the field since we stopped working on it. We took a short break (during which we also, unsuccessfully, searched for further funding), after which Rob started working on a cool, related project of his own that you should check out, and I, being a bit less brave, starting seeking out a new job. I did consider becoming a full-time musician, but business wasn’t picking up as quickly as I’d hoped it might in that down-time, and with hindsight, I’m glad I didn’t (Covid-19 and all).

I interviewed with a few places, which was certainly an eye-opening experience. The last ‘real’ job interview I did was for Mozilla in 2011, which consisted mainly of talking with engineers that worked there, and working through a few whiteboard problems. Being a young, eager coder at the time, this didn’t really phase me back then. Turns out either the questions have evolved or I’m just not quite as sharp as I used to be in that very particular environment. The one interview I had that involved whiteboard coding was a very mixed bag. It seemed a mix of two types of questions; those that are easy to answer (but unless you’re in the habit of writing very quickly on a whiteboard, slow to write down) and those that were pretty impossible to answer without specific preparation. Perhaps this was the fault of recruiters, but you might hope that interviews would be catered somewhat to the person you’re interviewing, or the work they might actually be doing, neither of which seemed to be the case? Unsurprisingly, I didn’t get past that interview, but in retrospect I’m also glad I didn’t. Igalia’s interview process was much more humane, and involved mostly discussions about actual work I’ve done, hypothetical situations and ethics. They were very long discussions, mind, but I’m very glad that they were happy to hire me, and that I didn’t entertain different possibilities. If you aren’t already familiar with Igalia, I’d highly recommend having a read about them/us. I’ve been there a year now, and the feeling is quite similar to when I first joined Mozilla, but I believe with Igalia’s structure, this is likely to stay a happier and safer environment. Not that I mean to knock Mozilla, especially now, but anyone that has worked there will likely admit that along with the giddy highs, there are also some unfortunate lows.

Igalia

I joined Igalia as part of the team that works on WebKit, and that’s what I’ve been doing since. It almost makes perfect sense in a way. Surprisingly, although I’ve spent overwhelmingly more time on Gecko, I did actually work with WebKit first while at OpenedHand, and for a short period at Intel. While celebrating my first commit to WebKit, I did actually discover it wasn’t my first commit at all, but I’d contributed a small embedding-related fix-up in 2008. So it’s nice to have come full-circle! My first work at Igalia was fixing up some patches that Žan Doberšek had prototyped to allow direct display of YUV video data via pixel shaders. Later on, I was also pleased to extend that work somewhat by fixing some vc3 driver bugs and GStreamer bugs, to allow for hardware decoding of YUV video on Raspberry Pi 3b (this, I believe, is all upstream at this point). WebKit Gtk and WPE WebKit may be the only Linux browser backends that leverage this pipeline, allowing for 1080p30 video playback on a Pi3b. There are other issues making this less useful than you might think, but either way, it’s a nice first achievement.

OffscreenCanvas

After that introduction, I was pointed at what could be fairly described as my main project, OffscreenCanvas. This was also a continuation of Žan’s work (he’s prolific!), though there has been significant original work since. This might be the part of this post that people find most interesting or relevant, but having not blogged in over 2 years, I can’t be blamed for waffling just a little. OffscreenCanvas is a relatively new web standard that allows the use of canvas API disconnected from the DOM, and within Workers. It also makes some provisions for asynchronously updated rendering, allowing canvas updates in Workers to bypass the main thread entirely and thus not be blocked by long-running processes on that thread. The most obvious use-case for this, and I think the most practical, is essentially non-blocking rendering of generated content. This is extremely handy for maps, for example. There are some other nice use-cases for this as well – you can, for example, show loading indicators that don’t stop animating while performing complex DOM manipulation, or procedurally generate textures for games, asynchronously. Any situation where you might want to do some long-running image processing without blocking the main thread (image editing also springs to mind).

Currently, the only complete implementation is within Blink. Gecko has a partial implementation that only supports WebGL contexts (and last time I tried, crashed the browser on creation…), but as far as I know, that’s it. I’ve been working on this, with encouragement and cooperation from Apple, on and off for the past year. In fact, as of August 12th, it’s even partially usable, though there is still a fair bit missing. I’ve been concentrating on the 2d context use-case, as I think it’s by far the most useful part of the standard. It’s at the point where it’s mostly usable, minus text rendering and minus some edge-case colour parsing. Asynchronous updates are also not yet supported, though I believe that’s fairly close for Linux. OffscreenCanvas is enabled with experimental features, for those that want to try it out.

My next goal, after asynchronous updates on Linux, is to enable WebGL context support. I believe these aren’t particularly tough goals, given where it is now, so hopefully they’ll happen by the end of the year. Text rendering is a much harder problem, but I hope that between us at Igalia and the excellent engineers at Apple, we can come up with a plan for it. The difficulty is that both styling and font loading/caching were written with the assumption that they’d run on just one thread, and that that thread would be the main thread. A very reasonable assumption in a pre-Worker and pre-many-core-CPU world of course, but increasingly less so now, and very awkward for this particular piece of work. Hopefully we’ll persevere though, this is a pretty cool technology, and I’d love to contribute to it being feasible to use widely, and lessen the gap between native and the web.

And that’s it from me. Lots of non-work related stuff has happened in the time since I last posted, but I’m keeping this post tech-related. If you want to hear more of my nonsense, I tend to post on Twitter a bit more often these days. See you in another couple of years 🙂

by Chris Lord at August 27, 2020 08:56 AM

August 16, 2020

Eleni Maria Stea

[OpenGL and Vulkan Interoperability on Linux] Part 2: Using OpenGL to draw on Vulkan textures.

This is the second post of the OpenGL and Vulkan interoperability series, where I explain some EXT_external_objects and EXT_external_objects_fd use cases with examples taken by the Piglit tests I’ve written to test the extensions as part of my work for Igalia‘s graphics team. We are going to see a very simple case of Vulkan/GL interoperability … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 2: Using OpenGL to draw on Vulkan textures.

by hikiko at August 16, 2020 09:51 AM

August 13, 2020

Javier Fernández

Improving CSS Custom Properties performance

Chrome 84 reached the stable channel a few weeks ago, and there are already several great posts describing the many important additions, interesting new features, security fixes and improvements in privacy policies (([1], [2], [3], [4]) it contains. However, there is a change that I worked on in this release which might have passed unnoticed by most, but I think is very valuable: A change regarding CSS Custom Properties (variables) performance.

The design of CSS, in general, takes great care in considering how features are designed with respect to making it possible for them to perform well. However, implementations may not perform as well as they could, and it takes a considerable amount of time to understand how authors use the features and which cases are more relevant for them.

CSS Custom Properties are an interesting example to look at here: They are a wonderful feature that provides a lot of advantages for web authors. For a whole lot of cases, all of the implementations of CSS Custom Properties perform well enough that most people won’t notice. However, we at Igalia have been analyzing several use cases and looking at some reports around their performance in different implementations.

Let’s consider a fairly straightforward example in which an author sets a single property in a toggleable class in the body, and then uses that property several times deeper in the tree to change the foreground color of some text.


   .red { --prop: red; }
   .green { --prop: green; }



Only about 20% of those actually use this property, 5 elements deep into the tree, and only to change the foreground color.

To evaluate Chromium’s performance in a case like this we can define a new perf tests, using the perf tools the Chromium project has available for browser engineers. In this case, we want a huge tree so that we can evaluate better the impact of the different optimizations.


    .green { --prop: green; }
    .red { --prop: red; }


    

These are the results obtained runing the test in Chrome 83:

avg median
stdev

min

max

163.74 ms 163.79 ms 3.69 ms 158.59 ms 163.74 ms

I admit that it’s difficult to evaluate the results, especially considering the number of nodes of such a huge DOM tree. Lets compare the results of the same test on Firefox, using different number of nodes.

Nodes 50K 20K 10K 5K 1K 500
Chrome 83 163.74 ms 55.05 ms 25.12 ms 14.18 ms 2.74 ms 1.50 ms
FF 78 28.35 ms 12.05 ms 6.10 ms 3.50 ms 1.15 ms 0.55 ms
1/6 1/5 1/4 1/4 1/2 1/3

As I commented before, the data are more accurate when the DOM tree has a lot of nodes; in any case, the difference is quite clear and shows there is plenty room for improvement. WebKit based browsers have results more similar to Chromium as well.

Performance tests like the one above can be added to browsers for tracking improvements and regressions over time, so we’ve added (r763335) that to Chromium’s tree: We’d like to see it get faster over time, and definitely cannot afford regressions (see Chrome Performance Dashboard and the ChangeStyleCustomPropertyDeclaration test for details) .

So… What can we do?

In Chrome 83 and lower, whenever the custom property declaration changed, the new declaration would be inherited by the whole tree. This inheritance implied executing the whole CSS cascade and recalculating the styles of all the nodes in the entire tree, since with this approach, all nodes may be affected.

Chrome had already implemented an optimization on the CSS cascade implementation for regular CSS properties that don’t depend on any other to resolve their value. These subset of CSS properties are defined as Independent Properties in the Chromium codebase. The optimization mentioned before affects how the inheritance mechanism is implemented for these Independent properties. Whenever one of these properties changes, instead of recalculating the styles of the inherited properties, children can just copy the whole parent’s computed style. Blink’s style engine has a component known as Matched Properties Cache responsible of deciding when is possible to avoid the style resolution of an element and instead, performing an efficient copy of the matched computed style. I’ll get back to this concept in the last part of this post.

In the case of CSS Custom Properties, we could apply a similar approach as a good step. We can consider that the nodes with computed styles that don’t have references to custom properties declarations shouldn’t be affected by the new declaration, and we can implement the inheritance directly by copying the parent’s computed style. The patch with the optimization I’ve implemented in r765278 initially landed in Chrome 84.0.4137.0

Let’s look at the result of this one action in the Chrome Performance Dashboard:

That’s a really good improvement!

However, it’s also just a first step. It’s clear that Chrome still has a wide margin for improvement in this case, as well any WebKit based browser – Firefox is still, impressively, markedly faster as it’s been described in the bug report filed to track this issue. The following table shows the result of the different browsers together; even disabling the muti-thread capabilities of Firefox’s Stylo engine (STYLO_THREAD=1), FF is much faster than Chrome with the optimization applied.

Chrome 83 Chrome 84 FF 78 FF 78 th=1
avg
median
stdev
min
max
163.74 ms
163.79 ms
3.69 ms
158.59 ms
163.74 ms
117.37 ms
117.52 ms
1.98 ms
113.66 ms
120.87 ms
28.35 ms
28.50 ms
0.93 ms
26.00 ms
30.00 ms
38.25 ms
38.50 ms
1.86 ms
35.00 ms
41.00 ms

Before continue, I want get back to the Matched Properties Cache (MPC) concept, since it has an important role on these style optimizations. This cache is not a new concept in the Chrome’s engine; as a matter of fact, it’s also used in WebKit, since it was implemented long ago, before the fork that created the new blink engine. However, Google has been working a lot on this area in the last years and some of the most recent changes in the MPC have had an important impact on style resolution performance. As a result of this work, elements with independent and non-independent properties using CSS Variables might produce cache hits in the MPC. The results of the Performance Dashboard show a considerable improvement in the mentioned ChangeStyleCustomPropertyDeclaration test (avg: 108.06 ms)

Additionally, there are several other cases where the use of CSS Variables has a considerable impact on performance, compared with using regular CSS properties. Obviously, resolving CSS Variables has a cost, so it’s clear that we could apply additional optimizations that reduce the impact of the variable resolution, especially for handling specific style changes that might not affect to a substantial portion of the DOM tree. I’ve been experimenting with the MPC to explore the idea an independent CSS Custom Properties cache; nodes with variables referencing the same custom property will produce cache hits in the MPC, even though other properties don’t match. The preliminary approach I’ve been implementing consists on a new matching function, specific for custom properties, and a mechanism to transfer/copy the property’s data to avoid resolving the variable again, since the property’s declaration hasn’t change. We would need to apply the css cascade again, but at least we could save the cost of the variable resolution.

Of course, at the end of the day, improving performance has costs and challenges – and it’s hard to keep performance even once you get it. Bit if we really want performant CSS Custom Properties, this means that we have to decide to prioritize this work. Currently there is reluctance to explore the concept of a new Custom Properties specific cache – the challenge is big and the risks are not non-existent; cache invalidation can get complicated. But, the point is that we have to understand that we aren’t all going to agree what is important enough to warrant attention, or how much investment, or when. Web authors must convince vendors that these use cases are worth being optimized and that the cost and risks of such a complex challenges should be assumed by them.

This work has been sponsored by Bloomberg, which I consider one of the most important contributors of the Web Platform. After several years, the vision of this company and its responsibility as consumer of the platform has lead to many and important contributions that we all enjoy now. Although CSS Grid Layout might be the most remarkable one, there are may other not that big, like this work on CSS Custom Properties, or several other new features of the CSS Text specification. This is a perfect example of an company that tries to change priorities and adapt the web platform to its needs and the use cases they consider more aligned with their business strategy.

I understand that not every user of the web platform can do this kind of investment. This is why I believe that initiatives like Open Priorization could help to move the web platform in a positive direction. By providing a way for us to move past a lot of these conversation and focus on the needs that some web authors and users of the platform consider more important, or higher priority. Improving performance for CSS Custom Properties isn’t currently one of the projects we’ve listed, but perhaps it would be an interesting one we might try in the future if we are successful with these. If you haven’t already, have a look and see if there is something there that is interesting to you or your company – pledges of any size are good – ten thousand $1 donations are every bit as good as ten $1000 donations. Together, we can make a difference, and we all benefit.

Also, we would love to hear about your ideas. Is improving CSS Custom Properties performance important to you? What else is? Share your comments with us on Twitter, either me (@lajava77) or our developer advocate Brian Kardell (@briankardell), or email me at jfernandez@igalia.com. I’d be glad to answer any question about the Open Priorization experiment.

by jfernandez at August 13, 2020 06:16 PM

August 12, 2020

Brian Kardell

Reimaging the Vendor-Only Model

Reimaging the Vendor-Only Model

I have been trying to organize my personal thoughts on this and figure out how to carefuly articulate them for a while now. I hadn't planned on posting this just yet but recent events and discussions make me think that maybe it is worth sharing.

I've been talking a lot this year about the health of the Web Ecosystem[1], [2], [3],, kind of trying to raise the larger discussion that we should have a discussion about how we think about it and re-consider how we ensure it remains (or increases) its vibrance. Our recent open-prioritization experiment is meant, in part, to prompt some of this discussion and begin looking at ways to make things better. Recent news from Mozilla, it seems, has done as much to prompt some urgency on this discussion in people's minds, so let's dig into it.

One comment that I heard a lot over the years goes something like this...

"This is vendors responsibility - and their companies make bajillions of dollars on the web... Why don't they just hire more people?.

It's a very natural question since this is the way we tried look at it in practice for roughly the first 25 years, and it's mostly how many of us still think about it. It's hard to even imagine something else with that kind of inertia. And... Maybe that model is fine.

But then again... Maybe not. I think it's worth considering together, because I am ever increasingly on the "probably not" side. So, now that the conversation is started, let me try to explain why...

The trouble with the vendor-only model...

In the entire world, at the time of this writing, only 3 organizations currently make the level of investments necessary to maintain a rendering engine at all. Developers sometimes express frustration at the level of investment because 2 of these 3 organizations are part of the most profitable companies on earth, and they have chosen to invest very differently. That's fine. You can continue to do that, and I'm not suggesting that is invalid ... But I think this somewhat of a red herring and that we might be missing the more important conversation.

At some level, what is far more interesting than the specific arbitrary amount and how it relates to their profits (by this argument both of them could, in theory, invest more) is the fact that they are under no actual obligation to do so, and that in the end,there are no guarantees about continued level of investment.

Don't get me wrong: I'm tremendously happy that they do, and I absolutely want them to continue playing a core role - just not alone. I think this is too important to not discuss - because we've already seen several varieties of changes here over time. Whatever the world looks like today - the only thing you can be sure of is that change is inevitable. As hard as it is to believe, all the paradigms of organizational power will eventually shift. The Fortune 500 changes, ultimately many declare bankruptcy and/or are completely re-imagined. Even if we are extremely fortunate and a company really wants, at its very core, to put in a huge level of investment - there may come a day when they just can't.

In other words: The traditional model of a few big vendors voluntarily funding development of the Web with very specific revenues puts all of eggs in a very particularly centralized and fickle basket. Just this week, there is related news from Mozilla, kind of demonstrating the latter point.

Conversely: diversifying investments in the commons would buffer it significantly - and thankfully, it would seem that there are plenty of opportunities to do so. Let's look some..

Diversifying investments

The most obvious and immediate thing to do is to try to expand the sorts of things that have already begun happening and working...

Finding additional ways of funding an organization like Mozilla itself is a good idea. Experimenting with new models for funding development like Brave or Puma which could potentially make a browser itself actually profitable are interesting ideas too.

Some things, like CSS Grid, are examples of really big investments from non-browser vendors. A whole lot of that implementation in two browsers was funded by Bloomberg Tech and done by Igalia thanks to the increasingly open and collaborative nature of all of the engines. That is great too! There are many wealthy organizations out there (Fortune companies)who can help expand investment like that. If you work for an organization who is intersted in that, reach out to me!

However, this avenue isn't immediately appealing even to a lot of big companies. Could we do more?

Sure, why not! There is a much bigger group still of big companies who perhaps won't, themselves, fund a whole feature. What we are trying to show with open-prioritization is that there are many ways that we can pool money toward a specific purpose. Several big companies can share the costs. That seems good, and it's one thing that open-prioritization allows for: A comparatively small number of really big businesses working better together would be excellent all on its own.

But then... If we're exploring diversifying funding, why place some arbitrary limits on "bigness"? What makes them special? Perhaps you don't have to be a mega-company to help advance things? Experimenting with this is also part of what open-prioritization is about -Maybe there doesn't even have to be a single answer. In fact, maybe it's better if there isn't!

Exploring many models

When I was a kid, and cable television was "new", my grandmother thought it was just ridiculous that someone would pay for television. After all, it came over the air wave "for free". Of course... It wasn't free, and despite it costing a lot to produce and run, it was a money making endeavor. The cost to maintain a station is comparatively tiny compared to the potential profits they could make from shoving advertising in our eyeballs. You paid. All the way through the system. A small amount of your product purchases funded advertising, which funded broadcast.

Actually, the web isn't that different here. It is traditionally largely supported by very few revenues, mainly advertising in search. When ad revenue takes a hit, so do the budgets.

But over time we've created lots of ways to explore different models. When it comes to watching TV and movies, I haven't seen "traditional commercial TV" in years - I use some other model to pay for content more directly. And, honestly, this has been good in a whole lot of ways. It's generated some real good content even. Still, there are other forms of short form content - podcasts or music services, for example, that I actually prefer the advertising model for. And, you can also donate to public radio.

Open prioritization is aimed at exploring all of the different ways that we can realize our potential for power and give people opportunities to improve things in many ways.

With a way to pool funding, there doesn't have to be artificial barriers. We can give smaller organizations opportunities to participate however they can: Maybe that is simply through advocating/lobbying bigger companies for specific dollars for development of specific thing. Perhaps sometimes they don't even need a bigger company. Perhaps together, several smaller organizations can rally together to act as a big one? It seems good to allow them to.

The one thing that you'll notice that all answers have in common is that ultimately, somehow lots of people are paying in small amounts, and lots of little bits add up to a lot.

The potential power of collectives

With no arbitrary limits, even developers themselves could play a big, very direct role... If we want to.

Should we? I don't know!

I'd love to talk about it though because I think that hidden in here is something with immense potential for good by voluntarily even small commitments simply by realizing our sheer numbers. This is sort of the power of collectives. If you're looking for interesting analaogies, check out Pia Mancini's Talks/Media posts explaining things like how and why this can also work for governments.

Consider that meetup.com alone lists something like 18 million people who identify as Web developers. Using that as just some kind of working number: If even 1/10th of those developers decided to voluntarily contibute a single dollar each month, this would be over 21.5 million dollars annually that we could use to collectively directly decide what should happen. This seems very plausible as you need far less than 1/10th if we assume that some people would be willing to give $2 or $5 or maybe even $10.

In other words, in addition to all of the other ways that we can explore, even we developers actuallly have enormous potential untapped power even with very limited kinds of investment to play a role (not exclusively)... That's more than we can convince most big companies to invest, to directly shape things today. That would be some extreme diversification, and a real "commons". To me that seems really good to have in the mix... If we want it.

I guess the question is: Do we?

I am willing to accept the possibility that it is possible that the answer is simply "No thanks" - but I think it's great that at least we have the choice/opportunity and an avenue to begin to consider all of these things, directly, and have the discussion. I think that's a real positive. What do you think? I'd love to hear from you. If you like this idea, pass it on and maybe consider pledging $1 to one of the pledge collectives in our experiment to help advance the larger ideas and conversation.

August 12, 2020 04:00 AM

July 31, 2020

Alejandro Piñeiro

v3dv status update 2020-07-31

Iago talked recently about the work done testing and supporting well known applications, like the Vulkan ports of the Quake1, Quake 2 and Quake3. Let’s go back here to the lastest news on feature and bugfixing work.

Pipeline cache

Pipeline cache objects allow the result of pipeline construction to be reused. Usually (and specifically on our implementation) that means caching compiled shaders. Reuse can be achieved between pipelines creation during the same application run by passing the same pipeline cache object when creating multiple pipelines. Reuse across runs of an application is achieved by retrieving pipeline cache contents in one run of an application, saving the contents, and using them to preinitialize a pipeline cache on a subsequent run.

Note that it may happens that a pipeline cache would not improve the performance of an application once it starts to render. This is because application developers are encouraged to create all the pipelines in advance, to avoid any hiccup during rendering. On that situation pipeline cache would help to reduce load times. In any case, that is not always avoidable. In that case the pipeline cache would allow to reduce the hiccup, as a cache hit is far faster than a shader recompilation.

One specific detail about our implementation is that internally we keep a default pipeline cache, used if the user doesn’t provide a pipeline cache when creating a pipeline, and also to cache the custom shaders we use for internal operations. This allowed to simplify our code, discarding some custom caches that were already implemented.

Uniform/storage texel buffer

Uniform texel buffers define a tightly-packed 1-dimensional linear array of texels, with texels going through format conversion when read in a shader in the same way as they are for an image. They are mostly equivalent to OpenGL buffer texture, so you can see them as textures backed up by a VkBuffer (through a VkBufferView). With uniform texel buffers you can only do a formatted load.

Storage texel buffers are the equivalent concept, but applied to images instead of textures. Unlike uniform texel buffers, they can also be written to in the same way as for storage images.

Multisampling

Multisampling is a technique that allows to reduce aliasing artifacts on images, by by sampling pixel coverage at multiple subpixel locations and then averaging subpixel samples to produce a final color value for each pixel. We have already started working on this feature, and included some patches on the development branch, but it is still a work in progress. Having said so, it is enough to get Sascha Willems’s basic multisampling demo working:

Sascha Willems multisampling demo run on rpi4

Bugfixing

Again, in addition to work on specific features, we also spent some time fixing specific driver bugs, using failing Vulkan CTS tests as reference. So let’s take a look of some screenshots of Sascha Willem’s demos that are now working:

Sascha Willems deferred demo run on rpi4

Sascha Willems texture array demo run on rpi4

Sascha Willems Compute N-Body demo run on rpi4

Next

We plan to work on supporting the following features next:

  • Robust access
  • Multisample (finish it)

Previous updates

Just in case you missed any of the updates of the vulkan driver so far:

Vulkan raspberry pi first triangle
Vulkan update now with added source code
v3dv status update 2020-07-01
V3DV Vulkan driver update: VkQuake1-3 now working

by infapi00 at July 31, 2020 08:23 AM

July 27, 2020

Jacobo Aragunde

The trip of a key press event in Chromium accessibility

It’s amazing to think about how much computing goes into something as simple as a keystroke that we just take for granted. Recently, I was fixing a bug related to accessibility key events, and to do this, first I had to understand the complex trip that these events take when they arrive to the browser – from the X server until they reach the accessibility system.

Let me start from the beginning. I’m working on the accessibility of the Chromium browser on Linux. The bug was #1042864: key strokes happening on native dialogs, like open and save dialogs, were not reported to the screen reader. The issue also affects Electron-based software, one important example is Visual Studio Code.

A cake with many layers

Linux accessibility is composed of many layers, and this fact stands out when working on a complex piece of software like Chromium which comes with its own UI toolkit.

Currently, the most straight-forward way to build accessible applications for the Linux desktop is using and implementing the hooks provided by the ATK (Accessibility Toolkit) API. GTK+ widgets already do so, applications using them will be accessible by default, but more complex software that implements custom widgets or its own toolkit will need to provide the code for the ATK entry points and emit the corresponding events. That’s the case of Chromium!

The screen reader or any Assistive technology (AT) has to get information and listen for events happening in any software running on the system, to transform it into something meaningful for their users, like speech or braille.

AT-SPI is the glue between these two ends: it runs at system level, observing what’s going on with applications and keeping a global registry of the accessible objects; it receives updates from applications, which translate local ATK objects into global AT-SPI objects; and it receives queries from ATs then pings them when events of their interest happen. It uses D-Bus for inter-process communication (IPC).

You can learn more about accessibility, in general and also in the web platform, in this great interview with my colleague Martin Robinson.

The trip of a keypress event

Let’s say we are building an AT in Python, making use of the pyatspi2 library. We want to listen keypress events, so we run registerKeystrokeListener to register a callback function.

Pyatspi2 actually wraps AT-SPI’s function atspi_register_keystroke_listener, which eventually calls the remote method RegisterKeystrokeListener via D-Bus. The actual D-Bus remote calls happen at dbind.c.

We have jumped from our AT to the AT-SPI service, via IPC. The DeviceEventController interface provides the remote method mentioned above, and the actual code implementing it is in impl_register_keystroke_listener. Then, the function is added to a list of listeners for key events, in spi_controller_register_device_listener; these listeners in the list will be notified when an event happens, in spi_controller_notify_keylisteners.

AT-SPI will sit there, waiting for the events from applications to arrive. They will come over through D-Bus, as they are from different processes: the entry point for any D-Bus message in the AT-SPI core is handle_dec_method_from_idle. One of the operations, NotifyListenersSync, will run impl_notify_listeners_sync which will later call the function spi_controller_notify_keylisteners we just mentioned, and run all the registered listeners.

Who will call the remote method NotifyListenersSync? Applications will have to do it, if they want to be accessible. They could implement this themselves, but they are likely using a wrapper library. In the case of GTK+, there is at-spi2-atk, which bridges ATK signals with the at-spi2-core D-Bus interfaces so applications don’t have to know them.

The bridge sports its own callback, named spi_atk_bridge_key_listener, and eventually calls the NotifyListenersSync method via D-Bus, at Accessibility_DeviceEventController_NotifyListenersSync. The callback was registered as an ATK key event listener in spi_atk_register_event_listeners: the function atk_add_key_event_listener, which is part of the ATK API, registers the key event listener, and it internally makes use of the AtkUtil struct as defined in atkutil.h.

AtkUtil functions are not implemented by the ATK library; instead, toolkits must provide them. GTK+ does it in _gtk_accessibility_override_atk_util; in the case of Chromium, the functions that populate the AtkUtil struct are defined in atk_util_auralinux_class_init. The particular function that registers the key event listener in Chromium is AtkUtilAuraLinuxAddKeyEventListener: it gets added to a list, which will later be called when an Atk key event is processed in Chromium, at HandleAtkKeyEvent.

Are we there yet?

There are more pieces of software involved in this issue: Chromium will receive key press events from the X server, involving another IPC connection and API layer, and there’s all the browser code managing them, where the solution was actually implemented. I will cover those parts, and the actual solution, in a future post.

Meanwhile, I hope you enjoyed reading this, and happy hacking!

by Jacobo Aragunde Pérez at July 27, 2020 03:01 PM

July 23, 2020

Iago Toral

V3DV Vulkan driver update: VkQuake1-3 now working

A few weeks ago we shared an update regarding the status of the Vulkan driver for the Raspberry Pi 4. In that post I mentioned that the plan was to focus on completing the feature set for Vulkan 1.0 and then moving on to conformance and bugfixing work before attempting to run actual games and applications. A bit later my colleague Alejandro shared another update detailing some of our recent feature work.

We have been making good progress so far and at this point we are getting close to having a complete Vulkan 1.0 implementation. I believe the main pending features for that are pipeline caches, which Alejandro is currently working on, texel buffers, multisampling support and robust buffer access, so in the last few weeks I decided to take a break from feature development and try to get some Vulkan games running with our driver and use them to guide some inital performance work.

I decided to work with all 3 VkQuake games since they run on Linux, the source code is available (which makes things a lot easier to debug) and seemed to be using a subset of the Vulkan API we already supported. For vkQuake we needed compute shaders and input attachments that we implemented recently, and for vkQuake3 we needed a couple of optional Vulkan features which I implemented recently to get it running without having to modify the game code. So all these games are now running on the Raspberry Pi4 with the V3DV driver. At the same time, our friend Salva from Pi Labs has also been testing the PPSSPP emulator using Vulkan and reporting that some games seem to be working already, which has been great to hear.

I was particularly interested in getting vkQuake3 to work because the project includes both the Vulkan and the original OpenGL renderers, which was great to compare performance between both APIs. VkQuake3 comes with a GL1 and a GL2 renderer, with the GL1 render being the fastest of the two by a large margin (apparently the GL2 renderer has additional rendering features that make it much slower). I think the Vulkan renderer is based on the GL1 renderer (although I have not actually checked) so I figured it would make the most reasonable comparison, and in our tests we found the Vulkan version to be up to 60% faster. Of course, it could be argued that GL1 is a pretty old API and that the difference with a more modern GL or GLES renderer might be less significant, but it is still a good sign.

To finish the post, here are some pics of the games:


vkQuake

vkQuake2

vkQuake3 OpenGL 1 renderer

vkQuake3 OpenGL 1 renderer

vkQuake3 Vulkan renderer

vkQuake3 Vulkan renderer

A couple of final notes:
* Note that the Vulkan renderer for vkQuake3 is much darker, but that is just how the renderer operates and not a driver issue, we observed the same behavior on Intel GPUs.
* A note for those interested in trying vkQuake3, we noticed that exterior levels have broken sky rendering, I hope we will get to fix that soon.

by Iago Toral at July 23, 2020 11:29 AM

July 13, 2020

Frédéric Wang

Igalia's contribution to the Mozilla project and Open Prioritization

As many web platform developer and Firefox users, I believe Mozilla’s mission is instrumental for a better Internet. In a recent Igalia’s chat about the Web Ecosystem Health, participants made the usual observation regarding this important role played by Mozilla on the one hand and the limited development resources and small Firefox’s usage share on the other hand. In this blog post, I’d like to explain an experimental idea we are launching at Igalia to try and make browser development better match the interest of the web developer and user community.

Igalia’s contribution to browser repositories

As mentioned in the past in this blog, Igalia has contributed to different part of Firefox such as multimedia (e.g. <video> support), layout (e.g. Stylo, WebRender, CSS, MathML), scripts (e.g. BigInt, WebAssembly) or accessibility (e.g. ARIA). But is it enough?

Although commit count is an imperfect metric it is also one of the easiest to obtain. Let’s take a look at how Igalia’s commits repositories of the Chromium (chromium, v8), Mozilla (mozilla-central, servo, servo-web-render) and WebKit projects were distributed last year:

pie chart
Diagram showing, the distribution of Igalia's contributions to browser repositories in 2019 (~5200 commits). Chromium (~73%), Mozilla (~4%) and WebKit (~23%).

As you can see, in absolute value Igalia contributed roughly 3/4 to Chromium, 1/4 to WebKit, with a small remaining amount to Mozilla. This is not surprising since Igalia is a consulting company and our work depends on the importance of browsers in the market where Chromium dominates and WebKit is also quite good for iOS devices and embedded systems.

This suggests a different way to measure our contribution by considering, for each project, the percentage relative to the total amount of commits:

Bar graph
Diagram showing, for each project, the percentage of Igalia's commits in 2019 relative to the total amount of the project. From left to right: Chromium (~3.96%), Mozilla (~0.43%) and WebKit (~10.92%).

In the WebKit project, where ~80% of the contributions were made by Apple, Igalia was second with ~10% of the total. In the Chromium project, the huge Google team made more than 90% of the contributions and many more companies are involved, but Igalia was second with about 4% of the total. In the Mozilla project, Mozilla is also doing ~90% of the contributions but Igalia only had ~0.5% of the total. Interestingly, the second contributing organization was… the community of unindentified gmail.com addresses! Of course, this shows the importance of volunteers in the Mozilla project where a great effort is done to encourage participation.

Open Prioritization

From the commit count, it’s clear Igalia is not contributing as much to the Mozilla project as to Chromium or WebKit projects. But this is expected and is just reflecting the priority set by large companies. The solid base of Firefox users as well as the large amount of volunteer contributors show that the Mozilla project is nevertheless still attractive for many people. Could we turn this into browser development that is not funded by advertising or selling devices?

Another related question is whether the internet can really be shaped by the global community as defended by the Mozilla’s mission? Is the web doomed to be controlled by big corporations doing technology’s “evangelism” or lobbying at standardization committees? Are there prioritization issues that can be addressed by moving to a more collective decision process?

At Igalia, we internally try and follow a more democratic organization and, at our level, intend to make the world a better place. Today, we are launching a new Open Prioritization experiment to verify whether crowdfunding could be a way to influence how browser development is prioritized. Below is a short (5 min) introductory video:

I strongly recommend you to take a look at the proposed projects and read the FAQ to understand how this is going to work. But remember this is an experiment so we are starting with a few ideas that we selected and tasks that are relatively small. We know there are tons of user reports in bug trackers and suggestions of standards, but we are not going to solve everything in one day !

If the process is successful, we can consider generalizing this approach, but we need to test it first, check what works and what doesn’t, consider whether it is worth pursuing, analyze how it can be improved, etc

Two Crowdfunding Tasks for Firefox

CIELAB color space*
Representation of the CIELAB color space (top view) by Holger Everding, under CC-SA 4.0.

As explained in the previous paragraph, we are starting with small tasks. For Firefox, we selected the following ones:

  • CSS lab() colors. This is about giving web developers a way to express colors using the CIELAB color space which approximates better the human perception. My colleague Brian Kardell wrote a blog with more details. Some investigations have been made by Apple and Google. Let’s see what we can do for Firefox !

  • SVG path d attribute. This is about expressing SVG path using the corresponding CSS syntax for example <path style="d: path('M0,0 L10,10,...')">. This will likely involve a refactoring to use the same parser for both SVG and CSS paths. It’s a small feature but part of a more general convergence effort between SVG and CSS that Igalia has been involved in.

Conclusion

Is this crowd-funded experiment going to work? Can this approach solve the prioritization problems or at least help a bit? How can we improve that idea in the future?…

There are many open questions but we will only be able to answer them if we have enough people participating. I’ll personally pledge for the two Firefox projects and I invite you to at least take a look and decide whether there is something there that is interesting for you. Let’s try and see!

July 13, 2020 10:45 AM

Oriol Brufau

Open Prioritization for implementing selector list argument of :not() in Chrome

CSS :not() selector

As you probably know, CSS selectors are patterns that can be used to apply a group of style declarations to multiple elements. For example, if you want to select all elements with the class foo, you can use the selector .foo or [class~=foo]. But sometimes, instead of styling the elements that have a specific characteristic, we want to exclude these and style the other ones.

For example, some developers prefer the border box sizing model rather than the content box one, so they use

* { box-sizing: border-box }

But then, imagine there is an image like

<img src="img.png" width="100" height="50" style="padding: 5px" />

The size attributes will include the padding, so the resulting content area will be 90x40. Not only will the image be downscaled, possibly producing artifacts, but it will also be stretched: the aspect ratio becomes 2.25 instead of the natural 2.

The above shows that excluding images can be a good idea. In CSS2, the solution was simply using a more specific selector to undo the general one:

* { box-sizing: border-box }
img { box-sizing: content-box }

However, in more complex cases this can get tedious, and the reset value may not be that obvious. Therefore, the Selectors Level 3 specification introduced the :not() pseudo-class. It accepts a selector argument, and matches all elements that do not match the argument. The example above can then be

:not(img) { box-sizing: border-box }

There were some limitations though. The selector argument had to be a simple selector, that is, one of:

Additionally, nesting negations like :not(:not(...)) was also invalid.

:not() with selector list argument

Selectors level 4 has made :not() more flexible, now it accepts any selector list as the argument.

In particular, it allows a compound selector argument, like :not(ul[reversed]), which matches all elements except the ones that both are ul and have the attribute reversed.

Another example of what you can do in level 4 is using a complex selector, that is, a sequence of compound selectors separated by combinators. For instance, :not(div > p) matches all elements except the p which have a div parent.

And finally, an example with a selector list could be :not(div, p), matching all elements except the ones that are either a div or a p.

Is that completely new behavior? Well, using De Morgan’s laws, the selectors above can be transformed into others which were valid in level 3:

Level 4 new syntax Level 3 alternative
:not(ul[reversed]) :not(ul), :not([reversed])
:not(div > p) :not(p), :not(div) > p, p:root
:not(div, p) :not(div):not(p)

However, the specificities are not completely equivalent. The specificity of a selector is a tern of 3 natural numbers (A,B,C), where in simple cases A is the number of ID selectors in the whole selector, B is the number of attribute or class selectors and pseudo-classes, and C is the number of type selectors and pseudo-elements. The specificity is one of the criteria used to solve conflicts when different CSS rules set the same property to the same element: the winning declaration will be the one with the most specific selector, when comparing specificities in lexicographical order.

The specificity of a :not() pseudo-class is the greatest among the specificities of the complex selectors in the argument, while the specificity of a selector list is the greatest among the complex selectors that match.

For example, for :not(ul[reversed]), the specificity is the same as for ul[reversed], i.e. (0,1,1). However, the specificity of :not(ul), :not([reversed]) can either be:

  • If only :not(ul) matches: (0,0,1).
  • If only :not([reversed]) matches: (0,1,0).
  • If both match, the maximum: (0,1,0).

Additionally, the level 3 alternatives can become cumbersome when used as part of a bigger selector. For instance, :not(.a1.a2) :not(.b1.b2) :not(.c1.c2) would become

:not(.a1) :not(.b1) :not(.c1), :not(.a1) :not(.b1) :not(.c2),
:not(.a1) :not(.b2) :not(.c1), :not(.a1) :not(.b2) :not(.c2),
:not(.a2) :not(.b1) :not(.c1), :not(.a2) :not(.b1) :not(.c2),
:not(.a2) :not(.b2) :not(.c1), :not(.a2) :not(.b2) :not(.c2)

Note the combinatorial explosion! It could be avoided using :is() or :where(), but they are also new level 4 additions.

Moreover, not all :not() selectors have a finite alternative in level 3. Consider :not(div p), that is, all elements which either are not p or don’t have any div ancestor. The problem is that we can’t directly enforce a constraint over all ancestors, instead we need something like

:not(p), p:root,
:root:not(div) > p,
:root:not(div) > :not(div) > p,
:root:not(div) > :not(div) > :not(div) > p,
/* ... */

and if a priori the DOM tree can be arbitrarily deep, we don’t know when to end the selector.

It’s for all these reasons that allowing selector lists in :not() is such a nice addition!

Browser support and Igalia’s Open Prioritization

If the new capabilities of :not() with a selector list sound cool to you, you might want to start using it right now! However, while major browsers have supported the level 3 version for a long time, only WebKit supports the level 4 one, Chrome and Firefox don’t.

But here is where Igalia comes into play! We are happy to announce Open Prioritization, an experiment for crowdfunding web platform features.

Historically, which features are implemented sooner and which ones are delayed has been decided by browser vendors, and big companies that can fund the work. But wouldn’t it be great if web developers could also have a saying in this prioritization?

At Igalia we have selected a few tasks that we think the community might be interested in. During a first stage, anybody can pledge their desired amount of money for their preferred choices. The first feature that reaches the goal will be selected for a 2nd stage, in which the actual funding will happen. And once funded, Igalia will do the implementation work. Note: don’t worry if you pledge for an option which is not selected, your money is only deducted when funding. See the FAQ for more details.

One of the features that we offer is precisely implementing :not() with selector lists in Chrome. If you like the idea, I invite you to pledge here!

Open Prioritization by Igalia. An experiment in crowd-funding prioritization.

by Oriol Brufau at July 13, 2020 10:45 AM

Brian Kardell

Open Prioritization and Advocacy

Open Prioritization and Advocacy

I love the Web, and I want it to thrive. It is an incredible open commons that makes so much of modern life possible, and better. If you're reading this, chances are pretty good that the web has even enabled your career. Mine too! But there's also pretty good odds that you think it could be better. I agree! Let me tell you about an experiment at visibly making it better, together.

The web is 30 years old. In that time we've learned a lot about the problem space involved. I write a lot about the challenges we've faced, and am a regular advocate for the idea that we can do better. I believe that when we lean on pragmatic answers and include more people as directly as possible, we all win. The commons wins. That's why I am so excited about our new experiment...

Open Prioritization, by Igalia: An experiment in crowdfunding prioritization

If you're a fan of The ShopTalk show, I first publicly discussed the idea there back in April. But, as you might have noticed, the world has been full of snags in 2020 and things got a little delayed. But finally, here we are!

There will be lots of posts describing what its about, it's a big idea. But, in a very few words: Here are six concrete web features that we can crowdfund together toward advancing actual work. Here is a means of having a tangible voice in how we advance this commons that we all share. Below is a short (5 min) video that describes it and answers questions submitted by some friends we talked to about it early...

More details are available at open-prioritization.igalia.com, and that's where you can pledge, and it contains more details too. But before you pledge anything, I'd like to offer some thoughts on it before you go.

I think that we'll see that prioritization is hard. I think we'll see that we probably don't all automatically agree on which thing is most worthy of prioritization.

And that's ok... In fact, it's really interesting.

Which of the 6 features best to support...

Everything is best. It depends who you ask. Part of the purpose of this experiment is to show that groups can find new ways to advocate and work together to decide to do concrete things. So let me attempt to do just that and get the ball rolling...

Let me tell you how I prioritized, and why - and advocate for the ones I think are the best choices. You can judge for yourself whether that makes sense and make your own decisions, or share your own advocacy.

First off, note that you have a lot of expressive power here: I pledged something to everything because I really believe in the idea. I think think that changing the status quo and advancing any of these in a new way is a positive result. What I did is vary how much I helped anything along the way to it's goal. My minium was $5, my max was $40, but that could just as well be $1 or $5. If you pledge $1, that is $1 worth of funding of the commons that didn't exist before and lots of $1's can get it done too.

The implmentation projects I pledged less to...
  • Selector list support for `:not in Chromium`

    I pledged $5 (1/9600th of the goal) to supporting selector lists in Chromium. Chrome already spends way more than anyone else, and is by far the most dominant browser. Generally speaking, I am more interested in helping to level things out. Almost as importantly to me: It's only a second implementation - it doesn't achieve universality in the way some others do. Those are more valuable to me. A second implementation is better than a just first, and it's a necessary step along the way, for sure. However, it still leaves me without for a while. In a very tangible way, the last implementation is worth a lot. Finally, while it offers something toward specifity issues, mostly, it's just sugar. Would I like it? Sure, but it feels like there not a long of "bang for the buck".

  • CSS lab()` colors in Firefox

    I gave $5 toward CSS's lab colors in Firefox too (the same degree toward the goal as :not). This was a tough one, I'm not going to lie. It is potentially very valuable, but it is the kind of investment you only fully realize at the end of several stages. This is, in fact, one of the reasons it's been slow to get traction. We have to get this to get colors level 5. We want this in design tools. But the real value is only when we have all of the next steps, even across the platform (like in canvas too). It would also appear that this is not yet a "cross the finish line" investment.

  • CSS d (SVG path) support in Firefox

    I also pledged $5 toward supporting SVG (d) path in CSS in Firefox (but that is 1/2600th of the goal). It seems less important than others to me on a number of counts. It also doesn't help us cross a finish line: Once this is done, it still isn't in WebKit. That said, it is very good bang for the buck and I think we could get it done.

  • CSS contain in WebKit

    I pledged $10 toward supporting containment in WebKit (1/7100th of the the goal). Again, this is tough. It is a last mile - only WebKit doesn't support it. However, the price tag is much bigger too, and my big challenge here is that I don't know what the payoff really looks like is in practice. It can improve performance, which is great, but for me - it's extra. It can be used in conjunction with ResizeObserver for some pretty good approximations of container queries, which is great too. But it isn't 100% clear that it is a necessary element for container queries. In fact, in our proposal and experiments to far, it isn't necessary for a whole bunch of use cases. But... it's uncertain. If it turns out to be critical, and we haven't invested in it, it's only that much longer until we finally arrive. So... Sigh... That's a tough one.

My top choices: inert and focus-visible

This leads me to my top choices: :focus-visible (I pledged $40 - 1/875th of the goal) and inert ($40 or 1 1200th of the goal)- both in WebKit.

Why these? Well, because I think they are the most valuable in a lot of ways. Of course, I'm biased: I worked on the design and championing of both of these features with my friends Alice Boxhall and Rob Dodson, including writing and popularizing polyfills and iterating on the details with practical feedback and experience under our belts. And now, here we are a few years later and these are last miles - only one browser doesn't have them implemented or isn't implementing yet. We can change that.

I did this work on my own time. as a developer, before I came to work for Igalia because there were real problems my teams were facing every day. Worse, they were right at the intersection of UX, DX and accessibility so the pains were very real. So, let me tell you about what they are and why I think they're worth giving that last push.

inert

There are a whole bunch of design patterns that require you to manage a whole bunch of other properties enmasse: Make them unclickable, make their text unselectable, hide them from screen readers via ARIA, take them out of sequential focus temporarily. Probably the easiest example of this is when a dialog is open - the rest of the page is 'there' but kind of 'inert'. In fact, inert was as a term to the HTML specification when <dialog> was spec'ed. But this isn't the only time when you want this behavior and, in fact, it is useful in a number of other use cases - but it's hard enough that even a lot of common UI libraries get it wrong.

inert exposes the ability to say "the stuff inside this element should be all of those things" in a simple way via a reflecting attribute - that is, you can set it in markup as an attribute, or via DOM property (which updates the attribute).

You can read more about it, including other use cases and details on the WICG explainer (the related pull request to HTML itself is pending).

:focus-visible

Research shows that way too many people disable the focus indicator, at the cost of accessibility. Well, there's kind of a rational explanation for why this happens so much, and that is that CSS's :focus pseudo-class is just plain broken.

From very early days, browsers have used various heuristics to make the native focus ring valuable and, at the same time, unobtrusive. There are times when an element gets focus where showing the indicator just feels bad and is there for no particularly really good reason for it to be there. The use cases are not all the same: If you click on a password field in a busy environment, for example, it's very important that you understand that your typed characters will land in the obscured field -- so you will always get a focus indicator. If you click on a button, on the other hand, and it gets a focus indicator, that might be disorienting... But if you keyboard navigate to that button, then you need to see the indicator.

Unfortunately, designs frequently don't "work" with only the native indicator, and the trouble is that traditionally the only tool that designers have to style the focus indicator is :focus, and it doesn't care about any of that. The net result is that, unfortunately, attempting to use :focus to make the focus indicator work better for your site results in unfamilliar and disorienting UX for everyone. Given only bad choices, the answer that too many make is the easy one: Simply disable it.

:focus-visible, on the other hand, only matches if the indicator would natively be shown, taking into account all of the browser UX research and preventing this disconnect. It is, in our experience, provably easier to get right, and more successful.

What about you?

What about you, will you support something? Will you help advocate for the priority of something specific? I (and we at Igalia at large) would love to hear your thoughts - share them on social media with the hashtag #openprioritization so that we can try to collect them.

July 13, 2020 04:00 AM

July 12, 2020

Manuel Rego

Open Prioritization and CSS Containment

Igalia is a major contributor to all the open source web rendering engines (Blink, Gecko, Servo and WebKit). We have been doing different kind of contributions for years, which has led us to have an important position on the different communities. This allows us to help our customers to solve their problems through upstream contributions that also benefit the whole web community.

Implementing a feature in a rendering engine (or in several) might look very simple at first sight, but contributing them upstream can take a while depending on the standarization status, the related bugs, the browser architecture, and many other factors. You can find examples of things implemented by Igalia in the past on my previous blog posts, and you will realize about all the work behind some of those features.

There’s a common thing everywhere, people usually get really angry because that bug they reported years ago is still not fixed in a given browser. That can be for a variety of reasons, and not simply because the developers of that browser are very lazy and not paying attention to that particular bug. In many cases the answer to why that hasn’t been solved yet is pretty simple: priorities. Different companies and individuals contributing to the projects have their own interests and priorities, they prioritize the different issues and tasks and put the focus and effort on the ones that have a higher priority for them. A possible solution for that, now that major browsers are all open source, would be to look for a consulting company like Igalia that can fix that bug for you; but you as an individual, or even as a company, maybe you don’t have the budget to make that happen.

What would happen if we allow several parties to contribute together to the development of some features? That would make possible that both individuals and organizations that don’t have the power to implement them alone, could contribute their piece of the cake in order to add support for those features on the web platform.

Open Prioritization

Igalia is launching Open Prioritization, a crowd-founding campaign for the web platform. We believe this can open the door to many different people and organizations to prioritize the development of some features on the different web engines. Initially we have defined 6 tasks that can be found on the website, together with a FAQ explaining all the details of the campaign. 🚀

Let’s hope we can make this happen. If this is a success and some of these items get funded and implemented, probably there’ll be more in the future, including new things or ideas that you can share with us.

Open Prioritization by Igalia. An experiment in crowd-funding prioritization. Open Prioritization by Igalia

One of the tasks of the Open Prioritization campaign we’re starting this week is about adding CSS Containment support in WebKit, and we have experience working on that in Chromium.

Why CSS Containment in WebKit?

Briefly speaking CSS Containment is a standard focused in improving the rendering performance of web pages, it allows author to isolate DOM subtrees from the rest of the document, so any change that happens on the “contained” subtree doesn’t affect anything outside that element.

This is the spec behind the contain property, that can have a few values defining the “type of containment”: layout, paint, size and style. I’m not going to go deeper into this and I’ll refer to my introductory post or my CSSconf EU talk if you’re interested in getting more details about this specification.

So why we think this is important? Currently we have an issue with CSS Containment, it’s supported in Chromium and Firefox (except style containment) but not in WebKit. This might be not a big deal as it’s a performance oriented feature, so if you don’t have support you’ll simply have a worse performance and that’s all. But that’s not completely true as the different type of containments have some restrictions that apply to the contained element (e.g. layout containment makes the element become the containing block of positioned descendants), which might cause interoperability issues if you start to use the contain property in your websites.

The main goal of this task would be add CSS Containment support in WebKit, at least to the level that it’s spec compliant with the other implementations, and if time permits to implement some optimizations based on it. Once we have interoperability you can start using it wihtout any concern in your web pages, as the behavior won’t change between the different browsers and you might get some perf improvements (that will vary depending on each browser implementation).

In addition this will allow WebKit to implement further optimizations thanks to the information that the web authors provide through the contain property. On top of that, this initial support is a requirement in order to implement new features that are based on it; like the new CSS properties content-visibility and contain-intrinsic-size which are part of Display Locking feature.

If you think this is an important feature for you, please go ahead and do your pledge so it can get prioritized and implemented in WebKit upstream.

Really looking forward to seeing how this Open Prioritization campaign goes in the coming weeks. 🤞

July 12, 2020 10:00 PM

July 10, 2020

Víctor Jáquez

New VA-API H.264 decoder in gst-plugins-bad

Recently, a new H.264 decoder, using VA-API, was merged in gst-plugins-bad.

Why another VA-based H.264 decoder if there is already gstreamer-vaapi?

As usual, an historical perspective may give some clues.

It started when Seungha Yang implemented the GStreamer decoders for Windows using DXVA2 and D3D11 APIs.

Perhaps we need one step back and explain what are stateless decoders.

Video decoders are magic and opaque boxes where we push encoded frames, and later we’ll pop full decoded frames in raw format. This is how OpenMAX and V4L2 decoders work, for example.

Internally we can imagine those magic and opaque boxes has two main operations:

  • Codec state handling
  • Signal processing like Fourier-related transformations (such as DCT), entropy coding, etc. (DSP, in general)

The codec state handling basically extracts, from the stream, the frame’s parameters and its compressed data, so the DSP algorithms can decode the frames. Codec state handling can be done with generic CPUs, while DSP algorithms are massively improved through specific purpose processors.

These video decoders are known as stateful decoders, and usually they are distributed through binary and closed blobs.

Soon, silicon vendors realized they can offload the burden of state handling to third-party user-space libraries, releasing what it is known as stateless decoders. With them, your code not only has to push frames into the opaque box, but now it shall handle the codec specifics to provide all the parameters and references for each frame. VAAPI and DXVA2 are examples of those stateless decoders.

Returning to Seungha’s implementation, in order to get theirs DXVA2/D3D11 decoders, they also needed a state handler library for each codec. And Seungha wrote that library!

Initially they wanted to reuse the state handling in gstreamer-vaapi, which works pretty good, but its internal library, from the GStreamer perspective, is over-engineered: it is impossible to rip out only the state handling without importing all its data types. Which is kind of sad.

Later, Nicolas Dufresne, realized that this library can be re-used by other GStreamer plugins, because more stateless decoders are now available, particularly V4L2 stateless, in which he is interested. Nicolas moved Seungha’s code into a library in gst-plugins-bad.

Currently, libgstcodecs provides state handling of H.264, H.265, VP8 and VP9.

Let’s return to our original question: Why another VA-based H.264 decoder if there is already one in gstreamer-vaapi?

The quick answer is «to pay my technical debt».

As we already mentioned, gstreamer-vaapi is big and over-engineered, though we have being simplifying the internal libraries, in particular He Junyan, has done a lot of work replacing the internal base class, GstVaapiObject, withGstObject or GstMiniObject. Also, this kind of projects, where there’s a lot of untouched code, it carries a lot of cargo cult decisions.

So I took the libgstcodecs opportunity to write a simple, thin and lean, H.264 decoder, using VA new API calls (vaExportSurfaceHandle(), for example) and learning from other implementations, such as FFMpeg and ChromeOS. This exercise allowed me to identify where are the dusty spots in gstreamer-vaapi and how they should be fixed (and we have been doing it since then!).

Also, this opportunity lead me to learn a bit more about the H.264 specification since I implemented the reference picture list handling, and fixed a small bug in Chromium.

Now, let me be crystal clear: GStreamer VA-API is not going anywhere. It is, right now, one of the most feature-complete implementations using VA-API, even with its integration issues, and we are working on them, particularly, Intel folks are working hard on a new AV1 decoder, enhancing encoders and adding new video post-processing features.

But, this new vah264dec is an experimental VA-API decoder, which aims towards a tight integration with GStreamer, oriented to provide a good experience in most of the common use cases and to enhance the common libgstcodecs library shared with other stateless decoders, looking to avoid Intel specific nuances.

These are the main characteristics and plans of this new decoder:

  • It use, by default, a DRM connection to VA display, avoiding the troubles of choosing X11 or Wayland.
    • It uses the first found DRM device as VA display
    • In the future, users will be able to provide their custom VA display through the pipeline’s context.
  • It requires libva >= 1.6
  • No multiview/stereo profiles, neither interlaced streams, because libgstcodecs doesn’t handle them yet
  • It is incompatible with gstreamer-vaapi: mixing elements might lead to problems.
  • Even if memory:VAMemory is exposed, it is not handled yet by any other element yet.
    • Users will get VASurfaces via mapping as GstGL does with textures.
  • Caps templates are generated dynamically generated by querying VAAPI
  • YV12 and I420 are added for system memory caps because they seem to be supported for all the drivers when downloading frames onto main memory, as they are used by xvimagesink and others, avoiding color conversion.
  • Decoding surfaces aren’t bounded to context, so they can grow beyond the DBP size, allowing smooth reverse playback.
  • There isn’t yet error handling and recovery.
  • The element is supposed to spawn if different renderD nodes with VA-API driver support are found (like gstv4l2), but it hasn’t been tested yet.

Now you may be asking how do I use vah264dec?

Currently vah264dec has NONE rank, which means that it will never be autoplugged, but you can use the trick of the environment variable GST_PLUGIN_FEATURE_RANK:

$ GST_PLUGIN_FEATURE_RANK=vah264dec:259 gst-play-1.0 ~/video.mp4

And that’s it!

Thanks.

by vjaquez at July 10, 2020 06:15 PM

July 08, 2020

Ricardo García

VK_EXT_extended_dynamic_state released for Vulkan

A few days ago, the VK_EXT_extended_dynamic_state extension for Vulkan was released and included for the first time as part of Vulkan 1.2.145. This is a pretty interesting extension that makes Vulkan pipelines more flexible and practical for many use cases. At Igalia, I was involved in getting this extension out the door as the author of its VK-GL-CTS tests and, in a very minor capacity, by reviewing the spec text and contributing a couple of small fixes to it.

Vulkan pipelines

The purpose of this Vulkan extension is to make Vulkan pipelines less rigid by allowing them to have certain values set dynamically when you use the pipeline instead of those values being set in stone when creating the pipeline. For those less familiar with Vulkan, Vulkan pipelines are one of the most “heavy” objects in the API. Vulkan typically has compute and graphics pipelines. For this extension, we’ll be talking about graphics pipelines. A pipeline object, when created, contains a lot of information about what the GPU needs to do when rendering a scene or part of a scene, like how triangle vertices need to be read from memory, the number of textures, buffers and images that will be used, parameters for color blending operations, depth and stencil tests, multisample antialiasing, viewports, etc.

Vulkan, being a low-overhead API that tries to help you squeeze as much performance as possible out of a GPU, wants you to specify all that information in advance so implementations (GPU plus driver) have higher chances of optimizing the process, both at pipeline creation time and at runtime. Every time you “bind a pipeline” (i.e. setting it as the active pipeline for future commands) you’re telling the implementation how everything should work, which is usually followed by commands telling the GPU to draw lots of geometry using the previous parameters.

Creating a pipeline may also involve compiling shaders to native GPU instructions. Shaders are “small” programs that run on the GPU when the rendering process reaches a programmable stage. When a GPU is drawing anything, the drawing process is divided in stages. Each stage takes a number of inputs both directly from the previous stage and as external resources (buffers, textures, etc), and produces a number of outputs to be directly consumed by the next stage or as side effects in external resources. Some of those stages are fixed and some are programmable with user-provided shader programs. When these shaders are not so small, compiling and optimizing them to native GPU instructions takes some time. Usually not a very long time, but every millisecond counts when you only have 16 of them to draw the next frame in order to achieve 60 frames per second. Stuff like this is what drove the creation of the ACO shader compiler for the Mesa RADV driver and it’s also why some drivers hash shader contents and use a shader cache to check if that exact shader has been compiled before. It’s also why Vulkan wants you to create pipelines in advance if possible. Otherwise, if you realize you need a new pipeline in the middle of preparing the next frame in an action game, the pipeline creation process may make the game stutter at that point due to the extra processing time needed.

Vulkan gives you several possibilities to alleviate the problem. You can create every pipeline you may need in advance. This is one of the most effective approaches but may involve a good number of pipelines due to the different possible combinations of pipeline parameters you may want to use. Say you want to vary 7 different parameters independently from each other with two possible values each. That means you have to create 128 different pipelines and manage them in your application. Another option is using a pipeline cache that will speed up creation of pipelines identical or similar to other ones created in the past. This lets you focus only on the pipeline variants you need at a given point in time. Finally, Vulkan gives you the possibility of changing a few pipeline parameters on the fly instead of giving them fixed values at pipeline creation time. This is the dynamic state inside the pipeline.

Dynamic state and VK_EXT_extended_dynamic_state

Dynamic state helps in addition to anything I mentioned before. It makes your application logic easier by not having to deal with so many different variations and reduces the total number of times you may have to create a new pipeline, which may decrease initialization time, pipeline cache sizes and access, state changes and game stuttering. VK_EXT_extended_dynamic_state, when available and as its name implies, extends the number of pipeline elements that can be part of that dynamic state. It adds states like the culling mode, front face, primitive topology, viewport with count, scissor with count (previously, viewports and scissors could be changed dynamically but not their counts), vertex input binding stride, depth test activation and writes, depth comparison operation, depth bounds activation and stencil test activation and operations. That’s a pretty large set of new dynamic elements.

The obvious question that follows is if using so many dynamic elements decreases performance, in the sense that it may reduce the optimization opportunities the implementation may have because some details about the pipeline are not known in advance. The answer is that this really depends on the implementation. For example, in some implementations the culling mode or front face may be set in a register before drawing operations and there’s no practical difference between setting it when the pipeline is bound to be used or dynamically before a large set of drawing commands are used.

I’ve measured the impact of enabling every new dynamic state in a simple GPU-bound Vulkan program that displays a rotating model on screen and I haven’t noticed any performance impact with the NVIDIA proprietary driver and a GTX 1070 card, but your mileage may vary. As usual, measure before deploying.

VK_EXT_extended_dynamic_state can also help when Vulkan is used as the backend to implement other higher level APIs which are not as rigid as Vulkan itself and in which some drawing parameters can be changed on the fly, being up to the driver to implement those changes as efficiently as possible. We’re talking about OpenGL, or DirectX up to version 11. As you can imagine, it’s an interesting extension for projects like DXVK and it can help improve the state of Linux gaming through Wine and Proton.

Origins of VK_EXT_extended_dynamic_state

The story about how this extension came to be is also interesting. It all started as a reaction to an “angry” tweet by Eric Lengyel in which he lamented that he had to create two separate pipelines just to change the front face or winding order of triangles when rendering a reflection. That prompted Piers Daniell from NVIDIA to start a multivendor effort inside Khronos that resulted in VK_EXT_extended_dynamic_state. As you can read in the extension summary, several companies where involved: AMD, Arm, Broadcom, Google, Imagination, Intel, NVIDIA, and Valve.

For that reason, this extension is also one of the many success stories from the Khronos Group, a forum in which hardware and software vendors, big and small, participate designing and standardizing cross-platform solutions for the graphics industry. Many different points of view are taken into account when designing those solutions. If you look at the member list you’ll see plenty of known logos from hardware manufacturers and software developers, including companies making widely available game engines.

In this case an angry tweet was enough to spark an effort, but that’s not the ideal situation. You can propose specification improvements, extensions or new ideas using the Vulkan Docs repository. An issue could be enough and, for small changes, a pull request can be even better.

July 08, 2020 09:02 PM

Mario Sanchez Prada

​Chromium now migrated to the new C++ Mojo types

At the end of the last year I wrote a long blog post summarizing the main work I was involved with as part of Igalia’s Chromium team. In it I mentioned that a big chunk of my time was spent working on the migration to the new C++ Mojo types across the entire codebase of Chromium, in the context of the Onion Soup 2.0 project.

For those of you who don’t know what Mojo is about, there is extensive information about it in Chromium’s documentation, but for the sake of this post, let’s simplify things and say that Mojo is a modern replacement to Chromium’s legacy IPC APIs which enables a better, simpler and more direct way of communication among all of Chromium’s different processes.

One interesting thing about this conversion is that, even though Mojo was already “the new thing” compared to Chromium’s legacy IPC APIs, the original Mojo API presented a few problems that could only be fixed with a newer API. This is the main reason that motivated this migration, since the new Mojo API fixed those issues by providing less confusing and less error-prone types, as well as additional checks that would force your code to be safer than before, and all this done in a binary compatible way. Please check out the Mojo Bindings Conversion Cheatsheet for more details on what exactly those conversions would be about.

Another interesting aspect of this conversion is that, unfortunately, it wouldn’t be as easy as running a “search & replace” operation since in most cases deeper changes would need to be done to make sure that the migration wouldn’t break neither existing tests nor production code. This is the reason why we often had to write bigger refactorings than what one would have anticipated for some of those migrations, or why sometimes some patches took a bit longer to get landed as they would span way too much across multiple directories, making the merging process extra challenging.

Now combine all this with the fact that we were confronted with about 5000 instances of the old types in the Chromium codebase when we started, spanning across nearly every single subdirectory of the project, and you’ll probably understand why this was a massive feat that would took quite some time to tackle.

Turns out, though, that after just 6 months since we started working on this and more than 1100 patches landed upstream, our team managed to have nearly all the existing uses of the old APIs migrated to the new ones, reaching to a point where, by the end of December 2019, we had completed 99.21% of the entire migration! That is, we basically had almost everything migrated back then and the only part we were missing was the migration of //components/arc, as I already announced in this blog back in December and in the chromium-mojo mailing list.


Progress of migrations to the new Mojo syntax by December 2019

This was good news indeed. But the fact that we didn’t manage to reach 100% was still a bit of a pain point because, as Kentaro Hara mentioned in the chromium-mojo mailing list yesterday, “finishing 100% is very important because refactoring projects that started but didn’t finish leave a lot of tech debt in the code base”. And surely we didn’t want to leave the project unfinished, so we kept collaborating with the Chromium community in order to finish the job.

The main problem with //components/arc was that, as explained in the bug where we tracked that particular subtask, we couldn’t migrate it yet because the external libchrome repository was still relying on the old types! Thus, even though almost nothing else in Chromium was using them at that point, migrating those .mojom files under //components/arc to the new types would basically break libchrome, which wouldn’t have a recent enough version of Mojo to understand them (and no, according to the people collaborating with us on this effort at that particular moment, getting Mojo updated to a new version in libchrome was not really a possibility).

So, in order to fix this situation, we collaborated closely with the people maintaining the libchrome repository (external to Chromium’s repository and still relies in the old mojo types) to get the remaining migration, inside //components/arc, unblocked. And after a few months doing some small changes here and there to provide the libchrome folks with the tools they’d need to allow them to proceed with the migration, they could finally integrate the necessary changes that would ultimately allow us to complete the task.

Once this important piece of the puzzle was in place, all that was left was for my colleague Abhijeet to land the CL that would migrate most of //components/arc to the new types (a CL which had been put on hold for about 6 months!), and then to land a few CLs more on top to make sure we did get rid of any trace of old types that might still be in codebase (special kudos to my colleague Gyuyoung, who wrote most of those final CLs).


Progress of migrations to the new Mojo syntax by July 2020

After all this effort, which would sit on top of all the amazing work that my team had already done in the second half of 2019, we finally reached the point where we are today, when we can proudly and loudly announce that the migration of the old C++ Mojo types to the new ones is finally complete! Please feel free to check out the details on the spreadsheet tracking this effort.

So please join me in celebrating this important milestone for the Chromium project and enjoy the new codebase free of the old Mojo types. It’s been difficult but it definitely pays off to see it completed, something which wouldn’t have been possible without all the people who contributed along the way with comments, patches, reviews and any other type of feedback. Thank you all! 👌 🍻

IgaliaLast, while the main topic of this post is to celebrate the unblocking of these last migrations we had left since December 2019, I’d like to finish acknowledging the work of all my colleagues from Igalia who worked along with me on this task since we started, one year ago. That is, Abhijeet, Antonio, Gyuyoung, Henrique, Julie and Shin.

Now if you’ll excuse me, we need to get back to working on the Onion Soup 2.0 project because we’re not done yet: at the moment we’re mostly focused on converting remote calls using Chromium’s legacy IPC to Mojo (see the status report by Dave Tapuska) and helping finish Onion Soup’ing the remaining directores under //content/renderer (see the status report by Kentaro Hara), so there’s no time to waste. But those migrations will be material for another post, of course.

by mario at July 08, 2020 08:55 AM

Philip Chimento

Sculpture of one of Salvador Dalí's melting pocket watches draped over a tree branch.

It’s been a long time since I last blogged. In the interim I started a new job at Igalia as a JavaScript Engine Developer on the compilers team, and attended FOSDEM in Brussels several million years ago in early February back when “getting on a plane and traveling to a different country” was still a reasonable thing to do.

In this blog post I would like to present Temporal, a proposal to add modern and comprehensive handling of dates and times to the JavaScript language. This has been the project I’m working on at Igalia, as sponsored by Bloomberg. I’ve been working on it for the last 6 months, joining several of my coworkers in a cross-company group of talented people who have already been working on it for several years.

This is the kind of timekeeping you get with the old JavaScript Date… (Public domain photograph by Julo)

I already collaborated on a blog post about Temporal, “Dates and Times in JavaScript”, so I won’t repeat all that here, but all the explanation you really need is that Temporal is a modern replacement for the Date object in JavaScript, which is terrible. You may also want to read “Fixing JavaScript Date”, a two-part series providing further background, by Maggie Pint, one of the originators of Temporal.

How Temporal can be useful in GNOME

I’m aware that this blog is mostly read by the GNOME community. That’s why in this blog post I want to talk especially about how a large piece of desktop software like GNOME is affected by JavaScript Date being so terrible.

Of course most improvements to the JavaScript language are driven by the needs of the web.1 But a few months ago this merge request caught my eye, fixing a bug that made the date displayed in GNOME wrong by a full 1,900 years! The difference between Date.getYear() not doing what you expect (and Date.getFullYear() doing it instead) is one of the really awful parts of JavaScript Date. In this case if there had been a better API without evil traps, the mistake might not have been made in the first place, and it wouldn’t have come down to a last-minute code freeze break.

In the group working on the Temporal proposal we are seeking feedback from people who are willing to try out the Temporal API, so that we can find out if there are any parts that don’t meet people’s needs and change them before we try to move the proposal to Stage 3 of the TC39 process. Since I think GNOME Shell and GNOME Weather, and possibly other apps, might benefit from using this API when it becomes part of JavaScript in the future, I’d be interested in finding out what we in the GNOME community need from the Temporal API.

It seems to me the best way to do this would be to make a port of GNOME Shell and/or GNOME Weather to the experimental Temporal API, and see what issues come up. Unfortunately, it would defeat the purpose for me to do this myself, since I am already overly familiar with Temporal and by now its shortcomings are squarely in my blind spot! So instead I’ll offer my help and guidance to anyone who wants to try this out. Please get in touch with me if you are interested.

How to try it out

Since Temporal is of course not yet a built-in object in JavaScript, to try it out we will need to import a polyfill. We have published a polyfill which is experimental only, for the purpose of trying out the API and integrating it with existing code. Here’s a link to the API documentation.

The polyfill is primarily published as an NPM library, but we can get it to work with GJS quite easily. Here’s how I did it.

First I cloned the tc39/proposal-temporal repo, and ran npm install and npm run build in it. This generates a file called polyfill/script.js which you can copy into your code, into a place in your imports path so that the importer can find it. Then you can import Temporal:

const {Temporal} = imports.temporal.temporal;

Note that the API is not stable, so only use this to try out the API and give feedback! Don’t actually include it in your code. We have every intention of changing the API, maybe even drastically, based on feedback that we receive.

Once you have tried it out, the easiest way to tell us about your findings is to complete the survey, but do also open an issue in the bug tracker if you have something specific.

Intl, or how to stop doing _("%B %-d %Y")

While I was browsing through GNOME Shell bug reports to find ones related to JavaScript Date, I found several such as gnome-shell#2293 where the translated format strings lag behind the release while translators figure out how to translate cryptic strings such as "%B %-d %Y" for their locales. By doing our own translations, we are actually creating the conditions to receive these kinds of bug reports in the first place. Translations for these kinds of formats that respect the formatting rules for each locale are already built into JavaScript engines nowadays, in the Intl API via libicu, and we could take advantage of these translations to take some pressure off of our translators.

In fact, we could do this right now already, no need to wait for the Temporal proposal to be adopted into JavaScript and subsequently make it into GJS. We already have everything we need in GNOME 3.36. With Intl, the function I linked above would become:

_updateTitle() {
    const locale = getCachedLocale();
    const timeSpanDay = GLib.TIME_SPAN_DAY / 1000;
    const now = new Date();
    const rtf = new Intl.RelativeTimeFormat(locale, {numeric: 'auto'});

    if (this._startDate <= now && now <= this._endDate)
        this._title.text = rtf.format(0, 'day');
    else if (this._endDate < now && now - this._endDate < timeSpanDay)
        this._title.text = rtf.format(-1, 'day');
    else if (this._startDate > now && this._startDate - now < timeSpanDay)
        this._title.text = rtf.format(1, 'day');
    else if (this._startDate.getFullYear() === now.getFullYear())
        this._title.text = this._startDate.toLocaleString(locale, {month: 'long', day: 'numeric'});
    else
        this._title.text = this._startDate.toLocaleString(locale, {year: 'numeric', month: 'long', day: 'numeric'});
}

(Note, this presumes a function getCachedLocale() which determines the correct locale for Intl by looking at the LC_TIME, LC_ALL, etc. evnvironment variables. If GNOME apps wanted to move to Intl generally, I think it might be worth adding such a function to GJS’s Gettext module.)

Whereas in the future with Temporal, it would be even simpler and clearer, and I couldn’t resist rewriting that method! We wouldn’t need to store a start Date at 00:00 and end Date at 23:59.999 which is really just a workaround for the fact that we are talking here about a date without a time component, that is purely a calendar day. Temporal covers this use case out of the box:

_updateTitle() {
    const locale = getCachedLocale();
    const today = Temporal.now.date();

    const {days} = today.difference(this._date);
    if (days <= 1) {
        const rtf = new Intl.RelativeTimeFormat(locale, {numeric: 'auto'});
        // Note: if this negation seems a bit unwieldy, be aware that we are
        // considering revising the API to allow negative-valued durations
        days = Temporal.Date.compare(today, this._date) < 0 ? days : -days;
        this._title.text = rtf.format(days, 'day');
    } else {
        const options = {month: 'long', day: 'numeric'};
        if (today.year !== this._date.year)
            options.year = 'numeric';

        this._title.text = this._date.toLocaleString(locale, options);
    }
}

Calendar systems

One exciting thing about Temporal is that it will support non-Gregorian calendars. If you are a GNOME user or developer who uses a non-Gregorian calendar, or develops code for users who do, then please get in touch with me! In the group of people developing Temporal everyone uses the Gregorian calendar, so we have a knowledge gap about what users of other calendars need. We’d like to try to close this gap by talking to people.

A Final Note

In the past months I’ve not been much in the mood to write blog posts. My mind has been occupied worrying about the health of my family, friends, and myself; feeling fury and shame at the inequalities of our society that, frankly, the pandemic has made harder to fool ourselves into forgetting if it doesn’t affect us directly; and fury at our governments that perpetuate these problems and resist meaningful attempts at reform.

With all that’s going on in the world, blogging about technical achievements feels a bit ridiculous and inconsequential, but, well, I’m writing this, and you’re reading this, and here we are. So keep in mind there are other important things too. Be safe, be kind, but don’t forget to stay furious after the dust settles.


[1] One motivation for why some are eagerly awaiting Temporal as part of the JavaScript language, as opposed to a library, is that it would be built-in to the browser. The most popular library for fixing the deficiencies of Date, moment.js, can mean an extra download of 20–100 kb, depending on whether you include all locales and support for time zones. This adds up to quite a lot of wasted data if you are downloading this on a large number of the websites you visit, but this specifically doesn’t affect GNOME. ↩

by Philip Chimento at July 08, 2020 12:43 AM

July 06, 2020

Asumu Takikawa

Shipping WebAssembly's BigInt/I64 conversion in Firefox

Hello folks. Today I’m excited to share with you about some work I’ve been hacking on in Firefox’s WebAssembly (AKA Wasm) engine recently.

The tl;dr summary: starting in Firefox 78 (released June 30, 2020), you will be able to write WebAssembly functions that pass 64-bit integers to JavaScript (where they will turn into BigInts) and vice versa.

(see Lin Clark’s excellent Cartoon Intro to WebAssembly if you’re not familiar with WebAssembly)

Wasm comes with a JavaScript API that allows a Wasm program to interact with JavaScript, by exchanging values through mechanisms such as function calls between the languages, globals, or shared linear memory. The initial MVP release of Wasm came with four built-in types: i32, i64, f32, and f64 (32-bit and 64-bit integers and 32-bit and 64-bit floats, respectively).

All but one of these types could be used to talk with JavaScript programs from the start by converting to a Number appropriately. But i64s were disallowed.

Concretely, a simple example program (with a single function that returns an i64) like this one:

1
2
3
(module
  (func (export "f") (result i64)
    (i64.const 42)))

would produce an error like this when you would try to run it from JS:

1
2
3
4
> WebAssembly.instantiateStreaming(fetch('../out/main.wasm')).
then(obj => console.log(obj.instance.exports.f())).catch(console.error);

[error]: TypeError: cannot pass i64 to or from JS

(you can try this out in WebAssembly Studio: select a wat project and put the above code in the .wat and .js files respectively)

In Firefox 78, you won’t get an error. Instead, the JS caller will receive a BigInt value 42n and can continue its computation.

One of the practical ramifications of this limitation in previous releases was that tools that produce Wasm code would have to legalize it by transforming the code to accept or produce multiple 32-bit integers instead. For example, a function with a signature (param i64) would be translated to one with (param i32 i32). This can increase compiled code size and complicate the JS logic in your program.

The reason that the limitation existed was that until relatively recently, there was no good way to represent a 64-bit integer value in JS; the Number type in JS doesn’t allow you to represent the full range of a 64-bit integer.

That’s all changed since BigInt (arbitrary-length integers) became part of the JS standard and started shipping in browsers. In Firefox, it’s been available since 2019 after my colleagues at Igalia shipped it (see Andy Wingo’s blog post about that).

With the now implemented “JavaScript BigInt to WebAssembly i64 integration” proposal, i64 values from Wasm get converted to BigInts on their way out to JS as you saw in the earlier example. On the other hand, JS code can send a BigInt (e.g., as an argument to a Wasm function) to Wasm and it will get converted to an i64. In the case that a BigInt’s value exceeds the i64 range, it is fit into the range with a modulo operation.

Support for this proposal has also landed in tools like Emscripten already as well (you have to pass a WASM_BIGINT flag to use it).

It was possible to fill in this gap in the JS API thanks to a long line of work on BigInt in JS standards and engines. Some of my colleagues at Igalia such as Dan Ehrenberg and Caio Lima have been pushing forward a lot of the work on BigInt, which we talked about a little bit in a recent blog post.

Much of the BigInt/i64 interoperation work in Firefox was originally done by Sven Sauleau, who championed the spec proposal along with Dan Ehrenberg. Ms2ger from Igalia also worked on the spec text and tests. Many thanks to the engineers at Mozilla (Lars Hansen, Benjamin Bouvier, and André Bargull) who helped review the work and provided bug fixes.

Finally, our work on WebAssembly at Igalia is sponsored by Tech at Bloomberg. Thanks to them for making this work on the web platform possible! They have been a great partner in Igalia’s mission to help build an expressive and open web platform for everyone.

by Asumu Takikawa at July 06, 2020 07:00 PM

July 05, 2020

Frédéric Wang

Contributions to Web Platform Interoperability (First Half of 2020)

Note: This blog post was co-authored by AMP and Igalia teams.

Web developers continue to face challenges with web interoperability issues and a lack of implementation of important features. As an open-source project, the AMP Project can help represent developers and aid in addressing these challenges. In the last few years, we have partnered with Igalia to collaborate on helping advance predictability and interoperability among browsers. Standards and the degree of interoperability that we want can be a long process. New features frequently require experimentation to get things rolling, course corrections along the way and then, ultimately as more implementations and users begin exploring the space, doing really interesting things and finding issues at the edges we continue to advance interoperability.

Both AMP and Igalia are very pleased to have been able to play important roles at all stages of this process and help drive things forward. During the first half of this year, here’s what we’ve been up to…

Default Aspect Ratio of Images

In our previous blog post we mentioned our experiment to implement the intrinsic size attribute in WebKit. Although this was a useful prototype for standardization discussions, at the end there was a consensus to switch to an alternative approach. This new approach addresses the same use case without the need of a new attribute. The idea is pretty simple: use specified width and height attributes of an image to determine the default aspect ratio. If additional CSS is used e.g. “width: 100%; height: auto;”, browsers can then compute the final size of the image, without waiting for it to be downloaded. This avoids any relayout that could cause bad user experience. This was implemented in Firefox and Chromium and we did the same in WebKit. We implemented this under a flag which is currently on by default in Safari Tech Preview and the latest iOS 14 beta.

Scrolling

We continued our efforts to enhance scroll features. In WebKit, we began with scroll-behavior, which provides the ability to do smooth scrolling. Based on our previous patch, it has landed and is guarded by an experimental flag “CSSOM View Smooth Scrolling” which is disabled by default. Smooth scrolling currently has a generic platform-independent implementation controlled by a timer in the web process, and we continue working on a more efficient alternative relying on the native iOS UI interfaces to perform scrolling.

We have also started to work on overscroll and overscroll customization, especially for the scrollend event. The scrollend event, as you might expect, is fired when the scroll is finished, but it lacked interoperability and required some additional tests. We added web platform tests for programmatic scroll and user scroll including scrollbar, dragging selection and keyboard scrolling. With these in place, we are now working on a patch in WebKit which supports scrollend for programmatic scroll and Mac user scroll.

On the Chrome side, we continue working on the standard scroll values in non-default writing modes. This is an interesting set of challenges surrounding the scroll API and how it works with writing modes which was previously not entirely interoperable or well defined. Gaining interoperability requires changes, and we have to be sure that those changes are safe. Our current changes are implemented and guarded by a runtime flag “CSSOM View Scroll Coordinates”. With the help of Google engineers, we are trying to collect user data to decide whether it is safe to enable it by default.

Another minor interoperability fix that we were involved in was to ensure that the scrolling attribute of frames recognizes values “noscroll” or “off”. That was already the case in Firefox and this is now the case in Chromium and WebKit too.

Intersection and Resize Observers

As mentioned in our previous blog post, we drove the implementation of IntersectionObserver (enabled in iOS 12.2) and ResizeObserver (enabled in iOS 14 beta) in WebKit. We have made a few enhancements to these useful developer APIs this year.

Users reported difficulties with observe root of inner iframe and the specification was modified to accept an explicit document as a root parameter. This was implemented in Chromium and we implemented the same change in WebKit and Firefox. It is currently available Safari Tech Preview, iOS 14 beta and Firefox 75.

A bug was also reported with ResizeObserver incorrectly computing size for non-default zoom levels, which was in particular causing a bug on twitter feeds. We landed a patch last April and the fix is available in the latest Safari Tech Preview and iOS 14 beta.

Resource Loading

Another thing that we have been concerned with is how we can give more control and power to authors to more effectively tell the browser how to manage the loading of resources and improve performance.

The work that we started in 2019 on lazy loading has matured a lot along with the specification.

The lazy image loading implementation in WebKit therefore passes the related WPT tests and is functional and comparable to the Firefox and Chrome implementations. However, as you might expect, as we compare uses and implementation notes it becomes apparent that determining the moment when the lazy image load should start is not defined well enough. Before this can be enabled in releases some more work has to be done on improving that. The related frame lazy loading work has not started yet since the specification is not in place.

We also added an implementation for stale-while-revalidate. The stale-while-revalidate Cache-Control directive allows a grace period in which the browser is permitted to serve a stale asset while the browser is checking for a newer version. This is useful for non-critical resources where some degree of staleness is acceptable, like fonts. The feature has been enabled recently in WebKit trunk, but it is still disabled in the latest iOS 14 beta.

Contributions were made to improve prefetching in WebKit taking into account its cache partitioning mechanism. Before this work can be enabled some more patches have to be landed and possibly specified (for example, prenavigate) in more detail. Finally, various general Fetch improvements have been done, improving the fetch WPT score. Examples are:

What’s next

There is still a lot to do in scrolling and resource loading improvements and we will continue to focus on the features mentioned such as scrollend event, overscroll behavior and scroll behavior, lazy loading, stale-while-revalidate and prefetching.

As a continuation of the work done for aspect ratio calculation of images, we will consider the more general CSS aspect-ratio property. Performance metrics such as the ones provided by the Web Vitals project is also critical for web developers to ensure that their websites provide a good user experience and we are willing to investigate support for these in Safari.

We love doing this work to improve the platform and we’re happy to be able to collaborate in ways that contribute to bettering the web commons for all of us.

July 05, 2020 10:00 PM

July 03, 2020

Eleni Maria Stea

[OpenGL and Vulkan Interoperability on Linux] Part 1: Introduction

It’s been a while that Igalia’s graphics team had been working on the OpenGL extensions that provide the mechanisms for OpenGL and Vulkan interoperability in the Intel iris (gallium3d) driver that is part of mesa. As there were no conformance tests (CTS) for this extension, and we needed to test it, we have written (and … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 1: Introduction

by hikiko at July 03, 2020 11:39 AM

Martin Robinson

CSS Painting Order

How does a browser determine what order to paint content in? A first guess might be that browsers will paint content in the order that it is specified in the DOM, which for an HTML page is the order it appears in the source code of the page.

We can construct a simple example showing that two divs overlap in this order. We overlap two divs by giving one of them a negative top margin.

<style>
    .box {
        width: 8ex;
        height: 8ex;
        padding: 0.2ex;
        color: white;
        font-weight: bold;
        text-align: right;
    }

    .blue { background: #99DDFF; }

    /* The second div has a negative top margin so that it overlaps
       with the first (blue) div. Also, shift it over to the right
       slightly. */
    .green {
        background: #44BB99;
        margin-left: 3ex;
        margin-top: -6ex;
     }
</style>

<div class="blue box">1</div>
<div class="green box">2</div>
.example { margin-left: 5ex; margin-bottom: 6ex; } .box { width: 8ex; height: 8ex; padding: 0.2ex; color: white; font-weight: bold; text-align: right; } .blue { background: #99DDFF; } /* The second div has a negative top margin so that it overlaps with the first (blue) div. Also, shift it over to the right slightly. */ .green { background: #44BB99; margin-left: 3ex; margin-top: -6ex; }

It seems that our guess was a pretty good guess! I’m sure some of you are saying, “Hold on! What about z-index?” You’re right. Using the z-index property, we can override the normal painting order used by the browser. We give the green div a z-index and make it relatively positioned, because z-index only works on positioned elements. We also add a yellow child of the green div to see how this affects children. Finally, let’s start labeling each div with its z-index.

<style>
    .yellow {
        margin-left: 3ex;
        background: #EEDD88;
    }
</style>

<div class="blue box">0</div>
<div class="green box" style="position: relative; z-index: -1;">-1
    <div class="yellow box">-1</div>
</div>
.yellow { margin-left: 3ex; background: #EEDD88; }
0
-1
-1

In this example, the green div is painted before the blue div, even though it comes later in the source code. We can see that the z-index affects the div itself and also the yellow child div. What if we want to now paint the yellow nested child on top of everything by giving it a large positive z-index?

<div class="blue box">0</div>
<div class="green box" style="position: relative; z-index: -1;">-1
    <div class="yellow box" style="position: relative; z-index: 1000;">1000</div>
</div>
0
-1
1000

Wait! What’s going on here? The blue div has no z-index specified, which should mean that the value used for its z-index is zero. The z-index of our nested yellow child is 1000, yet this div is still painted underneath. Why isn’t the nested child painted on top of the blue div as we might expect?

At this point, it’s appropriate we have to buy the classic joke “CSS IS AWESOME” mug, fill it up with coffee, and read the entirety of the CSS2 specification. Suddenly, we understand that the answer is that the our divs are forming something called stacking contexts.

The Stacking Context

We determined exactly what was going when we arrived at Appendix E: Elaborate description of Stacking Contexts. Thankfully, we made a stupidly big cup of coffee since all the good information is apparently stuffed in the appendices. Appendix E gives us a peak at the algorithm that browsers use to determine the painting order of content on the page, including what sorts of properties affect this painting order. It turns out that our early guesses were mostly correct, things generally stack according to the order in the DOM and active z-indices. Sometimes though, certain CSS properties applied to elements trigger the creation of a stacking context which might affect painting order in a way we don’t expect.

We learn from the Appendix E that a stacking context is an atomically painted collection of page items. What does this mean? To put it simply, it means that things inside a stacking context are painted together, as a unit, and that items outside the stacking content will never be painted between them. Having an active z-index is one of the situations in CSS which triggers the creation of a stacking context. Is there a way we can adjust our example above so that the third element belongs to the same stacking context as the first two elements? The answer is that we must remove it from the stacking context created by the second element.

<div class="blue box">0</div>
<div class="yellow box" style="position: relative; z-index: 1000; margin-top: -5ex">1000</div>
<div class="green box" style="position: relative; z-index: -1; margin-left: 6ex;">-1</div>
0
1000
-1

Now the yellow div is a sibling of the blue and the green and is painted on top of both of them, even though it now comes second in the source.

It’s clear that stacking contexts can impose strong limitations on the order our elements are painted, so it’d be great to know when we are triggering them. Whether or not a particular CSS feature triggers the creation of a new stacking context is defined with that feature, which means the information is spread throughout quite a few specifications. Helpfully, MDN has a great list of situations where element create a stacking context. Some notable examples are elements with an active z-index, position: fixed and position: sticky elements, and elements with a transform or perspective.

Surprising Details

I’m going to level with you. While the stacking context might be a bit confusing at first, for a browser implementor it makes things a lot simpler. The stacking context is a handy abstraction over a chunk of the layout tree which can be processed atomically. In fact, it would be nice if more things created stacking contexts. Rereading the list above you may notice some unusual exceptions. Some of these exceptions are not “on purpose,” but were just arbitrary decisions made a long time ago.

For me, one of the most surprising exceptions to stacking context creation is overflow: scroll. We know that setting scroll for the overflow property causes all contents that extend past the padding edge of a box to be hidden within a scrollable area. What does it mean that they do not trigger the creation of a stacking context? It means that content outside of a scrollable area can intersect content inside of it. All it takes is a little bit of work to see this in action:

<style>
    .scroll-area {
        overflow: scroll;
        border: 3px solid salmon;
        width: 18ex;
        height: 15ex;
        margin-left: 2ex;
    }

    .scroll-area .vertical-bar {
        position: relative; /* We give each bar position: relative so that they can have z-indices. */
        float: left;
        height: 50ex;
        width: 4ex;

        /* A striped background that shows scrolling motion. */
        background: repeating-linear-gradient(
            120deg,
            salmon 0px, salmon 10px,
            orange 10px, orange 20px
       );
    }

    /* Even bars will be on top of the yellow vertical bar due to having a greater z-index. */
    .scroll-area .vertical-bar:nth-child(even) {
        z-index: 4;
        opacity: 0.9;
    }

    .yellow-horizontal-bar {
        margin-top: -10ex;
        margin-bottom: 10ex;
        max-width: 30ex;
        background: #EEDD88;

        /* Raise the horizontal bar above the scrollbar of the scrolling area. */
        position: relative;
        z-index: 2;
    }
</style>

<!-- A scroll area with four vertical bars. -->
<div class="scroll-area">
    <div class="vertical-bar"></div>
    <div class="vertical-bar"></div>
    <div class="vertical-bar"></div>
    <div class="vertical-bar"></div>
</div>

<!-- A div that will thread in between the vertical bars of the scroll area above. -->
<div class="yellow-horizontal-bar">~~JUST PASSING THROUGH~~</div>
.scroll-area { overflow: scroll; border: 3px solid salmon; width: 18ex; height: 15ex; margin-left: 2ex; } .scroll-area .vertical-bar { position: relative; /* We give each bar position: relative so that they can have z-indices. */ float: left; height: 50ex; width: 4ex; /* A striped background that shows scrolling motion. */ background: repeating-linear-gradient( 120deg, salmon 0px, salmon 10px, orange 10px, orange 20px ); } /* Even bars will be on top of the yellow vertical bar due to having a greater z-index. */ .scroll-area .vertical-bar:nth-child(even) { z-index: 4; opacity: 0.9; } .yellow-vertical-bar { margin-top: -10ex; margin-bottom: 10ex; max-width: 30ex; background: #EEDD88; /* Raise the vertical bar above the scrollbar of the scrolling area. */ position: relative; z-index: 2; }
~~JUST PASSING THROUGH~~

Using the power of web design, we’ve managed to wedge the final div between the contents of the scroll frame. Half of the scrolling content is on top of the interloper and half is underneath. This probably renders in a surprising way with the interposed div on top the scrolling area’s scrollbar (if it has one). You can imagine what kind of headaches this causes for the implementation of scrollable areas in browser engines, because the children of a particular scroll area might be spread throughout the layout tree. There’s no guarantee that it has any kind of recursive encapsulation.

CSS’s rules often have a reasoned origin, but some of them are just arbitrary implementation decisions made roughly 20 years ago without the benefit of hindsight. Rough edges like this stacking context exception might seldom come into play, but the web is huge and has collected years of content. There are potentially thousands of pages relying on this behavior such as lists of 2003’s furriest angora rabbits or memorials to someone’s weird obsession with curb cuts. The architects of the web have chosen not to break those galleries of gorgeous lagomorphs and has instead opted for maximizing long-term web compatibility.

Breaking the Rules

Earlier, I wrote that nothing from the outside a stacking context can be painted in between a stacking context’s contents. Is that really, really, really true though? CSS is so huge, there must be at least one exception, right? I now have a concrete answer to this question and that answer is “maybe.” CSS is full of big hammers and one of the biggest hammers (this is foreshadowing for a future post) is CSS transformations. This makes sense. Stacking contexts are all about enforcing order amidst the chaos of the z-axis, which is the one that extends straight from your heart into your screen. Transformed elements can traverse this dimension allowing for snazzy flipbook effects and also requiring web browsers to gradually become full 3D compositors. Surely if its possible to break this rule we can do it with 3D CSS transformations.

Let’s take a modified version of one of our examples above. Here we have three boxes. The last two are inside of a div with a z-index of -2, which means that they are both inside a single stacking context that stacks underneath the first box.

<style>
    .salmon {
        background: salmon;
        margin-top: -5ex;
        margin-left: 4ex;
    }
</style>

<div class="blue box">0</div>
<div style="position: relative; z-index: -2;">
    <div class="green box">-2</div>
    <div class="salmon box">-2</div>
</div>
.salmon { background: salmon; margin-top: -5ex; margin-left: 4ex; }
0
-2
-2

Now we make two modifications to this example. First, we wrap the example in a new div with a transform-style of preserve-3d, which will position all children in 3d space. Finally, we push one of the divs with z-index of -2 out of the screen using a 3d translation.

<div style="transform-style: preserve-3d;">
    <div class="blue box">0</div>
    <div style="position: relative; z-index: -2;">
        <div class="green box">-2</div>
        <div class="salmon box" style="transform: translateZ(50px);">-2</div>
    </div>
</div>
0
-2
-2

It’s possible that your browser might not render this in the same way, but in Chrome the div with z-index of 0 is rendered in between two divs within the same stacking context both with z-index of -2.

We broke the cardinal rule of the stacking context. Take that architects of the web! Is this exercise useful at all? Almost certainly not. I hope it was sufficiently weird though!

Conclusion

Hopefully I’ll be back soon to talk about the implementation of this wonderful nonsense in Servo. I want to thank Frédéric Wang for input on this post and also Mozilla for allowing me to hack on this as part of my work for Igalia. Servo is a really great way to get involved in browser development. It’s also written in Rust, which is a language that can help you become a better programmer simply by learning it, so check it out. Thanks for reading!

July 03, 2020 04:00 AM

July 02, 2020

Philippe Normand

Web-augmented graphics overlay broadcasting with WPE and GStreamer

Graphics overlays are everywhere nowadays in the live video broadcasting industry. In this post I introduce a new demo relying on GStreamer and WPEWebKit to deliver low-latency web-augmented video broadcasts.

Readers of this blog might remember a few posts about WPEWebKit and a GStreamer element we at Igalia worked on. In december 2018 I introduced GstWPE and a few months later blogged about a proof-of-concept application I wrote for it. So, learning from this first iteration, I wrote another demo!

The first demo was already quite cool, but had a few down-sides:

  1. It works only on desktop (running in a Wayland compositor). The Wayland compositor dependency can be a burden in some cases. Ideally we could imaginge GstWPE applications running “in the cloud”, on machines without GPU, bare metal.
  2. While it was cool to stream to Twitch, Youtube and the like, these platforms currently can ingest only RTMP streams. That means the latency introduced can be quite significant, depending on the network conditions of course, but even in ideal conditions the latency was between one and 2 seconds. This is not great, in the world we live in.

To address the first point, WPE founding engineer, Žan Doberšek enabled software rasterizing support in WPE and its FDO backend. This is great because it allows WPE to run on machines without GPU (like continuous integration builders, test bots) but also “in the cloud” where machines with GPU are less affordable than bare metal! Following up, I enabled this feature in GstWPE. The source element caps template now has video/x-raw, in addition to video/x-raw(memory:GLMemory). To force swrast, you need to set the LIBGL_ALWAYS_SOFTWARE=true environment variable. The downside of swrast is that you need a good CPU. Of course it depends on the video resolution and framerate you want to target.

On the latency front, I decided to switch from RTMP to WebRTC! This W3C spec isn’t only about video chat! With WebRTC, sub-second live one-to-many broadcasting can be achieved, without much efforts, given you have a good SFU. For this demo I chose Janus, because its APIs are well documented, and it’s a cool project! I’m not sure it would scale very well in large deployments, but for my modest use-case, it fits very well.

Janus has a plugin called video-room which allows multiple participants to chat. But then imagine a participant only publishing its video stream and multiple “clients” connecting to that room, without sharing any video or audio stream, one-to-many broadcasting. As it turns out, GStreamer applications can already connect to this video-room plugin using GstWebRTC! A demo was developed by tobiasfriden and saket424 in Python, it recently moved to the gst-examples repository. As I kind of prefer to use Rust nowadays (whenever I can anyway) I ported this demo to Rust, it was upstreamed in gst-examples as well. This specific demo streams the video test pattern to a Janus instance.

Adapting this Janus demo was then quite trivial. By relying on a similar video mixer approach I used for the first GstWPE demo, I had a GstWPE-powered WebView streaming to Janus.

The next step was the actual graphics overlays infrastructure. In the first GstWPE demo I had a basic GTK UI allowing to edit the overlays on-the-fly. This can’t be used for this new demo, because I wanted to use it headless. After doing some research I found a really nice NodeJS app on Github, it was developed by Luke Moscrop, who’s actually one of the main developers of the Brave BBC project. The Roses CasparCG Graphics was developed in the context of the Lancaster University Students’ Union TV Station, this app starts a web-server on port 3000 with two main entry points:

  • An admin web-UI (in /admin/ allowing to create and manage overlays, like sports score boards, info banners, and so on.
  • The target overlay page (in the root location of the server), which is a web-page without predetermined background, displaying the overlays with HTML, CSS and JS. This web-page is meant to be fed to CasparCG (or GstWPE :))

After making a few tweaks in this NodeJS app, I can now:

  1. Start the NodeJS app, load the admin UI in a browser and enable some overlays
  2. Start my native Rust GStreamer/WPE application, which:
    • connects to the overlay web-server
    • mixes a live video source (webcam for instances) with the WPE-powered overlay
    • encodes the video stream to H.264, VP8 or VP9
    • sends the encoded RTP stream using WebRTC to a Janus server
  3. Let “consumer” clients connect to Janus with their browser, in order to see the resulting live broadcast.

(If the video doesn’t display, here is the Youtube link.)

This is pretty cool and fun, as my colleague Brian Kardell mentions in the video. Working on this new version gave me more ideas for the next one. And very recently the audio rendering protocol was merged in WPEBackend-FDO! That means even more use-cases are now unlocked for GstWPE.

This demo’s source code is hosted on Github. Feel free to open issues there, I am always interested in getting feedback, good or bad!

GstWPE is maintained upstream in GStreamer and relies heavily on WPEWebKit and its FDO backend. Don’t hesitate to contact us if you have specific requirements or issues with these projects :)

by Philippe Normand at July 02, 2020 01:00 PM

July 01, 2020

Alejandro Piñeiro

v3dv status update 2020-07-01

About three weeks ago there was a big announcement about the update of the status of the Vulkan effort for the Raspberry Pi 4. Now the source code is public. Taking into account the interest that it got, and that now the driver is more usable, we will try to post status updates more regularly. Let’s talk about what’s happened since then.

Input Attachments

Input attachment is one of the main sub-features for Vulkan multipass, and we’ve gained support since the announcement. On Vulkan the support for multipass is more tightly supported by the API. Renderpasses can have multiple subpasses. These can have dependencies between each other, and each subpass define a subset of “attachments”. One attachment that is easy to understand is the color attachment: This is where a given subpass writes a given color. Another, input attachment, is an attachment that was updated in a previous subpass (for example, it was the color attachment on such previous subpass), and you get as a input on following subpasses. From the shader POV, you interact with it as a texture, with some restrictions. One important restriction is that you can only read the input attachment at the current pixel location. The main reason for this restriction is because on tile-based GPUs (like rpi4) all primitives are batched on tiles and fragment processing is rendered one tile at a time. In general, if you can live with those restrictions, Vulkan multipass and input attachment will provide better performance than traditional multipass solutions.

If you are interested in reading more details on this, you can check out ARM’s very nice presentation “Vulkan Multipass mobile deferred done right”, or Sascha Willems’ post “Vulkan input attachments and sub passes”. The latter also includes information about how to use them and code snippets of one of his demos. For reference, this is how the input attachment demos looks on the rpi4:

Sascha Willems inputattachment demos run on rpi4

Compute Shader

Given that this was one of the most requested features after the last update, we expect that this will be likely be the most popular news from this post: Compute shaders are now supported.

Compute shaders give applications the ability to perform non-graphics related tasks on the GPU, outside the normal rendering pipeline. For example they don’t have vertices as input, or fragments as output. They can still be used for massivelly parallel GPGPU algorithms. For example, this demo from Sascha Willems uses a compute shader to simulate cloth:

Sascha Willems Compute Cloth demos run on rpi4

Storage Image

Storage Image is another recent addition. It is a descriptor type that represents an image view, and supports unfiltered loads, stores, and atomics in a shader. It is really similar in most other ways to the well-known OpenGL concept of texture. They are really common with compute shaders. Compute shaders will not render (they can’t) directly any image, and it is likely that if they need an image, they will update it. In fact the two Sascha Willem demos using storage images also require compute shader support:

Sascha Willems compute shader demos run on rpi4

Sascha Willems compute raytracing demo run on rpi4

Performance

Right now our main focus for the driver is working on features, targetting a compliant Vulkan 1.0 driver. Having said so, now that we both support a good range of features and can run non-basic applications, we have devoted some time to analyze if there were clear points where we could improve the performance. Among these we implemented:
1. A buffer object (BO) cache: internally we are allocating and freeing really often buffer objects for basically the same tasks, so there are a constant need of buffers of the same size. Such allocation/free require a DRM call, so we implemented a BO cache (based on the existing for the OpenGL driver) so freed BOs would be added to a cache, and reused if a new BO is allocated with the same size.
2. New code paths for buffer to image copies.

Bugfixing!!

In addition to work on specific features, we also spent some time fixing specific driver bugs, using failing Vulkan CTS tests as reference. Thanks to that work, the Sascha Willems’ radial blur demo is now properly rendering, even though we didn’t focus specifically on working on that demo:

Sascha Willems radial blur demo run on rpi4

Next?

Now that the driver supports a good range of features and we are able to test more applications and run more Vulkan CTS Tests with all the needed features implemented, we plan to focus some efforts towards bugfixing for a while.

We also plan to start to work on implementing the support for Pipeline Cache, which allows the result of pipeline construction to be reused between pipelines and between runs of an application.

by infapi00 at July 01, 2020 05:26 PM

June 30, 2020

Enrique Ocaña

Developing on WebKitGTK with Qt Creator 4.12.2

After the latest migration of WebKitGTK test bots to use the new SDK based on Flatpak, the old development environment based on jhbuild became deprecated. It can still be used with export WEBKIT_JHBUILD=1, though, but support for this way of working will gradually fade out.

I used to work on a chroot because I love the advantages of having an isolated and self-contained environment, but an issue in the way bubblewrap manages mountpoints basically made it impossible to use the new SDK from a chroot. It was time for me to update my development environment to the new ages and have it working in my main Kubuntu 18.04 distro.

My mail goal was to have a comfortable IDE that follows standard GUI conventions (that is, no emacs nor vim) and has code indexing features that (more or less) work with the WebKit codebase. Qt Creator was providing all that to me in the old chroot environment thanks to some configuration tricks by Alicia, so it should be good for the new one.

I preferred to use the Qt Creator 4.12.2 offline installer for Linux, so I can download exactly the same version in the future in case I need it, but other platforms and versions are also available.

The WebKit source code can be downloaded as always using git:

git clone git.webkit.org/WebKit.git

It’s useful to add WebKit/Tools/Scripts and WebKit/Tools/gtk to your PATH, as well as any other custom tools you may have. You can customize your $HOME/.bashrc for that, but I prefer to have an env.sh environment script to be sourced from the current shell when I want to enter into my development environment (by running webkit). If you’re going to use it too, remember to adjust to your needs the paths used there.

Even if you have a pretty recent distro, it’s still interesting to have the latests Flatpak tools. Add Alex Larsson’s PPA to your apt sources:

sudo add-apt-repository ppa:alexlarsson/flatpak

In order to ensure that your distro has all the packages that webkit requires and to install the WebKit SDK, you have to run these commands (I omit the full path). Downloading the Flatpak modules will take a while, but at least you won’t need to build everything from scratch. You will need to do this again from time to time, every time the WebKit base dependencies change:

install-dependencies
update-webkitgtk-libs

Now just build WebKit and check that MiniBrowser works:

build-webkit --gtk
run-minibrowser --gtk

I have automated the previous steps as go full-rebuild and runtest.sh.

This build process should have generated a WebKit/WebKitBuild/GTK/Release/compile_commands.json
file with the right parameters and paths used to build each compilation unit in the project. This file can be leveraged by Qt Creator to get the right include paths and build flags after some preprocessing to translate the paths that make sense from inside Flatpak to paths that make sense from the perspective of your main distro. I wrote compile_commands.sh to take care of those transformations. It can be run manually or automatically when calling go full-rebuild or go update.

The WebKit way of managing includes is a bit weird. Most of the cpp files include config.h and, only after that, they include the header file related to the cpp file. Those header files depend on defines declared transitively when including config.h, but that file isn’t directly included by the header file. This breaks the intuitive rule of “headers should include any other header they depend on” and, among other things, completely confuse code indexers. So, in order to give the Qt Creator code indexer a hand, the compile_commands.sh script pre-includes WebKit.config for every file and includes config.h from it.

With all the needed pieces in place, it’s time to import the project into Qt Creator. To do that, click File → Open File or Project, and then select the compile_commands.json file that compile_commands.sh should have generated in the WebKit main directory.

Now make sure that Qt Creator has the right plugins enabled in Help → About Plugins…. Specifically: GenericProjectManager, ClangCodeModel, ClassView, CppEditor, CppTools, ClangTools, TextEditor and LanguageClient (more on that later).

With this setup, after a brief initial indexing time, you will have support for features like Switch header/source (F4), Follow symbol under cursor (F2), shading of disabled if-endif blocks, auto variable type resolving and code outline. There are some oddities of compile_commands.json based projects, though. There are no compilation units in that file for header files, so indexing features for them only work sometimes. For instance, you can switch from a method implementation in the cpp file to its declaration in the header file, but not the opposite. Also, you won’t see all the source files under the Projects view, only the compilation units, which are often just a bunch of UnifiedSource-*.cpp files. That’s why I prefer to use the File System view.

Additional features like Open Type Hierarchy (Ctrl+Shift+T) and Find References to Symbol Under Cursor (Ctrl+Shift+U) are only available when a Language Client for Language Server Protocol is configured. Fortunately, the new WebKit SDK comes with the ccls C/C++/Objective-C language server included. To configure it, open Tools → Options… → Language Client and add a new item with the following properties:

  • Name: ccls
  • Language: *.c;.cpp;*.h
  • Startup behaviour: Always On
  • Executable: /home/enrique/work/webkit/WebKit/Tools/Scripts/webkit-flatpak
  • Arguments: --gtk -c ccls --index=/home/enrique/work/webkit/WebKit

Some “LanguageClient ccls: Unexpectedly finished. Restarting in 5 seconds.” errors will appear in the General Messages panel after configuring the language client and every time you launch Qt Creator. It’s just ccls taking its time to index the whole source code. It’s “normal”, don’t worry about it. Things will get stable and start to work after some minutes.

Due to the way the Locator file indexer works in Qt Creator, it can become confused, run out of memory and die if it finds cycles in the project file tree. This is common when using Flatpak and running the MiniBrowser or the tests, since /proc and other large filesystems are accessible from inside WebKit/WebKitBuild. To avoid that, open Tools → Options… → Environment → Locator and set Refresh interval to 0 min.

I also prefer to call my own custom build and run scripts (go and runtest.sh) instead of letting Qt Creator build the project with the default builders and mess everything. To do that, from the Projects mode (Ctrl+5), click on Build & Run → Desktop → Build and edit the build configuration to be like this:

  • Build directory: /home/enrique/work/webkit/WebKit
  • Add build step → Custom process step
    • Command: go (no absolute route because I have it in my PATH)
    • Arguments:
    • Working directory: /home/enrique/work/webkit/WebKit

Then, for Build & Run → Desktop → Run, use these options:

  • Deployment: No deploy steps
  • Run:
    • Run configuration: Custom Executable → Add
      • Executable: runtest.sh
      • Command line arguments:
      • Working directory:

With these configuration you can build the project with Ctrl+B and run it with Ctrl+R.

I think I’m not forgetting anything more regarding environment setup. With the instructions in this post you can end up with a pretty complete IDE. Here’s a screenshot of it working in its full glory:

Anyway, to be honest, nothing will ever reach the level of code indexing features I got with Eclipse some years ago. I could find usages of a variable/attribute and know where it was being read, written or read-written. Unfortunately, that environment stopped working for me long ago, so Qt Creator has been the best I’ve managed to get for a while.

Properly configured web based indexers such as the Searchfox instance configured in Igalia can also be useful alternatives to a local setup, although they lack features such as type hierarchy.

I hope you’ve found this post useful in case you try to setup an environment similar to the one described here. Enjoy!

by eocanha at June 30, 2020 03:47 PM

June 23, 2020

Igalia Compilers Team

Dates and Times in JavaScript

tl;dr: We are looking for feedback on the Temporal proposal. Try out the polyfill, and complete the survey; but don’t use it in production yet!

JavaScript Date is broken in ways that cannot be fixed without breaking the web. As the story goes, it was included in the original 10-day JavaScript engine hack and based on java.util.Date, which itself was deprecated in 1997 due to being a terrible API and replaced with a better one. The result has been for all of JavaScript’s history, the built-in Date has remained very hard to work with directly.

Starting a few years ago, a proposal has been developing, to add a new globally available object to JavaScript, Temporal. Temporal is a robust and modern API for working with dates, times, and timestamps, and also makes it easy to do things that were hard or impossible with Date, like converting dates between time zones, adding and subtracting while accounting for daylight saving time, working with date-only or time-only data, and even handling dates in non-Gregorian calendars. Although Temporal has “just works” defaults, it also provides fine-grained opt-in control of overflows, interpreting ambiguous times, and other corner cases. For more on the history of the proposal, and why it’s not possible to fix Date itself, read Maggie Pint’s two-part blog post “Fixing JavaScript Date”.

For examples of the power of Temporal, check out the cookbook. Many of these examples would be difficult to do with legacy Date, particularly the ones involving time zones. (We would have put an example in this post, but the code might soon become stale, for reasons which will hopefully become clear!)

This proposal is currently at Stage 2 in TC39’s proposal process, and we1 are hoping to move it along to Stage 3 soon.2 We have been working on the feature set of Temporal and the API for a long time, and we believe it’s full-featured and that the API is reasonable. You don’t design good APIs solely on the drawing board, however, so it’s time to put it to the test and let the JavaScript developer community try it out and see whether what we’ve come up with meets people’s needs.

It is still early enough that we can make drastic changes to the API if we find we need to, based on the feedback that we get. So please, try it out and let us know!

How to Try Temporal

If you just want to try Temporal out casually, with an interactive prompt, that’s easy! Visit the API documentation in your browser. On any of the documentation or cookbook pages, you can open your browser console and Temporal will be already loaded, ready for you to try out the examples. Or you can try it out on RunKit.

Or, maybe you are interested in a bit more in-depth evaluation, like building a small test project using Temporal. We know this takes up people’s valuable project time, but it’s also the best way that we can get the most valuable feedback, so we’d really appreciate this! We have released a polyfill for the Temporal API on npm. You can use it in your project with npm install --save proposal-temporal, and import it in your project with const { Temporal } = require('proposal-temporal');.

However, don’t use the polyfill in production applications! The proposal is still at Stage 2, and the polyfill has an 0.x version, so that should make it clear that the API is subject to change, and we do intend to keep changing it when we get feedback from you!

How to Give Feedback

We would love to hear from you about your experiences with Temporal! Once you’ve tried it, we have a short survey for you to fill out. If you feel comfortable doing so, please leave us your contact information, since we might want to ask some follow up questions.

Please also open an issue on our issue tracker if you have some suggestion! We welcome suggestions whether or not you filled out the survey. You can also browse the feedback that’s already been given in the issue tracker, and give it a thumbs-up if you agree or thumbs-down if you disagree.

Thanks for participating if you can! All the feedback that we receive now will help us make the right decisions as the proposal moves along to Stage 3 and Temporal eventually appears in your browser.


[1] “We” in this post means the Temporal champions group, a group of TC39 delegates and interested people. As you may guess from where this blog post is hosted, it includes members of Igalia’s Compilers team, but this was written on behalf of the Temporal champions. ↩

[2] Read the TC39 process document for more information on what these stages mean. tl;dr: Stage 2 is the time to give feedback on the proposal that can still be incorporated even if it requires drastic changes. Stage 3 is when the proposal remains stable except for serious problems discovered during implementation in browsers. ↩

by compilers at June 23, 2020 06:11 PM