Planet Igalia

November 22, 2020

Eleni Maria Stea

FOSSCOMM 2020, and a status update on EXT_external_objects(_fd) extensions [en, gr]

FOSSCOMM (Free and Open Source Software Communities Meeting) is a Greek conference aiming to promote the use of FOSS in Greece and to bring FOSS enthusiasts together. It is organized entirely by volunteers and universities and takes place in a different city each year. This year it was virtual as Greece is under lockdown, and … Continue reading FOSSCOMM 2020, and a status update on EXT_external_objects(_fd) extensions [en, gr]

by hikiko at November 22, 2020 06:11 PM

November 20, 2020

Paulo Matos

A tour of the for..of implementation for 32bits JSC

We look at the implementation of the for-of intrinsic in 32bit JSC (JavaScriptCore).

More…

by Paulo Matos at November 20, 2020 02:00 PM

Maksim Sisov

Chrome/Chromium on Wayland: The Waylandification project.

It has been a long time since I wrote my last blog post and since I wrote about something that I and my colleagues at Igalia have been working for the past 4 years. I have been postponing writing this post waiting until something big happens. Well, something big just happened…

If you already know what Ozone is, then I am happy to tell you that Chromium for Linux includes Ozone by default now and can be enabled with runtime command line flags. If you are interested in trying Chrome/Chromium with native Wayland support, you are encouraged to download Google Chrome for developers and try Ozone/Wayland by running the browser with the following command line flags – ‘–enable-features=UseOzonePlatform –ozone-platform=wayland’.

If you don’t know what Ozone is, here’s a brief explanation, before I talk about the history, status and design of this effort.

What is Ozone?

The very first thing that one may think about when they hear “Ozone” is the gas or a thin layer of the Earth’s atmosphere. Well… it is partly correct. In the case of Chromium, it is a platform abstraction layer.

I will not go into many details, but here is the description of that layer from Chromium’s documentation about Ozone –
“Ozone is a platform abstraction layer beneath Aura, Chromium’s platform independent windowing system, that is used for low level input and graphics. Once complete, the abstraction will support underlying systems ranging from embedded SoC targets to new X11-alternative window systems on Linux such as Wayland or Mir to bring up Aura Chromium by providing an implementation of the platform interface.”.
If you are interested in more details, you are welcome to read the project’s documentation at https://chromium.googlesource.com.

The Summary of the Design of Ozone/Wayland

It has been a long time since Antonio Gomes started to work on this project. It started as a research project for our customer – Renesas Electronics, and was based on a former abstraction project with another clever name, “mus+ash” (pronounced “mustache”, you can read more about that here – Chromium, ozone, wayland and beyond).

Since that time, the project has been moved to downstream and back to upstream (because of some unknowns related to the “mus+ash”) and the design of Ozone integration has also been changed.

Currently, the Aura/Platform classes are injected into the Browser process and communicate directly with the underlying Ozone platforms including Wayland. In the browser process, Wayland creates a connection with a Wayland compositor, while in the GPU process, it only draws pixels into the created DMABUFs and neither receives events nor creates surfaces.

Migrating Away From X11-only Legacy Backend.

It is worth mentioning that Igalia has been working on both Ozone/X11 and Ozone/Wayland.

Since June 2020, we have been working on switching Ozone for Linux from needing to be done at compile time to being choosable at runtime. At the moment, one can try Ozone by running Chrome downloaded from the development channel with the ‘–enable-features=UseOzonePlatform –ozone-platform=wayland/x11’ runtime flags.

That approach is allowing us to gather a bigger audience of users who are willing to test the Ozone capabilities, but also achieve a better feature parity between non-Ozone/X11 and Ozone/X11/Wayland.

That is, most of the features and code paths are shared between the two implementations, and the paths that are not compatible with Ozone are being refactored at the moment.

Once all the incompatible  paths are refactored ( just a few of them remain) and all the available test suites are enabled on the Linux/Ozone bot, we will start what is known as a “finch trial”.  This allows Ozone to be enabled by default for some percentage of users (about 10%). If the finch trial goes well, the percentage of users will be gradually grown to 100% and we will start removing old non”-Ozone/X11 implementation.

Wayland + Tab Drag

If you’ve been trying it out, you might have already noticed that Ozone/Wayland does not support the Tab Drag feature well enough. The problem is the lack of the protocol for this feature.

At the moment, my colleague Nick Diego is working on the definition of the protocol for tab drag and implementation of that in Chromium.

Unfortunately, Ozone will fallback to x11/xwayland for compositors that do not support the aforementioned protocol. However, once more and more compositors will support that, Chrome will whitelist those compositors.

I would not go into details of that effort here in this blog post, but just rather leave a link to the design document that Nick has created – Tab Dragging on Ozone/Wayland.

Summary

This blog post was rather a brief summary of the design, feature, and status of the project. Thank you for reading it. Hopefully, when we start a finch trial, I will write another blog telling you how it goes. Bye.

by msisov at November 20, 2020 11:28 AM

Brian Kardell

Open Prioritization First Experiment Wrap Up

Open Prioritization First Experiment Wrap Up

Earlier this year, Igalia launched an experiment called "Open Prioritization" which, effectively, lets us pull money together to prioritize work on a feature in web browsers. In this piece I'll talk about the outcome, lessons learned along the way and next steps.

Our Open Prioritization experiment was a pretty big idea. On the surface, it seems to be asking a pretty simple question: "Could we crowdfund some development work on browser?" However, it was quite a bit more involved in its goals, because there is a lot more hiding behind that question than meets the eye, and most of it is kind of difficult to talk about in purely theoretical ways. I'll get into all of that in a minute, but let's start with the results...

One project advances: :focus-visible

We began the experiment with six possible things we would try to crowdfund, and I'm pleased to say that one will advance: :focus-visible in WebKit.

We are working with Open Collective on next steps as it involves some decision making on how we manage future experiments and a bigger idea too. However, very soon this will shift from a pledged collective which just asked "would you financially support this project if it were to be offered?" to a proper way to collect funds for it. If you pledged, you will receive an contact when it's ready asking you to fulfill your pledge with information on how. We will also write a post when that happens as it's likely that at least some people will not come back and fulfill their pledge.

As soon as this begins and enough funds are available, it will enter our developers work queue and as staff frees up, they will shift to begin work on implementing this in WebKit!

We did it! We at Igalia would like to say a giant "thank you" for all of those who helped support these efforts in improving our commons.

Retrospective

Let's talk about some of those bigger ideas this experiment was aiming to look at, and lessons we learned along the way, in retrospect...

  • Resources are finite. Prioritization is hard. No matter how big the budget, resources are still finite and work has to be prioritized. Even with only a few choices on the table to choose from, it's not necessarily easy or plain what to choose because there isn't a "right" answer.

  • There are reasonable arguments for prioritizing different things. The two finalists both had strong arguments from different angles, but even a step back - at least some people chose to pledge to something else. Some thought that supporting SVG path syntax in CSS was the best choice. They only pledged to that one. Many of these were final implementations, but this wasn't. Some people thought that advancing new things that no one else seems to be advancing was the way to go. Others supported it because they thought that it was really important to boost ones that are help Mozilla. There just weren't enough people either seeing or agreeing with that weighting of things.

  • Cost is a factor It's not an exclusive factor - the cheapest option by far (SVG Path in CSS/Mozilla) was eliminated earlier. There are other reasons :focus-visible made some giant leaps too, but - at the end of the day the bar was also just lower. The second place project never actually managed to pull ahead, depite hoving more actual pledged dollars at one point.

  • Investing with uncertainty is especially hard . Just last week, Twitter exploded with excitement that Google was going to prototype some high level stuff with Container Queries. Fundamental to Google's intent is CSS containment in a single direction. CSS does not currently define containment in a single direction, but it does define the containment module where it would be defined. Containment was, in part, trying to lay some potential groundwork here. When we launched the project, I wrote about this: WebKit doesn't currently support the containment that is defined already and is a necessary prerequisite of any proposal involving that approach. The trouble is: We don't know if that will be the approach, and supporting it is a big task. Building a high level solution on the magic in our switch proposal, for example, doesn't require containment at all. Adding general containment support was the most expensive project on our list, by far. In fact, we could have done a couple of them for that price. This makes the value proposition of that work very speculative. Despite being potentially critically valuable for the single biggest/longest ask in CSS history - that project didn't make the finals when we put it to the public either.

  • Some things are difficult to predict. Going into this, I didn't know what to expect. A single viral tweet and a mass of developers pitching in $1 or $2 could, in theory, have funded any of these in hours. While I didn't expect that, I did kind of expect some amount of funds in the end would be of that sort. Interestingly, that didn't happen. At all. Depite lots of efforts trying to get lots of people to pledge very small dollars even asking specifically, and making it possible to do with a tweet - very, very few did (literally 1 on the winning project pledged less than five dollars). The most popular pledge was $20 with about a quarter of the pledges being over $50, and going up from there.

  • Matching funds are a really big deal. You can definitely see why fundraisers stress this. For the duration of this experiment, we saw periods of little actual movement, despite lots of tweets about it, likes and blog posts. There were a few giant leaps, and they all involved offers of matching dollars. Igalia ourselves, The A11Y Project and AMPHTML all had some offer of matching dollars that really seemed to inspire a lot more participation. The bigger the matching dollars available, the bigger the participation was.

  • Communication is hard. These might not have been the most ideal projects, in some respects. This last bullet is complicated enough that I'll give it it's own section.
Lessons learned: Communication challenges

While I am tremendously happy that inert and :focus-visible were our finalists and both did very well, I am biased. I helped advocate for and specify these two features before I came to Igalia, working with some Googlers who also did the initial implementations. I also advocated for them to be included in the list of projects we offered. However, I failed to anticipate that the very reasons I did both of these would present challenges for the experiment, so I'd like to talk about that a bit...

Unfortunately a confluence of things led to a lot of chatter and blog posts which were effectively saying something along the lines of "Developers shouldn't have to foot the bill because Apple doesn't care about accessibility and refuses to implement something. They don't care, and this is evidence proof - they are the last ones to not implement" and I wound up having a lot of conversations trying to correct the various misunderstanding here. That's not everyone else's fault, it's mine. I should have taken more time to communicate these things clearly, but for the record, nothing about this is really correct, so let me take the time to add the clarity for posterity...

  • On last implementations The second implementations only recently began or completed in Firefox, and one of those was also by Igalia. It seems really unfortunate and not exactly fair to suggest that being a few weeks/months behind, and especially when that came from outside help, is really an indictment. It's not. As an example, in the winning project, Chromium shipped this by default in October 2020. Firefox is right now pending a default release. Keep in mind that vendors don't have perfect insight into what is happening in other browsers, and even if they did reallocating resources isn't a thing that is done on a whim: Different browsers have different people with different skills and availability at any given point in time.

  • On refusal to implement This is 100% incorrect. I want to really stress this: Every item on our list comes from the list of things that are 'wants' from vendors themselves that need prioritization and are among the things they will be considering taking up next. If not funded here, it will definitely still get done - it's just impossible to say when really, and whatever priority they give it, they can't give to something else. This experiment gives us a more definite timeframe and frees them to spend that on implementing something else.

  • On web developers shouldn't have to foot the bill. Well, if you mean contributing dollars directly in crowdfunding in order to get the feature, we absolutely don't (see above bullet). However, generally speaking, this was in fact part of the conversation we wanted to start. Make no mistake: You are paying today, indirectly - and the actual investment back into the commons is inefficeint and non-guaranteed. It's wonderful that 3 organizations have seemed to foot the bill for decades, but starting a conversation about whether it is talking about that is definitely part of the goal here.

  • On "Apple doesn't care about accessibility" This one makes me really sad, not only because I know it isn't true and it seems easy to show otherwise, but also because there are some really great people from Apple like James Craig who absolutely not only care very deeply but often help lead on important things.

  • On "it's wrong to crowdfund accessibility features"Unfortunately, it seems the very things that drew me to work on these in the first place wound up working against us a little: Both inert and :focus-visible are interesting because they are "core features" to the platform that are useful to everyone. However, they are designed to sit at an intersection where they happily have really out-sized impact for accessibility. There are good polyfills for both of these which work well and somewhat reduce the degree of 'urgency'. I really thought that this made for a nice combination of interests/pains might lead to good partnerships of investment where, yes, I imagined that perhaps some organizations interested in advancing the accessibility end of things and who have historically contribute their labors, might see value in contributing to the flame more directly. Perhaps this wasn't as wise or obviously great as I imagined.

Wrapping up

All in all, in the end - despite some rocky communications, we are really encouraged by this first experiment. Thank you to everyone who pledged, boosted, blogged about the effort, etc. We're really looking forward to taking this much further next year and we'd like to begin by asking you to share which specific projects you'd be interested in seeing or supporting in the future? Hit us up on @briankardell or @igalia.

November 20, 2020 05:00 AM

November 14, 2020

Eleni Maria Stea

A hack to display the Vulkan CTS tests output

Vulkan conformance tests for graphics drivers save their output images inside an XML file called TestResults.qpa. As binary outputs aren’t allowed, these output images (that would be saved as PNG otherwise) are encoded to text using Base64 and the result is printed between <Image></Image> XML tags. This is a problem sometimes, as external tools are … Continue reading A hack to display the Vulkan CTS tests output

by hikiko at November 14, 2020 03:20 AM

November 13, 2020

Alexander Dunaev

HiDPI support in Chromium for Wayland

It all started with this bug. The description sounded humble and harmless: the browser ignored some command line flag on Wayland. A screenshot was attached where it was clearly seen that Chromium (version 72 at that time, 2019 spring) did not respect the screen density and looked blurry on a HiDPI screen.

HiDPI literally means small pixels. It is hard to tell now what was the first HiDPI screen, but I assume that their wide recognition came around 2010 with Apple’s Retina displays. Ultra HD had been standardised in 2012, defining the minimum resolution and aspect ratio for what today is known informally as 4K—and 4K screens for laptops have pixels that are small enough to call it HiDPI. This Chromium issue, dated 2012, says that the Linux port lacks support for HiDPI while the Mac version has it already. On the other hand, HiDPI on Windows was tricky even in 2014.

‘That should be easy. Apparently it’s upscaled from low resolution. Wayland allows setting scale for the back buffers, likely you’ll have to add a single call somewhere in the window initialisation’, a colleague said.

Like many stories that begin this way, this turned out to be wrong. It was not so easy. Setting the buffer scale did the right thing indeed, but it was absolutely not enough. It turned out that support for HiDPI screens was entirely missing in our implementation of the Wayland client. On my way to the solution, I have found that scaling support in Wayland is non-trivial and sometimes confusing. Since I finished this work, I have been asked a few times about what happens there, so I thought that writing it all down in a post would be useful.

Background

Modern desktop environments usually allow configuring the scale of the display at global system level. This allows all standard controls and window decorations to be sized proportionally. For applications that use those standard controls, this is a happy end: everything will be scaled automatically. Those which prefer doing everything themselves have to get the current scale from the environment and adjust rendering.  Chromium does exactly that: inside it has a so-called device scale factor. This factor is applied equally to all sizes, locations, and when rendering images and fonts. No code has to bother ever. It works within this scaled coordinate system, known as device independent pixels, or DIP. The device scale factor can take fractional values like 1.5, but, because it is applied at the stage of rendering, the result looks nice. The system scale is used as default device scale factor, and the user can override it using the command line flag named --force-device-scale-factor. However, this is the very flag which did not work in the bug mentioned in the beginning of this story.

Note that for X11 the ‘natural’ scale is still the physical pixels.  Despite having the system-wide scale, the system talks to the application in pixels, not in DIP.  It is the application that is responsible to handle the scale properly. If it does not, it will look perfectly sharp, but its details will be perhaps too small for the naked eye.

However, Wayland does it a bit differently. The system scale there is respected by the compositor when pasting buffers rendered by clients. So, if some application has no idea about the system scale and renders itself normally, the compositor will upscale it.  This is what originally happened to Chromium: it simply drew itself at 100%, and that image was then stretched by the system compositor. Remember that the Wayland way is giving a buffer to each application and then compositing the screen from those buffers, so this approach of upscaling buffers rendered by applications is natural. The picture below shows what that looks like. The screenshot is taken on a HiDPI display, so in order to see the difference better, you may want to see the full version (click the picture to open).

What Chromium looked like when it did not set its back buffer scale

Firefox (left) vs. Chromium (right)

How do Wayland clients support HiDPI then?

Level 1. Basic support

Each physical output device is represented at the Wayland level by an object named output. This object has a special integer property named buffer scale that tells literally how many physical pixels are used to represent the single logical pixel. The application’s back buffer has that property too. If scales do not match, Wayland will simply scale the raster image, thus emulating the ‘normal DPI’ device for the application that is not aware of any buffer scales.

The first thing the window is supposed to do is to check the buffer scale of the output that it currently resides at, and to set the same value to its back buffer scale. This will basically make the application using all available physical pixels: as scales of the buffer and the output are the same, Wayland will not re-scale the image.

Back buffer scale is set but rendering is not aware of that

Chromium now renders sharp image but all details are half their normal size

The next thing is fixing the rendering so it would scale things to the right size.  Using the output buffer scale as default is a good choice: the result will be ‘normal size’.  For Chromium, this means simply setting the device scale factor to the output buffer scale.

Now Chromium looks right

All set now

The final bit is slightly trickier.  Wayland sends UI events in DIP, but expects the client to send surface bounds in physical pixels. That means that if we implement something like interactive resize of the window, we will also have to do some math to convert the units properly.

This is enough for the basic support.  The application will work well on a modern laptop with 4K display.  But what if more than a single display is connected, and they have different pixel density?

Level 2. Multiple displays

If there are several output devices present in the system, each one may have its own scale. This makes things more complicated, so a few improvements are needed.

First, the window wants to know that it has been moved to another device.  When that happens, the window will ask for the new buffer scale and update itself.

Second, there may be implementation-specific issues. For example, some Wayland servers initially put the new sub-surface (which is used for menus) onto the default output, even if its parent surface has been moved to another output.  This may cause weird changes of their scale during their initialisation.  In Chromium, we just made it so the sub-surface always takes its scale from the parent.

Level 3? Fractional scaling?

Not really. Fractional scaling is basically ‘non-even’ scales like 125%. The entire feature had been somewhat controversial when it had been announced, because of how rendering in Wayland is performed. Here, non-even scale inevitably uses raster operations which make the image blurry. However, all that is transparent to the applications. Nothing new has been introduced at the level of Wayland protocols.

Conclusion

Although this task was not as simple as we thought, in the end it turned out to be not too hard. Check the output scale, set the back buffer scale, scale the rendering, translate pixels to DIP and vice versa in certain points. Pretty straightforward, and if you are trying to do something related, I hope this post helps you.

The issue is that there are many implementations of Wayland servers out there, not all of them are consistent, and some of them have bugs. It is worth testing the solution on a few distinct Linux distributions and looking for discrepancies in behaviour.

Anyway, Chromium with native Wayland support has recently reached beta—and it supports HiDPI! There may be bugs too, but the basic support should work well. Try it, and let us know if something is not right.

Note: the Wayland support is so far experimental. To try it, you would need to launch chrome via the command line with two flags:
--enable-features=UseOzonePlatform
--ozone-platform=wayland

by adunaev at November 13, 2020 10:10 AM

November 08, 2020

Eleni Maria Stea

[OpenGL and Vulkan Interoperability on Linux] Part 10: Reusing a Vulkan stencil buffer from OpenGL

This is 10th post on OpenGL and Vulkan interoperability with EXT_external_objects and EXT_external_objects_fd. We’ll see the last use case I’ve written for Piglit to test the extensions implementation on various mesa drivers as part of my work for Igalia. In this test a stencil buffer is allocated and filled with a pattern by Vulkan and … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 10: Reusing a Vulkan stencil buffer from OpenGL

by hikiko at November 08, 2020 10:07 PM

November 05, 2020

Iago Toral

V3DV + Zink

During my presentation at the X Developers Conference I stated that we had been mostly using the Khronos Vulkan Conformance Test suite (aka Vulkan CTS) to validate our Vulkan driver for Raspberry Pi 4 (aka V3DV). While the CTS is an invaluable resource for driver testing and validation, it doesn’t exactly compare to actual real world applications, and so, I made the point that we should try to do more real world testing for the driver after completing initial Vulkan 1.0 support.

To be fair, we had been doing a little bit of this already when I worked on getting the Vulkan ports of all 3 Quake game classics to work with V3DV, which allowed us to identify and fix a few driver bugs during development. The good thing about these games is that we could get the source code and compile them natively for ARM platforms, so testing and debugging was very convenient.

Unfortunately, there are not a plethora of Vulkan applications and games like these that we can easily test and debug on a Raspberry Pi as of today, which posed a problem. One way to work around this limitation that was suggested after my presentation at XDC was to use Zink, the OpenGL to Vulkan layer in Mesa. Using Zink, we can take existing OpenGL applications that are currently available for Raspberry Pi and use them to test our Vulkan implementation a bit more thoroughly, expanding our options for testing while we wait for the Vulkan ecosystem on Raspberry Pi 4 to grow.

So last week I decided to get hands on with that. Zink requires a few things from the underlying Vulkan implementation depending on the OpenGL version targeted. Currently, Zink only targets desktop OpenGL versions, so that limits us to OpenGL 2.1, which is the maximum version of desktop OpenGL that Raspbery Pi 4 can support (we support up to OpenGL ES 3.1 though). For that desktop OpenGL version, Zink required a few optional Vulkan 1.0 features that we were missing in V3DV, namely:

  • Logic operations.
  • Alpha to one.
  • VK_KHR_maintenance1.

The first two were trivial: they were already implemented and we only had to expose them in the driver. Notably, when I was testing these features with the relevant CTS tests I found a bug in the alpha to one tests, so I proposed a fix to Khronos which is currently in review.

I also noticed that Zink was also implicitly requiring support for timestamp queries, so I also implemented that in V3DV and then also wrote a patch for Zink to handle this requirement better.

Finally, Zink doesn’t use Vulkan swapchains, instead it creates presentable images directly, which was problematic for us because our platform needs to handle allocations for presentable images specially, so a patch for Zink was also required to address this.

As of the writing of this post, all this work has been merged in Mesa and it enables Zink to run OpenGL 2.1 applications over V3DV on Raspberry Pi 4. Here are a few screenshots of Quake3 taken with the native OpenGL driver (V3D), with the native Vulkan driver (V3DV) and with Zink (over V3DV). There is a significant performance hit with Zink at present, although that is probably not too unexpected at this stage, but otherwise it seems to be rendering correctly, which is what we were really interested to see:


Quake3 Vulkan renderer (V3DV)

Quake3 OpenGL renderer (V3D)

Quake3 OpenGL renderer (Zink + V3DV)

Note: you’ll notice that the Vulkan screenshot is darker than the OpenGL versions. As I reported in another post, that is a feature of the Vulkan port of Quake3 and is unrelated to the driver.

Going forward, we expect to use Zink to test more applications and hopefully identify driver bugs that help us make V3DV better.

by Iago Toral at November 05, 2020 10:14 AM

Brian Kardell

All Them Switches: Responsive Elements and More

All Them Switches: Responsive Elements and More

In this post I'll talk about developments along the way to a 'responsive elements' proposal (aka container queries/element queries use cases) that I talked about earlier this year, a brief detour along the way, and finally, ask for your input on both...

I've been talking a lot this year about the web ecosystem as a commons, its health, and why I believe that diversifying investment in it is both important and productive1,2,3,4,5. Collectively, at Igalia, believe this and we choose to invest in the commons ourselves too. We try to apply our expertise toward helping solve hard problems that have been long stuck, trying to listen to developers and do things that we believe can help further what we think are valuable causes. I'd like to tell you the story of one of those efforts, which became two - and enlist your input..

Advancing the Container Queries Cause

As you may recall, back in Feburary I posted an article explaining that we had been working on this problem a bunch, and sharing our thoughts and progress and just letting people know that something is happening... People are listening, and trying. I also shared that our discussions also prompted David Baron's work toward another possible path.

We wanted to present these together, so by late April we both made informal proposals to the CSS working group of what we'd like to explore. Ours was to begin with a switch() function in CSS focused on slotting into the architecture of CSS in a way that allows us to solve currently impossible problems. If we can show that this works and progress all of the engines, the problem of sugaring an even higher level expression becomes possible, but we deliver useful values fast too.

Neither the CSS working nor anyone involved in any of the proposals is arguing that these are an either/or choice here. We are pursuing options and answering questions, together. We all believe that working this problem from both ends has definite value in both the short and long term and are mutually supportive. We are also excited by Miriam Suzanne's recent work. They are almost certainly complimentary and may even wind up helping each other with different cases.

Shortly after we presented our idea, Igalia also said that we would be willing to invest time to try to do some prototyping and implementation exploration and report back.

Demos and Updates

My colleague Javi Fernadez agreed to tackle initial implementation investigations with some down time he had. Initially, he made some really nice progress pretty quickly, coming up with a strategy, writing some low-level tests and getting them passing. But, then the world got very... you know... hectic.

However, I'm really happy to announce today that that we have recently completed enough to to share and to say we'll be able to take this experience back to report to CSSWG pretty soon.

The following demos represent research and development. Implementation is limited, not yet standard and was done for the purposes of investigation, dicussion and to answer questions necessary for implementers. It is, nevertheless, real, functioning code.

A running demo in a build of Chromium of an image grid component designed independently from page layout, which uses the proposed CSS switch() function to declaratively, responsively change the grid-template-columns that it uses based on the size available to it.

Cool, right? Here's a short "lightning talk" style presentation on it with some more demos too (note the bit of jank you see is dropped frames from my recording, rendering is very fluid as they are in the version embedded above)...

So - I think this is pretty exciting... What do you think? Here are answers to some questions I know people have

FAQ
Why a function and not a MQ or pseduo?
My post from Feb and the proposal explains that this is not an "instead of", but rather a " simpler and powerful step in breaking down the problem, which is luckily also a very useful step on its own". The things we want ultimately and tend to discuss are full of several hard problems, not just one. It's very hard to discuss numerous hypotheticals all at once, especially if they represent enormous amounts of work to imagine how they slot together in existing CSS architecture. That's not to say we shouldn't still try that too, it's just that the path for one is more definite and known. Our proposal, we believe, neatly slots out a few of the hardest problems in CSS and provides an achieveable answer we can make fast progress on in all engines and lessen the number of open questoons. This could allow us to both take on higher level sugar next, but also to fill that gap in user-land until we do. Breaking down problems like this is probably a thing you have done on your own engineering projects. It makes sense.
Why is it called inline available size?
The short answer is because that is accurately what it actually represents internally and there are good arguments for it I'll save for a more detailed post if this carries on, but don't get hung up on that, we haven't begun bikeshedding details of how you write the function and it will change. In fact, these demos use a slightly different format than our proposal because it was easier to parse. Don't get hung up on that either.
Where can you use switch?
You can use anything anywhere, but it will only be valid and meaningful in certain places. The function will only provide an available-inline-size value to switch in places that the CSS WG agrees to list. Sensibly what you can say is that these will never include things that could create circularties because they are used in derermining the available size. So, you wont be able to change the display, or the font with a switch() that depends on inine-available-size, but anything that changes properties of a layout module or paint is probably fair game. CSS could make other switchable values available for other properties though.
Why doesn't it use min-width/max-width style like media queries?
Media Queries L4 supports these examples, we just wanted to show you could. You could just as easily use min-width/max-width here!
Bonus Round: Switching gears...

Shortly after we made our switch proposal, my friend Jon Neal opened a github issue based on some twitter conversations. For the next week or two this thread was very busy with lots of function proposals that looked vaguely "switch-like". In fact, a number of them were also using the word "switch". From these, there are 3 ideas which seem interesting, look somewhat similar, but have some (very) importantly different characteristics, challenges and abilities. They are described below.

nth-value()

This proposal is a function which lets a variable represent an index into a list of possible values. Its use would look like this:

.foo {
  color: 
    nth-value(
      var(--x); 
      red; 
      blue; 
      yellow
    );
}
cond()

This proposal is a function which allows you to pass pairs of math-like conditions and value associations, as well as a default value. The conditions are evaluated from left to right and the value following the first condition to be true is used, or the default if none are. Its use would look like this:

.foo {
  margin-left: 
    cond(
      (50vw < 400px) 2em, 
      (50vw < 800px) 1em, 
      0px
    );
}
switch()

This (our) proposal is a function which works like cond() above, but can provide contextual information only available at appropriate stages in the lifecycle. In the case of layout properties, it would have the same sorts of information available to it as a layout worklet, thus allowing you to do a lot of the things people want to do with "container queries" as in this example below (available-inline-size is the contextual value provided during layout). Its use would look like this:

/* (proposed syntax, to be bikeshed much.. note the demos use a less flexible/different/easier to implement syntax for now ) */ 
.foo {
  grid-template-columns: 
    switch(
      (available-inline-size > 1024px) 1fr 4fr 1fr;
      (available-inline-size > 400px) 2fr 1fr;
      (available-inline-size > 100px) 1fr;
      default 1fr;
    );
}

As similar as these may seem, almost everything about them concretely is different. Each is parsed and follows very different paths around what can be resolved and when, as well as what you can do with them. nth-value(), it was suggested by Mozilla's Emilio Cobos, should be extremely easy to implement because it reuses much of the existing infrastructure for CSS math functions. In fact, he went ahead and implemented it in Mozilla's code base to illustrate.

While things were too hectic to advance our own proposal for a while earlier this year, we did have a enough time to look into that and indeed, the nth-value() proposal was fairly simple to implement in Chromium too! In a very short time, without very sustained investment, we were able to create a complete patch that we could submit.

While nth-value() doesn't help advance the container queries use cases, we agree that it looks like a comparatively easy win for developers, and it might be worth having too.

So, we put it to you: Is it?

We would love your feedback on both of these things - are they things that you would like to see standards bodies and implementers pursue? We certainly are willing to implement a similar prototype for WebKit if necessary if developers are interested and it is standardized. Let us know what you think via @igalia or @briankardell!

November 05, 2020 05:00 AM

October 31, 2020

Eleni Maria Stea

[OpenGL and Vulkan Interoperability on Linux] Part 9: Reusing a Vulkan z buffer from OpenGL

In this 9th post on OpenGL and Vulkan interoperability on Linux with EXT_external_objects and EXT_external_objects_fd we are going to see another extensions use case where a Vulkan depth buffer is used to render a pattern with OpenGL. Like every other example use case described in these posts, it was implemented for Piglit as part of … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 9: Reusing a Vulkan z buffer from OpenGL

by hikiko at October 31, 2020 02:10 PM

October 30, 2020

Jacobo Aragunde

Event management in X11 Chromium

This is a follow-up of my previous post, where I was trying to fix the bug #1042864 in Chromium: key strokes happening on native dialogs, like open and save dialogs, were not reported to the screen reader.

After learning how accessibility tools (ATs) register listeners for key events, I found out the problem was not actually there; I had to investigate how events arrive from the X11 server to the browser, and how they are forwarded to the ATs.

Not this kind of event

Events arrive from the X server

If you are running Chromium on Linux with the X11 backend (most likely, as it is the default), the Chromium browser process receives key press events from the X server. Then, it finds out if the target of those events is one of its browser windows, and sends it to the proper Window object to be processed.

These are the classes involved in the first part of this process:

The interface PlatformEventSource represents an undetermined source of events coming from the platform, and a PlatformEventDispatcher is any object in the browser capable of managing those events, dispatching them to the actual webpage or UI element. These two classes are related, the PlatformEventSource keeps a list of dispatchers it will forward the event to, if they can manage it (CanDispatchEvent).

The X11EventSource class implements PlatformEventSource; it has the code managing the events coming from an X11 server, in particular. It additionally keeps a list of XEventDispatcher objects, which is a class to manage X11 Event objects independently, but it’s not an implementation of PlatformEventDispatcher.

The X11Window class is the central piece, implementing both the PlatformEventDispatcher and the XEventDispatcher interfaces, in addition to the XWindow class. It has all the means required to find out if it can dispatch an event, and do it.

The main event processing loop looks like this:

  1. An event arrives to X11EventSource.
  • X11EventSource loops through its list of XEventDispatcher, and calls CheckCanDispatchNextPlatformEvent for each of them.

  • The X11Window implementing that function checks if the XWindow ID of the event target matches the ID of the XWindow represented by that object, and saves the XEvent object if affirmative.

  • X11EventSource calls DispatchEvent as implemented by its parent class PlatformEventSource.

  • The PlatformEventSource loops through its list of PlatformEventDispatchers and calls CanDispatchEvent on each one of them.

  • The X11Window object, which had previously run CheckCanDispatchNextPlatformEvent, just verifies if the XEvent object was saved then, and considers that a confirmation it can dispatch the event.

  • When one of the dispatchers answers positively, it receives the event for processing in a call to DispatchEvent; it is implemented at X11Window.

  • If it’s a keyboard event, it takes the steps required to send it to any ATs listening to it, which had been previously registered via ATK.

  • When X11Window ends processing the event, it returns POST_DISPATCH_STOP_PROPAGATION, telling PlatformEventSource to stop looping through the rest of dispatchers.

  • This is a sequence diagram summarizing this process:

    Events leave to the ATs

    As explained in the previous post, ATs can register callbacks for key press events, which ultimately call AtkUtilClass::add_key_event_listener. AtkUtilClass is a struct of function pointers, the actual implementation is provided by Chromium in the AtkUtilAuraLinux class, which keeps a list of those callbacks.

    When an X11Window class encounters an event that is targetting its own X Window, and it is a keyboard event, it calls X11ExtensionDelegate::OnAtkEvent() which is actually implemented by the class DesktopWindowTreeHostLinux; it ultimately hands the event to the AtkUtilAuraLinux class and runs HandleAtkEvent(). It will loop through, and run, any listeners that may have been registered.

    Native dialogs are different

    Native dialogs are stand-alone windows in the X server, different from the browser window that called them, and the browser process doesn’t wrap them in X11Window object. It is considered unnecessary, because the windows for native dialogs talk to the X server and receive events from it directly.

    They do belong to the browser process, though, which means that the browser will still receive events targetting the dialog windows. They will go through all the steps mentioned above to eventually be dismissed, because there is no X11Window object in the browser matching the ID of the target window of the process.

    Another consequence of dialog windows belonging to the browser process is that the AtkUtilClass struct points to Chromium’s own implementation, and here comes the problem… The dialog is expected to manage its own events through GTK+ code, including the GTK+ implementation of AtkUtilClass, but Chromium overrode it. The key press listeners that ATs registered are kept in Chromium code, so the dialog cannot notify them.

    Finally, fixing the problem

    Chromium does receive the keyboard events targetted to the dialog windows, but it does nothing with them because the target of those events is not a browser window. It gives us, though, a leg towards building a solution.

    To fix the problem, I made Chromium X Windows manage the keyboard events addressed to the native dialogs in addition to their own. For that, I took advantage of the “transient” property, which indicates a dependency of one window from the other: the dialog window had been set as transient for the browser window. In my first approach, I modified X11Window::CheckCanDispatchNextPlatformEvent() to verify if the target of the event was a transient window of the browser X Window, and in that case it would hand the event to X11ExtensionDelegate to be sent to ATs, following the code patch previously explained. It stopped processing at this point, otherwise the browser window would have received key presses directed to the dialog.

    The approach had one performance problem: I was calling the X server to check that property, for every keystroke, and that call implied using synchronous IPC. This was unacceptable! But it could be worked around: we could also notify the corresponding internal X11Window object about the existence of this transient window, when the dialog is created. This implies no IPC at all, we just store one new property in the X11Window object that can be checked locally when keyboard events are processed.

    This is a link to the review process of the patch, if you are interested in its history. To sum up, in the final solution:

    1. Chromium creates the native dialog and calls XWindow::SetTransientWindow, setting that property in the corresponding browser X Window.
  • When Chromium receives a keyboard event, it is captured by the X11Window object whose transient window property has been set before.

  • X11ExtensionDelegate::OnAtkEvent() is called for that event, then no more processing of this event happens in Chromium.

  • The native dialog code will also receive the event and manage the keystroke accordingly.

  • I hope you enjoyed this trip through Chromium event processing code. If you want to use the diagrams in this post, you may find their Dia source files in this link. Happy hacking!

    by Jacobo Aragunde Pérez at October 30, 2020 05:00 PM

    October 29, 2020

    Claudio Saavedra

    Thu 2020/Oct/29

    In this line of work, we all stumble at least once upon a problem that turns out to be extremely elusive and very tricky to narrow down and solve. If we&aposre lucky, we might have everything at our disposal to diagnose the problem but sometimes that&aposs not the case – and in embedded development it&aposs often not the case. Add to the mix proprietary drivers, lack of debugging symbols, a bug that&aposs very hard to reproduce under a controlled environment, and weeks in partial confinement due to a pandemic and what you have is better described as a very long lucid nightmare. Thankfully, even the worst of nightmares end when morning comes, even if sometimes morning might be several days away. And when the fix to the problem is in an inimaginable place, the story is definitely one worth telling.

    The problem

    It all started with one of Igalia&aposs customers deploying a WPE WebKit-based browser in their embedded devices. Their CI infrastructure had detected a problem caused when the browser was tasked with creating a new webview (in layman terms, you can imagine that to be the same as opening a new tab in your browser). Occasionally, this view would never load, causing ongoing tests to fail. For some reason, the test failure had a reproducibility of ~75% in the CI environment, but during manual testing it would occur with less than a 1% of probability. For reasons that are beyond the scope of this post, the CI infrastructure was not reachable in a way that would allow to have access to running processes in order to diagnose the problem more easily. So with only logs at hand and less than a 1/100 chances of reproducing the bug myself, I set to debug this problem locally.

    Diagnosis

    The first that became evident was that, whenever this bug would occur, the WebKit feature known as web extension (an application-specific loadable module that is used to allow the program to have access to the internals of a web page, as well to enable customizable communication with the process where the page contents are loaded – the web process) wouldn&apost work. The browser would be forever waiting that the web extension loads, and since that wouldn&apost happen, the expected page wouldn&apost load. The first place to look into then is the web process and to try to understand what is preventing the web extension from loading. Enter here, our good friend GDB, with less than spectacular results thanks to stripped libraries.

    #0  0x7500ab9c in poll () from target:/lib/libc.so.6
    #1  0x73c08c0c in ?? () from target:/usr/lib/libEGL.so.1
    #2  0x73c08d2c in ?? () from target:/usr/lib/libEGL.so.1
    #3  0x73c08e0c in ?? () from target:/usr/lib/libEGL.so.1
    #4  0x73bold6a8 in ?? () from target:/usr/lib/libEGL.so.1
    #5  0x75f84208 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
    #6  0x75fa0b7e in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
    #7  0x7561eda2 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
    #8  0x755a176a in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
    #9  0x753cd842 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
    #10 0x75451660 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
    #11 0x75452882 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
    #12 0x75452fa8 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
    #13 0x76b1de62 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
    #14 0x76b5a970 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
    #15 0x74bee44c in g_main_context_dispatch () from target:/usr/lib/libglib-2.0.so.0
    #16 0x74bee808 in ?? () from target:/usr/lib/libglib-2.0.so.0
    #17 0x74beeba8 in g_main_loop_run () from target:/usr/lib/libglib-2.0.so.0
    #18 0x76b5b11c in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
    #19 0x75622338 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
    #20 0x74f59b58 in __libc_start_main () from target:/lib/libc.so.6
    #21 0x0045d8d0 in _start ()

    From all threads in the web process, after much tinkering around it slowly became clear that one of the places to look into is that poll() call. I will spare you the details related to what other threads were doing, suffice to say that whenever the browser would hit the bug, there was a similar stacktrace in one thread, going through libEGL to a call to poll() on top of the stack, that would never return. Unfortunately, a stripped EGL driver coming from a proprietary graphics vendor was a bit of a showstopper, as it was the inability to have proper debugging symbols running inside the device (did you know that a non-stripped WebKit library binary with debugging symbols can easily get GDB and your device out of memory?). The best one could do to improve that was to use the gcore feature in GDB, and extract a core from the device for post-mortem analysis. But for some reason, such a stacktrace wouldn&apost give anything interesting below the poll() call to understand what&aposs being polled here. Did I say this was tricky?

    What polls?

    Because WebKit is a multiprocess web engine, having system calls that signal, read, and write in sockets communicating with other processes is an everyday thing. Not knowing what a poll() call is doing and who is it that it&aposs trying to listen to, not very good. Because the call is happening under the EGL library, one can presume that it&aposs graphics related, but there are still different possibilities, so trying to find out what is this polling is a good idea.

    A trick I learned while debugging this is that, in absence of debugging symbols that would give a straightforward look into variables and parameters, one can examine the CPU registers and try to figure out from them what the parameters to function calls are. Let&aposs do that with poll(). First, its signature.

    int poll(struct pollfd *fds, nfds_t nfds, int timeout);

    Now, let's examine the registers.

    (gdb) f 0
    #0  0x7500ab9c in poll () from target:/lib/libc.so.6
    (gdb) info registers
    r0             0x7ea55e58	2124766808
    r1             0x1	1
    r2             0x64	100
    r3             0x0	0
    r4             0x0	0

    Registers r0, r1, and r2 contain poll()&aposs three parameters. Because r1 is 1, we know that there is only one file descriptor being polled. fds is a pointer to an array with one element then. Where is that first element? Well, right there, in the memory pointed to directly by r0. What does struct pollfd look like?

    struct pollfd {
      int   fd;         /* file descriptor */
      short events;     /* requested events */
      short revents;    /* returned events */
    };

    What we are interested in here is the contents of fd, the file descriptor that is being polled. Memory alignment is again in our side, we don&apost need any pointer arithmetic here. We can inspect directly the register r0 and find out what the value of fd is.

    (gdb) print *0x7ea55e58
    $3 = 8

    So we now know that the EGL library is polling the file descriptor with an identifier of 8. But where is this file descriptor coming from? What is on the other end? The /proc file system can be helpful here.

    # pidof WPEWebProcess
    1944 1196
    # ls -lh /proc/1944/fd/8
    lrwx------    1 x x      64 Oct 22 13:59 /proc/1944/fd/8 -> socket:[32166]
    

    So we have a socket. What else can we find out about it? Turns out, not much without the unix_diag kernel module, which was not available in our device. But we are slowly getting closer. Time to call another good friend.

    Where GDB fails, printf() triumphs

    Something I have learned from many years working with a project as large as WebKit, is that debugging symbols can be very difficult to work with. To begin with, it takes ages to build WebKit with them. When cross-compiling, it&aposs even worse. And then, very often the target device doesn&apost even have enough memory to load the symbols when debugging. So they can be pretty useless. It&aposs then when just using fprintf() and logging useful information can simplify things. Since we know that it&aposs at some point during initialization of the web process that we end up stuck, and we also know that we&aposre polling a file descriptor, let&aposs find some early calls in the code of the web process and add some fprintf() calls with a bit of information, specially in those that might have something to do with EGL. What can we find out now?

    Oct 19 10:13:27.700335 WPEWebProcess[92]: Starting
    Oct 19 10:13:27.720575 WPEWebProcess[92]: Initializing WebProcess platform.
    Oct 19 10:13:27.727850 WPEWebProcess[92]: wpe_loader_init() done.
    Oct 19 10:13:27.729054 WPEWebProcess[92]: Initializing PlatformDisplayLibWPE (hostFD: 8).
    Oct 19 10:13:27.730166 WPEWebProcess[92]: egl backend created.
    Oct 19 10:13:27.741556 WPEWebProcess[92]: got native display.
    Oct 19 10:13:27.742565 WPEWebProcess[92]: initializeEGLDisplay() starting.

    Two interesting findings from the fprintf()-powered logging here: first, it seems that file descriptor 8 is one known to libwpe (the general-purpose library that powers the WPE WebKit port). Second, that the last EGL API call right before the web process hangs on poll() is a call to eglInitialize(). fprintf(), thanks for your service.

    Number 8

    We now know that the file descriptor 8 is coming from WPE and is not internal to the EGL library. libwpe gets this file descriptor from the UI process, as one of the many creation parameters that are passed via IPC to the nascent process in order to initialize it. Turns out that this file descriptor in particular, the so-called host client file descriptor, is the one that the freedesktop backend of libWPE, from here onwards WPEBackend-fdo, creates when a new client is set to connect to its Wayland display. In a nutshell, in presence of a new client, a Wayland display is supposed to create a pair of connected sockets, create a new client on the Display-side, give it one of the file descriptors, and pass the other one to the client process. Because this will be useful later on, let&aposs see how is that currently implemented in WPEBackend-fdo.

        int pair[2];
        if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0, pair)  0)
            return -1;
    
        int clientFd = dup(pair[1]);
        close(pair[1]);
    
        wl_client_create(m_display, pair[0]);
    

    The file descriptor we are tracking down is the client file descriptor, clientFd. So we now know what&aposs going on in this socket: Wayland-specific communication. Let&aposs enable Wayland debugging next, by running all relevant process with WAYLAND_DEBUG=1. We&aposll get back to that code fragment later on.

    A Heisenbug is a Heisenbug is a Heisenbug

    Turns out that enabling Wayland debugging output for a few processes is enough to alter the state of the system in such a way that the bug does not happen at all when doing manual testing. Thankfully the CI&aposs reproducibility is much higher, so after waiting overnight for the CI to continuously run until it hit the bug, we have logs. What do the logs say?

    WPEWebProcess[41]: initializeEGLDisplay() starting.
      -> wl_display@1.get_registry(new id wl_registry@2)
      -> wl_display@1.sync(new id wl_callback@3)
    

    So the EGL library is trying to fetch the Wayland registry and it&aposs doing a wl_display_sync() call afterwards, which will block until the server responds. That&aposs where the blocking poll() call comes from. So, it turns out, the problem is not necessarily on this end of the Wayland socket, but perhaps on the other side, that is, in the so-called UI process (the main browser process). Why is the Wayland display not replying?

    The loop

    Something that is worth mentioning before we move on is how the WPEBackend-fdo Wayland display integrates with the system. This display is a nested display, with each web view a client, while it is itself a client of the system&aposs Wayland display. This can be a bit confusing if you&aposre not very familiar with how Wayland works, but fortunately there is good documentation about Wayland elsewhere.

    The way that the Wayland display in the UI process of a WPEWebKit browser is integrated with the rest of the program, when it uses WPEBackend-fdo, is through the GLib main event loop. Wayland itself has an event loop implementation for servers, but for a GLib-powered application it can be useful to use GLib&aposs and integrate Wayland&aposs event processing with the different stages of the GLib main loop. That is precisely how WPEBackend-fdo is handling its clients&apos events. As discussed earlier, when a new client is created a pair of connected sockets are created and one end is given to Wayland to control communication with the client. GSourceFunc functions are used to integrate Wayland with the application main loop. In these functions, we make sure that whenever there are pending messages to be sent to clients, those are sent, and whenever any of the client sockets has pending data to be read, Wayland reads from them, and to dispatch the events that might be necessary in response to the incoming data. And here is where things start getting really strange, because after doing a bit of fprintf()-powered debugging inside the Wayland-GSourceFuncs functions, it became clear that the Wayland events from the clients were never dispatched, because the dispatch() GSourceFunc was not being called, as if there was nothing coming from any Wayland client. But how is that possible, if we already know that the web process client is actually trying to get the Wayland registry?

    To move forward, one needs to understand how the GLib main loop works, in particular, with Unix file descriptor sources. A very brief summary of this is that, during an iteration of the main loop, GLib will poll file descriptors to see if there are any interesting events to be reported back to their respective sources, in which case the sources will decide whether to trigger the dispatch() phase. A simple source might decide in its dispatch() method to directly read or write from/to the file descriptor; a Wayland display source (as in our case), will call wl_event_loop_dispatch() to do this for us. However, if the source doesn&apost find any interesting events, or if the source decides that it doesn&apost want to handle them, the dispatch() invocation will not happen. More on the GLib main event loop in its API documentation.

    So it seems that for some reason the dispatch() method is not being called. Does that mean that there are no interesting events to read from? Let&aposs find out.

    System call tracing

    Here we resort to another helpful tool, strace. With strace we can try to figure out what is happening when the main loop polls file descriptors. The strace output is huge (because it takes easily over a hundred attempts to reproduce this), but we know already some of the calls that involve file descriptors from the code we looked at above, when the client is created. So we can use those calls as a starting point in when searching through the several MBs of logs. Fast-forward to the relevant logs.

    socketpair(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, [128, 130]) = 0
    dup(130)               = 131
    close(130)             = 0
    fcntl64(128, F_DUPFD_CLOEXEC, 0) = 130
    epoll_ctl(34, EPOLL_CTL_ADD, 130, {EPOLLIN, {u32=1639599928, u64=1639599928}}) = 0

    What we see there is, first, WPEBackend-fdo creating a new socket pair (128, 130) and then, when file descriptor 130 is passed to wl_client_create() to create a new client, Wayland adds that file descriptor to its epoll() instance for monitoring clients, which is referred to by file descriptor 34. This way, whenever there are events in file descriptor 130, we will hear about them in file descriptor 34.

    So what we would expect to see next is that, after the web process is spawned, when a Wayland client is created using the passed file descriptor and the EGL driver requests the Wayland registry from the display, there should be a POLLIN event coming in file descriptor 34 and, if the dispatch() call for the source was called, a epoll_wait() call on it, as that is what wl_event_loop_dispatch() would do when called from the source&aposs dispatch() method. But what do we have instead?

    poll([{fd=30, events=POLLIN}, {fd=34, events=POLLIN}, {fd=59, events=POLLIN}, {fd=110, events=POLLIN}, {fd=114, events=POLLIN}, {fd=132, events=POLLIN}], 6, 0) = 1 ([{fd=34, revents=POLLIN}])
    recvmsg(30, {msg_namelen=0}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
    

    strace can be a bit cryptic, so let&aposs explain those two function calls. The first one is a poll in a series of file descriptors (including 30 and 34) for POLLIN events. The return value of that call tells us that there is a POLLIN event in file descriptor 34 (the Wayland display epoll() instance for clients). But unintuitively, the call right after is trying to read a message from socket 30 instead, which we know doesn&apost have any pending data at the moment, and consequently returns an error value with an errno of EAGAIN (Resource temporarily unavailable).

    Why is the GLib main loop triggering a read from 30 instead of 34? And who is 30?

    We can answer the latter question first. Breaking on a running UI process instance at the right time shows who is reading from the file descriptor 30:

    #1  0x70ae1394 in wl_os_recvmsg_cloexec (sockfd=30, msg=msg@entry=0x700fea54, flags=flags@entry=64)
    #2  0x70adf644 in wl_connection_read (connection=0x6f70b7e8)
    #3  0x70ade70c in read_events (display=0x6f709c90)
    #4  wl_display_read_events (display=0x6f709c90)
    #5  0x70277d98 in pwl_source_check (source=0x6f71cb80)
    #6  0x743f2140 in g_main_context_check (context=context@entry=0x2111978, max_priority=, fds=fds@entry=0x6165f718, n_fds=n_fds@entry=4)
    #7  0x743f277c in g_main_context_iterate (context=0x2111978, block=block@entry=1, dispatch=dispatch@entry=1, self=)
    #8  0x743f2ba8 in g_main_loop_run (loop=0x20ece40)
    #9  0x00537b38 in ?? ()

    So it&aposs also Wayland, but on a different level. This is the Wayland client source (remember that the browser is also a Wayland client?), which is installed by cog (a thin browser layer on top of WPE WebKit that makes writing browsers easier to do) to process, among others, input events coming from the parent Wayland display. Looking at the cog code, we can see that the wl_display_read_events() call happens only if GLib reports that there is a G_IO_IN (POLLIN) event in its file descriptor, but we already know that this is not the case, as per the strace output. So at this point we know that there are two things here that are not right:

    1. A FD source with a G_IO_IN condition is not being dispatched.
    2. A FD source without a G_IO_IN condition is being dispatched.

    Someone here is not telling the truth, and as a result the main loop is dispatching the wrong sources.

    The loop (part II)

    It is at this point that it would be a good idea to look at what exactly the GLib main loop is doing internally in each of its stages and how it tracks the sources and file descriptors that are polled and that need to be processed. Fortunately, debugging symbols for GLib are very small, so debugging this step by step inside the device is rather easy.

    Let&aposs look at how the main loop decides which sources to dispatch, since for some reason it&aposs dispatching the wrong ones. Dispatching happens in the g_main_dispatch() method. This method goes over a list of pending source dispatches and after a few checks and setting the stage, the dispatch method for the source gets called. How is a source set as having a pending dispatch? This happens in g_main_context_check(), where the main loop checks the results of the polling done in this iteration and runs the check() method for sources that are not ready yet so that they can decide whether they are ready to be dispatched or not. Breaking into the Wayland display source, I know that the check() method is called. How does this method decide to be dispatched or not?

        [](GSource* base) -> gboolean
        {
            auto& source = *reinterpret_cast(base);
            return !!source.pfd.revents;
        },

    In this lambda function we&aposre returning TRUE or FALSE, depending on whether the revents field in the GPollFD structure have been filled during the polling stage of this iteration of the loop. A return value of TRUE indicates the main loop that we want our source to be dispatched. From the strace output, we know that there is a POLLIN (or G_IO_IN) condition, but we also know that the main loop is not dispatching it. So let&aposs look at what&aposs in this GPollFD structure.

    For this, let&aposs go back to g_main_context_check() and inspect the array of GPollFD structures that it received when called. What do we find?

    (gdb) print *fds
    $35 = {fd = 30, events = 1, revents = 0}
    (gdb) print *(fds+1)
    $36 = {fd = 34, events = 1, revents = 1}

    That&aposs the result of the poll() call! So far so good. Now the method is supposed to update the polling records it keeps and it uses when calling each of the sources check() functions. What do these records hold?

    (gdb) print *pollrec->fd
    $45 = {fd = 19, events = 1, revents = 0}
    (gdb) print *(pollrec->next->fd)
    $47 = {fd = 30, events = 25, revents = 1}
    (gdb) print *(pollrec->next->next->fd)
    $49 = {fd = 34, events = 25, revents = 0}

    We&aposre not interested in the first record quite yet, but clearly there&aposs something odd here. The polling records are showing a different value in the revent fields for both 30 and 34. Are these records updated correctly? Let&aposs look at the algorithm that is doing this update, because it will be relevant later on.

      pollrec = context->poll_records;
      i = 0;
      while (pollrec && i  n_fds)
        {
          while (pollrec && pollrec->fd->fd == fds[i].fd)
            {
              if (pollrec->priority = max_priority)
                {
                  pollrec->fd->revents =
                    fds[i].revents & (pollrec->fd->events | G_IO_ERR | G_IO_HUP | G_IO_NVAL);
                }
              pollrec = pollrec->next;
            }
    
          i++;
        }

    In simple words, what this algorithm is doing is to traverse simultaneously the polling records and the GPollFD array, updating the polling records revents with the results of polling. From reading how the pollrec linked list is built internally, it&aposs possible to see that it&aposs purposely sorted by increasing file descriptor identifier value. So the first item in the list will have the record for the lowest file descriptor identifier, and so on. The GPollFD array is also built in this way, allowing for a nice optimization: if more than one polling record – that is, more than one polling source – needs to poll the same file descriptor, this can be done at once. This is why this otherwise O(n^2) nested loop can actually be reduced to linear time.

    One thing stands out here though: the linked list is only advanced when we find a match. Does this mean that we always have a match between polling records and the file descriptors that have just been polled? To answer that question we need to check how is the array of GPollFD structures filled. This is done in g_main_context_query(), as we hinted before. I&aposll spare you the details, and just focus on what seems relevant here: when is a poll record not used to fill a GPollFD?

      n_poll = 0;
      lastpollrec = NULL;
      for (pollrec = context->poll_records; pollrec; pollrec = pollrec->next)
        {
          if (pollrec->priority > max_priority)
            continue;
      ...
    

    Interesting! If a polling record belongs to a source whose priority is lower than the maximum priority that the current iteration is going to process, the polling record is skipped. Why is this?

    In simple terms, this happens because each iteration of the main loop finds out the highest priority between the sources that are ready in the prepare() stage, before polling, and then only those file descriptor sources with at least such a a priority are polled. The idea behind this is to make sure that high-priority sources are processed first, and that no file descriptor sources with lower priority are polled in vain, as they shouldn&apost be dispatched in the current iteration.

    GDB tells me that the maximum priority in this iteration is -60. From an earlier GDB output, we also know that there&aposs a source for a file descriptor 19 with a priority 0.

    (gdb) print *pollrec
    $44 = {fd = 0x7369c8, prev = 0x0, next = 0x6f701560, priority = 0}
    (gdb) print *pollrec->fd
    $45 = {fd = 19, events = 1, revents = 0}

    Since 19 is lower than 30 and 34, we know that this record is before theirs in the linked list (and so it happens, it&aposs the first one in the list too). But we know that, because its priority is 0, it is too low to be added to the file descriptor array to be polled. Let&aposs look at the loop again.

      pollrec = context->poll_records;
      i = 0;
      while (pollrec && i  n_fds)
        {
          while (pollrec && pollrec->fd->fd == fds[i].fd)
            {
              if (pollrec->priority = max_priority)
                {
                  pollrec->fd->revents =
                    fds[i].revents & (pollrec->fd->events | G_IO_ERR | G_IO_HUP | G_IO_NVAL);
                }
              pollrec = pollrec->next;
            }
    
          i++;
        }

    The first polling record was skipped during the update of the GPollFD array, so the condition pollrec && pollrec->fd->fd == fds[i].fd is never going to be satisfied, because 19 is not in the array. The innermost while() is not entered, and as such the pollrec list pointer never moves forward to the next record. So no polling record is updated here, even if we have updated revent information from the polling results.

    What happens next should be easy to see. The check() method for all polled sources are called with outdated revents. In the case of the source for file descriptor 30, we wrongly tell it there&aposs a G_IO_IN condition, so it asks the main loop to call dispatch it triggering a a wl_connection_read() call in a socket with no incoming data. For the source with file descriptor 34, we tell it that there&aposs no incoming data and its dispatch() method is not invoked, even when on the other side of the socket we have a client waiting for data to come and blocking in the meantime. This explains what we see in the strace output above. If the source with file descriptor 19 continues to be ready and with its priority unchanged, then this situation repeats in every further iteration of the main loop, leading to a hang in the web process that is forever waiting that the UI process reads its socket pipe.

    The bug – explained

    I have been using GLib for a very long time, and I have only fixed a couple of minor bugs in it over the years. Very few actually, which is why it was very difficult for me to come to accept that I had found a bug in one of the most reliable and complex parts of the library. Impostor syndrome is a thing and it really gets in the way.

    But in a nutshell, the bug in the GLib main loop is that the very clever linear update of registers is missing something very important: it should skip to the first polling record matching before attempting to update its revents. Without this, in the presence of a file descriptor source with the lowest file descriptor identifier and also a lower priority than the cutting priority in the current main loop iteration, revents in the polling registers are not updated and therefore the wrong sources can be dispatched. The simplest patch to avoid this, would look as follows.

       i = 0;
       while (pollrec && i  n_fds)
         {
    +      while (pollrec && pollrec->fd->fd != fds[i].fd)
    +        pollrec = pollrec->next;
    +
           while (pollrec && pollrec->fd->fd == fds[i].fd)
             {
               if (pollrec->priority = max_priority)

    Once we find the first matching record, let&aposs update all consecutive records that also match and need an update, then let&aposs skip to the next record, rinse and repeat. With this two-line patch, the web process was finally unlocked, the EGL display initialized properly, the web extension and the web page were loaded, CI tests starting passing again, and this exhausted developer could finally put his mind to rest.

    A complete patch, including improvements to the code comments around this fascinating part of GLib and also a minimal test case reproducing the bug have already been reviewed by the GLib maintainers and merged to both stable and development branches. I expect that at least some GLib sources will start being called in a different (but correct) order from now on, so keep an eye on your GLib sources. :-)

    Standing on the shoulders of giants

    At this point I should acknowledge that without the support from my colleagues in the WebKit team in Igalia, getting to the bottom of this problem would have probably been much harder and perhaps my sanity would have been at stake. I want to thank Adrián and &Zcaronan for their input on Wayland, debugging techniques, and for allowing me to bounce back and forth ideas and findings as I went deeper into this rabbit hole, helping me to step out of dead-ends, reminding me to use tools out of my everyday box, and ultimately, to be brave enough to doubt GLib&aposs correctness, something that much more often than not I take for granted.

    Thanks also to Philip and Sebastian for their feedback and prompt code review!

    October 29, 2020 01:10 PM

    October 23, 2020

    Eleni Maria Stea

    [OpenGL and Vulkan Interoperability on Linux] Part 8: Using a Vulkan vertex buffer from OpenGL and then from Vulkan

    This is the 8th post on OpenGL and Vulkan Interoperability with EXT_external_objects and EXT_external_objects_fd where I explain some example use cases of the extensions I’ve implemented for Piglit as part of my work for Igalia. In this example, a Vulkan vertex buffer is created and filled with vertices and then it’s used to render the … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 8: Using a Vulkan vertex buffer from OpenGL and then from Vulkan

    by hikiko at October 23, 2020 05:00 PM

    October 18, 2020

    Eleni Maria Stea

    [OpenGL and Vulkan Interoperability on Linux] Part 7: Reusing a Vulkan vertex buffer from OpenGL

    This is the 7th post on OpenGL and Vulkan Interoperability with EXT_external_objects. It’s about another EXT_external_objects use case implemented for Piglit as part of my work for Igalia‘s graphics team. In this case a vertex buffer is allocated and filled with data from Vulkan and then it’s used from OpenGL to render a pattern on … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 7: Reusing a Vulkan vertex buffer from OpenGL

    by hikiko at October 18, 2020 04:47 PM

    [OpenGL and Vulkan Interoperability on Linux] Part 6: We should be able to reuse a Vulkan pixel buffer from OpenGL but not to overwrite it!

    This is another blog post on OpenGL and Vulkan Interoperability. It’s not really a description of a new use case as the Piglit test I am going to describe is quite similar to the previous example we’ve seen where we reused a Vulkan pixel buffer from OpenGL. This Piglit test was written because there’s an … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 6: We should be able to reuse a Vulkan pixel buffer from OpenGL but not to overwrite it!

    by hikiko at October 18, 2020 11:23 AM

    [OpenGL and Vulkan Interoperability on Linux] Part 5: A Vulkan pixel buffer is reused by OpenGL

    This is the 5th post of the OpenGL and Vulkan interoperability series where I describe some use cases for the EXT_external_objects and EXT_external_objects_fd extensions. These use cases have been implemented inside Piglit as part of my work for Igalia‘s graphics team using a Vulkan framework I’ve written for this purpose. And in this 5th post, … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 5: A Vulkan pixel buffer is reused by OpenGL

    by hikiko at October 18, 2020 10:02 AM

    October 17, 2020

    Eleni Maria Stea

    [OpenGL and Vulkan Interoperability on Linux] Part 4: Using OpenGL to overwrite Vulkan allocated textures.

    This is the 4th post on OpenGL and Vulkan Interoperability on Linux. The first one was an introduction to EXT_external_objects and EXT_external_objects_fd extensions, the second was describing a simple interoperability use case where a Vulkan allocated textured is filled by OpenGL, and the third was about a slightly more complex use case where a Vulkan … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 4: Using OpenGL to overwrite Vulkan allocated textures.

    by hikiko at October 17, 2020 08:06 AM

    October 16, 2020

    Enrique Ocaña

    Figuring out corrupt stacktraces on ARM

    If you’re developing C/C++ on embedded devices, you might already have stumbled upon a corrupt stacktrace like this when trying to debug with gdb:

    (gdb) bt 
    #0  0xb38e32c4 in pthread_getname_np () from /home/enrique/buildroot/output5/staging/lib/libpthread.so.0
    #1  0xb38e103c in __lll_timedlock_wait () from /home/enrique/buildroot/output5/staging/lib/libpthread.so.0 
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)

    In these cases I usually give up gdb and try to solve my problems by adding printf()s and resorting to other tools. However, there are times when you really really need to know what is in that cursed stack.

    ARM devices subroutine calls work by setting the return address in the Link Register (LR), so the subroutine knows where to point the Program Counter (PC) register to. While not jumping into subroutines, the values of the LR register is saved in the stack (to be restored later, right before the current subroutine returns to the caller) and the register can be used for other tasks (LR is a “scratch register”). This means that the functions in the backtrace are actually there, in the stack, in the form of older saved LRs, waiting for us to get them.

    So, the first step would be to dump the memory contents of the backtrace, starting from the address pointed by the Stack Pointer (SP). Let’s print the first 256 32-bit words and save them as a file from gdb:

    (gdb) set logging overwrite on
    (gdb) set logging file /tmp/bt.txt
    (gdb) set logging on
    Copying output to /tmp/bt.txt.
    (gdb) x/256wa $sp
    0xbe9772b0:     0x821e  0xb38e103d   0x1aef48   0xb1973df0
    0xbe9772c0:      0x73d  0xb38dc51f        0x0          0x1
    0xbe9772d0:   0x191d58    0x191da4   0x19f200   0xb31ae5ed
    ...
    0xbe977560: 0xb28c6000  0xbe9776b4        0x5      0x10871 <main(int, char**)>
    0xbe977570: 0xb6f93000  0xaaaaaaab 0xaf85fd4a   0xa36dbc17
    0xbe977580:      0x130         0x0    0x109b9 <__libc_csu_init> 0x0
    ...
    0xbe977690:        0x0         0x0    0x108cd <_start>  0x0
    0xbe9776a0:        0x0     0x108ed <_start+32>  0x10a19 <__libc_csu_fini> 0xb6f76969  
    (gdb) set logging off
    Done logging to /tmp/bt.txt.

    Gdb already can name some of the functions (like main()), but not all of them. At least not the ones more interesting for our purpose. We’ll have to look for them by hand.

    We first get the memory page mapping from the process (WebKit’s WebProcess in my case) looking in /proc/pid/maps. I’m retrieving it from the device (named metro) via ssh and saving it to a local file. I’m only interested in the code pages, those with executable (‘x’) permissions:

    $ ssh metro 'cat /proc/$(ps axu | grep WebProcess | grep -v grep | { read _ P _ ; echo $P ; })/maps | grep " r.x. "' > /tmp/maps.txt

    The file looks like this:

    00010000-00011000 r-xp 00000000 103:04 2617      /usr/bin/WPEWebProcess
    ...
    b54f2000-b6e1e000 r-xp 00000000 103:04 1963      /usr/lib/libWPEWebKit-0.1.so.2.2.1 
    b6f6b000-b6f82000 r-xp 00000000 00:02 816        /lib/ld-2.24.so 
    be957000-be978000 rwxp 00000000 00:00 0          [stack] 
    be979000-be97a000 r-xp 00000000 00:00 0          [sigpage] 
    be97b000-be97c000 r-xp 00000000 00:00 0          [vdso] 
    ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]

    Now we process the backtrace to remove address markers and have one word per line:

    $ cat /tmp/bt.txt | sed -e 's/^[^:]*://' -e 's/[<][^>]*[>]//g' | while read A B C D; do echo $A; echo $B; echo $C; echo $D; done | sed 's/^0x//' | while read P; do printf '%08x\n' "$((16#"$P"))"; done | sponge /tmp/bt.txt

    Then merge and sort both files, so the addresses in the stack appear below their corresponding mappings:

    $ cat /tmp/maps.txt /tmp/bt.txt | sort > /tmp/merged.txt

    Now we process the resulting file to get each address in the stack with its corresponding mapping:

    $ cat /tmp/merged.txt | while read LINE; do if [[ $LINE =~ - ]]; then MAPPING="$LINE"; else echo $LINE '-->' $MAPPING; fi; done | grep '/' | sed -E -e 's/([0-9a-f][0-9a-f]*)-([0-9a-f][0-9a-f]*)/\1 - \2/' > /tmp/mapped.txt

    Like this (address in the stack, page start (or base), page end, page permissions, executable file load offset (base offset), etc.):

    0001034c --> 00010000 - 00011000 r-xp 00000000 103:04 2617 /usr/bin/WPEWebProcess
    ...
    b550bfa4 --> b54f2000 - b6e1e000 r-xp 00000000 103:04 1963 /usr/lib/libWPEWebKit-0.1.so.2.2.1 
    b5937445 --> b54f2000 - b6e1e000 r-xp 00000000 103:04 1963 /usr/lib/libWPEWebKit-0.1.so.2.2.1 
    b5fb0319 --> b54f2000 - b6e1e000 r-xp 00000000 103:04 1963 /usr/lib/libWPEWebKit-0.1.so.2.2.1
    ...

    The addr2line tool can give us the exact function an address belongs to, or even the function and source code line if the code has been built with symbols. But the addresses addr2line understands are internal offsets, not absolute memory addresses. We can convert the addresses in the stack to offsets with this expression:

    offset = address - page start + base offset

    I’m using buildroot as my cross-build environment, so I need to pick the library files from the staging directory because those are the unstripped versions. The addr2line tool is the one from the buldroot cross compiling toolchain. Written as a script:

    $ cat /tmp/mapped.txt | while read ADDR _ BASE _ END _ BASEOFFSET _ _ FILE; do OFFSET=$(printf "%08x\n" $((0x$ADDR - 0x$BASE + 0x$BASEOFFSET))); FILE=~/buildroot/output/staging/$FILE; if [[ -f $FILE ]]; then LINE=$(~/buildroot/output/host/usr/bin/arm-buildroot-linux-gnueabihf-addr2line -p -f -C -e $FILE $OFFSET); echo "$ADDR $LINE"; fi; done > /tmp/addr2line.txt

    Finally, we filter out the useless [??] entries:

    $ cat /tmp/bt.txt | while read DATA; do cat /tmp/addr2line.txt | grep "$DATA"; done | grep -v '[?][?]' > /tmp/fullbt.txt

    What remains is something very similar to what the real backtrace should have been if everything had originally worked as it should in gdb:

    b31ae5ed gst_pad_send_event_unchecked en /home/enrique/buildroot/output5/build/gstreamer1-1.10.4/gst/gstpad.c:5571 
    b31a46c1 gst_debug_log en /home/enrique/buildroot/output5/build/gstreamer1-1.10.4/gst/gstinfo.c:444 
    b31b7ead gst_pad_send_event en /home/enrique/buildroot/output5/build/gstreamer1-1.10.4/gst/gstpad.c:5775 
    b666250d WebCore::AppendPipeline::injectProtectionEventIfPending() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebCore/platform/graphics/gstreamer/mse/AppendPipeline.cpp:1360 
    b657b411 WTF::GRefPtr<_GstEvent>::~GRefPtr() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/DerivedSources/ForwardingHeaders/wtf/glib/GRefPtr.h:76 
    b5fb0319 WebCore::HTMLMediaElement::pendingActionTimerFired() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebCore/html/HTMLMediaElement.cpp:1179 
    b61a524d WebCore::ThreadTimers::sharedTimerFiredInternal() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebCore/platform/ThreadTimers.cpp:120 
    b61a5291 WTF::Function<void ()>::CallableWrapper<WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::{lambda()#1}>::call() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/DerivedSources/ForwardingHeaders/wtf/Function.h:101 
    b6c809a3 operator() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:171 
    b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164 
    b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164 
    b2ad4223 g_main_context_dispatch en :? 
    b6c80601 WTF::{lambda(_GSource*, int (*)(void*), void*)#1}::_FUN(_GSource*, int (*)(void*), void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:40 
    b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164 
    b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164 
    b2adfc49 g_poll en :? 
    b2ad44b7 g_main_context_iterate.isra.29 en :? 
    b2ad477d g_main_loop_run en :? 
    b6c80de3 WTF::RunLoop::run() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:97 
    b6c654ed WTF::RunLoop::dispatch(WTF::Function<void ()>&&) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/RunLoop.cpp:128 
    b5937445 int WebKit::ChildProcessMain<WebKit::WebProcess, WebKit::WebProcessMain>(int, char**) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebKit/Shared/unix/ChildProcessMain.h:64 
    b27b2978 __bss_start en :?

    I hope you find this trick useful and the scripts handy in case you ever to resort to examining the raw stack to get a meaningful backtrace.

    Happy debugging!

    by eocanha at October 16, 2020 07:07 PM

    October 15, 2020

    Andy Wingo

    on "binary security of webassembly"

    Greets!

    You may have seen an interesting paper cross your radar a couple months ago: Everything Old is New Again: Binary Security of WebAssembly, by Daniel Lehmann, Johannes Kinder and Michael Pradel. The paper makes some strong claims and I would like to share some thoughts on it.

    reader-response theory

    For context, I have been working on web browsers for the last 8 years or so, most recently on the JavaScript and WebAssembly engine in Firefox. My work mostly consists of implementing new features, which if you are familiar with software development translates as "writing bugs". Almost all of those bugs are security bugs, potentially causing Firefox to go from being an agent of the user to an agent of the Mossad, or of cryptocurrency thieves, or anything else.

    Mitigating browser bug flow takes a siege mentality. Web browsers treat all web pages and their corresponding CSS, media, JavaScript, and WebAssembly as hostile. We try to reason about global security properties, and translate those properties into invariants ensured at compile-time and run-time, for example to ensure that a web page from site A can't access cookies from site B.

    In this regard, WebAssembly has some of the strongest isolation invariants in the whole platform. A WebAssembly module has access to nothing, by default: neither functionality nor data. Even a module's memory is isolated from the rest of the browser, both by construction (that's just how WebAssembly is specified) and by run-time measures (given that pointers are 32 bits in today's WebAssembly, we generally reserve a multi-gigabyte region for a module's memory that can contain nothing else).

    All of this may seem obvious, but consider that a C++ program compiled to native code on a POSIX platform can use essentially everything that the person running it has access to: your SSH secrets, your email, all of your programs, and so on. That same program compiled to WebAssembly does not -- any capability it has must have been given to it by the person running the program. For POSIX-like programs, the WebAssembly community is working on a POSIX for the web that standardizes a limited-capability access to data and functionality from the world, and in web browsers, well of course the module has access only to the capabilities that the embedding web page gives to it. Mostly, as the JS run-time accompanying the WebAssembly is usually generated by emscripten, this set of capabilties is a function of the program itself.

    Of course, complex WebAssembly systems may contain multiple agents, acting on behalf of different parties. For example, a module might, through capabilities provided by the host, be able to ask flickr to delete a photo, but might also be able to crawl a URL for photos. Probably in this system, crawling a web page shouldn't be able to "trick" the WebAssembly module into deleting a photo. The C++ program compiled to WebAssembly could have a bug of course, in which case, you get to keep both pieces.

    I mention all of this because we who work on WebAssembly are proud of this work! It is a pleasure to design and build a platform for high-performance code that provides robust capabilities-based security properties.

    the new criticism

    Therefore it was with skepticism that I started reading the Lehmann et al paper. The paper focusses on WebAssembly itself, not any particular implementation thereof; what could be wrong about WebAssembly?

    I found the answer to be quite nuanced. To me, the paper shows three interesting things:

    1. Memory-safety bugs in C/C++ programs when compiled to WebAssembly can cause control-flow edges that were not present in the source program.

    2. Unexpected control-flow in a web browser can sometimes end up in a call to eval with the permissions of the web page, which is not good.

    3. It's easier in some ways to exploit bugs in a C/C++ program when compiled to WebAssembly than when compiled natively, because many common mitigations aren't used by the WebAssembly compiler toolchain.

    Firstly, let's dicuss the control-flow point. Let's say that the program has a bug, and you have made an exploit to overwrite some memory location. What can you do with it? Well, consider indirect calls (call_indirect). This is what a compiler will emit for a vtable method call, or for a call to a function pointer. The possible targets for the indirect call are stored in a table, which is a side array of all possible call_indirect targets. The actual target is selected at run-time based on an index; WebAssembly function pointers are just indices into this table.

    So if a function loads an index into the indirect call table from memory, and some exploit can change this index, then you can cause a call site to change its callee. Although there is a run-time type check that occurs at the call_indirect site to ensure that the callee is called with the right type, many functions in a module can have compatible types and thus be callable without an error.

    OK, so that's not great. But what can you do with it? Well it turns out that emscripten will sometimes provide JavaScript's eval to the WebAssembly module. Usually it will be called only with a static string, but anything can happen. If an attacker can redirect a call site to eval instead of one of the possible targets from the source code, you can (e.g.) send the user's cookies to evil.com.

    There's a similar vulnerability regarding changing the operand to eval, instead. Strings are represented in linear memory as well, and there's no write protection on them, even if they are read-only data. If your write primitive can change the string being passed to eval, that's also a win for the attacker. More details in the paper.

    This observation brings us to the last point, which is that many basic mitigations in (e.g.) POSIX deployments aren't present in WebAssembly. There are no OS-level read-only protections for static data, and the compiler doesn't enforce this either. Also WebAssembly programs have to bundle their own malloc, but the implementations provided by emscripten don't implement the "hardening" techniques. There is no addres-space layout randomization, so exploits are deterministic. And so on.

    on mitigations

    It must be said that for most people working on WebAssembly, security "mitigations" are... unsatisfactory. They aren't necessary for memory-safe programs, and they can't prevent memory-unsafe programs from having unexpected behavior. Besides, we who work on WebAssembly are more focussed on the security properties of the WebAssembly program as embedded in its environment, but not on the program itself. Garbage in, garbage out, right?

    In that regard, I think that one answer to this paper is just "don't". Don't ship memory-unsafe programs, or if you do, don't give them eval capabilities. No general mitigation will make these programs safe. Writing your program in e.g. safe Rust is a comprehensive fix to this class of bug.

    But, we have to admit also that shipping programs written in C and C++ is a primary goal of WebAssembly, and that no matter how hard we try, some buggy programs will get shipped, and therefore that there is marginal value to including mitigations like read-only data or even address space randomization. We definitely need to work on getting control-flow integrity protections working well with the WebAssembly toolchain, probably via multi-table support (part of the reference types extension; my colleague Paulo Matos just landed a patch in this area). And certainly Emscripten should work towards minimizing the capabilities set exposed to WebAssembly by the generated JavaScript, notably by compiling away uses of eval by embind.

    Finally, I think that many of the problems identified by this paper will be comprehensively fixed in a different way by more "managed" languages. The problem is that C/C++ pointers are capabilities into all of undifferentiated linear memory. By contrast, handles to GC-managed objects are unforgeable: given object A, you can't get to object B except if object A references B. It would be great if we could bring some of the benefits of this more capability-based approach to in-memory objects to languages like C and C++; more on that in a future note, I think.

    chapeau

    In the end, despite my initial orneriness, I have to admit that the paper authors point out some interesting areas to work on. It's clear that there's more work to do. I was also relieved to find that my code is not at fault in this particular instance :) Onwards and upwards, and until next time, happy hacking!

    by Andy Wingo at October 15, 2020 10:29 AM

    October 13, 2020

    Andy Wingo

    malloc as a service

    Greetings, internet! Today I have the silliest of demos for you: malloc-as-a-service.

    loading walloc...

    JavaScript disabled, no walloc demo. See the walloc web page for more information. >&&<&>>>>>&&>

    The above input box, if things managed to work, loads up a simple bare-bones malloc implementation, and exposes "malloc" and "free" bindings. But the neat thing is that it's built without emscripten: it's a standalone C file that compiles directly to WebAssembly, with no JavaScript run-time at all. I share it here because it might come in handy to people working on WebAssembly toolchains, and also because it was an amusing experience to build.

    wat?

    The name of the allocator is "walloc", in which the w is for WebAssembly.

    Walloc was designed with the following priorities, in order:

    1. Standalone. No stdlib needed; no emscripten. Can be included in a project without pulling in anything else.

    2. Reasonable allocation speed and fragmentation/overhead.

    3. Small size, to minimize download time.

    4. Standard interface: a drop-in replacement for malloc.

    5. Single-threaded (currently, anyway).

    Emscripten includes a couple of good malloc implementations (dlmalloc and emmalloc) which probably you should use instead. But if you are really looking for a bare-bones malloc, walloc is fine.

    You can check out all the details at the walloc project page; a selection of salient bits are below.

    Firstly, to build walloc, it's just a straight-up compile:

    clang -DNDEBUG -Oz --target=wasm32 -nostdlib -c -o walloc.o walloc.c
    

    The resulting walloc.o is a conforming WebAssembly file on its own, but which also contains additional symbol table and relocation sections which allow wasm-ld to combine separate compilation units into a single final WebAssembly file. walloc.c on its own doesn't import or export anything, in the WebAssembly sense; to make bindings visible to JS, you need to add a little wrapper:

    typedef __SIZE_TYPE__ size_t;
    
    #define WASM_EXPORT(name) \
      __attribute__((export_name(#name))) \
      name
    
    // Declare these as coming from walloc.c.
    void *malloc(size_t size);
    void free(void *p);
                              
    void* WASM_EXPORT(walloc)(size_t size) {
      return malloc(size);
    }
    
    void WASM_EXPORT(wfree)(void* ptr) {
      free(ptr);
    }
    

    If you compile that to exports.o and link via wasm-ld --no-entry --import-memory -o walloc.wasm exports.o walloc.o, you end up with the walloc.wasm used in the demo above. See your inspector for the URL.

    The resulting wasm file is about 2 kB (uncompressed).

    Walloc isn't the smallest allocator out there. A simple bump-pointer allocator that never frees is the fastest thing you can have. There is also an alternate allocator for Rust, wee_alloc, which is said to be smaller than walloc, though I think it is less space-efficient for small objects. But still, walloc is pretty small.

    implementation notes

    When a C program is compiled to WebAssembly, the resulting wasm module (usually) has associated linear memory. It can be linked in a way that the memory is created by the module when it's instantiated, or such that the module is given a memory by its host. The above example passed --import-memory to the linker, allowing the host to bound memory usage for the module instance.

    The linear memory has the usual data, stack, and heap segments. The data and stack are placed first. The heap starts at the &__heap_base symbol. (This symbol is computed and defined by the linker.) All bytes above &__heap_base can be used by the wasm program as it likes. So &__heap_base is the lower bound of memory managed by walloc.

                                                  memory growth ->
    +----------------+-----------+-------------+-------------+----
    | data and stack | alignment | walloc page | walloc page | ...
    +----------------+-----------+-------------+-------------+----
    ^ 0              ^ &__heap_base            ^ 64 kB aligned
    

    Interestingly, there are a few different orderings of data and stack used by different toolchains. It used to even be the case that the stack grew up. This diagram from the recent "Everything Old is New Again: Binary Security of WebAssembly" paper by Lehmann et al is a good summary:

    The sensible thing to prevent accidental overflow (underflow, really) is to have the stack grow down to 0, with data at higher addresses. But this can cause WebAssembly code that references data to take up more bytes, because addresses are written using variable-length "LEB" encodings that favor short offsets, so it isn't the default, right now at least.

    Anyway! The upper bound of memory managed by walloc is the total size of the memory, which is aligned on 64-kilobyte boundaries. (WebAssembly ensures this alignment.) Walloc manages memory in 64-kb pages as well. It starts with whatever memory is initially given to the module, and will expand the memory if it runs out. The host can specify a maximum memory size, in pages; if no more pages are available, walloc's malloc will simply return NULL; handling out-of-memory is up to the caller.

    Walloc has two allocation strategies: small and large objects.

    big bois

    A large object is more than 256 bytes.

    There is a global freelist of available large objects, each of which has a header indicating its size. When allocating, walloc does a best-fit search through that list.

    struct large_object {
      struct large_object *next;
      size_t size;
      char payload[0];
    };
    struct large_object* large_object_free_list;
    

    Large object allocations are rounded up to 256-byte boundaries, including the header.

    If there is no object on the freelist that can satisfy an allocation, walloc will expand the heap by the size of the allocation, or by half of the current walloc heap size, whichever is larger. The resulting page or pages form a large object that can satisfy the allocation.

    If the best object on the freelist has more than a chunk of space on the end, it is split, and the tail put back on the freelist. A chunk is 256 bytes.

    +-------------+---------+---------+-----+-----------+
    | page header | chunk 1 | chunk 2 | ... | chunk 255 |
    +-------------+---------+---------+-----+-----------+
    ^ +0          ^ +256    ^ +512                      ^ +64 kB
    

    As each page is 65536 bytes, and each chunk is 256 bytes, there are therefore 256 chunks in a page. The first chunk in a page that begins an allocated object, large or small, contains a header chunk. The page header has a byte for each of the 256 chunks in the page. The byte is 255 if the corresponding chunk starts a large object; otherwise the byte indicates the size class for packed small-object allocations (see below).

    +-------------+---------+---------+----------+-----------+
    | page header | large object 1    | large object 2 ...   |
    +-------------+---------+---------+----------+-----------+
    ^ +0          ^ +256    ^ +512                           ^ +64 kB
    

    When splitting large objects, we avoid starting a new large object on a page header chunk. A large object can only span where a page header chunk would be if it includes the entire page.

    Freeing a large object pushes it on the global freelist. We know a pointer is a large object by looking at the page header. We know the size of the allocation, because the large object header precedes the allocation. When the next large object allocation happens after a free, the freelist will be compacted by merging adjacent large objects.

    small fry

    Small objects are allocated from segregated freelists. The granule size is 8 bytes. Small object allocations are packed in a chunk of uniform allocation size. There are size classes for allocations of each size from 1 to 6 granules, then 8, 10, 16, and 32 granules; 10 sizes in all. For example, an allocation of e.g. 12 granules will be satisfied from a 16-granule chunk. Each size class has its own free list.

    struct small_object_freelist {
      struct small_object_freelist *next;
    };
    struct small_object_freelist small_object_freelists[10];
    

    When allocating, if there is nothing on the corresponding freelist, walloc will allocate a new large object, then change its chunk kind in the page header to the size class. It then goes through the fresh chunk, threading the objects through each other onto a free list.

    +-------------+---------+---------+------------+---------------------+
    | page header | large object 1    | granules=4 | large object 2' ... |
    +-------------+---------+---------+------------+---------------------+
    ^ +0          ^ +256    ^ +512    ^ +768       + +1024               ^ +64 kB
    

    In this example, we imagine that the 4-granules freelist was empty, and that the large object freelist contained only large object 2, running all the way to the end of the page. We allocated a new 4-granules chunk, splitting the first chunk off the large object, and pushing the newly trimmed large object back onto the large object freelist, updating the page header appropriately. We then thread the 4-granules (32-byte) allocations in the fresh chunk together (the chunk has room for 8 of them), treating them as if they were instances of struct freelist, pushing them onto the global freelist for 4-granules allocations.

               in fresh chunk, next link for object N points to object N+1
                                     /--------\                     
                                     |        |
                +------------------+-^--------v-----+----------+
    granules=4: | (padding, maybe) | object 0 | ... | object 7 |
                +------------------+----------+-----+----------+
                                   ^ 4-granule freelist now points here 
    

    The size classes were chosen so that any wasted space (padding) is less than the size class.

    Freeing a small object pushes it back on its size class's free list. Given a pointer, we know its size class by looking in the chunk kind in the page header.

    and that's it

    Hey have fun with the thing! Let me know if you find it useful. Happy hacking and until next time!

    by Andy Wingo at October 13, 2020 01:34 PM

    October 01, 2020

    Sergio Villar

    Closing the gap (in flexbox 😇)

    Flexbox had a lot of early problems, but by mid-May 2020 where our story begins, both Firefox and Chromium had done a lot of work on improving things with this feature. WebKit, however, hadn’t caught up. Prioritizing the incredible amounts of work a web engine requires is difficult. The WebKit implementation was still passable for very many (most) cases of the core features, and it didn’t have problems that caused crashes or something that urgently demanded attention, so engineers dedicated their limited time toward other things. The net result, however, was that as this choice repeated many times, the comparative state of WebKit’s flexbox implementation had fallen behind pretty significantly.
    Web Platform Tests (WPT) is a huge ongoing effort from many people to come up with a very extensive list of tests that could help both spec editors and implementors to make sure we have great compatibility. In the case of flexbox, for example, there are currently 773 tests (2926 subtests) and WebKit was failing a good amount of them. This matters a lot because there are things that flexbox is ideal for, and it is exceptionally widely used. In mid-May, Igalia was contracted to improve things here, and in this post, I’ll explain and illustrate how we did that.

    The Challenge

    The main issues were (in no particular order):
    • min-width:auto and min-height:auto handling
    • Nested flexboxes in column flows
    • Flexboxes inside tables and viceversa
    • Percentages in heights with indefinite sizes
    • WebKit CI not runnning many WPT flexbox tests
    • and of course… lack of gap support in Flexbox
    Modifying Flexbox layout code is a challenge by itself. Tiny modifications in the source code could cause huge differences in the final layout. You might even have a patch that passes all the tests and regresses multiple popular web sites.
    Good news is that we were able to tackle most of those issues. Let’s review what changes you could eventually expect from future releases of Safari (note that Apple doesn’t disclose information about future products and/or releases) and the other WebKit based browsers (like GNOME Web).

    Flexbox gaps 🥳🎉

    Probably one of the most awaited features in WebKit by web developers. It’s finally here after Firefox and Chrome landed it not so long ago. The implementation was initially inspired by the one in Chrome but then it diverged a bit in the final version of the patch. The important thing is that the behaviour should be the same, at least all the tests in WPT related to gaps are passing now in WebKit trunk.
    
    <div style="display: flex; flex-wrap: wrap; gap: 1ch">
      <div style="background: magenta; color: white">Lorem</div>
      <div style="background: green; color: white">ipsum</div>
      <div style="background: orange; color: white">dolor</div>
      <div style="background: blue; color: white">sit</div>
      <div style="background: brown; color: white">amet</div>
    </div>
    
    

    Tables as flex items

    Tables should obey the flex container sizing whenever they are flex items. As it can be seen in the examples bellow, the tables’ layout code was kicking in and ignoring the constraints set by the flex container. Tables should do what the flex algorithm mandates and thus they should allow being stretched/squeezed as required.
    
    <div style="display:flex; width:100px; background:red;">
      <div style="display:table; width:10px; max-width:10px; height:100px; background:green;">
        <div style="width:100px; height:10px; background:green;"></div>
      </div>
    </div>
    
    

    Tables with items exceeding the 100% of available size

    This is the case of tables placed inside flex items. The automatic layout table algorithm was generating tables with unlimited widths when the sum of the sizes of their columns (expressed in percentages) was exceeding the 100%. It was impossible to fulfill at the same time the constraints set by tables and flexbox algorithms.
    
    <div style="display:flex; width:100px; height:100px; align-items:flex-start; background:green;">
      <div style="flex-grow:1; flex-shrink:0;">
        <table style="height:50px; background:green;" cellpadding="0" cellspacing="0">
          <tr>
            <td style="width:100%; background:green;"> </td>
            <td style="background:green;"> </td>
          </tr>
        </table>
      </div>
    </div>
    
    
    Note how the table was growing indefinitely (I cropped the “Before” picture to fit in the post) to the right before the fix.

    Alignment in single-line flexboxes

    Interesting case. The code was considering that single-line flexboxes were those where all the flex items were placed in a single line after computing the required space for them. Though sensible, that’s not what a single line flexbox is, it’s a flex container with flex-wrap:nowrap. This means that a flex container with flex-wrap:wrap whose children do not need more than 1 flex line to be placed is not a single-line flex container from the specs POV (corolary: implementing specs is hard).
    
    <div style="display: flex; flex-wrap: wrap; align-content: flex-end; width: 425px; height: 70px; border: 2px solid black">
      <div style="height: 20px">This text should be at the bottom of its container</div>
    </div>
    
    

    Percentages in flex items with indefinite sizes

    One of the trickiest ones. Although it didn’t involve a lot of code it caused two serious regressions in Youtube’s upload form and when viewing Twitter videos in fullscreen which required some previous fixes and delayed a bit the landing of this patch. Note that this behaviour was really conflictive from the pure specification POV as there were many changes over the time. Defining a good behaviour is really complicated. Without entering in too much details, flexbox has a couple of cases were sizes are considered as definite when they are theoretically indefinite. In this case we consider that if the flex container main size is definite then the post-flexing size of flex items is also treated as definite.
    
    <div style="display: flex; flex-direction: column; height: 150px; width: 150px; border: 2px solid black;">
      <div>
        <div style="height: 50%; overflow: hidden;">
          <div style="width: 50px; height: 50px; background: green;"></div>
        </div>
      <div>
      <div style="flex: none; width: 50px; height: 50px; background: green;"></div>
    </div>
    
    

    Hit testing with overlapping flex items

    There were some issues with pointer events passing through overlapping flex items (due to negative margins for example). This was fixed by letting the hit testing code proceed in reverse (the opposite to painting) order-modified document order instead of using the raw order from the DOM.
    
    <div style="display:flex; border: 1px solid black; width: 300px;">
      <a style="width: 200px;" href="#">Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua</a>
      <div style="margin-left: -200px; width: 130px; height: 50px; background: orange;"></div>
    </div>
    
    
    In the “Before” case hit testing was bypassing the orange block and thus, the cursor was showing a hand because it detected that it was hovering a link. After the fix, the cursor is properly rendered as an arrow because the orange block covers the underneath link.

    Computing percentages with scrollbars

    In this case the issue was that, in order to compute percentages in heights, we were incorrectly using the size of the scrollbars too.
    
    <div style="display: inline-flex; height: 10em;">
      <div style="overflow-x: scroll;">
        <div style="width: 200px; height: 100%; background: green"></div>
      </div>
    </div>
    
    
    Note that in the “After” picture the horizontal scrollbar background is visible while in the “Before” the wrong height computation made the flex item overlap the scrollbar.

    Image items with specific sizes

    The flex layout algorithm needs the intrinsic sizes of the flex items to compute their sizes and the size of the flex container. Changes to those intrinsic sizes should trigger new layouts, and the code was not doing that.
    
    <!-- Just to showcase how the img bellow is not properly sized -->
    <div style="position: absolute; background-color: red; width: 50px; height: 50px; z-index: -1;"></div>
     
    <div style="display: flex; flex-direction: column; width: 100px; height: 5px;">
      <img style="width: 100px; height: 100px;" src="https://wpt.live/css/css-flexbox/support/100x100-green.png">
    </div>
    
    
    

    Nested flexboxes with ‘min-height: auto’

    Another tricky one and another one related to the handling of nested column flexboxes. As in the previous issue with nested column flexboxes the problem was that we were not supporting this case. For those wanting to have a deeper understanding of the issue this bug was about implementing section 4.5 of the specs. This was one of the more complicated ones to fix, Edward Lorenz would love that part of the layout code, the slightest change in one of those source code lines could trigger huge changes in the final rendering.
    
    <div style='display:flex; flex-direction: column; overflow-y: scroll; width: 250px; height: 250px; border: 1px solid black'>
      <div style='display:flex;'>
        <div style="width: 100px; background: blue"></div>
        <div style='width: 120px; background: orange'></div>
        <div style='width: 10px; background: yellow; height: 300px'></div>
      </div>
    </div>
    
    
    As it can be seen, in the “Before” picture the blue and orange blocks are sized differently to the yellow one. That’s fixed in the “After” picture.

    Percentages in quirks mode

    Another one affecting how percentages are computed in heights, but this one specific to quirks mode. We’re matching now Firefox, Chrome and pre-Chromium Edge, i.e., flexbox should not care much about quirks mode since it was invented many years after quirky browsers dominated the earth.
    
    <!DOCTYPE html PUBLIC>
    <div style="width: 100px; height: 50px;">
      <div style="display: flex; flex-direction: column; outline: 2px solid blue;">
        <div style="flex: 0 0 50%"></div>
      </div>
    </div>
    
    

    Percentages in ‘flex-basis’

    Percentages were working generally fine inside flex-basis, however there was one particular problematic case. It arose whenever that percentage was refererring to, oh surprise, and indefinite height. And again, we’re talking about nested flexboxes with column flows. Indeed, definite/indefinite sizes is one of the toughest things to get right from the layout POV. In this particular case, the fix was to ignore the percentages and and treat them as height: auto.
    
    <div style="display: flex; flex-direction: column; width: 200px;">
      <div style="flex-basis: 0%; height: 100px; background: red;">
        <div style="background: lime">Here's some text.</div>
      </div>
    </div>
    
    

    Flex containers inside STF tables

    Fixing a couple of test cases submitted by an anonymous Opera employee 8! years ago. This is another case of competing layout contexts trying to do things their own way.
    
    <div style="display: table; background:red">
       <div style="display: flex; width: 0px">
          <p style="margin: 1em 1em;width: 50px">Text</p>
          <p style="margin: 1em 1em;width: 50px">Text</p>
          <p style="margin: 1em 1em;width: 50px">Text</p>
       </div>
    </div>
    
    
    After the fix the table is properly sized to 0px width and thus no red is seen.

    Conclusions

    These examples are just some interesting ones I’ve chosen to highlight. In the end, almost 50 new flexbox tests are passing in WebKit that weren’t back in May!. I wouldn’t like to forget the great job done by my colleague Carlos Lopez who imported tons of WPT flexbox tests into the WebKit source tree. He also performed awesome triage work which made my life a lot easier.
    Investing in interoperability is a huge deal for the web. It’s good for everyone, from spec authors to final users, including browser vendors, downstream ports or web authors. So if you care about the web, or your business orbits around web technologies, you should definitely promote and invest on interoperability.

    Implementing standards or fixing bugs in web engines is the kind of work we happily do at Igalia on a daily basis. We are the second largest contributor to both WebKit and Chrome/Blink, so if you have an annoying bug on a particular web engine (Gecko and Servo as well) that you want to be fixed, don’t hesitate and contact us, we’d be glad to help. Also, should you want to be part of a workers-owned cooperative with an asambleary decision-making mechanism and a strong focus on free software technologies join us!.

    Acknowledgements

    Many thanks to WebKit reviewers from Apple and Igalia like Darin Adler, Manuel Rego, Javier Fernández or Daniel Bates who made the process really easy for me, always providing very nice feedback for the patches I submitted.
    I’m also really thankful to Googlers like Christian Biesinger, David Grogan and Stephen McGruer who worked on the very same things in Blink and/or provided very nice guidance and support when porting patches.

    by svillar at October 01, 2020 11:34 AM

    September 28, 2020

    Adrián Pérez

    Sunsetting NPAPI support in WebKitGTK (and WPE)

    1. Summary
    2. What is NPAPI?
    3. What is NPAPI used for?
    4. Why are NPAPI plug-ins being phased out?
    5. What are other browsers doing?
    6. Is WebKitGTK following suit?

    Summary

    Here’s a tl;dr list of bullet points:

    • NPAPI is an old mechanism to extend the functionality of a web browser. It is time to let it go.
    • One year ago, WebKitGTK 2.26.0 removed support for NPAPI plug-ins which used GTK2, but the rest of plug-ins kept working.
    • WebKitGTK 2.30.x will be the last stable series with support for NPAPI plug-ins at all. Version 2.30.0 was released a couple of weeks ago.
    • WebKitGTK 2.32.0, due in March 2021, will be the first stable release to ship without support for NPAPI plug-ins.
    • We have already removed the relevant code from the WebKit repository.
    • While the WPE WebKit port allowed running windowless NPAPI plug-ins, this was never advertised nor supported by us.

    What is NPAPI?

    In 1995, Netscape Navigator 2.0 introduced a mechanism to extend the functionality of the web browser. That was NPAPI, short for Netscape Plugin Application Programming Interface. NPAPI allowed third parties to add support for new content types; for example Future Splash (.spl files), which later became Flash (.swf).

    When a NPAPI plug-in is used to render content, the web browser carves a hole in the rectangular location where content handled by the plug-in will be placed, and hands off the rendering responsibility to the plug-in. This would end up calling call for trouble, as we will see later.

    What is NPAPI used for?

    A number of technologies have used NPAPI along the years for different purposes:

    • Displaying of multimedia content using Flash Player or the Silverlight plug-ins.
    • Running rich Java™ applications in the browser.
    • Displaying documents in non-Web formats (PDF, DjVu) inside browser windows.
    • A number of questionable practices, like VPN client software using a browser plug‑in for configuration.

    Why are NPAPI plug-ins being phased out?

    The design of NPAPI makes the web browser give full responsibility to plug-ins: the browser has no control whatsoever over what plug-ins do to display content, which makes it hard to make them participate in styling and layout. More importantly, plug-ins are compiled, native code over which browser developers cannot exercise quality control, which resulted in a history of security incidents, crashes, and browser hangs.

    Today, Web browsers’ rendering engines can do a better job than plug-ins, more securely and efficiently. The Web platform is mature and there is no place to blindly trust third party code to behave well. NPAPI is a 25 years old technology showing its age—it has served its purpose, but it is no longer needed.

    The last nail in the coffin was Adobe’s 2017 announcement that the Flash plugin will be discontinued in January 2021.

    What are other browsers doing?

    Glad that you asked! It turns out that all major browsers have plans for incrementally reducing how much of NPAPI usage they allow, until they eventually remove it.

    Firefox

    Let’s take a look at the Firefox roadmap first:

    Version Date Plug-in support changes
    47 June 2016 All plug-ins except Flash need the user to click on the element to activate them.
    52 March 2017 Only loads the Flash plug‑in by default.
    55 August 2017 Does not load the Flash plug‑in by default, instead it asks users to choose whether sites may use it.
    56 September 2017 On top of asking the user, Flash content can only be loaded from http:// and https:// URIs; the Android version completely removes plug‑in support. There is still an option to allow always running the Flash plug-in without asking.
    69 September 2019 The option to allow running the Flash plug-in without asking the user is gone.
    85 January 2021 Support for plug-ins is gone.
    Table: Firefox NPAPI plug-in roadmap.

    In conclusion, the Mozilla folks have been slowly boiling the frog for the last four years and will completely remove the support for NPAPI plug-ins coinciding with the Flash player reaching EOL status.

    Chromium / Chrome

    Here’s a timeline of the Chromium roadmap, merged with some highlights from their Flash Roadmap:

    Version Date Plug-in support changes
    ? Mid 2014 The interface to unblock running plug-ins is made more complicated, to discourage usage.
    ? January 2015 Plug-ins blocked by default, some popular ones allowed.
    42 April 2015 Support for plug-ins disabled by default, setting available in chrome://flags.
    45 September 2015 Support for NPAPI plug-ins is removed.
    55 December 2016 Browser does not advertise Flash support to web content, the user is asked whether to run the plug-in for sites that really need it.
    76 July 2019 Flash support is disabled by default, can still be enabled with a setting.
    88 January 2021 Flash support is removed.
    Table: Chromium NPAPI/Flash plug-in roadmap.

    Note that Chromium continued supporting Flash content even when it already removed support for NPAPI in 2015: by means of their acute NIH syndrome, Google came up with PPAPI, which replaced NPAPI and which was basically designed to support Flash and is currently used by Chromium’s built-in PDF viewer—which will go away also coinciding with Flash being EOL, nevertheless.

    Safari

    On the Apple camp, the story is much easier to tell:

    • Their handheld devices—iPhone, iPad, iPod Touch—never supported NPAPI plug-ins to begin with. Easy-peasy.
    • On desktop, Safari has required explicit approval from the user to allow running plug-ins since June 2016. The Flash plug-in has not been preinstalled in Mac OS since 2010, requiring users to manually install it.
    • NPAPI plug-in support will be removed from WebKit by the end of 2020.

    Is WebKitGTK following suit?

    Yes. In September 2019 WebKitGTK 2.26 removed support for NPAPI plug-ins which use GTK2. This included Flash, but the PPAPI version could still be used via freshplayerplugin.

    In March 2021, when the next stable release series is due, WebKitGTK 2.32 will remove the support for NPAPI plug-ins. This series will receive updates until September 2021.

    The above gives a full two years since we started restricting which plug-ins can be loaded before they stop working, which we reckon should be enough. At the moment of writing this article, the support for plug-ins was already gone from the WebKit source the GTK and WPE ports.

    Yes, you read well, WPE supported NPAPI plug-ins, but in a limited fashion: only windowless plug-ins worked. In practice, making NPAPI plug-ins work on Unix-like systems required using the XEmbed protocol to allow them to place their rendered content overlaid on top of WebKit’s, but the WPE port does not use X11. Provided that we never advertised nor officially supported the NPAPI support in the WPE port, we do not expect any trouble removing it.

    by aperez (adrian@perezdecastro.org) at September 28, 2020 10:00 PM

    September 27, 2020

    Ricardo García

    My participation in XDC 2020

    The 2020 X.Org Developers Conference took place from September 16th to September 18th. For the first time, due to the ongoing COVID-19 pandemic, it was a fully virtual event. While this meant that some interesting bits of the conference, like the hallway track, catching up in person with some people and doing some networking, was not entirely possible this time, I have to thank the organizers for their work in making the conference an almost flawless event. The conference was livestreamed directly to YouTube, which was the main way for attendees to watch the many different talks. freenode was used for the hallway track, with most discussions happening in the ##xdc2020 IRC channel. In addition ##xdc2020-QA was used for attendees wanting to add questions or comments at the end of the talk.

    Igalia was a silver sponsor of the event and we also participated with 5 different talks, including one by yours truly.

    My talk about VK_EXT_extended_dynamic_state was based on my previous blog post, but it includes a more detailed explanation of the extension as well as more detailed comments and an explanation about how the extension was created. I took advantage of the possibility of using pre-recorded videos for the conference, as I didn’t fully trust my kids wouldn’t interrupt me in the middle of the talk. In the end I think it was a good idea and, from the presenter point of view, I also found out using a script and following it strictly (to some degree) prevented distractions and made the talk a bit shorter and more to the point, because I tend to beat around the bush when talking live. You can watch my talk in the embedded video below.

    * { padding: 0; margin: 0; overflow: hidden; } html, body { height: 100%; } img, span { /* All elements take the whole iframe width and are vertically centered. */ position: absolute; width: 100%; top: 0; bottom: 0; margin: auto; } span { /* This mostly applies to the play button. */ height: 1.5em; text-align: center; font-family: sans-serif; font-size: 500%; color: white; } Video XDC 2020 | How the Vulkan VK_EXT_extended_dynamic_state extension came to be "> * { padding: 0; margin: 0; overflow: hidden; } html, body { height: 100%; } img, span { /* All elements take the whole iframe width and are vertically centered. */ position: absolute; width: 100%; top: 0; bottom: 0; margin: auto; } span { /* This mostly applies to the play button. */ height: 1.5em; text-align: center; font-family: sans-serif; font-size: 500%; color: white; } Video XDC 2020 | How the Vulkan VK_EXT_extended_dynamic_state extension came to be " >

    Slides for the talk are also available and below you can find a transcript of the talk.

    <Title slide>

    Hello, my name is Ricardo García, I work at Igalia as part of its Graphics team and today I will be talking about the extended dynamic state Vulkan extension. At Igalia I was involved in creating CTS tests for this extension and also in reviewing the spec when writing those tests, in a very minor capacity. This extension is pretty simple and very useful, and the talk is divided in two parts. First I will talk about the extension itself and then I’ll reflect on a few bits about how this extension was created that I consider quite interesting.

    <Part 1>

    <Extension description slide>

    So, first, what does this extension do? Its documentation says:

    VK_EXT_extended_dynamic_state adds some more dynamic state to support applications that need to reduce the number of pipeline state objects they compile and bind.

    In other words, as you will see, it makes Vulkan pipeline objects more flexible and easier to use from the application point of view.

    <Pipeline diagram slide>

    So, to give you some context, this is [the] typical graphics pipeline representation in many APIs like OpenGL, DirectX or Vulkan. You’ve probably seen variations of this a million times. The pipeline is divided in stages, some of them fixed-function, some of them programmable with shaders. Each stage usually takes some data from the previous stage and produces data to be consumed by the next one, apart from using other external resources like buffers or textures or whatever. What’s the Vulkan approach to represent this process?

    <Creation structure slide>

    Vulkan wants you to specify almost every single aspect of the previous pipeline in advance by creating a graphics pipeline object that contains information about how every stage should work. And, once created, most of these pipeline parameters or configuration cannot be changed. As you can see here, this includes shader programs, how vertices are read and processed, depth and stencil tests, you name it. Pipeline objects are heavy objects in Vulkan and they are hard to create. Why does Vulkan want you to do that? The answer has always been this keyword: “optimization”. Giving all the information in advance gives more chances for every current or even future implementations to optimize how the pipeline works. It’s the safe choice. And, despite this, you can see there’s a pipeline creation parameter with information about dynamic state. These are things that can be changed when using the pipeline without having to create a separate and almost identical pipeline object.

    <New dynamic states slide>

    What the extension does should be pretty obvious now: it adds a bunch of additional elements that can be changed on the fly without creating additional pipelines. This includes things like primitive topology, front face vertex order, vertex stride, cull mode and more aspects of the depth and stencil tests, etc. A lot of things. Using them if needed means fewer pipeline objects, fewer pipeline cache accesses and simpler programs in general. As I said before, it makes Vulkan pipeline objects more flexible and easier to use from the application point of view, because more pipeline aspects can be changed on the fly when using these pipeline objects instead of having to create separate objects for each combination of parameters you may want to modify at runtime. This may make the application logic simpler and it can also help when Vulkan is used as the backend, for example, to implement higher level APIs that are not so rigid regarding pipelines. I know this extension is useful for some emulators and other API-translating projects.

    <New commands slide>

    Together with those it also introduces a new set of functions to change those parameters on the fly when recording commands that will use the pipeline state object.

    <Pipeline diagram slide>

    So, knowing that and going back to the graphics pipeline, the obvious question is: does this impact performance? Aren’t we reducing the number of optimization opportunities the implementation has if we use these additional dynamic states? In theory, yes. In practice, it depends on the implementation. Many GPUs and Vulkan drivers out there today have some pipeline aspects that are considered “dynamic” in the sense that they are easily changed on the fly without a perceptible impact in performance, while others are truly important for optimization. For example, take shaders. In Vulkan they’re provided as SPIR-V programs that need to be translated to GPU machine code and creating pipelines when the application starts makes it easy to compile shaders beforehand to avoid stuttering and frame timing issues later, for example. And not only that. As you create pipelines, you’re telling the implementation which shaders are used together. Say you have a vertex shader that outputs 4 parameters, and it’s used in a pipeline with a fragment shader that only uses the first 2. When creating the pipeline the implementation can decide to discard instructions that are only related to producing the 2 extra unused parameters in the vertex shader. But other things like, for example, changing the front face? That may be trivial without affecting performance.

    <Part 2>

    <Eric Lengyel tweet slide>

    Moving on to the second part, I wanted to talk about how this extension was created. It all started with an “angry” tweet by Eric Lengyel (sorry if I’m not pronouncing it correctly) who also happens to be the author of the previous diagram. He complained in Twitter that you couldn’t change the front face dynamically, which happens to be super useful for rendering reflections, and pointed to an OpenGL NVIDIA extension that allowed you to do exactly that.

    <Piers Daniell reply slide>

    This was noticed by Piers Daniell from NVIDIA, who created a proposal in Khronos. That proposal was discussed with other vendors (software and hardware) that chimed in on aspects that could be or should be made dynamic if possible, which resulted in the multi-vendor extension we have today.

    <RADV implementation slide>

    In fact, RADV was one of the first Vulkan implementations to support the extension thanks to the effort by Samuel Pitoiset.

    <Promoters of Khronos slide>

    This whole process got me thinking Khronos may sometimes be seen from the outside as this closed silo composed mainly of hardware vendors. Certainly, there are a lot of hardware vendors but if you take the list of promoter members you can see some fairly well-known software vendors as well, and API usability and adoption are important for both groups. There are many people in Khronos trying to make Vulkan easier to use even if we’re all aware that’s somewhat in conflict with providing a lower level API that should let you write performant applications.

    <Khronos Contributors slide>

    If you take a look at the long list of contributor members, that’s only shown partially here because it’s very long, you’ll notice a lot of actors from different backgrounds as well.

    <Vulkan-Docs repo slide>

    Moreover, while Khronos and its different Vulkan working groups are far from an open source project or community, I believe they’re certainly more open to contributions than what many people think. For example, the Vulkan spec is published in a GitHub repo with instructions to build it (the spec is written in AsciiDoc) and this repo is open for issues and pull requests. So, obviously, if you want to change major parts of Vulkan and how some aspects of the API work, you’re going to meet opposition and maybe you should be joining Khronos to discuss things internally with everyone involved in there. However, while an angry tweet was enough for this particular extension, if you’re not well-known you may want to create an issue instead, exposing your use case and maybe with other colleagues chiming in on details or supporting of your proposal. I know for a fact issues created in this public repo are discussed in periodic Khronos meetings. It may take some weeks if people are busy and there’s a lot of things on the table, but they’re going to end up being discussed, which is a very good thing I was happy to see, and I want to put emphasis on that. I would like Khronos to continue doing that and I would like more people to take advantage of the public repos from Khronos. I know the people involved in the Vulkan spec want to make the text as clear as possible. Maybe you think some paragraph is confusing, or there’s a missing link to another section that provides more context, or something absurd is allowed by the spec and should be forbidden. You can try a reasoned pull request for any of those. Obviously, no guarantees it will go in, but interesting in any case.

    <Blend state tweet slide>

    For example, in the Twitter thread I showed before, I tweeted a reply when the extension was published and, among a few retweets, likes and quoted replies I found this very interesting Tweet I’m showing you here, asking for the whole blend state to be made dynamic and indicating that would be game-changing for some developers and very interesting for web browsers. We all want our web browsers to leverage the power of the GPU as much as possible, right? So why not? I thought creating an issue in the public repo for this case could be interesting.

    <Dynamic blend state issue slide>

    And, in fact, it turns out someone had already created an issue about it, as you can see here.

    <Tom Olson reply slide>

    And in this case, in this issue, Tom Olson from ARM replied that the working group had been discussing it and it turns out in this particular case existing hardware doesn’t make it easy to make the blend state fully dynamic without possibly recompiling shaders under the hood and introducing unwanted complexity in the implementations, so it was rejected for now. But even if, in this case, the reply is negative, you can see what I was mentioning: the issue reached the working group, it was considered, discussed and the issue creator got a reply and feedback. And that’s what I wanted to show you.

    <Final slide>

    And that’s all. Thanks for listening! Any questions maybe?

    The talk was followed by a Q&A section moderated, in this case, by Martin Peres. In the text below RG stands for Ricardo Garcia and MP stands for Martin Peres.

    RG: OK…​ Hello everyone!

    MP: OK, so far we do not have any questions. Jason Ekstrand has a comment: "We (the Vulkan Working Group) has had many contributions to the spec".

    RG: Yeah, yeah, exactly. I mean, I don’t think it’s very well known but yeah, indeed, there are a lot of people who have already contributed issues, pull requests and there have been many external contributions already so these things should definitely continue and even happen more often.

    MP: OK, I’m gonna ask a question. So…​ how much do you think this is gonna help layering libraries like Zink because I assume, I mean, one of the big issues with Zink is that you need to have a lot of pipelines precompiled and…​ is this helping Zink?

    RG: I don’t know if it’s being used. I think I did a search yesterday to see if Zink was using the extension and I don’t remember if I found anything specific so maybe the Zink people can answer the question but, yeah, it should definitely help in those cases because OpenGL is not as strict as Vulkan regarding pipelines obviously. You can change more things on the fly and if the underlying Vulkan implementation supports extended dynamic state it should make it easier to emulate OpenGL on top of Vulkan. For example, I know it’s being used by VKD3D right now to emulate DirectX 12 and there’s a emulator, a few emulators out there which are using the extension because, you know, APIs for consoles are different and they can use this type of extensions to make code better.

    MG: Agree. Jason also has another comment saying there are even extensions in flight from the Mesa community for some windowing-system related stuff.

    RG: Yeah, I was happy to see yesterday…​ I think it was yesterday, well, here at this XDC that the present timing extension pull request is being handled right now on GitHub which I think is a very good thing. It’s a trend I would like to [see] continue because, well, I guess sometimes, you know, the discussions inside the Working Group and inside Khronos may involve IP or whatever so it’s better to have those discussions sometimes in private, but it is a good thing that maybe, you know, there are a few extensions that could be handled publicly in GitHub instead of the internal tools in Khronos. So, yeah, that’s a good thing and a trend I would like to see continue: extensions discussed in public.

    MG: Yeah, sounds very cool. OK, I think we do not have any question…​ other questions or comments so let’s say thank you very much and…​

    RG: Thank you very much and let me congratulate you for…​ to the organizers for organizing XDC and…​ everyone, enjoy the rest of the day, thank you.

    MG: Thank you! See you in 13m 30s for the status of freedesktop.org’s GitLab cloud hosting.

    Regarding Zink, at the time I’m writing this, there’s an in-progress merge request for it to take advantage of the extension. Regarding the present timing extension, its pull request is at GitHub and you can also watch a short talk from Day One of the conference. I also mentioned the extension being used by VKD3D. I was specifically referring to the VKD3D-Proton fork.

    References used in the talk:

    September 27, 2020 01:25 PM

    September 24, 2020

    Samuel Iglesias

    X.Org Developers Conference 2020

    XDC 2020 sponsors

    Last week, X.Org Developers Conference 2020 was held online for the first time. This year, with all the COVID-19 situation that is affecting almost every country worldwide, the X.Org Foundation Board of Directors decided to make it virtual.

    I love open-source conferences :-) They are great for networking, have fun with the rest of community members, have really good technical discussions in the hallway track… and visit a new place every year! Unfortunately, we couldn’t do any of that this time and we needed to look for an alternative… being going virtual the obvious one.

    The organization team at Intel, lead by Radoslaw Szwichtenberg and Martin Peres, analyzed the different open-source alternatives to organize XDC 2020 in a virtual manner. Finally, due to the setup requirements and the possibility of having more than 200 attendees connected to the video stream at the same time (Big Blue Button doesn’t recommend more than 100 simultaneous users), they selected Jitsi for speakers + Youtube for streaming/recording + IRC for questions. Arkadiusz Hiler summarized very well what they did from the A/V technical point of view and how was the experience hosting a virtual XDC.

    I’m very happy with the final result given the special situation this year: the streaming was flawless, we didn’t have almost any technical issue (except one audio issue in the opening session… like in physical ones! :-D), and IRC turned out to be very active during the conference. Thanks a lot to the organizers for their great job!

    However, there is always room for improvements. Therefore, the X.Org Foundation board is asking for feedback, please share with us your opinion on XDC 2020!

    Just focusing on my own experience, it was very good. I enjoyed a lot the talks presented this year and the interesting discussions happening in IRC. I would like to highlight the four talks presented by my colleagues at Igalia :-D

    This year I was also a speaker! I presented “Improving Khronos CTS tests with Mesa code coverage” talk (watch recording), where I explained how we can improve the VK-GL-CTS quality by leveraging Mesa code coverage using open-source tools.

    My XDC 2020 talk

    I’m looking forward to attending X.Org Developers Conference 2021! But first, we need an organizer! Requests For Proposals for hosting XDC2021 are now open!

    See you next year!

    September 24, 2020 08:10 AM

    September 20, 2020

    Eleni Maria Stea

    [OpenGL and Vulkan Interoperability on Linux] Part 3: Using OpenGL to display Vulkan allocated textures.

    This is the third post of the OpenGL and Vulkan interoperability series, where I explain some EXT_external_objects and EXT_external_objects_fd use cases with examples taken by the Piglit tests I’ve written to test the extensions as part of my work for Igalia‘s graphics team. We are going to see a slightly more complex case of Vulkan/GL … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 3: Using OpenGL to display Vulkan allocated textures.

    by hikiko at September 20, 2020 01:08 PM

    September 18, 2020

    Eleni Maria Stea

    [OpenGL and Vulkan Interoperability on Linux]: The XDC 2020 presentation

    In this year’s XDC I’ve given a talk about OpenGL and Vulkan interoperability and our work to support EXT_external_objects and EXT_external_objects_fd on different Mesa3D drivers. These are my slides and pages 9-12 contain short descriptions of all the Piglit EXT_external_objects use cases I plan to describe in the upcoming OpenGL and Vulkan Interoperability blog posts: … Continue reading [OpenGL and Vulkan Interoperability on Linux]: The XDC 2020 presentation

    by hikiko at September 18, 2020 05:19 PM

    XDC 2020

    XDC 2020 was virtual this year due to covid-19. Despite that, there were many interesting talks and 5 of them were given by Igalians. Like in the previous years Igalia was among the sponsors (a silver sponsor). The talks coming from Igalians were the following: Day #1: Overview of the open source Vulkan driver for … Continue reading XDC 2020

    by hikiko at September 18, 2020 05:18 PM

    September 08, 2020

    Brian Kardell

    Numeric Literal Separators: 2 Minute Standards

    Numeric Literal Separators: 2 Minute Standards

    Part of #StandardsIn2Min, an effort to provide short, but useful information about standards. Follow @StandardsIn2Min for a low-stress way to keep up, 2 minutes at a time.

    Imagine if we didn't have spaces...

    Imagineifyouhadtoreadallofthesentencesinthispostlikethis.

    What a nightmare that would be. Separators are really useful for humans. However, for most of JavaScript's history, code like this wasn't uncommon...

    var x = 10000000;

    This is, basically, the same problem. How many is that? It's too hard for our eyeballs to parse. If we were to display that in any form intended to be readable, we'd use some kind of separators to put these into hundred, thousands, millions, etc.. Like this... 10,000,000

    Numeric separators allow us to achieve this same visual separation for readability in code, using the _ (underscore) character. The same example above could be written now as..

    var x = 10_000_000;

    That is, obviously, an improvement for readability.

    In fact, JavaScript allows for several kinds of numeric literals beyond simple integers and these separators can be employed in all of them for visual separation. We have decimals (like 10_000.00), binary (using 'b', like 0b1101_0010), octal (using 'o' like 0o7_6_5) and hexidecimal (using '0x' like 0xA0_B2_C3), and BigInt (using 'n' like 100_000_000_000_000n).

    There are just a few limitations to be aware of..

    1. You cannot use two contiguous separators
    2. You cannot end a number with a separator
    3. You cannot begin a number with a leading 0 immediately followed by a separator

    That's about all there is to it. It has rich support in both modern, shipping engines and is also usable by transpilers.

    September 08, 2020 04:00 AM

    September 07, 2020

    Víctor Jáquez

    Review of Igalia Multimedia activities (2020/H1)

    This blog post is a review of the various activities the Igalia Multimedia team was involved in during the first half of 2020.

    Our previous reports are:

    Just before a new virus turned into pandemics we could enjoy our traditional FOSDEM. There, our colleague Phil gave a talk about many of the topics covered in this report.

    GstWPE

    GstWPE’s wpesrc element, produces a video texture representing a web page rendered off-screen by WPE.

    We have worked on a new iteration of the GstWPE demo, focusing on one-to-many, web-augmented overlays, broadcasting with WebRTC and Janus.

    Also, since the merge of gstwpe plugin in gst-plugins-bad (staging area for new elements) new users have come along spotting rough areas and improving the element along the way.

    Video Editing

    GStreamer Editing Services (GES) is a library that simplifies the creation of multimedia editing applications. It is based on the GStreamer multimedia framework and is heavily used by Pitivi video editor.

    Implemented frame accuracy in the GStreamer Editing Services (GES)

    As required by the industry, it is now possible to reference all time in frame number, providing a precise mapping between frame number and play time. Many issues were fixed in GStreamer to reach the precision enough for make this work. Also intensive regression tests were added.

    Implemented time effects support in GES

    Important refactoring inside GStreamer Editing Services have happened to allow cleanly and safely change playback speed of individual clips.

    Implemented reverse playback in GES

    Several issues have been fixed inside GStreamer core elements and base classes in order to support reverse playback. This allows us to implement reliable and frame accurate reverse playback for individual clips.

    Implemented ImageSequence support in GStreamer and GES

    Since OpenTimelineIO implemented ImageSequence support, many users in the community had said it was really required. We reviewed and finished up imagesequencesrc element, which had been awaiting review for years.

    This feature is now also supported in the OpentimelineIO GES adapater.

    Optimized nested timelines preroll time by an order of magnitude

    Caps negotiation, done while the pipeline transitions from pause state to playing state, testing the whole pipeline functionality, was the bottleneck for nested timelines, so pipelines were reworked to avoid useless negotiations. At the same time, other members of GStreamer community have improved caps negotiation performance in general.

    Last but not least, our colleague Thibault gave a talk in The Pipeline Conference about The Motion Picture Industry and Open Source Software: GStreamer as an Alternative, explaining how and why GStreamer could be leveraged in the motion picture industry to allow faster innovation, and solve issues by reusing all the multi-platform infrastructure the community has to offer.

    WebKit multimedia

    There has been a lot of work on WebKit multimedia, particularly for WebKitGTK and WPE ports which use GStreamer framework as backend.

    WebKit Flatpak SDK

    But first of all we would like to draw readers attention to the new WebKit Flatpak SDK. It was not a contribution only from the multimedia team, but rather a joint effort among different teams in Igalia.

    Before WebKit Flatpak SDK, JHBuild was used for setting up a WebKitGTK/WPE environment for testing and development. Its purpose to is to provide a common set of well defined dependencies instead of relying on the ones available in the different Linux distributions, which might bring different outputs. Nonetheless, Flatpak offers a much more coherent environment for testing and develop, isolated from the rest of the building host, approaching to reproducible outputs.

    Another great advantage of WebKit Flatpak SDK, at least for the multimedia team, is the possibility of use gst-build to setup a custom GStreamer environment, with latest master, for example.

    Now, for sake of brevity, let us sketch an non-complete list of activities and achievements related with WebKit multimedia.

    General multimedia

    Media Source Extensions (MSE)

    Encrypted Media Extension (EME)

    One of the major results of this first half, is the upstream of ThunderCDM, which is an implementation of a Content Decryption Module, providing Widevine decryption support. Recently, our colleague Xabier, published a blog post on this regard.

    And it has enabled client-side video rendering support, which ensures video frames remain protected in GPU memory so they can’t be reached by third-party. This is a requirement for DRM/EME.

    WebRTC

    GStreamer

    Though we normally contribute in GStreamer with the activities listed above, there are other tasks not related with WebKit. Among these we can enumerate the following:

    GStreamer VAAPI

    • Reviewed a lot of patches.
    • Support for media-driver (iHD), the new VAAPI driver for Intel, mostly for Gen9 onwards. There are a lot of features with this driver.
    • A new vaapioverlay element.
    • Deep code cleanups. Among these we would like to mention:
      • Added quirk mechanism for different backends.
      • Change base classes to GstObject and GstMiniObject of most of classes and buffers types.
    • Enhanced caps negotiation given current driver’s constraints

    Conclusions

    The multimedia team in Igalia has keep working, along the first half of this strange year, in our three main areas: browsers (mainly on WebKitGTK and WPE), video editing and GStreamer framework.

    We worked adding and enhancing WebKitGTK and WPE multimedia features in order to offer a solid platform for media providers.

    We have enhanced the Video Editing support in GStreamer.

    And, along these tasks, we have contribuited as much in GStreamer framework, particulary in hardware accelerated decoding and encoding and VA-API.

    by vjaquez at September 07, 2020 03:12 PM

    Alejandro Piñeiro

    v3dv status update 2020-09-07

    So here a new update of the evolution of the Vulkan driver for the rpi4 (broadcom GPU).

    Features

    Since my last update we finished the support for two features. Robust buffer access and multisampling.

    Robust buffer access is a feature that allows to specify that accesses to buffers are bounds-checked against the range of the buffer descriptor. Usually this is used as a debug tool during development, but disabled on release (this is explained with more detail on this ARM guide). So sorry, no screenshot here.

    On my last update I mentioned that we have started the support for multisampling, enough to get some demos working. Since then we were able to finish the rest of the mulsisampling support, and even implemented the optional feature sample rate shading. So now the following Sascha Willems’s demo is working:

    Sascha Willems deferred multisampling demo run on rpi4

    Bugfixing

    Taking into account that most of the features towards support Vulkan core 1.0 are implemented now, a lot of the effort since the last update was done on bugfixing, focusing on the specifics of the driver. Our main reference for this is Vulkan CTS, the official Khronos testsuite for Vulkan and OpenGL.

    As usual, here some screenshots from the nice Sascha Willems’s demos, showing demos that were failing when I wrote the last update, and are working now thanks of the bugfixing work.

    Sascha Willems hdr demo run on rpi4

    Sascha Willems gltf skinning demo run on rpi4

    Next

    At this point there are no full features pending to implement to fulfill the support for Vulkan core 1.0. So our focus would be on getting to pass all the Vulkan CTS tests.

    Previous updates

    Just in case you missed any of the updates of the vulkan driver so far:

    Vulkan raspberry pi first triangle
    Vulkan update now with added source code
    v3dv status update 2020-07-01
    V3DV Vulkan driver update: VkQuake1-3 now working
    v3dv status update 2020-07-31

    by infapi00 at September 07, 2020 02:37 PM

    September 02, 2020

    Xabier Rodríguez Calvar

    Serious Encrypted Media Extensions on GStreamer based WebKit ports

    Encrypted Media Extensions (a.k.a. EME) is the W3C standard for encrypted media in the web. This way, media providers such as Hulu, Netflix, HBO, Disney+, Prime Video, etc. can provide their contents with a reasonable amount of confidence that it will make it very complicated for people to “save” their assets without their permission. Why do I use the word “serious” in the title? In WebKit there is already support for Clear Key, which is the W3C EME reference implementation but EME supports more encryption systems, even privative ones (I have my opinion about this, you can ask me privately). No service provider (that I know) supports Clear Key, they usually rely on Widevine, PlayReady or some other.

    Three years ago, my colleague Žan Doberšek finished the implementation of what was going to be the shell of WebKit’s modern EME implementation, following latest W3C proposal. We implemented that downstream (at Web Platform for Embedded) as well using Thunder, which includes as a plugin a fork of what was Open Content Decryption Module (a.k.a. OpenCDM). The OpenCDM API changed quite a lot during this journey. It works well and there are millions of set-top-boxes using it currently.

    The delta between downstream and the upstream GStreamer based WebKit ports was quite big, testing was difficult and syncing was not always easy, so we decided reverse the situation.

    Our first step was done by my colleague Charlie Turner, who made Clear Key work upstream again while adapted some changes the Apple folks had done meanwhile. It was amazing to see Clear Key tests passing again and his work with the CDMProxy related classes was awesome. After having ClearKey working, I had to adapt them a bit to accomodate Thunder. To explain a bit about the WebKit EME architecture, I must say that there are two layers. The first is the crossplatform one, which implements the W3C API (MediaKeys, MediaKeySession, CDM…). These classes rely on the platform ones (CDMPrivate, CDMInstance, CDMInstanceSession) to handle the platform management, message exchange, etc. which would be the second layer. Apple playback system is fully integrated with their DRM system so they don’t need anything else. We do because we need to integrate our own decryptors to defer to Thunder for decryption so in the GStreamer based ports we also need the CDMProxy related classes, which would be CDMProxy, CDMInstanceProxy, CDMInstanceSessionProxy… The last two extend CDMInstance and CDMInstanceSession respectively to be able to deal with the key management, that is abstracted to the KeyHandle and KeyStore.

    Once the abstraction is there (let’s remember that the abstranction works both for Clear Key and Thunder), the Thunder implementation is quite simple, just gluing the CDMProxy, CDMInstanceProxy and CDMInstanceSessionProxy classes to the Thunder system and writing a GStreamer decryptor element for it. I might have made a mistake when selecting the files but considering Thunder classes + the GStreamer common decryptor code, cloc says it is just 1198 lines of platform code. I think it is pretty low for what it does. Apart from that, obviously, there are 5760 lines of crossplatform code.

    To build and run all this you need to do several things:

    1. Build the dependencies with WEBKIT_JHBUILD=1 JHBUILD_ENABLE_THUNDER="yes" to enable the old fashioned JHBuild build and force it to build the Thunder dependencies. All dependendies are on JHBuild, even Widevine is referenced but to download it you need the proper credentials as it is closed source.
    2. Pass --thunder when calling build-webkit.sh.
    3. Run MiniBrowser with WEBKIT_GST_EME_RANK_PRIORITY="Thunder" and pass parameters --enable-mediasource=TRUE --enable-encrypted-media=TRUE --autoplay-policy=allow. The autoplay policy is usually optional but in this case it is necessary for the YouTube TV tests. We need to give the Thunder decryptor a higher priority because of WebM, that does not specify a key system and without it the Clear Key one can be selected and fail. MP4 does not create trouble because the protection system is specified and the caps negotiation does its magic.

    As you could have guessed if you have a closer look at the GStreamer JHBuild moduleset, you’ll see that only Widevine is supported. To support more, you only have to make them build in the Thunder ecosystem and add them to CDMFactoryThunder::supportedKeySystems.

    When I coded this, all YouTube TV tests for Widevine were green in the desktop. At the moment of writing this post they aren’t because of some problem with the Widevine installation that will be sorted quickly, I hope.

    by calvaris at September 02, 2020 02:59 PM

    August 27, 2020

    Chris Lord

    OffscreenCanvas, jobs, life

    Hoo boy, it’s been a long time since I last blogged… About 2 and a half years! So, what’s been happening in that time? This will be a long one, so if you’re only interested in a part of it (and who could blame you), I’ve titled each section.

    Leaving Impossible

    Well, unfortunately my work with Impossible ended, as we essentially ran out of funding. That’s really a shame, we worked on some really cool, open-source stuff, and we’ve definitely seen similar innovations in the field since we stopped working on it. We took a short break (during which we also, unsuccessfully, searched for further funding), after which Rob started working on a cool, related project of his own that you should check out, and I, being a bit less brave, starting seeking out a new job. I did consider becoming a full-time musician, but business wasn’t picking up as quickly as I’d hoped it might in that down-time, and with hindsight, I’m glad I didn’t (Covid-19 and all).

    I interviewed with a few places, which was certainly an eye-opening experience. The last ‘real’ job interview I did was for Mozilla in 2011, which consisted mainly of talking with engineers that worked there, and working through a few whiteboard problems. Being a young, eager coder at the time, this didn’t really phase me back then. Turns out either the questions have evolved or I’m just not quite as sharp as I used to be in that very particular environment. The one interview I had that involved whiteboard coding was a very mixed bag. It seemed a mix of two types of questions; those that are easy to answer (but unless you’re in the habit of writing very quickly on a whiteboard, slow to write down) and those that were pretty impossible to answer without specific preparation. Perhaps this was the fault of recruiters, but you might hope that interviews would be catered somewhat to the person you’re interviewing, or the work they might actually be doing, neither of which seemed to be the case? Unsurprisingly, I didn’t get past that interview, but in retrospect I’m also glad I didn’t. Igalia’s interview process was much more humane, and involved mostly discussions about actual work I’ve done, hypothetical situations and ethics. They were very long discussions, mind, but I’m very glad that they were happy to hire me, and that I didn’t entertain different possibilities. If you aren’t already familiar with Igalia, I’d highly recommend having a read about them/us. I’ve been there a year now, and the feeling is quite similar to when I first joined Mozilla, but I believe with Igalia’s structure, this is likely to stay a happier and safer environment. Not that I mean to knock Mozilla, especially now, but anyone that has worked there will likely admit that along with the giddy highs, there are also some unfortunate lows.

    Igalia

    I joined Igalia as part of the team that works on WebKit, and that’s what I’ve been doing since. It almost makes perfect sense in a way. Surprisingly, although I’ve spent overwhelmingly more time on Gecko, I did actually work with WebKit first while at OpenedHand, and for a short period at Intel. While celebrating my first commit to WebKit, I did actually discover it wasn’t my first commit at all, but I’d contributed a small embedding-related fix-up in 2008. So it’s nice to have come full-circle! My first work at Igalia was fixing up some patches that Žan Doberšek had prototyped to allow direct display of YUV video data via pixel shaders. Later on, I was also pleased to extend that work somewhat by fixing some vc3 driver bugs and GStreamer bugs, to allow for hardware decoding of YUV video on Raspberry Pi 3b (this, I believe, is all upstream at this point). WebKit Gtk and WPE WebKit may be the only Linux browser backends that leverage this pipeline, allowing for 1080p30 video playback on a Pi3b. There are other issues making this less useful than you might think, but either way, it’s a nice first achievement.

    OffscreenCanvas

    After that introduction, I was pointed at what could be fairly described as my main project, OffscreenCanvas. This was also a continuation of Žan’s work (he’s prolific!), though there has been significant original work since. This might be the part of this post that people find most interesting or relevant, but having not blogged in over 2 years, I can’t be blamed for waffling just a little. OffscreenCanvas is a relatively new web standard that allows the use of canvas API disconnected from the DOM, and within Workers. It also makes some provisions for asynchronously updated rendering, allowing canvas updates in Workers to bypass the main thread entirely and thus not be blocked by long-running processes on that thread. The most obvious use-case for this, and I think the most practical, is essentially non-blocking rendering of generated content. This is extremely handy for maps, for example. There are some other nice use-cases for this as well – you can, for example, show loading indicators that don’t stop animating while performing complex DOM manipulation, or procedurally generate textures for games, asynchronously. Any situation where you might want to do some long-running image processing without blocking the main thread (image editing also springs to mind).

    Currently, the only complete implementation is within Blink. Gecko has a partial implementation that only supports WebGL contexts (and last time I tried, crashed the browser on creation…), but as far as I know, that’s it. I’ve been working on this, with encouragement and cooperation from Apple, on and off for the past year. In fact, as of August 12th, it’s even partially usable, though there is still a fair bit missing. I’ve been concentrating on the 2d context use-case, as I think it’s by far the most useful part of the standard. It’s at the point where it’s mostly usable, minus text rendering and minus some edge-case colour parsing. Asynchronous updates are also not yet supported, though I believe that’s fairly close for Linux. OffscreenCanvas is enabled with experimental features, for those that want to try it out.

    My next goal, after asynchronous updates on Linux, is to enable WebGL context support. I believe these aren’t particularly tough goals, given where it is now, so hopefully they’ll happen by the end of the year. Text rendering is a much harder problem, but I hope that between us at Igalia and the excellent engineers at Apple, we can come up with a plan for it. The difficulty is that both styling and font loading/caching were written with the assumption that they’d run on just one thread, and that that thread would be the main thread. A very reasonable assumption in a pre-Worker and pre-many-core-CPU world of course, but increasingly less so now, and very awkward for this particular piece of work. Hopefully we’ll persevere though, this is a pretty cool technology, and I’d love to contribute to it being feasible to use widely, and lessen the gap between native and the web.

    And that’s it from me. Lots of non-work related stuff has happened in the time since I last posted, but I’m keeping this post tech-related. If you want to hear more of my nonsense, I tend to post on Twitter a bit more often these days. See you in another couple of years 🙂

    by Chris Lord at August 27, 2020 08:56 AM

    August 16, 2020

    Eleni Maria Stea

    [OpenGL and Vulkan Interoperability on Linux] Part 2: Using OpenGL to draw on Vulkan textures.

    This is the second post of the OpenGL and Vulkan interoperability series, where I explain some EXT_external_objects and EXT_external_objects_fd use cases with examples taken by the Piglit tests I’ve written to test the extensions as part of my work for Igalia‘s graphics team. We are going to see a very simple case of Vulkan/GL interoperability … Continue reading [OpenGL and Vulkan Interoperability on Linux] Part 2: Using OpenGL to draw on Vulkan textures.

    by hikiko at August 16, 2020 09:51 AM

    August 13, 2020

    Javier Fernández

    Improving CSS Custom Properties performance

    Chrome 84 reached the stable channel a few weeks ago, and there are already several great posts describing the many important additions, interesting new features, security fixes and improvements in privacy policies (([1], [2], [3], [4]) it contains. However, there is a change that I worked on in this release which might have passed unnoticed by most, but I think is very valuable: A change regarding CSS Custom Properties (variables) performance.

    The design of CSS, in general, takes great care in considering how features are designed with respect to making it possible for them to perform well. However, implementations may not perform as well as they could, and it takes a considerable amount of time to understand how authors use the features and which cases are more relevant for them.

    CSS Custom Properties are an interesting example to look at here: They are a wonderful feature that provides a lot of advantages for web authors. For a whole lot of cases, all of the implementations of CSS Custom Properties perform well enough that most people won’t notice. However, we at Igalia have been analyzing several use cases and looking at some reports around their performance in different implementations.

    Let’s consider a fairly straightforward example in which an author sets a single property in a toggleable class in the body, and then uses that property several times deeper in the tree to change the foreground color of some text.

    
       .red { --prop: red; }
       .green { --prop: green; }
    
    
    
    

    Only about 20% of those actually use this property, 5 elements deep into the tree, and only to change the foreground color.

    To evaluate Chromium’s performance in a case like this we can define a new perf tests, using the perf tools the Chromium project has available for browser engineers. In this case, we want a huge tree so that we can evaluate better the impact of the different optimizations.

    
        .green { --prop: green; }
        .red { --prop: red; }
    
    
        

    These are the results obtained runing the test in Chrome 83:

    avg median
    stdev

    min

    max

    163.74 ms 163.79 ms 3.69 ms 158.59 ms 163.74 ms

    I admit that it’s difficult to evaluate the results, especially considering the number of nodes of such a huge DOM tree. Lets compare the results of the same test on Firefox, using different number of nodes.

    Nodes 50K 20K 10K 5K 1K 500
    Chrome 83 163.74 ms 55.05 ms 25.12 ms 14.18 ms 2.74 ms 1.50 ms
    FF 78 28.35 ms 12.05 ms 6.10 ms 3.50 ms 1.15 ms 0.55 ms
    1/6 1/5 1/4 1/4 1/2 1/3

    As I commented before, the data are more accurate when the DOM tree has a lot of nodes; in any case, the difference is quite clear and shows there is plenty room for improvement. WebKit based browsers have results more similar to Chromium as well.

    Performance tests like the one above can be added to browsers for tracking improvements and regressions over time, so we’ve added (r763335) that to Chromium’s tree: We’d like to see it get faster over time, and definitely cannot afford regressions (see Chrome Performance Dashboard and the ChangeStyleCustomPropertyDeclaration test for details) .

    So… What can we do?

    In Chrome 83 and lower, whenever the custom property declaration changed, the new declaration would be inherited by the whole tree. This inheritance implied executing the whole CSS cascade and recalculating the styles of all the nodes in the entire tree, since with this approach, all nodes may be affected.

    Chrome had already implemented an optimization on the CSS cascade implementation for regular CSS properties that don’t depend on any other to resolve their value. These subset of CSS properties are defined as Independent Properties in the Chromium codebase. The optimization mentioned before affects how the inheritance mechanism is implemented for these Independent properties. Whenever one of these properties changes, instead of recalculating the styles of the inherited properties, children can just copy the whole parent’s computed style. Blink’s style engine has a component known as Matched Properties Cache responsible of deciding when is possible to avoid the style resolution of an element and instead, performing an efficient copy of the matched computed style. I’ll get back to this concept in the last part of this post.

    In the case of CSS Custom Properties, we could apply a similar approach as a good step. We can consider that the nodes with computed styles that don’t have references to custom properties declarations shouldn’t be affected by the new declaration, and we can implement the inheritance directly by copying the parent’s computed style. The patch with the optimization I’ve implemented in r765278 initially landed in Chrome 84.0.4137.0

    Let’s look at the result of this one action in the Chrome Performance Dashboard:

    That’s a really good improvement!

    However, it’s also just a first step. It’s clear that Chrome still has a wide margin for improvement in this case, as well any WebKit based browser – Firefox is still, impressively, markedly faster as it’s been described in the bug report filed to track this issue. The following table shows the result of the different browsers together; even disabling the muti-thread capabilities of Firefox’s Stylo engine (STYLO_THREAD=1), FF is much faster than Chrome with the optimization applied.

    Chrome 83 Chrome 84 FF 78 FF 78 th=1
    avg
    median
    stdev
    min
    max
    163.74 ms
    163.79 ms
    3.69 ms
    158.59 ms
    163.74 ms
    117.37 ms
    117.52 ms
    1.98 ms
    113.66 ms
    120.87 ms
    28.35 ms
    28.50 ms
    0.93 ms
    26.00 ms
    30.00 ms
    38.25 ms
    38.50 ms
    1.86 ms
    35.00 ms
    41.00 ms

    Before continue, I want get back to the Matched Properties Cache (MPC) concept, since it has an important role on these style optimizations. This cache is not a new concept in the Chrome’s engine; as a matter of fact, it’s also used in WebKit, since it was implemented long ago, before the fork that created the new blink engine. However, Google has been working a lot on this area in the last years and some of the most recent changes in the MPC have had an important impact on style resolution performance. As a result of this work, elements with independent and non-independent properties using CSS Variables might produce cache hits in the MPC. The results of the Performance Dashboard show a considerable improvement in the mentioned ChangeStyleCustomPropertyDeclaration test (avg: 108.06 ms)

    Additionally, there are several other cases where the use of CSS Variables has a considerable impact on performance, compared with using regular CSS properties. Obviously, resolving CSS Variables has a cost, so it’s clear that we could apply additional optimizations that reduce the impact of the variable resolution, especially for handling specific style changes that might not affect to a substantial portion of the DOM tree. I’ve been experimenting with the MPC to explore the idea an independent CSS Custom Properties cache; nodes with variables referencing the same custom property will produce cache hits in the MPC, even though other properties don’t match. The preliminary approach I’ve been implementing consists on a new matching function, specific for custom properties, and a mechanism to transfer/copy the property’s data to avoid resolving the variable again, since the property’s declaration hasn’t change. We would need to apply the css cascade again, but at least we could save the cost of the variable resolution.

    Of course, at the end of the day, improving performance has costs and challenges – and it’s hard to keep performance even once you get it. Bit if we really want performant CSS Custom Properties, this means that we have to decide to prioritize this work. Currently there is reluctance to explore the concept of a new Custom Properties specific cache – the challenge is big and the risks are not non-existent; cache invalidation can get complicated. But, the point is that we have to understand that we aren’t all going to agree what is important enough to warrant attention, or how much investment, or when. Web authors must convince vendors that these use cases are worth being optimized and that the cost and risks of such a complex challenges should be assumed by them.

    This work has been sponsored by Bloomberg, which I consider one of the most important contributors of the Web Platform. After several years, the vision of this company and its responsibility as consumer of the platform has lead to many and important contributions that we all enjoy now. Although CSS Grid Layout might be the most remarkable one, there are may other not that big, like this work on CSS Custom Properties, or several other new features of the CSS Text specification. This is a perfect example of an company that tries to change priorities and adapt the web platform to its needs and the use cases they consider more aligned with their business strategy.

    I understand that not every user of the web platform can do this kind of investment. This is why I believe that initiatives like Open Priorization could help to move the web platform in a positive direction. By providing a way for us to move past a lot of these conversation and focus on the needs that some web authors and users of the platform consider more important, or higher priority. Improving performance for CSS Custom Properties isn’t currently one of the projects we’ve listed, but perhaps it would be an interesting one we might try in the future if we are successful with these. If you haven’t already, have a look and see if there is something there that is interesting to you or your company – pledges of any size are good – ten thousand $1 donations are every bit as good as ten $1000 donations. Together, we can make a difference, and we all benefit.

    Also, we would love to hear about your ideas. Is improving CSS Custom Properties performance important to you? What else is? Share your comments with us on Twitter, either me (@lajava77) or our developer advocate Brian Kardell (@briankardell), or email me at jfernandez@igalia.com. I’d be glad to answer any question about the Open Priorization experiment.

    by jfernandez at August 13, 2020 06:16 PM

    August 12, 2020

    Brian Kardell

    Reimagining the Vendor-Only Model

    Reimagining the Vendor-Only Model

    I have been trying to organize my personal thoughts on this and figure out how to carefuly articulate them for a while now. I hadn't planned on posting this just yet but recent events and discussions make me think that maybe it is worth sharing.

    I've been talking a lot this year about the health of the Web Ecosystem[1], [2], [3],, kind of trying to raise the larger discussion that we should have a discussion about how we think about it and re-consider how we ensure it remains (or increases) its vibrance. Our recent open-prioritization experiment is meant, in part, to prompt some of this discussion and begin looking at ways to make things better. Recent news from Mozilla, it seems, has done as much to prompt some urgency on this discussion in people's minds, so let's dig into it.

    One comment that I heard a lot over the years goes something like this...

    "This is vendors responsibility - and their companies make bajillions of dollars on the web... Why don't they just hire more people?.

    It's a very natural question since this is the way we tried look at it in practice for roughly the first 25 years, and it's mostly how many of us still think about it. It's hard to even imagine something else with that kind of inertia. And... Maybe that model is fine.

    But then again... Maybe not. I think it's worth considering together, because I am ever increasingly on the "probably not" side. So, now that the conversation is started, let me try to explain why...

    The trouble with the vendor-only model...

    In the entire world, at the time of this writing, only 3 organizations currently make the level of investments necessary to maintain a rendering engine at all. Developers sometimes express frustration at the level of investment because 2 of these 3 organizations are part of the most profitable companies on earth, and they have chosen to invest very differently. That's fine. You can continue to do that, and I'm not suggesting that is invalid ... But I think this somewhat of a red herring and that we might be missing the more important conversation.

    At some level, what is far more interesting than the specific arbitrary amount and how it relates to their profits (by this argument both of them could, in theory, invest more) is the fact that they are under no actual obligation to do so, and that in the end,there are no guarantees about continued level of investment.

    Don't get me wrong: I'm tremendously happy that they do, and I absolutely want them to continue playing a core role - just not alone. I think this is too important to not discuss - because we've already seen several varieties of changes here over time. Whatever the world looks like today - the only thing you can be sure of is that change is inevitable. As hard as it is to believe, all the paradigms of organizational power will eventually shift. The Fortune 500 changes, ultimately many declare bankruptcy and/or are completely re-imagined. Even if we are extremely fortunate and a company really wants, at its very core, to put in a huge level of investment - there may come a day when they just can't.

    In other words: The traditional model of a few big vendors voluntarily funding development of the Web with very specific revenues puts all of eggs in a very particularly centralized and fickle basket. Just this week, there is related news from Mozilla, kind of demonstrating the latter point.

    Conversely: diversifying investments in the commons would buffer it significantly - and thankfully, it would seem that there are plenty of opportunities to do so. Let's look some..

    Diversifying investments

    The most obvious and immediate thing to do is to try to expand the sorts of things that have already begun happening and working...

    Finding additional ways of funding an organization like Mozilla itself is a good idea. Experimenting with new models for funding development like Brave or Puma which could potentially make a browser itself actually profitable are interesting ideas too.

    Some things, like CSS Grid, are examples of really big investments from non-browser vendors. A whole lot of that implementation in two browsers was funded by Bloomberg Tech and done by Igalia thanks to the increasingly open and collaborative nature of all of the engines. That is great too! There are many wealthy organizations out there (Fortune companies)who can help expand investment like that. If you work for an organization who is intersted in that, reach out to me!

    However, this avenue isn't immediately appealing even to a lot of big companies. Could we do more?

    Sure, why not! There is a much bigger group still of big companies who perhaps won't, themselves, fund a whole feature. What we are trying to show with open-prioritization is that there are many ways that we can pool money toward a specific purpose. Several big companies can share the costs. That seems good, and it's one thing that open-prioritization allows for: A comparatively small number of really big businesses working better together would be excellent all on its own.

    But then... If we're exploring diversifying funding, why place some arbitrary limits on "bigness"? What makes them special? Perhaps you don't have to be a mega-company to help advance things? Experimenting with this is also part of what open-prioritization is about -Maybe there doesn't even have to be a single answer. In fact, maybe it's better if there isn't!

    Exploring many models

    When I was a kid, and cable television was "new", my grandmother thought it was just ridiculous that someone would pay for television. After all, it came over the air wave "for free". Of course... It wasn't free, and despite it costing a lot to produce and run, it was a money making endeavor. The cost to maintain a station is comparatively tiny compared to the potential profits they could make from shoving advertising in our eyeballs. You paid. All the way through the system. A small amount of your product purchases funded advertising, which funded broadcast.

    Actually, the web isn't that different here. It is traditionally largely supported by very few revenues, mainly advertising in search. When ad revenue takes a hit, so do the budgets.

    But over time we've created lots of ways to explore different models. When it comes to watching TV and movies, I haven't seen "traditional commercial TV" in years - I use some other model to pay for content more directly. And, honestly, this has been good in a whole lot of ways. It's generated some real good content even. Still, there are other forms of short form content - podcasts or music services, for example, that I actually prefer the advertising model for. And, you can also donate to public radio.

    Open prioritization is aimed at exploring all of the different ways that we can realize our potential for power and give people opportunities to improve things in many ways.

    With a way to pool funding, there doesn't have to be artificial barriers. We can give smaller organizations opportunities to participate however they can: Maybe that is simply through advocating/lobbying bigger companies for specific dollars for development of specific thing. Perhaps sometimes they don't even need a bigger company. Perhaps together, several smaller organizations can rally together to act as a big one? It seems good to allow them to.

    The one thing that you'll notice that all answers have in common is that ultimately, somehow lots of people are paying in small amounts, and lots of little bits add up to a lot.

    The potential power of collectives

    With no arbitrary limits, even developers themselves could play a big, very direct role... If we want to.

    Should we? I don't know!

    I'd love to talk about it though because I think that hidden in here is something with immense potential for good by voluntarily even small commitments simply by realizing our sheer numbers. This is sort of the power of collectives. If you're looking for interesting analaogies, check out Pia Mancini's Talks/Media posts explaining things like how and why this can also work for governments.

    Consider that meetup.com alone lists something like 18 million people who identify as Web developers. Using that as just some kind of working number: If even 1/10th of those developers decided to voluntarily contribute a single dollar each month, this would be over 21.5 million dollars annually that we could use to collectively directly decide what should happen. This seems very plausible as you need far less than 1/10th if we assume that some people would be willing to give $2 or $5 or maybe even $10.

    In other words, in addition to all of the other ways that we can explore, even we developers actuallly have enormous potential untapped power even with very limited kinds of investment to play a role (not exclusively)... That's more than we can convince most big companies to invest, to directly shape things today. That would be some extreme diversification, and a real "commons". To me that seems really good to have in the mix... If we want it.

    I guess the question is: Do we?

    I am willing to accept the possibility that it is possible that the answer is simply "No thanks" - but I think it's great that at least we have the choice/opportunity and an avenue to begin to consider all of these things, directly, and have the discussion. I think that's a real positive. What do you think? I'd love to hear from you. If you like this idea, pass it on and maybe consider pledging $1 to one of the pledge collectives in our open-prioritization experiment to help advance the larger ideas and conversation.

    Open Prioritization, by Igalia: An experiment in crowdfunding prioritization

    August 12, 2020 04:00 AM

    July 31, 2020

    Alejandro Piñeiro

    v3dv status update 2020-07-31

    Iago talked recently about the work done testing and supporting well known applications, like the Vulkan ports of the Quake1, Quake 2 and Quake3. Let’s go back here to the lastest news on feature and bugfixing work.

    Pipeline cache

    Pipeline cache objects allow the result of pipeline construction to be reused. Usually (and specifically on our implementation) that means caching compiled shaders. Reuse can be achieved between pipelines creation during the same application run by passing the same pipeline cache object when creating multiple pipelines. Reuse across runs of an application is achieved by retrieving pipeline cache contents in one run of an application, saving the contents, and using them to preinitialize a pipeline cache on a subsequent run.

    Note that it may happens that a pipeline cache would not improve the performance of an application once it starts to render. This is because application developers are encouraged to create all the pipelines in advance, to avoid any hiccup during rendering. On that situation pipeline cache would help to reduce load times. In any case, that is not always avoidable. In that case the pipeline cache would allow to reduce the hiccup, as a cache hit is far faster than a shader recompilation.

    One specific detail about our implementation is that internally we keep a default pipeline cache, used if the user doesn’t provide a pipeline cache when creating a pipeline, and also to cache the custom shaders we use for internal operations. This allowed to simplify our code, discarding some custom caches that were already implemented.

    Uniform/storage texel buffer

    Uniform texel buffers define a tightly-packed 1-dimensional linear array of texels, with texels going through format conversion when read in a shader in the same way as they are for an image. They are mostly equivalent to OpenGL buffer texture, so you can see them as textures backed up by a VkBuffer (through a VkBufferView). With uniform texel buffers you can only do a formatted load.

    Storage texel buffers are the equivalent concept, but applied to images instead of textures. Unlike uniform texel buffers, they can also be written to in the same way as for storage images.

    Multisampling

    Multisampling is a technique that allows to reduce aliasing artifacts on images, by by sampling pixel coverage at multiple subpixel locations and then averaging subpixel samples to produce a final color value for each pixel. We have already started working on this feature, and included some patches on the development branch, but it is still a work in progress. Having said so, it is enough to get Sascha Willems’s basic multisampling demo working:

    Sascha Willems multisampling demo run on rpi4

    Bugfixing

    Again, in addition to work on specific features, we also spent some time fixing specific driver bugs, using failing Vulkan CTS tests as reference. So let’s take a look of some screenshots of Sascha Willem’s demos that are now working:

    Sascha Willems deferred demo run on rpi4

    Sascha Willems texture array demo run on rpi4

    Sascha Willems Compute N-Body demo run on rpi4

    Next

    We plan to work on supporting the following features next:

    • Robust access
    • Multisample (finish it)

    Previous updates

    Just in case you missed any of the updates of the vulkan driver so far:

    Vulkan raspberry pi first triangle
    Vulkan update now with added source code
    v3dv status update 2020-07-01
    V3DV Vulkan driver update: VkQuake1-3 now working

    by infapi00 at July 31, 2020 08:23 AM