Planet Igalia

December 02, 2021

Danylo Piliaiev

:tada: Turnip is Vulkan 1.1 Conformant :tada:

Khronos submission indicating Vulkan 1.1 conformance for Turnip on Adreno 618 GPU.

It is a great feat, especially for a driver which is created without hardware documentation. And we support features far from the bare minimum required for conformance.

But first of all, I want to thank and congratulate everyone working on the driver: Connor Abbott, Rob Clark, Emma Anholt, Jonathan Marek, Hyunjun Ko, Samuel Iglesias. And special thanks to Samuel Iglesias and Ricardo Garcia for tirelessly improving Khronos Vulkan Conformance Tests.


At the start of the year, when I started working on Turnip, I looked at the list of failing tests and thought “It wouldn’t take a lot to fix them!”, right, sure… And so I started fixing issues alongside of looking for missing features.

In June there were even more failures than there were in January, how could it be? Of course we were adding new features and it accounted for some of them. However even this list was likely not exhaustive because for gitlab CI instead of running the whole Vulkan CTS suite - we ran 1/3 of it. We didn’t have enough devices to run the whole suite fast enough to make it usable in CI. So I just ran it locally from time to time.

1/3 of the tests doesn’t sound bad and for the most part it’s good enough since we have a huge amount of tests looking like this:

dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_copy
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_copy_format_list
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_load
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_load_format_list
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_texture
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_texture_format_list
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_copy
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_copy_format_list
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_load
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_load_format_list
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_texture
dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_texture_format_list
...

Every format, every operation, etc. Tens of thousands of them.

Unfortunately the selection of tests for a fractional run is as straightforward as possible - just every third test. Which bites us when there a single unique tests, like:

dEQP-VK.fragment_operations.early_fragment.no_early_fragment_tests_depth
dEQP-VK.fragment_operations.early_fragment.no_early_fragment_tests_stencil
dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_depth
dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_stencil
dEQP-VK.fragment_operations.early_fragment.no_early_fragment_tests_depth_no_attachment
dEQP-VK.fragment_operations.early_fragment.no_early_fragment_tests_stencil_no_attachment
dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_depth_no_attachment
dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_stencil_no_attachment
...

Most of them test something unique that has much higher probability of triggering a special path in a driver compared to uncountable image tests. And they fell through the cracks. I even had to fix one test twice because the CI didn’t run it.

A possible solution is to skip tests only when there is a large swath of them and run smaller groups as-is. But it’s likely more productive to just throw more hardware at the issue =).

Not enough hardware in CI

Another trouble is that we had only one 6xx sub-generation present in CI - Adreno 630. We distinguish four sub-generations. Not only they have some different capabilities, there are also differences in the existing ones, causing the same test to pass on CI and being broken on another newer GPU. Presently in CI we test only Adreno 618 and 630 which are “Gen 1” GPUs and we claimed conformance only for Adreno 618.

Yet another issue is that we could render in tiling and bypass (sysmem) modes. That’s because there are a few features we could support only when there is no tiling and we render directly into the sysmem, and sometimes rendering directly into sysmem is just faster. At the moment we use tiling rendering by default unless we meet an edge case, so by default CTS tests only tiling rendering.

We are forcing sysmem mode for a subset of tests on CI, however it’s not enough because the difference between modes is relevant for more than just a few tests. Thus ideally we should run twice as many tests, and even better would be thrice as many to account for tiling mode without binning vertex shader.

That issue became apparent when I implemented a magical eight-ball to choose between tiling and bypass modes depending on the run-time information in order to squeeze more performance (it’s still work-in-progress). The basic idea is that a single draw call or a few small draw calls is faster to render directly into system memory instead of loading framebuffer into the tile memory and storing it back. But almost every single CTS test does exactly this! Do a single or a few draw calls per render pass, which causes all tests to run in bypass mode. Fun!

Now we would be forced to deal with this issue since with the magic eight-ball games would partly run in the tiling mode and partly in the bypass, making them equally important for real-world workload.

Does conformance matter? Does it reflect anything real-world?

Unfortunately no test suite could wholly reflect what game developers do in their games. However, the amount of tests grows and new tests are getting contributed based on issues found in games and other applications.

When I ran my stash of D3D11 game traces through DXVK on Turnip for the first time - I found a bunch of new crashes and hangs but it took fixing just a few of them for majority of games to render correctly. This shows that Khronos Vulkan Conformance Tests are doing their job and we at Igalia are striving to make them even better.

by Danylo Piliaiev at December 02, 2021 10:00 PM

Samuel Iglesias

VK_EXT_image_view_min_lod Vulkan extension released

One of the extensions released as part of Vulkan 1.2.199 was VK_EXT_image_view_min_lod extension. I’m happy to see it published as I have participated in the release process of this extension: from reviewing the spec exhaustively (I even contributed a few things to improve it!) to developing CTS tests for it that will be eventually merged to the CTS repo.

This extension was proposed by Valve to mirror a feature present in Direct3D 12 (check ResourceMinLODClamp here) and Direct3D 11 (check SetResourceMinLOD here). In other words, this extension allows clamping the minimum LOD value accessed by an image view to a minLod value set at image view creation time.

That way, any library or API layer that translates Direct3D 11/12 calls to Vulkan can use the extension to mirror the behavior above on Vulkan directly without workarounds, facilitating the port of Direct3D applications such as games to Vulkan. For example, projects like Vkd3d, Vkd3d-proton and DXVK could benefit from it.

Going into more details, this extension changed how the image level selection is calculated and sets an additional minimum required in the image level for integer texel coordinate operations if it is enabled.

The way to use this feature in an application is very simple:

  • Check the extension is supported and if the physical device supports the respective feature:
// Provided by VK_EXT_image_view_min_lod
typedef struct VkPhysicalDeviceImageViewMinLodFeaturesEXT {
    VkStructureType    sType;
    void*              pNext;
    VkBool32           minLod;
} VkPhysicalDeviceImageViewMinLodFeaturesEXT;
  • Once you know everything is working, enable both the extension and the feature when creating the device.

  • When you want to create a VkImageView that defines a minLod for image accesses, then add the following structure filled with the value you want in VkImageViewCreateInfo’s pNext.

// Provided by VK_EXT_image_view_min_lod
typedef struct VkImageViewMinLodCreateInfoEXT {
    VkStructureType    sType;
    const void*        pNext;
    float              minLod;
} VkImageViewMinLodCreateInfoEXT;

And that’s all! As you see, it is a very simple extension.

Happy hacking!

December 02, 2021 02:48 PM

November 29, 2021

Brian Kardell

Webrise

Webrise

Something in a recent piece by Jeremy Keith really clicked with something I’ve been thinking about, so I thought I’d write about it.

Recently Jeremy Keith published “The State of the Web”. It’s based on an opening talk from the An Event Apart Spring Summit earlier this year. He’s also made it available in audio format. Jeremy is a great story teller/writer/speaker, so it is unsurpringly a delightful read/listen, and I couldn’t recommend it enough. In it, he (beautifully) explains a lot about perspective. He holds up that

“Astronauts have been known to experience something called the overview effect. It’s a profound change in perspective that comes from seeing the totality of our home planet in all its beauty and fragility.”

He notes that the famous “earthrise” photo (below) that the astronauts took, gave everyone here on earth a very small taste of that too.

Earthrise, taken on December 24, 1968, by Apollo 8 astronaut William Anders.

Then he asks...

“I wonder if it’s possible to get an overview effect for the World Wide Web?”

When I heard this, I realized: This is exactly what I have been trying to get at too, just from a different angle.

Zoom out...

I have been trying to ask people to put aside a lot of the conversations that we typically have for a moment. Zoom out, and see the whole ecosystem.

It’s not that all of things we talk about today are unimportant, in fact, some of them are profoundly important - but when you zoom way out and look at the whole thing, you gain some new perspective on all of them... And more.

Gaining new perspective can have big impacts. In How we got to now Steven Johnson describes how mirrors, and the simple ability to see one’s self ultimately impacted art, literature and politics. It literally helped shape the world in profound ways.

Since coming to work at Igalia, I’ve gotten the rare privilidge to observe the web ecosystem from a whole new point of view. That is, not the site and pages, but what makes all of that stuff possible and holds it all together? This has caused exactly that sort of “overview effect” shift for me, and really want to share it.

There is, unfortunately, no camera with which to snap a nice neat “Webrise” photo that I can distribute, nor a mirror I can just show people. So, I’ll try to use words.

New Perspective

We spend so much time discussing particular details: “Why are none of them giving time to feature Q?”. Or, “Why does z push so many (or few) features?”. Or even, “Why are they all doing x instead of y?!”. We imagine larger motives. We fill volumes with debates.

But, from my vantage point, I see something that informs all of those, and seems far more important.

We’ve built the web so far on a very particular model, with web engine implementers at the center. The whole world is leaning extremely heavily on the indepenendent, voluntary funding (and management and prioritization) of a very few steward organizations.

Maybe that isn’t a great idea.

Fragility

However much we might have convinced ourselves that this is how it should work, it feels increasingly bad to me. It seems like it is not a lasting strategy, and we really need one that is. Web engines have to outlast each of those voluntary investments/investors. The current situation feels precarious.

There are, of course, a lot of variables, but I can easily imagine a lot of ways that it could all come apart - either with a bang, or a whimper.

Imagine, for example, that Apple is convinced to match or even exceed Google’s contributions. Yay! A boom in innovation! Great! It seems hard to imagine Mozilla not being left behind. Maybe Google responds in kind and we enter a sort of “new arms” race. Again, great for a lot of things, but sustainability isn’t one. In this scenario it feels almost certain to me that Mozilla (the only foundation here) is the first casuality, but maybe not the last. The aim of a war is to win. And then what? Microsoft won the first browser wars, and then left the game.

Or perhaps legislation hits Google’s default search deal and seriously disrupts things in the ecosystem. That’s pretty much all of the actual funding for Mozilla. Uh oh. Interestingly too though, Apple’s entire earnings would suddenly dip by something like 15-20%. Yikes! Perhaps there are simply changes in leadership. Or several of these happen together. All sorts of things like these cause companies to re-evaluate what they’re spending money on. If any of these things caused reevaluation by either company, it’s not impossible for me to imagine them deciding that maybe the costs outweigh the benefits of maintaining an engine in the first place. That’s an entirely normal response and there are historical precedents: Opera did it. Microsoft did it. And the problems in this space only get harder and harder (see next section).

The only thing I can say for sure is that things will change for businesses we’re leaning on. In fact, things have changed. In 1993 when the web was still in its infancy - Microsoft had just entered the top 10 by market cap. By actual revenue, they weren’t even on the list. In fact, the first company who makes an engine to appear on the top 10 list by revenue was Apple, in 2014… At the same time, there are several other tech companies who are also on the list who don’t invest in implementations at all. Many have come and gone since. It is incredibly rare for a business to stay on the Fortune 500 list (let alone dominate it) for more than 10-15 years. When this status is lost, actions and reevaluation usually follow and as a result key dominating names in computing have disappeared entirely.

Not just more sustainable… More.

Being reliant on the historical model isn’t just possibly precarious in the long term - it also has definite limits. All of the engine teams, no matter how big, have to do some fairly radical prioritorization. The backlog is already miles long, and subject to lots of filters. It’s not just how big a team is, but tons of mundane things about the makeup, expertise, and often the vision and current state of some area of code in their engine.

A really basic implication of this is that rollouts of features can be extremely ragged, but it’s much more than that. It also means that they have to short circuit things where they can. Even in standards discussion, it means a lot of potentially good stuff just can’t get discussed. In the end, it's hard to work all of those things in a way that it’s easy to say is representative of everyone if only a few are investing in the commons.

Solvable, but not solved anywhere

Luckily, this is all very solvable, and is very much in our control. To some extent we’ve already started to address it: There are today, more limited partners too. I think that somehow people have impressions of this, but we don’t talk about the actual details much, so let’s…

It’s held up, for example, that Microsoft, Samsung and Intel are all Chromium partners. That’s great, and that isn’t even counting Igalia, who has, for the last few years, made more contributions than anyone outside of Google.

Others, I’ve heard say that Mozilla has a veritable army of independent contributors too.

Conversely, I often hear WebKit described as “mainly an Apple thing”. However, there are partners there too. Igalia, Sony and RedHat all contribute significantly, for example.

However, if we look at commits: Over 80% of contributions to Chromium come from Google, about 77% of contributions in WebKit come from Apple, and at Mozilla Central - about 82% of commits are from Mozillans.

In other words, they aren’t all that different in terms of diversity of investment. Each engine project would seem to have about 20-25% of its investment diversified. That’s way better than exclusive investment, but I think we’ve still got a long way to go.

One obvious solution seems to be for existing implementation partners to simply ramp up contribution budgets.

Sure that would be great on its own, but that’s still a really small number of organizations. There are many big tech web companies who aren’t on that list at all, at lest one of them is in the “trillion dollar club”.

Imagine what we could do if we changed our perspective, and built a model in which we invested and prioritized more collaboratively. Imagine how much more resilient that would be.

Collective Funding for Collective Benefits

We’ve spent a lot of time trying to solve problems together in standards, but we don’t then also act together. But… we could.

In fact, why should we stop at a dozen big tech companies making gigantic or general investments? We could decide that investments, could also be shared in different ways. Funding doesn’t need to come from giant sources, or to be generally purposed. 10 companies agreeing to invest $10k apiece to advance and maintain some area of shared interest is every bit as useful as 1 agreeing to invest $100k generally. In fact, maybe it’s more representative.

Igalia has help advanced things that boost capabilities for everyone by working with individual organizations, often much smaller ones, who have considerably finite asks: More responsive cable box interfaces or more fluid SVG interfaces on their cooking machines. We do this precisely because we can see the interconnectedness of it.

We believe that there is a very long tail of increasingly smaller companies who could do something, if only they coordinated to fund it together. The further we stretch this out, the more sources we enable, the more its potential adds up.

That’s part of what our Open Prioritization efforts have been about. We’re trying to shine light on this in different ways, open new doors and help people see the web ecosystem from a different perspective.

My colleage, Eric Meyer recently gave a talk to W3C member orgnizations on this topic, and we did a podcast together on it too, as part of announcing our new MathML-Core Support Collective. You can find links to both and learn more about it in this announcement.

If you find this interesting, please let us know. Consider talking to orgnanizations interested in promoting the advancement of more rapidly interoperable and standard and accessible mathematical support on the web by adding some supporting funding through the collective - but also to organizations who aren’t interested in math specifically about the bigger idea. I’m hopeful that we can shift our perspective.

November 29, 2021 05:00 AM

November 23, 2021

Alexander Dunaev

Drop shadows on Linux, or why standards are good

Since the origins of graphical desktop environments, there were two approaches to styling GUI of an application: using the standard system toolkit versus choosing a custom one.

When a single platform is targeted, choosing the approach is often the matter of aesthetics or some particular features that may be supported in certain toolkits. The additional cost of adopting a custom toolkit may actually be a one-time investment, and if the decision to use it is taken at the right time, the cost may be low. However, when it comes to cross-platform applications, using a cross-platform toolkit is the obvious choice.

GUI toolkits do a good job at rendering the contents of the window, but there is an area where they usually step aside: window decorations. Even if we look at cross-platform toolkits, the best they can do is provide some façade for the standard options available on supported platforms. But what if we want to customise everything?

Let us take a look at some random window in a modern desktop environment.

This is KCalc, the standard calculator application built into KDE Plasma desktop environment.

KCalc, the standard application built into KDE Plasma

What if we wanted to replicate that on our own? At first glance, no big deal. Drawing the title bar would not be that difficult, as long as we render everything in the window. The border is easy too, and rounded corners are also feasible if the window manager supports transparency.

But the window also has a drop shadow. We have to render it too, and this is where things become tricky.

KCalc vs. Chromium.  Drop shadows look quite different.

KCalc vs. Chromium, note how different the shadows are

Yes, the drop shadow is essentially just one more area inside the window, we have to render it, and also we have to make things around it work smoothly. The inner strip of the shadow should work as the frame of the window where the user would see the resize mouse pointer (and it should work that way), while the outer part should be totally transparent for the mouse events, but not totally—to the user’s eye.

The outermost rectangle is the edge of the “real” window; the innermost one is the “logical” one. The narrow strip (partially striped) that borders the logical window is the resize area.

Basically, to be able to do what we have just explained, we need two things. The first one is support for transparency in the window manager. The second one is some way to tell the window manager where our “logical” window resides within the “real” one, so that the environment could correctly snap our window to the edge of the screen or to other windows when we drag it there. (The inner part that makes sense as a window to the user is often called “window geometry”.)

On Wayland, transparency is always supported (yay!), and the concept of the window geometry is part of the desktop shell protocol, such as xdg_wm_base. Both requirements are met.

On X11 it is more complicated. First, transparency is not always supported, but let us assume that we have that support, otherwise we cannot have any shadows. The major pain is setting the window geometry, or to say better, the lack (at the moment of writing) of a standard way to do so. There is a _GTK_FRAME_EXTENTS window property that, as its name suggests, was once introduced in GTK. There it seems to be used to define margins at the edges of the window—you may ask, “it seems”? Are you not certain? Well, yes, because that property is not documented. There are a few other posts about this issue on the internet. I would recommend What are _GTK_FRAME_EXTENTS and how does Gnome Window Sizing work? by Erwin and CSD support in KWin by Vlad Zahorodnii.

Currently _GTK_FRAME_EXTENTS is supported by GNOME (naturally) and KDE Plasma (reverse engineered). In other desktop environments (or to say better, in window managers other than Mutter and KWin) setting it may cause weird issues.

Precisely that issue is what happened to Chromium.

In regards to the window decorations, the Linux port of Chromium was a bit backwards for a very long time. It had an old style thick frame with sharp corners and without the drop shadow. Finally, that had been improved, and the modern window decorations were shipped in Chromium version 94. The new implementation used _GTK_FRAME_EXTENTS to define the shadow area.

Soon after that, a bug report came from users of Enlightenment. In that environment things inside the Chromium window went mad, mouse clicks strayed from the actual position of the pointer. The quick investigation (it was really quick thanks to the help of people who reported the problem) showed that the culprit was that very window property. The window manager got confused when the frame extents were set to zeros for a maximised window, instead it expected the property to be reset completely.

Soon after we landed the fix, and people from Enlightenment confirmed that the issue was resolved, another bug report came, this time from Xfce. There, the investigation was a bit longer, but finally we found (thanks to the help of people who reported the problem and to the maintainers of the window manager) that the window manager in that environment actually expects quite the opposite: for the maximised window it wants all zeros, and gets confused if the property is reset completely.

The situation came to a dead end. Two window managers wanted exactly the opposite things. What could be done to resolve the issue? We could easily end up having workarounds for every non-standard window manager, which is one of the most unpleasant situations in software maintenance.

Luckily, the maintainers of Xfwm4 (the window manager in Xfce) suggested fixing the issue from their side, and landed the fix really promptly. So this story has a happy end!

Or rather, the story will have a happy end, because we still had to put in a workaround for Xfwm4 that disables window decorations on that window manager. The workaround is temporary, and we will remove it once Linux distributions that base on Xfwm4 adopt the fix.

by Alex at November 23, 2021 01:19 PM

November 19, 2021

Tim Chevalier

The emotional roller coaster that is programming

I skipped a few days’ worth of updates; turns out it’s a bit difficult to fit in time to write an entire post when your work schedule isn’t very consistent.

In the meantime, I finished implementing all the record and tuple opcodes in the JIT. Having done some manual testing, it was time to start running the existing test suite with the compiler enabled. Fortunately, I figured out the flag to pass in so that I wouldn’t have to add it to each test file by hand (which would have been bad practice anyway):

mach jstests Record --args=--baseline-eager

This runs all the tests with Record in the name and adds the --baseline-eager flag in to the JavaScript shell.

At this stage, failures are good — it means there’s still something interesting left to work on. Yay, a failure!

Hit MOZ_CRASH(Unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIR.cpp:7745
REGRESSION - non262/Record/equality.js
[7|1|0|0] 100% ======================================================>|   1.1s
REGRESSIONS
    non262/Record/equality.js
FAIL
 

Narrowing down the code that caused the failure, I got:

js> Object.is(#{x: +0}, #{x: -0})
Object.is(withPosZ, withNegZ)
Hit MOZ_CRASH(Unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIR.cpp:7745

Thread 1 "js" received signal SIGSEGV, Segmentation fault.
0x00005555583763f8 in js::jit::CallIRGenerator::tryAttachObjectIs (this=0x7fffffffcd10, callee=...)
    at /home/tjc/gecko-fork/js/src/jit/CacheIR.cpp:7745
7745            MOZ_CRASH("Unexpected type");
(gdb) 

So this told me that I hadn’t yet implemented the cases for comparing records/tuples/boxes to each other in Object.is() in the JIT.

Fixing the problem seemed straightforward. I found the CallIRGenerator::tryAttachObjectIs() method in CacheIR.cpp. The CallIRGenerator stub takes care of generating code for built-in methods as they’re called; each time a known method is called on a particular combination of operand types that’s implemented in the baseline compiler, code gets generated that will be called next time instead of either interpreting the code or re-generating it from scratch.

For example, this code snippet from tryAttachObjectIs() shows that the first time Object.is() is called with two int32 operands, the compiler will generate a version of Object.is() that’s specialized to this case and saves the need to call a more generic method and do more type checks. Of course, the generated code has to include a check that the operand types actually are int32, and either call a different generated method or generate a new stub (specialized version of the method) if not.

    MOZ_ASSERT(lhs.type() == rhs.type());
    MOZ_ASSERT(lhs.type() != JS::ValueType::Double);

    switch (lhs.type()) {
      case JS::ValueType::Int32: {
        Int32OperandId lhsIntId = writer.guardToInt32(lhsId);
        Int32OperandId rhsIntId = writer.guardToInt32(rhsId);
        writer.compareInt32Result(JSOp::StrictEq, lhsIntId, rhsIntId);
        break;
      }

The existing code handles cases where both arguments have type Int32, String, Symbol, Object, et al. So it was easy to follow that structure and add a case where both operands have a box, record, or tuple type. After a fun adventure through the MacroAssembler, I had all the pieces implemented and the test passed; I was able to apply Object.is() to records (etc.) with the baseline compiler enabled.

After that, all the tests for records passed, which isn’t too surprising since there aren’t many methods for records. Next, I tried running the tests for what’s currently called Box in the Records and Tuples proposal (subject to change), and got more failures; still a good thing.

mach-with record-tuple-with-jit jstests Box --args=--baseline-eager
[1|0|0|0]  20% ==========>                                            |   1.2s

Hit MOZ_CRASH(unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIRCompiler.cpp:1930
REGRESSION - non262/Box/unbox.js
[1|1|0|0]  40% =====================>                                 |   1.2s

Hit MOZ_CRASH(unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIRCompiler.cpp:1930
REGRESSION - non262/Box/json.js
[1|2|0|0]  60% ================================>                      |   1.3s

Hit MOZ_CRASH(unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIRCompiler.cpp:1930
REGRESSION - non262/Box/constructor.js
[2|3|0|0] 100% ======================================================>|   1.3s
REGRESSIONS
    non262/Box/unbox.js
    non262/Box/json.js
    non262/Box/constructor.js
FAIL

The common cause: generating code for any method calls on Boxes invokes GetPropIRGenerator::tryAttachPrimitive() (also in CacheIR.cpp as above), which didn’t have a case for records/tuples/boxes. (In JavaScript, a method is just another property on an object; so the GetProp bytecode operation extracts the property, and calling it is a separate instruction.) Similarly to the above, I added a case, and the code worked; I was able to successfully call (Box({}).unbox()) with the compiler enabled.

The next test failure, in json.js, was harder. I minimized the test case to one line, but wasn’t able to get it any simpler than this:

JSON.stringify(Box({}), (key, value) => (typeof value === "box" ? {x: value.unbox() } : value))

This code calls the JSON.stringify() standard library method on the value Box({}) (a box wrapped around an empty object); the second argument is a function that’s applied to the value of each property in the structure before converting it to a string. The fix I made that fixed unbox.js got rid of the MOZ_CRASH(unexpected type) failure, but replaced it with a segfault.

It took me too many hours to figure out that I had made the mistake of copying/pasting code without fully understanding it. The cached method stubs rely on “guards”, which is to say, runtime type checks, to ensure that we only call a previously-generated method in the future if the types of the operands match the ones from the past (when we generated the code for this particular specialization of the method). When making the change for Object.is(), I had looked at CacheIRCompiler.cpp and noticed that the CacheIRCompiler::emitGuardToObject() method generates code that tests whether an operand is an object or not:

bool CacheIRCompiler::emitGuardToObject(ValOperandId inputId) {
  JitSpew(JitSpew_Codegen, "%s", __FUNCTION__);
  if (allocator.knownType(inputId) == JSVAL_TYPE_OBJECT) {
    return true;
  }

  ValueOperand input = allocator.useValueRegister(masm, inputId);
  FailurePath* failure;
  if (!addFailurePath(&failure)) {
    return false;
  }
  masm.branchTestObject(Assembler::NotEqual, input, failure->label());
  return true;
}

The generated code contains a “failure” label that this code branches to when the operand inputId is not an object. (It’s up to the caller to put appropriate code under the “failure” label so that this result will be handled however the caller wants.) I copied and pasted this code to create an emitGuardToExtendedPrimitive() method (“extended primitives” are what we’re calling records/tuples/boxes for now), and changed JSVAL_TYPE_OBJECT to JSVAL_TYPE_EXTENDED_PRIMITIVE so that the code would check for the “extended primitive” runtime type tag instead of the “object” type tag. The problem is that I also needed to use something else instead of branchTestObject. As it was, whenever a stub that expects a record/tuple/box as an argument was generated, it would be re-used for operands that are objects. This is obviously unsound and, looking at the failing test case again, we can see why this code exposed the bug:

JSON.stringify(Box({}), (key, value) => (typeof value === "box" ? {x: value.unbox() } : value))

The first time the (key, value) anonymous function is called, the name value is bound to Box({}). So a stub gets generated that’s a version of the typeof operation, specialized to Box things (actually anything that’s a record, tuple, or box, for implementation-specific reasons). The stub checks that the operand is a record/tuple/box, and if so, returns the appropriate type tag string (such as “box”). Except because of the bug that I introduced, this stub got re-used for any object operands. The way that the JSON stringify code works (JSON.cpp), it calls the “replacer” (i.e. the anonymous (key, value) function) on the value of each property — but then, it calls the replacer again on the replaced value. So my generated stub that worked perfectly well for Box({}) was subsequently called on {x: {}}, which has an entirely different type; hence the segfault.

Finding this bug took a long time (partly because I couldn’t figure out how to enable the “CacheIR spew” code that prints out generated CacheIR code, so I was trying to debug generated code without being able to read it…), but I experimented with commenting out various bits of code and eventually deduced that typeof was probably the problem; once I read over the code related to typeof, I spotted that my emitGuardToExtendedPrimitive() method was calling branchTestObject(). Adding a branchTestExtendedPrimitive() method to the macro assembler was easy, but tedious, since the code is architecture-specific. It would be nice if dynamic type-testing code was automatically generated, since the code that tests for tags denoting different runtime types is all basically the same. But rather than trying to automate that, I decided it was better to bite the bullet, since I already had enough cognitive load with trying to understand the compiler as it is.

It turned out that the json.js test case, despite being designed to test something else, was perfect for catching this bug, since it involved applying the same method first to a Box and then to an object. Once I’d fixed the problem with guards, this test passed. The constructor.js test still fails, but that just means I’ll have something interesting to work on tomorrow.

Perhaps the swings from despair to elation are why programming can be so habit-forming. While trying to track down the bug, I felt like I was the dullest person in the world and had hit my limit of understanding, and would never make any further progress. When I found the bug, for a moment I felt like I was on top of the world. That’s what keeps us doing it, right? (Besides the money, anyway.)

By the way, I still don’t fully understand inline caching in SpiderMonkey, so other resources, such as “An Inline Cache Isn’t Just A Cache” by Matthew Gaudet, are better sources than my posts. I mean for this blog to be more of a journal of experimentation than a definitive source of facts about anything.

by Tim Chevalier at November 19, 2021 07:28 AM

November 17, 2021

Manuel Rego

The path of bringing :focus-visible to WebKit

Last weekend I was speaking at CSS Conf Armenia 2021 about the work Igalia has been doing adding support for :focus-visible in WebKit.

The slides of my talk are available on this blog and the video is on Igalia’s YouTube channel.

The presentation is divided in 4 parts:

  1. An introduction about :focus-visible feature, paying attention to some special details.
  2. An explanation of the Open Prioritization effort from Igalia that lead to the implementation of :focus-visible in WebKit.
  3. A summary of the work done during this year.
  4. Some discussion about the next steps looking forward to ship :focus-visbile in Safari/WebKit.

Last but not least, thanks again to all the people and organizations that have sponsored the implementation of :focus-visible in WebKit. We’re closer than ever to see it shipping there. We’ll keep you posted!

November 17, 2021 11:00 PM

November 16, 2021

Igalia Compilers Team

Recent talks at GUADEC and NodeConf

Over the summer and now going into autumn, Igalia compilers team members have been presenting talks at various venues about JavaScript and web engines. Today we’d like to share with you two of those talks that you can watch online.

First, Philip Chimento gave a talk titled “What’s new with JavaScript in GNOME: The 2021 edition” at GUADEC 2021 about GNOME’s integrated JavaScript engine GJS. This is part of a series of talks about JavaScript in GNOME that Philip has been giving at GUADEC for a number of years.

You can watch it on Youtube here and the slides for the talk are available here.

Screenshot of NodeConf 2021 talk

Second, Romulo Cintra gave a talk at NodeConf Remote 2021 titled “IPFS – InterPlanetary File System with Node.js”. In this talk, Romulo introduces IPFS: a new distributed file system protocol for sharing files and media in a peer-to-peer fashion. Romulo also talks about some of the efforts to bring this to the web (https://arewedistributedyet.com/) and goes over how IPFS can be used with Node.js.

You can watch Romulo’s talk on YouTube as well by going here.

The slides for the talk are available here or you can even use IPFS to download it: ipfs://QmQCZaHJBZVFncftY8YGsS3BEbgA9Pu6B3JT4gdE7EhELD

by Compilers Team at November 16, 2021 04:26 PM

November 10, 2021

Tim Chevalier

Adventures in gdb

I picked up from yesterday wanting to see what code was being generated for record initialization. A colleague pointed me to a page of SpiderMonkey debugging tips. This was helpful, but required being able to run the JS interpreter inside GDB and type some code into the REPL. The problem is that before it got to that point, the interpreter was trying to compile all the self-hosted code; I knew that this wasn’t going to succeed since I’ve only implemented one of the record/tuple opcodes. I wanted to be able to just do:

> x = #{}

(binding the variable x to an empty record literal) and see the generated code. But because the much-more-complicated self-hosted code has to get compiled first, I never get to that point.

Another colleague suggested looking at the IONFLAGS environment variable. This, in turn, seems to only have an effect if you build the compiler with the --enable-jitspew option. Once I did that, I was able to find out more:

$ IONFLAGS=zzzz mach run
obj-x64-debug/dist/bin/js
found tag: zzzz
Unknown flag.

usage: IONFLAGS=option,option,option,... where options can be:

  aborts        Compilation abort messages
  scripts       Compiled scripts
  mir           MIR information
    ...
    

And so on.

I found that IONFLAGS=codegen mach run would cause the interpreter to print out all the generated assembly code, including all the code for self-hosted methods. This wasn’t entirely helpful, since it was hard to see where the boundaries were between different methods.

I decided to try a different strategy and see what I could do inside gdb. I’ve avoided using debuggers as much as possible throughout my programming career. I’m a fan of printf-style debugging. So much so that I created the printf-style debugging page on Facebook. (This made more sense back when Facebook pages were “fan pages”, so you could be a “fan of” printf-style debugging.) I’ve always had the feeling that any more sophisticated debugging technology wasn’t worth the difficulty of use. Working on a compiler implemented in C++, though, it seems I’m finally having to suck it up and learn.

The first question was how to set a breakpoint on a templated function. I found the rbreak command in gdb, which takes a regular expression. I realized I could also just do:

(gdb) info functions .*emit_InitR.*
All functions matching regular expression ".*emit_InitR.*":

File js/src/jit/BaselineCodeGen.cpp:
2590:   bool js::jit::BaselineCodeGen::emit_InitRecord();
2590:   bool js::jit::BaselineCodeGen::emit_InitRecord();

File js/src/jit/BaselineIC.cpp:
2454:   bool js::jit::FallbackICCodeCompiler::emit_InitRecord();
(gdb)

So I set a breakpoint on the method I wrote to generate code for the InitRecord opcode:

(gdb) b js::jit::BaselineCodeGen::emit_InitRecord
Breakpoint 1 at 0x555558093884: file /home/tjc/gecko-fork/js/src/jit/BaselineCodeGen.cpp, line 2591.
(gdb) b js::jit::FallbackICCodeCompiler::emit_InitRecord
Breakpoint 2 at 0x5555580807b1: file /home/tjc/gecko-fork/js/src/jit/BaselineIC.cpp, line 2455.
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/tjc/gecko-fork/obj-x64-debug/dist/bin/js 
[snip]

Thread 1 "js" hit Breakpoint 2, js::jit::FallbackICCodeCompiler::emit_InitRecord (this=0x7fffffffd1b0)
    at /home/tjc/gecko-fork/js/src/jit/BaselineIC.cpp:2455
2455      EmitRestoreTailCallReg(masm);
(gdb) 

Finally! At this point, I was hoping to be able to view the code that was being generated for the empty record literal. Stepping through the code from here gave me what I was looking for:

(gdb) s
js::jit::FallbackICCodeCompiler::tailCallVMInternal (
    this=0x7fffffffd1b0, masm=..., 
    id=js::jit::TailCallVMFunctionId::DoInitRecordFallback)
    at /home/tjc/gecko-fork/js/src/jit/BaselineIC.cpp:510
510   TrampolinePtr code = cx->runtime()->jitRuntime()->getVMWrapper(id);
(gdb) n
511   const VMFunctionData& fun = GetVMFunction(id);
(gdb) n
512   MOZ_ASSERT(fun.expectTailCall == TailCall);
(gdb) n
513   uint32_t argSize = fun.explicitStackSlots() * sizeof(void*);
(gdb) n
514   EmitBaselineTailCallVM(code, masm, argSize);
(gdb) n
515   return true;
(gdb) p code
$18 = {value = 0x1e4412b875e0 "H\277"}
(gdb) p code.value
$19 = (uint8_t *) 0x1e4412b875e0 "H\277"
(gdb) x/64i code.value
   0x1e4412b875e0:  movabs $0x7ffff4219000,%rdi
   0x1e4412b875ea:  mov    0x1c0(%rdi),%rax
   0x1e4412b875f1:  mov    %rsp,0x70(%rax)
   0x1e4412b875f5:  movabs $0x55555903de60,%r11
   0x1e4412b875ff:  push   %r11
   0x1e4412b87601:  lea    0x18(%rsp),%r10
   0x1e4412b87606:  movabs $0xfff9800000000000,%r11
     

So that’s the generated code for DoInitRecordFallback (the fallback method implemented in the inline cache module of the baseline compiler), but I realized this wasn’t really what I was hoping to find. I wanted to see the intermediate representation first.

From there, I realized I was barking up the wrong tree, since the baseline compiler just goes straight from JS to assembly; only the more sophisticated compilers (which weren’t being invoked at this point) use MIR and LIR. (A blog post from Matthew Gaudet, “A Beginners Guide To SpiderMonkey’s MacroAssembler”], explains some of the pipeline.)

So at least I knew one way to get to the generated assembly code for one opcode, but it wasn’t particularly helpful. My co-worker suggested putting in no-op implementations for the other opcodes so that it would be able to compile all the self-hosted code (even if the generated code wouldn’t work). This seemed like the fastest way to get to a functioning REPL so I could experiment with simpler code snippets, and it worked. After just adding a no-op emit_ method in BaselineCodeGen.cpp for each opcode, the interpreter was able to start up.

When I typed code into the REPL, I could tell it was only being interpreted, not compiled, since everything still worked, and I would expect anything that used records/tuples except for an empty record literal to fail. I found the --baseline-eager flag with a little bit of digging, and:

obj-x64-debug/dist/bin/js --baseline-eager
js> function f() { return #{}; }
function f() { return #{}; }
js> f()
f()
Assertion failure: !BytecodeOpHasIC(op) (Missing entry in OpToFallbackKindTable for JOF_IC op), at js/src/jit/BaselineIC.cpp:353
Segmentation fault
$

Excellent! This pointed to something I didn’t change yesterday (since the compiler didn’t make me) — I had to update the OpToFallbackKindTable in BaselineIC.cpp.

Once I did that, I realized that I couldn’t get very far with just InitRecord, since I wouldn’t expect even the empty record to compile without being able to compile the FinishRecord opcode. (Since records are immutable, Nicolò’s implementation adds three opcodes for creating records: one to initialize the empty record, one to add a new record field, and one to finish initialization, the last of which marks the record as immutable so that no more fields can be added.)

So I implemented FinishRecord, similarly to the work from yesterday. Now what? I was able to type in an empty record literal without errors:

> x = #{}
#{}

But how do I know that x is bound to a well-formed record that satisfies its interface? There’s not too much you can do with an empty record. I decided to check that typeof(x) worked (it should return “record”), and got an assertion failure in the emitGuardNonDoubleType() method in CacheIRCompiler.cpp). It took me some time to make sense of various calls through generated code, but the issue was the TypeOfIRGenerator::tryAttachStub() method in CacheIR.cpp:

AttachDecision TypeOfIRGenerator::tryAttachStub() {
[...snip...]
  TRY_ATTACH(tryAttachPrimitive(valId));
  TRY_ATTACH(tryAttachObject(valId));

  MOZ_ASSERT_UNREACHABLE("Failed to attach TypeOf");
  return AttachDecision::NoAction;
    }
    

This code decides, based on the type of the operand (valId) whether to use the typeOf code for primitives or for objects. The record/tuple implementation adds “object primitives”, which share some qualities with objects but aren’t objects (since, among other things, objects are mutable). The tryAttachPrimitive() call was successfully selecting the typeOf code for primitives, since the isPrimitive() method on the Value type returns true for object primitives. Because there was no explicit case in the code for records, the code for double values was getting called as a fallback and that’s where the assertion failure was coming from. Tracking this down took much more time than actually implementing typeOf for records, which I proceeded to do. And now I can get the type of a record-valued variable in compiled code:

js> x = #{}
    #{}
js> typeof(x)
"record"

This provides at least some evidence that the code I’m generating is laying out records properly. Next up, I’ll try implementing the opcode that adds record properties, so that I can test out non-empty records!

by Tim Chevalier at November 10, 2021 06:06 AM

November 09, 2021

Tim Chevalier

Adding record and tuple support to the JIT

Today I started working on implementing the Record and Tuples proposal for JavaScript in the JIT in SpiderMonkey. All of this work is building on code written by Nicolò Ribaudo, which isn’t merged into SpiderMonkey yet but can be seen in patches linked from the Bugzilla bug.

Up until now, SpiderMonkey would automatically disable the JIT if you built it with the compile-time flag that enables records and tuples. Currently, the interpreter implements records and tuples, but not the compiler. I started by searching through the code to figure out how to re-enable the JIT, but realized it would be faster to look through the commit history, and found it in js/moz.configure. (If you try to follow along, you won’t be able to see some of the code I’m referring to since it’s in unapplied patches, but I’m including some links anyway to give context.)

I saw that if I just pass in the --enable-jit build flag explicitly, it should override what the config file said, and it indeed did. I decided to operate on the assumption that the compiler error messages would tell me what I needed to implement, which isn’t always a safe assumption when working in C/C++, but seems to have served me okay in my SpiderMonkey work so far.

The first set of compiler errors I got had to do with adding the IsTuple() built-in method to the LIR. (The MIR and LIR, two of the intermediate languages used in SpiderMonkey, are explained briefly on the SpiderMonkey documentation page.) This involved implementing EmitObjectIsTuple() and visitIsTuple methods in CodeGenerator.cpp, part of the baseline compiler (the documentation also explains the various compilers that make up the JIT). That was straightforward, since IsTuple() is just a predicate that returns true for tuple arguments and false for arguments of any other type. When I implemented this method before, I chose to implement it as a JS_INLINABLE_FN, not knowing what I was getting myself into. With JIT disabled at compile time, the compiler made me implement it down to the MIR level, but now I had to implement it in LIR.

Once that was done, I ran the interpreter and got an assertion failure: "Hit MOZ_CRASH(Record and Tuple are not supported by jit) at gecko-fork/js/src/jit/BaselineCodeGen.cpp:2589". This was excellent, since it told me exactly where to start. When I looked at BaselineCodeGen.cpp, I saw that the seven opcodes for records and tuples were all defined with the UNSUPPORTED_OPCODE macro, so I planned to proceed by removing each of the UNSUPPORTED_OPCODE calls one-by-one and seeing what that forced me to implement.

I started with the InitRecord opcode, which as you might guess, creates a new record with a specified number of fields. As a strategy, I followed the pattern for the existing NewArray and NewObject opcodes, since creating new arrays and objects is similar to creating new records.

By following the error messages, I found the files that I needed to change; I’m putting this list in logical order rather than in the order that the compile errors came up, which was quite different.

  • VMFunctionList-inl.h — added the RecordType::createUninitialized C++ function to the list of functions that can be called from the JIT
  • VMFunctions.h — added a TypeToDataType case for the RecordType C++ type
  • BaselineCodeGen.cpp, where I added an emit method for InitRecord
  • BaselineIC.cpp, and CacheIR.cpp, where I added code to support inline caching (explained here) for InitRecord.
  • MIROps.yaml, the file that defines all MIR opcodes; a lot of other code is automatically generated from this file. I had to add a new InitRecord opcode.
  • MIR.cpp — MInitRecord methods
  • MIR.h, where I had to define a new MInitRecord class, and MIR.cpp, where I had to implement the class.
  • Lowering.cpp, where I added code to translate the MIR representation for an InitRecord call to LIR.
  • LIROps.yaml, similarly to MIROps.yaml.
  • CodeGenerator.cpp, where I added the visitInitRecord method that translates the LIR code to assembly.
  • Recover.cpp — while I don’t understand this code very well, I think it’s what implements the “bailout” mechanism described in the docs. Similarly to the other modules, I had to add methods for InitRecord and a new class to the accompanying header file.

I love compiler errors! Without static typechecking, I wouldn’t have any information about what parts of the code I needed to change to add a new feature. As a functional programmer, I normally don’t give C++ a lot of credit for static typechecking, but whether it’s about modern language features or the coding style used in SpiderMonkey (or both), I actually find that I get a lot of helpful information from type error messages when working on SpiderMonkey. Without static type errors, I would have had to understand the JIT from the top down to know what parts I needed to change, maybe by reading through the code (slow and tedious) or maybe by reading through documentation (likely to be out of date). Types are documentation that can’t fall out of date, since the compiler won’t generate code for you if you give it something that doesn’t typecheck.

Once everything compiled and I started the interpreter again, I got a different assertion failure:

"Assertion failure: BytecodeOpHasIC(op), at /home/tjc/gecko-fork/js/src/jit/BaselineCodeGen.cpp:649"

This pointed to the final change, in BytecodeLocation.h. I had added the code for inline caching, but hadn’t updated the opcode table defined in this file to indicate that the InitRecord opcode had an inline cache. Since the relationship between this table and the code itself exists only in the programmers’ heads, there’s no way for the compiler to check this for us.

Once I fixed this and started the interpreter again, I got a new error:

Hit MOZ_CRASH(Record and Tuple are not supported by jit) at /home/tjc/gecko-fork/js/src/jit/BaselineCodeGen.cpp:2604
Thread 1 "js" received signal SIGSEGV, Segmentation fault. 0x000055555809ce62 in js::jit::BaselineCodeGen::emit_AddRecordProperty (this=0x7fffffffd080) at /home/tjc/gecko-fork/js/src/jit/BaselineCodeGen.cpp:2604 2604 UNSUPPORTED_OPCODE(AddRecordProperty)

This is just saying that AddRecordProperty is an unsupported opcode, which is what I would expect since I only implemented one of the record/tuple opcodes. So that means that after my changes, SpiderMonkey was able to generate code for the InitRecord opcode. (The reason why these errors showed up as soon as I launched the interpreter, without having to execute any code, is that at startup time with JIT enabled, the interpreter compiles all the self-hosted libraries, which are implemented in JavaScript. Since on my working branch, there is library code that uses the Record and Tuple types, that means that the code path leading to those UNSUPPORTED_OPCODES was guaranteed to be reached.)

So what do I know now? The JIT seems to be able to generate code for the InitRecord opcode, at least for the first occurrence of it in the self-hosted libraries. Whether that code works (that is, implements the semantics in the spec) is a separate question. To know the answer, I would have to look at the generated code — I won’t be able to actually test any code in the interpreter until I implement all the opcodes, since each one will subsequently fail with the same error message as above. But that’s for another day.

by Tim Chevalier at November 09, 2021 05:32 AM

November 08, 2021

Tim Chevalier

Hello, world!

It’s been a long time since I’ve blogged regularly, especially about software. When I worked on the Rust team, I wrote an update post at the end of every single day about what I’d worked on that day, every day I possibly could. I’m going to try to do that again I joined the Compilers team at Igalia this past September and am currently working on implementing new JavaScript features in the Spidermonkey JavaScript engine; at the moment, the Records and Tuples proposal, which would add immutable data types to JavaScript. As much as possible, I’m going to document how I spend each work day and what problems arise. This is mostly for me (so that I don’t look back and wonder what I did all month), but if anyone else happens to find it interesting, that’s an added bonus.

by Tim Chevalier at November 08, 2021 02:12 AM

November 01, 2021

Qiuyi Zhang (Joyee)

Building V8 on an M1 MacBook

I’ve recently got an M1 MacBook and played around with it a bit. It seems many open source projects still haven’t added MacOS with ARM64

November 01, 2021 01:50 PM

My 2019

It’s that time of the year again! I did not manage to write a recap about my 2018, so I’ll include some reflection about that year in

November 01, 2021 01:50 PM

Uncaught exceptions in Node.js

In this post, I’ll jot down some notes that I took when refactoring the uncaught exception handling routines in Node.js. Hopefully it

November 01, 2021 01:50 PM

On deps/v8 in Node.js

I recently ran into a V8 test failure that only showed up in the V8 fork of Node.js but not in the upstream. Here I’ll write down my

November 01, 2021 01:50 PM

Tips and Tricks for Node.js Core Development and Debugging

I thought about writing some guides on this topic in the nodejs/node repo, but it’s easier to throw whatever tricks I personally use on

November 01, 2021 01:50 PM

My 2017

I decided to write a recap of my 2017 because looking back, it was a very important year to me.

November 01, 2021 01:50 PM

New Blog

I’ve been thinking about starting a new blog for a while now. So here it is.

Not sure if I am going to write about tech here.

November 01, 2021 01:50 PM

October 13, 2021

Nikolas Zimmermann

Accelerating SVG - an update

Yikes, it’s been more than a year since my last post.

October 13, 2021 12:00 AM

October 02, 2021

Alicia Boya

Setting up VisualStudio code to work with WebKitGTK using clangd

Lately I’m working on a refactor in the append pipeline of the MediaSource Extensions implementation of the WebKit for the GStreamer ports. Working on refactors often triggers many build issues, not only because they often encompass a lot of code, but also because it’s very easy to miss errors in the client code when updating an interface.

The traditional way to tackle this problem is by doing many build cycles: compile, fix the topmost error, and maybe some other errors on view that seem legit (note in C++ it’s very common to have chain errors that are consequence of previous errors), repeat until it builds successfully.

This approach is not very pleasant in a project like WebKit where an incremental build of a single file takes just enough time to cause the need for a distraction. It’s also worsened when it’s not just one file, but a complete build that may stop at any time, depending on the order the build system chooses for the files. Often it does take more time to wait for the compiler to show the error than to fix the error.

Unpleasant unfavors motivation, and lack of motivation unfavors productivity, and by the end of the day you are tired and still undone. Somehow it feels like the time spent fixing trivial build issues is substancially more than the time of a build cycle times the number of errors. Whether that perception is accurate or not, I am acutely aware of the huge impact having helpful tooling has on both productivity and quality of life, both while and after you’re done with the work, so I decided to have a look at the state of modern C++ language servers when working on a large codebase like WebKit. Previous experiences were very unsuccessful, but there are people dedicated to this and progress has been made.

Creating a WebKit project in VS Code

  1. Open the directory containing the WebKit checkout in VS Code.
  2. WebKit has A LOT of files. If you use Linux you will see a warning telling you increase the number of inotify watchers. Do so if you haven’t done it before, but even then, it will not be enough, because WebKit has more files than the maximum number of inotify watchers supported by the kernel. Also, they use memory.
  3. Go to File/Preferences/Settings, click the Workspace tab, search for Files: Watcher Exclude and add the following patterns:
    **/CMakeFiles/**
    **/JSTests/**
    **/LayoutTests/**
    **/Tools/buildstream/cache/**
    **/Tools/buildstream/repo/**
    **/WebKitBuild/UserFlatpak/repo/**

    This will keep the number of watches on a workable 258k. Still a lot, but under the 1M limit.

How to set up clangd

The following instructions assume you’re using WebKitGTK with the WebKit Flatpak SDK. They should also work for WPE with minimal substitutions.

  1. Microsoft has its own C++ plugin for VS Code, which may be installed by default. The authors of the clangd plugin recommend to uninstall the built-in C++ plugin, as running both doesn’t make much sense and could cause conflicts.
  2. Install the clangd extension for VS Code from the VS Code Marketplace.
  3. The WebKit flatpak SDK already includes clangd, so it’s not necessary to install it if you’re using it. On the other hand, because the flatpak has a virtual filesystem, it’s necessary to map paths from the flatpak to the outside. You can create this wrapper script for this purpose. Make sure to give it execution rights (chmod +x).
    #!/bin/bash
    set -eu
    # https://stackoverflow.com/a/17841619
    function join_by { local d=${1-} f=${2-}; if shift 2; then printf %s "$f" "${@/#/$d}"; fi; }
    
    local_webkit=/webkit
    include_path=("$local_webkit"/WebKitBuild/UserFlatpak/runtime/org.webkit.Sdk/x86_64/*/active/files/include)
    if [ ! -f "${include_path[0]}/stdio.h" ]; then
      echo "Couldn't find the directory hosting the /usr/include of the flatpak SDK."
      exit 1
    fi
    include_path="${include_path[0]}"
    mappings=(
      "$local_webkit/WebKitBuild/GTK/Debug=/app/webkit/WebKitBuild/Debug"
      "$local_webkit/WebKitBuild/GTK/Release=/app/webkit/WebKitBuild/Release"
      "$local_webkit=/app/webkit"
      "$include_path=/usr/include"
    )
    
    exec "$local_webkit"/Tools/Scripts/webkit-flatpak --gtk --debug run -c clangd --path-mappings="$(join_by , "${mappings[@]}")" "$@"

    Make sure to set the path of your WebKit repository in local_webkit.

    Then, in VS Code, go to File/Preferences/Settings, and in the left pane, search for Extensions/clangd. Change Clangd: Path to the absolute path of the saved script above. I recomend making these changes in the Workspace tab, so they apply only to WebKit.

  4. Create a symlink named compile_commands.json inside the root of the WebKit checkout directory pointing to the compile_commands.json file of the WebKit build you will be using, for instance: WebKitBuild/GTK/Debug/compile_commands.json
  5. Create a .clangd file inside the root of the WebKit checkout directory with these contents:
    If:
        PathMatch: "(/app/webkit/)?Source/.*\\.h"
        PathExclude: "(/app/webkit/)?Source/ThirdParty/.*"
    
    CompileFlags:
        Add: [-include, config.h]

    This includes config.h in header files in WebKit files, with the exception of those in Source/ThirdParty. Note: If you need to add additional rules, this is done by adding additional YAML documents, which are separated by a --- line.

  6. VS Code clangd plugin doesn’t read .clangd by default. Instead, it has to be instructed to do so by adding --enable-config to Clangd: Arguments. Also add --limit-results=5000, since the default limit for cross reference search results (100) is too small for WebKit.Additional tip: clangd will also add #include lines when you autocomplete a type. While the intention is good, this often can lead to spurious redundant includes. I have disabled it by adding --header-insertion=never to clangd’s arguments.
  7. Restart VS Code. Next time you open a C++ file you will get a prompt requesting confirmating your edited configuration:

VS Code will start indexing your code, and you will see a progress count in the status bar.

Debugging problems

clangd has a log. To see it, click View/Output, then in the Output panel combo box, select clangd.

The clangd database is stored in .cache/clangd inside the WebKit checkout directory. rm -rf’ing that directory will reset it back to its initial state.

For each compilation unit indexed, you’ll find a file following the pattern .cache/clangd/index/<Name>.<Hash>.idx. For instance: .cache/clangd/index/MediaSampleGStreamer.cpp.0E0C77DCC76C3567.idx. This way you can check whether a particular compilation unit has been indexed.

Bug: Some files are not indexed

You may notice VS Code has not indexed all your files. This is apparent when using the Find all references feature, since you may be missing results. This in particular affects to generated code, in particular unified sources (.cpp files generated by concatenating via #include a series of related .cpp files with the purpose of speeding up the build, compared to compiling them as individual units).

I don’t know the reason for this bug, but I can confirm the following workaround: Open a UnifiedSources file. Any UnifiedSources file will do. You can find them in paths such as WebKitBuild/GTK/Debug/WebCore/DerivedSources/unified-sources/UnifiedSource-043dd90b-1.cpp. After you open any of them, you’ll see VS Code indexing over a thousand files that were skipped before. You can close the file now. Find all references should work once the indexing is done.

Things that work

Overall I’m quite satisfied with the setup. The following features work:

  • Autocompletion:
  • . gets replaced to -> when autocompleting a member inside an object accessible by dereferencing a pointer or smart pointer. (. will autocomplete not only the members of the object, but also of the pointee).
  • Right click/Find All References: What it founds is accurate, although I don’t feel very confident in it being exhaustive, as that requires a full index.
  • Right click/Show Call Hierarchy: This a useful tool that shows what functions call the selected function, and so on, automating what otherwise is a very manual process. At least, when it’s exhaustive enough.
  • Right click/Type hierarchy: It shows the class tree containing a particular class (ancestors, children classes and siblings).
  • Error reporting: the right bar of VS Code will show errors and warnings that clangd identifies with the code. It’s important to note that there is a maximum number of errors per file, after which the checking will stop, so it’s a good idea to start from the top of the file. The errors seem quite precise and avoid a lot of trips to the compiler. Unfortunately, they’re not completely exhaustive, so even after the file shows no errors in clangd, it might still show errors in the actual compiler, but it still catches most with very detailed information.
  • Signature completion: after completing a function, you get help showing you what types the parameters expect

Known issues and workarounds

“Go to definition” not working sometimes

If “Go to definition” (ctrl+click on the name of a function) doesn’t work on a header file, try opening the source file by pressing Ctrl+o, then go back to the header file by pressing Ctrl+o again and try going to definition again.

Base functions of overriden functions don’t show up when looking for references

Although this is supposed to be a closed issue I can still reproduce it. For instance, when searching for uses of SourceBufferPrivateGStreamer::enqueueSample(), calls to the parent class, SourceBufferPrivate::enqueueSample() get ignored.

This is also a common issue when using Show Call Hierarchy.

Lots of strange errors after a rebase

Clean the cache, reindex the project. Close VS Code, rm -rf .cache/clangd/index inside the WebKit checkout directory, then open VS Code again. Remember to open a UnifiedSources file to create a complete index.

by aboya at October 02, 2021 01:07 PM

September 30, 2021

Brian Kardell

Making the whole web better, one canvas at a time.

Making the whole web better, one canvas at a time.

One can have an entire career on the web and never write a single canvas.getContext('2d'), so "Why should I care about this new OffscreenCanvas thing?" is a decent question for many. In this post, I'll tell you why I'm certain that it will matter to you, in real ways.

How relevant is canvas?

As a user, you know from lived experience that <video> on the web is pretty popular. It isn't remotely niche. However, many developers I talk to think that <canvas> is. The sentiment seems to be something like...

I can see how it is useful if you want to make a photo editor or something, but... It's not really a thing I've ever added to a site or think I experience much... It's kind of niche, right?

What's interesting though, is that in reality, <canvas>'s prevalence in the the HTTPArchive isn't so far behind <video> (63rd/70th most popular elements respectively). It's considerably more widely used than many other standard HTML elements.

Amazing, right? I mean, how could that even be?!

The short answer is, it's just harder to recognize. A great example of this is maps. As a user, you recognize maps. You know they are common and popular. But what perhaps you don't recognize that it's on a canvas.

As a developer, there is a fair chance you have included a <canvas> somewhere without even realizing it. But again, since it is harder to recognize "ah this is a canvas" we don't idenitfy it the way we do video. Think about it: We include videos similarly all the time - not by directly including a <video> but via an abtraction - maybe it is a custom element or an iframe. Still, as a user you still clearly idenitfy it, so in your mind, as a developer you count it.

If canvas is niche, it is only so in the sense of who has to worry about those details. So let's talk about why you'll care, even if you don't directly use the API...

The trouble with canvas...

Unfortunately, <canvas> itself has a fundamental flaw. Let me show you...

Canvas (old)

This video is made by Andreas Hocevar using a common mapping library, on some fairly powerful hardware. You'll note how janky it gets - what you also can't tell from the video is that user interactions are temporarily interrupted on and off as rendering tries to keep up. The interface feels a little broken and frustrating.

For whom the bell tolls

For as bad as the video above is, as is the case on all performance related things, it's tempting to kind of shrug it off and think "Well, I don't know.. it's pretty usable, still - and hardware will catch up".

For all of the various appeals that have been made over the years to get us to care more about performance ("What about the fact that the majority of people use hardware less powerful than yours?" or "What about the fact that you're losing potential customers and users?" etc), we haven't moved that ball as meaningfully as we'd like. But,W I'd like to add one more to the list of things to think about here...

Ask not for whom the performance bell tolls, because increasingly: It tolls for you.

While we've been busy talking about phones and computers, something interesting happened: Billions of new devices using embedded web rendering engines appeared. TVs, game consoles, GPS systems, audio systems, infotainment systems in cars, planes and trains, kiosks, point of sale, digital signage, refridgerators, cooking appliances, ereaders, etc.. They're all using web engines.

Interstingly, if you own a high-end computer or phone, you're similarly more likely to enounter even more of these, as a user.

Embedded systems are generally way less powered than the universal devices we talk about often when they're brand new -- and their replacement rate is way slower.

So, while that moderately uncomfortable jank on your new iPhone still seems pretty bearable, it might translate to just a few (or even 1) FPS on your embedded device. Zoiks!

In other words, increasingly, that person that all of the other talks ask you to consider and empathize with... is you.

Enter: OffscreenCanvas

OffscreenCanvas is a solution to this. It's API surface is really small: It has a constructor, and a getContext('2d') method. Unlike the canvas element itself, however, it is neatly decoupled from the DOM. It can be used in a worker - in fact, they are tranferrable - you can pass them between windows and workers via postMessage. The existing DOM <canvas> API itself adds a .transferControlToOffscreen which will (explcitly) give you one back, and is in charge of painting in this rectangle.

If you are one of the many people who don't program against canvases yourself, don't worry about the details... Instead, let me show you what that means. The practical upshot of simply decoupling this is pretty clear, even on good hardware, as you can see in this demo...

OffscreenCanvas based maps
Using OffscreenCanvas, user interactions are not blocked - the rendering is way more fluid and the interface is able to feel smooth and responsive.

A Unique Opportunity

Canvas is also pretty unique in the history of the web because it began as unusually low level. That has its pros and its cons - but one positive thing is that the fact that most people use it by abstraction presents an intersting opportunity. We can radically improve things for pretty much all real users, through the actions of comparatively group of people who directly write things against the actual canvas APIs. Your own work can realize this, in most cases, without any changes to your code. Potentially without you even knowing. Nice.

New super powers, same great taste

There's a knock on effect here too that might be hard to notice at first. OffscreenCanvas doesn't create a whole new API to do its work - it's basically the same canvas context. And so are Houdini Custom Paint worklets. In fact, it's pretty hard to not see the relationship between painting on a canvas in a worker, and painting on a canvas in a worklet - right? They are effectively the same idea. There is minimal new platform "stuff" but we gain whole new superpowers and a clearer architecture. To me, this seems great.

What's more, while breaking off control and decoupling the main thread is a kind of easy win for performance and an intersting super power on it's own, we actually get more than that: In the case of Houdini we are suddenly able to tap into all of the rest of the CSS infrastructure and use this to brainstorm, explore and test and polyfill interesting new paint ideas before we talk about standardizing them. Amazing! That's really good for both standards and users.

Really interestingly though: In the case of OffscreenCanvas, we now suddenly have the ability to parallelize tasks and throw more hardware at highly parallelizable problems. Maps are also an example of that, but they aren't the only one.

My colleague Chris Lord recently gave a talk in which he gave a great demo visualizing an interactive and animated Mandlebrot Set (below). If you're unfamilliar with why this is impressive: A fractal is a self repeating geometric pattern, and they can be pretty intense to visualize. Even harder to make explorable in a UI. At 1080p resolution, and 250 iterations, that's about half a billion complex equations per rendered frame. Fortunately, they are also an example of a highly parallelizable problem, so they make for a nice demo of a thing that was just totally impossible with web technology yesterday, suddenly becomming possible with this new superpower.

OffscreenCanvas super powers!
A video of a talk from a recent WebKit Contributors meeting, showing impressive rendering. It should be time jumped, but on the chance that that fails, you can skip to about the 5 minute mark to see the demo.

What other doors will this open, and what will we see come from it? It will be super exciting to see!

September 30, 2021 04:00 AM

September 29, 2021

Thibault Saunier

GStreamer: one repository to rule them all

For the last years, the GStreamer community has been analysing and discussing the idea of merging all the modules into one single repository. Since all the official modules are released in sync and the code evolves simultaneously between those repositories, having the code split was a burden and several core GStreamer developers believed that it was worth making the effort to consolidate them into a single repository. As announced a while back this is now effective and this post is about explaining the technical choices and implications of that change.

You can also check out our Monorepo FAQ for a list of questions and answers.

Technicall details of the unification

Since we moved to meson as a build system a few years ago we implemented gst-build which leverages the meson subproject feature to build all GStreamer modules as one single project. This greatly enhanced the development experience of the GStreamer framework but we considered that we could improve it even more by having all GStreamer code in a single repository that looks the same as gst-build.

This is what the new unified git repository looks like, gst-build in the main gstreamer repository, except that all the code from the GStreamer modules located in the subprojects/ directory are checked in.

This new setup now lives in the main default branch of the gstreamer repository, the master branches for all the other modules repositories are now retired and frozen, no new merge request or code change will be accepted there.

This is only the first step and we will consider reorganizing the repository in the future, but the goal is to minimize disruptions.

The technical process for merging the repositories looks like:

foreach GSTREAMER_MODULE
    git remote add GSTREAMER_MODULE.name GSTREAMER_MODULE.url
    git fetch GSTREAMER_MODULE.name
    git merge GSTREAMER_MODULE.name/master
    git mv list_all_files_from_merged_gstreamer_module() GSTREAMER_MODULE.shortname
    git commit -m "Moved all files from " + GSTREAMER_MODULE.name
endforeach

This allows us to keep the exact same history (and checksum of each commit) for all the old gstreamer modules in the new repository which guarantees that the code is still exactly the same as before.

Releases with the new setup

In the same spirit of avoiding disruption, releases will look exactly the same as before. In the new unique gstreamer repository we still have meson subprojects for each GStreamer modules and they will have their own release tarballs. In practice, this means that not much (nothing?) should change for distribution packagers and consumers of GStreamer tarballs.

What should I do with my pending MRs in old modules repositories?

Since we can not create new merge requests in your name on gitlab, we wrote a move_mrs_to_monorepo script that you can run yourself. The script is located in the gstreamer repository and you can start moving all your pending MRs by simply calling it (scripts/move_mrs_to_monorepo.py and follow the instructions).


You can also check out our Monorepo FAQ for a list of questions and answers.

Thanks to everyone in the community for providing us with all the feedback and thanks to Xavier Claessens for co-leading the effort.

We are still working on ensuring the transition as smoothly as possible and if you have any question don’t hesitate to come talk to us in #gstreamer on the oftc IRC network.

Happy GStreamer hacking!

by thiblahute at September 29, 2021 09:34 PM

September 24, 2021

Samuel Iglesias

X.Org Developers Conference 2021

Last week we had our most loved annual conference: X.Org Developers Conference 2021. As a reminder, due to COVID-19 situation in Europe (and its respective restrictions on travel and events), we kept it virtual again this year… which is a pity as the former venue was Gdańsk, a very beautiful city (see picture below if you don’t believe me!) in Poland. Let’s see if we can finally have an XDC there!

XDC 2021

This year we had a very strong program. There were talks covering all aspects of the open-source graphics stack: from the kernel (including an Outreachy talk about VKMS) and Mesa drivers of all kind, inputs, libraries, X.org security and Wayland robustness… we had talks about testing drivers, debugging them, our infra at freedesktop.org, and even Vulkan specs (such Vulkan Video and VK_EXT_multi_draw) and their support in the open-source graphics stack. Definitely, a very complete program that is very interesting to all open-source developers working on this area. You can watch all the talks here or here and the slides were already uploaded in the program.

On behalf of the Call For Papers Committee, I would like to thank all speakers for their talks… this conference won’t make sense without you!

Big shout-out to the XDC 2021 organizers (Intel) represented by Radosław Szwichtenberg, Ryszard Knop and Maciej Ramotowski. They did an awesome job on having a very smooth conference. I can tell you that they promptly fixed any issue that happened, all of that behind the scenes so that the attendees not even noticed anything most of the times! That is what good conference organizers do!

XDC 2021 Organizers Can I invite you to a drink at least? You really deserve it!

If you want to know more details about what this virtual conference entailed, just watch Ryszard’s talk at XDC (info, video) or you can reuse their materials for future conferences. That’s very useful info for future conference organizers!

Talking about our streaming platforms, the big novelty this year was the use of media.ccc.de as a privacy-friendly alternative to our traditional Youtube setup (last year we got feedback about this). Media.ccc.de is an open-source platform that respects your privacy and we hope it worked fine for all attendees. Our stats indicate that ~50% of our audience connected to it during the three days of the conference. That’s awesome!

Last but not least, we couldn’t make this conference without our sponsors. We are very lucky to have on board Intel as our Platinum sponsor and organizer, our Gold sponsors (Google, NVIDIA, ARM, Microsoft and AMD, our Silver sponsors (Igalia, Collabora, The Linux Foundation), our Bronze sponsors (Gitlab and Khronos Group) and our Supporters (C3VOC). Big thank you from the X.Org community!

XDC 2021 Sponsors

Feedback

We would like to hear from you and learn about what worked and what needs to be improved for future editions of XDC! Share us your experience!

We have sent an email asking for feedback to different mailing lists (for example this). Don’t hesitate to send an email to X.Org Foundation board with all your feedback!

XDC 2022 announced!

X.Org Developers Conference 2022 has been announced! Jeremy White, from Codeweavers, gave a lightning talk presenting next year edition! Next year the XDC will not be alone… WineConf 2022 is going to be organized by Codeweavers as well and co-located with XDC!

Save the dates! October 4-5-6, 2022 in Minneapolis, Minnesota, USA.

XDC 2022: Minneapolis, Minnesota, USA Image from Wikipedia. License CC BY-SA 4.0.

XDC 2023 hosting proposals

Have you enjoyed XDC 2021? Do you think you can do it better? ;-) We are looking for organizers for XDC 2023 (most likely in Europe but we are open to other places).

We know this is a decision that takes time (trigger internal discussion, looking for volunteers, budget, a venue suitable for the event, etc). Therefore, we encourage potential interested parties to start the internal discussions now, so any question they have can be answered before we open the call for proposals for XDC 2023 at some point next year. Please read what it is required to organize this conference and feel free to contact me or the X.Org Foundation board for more info if needed.

Final acknowledgment

I would like to thank Igalia for all the support I got when I decided to run for re-election this year in the X.Org Foundation board and to allow me to participate in XDC organization during my work hours. It’s amazing that our Free Software and collaboration values are still present after 20 years rocking in the free world!

Igalia 20th anniversary Igalia

September 24, 2021 05:20 AM

Brian Kardell

Dad: A Personal Post

Dad: A Personal Post

Last month, my dad passed away, very unexpectedly. That night, alone with my thoughts and unable to sleep, or do anything else, I wrote this post. I didn't write it for my blog, I wrote it for me. I needed to. I didn't post it then for a lot of reasons, not the least of which is that I don't generally share personal or vulnerable things here. I can understand if that's not why you're here. Today, I decided I would, as a kind of memorial... And immediately cried. So, please: Feel free to skip this one if it pops up in your feed and you're here for the tech. This isn't that post. This one isn't for you, it's for me, and my dad.

[Posted later] Today my dad passed away, unexpectedly. I am thinking a lot, and so sad. I need to put words on a page and get them out of my head.

my dad's obit photo
My dad's obit photo. He was barely 63.

My Dad

When I was 5 my mother and my biological father, barely in their mid-20s, got a divorce. Even I could see that weren't compatible. My mom, just finishing college had a lot of friends and they would occasionally help us out in many ways: From picking me up from school because my mom was held up, to to helping us move into our first apartment - or sometimes, just inviting us over.

One of those people, who I saw more and more of was a young man named Jim Wyse.. Jimmy... My dad, who passed away today, unexpectedly.

Legally speaking, I guess, Jimmy became my "dad" in a ceremony when I was 7 - but that's bullshit, because the truth is, I can't even tell you when it became clear that this distinction was uttlerly meaningless to us. I was his son, and he was my dad. It wasn't because of biology or law or ceremony, but by virtue of all of the things that ultimately matter so much more...and by choice. I couldn't tell you when, because it is seamless in my mind.

From the very beginning he cared for me. He took me camping, and fishing. He taught me to shift gears while he worked the clutch. He played with me in the yard. We wrestled and "boxed". We swam and we boated. He took me to see the movies of my childhood: The Empire Strikes Back, Superman II and Rocky 3. He gave me my first tastes of coffee, beer and wine. He told me stories of his childhood. We laughed together. He taught me to build and fix things, or at least he included me, as if my "help" (often counter-productive) really mattered. What really mattered was something more than that. It's easy to see that, now.

Early photos of my dad and me, maybe even before he was technically my dad (I am in the black hat, with me is his nephew, my late cousin Jason who died a couple of years ago).

In fact, we spent what seems like, in retrospect, an impossible amount of time together. He cared when I was sad. He celebrated my victories. He taught me to be respectful and empathetic and generous and forgiving. He provided discipline too.

Jimmy came from a large family by today's standards, 4 brothers and a sister who all grew up and spent their entire lives in the same small 3 bedroom, 1 small bathroom house. It is generous, in fact, to call it 3 bedrooms. One of them, I believe, was converted out of the largest of two when my aunt was born. I worked on houses with my dad that had walk-in closets that are larger. They weren't wealthy by any stretch of the imagination, but they were close, and he still lived in that house with his parents when I met him. He was younger than my mom.

In this I got a whole new (big) family too. Cousins, aunts, uncles and grandparents with grand children who would become fixtures in my life. They were, of course, all actually biologically related and yet this distinction seems to have been totally irrelevant to all of them from the beginning as well. We spent holidays and vacaations together. In fact, while we lived near enough, we spent many weekends and evenings together too. Several of them lived with us and worked for him for a stint during difficult times in their own lives.

When I was 9 my sister Jennifer was born. It would be impossible to overstate how much I loved this new baby that came into our house. And it would be impossible to not see how much he did too. Perhaps it was the fact that some people began congratulate him on his "first child" that caused me to hear him first address the issue. It may well be the first time, though it was certainly not the last - that I heard him express just how much he loved me and reassure me that I was every bit his child. It was genuine.

By the time my sister Sarah was born there was certainly nobody I met who doubted this. I was "Jimmy and Adele's kid" and most people referred to me as a Wyse.

My sisters are much younger than me. I don't tell them enough anymore, but I hope they know how much I love them, and how much he did. Because of our age differences, I probably have different memories than them. By the time they were probably old enough to remember much, I was already in my teenage years and spending less time at home. But I have so many wonderful memories of time we all spent together.

Somehow, it is amazing to me that the first time heard anyone refer to us as "half-brother" or "half-sister" I was 40. Despite knowing this to be a biological truth in my mind, I considerably was taken aback just to hear it and it still feels... wrong.

Tonight this memory dawned on me again as I realized it might be difficult for me to help with arrangements. He and my mother divorced long ago, so on paper we're as good as strangers, probably.

As I spoke to my sister on the phone, this realization fresh in my head, I began to worry that perhaps there was a difference. My heart broke again as I imagined the pain my sisters must feel - is it more than my own? Perhaps it is even insensitve to not acknowledge? He was, after all, the man who held them in his arms at the hospital moments after their birth - they have known nothing else. I offered a stuggled, "I know it probably isn't quite the same for us... I'm so sorry.".

That this was the moment that finally prompted her to audible tears filled me with instant regret. "No one ever thought that. How could you say that? He definitely didn't see it that way." She's right, of course, and I know it. I'm saddened that I brought it up. He was my dad - and throughout my entire life he has always been there.

My teenage years were difficult. I was difficult. I didn't take school, or much of anything else seriously. But through it all, he never gave up on me. By then he had started a small business as a general contractor and he put me to work weekends and summers (and even the occasional school day when he was very shorthanded and it was clear I wasn't going to go to school anyway). He was a constant force who walked a thin line - both teaching me valueable skills that I might need with pride, and simultaneously constantly pushing me to please use my brain and not my back to make a living.

When I graduated highschool, by some miracle, I went to work for him full time.

The following February we went to a job near Lake Erie to work on a roof. It was just about the last day anyone would want to do such a thing. It was windy, and biting cold - just above freezing. There was easily a foot of snow on the roof, and in inch of ice below it. By 9, a freezing rain had started whipping across us too.

Cold, soaked, and more uncomfortable than I have ever been, I realized that I couldn't imagine lasting the rest of the day. Did I really imagine doing this for another 40 years or more?. It was then that I realized he was right, I should do something else. I wanted to leave right then, but the shame I'd feel walking off the job because I couldn't take it kept me going for another hour... But it couldn't last.

Around 10am, I quit.

Miles from home, I sat in his truck (still very cold, wet and without heat) for many hours pondering my future. I'm sure he took some shit from the rest of the crew about it. I spent months finding a college to accept me on a trial admission program.

I tell this story so that I can add that years later, after honors and success, he told me "That was the plan. I had to show you very clearly the choice in front of you. It was one of the happiest days of my life, when you realized you didn't have to do this.... But man it was cold. That was a shitty day.".

Throughout my life, he's always been teaching me - sometimes directly, sometimes indirectly by letting me fall flat and being there to pick me up and set me right.

He was the model in my young life that set the bar for what I wanted to be for my own chidren. I also watched him a show kindness, patience and understanding to many people, over the years, in ways that remain unparalleled examples to me. He was my example and the man I tried to be in so many ways.

He was the warmest soul in my darkest hours. There were times in my life where he was the only one I could talk to. On more than one occasion, he consoled and supported me in ways no one else could. He sat with me and calmed me while I cried so hard I couldn't speak. I tried be there for him in some of his difficult times too, and he had some rough ones. He wasn't perfect either, but who is?

The truth is, he was more to me than "dad" expresses. Much more.

7 years ago, during one of those difficult times in my life, after my own long relationship broke up, I purchased the home that he grew up in from my Aunt. It needed a lot of work at the time, and he came and did some of it. He replaced the roof and installed new windows in the front. We were planning on doing the back last fall, until the pandemic. A boom in new work after things began to turn around meant we'd put it off till this fall.

It's funny, and sad, how much we (or at least I) put off till tomorrow, and then miss the chance because there are no more tomorrows. I haven't phsyically seen him (or much of anyone, really) in a year.

A lot of our conversation since the pandemic has centered on the old place: Me asking him questions about how to do something, or sending him pictures of improvements or changes I'd made. He'd always reply encouragingly, celebrating my work and expressing happiness that this home remained in the family. "Your grandparents would be happy".

Tonight, as I went to make a call, I realized that I have an unread message on my phone from him from last week. He was replying to to a photo I sent him of some new landscaping. It was a simple message. "Looks great!" Two words and an exclamation point. That's it. Nothing deep, but it made me cry. I missed it at the time, and these are his last words to me. Encouraging me.

Each night I fall asleep in the same room that he did until we met. My bedroom is his old bedroom that I used to go and play in and wrestle with my cousins. I think about all of this often - and how lucky I am that Jim was my dad and that he loved me. I loved him too - and I'm glad I can say that we both knew it. Tonight, won't be different in that respect - I'm sure I'll replay all of this in my mind... But.. It is quite a bit different, isn't it? He's gone now.

Photo memories

One of the things I spent a long time doing since is looking through old photos. Most of these are bad photos of photos, but they give some context to all of this and are some great memories for me... Even if they aren't all of him, he's in all of the memories.

I was in my mom and dad's wedding party, in fact. I am pretty sure he helped pick my suit. This is me (in the suit), outside the reception where a bunch of us helped decorate their car with paper flowers.
This is a photo of my dad on his honeymoon after he married my mom. He was 21. So young. Only 14 years older than me, in fact.
<
A photo of me and my sister Jennifer. I was 9 by the time she was born. My dad took this photo of us on vacation.
Me and my youngest sister, Sarah, when she was born. Also, taken by dad. He loved taking pictures of us (he got much better at it later).
This is me at my 6th grade graduation. My dad got right up there on the stage to take a photo. He was like that - always cheering me on, boisterously. I almost didn't go to my highschool graduation. He talked me into it. I could hear him over the entire crowd when I walked up.
A photo of me and my two sisters, taken by my dad.
My dad and I were always horsing around. These memories of him are so firmly engrained in my mind, and still how I see him that a just few years ago, in his pool I initiated similar horseplay in his pool (we had fun), before remembering that he was 60 and had a bad back, and just very quickly let him take me down. I say "let" only to say I phsyically stopped - but I'm not gonna lie, my dad was rugged as hell, even then.
In 2014, a photo of me, my dad and my two sisters (and my sister's husband) at his house for Christmas, after I moved back to Pittsburgh. He and my mom had been divorced for maybe a decade. He'd been remarried and divorced again since. He never stopped being my dad for a minute.

September 24, 2021 04:00 AM

September 20, 2021

Manuel Rego

Igalia 20th Anniversary

This is a brief post about an important event that is happening today.

Back in 2001 a group of 10 engineers from the University of A Coruña in Galicia (Spain) founded Igalia to run a cooperative business around the free software world. Today it’s its 20th anniversary so it’s time to celebrate! 🎂

Igalia 20th anniversary logo Igalia 20th anniversary logo

In my particular case I joined the company in 2007, just after graduating from the University. During these years I have had the chance to be involved in many interesting projects and communities. On top of that, I’ve learnt a ton of thigs about how a company is managed. I’ve also seen Igalia grow in size and move from local projects and customers to work with the biggest fishes in the IT industry.

I’m very grateful to the founders and all the people that have been involved in making Igalia a successful project during all this years. I’m also very proud of all what we have achieved so far, and how we have done it without sacrificing any of our principles.

We’re now more than 100 people, from all over the world. Awesome colleagues, partners and friends which share amazing values that define the direction of the company. Igalia is today the reference consultancy in some of the most important open source communities out there, which is an outstanding achievement for such a small company.

This time the celebration has to be at home, but for sure we’ll do a big party when we all can meet again together in the future. Meanwhile we have a new brand 20th anniversary logo, together with a small website in case you want to know more about Igalia’s history.

Myself with the Igalia 20th anniversary t-shirt Myself with the Igalia 20th anniversary t-shirt

Sometimes happens that the company you work for turns 20 years old. But it’s way less common that the company you co-own turns 20 years old. Let’s enjoy this moment. Looking forward to many more great years to come! 🎉

September 20, 2021 10:00 PM

September 02, 2021

Byungwoo Lee

CSS Selectors :has()

Selector? Combinator? Subject?

As described in the Selectors Level 4 spec, a selector represents a particular pattern of element(s) in a tree structure. We can select specific elements in a tree structure by matching the pattern to the tree.

Generally, this pattern involves two disctinct concepts: First, a means to express conditions to be tested on an element itself (simple selectors or compound selector). Second, a means to express conditions on the relationship between two elements (combinators).

And the subject of a selector is any element matched by the selector.

The limits of subjects, so far

When you have a reference element in a DOM tree, you can select other elements with a CSS selector.

In a generic tree structure, an element can have 4-way relationships to other elements.

  • an element is an ancestor of an other element.
  • an element is a previous sibling of an other element.
  • an element is a next sibling of an other element.
  • an element is a descendant of an other element.

CSS Selectors, to date, have only allowed the last 2 (‘is a next sibling of’ and ‘is a descendant of’).

So in the CSS world, Thor can say “I am Thor, son of Odin” like this: Odin > Thor. But there has been no way for Darth Vader to tell Luke, “I’m your father”.

At least, these are the limits of what has been implemented and is shipping in every browser to date. However, :has() in the CSS Selectors spec provides the expression: DarthVader:has(> Luke)

The reason of the limitation is mainly about efficiency.

The primary use of selectors has always been in CSS itself. Pages often have 500-2000 CSS rules and slightly more elements in them. Selectors act as filters in the process of applying style rules to elements. If we have 2000 css rules for 2000 elements, matching could be done at least 2,000 times, and in the worst case (in theory) 4,000,000 times. In the browser, the tree is changing constantly - even a static document is rapidly mutated (built) as it is parsed - and we try to render all of this incrementally and at 60 fps. In summary, the selector matching is performed very frequently in performance-critical processes. So, it must be designed and implemented to meet very high performance. And one of the efficient ways to make it is to make the problem simple by limiting complex problems.

In the tree structure, checking a descendant relationship is more efficient than checking an ancestor relationship because an element has only one parent, but it can have multiple children.

<div id=parent>
  <div id=subject>
    <div id=child1></div>
    <div id=child2></div>
    ...
    <div id=child10></div>
  </div>
</div>
<script>
subject.matches('#parent > :scope');
// matches   : Are you a child of #parent ?
// #subject  : Yes, my parent is #parent.

subject.matches(':has(> #child10)');
// matches   : Are you a parent of #child10 ?
// #subject  : Wait a second, I have to lookup all my children.
//             Yes, #child10 is one of my children.
</script>

By removing one of the two opposite directions, we can always place the subject of a selector to the right, no matter how complex the selector is.

  • ancestor subject
    -> subject is a descendant of ancestor
  • previous_sibling ~ subject
    -> subject is a next sibling of previous_sibling
  • previous_sibling ~ ancestor subject
    -> subject is a descendant of ancestor, which is a next sibling of previous_sibling

With this limitation, we can get the advantages of having simple data structures and simple matching sequences.

<style>
A > B + C { color: red; }
</style>
<!--
'A > B + C' can be parsed as a list of selector/combinator pair.
[
  {selector: 'C', combinator: '+'},
  {selector: 'B', combinator: '>'},
  {selector: 'A', combinator: null}
]
-->
<A>       <!-- 3. match 'A' and apply style to C if matched-->
  <B></B> <!-- 2. match 'B' and move to parent if matched-->
  <C></C> <!-- 1. match 'C' and move to previous if matched-->
</A>

:has() allows you to select subjects at any position

With combinators, we can only select downward (descendants, next siblings or descendants of next siblings) from a reference element. But there are many other elements that we can select if the other two relationships, ancestors and previous siblings, are supported.

<div>               <!-- ? -->
  <div></div>         <!-- ? -->
</div>
<div>               <!-- ? -->
  <div>               <!-- ? -->
    <div></div>         <!-- ? -->
  </div>
  <div id=reference>  <!-- #reference -->
    <div></div>         <!-- #reference > div -->
  </div>
  <div>               <!-- reference + div -->
    <div></div>         <!-- reference + div > div -->
  </div>
</div>
<div>               <!-- ? -->
  <div></div>         <!-- ? -->
</div>

:has() provides the way of selecting upward (ancestors, previous siblings, previous siblings of ancestors) from a reference element.

<div>               <!-- div:has(+ div > #reference) -->
  <div></div>         <!-- ? -->
</div>
<div>               <!-- div:has(> #reference) -->
  <div>               <!-- div:has(+ #reference) -->
    <div></div>         <!-- ? -->
  </div>
  <div id=reference>  <!-- #reference -->
    <div></div>         <!-- #reference > div -->
  </div>
  <div>               <!-- #reference + div -->
    <div></div>         <!-- #reference + div > div -->
  </div>
</div>
<div>               <!-- ? -->
  <div></div>         <!-- ? -->
</div>

And with some simple combinations, we can select all elements around the reference element.

<div>               <!-- div:has(+ div > #reference) -->
  <div></div>         <!-- div:has(+ div > #reference) > div -->
</div>
<div>               <!-- div:has(> #reference) -->
  <div>               <!-- div:has(+ #reference) -->
    <div></div>         <!-- div:has(+ #reference) > div -->
  </div>
  <div id=reference>  <!-- #reference -->
    <div></div>         <!-- #reference > div -->
  </div>
  <div>               <!-- #reference + div -->
    <div></div>         <!-- #reference + div > div -->
  </div>
</div>
<div>               <!-- div:has(> #reference) + div -->
  <div></div>         <!-- div:has(> #reference) + div > div -->
</div>

What is the problem with :has() ?

As you might already know, this pseudo class has been delayed for a long time despite the constant interest.

There are many complex situations that makes things difficult when we try to support :has().

  • There are many, many complex cases of selector combinations.
  • Those cases are handled in the selector matching operations and style invalidation operations in the style engine.
  • Selector matching operation and style invalidation operation is very critical to performance.
  • The style engine is carefully designed and highly optimized based on the existing two relationships. (is a descendant of, is a next sibling of).
  • Each Browser engine has its own design and optimization for those operations.

In this context, :has() provides the other two relationships (is a parent of, is a previous sibling of), and problems and concerns start from this.

When we meet a complex and difficult problem, the first strategy we can take is to break it down into smaller ones. For :has(), we can divide the problems with the CSS selector profiles

Problems of the :has() matching operation

:has() matching operation basically implies descendant lookup overhead as described previously. This is an unavoidable overhead we have to take on when we want to use :has() functionality.

In some cases, :has() matching can be O(n2) because of the duplicated argument matching operations. When we call document.querySelectorAll('A:has(B)') on the DOM <A><A><A><A><A><A><A><A><A><A><B>, there can be unnecessary argument selector matching because the descendant traversal can occur for every element A. If so, the number of argument matching operation can be 55(=10+9+8+7+6+5+4+3+2+1) without any optimization, whereas 10 is optimal for this case.

There can be more complex cases involving shadow tree boundary crossing.

Problems of the :has() Style invalidation

In a nutshell, the style engine tries to invalidate styles of elements that are possibly affected by a DOM mutation. It has long been designed and highly optimized based on the assumption that, any possibly affected element is the changed element itself or is downward from it.

<style>
.mutation .subject { color: red; }
</style>
<div>          <!-- classList.toggle('mutation') affect .subject -->
  <div class="subject"></div>       <!-- .subject is in downward -->
</div>

But :has() invalidation is different because the possibly affected element is upward of the changed element (an ancestor, rather than a descendant).

<style>
.subject:has(.mutation) { color: red; }
</style>
<div class="subject">                 <!-- .subject is in upward -->
  <div></div>  <!-- classList.toggle('mutation') affect .subject -->
</div>

In some cases, a change can affect elements in both the upward and downward directions.

<style>
.subject1:has(:is(.mutation1 .something)) { color: red; }
.something:has(.mutation2) .subject2 { color: red; }
</style>
<div class="subject1">              <!-- .subject1 is in upward -->
  <div>     <!-- classList.toggle('mutation1') affect .subject1 -->
    <div class="subject1">        <!-- .subject1 is in downward -->
      <div class="something"></div>
    </div>
  </div>
</div>
<div class="something">
  <div class="subject2">            <!-- .subject2 is in upward -->
    <div>   <!-- classList.toggle('mutation2') affect .subject2 -->
      <div class="subject2"></div><!-- .subject2 is in downward -->
    </div>
  </div>
</div>

Actually, a change can affect everywhere.

<style>
:has(~ .mutation) .subject { color: red; }
:has(.mutation) ~ .subject { color: red; }
</style>
<div>
  <div>
    <div class="subject">        <!-- not in upward or downward -->
    </div>
  </div>
  <div></div> <!-- classList.toggle('mutation') affect .subject -->
</div>
<div class="subject"></div>      <!-- not in upward or downward -->

The expansion of the invalidation traversal scope (from the downward sub-tree to the entire tree) can cause performance degradation. And the violation of the basic assumptions of the invalidation logic (finding a subject from the entire tree instead of finding it from downward) can cause performance degradation and can increase implementation complexity or maintenance overhead, because it will be hard or impossible for the existing invalidation logic to support :has() invalidation as it is.

(There are many more details about :has() invalidation, and those will be covered later.)

What is the current status of :has() ?

Thanks to funding from eye/o, the :has() prototyping in the Chromium project was started by Igalia after some investigations.

(You can get rich background about this from the post - “Can I :has() by Brian Kardell.)

Prototyping is still underway, but here is our progress so far.

  • Chromium
    • Landed CLs to support :has() selector matching (3 CLs)
    • Bug fix (2 CLs)
    • Add experimental feature flag for :has() in snapshot profile (1 CL)
  • WPT (web platform test)
    • Add tests (3 Pull requests)
  • CSS working group drafts

:has() in snapshot profile

For about the :has() in snapshot profile, as of now, Chrome Dev (Version 94 released at Aug 19) supports all the :has() functionalities except some cases involving shadow tree boundary crossing.

You can try :has() with javascript APIs (querySelectorAll, querySelector, matches, closest) in snapshot profile after enabling the runtime flag : enable-experimental-web-platform-features.

has-in-snapshot-profile-is-under-the-experimental-flag

You can also enable it with the commandline flag : CSSPseudoHasInSnapshotProfile.

$ google-chrome-unstable \
        --enable-blink-features=CSSPseudoHasInSnapshotProfile

:has() in both (snapshot/live) profile

You can enable :has() in both profiles with the commandline flag : CSSPseudoHas.

$ google-chrome-unstable --enable-blink-features=CSSPseudoHas

Support for :has() in the live profile is still in progress. When you enable :has() with this flag, you can see that style rules with :has() are working only at loading time. The style will not be recalculated after DOM changes.

has-in-live-profile-just-support-initial-style

by Byungwoo's Blog at September 02, 2021 03:00 PM

August 31, 2021

Juan A. Suárez

Implementing Performance Counters in V3D driver

Let me talk here about how we implemented the support for performance counters in the Mesa V3D driver, the OpenGL driver used by the Raspberry Pi 4. For reference, the implementation is very similar to the one already available (not done by me, by the way) for the VC4, OpenGL driver for the Raspberry Pi 3 and prior devices, also part of Mesa. If you are already familiar with how this is implemented in VC4, then this will mostly be a refresher.

First of all, what are these performance counters? Most of the processors nowadays contain some hardware facilities to get measurements about what is happening inside the processor. And of course graphics processors aren’t different. In this case, the graphics chips used by Raspberry Pi devices (manufactured by Broadcom) can record a bunch of different graphics-related parameters: how many quads are passing or failing depth/stencil tests, how many clock cycles are spent on doing vertex/fragment shading, hits/misses in the GPU cache, and many others values. In fact, with the V3D driver it is possible to measure around 87 different parameters, and up to 32 of them simultaneously. Quite a few less in VC4, though. But still a lot.

On a hardware level, using these counters is just a matter of writing and reading some GPU registers. First, write the registers to select what we want to measure, then a few more to start to measure, and finally read other registers containing the results. But of course, much like we don’t expect users to write GPU assembly code, we don’t expect users to write registers in the GPU directly. Moreover, even the Mesa drivers such as V3D can’t interact directly with the hardware; rather, this is done through the kernel, the one that can use the hardware directly, through the DRM subsystem in the kernel. For the case of V3D (and same applies to VC4, and in general to any other driver), we have a driver in user-space (whether the OpenGL driver, V3D, or the Vulkan driver, V3DV), and a kernel driver in the kernel-space, unsurprisingly also called V3D. The user-space driver is in charge of translating all the commands and options created with the OpenGL API or other API to batches of commands to be executed by the GPU, which are submitted to the kernel driver as DRM jobs. The kernel does the proper actions to send these to the GPU to execute them, including touching the proper registers. Thus, if we want to implement support for the performance counters, we need to modify the code in two places: the kernel and the (user-space) driver.

Implementation in the kernel

Here we need to think about how to deal with the GPU and the registers to make the performance counters work, as well as the API we provide to user-space to use them. As mentioned before, the approach we are following here is the same as the one used in the VC4 driver: performance counters monitors. That is, the user-space driver creates one or more monitors, specifying for each monitor what counters it is interested in (up to 32 simultaneously, the hardware limit). The kernel returns a unique identifier for each monitor, which can be used later to do the measurement, query the results, and finally destroy it when done.

In this case, there isn’t an explicit start/stop the measurement. Rather, every time the driver wants to measure a job, it includes the identifier of the monitor it wants to use for that job, if any. Before submitting a job to the GPU, the kernel checks if the job has a monitor identifier attached. If so, then it needs to check if the previous job executed by the GPU was also using the same monitor identifier, in which case it doesn’t need to do anything other than send the job to the GPU, as the performance counters required are already enabled. If the monitor is different, then it needs first to read the current counter values (through proper GPU registers), adding them to the current monitor, stop the measurement, configure the counters for the new monitor, start the measurement again, and finally submit the new job to the GPU. In this process, if it turns out there wasn’t a monitor under execution before, then it only needs to execute the last steps.

The reason to do all this is that multiple applications can be executing at the same time, some using (different) performance counters, and most of them probably not using performance counters at all. But the performance counter values of one application shouldn’t affect any other application so we need to make sure we don’t mix up the counters between applications. Keeping the values in their respective monitors helps to accomplish this. There is still a small requirement in the user-space driver to help with accomplishing this, but in general, this is how we avoid the mixing.

If you want to take a look at the full implementation, it is available in a single commit.

Implementation in the driver

Once we have a way to create and manage the monitors, using them in the driver is quite easy: as mentioned before, we only need to create a monitor with the counters we are interested in and attach it to the job to be submitted to the kernel. In order to make things easier, we keep a mirror-like version of the monitor inside the driver.

This approach is adequate when you are developing the driver, and you can add code directly on it to check performance. But what about the final user, who is writing an OpenGL application and wants to check how to improve its performance, or check any bottleneck on it? We want the user to have a way to use OpenGL for this.

Fortunately, there is in fact a way to do this through OpenGL: the GL_AMD_performance_monitor extension. This OpenGL extension provides an API to query what counters the hardware supports, to create monitors, to start and stop them, and to retrieve the values. It looks very similar to what we have described so far, except for an important difference: the user needs to start and stop the monitors explicitly. We will explain later why this is necessary. But the key point here is that when we start a monitor, this means that from that moment on, until stopping it, any job created and submitted to the kernel will have the identifier of that monitor attached. This implies that only one monitor can be enabled in the application at the same time. But this isn’t a problem, as this restriction is part of the extension.

Our driver does not implement this API directly, but through “queries”, which are used then by the Gallium subsystem in Mesa to implement the extension. For reference, the V3D driver (as well as the VC4) is implemented as part of the Gallium subsystem. The Gallium part basically handles all the hardware-independent OpenGL functionality, and just requires the driver hook functions to be implemented by the driver. If the driver implements the proper functions, then Gallium exposes the right extension (in this case, the GL_AMD_performance_monitor extension).

For our case, it requires the driver to implement functions to return which counters are available, to create or destroy a query (in this case, the query is the same as the monitor), start and stop the query, and once it is finished, to get the results back.

At this point, I would like to explain a bit better what it implies to stop the monitor and get the results back. As explained earlier, stopping the monitor or query means that from that moment on, any new job submitted to the kernel (and thus to the GPU) won’t contain a performance monitor identifier attached, and hence won’t be measured. But it is important to know that the driver submits jobs to the kernel to be executed at its own pace, but these aren’t executed immediatly; the GPU needs time to execute the jobs, and so the kernel puts the arriving jobs in a queue, to be submitted to the GPU. This means when the user stops the monitor, there could be still jobs in the queue that haven’t been executed yet and are thus pending to be measured.

And how do we know that the jobs have been executed by the GPU? The hook function to implement getting the query results has a “wait” parameter, which tells if the function needs to wait for all the pending jobs to be measured to be executed or not. If it doesn’t but there are pending jobs, then it just returns telling the caller this fact. This allows to do other work meanwhile and query again later, instead of becoming blocked waiting for all the jobs to be executed. This is implemented through sync objects. Every time a job is sent to the kernel, there’s a sync object that is used to signal when the job has finished executing. This is mainly used to have a way to synchronize the jobs. In our case, when the user finalizes the query we save this fence for the last submitted job, and we use it to know when this last job has been executed.

There are quite a few details I’m not covering here. If you are interested though, you can take a look at the merge request.

Gallium HUD

So far we have seen how the performance counters are implemented, and how to use them. In all the cases it requires writing code to create the monitor/query, start/stop it, and querying back the results, either in the driver itself or in the application through the GL_AMD_performance_monitor extension1.

But what if we want to get some general measurements without adding code to the application or the driver? Fortunately, there is an environmental variable GALLIUM_HUD that, when correctly, will show on top of the application some graphs with the measured counters.

Using it is very easy; set it to help to know how to use it, as well as to get a list of the available counters for the current hardware.

As example:

$ env GALLIUM_HUD=L2T-CLE-reads,TLB-quads-passing-z-and-stencil-test,QPU-total-active-clk-cycles-vertex-coord-shading scorched3d

You will see:

Performance Counters in Scorched 3D

Bear in mind that to be able to use this you will need a kernel that supports performance counters for V3D. At the moment of writing this, no kernel has been released yet with this support. If you don’t want to wait for it, you can download the patch, apply it to your raspberry pi kernel (which has been tested in the 5.12 branch), build and install it.

  1. All this is for the case of using OpenGL; if your application uses Vulkan, there are other similar extensions, which are not yet implemented in our V3DV driver at the moment of writing this post. 

August 31, 2021 10:00 PM

August 20, 2021

Brian Kardell

Experimenting with :has()

Experimenting with :has()

Back in May, I wrote Can I :has()?. In that piece, I discussed the :has() pseudo-class and the practical reasons it's been hard to advance. Today I'll give you some updates on advancing :has() efforts in Chromium, and how you can play with it today.

In my previous piece I explained that Igalia had been working to help move these discussions along by doing the research that has been difficult for vendors to prioritize (funded by eyeo) and that we believe that we'd gotten somewhere: We'd done lot of research, developed a prototype in a custom build of chromium and had provided what we believed were good proofs for discussion. The day that I wrote that last piece, we were filing an intent to prototype in chromium.

Today, I'd like to give some updates on those efforts...

Where things stand in Chromium, as of yesterday

As you may, or may not know, the process for shipping new features in Chromium is pretty involved and careful. There are several 'intent' steps, many, reviews along the way, many channels (canary, dev, beta, stable). Atop this are also things which launch with command line flags, runtime feature flags, origin trials (experimentally on for some sites opted in), reverse origin trials (some sites opted out) and field trials/finch flags (rollout to some % of users on or off by default).

Effectively, things get more serious and certain, and as that happens we want to expand the reach of these things by making it easier for more developers to experiment with it.

Previously...

For a while now our up-streaming efforts have allowed you to pass command line flags to enable some support in early channels. Either

--enable-blink-features=CSSPseudoHasInSnapshotProfile
--enable-blink-features=CSSPseudoHas

The former adds support for the use of the :has() pseudo class in the JavaScript selector APIs ('the snapshot/static profile'), and the latter enables support in CSS stylesheets too.

These ways still work, but it's obviously a lot more friction than most developers will take the time to learn, figure out, and try. Most of us don't launch from a command line.

New Advancements!

As things have gotten more stable and serious, we're moving along and making some thing easier...

As of the dev channel release 94.0.4606.12 (yesterday), enabling support in the JavaScript selector APIs is now as simple as enabling the experimental web platform features runtime flag. Chances are, a number of readers already have this flag flipped, so low friction indeed!

Support in the JavaScript APIs has always involved far fewer unknowns and challenges, but what's held us from adding support there first has always been a desire to prevent splitting and a lack of ability to answer questions about whether the main, live CSS profile could be supported, what limits it would need and so on. We feel like we have a much better grip on many of these questions now and so things are moving along a bit.

We hope that this encourages more people to try it out and provide feedback, open bugs, or just add encouragement. Let us know if you do!

Much more at Ad Blocker Dev Summit 2021

I'm also happy to note that I'll be speaking, along with my colleague Byungwoo Lee and eyeo's @shwetank and @WebReflection at Ad Blocker Dev Summit 2021 on October 21. Looking forward to being able to provide a lot more information there on the history, technical challenges, process, use cases and impacts! Hope to see you there!

August 20, 2021 04:00 AM

August 11, 2021

Danylo Piliaiev

Testing Vulkan drivers with games that cannot run on the target device

Here I’m playing “Spelunky 2” on my laptop and simultaneously replaying the same Vulkan calls on an ARM board with Adreno GPU running the open source Turnip Vulkan driver. Hint: it’s an x64 Windows game that doesn’t run on ARM.

The bottom right is the game I’m playing on my laptop, the top left is GFXReconstruct immediately replaying Vulkan calls from the game on ARM board.

How is it done? And why would it be useful for debugging? Read below!


Debugging issues a driver faces with real-world applications requires the ability to capture and replay graphics API calls. However, for mobile GPUs it becomes even more challenging since for Vulkan driver the main “source” of real-world workload are x86-64 apps that run via Wine + DXVK, mainly games which were made for desktop x86-64 Windows and do not run on ARM. Efforts are being made to run these apps on ARM but it is still work-in-progress. And we want to test the drivers NOW.

The obvious solution would be to run those applications on an x86-64 machine capturing all Vulkan calls. Then replaying those calls on a second machine where we cannot run the app. This way it would be possible to test the driver even without running the application directly on it.

The main trouble is that Vulkan calls made on one GPU + Driver combo are not generally compatible with other GPU + Driver combo, sometimes even for one GPU vendor. There are different memory capabilities (VkPhysicalDeviceMemoryProperties), different memory requirements for buffer and images, different extensions available, and different optional features supported. It is easier with OpenGL but there are also some incompatibilities there.

There are two open-source vendor-agnostic tools for capturing Vulkan calls: RenderDoc (captures single frame) and GFXReconstruct (captures multiple frames). RenderDoc at the moment isn’t suitable for the task of capturing applications on desktop GPUs and replaying on mobile because it doesn’t translate memory type and requirements (see issue #814). GFXReconstruct on the other hand has the necessary features for this.

I’ll show a couple of tricks with GFXReconstruct I’m using to test things on Turnip.


Capturing with GFXReconstruct

At this point you either have the application itself or, if it doesn’t use Vulkan, a trace of its calls that could be translated to Vulkan. There is a detailed instruction on how to use GFXReconstruct to capture a trace on desktop OS. However there is no clear instruction of how to do this on Android (see issue #534), fortunately there is one in Android’s documentation:

Android how-to (click me)
For Android 9 you should copy layers to the application which will be traced
For Android 10+ it's easier to copy them to com.lunarg.gfxreconstruct.replay
You should have userdebug build of Android or probably rooted Android

# Push GFXReconstruct layer to the device
adb push libVkLayer_gfxreconstruct.so /sdcard/

# Since there is to APK for capture layer,
# copy the layer to e.g. folder of com.lunarg.gfxreconstruct.replay
adb shell run-as com.lunarg.gfxreconstruct.replay cp /sdcard/libVkLayer_gfxreconstruct.so .

# Enable layers
adb shell settings put global enable_gpu_debug_layers 1

# Specify target application
adb shell settings put global gpu_debug_app <package_name>

# Specify layer list (from top to bottom)
adb shell settings put global gpu_debug_layers VK_LAYER_LUNARG_gfxreconstruct

# Specify packages to search for layers
adb shell settings put global gpu_debug_layer_app com.lunarg.gfxreconstruct.replay

If the target application doesn’t have rights to write into external storage - you should change where the capture file is created:

adb shell "setprop debug.gfxrecon.capture_file '/data/data/<target_app_folder>/files/'"


However, when trying to replay the trace you captured on another GPU - most likely it will result in an error:

[gfxrecon] FATAL - API call vkCreateDevice returned error value VK_ERROR_EXTENSION_NOT_PRESENT that does not match the result from the capture file: VK_SUCCESS.  Replay cannot continue.
Replay has encountered a fatal error and cannot continue: the specified extension does not exist

Or other errors/crashes. Fortunately we could limit the capabilities of desktop GPU with VK_LAYER_LUNARG_device_simulation

VK_LAYER_LUNARG_device_simulation when simulating another GPU should be told to intersect the capabilities of both GPUs, making the capture compatible with both of them. This could be achieved by recently added environment variables:

VK_DEVSIM_MODIFY_EXTENSION_LIST=whitelist
VK_DEVSIM_MODIFY_FORMAT_LIST=whitelist
VK_DEVSIM_MODIFY_FORMAT_PROPERTIES=whitelist

whitelist name is rather confusing because it’s essentially means “intersection”.

One would also need to get a json file which describes target GPU capabilities, this should be done by running:

vulkaninfo -j &> <device_name>.json

The final command to capture a trace would be:

VK_LAYER_PATH=<path/to/device-simulation-layer>:<path/to/gfxreconstruct-layer> \
VK_INSTANCE_LAYERS=VK_LAYER_LUNARG_gfxreconstruct:VK_LAYER_LUNARG_device_simulation \
VK_DEVSIM_FILENAME=<device_name>.json \
VK_DEVSIM_MODIFY_EXTENSION_LIST=whitelist \
VK_DEVSIM_MODIFY_FORMAT_LIST=whitelist \
VK_DEVSIM_MODIFY_FORMAT_PROPERTIES=whitelist \
<the_app>

Replaying with GFXReconstruct

gfxrecon-replay -m rebind --skip-failed-allocations <trace_name>.gfxr
  • -m Enable memory translation for replay on GPUs with memory types that are not compatible with the capture GPU’s
    • rebind Change memory allocation behavior based on resource usage and replay memory properties. Resources may be bound to different allocations with different offsets.
  • --skip-failed-allocations skip vkAllocateMemory, vkAllocateCommandBuffers, and vkAllocateDescriptorSets calls that failed during capture

Without these options replay would fail.

Now you could easily test any app/game on your ARM board, if you have enough RAM =) I even successfully ran a capture of “Metro Exodus” on Turnip.

But what if I want to test something that requires interactivity?

Or you don’t want to save a huge trace on disk, which could grow tens of gigabytes if application is running for considerable amount of time.

During the recording GFXReconstruct just appends calls to a file, there are no additional post-processing steps. Given that the next logical step is to just skip writing to a disk and send Vulkan calls over the network!

This would allow us to interact with the application and immediately see the results on another device with different GPU. And so I hacked together a crude support of over-the-network replay.

The only difference with ordinary tracing is that now instead of file we have to specify a network address of the target device:

VK_LAYER_PATH=<path/to/device-simulation-layer>:<path/to/gfxreconstruct-layer> \
    ...
GFXRECON_CAPTURE_FILE="<ip>:<port>" \
<the_app>

And on the target device:

while true; do gfxrecon-replay -m rebind --sfa ":<port>"; done

Why while true? It is common for DXVK to call vkCreateInstance several times leading to the creation of several traces. When replaying over the network we therefor want gfxrecon-replay to immediately restart when one trace ends to be ready for another.

You may want to bring the FPS down to match the capabilities of lower power GPU in order to prevent constant hiccups. It could be done either with libstrangle or with mangohud:

  • stranglevk -f 10
  • MANGOHUD_CONFIG=fps_limit=10 mangohud

You have seen the result at the start of the post.

by Danylo Piliaiev at August 11, 2021 09:00 PM

August 10, 2021

Iago Toral

An update on feature progress for V3DV

I’ve been silent here for quite some time, so here is a quick summary of some of the new functionality we have been exposing in V3DV, the Vulkan driver for Raspberry PI 4, over the last few months:

  • VK_KHR_bind_memory2
  • VK_KHR_copy_commands2
  • VK_KHR_dedicated_allocation
  • VK_KHR_descriptor_update_template
  • VK_KHR_device_group
  • VK_KHR_device_group_creation
  • VK_KHR_external_fence
  • VK_KHR_external_fence_capabilities
  • VK_KHR_external_fence_fd
  • VK_KHR_external_semaphore
  • VK_KHR_external_semaphore_capabilities
  • VK_KHR_external_semaphore_fd
  • VK_KHR_get_display_properties2
  • VK_KHR_get_memory_requirements2
  • VK_KHR_get_surface_capabilities2
  • VK_KHR_image_format_list
  • VK_KHR_incremental_present
  • VK_KHR_maintenance2
  • VK_KHR_maintenance3
  • VK_KHR_multiview
  • VK_KHR_relaxed_block_layout
  • VK_KHR_sampler_mirror_clamp_to_edge
  • VK_KHR_storage_buffer_storage_class
  • VK_KHR_uniform_buffer_standard_layout
  • VK_KHR_variable_pointers
  • VK_EXT_custom_border_color
  • VK_EXT_external_memory_dma_buf
  • VK_EXT_index_type_uint8
  • VK_EXT_physical_device_drm

Besides that list of extensions, we have also added basic support for Vulkan subgroups (this is a Vulkan 1.1 feature) and Geometry Shaders (we use this to implement multiview).

I think we now meet most (if not all) of the Vulkan 1.1 mandatory feature requirements, but we still need to check this properly and we also need to start doing Vulkan 1.1 CTS runs and fix test failures. In any case, the bottom line is that Vulkan 1.1 should be fairly close now.

by Iago Toral at August 10, 2021 08:10 AM

August 07, 2021

Enrique Ocaña

Beyond Google Bookmarks

I was a happy user of Del.icio.us for many years until the service closed. Then I moved my links to Google Bookmarks, which offered basically the same functionality (at least for my needs): link storage with title, tags and comments. I’ve carefully tagged and filed more than 2500 links since I started, and I’ve learnt to appreciate the usefulness of searching by tag to find again some precious information that was valuable to me in the past.

Google Bookmarks is a very old and simple service that “just works”. Sometimes it looked as if Google had just forgotten about it and let it run for years without anybody noticing… until now. It’s closing on September 2021.

I didn’t want to lose all my links, still need a link database searchable by tags and don’t want to be locked-in again in a similar service that might close in some years, so I wrote my own super-simple alternative to it. It’s called bs, sort of bookmark search.

The usage can’t be simpler, just add the tag you want to look for and it will print a list of links that have that tag:

$ bs webassembly
  title = Canvas filled three ways: JS, WebAssembly and WebGL | Compile 
    url = https://compile.fi/canvas-filled-three-ways-js-webassembly-and-webgl/ 
   tags = canvas,graphics,html5,wasm,webassembly,webgl 
   date = 2020-02-18 16:48:56 
comment =  
 
  title = Compiling to WebAssembly: It’s Happening! ★ Mozilla Hacks – the Web developer blog 
    url = https://hacks.mozilla.org/2015/12/compiling-to-webassembly-its-happening/ 
   tags = asm.js,asmjs,emscripten,llvm,toolchain,web,webassembly 
   date = 2015-12-18 09:14:35 
comment = 

If you call the tools without parameters, it will prompt data to insert a new link or edit it if the entered url matches a preexisting one:

$ bs 
url: https://compile.fi/canvas-filled-three-ways-js-webassembly-and-webgl/ 
title: Canvas filled three ways: JS, WebAssembly and WebGL | Compile 
tags: canvas,graphics,html5,wasm,webassembly,webgl 
comment: 

The data is stored in an sqlite database and I’ve written some JavaScript snippets to import the Delicious exported bookmarks file and the Google Bookmarks exported bookmarks file. Those snippets are meant to be copypasted in the JavaScript console of your browser while you have the exported bookmarks html file open on it. They’ll generate SQL sentences that will populate the database for the first time with your preexisting data.

By now the tool doesn’t allow to delete bookmarks (I haven’t had the need yet) and I still need to find a way to simplify its usage through the browser with a bookmarklet to ease adding new bookmarks automatically. But that’s a task for other day. By now I have enough just by knowing that my bookmarks are now safe.

Enjoy!

[UPDATE: 2020-09-08]

I’ve now coded an alternate variant of the database client that can be hosted on any web server with PHP and SQLite3. The bookmarks can now be managed from a browser in a centralized way, in a similar fashion as you could before with Google Bookmarks and Delicious. As you can see in the screenshot, the style resembles Google Bookmarks in some way.

You can easily create a quick search / search engine link in Firefox and Chrome (I use “d” as keyword, a tradition from the Delicious days, so that if I type “d debug” in the browser search bar it will look for that tag in the bookmark search page). Also, the 🔖 button opens a popup that shows a bookmarklet code that you can add to your browser bookmark bar. When you click on that bookmarklet, the edit page prefilled with the current page info is opened, so you can insert or edit a new entry.

There’s a trick to use the bookmarklet on Android Chrome: Use a rare enough name for the bookmarklet (I used “+ Bookmark 🔖”). Then, when you want to add the current page to the webapp, just start typing “+ book”… in the search bar and the saved bookmarklet link will appear as an autocomplete option. Click on it and that’s it.

Enjoy++!

by eocanha at August 07, 2021 12:29 PM