Planet Igalia

June 08, 2021

Eric Meyer

Back in the CSSWG

As you might have noticed, I recently wrote about how I got started with CSS a quarter century ago,  what I’ve seen change over that long span of time, and the role testing has played in both of those things.

After all, CSS tests are most of how I got onto the Cascading Style Sheets & Formatting Properties Working Group (as it was known then) back in the late 1990s.  After I’d finished creating tests for nearly all of CSS, I wrote the chair of the CSS&FP WG, Chris Lilley, about it.  The conversation went something like, “Hey, I have all these tests I’ve created, would the WG or browser makers be at all interested in using them?”  To which the answer was a resounding yes.

Not too much later, I made some pithy-snarky comment on www-style about how only the Cool Kids on the WG knew what was going on with something or other, and I wasn’t one of them, pout pout.  At which point Chris emailed me to say something like, “We have this role called Invited Expert; how would you like to be one?”  To which the answer was a resounding (if slightly stunned) yes.

I came aboard with a lot of things in mind, but the main thing was to merge my test suite with some other tests and input from smart folks to create the very first official W3C test suite.  Of any kind, not just for CSS.  It was announced alongside the promotion of CSS2 to Recommendation status in December 1998.

I stayed an Invited Expert for a few years, but around 2003 I withdrew from the group for lack of time and input, and for the last 17-some years, that’s how it’s stayed.  Until now, that is: as of yesterday, I’ve rejoined the CSS Working Group, this time as an official Member, one of several representing Igalia.  And fittingly, Chris Lilley was the first to welcome me back.

I’m returning to take back up the mantle I carried the first time around: testing CSS.  I intend to focus on creating Web Platform Test entries demonstrating new CSS features, clarifying changes to existing specifications, and filling in areas of CSS that are under-tested.  Maybe even to draft tests for things the WG is debating, to explore what a given proposal would mean in terms of real-world rendering.

My thanks to Igalia for enabling my return to the CSS WG, as well as supporting my contributions yet to come.  And many thanks to the WG for a warm welcome.  I have every hope that I’ll be able to once more help CSS grow and improve in my own vaguely unique way.


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at June 08, 2021 01:43 PM

June 06, 2021

Manuel Rego

:focus-visible in WebKit - May 2021

And again this is a new report about the work around :focus-visible in WebKit, you can check the previous ones at:

As you might already know this work is part of the Open Prioriziatation campaign by Igalia that has been funded by a lot of people. Thank you all for your support!

The high level summary is that the implementation in WebKit can be considered to be complete and all the :focus-visible patches have been included on the last Safari Technology Preview 125 as an experimental feature. Moreover, Igalia has been in conversations with Apple trying to find a way to enable the feature by default at some point.

Implementation details

As I’ve just mentioned, the implementation finished by the end of April, and no more patches have landed since then. It passes most of the WPT tests, there are still some minor differences here and there (like some input types matching or not :focus-visible) but those issues have been considered to be fine as they depend on the different browsers specific behavior.

You can test this feature in Safari Technology Preview (since release 125) by enabling the runtime flag in the menu (Develop > Experimental Features > :focus-visible pseudo-class). Please play with it and report any issue you might find.

Debate time

During the last patch reviews more Apple engineers got interested on the feature, and there were a bunch of discussions about whether it would (or should) change the default behavior in WebKit, and how.

So let’s start from the beginning, what is focus-visible? A broad description of :focus-visible is that it will match based on when the browser would natively show a focus ring. The typical example for this are buttons, in general when people click a button they don’t expect to see a focus ring, for that reason most browsers haven’t been showing it for years. When an element is focused browsers use some internal heuristics to decide when to show or not a focus ring.

However buttons in Safari are different to other browsers, and that’s because Safari follows the Mac platform conventions. Buttons are not click focusable in Safari (though you can still focus them via keyboard with Option + Tab), as they don’t receive focus on click they don’t even match :focus, so they never show a focus ring on mouse interactions. This behavior tries to mimic what happens on the Mac platform, but there are still some differences. The Mac platform standard, for example, allows that you can be editing an input, click on a button and keep editing the input as the focus is still there. However that’s not exactly what happens in Safari either, when you click the button, even if it doesn’t get the focus, the focus is gone from the input, so you cannot just continue editing it like in the platform. On top of that, an invisible navigation caret moves to that button on click, and further keyboard navigations start from there. So it’s kind of similar to the platform, but with some nuances.

This is only part of the problem, the web is full of things that are focusable, like <div tabindex="0"> elements. These elements have always matched (and still match) :focus by default, and have usually showed a focus ring when focused via mouse click. Web authors generally want to hide the focus ring when clicking on <div tabindex="0"> elements, and that’s why the current :focus-visible implementations don’t match in this case. Chrome and Firefox are using :focus-visible in the User Agent (UA) style sheet, so they don’t show a focus ring when clicking on such elements. However, Apple has expressed some concerns here that it might change the default focus indicator behavior in a way that might differ from their platform philosophy, and thus needs more review.

During these conversations an idea showed up as potential solution. What if we show a focus ring when users click on a generic <div tabindex="0">, but we don’t if that element has some specific role, e.g. <div tabindex="0" role="button">. This would give web authors the possibility to get their desired behavior by just adding a role to those elements.

This would make <div tabindex="0" role="button"> work similar to regular buttons on Mac, but there’s still one difference, those elements will still get the focus so some use cases might get broken. James Craig came out with a scenario in which an user is scrolling the page with the spacebar, then they click on a <div tabindex="0" role="button">, and if they enter spacebar again, that wouldn’t keep scrolling the page anymore. And the user won’t know exactly why, as they haven’t seen any focus ring after click (note that with the current :focus-visible implementation, they user will start to see a focus ring on that <div tabindex="0" role="button"> after entering the spacebar).

On that discussion James has shared an idea to add a new CSS property (or it could be a HTML attribute) that marks an element so it cannot receive focus via mouse click. That would make possible to make buttons work like in Safari in other browsers, or make a <div tabindex="0"> to work like a Mac button too. However this would be something new that would need to get implemented in all browsers, not just WebKit and that would need to get discussed and agreed with the web community.

On the same issue Brian Kardell is proposing some alternatives, for example having some special parameter like :focus-visible(platform) (syntax to be defined) that could behave differently in Safari than other browsers, so Safari can use it in the UA style sheet, while :focus-visible alone would work the same in all browsers.

As you see there’s not a clear solution to all this discussion yet, but we’re following it closely and providing our feedback to try to reach some final proposal that makes everyone happy.

Some numbers

Let’s do a final review of the total numbers (as nothing has changed in May):

  • 26 PRs merged in WPT.
  • 27 patches landed in WebKit.
  • 9 patches landed in Chromium.
  • 2 PRs merged in CSS spcs.
  • 1 PR merged in HTML spec.

Wrapping up

:focus-visible has been added to WebKit thanks to the support from many individual people and organizations that make it happen through the Open Prioritization experiment by Igalia. Once more big thanks to you all! 🙏

In addition, the WPT test suite has been improved counting now ~40 tests for this feature. Also in January neither Firefox or Chrome were using :focus-visible on the UA style sheet, however they both use it there nowadays. Thus, doing the implementation on WebKit has helped to move forward this feature on different places.

There is still the ongoing discussion about when or how this could be enabled by default in WebKit and eventually shipped in Safari. That conversation is moving and we hope there’ll be some kind of positive resolution so this feature can be enjoyed by web authors in all the browser engines. Igalia will keep being on top of the topic and pushing things forward to make it happen.

Finally, thanks to everyone who has helped in the different conversations, reviews, etc. during these months.

June 06, 2021 10:00 PM

June 02, 2021

Eric Meyer

Ancestors and Descendants

After my post the other day about how I got started with CSS 25 years ago, I found myself reflecting on just how far CSS itself has come over all those years.  We went from a multi-year agony of incompatible layout models to the tipping point of April 2017, when four major Grid implementations shipped in as many weeks, and were very nearly 100% consistent with each other.  I expressed delight and astonishment at the time, but it still, to this day, amazes me.  Because that’s not what it was like when I started out.  At all.

I know it’s still fashionable to complain about how CSS is all janky and weird and unapproachable, but child, the wrinkles of today are a sunny park stroll compared to the jagged icebound cliff we faced at the dawn of CSS.  Just a few examples, from waaaaay back in the day:

  • In the initial CSS implementation by Netscape Navigator 4, padding was sometimes a void.  What I mean is, you could give an element a background color, and you could set a border, but if you adding any padding, in some situations it wouldn’t take on the background color, allowing the background of the parent element to show through.  Today, we can recreate that effect like so:
    border: 3px solid red;
    padding: 0.5em;
    background-color: cornflowerblue;
    background-clip: content-box;
    

    Padding as a void.

    .code-by-example { display: flex; flex-wrap: wrap; gap: 1em 1.5em; margin: 1.5em 0; } .code-by-example pre { flex-grow: 1; margin: 0; } #padding-void { border: 3px solid red; padding: 0.75em; background-color: cornflowerblue; background-clip: content-box; flex-shrink: 0; flex-grow: 1; } But we didn’t have background-clip in those days, and backgrounds weren’t supposed to act like that.  It was just a bug that got fixed a few versions later. (It was easier to get browsers to fix bugs in those days, because the web was a lot smaller, and so were the stakes.)  Until that happened, if you wanted a box with border, background, padding, and content in Navigator, you wrapped a <div> inside another <div>, then applied the border and background to the outer and the padding (or a margin, at that point it didn’t matter) to the inner.
  • In another early Navigator 4 version, pica math was inverted: Instead of 12 points per pica, it was set to 12 picas per point — so 12pt equated to 144pc instead of 1pc.  Oops.
  • Navigator 4’s handling of color values was another fun bit of bizarreness.  It would try to parse any string as if it were hexadecimal, but it did so in this weird way that meant if you declared color: inherit it would render in, as one person put it, “monkey-vomit green”.
  • Internet Explorer for Windows started out by only tiling background images down and to the right.  Which was fine if you left the origin image in the top left corner, but as soon as you moved it with background-position, the top and left sides of the element just… wouldn’t have any background.  Sort of like Navigator’s padding void!
  • At one point, IE/Win (as we called it then) just flat out refused to implement background-position: fixed.  I asked someone on that team point blank if they’d ever do it, and got just laughter and then, “Ah no.” (Eventually they relented, opening the door for me to create complexspiral and complexspiral distorted.)
  • For that matter, IE/Win didn’t inherit font sizes into tables.  Which would be annoying even today, but in the era of still needing tables to do page-level layout, it was a real problem.
  • IE/Win had so many layout bugs, there were whole sites dedicated to cataloging and explaining them.  Some readers will remember, and probably shudder to do so, the Three-Pixel Text Jog, the Phantom Box Bug, the Peekaboo Bug, and more.  Or, for that matter, hasLayout/zoom.
  • And perhaps most famous of all, Netscape and Opera implemented the W3C box model (2021 equivalent: box-sizing: content-box) while Microsoft implemented an alternative model (2021 equivalent: box-sizing: border-box), which meant apparently simple CSS meant to size elements would yield different results in different browsers.  Possibly vastly different, depending on the size of the padding and so on.  Which model is more sensible or intuitive doesn’t actually matter here: the inconsistency literally threatened the survival of CSS itself.  Neither side was willing to change to match the other — “we have customers!” was the cry — and nobody could agree on a set of new properties to replace height and width.  It took the invention of DOCTYPE switching to rescue CSS from the deadlock, which in turn helped set the stage for layout-behavior properties like box-sizing.

I could go on.  I didn’t even touch on Opera’s bugs, for example.  There was just so much that was wrong.  Enough so that in a fantastic bit of code aikido, Tantek turned browsers’ parsing bugs against them, redirecting those failures into ways to conditionally deliver specific CSS rules to the browsers that needed them.  A non-JS, non-DOCTYPE form of browser sniffing, if you like — one of the earliest progenitors of feature queries.

I said DOCTYPE switching saved CSS, and that’s true, but it’s not the whole truth.  So did the Web Standards Project, WaSP for short.  A group of volunteers, sick of the chaotic landscape of browser incompatibilities (some intentional) and the extra time and cost of dealing with them, who made the case to developers, browser makers, and the tech press that there was a better way, one where browsers were compatible on the basics like W3C specifications, and could compete on other features.  It was a long, wearying, sometimes frustrating, often derided campaign, but it worked.

The state of the web today, with its vast capability and wide compatibility, owes a great deal to the WaSP and its allies within browser teams.  I remember the time that someone working on a browser — I won’t say which one, or who it was — called me to discuss the way the WaSP was treating their browser. “I want you to be tougher on us,” they said, surprising the hell out of me. “If we can point to outside groups taking us to task for falling short, we can make the case internally to get more resources.”  That was when I fully grasped that corporations aren’t monoliths, and formulated my version of Hanlon’s Razor: “Never ascribe to malice that which is adequately explained by resource constraints.”

The original Acid Test.

In order to back up what we said when we took browsers to task, we needed test cases.  This not only gave the CSS1 Test Suite a place of importance, but also the tests the WaSP’s CSS Action Committee (aka the CSS Samurai) devised.  The most famous of these is the first CSS Acid Test, which was added to the CSS1 Test Suite and was even used as an Easter egg in Internet Explorer 5 for Macintosh.

The need for testing, whether acid or basic, lives on in the Web Platform Tests, or WPT for short.  These tests form a vital link in the development of the web.  They allow specification authors to create reference results for the rules in those specifications, and they allow browser makers to see if the code they’re writing yields the correct results.  Sometimes, an implementation fails a test and the implementor can’t figure out why, which leads to a discussion with the authors of the specification, and that can lead to clarifications of the specification, or to fixing flawed tests, or even to both.  Realize just how harmonious browser support for HTML and CSS is these days, and know that WPT deserves a big part of the credit for that harmony.

As much as the Web Standards Project set us on the right path, the Web Platform Tests keep us on that path.  And I can’t lie, I feel like the WPT is to the CSS1 Test Suite much like feature queries are to those old CSS parser hacks.  The latter are much greater and more powerful than than the former, but there’s an evolutionary line that connects them.  Forerunners and inheritors.  Ancestors and descendants.

It’s been a real privilege to be present as CSS first emerged, to watch as it’s developed into the powerhouse it is today, and to be a part of that story — a story that is, I believe, far from over.  There are still many ways for CSS to develop, and still so many things we have yet to discover in its feature set.  It’s still an entrancing language, and I hope I get to be entranced for another 25 years.

Thanks to Brian Kardell, Jenn Lukas, and Melanie Sumner for their input and suggestions.


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at June 02, 2021 08:16 PM

May 27, 2021

Brian Kardell

Stranger Than Fractions

Stranger Than Fractions

There's a new Math Working Group in the W3C (and I'm co-chairing). In this post, I'll share some information on that, why I really hope your organizations will join, as well as some personal reflections.

Life is weird. If I could travel back in time and explain my life to a younger me, I couldn't even count of the number of things that younger me would have just absolutely scoffed at in disbelief. Here's another one to add to the list: I'm co-chairing a new W3C Working Group focused on Math on the Web.

I'm not going to offer all of the reasons this would be surprising to a younger me, but suffice to say it's a pretty long list. Even a me of only 2-3 years ago would probably be pretty incredulous. See, I'd never really given math on the web much thought until then. The thing that really brought it to my attention was that a company I knew to be full of some pretty smart people (Igalia, where I now work) were suddenly talking about how to add MathML to Chromium, and why this is a thing we should do. It came up before the W3C Technical Architecture Group and was getting some larger discussion around the interwebs. Particularly, there were connections to Extensible Web Manifesto. I felt kind of compelled to really think about it and write something thoughtful about it myself. So in January 2019 (I didn't work for Igalia then) I wrote Harold Crick and the Web Platform.

Based on this and some other observations that I was having about what an important role I thought Igalia could play in so many fundamentally important issues, I applied there (here). Since then I've tried to help "right the ship" and get math onto the web and on a stable footing that is integrated with the platform. I participated in the CG where we worked out MathML-Core, attempting to do just that. I helped write some tests, open (and resolve!) issues in a number of standards about how we integrate, draft a bit of spec, open implementation bugs (and ship changes in all browsers!), explain why this work is important from a significant number of angles (not the least of which is that it is societally important) in blog posts and talks (I won't link them all because there's already a lot of links here), prioritize work, draft an expainer, work through a TAG review, draft the new Working Group Charter and gain support for it (I'm very pleased to say that every browser vendor supported its creation - and Chrome was even first one, if anyone has doubts).

A few weeks after the charter was approved, I was asked to sign on as a co-chair to lead the MathML-Core portion (the bit that goes in browsers). Last week I was officially added and "approved by the director" as co-chair.

Now for... you know... lots more important work as we try reach a really great state of affairs.

We'll only really hope to do that though with help and good, diverse (from many angles) participation in the Working Group. If you're a W3C member, consider getting involved yourself. If not, still please comment on issues and review things. Importantly: Put aside any math phobias, doubts or pre-conceived notions. Even if your present-self is a little (or even a lot) incredulous at the idea that you can really help. Believe me, I get it. But that's wrong. Help and participation from people with backgrounds across the platform aren't only very welcome, they're necessary: There's a lot to do to ensure that the platform is sensible and consistent as possible. Many discliplines need to coordinate to make sure that things stay on track, make sense and that important aspects don't get left behind.

We'll be starting up the MathML-Core meetings soon (end of June or early July, tdb sooon) and focus on actually moving some of this through the standards process and beginning to work together to answer remaining questions and make sure we're driving toward really good interoperable, well integrated math on the Web.

We can do this.

May 27, 2021 04:00 AM

May 25, 2021

Eric Meyer

25 Years of CSS

It was the morning of Tuesday, May 7th and I was sitting in the Ambroisie conference room of the CNIT in Paris, France having my mind repeatedly blown by an up-and-coming web technology called “Cascading Style Sheets”, 25 years ago this month.

I’d been the Webmaster at Case Western Reserve University for just over two years at that point, and although I was aware of table-driven layout, I’d resisted using it for the main campus site.  All those table tags just felt… wrong.  Icky.  And yet, I could readily see how not using tables hampered my layout options.  I’d been holding out for something better, but increasingly unsure how much longer I could wait.

Having successfully talked the university into paying my way to Paris to attend WWW5, partly by having a paper accepted for presentation, I was now sitting in the W3C track of the conference, seeing examples of CSS working in a browser, and it just felt… right.  When I saw a single word turned a rich blue and 100-point size with just a single element and a few simple rules, I was utterly hooked.  I still remember the buzzing tingle of excitement that encircled my head as I felt like I was seeing a real shift in the web’s power, a major leap forward, and exactly what I’d been holding out for.

Page 4, HTML 3.2.

Looking back at my hand-written notes (laptops were heavy, bulky, battery-poor, and expensive in those days, so I didn’t bother taking one with me) from the conference, which I still have, I find a lot that interests me.  HTTP 1.1 and HTML 3.2 were announced, or at least explained in detail, at that conference.  I took several notes on the brand-new <OBJECT> element and wrote “CENTER is in!”, which I think was an expression of excitement.  Ah, to be so young and foolish again.

There are other tidbits: a claim that “standards will trail innovation” — something that I feel has really only happened in the past decade or so — and that “Math has moved to ActiveMath”, the latter of which is a term I freely admit I not only forgot, but still can’t recall in any way whatsoever.

My first impressions of CSS, split for no clear reason across two pages.

But I did record that CSS had about 35 properties, and that you could associate it with markup using <LINK REL=STYLESHEET>, <STYLE>…</STYLE>, or <H1 STYLE="…">.  There’s a question — “Gradient backgrounds?” — that I can’t remember any longer if it was a note to myself to check later, or something that was floated as a possibility during the talk.  I did take notes on image backgrounds, text spacing, indents (which I managed to misspell), and more.

What I didn’t know at the time was that CSS was still largely vaporware.  Implementations were coming, sure, but the demos I’d seen were very narrowly chosen and browser support was minimal at best, not to mention wildly inconsistent.  I didn’t discover any of this until I got back home and started experimenting with the language.  With a printed copy of the CSS1 specification next to me, I kept trying things that seemed like they should work, and they didn’t.  It didn’t matter if I was using the market-dominating behemoth that was Netscape Navigator or the scrappy, fringe-niche new kid Internet Explorer: very little seemed to line up with the specification, and almost nothing worked consistently across the browsers.

So I started creating little test pages, tackling a single property on each page with one test per value (or value type), each just a simple assertion of what should be rendered along with a copy of the CSS used on the page.  Over time, my completionist streak drove me to expand this smattering of tests to cover everything in CSS1, and the perfectionist in me put in the effort to make it easy to navigate.  That way, when a new browser version came out, I could run it through the whole suite of tests and see what had changed and make note of it.

Eventually, those tests became the CSS1 Test Suite, and the way it looks today is pretty much how I built it.  Some tests were expanded, revised, and added, plus it eventually all got poured into a basic test harness that I think someone else wrote, but most of the tests — and the overall visual design — were my work, color-blindness insensitivity and all.  Those tests are basically what got me into the Working Group as an Invited Expert, way back in the day.

Before that happened, though, with all those tests in hand, I was able to compile CSS browser support information into a big color-coded table, which I published on the CWRU web site (remember, I was Webmaster) and made freely available to all.  The support data was stored in a large FileMaker Pro database, with custom dropdown fields to enter the Y/N/P/B values and lots of fields for me to enter template fragments so that I could export to HTML.  That support chart eventually migrated to the late Web Review, where it came to be known as “the Mastergrid”, a term I find funny in retrospect because grid layout was still two decades in the future, and anyway, it was just a large and heavily styled data table.  Because I wasn’t against tables for tabular data.  I just didn’t like the idea of using them solely for layout purposes.

You can see one of the later versions of Mastergrid in the Wayback Machine, with its heavily classed and yet still endearingly clumsy markup.  My work maintaining the Mastergrid, and articles I wrote for Web Review, led to my first book for O’Reilly (currently in its fourth edition), which led to my being asked to write other books and speak at conferences, which led to my deciding to co-found a conference… and a number of other things besides.

And it all kicked off 25 years ago this month in a conference room in Paris, May 7th, 1996.  What a journey it’s been.  I wonder now, in the latter half of my life, what CSS — what the web itself — will look like in another 25 years.


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at May 25, 2021 03:30 PM

Enrique Ocaña

GStreamer WebKit debugging by using external tools (2/2)

This is the last post of the series showing interesting debugging tools, I hope you have found it useful. Don’t miss the custom scripts at the bottom to process GStreamer logs, help you highlight the interesting parts and find the root cause of difficult bugs. Here are also the previous posts of the series:

How to debug pkgconfig

When pkg-config finds the PKG_CONFIG_DEBUG_SPEW env var, it explains all the steps used to resolve the packages:

PKG_CONFIG_DEBUG_SPEW=1 /usr/bin/pkg-config --libs x11

This is useful to know why a particular package isn’t found and what are the default values for PKG_CONFIG_PATH when it’s not defined. For example:

Adding directory '/usr/local/lib/x86_64-linux-gnu/pkgconfig' from PKG_CONFIG_PATH
Adding directory '/usr/local/lib/pkgconfig' from PKG_CONFIG_PATH
Adding directory '/usr/local/share/pkgconfig' from PKG_CONFIG_PATH
Adding directory '/usr/lib/x86_64-linux-gnu/pkgconfig' from PKG_CONFIG_PATH
Adding directory '/usr/lib/pkgconfig' from PKG_CONFIG_PATH
Adding directory '/usr/share/pkgconfig' from PKG_CONFIG_PATH

If we have tuned PKG_CONFIG_PATH, maybe we also want to add the default paths. For example:

SYSROOT=~/sysroot-x86-64
export PKG_CONFIG_PATH=${SYSROOT}/usr/local/lib/pkgconfig:${SYSROOT}/usr/lib/pkgconfig
# Add also the standard pkg-config paths to find libraries in the system
export PKG_CONFIG_PATH=${PKG_CONFIG_PATH}:/usr/local/lib/x86_64-linux-gnu/pkgconfig:\
/usr/local/lib/pkgconfig:/usr/local/share/pkgconfig:/usr/lib/x86_64-linux-gnu/pkgconfig:\
/usr/lib/pkgconfig:/usr/share/pkgconfig
# This tells pkg-config where the "system" pkg-config dir is. This is useful when cross-compiling for other
# architecture, to avoid pkg-config using the system .pc files and mixing host and target libraries
export PKG_CONFIG_LIBDIR=${SYSROOT}/usr/lib
# This could have been used for cross compiling:
#export PKG_CONFIG_SYSROOT_DIR=${SYSROOT}

Man in the middle proxy for WebKit

Sometimes it’s useful to use our own modified/unminified files with a 3rd party service we don’t control. Mitmproxy can be used as a man-in-the-middle proxy, but I haven’t tried it personally yet. What I have tried (with WPE) is this:

  1. Add an /etc/hosts entry to point the host serving the files we want to change to an IP address controlled by us.
  2. Configure a web server to provide the files in the expected path.
  3. Modify the ResourceRequestBase constructor to change the HTTPS requests to HTTP when the hostname matches the target:
ResourceRequestBase(const URL& url, ResourceRequestCachePolicy policy)
    : m_url(url)
    , m_timeoutInterval(s_defaultTimeoutInterval)
    ...
    , m_isAppBound(false)
{
    if (m_url.host().toStringWithoutCopying().containsIgnoringASCIICase(String("out-of-control-service.com"))
        && m_url.protocol().containsIgnoringASCIICase(String("https"))) {
        printf("### %s: URL %s detected, changing from https to http\n",
            __PRETTY_FUNCTION__, m_url.string().utf8().data()); 
        fflush(stdout);
        m_url.setProtocol(String("http"));
    }
}

:bulb: Pro tip: If you have to debug minified/obfuscated JavaScript code and don’t have a deobfuscated version to use in a man-in-the-middle fashion, use http://www.jsnice.org/ to deobfuscate it and get meaningful variable names.

Bandwidth control for a dependent device

If your computer has a “shared internet connection” enabled in Network Manager and provides access to a dependent device , you can control the bandwidth offered to that device. This is useful to trigger quality changes on adaptive streaming videos from services out of your control.

This can be done using tc, the Traffic Control tool from the Linux kernel. You can use this script to automate the process (edit it to suit to your needs).

Useful scripts to process GStreamer logs

I use these scripts in my daily job to look for strange patterns in GStreamer logs that help me to find the cause of the bugs I’m debugging:

  • h: Highlights each expression in the command line in a different color.
  • mgrep: Greps (only) for the lines with the expressions in the command line and highlights each expression in a different color.
  • filter-time: Gets a subset of the log lines between a start and (optionally) an end GStreamer log timestamp.
  • highlight-threads: Highlights each thread in a GStreamer log with a different color. That way it’s easier to follow a thread with the naked eye.
  • remove-ansi-colors: Removes the color codes from a colored GStreamer log.
  • aha: ANSI-HTML-Adapter converts plain text with color codes to HTML, so you can share your GStreamer logs from a web server (eg: for bug discussion). Available in most distros.
  • gstbuffer-leak-analyzer: Analyzes a GStreamer log and shows unbalances in the creation/destruction of GstBuffer and GstMemory objects.

by eocanha at May 25, 2021 06:00 AM

May 18, 2021

Enrique Ocaña

GStreamer WebKit debugging by using external tools (1/2)

In this new post series, I’ll show you how both existing and ad-hoc tools can be helpful to find the root cause of some problems. Here are also the older posts of this series in case you find them useful:

Use strace to know which config/library files are used by a program

If you’re becoming crazy supposing that the program should use some config and it seems to ignore it, just use strace to check what config files, libraries or other kind of files is the program actually using. Use the grep rules you need to refine the search:

$ strace -f -e trace=%file nano 2> >(grep 'nanorc')
access("/etc/nanorc", R_OK)             = 0
access("/usr/share/nano/javascript.nanorc", R_OK) = 0
access("/usr/share/nano/gentoo.nanorc", R_OK) = 0
...

Know which process is killing another one

First, try to strace -e trace=signal -p 1234 the killed process.

If that doesn’t work (eg: because it’s being killed with the uncatchable SIGKILL signal), then you can resort to modifying the kernel source code (signal.c) to log the calls to kill():

SYSCALL_DEFINE2(kill, pid_t, pid, int, sig)
{
    struct task_struct *tsk_p;
    ...
    /* Log SIGKILL */
    if (sig & 0x1F == 9) {
        tsk_p = find_task_by_vpid(pid);

        if (tsk_p) {
            printk(KERN_DEBUG "Sig: %d from pid: %d (%s) to pid: %d (%s)\n",
                sig, current->pid, current->comm, pid, tsk_p->comm);
        } else {
            printk(KERN_DEBUG "Sig: %d from pid: %d (%s) to pid: %d\n",
                sig, current->pid, current->comm, pid);
        }
    }
    ...
}

Wrap gcc/ld/make to tweak build parameters

If you ever find yourself with little time in front of a stubborn build system and, no matter what you try, you can’t get the right flags to the compiler, think about putting something (a wrapper) between the build system and the compiler. Example for g++:

#!/bin/bash
main() {
    # Build up arg[] array with all options to be passed
    # to subcommand.
    i=0
    for opt in "$@"; do
        case "$opt" in
        -O2) ;; # Removes this option
        *)
            arg[i]="$opt" # Keeps the others
            i=$((i+1))
            ;;
        esac
    done
    EXTRA_FLAGS="-O0" # Adds extra option
    echo "g++ ${EXTRA_FLAGS} ${arg[@]}" # >> /tmp/build.log # Logs the command
    /usr/bin/ccache g++ ${EXTRA_FLAGS} "${arg[@]}" # Runs the command
}
main "$@"

Make sure that the wrappers appear earlier than the real commands in your PATH.

The make wrapper can also call remake instead. Remake is fully compatible with make but has features to help debugging compilation and makefile errors.

Analyze the structure of MP4 data

The ISOBMFF Box Structure Viewer online tool allows you to upload an MP4 file and explore its structure.

by eocanha at May 18, 2021 06:00 AM

May 17, 2021

Delan Azabani

Chromium spelling and grammar features

Back in September, I wrote about my wonderful internship with Igalia’s web platform team. I’m thrilled to have since joined Igalia full-time, starting in the very last week of last year. My first project has been implementing the new CSS spelling and grammar features in Chromium. Life has been pretty hectic since Aria and I moved back to Perth, but more on that in another post. For now, let’s step back and review our progress.

article > figure > img { max-width: 100%; } article > figure > figcaption { max-width: 30rem; margin-left: auto; margin-right: auto; } article > pre, article > code { font-family: Inconsolata, monospace, monospace; } .local-demo { font-style: italic; font-weight: bold; color: rebeccapurple; } .local-spelling, .local-grammar { text-decoration-thickness: 0; text-decoration-skip-ink: none; } .local-spelling { text-decoration: red wavy underline; } .local-grammar { text-decoration: green wavy underline; } .local-table { font-size: 0.75em; } .local-table td, .local-table th { vertical-align: top; border: 1px solid black; } .local-table td:not(.local-tight), .local-table th:not(.local-tight) { padding: 0.5em; } .local-tight picture, .local-tight img { vertical-align: top; } .local-compare * + *, .local-tight * + * { margin-top: 0; } .local-compare { max-width: 100%; border: 1px solid rebeccapurple; } .local-compare > div { max-width: 100%; position: relative; touch-action: pinch-zoom; --cut: 50%; } .local-compare > div > * { vertical-align: top; max-width: 100%; } .local-compare > div > :nth-child(1) { position: absolute; clip: rect(auto, auto, auto, var(--cut)); } .local-compare > div > :nth-child(2) { position: absolute; width: var(--cut); height: 100%; border-right: 1px solid rebeccapurple; } .local-compare > div > :nth-child(2):before { content: "actual"; color: rebeccapurple; font-size: 0.75em; position: absolute; right: 0.5em; } .local-compare > div > :nth-child(2):after { content: "ref"; color: rebeccapurple; font-size: 0.75em; position: absolute; left: calc(100% + 0.5em); }

The squiggly lines that indicate possible spelling or grammar errors have been a staple of word processing on computers for decades. But on the web, these indicators are powered by the browser, which doesn’t always have the information needed to place and render them most appropriately. For example, authors might want to provide their own grammar checker (placement), or tweak colors to improve contrast (rendering).

To address this, the CSS pseudo and text decoration specs have defined new pseudo-elements ::spelling-error and ::grammar-error, allowing authors to style those indicators, and new text-decoration-line values spelling-error and grammar-error, allowing authors to mark up their text with the same kind of decorations as native indicators.

Contents

Current status

I’ve sent an Intent to Prototype, as well as requests for positions from Mozilla and Apple.

I’ve landed a patch that paves the way for ::spelling-error + ::grammar-error support internally, and I’m hopefully(!) around halfway done with implementing both the new painting rules and the new processing model.

The spec updates, led by Florian Rivoal, were largely done by the end of 2017. As the first impl of both the features themselves and much of the underlying highlight specs, there were always going to be questions and rough edges to be clarified.

Two issues were raised before we even started, I’ve since sent in another two, and I’ll need to raise at least two more by the time we’re done. I’ve also landed three WPT patches, including three new tests and fixes for countless more.

highlight-painting-003.html

In the course of my work on these features, I’ve already fixed at least two other bugs that weren’t of my own creation, and reported four more:

1171741Selecting text causes emphasis marks to be painted twice
1172177Erroneous viewport-size-dependent clipping of some text shadows
1176649text-shadow paints with incorrect offset for vertical scripts in vertical writing modes
1180068text-shadow erroneously paints over text proper in mixed upright/sideways fragments

CJK CSS unification

My colleague Rego noticed that the squiggly lines for spelling and grammar errors look slightly different to a naïve red or green wavy underline. How, why, and should we unify squiggly and wavy lines? Some further investigation revealed that the two kinds of decorations are drawn very differently with completely separate code paths.

non-macOS (demo0)
100%200%

Left (bolder text): nearest wavy decorations.
Right (lighter text): native squiggly lines.

The case for unifying squiggly and wavy lines became a lot more complicated too. For example, our squiggly lines are actually dots on macOS. More specifically, they are round dots with an alpha gradient, matching the platform’s native controls. These details are beyond what can be expressed in terms of a dotted underline, so if we were to unify by making squiggly lines equivalent to such a decoration, we would lose that benefit.

macOS (demo0)
100%200%

Left (bolder text): nearest dotted decorations.
Right (lighter text): native squiggly lines.

The spec doesn’t require that spelling-error and grammar-error lines be expressible in terms of other decoration lines, so unification won’t block shipping. I decided it would be best to revisit this once I landed some patches and familiarised myself with the code.

Fifteen years in the making

::spelling-error and ::grammar-error are defined as highlight pseudo-elements, together with ::selection and ::target-text. The spec’s processing model and rendering rules are both very different to how ::selection (or ::target-text) has been implemented in any browser so far. Now that we’re implementing more than just the first couple of pseudos, we really ought to comply with the new spec, which complicates our job somewhat.

I’ll talk about ::selection a fair bit below, because most of the spec discussion I found happened before the others were defined, going back as far as 2006. Highlight pseudos like ::selection are tricky because they aren’t tree-abiding: the selected parts of the document aren’t generally a child of any one element.

But even then, how hard could it be?

  • What is ::selection? How does it interact with other pseudo-elements? Is it a singleton, or does each element have a ::selection pseudo-element? How do we reconcile the ::selection “tree”, if any, with the element tree?
  • Can child ::selection styles override parent ::selection styles? What about the child’s “real element” styles? How exactly do parent ::selection styles propagate to child ::selection styles? Do we use a tweaked cascade or tweaked inheritance?
  • What happens when authors specify ::selection styles that affect layout? What about styles that rely on how ::selection relates to the element tree, like outline or translucent background-color?
  • What happens when child ::selection styles specify only color or only background-color but not both? Does the other inherit as usual? If we want a special case tying these two properties together, how does it interact with other properties?
  • Does the ::selection background-color paint over text, or under it? What about “replaced” content like images? If we paint over text, do we need to make the author’s color translucent, and if so, how?
  • Is text in the ::selection color painted in addition to, or instead of, the same text in its original color? What about background-color?
  • Can the default UA stylesheet describe the platform’s ::selection style? How?
  • How naughty were browsers that implemented ::selection without a -vendor-prefix before it was standardised? Are vendor prefixes even a good idea?
  • Most importantly, how do we introduce a new processing model and rendering rules without breaking existing content?

For answers to most of these questions, check out my notes5.


By the time I started to understand the problem space, two weeks had passed.

Pretty intense for my very first foray into www-style!

Highlight painting

The current spec isolates each highlight pseudo into an “overlay”, and allows each of them to have independent backgrounds, shadows, and other decorations.

Like other browsers, Chromium implemented an older model, where matching ::selection rules are only used to change things like the text color and shadows (except for background-color, which has always been independent).

But the closer I looked, the deeper the problems ran.

Shadows and backgrounds

everyone’s shadow code is complete made-up horseshit but mostly i blame the fact that someone decided to add ‘shadow’ to the (very small!) special list of styles ::selection could modify

— Gankra, 2021

I whipped up a quick demo3 with some backgrounds and shadows, and the result was… not good. “So the originating text shadow (yellow) paints over the ::selection background (grey), except when it paints under, and sometimes it even paints over the text (black)? Why is the ::selection shadow clipped to the ::selection background? What?”

highlight-painting-001.html (based on demo3)

Some of these were easier to fix than others. To fix backgrounds, we essentially push the code that paints the background waaaaay down NG­Text­Fragment­Painter, so that it’s before painting the selected text but after pretty much everything else. We then fix shadows similarly, reordering the text paints from “before with shadows, after with shadows, selected with shadows” to an order that keeps shadows behind text.

These initial fixes are now live in Chromium 90, but we still need to deal with the ::selection shadow clipping. What’s up with that?

Shadow clipping

The weird shadow clipping was a side effect of how we ensured that the ::selection text color changes exactly where the ::selection background starts:

  1. we clip out and paint the selected text in original color, then
  2. we clip (in) and paint the selected text in ::selection color.

This is useful for both subtle reasons, like ink overflow…

…and not so subtle reasons, like allowing the user to clearly and precisely select graphemes in ligature-heavy languages like Sorani. In this example, یلا is three letters (îla), but only two glyphs. This isn’t explicitly required by any spec, but it’s definitely intentional.

If you use Chromium, you may notice that the ref for that demo appears to select more text. What we’re really doing with ::selection painting is pretending that ligatures are divisible into horizontal parts and guessing how wide each part is. Current font technology just doesn’t provide the metadata to do this more “correctly”.

Firefox always allows splitting ligature styles, including with real elements, and there are at least two good arguments in favour of this approach. Chromium has (reasonably) decided that while the technique is ok for ::selection, perhaps even desirable, it’s not the way to go for ordinary markup.

But anyway, back to the point at hand. text-shadow means “paint the text again, under the text proper, with these colors and offsets”. We want to clip the ::selection shadow for the same reasons we clip the text proper in ::selection color, but the coordinates need to be offset for each shadow. That we don’t is the bug here.

When painting the ::selection shadow (blue), we need to clip the canvas to the dotted line, but we were actually clipping to the solid line.

Consensus seems to be that not doing so is undesirable, and in theory, fixing this would be straightforward, but in practice… 😵‍💫


The first confounding factor was that NG­Text­Fragment­Painter and NG­Text­Painter were… a tangled mess. Even the owners weren’t sure this was the most helpful architecture:

// TODO(layout-dev): Does this distinction make sense?
class CORE_EXPORT NGTextPainter : public TextPainterBase { /* ... */ }

Years of typographical features have been duct-taped on without a systemic approach to managing complexity, including decorations, shadows, ellipses, background clipping, RTL text, vertical text, ruby text, emphasis marks, print rendering, drag-and-drop rendering, selections, highlights, “markers”, and SVG features like stroke and fill.

A third of the logic was in Text­Painter­Base, so good luck not breaking legacy. Shadows were painted with a now-deprecated Skia feature called a Draw­Looper, which allows you to repeat a procedure a bunch of times with different tweaks, such as canvas transformations and color changes. It’s almost specifically designed for shadows, but it’s technically possible to repeat procedures that have nothing to do with drawing text.

// SkCanvas* canvas;
// SkPaint paint;
// SkScalar x, y;
// sk_sp<SkTextBlob> blob;
// sk_sp<SkDrawLooper> looper;
looper->apply(canvas, paint, [&blob, x, y](SkCanvas* c, const SkPaint& p) {
    // procedure to be looped
    c->drawTextBlob(blob, x, y, p);
});

My solution was based on the observation that loopers draw offset shadows by “moving” the canvas with a transform before each iteration, but transforming the canvas only affects subsequent operations. We were clipping the canvas once, before running the looper, but if we could somehow reclip the canvas after each transform, the clip region would “move” together with each shadow, and we wouldn’t even need to change the coordinates!

I prototyped a fix that seemed to handle everything I threw at it, and informed by the challenges that involved, I also refactored out the code for selections, highlights, and markers. Stephen and I decided that adding clipping as a fixed function to Draw­Looper made more sense than adding it to the procedure. At the time, this was true.

The prototype made my most complex test case (at the time) pass, with the exception of ink overflow color, which was a limitation of my ref (both renderings are acceptable).

I then took a couple weeks off to move to Perth.

Vertical vertigo

“Wait… isn’t the original purpose of vertical writing modes, you know, vertical scripts? I wonder if those work as well as horizontal scripts being rotated sideways…”

“…what? Let’s see what they look like without my patch…”

“…what?”

Left: vertical script in vertical-rl, with patch.
Right: same test case, without patch.

Notice how the shadows are offset in the wrong direction. They should be painted southeast of the text proper, but were being painted northeast.

When painting a text fragment with a vertical writing-mode, we rotate the canvas by 90° cw (or ccw for sideways-lr). This is good for horizontal scripts like Latin or Sorani, because they usually need to be painted sideways.

Except when text-orientation is upright, which overrides the usual behaviour.

But for vertical scripts like Han, we usually need to keep the canvas unrotated. A single text fragment can contain text in multiple scripts, so we actually achieve this by rotating the canvas back for the parts in vertical scripts.

Except when text-orientation is sideways, which overrides the usual behaviour.

Note that the way text-orientation is defined means that none of its values are actually supposed to affect the rendering of vertical-only scripts like Mongolian. I would suggest not thinking about this too hard.

So far so good right?


This is what we were doing when painting text with vertical scripts and shadows (example limited to a single script and single shadow for simplicity):

  1. Let space be our original “physical” coordinate space
  2. Let offset be the shadow’s offset in space
  3. Let selection be the selection rect coordinates in space
  4. Vertical writing mode, so rotate canvas by 90°, yielding space′
  5. Let offset′ be the result of mapping offset into space′
  6. Let selection′ be the result of mapping selection into space′
  7. Old: clip the canvas to selection′
  8. Configure a Draw­Looper that will:
    • move the canvas by offset′
    • New: clip the canvas to selection′
    • draw the text for the shadow
  9. Vertical script, so rotate canvas back by 90°, yielding space″
  10. Run the Draw­Looper, which carries out the steps above

The looper is told to move and clip the canvas to offset′ and selection′, which are coordinates in space′, but when it eventually tries to do that, the canvas is in space″.

offset′ being in the wrong space is why shadows have always been painted in the wrong place for vertical scripts. By reordering the clip to selection′ so it happens after the rotation to space″, we were now clipping the canvas to the wrong coordinates, which in turn made the text invisible in our demo6!

Cursed

Fixing this again proved harder than it seemed on the surface, because text painting in Chromium involves the coordination of four components: paint, shaping, cc, and Skia.

In paint, the text painters are given a “fragment” of text to be painted in a given style. They know the writing mode, because that’s part of the style, but they know very little about the text itself. The first rotation (for the vertical writing mode) happens here, and we configure the Draw­Looper here (except for its procedure, which we pass in shaping).

In shaping, we find the best glyphs for each character, and determine what scripts the text fragment is made of, then split the text into “blobs”. The second rotation (for the vertical script) happens here, and we throw in a skew transform too if the text we’re painting is oblique (or fake italic, which is again known only to shaping).

In cc, we expose a Skia-like API that can either dispatch to Skia immediately or collect operations into a queue for later. Draw­Looper is in the process of being moved here, because the Skia maintainers don’t want it.

Skia provides a stateful canvas, which more or less creates visible output.


With each canvas transform, existing coordinates need to be remapped into the new space before they can be used again, and we were doing them imperatively in two different components. Worse still, while layout (ng) — the phase that happens before paint — uses the type system to enforce correct handling of coordinates (e.g. Physical­Offset, Logical­Rect), the same is not true for paint onwards.

Everything is in Physical­Rect and friends, often erroneously, or in “untyped” coordinates like Float­Rect or Sk­Rect. In one case, a Physical­Offset is used in both physical and non-physical (rotated for writing-mode) spaces, to refer to two different points at different corners of the text. Here… let me illustrate.

When painting horizontal text in vertical-rl, we rotate the canvas 90° cw around A so that the text’s left descent corner lands on B. The left ascent corner moves from B to C.

That single variable was used to intentionally refer to both B and C at different times in a function, because the coordinates for B in space happen to be numerically the same as those for C in space′. aaaa­aaaA­AAAA­AAAA­AAAA-

-AAAAAAAAAAAAA

To be fair, each of these flaws has a reasonable explanation.

Layout is a confusing place where we constantly need to deal with different coordinate spaces, so ideally we would iron everything out so that paint can work purely in physical space. Half the point of types like Logical­Rect is to provide getters and setters for concepts like “inline start” and “block end”.

For most of the things we paint, this is ok, even desirable. Rects like ::selection backgrounds must be painted in physical space, so we can round the coordinates to integers for crisp edges. Text is the only exception: the history of computer typography means that vertical text is, to some extent, seen internally as rotated horizontal text.

Draw­Looper is handy for painting shadows, and it might[citation needed] even reduce serialisation overhead in cc. But the way we currently configure them, baking coordinates into them before shaping, makes it even harder to handle vertical text correctly.

Last but not least, Chromium’s pre-standard text painting order was “all rects for highlights and markers first, then all texts”. This made the imperative canvas rotations almost acceptable, if you ignore the shadow bugs, because we didn’t need to rotate the canvas back and forth nearly as many times.

Once I moved to Perth, I spent over three weeks trying to find a systemic solution to these problems, but I just wasn’t getting anywhere meaningful. In the interests of working a bit more breadth-first and avoiding burnout, I’ve shelved highlight painting for now.

Processing model

Let’s return to how computed styles for highlight selectors should work.

The consensus was that parent ::selection styles should somehow propagate to the ::selection styles of their children, so authors can use their existing CSS skills to define both general ::selection styles and more specific styles under certain elements. This was unlike all existing implementations, where the only selector that worked the way you would expect was ::selection, that is to say, *::selection.

At first, that “somehow” was by tweaking the cascade to take parent ::selection rules into account. Emilio raised performance concerns with this, so the spec was changed, instead tweaking inheritance to make ::selection styles inherit from parent ::selection styles (and never from originating or “real” elements).

This is what I’m working on now. I’ve got a patch that gets most of the way, first by fixing inherit, then by fixing unset, then with a couple more fixes for styles where the cascade doesn’t yield any value, but there are still a few kinks ahead:

  • impl work has raised at least three questions that need CSSWG clarification;
  • we need to optimise it, maybe more than before, to avoid perf regressions;
  • we still need to check if style invalidation works correctly; and
  • we probably want new devtools features to visualise highlight inheritance.

Stay tuned!

Beyond my colleagues at Igalia, special thanks go to Stephen, Rune, Koji (Google), and Emilio (Mozilla) for putting up with all of my questions, not to mention Florian and fantasai from the CSSWG, plus Gankra (Mozilla) for her writing about text rendering, which has proved both inspiring and reassuring.

May 17, 2021 10:30 AM

May 13, 2021

Eric Meyer

Adding Pandoc Arguments in BBEdit

Thanks to the long and winding history of my blog, I write posts in Markdown in BBEdit, export them to HTML, and paste the resulting HTML into WordPress. I do it that way because switching WordPress over to auto-parsing Markdown in posts causes problems with rendering the markup of some posts I wrote 15-20 years ago, and finding and fixing every instance is a lengthy project for which I do not have the time right now.

(And I don’t use the block editor because whenever I use it to edit an old post, the markup in those posts get mangled so much that it makes me want to hurl. This is as much the fault of my weird idiosyncratic bespoke-ancient setup as of WordPress itself, but it’s still super annoying and so I avoid it entirely.)

Anyway, the point here is that I write Markdown in BBEdit, and export it from there. This works okay, but there have always been things missing, like a way to easily add attributes to elements like my code blocks. BBEdit’s default Markdown exporter, CommonMark, sort of supports that, except it doesn’t appear to give me control over the class names: telling it I want a class value of css on a preformatted block means I get a class value of language-css instead. Also it drops that class value on the code element it inserts into the pre element, instead of attaching it directly to the pre element. Not good, unless I start using Prism, which I may one day but am not yet.

Pandoc, another exporter you can use in BBEdit, offers much more robust and yet simple element attribute attachment: you put {.class #id} or whatever at the beginning of any element, and you get those things attached directly to the element. But by default, it also wraps elements around, and adds attributes to, the pre element, apparently in anticipation of some other kind of syntax highlighting.

I spent an hour reading the Pandoc man page (just kidding, I was actually skimming, that’s the only way I could possibly get through all that in an hour) and found the --no-highlight option. Perfect! So I dropped into Preferences > Languages > Language-specific settings:Markdown > Markdown, set the “Markdown processor” dropdown to “Custom”, and filled in the following:

Command pandoc
Arguments --no-highlight

Done and done. I get a more powerful flavor of Markdown in an editor I know and love. It’s not perfect — I still have to manually tweak table markup by hand, for example — but it’s covering probably 95% of my use cases for writing blog posts.

Now all I need to do is find a Pandoc Markdown option or extensions or whatever that keeps it from collapsing the whitespace between elements in its HTML output, and I’ll be well and truly satisfied.


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at May 13, 2021 07:08 PM

Andy Wingo

cross-module inlining in guile

Greetings, hackers of spaceship Earth! Today's missive is about cross-module inlining in Guile.

a bit of history

Back in the day... what am I saying? I always start these posts with loads of context. Probably you know it all already. 10 years ago, Guile's partial evaluation pass extended the macro-writer's bill of rights to Schemers of the Guile persuasion. This pass makes local function definitions free in many cases: if they should be inlined and constant-folded, you are confident that they will be. peval lets you write clear programs with well-factored code and still have good optimization.

The peval pass did have a limitation, though, which wasn't its fault. In Guile, modules have historically been a first-order concept: modules are a kind of object with a hash table inside, which you build by mutating. I speak crassly but that's how it is. In such a world, it's hard to reason about top-level bindings: what module do they belong to? Could they ever be overridden? When you have a free reference to a, and there's a top-level definition of a in the current compilation unit, is that the a that's being referenced, or could it be something else? Could the binding be mutated in the future?

During the Guile 2.0 and 2.2 cycles, we punted on this question. But for 3.0, we added the notion of declarative modules. For these modules, bindings which are defined once in a module and which are not mutated in the compilation unit are declarative bindings, which can be reasoned about lexically. We actually translate them to a form of letrec*, which then enables inlining via peval, contification, and closure optimization -- in descending order of preference.

The upshot is that with Guile 3.0, top-level bindings are no longer optimization barriers, in the case of declarative modules, which are compatible enough with historic semantics and usage that they are on by default.

However, module boundaries have still been an optimization barrier. Take (srfi srfi-1), a little utility library on lists. One definition in the library is xcons, which is cons with arguments reversed. It's literally (lambda (cdr car) (cons car cdr)). But does the compiler know that? Would it know that (car (xcons x y)) is the same as y? Until now, no, because no part of the optimizer will look into bindings from outside the compilation unit.

mr compiler, tear down this wall

But no longer! Guile can now inline across module boundaries -- in some circumstances. This feature will be part of a future Guile 3.0.8.

There are actually two parts of this. One is the compiler can identify a set of "inlinable" values from a declarative module. An inlinable value is a small copyable expression. A copyable expression has no identity (it isn't a fresh allocation), and doesn't reference any module-private binding. Notably, lambda expressions can be copyable, depending on what they reference. The compiler then extends the module definition that's residualized in the compiled file to include a little procedure that, when passed a name, will return the Tree-IL representation of that binding. The design of that was a little tricky; we want to avoid overhead when using the module outside of the compiler, even relocations. See compute-encoding in that module for details.

With all of that, we can call ((module-inlinable-exports (resolve-interface '(srfi srfi-1))) 'xcons) and get back the Tree-IL equivalent of (lambda (cdr car) (cons car cdr)). Neat!

The other half of the facility is the actual inlining. Here we lean on peval again, causing <module-ref> forms to trigger an attempt to copy the term from the imported module to the residual expression, limited by the same effort counter as the rest of peval.

The end result is that we can be absolutely sure that constants in imported declarative modules will inline into their uses, and fairly sure that "small" procedures will inline too.

caveat: compiled imported modules

There are a few caveats about this facility, and they are sufficiently sharp that I should probably fix them some day. The first one is that for an imported module to expose inlinable definitions, the imported module needs to have been compiled already, not loaded from source. When you load a module from source using the interpreter instead of compiling it first, the pipeline is optimized for minimizing the latency between when you ask for the module and when it is available. There's no time to analyze the module to determine which exports are inlinable and so the module exposes no inlinable exports.

This caveat is mitigated by automatic compilation, enabled by default, which will compile imported modules as needed.

It could also be fixed for modules by topologically ordering the module compilation sequence; this would allow some parallelism in the build but less than before, though for module graphs with cycles (these exist!) you'd still have some weirdness.

caveat: abi fragility

Before Guile supported cross-module inlining, there was only explicit inlining across modules in Guile, facilitated by macros. If you write a module that has a define-inlinable export and you think about its ABI, then you know to consider any definition referenced by the inlinable export, and you know by experience that its code may be copied into other compilation units. Guile doesn't automatically recompile a dependent module when a macro that it uses changes, currently anyway. Admittedly this situation leans more on culture than on tools, which could be improved.

However, with automatically inlinable exports, this changes. Any definition in a module could be inlined into its uses in other modules. This may alter the ABI of a module in unexpected ways: you think that module C depends on module B, but after inlining it may depend on module A as well. Updating module B might not update the inlined copies of values from B into C -- as in the case of define-inlinable, but less lexically apparent.

At higher optimization levels, even private definitions in a module can be inlinable; these may be referenced if an exported macro from the module expands to a term that references a module-private variable, or if an inlinable exported binding references the private binding. But these optimization levels are off by default precisely because I fear the bugs.

Probably what this cries out for is some more sensible dependency tracking in build systems, but that is a topic for another day.

caveat: reproducibility

When you make a fresh checkout of Guile from git and build it, the build proceeds in the following way.

Firstly, we build libguile, the run-time library implemented in C.

Then we compile a "core" subset of Scheme files at optimization level -O1. This subset should include the evaluator, reader, macro expander, basic run-time, and compilers. (There is a bootstrap evaluator, reader, and macro expander in C, to start this process.) Say we have source files S0, S1, S2 and so on; generally speaking, these files correspond to Guile modules M0, M1, M2 etc. This first build produces compiled files C0, C1, C2, and so on. When compiling a file S2 implementing module M2, which happens to import M1 and M0, it may be M1 and M0 are provided by compiled files C1 and C0, or possibly they are loaded from the source files S1 and S0, or C1 and S0, or S1 and C0.

The bootstrap build uses make for parallelism, with each compile process starts afresh, importing all the modules that comprise the compiler and then using them to compile the target file. As the build proceeds, more and more module imports can be "serviced" by compiled files instead of source files, making the build go faster and faster. However this introduces system-specific nondeterminism as to the set of compiled files available when compiling any other file. This strategy works because it doesn't really matter whether module M1 is provided by compiled file C1 or source file S1; the compiler and the interpreter implement the same language.

Once the compiler is compiled at optimization level -O1, Guile then uses that freshly built compiler to build everything at -O2. We do it in this way because building some files at -O1 then all files at -O2 takes less time than going straight to -O2. If this sounds weird, that's because it is.

The resulting build is reproducible... mostly. There is a bug in which some unique identifiers generated as part of the implementation of macros can be non-reproducible in some cases, and that disabling parallel builds seems to solve the problem. The issue being that gensym (or equivalent) might be called a different number of times depending on whether you are loading a compiled module, or whether you need to read and macro-expand it. The resulting compiled files are equivalent under alpha-renaming but not bit-identical. This is a bug to fix.

Anyway, at optimization level -O1, Guile will record inlinable definitions. At -O2, Guile will actually try to do cross-module inlining. We run into two issues when compiling Guile; one is if we are in the -O2 phase, and we compile a module M which uses module N, and N is not in the set of "core" modules. In that case depending on parallelism and compile order, N may be loaded from source, in which case it has no inlinable exports, or from a compiled file, in which case it does. This is not a great situation for the reliability of this optimization. I think probably in Guile we will switch so that all modules are compiled at -O1 before compiling at -O2.

The second issue is more subtle: inlinable bindings are recorded after optimization of the Tree-IL. This is more optimal than recording inlinable bindings before optimization, as a term that is not inlinable due to its size in its initial form may become small enough to inline after optimization. However, at -O2, optimization includes cross-module inlining! A term that is inlinable at -O1 may become not inlinable at -O2 because it gets slightly larger, or vice-versa: terms that are too large at -O1 could shrink at -O2. We don't even have a guarantee that we will reach a fixed point even if we repeatedly recompile all inputs at -O2, because we allow non-shrinking optimizations.

I think this probably calls for a topological ordering of module compilation inside Guile and perhaps in other modules. That would at least give us reproducibility, provided we avoid the feedback loop of keeping around -O2 files compiled from a previous round, even if they are "up to date" (their corresponding source file didn't change).

and for what?

People who have worked on inliners will know what I mean that a good inliner is like a combine harvester: ruthlessly efficient, a qualitative improvement compared to not having one, but there is a pointy end with lots of whirling blades and it's important to stop at the end of the row. You do develop a sense of what will and won't inline, and I think Dybvig's "Macro writer's bill of rights" encompasses this sense. Luckily people don't lose fingers or limbs to inliners, but inliners can maim expectations, and cross-module inlining more so.

Still, what it buys us is the freedom to be abstract. I can define a module like:

(define-module (elf)
  #:export (ET_NONE ET_REL ET_EXEC ET_DYN ET_CORE))

(define ET_NONE		0)		; No file type
(define ET_REL		1)		; Relocatable file
(define ET_EXEC		2)		; Executable file
(define ET_DYN		3)		; Shared object file
(define ET_CORE		4)		; Core file

And if a module uses my (elf) module and references ET_DYN, I know that the module boundary doesn't prevent the value from being inlined as a constant (and possibly unboxed, etc).

I took a look and on our usual microbenchmark suite, cross-module inlining doesn't make a difference. But that's both a historical oddity and a bug: firstly that the benchmark suite comes from an old Scheme world that didn't have modules, and so won't benefit from cross-module inlining. Secondly, Scheme definitions from the "default" environment that aren't explicitly recognized as primitives aren't inlined, as the (guile) module isn't declarative. (Probably we should fix the latter at some point.)

But still, I'm really excited about this change! Guile developers use modules heavily and have been stepping around this optimization boundary for years. I count 100 direct uses of define-inlinable in Guile, a number of them inside macros, and many of these are to explicitly hack around the optimization barrier. I really look forward to seeing if we can remove some of these over time, to go back to plain old define and just trust the compiler to do what's needed.

by the numbers

I ran a quick analysis of the modules include in Guile to see what the impact was. Of the 321 files that define modules, 318 of them are declarative, and 88 contain inlinable exports (27% of total). Of the 6519 total bindings exported by declarative modules, 566 of those are inlinable (8.7%). Of the inlinable exports, 388 (69%) are functions (lambda expressions), 156 (28%) are constants, and 22 (4%) are "primitives" referenced by value and not by name, meaning definitions like (define head car) (instead of re-exporting car as head).

On the use side, 90 declarative modules import inlinable bindings (29%), resulting in about 1178 total attempts to copy inlinable bindings. 902 of those attempts are to copy a lambda expressions in operator position, which means that peval will attempt to inline their code. 46 of these attempts fail, perhaps due to size or effort constraints. 191 other attempts end up inlining constant values. 20 inlining attempts fail, perhaps because a lambda is used for a value. Somehow, 19 copied inlinable values get elided because they are evaluated only for their side effects, probably to clean up let-bound values that become unused due to copy propagation.

All in all, an interesting endeavor, and one to improve on going forward. Thanks for reading, and catch you next time!

by Andy Wingo at May 13, 2021 11:25 AM

Brian Kardell

Can I :has()

Can I :has()

As you might know, my company (Igalia) works on all of the web engines and we contribute a lot. I'm very proud of all of the things we're able to do to improve both the features of the web platform, and the overall health of this commons. I'm especially pleased when this lets us tackle historically hard problems. A very incomplete list of things with some historical challenges that we've helped move in important ways the past few years would include: CSS Grid, MathML, JavaScript Class features, hardware accelerated SVGs and Container Queries. Today I'll be telling you about another one we're working on.

Today we're filing an intent to prototype, tackling yet another historically hard problem for the web: The :has() selector. In this post, I'd like to explain what this intent means, as well as why it matters, where it comes from and why I am very excited about it.

:has() for the unfamilliar

When you write a CSS rule, the last (right-most) bit of the selector is the thing that you're styling. We call that the "subject" of the rule. Most people writing CSS have, at some point, found themselves wanting to style something based on what is inside it. You might have heard people talk about wanting "a parent selector" or an "ancestor combinator". :has() is that - basically.

/* style an .x that contains a .y descendant - not the .y */ 
.x:has(.y) { ... }

The long history of postponed :has

The basic reasons to desire such powers are pretty obvious. Powerful selection ability greatly enables a real separation of concerns. This fact wasn't lost on anyone. XPath allowed it, and CSS specifications since the late 1990's have tried.

In fact, a lot of people learned about :has() first through jQuery. That's because when John Resig wrote it he wanted it to support all of the ""new CSS 3 selectors" - and :has() was one of them. It was in the spec, so jQuery supported the :has() selector pseudo. The trouble, of course, is that no one actually knows what will gain implementations and reach recommendations at the start of the process - and :has() didn't, and was postponed again to Selectors Level 4. The first draft of Selectors Level 4 was published in 2011. It is highly starred, was among the top requested features in Microsoft's old User Voice system, and every year remains among the top 2-3 most requested features. Every so often someone (frequently me) brings it up again in the CSS Working Group.

Why the hold up?

Primarily, even without :has() it's pretty hard to live up to performance guarantees of CSS, where everything continue to evaluate and render "live" at 60fps. If you think, mathematically, about just how much work is conceptually involved in applying hundreds or thousands of rules as the DOM changes (including as it is parsing), it's quite a feat as is.

Engines have figured out how to optimize this based on clever patterns and observations that avoid the work that is conceptually necessary - and a lot of that is sort of based on these subject invariants that has() would appear to throw to the wind.

At the same time, there are plenty of aspects of this problem that are considerably easier than others. CSS engines for print, for example support :has() because they don't need to run at 60fps. DOM APIs like querySelector() / querySelectorAll() / matches() also check at a very specific point in time in a completely different manner - it's very doable there, as jQuery showed.

There are limits that we could potentially place on this selector that might help a little. Or, there are things like :focus-within or :empty which seem similar, but internally, are very specifically easier - but very incomplete.

And so, for the last decade this comes up once or twice a year as we try to find some way forward. In the end, we ultimately sort of go around in circles: It's impossible to make decisions and progress while everything is in limbo. We don't really know what the options are, and its hard for anyone to take up trying to imagine a way forward. As we've seen with some other things, like Container Queries, this is also somewhat of a vicious cycle: The more it comes up, and the more it is discussed without ultimate progress of any real kind, the less likely it is that anyone actually wants to talk about it again.

We need prototyping, exploration, data we can point to and more concrete things we can discuss - but the longer it goes on, the more hopeless it looks and the less likely anyone is to do it.

eyeo, Igalia and prototypes

Igalia works on all of the web engines, and with lots of consumers of those engines to expand investment in this wonderful commons. eyeo makes a number of products like the Adblock Browser and Adblock Plus. While some sites can offer workarounds that employ additional classes for intentionaly styling these sorts of things, plenty of other things cannot. Lots of very useful things (reader mode, ad blockers, conformance checker plugins and search are just some examples) rely on selectors and heuristics abbout trees of markup that they didn't create. They have a definite separation of concerns and thus observe these sorts of shortcommings very acutely. Having no native solutions for some of these hard problems causes everyone to have to find their own ways deal with it themselves, and all of them have different performance characteristics and different edge cases, and all of them require additional JavaScript. That's not good for anyone. So, eyeo approached us about sponsoring work, research and prototyping on some things - among them :has(). Can we somehow get past these impasses and make progress on this one, and make things better for the entire community? What might that look like? Can it conceivably work in the main 60fps CSS? If not, can we provide some research and data that allows other paths like support in the JavaScript DOM methods, or a static profile? Let's be sure to include all of important uses of selectors.

For the past little while, Igalia engineers have been looking into this problem. We've been having discussions with Chromium developers, looking into Firefox and WebKit codebases and doing some initial protypes and tests to really get our heads around it. Through this, we've provided lots of data about performance of what we have already and where we believe challenges and possibilities lie. We've begun sketching out an explainer with all of our design notes and questions linked up - so it's all there in the open for people to review as we attempt to open this discussion.

Today's intent: What it means

The meaning of "intents" have been occasionally difficult for the larger community to understand in the same way, so I wanted to take a moment to suggest how to interpret it: With today's intent, we're simply stating that we feel that we have gathered enough information and data on this that we feel like we're ready to share it for wider review and discussion, productively. We believe that the data suggests that it seems at least plausible to carry on with discussions around supporting a (partially limited) form of :has in the main, live CSS. We would like for data, designs and limits to be discussed fairly concretely. We would like like to carry forward with additional, concrete implementation prototyping and continue to help sort out a path forward.

May 13, 2021 04:00 AM

May 11, 2021

Enrique Ocaña

GStreamer WebKit debugging by instrumenting source code (3/3)

This is the last post on the instrumenting source code series. I hope you to find the tricks below as useful as the previous ones.

In this post I show some more useful debugging tricks. Don’t forget to have a look at the other posts of the series:

Finding memory leaks in a RefCounted subclass

The source code shown below must be placed in the .h where the class to be debugged is defined. It’s written in a way that doesn’t need to rebuild RefCounted.h, so it saves a lot of build time. It logs all refs, unrefs and adoptPtrs, so that any anomaly in the refcounting can be traced and investigated later. To use it, just make your class inherit from LoggedRefCounted instead of RefCounted.

Example output:

void WTF::adopted(WTF::LoggedRefCounted<T>*) [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1
void WTF::adopted(WTF::LoggedRefCounted<T>*) [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1
^^^ Two adopts, this is not good.
void WTF::LoggedRefCounted<T>::ref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1 --> ...
void WTF::LoggedRefCounted<T>::ref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount ... --> 2
void WTF::LoggedRefCounted<T>::deref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 2 --> ...
void WTF::LoggedRefCounted<T>::deref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount ... --> 1
void WTF::adopted(WTF::LoggedRefCounted<T>*) [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1
void WTF::LoggedRefCounted<T>::deref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1 --> ...
void WTF::LoggedRefCounted<T>::deref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1 --> ...
^^^ Two recursive derefs, not good either.
#include "Logging.h"

namespace WTF {

template<typename T> class LoggedRefCounted : public WTF::RefCounted<T> {
    WTF_MAKE_NONCOPYABLE(LoggedRefCounted); WTF_MAKE_FAST_ALLOCATED;
public:
    void ref() {
        printf("%s: this=%p, refCount %d --> ...\n", __PRETTY_FUNCTION__, this, WTF::RefCounted<T>::refCount()); fflush(stdout);
        WTF::RefCounted<T>::ref();
        printf("%s: this=%p, refCount ... --> %d\n", __PRETTY_FUNCTION__, this, WTF::RefCounted<T>::refCount()); fflush(stdout);
    }

    void deref() {
        printf("%s: this=%p, refCount %d --> ...\n", __PRETTY_FUNCTION__, this, WTF::RefCounted<T>::refCount()); fflush(stdout);
        WTF::RefCounted<T>::deref();
        printf("%s: this=%p, refCount ... --> %d\n", __PRETTY_FUNCTION__, this, WTF::RefCounted<T>::refCount()); fflush(stdout);
    }

protected:
    LoggedRefCounted() { }
    ~LoggedRefCounted() { }
};

template<typename T> inline void adopted(WTF::LoggedRefCounted<T>* object)
{
    printf("%s: this=%p, refCount %d\n", __PRETTY_FUNCTION__, object, (object)?object->refCount():0); fflush(stdout);
    adopted(static_cast<RefCountedBase*>(object));
}

} // Namespace WTF

Pause WebProcess on launch

WebProcessMainGtk and WebProcessMainWPE will sleep for 30 seconds if a special environment variable is defined:

export WEBKIT2_PAUSE_WEB_PROCESS_ON_LAUNCH=1

It only works #if ENABLE(DEVELOPER_MODE), so you might want to remove those ifdefs if you’re building in Release mode.

Log tracers

In big pipelines (e.g. playbin) it can be very hard to find what element is replying to a query or handling an event. Even using gdb can be extremely tedious due to the very high level of recursion. My coworker Alicia commented that using log tracers is more helpful in this case.

GST_TRACERS=log enables additional GST_TRACE() calls all accross GStreamer. The following example logs entries and exits into the query function.

GST_TRACERS=log GST_DEBUG='query:TRACE'

The names of the logging categories are somewhat inconsistent:

  • log (the log tracer itself)
  • GST_BUFFER
  • GST_BUFFER_LIST
  • GST_EVENT
  • GST_MESSAGE
  • GST_STATES
  • GST_PADS
  • GST_ELEMENT_PADS
  • GST_ELEMENT_FACTORY
  • query
  • bin

The log tracer code is in subprojects/gstreamer/plugins/tracers/gstlog.c.

by eocanha at May 11, 2021 06:00 AM

May 10, 2021

Fernando Jiménez

WPE WebKit for Android

WPE WebKit is the official WebKit port for embedded and low-consumption computer devices. It has been designed from the ground-up with performance, small footprint, accelerated content rendering, and simplicity of deployment in mind.

It brings the excellence of the WebKit engine to countless platforms and target devices, serving as a base for systems and environments that primarily or completely rely on web platform technologies to build their interfaces.

WPE WebKit’s architecture allows for inclusion in a variety of use cases and applications. It can be custom embedded into an existing application, or it can run as a standalone web runtime under a variety of presentation systems, from platform-specific display managers to existing window management protocols like Wayland or X11.

Today, we (Igalia) are happy to announce initial support of WPE for Android.

This effort was initiated back in 2017 by my colleague Žan Doberšek, who fully implemented a WPE backend for Android along with the required pieces to get rendering and basic input work. The work was paused for quite some time until the beginning of this year, when I joined Igalia and took over his work. Since then, I have been heads down working on it, trying to make it more usable thanks to Cerbero and a WebView based Java API.

How it looks

A picture is worth a thousand words. This is how it currently looks running on an Android phone:

As you can see, we have the basic set of functionality enough to implement a simple multi-tabs web browser with progress report, navigation controls and IME support.

Support is not limited to mobile devices though. Thanks to the wide range of architectures and devices that support Android we can now run WPE WebKit on an even wider set of devices. Like a pair of XR glasses. This is a video of a port of Firefox Reality using WPEView instead of GeckoView:

Building blocks

Cerbero build system

WPE WebKit has a very long list of dependencies. Cross compiling all these dependencies manually can be quite cumbersome, so in order to ease the development process I focused my first weeks of work on setting up a more usable build system. We decided to use Cerbero, GStreamer’s cross compilation system, which already had recipes - this is how Cerbero names its build scripts - for many of the required dependencies. I wrote all the missing Cerbero recipes and integrated it into WPE Android’s build system, to the point that building everything requires a single python3 scripts/bootstrap.py --build command.

For now the only supported architecture is arm64. There are plans to support other architectures soon.

WPEView API

WPEView wraps the WPE WebKit browser engine in a reusable Android API. WPEView serves a similar purpose to Android’s built-in WebView and tries to mimick its API aiming to be an easy to use drop-in replacement with extended functionality.

Setting up WPEView in your Android application is fairly simple.

First, add the WPEView widget to your Activity layout

<com.wpe.wpeview.WPEView
        android:id="@+id/wpe_view"
        android:layout_width="match_parent"
        android:layout_height="match_parent"
        tools:context=".MainActivity"/>

And next, wire it in your Activity implementation to start using the API, for example, to load an URL:

override fun onCreate(savedInstanceState: Bundle?) {
    super.onCreate(savedInstanceState)
    setContentView(R.layout.activity_main)

    var browser = findViewById(R.id.wpe_view)
    browser?.loadUrl(INITIAL_URL)
}

To get a better sense on how to use WPEView, check the code of the MiniBrowser demo in the examples folder.

Process model

In order to safeguard the rest of the system and to allow the application to remain responsive even if the user loads a web page that infinite loops or otherwise hangs, the modern incarnation of WebKit uses a multi-process architecture. Web pages are loaded in its own WebProcess. Multiple WebProcesses can share a browsing session, which lives in a shared NetworkProcess. In addition to handling all network accesses, this process is also responsible for managing the disk cache and Web APIs that allow websites to store structured data such as Web Storage API and IndexedDB API.

Given that Android forbids the fork syscall on non-rooted devices, we cannot directly spawn child processes. Instead, we use Android Services to host the logic of WebKit’s auxiliary processes. The life cycle of all WebKit’s auxiliary processes is managed by WebKit itself. The Android layer only proxies requests to spawn and terminate these processes/services.

In addition to the multi-process architecture, modern WebKit versions introduce the PSON model (Process Swap On Navigation) which aims to improve security by creating an independent WebProcess for each security origin. This is currently disabled for WPE Android, although partial support is already in place.

Browser and Pages

The central piece of WPE Android is the Browser top level Singleton object. This is somehow the equivalent to WebKit’s UIProcess. Among other duties it:

  • Manages the creation and destruction of Page instances.
  • Funnels WPEView API calls to the appropriate Page instance.
  • Manages the Android Services equivalent to WebKit’s auxiliary processes (Web and Network processes).
  • Hosts the UIProcess thread where the WebKitWebContext instance lives and where the main loop is run.

A Page roughly corresponds to a tab in a regular browser UI. There is a 1:1 relationship between WPEView and Page. Each Page instance has its own gfx.View and WebKitWebView instances associated.

WPE Backend

The common interface between WPEWebKit and its rendering backends is provided by libwpe. WPEBackend-android is our Android-oriented implementation of the libwpe API, bridging the gap between the WebKit architecture and the internal composition structure on one side and the Android system on the other.

gfx.View

gfx.View is an extension of android.opengl.GLSurfaceView living in the UI Process. It manages the life cycle of a Surface Texture, which is some sort of buffer consumer, that is handed off to the Web Process through Android’s IPC mechanisms, where the actual rendering happens.

It is also in charge of relaying input events to the internal WebKit input-methods.

This part is currently being significantly changed by Žan to use Native Hardware Buffers.

Future work

There are still plenty of things to do and, we have a growing list of issues in the main repository. The next steps will be towards extending support for other architectures - so far only arm64 is supported. Multimedia support is also on the list of immediate plans. Along with the big rendering engine refactor that Žan is working on.

Try it yourself

If you want to try the current prototype, you can follow the instructions in the README of the main repo.

We welcome contributions of all kinds. Give it a try and file issues as you encounter them. And if you feel encouraged enough, send us patches!

Acknowledgements

  • I would like to thank Igalia for giving me the time and space to work on this project.
  • Huge thanks to Žan Doberšek for his amazing work and continuous guidance.
  • Kudos to Philippe Normand and Thibault Saunier for their recommendations and support around Cerbero.
  • Many thanks to Imanol Fernández for his contributions so far and for the VR demo.

May 10, 2021 12:00 AM

May 06, 2021

Eleni Maria Stea

Sharing texture data between ANGLE and the native system driver using DMA buffers and EGL on Linux (proof of concept)

This post is about an experiment I’ve performed to investigate if it’s possible to fill a texture from an ANGLE EGL/GLESv2 context (ANGLE is an EGL/GLESv2 implementation on top of other graphics APIs), and use it from a native (for example mesa3D) EGL/OpenGL context on Linux. I’ve written a program that is similar to my … Continue reading Sharing texture data between ANGLE and the native system driver using DMA buffers and EGL on Linux (proof of concept)

by hikiko at May 06, 2021 01:23 PM

May 05, 2021

Danylo Piliaiev

Turnips in the wild (Part 2)

In Turnips in the wild (Part 1) we walked through two issues, one in TauCeti Benchmark and the other in Genshin Impact. Today, I have an update about the one I didn’t have plan to fix, and a showcase of two remaining issues I met in Genshin Impact.

Genshin Impact

Gameplay – Disco Water

In the previous post I said that I’m not planning to fix the broken water effect since it relied on undefined behavior.

Screenshot of the gameplay with body of water that has large colorful artifacts

However, I was notified that same issue was fixed in OpenGL driver for Adreno (Freedreno) and the fix is rather easy. Even though for Vulkan it is clearly an undefined behavior, with other APIs it might not be so clear. Thus, given that we want to support translation from other APIs, there are already apps which rely on this behavior, and it would be just a bit more performant - I made a fix for it.

Screenshot of the gameplay with body of water without artifacts

The issue was fixed by “tu: do not corrupt unwritten render targets (!10489)”

Login Screen

The login screen welcomes us with not-so-healthy colors:

Screenshot of a login screen in Genshin Impact which has wrong colors - columns and road are blue and white

And with a few failures to allocate registers in the logs. The failure to allocate registers isn’t good and may cause some important shader not to run, but let’s hope it’s not that. Thus, again, we should take a closer look at the frame.

Once the frame is loaded I’m staring at an empty image at the end of the frame… Not a great start.

Such things mostly happen due to a GPU hang. Since I’m inspecting frames on Linux I took a look at dmesg and confirmed the hang:

 [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence ...

Fortunately, after walking through draw calls, I found that the mis-rendering happens before the call which hangs. Let’s look at it:

Screenshot of a correct draw call right before the wrong one being inspected in RenderDoc
Draw call right before
Screenshot of a draw call, that draws the wrong colors, being inspected in RenderDoc
Draw call with the issue

It looks like some fullscreen effect. As in the previous case - the inputs are fine, the only image input is a depth buffer. Also, there are always uniforms passed to the shaders, but when there is only a single problematic draw call - they are rarely an issue (also they are easily comparable with the proprietary driver if I spot some nonsensical values among them).

Now it’s time to look at the shader, ~150 assembly instructions, nothing fancy, nothing obvious, and a lonely kill near the top. Before going into the most “fun” part, it’s a good idea to make sure that the issue is 99% in the shader. RenderDoc has a cool feature which allows to debug shader (its SPIRV code) at a certain fragment (or vertex, or CS invocation), it does the evaluation on CPU, so I can use it as some kind of a reference implementation. In our case the output between RenderDoc and actual shader evaluation on GPU is different:

Screenshot of the color value calculated on CPU by RenderDoc
Evaluation on CPU: color = vec4(0.17134, 0.40289, 0.69859, 0.00124)
Screenshot of the color value calculated on GPU
On GPU: color = vec4(3.1875, 4.25, 5.625, 0.00061)

Knowing the above there is only one thing left to do - reduce the shader until we find the problematic instruction(s). Fortunately there is a proprietary driver which renders the scene correctly, therefor instead of relying on intuition, luck, and persistance - we could quickly bisect to the issue by editing and comparing the edited shader with a reference driver. Actually, it’s possible to do this with shader debugging in RenderDoc, but I had problems with it at that moment and it’s not that easy to do.

The process goes like this:

  1. Decompile SPIRV into GLSL and check that it compiles back (sometimes it requires some editing)
  2. Remove half of the code, write the most promising temporary variable as a color, and take a look at results
  3. Copy the edited code to RenderDoc instance which runs on proprietary driver
  4. Compare the results
  5. If there is a difference - return deleted code, now we know that the issue is probably in it. Thus, bisect it by returning to step 2.

This way I bisected to this fragment:

_243 = clamp(_243, 0.0, 1.0);
_279 = clamp(_279, 0.0, 1.0);
float _290;
if (_72.x) {
  _290 = _279;
} else {
  _290 = _243;
}

color0 = vec4(_290);
return;

Writing _279 or _243 to color0 produced reasonable results, but writing _290 produced nonsense. The difference was only the presence of condition. Now, having a minimal change which reproduces the issue, it’s possible to compare native assembly.

Bad:

mad.f32 r0.z, c0.y, r0.x, c6.w
sqrt r0.y, r0.y
mul.f r0.x, r1.y, c1.z
(ss)(nop2) mad.f32 r1.z, c6.x, r0.y, c6.y
(nop3) cmps.f.ge r0.y, r0.x, r1.w
(sat)(nop3) sel.b32 r0.w, r0.z, r0.y, r1.z

Good:

(sat)mad.f32 r0.z, c0.y, r0.x, c6.w
sqrt r0.y, r0.y
(ss)(sat)mad.f32 r1.z, c6.x, r0.y, c6.y
(nop2) mul.f r0.y, r1.y, c1.z
add.f r0.x, r0.z, r1.z
(nop3) cmps.f.ge r0.w, r0.y, r1.w
cov.u r1.w, r0.w
(rpt2)nop
(nop3) add.f r0.w, r0.x, r1.w

By running them in my head I reasoned that they should produce the same results. Something works not as expected. After a bit more changes in GLSL, it became apparent that something wrong with clamp(x, 0, 1) which is translated into (sat) modifier for instructions. A bit more digging and I found out that hardware doesn’t understand saturation modifier being placed on sel. instruction (sel is a selection between two values based on third).

Disallowing compiler to place saturation on sel instruction resolved the bug:

Login screen after the fix

The issue was fixed by “ir3: disallow .sat on SEL instructions (!9666)”

Gameplay – Where did the trees go?

Screenshot of the gameplay with trees and grass being almost black

The trees and grass are seem to be rendered incorrectly. After looking through the trace and not finding where they were actually rendered, I studied the trace on proprietary driver and found them. However, there weren’t any such draw calls on Turnip!

The answer was simple, shaders failed to compile due to the failure in a register allocation I mentioned earlier… The general solution would be an implementation of register spilling. However in this case there is a pending merge request that implements a new register allocator, which later would help us implement register spilling. With it shaders can now be compiled!

Screenshot of the gameplay with trees and grass being rendered correctly

More Turnip adventures to come!

by Danylo Piliaiev at May 05, 2021 09:00 PM

May 04, 2021

Manuel Rego

:focus-visible in WebKit - April 2021

A new report after a new month is gone, you can see the previous ones at:

Again this is a status report about the work Igalia is doing on the implementation of :focus-visible in WebKit, which is part of the Open Prioriziatation campaign. Thanks everyone supporting this!

There has been some nice progress in April, though some things are still under ongoing discussion.

Script focus and :focus-visible

Finally we decided to reflect the reality in the script focus tests that I created, and merge them on WPT. The ones where the implementations Chromium and Firefox don’t match the agreed expectations (the ones when you click a non-focusable element, and that click moves focus elsewhere), where marked as .tentative.

So we opened a separated issue to explain the situation and gather more feedback about what to do here.

Once we had the tests I implemented the same behavior as other browsers, the current one reflected on the tests, in WebKit. The patch got reviewed and merged, so script focus and :focus-visible work the same in all browsers right now.

:focus-visible and Shadow DOM

The test that checks that :focus-visible doesn’t match on ShadowRoot was also merged in WPT. That’s the current behavior on WebKit implementation too. More about this in January’s post.

Implementation details

There was a crash in WebKit due to my initial implementation of script focus, that problem has been already fixed. Also an extra bug was found and fixed too.

On the review of those patches, some new discussion started about different things related to :focus-visible feature, like why a keyboard input triggers :focus-visible matching. The discussion with Apple engineers is ongoing on the bug and let’s see how it ends.

Some numbers

Let’s take a look to the numbers again:

  • 26 PRs merged in WPT (5 in April).
  • 27 patches landed in WebKit (10 in April).
  • 9 patches landed in Chromium (2 in April).
  • 2 PRs merged in CSS spcs.
  • 1 PR merged in HTML spec.

Next steps

Implementation is mostly over, now the goal is to close the discussions with the different parties and check the possibilities of shipping this in WebKit.

Thanks everyone that has provided input in the different discussions and jumped on the patches review. Your feedback has been really useful to keep moving this forward.

Stay tuned!

May 04, 2021 10:00 PM

Enrique Ocaña

GStreamer WebKit debugging by instrumenting source code (2/3)

In this post I show some more useful debugging tricks. Check also the other posts of the series:

Print current thread id

The thread id is generated by Linux and can take values higher than 1-9, just like PIDs. This thread number is useful to know which function calls are issued by the same thread, avoiding confusion between threads.

#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

printf("%s [%d]\n", __PRETTY_FUNCTION__, syscall(SYS_gettid));
fflush(stdout);

Debug GStreamer thread locks

We redefine the GST_OBJECT_LOCK/UNLOCK/TRYLOCK macros to print the calls, compare locks against unlocks, and see who’s not releasing its lock:

#include "wtf/Threading.h"
#define GST_OBJECT_LOCK(obj) do { \
  printf("### [LOCK] %s [%p]\n", __PRETTY_FUNCTION__, &Thread::current()); fflush(stdout); \
  g_mutex_lock(GST_OBJECT_GET_LOCK(obj)); \
} while (0)
#define GST_OBJECT_UNLOCK(obj) do { \
  printf("### [UNLOCK] %s [%p]\n", __PRETTY_FUNCTION__, &Thread::current()); fflush(stdout); \
  g_mutex_unlock(GST_OBJECT_GET_LOCK(obj)); \
} while (0)
#define GST_OBJECT_TRYLOCK(obj) ({ \
  gboolean result = g_mutex_trylock(GST_OBJECT_GET_LOCK(obj)); \
  if (result) { \
   printf("### [LOCK] %s [%p]\n", __PRETTY_FUNCTION__, &Thread::current()); fflush(stdout); \
  } \
  result; \
})

Warning: The statement expression that allows the TRYLOCK macro to return a value will only work on GCC.

There’s a way to know which thread has taken a lock in glib/GStreamer using gdb. First locate the stalled thread:

(gdb) thread 
(gdb) bt
#2  0x74f07416 in pthread_mutex_lock ()
#3  0x7488aec6 in gst_pad_query ()
#4  0x6debebf2 in autoplug_query_allocation ()

(gdb) frame 3
#3  0x7488aec6 in gst_pad_query (pad=pad@entry=0x54a9b8, ...)
4058        GST_PAD_STREAM_LOCK (pad);

Now get the process id (PID) and use the pthread_mutex_t structure to print the Linux thread id that has acquired the lock:

(gdb) call getpid()
$30 = 6321
(gdb) p ((pthread_mutex_t*)pad.stream_rec_lock.p)->__data.__owner
$31 = 6368
(gdb) thread find 6321.6368
Thread 21 has target id 'Thread 6321.6368'

Trace function calls (poor developer version)

If you’re using C++, you can define a tracer class. This is for webkit, but you get the idea:

#define MYTRACER() MyTracer(__PRETTY_FUNCTION__);
class MyTracer {
public:
    MyTracer(const gchar* functionName)
      : m_functionName(functionName) {
      printf("### %s : begin %d\n", m_functionName.utf8().data(), currentThread()); fflush(stdout);
    }
    virtual ~MyTracer() {
        printf("### %s : end %d\n", m_functionName.utf8().data(), currentThread()); fflush(stdout);
    }
private:
    String m_functionName;
};

And use it like this in all the functions you want to trace:

void somefunction() {
  MYTRACER();
  // Some other code...
}

The constructor will log when the execution flow enters into the function and the destructor will log when the flow exits.

Setting breakpoints from C

In the C code, just call raise(SIGINT) (simulate CTRL+C, normally the program would finish).

And then, in a previously attached gdb, after breaking and having debugging all you needed, just continue the execution by ignoring the signal or just plainly continuing:

(gdb) signal 0
(gdb) continue

There’s a way to do the same but attaching gdb after the raise. Use raise(SIGSTOP) instead (simulate CTRL+Z). Then attach gdb, locate the thread calling raise and switch to it:

(gdb) thread apply all bt
[now search for "raise" in the terminal log]
Thread 36 (Thread 1977.2033): #1 0x74f5b3f2 in raise () from /home/enrique/buildroot/output2/staging/lib/libpthread.so.0
(gdb) thread 36

Now, from a terminal, send a continuation signal: kill -SIGCONT 1977. Finally instruct gdb to single-step only the current thread (IMPORTANT!) and run some steps until all the raises have been processed:

(gdb) set scheduler-locking on
(gdb) next    // Repeat several times...

Know the name of a GStreamer function stored in a pointer at runtime

Just use this macro:

GST_DEBUG_FUNCPTR_NAME(func)

Detecting memory leaks in WebKit

RefCountedLeakCounter is a tool class that can help to debug reference leaks by printing this kind of messages when WebKit exits:

  LEAK: 2 XMLHttpRequest
  LEAK: 25 CachedResource
  LEAK: 3820 WebCoreNode

To use it you have to modify the particular class you want to debug:

  • Include wtf/RefCountedLeakCounter.h
  • DEFINE_DEBUG_ONLY_GLOBAL(WTF::RefCountedLeakCounter, myClassCounter, ("MyClass"));
  • In the constructor: myClassCounter.increment()
  • In the destructor: myClassCounter.decrement()

by eocanha at May 04, 2021 06:00 AM

April 30, 2021

Brian Kardell

ES Temporal: 2 Minute Standards

ES Temporal: 2 Minute Standards

You might have heard that Temporal has recently reached Stage 3 in ECMA, here's a #StandardsIn2Min explaination of it...

Since it's beginnings JavaScript has had only a rudimentary Date object. It copied Date from an early edition of Java and it was intended to be used for programming just about anything relating to time. While Java itself quickly deprecated and improved their situation, JavaScript implementations didn't follow suit. It also introduced its own warts and quirks to Date along the way. As a result, JavaScript libraries dedicated toward reasoning about the complexities of time, like moment.js became common and essential.

Temporal, now at stage 3 as of this writing, is the result of a lot of work initiated by maintainers of those projects and shepherded through the standards process. It introduces lots of rich APIs to the JavaScript standard library, hosted by a new top-level Temporal object. Large top-level introductions of this sort (like Temporal, Math and Intl) are exceedingly rare.

Temporal provides a number of objects, all immutable and serializable, and each with their own methods for reasoning about time in different ways.

It contains some fundamental concepts which implement standards for calendar systems and timezones respectively, and set the foundations for how a lot of methods do their work:

  • Temporal.Calendar
  • Temporal.TimeZone

It introduces Temporal.Instant which is used for dealing with an instant in time to various degrees of precision.

It also introduces a number of "plain" themed objects geared toward providing APIs for the different ways we think about time not simply in terms of different kinds of precision. For example...

  • Temporal.PlainTime is for dealing with wall-clock time that is not associated with a particular date or time zone.
  • Temporal.PlainMonthDay is for date without a year components, useful for annual events like "The fourth of July" or "Christmas Day".
  • Temporal.PlainDateTime is for representing a calendar date and wall-clock time that does not carry time zone information, e.g. December 7th, 1995 at 3:00 PM (in the Gregorian calendar).
  • Temporal.PlainYearMonth is useful for expressing things like "The October 2020 edition of Vanity Fair".

It also includes...

  • Temporal.ZonedDateTime is for reasoning about dates and times in the timezone offsets reckoned by a particular calendar
  • Temporal.Duration is used for measuring the duration between two temporal objects.
  • Temporal.now provides APIs about current the moment in time.
A diagram illustrating the different types, relationships and concepts described that make up Temporal

You can learn a lot more by following through links in the Temporal proposal repo, including helpful Reference documentation and examples, a cookbook to help you get started and learn the ins and outs of Temporal which include a (not production ready) polyfill in every page so that you can open Dev Tools and explore for yourself.

If you're interested in hearing about the history, development, challenges, inner workings or rationale behind any of this, I recently hosted an edition of our podcast on this topic with guests who worked on the standard.

April 30, 2021 04:00 AM

April 29, 2021

Ricardo García

Vulkan Ray Tracing Resources and Overview

As you may know, I’ve been working on VK-GL-CTS for some time now. VK-GL-CTS the Conformance Test Suite for Vulkan and OpenGL, a large collection of tests used to verify implementations of the Vulkan and OpenGL APIs work as intended by the specification. My work has been mainly focused on the Vulkan side of things as part of Igalia's ongoing collaboration with Valve.

Last year, Khronos released the official specification of the Vulkan ray tracing extensions and I had the chance to participate in the final stages of the process by improving test coverage and fixing bugs in existing CTS tests, which is work that continues to this day mixed with other types of tasks in my backlog.

As part of this effort I learned many bits of the new Vulkan Ray Tracing API and even provided some very minor feedback about the spec, which resulted in me being listed as contributor to the VK_KHR_acceleration_structure extension.

Now that the waters are a bit more calm, I wanted to give you a list of resources and a small overview of the main concepts behind the Vulkan version of ray tracing.

General Overview

There are a few basic resources that can help you get acquainted with the new APIs.

  1. The official Khronos blog published an overview of the ray tracing extensions that explains some of the basic concepts like acceleration structures, ray tracing pipelines (and what their different shader stages do) and ray queries.

  2. Intel’s Jason Ekstrand gave an excellent talk about ray tracing in Vulkan in XDC 2020. I highly recommend you to watch it if you’re interested.

  3. For those wanting to get their hands on some code, the Khronos official Vulkan Samples repository includes a basic ray tracing sample.

  4. The official Vulkan specification text (warning: very large HTML document), while intimidating, is actually a good source to learn many new parts of the API. If you’re already familiar with Vulkan, the different sections about ray tracing and ray tracing pipelines are worth reading.

Acceleration Structures

The basic idea of ray tracing, as a tool, is that you must be able to choose an arbitrary point in space as the ray origin and a direction vector, and ask your implementation if that ray intersects anything along the way given a minimum and maximum distance.

In a modern computer or console game the number of triangles present in a scene is huge, so you can imagine detecting intersections between them and your ray can be very expensive. The implementation typically needs to organize the scene geometry in a hierarchical tree-like structure that can be traversed more efficiently by discarding large amounts of geometry with some simple tests. That’s what an Acceleration Structure is.

Fortunately, you don’t have to organize the scene geometry yourself. Implementations are free to choose the best and most suitable acceleration structure format according to the underlying hardware. They will build this acceleration structure for you and give you an opaque handle to it that you can use in your app with the rest of the API. You’re only required to provide the long list of geometries making up your scene.

You may be thinking, and you’d be right, that building the acceleration structure must be a complex and costly process itself, and it is. For this reason, you must try to avoid rebuilding them completely all the time, in every frame of the app. This is why acceleration structures are divided in two types: bottom level and top level.

Bottom level acceleration structures (BLAS) contain lists of geometries and typically represent whole models in your scene: a building, a tree, an object, etc.

Top level acceleration structures (TLAS) contain lists of “pointers” to bottom level acceleration structures, together with a transformation matrix for each pointer.

In the diagram below, taken from Jason Ekstrand’s XDC 2020 talk[1], you can see the blue square representing the TLAS, the red squares representing BLAS and the purple squares representing geometries.

Picture showing a hand-drawn cowboy, cactus and cow. A blue square surrounds the whole picture. Orange squares surround the cowboy, cactus and cow. Individual pieces of the cowboy, cactus and cow are surrounded by purple squares.

The whole idea behind this is that you may be able to build the bottom level acceleration structure for each model only once as long as the model itself does not change, and you will include this model in your scene one or more times. Each time, it will have an associated transformation matrix that will allow you to translate, rotate or scale the model without rebuilding it. So, in each frame, you may only have to rebuild the top level acceleration structure while keeping the bottom level ones intact. Other tricks you can use include rebuilding the top level acceleration structure in a reduced frame rate compared to the app, or using a simplified version of the world geometry when tracing rays instead of the more detailed model used when rendering the scene normally.

Acceleration structures, ray origins and direction vectors typically use world-space coordinates.

Ray Queries

In its most basic form, you can access the ray tracing facilities of the implementation by using ray queries. Before ray tracing, Vulkan already had graphics and compute pipelines. One of the main components of those pipelines are shader programs: application-provided instructions that run on the GPU telling it what to do and, in a graphics pipeline, how to process geometry data (vertex shaders) and calculate the color of each pixel that ends up on the screen (fragment shaders).

When ray queries are supported, you can trace rays from those “classic” shader programs for any purpose. For example, to implement lighting effects in a fragment shader.

Ray Tracing Pipelines

The full power of ray tracing in Vulkan comes in the form of a completely new type of pipeline, the ray tracing pipeline, that complements the existing compute and graphics pipelines.

Most Vulkan ray tracing tutorials, including the Khronos blog post I mentioned before, explain the basics of these pipelines, including the new shader stages (ray generation, intersection, any hit, closest hit, etc) and how they work together. They cover acceleration structure traversal for each ray and how that triggers execution of a particular shader program provided by your app. The image below, taken from the official Vulkan specification[2], contains the typical representation of this traversal process.

Ray Tracing Acceleration Structure traversal diagram showing the ray generation shader initiating the traversal procedure, the miss shader called when the ray does not intersect any geometry and the intersection, any hit and closest hit shaders called when an intersection is found

The main difference between the traditional graphics pipelines and ray tracing pipelines is the following one. If you’re familiar with the classic graphics pipelines, you know the app decides and has full control over what is being drawn at any moment. Your command stream usually looks like this.

  1. Begin render pass (I’ll be using this depth buffer to discard overlapping geometry on the screen and the resulting pixels need to be written to this image)

  2. Bind descriptor sets (I’ll be using these textures and data buffers)

  3. Bind pipeline (This is how the whole process looks like, including the crucial part of shader programs: let me tell you what to do with each vertex and how to calculate the color of each resulting pixel)

  4. Draw this

  5. Draw that

  6. Bind pipeline (I’ll be using different shader programs for the next draws, thank you)

  7. Draw some more

  8. Draw even more

  9. Bind descriptor sets (The textures and other data will be different from now on)

  10. Bind pipeline (The shaders will be different too)

  11. Additional draws

  12. Final draws (Almost there, buddy)

  13. End render pass (I’m done)

Each draw command in the command stream instructs the GPU to draw an object and, because the app is recording that command, the app knows what that object is and the appropriate resources that need to be used to draw that object, including textures, data buffers and shader programs. Before recording the draw command, the app can prepare everything in advance and tell the implementation which shaders and resources will be used with the draw command.

In a ray tracing pipeline, the scene geometry is organized in an acceleration structure. When tracing a ray, you don’t know, in advance, which geometry it’s going to intersect. Each geometry may need a particular set of resources and even the shader programs may need to change with each geometry or geometry type.

Shader Binding Table

For this reason, ray tracing APIs need you to create a Shader Binding Table or SBT for short. SBTs represent (potentially) large arrays of shaders organized in shader groups, where each shader group has a handle that sits in a particular position in the array. The implementation will access this table, for example, when the ray hits a particular piece of geometry. The index it will use to access this table or array will depend on several parameters. Some of them come from the ray tracing command call in a ray generation shader, and others come from the index of the geometry and instance data in the acceleration structure.

There’s a formula to calculate that index and, while it’s not very complex, it will determine the way you must organize your shader binding table so it matches your acceleration structure, which can be a bit of a headache if you’re new to the process.

I highly recommend to take a look at Will Usher’s Shader Binding Table Tutorial, which includes an interactive SBT builder tool that will let you get an idea of how things work and fit together.

The Shader Binding Table is complemented in Vulkan by a Shader Record Buffer. The concept is that entries in the Shader Binding Table don’t have a fixed size that merely corresponds to the size of a shader group handle identifying what to run when the ray hits that particular piece of geometry. Instead, each table entry can be a bit larger and you can put arbitrary data after each handle. That data block is called the Shader Record Buffer, and can be accessed from shader programs when they run. They may be used, for example, to store indices to resources and other data needed to draw that particular piece of geometry, so the shaders themselves don’t have to be completely unique per geometry and can be reused more easily.

Conclusion

As you can see, ray tracing can be more complex than usual but it’s a very powerful tool. I hope the basic explanations and resources I linked above help you get to know it better. Happy hacking!

Notes

[1] The Acceleration Structure representation image with the cowboy, cactus and cow is © 2020 Jason Ekstrand and licensed under the terms of CC-BY.

[2] The Acceleration Structure traversal diagram in a ray tracing pipeline is © 2020 The Khronos Group and released under the terms of CC-BY.

April 29, 2021 07:48 PM

April 27, 2021

Enrique Ocaña

GStreamer WebKit debugging by instrumenting source code (1/3)

This is the continuation of the GStreamer WebKit debugging tricks post series. In the next three posts, I’ll focus on what we can get by doing some little changes to the source code for debugging purposes (known as “instrumenting”), but before, you might want to check the previous posts of the series:

Know all the env vars read by a program by using LD_PRELOAD to intercept libc calls

// File getenv.c
// To compile: gcc -shared -Wall -fPIC -o getenv.so getenv.c -ldl
// To use: export LD_PRELOAD="./getenv.so", then run any program you want
// See http://www.catonmat.net/blog/simple-ld-preload-tutorial-part-2/

#define _GNU_SOURCE

#include <stdio.h>
#include <dlfcn.h>

// This function will take the place of the original getenv() in libc
char *getenv(const char *name) {
 printf("Calling getenv(\"%s\")\n", name);

 char *(*original_getenv)(const char*);
 original_getenv = dlsym(RTLD_NEXT, "getenv");

 return (*original_getenv)(name);
}

See the breakpoints with command example to know how to get the same using gdb. Check also Zan’s libpine for more features.

Track lifetime of GObjects by LD_PRELOADing gobject-list

The gobject-list project, written by Thibault Saunier, is a simple LD_PRELOAD library for tracking the lifetime of GObjects. When loaded into an application, it prints a list of living GObjects on exiting the application (unless the application crashes), and also prints reference count data when it changes. SIGUSR1 or SIGUSR2 can be sent to the application to trigger printing of more information.

Overriding the behaviour of a debugging macro

The usual debugging macros aren’t printing messages? Redefine them to make what you want:

#undef LOG_MEDIA_MESSAGE
#define LOG_MEDIA_MESSAGE(...) do { \
  printf("LOG %s: ", __PRETTY_FUNCTION__); \
  printf(__VA_ARGS__); \
  printf("\n"); \
  fflush(stdout); \
} while(0)

This can be done to enable asserts on demand in WebKit too:

#undef ASSERT
#define ASSERT(assertion) \
  (!(assertion) ? \
      (WTFReportAssertionFailure(__FILE__, __LINE__, WTF_PRETTY_FUNCTION, #assertion), \
       CRASH()) : \
      (void)0)

#undef ASSERT_NOT_REACHED
#define ASSERT_NOT_REACHED() do { \
  WTFReportAssertionFailure(__FILE__, __LINE__, WTF_PRETTY_FUNCTION, 0); \
  CRASH(); \
} while (0)

It may be interesting to enable WebKit LOG() and GStreamer GST_DEBUG() macros only on selected files:

#define LOG(channel, msg, ...) do { \
  printf("%s: ", #channel); \
  printf(msg, ## __VA_ARGS__); \
  printf("\n"); \
  fflush(stdout); \
} while (false)

#define _GST_DEBUG(msg, ...) do { \
  printf("### %s: ", __PRETTY_FUNCTION__); \
  printf(msg, ## __VA_ARGS__); \
  printf("\n"); \
  fflush(stdout); \
} while (false)

Note all the preprocessor trickery used here:

  • First arguments (channel, msg) are captured intependently
  • The remaining args are captured in __VA_ARGS__
  • do while(false) is a trick to avoid {braces} and make the code block work when used in if/then/else one-liners
  • #channel expands LOG(MyChannel,....) as printf("%s: ", "MyChannel"). It’s called “stringification”.
  • ## __VA_ARGS__ expands the variable argument list as a comma-separated list of items, but if the list is empty, it eats the comma after “msg”, preventing syntax errors

Print the compile-time type of an expression

Use typeid(<expression>).name(). Filter the ouput through c++filt -t:

std::vector<char *> v; 
printf("Type: %s\n", typeid(v.begin()).name());

Abusing the compiler to know all the places where a function is called

If you want to know all the places from where the GstClockTime toGstClockTime(float time) function is called, you can convert it to a template function and use static_assert on a wrong datatype like this (in the .h):

template <typename T = float> GstClockTime toGstClockTime(float time) { 
  static_assert(std::is_integral<T>::value,
    "Don't call toGstClockTime(float)!");
  return 0;
}

Note that T=float is different to integer (is_integral). It has nothing to do with the float time parameter declaration.

You will get compile-time errors like this on every place the function is used:

WebKitMediaSourceGStreamer.cpp:474:87:   required from here
GStreamerUtilities.h:84:43: error: static assertion failed: Don't call toGstClockTime(float)!

Use pragma message to print values at compile time

Sometimes is useful to know if a particular define is enabled:

#include <limits.h>

#define _STR(x) #x
#define STR(x) _STR(x)

#pragma message "Int max is " STR(INT_MAX)

#ifdef WHATEVER
#pragma message "Compilation goes by here"
#else
#pragma message "Compilation goes by there"
#endif

...

The code above would generate this output:

test.c:6:9: note: #pragma message: Int max is 0x7fffffff
 #pragma message "Int max is " STR(INT_MAX)
         ^~~~~~~
test.c:11:9: note: #pragma message: Compilation goes by there
 #pragma message "Compilation goes by there"
         ^~~~~~~

by eocanha at April 27, 2021 06:00 AM

April 25, 2021

Brian Kardell

Design Affordance Controls

Design Affordance Controls

We often use a single word (the actual word varies) to discuss "controls" . This, I believe, carries with it a fundamental assumption that they are all somehow "the same," and that has shaped how we think about them. In this post I'll explain why I have recently come to think that perhaps they aren't really quite the same at all. I'll also suggest that some additional terminology which could describe a few broad classes of controls, could help us better discuss (and perhaps shape) them.

What is a "control", exactly?

On its face, it seems like kind of an absurdly simple question, right?

However, consider this: Many UI toolkits outside the web have, at some point, explicitly defined some kind of control which could be described as "a container with scrollbars". In a way, this makes sense: When something is scrollable, there are several UI implications:

  • They paint scroll bars and mouse/touch affordances.
  • They become part of the sequential focus order.
  • Standard keyboard control affordances for managing the scroll are added.
  • There are events and UI states to track.
  • They might gain an affordance to become user-sizable.
  • ...and so on.

On the web, we don't look at the problem that way. There is no special "scroll container" element, nor even an ARIA role for it. Instead, elements are, first, just "Good Semantic Content", and whether they are scrollable (or not) is considered a matter of, and subject to, the design. Overflow, and affordances related to it, are decidedly presentational.

This is pretty interesting because, for example, authors can choose when a <section> should be "component-like" with affordances or not - they vary in pursuit of good design.

Form controls, on the other hand, are definitely not like that at all. Their entire nature is to always be a particular control. As we look to introducing new "controls" in HTML, I think it could be helpful to think about this sort of distinction.

"Design Affordance Controls"

I would like to argue that there are several other "common controls" (collapsible content, tabs and accordions are some examples) which have more in common with scrollable areas than they do with form controls, and that this might be worth careful consideration.

While these components aren't simply about "overflow" (they manage actual hiddenness and have ARIA roles), they do seem to share a lot of other qualities with scroll containers:

  • They can be (and should be, I will argue) thought of, first and foremost, as natural, meaningful document content..
  • Whether or not these elements are "control-like" is something which should be deployed by authors subject to the design in order to provide helpful affordances for users to be able to more easily consume the content.
  • Their interaction is not primary to the control, but secondary.

I label these here as "Design Affordance Controls". To consider why this matters: HTML has <details> and <summary> elements which provide collapsible content and I will use them as a point of comparison.

What about print?

When you print a page containing input collection controls, what prints is the control, in its current state. This is entirely unsurprising. What else could it do, really?

However, is this uniformly desirable? I would suggest it is not. In fact, it is not universally true of any of the things I have labelled "Display Affordance Controls". Just as with scrolling, there isn't a simple "Yes, always" or "No, never" to the question of whether it should be content-like, or control like.. It is reasonable for an author to decide whether it should print either way.

To illustrate: Imagine that I built a a site about recipes. It has sections about the 'ingredients', 'instructions' and 'dietary information'. I might like those to display on someone's screen as collapsible sections of the sort provided by <details> and <summary> in order to provide a a nicer design and a set of affordances for a user to consume the content more easily.

Recipes displayed with disclosure sections

But really, at the end of the day, it's just that: A convenience of design affordance. As an author, in this case, I'd like it to print with all of the content, sans controls.

Receipes as simple sections

With <details> and <summary>, this is not an option. Their control-ness is hard-wired and fundamental. This feels like a mistake.

Interchangeability?

Unlike the collection of input which wants to describe a single "right shape", there isn't really a "right" answer to which of the things I have labelled Design Affordance Controls we should use. In the above example, one could easily and reasonably swap in several (maybe any) of them. Tabs, for example, are also a reasonable choice.

Recipes displayed with sections as tabs

Because this is all in pursuit of design, it might even be desirable to even change our minds! "Responsive Tabs" which allow for a control to be presented as either "tab-like" or "accordion-like" based on design constraints aren't uncommon, and are an example of just this. Their existence helps illustrate that there are at least reasons to consider that this observation is relevant.

(There is a Note about "browser tabs" later)

UnControling?

Even outside of print, as an author, the answer to whether I'd like something to even provide <details> and <summary> affordances varies based on the design. This website, in fact, is an example.

If your browser window is big enough, the left side of this website (as of this writing, at least) is a bunch of information about me. It's not, very probably, why you're here. You didn't come to learn about me. You want to read the <main> content. But, that's not a design problem either... In fact, I think that showing you both and making it easy for you to get that information without being overly distracting is a feature of the design.

However, in a smaller viewport, this would become really inconvenient for a user. It would involve scrolling through a lot of "noise" in order to get to the actual content. The "easy" solution might be to just hide it. However, as I said, making it easy to find that information is a feature. So, on a small screen, I choose to put a disclosure widget that is collapsed by default with 'author information' in the summary.

Almost every website on earth, it seems, has some "spiritually similar" idea (hamburger menus, for example). Cool.

Except... wait... is it? That's not how the API surface of <details> and <summary> work. At all. I believe that's because of how we approached its design. I can't think of any precedent of a control that becomes... not a control. I can, however, point to plenty of examples of "regular things" that can potentially gain affordances.

Enhancing vs... Unenhancing?

Just as scrollbars "enhance" regular content with affordances, Progressive Enhancement does something similar. That's useful to think about.

Assuming that content should be meaningful and non-interactive to start is a good exercise, generally. Any of a myriad of problems can get in the way and if markup and styling implies something is interactive, but it doesn't wind up being so, users are left in one of two bad states:

  • (ideally) a UI which seems to imply that a section could be collapsed, but frustratingly that won't seem to work.
  • (much worse) a UI which seems to imply there is content (which their is) which the user should be able to expand, but frustratingly cannot.

This exercise is also potentially helpful in designing a new control like this for HTML itself. Until new controls are supported by every browser (and historically, that takes a long time), the situation is not dissimilar.

A more robust solution involves enhancing otherwise good and meaningful content (as you see in the print version above) with affordances, if that is both possible and desirable. In fact, this is precisely what several of the original PE examples/essays did. The essence of the element didn't change, only the affordances inside it did, and only if they could. They were perfectly good on their own, and were careful to avoid the above kind of situation.

But <details> and <summary> were not that. They were just unknown elements (effectively, <spans>s). They ran together stylistically, and they had no useful meaning to assistive technology either. If we had thought of these as "Design Affordances" which could appear on a <section>, how much better would that have been in the interim?

Clues in Linking and Searching?

Pages allow authors to share links to anchors. Find-in-page allows authors to find content in the page by searching. Today, this is a problem for <details> and <summary>, again in part because of how 'control-ness' is fundamentally baked into them. Again, this feels like a mistake, and these are pain points that we're working on - but it's interesting to realize that the same would be the case for other controls of this class if we continue to think of them this way.

I feel like these offer further evidence their UI affordances are necessarily secondary and not fundamental.

A Quick Note on "Browser Tabs"

Lots of interfaces that we use have things we'd all refer to in polite conversation as "Tabs". These are used as examples that confound a lot of conversations and take them in many directions. It's never the case, for example, that one would "print all the open tabs" or "view them all in a sequence". In fact, one could reply to many of the things I've written here with "...but, browser tabs...".

It's important to note that while we refer to all of these things as "tabs", most UI toolkits have entirely separate classes of "tab-like" things with distinct names. The fundamental distinction between them is that one of those kinds of tabs (the kind in browsers, chat clients, editors and so on) are actually managing windows, and the other kind is managing panels of content inside a window. On the web, our model is documents. But, there is an easy-to-imagine parallel with embedded documents here. That is, one could perhaps make tabs out of <section>s, or perhaps out of <iframe>s. These would have roughly the same kinds of boundaries as toolkits, but (even from a user's POV) actually different expectations on several of the things noted in this post.

Conclusions?

This post isn't actually intended to present a "solution" as much as it is to provide food for thought about how we shape the conversations as we design new things for HTML. Perhaps even toward helping shape answers to some long open questions. Even something as simple as "what is an accordion?" remains, believe it or not, just a bit elusive in some important ways that even the ARIA Practices Guide (APG) itself has struggled with. I hope that this line of thinking and discussion is helpful to that struggle.

In truth, it's probably not even this cut and dry. As I said: These controls aren't exactly like scrollbars either. I'm definitely not suggesting we should just match that either. It's tempting to kind of think of something like input's type attribute here - but it's not quite that either (which is good, because that is rife with issues -- see Monica's Input I <3 you, but you're bringing me down. for a rough sense of them).

. Instead, it seems there is very probably a small spectrum of "classes of controls and concerns" here that are worth thinking about carefully as we are designing new controls. But we can't really do that unless we start talking about it somewhere. Starting that conversation is my hope here.

I'm currently working with lots of folks in OpenUI on trying to introduce some of these controls, and I'm hopeful we can incorporate these thoughts into our discussions. While this is a lot more to be fleshed out, there is a version of this that I can imagine that very closely resembles a proposal from Ian Hickson from long ago. Ian's proposal provided a wrapper element around otherwise good, sectioned content and would have offered minimal affordances (like grouping the headings as "tabs") and state management. There is a lot that appeals to me about that kind of approach and could play well here. A few of us (myself, Dave Rupert, Pascal Schilp, Jonathan Neal, Miriam Suzanne, Zach Leatherman, Greg Whitworth, Nicole Sullivan) have also been discussing bits of this. Some of us have also been working on creating a custom element along similar lines which can be plain old sections, "tab-like" or "accordion-like". This broke out from collaborative work that began with several of us attempting to align our ideas together on Pascal Schilp's Generic Tabs repository (that isn't the component, but it already has some nice qualities). I'm looking forward to sharing our component and more details soon.

Honestly, I'd love to hear your thoughts. I feel like this is an area ripe for serious R&D, study and discussion.

Special thanks to several friends for proofing/commenting on this as it developed: Alice Boxhall, Jonathan Neal, Miriam Suzanne, Eric Meyer, Dave Rupert. Thanks don't imply some kind of endorsement.

April 25, 2021 04:00 AM

April 22, 2021

Alex Surkov

Hidden Gem of Chromium Accessibility

Low-level accessibility tools

There’s a hidden gem in Chromium accessibility. It is the command line accessibility tools. Similar to accessibility inspection tools available on all popular platforms such as Windows or OSX, these tools are designed for the very same purpose: they provide a low level access to accessibility features of a web site. They also have a superpower though that makes them mighty and unique, but let’s talk about it later. Let’s do some backgrounds for the start.

An accessible website means that the assistive technologies like a screen reader can see, perceive and operate on the website. Not every website is accessible. If website content can’t be expressed in a way the assistive technologies understand, then the website is inaccessible. It’s like translating the website to the assistive technologies language: if it cannot be done, then you are out of luck.

Each platform has its own accessibility API, the assistive technologies rely on those to work with websites. In particular it means the website content should be designed in such a way that it can be mapped to those APIs. So if you have an accessibility issue on a website, then quite likely it indicates the content is not properly mapped to the accessibility APIs.

If a website has an accessibility bug, then it’s often a good idea to inspect how the website is mapped to the APIs. Each platform has its own set of accessibility tools: Linux has Accerciser, OSX has Accessibility Inspect, Windows has Inspect. Each browser is also packed by developer tools which incorporate functions for website accessibility inspection. They all are designed to help web authors to better understand how screen readers see a web site. They represent website content in a quite techy way in terms of APIs, objects and their properties, but it may give you a clue why accessibility doesn’t work as expected. It’s definitely worth a shot.

Let’s stop from burrowing further into tech details of how accessibility works. It should be good enough for the context to move back to the original topic. In case if you want to dive deeper, then you can check out this podcast where igalians discuss tech accessibility.

So what makes Chromium accessibility tools special?

  • First, they are cross platform. You have a single tool for all major platforms: Windows, OSX and Linux. It is handy and makes a learning curve shallow.
  • Second, these tools are open source, you know what happens under-the-hood and you can always add a new feature if missing.
  • And finally, these are CLI tools and can be used as a building block to create other accessibility inspection applications. This is my favourite part. I think this latter argument captures the spirit of the Unix world quite nicely, but let’s talk on this later.

The Chromium accessibility is packed by two tools:

  • ax_dump_tree which is used to dump an accessible tree of an application. It makes it quite similar to well known inspection tools (see above) and
  • ax_dump_events tool which is designed to log all accessibility events coming from an application.

Being able to inspect an accessible tree which shows a hierarchy of accessible elements and includes all accessibility properties like name, description is a quite handy feature when you dig into accessibility bugs.

Accessibility events logging can be also super helpful. It’s not something typically supported out of box by existing accessibility tools, but this is a super important bit of information. Indeed, events are the primary notification mechanism used by the assistive technologies to pick up changes from a website. If events are broken, then the assistive technologies get broken too. It makes the ax_dump_events tool special, and also makes another good reason why you’d want to try it next time :)

These tools support a bunch of pre-defined selectors to grab a tree or record events in browsers in no time. For example:

  • — chrome and — chromium for Chrome and Chromium
  • — firefox for Firefox
  • — edge for Edge
  • — safari for Safari

For example, you do:

ax_dump_tree — edge

to dump the accessible tree of Microsoft Edge browser on Windows.

Same with events. To start recording events for Safari browser you do:

ax_dump_events — safari

If you need to scope the tool to a web site (as opposed to the whole browser), then you have — active-tab selector to do so. In case of multiple windows of the same application, you can tune your search by — pattern option. For non browser applications you can specify process ID via — pid option. See docs for full documentation.

Let’s recapture some of the above. You’d want to use these tools when:

  • You need to inspect an accessible tree for a website/test case, or when you need to check which accessibility events are fired: sometimes it’s important to understand how the assistive technologies see your website (Say “hi” to web authors);
  • You need to compare accessible trees/events between browsers. It’s helpful when you need to figure out why a screen reader works in one browser and doesn’t work in another (say “hi” to browser developers);
  • It’s useful to find implementation gaps and inconsistencies (say “hi” to spec authors).

There is no official place for the tools yet. You can build those yourself from Chromium sources. I also prepared nightly builds and uploaded them at my google drive at your convenience.

The tools are awesome: fast, reliable and easy to use. They are perfect except … they have no GUI :) All console tools have such flaws, just by design. Indeed, CLI lacks certain useful options inherent to GUI tools, for example, click-to-inspect feature, which allows you to select a DOM element and inspect its accessible tree. This is very handy. You can find the feature literally in any accessibility tool, and not having it is a real bummer.

But no-GUI disadvantage can become a benefit: the tools can be easily incorporated into other accessibility tools as an accessibility inspect engine. You can create a cross platform GUI tool backed, for example, by Electron, and powered by Chromium’s ax_dump tools. Indeed, you don’t have to c++ or python into web applications accessibility, you can rely on ax_dump tools output instead. Chromium developer tools, which is not surprising, are based on these very same CLI tools.

And last but not least. It opens a new world for cross browser testing. Now you should be able to automate the platform accessibility mapping, for example, HTML-AAM, and it might be an easier solution than Accessible Testing Protocol. Needless to say Chromium already uses it in their automated test suite.

This is the beauty of the Unix word: you have a basic but powerful command line tool, which can serve as an engine to build anything you want upon it: GUI accessibility tools, automated testing, or you can use them as is because they are just cool.

by Alexander Surkov at April 22, 2021 07:22 PM

Imanol Fernandez

WebXR landing in WebKit

Since I joined Igalia, I have been working on finishing up the core WebXR implementation in WebKit, focused on the DOM, render loop, graphics and input sources. We are targeting the OpenXR API and we have reached the point where we are able to run some demos, so it is a good time to share a summary of the work that we have done so far. You can check all the patches and the code at the WebKit WebXR module.

April 22, 2021 10:45 AM

April 21, 2021

Víctor Jáquez

Review of Igalia Multimedia activities (2020/H2)

As the first quarter of 2021 has aready come to a close, we reckon it’s time to recap our achievements from the second half of 2020, and update you on the improvements we have been making to the multimedia experience on the Web and Linux in general.

Our previous reports:

WPE / WebKitGTK

We have closed ~100 issues related with multimedia in WebKitGTK/WPE, such as fixed seek issues while playback, plugged memory leaks, gardening tests, improved Flatpak-based developing work-flow, enabled new codecs, etc.. Overall, we improved a bit the multimedia’s user experience on these Webkit engine ports.

To highlight a couple tasks, we did some maintenance work on WebAudio backends, and we upstreamed an internal audio mixer, keeping only one connection to the audio server, such as PulseAudio, instead of multiple connections, one for every audio resource. The mixer combines all streams into a single audio server connection.

Adaptive media streaming for the Web (MSE)

We have been working on a new MSE backend for a while, but along the way many related bugs have appeared and they were squashed. Also many code cleanups has been carried out. Though it has been like yak shaving, we are confident that we will reach the end of this long and winding road soonish.

DRM media playback for the Web (EME)

Regarding digital protected media playback, we worked to upstream OpenCDM, support with Widevine, through RDK’s Thunder framework, while continued with the usual maintenance of the others key systems, such as Clear Key, Widevine and PlayReady.

For more details we published a blog post: Serious Encrypted Media Extensions on GStreamer based WebKit ports.

Realtime communications for the Web (WebRTC)

Just as EME, WebRTC is not currently enabled by default in browsers such as Epiphany because license problems, but they are available for custom adopters, and we are maintaining it. For example, we collaborated to upgrade LibWebRTC to M87 and fixed the expected regressions and gardening.

Along the way we experimented a bit with the new GPUProcess for capture devices, but we decided to stop the experimentation while waiting for a broader adoption of the process, for example in graphics rendering, in WPE/WebKitGTK.

GPUProcess work will be retaken at some point, because it’s not, currently, a hard requirement, since we already have moved capture devices handling from the UIProcess to the WebProcess, isolating all GStreamer operations in the latter.

GStreamer

GStreamer is one of our core multimedia technologies, and we contribute on it on a daily basis. We pushed ~400 commits, with similar number of code reviews, along the second half of 2020. Among of those contributions let us highlight the following list:

  • A lot of bug fixing aiming for release 1.18.
  • Reworked and enhanced decodebin3, the GstTranscoder
    API
    and encodebin.
  • Merged av1parse in video parsers plugin.
  • Merged qroverlay plugin.
  • Iterated on the mono-repo
    proposal, which requires consensus and coordination among the whole community.
  • gstwpe element has been greatly improved from new user requests.
  • Contributed on the new libgstcodecs library, which enables stateless video decoders through different platforms (for example, v4l2, d3d11, va, etc.).
  • Developed a new plugin for VA-API using this library, exposing H.264, H.265, VP9, VP8, MPEG2 decoders and a full featured postprocessor, with better performance, according our measurements, than GStreamer-VAAPI.

Conferences

Despite 2020 was not a year for conferences, many of them went virtual. We attended one, the Mile high video conference, and participated in the Slack workspace.

Thank you for reading this report and stay tuned with our work.

by vjaquez at April 21, 2021 04:49 AM

April 20, 2021

Danylo Piliaiev

Turnips in the wild (Part 1)

Running games and benchmarks is much more exciting than trying to fix a handful of remaining synthetic tests. Turnip, which is an open-source Vulkan driver for recent Adreno GPUs, should already be capable of running real world applications, and they always have a way to break the driver in a new, unexpected ways.

TauCeti Vulkan Technology Benchmark

The benchmark greeted me with this wonderful field which looked a little blocky:

Screenshot of the benchmark with grass/dirt field which looks more like a patchwork

It’s not a crash, it’s not a hang, there is something wrong either with textures or with a shader. So, let’s take a closer look at a frame. Here is the first major draw call which looks wrong:

Screenshot of a bad draw call being inspected in RenderDoc

Now, let’s take a look at textures used by the draw:

More than twelve textures used by the draw call

That’s a lot of textures! But all of them look fine; yes, there are some textures that look blocky, but these textures are just small, so nothing wrong here.

The next stop is the fragment shader, I recently implemented an extension which helps with that. VK_KHR_pipeline_executable_properties allows the reporting of additional information about shaders in a pipeline. In case of Turnip it is statistics about registers and assembly of the shader:

Part of the problematic shader with a few instructions having a warning printed near them
Excerpt from the problematic shader
sam.base0 (f32)(xy)r0.x, r0.z, a1.x	; no field 'HAS_SAMP', WARNING: unexpected bits[0:7] in #cat5-samp-s2en-bindless-a1: 0x6 vs 0x0

Bingo! Usually when the issue is in a shader I have to either make educated guesses and/or reduce the shader until the issue is apparent. However, here we can immediately spot instructions with a warning. It may not be the cause of earth mis-rendering, but these instructions do sampling from textures, soooo.

After fixing their encoding, which was caused by a mistake in the XML definition, the ground is now rendered correctly:

Screenshot of the benchmark with a field which now has proper grass and dirt

After looking a bit more at the frame I saw that the same issue plagued all rock formations, and now it is gone too. The final fix could be seen at

ir3/isa,parser: fix encoding and parsing of bindless s2en SAM (!9628)

Genshin Impact

Now it’s time for a more heavyweight opponent - Genshin Impact, one of the most popular mobile games at the moment. Genshin Impact supports both GLES and Vulkan, and defaults to GLES on all but several devices. In any case, Vulkan could be enabled by editing a config file.

There are several mis-renderings in the game, however here I would describe the one which shows why it is important to use all available validation tooling for such complex API as Vulkan. Other mis-renderings would be a matter for the second post.

Gameplay – Disco Water

Proceeding to the gameplay and running around for a bit revealed a major artifacts on the water surface:

Screenshot of the gameplay with body of water that has large colorful artifacts

This one was a multiframe effect so I had to capture a trace of all frames and then find the one where issue began. The draw call I found:

GIF showing before and after problematic draw call

It adds water to the first framebuffer and a lot of artifacts to the second one. Looking at the framebuffer configuration - I could see that the fragment shader doesn’t write to the second framebuffer. Is it allowed? Yes. Is the behavior defined? No! VK_LAYER_KHRONOS_validation warns us about it, if warnings are enabled:

UNASSIGNED-CoreValidation-Shader-InputNotProduced(WARN / SPEC): Validation Warning:
Attachment 1 not written by fragment shader; undefined values will be written to attachment

“undefined values will be written to attachment” may mean that nothing is written into it, like expected by the game, or that random values are written to the second attachment, like Turnip does. Such behavior should not be relied upon, so Turnip does not contradict the specification here and there is nothing to fix. Case closed.

More mis-renderings and investigations to come in the next post(s).

Turnips in the wild (Part 2)

by Danylo Piliaiev at April 20, 2021 09:00 PM

Eleni Maria Stea

EXT_external_objects and EXT_external_objects_fd for the Intel iris driver have been merged into mesa3D! [updated]

This post is a quick status update on OpenGL and Vulkan Interoperability extensions for Linux mesa3D drivers: Both EXT_external_objects and EXT_external_objects_fd implementations for the Intel iris driver have been finally merged into mesa3D earlier today and will be available in next release! 🎉 Parts of these extensions were already upstream (EXT_semaphore, EXT_semaphore_fd, and some drivers … Continue reading EXT_external_objects and EXT_external_objects_fd for the Intel iris driver have been merged into mesa3D! [updated]

by hikiko at April 20, 2021 07:49 PM

Enrique Ocaña

GStreamer WebKit debugging tricks using GDB (2/2)

This post is a continuation of a series of blog posts about the most interesting debugging tricks I’ve found while working on GStreamer WebKit on embedded devices. These are the other posts of the series published so far:

Print corrupt stacktraces

In some circumstances you may get stacktraces that eventually stop because of missing symbols or corruption (?? entries).

#3  0x01b8733c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

However, you can print the stack in a useful way that gives you leads about what was next in the stack:

  • For i386: x/256wa $esp
  • For x86_64: x/256ga $rsp
  • For ARM 32 bit: x/256wa $sp

You may want to enable asm-demangle: set print asm-demangle

Example output, the 3 last lines give interesting info:

0x7ef85550:     0x1b87400       0x2     0x0     0x1b87400
0x7ef85560:     0x0     0x1b87140       0x1b87140       0x759e88a4
0x7ef85570:     0x1b87330       0x759c71a9 <gst_base_sink_change_state+956>     0x140c  0x1b87330
0x7ef85580:     0x759e88a4      0x7ef855b4      0x0     0x7ef855b4
...
0x7ef85830:     0x76dbd6c4 <WebCore::AppendPipeline::resetPipeline()::__PRETTY_FUNCTION__>        0x4     0x3     0x1bfeb50
0x7ef85840:     0x0     0x76d59268      0x75135374      0x75135374
0x7ef85850:     0x76dbd6c4 <WebCore::AppendPipeline::resetPipeline()::__PRETTY_FUNCTION__>        0x1b7e300       0x1d651d0       0x75151b74

More info: 1

Sometimes the symbol names aren’t printed in the stack memdump. You can do this trick to iterate the stack and print the symbols found there (take with a grain of salt!):

(gdb) set $i = 0
(gdb) p/a *((void**)($sp + 4*$i++))

[Press ENTER multiple times to repeat the command]

$46 = 0xb6f9fb17 <_dl_lookup_symbol_x+250>
$58 = 0xb40a9001 <g_log_writer_standard_streams+128>
$142 = 0xb40a877b <g_return_if_fail_warning+22>
$154 = 0xb65a93d5 <WebCore::MediaPlayerPrivateGStreamer::changePipelineState(GstState)+180>
$164 = 0xb65ab4e5 <WebCore::MediaPlayerPrivateGStreamer::playbackPosition() const+420>
...

Many times it’s just a matter of gdb not having loaded the unstripped version of the library. /proc/<PID>/smaps and info proc mappings can help to locate the library providing the missing symbol. Then we can load it by hand.

For instance, for this backtrace:

#0  0x740ad3fc in syscall () from /home/enrique/buildroot-wpe/output/staging/lib/libc.so.6 
#1  0x74375c44 in g_cond_wait () from /home/enrique/buildroot-wpe/output/staging/usr/lib/libglib-2.0.so.0 
#2  0x6cfd0d60 in ?? ()

In a shell, we examine smaps and find out that the unknown piece of code comes from libgstomx:

$ cat /proc/715/smaps
...
6cfc1000-6cff8000 r-xp 00000000 b3:02 785380     /usr/lib/gstreamer-1.0/libgstomx.so
...

Now we load the unstripped .so in gdb and we’re able to see the new symbol afterwards:

(gdb) add-symbol-file /home/enrique/buildroot-wpe/output/build/gst-omx-custom/omx/.libs/libgstomx.so 0x6cfc1000
(gdb) bt
#0  0x740ad3fc in syscall () from /home/enrique/buildroot-wpe/output/staging/lib/libc.so.6
#1  0x74375c44 in g_cond_wait () from /home/enrique/buildroot-wpe/output/staging/usr/lib/libglib-2.0.so.0
#2  0x6cfd0d60 in gst_omx_video_dec_loop (self=0x6e0c8130) at gstomxvideodec.c:1311
#3  0x6e0c8130 in ?? ()

Useful script to prepare the add-symbol-file:

cat /proc/715/smaps | grep '[.]so' | sed -e 's/-[0-9a-f]*//' | { while read ADDR _ _ _ _ LIB; do echo "add-symbol-file $LIB 0x$ADDR"; done; }

More info: 1

The “figuring out corrupt ARM stacktraces” post has some additional info about how to use addr2line to translate memory addresses to function names on systems with a hostile debugging environment.

Debugging a binary without debug symbols

There are times when there’s just no way to get debug symbols working, or where we’re simply debugging on a release version of the software. In those cases, we must directly debug the assembly code. The gdb text user interface (TUI) can be used to examine the disassebled code and the CPU registers. It can be enabled with these commands:

layout asm
layout regs
set print asm-demangle

Some useful keybindings in this mode:

  • Arrows: scroll the disassemble window
  • CTRL+p/n: Navigate history (previously done with up/down arrows)
  • CTRL+b/f: Go backward/forward one character (previously left/right arrows)
  • CTRL+d: Delete character (previously “Del” key)
  • CTRL+a/e: Go to the start/end of the line

This screenshot shows how we can infer that an empty RefPtr is causing a crash in some WebKit code.

Wake up an unresponsive gdb on ARM

Sometimes, when you continue (‘c’) execution on ARM there’s no way to stop it again unless a breakpoint is hit. But there’s a trick to retake the control: just send a harmless signal to the process.

kill -SIGCONT 1234

Know which GStreamer thread id matches with each gdb thread

Sometimes you need to match threads in the GStreamer logs with threads in a running gdb session. The simplest way is to ask it to GThread for each gdb thread:

(gdb) set output-radix 16
(gdb) thread apply all call g_thread_self()

This will print a list of gdb threads and GThread*. We only need to find the one we’re looking for.

Generate a pipeline dump from gdb

If we have a pointer to the pipeline object, we can call the function that dumps the pipeline:

(gdb) call gst_debug_bin_to_dot_file_with_ts((GstBin*)0x15f0078, GST_DEBUG_GRAPH_SHOW_ALL, "debug")

by eocanha at April 20, 2021 06:00 AM

April 19, 2021

Samuel Iglesias

Low Resolution Z Buffer support on Turnip

Last year I worked on implementing in Turnip the support for a HW feature present in Qualcomm Adreno GPUs: the low-resolution Z buffer (aka LRZ). This is a HW feature already supported in Freedreno, which is the open-source OpenGL driver for these GPUs.

What is low-resolution Z buffer

Low-resolution Z buffer is very similar to a depth prepass that helps the HW avoid executing the fragment shader on those fragments that will be subsequently discarded by the depth test afterwards (Hidden surface removal). This feature comes with some limitations though, such as the fragment shader not being allowed to have side effects (writing to SSBOs, atomic operations, etc) among others.

The interesting part of this feature is that it allows the applications to submit the vertices in any order (saving CPU time that was otherwise used on ordering them) and the HW will process them in the binning pass as explained below, detecting which ones are occluded and giving an increase in performance in some specific use cases due to this.

Tiled-rendering

To understand better how LRZ works, we need to talk a bit about tiled-based rendering. This is a way of rendering based on subdividing the framebuffer in tiles and rendering each tile separately. The advantage of this design is that the amount of memory and bandwidth is reduced compared to immediate mode rendering systems that draw the entire frame at once. Tile-based rendering is very popular on embedded GPUs, including Qualcomm Adreno.

Entering into more details, the graphics pipeline is divided into three different passes executed per tile of the frame.

Tiled-rendering architecture diagram
Tiled-rendering architecture diagram.
  • The binning pass. This pass processes the geometry of the scene and records in a table on which tiles a primitive will be rendered. By doing this, the HW only needs to render the primitives that affect a specific tile when is processed.

  • The rendering pass. This pass gets the rasterized primitives and executes all the fragment related processes of the pipeline (fragment shader execution, depth pass, stencil pass, blending, etc). Once it finishes, the resolve pass starts.

  • The resolve pass. It first resolves the tile buffer (GMEM) if it is multisample, and copy the final color and depth values for all tile pixels back to system memory. If it is the last tile of the framebuffer, it swap buffers and start the binning pass for next frame.

Where is LRZ used then? Well, in both binning and rendering passes. In the binning pass, it is possible to store the depth value of each vertex of the geometries of the scene in a buffer as the HW has that data available. That is the depth buffer used internally for LRZ. It has lower resolution as too much detail is not needed, which helps to save bandwidth while transferring its contents to system memory.

Thanks to LRZ, the rendering pass is only executed on the fragments that are going to be visible at the end. However, there are some limitations as mentioned before: if a fragment shader has collateral effects, such as writing SSBO, atomics, etc; or if blending is enabled, or if the fragment shader could modify the fragment’s depth… then LRZ cannot be used as the results may be wrong.

However, LRZ brings a couple of things on the table that makes it interesting. One is that applications don’t need to reorder their primitives before submission to be more efficient, that is done by the HW with LRZ automatically. Another one is performance improvements in some use cases. For example, imagine a fragment shader discards parts of fragments but it doesn’t have any other collateral effect otherwise. In that case, although we cannot do early depth testing, we can do early LRZ as we know that some fragments won’t pass a depth test even if they are not discarded by the fragment shader.

Turnip implementation

Talking about the LRZ implementation, I took Freedreno’s code as a starting point to implement LRZ on Turnip. After some months of work, it finally landed in Mesa master.

Last week, more patches related to LRZ landed in Mesa master: the ones fixing LRZ interactions with VK_EXT_extended_dynamic_state, as with this extension the application can change some states in command buffer time that could affect LRZ state and, therefore, we need to track them accordingly.

I also implemented some LRZ improvements that are currently under review also landed (thanks Eric Anholt!), such as the support to do early-LRZ-late-depth test that I mentioned before, which could bring a performance improvement in some applications.

LRZ improvements
Left: original vulkan tutorial demo implementation. Right: same demo modified to discard fragments with red component lower than 0.5f.

For instance, I did some measurements in a vulkan-tutorial.com implementation of my own that I modified to discard a significant amount of fragments (see previous figure). This is one of the cases that early-LRZ-late-depth test helps to improve performance.

When running the modified demo with these patches, I found a performance improvement between 13-16%.

Acknowledgments

All this LRZ work was my first big contribution to this open-source reverse engineered driver! I don’t want to finish this post without thanking publicly Rob Clark for the original Freedreno implementation and his reviews of my work, as well as Jonathan Marek and Connor Abbott for their insightful reviews, advice and tips to make it working. Edited: Many thanks to Eric Anholt for his reviews in the last two patch series!

Happy hacking!

April 19, 2021 07:40 AM

April 16, 2021

Eric Meyer

Scripted Server Startup for MDN and WPT

A sizeable chunk of my work at Igalia so far involves editing and updating the Mozilla Developer Network (MDN), and a smaller chunk has me working on the Web Platform Tests (WPT).  In both cases, the content is stored in large public repositories (MDN, WPT) and contributors are encouraged to fork the repositories, clone them locally, and push updates via the fork as PRs (Pull Requests).  And while both repositories roll in localhost web server setups so you can preview your edits locally, each has its own.

As useful as these are, if you ignore the whole “auto-force a browser page reload every time the file is modified in any way whatsoever” thing that I’ve been trying very hard to keep from discouraging me from saving often, each has to be started in its own way, from within their respective repository directories, and it’s generally a lot more convenient to do so in a separate Terminal window.

I was getting tired of constantly opening a new Terminal window, cding into the correct place, remembering the exact invocation needed to launch the local server, and on and on, so I decided to make my life slightly easier with a few short scripts and aliases.  Maybe this will be useful to you as well.

First, I decided to keep things relatively simple.  Instead of writing a small program that would handle all server startups by parsing shell arguments and what have you, I wrote a couple of very similar shell scripts.  Here’s the script for launching MDN’s localhost:

#!/bin/bash
cd ~/repos/mdn/content/
yarn start

Then I added an alias to ~/.bashrc which employs a technique I swiped from this Stack Overflow answer.

alias mdn-server="open -a Terminal.app ~/bin/mdn-start.bsh"

Translated into English, that means “open the file ~/bin/mdn-start.bsh using the -application Terminal.app”.

Thus, when I type mdn-server in any command prompt, a new Terminal window will open and the shell script mdn-start.bsh will be run; the script switches into the needed directory and launches the localhost server using yarn, as per the MDN instructions.  What’s more, when I’m done working on MDN, I can switch to the window running the server, stop the server with ⌃C (control-C), and the Terminal window closes automatically.

I did something very similar for WPT, except in this case the alias reads:

alias wpt-server="open -a Terminal.app ~/bin/wpt-serve.bsh"

And the script to which it points reads:

#!/bin/bash
cd ~/repos/wpt/
./wpt serve

As I mentioned before, I chose to do it this way rather than writing a single alias (say, local-server) that would accept arguments (mdn, wpt, etc.) and fire off scripts accordingly, but that’s also an option and a viable one at that.

So that’s my little QoL (Quality of Life) upgrade to make working on MDN and WPT a little easier.  I hope it helps you in some way!


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at April 16, 2021 01:55 PM

April 13, 2021

Eleni Maria Stea

Using EGL and the dma_buf kernel framework to associate two textures with the contents of the same buffer without copy taking place

It’s been a few weeks I’ve been experimenting with EGL/GLESv2 as part of my work for WebKit (Browsers) team of Igalia. One thing I wanted to familiarize with was using shared DMA buffers to avoid copying textures in graphics programs. I’ve been experimenting with the dma_buf API, which is a generic Linux kernel framework for … Continue reading Using EGL and the dma_buf kernel framework to associate two textures with the contents of the same buffer without copy taking place

by hikiko at April 13, 2021 01:12 PM

Enrique Ocaña

GStreamer WebKit debugging tricks using GDB (1/2)

I’ve been developing and debugging desktop and mobile applications on embedded devices over the last decade or so. The main part of this period I’ve been focused on the multimedia side of the WebKit ports using GStreamer, an area that is a mix of C (glib, GObject and GStreamer) and C++ (WebKit).

Over these years I’ve had to work on ARM embedded devices (mobile phones, set-top-boxes, Raspberry Pi using buildroot) where most of the environment aids and tools we take for granted on a regular x86 Linux desktop just aren’t available. In these situations you have to be imaginative and find your own way to get the work done and debug the issues you find in along the way.

I’ve been writing down the most interesting tricks I’ve found in this journey and I’m sharing them with you in a series of 7 blog posts, one per week. Most of them aren’t mine, and the ones I learnt in the begining of my career can even seem a bit naive, but I find them worth to share anyway. I hope you find them as useful as I do.

Breakpoints with command

You can break on a place, run some command and continue execution. Useful to get logs:

break getenv
command
 # This disables scroll continue messages
 # and supresses output
 silent
 set pagination off
 p (char*)$r0
continue
end

break grl-xml-factory.c:2720 if (data != 0)
command
 call grl_source_get_id(data->source)
 # $ is the last value in the history, the result of
 # the previous call
 call grl_media_set_source (send_item->media, $)
 call grl_media_serialize_extended (send_item->media, 
  GRL_MEDIA_SERIALIZE_FULL)
 continue
end

This idea can be combined with watchpoints and applied to trace reference counting in GObjects and know from which places the refcount is increased and decreased.

Force execution of an if branch

Just wait until the if chooses a branch and then jump to the other one:

6 if (i > 3) {
(gdb) next
7 printf("%d > 3\n", i);
(gdb) break 9
(gdb) jump 9
9 printf("%d <= 3\n", i);
(gdb) next
5 <= 3

Debug glib warnings

If you get a warning message like this:

W/GLib-GObject(18414): g_object_unref: assertion `G_IS_OBJECT (object)' failed

the functions involved are: g_return_if_fail_warning(), which calls to g_log(). It’s good to set a breakpoint in any of the two:

break g_log

Another method is to export G_DEBUG=fatal_criticals, which will convert all the criticals in crashes, which will stop the debugger.

Debug GObjects

If you want to inspect the contents of a GObjects that you have in a reference…

(gdb) print web_settings 
$1 = (WebKitWebSettings *) 0x7fffffffd020

you can dereference it…

(gdb) print *web_settings
$2 = {parent_instance = {g_type_instance = {g_class = 0x18}, ref_count = 0, qdata = 0x0}, priv = 0x0}

even if it’s an untyped gpointer…

(gdb) print user_data
(void *) 0x7fffffffd020
(gdb) print *((WebKitWebSettings *)(user_data))
{parent_instance = {g_type_instance = {g_class = 0x18}, ref_count = 0, qdata = 0x0}, priv = 0x0}

To find the type, you can use GType:

(gdb) call (char*)g_type_name( ((GTypeInstance*)0x70d1b038)->g_class->g_type )
$86 = 0x2d7e14 "GstOMXH264Dec-omxh264dec"

Instantiate C++ object from gdb

(gdb) call malloc(sizeof(std::string))
$1 = (void *) 0x91a6a0
(gdb) call ((std::string*)0x91a6a0)->basic_string()
(gdb) call ((std::string*)0x91a6a0)->assign("Hello, World")
$2 = (std::basic_string<char, std::char_traits<char>, std::allocator<char> > &) @0x91a6a0: {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x91a6f8 "Hello, World"}}
(gdb) call SomeFunctionThatTakesAConstStringRef(*(const std::string*)0x91a6a0)

See: 1 and 2

by eocanha at April 13, 2021 10:49 AM

April 11, 2021

Andy Wingo

guile's reader, in guile

Good evening! A brief(ish?) note today about some Guile nargery.

the arc of history

Like many language implementations that started life when you could turn on the radio and expect to hear Def Leppard, Guile has a bottom half and a top half. The bottom half is written in C and exposes a shared library and an executable, and the top half is written in the language itself (Scheme, in the case of Guile) and somehow loaded by the C code when the language implementation starts.

Since 2010 or so we have been working at replacing bits written in C with bits written in Scheme. Last week's missive was about replacing the implementation of dynamic-link from using the libltdl library to using Scheme on top of a low-level dlopen wrapper. I've written about rewriting eval in Scheme, and more recently about how the road to getting the performance of C implementations in Scheme has been sometimes long.

These rewrites have a quixotic aspect to them. I feel something in my gut about rightness and wrongness and I know at a base level that moving from C to Scheme is the right thing. Much of it is completely irrational and can be out of place in a lot of contexts -- like if you have a task to get done for a customer, you need to sit and think about minimal steps from here to the goal and the gut doesn't have much of a role to play in how you get there. But it's nice to have a project where you can do a thing in the way you'd like, and if it takes 10 years, that's fine.

But besides the ineffable motivations, there are concrete advantages to rewriting something in Scheme. I find Scheme code to be more maintainable, yes, and more secure relative to the common pitfalls of C, obviously. It decreases the amount of work I will have when one day I rewrite Guile's garbage collector. But also, Scheme code gets things that C can't have: tail calls, resumable delimited continuations, run-time instrumentation, and so on.

Taking delimited continuations as an example, five years ago or so I wrote a lightweight concurrency facility for Guile, modelled on Parallel Concurrent ML. It lets millions of fibers to exist on a system. When a fiber would need to block on an I/O operation (read or write), instead it suspends its continuation, and arranges to restart it when the operation becomes possible.

A lot had to change in Guile for this to become a reality. Firstly, delimited continuations themselves. Later, a complete rewrite of the top half of the ports facility in Scheme, to allow port operations to suspend and resume. Many of the barriers to resumable fibers were removed, but the Fibers manual still names quite a few.

Scheme read, in Scheme

Which brings us to today's note: I just rewrote Guile's reader in Scheme too! The reader is the bit that takes a stream of characters and parses it into S-expressions. It was in C, and now is in Scheme.

One of the primary motivators for this was to allow read to be suspendable. With this change, read-eval-print loops are now implementable on fibers.

Another motivation was to finally fix a bug in which Guile couldn't record source locations for some kinds of datums. It used to be that Guile would use a weak-key hash table to associate datums returned from read with source locations. But this only works for fresh values, not for immediate values like small integers or characters, nor does it work for globally unique non-immediates like keywords and symbols. So for these, we just wouldn't have any source locations.

A robust solution to that problem is to return annotated objects rather than using a side table. Since Scheme's macro expander is already set to work with annotated objects (syntax objects), a new read-syntax interface would do us a treat.

With read in C, this was hard to do. But with read in Scheme, it was no problem to implement. Adapting the expander to expect source locations inside syntax objects was a bit fiddly, though, and the resulting increase in source location information makes the output files bigger by a few percent -- due somewhat to the increased size of the .debug_lines DWARF data, but also due to serialized source locations for syntax objects in macros.

Speed-wise, switching to read in Scheme is a regression, currently. The old reader could parse around 15 or 16 megabytes per second when recording source locations on this laptop, or around 22 or 23 MB/s with source locations off. The new one parses more like 10.5 MB/s, or 13.5 MB/s with positions off, when in the old mode where it uses a weak-key side table to record source locations. The new read-syntax runs at around 12 MB/s. We'll be noodling at these in the coming months, but unlike when the original reader was written, at least now the reader is mainly used only at compile time. (It still has a role when reading s-expressions as data, so there is still a reason to make it fast.)

As is the case with eval, we still have a C version of the reader available for bootstrapping purposes, before the Scheme version is loaded. Happily, with this rewrite I was able to remove all of the cruft from the C reader related to non-default lexical syntax, which simplifies maintenance going forward.

An interesting aspect of attempting to make a bug-for-bug rewrite is that you find bugs and unexpected behavior. For example, it turns out that since the dawn of time, Guile always read #t and #f without requiring a terminating delimiter, so reading "(#t1)" would result in the list (#t 1). Weird, right? Weirder still, when the #true and #false aliases were added to the language, Guile decided to support them by default, but in an oddly backwards-compatible way... so "(#false1)" reads as (#f 1) but "(#falsa1)" reads as (#f alsa1). Quite a few more things like that.

All in all it would seem to be a successful rewrite, introducing no new behavior, even producing the same errors. However, this is not the case for backtraces, which can expose the guts of read in cases where that previously wouldn't happen because the C stack was opaque to Scheme. Probably we will simply need to add more sensible error handling around callers to read, as a backtrace isn't a good user-facing error anyway.

OK enough rambling for this evening. Happy hacking to all and to all a good night!

by Andy Wingo at April 11, 2021 07:51 PM

April 08, 2021

Andy Wingo

sign of the times

Hello all! There is a mounting backlog of things that landed in Guile recently and to avoid having to eat the whole plate in one bite, I'm going to try to send some shorter missives over the next weeks.

Today's is about a silly thing, dynamic-link. This interface is dlopen, but "portable". See, back in the day -- like, 1998 -- there were lots of kinds of systems and how to make and load a shared library portably was hard. You'd have people with AIX and Solaris and all kinds of weird compilers and linkers filing bugs on your project if you hard-coded a GNU toolchain invocation when creating loadable extensions, or hard-coded dlopen or similar to use them.

Libtool provided a solution to create portable loadable libraries, which involved installing .la files alongside the .so files. You could use libtool to link them to a library or an executable, or you could load them at run-time via the libtool-provided libltdl library.

But, the .la files were a second source of truth, and thus a source of bugs. If a .la file is present, so is an .so file, and you could always just use the .so file directly. For linking against an installed shared library on modern toolchains, the .la files are strictly redundant. Therefore, all GNU/Linux distributions just delete installed .la files -- Fedora, Debian, and even Guix do so.

Fast-forward to today: there has been a winnowing of platforms, and a widening of the GNU toolchain (in which I include LLVM as well as it has a mostly-compatible interface). The only remaining toolchain flavors are GNU and Windows, from the point of view of creating loadable shared libraries. Whether you use libtool or not to create shared libraries, the result can be loaded either way. And from the user side, dlopen is the universally supported interface, outside of Windows; even Mac OS fixed their implementation a few years back.

So in Guile we have been in an unstable equilibrium: creating shared libraries by including a probably-useless libtool into the toolchain, and loading them by using a probably-useless libtool-provided libltdl.

But the use of libltdl has not been without its costs. Because libltdl intends to abstract over different platforms, it encourages you to leave off the extension when loading a library, instead promising to try a platform-specific set such as .so, .dll, .dylib etc as appropriate. In practice the abstraction layer was under-maintained and we always had some problems on Mac OS, for example.

Worse, as ltdl would search through the path for candidates, it would only report the last error it saw from the underlying dlopen interface. It was almost always the case that if A and B were in the search path, and A/foo.so failed to load because of a missing dependency, the error you would get as a user would instead be "file not found", because ltdl swallowed the first error and kept trucking to try to load B/foo.so which didn't exist.

In summary, this is a case where the benefits of an abstraction layer decline over time. For a few years now, libltdl hasn't been paying for itself. Libtool is dead, for all intents and purposes (last release in 2015); best to make plans to migrate away, somehow.

In the case of the dlopen replacement, in Guile we ended up rewriting the functionality in Scheme. The underlying facility is now just plain dlopen, for which we shim a version of dlopen on Windows, inspired by the implementation in cygwin. There are still platform-specific library extensions, but that is handled easily on the Scheme layer.

Looking forward, I think it's probably time to replace Guile's use of libtool to create its libraries and executables. I loathe the fact that libtool puts shell scripts in the place of executables in build directories and stashes the actual executables elsewhere -- like, visceral revulsion. There is no need for that nowadays. Not sure what to replace it with, nor on what timeline.

And what about autotools? That, my friends, would be a whole nother blog post. Until then, & probably sooner, happy hacking!

by Andy Wingo at April 08, 2021 07:09 PM

Jacobo Aragunde

A recap of Chromium dialog accessibility enhancements

It’s been a bit more than a year since I started working on Chromium accessibility. Although I’ve worked on several kinds of issues, a lot of them had to do with UI dialogs of some kind. Browser interfaces present many kinds of dialogs and these problems affected all the desktop platforms. This post is a recap of all the kinds of things that have been fixed related with different uses of dialogs in Chromium!

JavaScript prompt dialogs

Several JavaScript things cause dialogs to be shown: Things like alert(), prompt() or even in certain cases the window.beforeunload event (if it is a function which returns a string, this is presented in a dialog before unload happens). These were reported in #779501, #1164927 and #1166370. On Windows ATs, these dialogs were not reading the actual content of the dialog when announced, thus defeating their purpose. This was fixed by removing some workarounds in the MessageBoxView class, a reusable component found inside these dialogs which used to implement alert-like behavior. This confused some ATs, which saw an alert inside a dialog; now it’s the parent dialog who might behave like an alert if needed.

Once this was fixed, we took also care of some speech duplication caused by several items having the same accessible name, reported as #1185092.

Save password and credit card dialogs

But browser interfaces themselves also use dialogs to create (hopefully) helpful interfaces for their users to do things like offering to save sensitive information like save a password or credit card information. Sadly, neither the save password or credit card dialogs were announced (#1079320, #1119367, #1125118). A previous attempt to enforce announcing “save password” (of which I talked in a previous post) caused it to be announced twice when using the toolbar icon to open it (#1132318).

These dialogs are particular because they have two behaviors: they can be explicitly opened using a toolbar icon, in which case they get focus immediately and must use the standard dialog role, or can be opened when a password is submitted. In the latter case, they don’t get focus and so they need to be announced as alerts. We fixed the problem by making sure we use one or the other role depending on how they are triggered, which enforces the correct code paths to produce alert event only when necessary.

Another useful things browsers offer to do is translate a page. This problem also affected the translate page dialog (#1119734). It shares code and behavior with the aforementioned ones and was fixed at the same time.

Accounts and sync menu

Accounts and sync, the avatar icon in the toolbar, looks and behaves more similarly to a menu but it’s implemented with dialog (bubble) code. There were, unfortunately, some problems with it: it was not announced when opened (#1098304) and contents that were informative and not focusable were not announced (#1078580 and #1161166). We used a variety of techniques to fix the problem:

  • The bubble already used a menu-like container role; so we enforce it to behave completely like a menu, using all the corresponding roles and emitting menu open/close events.
  • Use the informative, inaccessible titles as accessible names for containers, to make ATs announce them by context when focus jumps inside the container. Enforce certain role names for this to work on Windows.
  • Make the top-most, inaccessible content (user name and email) part of the menu title for it to be announced by context when the menu/dialog is opened.

Extension-related dialogs

Extension management also involves a lot of dialogs (install, manage, remove, confirmation). We detected and fixed the issue #1123107: the “add extension” dialog was not properly announced on Linux.

System dialogs

Browsers also launch system dialogs (open, save, print, etc). In the issue #1042864, keystroke events were not being produced in these cases.. This was quite a complex one! I wrote two posts dedicated to just this problem and its solution.

Restore after crash dialog

If your browser crashes, a dialog asking the user if they’d like to restore the tabs that were open before the crash is show. This dialog was not being announced (#1023822) and could not be focused with the Alt+Shift+A and F6 hotkeys (#1042010). These were the first issues I worked on, and wrote a specific post about them as well.

Site permission dialogs

Dialogs are also used in browsers to ask for their explicit permissions to access powerful features. We briefly mentioned how these were fixed in a previous post and we fixed their announcement (#1052675) and their ability to be focused with hotkeys (#1052676). In this process, I wrote the required code to emit alert events in the Linux accessibility backend, which was still missing at that point.

Additionally, I added more tests and cleaned up things a bit, refactoring code and removing some redundant events. It’s been quite a trip, building and testing on all the desktop platforms, taking into account platform-specific behavior and AT quirks… I still plan to keep working on this area, hardening automated testing if possible to prevent regressions, which should reduce the amount of effort in the long run.

Thanks for reading, and happy hacking!

by Jacobo Aragunde Pérez at April 08, 2021 04:00 PM

April 06, 2021

Eleni Maria Stea

Setting up to debug ANGLE with GDB (on Linux)

ANGLE is an EGL/GLES2 implementation on top of other graphics APIs. It is mainly used in systems that lack a native GLES2/GLES3 driver and in some browsers for example Chromium. As recently, I’ve used it for some browsers related work in Igalia‘s WebKit team (more on that coming soon) and had to set it up … Continue reading Setting up to debug ANGLE with GDB (on Linux)

by hikiko at April 06, 2021 06:53 PM

April 04, 2021

Manuel Rego

:focus-visible in WebKit - March 2021

Another month is gone, and we are back with another status update (see January and February ones).

This is about the work Igalia is doing on the implementation of :focus-visible in WebKit. This is a part of the Open Prioriziatation campaign and being sponsored by many people. Thank you!

The work on March has slowed down, so this status update is smaller than previous ones. Main focus has been around spec discussions trying to reach agreement.

Implementation details

The initial patch is available in the last Safari Technology Preview (STP) releases behind a runtime flag, but it has an annoying bug that was causing the body element to match :focus-visible when you used the keyboard to move focus. The issue was fixed past month but it hasn’t been included on a STP release yet (hopefully it’ll made it in release 124). Apart from that some minor patches related to implementation details have landed too. But this was just a small part of the work during March.

In addition I realized that :focus-visible appears in the Chromium and Firefox DevTools, so I took a look about how to make that happen on WebKit too. At that point I realized that :focus-within, which has been shipping for a long time, isn’t available in WebKit Web Inspector yet, so I cooked a simple patch to add it there. However that hasn’t landed yet, because it needs some UI rework, otherwise the list of pseudo-classes is going to be too long and not looking nice on the inspector. So the patch is waiting for some changes on the UI before it can be merged. Once that’s solved, adding :focus-within and :focus-visible to the Web Inspector is going to be pretty straight forward.

Spec discussions

This was the main part of the work during March, and the goal was to reach some agreement before finishing the implementation bits.

The main issue was how :focus-visible should work when a script moves focus. The behavior from the current implementations was not interoperable, the spec was not totally clear and, as explained on the previous report, in order to clarify this I created a set of new tests. These tests demonstrated some interesting incompatibilities. Based on this, we compared the results with the widely used polyfill as well. We found that there were various misalignments on tricky cases which generated significant discussions on which was correct, and why. After considerable discussion with people from Google and Mozilla, it looks like we have finally reached an agreement on the expectations.

Next was to see if we could clarify the text so that these cases couldn’t be interpreted in importantly incompatible ways, and following the advice from the CSS Working Group, I worked on a PR for the HTML spec trying to define when a browser should draw a focus indicator, and thus match :focus-visible. There some discussion about which elements should always match :focus-visible and how to define that in a normative text was raised (as some elements like <select> draw a focus ring in some browsers and not other when clicked, and some elements like <input type="date"> allow keyboard input or not depending on the platform). The discussion is still ongoing, and we’re still trying to find the proper way to define this in the HTML spec. Anyway if we manage to do that, that would be a great step forward regarding interoperability of :focus-visible implementations, and a big win for the final people using this feature.

Apart from that I’ve also created a test for my proposal about how :focus-visible should work in combination with Shadow DOM, but I don’t think I want to open that can of worms until the other part is done.

Some numbers

Let’s take a look to the numbers again, though things have moved slowly this time.

  • 21 PRs merged in WPT (1 in March).
  • 17 patches landed in WebKit (3 in March).
  • 7 patches landed in Chromium.
  • 2 PRs merged in CSS spcs (1 in March).
  • 1 PR merged in HTML spec.

Next steps

The main goal for April would be to close the spec discussions and land the PRs in HTML spec, so we can focus again in finishing the implementation in WebKit.

However if reaching an agreement to make these changes on HTML spec is not possible, probably we can still land some patches on the implementation side, to add some support for script focus on the WebKit side.

Stay tuned for more updates!

April 04, 2021 10:00 PM

March 31, 2021

Pablo Saavedra

Mini-PC Fulong 2.0 Inside

The history of the world is a continuous succession of  contradictions. The announcement from MIPS Technologies about their decision of definitely abandoning MIPS arch in favour of RISC-V is just another example. But the truth is that things are far from trivial in this topic. Even when the end-of-life date for the MIPS architecture looks closer in time than ever,  there are still infrastructures and platforms what need to keep being supported and maintained for this architecture in the meantime. To make the situation more complex, at the same time I am writing this post the Loongson Technology Ltd  is  announcing a new 16-Core MIPS 12nm CPU for 256-Core (Tom’s Hardware news). Loongson Technology also says that they keep a strong commitment with RISC-V for the future but they will keep their bet for MIPS64 in the meantime. So if MIPS is going to die it going be in lovely death.

In this context, here in Igalia we are hosting and maintaining the CI workers for the JavaScriptCore 32-bit (MIPS) infrastructure for the WebKit web browser engine.

No one ever said that finding end-user hardware of this kind of system is easy-peasy. The options in the market often don’t achieve the sufficient level of maturity or come with a poor set of hardware specifications. The choices are often not intended for long time-consuming CPU tasks, or they simply lack good OS support (maintenance, updates, custom kernels, open-source drivers …).

Nowadays we are using a parallelized cluster of MIPSEL CI 20 boards to move the JavaScriptCore 32-bits (MIPS) CI workers. Don’t get me wrong: the CI 20 boards are certainly not bad. These boards are really great for development and evaluation purposes, but even rare failures become commonplace when you run 30 of them 24/7 in parallel. For this reason some time ago we started looking for an alternative that would eventually replace them. And this was when we found the following candidate.

The candidate

We had a look at what Debian was using for their QA infrastructure and talked to the MIPS team – credits to Berto García who helped us with this – and we concluded that the Loongson 3B4000 MIPSel board was a promising option so we decided to explore it.

We started looking for information about this CPU model and we found references for the Loongson 3A4000 + Fuloong 2.0 Mini-PC. This computer is a kind of very interesting end-user product based on the MIPS64el architecture. In particular, this computer uses a similar but more recent and powerful evolution of the Loongson 3B4000 processor. The Fuloong 2.0 comes in a barebone format with the Loongson-3A R4 (Loongson-3A4000) @ 1500MHz, a quad-core processor, with 8GB DDR4 RAM and a 1TB NVMe of internal storage. These technical specifications are completed with a Realtek ALC662 sound card, 2x USB 3.0 ports + 1x USB Type-C + 4x USB 2.0, 2x HDMI video outputs, 2x Ethernet (WGI211AT), audio connectors, M.2 slot for WiFi module and, finally, a Vivante GL1000 GPU (OpenGL ES 2.0/1.1). This specifications are clearly far from the common constraints of the regular development MIPS boards and are technically a serious candidate for replacing the current boards used in the CI cluster.

However, the acquisition of this kind of products has some non-technical cons that is important to have in mind before taking any decision. For example, it is very difficult to find a reseller in Europe providing this kind of machines. This means that this computer needs to be directly shipped from China, which also means that the acquisition process can suffer from the common problems of this kind of orders: higher delivery time (~1 month), paperwork for customs, taxes, delivery tracking issues … Anyway, this post is intended to keep the focus on the technical details ;-). The fact is, once these issues are solved you will receive a machine similar to this one shown in the photos:

Mini-PC Fulong 2.0 running Linux Distro Mini-PC Fulong 2.0 Closed Mini-PC Fulong 2.0 Inside

The unboxing

The machine comes with a pre-installed custom distro (“Dragon Dream F28”, based on Fedora 28). This distro is quite old but it is the one provided by the manufacturer (Lemote). Apparently it is the only one that, in theory, fully supports the machine. The installed image comes with a desktop environment on top of an X server. The distro is also synced with an RPM repository hosted by Lemote. This is really convenient to start experimenting with the computer and very useful to get information about the system before taking any action on the computer. Here is the output of some commands:

# cat /proc/cpuinfo
system type : generic-loongson-machine
machine : loongson,generic
processor : 0
cpu model : Loongson-3 V0.4 FPU V0.1
model name : Loongson-3A R4 (Loongson-3A4000) @ 1500MHz
CPU MHz : 1500.00
BogoMIPS : 2990.15
wait instruction : yes
microsecond timers : yes
tlb_entries : 2112
extra interrupt vector : no
hardware watchpoint : no
isa : mips1 mips2 mips3 mips4 mips5 mips32r1 mips32r2 mips64r1 mips64r2
ASEs implemented : vz msa loongson-mmi loongson-cam loongson-ext loongson-ext2
shadow register sets : 1
kscratch registers : 6
package : 0
core : 0
... (x4)

dmesg:

Mar 9 12:43:19 fuloong-01 kernel: [ 2.884260] Console: switching to colour frame buffer device 240x67 
Mar 9 12:43:19 fuloong-01 kernel: [ 2.915928] loongson-drm 0000:00:06.1: fb0: loongson-drmdrm frame buffer device 
Mar 9 12:43:19 fuloong-01 kernel: [ 2.919792] etnaviv 0000:00:06.0: Device 14:7a15, irq 93 
Mar 9 12:43:19 fuloong-01 kernel: [ 2.920249] etnaviv 0000:00:06.0: model: GC1000, revision: 5037 
Mar 9 12:43:19 fuloong-01 kernel: [ 2.920378] [drm] Initialized etnaviv 1.3.0 20151214 for 0000:00:06.0 on minor 1

lsblk:

# lsblk
nvme0n1 259:0 0 477G 0 disk
├─nvme0n1p1 259:1 0 190M 0 part /boot/efi
├─nvme0n1p2 259:2 0 1,7G 0 part /boot
├─nvme0n1p3 259:3 0 7,5G 0 part [SWAP]
├─nvme0n1p4 259:4 0 46,6G 0 part /
└─nvme0n1p5 259:5 0 421,1G 0 part /home

Getting Debian into the Fuloong 2.0

The WebKitGTK and WPE WebKit CI infrastructure is entirely based on Debian Stable and/or Ubuntu LTS. This is according to the WebKitGTK maintenance and development policy. For that reason we were pretty interested in getting the machine running with Debian Stable (“buster” as of this writing). So what comes next is the description of the installation process of a pure Debian base system hybridized with the Lemote Fedora Linux kernel using an external USB storage stick as the bootable disk. The process is a mix between the following two documents:

Those documents provide a good detailed explanation of the steps to follow to perform the installation. Only the installation of the kernel and the grub2-efi differs a bit but let’s come back to that later. The idea is:

  • Set the EFI/BIOS to boot from the USB storage (EFI)
  • Install the base Debian OS in a external microSD card connected to the USB3-SS port
  • Keep using the internal nvme disk as the working dir (/home, /var/lib/lxc)

The installation process is initiated in the pre-installed Fedora image. The first action is to mount the external USB storage (sda) in the living system as follows:

# lsblk
sda 8:0 1 14,9G 0 disk
├─sda1 8:1 1 200M 0 part /mnt/debinst/boot/efi
└─sda2 8:2 1 10G 0 part /mnt/debinst
nvme0n1 259:0 0 477G 0 disk
├─nvme0n1p1 259:1 0 190M 0 part /boot/efi
├─nvme0n1p2 259:2 0 1,7G 0 part /boot
├─nvme0n1p3 259:3 0 7,5G 0 part [SWAP]
├─nvme0n1p4 259:4 0 46,6G 0 part /
└─nvme0n1p5 259:5 0 421,1G 0 part /home

As I said, the steps to install the Debian system into the SDcard are quite straightforward. The problems begins during the installation of GRUB and the Linux kernel …

The Linux Kernel

Having followed the guide we will reach the Install a Kernel step. Debian provides a Loongson Linux 4.19 kernel for the Loongson 3A/3B boards.

ii linux-image-4.19.0-14-loongson-3 4.19.171-2 mips64el Linux 4.19 for Loongson 3A/3B
ii linux-image-loongson-3 4.19+105+deb10u9 mips64el Linux for Loongson 3A/3B (meta-package)
ii linux-libc-dev:mips64el 4.19.171-2 mips64el Linux support headers for userspace development

It is quite old in comparison with the one that the Lemote Fedora distro contains (5.4.63-20201012-def) so I prefered to keep the one, although it should be possible to get the machine running with this kernel as well.

Grub2 EFI, first attempt trying to build it for the device

This is the main issue that I found. The first thing that I tried was to look for a GRUB package with EFI support in the mips64el Debian chroot:

root@fuloong-01:/# apt search grub | grep efi
<<empty>>

The frustration came quickly when I didn’t find any GRUB candidate. It was then when I remembered that there was a grub-yeeloong package in the Debian repository that could be useful in this case. The Yeeloong is the predecessor of the Loongson so what I tried next was to rebuild the GRUB package but adding the mips64el architecture for the grub-yeeloong package. Something like the following:

  • Getting the Debian sources and dependencies for the grub2 packages:
    apt source grub2
    apt install debhelper patchutils python flex bison po-debconf help2man texinfo xfonts-unifont libfreetype6-dev gettext libdevmapper-dev libsdl1.2-dev xorriso parted libfuse-dev ttf-dejavu-core liblzma-dev wamerican pkg-config bash-completion build-essentia
    
  • Patching the /debian/control file using this patch
  • … and then to build the Debian package:
    ~/debs# cd grub2-2.02+dfsg1 && dpkg-buildpackage
    
    ~/debs/grub2-2.02+dfsg1# ls ../
    grub-common-dbgsym_2.02+dfsg1-20+deb10u3_mips64el.deb grub-yeeloong_2.02+dfsg1-20+deb10u3_mips64el.deb grub2_2.02+dfsg1-20+deb10u3.debian.tar.xz grub2_2.02+dfsg1.orig.tar.xz
    grub-common_2.02+dfsg1-20+deb10u3_mips64el.deb grub2-2.02+dfsg1 grub2_2.02+dfsg1-20+deb10u3.dsc
    grub-mount-udeb_2.02+dfsg1-20+deb10u3_mips64el.udeb grub2-common-dbgsym_2.02+dfsg1-20+deb10u3_mips64el.deb grub2_2.02+dfsg1-20+deb10u3_mips64el.buildinfo
    grub-yeeloong-bin_2.02+dfsg1-20+deb10u3_mips64el.deb grub2-common_2.02+dfsg1-20+deb10u3_mips64el.deb grub2_2.02+dfsg1-20+deb10u3_mips64el.changes
    

The .deb package is built correctly but the problem is the binary. It lacks EFI runtime support so it is not useful in our case:

*******************************************************
GRUB2 will be compiled with following components:
Platform: mipsel-none <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
With devmapper support: Yes
With memory debugging: No
With disk cache statistics: No
With boot time statistics: No
efiemu runtime: No (only available on i386)
grub-mkfont: Yes
grub-mount: Yes
starfield theme: Yes
With DejaVuSans font from /usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf
With libzfs support: No (need zfs library)
Build-time grub-mkfont: Yes
With unifont from /usr/share/fonts/X11/misc/unifont.pcf.gz
With liblzma from -llzma (support for XZ-compressed mips images)
With quiet boot: No
*******************************************************

This is what happens if you still try to install it:

root@fuloong-01:~/debs/grub2-2.02+dfsg1# dpkg -i ../grub-yeeloong-bin_2.02+dfsg1-20+deb10u3_mips64el.deb ../grub-common_2.02+dfsg1-20+deb10u3_mips64el.deb ../grub2-common_2.02+dfsg1-20+deb10u3_mips64el.deb
root@fuloong-01:~/debs/grub2-2.02+dfsg1# grub-install /dev/sda
Installing for mipsel-loongson platform.
...
grub-install: warning: WARNING: no platform-specific install was performed. <<<<<<<<<<
Installation finished. No error reported.

There is not glue between EFI and GRUB. Files like BOOTMIPS.EFI, gcdmips64el.efi and grub.efi are missing so this is package is not useful at all:

root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/
System.map-4.19.0-14-loongson-3 config-4.19.0-14-loongson-3 efi grub grub.elf initrd.img-4.19.0-14-loongson-3 vmlinux-4.19.0-14-loongson-3
root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/grub
fonts grubenv locale mipsel-loongson
root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/efi/
<<empty>>
root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/
System.map-4.19.0-14-loongson-3 config-4.19.0-14-loongson-3 efi grub grub.elf initrd.img-4.19.0-14-loongson-3 vmlinux-4.19.0-14-loongson-3
root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/grub
grub/ grub.elf

The grub-install command will also confirm that the mips64el-efi target is not supported:

root@fuloong-01:~/debs/grub2-2.02+dfsg1# /usr/sbin/grub-install --help
Usage: grub-install [OPTION...] [OPTION] [INSTALL_DEVICE]
Install GRUB on your drive.
...
--target=TARGET install GRUB for TARGET platform
[default=mipsel-loongson]; available targets:
arm-efi, arm-uboot, arm64-efi, i386-coreboot,
i386-efi, i386-ieee1275, i386-multiboot, i386-pc,
i386-qemu, i386-xen, i386-xen_pvh, ia64-efi,
mips-arc, mips-qemu_mips, mipsel-arc,
mipsel-loongson, mipsel-qemu_mips,
powerpc-ieee1275, sparc64-ieee1275, x86_64-efi,
x86_64-xen

Second attempt, the loongson-community Grub2 EFI

Now that we know that we can not use an official Debian package to install and configure GRUB it is time for a bit of google-fu.

I must have a lot of practice since it only took me a short while to find that the Lemote Fedora distro provides its own GRUB package for the Loongson and, later, I found new hope reading this article. This article explains how to build the GRUB from loongson-community with EFI support so what I would do next was the obvious logical step: To try to build it and check it:

    • git clone https://github.com/loongson-community/grub.git
      cd grub
      bash autogen.sh
      ./configure --prefix=/opt/alternative/
      make ; make install
    • The configure output looks promising:
      *******************************************************
      GRUB2 will be compiled with following components:
      Platform: mips64el-efi <<<<<<<<<<<<<<<<< Looks good.
      With devmapper support: Yes
      With memory debugging: No
      With disk cache statistics: No
      With boot time statistics: No
      efiemu runtime: No (not available on efi)
      grub-mkfont: No (need freetype2 library)
      grub-mount: Yes
      starfield theme: No (No build-time grub-mkfont)
      With libzfs support: No (need zfs library)
      Build-time grub-mkfont: No (need freetype2 library)
      Without unifont (no build-time grub-mkfont)
      With liblzma from -llzma (support for XZ-compressed mips images)
      *******************************************************

    … but unfortunately I started to have more and more build errors in every step. Errors like these:

cc1: error: position-independent code requires ‘-mabicalls’
grub_script.yy.c:19:22: error: statement with no effect [-Werror=unused-value]
build-grub-module-verifier: error: unsupported relocation 0x51807.

… so after several attempts I finally gave up trying to build the loongson-community with GRUB EFI support. Here the patch with some of the modifications that I tried in the code just in case you are better at solving these build errors than me.

Third attempt, reusing the GRUB2 EFI resources from the pre-installed system

… and the last one.

My winner horse was the simpler solution: to reuse the /boot and /boot/efi directories installed in the Fedora system as base for a new Debian system:

    • Clone the tree in the destination dir:
      cp -a /boot /mnt/debinst/boot
    • Replace the UUIDs patch

    The /boot dir in the target installation will be look like this:

    [root@fuloong-01 boot]# tree /mnt/debinst/boot/
    /mnt/debinst/boot/
    ├── boot -> .
    ├── config-5.4.60-1.fc28.lemote.mips64el
    ├── e8a27b4e4fcc4db9ab7a64bd81393773
    │   └── 5.4.60-1.fc28.lemote.mips64el
    │   ├── initrd
    │   └── linux
    ├── efi
    │   ├── boot
    │   │   ├── grub.cfg
    │   │   └── grub.efi
    │   ├── EFI
    │   │   ├── BOOT
    │   │   │   ├── BOOTMIPS.EFI
    │   │   │   ├── fonts
    │   │   │   │   └── unicode.pf2
    │   │   │   ├── gcdmips64el.efi
    │   │   │   ├── grub.cfg
    │   │   │   └── grubenv
    │   │   └── fedora
    │   ├── mach_kernel
    │   └── System
    │   └── Library
    │   └── CoreServices
    │   └── SystemVersion.plist
    ├── extlinux
    ├── grub2
    │   ├── grubenv -> ../efi/EFI/BOOT/grubenv
    │   └── themes
    │   └── system
    │   ├── background.png
    │   └── fireworks.png
    ├── grub.cfg
    ├── grub.efi
    ├── initramfs-5.4.60-1.fc28.lemote.mips64el.img
    ├── loader
    │   └── entries
    │   └── e8a27b4e4fcc4db9ab7a64bd81393773-5.4.60-1.fc28.lemote.mips64el.conf
    ├── lost+found
    ├── System.map-5.4.60-1.fc28.lemote.mips64el
    ├── vmlinuz-205
    └── vmlinuz-5.4.60-1.fc28.lemote.mips64el

… et voilà!

Finally we have a pure Debian Buster root base system hybridized with the Lemote Fedora Linux kernel:

root@fuloong-01:~# cat /etc/debian_version
10.8
root@fuloong-01:~# uname -a
Linux fuloong-01 5.4.60-1.fc28.lemote.mips64el #1 SMP PREEMPT Mon Aug 24 09:33:35 CST 2020 mips64 GNU/Linux
root@fuloong-01:~# cat /etc/apt/sources.list
deb http://httpredir.debian.org/debian buster main contrib non-free 
deb-src http://httpredir.debian.org/debian buster main contrib non-free 
deb http://security.debian.org/ buster/updates main contrib non-free 
deb http://httpredir.debian.org/debian/ buster-updates main contrib non-free
root@fuloong-01:~# apt update
Hit:1 http://httpredir.debian.org/debian buster InRelease
Get:2 http://security.debian.org buster/updates InRelease [65,4 kB]
Get:3 http://httpredir.debian.org/debian buster-updates InRelease [51,9 kB]
Get:4 http://security.debian.org buster/updates/main mips64el Packages [242 kB]
Get:5 http://security.debian.org buster/updates/main Translation-en [142 kB]
Fetched 501 kB in 1s (417 kB/s)                                
Reading package lists... Done
Building dependency tree       
Reading state information... Done
3 packages can be upgraded. Run 'apt list --upgradable' to see them.

With this hardware we can reasonably run native GDB directly on it and have the possibility to run other tools in the host (e.g. you can run any monitoring agent on it to get stats and so). Definitely, having this hardware enabled for using it in the CI infrastructure will be a promising step towards a better QA for the project.
That is all from my side. I will probably continue dedicating some time to get buildable packages of GRUB-EFI and the Linux Kernel that we could use for this and similar machines (e.g. for tools like perf who needs to have the userspace binaries in sync with the kernel version). In the meantime, I really hope that this can be useful to someone out there who is interested in this hardware. If you have some comment or question or you simply wish to share your thoughts about this just leave a comment.

Stay safe!

by Pablo Saavedra at March 31, 2021 07:16 AM

March 25, 2021

Andy Wingo

here we go again

Around 18 months ago, Richard Stallman was forced to resign from the Free Software Foundation board of directors and as president. It could have been anything -- at that point he already had a history of behaving in a way that was particularly alienating to women -- but in the end it was his insinuation that it was somehow OK if his recently-deceased mentor Marvin Minsky, then in his 70s or 80s, had sex with a 17-year-old on Jeffrey Epstein's private island. A weird pick of hill to stake one's reputation on, to say the least.

At the time I was relieved that we would finally be getting some leadership renewal at the FSF, and hopeful that we could get some mission renewal as well. I was also looking forward to the practical implications would be for the GNU project, as more people agreed that GNU was about software freedom and not about its founder.

But now we're back! Not only has RMS continued through this whole time to insist that he runs the GNU project -- something that is simply not the case, in my estimation -- but this week, a majority of a small self-selected group of people, essentially a subset of current and former members of the FSF board of directors and including RMS himself, elected to reinstate RMS to the board of the Free Software Foundation. Um... read the room, FSF voting members? What kind of message are you sending?

In this context I can only agree with the calls for the entire FSF board to resign. The board is clearly not fit for purpose, if it can make choices like this.

dissociation?

I haven't (yet?) signed the open letter because I would be in an inconsistent position if I did so. The letter enjoins people to "refuse to contribute to projects related to the FSF and RMS"; as a co-maintainer of GNU Guile, which has its origins in the heady 1990s of the FSF but has nothing to do any more with RMS, but whose copyrights are entirely held by the FSF, is hosted on FSF-run servers, and is even obliged (GPLv3 §5d, as referenced by LGPLv3) to print out Copyright (C) 1995-2021 Free Software Foundation, Inc. when it starts, I must admit that I contribute to a project that is "related to the FSF". But I don't see how Guile could continue this association, if the FSF board continues as it is. It's bad for contributors and for the future of the project.

It would be very tricky to disentangle Guile from the FSF -- consider hosting, for example -- so it's not the work of a day, but it's something to think about.

Of course I would rather that the FSF wouldn't allow itself to be seen as an essentially misogynist organization. So clean house, FSF!

on the nature of fire

Reflecting on how specifically we could have gotten here -- I don't know. I don't know the set of voting members at the FSF, what discussions were made, who voted what. But, having worked as a volunteer on GNU projects for almost two decades now, I have a guess. RMS and his closest supporters see themselves as guardians of the flame of free software -- a lost world of the late 70s MIT AI lab, reborn in a flurry of mid-80s hack, but since 25 years or so, slipping further and further away. These are dark times, in their view, and having the principled founder in a leadership role can only be a good thing.

(Of course, the environment in the AI lab was only good for some. The treatment of Margaret Hamilton as recounted in Levy's Hackers shows that not all were welcome. If this were just one story, I would discount it, but looking back, it does seem to be part of a pattern.)

But is that what the FSF is for today? If so, Guile should certainly leave. I'm not here for software as perfomative nostalgia -- I'm here to have fun with friends and start a fire. The FSF should look to do the same -- look at the world we are in, look where the energy is now, and engage in real conversations about success and failure and tactics. There is a world to win and doubling down on RMS won't get us there from here.

by Andy Wingo at March 25, 2021 12:22 PM

March 17, 2021

Samuel Iglesias

VK_KHR_depth_stencil_resolve support on Turnip

Last year, I have been working on Turnip driver development as my daily job at Igalia. One of those tasks was implementing the support for VK_KHR_depth_stencil_resolve extension.

VK_KHR_depth_stencil_resolve

This extension adds support for automatically resolving multisampled depth/stencil attachments in a subpass in a similar manner as for color attachments. This extension, however, does not add support for resolving msaa depth/stencil images with vkCmdResolveImage() command.

As you can imagine, this extension is easy to use by any application. Unless you are using a driver that supports Vulkan 1.2 or higher (VK_KHR_depth_stencil_resolve was promoted to core in Vulkan 1.2), you should check first if it is supported. Once done that, ask the driver which features are supported by extending VkPhysicalDeviceProperties2 with VkPhysicalDeviceDepthStencilResolveProperties structure when calling vkGetPhysicalDeviceProperties2().

struct VkPhysicalDeviceDepthStencilResolveProperties {
    VkStructureType       sType;
    void*                 pNext;
    VkResolveModeFlags    supportedDepthResolveModes;
    VkResolveModeFlags    supportedStencilResolveModes;
    VkBool32              independentResolveNone;
    VkBool32              independentResolve;
}

This structure will be filled by the driver to indicate the different depth/stencil resolve modes supported by the driver (more info about their meaning and possible values in the spec).

Next step: just fill a VkSubpassDescriptionDepthStencilResolve struct with the resolve mode wanted and the depth/stencil attachment used for resolve. Then, extend the VkSubpassDescription2 struct with it. And that’s all.

struct VkSubpassDescriptionDepthStencilResolve {
    VkStructureType                  sType;
    const void*                      pNext;
    VkResolveModeFlagBits            depthResolveMode;
    VkResolveModeFlagBits            stencilResolveMode;
    const VkAttachmentReference2*    pDepthStencilResolveAttachment;
}

Turnip implementation

Implementing this extension on Turnip was more or less direct, although there were some issues to fix.

For all depth/stencil formats, it was added its support in the both resolve paths used by the driver, one for sysmem (system memory) and other for gmem (tile buffer), including flushing the depth cache when needed. However, for VK_FORMAT_D32_SFLOAT_S8_UINT format, which has the stencil part in a separate plane, it was needed to add specific code to extend the 2D path used in the driver for sysmem resolves.

However the main issue when was testing the extension implementation on Adreno 630 GPU. It turns out that VK-GL-CTS tests exercising this extension for MSAA VK_FORMAT_D24_UNORM_S8_UINT formats were failing always, except when disabling UBWC via TU_DEBUG=noubwc environment variable. UBWC (Universal Bandwidth Compression) is a HW feature designed to improve throughput to system memory by minimizing the bandwidth of data (which also gives some power savings). The problem was that the UBWC support for MSAA VK_FORMAT_D24_UNORM_S8_UINT format is known to be failing on Adreno 630 and Adreno 618 (see merge request for freedreno driver). I just needed to disable it Turnip to fix these failures.

I also found other VK_KHR_depth_stencil_resolve CTS tests failing: the ones testing the format compatibility for VK_FORMAT_D32_SFLOAT_S8_UINT and VK_FORMAT_D24_UNORM_S8_UINT formats. For VK_FORMAT_D32_SFLOAT_S8_UINT failures, it was needed to take into account the particularity that it has a separate plane for the stencil part when resolving it to VK_FORMAT_S8_UINT. In the VK_FORMAT_D24_UNORM_S8_UINT failures, the problem was that we were setting wrongly the resolve mode used by the HW: it was wrongly doing a sample average, when we wanted to use the value of sample 0. This merge request fixed both issues.

And that’s all, this was a extension that allowed me to dive into the different resolve paths used by Turnip and learn one or two things about the HW ;-) Thanks a lot to Jonathan Marek for his reviews and suggestions to improve the implementation of this extension.

Happy hacking!

March 17, 2021 08:14 AM

March 16, 2021

Iago Toral

Improving performance of the V3D compiler for OpenGL and Vulkan

Lately, I have been looking at improving performance of the V3DV Vulkan driver for the Raspberry Pi 4. So far we had been toying a lot with some Vulkan ports of the Quake trilogy but we wanted to have a look at more modern games as well, and for that we started to look at some Unreal Engine 4 samples, particularly the Shooter demo.


Unreal Engine 4 Shooter Demo

In our initial tests with “High” settings, even at 480p we were running the sample at 8-15 fps, with 720p being mostly in the 5-10 fps range. Not great, obviously, but a good opportunity to guide and focus our performance efforts.

One aspect of the UE4 sample that was immediately obvious compared to the Quake games is that the shading is a lot more expensive in general, and more specifically, it involves more texture lookups and UBO loads, which require expensive accesses to memory from the shader cores, so this was the first area we targeted. The good thing about this is that because our Vulkan and OpenGL drivers share the compiler stack, any optimizations we do here benefit both drivers.

What follows is a brief discussion of some of the optimizations we did recently to our backend compiler and the results we have observed from this work.

Optimizing the backend compiler

So the first thing we tackled was better managing the latency of texture lookups. Interestingly, the NIR scheduler was setting this up so that we would try to put instructions that consumed the result of a texture lookup as far away as possible from the instruction that triggered the lookup, but then the backend compiler was not fully taking advantage of this and would still end up waiting on lookup results sooner than it should.

Fixing this helped performance by 1%-3%, although it could go a bit above that in some cases. It has a caveat though: doing this extends the liveness of our lookup sequences, and that makes spilling more difficult (we can’t emit spills/unspills in the middle of an outstanding memory lookup), so when register pressure is high enough that we need to inject register spills to compile the shader, we would typically be a lot more constrained and end up producing significantly worse code, or even worse, failing to register allocate for the shader completely. To avoid this, we recompile the shader without the optimization if we detect that we need to do any spilling. One can use V3D_DEBUG=perf to detect if this is happening for any shaders, looking for messages like this:

Falling back to strategy 'disable TMU pipelining' for MESA_SHADER_FRAGMENT.

While the above optimization was useful, I was expecting that it would make a larger impact for this particular demo so I kept looking for ways to do our memory lookups more efficient. One thing that is relevant to this analysis is that we were using the same hardware unit for both texture and UBO lookups, but for the latter, we could really use a different strategy by handling our UBOs as uniform streams. This has some caveats, but making a long story short, the fact that many UBO loads use uniform addresses, that we usually read a bunch of consecutive scalars from a UBO load (such as a full vec4) and that applications usually emit UBO loads for nearby addresses together, we can emit fairly optimal code for many of these, leading to more efficient memory access in general.

Eventually, this turned out to be a big win, yielding 20%-30% improvements for the Shooter demo even with the initial basic implementation, which we would then tune and optimize further.

Again related to memory accesses, I have also been improving how we schedule instructions involved with setting up memory lookups. Our scheduler was more restrictive here than it needed, and the extra flexibility can help reduce instruction counts, which will affect these more modern games most, as they are more likely to emit a larger number of texture/image operations in their shaders.

Another thing we did was to improve instruction packing. The V3D GPU is able to emit multiple instructions in the same cycle so long as the instructions meet some requirements. Turns out our assembly scheduler was being too restrictive with this and we could do better. Going by shader-db results, this led to ~5% less instructions on our programs and added another modest 1%-2% performance improvement for the Shooter demo.

Another issue we noticed is that a lot of the shaders were passing a lot of varyings from the vertex to fragment shaders, and our setup for fragment shader inputs was not optimal. The issue here was that there is a specific instruction involved in this process that writes two registers, one of then with an instruction of delay, and our dependency tracking was not able to handle this properly, effectively assuming that both registers are written in the same instruction which then had an impact in how we scheduled instructions. Fixing this required to handle this case specially in our scheduler so we could be more effective at scheduling these instructions in a way that would enable optimal pipelining of the instructions involved with varying setups for fragment shaders.

Another aspect we improved was related to our uniform handling. Generally, there many instances in which we need to emit duplicate uniforms. There are reasons for that related to how the GPU works, but in some cases, for example with consecutive UBO loads, we would emit the uniform with the base address of the UBO multiple times very close to each other. We can obviously do better by trying to track previous uses of a uniform/constant value and reusing them in nearby instructions. Of course, this comes at the expense of increasing register pressure (for reasons beyond the scope of this post our shaders typically require a lot of uniforms), so there is a balancing game to play here too. Reducing the size of our uniform streams this way also plays an important role in lowering some of the CPU overhead of the driver, since these streams need to be rebuilt often when certain pipeline states change.

Finally, an optimization that was more targeted at reducing register pressure rather than improving performance: we noticed that we would sometimes put some instructions far away from their consumers with no obvious benefit to it. This was bad enough some times that it would even cause us to be unable to compile some shaders. Mesa has a handy NIR pass for this called nir_opt_sink, which also proved to be helpful for this. This allowed us to get a few more shaders from shader-db to compile and reduce spilling for a bunch of shaders. For the Shooter demo, this changed a large compute shader involved with histogram post-processing, which had 48 spills and 50 fills to only have 8 spills and 15 fills. While the impact in performance of this is probably very small since the game only runs this pass once per frame, it made a notable difference in loading time, since compiling a shader with this much spilling is very slow at present. I have a mental note to improve this some day, I know Intel managed to fix this for their compiler, but for the time being, this alone managed to make the loading time much more reasonable.

Results

First, here are some shader-db stats, which describe how these optimizations change various statistics for a large collections of real world shaders:

total instructions in shared programs: 14992758 -> 13731927 (-8.41%)
instructions in affected programs: 14003658 -> 12742827 (-9.00%)
helped: 80448
HURT: 4297

total threads in shared programs: 407932 -> 412242 (1.06%)
threads in affected programs: 4514 -> 8824 (95.48%)
helped: 2189
HURT: 34

total uniforms in shared programs: 4069524 -> 3790401 (-6.86%)
uniforms in affected programs: 2267834 -> 1988711 (-12.31%)
helped: 40588
HURT: 1186

total max-temps in shared programs: 2388462 -> 2322009 (-2.78%)
max-temps in affected programs: 897803 -> 831350 (-7.40%)
helped: 30598
HURT: 2464

total spills in shared programs: 6241 -> 5940 (-4.82%)
spills in affected programs: 3963 -> 3662 (-7.60%)
helped: 75
HURT: 24

total fills in shared programs: 14587 -> 13372 (-8.33%)
fills in affected programs: 11192 -> 9977 (-10.86%)
helped: 90
HURT: 14

total sfu-stalls in shared programs: 28106 -> 31489 (12.04%)
sfu-stalls in affected programs: 16039 -> 19422 (21.09%)
helped: 4382
HURT: 6429

total inst-and-stalls in shared programs: 15020864 -> 13763416 (-8.37%)
inst-and-stalls in affected programs: 14028723 -> 12771275 (-8.96%)
helped: 80396
HURT: 4305

Less is better for all stats, except threads. We can see significant improvements across the board: we generally produce shaders with less instructions that have less maximum register pressure, we reduce spills and uniform counts and can run with more threads. We only are worse at stalls, but that is generally because we now produce more compact code with less instructions, so more stalls are expected.

Another good thing in these stats is the large number of helped shaders compared to hurt shaders, meaning that it is very likely that these optimizations will help most applications to some extent.

But enough of boring compiler statistics, most people won’t care about that and what they want to know is how this impacts performance on actual games and applications, which is what the following graphs shows (these were obtained by replaying specific traces with gfx-reconstruct). Keep in mind that while I am using a collection of Vulkan samples/games here it is expected that these optimizations apply to OpenGL too.


Before vs After

Framerate improvement after optimization (in %)

As it can be observed from the graph above, this optimization work made a significant impact in the observed framerate in all the cases. It is not surprising that the UE4 demo is the one that sees the most improvement, considering this the one we used to guide most of the optimization work.

Other optimizations and future work

In this post I have been focusing exclusively on compiler optimizations, but we have also been improving other parts of the Vulkan driver. While I won’t go into details to avoid making this post too long, we have also been improving aspects of the driver involved with buffer to image copies, depth buffer clears, dirty descriptor state management, usage of the TFU unit for transfer operations and more.

Finally, there is one other aspect of this UE4 demo that is pretty obvious as soon as you start a game: it can compile a lot of shaders in the middle of the gameplay loop which can lead to significant stutter. While there is not much we can do about this on the driver side, but adding support for a shader cache on disk should eliminate the problem on sessions after the first, so this is something that we may work on in the future.

We will certainly continue to look at improving the driver performance in the future so stay tuned for further updates on our progress or maybe join us at #videocore on Freenode.

by Iago Toral at March 16, 2021 09:36 AM

Eric Meyer

First Month at Igalia

Today marks one month at Igalia.  It’s been a lot, and there’s more to come, but it’s been a really great experience.  I get to do things I really enjoy and value, and Igalia supports and encourages all of it without trying to steer me in specific directions.  I’ve been incredibly lucky to experience that kind of working environment twice in my life — and the other one was an outfit I helped create.

Here’s a summary of what I’ve been up to:

  • Generally got up to speed on what Igalia is working on (spoiler: a lot).
  • Redesigned parts of wpewebkit.org, fixed a few outstanding bugs, edited most of the rest. (The site runs on 11ty, so I’ve been learning that as well.)
  • Wrote a bunch of CSS tests/demos that will form the basis for other works, like articles and videos.
  • Drafted a few of said articles.  As I write this, two are very close to being complete, and a third is almost ready for editing.
  • Edited some pages on the Mozilla Developer Network (MDN), clarifying or upgrading text in some places and replacing unclear examples in others.
  • Joined the Open Web Docs Steering Committee.
  • Reviewed various specs and proposals (e.g., Miriam’s very interesting @scope proposal).

And that’s not all!  Here’s what I have planned for the next few months:

  • More contributions to MDN, much of it in the CSS space, but also branching out into documenting some up-and-coming APIs in areas that are fairly new to me.  (Details to come!)
  • Contributions to the Web Platform Tests (WPT), once I get familiar with how that process is structured.
  • Articles on topics that will include (but are not limited to!) gaps in CSS, logical properties, and styling based on writing direction.  I haven’t actually settled on outlets for those yet, so if you’d be interested in publishing any of them, hit me up.  I usually aim for about a thousand words, including example markup and CSS.
  • Very likely will rejoin the CSS Working Group after a (mumblecough)-year absence.
  • Assembling a Raspberry Pi system to test out WPEWebKit in its native, embedded environment and get a handle on how to create a “setting up WPEWebKit for total embedded-device noobs”, of which I am one.

That last one will be an entirely new area for me, as I’ve never really worked with an embedded-device browser before.  WPEWebKit is a WebKit port, actually the official WebKit port for embedded devices, and as such is aggressively tuned for performance and low resource demand.  I’m really looking forward to not only seeing what it’s like to use it, but also how I might be able to leverage it into some interesting projects.

WPEWebKit is one of the reasons why Igalia is such a big contributor to WebKit, helping drive its standards support forward and raise its interoperability with other browser engines.  There’s a thread of self-interest there: a better WebKit means a better WPEWebKit, which means more capable embedded devices for Igalia’s clients.  But after a month on the inside, I feel comfortable saying most of Igalia’s commitment to interoperability is philosophical in nature — they truly believe that more consistency and capability in web browsers benefits everyone.  As in, THIS IS FOR EVERYONE.

And to go along with that, more knowledge and awareness is seen as an unvarnished good, which is why they’re having me working on MDN content.  To that end, I’m putting out an invitation here and now: if you come across a page on MDN about CSS or HTML that confuses you, or seems inaccurate, or just doesn’t have much information at all, please get in touch to let me know, particularly if you are not a native English speaker.

I can’t offer translation services, unfortunately, but I can do my best to make the English content of MDN as clear as possible.  Sometimes, what makes sense to a native English speaker is obscure or unclear to others.  So while this offer is open to everyone, don’t hold back if you’re struggling to parse the English.  It’s more likely the English is unclear and imprecise, and I’d like to erase that barrier if I can.

The best way to submit a report is to send me email with [MDN] and the URL of the page you’re writing about in the subject line.  If you’re writing about a collection of pages, put the URLs into the email body rather than the subject line, but please keep the [MDN] in the subject so I can track it more easily.  You can also ping me on Twitter, though I’ll probably ask you to email me so I don’t lose track of the report.  Just FYI.

I feel like there was more, but this is getting long enough and anyway, it already seems like a lot.  I can’t wait to share more with you in the coming months!


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at March 16, 2021 12:53 AM

March 09, 2021

Igalia Compilers Team

Igalia’s Compilers Team in 2020

In a previous blog post, we introduced the kind of work the Igalia compilers team does and gave a mid-year update on our 2020 progress.

Now that we have made our way into 2021, we wanted to recap our achievements from 2020 and update you on the exciting improvements we have been making to the web programming platform. Of course, we couldn’t have done this work alone; all of this was brought to you through our collaborations with our clients and upstream partners in the web ecosystem.

https://twitter.com/rkirsling/status/1276299298020290561

JavaScript class features

Our engineers at Igalia have continued to push forward on improvements to JS classes. Earlier in 2020, we had landed support for public field declarations in JavaScriptCore (JSC). In the latter half of 2020, we achieved major milestones such as getting private class fields into JSC (with optimizing compiler support!):

https://twitter.com/robpalmer2/status/1276378092349657095
https://twitter.com/caitp88/status/1318919341467979776

as well as static public and private fields.

We also helped ship private methods and accessors in V8 version 84. Our work on private methods also landed in JSC and we expect it to be available in future releases of Safari.

These additions will help JS developers create better abstractions by encapsulating state and behavior in their classes.

TC39 and Temporal

Our compilers team also contributed throughout 2020 to web standards through its participation in TC39 and related standards bodies.

One of the big areas we have been working on is the Temporal proposal, which aims to provide better date and time handling in JS. When we blogged about this in mid-2020, the proposal was still in Stage 2 but we’re expecting it to go Stage 3 soon in 2021. Igalians have been working hard on many aspects of the proposal since mid-2020, including managing community feedback, working on the polyfill, and maintaining the documentation.

For more info on Temporal, also check out a talk by one of engineers, Ujjwal Sharma, at Holy JS 2020 Piter.

Another area we have been contributing to for a number of years is the ECMA-402 Internationalization (Intl) standard, an important effort that provides i18n support for JS. We help maintain and edit the specification while also contributing tests and pushing Intl proposals forward. For example, we helped with the test suite of the Intl.Segmenter feature for implementing localized text segmentation, which recently shipped in Chrome. For a good overview of other recent Intl efforts, check out these slides from IUC44.

We’re also contributing to many other proposed features for JS, such as WeakRefs, Decimal (Daniel Ehrenberg from our team gave a talk on this at Node.TLV 2020), Import Assertions, Records & Tuples, Top-level await, and Module blocks & module bundling (Daniel also gave a talk on these topics at Holy JS 2020 Moscow).

Node.js

In addition to our contributions to the client side of the web, we are also contributing to server side use of web engines. In particular, we have continued to contribute to Node.js throughout 2020.

Some notable contributions include adding experimental support for per-context memory measurements in version 13 during early 2020.

Since late 2020, we have been working on improving Node.js startup speed by moving more of the bootstrap process into the startup snapshot. For more on this topic, you can watch a talk that one of our engineers, Joyee Cheung, presented at NodeConf Remote 2020 here (slides are available here).

JSC support on 32-bit platforms

Our group also continues to maintain support in JSC for 32-bit platforms. Earlier in 2020 we contributed improvements to JSC on 32-bit such as tail call optimizations, support for checkpoints, and others.

Since then we have been optimizing LLInt (the low-level interpreter for JSC) on 32-bit, and porting the support of inline caching for delete operations to 32-bit (to improve the performance of delete, you can read about the background on the original optimization from the Webkit blog here).

We also blogged about our efforts to support the for-of intrinsic on 32-bit to improve iteration on JS arrays.

WebAssembly

Finally, we have made a number of contributions to WebAssembly (Wasm), the new low-level compiler-target language for the web, on both the specification and implementation sides.

During 2020, we helped ship and standardize several Wasm features in web engines such as support for multiple-values, which can help compilers to Wasm produce better code, and support for BigInt/I64 conversion in the JS API, which lifts a restriction that made it harder to interact with Wasm programs from JS.

We’ve also improved support in tools such as LLVM for the reference types proposal, which adds new types to the language that can represent references to values from JS or other host languages. Eventually reference types will be key to supporting the garbage collection proposal (in which references are extended to new struct and array types), which will allow for easier compilation of languages that use GC to Wasm.

We’re also actively working on web engine support for exception handling, reference types, and other proposals while continuing to contribute to tools and specification work. We plan to help ship more WebAssembly features in browsers during 2021, so look forward to our mid-year update post!

by Compilers Team at March 09, 2021 09:35 AM

March 03, 2021

Samuel Iglesias

Igalia is hiring!

One of the best decisions I did in my life was when I joined Igalia in 2012. Inside Igalia, I have been working in different open-source projects, most of the time related to graphics technologies, interacting with different communities, giving talks, organizing conferences and, more importantly, contributing to free software as my daily job.

Now I’m thrilled to announce that we are hiring for our Graphics team!

Igalia

Right now we have two open positions:

  • Graphics Developer.

    We are looking for candidates that would like to contribute to open-source OpenGL/Vulkan graphics drivers (Mesa), or other areas of the open-source graphics stack such as X11 or Wayland, among others. If you have experience with them, or you are very motivated to become an expert there, just send us your CV!

  • Kernel Developer.

    We are looking for candidates that either have experience with kernel development or they can ramp-up quickly to contribute to linux kernel drivers. Although no specific subsystem is mentioned in the job position, I encourage you to apply if you have DRM experience and/or ARM[64]/MIPS related knowledge.

Graphics technologies are not your cup of tea? We have positions in other areas like browsers, compilers, multimedia… Just check out our job offers on our website!

What we offer is to work in an open-source consultancy in which you can participate equally in the management and decision-making process of the company via our democratic, consensus-based assembly structure. As all of our positions are remote-friendly, we welcome submissions from any part of the world.

Are you still a student? We have launched the 2021 edition of our Coding Experience program. Check it out!

Igalia's office

March 03, 2021 01:21 PM

February 28, 2021

Manuel Rego

:focus-visible in WebKit - February 2021

One month has passed since the previous report so it’s time for a status update.

As you probably already know, Igalia is working on the implementation of :focus-visible in WebKit, a project that is being sponsored by many people and organizations through the Open Prioriziatation campaing. We’ve reached 84% of the goal, thanks you all! 🎉 And if you haven’t contributed yet, you’re still in time to do it if you believe this work is important for you.

The main highlight for February is that initial work has started to land in WebKit, though some important bits are still under review.

Spec issues

There were some open discussions on the spec side regarding different topics, let’s review here the status of them:

  • :focus-visible on <select> element: After some discussion on the CSS Working Group (CSSWG) issue it was agreed to remove the <select> element from the tests and each browser could decide whether to match :focus-visible when it’s clicked, as there was not a clear agreement regarding how to interpret if it allows or not keybaord input.

    In any case Chromium has still a bug on elements that have a popup (like a <select>). When you click them they match :focus-visible, but they don’t show the focus indicator (because they want to avoid showing two focus indicators, one on the <select> and another one on the <option>). As it’s not showing a focus indicator, it shouldn’t match :focus-visible in that situation actually.

  • :focus-visible on script focus: The spec is not totally clear about when an element should match (or not) :focus-visible after script focus. There has been a nice proposal from Alice Boxhall and Brian Kardell on the CSSWG issue, but when we discussed this in the CSSWG it was decided that was not the proper forum for these discussions. This was because the CSSWG has defined that :focus-visible should match when the browser shows a focus indicator to the user, but it doesn’t care when it’s actually showed or not. That definition is very clear, and despite that each browser shows the focus indicator (or not) in different situations, the definition is still correct.

    Currently the list of heuristics on the spec are not normative, they’re just hints to the browsers about when to match :focus-visible (when to show a focus indicator). But I believe it’d be really nice to have interoperability here, so the people using this feature won’t find weird problems here and there. So the suggestion from the CSSWG was to discuss this in the HTML spec directly, proposing there a set of rules about when a browser should show a focus indicator to the user, those rules would be the current heuristics on the :focus-visible spec with some slight modifications to cover the corner cases that have been discussed. Hopefully we can reach an agreement between the different parties, and manage to define this properly on the HTML spec, so all the implementations can be interoperable on this regard.

    I believe we need to dig deeper in the specific case of script focus, as I’m not totally sure how some scenarios (e.g. blur() before focus() and things like that) should work. For that reason I worked on a set of tests trying to clarify the different situations when the browser should show or not a focus indicator. These need to be discussed with more people to see if we can reach an agreement and prepare some spec text for HTML.

  • :focus-visible and Shadow DOM: Another topic already explained in the previous report. My proposal to the CSSWG was to avoid matching :focus-visible on the ShadowRoot when some element in the Shadow Tree has the focus, in order to avoid having two focus indicators.

    There has been raised a concern, as this would allow to guess if an element is or not a ShadowRoot by focusing it via script and then checking if it matches or not :focus-visible (but ShadowRoot elements shouldn’t be observable). However that’s already possible in WebKit that currently uses :-webkit-direct-focus in the default User Agent style sheet, to avoid the double focus indicator in this case. In WebKit you can focus via script an element and check if it has or not an outline to determine if it’s a ShadowRoot.

    Anyway like in the previous case, this would be part of the heuristics so according to CSSWG’s suggestion, this should be discussed on the HTML spec directly.

Default User Agent style sheet

Early this month I landed a patch to start using :focus-visible in Chromium User Agent style sheet, this is included in version 90. 🚀 This means that from that version on you won’t see an outline when you click on a <div tabindex="0">, only when you focus it with the keyboard. Also the hack :focus:not(:focus-visible) won’t be needed anymore (actually it has been removed from the spec too).

In addition, Firefox is also using :focus-visible on their User Agent style sheet since version 87.

More about tests

During this month there has been still some extra work on the tests. While I was implementing things on WebKit I realized about some minor issues in the tests that have been fixed along the way.

I also found out some limitations of WebKit with regard to testdriver.js support for simulating keyboard inputs. Some of the :focus-visible tests use the method test_driver.send_keys() to send keys like Control or Enter. I added support for them on WebKit. Apart from that, I fixed how modifier keys are identified in WebKitGTK and WPE, as they were not following other browsers exactly (e.g. event.ctrlKey was not set on keydown event, only on keyup).

WebKit implementation

And the most important part, the actual WebKit implementation has been moving forward during this month. I managed to have a patch that passed all the tests, and split it a little bit in order to merge things upstream.

The first patch that just do the parsing of the new pseudo-class and adds a experimental flag has already landed.

Now a second patch is under review. It originally contained the whole implementation that passes all the tests, but due to some discussion on the script focus issues, that part has been removed. Anyway the review is ongoing and hopefully it’ll land soon and you could start testing it in the WebKit nightlies.

Some numbers

Like in the previous post, let’s again review the numbers of what has been done during in this project:

  • 20 PRs merged in WPT (7 in February).
  • 14 patches landed in WebKit (9 in February).
  • 7 patches landed in Chromium (3 in February).
  • 1 PR merged in Selectors spec (1 in February).
  • 1 PR merged in HTML spec (1 in February).

Next steps

First thing is to get the main patch landed in WebKit and verify that things are working as expected on the different platforms.

Another issue we need to solve is to reach an agreement on how script focus should work regarding :focus-visible, and then get that implemented in WebKit covering all the cases.

After that we could request to enable the feature by default in WebKit. Once that’s done, we could discuss the possibility to change the default User Agent style sheet to use :focus-visible too.

There are some interop work pending to do. A few things are failing on Firefox and we could try to help to fix them. Also some weird issues with <select> elements in Chromium that might need some love too. And depending on the ongoing spec discussions there might be some changes needed or not in the different browsers. Anyway, we might find the time or not to do this, let’s see how things evolve in the next weeks.

Big thanks to everyone that has contributed to this project, you’re all wonderful for letting us work on this. 🙏 Stay tuned for more updates in the future!

February 28, 2021 11:00 PM

February 22, 2021

Eric Meyer

First Week at Igalia

The first week on the job at Igalia was… it was good, y’all.  Upon formally joining the Support Team, got myself oriented, built a series of tests-slash-demos that will be making their way into some forthcoming posts and videos, and forked a copy of the Mozilla Developer Network (MDN) so I can start making edits and pushing them to the public site.  In fact, the first of those edits landed Sunday night!  And there was the usual setting up accounts and figuring out internal processes and all that stuff.

A series of tests of the CSS logical property ';block-border'.
Illustrating the uses of border-block.

To be perfectly honest, a lot of my first-week momentum was provided by the rest of the Support Team, and setting expectations during the interview process.  You see, at one point in the past I had a position like this, and I had problems meeting expectations.  This was partly due to my inexperience working in that sort of setting, but also partly due to a lack of clear communication about expectations.  Which I know because I thought I was doing well in meeting them, and then was told otherwise in evaluations.

So when I was first talking with the folks at Igalia, I shared that experience.  Even though I knew Igalia has a different approach to management and evaluation, I told them repeatedly, “If I take this job, I want you to point me in a direction.”  They’ve done exactly that, and it’s been great.  Special thanks to Brian Kardell in this regard.

I’m already looking forward to what we’re going to do with the demos I built and am still refining, and to making more MDN edits, including some upgrades to code examples.  And I’ll have more to say about MDN editing soon.  Stay tuned!


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at February 22, 2021 09:32 PM

February 15, 2021

Eric Meyer

First Day at Igalia

Today is my first day as a full-time employee at Igalia, where I’ll be doing a whole lot of things I love to do: document and explain web standards at MDN and other places, participate in standards work at the W3C, take on some webmaster duties, and play a part in planning Igalia’s strategy with respect to advancing the web.  And likely other things!

I’ll be honest, this is a pretty big change for me.  I haven’t worked for anyone other than myself since 2003.  But the last time I did work for someone else, it was for Netscape (slash AOL slash Time Warner) as a Standards Evangelist, a role I very much enjoyed.  In many ways, I’m taking that role back up at Igalia, in a company whose values and structure are much more in line with my own.  I’m really looking forward to finding out what we can do together.

If the name Igalia doesn’t ring any bells, don’t worry: nobody outside the field has heard of them, and most people inside the field haven’t either.  So, remember when CSS Grid came to browsers back in 2017?  Igalia did the implementation that landed in Safari and Chromium.  They’ve done a lot of other things besides that — some of which I’ll be helping to spread the word about — but it’s the thing that web folks will be most likely to recognize.

This being my first day and all, I’m still deep in the setting up of logins and filling out of forms and general orienting of oneself to a new team and set of opportunities to make a positive difference, so there isn’t much more to say besides I’m stoked and planning to say more a little further down the road.  For now, onward!


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at February 15, 2021 05:23 PM

February 13, 2021

Eleni Maria Stea

About VK_EXT_sample_locations

More than a year ago, I had worked on the implementation of VK_EXT_sample_locations extension for anv, the Intel Vulkan driver of mesa3D, as part of my work for Igalia. The implementation had been reviewed (see acknowledgments) at the time, but as the conformance tests that were available back then had to be improved and that … Continue reading About VK_EXT_sample_locations

by hikiko at February 13, 2021 09:47 PM