Planet Igalia

September 21, 2016

Andy Wingo

is go an acceptable cml?

Yesterday I tried to summarize the things I know about Concurrent ML, and I came to the tentative conclusion that Go (and any Go-like system) was an acceptable CML. Turns out I was both wrong and right.

you were wrong when you said everything's gonna be all right

I was wrong, in the sense that programming against the CML abstractions lets you do more things than programming against channels-and-goroutines. Thanks to Sam Tobin-Hochstadt to pointing this out. As an example, consider a little process that tries to receive a message off a channel, and times out otherwise:

func withTimeout(ch chan int, timeout int) (result int) {
  var timeoutChannel chan int;
  var msg int;
  go func() {
    sleep(timeout)
    timeoutChannel <- 0
  }()
  select {
    case msg = <-ch: return msg;
    case msg = <-timeoutChannel: return 0;
  }
}

I think that's the first Go I've ever written. I don't even know if it's syntactically valid. Anyway, I think we see how it should work. We return the message from the channel, unless the timeout happens before.

But, what if the message is itself a composite message somehow? For example, say we have a transformer that reads a value from a channel and adds 1 to it:

func onePlus(in chan int) (result chan int) {
  var out chan int
  go func () { out <- 1 + <-in }()
  return out
}

What if we do a withTimeout(onePlus(numbers), 0)? Assume the timeout fires first and that's the result that select chooses. There's still that onePlus goroutine out there trying to read from in and at some point probably it will succeed, but nobody will read its value. At that point the number just vanishes into the ether. Maybe that's OK in certain domains, but certainly not in general!

What CML gives you is the ability to express an event (which is kinda like a possibility of sending or receiving a message on a channel) in such a way that we don't run into this situation. Specifically with the wrap combinator, we would make an event such that receiving on numbers would run a function on the received message and return that as the message value -- which is of course the same as what we have, except that in CML the select wouldn't actually read the message off unless it select'd that channel for input.

Of course in Go you could just rewrite your program, so that the select statement looks like this:

select {
  case msg = <-ch: return msg + 1;
  case msg = <-timeoutChannel: return 0;
}

But here we're operating at a lower level of abstraction; we were forced to intertwingle our concerns of adding 1 and our concerns of timeout. CML is more expressive than Go.

you were right when you said we're all just bricks in the wall

However! I was right in the sense that you can build a CML system on top of Go-like systems (though possibly not Go in particular). Thanks to Vesa Karvonen for this comment and the link to their proof-of-concept CML implementation in Clojure's core.sync. I understand Vesa also has an implementation in F# as well.

Folks should read Vesa's code, after reading the Reppy papers of course; it's delightfully short and expressive. The basic idea is that event composition operators like choose and wrap build up data structures instead of doing things. The sync operation then grovels through those data structures to collect a list of channels to pass on to core.async's equivalent of select. When select returns, sync determines which event that chosen channel and message corresponds to, and proceeds to "activate" the event (and, as a side effect, possibly issue NACK messages to other channels).

Provided you can map from the chosen select channel/message back to the event, (something that core.async can mostly do, with a caveat; see the code), then you can build CML on top of channels and goroutines.

o/~ yeah you were wrong o/~

On the other hand! One advantage of CML is that its events are not limited to channel sends and receives. I understand that timeouts, thread joins, and maybe some other event types are first-class event kinds in many CML systems. Michael Sperber, current Scheme48 maintainer and functional programmer, tells me that simply wrapping events in channels+goroutines works but can incur a big performance overhead relative to supporting those event types natively, due to the need to make the new goroutine and channel and the scheduling costs. He quotes 10X as the overhead!

So although CML and Go appear to be inter-expressible, maybe a proper solution will base the simple channel send/receive interface on CML rather than the other way around.

Also, since these events are now second-class, it must be OK to lose these events, for the same reason that the naïve withTimeout could lose a message from numbers. This is the case for timeouts usually but maybe you have to think about this more, and possibly provide an infinite stream of the message. (Of course the wrapper goroutine would be collected if the channel becomes unreachable.)

you were right when you said this is the end

I've long wondered how contemporary musicians deal with the enormous, crushing weight of recorded music. I don't really pick any more but hoo am I feeling this now. I think for Guile, I will continue hacking on fibers in a separate library, and I think that things will remain that way for the next couple years and possibly more. We need more experience and more mistakes before blessing and supporting any particular formulation of highly concurrent programming. I will say though that I am delighted that we are able to actually do this experimentation on a library level and I look forward to seeing what works out :)

Thanks again to Vesa, Michael, and Sam for sharing their time and knowledge; all errors are of course mine. Happy hacking!

by Andy Wingo at September 21, 2016 09:29 PM

September 20, 2016

Andy Wingo

concurrent ml versus go

Peoples! Lately I've been navigating the guile-ship through waters unknown. This post is something of an echolocation to figure out where the hell this ship is and where it should go.

Concretely, I have been working on getting a nice lightweight concurrency system rolling for Guile. I'll write more about that later, but you can think of it as being modelled on Go, though built as a library. (I had previously described it as "Erlang-like", but that's just not accurate.)

Earlier this year at Curry On this topic was burning in my mind and of course when I saw the language-hacker fam there I had to bend their ears. My targets: Matthew Flatt, the amazing boundary-crossing engineer, hacker, teacher, researcher, and implementor of Racket, and Matthias Felleisen, the godfather of the PLT research family. I saw them sitting together and I thought, you know what, what can they have to say to each other? These people have been talking together for 30 years right? Surely they are actually waiting for some ignorant dude to saunter up to the PL genius bar, right?

So saunter I do, saying, "if someone says to you that they want to build a server that will handle 100K or so simultaneous connections on Racket, what abstraction do you tell them to use? Racket threads?" Apparently: yes. A definitive yes, in the case of Matthias, with a pointer to Robby Findler's paper on kill-safe abstractions; and still a yes from Matthew with the caveat that for the concrete level of concurrency that I described, you'd have to run tests. More fundamentally, I was advised to look at Concurrent ML (on which Racket's concurrency facilities were based), that CML was much better put together than many modern variants like Go.

This was very interesting and new to me. As y'all probably know, I don't have a formal background in programming languages, and although I've read a lot of literature, reading things only makes you aware of the growing dimension of the not-yet-read. Concurrent ML was even beyond my not-yet-read horizon.

So I went back and read a bunch of papers. Turns out Concurrent ML is like Lisp in that it has a tribe and a tightly-clutched history and a diaspora that reimplements it in whatever language they happen to be working in at the moment. Kinda cool, and, um... a bit hard to appreciate in the current-day context when the only good references are papers from 10 or 20 years ago.

However, after reading a bunch of John Reppy papers, here is my understanding of what Concurrent ML is. I welcome corrections; surely I am getting this wrong.

1. CML is like Go, composed of channels and goroutines. (Forgive the modern referent; I assume most folks know Go at this point.)

2. Unlike Go, in CML a channel is never buffered. To make a buffered channel in CML, you spawn a thread that manages a buffer between two channels.

3. Message send and receive operations in CML are built on a lower-level primitive called "events". (send ch x) is instead euivalent to (sync (send-event ch x)). It's like an event is the derivative of a message send with respect to time, or something.

4. Events can be combined and transformed using the choose and wrap combinators.

5. Doing a sync on an event created by choose allows a user to build select in "user-space", as a library. Cool stuff. So this is what events are for.

6. There are separate event type implementations for timeouts, channel send/recv blocking operations, file descriptor blocking operations, syscalls, thread joins, and the like. These are supported by the CML implementation.

7. The early implementations of Concurrent ML were concurrent but not parallel; they did not run multiple "goroutines" on separate CPU cores at the same time. It was only in like 2009 that people started to do CML in parallel. I do not know if this late parallelism has a practical impact on the viability of CML.

ok go

What is the relationship of CML to Go? Specifically, is CML more expressive than Go? (I assume the reverse is not the case, but that would also be an interesting result!)

There are a few languages that only allow you to select over message receives (not sends), but Go's select doesn't have this limitation, so that's not a differentiator.

Some people say that it's nice to have events as the common denominator, but I don't get this argument. If the only event under consideration is message send or receive over a channel, events + choose + sync is the same in expressive power as a built-in select, as far as I can see. If there are other events, then your runtime already has to support them either way, and something like (let ((ch (make-channel))) (spawn-fiber (lambda () (put-message ch exp))) (get-message ch)) should be sufficient for any runtime-supported event in exp, like sleeps or timeouts or thread joins or whatever.

To me it seems like Go has made the right choices here. I do not see the difference, and that's why I wrote all this, is to be shown the error of my ways. Choosing channels, send, receive, and select as the primitives seems to have the same power as SML events.

Let this post be a pentagram on the floor, then, to summon the CML cognoscenti. Well-actuallies are very welcome; hit me up in the comments!

[edit: Sam Tobin-Hochstadt tells me I got it wrong and I believe him :) In the meantime while I work out how I was wrong, examples are welcome!]

by Andy Wingo at September 20, 2016 09:33 PM

Carlos García Campos

WebKitGTK+ 2.14

These six months has gone so fast and here we are again excited about the new WebKitGTK+ stable release. This is a release with almost no new API, but with major internal changes that we hope will improve all the applications using WebKitGTK+.

The threaded compositor

This is the most important change introduced in WebKitGTK+ 2.14 and what kept us busy for most of this release cycle. The idea is simple, we still render everything in the web process, but the accelerated compositing (all the OpenGL calls) has been moved to a secondary thread, leaving the main thread free to run all other heavy tasks like layout, JavaScript, etc. The result is a smoother experience in general, since the main thread is no longer busy rendering frames, it can process the JavaScript faster improving the responsiveness significantly. For all the details about the threaded compositor, read Yoon’s post here.

So, the idea is indeed simple, but the implementation required a lot of important changes in the whole graphics stack of WebKitGTK+.

  • Accelerated compositing always enabled: first of all, with the threaded compositor the accelerated mode is always enabled, so we no longer enter/exit the accelerating compositing mode when visiting pages depending on whether the contents require acceleration or not. This was the first challenge because there were several bugs related to accelerating compositing being always enabled, and even missing features like the web view background colors that didn’t work in accelerated mode.
  • Coordinated Graphics: it was introduced in WebKit when other ports switched to do the compositing in the UI process. We are still doing the compositing in the web process, but being in a different thread also needs coordination between the main thread and the compositing thread. We switched to use coordinated graphics too, but with some modifications for the threaded compositor case. This is the major change in the graphics stack compared to the previous one.
  • Adaptation to the new model: finally we had to adapt to the threaded model, mainly due to the fact that some tasks that were expected to be synchronous before became asyncrhonous, like resizing the web view.

This is a big change that we expect will drastically improve the performance of WebKitGTK+, especially in embedded systems with limited resources, but like all big changes it can also introduce new bugs or issues. Please, file a bug report if you notice any regression in your application. If you have any problem running WebKitGTK+ in your system or with your GPU drivers, please let us know. It’s still possible to disable the threaded compositor in two different ways. You can use the environment variable WEBKIT_DISABLE_COMPOSITING_MODE at runtime, but this will disable accelerated compositing support, so websites requiring acceleration might not work. To disable the threaded compositor and bring back the previous model you have to recompile WebKitGTK+ with the option ENABLE_THREADED_COMPOSITOR=OFF.

Wayland

WebKitGTK+ 2.14 is the first release that we can consider feature complete in Wayland. While previous versions worked in Wayland there were two important features missing that made it quite annoying to use: accelerated compositing and clipboard support.

Accelerated compositing

More and more websites require acceleration to properly work and it’s now a requirement of the threaded compositor too. WebKitGTK+ has supported accelerated compositing for a long time, but the implementation was specific to X11. The main challenge is compositing in the web process and sending the results to the UI process to be rendered on the actual screen. In X11 we use an offscreen redirected XComposite window to render in the web process, sending the XPixmap ID to the UI process that renders the window offscreen contents in the web view and uses XDamage extension to track the repaints happening in the XWindow. In Wayland we use a nested compositor in the UI process that implements the Wayland surface interface and a private WebKitGTK+ protocol interface to associate surfaces in the UI process to the web pages in the web process. The web process connects to the nested Wayland compositor and creates a new surface for the web page that is used to render accelerated contents. On every swap buffers operation in the web process, the nested compositor in the UI process is automatically notified through the Wayland surface protocol, and  new contents are rendered in the web view. The main difference compared to the X11 model, is that Wayland uses EGL in both the web and UI processes, so what we have in the UI process in the end is not a bitmap but a GL texture that can be used to render the contents to the screen using the GPU directly. We use gdk_cairo_draw_from_gl() when available to do that, falling back to using glReadPixels() and a cairo image surface for older versions of GTK+. This can make a huge difference, especially on embedded devices, so we are considering to use the nested Wayland compositor even on X11 in the future if possible.

Clipboard

The WebKitGTK+ clipboard implementation relies on GTK+, and there’s nothing X11 specific in there, however clipboard was read/written directly by the web processes. That doesn’t work in Wayland, even though we use GtkClipboard, because Wayland only allows clipboard operations between compositor clients, and web processes are not Wayland clients. This required to move the clipboard handling from the web process to the UI process. Clipboard handling is now centralized in the UI process and clipboard contents to be read/written are sent to the different WebKit processes using the internal IPC.

Memory pressure handler

The WebKit memory pressure handler is a monitor that watches the system memory (not only the memory used by the web engine processes) and tries to release memory under low memory conditions. This is quite important feature in embedded devices with memory limitations. This has been supported in WebKitGTK+ for some time, but the implementation is based on cgroups and systemd, that is not available in all systems, and requires user configuration. So, in practice nobody was actually using the memory pressure handler. Watching system memory in Linux is a challenge, mainly because /proc/meminfo is not pollable, so you need manual polling. In WebKit, there’s a memory pressure handler on every secondary process (Web, Plugin and Network), so waking up every second to read /proc/meminfo from every web process would not be acceptable. This is not a problem when using cgroups, because the kernel interface provides a way to poll an EventFD to be notified when memory usage is critical.

WebKitGTK+ 2.14 has a new memory monitor, used only when cgroups/systemd is not available or configured, based on polling /proc/meminfo to ensure the memory pressure handler is always available. The monitor lives in the UI process, to ensure there’s only one process doing the polling, and uses a dynamic poll interval based on the last system memory usage to read and parse /proc/meminfo in a secondary thread. Once memory usage is critical all the secondary processes are notified using an EventFD. Using EventFD for this monitor too, not only is more efficient than using a pipe or sending an IPC message, but also allows us to keep almost the same implementation in the secondary processes that either monitor the cgroups EventFD or the UI process one.

Other improvements and bug fixes

Like in all other major releases there are a lot of other improvements, features and bug fixes. The most relevant ones in WebKitGTK+ 2.14 are:

  • The HTTP disk cache implements speculative revalidation of resources.
  • The media backend now supports video orientation.
  • Several bugs have been fixed in the media backend to prevent deadlocks when playing HLS videos.
  • The amount of file descriptors that are kept open has been drastically reduced.
  • Fix the poor performance with the modesetting intel driver and DRI3 enabled.

by carlos garcia campos at September 20, 2016 07:42 AM

September 19, 2016

Michael Catanzaro

Epiphany 3.22 (and a couple new stable releases too!)

It’s that time of year again! A new major release of Epiphany is out now, representing another six months of incremental progress. That’s a fancy way of saying that not too much has changed (so how did this blog post get so long?). It’s not for lack of development effort, though. There’s actually lot of action in git master and on sidebranches right now, most of it thanks to my awesome Google Summer of Code students, Gabriel Ivascu and Iulian Radu. However, I decided that most of the exciting changes we’re working on would be deferred to Epiphany 3.24, to give them more time to mature and to ensure quality. And since this is a blog post about Epiphany 3.22, that means you’ll have to wait until next time if you want details about the return of the traditional address bar, the brand-new user interface for bookmarks, the new support for syncing data between Epiphany browsers on different computers with Firefox Sync, or Prism source code view, all features that are brewing for 3.24. This blog also does not cover the cool new stuff in WebKitGTK+ 2.14, like new support for copy/paste and accelerated compositing in Wayland.

New stuff

So, what’s new in 3.22?

  • A new Paste and Go context menu option in the address bar, implemented by Iulian. It’s so simple, but it’s also the greatest thing ever. Why did nobody implement this earlier?
  • A new Duplicate Tab context menu option on tabs, implemented by Gabriel. It’s not something I use myself, but it seems some folks who use it in other browsers were disappointed it was missing in Epiphany.
  • A new keyboard shortcuts dialog is available in the app menu, implemented by Gabriel.

Gabriel also redesigned all the error pages. My favorite one is the new TLS error page, based on a mockup from Jakub Steiner:

Web app improvements

Pivoting to web apps, Daniel Aleksandersen turned his attention to the algorithm we use to pick a desktop icon for newly-created web apps. It was, to say the least, subpar; in Epiphany 3.20, it normally always fell back to using the website’s 16×16 favicon, which doesn’t look so great in a desktop environment where all app icons are expected to be at least 256×256. Epiphany 3.22 will try to pick better icons when websites make it possible. Read more on Daniel’s blog, which goes into detail on how to pick good web app icons.

Also new is support for system-installed web apps. Previously, Epiphany could only handle web apps installed in home directories, which meant it was impossible to package a web app in an RPM or Debian package. That limitation has now been removed. (Update: I had forgotten that limitation was actually removed for GNOME 3.20, but the web apps only worked when running in GNOME and not in other desktops, so it wasn’t really usable. That’s fixed now in 3.22.) This was needed to support packaging Fedora Developer Portal, but of course it can be used to package up any website. It’s probably only interesting to distributions that ship Epiphany by default, though. (Epiphany is installed by default in Fedora Workstation as it’s needed by GNOME Software to run web apps, it’s just hidden from the shell overview unless you “install” it.) At least one media outlet has amusingly reported this as Epiphany attempting to compete generally with Electron, something I did write in a commit message, but which is only true in the specific case where you need to just show a website with absolutely no changes in the GNOME desktop. So if you were expecting to see Visual Studio running in Epiphany: haha, no.

Shortcut woes

On another note, I’m pleased to announce that we managed to accidentally stomp on both shortcuts for opening the GTK+ inspector this cycle, by mapping Duplicate Tab to Ctrl+Shift+D, and by adding a new Ctrl+Shift+I shortcut to open the WebKit web inspector (in addition to F12). Go team! We caught the problem with Ctrl+Shift+D and removed the shortcut in time for the release, so at least you can still use that to open the GTK+ inspector, but I didn’t notice the issue with the web inspector until it was too late, and Ctrl+Shift+I will no longer work as expected in GTK+ apps. Suggestions welcome for whether we should leave the clashing Ctrl+Shift+I shortcut or get rid of it. I am leaning towards removing it, because we normally match Epiphany behavior with GTK+, and only match other browsers when it doesn’t conflict with GTK+. That’s called desktop integration, and it’s worked well for us so far. But a case can be made for matching other browsers, too.

Stable releases

On top of Epiphany 3.22, I’ve also rolled new stable releases 3.20.4 and 3.18.8. I don’t normally blog about stable releases since they only include bugfixes and are usually boring, so why are these worth mentioning here? Two reasons. First, one of the fixes in these releases is quite significant: I discovered that a few important features were broken when multiple tabs share the same web process behind the scenes (a somewhat unusual condition): the load anyway button on the unacceptable TLS certificate error page, password storage with GNOME keyring, removing pages from the new tab overview, and deleting web applications. It was one subtle bug that was to blame for breaking all of those features in this odd corner case, which finally explains some difficult-to-reproduce complaints we’d been getting, so it’s good to put out that bug of the way. Of course, that’s also fixed in Epiphany 3.22, but new stable releases ensure users don’t need a full distribution upgrade to pick up a simple bugfix.

Additionally, the new stable releases are compatible with WebKitGTK+ 2.14 (to be released later this week). The Epiphany 3.20.4 and 3.18.8 releases will intentionally no longer build with older versions of WebKitGTK+, as new WebKitGTK+ releases are important and all distributions must upgrade. But wait, if WebKitGTK+ is kept API and ABI stable in order to encourage distributions to release updates, then why is the new release incompatible with older versions of Epiphany? Well, in addition to stable API, there’s also an unstable DOM API that changes willy-nilly without any soname bumps; we don’t normally notice when it changes, since it’s autogenerated from web IDL files. Sounds terrible, right? In practice, no application has (to my knowledge) ever been affected by an unstable DOM API break before now, but that has changed with WebKitGTK+ 2.14, and an Epiphany update is required. Most applications don’t have to worry about this, though; the unstable API is totally undocumented and not available unless you #define a macro to make it visible, so applications that use it know to expect breakage. But unannounced ABI changes without soname bumps are obviously a big a problem for distributions, which is why we’re fixing this problem once and for all in WebKitGTK+ 2.16. Look out for a future blog post about that, probably from Carlos Garcia.

elementary OS

Lastly, I’m pleased to note that elementary OS Loki is out now. elementary is kinda (totally) competing with us GNOME folks, but it’s cool too, and the default browser has changed from Midori to Epiphany in this release due to unfixed security problems with Midori. They’ve shipped Epiphany 3.18.5, so if there are any elementary fans in the audience, it’s worth asking them to upgrade to 3.18.8. elementary does have some downstream patches to improve desktop integration with their OS — notably, they’ve jumped ahead of us in bringing back the traditional address bar — but desktop integration is kinda the whole point of Epiphany, so I can’t complain. Check it out! (But be sure to complain if they are not releasing WebKit security updates when advised to do so.)

by Michael Catanzaro at September 19, 2016 02:00 PM

September 18, 2016

Michael Catanzaro

A WebKit Update for Ubuntu

I’m pleased to learn that Ubuntu has just updated WebKitGTK+ from 2.10.9 to 2.12.5 in Ubuntu 16.04. To my knowledge, this is the first time Ubuntu has released a major WebKit update. It includes fixes for 16 security vulnerabilities detailed in WSA-2016-0004 and WSA-2016-0005.

This is really great. Of course, it would have been better if it didn’t take three and a half months to respond to WSA-2016-0004, and the week before WebKitGTK+ 2.12 becomes obsolete was not the greatest timing, but late security updates are much better than no security updates. It remains to be seen if Ubuntu will keep up with WebKit updates in the future, but I think I can tentatively stop complaining about Ubuntu for now. Debian is looking increasingly isolated in not offering WebKit security updates to its users.

Thanks, Ubuntu!

by Michael Catanzaro at September 18, 2016 04:34 AM

September 14, 2016

Manuel Rego

TPAC, Web Engines Hackfest & Igalia 15th anniversary on the horizon

W3C’s TPAC

Next week I’ll be in Lisbon attending TPAC. This is the annual conference organized by the W3C where all the different groups meet face to face during one week. It seems a huge event where you can meet lots of important people working on the web. Looking forward to being there. 😃

Due to our involvement on the implementation of CSS Grid Layout specification in Chromium/Blink and and Safari/WebKit, we’ve been interacting quite a lot with the CSS Working Group (CSS WG). Thus, I’ll be participating on their meetings during the event and also following the work around Houdini, with a close eye on the Layout API. BTW, thanks to the CSS WG chairs for letting me join them.

Like past year, Igalia will have a booth in the conference where you can chat with us about our involvement on the W3C, from the specs edition to the implementation of web standards on the different browsers. My colleagues Joanie and Juanjo will be attending the event too, so don’t hesitate to ping any of us to talk about Igalia and our contributions to the open web platform.

Web Engines Hackfest

This year Igalia is organizing and hosting again a new edition of the Web Engines Hackfest. The event will take place during the last week of the month (26-28th September), it used to be on December but we looked for a better date this year and it seems we’ve been successful as we’ll be around 40 people (more than ever). Thanks everyone attending, we just hope you really enjoy the event.

As usual my main goal for the hackfest is related to CSS Grid Layout implementation. Probably it’d be a good moment to draft a plan for shipping it on Chromium as we’ll have Christian Biesinger around, who is usually reviewing most of our Grid Layout patches.

On top of that and due to my involvement on the MathML refactoring we did on WebKit, I’ll be very interested on keep discussing about MathML and the next steps to make it a reality on the different web engines. Some fonts experts will be around, so we won’t miss the opportunity to try to improve OpenType MATH table support in HarfBuzz.

Apart from this, it’s worth highlighting the big number of Servo contributors we’ll have in the event, from Mozilla employees to external collaborators (like my colleague Martin). I’m eager to check firsthand the status of this engine and their future plans.

Last, but not least, hopefully during the hackfest we’ll find some time to discuss about the upstreaming process of WebKit for Wayland, trying to convert it in an official WebKit port like WebKitGTK+.

And there’ll be more topics that will be discussed there like: accessibility, multimedia, V8, WebRTC, etc. We’re quite a lot of people, so surely we’ll have productive meetings on most of these things.

Igalia 15th Anniversary

This month Igalia is becoming 15 years old! It’s amazing that a company with a completely different model (focused on free software and with a flat cooperative-like structure) has survived during all these years. I’m very grateful to the people who founded a company with such wonderful values, and all the people that have made possible to reach this point in our history. Thanks for letting me be part of it! 😊

Igalia 15th Anniversary Logo Igalia 15th Anniversary Logo

We’ll be celebrating the anniversary on the last week of September, we’ll have several parties during that week and we’ll have one of our summits at the weekend.

It’s awesome to see back in time and realize how many contributions we’ve been doing to a lot of different free software projects, from our first days inside the GNOME project, to our current work on the different browsers or graphics drivers, among others. Lots of programs you’re using every day have some code contributed by Igalia; from your computer to your phones, TVs, watches, etc.

Happy birthday Igalia, I wish we’ll have many more years of success on the free world! 🎂

September 14, 2016 10:00 PM

August 30, 2016

Javier Muñoz

AWS4 chunked upload goes upstream in Ceph RGW S3

With AWS Signature Version 4 (AWS4) you have the option of uploading the payload in fixed or variable-size chunks.

This chunked upload option, also known as Transfer payload in multiple chunks or STREAMING-AWS4-HMAC-SHA256-PAYLOAD feature in the Amazon S3 ecosystem, avoids reading the payload twice (or buffer it in memory) to compute the signature in the client side.

AWS4 chunked upload support is now upstream code in Ceph. It will also ship with the next Jewel release.

Signing single vs multiple chunks payloads

In AWS Signature Version 4 using the HTTP Authorization header is the most common method of providing authentication information. There are two supported options in S3:

The first option, 'Transfer payload in a single chunk', is the way how Ceph RGW S3 authenticates requests under AWS4 right now. It supports signed and unsigned payloads.

For this case the major client-side drawback with signed payloads uploads is the necessity to read the payload twice or buffer it in memory. For big files this approach may be inefficient so you might prefer to upload data in chunks instead.

The second option, 'Transfer payload in multiple chunks (chunked upload)', is the new functionality being added upstream and backported to Jewel.

In this case you break up the payload in fixed or variable-size chunks. By uploading data in chunks, you avoid accesing the payload twice to calculate the signature in the client side.

Signing the chunks

Each chunk signature computation includes the signature of the previous chunk. The seed signature is generated using the request headers only:

Bear in mind the seed signature is used in the signature computation of the first chunk only. For each subsequent chunk, you create a chunk signature that includes the signature of the previous chunk.

The string to sign used in each subsequent chunk signature is:

The chunk signatures chaining ensures you send the chunks in correct order.

Uploading chunked payloads with the AWS SDK for Java

To test chunked uploads with Ceph RGW S3 you can use the AWS SDK for Java. It supports AWS4 chunked uploads. The last version is available here.

Feel free to use this code based on the AWS SDK Java's UploadObjectSingleOperation example.

Enjoy!

Acknowledgments

My work in Ceph is sponsored by Outscale and has been made possible by Igalia and the invaluable help of the Ceph development team. Thank you all!

by Javier at August 30, 2016 10:00 PM

August 29, 2016

Iago Toral

OpenGL terrain renderer

Lately I have been working on a simple terrain OpenGL renderer demo, mostly to have a playground where I could try some techniques like shadow mapping and water rendering in a scenario with a non trivial amount of geometry, and I thought it would be interesting to write a bit about it.

But first, here is a video of the demo running on my old Intel IvyBridge GPU:

And some screenshots too:

OpenGL Terrain screenshot 1 OpenGL Terrain screenshot 2
OpenGL Terrain screenshot 3 OpenGL Terrain screenshot 4

Note that I did not create any of the textures or 3D models featured in the video.

With that out of the way, let’s dig into some of the technical aspects:

The terrain is built as a 251×251 grid of vertices elevated with a heightmap texture, so it contains 63,000 vertices and 125,000 triangles. It uses a single 512×512 texture to color the surface.

The water is rendered in 3 passes: refraction, reflection and the final rendering. Distortion is done via a dudv map and it also uses a normal map for lighting. From a geometry perspective it is also implemented as a grid of vertices with 750 triangles.

I wrote a simple OBJ file parser so I could load some basic 3D models for the trees, the rock and the plant models. The parser is limited, but sufficient to load vertex data and simple materials. This demo features 4 models with these specs:

  • Tree A: 280 triangles, 2 materials.
  • Tree B: 380 triangles, 2 materials.
  • Rock: 192 triangles, 1 material (textured)
  • Grass: 896 triangles (yes, really!), 1 material.

The scene renders 200 instances of Tree A another 200 instances of Tree B, 50 instances of Rock and 150 instances of Grass, so 600 objects in total.

Object locations in the terrain are randomized at start-up, but the demo prevents trees and grass to be under water (except for maybe their base section only) because it would very weird otherwise :), rocks can be fully submerged though.

Rendered objects fade in and out smoothly via alpha blending (so there is no pop-in/pop-out effect as they reach clipping planes). This cannot be observed in the video because it uses a static camera but the demo supports moving the camera around in real-time using the keyboard.

Lighting is implemented using the traditional Phong reflection model with a single directional light.

Shadows are implemented using a 4096×4096 shadow map and Percentage Closer Filter with a 3x3 kernel, which, I read is (or was?) a very common technique for shadow rendering, at least in the times of the PS3 and Xbox 360.

The demo features dynamic directional lighting (that is, the sun light changes position every frame), which is rather taxing. The demo also supports static lighting, which is significantly less demanding.

There is also a slight haze that builds up progressively with the distance from the camera. This can be seen slightly in the video, but it is more obvious in some of the screenshots above.

The demo in the video was also configured to use 4-sample multisampling.

As for the rendering pipeline, it mostly has 4 stages:

  • Shadow map.
  • Water refraction.
  • Water reflection.
  • Final scene rendering.

A few notes on performance as well: the implementation supports a number of configurable parameters that affect the framerate: resolution, shadow rendering quality, clipping distances, multi-sampling, some aspects of the water rendering, N-buffering of dynamic VBO data, etc.

The video I show above runs at locked 60fps at 800×600 but it uses relatively high quality shadows and dynamic lighting, which are very expensive. Lowering some of these settings (very specially turning off dynamic lighting, multisampling and shadow quality) yields framerates around 110fps-200fps. With these settings it can also do fullscreen 1600×900 with an unlocked framerate that varies in the range of 80fps-170fps.

That’s all in the IvyBridge GPU. I also tested this on an Intel Haswell GPU for significantly better results: 160fps-400fps with the “low” settings at 800×600 and roughly 80fps-200fps with the same settings used in the video.

So that’s it for today, I had a lot of fun coding this and I hope the post was interesting to some of you. If time permits I intend to write follow-up posts that go deeper into how I implemented the various elements of the demo and I’ll probably also write some more posts about the optimization process I followed. If you are interested in any of that, stay tuned for more.

by Iago Toral at August 29, 2016 11:50 AM

August 22, 2016

Xavier Castaño

Igalia will be at IBC 2016

Do you work in the media, entertainment, broadcasting or content industry? If so, please join us at IBC2016, 9-13 September in Amsterdam. The IBC Exhibition covers fifteen halls across the Amsterdam RAI and hosts over 1,600 exhibitors spanning the creation, management and delivery of electronic and media entertainment.
You will be able to meet us at our booth where we will be showcasing demos of our latest work in browsers. The demo of WebKit for Wayland, a new port of WebKit that is currently being upstreamed, on an IVI shell is especially worth checking out.

Igalia will be in Hall 5, booth 19 so don’t miss your chance to attend and visit us! Find us on the interactive floorplan.

by xavier castano at August 22, 2016 10:15 AM

August 18, 2016

Adrián Pérez

SnabbWall's Traffic Analyzer: L7Spy

Wow, two posts just a few days apart from each other! Today I am writing about L7Spy, the application from the SnabbWall suite used to analyze network traffic and detect which applications they belong to. SnabbWall is being developed at Igalia with sponsorship from the NLnet Foundation. (Thanks!)

 

A Digression

There are two things related to L7Spy I want to mention beforehand:

nDPI 1.8 is Now Supported

The changes from the [recently released][ljndpi0.3.3] ljndpi v0.3.3 have been merged into the SnabbWall repository, which means that now it also supports using version 1.8 of the nDPI library.

Following Upstream

Snabb follows a monthly release schedule. Initially, I thought it would be better to do all the development for SnabbWall starting from a given Snabb release, and incorporate the changes from upstream later on, after the first one of SnabbWall is done. I noticed how it might end up being difficult to rebase SnabbWall on top of the changes accumulated from the upstream repository, and switched to the following scheme:

  • The branch containing the most up-to-date development code is always in the snabbwall branch.
  • Whenever a new Snabb version is released, changes are merged from the upstream repository using its corresponding release tag.
  • When a SnabbWall release is done, it tagged from current contents of the snabbwall branch, which means it is automatically based on the latest Snabb release.
  • If a released version needs fixing, a maintenance branch snabbwall-vX.Y for it is created from the release tag. Patches have to be backported or purposedfully made for that this new branch.

This approach reduces the burden of keeping up with changes in Snabb, while still allowing to make bugfix releases of any SnabbWall version when needed.

L7Spy

The L7Spy is a Snabb application, and as such it can be reused in your own application networks. It is made to be connected in a few different ways, which should allow for painless integration. The application has two bidirectional endpoints called south and north:

The L7Spy application.

Packets entering one of the ends are forwarded unmodified to the other end, in both directions, and they are scanned as they pass through the application.

The only mandatory connection is the receiving side of either south or north: this allows for using L7Spy as a “kitchen sink”, swallowing the scanned packets without sending them anywhere. As an example, this can be used to connect a RawSocket application to L7Spy, attach the RawSocket to an existing network interface, and do passive analysis of the traffic passing through that interface:

Passive analysis made easy

If you swap the RawSocket application with the one driving Intel NICs, and divert a copy of packets passing through your firewall towards the NIC, you would be doing essentially the same but at lightning speed and moving the load of packet analysis to a separate machine. Add a bit more of smarts on top, and this kind of setup would already look like a standalone Intrusion Detection System!

snabb wall

While implementing the L7Spy application, the snabb wall command has also come to life. Like all the other Snabb programs, the wall command is built into the snabb executable, and it has subcommands itself:

aperez@hikari ~/snabb/src % sudo ./snabb wall
[sudo] password for aperez:
Usage:
  snabb wall <subcommand> <arguments...>
  snabb wall --help

Available subcommands:

  spy     Analyze traffic and report statistics

Use --help for per-command usage. Example:

  snabb wall spy --help

aperez@hikari ~/snabb/src %

For the moment being, only the spy subcommand is provided: it exercises the L7Spy application by feeding into it packets captured from one of the supported sources. It can work in batch mode —the default—, which is useful to analyze a static pcap file. The following example analyzes the full contents of the pcap file, reporting at the end all the traffic flows detected:

aperez@hikari ~/snabb/src % sudo ./snabb wall spy \
      pcap program/wall/tests/data/EmergeSync.cap
0xfffffffff19ba8be   18p   134.68.220.74:21769 -     192.168.1.2:26883  RSYNC
0xffffffffc6f96d0f 4538p   134.68.220.74:22025 -     192.168.1.2:26883  RSYNC
aperez@hikari ~/snabb/src %

Two flows are identified, which correspond to two different connections between the client with IPv4 address 134.68.220.74 (from two different ports) to the server with address 192.168.1.2. Both flows are correctly identified as belonging to the rsync application. Notice how the application is properly detected even though the rsync daemon is running in port 26883 which is a non-standard one for this kind of service.

It is also possible to ingest traffic directly from a network interface in live mode. Apart from the drivers for Intel cards included with Snabb (selectable using the intel10g and intel1g sources) there are two hardware-independent sources for TAP devices (tap) and RAW sockets (raw). Let's use a RAW socket to sniff a copy of every packet which passes through the a kernel-managed network interface, while opening a well known search engine in a browser using their https:// URL:

aperez@hikari ~/snabb/src % sudo ./snabb wall spy --live eth0
...
0x77becdcb    2p    192.168.20.1:53    -   192.168.20.21:58870  SERVICE_GOOGLE:DNS
0x6e111c73    4p   192.168.20.21:443   -   216.58.210.14:48114  SERVICE_GOOGLE:SSL
...

Note how even the connection is encrypted, our nDPI-based analyzer has been able to correlate a DNS query immediately followed by a connection to the address from the DNS response at port 443 as a typical pattern used to browse web pages, and it has gone the extra mile to tell us that it belongs to a Google-operated service.

There is one more feature in snabb wall spy. Passing --stats will print a line at the end (in normal mode), or every two seconds (in live mode), with a summary of the amount of data processed:

aperez@hikari ~/snabb/src % sudo ./snabb wall spy --live --stats eth0
...
=== 2016-08-17T18:05:54+0300 === 71866 Bytes, 99 packets, 35933.272 B/s, 49.500 PPS
...
=== 2016-08-17T18:19:35+0300 === 3105473 Bytes, 2884 packets, 1.599 B/s, 0.001 PPS
...

One last note: even though it is not shown in the examples, scanning and identifying IPv6 traffic is already supported.

Get that IPv6 flowing!

What Next?

Now that traffic analysis and flow identification is working nicely, it has to be used for actual filtering. I am now in the process of implementing the firewall component of SnabbWall (L7fw) as a Snabb application. Also, there will be a snabb wall filter subcommand, providing an off-the-shelf L7 filtering solution. Expect a couple more of blog posts about them.

In the meanwhile, if you want to play with L7Spy, I will be updating its WIP documentation in the following days. Happy hacking!

by aperez (adrian@perezdecastro.org) at August 18, 2016 08:51 AM

August 11, 2016

Xabier Rodríguez Calvar

Going to GUADEC 2016

I am at the airport getting ready to board beginning my trip to Karlsruhe for GUADEC 2016. I’ll be there with my colleague Michael Catanzaro.Please talk to us if you are interested in browsers or Igalia.

by calvaris at August 11, 2016 07:30 AM

July 31, 2016

Adrián Pérez

ljndpi 0.0.3 released with nDPI 1.8 support

Although I have been away from Snabb-related work for a while, the fact that nDPI 1.8 was released didn't went unnoticed. This library is an important building block for SnabbWall, the Layer-7 firewall which I am developing at Igalia with sponsorship from the NLnet Foundation (thanks!).

 

SnabbWall uses the nDPI library via the ljndpi binding, about which I wrote back in January. As any other component which uses a native library, ljndpi is a consumer of the API and ABI exported by it, so it needs to be updated to accomodate to changes introduced in new releases of nDPI.

Fortuntately most of the API (and ABI) from nDPI 1.7 remains unchanged in version 1.8, making it possible to maintain the interface exposed to the Lua world. This is very good news, and it means that any program using ljndpi does not need to be modified to work with nDPI 1.8: just update ljndpi to version 0.0.3 and you are ready to go.

Nitty-Gritty Details

In order to support versions 1.7 and 1.8 of the nDPI library, ndpi_revision() is used to obtain the version string of the loaded library, and parsed into a (major, minor, patch) version triplet. The ndpi.c module uses this version information to determine which symbols to make known to the FFI extension via ffi.cdef:

if lib_version.minor == 7 then
  -- Declare functions available only in nDPI 1.7
  ffi.cdef [[
    ...
  ]]
elseif lib_version.minor == 8 then
  -- Ditto, for version 1.8
  ffi.cdef [[
    ...
  ]]
end

The high-level ndpi.wrap module does the same, ensuring that only functions available in the version of the nDPI library currently loaded are used. In some cases, parameters passed to functions with the same name in nDPI may differ, and ndpi.wrap shuffles them around as needed in order to avoid changes in the interface exposed to Lua:

if lib_version.major == 7 then
   -- ...
else
   -- ...

   -- In nDPI 1.8 the second parameter (uint8_t proto) has been dropped.
   detection_module.find_port_based_protocol = function (dm, dummy, ...)
      local proto = lib.ndpi_find_port_based_protocol(dm, ...)
      return proto.master_protocol, proto.protocol
   end
end

Last but not least, the numeric values which identify each protocol supported by nDPI have changed as well, due to the additional protocols supported by the new version of the library. The existing ndpi.protocol_ids module was renamed to ndpi.protocol_ids_1_7, and then a new ndpi.protocol_ids_1_8 generated using the update-protocol-ids tool. Finally, the ndpi.wrap module was updated to load the module which corresponds to the version of the nDPI library being used:

-- Return the module table.
return {
  protocol = require("ndpi.protocol_ids_"
            .. lib_version.major .. "_"
            .. lib_version.minor);
  -- ...
}

Easier Installation

As a bonus, it is now possible to install ljndpi using the LuaRocks package manager. LuaRocks provides similar functionality to NodeJS' npm or Python's pip. If your LuaJIT installation includes LuaRocks, getting the latest stable release of ljndpi is easier than ever:

luarocks install ljndpi

It is also possible to install a development snapshot directly. The following will fetch the source from the Git repository and install it using LuaRocks — though it is strongly recommended to use the stable releases:

luarocks install --server=https://luarocks.org/dev ljndpi

Hopefully, this will make ljndpi easier to discover and used by projects other than SnabbWall. Who knows, it may even make it easier for distributions to package it! (Hint: LuaRocks supports specifying a custom tree.)

One More Thing...

He said it 31 times on stage (look it up!)

My previous post ended up in a cliffhanger which promised another post about SnabbWall. Supporting nDPI 1.8 in ljndpi seemed important enough to write about it, but rest assured that the post on SnabbWall is next in my “polish and publish” queue.

by aperez (adrian@perezdecastro.org) at July 31, 2016 08:15 PM

July 29, 2016

Alejandro Piñeiro

One year of Mesa

Changes

During the last year and something, my work at Igalia was focused on the Intel i965 driver for Mesa, the open source OpenGL implementation. Iago and Samuel were working for some time, then joined Edu and Antía, and then I joined myself, as the fifth igalian.

One of the reasons I will always remember working on this project is that I was also being a father for a year and something, so it will be always easy to relate one and the other. Changes! A project change plus a total life change.

In fact the parenthood affected a little how I joined the project. We (as Igalia) decided that I would be a good candidate to join the project around April. But as my leave of absence was estimated to be around the last weeks of May, and it would be strange to start a project and suddenly disappear, we decided to postpone the joining just after the parental leave. So I spent some time closing and transferring the stuff I was doing at Igalia at that moment, and starting to get into the specifics of Intel driver, and then at the end of May, my parental leave started. Pedro is here!

 

Pedro’s day zero
Parenthood

Mesa can be a challenging project, but nothing compared to how to parent. A lot of changes. In Spain the more usual parental leave for the father is 15 days, where the only  flexibility is that you need to take 2 just after the child has born, and chose when take the other 13. But those 13 needs to be taken in a row. So I decided to took 15 days in a row. Igalia gives you 15 extra days. In fact, the equivalent to 15 days, as they gives you flexibility of how to take them. So instead of took them as 15 full-day, I decided to take 30 part-time days. Additionally Pedro’s grandmother (maternal) came to live with us for a month. That allowed us to do a step by step go back to work. So it was something like:

  • Pedro was born, I am in full parental leave, plus grandmother living at our house helping.
  • I start to work part-time
  • Grandmother goes back to Brazil
  • I start to work full-time.
  • Pedro’s mother goes back to work part-time

And probably the more complicated moment was when Pedro’s mother went back to work. More or less at that time we switched the usual grandparents (paternal) weekend visits to in-week visits. So what it is my usual timetable? Three days per week I work 8:00-14:30 at my home, and I take care of Pedro on the afternoon, when his mother goes to work. Two days per week is 8:00-12:30 at my home, and 15:00-20:00 at my parents home. And that is more or less, as no day is the same that the other 😉 And now I go more to the parks!

Pedro playing on a park
Pedro playing on a park

There were other changes. For example switched from being one of the usual suspects at Igalia office to start to work mostly from home. In any case the experience is totally worthing, and it is incredible how fast the time passed. Pedro is one year old already!

Pedro destroying his birthday cake
Mesa tasks

I almost forgot that this blog post started to talk about my work on Mesa. So many photos! During this year, I have participated (although not alone, specially on spec implementations) in the following tasks (and except last one, in chronological order):

About which task I liked the most, I think that it was the work done on optimizing the NIR to vec4 pass. And not forget that the work done by Igalia to implement internalformat_query2 and vertex_attrib_64bit, plus the work done by Iago and Samuel with fp64, helped to get Mesa 12.0 exposing OpenGL 4.2 support on Broadwell and later.

What happens with accessibility?

Being working full time for the same project for a full year, plus learning how to parent, means that I didn’t have too much spare time for accessibility development. Having said so, I have been reviewing ATK patches, doing the ATK releases regularly and keeping an eye on the GNOME accessibility mailing lists, so I’m still around. And we have Joanmarie Diggs, also fellow igalian, still rocking on improving Orca, WebKitGTK and WAI-ARIA.

Thanks

Finally, I’d like to thank both Intel and Igalia for supporting my work on Mesa and i965 all this time. Specially on allowing me so flexible timetable, where the important is what you deliver, and not when you do the work, allowing me to enjoy parenthood. I also want to thanks the igalian colleagues I has been working during this year, Iago, Samuel, Antía, Edu, Andrés, and Juan, and all those at Intel who have been helping and reviewing my work during all this year, like Jason Ekstrand, Kenneth Graunke, Matt Turner and Francisco Jerez.

 

by infapi00 at July 29, 2016 09:02 AM

July 14, 2016

Javier Muñoz

Virtual Data and Control Paths in Software-Defined Storage

Two of the promised promises in Software-Defined Storage (SDS) are higher automation and flexibility in order to reduce the costs around the storage administration tasks.

This entry will go over the required concepts to understand the virtual data and control paths in SDS solutions and how metadata are used to convey the data requirements into the automation software with the proper granularity and flexibility.

The entry will also include one of the most simple use cases ('x-amz-website-redirect-location') you can find to illustrate how all these concepts fit in Ceph.

Software-Defined Storage

Although there is no formal definition for Storage-Defined Storage (SDS), it is all about decoupling the hardware and the intelligence.

In SDS, the hardware becomes "irrelevant", the intelligence lives into the software, and the hardware may be considered a cheap commodity.

From an architectural perspective, SDS sees the storage in three layers: orchestration, data services (compression, deduplication...) and hardware.

SDS abstracts storage resources to enable pooling, replication, and on-demand provisioning of resources. The result is the ability to pool storage arrays into logical pools.

SDS and SDDC

Beyond of being a stand-alone technology, SDS is also a building block of the Software-Defined Data Center (SDDC), together with Software-Defined Compute (SDC) and Software-Defined Networking (SDS) technologies.

In SDDC, as introduced by the former VMware CTO Steve Herrod in 2012, "compute, storage, networking, security, and availability services are pooled, aggregated, and delivered as software, and managed by intelligent, policy-driven software".

Software-Defined technology brings new and challenging changes into the industry but all of them are part of a movement towards promoting a greater role for software systems above the hardware.

Many of the complex and advanced functions that used to require proprietary hardware are implemented now in software running in commodity hardware.

Data Services and Storage Virtualization

Data Services and Storage Virtualization are fundamental blocks to understand the virtual data and control paths in Software-Defined solutions.

Data Services are standards-based and uniform means of supplying a need such as provisioning, protection, availability, performance, security, etc. over the stored data. The behaviours of those data services are defined by policies.

On the other hand, Storage Virtualization is the process of grouping the physical storage from multiple network storage devices so that it looks like a single storage device. Note that this logical storage may be the result of stacking multiple physical and/or logical abstractions.

Virtual Data and Control Paths

As described by the Storage Networking Industry Association (SNIA), SDS builds on the virtualization of the Data Path although SDS is not virtualization alone. Bear in mind the Control Path needs to be abstracted as a service as well.

The virtual data path is formed by block, file and object interfaces that support applications written to these interfaces. At the same time, the control path uses metadata to express requirements, data services control and service level capabilities.

This is the way how SDS simplify the management in order to reduce the cost of maintaining the storage infrastructure. This data services management and defining storage policies avoids the manual administration. It enables flexibility and automation.

Following this approach, each data object is able to convey its own requirements independent of which virtual storage device it resides on.

Virtual Data and Control Paths in Ceph RGW S3

Let's see how all these concepts fit in Ceph. The RGW S3 component will be used to support this point.

In the previous picture, virtual data and control paths in RGW S3 are surrounded by one black line.

The data path comes defined by the own AWS S3 interface specification. This path is a virtual path because the virtual RGW S3 object store is built on the top of the native Ceph object store.

The control path comes also defined by the AWS S3 interface as part of the metadata attached over the data.

The data services are enabled/configured by the storage administrator using 'pools'. These pools are the divisions or partitions used to define key parameters such as the number of placement groups, the CRUSH ruleset or the ownership. RGW S3 runs on top of those pools to provide the final storage service.

In some use cases, choosing the right pool or attaching metadata (zone/region) is enough to specify the place where the data will live. In other cases this configuration is not needed.

Take as example the 'x-amz-website-redirect-location' system-defined metadata in the S3 static website hosting feature. It is used to redirect requests for the associated object to another object in the same bucket or an external URL.

This metadata is a good example to see how the policy allows request redirections specified by the user instead of the administator.

While metadata attachment could look like an arbitrary action to get the job done, it is a recurrent pattern in S3. Here the metadata are part of the control path and they drive the behaviour of the use case along the whole storage stack.

Wrap-up

Along this blog post we explored the virtual data and control paths in SDS, its relationship with SDDC and how metadata can be used to drive services in Ceph RGW S3.

If you are interested in SDS and metadata you might find the SNIA' Software Defined Storage White Paper a great resource. The paper covers the role of metadata in SDS and the user's view in detail.

In the case you are interested in SDDC, the SDDC VMware documentation is available on-line. Bear in mind the SDDC term covers a broad concept and this documentation covers the VMware perspective only.

As usual, the Ceph documentation is awesome to deepen more on Ceph and its S3 Object Gateway.

by Javier at July 14, 2016 10:00 PM

July 13, 2016

Iago Toral

Story and status of ARB_gpu_shader_fp64 on Intel GPUs

In case you haven’t heard yet, with the recently announced Mesa 12.0 release, Intel gen8+ GPUs expose OpenGL 4.3, which is quite a big leap from the previous OpenGL 3.3!

OpenGL 4.3

The Mesa i965 Intel driver now exposes OpenGL 4.3 on Broadwell and later!

Although this might surprise some, the truth is that even if the i965 driver only exposed OpenGL 3.3 it had been exposing many of the OpenGL 4.x extensions for quite some time, however, there was one OpenGL 4.0 extension in particular that was still missing and preventing the driver from exposing a higher version: ARB_gpu_shader_fp64 (fp64 for short). There was a good reason for this: it is a very large feature that has been in the works by Intel first and Igalia later for quite some time. We first started to work on this as far back as November 2015 and by that time Intel had already been working on it for months.

I won’t cover here what made this such a large effort because there would be a lot of stuff to cover and I don’t feel like spending weeks writing a series of posts on the subject :). Hopefully I will get a chance to talk about all that at XDC in September, so instead I’ll focus on explaining why we only have this working in gen8+ at the moment and the status of gen7 hardware.

The plan for ARB_gpu_shader_fp64 was always to focus on gen8+ hardware (Broadwell and later) first because it has better support for the feature. I must add that it also has fewer hardware bugs too, although we only found out about that later ;). So the plan was to do gen8+ and then extend the implementation to cover the quirks required by gen7 hardware (IvyBridge, Haswell, ValleyView).

At this point I should explain that Intel GPUs have two code generation backends: scalar and vector. The main difference between both backends is that the vector backend (also known as align16) operates on vectors (surprise, right?) and has native support for things like swizzles and writemasks, while the scalar backend (known as align1) operates on scalars, which means that, for example, a vec4 GLSL operation running is broken up into 4 separate instructions, each one operating on a single component. You might think that this makes the scalar backend slower, but that would not be accurate. In fact it is usually faster because it allows the GPU to exploit SIMD better than the vector backend.

The thing is that different hardware generations use one backend or the other for different shader stages. For example, gen8+ used to run Vertex, Fragment and Compute shaders through the scalar backend and Geometry and Tessellation shaders via the vector backend, whereas Haswell and IvyBridge use the vector backend also for Vertex shaders.

Because you can use 64-bit floating point in any shader stage, the original plan was to implement fp64 support on both backends. Implementing fp64 requires a lot of changes throughout the driver compiler backends, which makes the task anything but trivial, but the vector backend is particularly difficult to implement because the hardware only supports 32-bit swizzles. This restriction means that a hardware swizzle such as XYZW only selects components XY in a dvecN and therefore, there is no direct mechanism to access components ZW. As a consequence, dealing with anything bigger than a dvec2 requires more creative solutions, which then need to face some other hardware limitations and bugs, etc, which eventually makes the vector backend require a significantly larger development effort than the scalar backend.

Thankfully, gen8+ hardware supports scalar Geometry and Tessellation shaders and Intel‘s Kenneth Graunke had been working on enabling that for a while. When we realized that the vector fp64 backend was going to require much more effort than what we had initially thought, he gave a final push to the full scalar gen8+ implementation, which in turn allowed us to have a full fp64 implementation for this hardware and expose OpenGL 4.0, and soon after, OpenGL 4.3.

That does not mean that we don’t care about gen7 though. As I said above, the plan has always been to bring fp64 and OpenGL4 to gen7 as well. In fact, we have been hard at work on that since even before we started sending the gen8+ implementation for review and we have made some good progress.

Besides addressing the quirks of fp64 for IvyBridge and Haswell (yes, they have different implementation requirements) we also need to implement the full fp64 vector backend support from scratch, which as I said, is not a trivial undertaking. Because Haswell seems to require fewer changes we have started with that and I am happy to report that we have a working version already. In fact, we have already sent a small set of patches for review that implement Haswell‘s requirements for the scalar backend and as I write this I am cleaning-up an initial implementation of the vector backend in preparation for review (currently at about 100 patches, but I hope to trim it down a bit before we start the review process). IvyBridge and ValleView will come next.

The initial implementation for the vector backend has room for improvement since the focus was on getting it working first so we can expose OpenGL4 in gen7 as soon as possible. The good thing is that it is more or less clear how we can improve the implementation going forward (you can see an excellent post by Curro on that topic here).

You might also be wondering about OpenGL 4.1’s ARB_vertex_attrib_64bit, after all, that kind of goes hand in hand with ARB_gpu_shader_fp64 and we implemented the extension for gen8+ too. There is good news here too, as my colleague Juan Suárez has already implemented this for Haswell and I would expect it to mostly work on IvyBridge as is or with minor tweaks. With that we should be able to expose at least OpenGL 4.2 on all gen7 hardware once we are done.

So far, implementing ARB_gpu_shader_fp64 has been quite the ride and I have learned a lot of interesting stuff about how the i965 driver and Intel GPUs operate in the process. Hopefully, I’ll get to talk about all this in more detail at XDC later this year. If you are planning to attend and you are interested in discussing this or other Mesa stuff with me, please find me there, I’ll be looking forward to it.

Finally, I’d like to thank both Intel and Igalia for supporting my work on Mesa and i965 all this time, my igalian friends Samuel Iglesias, who has been hard at work with me on the fp64 implementation all this time, Juan Suárez and Andrés Gómez, who have done a lot of work to improve the fp64 test suite in Piglit and all the friends at Intel who have been helping us in the process, very especially Connor Abbot, Francisco Jerez, Jason Ekstrand and Kenneth Graunke.

by Iago Toral at July 13, 2016 02:48 PM

July 12, 2016

Frédéric Wang

MathML Improvements in WebKit

In a previous blog post, I explained the work made by Igalia’s web platform team to refactor WebKit’s MathML layout classes. I stated that although some rendering improvements were a nice side effect, the main goal of the first phase was really to clean the code up so that it is easier for developers to work on MathML in the future. Indeed this really made things easier to review: Quite unexpectedly to me, the second phase only took 4 days to be upstreamed… Kudos to Brent Fulgham for having reviewed so many patches in such a short period of time!

In this blog post, I am going to give an overview of the improvements made during these two first phases taking changeset r203109 as a reference. The changes will be available in WebKitGTK+ 2.14 in September and are likely to be included this month in the next Safari Technology Preview. It definitely remains more work to do such as the third phase or other rendering improvements, but I believe we have already made a big step forward!

Mathematical Fonts

Two years ago, basic support for operator stretching via the OpenType MATH table was added to WebKit. During the refactoring, we improved that support and also made use of more parameters to improve the math layout (see section about OpenType MATH parameters below). While Latin Modern Math will be used in most screenshots, the following one shows that you can use any math fonts. By default WebKit will try and use one of these fonts but if none are available or if you force a non-math font-family then the rendering quality may not be good.

The following screenshot gives the rendering for various fonts. For the last one we used the value sans-serif to illustrate issues with non-math fonts (displaystyle integral too small, mathvariant italic glyphs taken from another font, missing italic correction etc).

Screenshots of a formula with various fonts Integral obtained by contour integration
WebKitGTK+ r203109 with various font-family values.
From top to bottom: TeX Gyre Schola, Latin Modern, DejaVu Math, STIX, and sans-serif.

The href Attribute

This new feature is obvious: You can now create a hyperlink for any part of a mathematical formula! Here is a screenshot of the MathML Torture Test 21 with additional links, as displayed in WebKit r203109. Click the image to load the test case in your browser and test it.

Screenshot of MathML Torture test 21 with hyperlinks

The mathvariant Attribute

Unicode contains Mathematical Alphanumeric Symbols to convey special meaning such as double-struck or specific Arabic styles. Characters for these symbols are generally provided by math fonts. In MathML, mathematical variables are automatically rendered using the italic characters from this Unicode block. One can also access these characters via the mathvariant attribute and that attribute is actually used by many LaTeX-to-MathML converters.

In the following screenshot, you can see that the letters f, x and y are now drawn with this special mathematical italic glyphs and that WebKit uses the conventional fraktur style for the Lie algebra g. Note that the prime is still too small because WebKit does not make use of the ssty feature yet.

Screenshot of Wikipedia article on Lie Algebra Homomorphism of Lie algebra
Top: Safari 9.1.1. Bottom: Safari r203109.

Operators and Spacing

As said in my previous blog post, the rendering of large and stretchy operators have been rewritten a lot and as a consequence the rendering has improved. Also, I mentioned that the width of operators may depend on their height. This may cause accumulated approximations during the computation of preferred widths. The old flexbox-based implementation incorrectly forced layout during preferred computation to avoid that but a quick workaround for that security concern caused the approximate preferred widths to be used for the logical widths. With our new implementation, the logical width is now correctly calculated. Finally, we added partial support for the mpadded element which is often used to tweak spacing in mathematical formulas.

The screenshot below illustrates the fix for a serious regression with large operator (summation symbol) as well as improvements in the rendering of stretchy operators (horizontal braces). Note that the formula has a hack with a zero-width mpadded element which used to cause improper spacing (large gap between the group of a’s and the group of b’s).

Screenshot from Mozilla's MathML torture test using Safari 9.1.1 or the the current refactoring Tests 21 and 22 from the MathML torture test
Left: Safari 9.1.1. Right: Safari r203109.

The following screenshot shows how incorrect width computations used to cause excessive spacing after the stretchy radical and slash symbols:

Screenshot of Wikipedia article on Trigonometric Functions Sine of 1 degree
Top: Safari 9.1.1. Bottom: Safari r203109.

The displaystyle Property

Mathematical formulas can be integrated inside a paragraph of text (inline math in TeX terminology) or displayed in its own horizontally centered paragraph (display math in TeX terminology). In the latter case, the formula is in displaystyle and does not have any restrictions on vertical spacing. In the former case, the layout of the mathematical formula is modified a bit to optimize this vertical spacing and to better integrate within the surrounding text. The displaystyle property can also be set using the corresponding attribute or can change automatically in subformulas (e.g. in fractions).

In the following screenshot the fix for the large operator regression is obvious but you can also notice that the summation is now slightly different for the definition of a Bézier curve (top) and for the one of a rational Bézier curve (bottom). For example, to save some vertical space in the fractions, the sigma symbol is drawn smaller and the scripts attached to it are moved on its right. However, the script size could still be improved when we implement the scriptlevel property.

Screenshot of Wikipedia article on Bézier Curves Definitions of Bézier Curve
Left: Safari 9.1.1. Right: Safari r203109.

OpenType MATH Parameters

Many new OpenType MATH features have been implemented following the MathML in HTML5 Implementation Note:

  • Use of the AxisHeight parameter to set vertical position of fractions, tables and symmetric operators.
  • Use of layout constants for radicals (some of them were already used), scripts and fractions. This improves math spacing and positioning and allows to adjust them according to value of the displaystyle property discussed in the previous section.
  • Use of the italic correction of large operator glyph to set the position of subscripts.

The screenshots below illustrate some of these improvements. For the first one, the use of AxisHeight allows to better align the fraction bar with the plus sign. For the second one, the use of layout constants for scripts as well as the italic correction of the integral improves the placement of limits.

Screenshot of Wikipedia article on the Continued Fractions Generalized Continued Fraction
Top: Safari 9.1.1. Bottom: Safari r203109.
Screenshot of Wikipedia article on the Gamma Function Definition of the Gamma Function
Top: Safari 9.1.1. Bottom: Safari r203109.

Right-to-left radicals

One of the advantage of the old flexbox-based implementation is that right-to-left layout was available for free. This support has of course been preserved in the new implementation. We also added a simple workaround to mirror radicals using a scale transform as shown in the screenshot below. However, general glyph-level mirroring is still missing.

Screenshot of Wikipedia article on Lie Algebra Test 13 from the MathML torture test (Machrek Style)
Top: Safari 9.1.1. Bottom: Safari r203109.

Conclusion

Igalia’s web platform team has been able to follow the MathML in HTML5 Implementation Note in order to significantly improve the rendering of mathematical expressions in WebKit. More work remains to do but we will definitely appreciate any feedback that can help improving native MathML support in web engines. We are also excited to continue work and collaboration at the next Web Engines Hackfest!

July 12, 2016 10:00 PM

July 08, 2016

Frédéric Wang

MathML Refactoring in WebKit

Introduction

If you follow WebKit developments, you are certainly aware that Igalia has been working on WebKit’s MathML implementation for some time. More recently, effort has been made to write a clean implementation addressing issues reported by WebKit reviewers in the past. After joining Igalia in March, I have been in charge of getting this work reviewed and merged into WebKit’s development branch. In the past four months, we have been successful in upstreaming the first phase of the refactoring and the work accomplished is described in this blog post.

Screenshot from Mozilla's MathML torture test using Safari 9.1.1 or the the current refactoring
MathML torture tests with the Latin Modern Math font
Left: Safari 9.1.1. Right: Current MathML branch.

Note that the focus was on code refactoring so the improvement may not be obvious for non-developers. Nevertheless many issues have already been fixed as a consequence of that work: math italic, position of scripts, stretchy and large operators, rendering update and more. More importantly, this preliminary step opens the way for beautiful math rendering based on TeX rules and the OpenType MATH table. Rendering improvements and implementation of new features have already started in the next phase of the refactoring, so stay tuned…

Design Issues

As explained in a previous report, the main design issues in the flexbox-based implementation released in 2013 can essentially be summarized in three points:

  1. WebKit’s code to stretch operator was not efficient at all and was limited to some basic fences buildable via Unicode characters.
  2. WebKit’s MathML code violated many layout invariants, making the code unreliable.
  3. WebKit’s MathML code relied heavily on the C++ renderer classes for flexboxes and had to manage too many anonymous renderers.

For the first point, the performance issue had been fixed by Igalia developers right after the initial feedback from WebKit developers and we improved that again during our refactoring. Partial support for the OpenType MATH table was added during my crowdfunding project and allowed to stretch more operators with the help of math fonts. For the second point, the main issue was also fixed right after the initial feedback. However one could still have some doubts regarding the layout steps, given the complexity implied by the third point. That last issue was still problematic so far and addressing it was the main achievement of our refactoring.

Technically, the dependence on flexbox is unnecessary and the implementation actually only used a limited set of flexbox features. Thus executing the whole flexbox code was overkill. It can also be a burden for people working on other places of the layout code. For example Javi Fernández has worked on improving the box alignments in the past and he had hard time fixing the MathML code impacted by his changes. This is probably the cause of the bad position of the summation symbol that can be seen in the screenshot above.

From the layout perspective, most of the rendering logic was implemented in the flexbox classes and the MathML “renderer” classes were really just managing the creation and update of anonymous nodes and style. Although this sounds good code reuse it actually made impossible to understand how and when layout steps happen or to add more advanced features. The new implementation replaced this manipulation of the render tree with simple arithmetic calculations on box metrics which is safer and more reliable. Also, complex renderers such as RenderMathMLScripts or RenderMathMLRoot actually achieve better rendering quality with less code!

As an example of the complexity, RenderMathMLUnderOver can behave as a RenderMathMLScripts in some situation so we really want the former class to reuse the latter class. As we will see below the old implementation of the two renderers were quite different: RenderMathMLUnderOver only relied on setting column direction in the user agent stylesheet while RenderMathMLScripts created a complex render tree structures with anonymous style. Hence it seemed difficult to share the two cases or to handle DOM changes causing to move from one case to the other one. With our new implementation, this is simply reduced to simple C++ inheritance.

When I started to work on WebKit some years ago, I made the mistake of continuing with the existing approach. The implementation of multiscripts or automatic italic mathvariant added more anonymous objects and made the situation even worse. After the end of my crowdfunding project, Alex Castro did more cleanup and tried to implement important features such as displaystyle but he also soon realized that it was too hard to work with the current code base…

Layout Refactoring

In order to solve the issues discussed in the previous section, Javi and Alex worked on a new MathML branch where the first step was to remove the inheritance on the flexbox layout classes. During the Web Engines Hackfest 2015, I collaborated with the Igalia’s web platform team team to continue the work on this branch. In a second step, we rewrote many MathML renderer functions so that they stop creating anonymous nodes or style. We obtained very encouraging results: The implementation looked much simpler and much more understandable!

Alex announced the initial plan on the webkit-dev mailing list. He started opening bugs and attaching patches to merge the first step. However, that step still required many of the flexbox logic and so made code hard to understand for reviewers. Hence when I joined Igalia four months ago Alex asked me to try and see how to reorganize patches so that the two initial steps can be submitted in one go. This corresponds to the first phase mentioned in the introduction. As indicated on the wiki page, the layout refactoring consisted in rewriting the following member functions of each renderer class:

  • computePreferredLogicalWidths: calculate preferred widths, based on the preferred widths of child renderers.
  • layoutBlock: set final position and size of child renderers.
  • firstLineBaseLine: calculate the ascent of the renderer.
  • paint (optional): perform special painting such as fraction bars.

Refactored renderers no longer rely on any flexbox code nor anonymous renderers and the functions mentioned above essentially perform arithmetic computations. By reading the code, one can be sure that we follow standard layout rules and that we do not perform unnecessary reflow. Moreover, the rules specific to math rendering are only located in the MathML renderers and can be better understood. Details for each class are provided in the next subsections. After all the layout functions were rewritten and the code managing the render tree structure removed, we were able to make the RenderMathMLBlock class inherit from RenderBlock instead of RenderFlexibleBox. Many of the bugs could then be immediately closed or otherwise fixed with small follow-up patches.

Spacing

RenderMathMLSpace is a simple class inserting blank boxes for adjusting spacing of math formulas. Obviously, we do not need any of the complexity of flexbox so it was straightforward to write the layout functions.

3x Large space between 3 and x.

Grouping

RenderMathMLRow performs rendering of a row of math items. Since WebKit does not support linebreaking in MathML at the moment, this is just putting child boxes on a same baseline. One specificity is that some operators can be stretched vertically and so their width may depend on their height.

{ 2x x 3 Row containing a stretched brace, a fraction and a scripted element.

Again, flexbox features are useless here. With the old code, it was not clear whether we were violating the CSS invariant with preferred and logical widths and which kind of relayout or render tree changes would happen when doing the stretch call. By properly implementing the layout functions previously mentioned all of this became much more trustable.

Fractions

RenderMathMLFraction draws a fraction with numerator and denominator.

x+1y+2 Simple fraction.

This used to be implemented using a column direction for the fraction element. Numerator and denominator were wrapped into anonymous nodes with additional style to leave space for the fraction bar and to adjust the horizontal alignments.

RenderMathMLFraction (flexbox with column direction)
  RenderMathMLBlock (anonymous flexbox)
    RenderMathMLRow (numerator)
      ...
  RenderMathMLBlock (anonymous flexbox)
    RenderMathMLRow (denominator)
      ...

It was relatively easy to implement this without any anonymous nodes and again the use of flexbox did not sound justified. For example, to calculate the preferred width we just take the maximum of the preferred widths of the numerator and denominator. For the layout, the calculation of the logical width is similar and we calculate the horizontal coordinates of numerator and denominator so that their centers are aligned. Vertical metrics are similarly calculated from the vertical metrics of the numerator and denominator. During that step, we also fixed some bugs with the linethickness attribute and added support for some OpenType MATH table constants.

Scripts above and below

RenderMathMLUnderOver is used to attach some scripts above and below a base. Each child can itself be a horizontal stretchy operator.

base→ Base with stretchy arrow over it.

This was implemented in the user agent stylesheet by using flexboxes with column direction for the corresponding MathML elements and the C++ class had additional rules to fire the stretching. So the problems and solutions for this class were essentially a mixed of the cases of RenderMathMLFraction and RenderMathMLRow we just discussed.

Subscripts and Superscripts

RenderMathMLScripts is used for a base with some arbitrary number of scripts. All the scripts can have different positions (pre, post, sub, super) and metrics (width, ascent and descent). We must avoid collisions and take care of horizontal and vertical alignements.

base a b c d e f Base with pre and post scripts.

The old code used a complex render tree with additional style to achieve the best possible result. However, the quality was still bad as you can see for the script attached to the integral in the screenshot above. Managing the render tree was a nightmare: Just to give the idea, additional anonymous node and style were used to allow horizontal and vertical adjustments (similar to RenderMathMLFraction above) and prescripts had negative order property so that they were positioned before the base.

RenderMathMLScripts
    Base Wrapper (anonymous flexbox)
      RenderMathMLRow (base)
        ...
    SubSupPair Wrapper (anonymous flexbox with column direction)
      RenderMathMLRow (post-subscript)
        ...
      RenderMathMLRow (subperscript)
        ...
    SubSupPair Wrapper (anonymous flexbox with column direction)
      RenderMathMLRow (post-subscript)
        ...
      RenderMathMLRow (post-superscript)
        ...
    ... (more postscripts)
    RenderMathMLBlock (prescripts separator)
    SubSupPair Wrapper (anonymous flexbox with column direction and order -1)
      RenderMathMLRow (pre-subscript)
        ...
      RenderMathMLRow (pre-subperscript)
        ...
    SubSupPair Wrapper (anonymous flexbox with column direction and order -1)
      RenderMathMLRow (pre-subscript)
        ...
      RenderMathMLRow (pre-superscript)
        ...
    ... (more prescripts)

Rules from TeX and the OpenType MATH table are relatively complex and we decided to implement them directly in the new refactoring as otherwise it was impossible to get decent quality. The code is still complex but we now have clear rules, we only perform simple calculations and the render tree structure matches the DOM tree.

“Enclosing” Notations

RenderMathMLMenclose is a row of math items with some additional notations. Gurpreet Kaur implemented this element two years ago but she followed the same approch, combining anonymous nodes and style for some simple notations and special painting for others.

x+1 circle and strike notations

During the refactoring, the code has been completely rewritten so that RenderMathMLMenclose is now essentially a derived class of RenderMathMLRow with the measuring and painting functions adjusted to take into account the additional notations. During that refactoring, we also removed support for unused radical notation, which was implemented using an anonymous RenderMathMLSquareRoot (see Radicals section below).

Helper Classes for Operators

The RenderMathMLOperator class is used for math operators. It was quite complex class and we decided to extract from it two features that are unrelated to layout:

The remaining code was indeed the real layout part but the mess with anonymous node and style was only removed later (see Text Classes below). Although it seems we just needed to move the code out of RenderMathMLOperator into those new classes, the case of MathOperator was particularly difficult. We had to split the effort into several small steps to make review possible and also fixed many issues due to the entanglement and confusion of these three different features of the RenderMathMLOperator class… The work done for MathOperator actually improved the rendering of stretchy operators as you can see for the horizontal braces in the screenshot above.

Radicals

RenderMathMLRoot is used for square root or arbitrary N-th root. Many of the TeX and OpenType MATH table rules were already used by the old implementation with anonymous nodes and style. However, there were bugs difficult to fix related to zooming, child removal or style change due to the management of the anonymous RenderMathMLOperator to draw the radical sign.

x+1 + x+1 3 square and cube roots

The old implementation actually had two classes for the square and general cases (RenderMathMLSquareRoot and RenderMathMLRoot). The usual technique with various anonymous wrappers and style was used. After the refactoring, we were able to merge everything in a single RenderMathMLRoot class. Because the square root behaves as an mrow, we also made that class derive from RenderMathMLRow to reuse as much code as possible. Here is are how the render trees used to look like:

RenderMathMLSquareRoot
  RenderMathMLBlock (anonymous used for metric adjustements)
    RenderMathMLRadicalOperator (anonymous used for the radical symbol)
      ...
  RenderMathMLRootWrapper (anonymous used for the children)
    RenderMathMLRow (child 1)
      ...
    RenderMathMLRow (child 2)
      ...
    ...
    RenderMathMLRow (child N)
      ...

RenderMathMLRoot
  RenderMathMLRootWrapper (anonymous for the index)
    ...
  RenderMathMLBlock (anonymous used for metric adjustements)
     RenderMathMLRadicalOperator (anonymous used for the radical symbol)
       ...
  RenderMathMLRootWrapper (anonymous for the base)
    ...

Again, we rewrote the implementation using only simple box positioning. The difficult part was to get rid of the anonymous RenderMathMLRadicalOperator to draw the radical symbol. This class was derived from RenderMathMLOperator and extended it with some fallback drawing when math fonts were not available. After having extracted stretchy operator shaping from RenderMathMLOperator it became possible to use the MathOperator helper class to draw the radical symbol. We implemented the fallback for missing math fonts the same as Gecko: Use a scale transform to stretch the base glyph for U+221A SQUARE ROOT. As a bonus, we used such transform to implement glyph mirroring, as required to draw right-to-left radicals in some Arabic mathematical notations.

Text Classes

These classes are containers for math text such as variables or operators. There is a generic RenderMathMLToken class and a derived class RenderMathMLOperator adding features specific to operators such as spacing, dictionary property, stretching… Anonymous wrappers and style were used to implement automatic italic mathvariant or operator spacing. The RenderText child of RenderMathMLOperator was (re)built as an anonymous text node so that is was possible to convert U+002D HYPHEN-MINUS into U+2212 MINUS SIGN or to provide some text for anonymous operators created by RenderMathMLFenced (see Unchanged Classes section).

RenderMathMLToken (e.g. mi element)
  RenderMathMLBlock (anonymous flexbox used to apply CSS italic)
    RenderBlock (anonymous created elsewhere to honor CSS rules)
      RenderText
        text run "x"

RenderMathMLOperator (mo element)
   RenderMathMLBlock (anonymous flexbox used for spacing)
    RenderBlock (anonymous created elsewhere to honor CSS rules)
      RenderText (anonymous destroyed and built again)
         text run "−"

We did a big refactoring to remove all the anonymous nodes created by the MathML renderer classes. Just like for MathOperator, we had to be careful and submit various small pieces as the text rendering was quite sensible to code change.

The simplified operator spacing that was supported by WebKit was easy to implement with the new approach. To do automatic italic mathvariant, we modified the paint function to use Mathematical Alphanumeric Symbols instead of CSS italic as you can notice for the variables displayed in the screenshot above. Hence we could remove the RenderMathMLBlock anonymous wrapper.

The use of an anonymous node for the text prevented it to appear in the dumped render tree of layout tests and also required some hacks in the accessibility code to expose that text. In order to address the cases of the minus sign and of mfenced operators, we decided to use our new MathOperator class again. Indeed MathOperator is actually also able to draw unstretched operators made of a single character and this works for the minus sign and for mfenced operators used in practice.

Unchanged Classes

Two classes have not been modified but such modifications were not needed to remove the dependency on RenderFlexibleBox:

  • RenderMathMLFenced is used for an mrow-like element that is defined in the MathML specification as strictly equivalent to constructions with rows and operators. It is implemented as a derived class of RenderMathMLRow and creates anonymous RenderMathMLOperators. This is the only remaining class that modifies the render tree structure. Note that prominent MathML websites and generators do not use the mfenced element, so it is not a big concern.

  • RenderMathMLTable is used for table layout. It is just derived from RenderTable, not RenderFlexibleBox. We did not change anything for now but we considered creating our own implementation in order to make our code independent from HTML table, to support MathML-specific table features and to make it better integrated with the rest of the MathML code.

Accessibility

Even if our main focus was on rendering, the changes we made also had impact on the MathML accessibility code. Indeed, the accessibility tree is generated from the MathML renderer classes: Since we changed the latter during the refactoring, we also had to adjust the accessibility code. Fortunately, we are lucky to have Joanmarie Diggs in our team and she was able to provide some help here.

First, the accessibility code exposes the linethickness of fractions to implement Apple’s AXMathLineThickness attribute. In practice, this is really necessary to know whether the linethickness is null or not (e.g. binomial coefficient VS the Legendre symbol). Apple’s unit test seemed to expose the ratio between the actual thickness and the default thickness but the accessibility code really just reads the actual thickness calculated by RenderMathMLFraction. Our fix and improvement for linethickness made the Apple’s unit test fail so we had to adjust RenderMathMLFraction to expose the value expected by that test.

In general, the accessibility code does not care about anonymous nodes created for layout purpose and there was some code to avoid exposing them in the accessibility tree. So removing all the anonymous during the layout refactoring was actually a good and safe thing to do. There were some helper functions to implement Apple’s AXMathRootRadicand and AXMathRootIndex attributes that had to be adjusted, though. These functions used to do some work to skip the anonymous wrappers and we were actually able to simplify them.

There was also some specific code for the RenderMathMLOperators and their anonymous RenderText that were necessary to expose the text content. Actually, there was an old bug in the accessibility code and the anonymous RenderMathMLOperators created by mfenced were not correctly exposed. The unit test we had for mfenced operators was only checking the text content but it was still passing and so the regression had never been detected before. After the layout refactoring we removed the anonymous RenderText of mfenced operators and so broke that test… We thus spent some time to fix the RenderMathMLOperator code. Essentially, we removed all the old hacks and only left a specific handling for mfenced operators. We also used this opportunity to improve and extend our MathML accessibility tests.

Finally, the MathML accessibility code was directly implemented into a generic AccessibilityRenderObject class. There was some functions to access math nodes and properties but also specific cases scattered all over the code (anonymous boxes, mfenced operators, math roles etc). In order to facilitate future work and maintenance we decided to move all the MathML code into a new AccessibilityMathMLElement class. Hence the work implied by the layout refactoring actually encouraged us to improve the organization and testing of our accessibility code!

Conclusion

In the past four months, Igalia’s web platform team has successfully upstreamed the refactoring of WebKit’s MathML renderer classes and we are now very confident about the quality of the layout code. In addition to the people mentioned above I would personally like to thank everybody who helped with this work. More specifically, I am very grateful to other people from Igalia (Martin Robinson, Sergio Villar and Manuel Rego) or Apple (Brent Fulgham and Darin Adler) who have spent some time to review patches. As a nice side effect of this work, mathematical formulas look better and the accessibility code has been improved. More is happenning in the next two phases. We are looking forward to continuing implementation of Web standards and collaboration with browser vendors at the next Web Engines Hackfest!

July 08, 2016 10:00 PM

July 01, 2016

Víctor Jáquez

VA-API and DRM/KMS in MinnowBoard

In 2012 I started to work on a video renderer for GStreamer which uses directly the DRM/KMS kernel subsystem to display images. I even blogged about it, but I didn’t finished it.

Nonetheless, in December last year, a customer asked us to finish the element, but this time focusing on i.MX6, rather than OMAP4. Consequently, early this year, kmssink got merged into upstream.

The video sink has some nice features, such as DMA-buf importing through the PRIME kernel interface. This feature makes possible a zero-copy path when the video decoder delivers dmabuf based frames. For this particular project, the kernel we used supports the CODA media subsystem, and through the GStreamer element v4l2videodec we could link pipelines negotiating dmabuf sharing, and thus we played videos very efficiently.

During the last GStreamer Hackfest, Nicolas Dufresne got working the MFC media subsystem, for Exynos, with v4l2videodec and kmssink, but I don’t remember if he also got the dmabuf sharing working.

All in all, kmssink had only been tested in a couple of ARM devices, so I wonder, as a gstremer-vaapi co-maintainer, if it also works in x86 devices.

Some colleagues at Igalia, use a Minnowboard to test WebKit4Wayland, so I borrowed it, installed a minimal Fedora 23 in it, and, surprisingly, I found out that it has a nice VA-API support:

   $ vainfo
   error: can't connect to X server!
   libva info: VA-API version 0.38.1
   libva info: va_getDriverName() returns 0
   libva info: Trying to open /usr/lib64/dri/i965_drv_video.so
   libva info: Found init function __vaDriverInit_0_38
   libva info: va_openDriver() returns 0
   vainfo: VA-API version: 0.38 (libva 1.6.2)
   vainfo: Driver version: Intel i965 driver for Intel(R) Bay Trail - 1.6.2
   vainfo: Supported profile and entrypoints
         VAProfileMPEG2Simple            : VAEntrypointVLD
         VAProfileMPEG2Simple            : VAEntrypointEncSlice
         VAProfileMPEG2Main              : VAEntrypointVLD
         VAProfileMPEG2Main              : VAEntrypointEncSlice
         VAProfileH264ConstrainedBaseline: VAEntrypointVLD
         VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
         VAProfileH264Main               : VAEntrypointVLD
         VAProfileH264Main               : VAEntrypointEncSlice
         VAProfileH264High               : VAEntrypointVLD
         VAProfileH264High               : VAEntrypointEncSlice
         VAProfileH264StereoHigh         : VAEntrypointVLD
         VAProfileVC1Simple              : VAEntrypointVLD
         VAProfileVC1Main                : VAEntrypointVLD
         VAProfileVC1Advanced            : VAEntrypointVLD
         VAProfileNone                   : VAEntrypointVideoProc
         VAProfileJPEGBaseline           : VAEntrypointVLD


Afterwards, I compiled GStreamer using gst-uninstalled in order to have kmssink and the master branch of gstreamer-vaapi. And got it! I had hardware accelerated decoding with VA-API, and video rendering using KMS/DRM. But, sadly, no zero copy, since, in order to have dmabuf sharing, we have to finish and to merge bug 755072 in gstreamer-vaapi.

Running the typical Big Bunny video (1080p, H.264) the playback consumes less than the 50% of the CPU usage, compared with the 100% of CPU usage with software decoders (and a lot of frames dropped).

I recorded a small video with the experiment as a proof.

Update: Javer Martinez told me in twitter, that MFC can do dma-buf export since uses vb2 and Exynos DRM can import, but he couldn’t get zero copy to work due missing format support.

by vjaquez at July 01, 2016 05:00 PM

June 21, 2016

Xavier Castaño

We’re proud to sponsor #Linuxcon North America and to celebrate the 25th anniversary of #Linux and #opensource!

LinuxCon North America and ContainerCon features 175+ sessions for developers, operators, users and other open source professionals with a range of content covering Linux, Containers, Cloud and much more!

Join us in Toronto, Canada where you’ll experience:

  1. Bonus Content Day featuring Docker 101 lab and tutorials
  2. Co-located events such as CloudNativeDay, KVM Forum, Linux Security Summit, Open Source Storage Summit, and Xen Project Developer Summit
  3. 25th Anniversary of Linux Casino Royale Gala on Wednesday, August 24th

Igalia will have a booth there so don’t miss your chance to attend and visit us!

by xavier castano at June 21, 2016 09:54 AM

June 20, 2016

Javier Muñoz

Ansible AWS S3 core module now supports Ceph RGW S3

The Ansible AWS S3 core module now supports Ceph RGW S3. The patch was upstream today and it will be included in Ansible 2.2

This post will introduce the new RGW S3 support in Ansible together with the required bits to run Ansible playbooks handling S3 use cases in Ceph Jewel

.

The Ansible project

Ansible is a simple IT automation engine that automates cloud provisioning, configuration management, application deployment and intra-service orchestration among many other IT needs.

Ansible works by connecting to nodes (SSH/WinRM) and pushing out small programs, called 'Ansible modules' to them. These programs are written to be resource models of the desired state of the system. Ansible then executes these modules and removes them when finished.

Ansible uses playbooks to orchestrate the infrastructure with very detailed control. Those playbooks define configuration policies and orchestration workflows. They are a YAML definition of automation tasks that describe how a particular piece of automation should be done.

Playbooks are modeled as a collection of plays, each of which defines a set of tasks to be executed on a group of remote hosts. A play also defines the environment where the tasks will be executed.

Ansible modules ensure indempotence so it is possible running the same tasks over and over without affecting the final result.

Using the Ansible AWS S3 core module with Ceph

The Ceph RGW S3 support is part of the Amazon S3 core module in Ansible. The AWS S3 core module allows the user to manage S3 buckets and the objects within them. It includes support for creating and deleting both objects and buckets, retrieving objects as files or strings and generating download links.

The patch leverages the AWS S3 use cases without any restriction or limitation with regions, URLs, etc.

To enable the RGW S3 flavour in the S3 core module you set the 'rgw' boolean option to 'true' and the 's3_url' string option to the RGW S3 server.

controller:~$ ansible rgw.test.node -m s3 -a \
"mode=list bucket=my-bucket s3_url=http://rgw.test.server:8000 rgw=true"

The 's3_url' option is mandatory in the RGW S3 flavour.

Testing the RGW S3 core module flavour via Playbook

This Playbook example tests the RGW S3 flavour in a simple three boxes network ('controller', 'rgw.test.node' and 'rgw.test.server')

The 'controller' host is the box running the Ansible engine.

The 'rgw.test.node' is the host connecting to the Ceph RGW server and running the S3 use cases.

The 'rgw.test.server' is the Ceph RGW S3 server.

Running the testing Playbook...

controller:~$ ansible-playbook playbook-test-rgw-s3-core-module.yml

PLAY [all] *********************************************************************

TASK [setup] *******************************************************************
ok: [rgw.test.node]

TASK [create test_file test-file.txt] ******************************************
changed: [rgw.test.node]

TASK [put s3_bucket_items my-test-object-1] ************************************
changed: [rgw.test.node] => (item=my-test-object-1.txt)
changed: [rgw.test.node] => (item=my-test-object-2.txt)

TASK [get s3_bucket_items] *****************************************************
ok: [rgw.test.node]

TASK [download s3_bucket_items] ************************************************
changed: [rgw.test.node] => (item=my-test-object-1.txt)
changed: [rgw.test.node] => (item=my-test-object-2.txt)

TASK [remove s3_bucket_and_content] ********************************************
changed: [rgw.test.node]

PLAY RECAP *********************************************************************
rgw.test.node              : ok=6    changed=4    unreachable=0    failed=0

As expected, the Playbook runs the 'create', 'upload', 'list', 'download' and 'remove' S3 use cases for buckets and objects. Adding the '-v' switch will show a more verbose output.

Wrap-up

All examples, modules and playbooks related to RGW S3 were tested on the new Ceph Jewel release.

Beyond of the RGW S3 support in the AWS S3 core module you could be interested in Ansible Playbooks to set up and configure Ceph clusters automatically. You can find those Playbooks in ceph/ceph-ansible with general support for Monitors, OSDs, MDSs and RGW.

The primary documentation for Ansible is available here. I found the Ansible whitepapers a great resource too.

Acknowledgments

My work in Ansible is sponsored by Outscale and has been made possible by Igalia and the invaluable help of the Ansible community. Thank you all!

by Javier at June 20, 2016 10:00 PM

Manuel Rego

My BlinkOn 6 Summary: Grid Layout, Houdini &amp; MathML

Igalia could not miss the chance to participate in a new European edition of BlinkOn. So past week my colleague Dape and I were attending BlinkOn 6 in Munich. In this post I’d do a personal summary about the main conference highlights.

Status of CSS Grid Layout implementation

During the conference I gave a talk about the status of Grid Layout on Blink. I went over the whole spec checking the things that are DONE, WIP or TODO. The summary is that on top of the things we’re already working on, and that will be landing very soon (auto-fit repeat, orthogonal flows, normal value in align|justify-items properties, etc.), we’ve just a few tasks pending (most of them changes on the spec in the last 2 months).

The main features pending are:

  • Fragmentation: This is an issue on the whole project and not only related to Grid Layout. For example, Blink doesn’t have proper fragmentation support on Flexbox either, so probably it’s not a blocker in order to ship Grid Layout. This will affect you when you try to print a grid and the rows are cut in the middle, actually they should have been moved completely to the next page. Of course, it’d be really nice to have it fixed and working as expected.
  • Subgrids: This is a hot topic and the opinions vary a lot depending on whom you’re asking. For some people, we should ship without subgrids support and add it later (as the change won’t be breaking any content); other believe that we shouldn’t ship until subgrids are implemented. I think it’s still needed more discussion between the CSS Working Group and the different browser vendors to reach an agreement on this issue.

You can find the slides of my talk on this blog already (notice that you’d need Chrome Canary with the experimental flag enabled to see them properly). Sadly the talks outside the main room were not recorded, so there won’t be any video of mine.

Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

From the different conversations I had, everyone seems really happy about the work we’re doing on Grid Layout, so thank you very much for the kind words. 😉 As you probably already know Igalia work on this feature has been sponsored by Bloomgerg, big kudos to them!

CSS Houdini

Houdini had a big presence on the conference, it seems clear that Google is really pushing for this to happen. It was nice to see the current status of some APIs that are already working in an experimental way like Typed OM and Painting API. Also, it’s great to check that other browsers are already taking some initial steps too, like Firefox with the Properties and Values API.

Userspace vector graphics using Houdini & Custom Elements by Shane Stephens Userspace vector graphics using Houdini & Custom Elements by Shane Stephens

Apart from the introductory talk on the main room there were 2 more sessions:

MathML in Chromium?

Just in case you don’t know, Igalia has been lately working on the refactoring of the MathML implementation in WebKit. The rationale was that the previous code was a pain to maintain and improve (e.g. it depended a lot on Flexbox making it really hard to implement new features), actually this was one of the reasons why it was removed from Blink after the fork. The refactoring is still ongoing but several patches have already landed.

As the new code (after the refactoring) was looking much better in WebKit my colleague Fred Wang started to port it to Blink on a GitHub repository. The initial results are looking pretty good, but all the code from WebKit hasn’t been ported yet.

MathML mentioned on the Houdini session MathML mentioned on the Houdini session

Thus, I took advantage of the conference to talk about this work with several people in order to check the feasibility of bringing MathML back to Chromium. After some discussions it seems clear that Google would be interested in having MathML support, however at this moment they seem more inclined to do it through the new Houdini APIs like CSS Layout and Fonts Metrics that are still being worked on. MathML seems to be a nice use case to verify that they work as expected.

If they follow this approach, it probably means that we won’t see MathML supported on Blink in the short-term, as we’d need to wait for those APIs to be ready. In any case, Igalia will keep an eye on all this stuff looking for a good opportunity to make it a reality.

Summary

As usual for me the most important part of these conferences is having the chance to meet some new faces and old friends too. It’s really nice to have the chance to spend a while talking face to face with people that have been interacting with you on the Internet for a long time.

Regarding Grid Layout I hope that the different parties can find a good way to bring it to the web authors in the 3 major browser (Chrome, Safari and Firefox) almost synchronously. I cannot say if it’s going to happen this year or the next one, but I’m sure it’s going to be something really big for the Web!

All the CSS Houdini stuff sounds really cool but a long-term thing at this moment. Let’s see how long we need to wait until we can use these new APIs for real.

Munich's Town Hall from from the Tower of St. Peter's Church Munich’s Town Hall from from the Tower of St. Peter’s Church

Finally, it was nice to visit Munich and to have some time for a walk around the city. We couldn’t find any single hill around the city center, next time it might be a good idea to rent a bike.

June 20, 2016 10:00 PM

June 14, 2016

Antonio Gomes

Understanding Chromium’s runtime ozone platform selection

Requisites

For the context of this post, it is assumed a ‘content_shell’ ChromeOS GN build produced with the following commands:

$ gn gen --args='target_os="chromeos"
                 use_ozone=true
                 ozone_platform_wayland=true
                 use_wayland_egl=false
                 ozone_platform_x11=true
                 ozone_platform_headless=true
                 ozone_auto_platforms=false' out-gn-ozone

$ ninja -C out-gn-ozone blink_tests

Ozone at runtime

Looking at the build arguments above, one can foresee that the content_shell app will be able to run on the following Graphics backends: Wayland (no EGL), X11, and “headless” – and it indeed is. This happens thanks to Chromium’s Graphics layer abstraction, Ozone.

So, in order to run content_shell app on Ozone platform “bleh”, one simply does:

$ out-gn-ozone/content_shell --ozone-platform="bleh"

Simple no? Well, yes .. and not as much.

The way the desired Ozone platform classes/objects are instantiated is interesting, involving c++ templates, GYP/GN hooks, python generated code, and more. This post aims to detail the process some more.

Ozone platform selection logic

Two methods kick off OzonePlatform instantiation: ::InitializeForUI and ::InitializeForGPU . They both call ::CreateInstance(), which is our starting point.

This is how simple it looks:

63 void OzonePlatform::CreateInstance() {
64     if (!instance_) {
(..)
69         std::unique_ptr<OzonePlatform> platform =
70             PlatformObject<OzonePlatform>::Create();
71
72         // TODO(spang): Currently need to leak this object.
73         OzonePlatform* pl = platform.release();
74         DCHECK_EQ(instance_, pl);
75     }
76 }

Essentially, when PlatformObject<T>::Create is ran (lines 69 and 70), it ends up calling a method named Create{T}Bleh, where

  • “T” is the template argument name, e.g. “OzonePlatform”.
  • “bleh” is the value passed to –ozone-platform command line parameter.

For instance, in the case of ./content_shell –ozone-platform=x11, the method called would CreateOzonePlatformX11, following the pattern Create{T}Bleh (i.e. “Create”+”OzonePlatform”+”X11”).

The actual logic

In order to understand how PlatformObject class works, lets start by looking at its definition (ui/ozone/platform_object.h & platform_internal_object.h):

template <class T> class PlatformObject {
 public:
  static std::unique_ptr<T> Create();
};
16 template <class T>
17 std::unique_ptr<T> PlatformObject<T>::Create() {
18     typedef typename PlatformConstructorList<T>::Constructor Constructor;
19
20     // Determine selected platform (from --ozone-platform flag, or default).
21     int platform = GetOzonePlatformId();
22
23     // Look up the constructor in the constructor list.
24     Constructor constructor = PlatformConstructorList<T>::kConstructors[platform];
26     // Call the constructor.
27     return base::WrapUnique(constructor());
28 }

In line 24 (highlighted above), the ozone platform runtime selection machinery actually happens. It retrieves a Constructor, which is a typedef for a PlatformConstructorList<T>::Constructor.

By looking at the definition of PlatformConstructorList class (below), Constructor is actually a pointer to a function that returns a T*.

14 template <class T>
15 struct PlatformConstructorList {
16     typedef T* (*Constructor)();
17     static const Constructor kConstructors[kPlatformCount];
18 };

Ok, so basically here is what we know this far:

  1. OzonePlatform::CreateInstance method calls OzonePlatform<bleh>::Create
  2. OzonePlatform<bleh>::Create picks up an index and retrieves a PlatformConstructorList<bleh>::Constructor (via kConstructor[index])
  3. PlatformConstructorList<bleh>::Constructor is a typedef to a function pointer that returns a bleh*.
  4. (..)
  5. This chain ends up calling Create{bleh}{ozone_platform}()

But wait! kConstructors, the array of pointers to functions – that solves the puzzle – is not defined anywhere in src/! 🙂

This is because its actual definition is part of some generated code triggered by specific GN/GYP hooks. They are:

  • generate_ozone_platform_list which generates out/../platform_list.cc,h,txt
  • generate_constructor_list which generates out/../constructor_list.cc though generate_constructor_list.py

In the end out/../generate_constructor_list.cc has the definition of kConstructors.

Again, in the specific case of the given GN build arguments, kConstructors would look like:

template <> const OzonePlatformConstructor
PlatformConstructorList<ui::OzonePlatform>::kConstructors[] = {
    &ui::CreateOzonePlatformHeadless,
    &ui::CreateOzonePlatformWayland,
    &ui::CreateOzonePlatformX11,
};

Logic wrap up

  • GYP/GN hooks are ran at build time, and generate plaform_list.{txt,c,h} as well as constructor_list.cc files respecting ozone_platform_bleh parameters.
    • constructor_list.cc has PlatformConstructorList<bleh>kConstructors actually populated.
  • ./content_shell –ozone-platform=bleh is called
  • OzonePlatform::InitializeFor{UI,Gpu}()
  • OzonePlatform::CreateInstance()
  • PlaformObject<OzonePlatformBleh>::Create()
  • PlatformConstructorList<bleh>::Constructor is retrieved – it is a pointer to a function stored in PlatformConstructorList<bleh>::kConstructor
  • function is ran and an OzonePlatformBleh instance is returned.

by agomes at June 14, 2016 04:28 AM

June 02, 2016

Alejandro Piñeiro

Introducing Mesa intermediate representations on Intel drivers with a practical example

Introduction

The recent big news on the Igalia work on Mesa was that our effort getting the ARB_gpu_shader_fp64 and ARB_vertex_attrib_64bit extensions implemented for Intel Gen8+, allowed to expose OpenGL 4.2 for Gen8+. But I will let other igalians to talk in details about them (no pressures ;)).

In a previous blog post I mentioned that NIR was intended to replace GLSL IR. Although that was true on the context I was talking about, that comment could be somewhat misleading, so I will try to clarify it.

Intermediate representations on Intel Mesa drivers

So first, let’s list the intermediate representations that you would find when working on Mesa Intel drivers:

  • AST (Abstract Syntax Tree): calling it an Intermediate Language is somewhat an abuse of language. This is the tree representation of your GLSL shader just after parsing it with Flex/Bison.
  • Mesa IR (Intermediate Representation): also called HIR and GLSL IR. A real Intermediate Language. It is converted from AST. Here you have optimizations, link support, etc.
  • NIR (New Intermediate Representation): a new Intermediate Language added recently.

So the first questions would be, why three? Having AST and another intermediate representation is easier to explain. AST is a raw tree representation, not useful to generate code. But why IR and NIR?

Current Mesa IR was created some years ago. The design decisions behind it had their advantages. But it has also some disadvantages. For example, their tree-like structure make it complex to traverse, making difficult to navigate, implement optimizations, etc. Other disadvantage was that it was not on SSA form, making again some optimizations hard to write. You can see a summary of all this on Ian Romanick’s presentation “Three Years Experience with a Tree-like Shader IR“.

So at some point an effort was started to “flatenize” Mesa IR and adding SSA support. But the conclusion was that the effort to modify Mesa IR was so big, that it was worth to just start from scratch using the learned lessons, as explained on Connor Abbot’s email “A new IR for Mesa“, in which he proposed this new IR.

Some time later, NIR was ready for production, and as I mentioned on my blog post (that one I’m clarifying right now), some parts of Mesa Intel driver was reimplemented in order to use NIR instead of Mesa IR. So Mesa IR was being replaced there. Where exactly? The parts where the final assembly code was being generated. And now that that is finished (at least on the i965 driver), we can say that Mesa IR is not used to generate code at all.

So right now there are an AST->Mesa IR->NIR chain. What is the plan now? Generate an AST->NIR pass and completely remove Mesa IR? This same question was asked (among other things) on January 2016, on the mesa-dev email “Nir, SCons, and Gallium“.  And the answer was “no”, as you can see on two Ian Romanick’s replies (here and here). The summary is that Mesa IR has several GLSL specifics that aren’t appropriate for NIR’s level. In that sense, NIR is a step below Mesa IR, more near to the GPU needs. It is also worth to mention that since then, Vulkan support was added to Mesa. In order to support Spir-V (Vulkan’s shader language, that is an intermediate representation itself), a SPIRV->NIR pass was created. In that sense, for OpenGL there is an OpenGL-specific intermediate representation, that is Mesa IR, and for Vulkan there is a Vulkan-specific intermediate representation, that is Spirv, and both are translated to the same common representation, NIR.

Practical example:

So, what means that in the practice? Do I need to deal with all those intermediate representations? Well, as anything in life, that would depend. If for example, you want to provide the support for a new GLSL feature or a specific hw, you would need to touch all three. You would need to modify the flex/bison files, so AST would need to be updated, and then Mesa IR, NIR, and the passes that transform one to the other. But if you want to give support for an GLSL feature that is already supported, but on new hw, the most likely is that you will not need changes on any of them, but just on the code that generate the final assembly using NIR.

And what would happen to other features like warnings and errors? Right now most of them are detected at the AST level, and some at the IR level. NIR doesn’t trigger any error/warning yet. It contains several asserts, but basically because it assumes that at that moment the representation of the shader should be correct, so if you find something wrong, means that the developer working on NIR is doing something wrong, in opposite to the developer that wrote the GLSL shader.

So lets go for a practical example I was working on: uninitialized variable warnings (“undefined values” on the bug tracking the issue). In short, this is about warn the developer about things like this:

out color;

void main()

{
 vec4 myTemp;

 color = myTemp;
}

What would be the color on screen? Who knows. So although is a feature you can live without, it is a good nice-to-have.

On the original bug, it was mentioned that this kind of errors are easy to detect, as NIR has a type ssa_undefs, so we just need to check if they are used. And in fact, when I started to work on it, I quickly find how to raise a warning. For example, on the method nir_print.c:print_ssa_def, used to debug, it is easy to modify it in order to point that it is using a undef. But trying to raise the warning there have some problems:

  • As mentioned NIR doesn’t have any warning/error triggering mechanism implemented, you would need to add them.
  • You want to include this warning on the OpenGL InfoLog, but as mentioned NIR is used for both OpenGL and Vulkan, and right now it doesn’t maintains any info about the origin.
  • And perhaps more important, at that point you lack the context data that points which source code line you are working on.

In fact, the last bullet point also applies to Mesa IR. There are some warnings raised at the Mesa IR level, but are “line-less”. Not sure any other developer, but for me, this kind of warning without a reference to the source line number would be annoying to use. Does that mean that the only option would be the totally raw AST tree?

Fortunately, Mesa IR was already saving if a variable was being statically assigned or not (to check some other possible errors). This was being computed on the AST to Mesa IR pass, and in fact the documentation mentions that this value is only valid at that moment. We would be on the middle of AST and Mesa IR. So when to raise the warning? The straightforward solution would be when a variable is , just before/after the error “variableX undeclared” is raised. But that is not so easy. For example:

float myFloat1;
float myFloat2;

myFloat1 = myFloat2;

How many warnings should we raise? Just one, for myFloat2. But technically we are also using myFloat1, and it is uninitialized. So we need to differentiate both cases. Being AST so raw, at that point we don’t have that information, and in fact it is also impossible to go up to the parent expression in order to compute that information. So it was needed to add an attribute on the AST node, that I called is_lhs (as “is left hand side”). That variable would be set when parent expressions are being transformed.

If you are taking attention, probably you start to see what would be the collateral effect of this. Being AST so raw, and OpenGL specific, there would be several corner cases needed to be manually assigned. In fact the  first commit of the series is already covering several corner cases. And in spite of this, once the code reached master, there were two cases of false positives that needed extra checks (for builtin-variables and for inout/out function parameters)

After those two false positives, managing this warning was spread all along the code that made the AST to Mesa IR pass, so seemed easy to broke. So I decided to send too some unit tests to verify that it gets working. First I sent the make check test that tested that warning, and then the unit tests. 30 unit tests (it was initially 28, but reviewer asked two more). Not a bad number for a warning.

Final words

At this point, one would wonder if it still makes sense to have this warning on the AST to Mesa IR pass, and if it would have it better to do it as initially proposed, on NIR. But although it is true that “just detecting it” would be easier on NIR, without dealing with so many corner cases, I still think that adding the support for raising warning/errors compatible with OpenGL Infolog, and bringing somehow the original source code line number, would mean too many changes on both Mesa IR and NIR. More changes that dealing with those corner cases when using the variable on the AST to Mesa IR pass. And in any case, if in the future the situation changes, and makes sense to move the warning to NIR, we would have the unit tests that would help to ensure that we don’t introduce regressions.

In relation to the intermediate representations, just to note that I’m focusing on the Intel driver. Gallium drivers use other intermediate representation, called TGSI. As far as I know, on those drivers, they have a AST->Mesa IR->TGSI chain, and right now there is a work in progress AST->Mesa IR->NIR->TGSI chain that will be used on some specific cases. But all this is beyond my knowledge, so you would need to investigate if you are interested.

Appendix, extra documentation:

If you want more details about MESA IR, you can read:

  • Read Mesa IR README
  • Read past blog posts from fellow igalian Iago Toral (when NIR was not available yet): post 1 and post 2

If you want extra information about Mesa NIR, you can read:

 

 

by infapi00 at June 02, 2016 05:47 PM

Xavier Castaño

Igalia sponsors LinuxCon and ContainerCon Japan 2016, the premier Linux Conference in Asia

Join us at LinuxCon + ContainerCon Japan! LinuxCon Japan is the premier Linux conference in Asia that brings together a unique blend of core developers, administrators, users, community managers and industry experts. It is designed not only to encourage collaboration but to support future interaction between Japan and other Asia Pacific countries and the rest of the global Linux community.

ContainerCon also comes to Japan for the first time this year, after a successful launch in North America in 2015. New developments in Linux containers are driving the adoption of cloud and virtualization technologies in many industries and ContainerCon Japan will provide a platform for exhibiting the best work.

My colleague Mi Sun Silvia Cho will be talking about WebKit For Wayland and we will show some nice demos about our Browsers, Graphics and Compilers experience in our booth in Tokyo next month.

by xavier castano at June 02, 2016 07:10 AM