Planet Igalia

August 29, 2016

Iago Toral

OpenGL terrain renderer

Lately I have been working on a simple terrain OpenGL renderer demo, mostly to have a playground where I could try some techniques like shadow mapping and water rendering in a scenario with a non trivial amount of geometry, and I thought it would be interesting to write a bit about it.

But first, here is a video of the demo running on my old Intel IvyBridge GPU:

And some screenshots too:

OpenGL Terrain screenshot 1 OpenGL Terrain screenshot 2
OpenGL Terrain screenshot 3 OpenGL Terrain screenshot 4

Note that I did not create any of the textures or 3D models featured in the video.

With that out of the way, let’s dig into some of the technical aspects:

The terrain is built as a 251×251 grid of vertices elevated with a heightmap texture, so it contains 63,000 vertices and 125,000 triangles. It uses a single 512×512 texture to color the surface.

The water is rendered in 3 passes: refraction, reflection and the final rendering. Distortion is done via a dudv map and it also uses a normal map for lighting. From a geometry perspective it is also implemented as a grid of vertices with 750 triangles.

I wrote a simple OBJ file parser so I could load some basic 3D models for the trees, the rock and the plant models. The parser is limited, but sufficient to load vertex data and simple materials. This demo features 4 models with these specs:

  • Tree A: 280 triangles, 2 materials.
  • Tree B: 380 triangles, 2 materials.
  • Rock: 192 triangles, 1 material (textured)
  • Grass: 896 triangles (yes, really!), 1 material.

The scene renders 200 instances of Tree A another 200 instances of Tree B, 50 instances of Rock and 150 instances of Grass, so 600 objects in total.

Object locations in the terrain are randomized at start-up, but the demo prevents trees and grass to be under water (except for maybe their base section only) because it would very weird otherwise :), rocks can be fully submerged though.

Rendered objects fade in and out smoothly via alpha blending (so there is no pop-in/pop-out effect as they reach clipping planes). This cannot be observed in the video because it uses a static camera but the demo supports moving the camera around in real-time using the keyboard.

Lighting is implemented using the traditional Phong reflection model with a single directional light.

Shadows are implemented using a 4096×4096 shadow map and Percentage Closer Filter with a 3x3 kernel, which, I read is (or was?) a very common technique for shadow rendering, at least in the times of the PS3 and Xbox 360.

The demo features dynamic directional lighting (that is, the sun light changes position every frame), which is rather taxing. The demo also supports static lighting, which is significantly less demanding.

There is also a slight haze that builds up progressively with the distance from the camera. This can be seen slightly in the video, but it is more obvious in some of the screenshots above.

The demo in the video was also configured to use 4-sample multisampling.

As for the rendering pipeline, it mostly has 4 stages:

  • Shadow map.
  • Water refraction.
  • Water reflection.
  • Final scene rendering.

A few notes on performance as well: the implementation supports a number of configurable parameters that affect the framerate: resolution, shadow rendering quality, clipping distances, multi-sampling, some aspects of the water rendering, N-buffering of dynamic VBO data, etc.

The video I show above runs at locked 60fps at 800×600 but it uses relatively high quality shadows and dynamic lighting, which are very expensive. Lowering some of these settings (very specially turning off dynamic lighting, multisampling and shadow quality) yields framerates around 110fps-200fps. With these settings it can also do fullscreen 1600×900 with an unlocked framerate that varies in the range of 80fps-170fps.

That’s all in the IvyBridge GPU. I also tested this on an Intel Haswell GPU for significantly better results: 160fps-400fps with the “low” settings at 800×600 and roughly 80fps-200fps with the same settings used in the video.

So that’s it for today, I had a lot of fun coding this and I hope the post was interesting to some of you. If time permits I intend to write follow-up posts that go deeper into how I implemented the various elements of the demo and I’ll probably also write some more posts about the optimization process I followed. If you are interested in any of that, stay tuned for more.

by Iago Toral at August 29, 2016 11:50 AM

August 22, 2016

Xavier Castaño

Igalia will be at IBC 2016

Do you work in the media, entertainment, broadcasting or content industry? If so, please join us at IBC2016, 9-13 September in Amsterdam. The IBC Exhibition covers fifteen halls across the Amsterdam RAI and hosts over 1,600 exhibitors spanning the creation, management and delivery of electronic and media entertainment.
You will be able to meet us at our booth where we will be showcasing demos of our latest work in browsers. The demo of WebKit for Wayland, a new port of WebKit that is currently being upstreamed, on an IVI shell is especially worth checking out.

Igalia will be in Hall 5, booth 19 so don’t miss your chance to attend and visit us! Find us on the interactive floorplan.

by xavier castano at August 22, 2016 10:15 AM

August 18, 2016

Adrián Pérez

SnabbWall's Traffic Analyzer: L7Spy

Wow, two posts just a few days apart from each other! Today I am writing about L7Spy, the application from the SnabbWall suite used to analyze network traffic and detect which applications they belong to. SnabbWall is being developed at Igalia with sponsorship from the NLnet Foundation. (Thanks!)

 

A Digression

There are two things related to L7Spy I want to mention beforehand:

nDPI 1.8 is Now Supported

The changes from the [recently released][ljndpi0.3.3] ljndpi v0.3.3 have been merged into the SnabbWall repository, which means that now it also supports using version 1.8 of the nDPI library.

Following Upstream

Snabb follows a monthly release schedule. Initially, I thought it would be better to do all the development for SnabbWall starting from a given Snabb release, and incorporate the changes from upstream later on, after the first one of SnabbWall is done. I noticed how it might end up being difficult to rebase SnabbWall on top of the changes accumulated from the upstream repository, and switched to the following scheme:

  • The branch containing the most up-to-date development code is always in the snabbwall branch.
  • Whenever a new Snabb version is released, changes are merged from the upstream repository using its corresponding release tag.
  • When a SnabbWall release is done, it tagged from current contents of the snabbwall branch, which means it is automatically based on the latest Snabb release.
  • If a released version needs fixing, a maintenance branch snabbwall-vX.Y for it is created from the release tag. Patches have to be backported or purposedfully made for that this new branch.

This approach reduces the burden of keeping up with changes in Snabb, while still allowing to make bugfix releases of any SnabbWall version when needed.

L7Spy

The L7Spy is a Snabb application, and as such it can be reused in your own application networks. It is made to be connected in a few different ways, which should allow for painless integration. The application has two bidirectional endpoints called south and north:

The L7Spy application.

Packets entering one of the ends are forwarded unmodified to the other end, in both directions, and they are scanned as they pass through the application.

The only mandatory connection is the receiving side of either south or north: this allows for using L7Spy as a “kitchen sink”, swallowing the scanned packets without sending them anywhere. As an example, this can be used to connect a RawSocket application to L7Spy, attach the RawSocket to an existing network interface, and do passive analysis of the traffic passing through that interface:

Passive analysis made easy

If you swap the RawSocket application with the one driving Intel NICs, and divert a copy of packets passing through your firewall towards the NIC, you would be doing essentially the same but at lightning speed and moving the load of packet analysis to a separate machine. Add a bit more of smarts on top, and this kind of setup would already look like a standalone Intrusion Detection System!

snabb wall

While implementing the L7Spy application, the snabb wall command has also come to life. Like all the other Snabb programs, the wall command is built into the snabb executable, and it has subcommands itself:

aperez@hikari ~/snabb/src % sudo ./snabb wall
[sudo] password for aperez:
Usage:
  snabb wall <subcommand> <arguments...>
  snabb wall --help

Available subcommands:

  spy     Analyze traffic and report statistics

Use --help for per-command usage. Example:

  snabb wall spy --help

aperez@hikari ~/snabb/src %

For the moment being, only the spy subcommand is provided: it exercises the L7Spy application by feeding into it packets captured from one of the supported sources. It can work in batch mode —the default—, which is useful to analyze a static pcap file. The following example analyzes the full contents of the pcap file, reporting at the end all the traffic flows detected:

aperez@hikari ~/snabb/src % sudo ./snabb wall spy \
      pcap program/wall/tests/data/EmergeSync.cap
0xfffffffff19ba8be   18p   134.68.220.74:21769 -     192.168.1.2:26883  RSYNC
0xffffffffc6f96d0f 4538p   134.68.220.74:22025 -     192.168.1.2:26883  RSYNC
aperez@hikari ~/snabb/src %

Two flows are identified, which correspond to two different connections between the client with IPv4 address 134.68.220.74 (from two different ports) to the server with address 192.168.1.2. Both flows are correctly identified as belonging to the rsync application. Notice how the application is properly detected even though the rsync daemon is running in port 26883 which is a non-standard one for this kind of service.

It is also possible to ingest traffic directly from a network interface in live mode. Apart from the drivers for Intel cards included with Snabb (selectable using the intel10g and intel1g sources) there are two hardware-independent sources for TAP devices (tap) and RAW sockets (raw). Let's use a RAW socket to sniff a copy of every packet which passes through the a kernel-managed network interface, while opening a well known search engine in a browser using their https:// URL:

aperez@hikari ~/snabb/src % sudo ./snabb wall spy --live eth0
...
0x77becdcb    2p    192.168.20.1:53    -   192.168.20.21:58870  SERVICE_GOOGLE:DNS
0x6e111c73    4p   192.168.20.21:443   -   216.58.210.14:48114  SERVICE_GOOGLE:SSL
...

Note how even the connection is encrypted, our nDPI-based analyzer has been able to correlate a DNS query immediately followed by a connection to the address from the DNS response at port 443 as a typical pattern used to browse web pages, and it has gone the extra mile to tell us that it belongs to a Google-operated service.

There is one more feature in snabb wall spy. Passing --stats will print a line at the end (in normal mode), or every two seconds (in live mode), with a summary of the amount of data processed:

aperez@hikari ~/snabb/src % sudo ./snabb wall spy --live --stats eth0
...
=== 2016-08-17T18:05:54+0300 === 71866 Bytes, 99 packets, 35933.272 B/s, 49.500 PPS
...
=== 2016-08-17T18:19:35+0300 === 3105473 Bytes, 2884 packets, 1.599 B/s, 0.001 PPS
...

One last note: even though it is not shown in the examples, scanning and identifying IPv6 traffic is already supported.

Get that IPv6 flowing!

What Next?

Now that traffic analysis and flow identification is working nicely, it has to be used for actual filtering. I am now in the process of implementing the firewall component of SnabbWall (L7fw) as a Snabb application. Also, there will be a snabb wall filter subcommand, providing an off-the-shelf L7 filtering solution. Expect a couple more of blog posts about them.

In the meanwhile, if you want to play with L7Spy, I will be updating its WIP documentation in the following days. Happy hacking!

by aperez (adrian@perezdecastro.org) at August 18, 2016 08:51 AM

August 11, 2016

Xabier Rodríguez Calvar

Going to GUADEC 2016

I am at the airport getting ready to board beginning my trip to Karlsruhe for GUADEC 2016. I’ll be there with my colleague Michael Catanzaro.Please talk to us if you are interested in browsers or Igalia.

by calvaris at August 11, 2016 07:30 AM

July 31, 2016

Adrián Pérez

ljndpi 0.0.3 released with nDPI 1.8 support

Although I have been away from Snabb-related work for a while, the fact that nDPI 1.8 was released didn't went unnoticed. This library is an important building block for SnabbWall, the Layer-7 firewall which I am developing at Igalia with sponsorship from the NLnet Foundation (thanks!).

 

SnabbWall uses the nDPI library via the ljndpi binding, about which I wrote back in January. As any other component which uses a native library, ljndpi is a consumer of the API and ABI exported by it, so it needs to be updated to accomodate to changes introduced in new releases of nDPI.

Fortuntately most of the API (and ABI) from nDPI 1.7 remains unchanged in version 1.8, making it possible to maintain the interface exposed to the Lua world. This is very good news, and it means that any program using ljndpi does not need to be modified to work with nDPI 1.8: just update ljndpi to version 0.0.3 and you are ready to go.

Nitty-Gritty Details

In order to support versions 1.7 and 1.8 of the nDPI library, ndpi_revision() is used to obtain the version string of the loaded library, and parsed into a (major, minor, patch) version triplet. The ndpi.c module uses this version information to determine which symbols to make known to the FFI extension via ffi.cdef:

if lib_version.minor == 7 then
  -- Declare functions available only in nDPI 1.7
  ffi.cdef [[
    ...
  ]]
elseif lib_version.minor == 8 then
  -- Ditto, for version 1.8
  ffi.cdef [[
    ...
  ]]
end

The high-level ndpi.wrap module does the same, ensuring that only functions available in the version of the nDPI library currently loaded are used. In some cases, parameters passed to functions with the same name in nDPI may differ, and ndpi.wrap shuffles them around as needed in order to avoid changes in the interface exposed to Lua:

if lib_version.major == 7 then
   -- ...
else
   -- ...

   -- In nDPI 1.8 the second parameter (uint8_t proto) has been dropped.
   detection_module.find_port_based_protocol = function (dm, dummy, ...)
      local proto = lib.ndpi_find_port_based_protocol(dm, ...)
      return proto.master_protocol, proto.protocol
   end
end

Last but not least, the numeric values which identify each protocol supported by nDPI have changed as well, due to the additional protocols supported by the new version of the library. The existing ndpi.protocol_ids module was renamed to ndpi.protocol_ids_1_7, and then a new ndpi.protocol_ids_1_8 generated using the update-protocol-ids tool. Finally, the ndpi.wrap module was updated to load the module which corresponds to the version of the nDPI library being used:

-- Return the module table.
return {
  protocol = require("ndpi.protocol_ids_"
            .. lib_version.major .. "_"
            .. lib_version.minor);
  -- ...
}

Easier Installation

As a bonus, it is now possible to install ljndpi using the LuaRocks package manager. LuaRocks provides similar functionality to NodeJS' npm or Python's pip. If your LuaJIT installation includes LuaRocks, getting the latest stable release of ljndpi is easier than ever:

luarocks install ljndpi

It is also possible to install a development snapshot directly. The following will fetch the source from the Git repository and install it using LuaRocks — though it is strongly recommended to use the stable releases:

luarocks install --server=https://luarocks.org/dev ljndpi

Hopefully, this will make ljndpi easier to discover and used by projects other than SnabbWall. Who knows, it may even make it easier for distributions to package it! (Hint: LuaRocks supports specifying a custom tree.)

One More Thing...

He said it 31 times on stage (look it up!)

My previous post ended up in a cliffhanger which promised another post about SnabbWall. Supporting nDPI 1.8 in ljndpi seemed important enough to write about it, but rest assured that the post on SnabbWall is next in my “polish and publish” queue.

by aperez (adrian@perezdecastro.org) at July 31, 2016 08:15 PM

July 29, 2016

Alejandro Piñeiro

One year of Mesa

Changes

During the last year and something, my work at Igalia was focused on the Intel i965 driver for Mesa, the open source OpenGL implementation. Iago and Samuel were working for some time, then joined Edu and Antía, and then I joined myself, as the fifth igalian.

One of the reasons I will always remember working on this project is that I was also being a father for a year and something, so it will be always easy to relate one and the other. Changes! A project change plus a total life change.

In fact the parenthood affected a little how I joined the project. We (as Igalia) decided that I would be a good candidate to join the project around April. But as my leave of absence was estimated to be around the last weeks of May, and it would be strange to start a project and suddenly disappear, we decided to postpone the joining just after the parental leave. So I spent some time closing and transferring the stuff I was doing at Igalia at that moment, and starting to get into the specifics of Intel driver, and then at the end of May, my parental leave started. Pedro is here!

 

Pedro’s day zero
Parenthood

Mesa can be a challenging project, but nothing compared to how to parent. A lot of changes. In Spain the more usual parental leave for the father is 15 days, where the only  flexibility is that you need to take 2 just after the child has born, and chose when take the other 13. But those 13 needs to be taken in a row. So I decided to took 15 days in a row. Igalia gives you 15 extra days. In fact, the equivalent to 15 days, as they gives you flexibility of how to take them. So instead of took them as 15 full-day, I decided to take 30 part-time days. Additionally Pedro’s grandmother (maternal) came to live with us for a month. That allowed us to do a step by step go back to work. So it was something like:

  • Pedro was born, I am in full parental leave, plus grandmother living at our house helping.
  • I start to work part-time
  • Grandmother goes back to Brazil
  • I start to work full-time.
  • Pedro’s mother goes back to work part-time

And probably the more complicated moment was when Pedro’s mother went back to work. More or less at that time we switched the usual grandparents (paternal) weekend visits to in-week visits. So what it is my usual timetable? Three days per week I work 8:00-14:30 at my home, and I take care of Pedro on the afternoon, when his mother goes to work. Two days per week is 8:00-12:30 at my home, and 15:00-20:00 at my parents home. And that is more or less, as no day is the same that the other 😉 And now I go more to the parks!

Pedro playing on a park
Pedro playing on a park

There were other changes. For example switched from being one of the usual suspects at Igalia office to start to work mostly from home. In any case the experience is totally worthing, and it is incredible how fast the time passed. Pedro is one year old already!

Pedro destroying his birthday cake
Mesa tasks

I almost forgot that this blog post started to talk about my work on Mesa. So many photos! During this year, I have participated (although not alone, specially on spec implementations) in the following tasks (and except last one, in chronological order):

About which task I liked the most, I think that it was the work done on optimizing the NIR to vec4 pass. And not forget that the work done by Igalia to implement internalformat_query2 and vertex_attrib_64bit, plus the work done by Iago and Samuel with fp64, helped to get Mesa 12.0 exposing OpenGL 4.2 support on Broadwell and later.

What happens with accessibility?

Being working full time for the same project for a full year, plus learning how to parent, means that I didn’t have too much spare time for accessibility development. Having said so, I have been reviewing ATK patches, doing the ATK releases regularly and keeping an eye on the GNOME accessibility mailing lists, so I’m still around. And we have Joanmarie Diggs, also fellow igalian, still rocking on improving Orca, WebKitGTK and WAI-ARIA.

Thanks

Finally, I’d like to thank both Intel and Igalia for supporting my work on Mesa and i965 all this time. Specially on allowing me so flexible timetable, where the important is what you deliver, and not when you do the work, allowing me to enjoy parenthood. I also want to thanks the igalian colleagues I has been working during this year, Iago, Samuel, Antía, Edu, Andrés, and Juan, and all those at Intel who have been helping and reviewing my work during all this year, like Jason Ekstrand, Kenneth Graunke, Matt Turner and Francisco Jerez.

 

by infapi00 at July 29, 2016 09:02 AM

July 13, 2016

Iago Toral

Story and status of ARB_gpu_shader_fp64 on Intel GPUs

In case you haven’t heard yet, with the recently announced Mesa 12.0 release, Intel gen8+ GPUs expose OpenGL 4.3, which is quite a big leap from the previous OpenGL 3.3!

OpenGL 4.3

The Mesa i965 Intel driver now exposes OpenGL 4.3 on Broadwell and later!

Although this might surprise some, the truth is that even if the i965 driver only exposed OpenGL 3.3 it had been exposing many of the OpenGL 4.x extensions for quite some time, however, there was one OpenGL 4.0 extension in particular that was still missing and preventing the driver from exposing a higher version: ARB_gpu_shader_fp64 (fp64 for short). There was a good reason for this: it is a very large feature that has been in the works by Intel first and Igalia later for quite some time. We first started to work on this as far back as November 2015 and by that time Intel had already been working on it for months.

I won’t cover here what made this such a large effort because there would be a lot of stuff to cover and I don’t feel like spending weeks writing a series of posts on the subject :). Hopefully I will get a chance to talk about all that at XDC in September, so instead I’ll focus on explaining why we only have this working in gen8+ at the moment and the status of gen7 hardware.

The plan for ARB_gpu_shader_fp64 was always to focus on gen8+ hardware (Broadwell and later) first because it has better support for the feature. I must add that it also has fewer hardware bugs too, although we only found out about that later ;). So the plan was to do gen8+ and then extend the implementation to cover the quirks required by gen7 hardware (IvyBridge, Haswell, ValleyView).

At this point I should explain that Intel GPUs have two code generation backends: scalar and vector. The main difference between both backends is that the vector backend (also known as align16) operates on vectors (surprise, right?) and has native support for things like swizzles and writemasks, while the scalar backend (known as align1) operates on scalars, which means that, for example, a vec4 GLSL operation running is broken up into 4 separate instructions, each one operating on a single component. You might think that this makes the scalar backend slower, but that would not be accurate. In fact it is usually faster because it allows the GPU to exploit SIMD better than the vector backend.

The thing is that different hardware generations use one backend or the other for different shader stages. For example, gen8+ used to run Vertex, Fragment and Compute shaders through the scalar backend and Geometry and Tessellation shaders via the vector backend, whereas Haswell and IvyBridge use the vector backend also for Vertex shaders.

Because you can use 64-bit floating point in any shader stage, the original plan was to implement fp64 support on both backends. Implementing fp64 requires a lot of changes throughout the driver compiler backends, which makes the task anything but trivial, but the vector backend is particularly difficult to implement because the hardware only supports 32-bit swizzles. This restriction means that a hardware swizzle such as XYZW only selects components XY in a dvecN and therefore, there is no direct mechanism to access components ZW. As a consequence, dealing with anything bigger than a dvec2 requires more creative solutions, which then need to face some other hardware limitations and bugs, etc, which eventually makes the vector backend require a significantly larger development effort than the scalar backend.

Thankfully, gen8+ hardware supports scalar Geometry and Tessellation shaders and Intel‘s Kenneth Graunke had been working on enabling that for a while. When we realized that the vector fp64 backend was going to require much more effort than what we had initially thought, he gave a final push to the full scalar gen8+ implementation, which in turn allowed us to have a full fp64 implementation for this hardware and expose OpenGL 4.0, and soon after, OpenGL 4.3.

That does not mean that we don’t care about gen7 though. As I said above, the plan has always been to bring fp64 and OpenGL4 to gen7 as well. In fact, we have been hard at work on that since even before we started sending the gen8+ implementation for review and we have made some good progress.

Besides addressing the quirks of fp64 for IvyBridge and Haswell (yes, they have different implementation requirements) we also need to implement the full fp64 vector backend support from scratch, which as I said, is not a trivial undertaking. Because Haswell seems to require fewer changes we have started with that and I am happy to report that we have a working version already. In fact, we have already sent a small set of patches for review that implement Haswell‘s requirements for the scalar backend and as I write this I am cleaning-up an initial implementation of the vector backend in preparation for review (currently at about 100 patches, but I hope to trim it down a bit before we start the review process). IvyBridge and ValleView will come next.

The initial implementation for the vector backend has room for improvement since the focus was on getting it working first so we can expose OpenGL4 in gen7 as soon as possible. The good thing is that it is more or less clear how we can improve the implementation going forward (you can see an excellent post by Curro on that topic here).

You might also be wondering about OpenGL 4.1’s ARB_vertex_attrib_64bit, after all, that kind of goes hand in hand with ARB_gpu_shader_fp64 and we implemented the extension for gen8+ too. There is good news here too, as my colleague Juan Suárez has already implemented this for Haswell and I would expect it to mostly work on IvyBridge as is or with minor tweaks. With that we should be able to expose at least OpenGL 4.2 on all gen7 hardware once we are done.

So far, implementing ARB_gpu_shader_fp64 has been quite the ride and I have learned a lot of interesting stuff about how the i965 driver and Intel GPUs operate in the process. Hopefully, I’ll get to talk about all this in more detail at XDC later this year. If you are planning to attend and you are interested in discussing this or other Mesa stuff with me, please find me there, I’ll be looking forward to it.

Finally, I’d like to thank both Intel and Igalia for supporting my work on Mesa and i965 all this time, my igalian friends Samuel Iglesias, who has been hard at work with me on the fp64 implementation all this time, Juan Suárez and Andrés Gómez, who have done a lot of work to improve the fp64 test suite in Piglit and all the friends at Intel who have been helping us in the process, very especially Connor Abbot, Francisco Jerez, Jason Ekstrand and Kenneth Graunke.

by Iago Toral at July 13, 2016 02:48 PM

July 12, 2016

Frédéric Wang

MathML Improvements in WebKit

In a previous blog post, I explained the work made by Igalia’s web platform team to refactor WebKit’s MathML layout classes. I stated that although some rendering improvements were a nice side effect, the main goal of the first phase was really to clean the code up so that it is easier for developers to work on MathML in the future. Indeed this really made things easier to review: Quite unexpectedly to me, the second phase only took 4 days to be upstreamed… Kudos to Brent Fulgham for having reviewed so many patches in such a short period of time!

In this blog post, I am going to give an overview of the improvements made during these two first phases taking changeset r203109 as a reference. The changes will be available in WebKitGTK+ 2.14 in September and are likely to be included this month in the next Safari Technology Preview. It definitely remains more work to do such as the third phase or other rendering improvements, but I believe we have already made a big step forward!

Mathematical Fonts

Two years ago, basic support for operator stretching via the OpenType MATH table was added to WebKit. During the refactoring, we improved that support and also made use of more parameters to improve the math layout (see section about OpenType MATH parameters below). While Latin Modern Math will be used in most screenshots, the following one shows that you can use any math fonts. By default WebKit will try and use one of these fonts but if none are available or if you force a non-math font-family then the rendering quality may not be good.

The following screenshot gives the rendering for various fonts. For the last one we used the value sans-serif to illustrate issues with non-math fonts (displaystyle integral too small, mathvariant italic glyphs taken from another font, missing italic correction etc).

Screenshots of a formula with various fonts Integral obtained by contour integration
WebKitGTK+ r203109 with various font-family values.
From top to bottom: TeX Gyre Schola, Latin Modern, DejaVu Math, STIX, and sans-serif.

The href Attribute

This new feature is obvious: You can now create a hyperlink for any part of a mathematical formula! Here is a screenshot of the MathML Torture Test 21 with additional links, as displayed in WebKit r203109. Click the image to load the test case in your browser and test it.

Screenshot of MathML Torture test 21 with hyperlinks

The mathvariant Attribute

Unicode contains Mathematical Alphanumeric Symbols to convey special meaning such as double-struck or specific Arabic styles. Characters for these symbols are generally provided by math fonts. In MathML, mathematical variables are automatically rendered using the italic characters from this Unicode block. One can also access these characters via the mathvariant attribute and that attribute is actually used by many LaTeX-to-MathML converters.

In the following screenshot, you can see that the letters f, x and y are now drawn with this special mathematical italic glyphs and that WebKit uses the conventional fraktur style for the Lie algebra g. Note that the prime is still too small because WebKit does not make use of the ssty feature yet.

Screenshot of Wikipedia article on Lie Algebra Homomorphism of Lie algebra
Top: Safari 9.1.1. Bottom: Safari r203109.

Operators and Spacing

As said in my previous blog post, the rendering of large and stretchy operators have been rewritten a lot and as a consequence the rendering has improved. Also, I mentioned that the width of operators may depend on their height. This may cause accumulated approximations during the computation of preferred widths. The old flexbox-based implementation incorrectly forced layout during preferred computation to avoid that but a quick workaround for that security concern caused the approximate preferred widths to be used for the logical widths. With our new implementation, the logical width is now correctly calculated. Finally, we added partial support for the mpadded element which is often used to tweak spacing in mathematical formulas.

The screenshot below illustrates the fix for a serious regression with large operator (summation symbol) as well as improvements in the rendering of stretchy operators (horizontal braces). Note that the formula has a hack with a zero-width mpadded element which used to cause improper spacing (large gap between the group of a’s and the group of b’s).

Screenshot from Mozilla's MathML torture test using Safari 9.1.1 or the the current refactoring Tests 21 and 22 from the MathML torture test
Left: Safari 9.1.1. Right: Safari r203109.

The following screenshot shows how incorrect width computations used to cause excessive spacing after the stretchy radical and slash symbols:

Screenshot of Wikipedia article on Trigonometric Functions Sine of 1 degree
Top: Safari 9.1.1. Bottom: Safari r203109.

The displaystyle Property

Mathematical formulas can be integrated inside a paragraph of text (inline math in TeX terminology) or displayed in its own horizontally centered paragraph (display math in TeX terminology). In the latter case, the formula is in displaystyle and does not have any restrictions on vertical spacing. In the former case, the layout of the mathematical formula is modified a bit to optimize this vertical spacing and to better integrate within the surrounding text. The displaystyle property can also be set using the corresponding attribute or can change automatically in subformulas (e.g. in fractions).

In the following screenshot the fix for the large operator regression is obvious but you can also notice that the summation is now slightly different for the definition of a Bézier curve (top) and for the one of a rational Bézier curve (bottom). For example, to save some vertical space in the fractions, the sigma symbol is drawn smaller and the scripts attached to it are moved on its right. However, the script size could still be improved when we implement the scriptlevel property.

Screenshot of Wikipedia article on Bézier Curves Definitions of Bézier Curve
Left: Safari 9.1.1. Right: Safari r203109.

OpenType MATH Parameters

Many new OpenType MATH features have been implemented following the MathML in HTML5 Implementation Note:

  • Use of the AxisHeight parameter to set vertical position of fractions, tables and symmetric operators.
  • Use of layout constants for radicals (some of them were already used), scripts and fractions. This improves math spacing and positioning and allows to adjust them according to value of the displaystyle property discussed in the previous section.
  • Use of the italic correction of large operator glyph to set the position of subscripts.

The screenshots below illustrate some of these improvements. For the first one, the use of AxisHeight allows to better align the fraction bar with the plus sign. For the second one, the use of layout constants for scripts as well as the italic correction of the integral improves the placement of limits.

Screenshot of Wikipedia article on the Continued Fractions Generalized Continued Fraction
Top: Safari 9.1.1. Bottom: Safari r203109.
Screenshot of Wikipedia article on the Gamma Function Definition of the Gamma Function
Top: Safari 9.1.1. Bottom: Safari r203109.

Right-to-left radicals

One of the advantage of the old flexbox-based implementation is that right-to-left layout was available for free. This support has of course been preserved in the new implementation. We also added a simple workaround to mirror radicals using a scale transform as shown in the screenshot below. However, general glyph-level mirroring is still missing.

Screenshot of Wikipedia article on Lie Algebra Test 13 from the MathML torture test (Machrek Style)
Top: Safari 9.1.1. Bottom: Safari r203109.

Conclusion

Igalia’s web platform team has been able to follow the MathML in HTML5 Implementation Note in order to significantly improve the rendering of mathematical expressions in WebKit. More work remains to do but we will definitely appreciate any feedback that can help improving native MathML support in web engines. We are also excited to continue work and collaboration at the next Web Engines Hackfest!

July 12, 2016 10:00 PM

July 08, 2016

Frédéric Wang

MathML Refactoring in WebKit

Introduction

If you follow WebKit developments, you are certainly aware that Igalia has been working on WebKit’s MathML implementation for some time. More recently, effort has been made to write a clean implementation addressing issues reported by WebKit reviewers in the past. After joining Igalia in March, I have been in charge of getting this work reviewed and merged into WebKit’s development branch. In the past four months, we have been successful in upstreaming the first phase of the refactoring and the work accomplished is described in this blog post.

Screenshot from Mozilla's MathML torture test using Safari 9.1.1 or the the current refactoring
MathML torture tests with the Latin Modern Math font
Left: Safari 9.1.1. Right: Current MathML branch.

Note that the focus was on code refactoring so the improvement may not be obvious for non-developers. Nevertheless many issues have already been fixed as a consequence of that work: math italic, position of scripts, stretchy and large operators, rendering update and more. More importantly, this preliminary step opens the way for beautiful math rendering based on TeX rules and the OpenType MATH table. Rendering improvements and implementation of new features have already started in the next phase of the refactoring, so stay tuned…

Design Issues

As explained in a previous report, the main design issues in the flexbox-based implementation released in 2013 can essentially be summarized in three points:

  1. WebKit’s code to stretch operator was not efficient at all and was limited to some basic fences buildable via Unicode characters.
  2. WebKit’s MathML code violated many layout invariants, making the code unreliable.
  3. WebKit’s MathML code relied heavily on the C++ renderer classes for flexboxes and had to manage too many anonymous renderers.

For the first point, the performance issue had been fixed by Igalia developers right after the initial feedback from WebKit developers and we improved that again during our refactoring. Partial support for the OpenType MATH table was added during my crowdfunding project and allowed to stretch more operators with the help of math fonts. For the second point, the main issue was also fixed right after the initial feedback. However one could still have some doubts regarding the layout steps, given the complexity implied by the third point. That last issue was still problematic so far and addressing it was the main achievement of our refactoring.

Technically, the dependence on flexbox is unnecessary and the implementation actually only used a limited set of flexbox features. Thus executing the whole flexbox code was overkill. It can also be a burden for people working on other places of the layout code. For example Javi Fernández has worked on improving the box alignments in the past and he had hard time fixing the MathML code impacted by his changes. This is probably the cause of the bad position of the summation symbol that can be seen in the screenshot above.

From the layout perspective, most of the rendering logic was implemented in the flexbox classes and the MathML “renderer” classes were really just managing the creation and update of anonymous nodes and style. Although this sounds good code reuse it actually made impossible to understand how and when layout steps happen or to add more advanced features. The new implementation replaced this manipulation of the render tree with simple arithmetic calculations on box metrics which is safer and more reliable. Also, complex renderers such as RenderMathMLScripts or RenderMathMLRoot actually achieve better rendering quality with less code!

As an example of the complexity, RenderMathMLUnderOver can behave as a RenderMathMLScripts in some situation so we really want the former class to reuse the latter class. As we will see below the old implementation of the two renderers were quite different: RenderMathMLUnderOver only relied on setting column direction in the user agent stylesheet while RenderMathMLScripts created a complex render tree structures with anonymous style. Hence it seemed difficult to share the two cases or to handle DOM changes causing to move from one case to the other one. With our new implementation, this is simply reduced to simple C++ inheritance.

When I started to work on WebKit some years ago, I made the mistake of continuing with the existing approach. The implementation of multiscripts or automatic italic mathvariant added more anonymous objects and made the situation even worse. After the end of my crowdfunding project, Alex Castro did more cleanup and tried to implement important features such as displaystyle but he also soon realized that it was too hard to work with the current code base…

Layout Refactoring

In order to solve the issues discussed in the previous section, Javi and Alex worked on a new MathML branch where the first step was to remove the inheritance on the flexbox layout classes. During the Web Engines Hackfest 2015, I collaborated with the Igalia’s web platform team team to continue the work on this branch. In a second step, we rewrote many MathML renderer functions so that they stop creating anonymous nodes or style. We obtained very encouraging results: The implementation looked much simpler and much more understandable!

Alex announced the initial plan on the webkit-dev mailing list. He started opening bugs and attaching patches to merge the first step. However, that step still required many of the flexbox logic and so made code hard to understand for reviewers. Hence when I joined Igalia four months ago Alex asked me to try and see how to reorganize patches so that the two initial steps can be submitted in one go. This corresponds to the first phase mentioned in the introduction. As indicated on the wiki page, the layout refactoring consisted in rewriting the following member functions of each renderer class:

  • computePreferredLogicalWidths: calculate preferred widths, based on the preferred widths of child renderers.
  • layoutBlock: set final position and size of child renderers.
  • firstLineBaseLine: calculate the ascent of the renderer.
  • paint (optional): perform special painting such as fraction bars.

Refactored renderers no longer rely on any flexbox code nor anonymous renderers and the functions mentioned above essentially perform arithmetic computations. By reading the code, one can be sure that we follow standard layout rules and that we do not perform unnecessary reflow. Moreover, the rules specific to math rendering are only located in the MathML renderers and can be better understood. Details for each class are provided in the next subsections. After all the layout functions were rewritten and the code managing the render tree structure removed, we were able to make the RenderMathMLBlock class inherit from RenderBlock instead of RenderFlexibleBox. Many of the bugs could then be immediately closed or otherwise fixed with small follow-up patches.

Spacing

RenderMathMLSpace is a simple class inserting blank boxes for adjusting spacing of math formulas. Obviously, we do not need any of the complexity of flexbox so it was straightforward to write the layout functions.

3x Large space between 3 and x.

Grouping

RenderMathMLRow performs rendering of a row of math items. Since WebKit does not support linebreaking in MathML at the moment, this is just putting child boxes on a same baseline. One specificity is that some operators can be stretched vertically and so their width may depend on their height.

{ 2x x 3 Row containing a stretched brace, a fraction and a scripted element.

Again, flexbox features are useless here. With the old code, it was not clear whether we were violating the CSS invariant with preferred and logical widths and which kind of relayout or render tree changes would happen when doing the stretch call. By properly implementing the layout functions previously mentioned all of this became much more trustable.

Fractions

RenderMathMLFraction draws a fraction with numerator and denominator.

x+1y+2 Simple fraction.

This used to be implemented using a column direction for the fraction element. Numerator and denominator were wrapped into anonymous nodes with additional style to leave space for the fraction bar and to adjust the horizontal alignments.

RenderMathMLFraction (flexbox with column direction)
  RenderMathMLBlock (anonymous flexbox)
    RenderMathMLRow (numerator)
      ...
  RenderMathMLBlock (anonymous flexbox)
    RenderMathMLRow (denominator)
      ...

It was relatively easy to implement this without any anonymous nodes and again the use of flexbox did not sound justified. For example, to calculate the preferred width we just take the maximum of the preferred widths of the numerator and denominator. For the layout, the calculation of the logical width is similar and we calculate the horizontal coordinates of numerator and denominator so that their centers are aligned. Vertical metrics are similarly calculated from the vertical metrics of the numerator and denominator. During that step, we also fixed some bugs with the linethickness attribute and added support for some OpenType MATH table constants.

Scripts above and below

RenderMathMLUnderOver is used to attach some scripts above and below a base. Each child can itself be a horizontal stretchy operator.

base→ Base with stretchy arrow over it.

This was implemented in the user agent stylesheet by using flexboxes with column direction for the corresponding MathML elements and the C++ class had additional rules to fire the stretching. So the problems and solutions for this class were essentially a mixed of the cases of RenderMathMLFraction and RenderMathMLRow we just discussed.

Subscripts and Superscripts

RenderMathMLScripts is used for a base with some arbitrary number of scripts. All the scripts can have different positions (pre, post, sub, super) and metrics (width, ascent and descent). We must avoid collisions and take care of horizontal and vertical alignements.

base a b c d e f Base with pre and post scripts.

The old code used a complex render tree with additional style to achieve the best possible result. However, the quality was still bad as you can see for the script attached to the integral in the screenshot above. Managing the render tree was a nightmare: Just to give the idea, additional anonymous node and style were used to allow horizontal and vertical adjustments (similar to RenderMathMLFraction above) and prescripts had negative order property so that they were positioned before the base.

RenderMathMLScripts
    Base Wrapper (anonymous flexbox)
      RenderMathMLRow (base)
        ...
    SubSupPair Wrapper (anonymous flexbox with column direction)
      RenderMathMLRow (post-subscript)
        ...
      RenderMathMLRow (subperscript)
        ...
    SubSupPair Wrapper (anonymous flexbox with column direction)
      RenderMathMLRow (post-subscript)
        ...
      RenderMathMLRow (post-superscript)
        ...
    ... (more postscripts)
    RenderMathMLBlock (prescripts separator)
    SubSupPair Wrapper (anonymous flexbox with column direction and order -1)
      RenderMathMLRow (pre-subscript)
        ...
      RenderMathMLRow (pre-subperscript)
        ...
    SubSupPair Wrapper (anonymous flexbox with column direction and order -1)
      RenderMathMLRow (pre-subscript)
        ...
      RenderMathMLRow (pre-superscript)
        ...
    ... (more prescripts)

Rules from TeX and the OpenType MATH table are relatively complex and we decided to implement them directly in the new refactoring as otherwise it was impossible to get decent quality. The code is still complex but we now have clear rules, we only perform simple calculations and the render tree structure matches the DOM tree.

“Enclosing” Notations

RenderMathMLMenclose is a row of math items with some additional notations. Gurpreet Kaur implemented this element two years ago but she followed the same approch, combining anonymous nodes and style for some simple notations and special painting for others.

x+1 circle and strike notations

During the refactoring, the code has been completely rewritten so that RenderMathMLMenclose is now essentially a derived class of RenderMathMLRow with the measuring and painting functions adjusted to take into account the additional notations. During that refactoring, we also removed support for unused radical notation, which was implemented using an anonymous RenderMathMLSquareRoot (see Radicals section below).

Helper Classes for Operators

The RenderMathMLOperator class is used for math operators. It was quite complex class and we decided to extract from it two features that are unrelated to layout:

The remaining code was indeed the real layout part but the mess with anonymous node and style was only removed later (see Text Classes below). Although it seems we just needed to move the code out of RenderMathMLOperator into those new classes, the case of MathOperator was particularly difficult. We had to split the effort into several small steps to make review possible and also fixed many issues due to the entanglement and confusion of these three different features of the RenderMathMLOperator class… The work done for MathOperator actually improved the rendering of stretchy operators as you can see for the horizontal braces in the screenshot above.

Radicals

RenderMathMLRoot is used for square root or arbitrary N-th root. Many of the TeX and OpenType MATH table rules were already used by the old implementation with anonymous nodes and style. However, there were bugs difficult to fix related to zooming, child removal or style change due to the management of the anonymous RenderMathMLOperator to draw the radical sign.

x+1 + x+1 3 square and cube roots

The old implementation actually had two classes for the square and general cases (RenderMathMLSquareRoot and RenderMathMLRoot). The usual technique with various anonymous wrappers and style was used. After the refactoring, we were able to merge everything in a single RenderMathMLRoot class. Because the square root behaves as an mrow, we also made that class derive from RenderMathMLRow to reuse as much code as possible. Here is are how the render trees used to look like:

RenderMathMLSquareRoot
  RenderMathMLBlock (anonymous used for metric adjustements)
    RenderMathMLRadicalOperator (anonymous used for the radical symbol)
      ...
  RenderMathMLRootWrapper (anonymous used for the children)
    RenderMathMLRow (child 1)
      ...
    RenderMathMLRow (child 2)
      ...
    ...
    RenderMathMLRow (child N)
      ...

RenderMathMLRoot
  RenderMathMLRootWrapper (anonymous for the index)
    ...
  RenderMathMLBlock (anonymous used for metric adjustements)
     RenderMathMLRadicalOperator (anonymous used for the radical symbol)
       ...
  RenderMathMLRootWrapper (anonymous for the base)
    ...

Again, we rewrote the implementation using only simple box positioning. The difficult part was to get rid of the anonymous RenderMathMLRadicalOperator to draw the radical symbol. This class was derived from RenderMathMLOperator and extended it with some fallback drawing when math fonts were not available. After having extracted stretchy operator shaping from RenderMathMLOperator it became possible to use the MathOperator helper class to draw the radical symbol. We implemented the fallback for missing math fonts the same as Gecko: Use a scale transform to stretch the base glyph for U+221A SQUARE ROOT. As a bonus, we used such transform to implement glyph mirroring, as required to draw right-to-left radicals in some Arabic mathematical notations.

Text Classes

These classes are containers for math text such as variables or operators. There is a generic RenderMathMLToken class and a derived class RenderMathMLOperator adding features specific to operators such as spacing, dictionary property, stretching… Anonymous wrappers and style were used to implement automatic italic mathvariant or operator spacing. The RenderText child of RenderMathMLOperator was (re)built as an anonymous text node so that is was possible to convert U+002D HYPHEN-MINUS into U+2212 MINUS SIGN or to provide some text for anonymous operators created by RenderMathMLFenced (see Unchanged Classes section).

RenderMathMLToken (e.g. mi element)
  RenderMathMLBlock (anonymous flexbox used to apply CSS italic)
    RenderBlock (anonymous created elsewhere to honor CSS rules)
      RenderText
        text run "x"

RenderMathMLOperator (mo element)
   RenderMathMLBlock (anonymous flexbox used for spacing)
    RenderBlock (anonymous created elsewhere to honor CSS rules)
      RenderText (anonymous destroyed and built again)
         text run "−"

We did a big refactoring to remove all the anonymous nodes created by the MathML renderer classes. Just like for MathOperator, we had to be careful and submit various small pieces as the text rendering was quite sensible to code change.

The simplified operator spacing that was supported by WebKit was easy to implement with the new approach. To do automatic italic mathvariant, we modified the paint function to use Mathematical Alphanumeric Symbols instead of CSS italic as you can notice for the variables displayed in the screenshot above. Hence we could remove the RenderMathMLBlock anonymous wrapper.

The use of an anonymous node for the text prevented it to appear in the dumped render tree of layout tests and also required some hacks in the accessibility code to expose that text. In order to address the cases of the minus sign and of mfenced operators, we decided to use our new MathOperator class again. Indeed MathOperator is actually also able to draw unstretched operators made of a single character and this works for the minus sign and for mfenced operators used in practice.

Unchanged Classes

Two classes have not been modified but such modifications were not needed to remove the dependency on RenderFlexibleBox:

  • RenderMathMLFenced is used for an mrow-like element that is defined in the MathML specification as strictly equivalent to constructions with rows and operators. It is implemented as a derived class of RenderMathMLRow and creates anonymous RenderMathMLOperators. This is the only remaining class that modifies the render tree structure. Note that prominent MathML websites and generators do not use the mfenced element, so it is not a big concern.

  • RenderMathMLTable is used for table layout. It is just derived from RenderTable, not RenderFlexibleBox. We did not change anything for now but we considered creating our own implementation in order to make our code independent from HTML table, to support MathML-specific table features and to make it better integrated with the rest of the MathML code.

Accessibility

Even if our main focus was on rendering, the changes we made also had impact on the MathML accessibility code. Indeed, the accessibility tree is generated from the MathML renderer classes: Since we changed the latter during the refactoring, we also had to adjust the accessibility code. Fortunately, we are lucky to have Joanmarie Diggs in our team and she was able to provide some help here.

First, the accessibility code exposes the linethickness of fractions to implement Apple’s AXMathLineThickness attribute. In practice, this is really necessary to know whether the linethickness is null or not (e.g. binomial coefficient VS the Legendre symbol). Apple’s unit test seemed to expose the ratio between the actual thickness and the default thickness but the accessibility code really just reads the actual thickness calculated by RenderMathMLFraction. Our fix and improvement for linethickness made the Apple’s unit test fail so we had to adjust RenderMathMLFraction to expose the value expected by that test.

In general, the accessibility code does not care about anonymous nodes created for layout purpose and there was some code to avoid exposing them in the accessibility tree. So removing all the anonymous during the layout refactoring was actually a good and safe thing to do. There were some helper functions to implement Apple’s AXMathRootRadicand and AXMathRootIndex attributes that had to be adjusted, though. These functions used to do some work to skip the anonymous wrappers and we were actually able to simplify them.

There was also some specific code for the RenderMathMLOperators and their anonymous RenderText that were necessary to expose the text content. Actually, there was an old bug in the accessibility code and the anonymous RenderMathMLOperators created by mfenced were not correctly exposed. The unit test we had for mfenced operators was only checking the text content but it was still passing and so the regression had never been detected before. After the layout refactoring we removed the anonymous RenderText of mfenced operators and so broke that test… We thus spent some time to fix the RenderMathMLOperator code. Essentially, we removed all the old hacks and only left a specific handling for mfenced operators. We also used this opportunity to improve and extend our MathML accessibility tests.

Finally, the MathML accessibility code was directly implemented into a generic AccessibilityRenderObject class. There was some functions to access math nodes and properties but also specific cases scattered all over the code (anonymous boxes, mfenced operators, math roles etc). In order to facilitate future work and maintenance we decided to move all the MathML code into a new AccessibilityMathMLElement class. Hence the work implied by the layout refactoring actually encouraged us to improve the organization and testing of our accessibility code!

Conclusion

In the past four months, Igalia’s web platform team has successfully upstreamed the refactoring of WebKit’s MathML renderer classes and we are now very confident about the quality of the layout code. In addition to the people mentioned above I would personally like to thank everybody who helped with this work. More specifically, I am very grateful to other people from Igalia (Martin Robinson, Sergio Villar and Manuel Rego) or Apple (Brent Fulgham and Darin Adler) who have spent some time to review patches. As a nice side effect of this work, mathematical formulas look better and the accessibility code has been improved. More is happenning in the next two phases. We are looking forward to continuing implementation of Web standards and collaboration with browser vendors at the next Web Engines Hackfest!

July 08, 2016 10:00 PM

July 01, 2016

Víctor Jáquez

VA-API and DRM/KMS in MinnowBoard

In 2012 I started to work on a video renderer for GStreamer which uses directly the DRM/KMS kernel subsystem to display images. I even blogged about it, but I didn’t finished it.

Nonetheless, in December last year, a customer asked us to finish the element, but this time focusing on i.MX6, rather than OMAP4. Consequently, early this year, kmssink got merged into upstream.

The video sink has some nice features, such as DMA-buf importing through the PRIME kernel interface. This feature makes possible a zero-copy path when the video decoder delivers dmabuf based frames. For this particular project, the kernel we used supports the CODA media subsystem, and through the GStreamer element v4l2videodec we could link pipelines negotiating dmabuf sharing, and thus we played videos very efficiently.

During the last GStreamer Hackfest, Nicolas Dufresne got working the MFC media subsystem, for Exynos, with v4l2videodec and kmssink, but I don’t remember if he also got the dmabuf sharing working.

All in all, kmssink had only been tested in a couple of ARM devices, so I wonder, as a gstremer-vaapi co-maintainer, if it also works in x86 devices.

Some colleagues at Igalia, use a Minnowboard to test WebKit4Wayland, so I borrowed it, installed a minimal Fedora 23 in it, and, surprisingly, I found out that it has a nice VA-API support:

   $ vainfo
   error: can't connect to X server!
   libva info: VA-API version 0.38.1
   libva info: va_getDriverName() returns 0
   libva info: Trying to open /usr/lib64/dri/i965_drv_video.so
   libva info: Found init function __vaDriverInit_0_38
   libva info: va_openDriver() returns 0
   vainfo: VA-API version: 0.38 (libva 1.6.2)
   vainfo: Driver version: Intel i965 driver for Intel(R) Bay Trail - 1.6.2
   vainfo: Supported profile and entrypoints
         VAProfileMPEG2Simple            : VAEntrypointVLD
         VAProfileMPEG2Simple            : VAEntrypointEncSlice
         VAProfileMPEG2Main              : VAEntrypointVLD
         VAProfileMPEG2Main              : VAEntrypointEncSlice
         VAProfileH264ConstrainedBaseline: VAEntrypointVLD
         VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
         VAProfileH264Main               : VAEntrypointVLD
         VAProfileH264Main               : VAEntrypointEncSlice
         VAProfileH264High               : VAEntrypointVLD
         VAProfileH264High               : VAEntrypointEncSlice
         VAProfileH264StereoHigh         : VAEntrypointVLD
         VAProfileVC1Simple              : VAEntrypointVLD
         VAProfileVC1Main                : VAEntrypointVLD
         VAProfileVC1Advanced            : VAEntrypointVLD
         VAProfileNone                   : VAEntrypointVideoProc
         VAProfileJPEGBaseline           : VAEntrypointVLD


Afterwards, I compiled GStreamer using gst-uninstalled in order to have kmssink and the master branch of gstreamer-vaapi. And got it! I had hardware accelerated decoding with VA-API, and video rendering using KMS/DRM. But, sadly, no zero copy, since, in order to have dmabuf sharing, we have to finish and to merge bug 755072 in gstreamer-vaapi.

Running the typical Big Bunny video (1080p, H.264) the playback consumes less than the 50% of the CPU usage, compared with the 100% of CPU usage with software decoders (and a lot of frames dropped).

I recorded a small video with the experiment as a proof.

Update: Javer Martinez told me in twitter, that MFC can do dma-buf export since uses vb2 and Exynos DRM can import, but he couldn’t get zero copy to work due missing format support.

by vjaquez at July 01, 2016 05:00 PM

June 21, 2016

Xavier Castaño

We’re proud to sponsor #Linuxcon North America and to celebrate the 25th anniversary of #Linux and #opensource!

LinuxCon North America and ContainerCon features 175+ sessions for developers, operators, users and other open source professionals with a range of content covering Linux, Containers, Cloud and much more!

Join us in Toronto, Canada where you’ll experience:

  1. Bonus Content Day featuring Docker 101 lab and tutorials
  2. Co-located events such as CloudNativeDay, KVM Forum, Linux Security Summit, Open Source Storage Summit, and Xen Project Developer Summit
  3. 25th Anniversary of Linux Casino Royale Gala on Wednesday, August 24th

Igalia will have a booth there so don’t miss your chance to attend and visit us!

by xavier castano at June 21, 2016 09:54 AM

June 20, 2016

Javier Muñoz

Ansible AWS S3 core module now supports Ceph RGW S3

The Ansible AWS S3 core module now supports Ceph RGW S3. The patch was upstream today and it will be included in Ansible 2.2

This post will introduce the new RGW S3 support in Ansible together with the required bits to run Ansible playbooks handling S3 use cases in Ceph Jewel

.

The Ansible project

Ansible is a simple IT automation engine that automates cloud provisioning, configuration management, application deployment and intra-service orchestration among many other IT needs.

Ansible works by connecting to nodes (SSH/WinRM) and pushing out small programs, called 'Ansible modules' to them. These programs are written to be resource models of the desired state of the system. Ansible then executes these modules and removes them when finished.

Ansible uses playbooks to orchestrate the infrastructure with very detailed control. Those playbooks define configuration policies and orchestration workflows. They are a YAML definition of automation tasks that describe how a particular piece of automation should be done.

Playbooks are modeled as a collection of plays, each of which defines a set of tasks to be executed on a group of remote hosts. A play also defines the environment where the tasks will be executed.

Ansible modules ensure indempotence so it is possible running the same tasks over and over without affecting the final result.

Using the Ansible AWS S3 core module with Ceph

The Ceph RGW S3 support is part of the Amazon S3 core module in Ansible. The AWS S3 core module allows the user to manage S3 buckets and the objects within them. It includes support for creating and deleting both objects and buckets, retrieving objects as files or strings and generating download links.

The patch leverages the AWS S3 use cases without any restriction or limitation with regions, URLs, etc.

To enable the RGW S3 flavour in the S3 core module you set the 'rgw' boolean option to 'true' and the 's3_url' string option to the RGW S3 server.

controller:~$ ansible rgw.test.node -m s3 -a \
"mode=list bucket=my-bucket s3_url=http://rgw.test.server:8000 rgw=true"

The 's3_url' option is mandatory in the RGW S3 flavour.

Testing the RGW S3 core module flavour via Playbook

This Playbook example tests the RGW S3 flavour in a simple three boxes network ('controller', 'rgw.test.node' and 'rgw.test.server')

The 'controller' host is the box running the Ansible engine.

The 'rgw.test.node' is the host connecting to the Ceph RGW server and running the S3 use cases.

The 'rgw.test.server' is the Ceph RGW S3 server.

Running the testing Playbook...

controller:~$ ansible-playbook playbook-test-rgw-s3-core-module.yml

PLAY [all] *********************************************************************

TASK [setup] *******************************************************************
ok: [rgw.test.node]

TASK [create test_file test-file.txt] ******************************************
changed: [rgw.test.node]

TASK [put s3_bucket_items my-test-object-1] ************************************
changed: [rgw.test.node] => (item=my-test-object-1.txt)
changed: [rgw.test.node] => (item=my-test-object-2.txt)

TASK [get s3_bucket_items] *****************************************************
ok: [rgw.test.node]

TASK [download s3_bucket_items] ************************************************
changed: [rgw.test.node] => (item=my-test-object-1.txt)
changed: [rgw.test.node] => (item=my-test-object-2.txt)

TASK [remove s3_bucket_and_content] ********************************************
changed: [rgw.test.node]

PLAY RECAP *********************************************************************
rgw.test.node              : ok=6    changed=4    unreachable=0    failed=0

As expected, the Playbook runs the 'create', 'upload', 'list', 'download' and 'remove' S3 use cases for buckets and objects. Adding the '-v' switch will show a more verbose output.

Wrap-up

All examples, modules and playbooks related to RGW S3 were tested on the new Ceph Jewel release.

Beyond of the RGW S3 support in the AWS S3 core module you could be interested in Ansible Playbooks to set up and configure Ceph clusters automatically. You can find those Playbooks in ceph/ceph-ansible with general support for Monitors, OSDs, MDSs and RGW.

The primary documentation for Ansible is available here. I found the Ansible whitepapers a great resource too.

Acknowledgments

My work in Ansible is sponsored by Outscale and has been made possible by Igalia and the invaluable help of the Ansible community. Thank you all!

by Javier at June 20, 2016 10:00 PM

Manuel Rego

My BlinkOn 6 Summary: Grid Layout, Houdini &amp; MathML

Igalia could not miss the chance to participate in a new European edition of BlinkOn. So past week my colleague Dape and I were attending BlinkOn 6 in Munich. In this post I’d do a personal summary about the main conference highlights.

Status of CSS Grid Layout implementation

During the conference I gave a talk about the status of Grid Layout on Blink. I went over the whole spec checking the things that are DONE, WIP or TODO. The summary is that on top of the things we’re already working on, and that will be landing very soon (auto-fit repeat, orthogonal flows, normal value in align|justify-items properties, etc.), we’ve just a few tasks pending (most of them changes on the spec in the last 2 months).

The main features pending are:

  • Fragmentation: This is an issue on the whole project and not only related to Grid Layout. For example, Blink doesn’t have proper fragmentation support on Flexbox either, so probably it’s not a blocker in order to ship Grid Layout. This will affect you when you try to print a grid and the rows are cut in the middle, actually they should have been moved completely to the next page. Of course, it’d be really nice to have it fixed and working as expected.
  • Subgrids: This is a hot topic and the opinions vary a lot depending on whom you’re asking. For some people, we should ship without subgrids support and add it later (as the change won’t be breaking any content); other believe that we shouldn’t ship until subgrids are implemented. I think it’s still needed more discussion between the CSS Working Group and the different browser vendors to reach an agreement on this issue.

You can find the slides of my talk on this blog already (notice that you’d need Chrome Canary with the experimental flag enabled to see them properly). Sadly the talks outside the main room were not recorded, so there won’t be any video of mine.

Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

From the different conversations I had, everyone seems really happy about the work we’re doing on Grid Layout, so thank you very much for the kind words. 😉 As you probably already know Igalia work on this feature has been sponsored by Bloomgerg, big kudos to them!

CSS Houdini

Houdini had a big presence on the conference, it seems clear that Google is really pushing for this to happen. It was nice to see the current status of some APIs that are already working in an experimental way like Typed OM and Painting API. Also, it’s great to check that other browsers are already taking some initial steps too, like Firefox with the Properties and Values API.

Userspace vector graphics using Houdini & Custom Elements by Shane Stephens Userspace vector graphics using Houdini & Custom Elements by Shane Stephens

Apart from the introductory talk on the main room there were 2 more sessions:

MathML in Chromium?

Just in case you don’t know, Igalia has been lately working on the refactoring of the MathML implementation in WebKit. The rationale was that the previous code was a pain to maintain and improve (e.g. it depended a lot on Flexbox making it really hard to implement new features), actually this was one of the reasons why it was removed from Blink after the fork. The refactoring is still ongoing but several patches have already landed.

As the new code (after the refactoring) was looking much better in WebKit my colleague Fred Wang started to port it to Blink on a GitHub repository. The initial results are looking pretty good, but all the code from WebKit hasn’t been ported yet.

MathML mentioned on the Houdini session MathML mentioned on the Houdini session

Thus, I took advantage of the conference to talk about this work with several people in order to check the feasibility of bringing MathML back to Chromium. After some discussions it seems clear that Google would be interested in having MathML support, however at this moment they seem more inclined to do it through the new Houdini APIs like CSS Layout and Fonts Metrics that are still being worked on. MathML seems to be a nice use case to verify that they work as expected.

If they follow this approach, it probably means that we won’t see MathML supported on Blink in the short-term, as we’d need to wait for those APIs to be ready. In any case, Igalia will keep an eye on all this stuff looking for a good opportunity to make it a reality.

Summary

As usual for me the most important part of these conferences is having the chance to meet some new faces and old friends too. It’s really nice to have the chance to spend a while talking face to face with people that have been interacting with you on the Internet for a long time.

Regarding Grid Layout I hope that the different parties can find a good way to bring it to the web authors in the 3 major browser (Chrome, Safari and Firefox) almost synchronously. I cannot say if it’s going to happen this year or the next one, but I’m sure it’s going to be something really big for the Web!

All the CSS Houdini stuff sounds really cool but a long-term thing at this moment. Let’s see how long we need to wait until we can use these new APIs for real.

Munich's Town Hall from from the Tower of St. Peter's Church Munich’s Town Hall from from the Tower of St. Peter’s Church

Finally, it was nice to visit Munich and to have some time for a walk around the city. We couldn’t find any single hill around the city center, next time it might be a good idea to rent a bike.

June 20, 2016 10:00 PM

June 14, 2016

Antonio Gomes

Understanding Chromium’s runtime ozone platform selection

Requisites

For the context of this post, it is assumed a ‘content_shell’ ChromeOS GN build produced with the following commands:

$ gn gen --args='target_os="chromeos"
                 use_ozone=true
                 ozone_platform_wayland=true
                 use_wayland_egl=false
                 ozone_platform_x11=true
                 ozone_platform_headless=true
                 ozone_auto_platforms=false' out-gn-ozone

$ ninja -C out-gn-ozone blink_tests

Ozone at runtime

Looking at the build arguments above, one can foresee that the content_shell app will be able to run on the following Graphics backends: Wayland (no EGL), X11, and “headless” – and it indeed is. This happens thanks to Chromium’s Graphics layer abstraction, Ozone.

So, in order to run content_shell app on Ozone platform “bleh”, one simply does:

$ out-gn-ozone/content_shell --ozone-platform="bleh"

Simple no? Well, yes .. and not as much.

The way the desired Ozone platform classes/objects are instantiated is interesting, involving c++ templates, GYP/GN hooks, python generated code, and more. This post aims to detail the process some more.

Ozone platform selection logic

Two methods kick off OzonePlatform instantiation: ::InitializeForUI and ::InitializeForGPU . They both call ::CreateInstance(), which is our starting point.

This is how simple it looks:

63 void OzonePlatform::CreateInstance() {
64     if (!instance_) {
(..)
69         std::unique_ptr<OzonePlatform> platform =
70             PlatformObject<OzonePlatform>::Create();
71
72         // TODO(spang): Currently need to leak this object.
73         OzonePlatform* pl = platform.release();
74         DCHECK_EQ(instance_, pl);
75     }
76 }

Essentially, when PlatformObject<T>::Create is ran (lines 69 and 70), it ends up calling a method named Create{T}Bleh, where

  • “T” is the template argument name, e.g. “OzonePlatform”.
  • “bleh” is the value passed to –ozone-platform command line parameter.

For instance, in the case of ./content_shell –ozone-platform=x11, the method called would CreateOzonePlatformX11, following the pattern Create{T}Bleh (i.e. “Create”+”OzonePlatform”+”X11”).

The actual logic

In order to understand how PlatformObject class works, lets start by looking at its definition (ui/ozone/platform_object.h & platform_internal_object.h):

template <class T> class PlatformObject {
 public:
  static std::unique_ptr<T> Create();
};
16 template <class T>
17 std::unique_ptr<T> PlatformObject<T>::Create() {
18     typedef typename PlatformConstructorList<T>::Constructor Constructor;
19
20     // Determine selected platform (from --ozone-platform flag, or default).
21     int platform = GetOzonePlatformId();
22
23     // Look up the constructor in the constructor list.
24     Constructor constructor = PlatformConstructorList<T>::kConstructors[platform];
26     // Call the constructor.
27     return base::WrapUnique(constructor());
28 }

In line 24 (highlighted above), the ozone platform runtime selection machinery actually happens. It retrieves a Constructor, which is a typedef for a PlatformConstructorList<T>::Constructor.

By looking at the definition of PlatformConstructorList class (below), Constructor is actually a pointer to a function that returns a T*.

14 template <class T>
15 struct PlatformConstructorList {
16     typedef T* (*Constructor)();
17     static const Constructor kConstructors[kPlatformCount];
18 };

Ok, so basically here is what we know this far:

  1. OzonePlatform::CreateInstance method calls OzonePlatform<bleh>::Create
  2. OzonePlatform<bleh>::Create picks up an index and retrieves a PlatformConstructorList<bleh>::Constructor (via kConstructor[index])
  3. PlatformConstructorList<bleh>::Constructor is a typedef to a function pointer that returns a bleh*.
  4. (..)
  5. This chain ends up calling Create{bleh}{ozone_platform}()

But wait! kConstructors, the array of pointers to functions – that solves the puzzle – is not defined anywhere in src/! 🙂

This is because its actual definition is part of some generated code triggered by specific GN/GYP hooks. They are:

  • generate_ozone_platform_list which generates out/../platform_list.cc,h,txt
  • generate_constructor_list which generates out/../constructor_list.cc though generate_constructor_list.py

In the end out/../generate_constructor_list.cc has the definition of kConstructors.

Again, in the specific case of the given GN build arguments, kConstructors would look like:

template <> const OzonePlatformConstructor
PlatformConstructorList<ui::OzonePlatform>::kConstructors[] = {
    &ui::CreateOzonePlatformHeadless,
    &ui::CreateOzonePlatformWayland,
    &ui::CreateOzonePlatformX11,
};

Logic wrap up

  • GYP/GN hooks are ran at build time, and generate plaform_list.{txt,c,h} as well as constructor_list.cc files respecting ozone_platform_bleh parameters.
    • constructor_list.cc has PlatformConstructorList<bleh>kConstructors actually populated.
  • ./content_shell –ozone-platform=bleh is called
  • OzonePlatform::InitializeFor{UI,Gpu}()
  • OzonePlatform::CreateInstance()
  • PlaformObject<OzonePlatformBleh>::Create()
  • PlatformConstructorList<bleh>::Constructor is retrieved – it is a pointer to a function stored in PlatformConstructorList<bleh>::kConstructor
  • function is ran and an OzonePlatformBleh instance is returned.

by agomes at June 14, 2016 04:28 AM

June 02, 2016

Alejandro Piñeiro

Introducing Mesa intermediate representations on Intel drivers with a practical example

Introduction

The recent big news on the Igalia work on Mesa was that our effort getting the ARB_gpu_shader_fp64 and ARB_vertex_attrib_64bit extensions implemented for Intel Gen8+, allowed to expose OpenGL 4.2 for Gen8+. But I will let other igalians to talk in details about them (no pressures ;)).

In a previous blog post I mentioned that NIR was intended to replace GLSL IR. Although that was true on the context I was talking about, that comment could be somewhat misleading, so I will try to clarify it.

Intermediate representations on Intel Mesa drivers

So first, let’s list the intermediate representations that you would find when working on Mesa Intel drivers:

  • AST (Abstract Syntax Tree): calling it an Intermediate Language is somewhat an abuse of language. This is the tree representation of your GLSL shader just after parsing it with Flex/Bison.
  • Mesa IR (Intermediate Representation): also called HIR and GLSL IR. A real Intermediate Language. It is converted from AST. Here you have optimizations, link support, etc.
  • NIR (New Intermediate Representation): a new Intermediate Language added recently.

So the first questions would be, why three? Having AST and another intermediate representation is easier to explain. AST is a raw tree representation, not useful to generate code. But why IR and NIR?

Current Mesa IR was created some years ago. The design decisions behind it had their advantages. But it has also some disadvantages. For example, their tree-like structure make it complex to traverse, making difficult to navigate, implement optimizations, etc. Other disadvantage was that it was not on SSA form, making again some optimizations hard to write. You can see a summary of all this on Ian Romanick’s presentation “Three Years Experience with a Tree-like Shader IR“.

So at some point an effort was started to “flatenize” Mesa IR and adding SSA support. But the conclusion was that the effort to modify Mesa IR was so big, that it was worth to just start from scratch using the learned lessons, as explained on Connor Abbot’s email “A new IR for Mesa“, in which he proposed this new IR.

Some time later, NIR was ready for production, and as I mentioned on my blog post (that one I’m clarifying right now), some parts of Mesa Intel driver was reimplemented in order to use NIR instead of Mesa IR. So Mesa IR was being replaced there. Where exactly? The parts where the final assembly code was being generated. And now that that is finished (at least on the i965 driver), we can say that Mesa IR is not used to generate code at all.

So right now there are an AST->Mesa IR->NIR chain. What is the plan now? Generate an AST->NIR pass and completely remove Mesa IR? This same question was asked (among other things) on January 2016, on the mesa-dev email “Nir, SCons, and Gallium“.  And the answer was “no”, as you can see on two Ian Romanick’s replies (here and here). The summary is that Mesa IR has several GLSL specifics that aren’t appropriate for NIR’s level. In that sense, NIR is a step below Mesa IR, more near to the GPU needs. It is also worth to mention that since then, Vulkan support was added to Mesa. In order to support Spir-V (Vulkan’s shader language, that is an intermediate representation itself), a SPIRV->NIR pass was created. In that sense, for OpenGL there is an OpenGL-specific intermediate representation, that is Mesa IR, and for Vulkan there is a Vulkan-specific intermediate representation, that is Spirv, and both are translated to the same common representation, NIR.

Practical example:

So, what means that in the practice? Do I need to deal with all those intermediate representations? Well, as anything in life, that would depend. If for example, you want to provide the support for a new GLSL feature or a specific hw, you would need to touch all three. You would need to modify the flex/bison files, so AST would need to be updated, and then Mesa IR, NIR, and the passes that transform one to the other. But if you want to give support for an GLSL feature that is already supported, but on new hw, the most likely is that you will not need changes on any of them, but just on the code that generate the final assembly using NIR.

And what would happen to other features like warnings and errors? Right now most of them are detected at the AST level, and some at the IR level. NIR doesn’t trigger any error/warning yet. It contains several asserts, but basically because it assumes that at that moment the representation of the shader should be correct, so if you find something wrong, means that the developer working on NIR is doing something wrong, in opposite to the developer that wrote the GLSL shader.

So lets go for a practical example I was working on: uninitialized variable warnings (“undefined values” on the bug tracking the issue). In short, this is about warn the developer about things like this:

out color;

void main()

{
 vec4 myTemp;

 color = myTemp;
}

What would be the color on screen? Who knows. So although is a feature you can live without, it is a good nice-to-have.

On the original bug, it was mentioned that this kind of errors are easy to detect, as NIR has a type ssa_undefs, so we just need to check if they are used. And in fact, when I started to work on it, I quickly find how to raise a warning. For example, on the method nir_print.c:print_ssa_def, used to debug, it is easy to modify it in order to point that it is using a undef. But trying to raise the warning there have some problems:

  • As mentioned NIR doesn’t have any warning/error triggering mechanism implemented, you would need to add them.
  • You want to include this warning on the OpenGL InfoLog, but as mentioned NIR is used for both OpenGL and Vulkan, and right now it doesn’t maintains any info about the origin.
  • And perhaps more important, at that point you lack the context data that points which source code line you are working on.

In fact, the last bullet point also applies to Mesa IR. There are some warnings raised at the Mesa IR level, but are “line-less”. Not sure any other developer, but for me, this kind of warning without a reference to the source line number would be annoying to use. Does that mean that the only option would be the totally raw AST tree?

Fortunately, Mesa IR was already saving if a variable was being statically assigned or not (to check some other possible errors). This was being computed on the AST to Mesa IR pass, and in fact the documentation mentions that this value is only valid at that moment. We would be on the middle of AST and Mesa IR. So when to raise the warning? The straightforward solution would be when a variable is , just before/after the error “variableX undeclared” is raised. But that is not so easy. For example:

float myFloat1;
float myFloat2;

myFloat1 = myFloat2;

How many warnings should we raise? Just one, for myFloat2. But technically we are also using myFloat1, and it is uninitialized. So we need to differentiate both cases. Being AST so raw, at that point we don’t have that information, and in fact it is also impossible to go up to the parent expression in order to compute that information. So it was needed to add an attribute on the AST node, that I called is_lhs (as “is left hand side”). That variable would be set when parent expressions are being transformed.

If you are taking attention, probably you start to see what would be the collateral effect of this. Being AST so raw, and OpenGL specific, there would be several corner cases needed to be manually assigned. In fact the  first commit of the series is already covering several corner cases. And in spite of this, once the code reached master, there were two cases of false positives that needed extra checks (for builtin-variables and for inout/out function parameters)

After those two false positives, managing this warning was spread all along the code that made the AST to Mesa IR pass, so seemed easy to broke. So I decided to send too some unit tests to verify that it gets working. First I sent the make check test that tested that warning, and then the unit tests. 30 unit tests (it was initially 28, but reviewer asked two more). Not a bad number for a warning.

Final words

At this point, one would wonder if it still makes sense to have this warning on the AST to Mesa IR pass, and if it would have it better to do it as initially proposed, on NIR. But although it is true that “just detecting it” would be easier on NIR, without dealing with so many corner cases, I still think that adding the support for raising warning/errors compatible with OpenGL Infolog, and bringing somehow the original source code line number, would mean too many changes on both Mesa IR and NIR. More changes that dealing with those corner cases when using the variable on the AST to Mesa IR pass. And in any case, if in the future the situation changes, and makes sense to move the warning to NIR, we would have the unit tests that would help to ensure that we don’t introduce regressions.

In relation to the intermediate representations, just to note that I’m focusing on the Intel driver. Gallium drivers use other intermediate representation, called TGSI. As far as I know, on those drivers, they have a AST->Mesa IR->TGSI chain, and right now there is a work in progress AST->Mesa IR->NIR->TGSI chain that will be used on some specific cases. But all this is beyond my knowledge, so you would need to investigate if you are interested.

Appendix, extra documentation:

If you want more details about MESA IR, you can read:

  • Read Mesa IR README
  • Read past blog posts from fellow igalian Iago Toral (when NIR was not available yet): post 1 and post 2

If you want extra information about Mesa NIR, you can read:

 

 

by infapi00 at June 02, 2016 05:47 PM

Xavier Castaño

Igalia sponsors LinuxCon and ContainerCon Japan 2016, the premier Linux Conference in Asia

Join us at LinuxCon + ContainerCon Japan! LinuxCon Japan is the premier Linux conference in Asia that brings together a unique blend of core developers, administrators, users, community managers and industry experts. It is designed not only to encourage collaboration but to support future interaction between Japan and other Asia Pacific countries and the rest of the global Linux community.

ContainerCon also comes to Japan for the first time this year, after a successful launch in North America in 2015. New developments in Linux containers are driving the adoption of cloud and virtualization technologies in many industries and ContainerCon Japan will provide a platform for exhibiting the best work.

My colleague Mi Sun Silvia Cho will be talking about WebKit For Wayland and we will show some nice demos about our Browsers, Graphics and Compilers experience in our booth in Tokyo next month.

by xavier castano at June 02, 2016 07:10 AM

May 26, 2016

Manuel Rego

CSS Grid Layout and positioned items

As part of the work done by Igalia in the CSS Grid Layout implementation on Chromium/Blink and Safari/WebKit, we’ve been implementing the support for positioned items. Yeah, absolute positioning inside a grid. 😅

Probably the first idea is that come to your mind is that you don’t want to use positioned grid items, but maybe in some use cases it can be needed. The idea of this post is to explain how they work inside a grid container as they have some particularities.

Actually there’s not such a big difference compared to regular grid items. When the grid container is the containing block of the positioned items (e.g. using position: relative; on the grid container) they’re placed almost the same than regular grid items. But, there’re a few differences:

  • Positioned items don't stretch by default.
  • They don't use the implicit grid. They don't create implicit tracks.
  • They don't occupy cells regarding auto-placement feature.
  • autohas a special meaning when referring lines.

Let’s explain with more detail each of these features.

Positioned items shrink to fit

We’re used to regular items that stretch by default to fill their area. However, that’s not the case for positioned items, similar to what a positioned regular block does, they shrink to fit.

This is pretty easy to get, but a simple example will make it crystal clear:

<div style="display: grid; position: relative;
            grid-template-columns: 300px 200px; grid-template-rows: 200px 100px;">
    <div style="grid-column: 1 / 3; grid-row: 1 / 3;"></div>
    <div style="position: absolute; grid-column: 1 / 3; grid-row: 1 / 3;"></div>
</div>

In this example we’ve a simple 2x2 grid. Both the regular item and the positioned one are placed with the same rules taking the whole grid. This defines the area for those items, which takes the 1st & 2nd rows and 1st & 2nd columns.

Positioned items shrink to fit Positioned items shrink to fit

The regular item stretches by default both horizontally and vertically, so it takes the whole size of the grid area. However, the positioned item shrink to fit and adapts its size to the contents.

For the examples in the next points I’m ignoring this difference, as I want to show the area that each positioned item takes. To get the same result than in the pictures, you’d need to set 100% width and height on the positioned items.

Positioned items and implicit grid

Positioned items don’t participate in the layout of the grid, neither they affect how items are placed.

You can place a regular item outside the explicit grid, and the grid will create the required tracks to accommodate the item. However, in the case of positioned items, you cannot even refer to lines in the implicit grid, they'll be treated as auto. Which means that you cannot place a positioned item in the implicit grid. they cannot create implicit tracks as they don't participate in the layout of the grid.

Let’s use an example to understand this better:

<div style="display: grid; position: relative;
            grid-template-columns: 200px 100px; grid-template-rows: 100px 100px;
            grid-auto-columns: 100px; grid-auto-rows: 100px;
            width: 300px; height: 200px;">
    <div style="position: absolute; grid-area: 4 / 4;"></div>
</div>

The example defines a 2x2 grid, but the positioned item is using grid-area: 4 / 4; so it tries to goes to the 4th row and 4th column. However the positioned items cannot create those implicit tracks. So it’s positioned like if it has auto, which in this case will take the whole explicit grid. auto has a special meaning in positioned items, it’ll be properly explained later.

Positioned items do not create implicit tracks Positioned items do not create implicit tracks

Imagine another example where regular items create implicit tracks:

<div style="display: grid; position: relative;
            grid-template-columns: 200px 100px; grid-template-rows: 100px 100px;
            grid-auto-columns: 100px; grid-auto-rows: 100px;
            width: 500px; height: 400px;">
    <div style="grid-area: 1 / 4;"></div>
    <div style="grid-area: 4 / 1;"></div>
    <div style="position: absolute; grid-area: 4 / 4;"></div>
</div>

In this case, the regular items will be creating the implicit tracks, making a 4x4 grid in total. Now the positioned item can be placed on the 4th row and 4th column, even if those columns are on the explicit grid.

Positioned items can be placed on the implicit grid Positioned items can be placed on the implicit grid

As you can see this part of the post has been modified, thanks to @fantasai for notifying me about the mistake.

Positioned items and placement algorithm

Again the positioned items do not affect the position of other items, as they don’t participate in the placement algorithm.

So, if you’ve a positioned item and you’re using auto-placement for some regular items, it’s expected that the positioned one overlaps the other. The positioned items are completely ignored during auto-placement.

Just showing a simple example to show this behavior:

<div style="display: grid; position: relative;
            grid-template-columns: 300px 300px; grid-template-rows: 200px 200px;">
    <div style="grid-column: auto; grid-row: auto;"></div>
    <div style="grid-column: auto; grid-row: auto;"></div>
    <div style="grid-column: auto; grid-row: auto;"></div>
    <div style="position: absolute;
                grid-column: 2 / 3; grid-row: 1 / 2;"></div>
</div>

Here we’ve again a 2x2 grid, with 3 auto-placed regular items, and 1 absolutely positioned item. As you can see the positioned item is placed on the 1st row and 2nd column, but there’s an auto-placed item in that cell too, which is below the positioned one. This shows that the grid container doesn’t care about positioned items and it just ignores them when it has to place regular items.

Positioned items and placement algorithm Positioned items and placement algorithm

If all the children were not positioned, the last one would be placed in the given position (1st row and 2nd column), and the rest of them (auto-placed) will take the other cells, without overlapping.

Positioned items and auto lines

This is probably the biggest difference compared to regular grid items. If you don’t specify a line, it’s considered that you’re using auto, but auto is not resolved as span 1 like in regular items. For positioned items auto is resolved to the padding edge.

The specification introduces the concepts of the lines 0 and -0, despite how weird it can sound, it actually makes sense. The auto lines would be referencing to those 0 and -0 lines, that represent the padding edges of the grid container.

Again let’s use a few examples to explain this:

<div style="display: grid; position: relative;
            grid-template-columns: 300px 200px; grid-template-rows: 200px 200px;
            padding: 50px 100px; width: 500px; height: 400px;">
    <div style="position: absolute;
                grid-column: 1 / auto; grid-row: 2 / auto;"></div>
</div>

Here we have a 2x2 grid container, which has some padding. The positioned item will be placed in the 2nd row and 1st column, but its area will take up to the padding edges (as the end line is auto in both axis).

Positioned items and auto lines Positioned items and auto lines

We could even place positioned grid items on the padding itself. For example using “grid-column: auto / 1;” the item would be on the left padding.

Positioned items using auto lines to be placed on the left padding Positioned items using auto lines to be placed on the left padding

Of course if the grid is wider and we’ve some free space on the content box, the items will take that space too. For example:

<div style="display: grid; position: relative;
            grid-template-columns: 300px 200px; grid-template-rows: 200px 200px;
            padding: 50px 100px; width: 600px; height: 400px;">
    <div style="position: absolute;
                grid-column: 3 / auto; grid-row: 2 / 3;"></div>
</div>

Here the grid columns are 500px, but the grid container has 600px width. This means that we’ve 100px of free space in the grid content box. As you can see in the example, that space will be also used when the positioned items extend up to the padding edges.

Positioned items taking free space and right padding Positioned items taking free space and right padding

Offsets

Of course you can use offsets to place your positioned items (left, right, top and bottom properties).

These offsets will apply inside the grid area defined for the positioned items, following the rules explained above.

Let’s use another example:

<div style="display: grid; position: relative;
            grid-template-columns: 300px 200px; grid-template-rows: 200px 200px;
            padding: 50px 100px; width: 500px; height: 400px;">
    <div style="position: absolute;
                grid-column: 1 / auto; grid-row: 2 / auto;
                left: 90px; right: 70px; top: 80px; bottom: 60px;"></div>
</div>

Again a 2x2 grid container with some padding. The positioned item have some offsets which are applied inside its grid area.

Positioned items and offets Positioned items and offets

Wrap-up

I’m not completely sure about how important is the support of positioned elements for web authors using Grid Layout. You’ll be the ones that have to tell if you really find use cases that need this. I hope this post helps to understand it better and make your minds about real-life scenarios where this might be useful.

The good news is that you can test this already in the most recent versions of some major browsers: Chrome Canary, Safari Technology Preview and Firefox. We hope that the 3 implementations are interoperable, but please let us know if you find any issue.

There’s one last thing missing: alignment support for positioned items. This hasn’t been implemented yet in any of the browsers, but the behavior will be pretty similar to the one you can already use with regular grid items. Hopefully, we’ll have time to add support for this in the coming months.

Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

Last but least, thanks to Bloomberg for supporting Igalia in the CSS Grid Layout implementation on Blink and WebKit.

May 26, 2016 10:00 PM

May 24, 2016

Alberto Garcia

I/O bursts with QEMU 2.6

QEMU 2.6 was released a few days ago. One new feature that I have been working on is the new way to configure I/O limits in disk drives to allow bursts and increase the responsiveness of the virtual machine. In this post I’ll try to explain how it works.

The basic settings

First I will summarize the basic settings that were already available in earlier versions of QEMU.

Two aspects of the disk I/O can be limited: the number of bytes per second and the number of operations per second (IOPS). For each one of them the user can set a global limit or separate limits for read and write operations. This gives us a total of six different parameters.

I/O limits can be set using the throttling.* parameters of -drive, or using the QMP block_set_io_throttle command. These are the names of the parameters for both cases:

-drive block_set_io_throttle
throttling.iops-total iops
throttling.iops-read iops_rd
throttling.iops-write iops_wr
throttling.bps-total bps
throttling.bps-read bps_rd
throttling.bps-write bps_wr

It is possible to set limits for both IOPS and bps at the same time, and for each case we can decide whether to have separate read and write limits or not, but if iops-total is set then neither iops-read nor iops-write can be set. The same applies to bps-total and bps-read/write.

The default value of these parameters is 0, and it means unlimited.

In its most basic usage, the user can add a drive to QEMU with a limit of, say, 100 IOPS with the following -drive line:

-drive file=hd0.qcow2,throttling.iops-total=100

We can do the same using QMP. In this case all these parameters are mandatory, so we must set to 0 the ones that we don’t want to limit:

   { "execute": "block_set_io_throttle",
     "arguments": {
        "device": "virtio0",
        "iops": 100,
        "iops_rd": 0,
        "iops_wr": 0,
        "bps": 0,
        "bps_rd": 0,
        "bps_wr": 0
     }
   }

I/O bursts

While the settings that we have just seen are enough to prevent the virtual machine from performing too much I/O, it can be useful to allow the user to exceed those limits occasionally. This way we can have a more responsive VM that is able to cope better with peaks of activity while keeping the average limits lower the rest of the time.

Starting from QEMU 2.6, it is possible to allow the user to do bursts of I/O for a configurable amount of time. A burst is an amount of I/O that can exceed the basic limit, and there are two parameters that control them: their length and the maximum amount of I/O they allow. These two can be configured separately for each one of the six basic parameters described in the previous section, but here we’ll use ‘iops-total’ as an example.

The I/O limit during bursts is set using ‘iops-total-max’, and the maximum length (in seconds) is set with ‘iops-total-max-length’. So if we want to configure a drive with a basic limit of 100 IOPS and allow bursts of 2000 IOPS for 60 seconds, we would do it like this (the line is split for clarity):

   -drive file=hd0.qcow2,
          throttling.iops-total=100,
          throttling.iops-total-max=2000,
          throttling.iops-total-max-length=60

Or with QMP:

   { "execute": "block_set_io_throttle",
     "arguments": {
        "device": "virtio0",
        "iops": 100,
        "iops_rd": 0,
        "iops_wr": 0,
        "bps": 0,
        "bps_rd": 0,
        "bps_wr": 0,
        "iops_max": 2000,
        "iops_max_length": 60,
     }
   }

With this, the user can perform I/O on hd0.qcow2 at a rate of 2000 IOPS for 1 minute before it’s throttled down to 100 IOPS.

The user will be able to do bursts again if there’s a sufficiently long period of time with unused I/O (see below for details).

The default value for ‘iops-total-max’ is 0 and it means that bursts are not allowed. ‘iops-total-max-length’ can only be set if ‘iops-total-max’ is set as well, and its default value is 1 second.

Controlling the size of I/O operations

When applying IOPS limits all I/O operations are treated equally regardless of their size. This means that the user can take advantage of this in order to circumvent the limits and submit one huge I/O request instead of several smaller ones.

QEMU provides a setting called throttling.iops-size to prevent this from happening. This setting specifies the size (in bytes) of an I/O request for accounting purposes. Larger requests will be counted proportionally to this size.

For example, if iops-size is set to 4096 then an 8KB request will be counted as two, and a 6KB request will be counted as one and a half. This only applies to requests larger than iops-size: smaller requests will be always counted as one, no matter their size.

The default value of iops-size is 0 and it means that the size of the requests is never taken into account when applying IOPS limits.

Applying I/O limits to groups of disks

In all the examples so far we have seen how to apply limits to the I/O performed on individual drives, but QEMU allows grouping drives so they all share the same limits.

This feature is available since QEMU 2.4. Please refer to the post I wrote when it was published for more details.

The Leaky Bucket algorithm

I/O limits in QEMU are implemented using the leaky bucket algorithm (specifically the “Leaky bucket as a meter” variant).

This algorithm uses the analogy of a bucket that leaks water constantly. The water that gets into the bucket represents the I/O that has been performed, and no more I/O is allowed once the bucket is full.

To see the way this corresponds to the throttling parameters in QEMU, consider the following values:

  iops-total=100
  iops-total-max=2000
  iops-total-max-length=60
  • Water leaks from the bucket at a rate of 100 IOPS.
  • Water can be added to the bucket at a rate of 2000 IOPS.
  • The size of the bucket is 2000 x 60 = 120000.
  • If iops-total-max is unset then the bucket size is 100.

bucket

The bucket is initially empty, therefore water can be added until it’s full at a rate of 2000 IOPS (the burst rate). Once the bucket is full we can only add as much water as it leaks, therefore the I/O rate is reduced to 100 IOPS. If we add less water than it leaks then the bucket will start to empty, allowing for bursts again.

Note that since water is leaking from the bucket even during bursts, it will take a bit more than 60 seconds at 2000 IOPS to fill it up. After those 60 seconds the bucket will have leaked 60 x 100 = 6000, allowing for 3 more seconds of I/O at 2000 IOPS.

Also, due to the way the algorithm works, longer burst can be done at a lower I/O rate, e.g. 1000 IOPS during 120 seconds.

Acknowledgments

As usual, my work in QEMU is sponsored by Outscale and has been made possible by Igalia and the help of the QEMU development team.

igalia-outscale

Enjoy QEMU 2.6!

by berto at May 24, 2016 11:47 AM

May 23, 2016

Igalia Compilers Team

Awaiting the future of JavaScript in V8

On the evening of Monday, May 16th, 2016, we have made history. We’ve landed the initial implementation of “Async Functions” in V8, the JavaScript runtime in use by the Google Chrome and Node.js. We do these things not because they are easy, but because they are hard. Because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one we are willing to accept. It is very exciting to see this, roughly 2 months of implementation, codereview and standards finangling/discussion to land. It is truly an honour.

 

To introduce you to Async Functions, it’s first necessary to understand two things: the status quo of async programming in JavaScript, as well as Generators (previously implemented by fellow Igalian Andy).

 

Async programming in JavaScript has historically been implemented by callbacks. `window.setTimeout(function toExecuteLaterOnceTimeHasPassed() {}, …)` being the common example. Callbacks on their own are not scalable: when numerous nested asynchronous operations are needed, code becomes extremely difficult to read and reason about. Abstraction libraries have been tacked on to improve this, including caolan’s async package, or Promise libraries such as Q. These abstractions simplify control flow management and data flow management, and are a massive improvement over plain Callbacks. But we can do better! For a more detailed look at Promises, have a look at the fantastic MDN article. Some great resources on why and how callbacks can lead to utter non-scalable disaster exist too, check out http://callbackhell.com!

 

The second concept, Generators, allow a runtime to return from a function at an arbitrary line, and later re-enter that function at the following instruction, in order to continue execution. So right away you can imagine where this is going — we can continue execution of the same function, rather than writing a closure to continue execution in a new function. Async Functions rely on this same mechanism (and in fact, on the underlying Generators implementation), to achieve their goal, immensely simplifying non-trivial coordination of asynchronous operations.

 

As a simple example, lets compare the following two approaches:
function deployApplication() {
  return cleanDirectory(__DEPLOYMENT_DIR__).
    then(fetchNpmDependencies).
    then(
      deps => Promise.all(
        deps.map(
          dep => moveToDeploymentSite(
            dep.files,
            `${__DEPLOYMENT_DIR__}/deps/${dep.name}`
        ))).
    then(() => compileSources(__SRC_DIR__,
                              __DEPLOYMENT_DIR__)).
    then(uploadToServer);
}

The Promise boiler plate makes this preit harder to read and follow than it could be. And what happens if an error occurs? Do we want to add catch handlers to each link in the Promise chain? That will only make it even more difficult to follow, with error handling interleaved in difficult to read ways.

Lets refactor this using async functions:

async function deployApplication() {
    await cleanDIrectory(__DEPLOYMENT_DIR__);
    let dependencies = await fetchNpmDependencies();

    // *see below*
    for (let dep of dependencies) {
        await moveToDeploymentSite(
            dep.files,
            `${__DEPLOYMENT_DIR__}/deps/${dep.name}`);
    }

    await compileSources(__SRC_DIR__,
                         __DEPLOYMENT_DIR__);
    return uploadToServer();
}

 

You’ll notice that the “moveToDeploymentSite” step is slightly different in the async function version, in that it completes each operation in a serial pipeline, rather than completing each operation in parallel, and continuing once finished. This is an unfortunate limitation of the async function specification, which will hopefully be improved on in the future.

 

In the meantime, it’s still possible to use the Promise API in async functions, as you can `await` any Promise, and continue execution after it is resolved. This grants compatibility with  numerous existing Web Platform APIs (such as `fetch()`), which is ultimately a good thing! Here’s an alternative implementation of this step, which performs the `moveToDeploymentSite()` bits in parallel, rather than serially:

 


await Promise.all(dependencies.map(
  dep => moveToDeploymentSite(
    dep.files,
    `${__DEPLOYMENT_DIR__}/deps/${dep.name}`
)));

 

Now, it’s clear from the let dependencies = await fetchNpmDependencies(); line that Promises are unwrapped automatically. What happens if the promise is rejected with an error, rather than resolved with a value? With try-catch blocks, we can catch rejected promise errors inside async functions! And if they are not caught, they will automatically return a rejected Promise from the async function.

function throwsError() { throw new Error("oops"); }

async function foo() { throwsError(); }

// will print the Error thrown in `throwsError`.
foo().catch(console.error)

async function bar() {
  try {
    var value = await foo();
  } catch (error) {
    // Rejected Promise is unwrapped automatically, and
    // execution continues here, allowing us to recover
    // from the error! `error` is `new Error("oops!")`
  }
}

 

There are also lots of convenient forms of async function declarations, which hopefully serve lots of interesting use-cases! You can concisely declare methods as asynchronous in Object literals and ES6 classes, by preceding the method name with the `async` keyword (without a preceding line terminator!)

 


class C {
  async doAsyncOperation() {
    // ...
  }
};

var obj = {
  async getFacebookProfileAsynchronously() {
    /* ... */
  }
};

 

These features allow us to write more idiomatic, easier to understand asynchronous control flow in our applications, and future extensions to the ECMAScript specification will enable even more idiomatic forms for writing complex algorithms, in a maintainable and readable fashion. We are very excited about this! There are numerous other resources on the web detailing async functions, their benefits, and perhaps ways they might be improved in the future. Some good ones include [this piece from Google’s Jake Archibald](https://jakearchibald.com/2014/es7-async-functions/), so give that a read for more details. It’s a few years old, but it holds up nicely!

 

So, now that you’ve seen the overview of the feature, you might be wondering how you can try it out, and when it will be available for use. For the next few weeks, it’s still too experimental even for the “Experimental Javascript” flag.  But if you are adventurous, you can try it already!  Fetch the latest Chrome Canary build, and start Chrome with the command-line-flag `–js-flags=”–harmony-async-await”`. We can’t make promises about the shipping timeline, but it could ship as early as Chrome 53 or Chrome 54, which will become stable in September or October.

 

We owe a shout out to Bloomberg, who have provided us with resources to improve the web platform that we love. Hopefully, we are providing their engineers with ways to write more maintainable, more performant, and more beautiful code. We hope to continue this working relationship in the future!

 

As well, shoutouts are owed to the Chromium team, who have assisted in reviewing the feature, verifying its stability, getting devtools integration working, and ultimately getting the code upstream. Terriffic! In addition, the WebKit team has also been very helpful, and hopefully we will see the feature land in JavaScriptCore in the not too distant future.

by compilers at May 23, 2016 02:08 PM

May 20, 2016

Víctor Jáquez

GStreamer Hackfest 2016

Yes, it happened again: the Gstreamer Spring Hackfest 2016! This time in the beautiful city of Thessaloniki. Thanks a lot, Vivia and Sebastian, for making it happen.

Thessaloniki
Thessaloniki’s promenade

My objective this time was to work with dma-buf support in gstreamer-vaapi. Though it is supported already, it needs a major clean up, and to extend its usage for downstream buffers (bugs 755072 and 765435).

In the way I learned that we need to update our internal API (called `libgstvaapi`), when handling dma-buf, to support mult-plane formats.

On the other hand, Nicolas Dufresne and I talked a bit about kmssink, libdrm and dma-buf. He managed to hack his Odroid U (Exynos3) to enable its V4L2 mem2mem video decoder and share buffers with kmssink. It was amazing. By the way, he promised me to write a blog post with the instructions to replicate his deed.

GStreamer Hackfest
Photo by Luis de Bethencourt

Finally, we had a preview of Edward Hervey‘s decodebin3. It is fun his test of switching the different audio streams in a media container (the different available dubbings) in every second or less. It was truly a multi-language audio!

In the meantime, we shared beers and meals, learning and laughing.

by vjaquez at May 20, 2016 10:32 AM

May 18, 2016

Antonio Gomes

[Chromium] content_shell running on Wayland desktop (Weston Compositor)

During my first weeks at Igalia, I got an interesting and promising task of Understand the status of Wayland support in Chromium upstream.

At first I could see clear signs that Wayland is being actively considered by the Chromium community:

  1. Ozone/Wayland project by Intel – which was my starting point as described later on.
  2. The meta bug “upstream wayland backend for ozone (20% project)“, which has some recent activity by the Chromium community.
  3. This comment in the chromium-dev mailing list (by a Googler):
  4. “(..) I ‘d recommend using ToT. Wayland support is a work-in-progress and newer trees will probably be better.”.

    Chromium’s DEPS file has “wayland” as one its core dependency (search for “wayland”).

Next step, naturally, was get my hands dirty, compiling and experimenting with it. I decided to start with content_shell.


Environment:

Ubuntu 16.04 LTS, regular Gnome session (with X) and Weston Wayland compositor running (no ‘xwayland’ installed) to run Wayland apps.

GYP_DEFINES

component=static_library use_ash=1 use_aura=1 chromeos=0 use_ozone=1 ozone_platform_wayland=1 use_wayland_egl=1 ozone_auto_platforms=0 use_xkbcommon=1 clang=0 use_sysroot=0 linux_use_bundled_gold=0

(note: GYP was used for personal familiarity with it, but GN applies fine here).

Chromium version

Base SHA 5da650481 (as of 2016/May/17)

Initial results

As is, content_shell built fine, but hung at runtime upon launch, hitting a CHECK at desktop_factory_ozone.cc(21).

Analysis and Action

After understanding current Ozone/Wayland upstream support, compare designs/code against 01.org, I could start connecting some of the missing dots.

The following files were “ported” from 01.org:

  • ui/views/widget/desktop_aura/desktop_drag_drop_client_wayland.cc / h
  • ui/views/widget/desktop_aura/desktop_screen_ozone_wayland.cc / h
  • ui/views/widget/desktop_aura/desktop_window_tree_host_ozone_wayland.cc / h

And then, I implemented DesktopFactoryOzoneWayland class (ui/views/widget/desktop_factory_ozone_wayland.cc/h) – it inherits from DesktopFactoryOzone, and implements the following pure virtual methods ::CreateWindowTreeHost and ::CreateDesktopScreen.

Initial result

After that, I could build and run content_shell with Weston Wayland Compositor (with no ‘xwayland’ installed). See a quick preview below.

Remarks

As is, the UI process owns the the Wayland connection, and GPU process runs without GL support.

UI processes initializes Ozone by calling:

#0 ui::WaylandSurfaceFactory::WaylandSurfaceFactory
#1 ui::OzonePlatformWayland::InitializeUI
#2 ui::OzonePlatform::InitializeForUI();
#3 aura::Env::Init()
#4 aura::Env::CreateInstance()
#5 content::BrowserMainLoop::InitializeToolkit()
(…)
#X content::ContentMain()
<UI PROCESS LAUNCH>
On the other side, GPU process gets initialized by calling:
#0 ui::WaylandSurfaceFactory::WaylandSurfaceFactory
#1 ui::OzonePlatformWayland::InitializeGPU
#2 ui::OzonePlatform::InitializeForGPU();
#3 gfx::InitializeStaticGLBindings()
#4 gfx::GLSurface::InitializeOneOffImplementation()
#5 gfx::GLSurface::InitializeOneOff()
#6 content::GpuMain()
<GPU PROCESS LAUNCH>

Differently from UI process, the GPU process call does not initialize OzonePlatformWayland::display_ and instead passes ‘nullptr’ to WaylandSurfaceFactory ctor.

Down the road on the GPU processes initialization WaylandSurfaceFactory::LoadEGLGLES2Bindings is called but bails out earlier explicitly because display_ is NIL.
Then, UI process falls back to software rendering (see call to WaylandSurfaceFactory::CreateCanvasForWidget).

Next step

  • So far I have experimented Ozone/Wayland support using “linux” as the target OS. As far as I can tell, most of the Ozone work upstream though has been focusing on “chromeos” builds instead (e.g. ozone/x11).
  • Hence the idea is to clean up the code and agree with Googlers / 01.org (Intel) people about how to best make use of this code.
  • It is being also discussed with some Googlers what the best way to tackle this lack of GL support is. Some real nice stuff are on the pipe here.

 

by agomes at May 18, 2016 08:21 PM

May 16, 2016

Javier Muñoz

The Ceph RGW storage driver goes upstream in Libcloud

The Ceph RGW storage driver was upstream in Apache Libcloud today. It is available in the Libcloud trunk repository and it will ship with the next release Apache Libcloud 1.0.0.

This post will introduce the new RGW driver together with the proper configuration parameters to run some examples uploading/downloading objects in Ceph Jewel.

The Ceph RGW storage driver

The Ceph RGW storage driver requires Ceph Jewel or above. As of this writing, the last Ceph Jewel version is 10.2.1. This version is available in the downloads section.

The driver extends the Libcloud S3 storage driver to provide a compatible S3 API with Ceph RGW.

The driver also contains support for AWS signature versions 2 (AWS2) and 4 (AWS4). It leverages the Libcloud common auth support on the client side. On the Ceph RGW side it required a little patch to handle unsigned paylods in the AWS4 auth header.

Developers and apps can use the Ceph RGW driver via the S3_RGW provider easily. A simple snippet follows...

from libcloud.storage.types import Provider
from libcloud.storage.providers import get_driver
import libcloud

api_key = 'api_key'
secret_key = 'secret_key'

cls = get_driver(Provider.S3_RGW)

driver = cls(api_key,
             secret_key,
	     signature_version='4',
	     region='my-region',
	     host='my-host',
	     port=8000)

container = driver.get_container(...)

If the region has not an explicit value, the driver will use the default region 'default'.

The valid signature versions are '2' (AWS2) and '4' (AWS4). AWS2 is the default signature version.

One host name is always required. No default value here.

The following two examples contain the minimal code to upload/download objects with the new provider:

Running the upload example...

$ ./test-upload-ceph-rgw-driver.py
<Object: name=my-name-abcdabcd-123,
size=110080,
hash=0a5cfeb3bb10e0971895f8899a64e816,
provider=Ceph RGW S3 (my-region) ...>

Running the download example...

$ ./test-download-ceph-rgw-driver.py
<Object: name=my-name-abcdabcd-123,
size=110080,
hash=0a5cfeb3bb10e0971895f8899a64e816,
provider=Ceph RGW S3 (my-region) ...>

Enjoy!

Acknowledgments

My work in Apache Libcloud is sponsored by Outscale and has been made possible by Igalia and the invaluable help of the Libcloud community. Thank you all!

by Javier at May 16, 2016 10:00 PM

May 10, 2016

Sergio Villar

Automatizing the Grid

My Igalia colleagues and me have extensively reviewed how to create grids and how to position items inside the grid using different CSS properties. So far everything was more or less static. We declare the sizes of our columns/rows or define a set of grid areas and that’s it. Well, actually there is room for automatic stuff,  you can dynamically create new tracks just by adding items to positions outside the explicit grid. Furthermore the grid is able to auto-position items for you if you don’t really care much about the final destination.

Use Cases

But imagine the following use case. Let’s assume that you are designing the product catalog of your pretty nice web store. CSS Grid Layout is the obvious choice for such layout. Just set some columns and some rows and that’s mostly it. Thing is that your catalog is likely not static, but automatically generated from a query to some database where you store all that data. You cannot know a priori how many items you’re going to show (users normally can also filter results).

Grid already supports that use case. You can already define the number of columns (rows) and let the grid create as many rows (columns) as needed. Something like this:

    grid-template-columns: repeat(3, 100px);
    grid-auto-rows: 100px;
    grid-auto-flow: row;

Picture showing auto row creation

 

The Multicol Example

But you need more flexibility. You’re happy with the grid creating new tracks on demand in one axis, but you also want to have as many tracks as possible (depending on the available size) in the other axis. You’ve already designed web sites using CSS Multicol and you want your grid to behave like columns: 100px;

Note that in the case of multicol the specified column width is an “optimal” size, meaning that it could be enlarged/narrowed to fill the container once the number of columns is calculated.

So is it possible to tell grid to add as many tracks as needed to fill some available space? It was not but now we have…

Repeat to fill: auto-fill and auto-fit

The grid way to implement it is by using the recently added auto repeat syntax. The already known repeat() function accepts as a first argument two new keywords, auto-fill and auto-fit. Both of them will generate as many repetitions as needed to fill the available space with the specified tracks without overflowing. The only difference between them is that, auto-fit will additionally drop any empty track (meaning no items spanning through it) generated by repeat() after positioning grid items.

The use of these two new keywords has some limitations:

  • Tracks with intrinsic or flexible sizes cannot be combined with auto repetitions
  • There can be just one auto-repeat per axis at the most
  • repeat(auto-fill|fit,) accepts only one track size as second argument (all auto repeat tracks will have the same size)

Some examples of valid declarations:

grid-template-columns: repeat(auto-fill, minmax(200px, 1fr)) [last];
grid-template-rows: 100px repeat(auto-fit, 2em) repeat(10, minmax(15%, 1fr));
grid-template-columns: repeat(auto-fill, minmax(max-content, 300px));

And some examples of invalid ones:

grid-template-columns: min-content repeat(auto-fill, 25px) 10px;
grid-template-rows: repeat(auto-fit, 15px) repeat(auto-fill, minmax(10px, 100px);
grid-template-columns: repeat(auto-fill, min-content);

The Details

I mentioned that we cannot mix flexible and/or intrinsic tracks with auto repetitions, so why repeat(auto-fill, minmax(200px, 1fr)) is a valid declaration? Well, according to the grammar, the auto repeat syntax require something called a <fixed-size> track, which is basically a track which has a length (like 10px) or a percentage in either its min or max function. The following are all <fixed-size> tracks:

15%
minmax(min-content, 200px)
minmax(5%, 1fr)

That length/percentage is the one used by the grid to compute the number of repetitions. You should also be aware of how the number of auto repeat tracks is computed. First of all you need a definite size on the axis where you want to use auto-repeat. If that is not the case then the number of repetitions will be 1.

The nice thing is that the definite size could be either the “normal” size (width: 250px), the max size (max-height: 15em) or even the min size (min-width: 200px). But there is an important difference, if either the size or max-size is definite, then the number of repetitions is the largest possible positive integer that does not cause the grid to overflow its grid container. Otherwise, if the min-size is definite, then the number of repetitions is the smallest possible positive integer that fulfills that minimum requirement.

For example, the following declaration will generate 4 rows:

height: 600px;
grid-template-rows: repeat(auto-fill, 140px);

But this one will generate 5:

min-height: 600px;
grid-template-rows: repeat(auto-fill, 140px);

I want to use it now!

Sure, no problem. I’ve just landed the support for auto-fill in both Blink and WebKit meaning that you’ll be able to test it (unprefixed) really soon in Chrome Canary and Safari Technology Preview (Firefox Nightly builds already support both auto-fill and auto-fit). Many thanks to our friends at Bloomberg for sponsoring this work. Enjoy!

by svillar at May 10, 2016 07:25 PM