Planet Igalia

January 24, 2015

Samuel Iglesias

FOSDEM 2015

Next week  I am traveling to Brussels to attend FOSDEM but this time as speaker! My talk will explain why we need to test our OpenGL drivers with OpenGL conformance test suites, list the unofficial Free software ones and give a brief introduction about how to use them.

See you there!

FOSDEM 2015

by Samuel Iglesias at January 24, 2015 11:10 AM

January 22, 2015

Jacobo Aragunde

LibreOffice for Android at FOSDEM

FOSDEM

Next week I’m flying to FOSDEM to let you know about our work in LibreOffice for Android, which has just been released in the Play Store. In my talk I will focus on the document browser, the new features we are currently working on and my own vision for the future of this component.

LibreOffice for Android, provider selection

You can read more about LibreOffice for Android in this post from The Document Foundation, and some gory technical details in Michael Meeks’ blog.

by Jacobo Aragunde Pérez at January 22, 2015 05:30 PM

January 19, 2015

Manuel Rego

New Year, New Blog!

As you might noticed, I’m starting the new year changing my blog design and software.

The idea has often come to my mind since I discovered the static site generators like Jekyll. And finally this year I decided to get rid of WordPress and, definitely, move my blog to Jekyll. Like the previous blog theme was really old fashioned and completely unresponsive, I took advantage of the software migration to improve the design of my blog too.

Basically, I found this nice Clean Blog theme by Start Bootstrap and decided to give it a chance. On top of that, they provide not only the theme but also a version already integrated with Jekyll. So, most of the work was already in place.

In order to migrate the blog posts I used a WordPress plugin which, together with a few sed commands, was enough to move the old posts to Jekyll. In addition, I’m using Jekyll tags to categorize the posts like before. With a tag view page that shows all the posts and provide its own feed (required for planets subscriptions). E.g. CSS tag page.

Regarding comments, I decided to go for a static solution too. Basically, I got very few comments during the last year, so I think that looking for a more complex solution doesn’t worth it. This means, that if you want to comment something in my posts from now on, you should send me a mail (or a tweet) and that’s all. You can check a previous post with comments to see how they look like.

This was just a short notice about this change, I hope you enjoy the new site. At least I like it a lot. :-)

January 19, 2015 11:00 PM

January 12, 2015

Javier Fernández

Box Alignment and Grid Layout (II)

Some time has passed since my first post about the Box Alignment spec implementation status for Blink and WebKit web engines. I’ll do an update in this post and, since the gap between both web engines has grown considerably (I’ll do my best to reduce it as soon as possible), I’ll remark the differences between both engines.

What’s new ?

The ‘stretch’ value is now implemented for align-{self, items} and justify-{self, items} CSS properties. This behavior is quite important because it’s the default for these four properties in Flexible Box and Grid layouts. According to the specification, the ‘stretch’ value is defined as follows:

If the width or height (as appropriate) of the alignment subject is auto, its used value is the length necessary to make the alignment subject’s outer size as close to the size of the alignment container as possible, while still respecting the constraints imposed by min-height/max-width/etc. Otherwise, this is equivalent to start.

When defining the alignment properties in a grid layout it’s very important to consider how we want to manage item’s overflow. It’s allowed to specify an overflow alignment value for both Content and Item Alignment definition, but so far it’s implemented only for the Item Alignment properties. The Overflow Alignment concept is defined in the specification as follows:

To help combat undesirable data loss, an overflow alignment mode can be explicitly specified. “True” alignment honors the specified alignment mode in overflow situations, even if it causes data loss, while “safe” alignment changes the alignment mode in overflow situations in an attempt to avoid data loss.

The ‘stretch’ value in Grid Layout

This value applies to the Self Alignment properties {align, justify}-self, and obviously their corresponding Default Alignment ones {align, justify}-items. For grids, these properties consider that the alignment container is the grid cell, while the alignment subject is the grid item’s margin box.

The Box Alignment specification states that Default Alignment ‘auto’ values should be resolved to ‘stretch’ in case of Grid containers; this value will be used as the resolved value for ‘auto’ Self Alignment values.

So by default, or when explicitly defined as ‘stretch’, the grid item’s margin box will be stretched to fill its grid cell breadth. Let’s see it with a basic example:

align-stretch

All the examples available at https://igalia.github.io/css-grid-layout/

This change affected only the layout codebase of the web engine, since the value was already introduced in the style parsing logic. The Grid Layout rendering logic uses an interesting abstraction to override the grid item’s block container. This abstraction allows us to use Grid Cells as block containers when computing the logical height and width.

Overflow Alignment in Grid Layout

The Overflow Alignment value used when defining a grid layout could be particularly useful, specially for fixed sized grids. The potential data lost may happen not only at the left and top box edges, but between adjoining grid cells. Overflow Alignment is defined via the ‘safe’ and ‘true’ keywords. They were already introduced in the Blink core’s style parsing logic as part of the CSS3 upgrade of the alignment properties (justify-self, align-self) used in the FlexBox implementation. The new CSS syntax is described by the following expression:

auto | stretch | ">" href="http://dev.w3.org/csswg/css-align-3/#typedef-baseline-position" data-link-type="type"><baseline-position> | [ ">" href="http://dev.w3.org/csswg/css-align-3/#typedef-item-position" data-link-type="type"><item-position> && ">" href="http://dev.w3.org/csswg/css-align-3/#typedef-overflow-position" data-link-type="type"><overflow-position>? ]

According to the current Box Alignment specification draft, the Overflow Alignment keywords have the following meaning:

  • safe: If the size of the alignment subject overflows the alignment container, the alignment subject is instead aligned as if the alignment mode were start.
  • true: Regardless of the relative sizes of the alignment subject and alignment container, the given alignment value is honored.

I’ll proceed now to show how Overflow Alignment is applied in the Grid Layout specification with an example:

align-overflow

All the examples available at https://igalia.github.io/css-grid-layout/

The new syntax to allow the Overflow Alignment keywords required to modify the style builder and parsing logic, as it was mentioned before. The alignment properties became a CSSValueList instance instead of simple keyword IDs; both Blink and WebKit provides Style Builder code generation directives (CSSPropertyNames.in) for simple properties, but this was not the case for these properties anymore.

The Style Builder is more complex now because it has to deal with the conditional overflow keyword, which can be specified before or after the <item-position> keyword. Blink provides a function template scheme for groups of properties sharing the same logic, which is the case of most of the CSS Box Alignment properties (align-self, align-items and justify-self; justify-items is slightly different so it needs custom functions). WebKit is currently defining the new style builder and it does not follow this approach yet, but I’d say it will, eventually, since it makes a lot of sense.

{% macro apply_alignment(property_id, alignment_type) %}
{% set property = properties[property_id] %}
{{declare_initial_function(property_id)}}
{
    state.style()->set{{alignment_type}}(RenderStyle::initial{{alignment_type}}());
    state.style()->set{{alignment_type}}OverflowAlignment(RenderStyle::initial{{alignment_type}}OverflowAlignment());
}
 
{{declare_inherit_function(property_id)}}
{
    state.style()->set{{alignment_type}}(state.parentStyle()->{{property.getter}}());
    state.style()->set{{alignment_type}}OverflowAlignment(state.parentStyle()->{{property.getter}}OverflowAlignment());
}
 
{{declare_value_function(property_id)}}
{
    CSSPrimitiveValue* primitiveValue = toCSSPrimitiveValue(value);
    if (Pair* pairValue = primitiveValue->getPairValue()) {
        state.style()->set{{alignment_type}}(*pairValue->first());
        state.style()->set{{alignment_type}}OverflowAlignment(*pairValue->second());
    } else {
        state.style()->set{{alignment_type}}(*primitiveValue);
    }
}
{% endmacro %}
{{apply_alignment('CSSPropertyJustifySelf', 'JustifySelf')}}
{{apply_alignment('CSSPropertyAlignItems', 'AlignItems')}}
{{apply_alignment('CSSPropertyAlignSelf', 'AlignSelf')}}

Even though style building and parsing is shared among all the layout models using the Box Alignment properties, like Flexible Box and Grid so far, Overflow Alignment is only supported so far by the CSS Grid Layout implementation. The Overflow Alignment logic affects how the grid items are positioned during the layout phase.

The Overflow Alignment keywords are also valid in the Content Distribution syntax, which I’m working on now with quite good progress. The first patches landed already in trunk, providing an implementation of the justify-content property in Grid Layout. I’ll talk about it soon, hopefully, once the implementation is completed and the discussion in the www-style mailing list conclude with some agreement regarding the Distributed Alignment for grids.

Current implementation status

The Box Alignment specification is quite complete now in Blink, unfortunately that’s not the case of WebKit. I’ll summarize now the current implementation status in browsers based on each web engine, which are basically Chrome/Chromium vs Safari; I’ll also try to outline the roadmap for the next weeks.

align-grid-support

The flex-start, and flex-end values are used only in Flexible Box layouts, so they don’t apply to this analysis of the Grid support of the Box Alignment spec. The Distributed Alignment values apply only to the Content Distribution properties (align-content and justify-content). Finally, ‘stretch‘ is a valid value for both, Positional and Distributed Alignment, so it’s not redundant but a different interpretation of the same value depending on the property it’s applied to.

Some conclusions we can extract from the table above:

  • Default and Self Alignment support is almost complete in Blink; only Baseline Alignment is pending to be implemented.
  • Content Distribution support for justify-content in Blink. Only <content-position> values are implemented, since current spec draft states that all the <content-distibution> values will fallback in Grid Layout; spec authors are still evaluating changes on this issue, though.
  • WebKit fails to provide CSS3 parsing for all the properties except justify-self, although there are some patches pending of review to improve this situation.
  • There is no Grid support at all in WebKit for any of the Box Alignment properties.
Igalia & Bloomberg logos

Igalia and Bloomberg working to build a better web platform

by jfernandez at January 12, 2015 11:41 AM

January 07, 2015

Manuel Rego

CSS Grid Layout 2014 Recap: Implementation Status

After the review of the changes in the CSS Grid Layout spec in my previous post, let’s focus now in the status of the implementation in Blink and WebKit. This post will try to cover what we’ve been doing in Igalia during 2014 around grid layout, and it’ll talk about our plans for 2015.

Work done in 2014

Spec syntax

During the first part of the year we were updating the CSS Grid Layout syntax in order to match the last modifications introduced in the spec during 2013.
As part of this work, the grid and grid-template shorthands were introduced, which are deeply explained by my colleague Javi Fernández in a post.
Right now the implementation both in Blink (#337626) and WebKit (#127987) is complete and matching the spec regarding to the syntax.

Painting optimizations

Back in 2013, Julien Chaffraix did some optimizations in the grid painting code (#248151). However, those changes introduced some issues that were being fixed during 2014. Finally the painting code seems to have been stable for a while.
This optimization is not present in WebKit, so this work was only done in Blink.

Code refactoring

Another task that was done in the first half of this year was the refactoring of the code related to positions resolution. It’s been moved to its own class (GridResolvedPosition), so RenderGrid only has to deal with resolved positions now.
This change was done in both Blink (#258258) and WebKit (#131732).

Named grid lines

At the beginning of the year we implemented the support for named grid lines. This completes the implementation of the different placement options availabe in a grid (numeric indexes, named areas, named lines and span). Once more, this is supported in Blink (#337779) and WebKit (#129372).
In this case, my fellow Sergio Villar talked about this work in another post.

Named grid lines example Named grid lines example

Track sizing algorithm

The track sizing algorithm has been rewritten in the spec during 2014. Despite of keeping the same behaviour, the implementation was modified to follow the new algorithm closely.
During this work some missing features were detected and solved, making the current implementation more complete and robust.
Several patches have been done in both Blink (#392200) and WebKit (#137253, #136453, #137019 & #136575).

Automatic placement algorithm

The auto-placement feature has been completed adding support for items spanning several tracks and implementing the “sparse” and “dense” packaging modes.

Auto-placement example with spanning item Auto-placement example with spanning item

In this case you can read my post about how all this works.
This was done in both Blink (#353789 & #384099) and WebKit (#103316).

Auto-placement Auto-placement “sparse” and “dense” packing modes example

Fuzzinator bugfixing

Apart from generic bugfixing during this year we’ve fixed some issues detected by a tool called Fuzzinator in both Blink and WebKit. Renata Hodovan wrote a nice post explaining all the details regarding this (thanks for great your work).
The good news is that now the grid code is much more stable thanks to all the reports and patches done during 2014.

Alignment and justification

This work is still ongoing, but the main alignment properties (justify-self, align-self, justify-items, align-items, justify-content and align-content) are already supported, or on the way (with patches pending review), in Blink (#234199). For this feature the patches in WebKit (#133222 & #133224) are moving slowly.
You can check all the possibilities provided by these properties in a blog post by Javi.

Different options to align an item inside a grid cell Different options to align an item inside a grid cell

Absolutely positioned grid children

During the last part of 2014 it’s been implemented the special support for positioned grid children, because of they’ve some particularities in grids.
The initial patch is available on Blink (#273898), but still some stuff needs to be fixed to complete it. Then, it’ll be ported to WebKit as usual.

Absolutely positioned grid children example Absolutely positioned grid children example

Writing modes

We’ve been working on writing modes support fixing issues with margins, borders and paddings. Now, the columns/rows are painted in the right order depending on the direction property.
Orthogonal flows were clarified in the last update of the spec, current issues are already being addressed in order to fix them.
Again, all this work was done in Blink (#437354) and will be ported to WebKit later on.

Example of direction support in grid Example of direction support in grid

Testing

You can always increase the test coverage, specially for a big spec like CSS Grid Layout. We’ve been adding some missing tests here and there, and finally decided to start the road to create a W3C test suite for grid.
We’re still on the early stages, and getting used to all the W3C testing infrastucture and processes. Gérard Talbot is helping us to take the first steps, big thanks!
We’ve already drafted a test plan where you can follow our progress. We hope to complete the test suite during 2015.
As expeceted, the nice part when you’re focused on writing tests in general (not only tests for the patch you’re developing) is that you do much better tests and you end up finding small issues in different places.

Plans for 2015

The main goal is to ship CSS Grid Layout in Chrome (Blink) and see if Safari (WebKit) follows the trend. In that hypothetical scenario 3 major browsers: Chrome, Safari and Internet Explorer (despite implementing an old version of the spec) will have CSS Grid Layout support; which would be really great news.
Thinking about the next step in the short-term, our intention is to send the “Intent to Ship” to Blink mailing list during the first quarter of 2015.
WebKit is lagging a bit behind, but we’ve plans to update the implementation and reduce the gap between Blink and WebKit grid’s codebases.

Of course, apart from completing all the ongoing tasks and other minor fixes, we’ll have to keep doing more work to fully implement the spec:

  • Add support for “auto” keyword for repeat() (recently added to the spec).
  • Allow to grow the implicit grid before the explicit grid (supporting properly negative indexes for grid line numbers).
  • Implement fragmentation support once the spec is definitive regarding this topic.

Apart from that during 2015 we’ll review the performance and will try to make faster the grid with some optimizations.
Furthermore, it’d be really nice to add some support for grid in Chrome DevTools and Safari Web Inspector. That would make life of end users much easier.

Wrap-up

2015 will be the year of CSS Grid Layout in the browser. Hopefully, you’ll be able to use it natively in all the major browsers but Firefox (where you could use the polyfill). Anyway you can start to play with it already, enabling the experimental Web Platform features flag in Chrome (unprefixed) or using the WebKit nightly builds (with “-webkit” prefix).
If you want to follow closely the implementation track the meta-bugs in Blink (#79180) and WebKit (#60731).

Igalia is doing all this work as part of our collaboration with Bloomberg.
We’re waiting for you to start testing grid layout, report feedback and bugs. We’ll do our best in order that you enjoy it. Exciting times ahead!

Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

January 07, 2015 11:00 PM

Xabier Rodríguez Calvar

Streams API in WebKit at the Web Engines Hackfest

Yes, I know, I should have written this post before you know, blah, blah, excuse 1, blah, excuse 2, etc. ;)

First of course I would like to thank Igalia for allowing me to use the company time to attend the hackfest and meeting such a group of amazing programmers! It was quite intense and I tried to give my best though for different reasons (coordination, personal and so on) I missed some session.

My purpose at the hackfest was to work with Youenn Fablet from Canon on implementing the Streams API in WebKit. When we began to work together in November, Youenn had already a prototype working with some tests, so the idea was taking that, completing, polishing and shipping it. Easy, huh? Not so…

What is Streams? As you can read in the spec, the idea is to create a way of handling different kind of streams with a common high level API. Those streams can be a mapping of low level I/O system operations or can be easily created from JavaScript.

Fancy things you can do:

  • Create readable/writable streams mapping different operations
  • Read/write data from/to the streams
  • Pipe data between different streams
  • Handle backpressure (controlling the data flow) automagically
  • Handle chunks as the web application sees fit, including different data types
  • Implement custom loaders to feed different HTML tags (images, multimedia, etc.)
  • Map some existing APIs to Streams. XMLHttpRequest would be a wonderful first step.

First thing we did after the prototype was defining a roadmap:

  • General ReadableStream that you can create at JavaScript and read from it
  • XMLHttpRequest integration
  • Loaders for some HTML tags
  • WritableStream
  • Piping operations

As you can see in bugzilla we are close to finishing the first point, which took quite a lot of effort because it required:

  • Code cleaning
  • Making it build in debug
  • Improving the tests
  • Writing the promises based constructor
  • Fixing a lot of bugs

Of course we didn’t do all this at the hackfest, only Chuck Norris would have been able to do that. The hackfest provided the oportunity of meeting Youenn in person, working side by side and discussing different problems and possible strategies to solve them, like for example, the error management, queueing chunks and handling their size, etc. which are not trivial given the complexity created by the flexibility of the API.

After the hackfest we continued working and, as I said before, the result you can find at bugzilla. We hope to be able to land this soon and continue working on the topic within the current roadmap.

To close the topic about the hackfest, it was a pleasure to work with such amount of awesome web engines hackers and I would like to finish thanking the sponsors Collabora and Adobe and specially my employer, Igalia, that was sponsor and host.

by calvaris at January 07, 2015 07:30 PM

January 02, 2015

Claudio Saavedra

Thu 2015/Jan/01

Some recap about my reading in 2014. I definitely read less this year although to some degree it was more exhausting. Let's see.

Books read to completion through the year: 23 (~7800 pages, according to Goodreads). Tolstoi's "War and Peace" and Marx's "Capital, vol. 1" were the ones that took most of the time, and for both I had to make long breaks with lighter reads in between.

In the case of "War and Peace", I read from both the English translation by Pevear & Volokhonsky (Vintage Classics) and the Spanish translation by Kúper (Editorial El Aleph). From these two, P&V became my favorite one as I was approaching the end of the book.

The volume 1 of Capital was fascinating but exhausting, as its style and long chapters on the working conditions in England up to the 19th century and the development of industry and machinery are, well, not condensed enough. Someone had to tell Marx to just write a separate book about these topics (had not Engels' "The Condition of the Working Class in England" been enough?). Anyway, I read it side by side with Harvey's "Companion to Marx's Capital" which was pretty good at avoiding one's getting lost in the interwined historical and logical arguments and the dialectic journey from commodities to class struggle. I'm looking forward to volume 2, although I'll need a bit of a rest before I get to it.

The language count: 17 in English, 5 in Spanish, 2 in German. That's frustratingly not enough books in Spanish for a Spanish speaker. But on the upside I got a bit more confident reading in German. I even tried to read a few short stories by Dürrenmatt but his German was way beyond mine ("Der Tunnel" is mindblowing, by the way). I'll try again at some point in the future with him.

Steinbeck deserves a special mention. I loved both "Grapes of Wrath" and "Of Mice and Men" not only for their style but also for their historical component (specially in the Grapes). "Zur falschen Zeit" by the swiss Alain Claude Sulzer is also a pretty good book (and that I can recommend to people with an intermediate German level). Fuguet's "Las Películas de mi Vida" touched a nerve with me, as its central topics: earthquakes, Chile, cinema, family, and being an expat are all topics that relate to my life. I read it in the plane that took me to Chile during these holidays and couldn't help but shed a tear or two with it, in anticipation to what would be an intensely emotional visit to the homeland after 4 years.

The worst book I read was a compilation of Marxist writings by C. Wright Mills called "The Marxists". If Mills was a relevant author in sociology (I still have to read "The Sociological Imagination), this is a pretty lazy work. I won't repeat what I said in my Goodreads review.

The author count! 20 distinct authors, and again, very few women among them -- two to be precise. Steinbeck, Marx, and Lenin were the most read authors, with two works of each.

The sources for the books: two from local libraries and two electronic books, 13 second-hand books, one loan, and the rest probably bought new.

Books acquired in 2014: 73 (down from 110 in 2013, so that's an improvement). Most of them are second-hand books and a few are presents from my loving friends. I also have a bigger bookshelf now but it's full already -- story of my life.

Of my goals for 2014, I only fulfilled: reading "War and Peace", buying less books, reading more German. I didn't read the Quixote, Ulysses, nor 2666 yet, and I read even less books in Spanish. For 2015, I'll be happy if I read vol. 2 of Capital, if I manage to read more books in Spanish, and if I get to read more German. Off we go, 2015.

I also took my hobby of curating quotes to what I've named Hel Literaria.

January 02, 2015 02:06 AM

December 29, 2014

Manuel Rego

CSS Grid Layout 2014 Recap: Specification Evolution

Year 2014 is coming to an end, so it’s the perfect timing to review what has happened regarding the CSS Grid Layout spec, which Igalia has been implementing in both Blink and WebKit engines, as part of our collaboration with Bloomberg.

I was wondering what would be the best approach to write this post, and finally I’m going to split it in 2 different posts. This one covering the changes in the spec during the whole year and another one talking about the implementation details.

Two Working Drafts (WD) of the CSS Grid Layout Module have been published during 2014. In addition, during the last month of the year, somehow related with the work previously done at TPAC in the CSS Working Group (WG) face to face (F2F) meeting, several changes have been done in the spec (I guess that they’ll end up in a new WD in early 2015). So, let’s review the most important changes introduced in each version.

Working Draft: 23 Jan 2014

Subgrids

This is the first WD where subgrids feature appears marked as at-risk. This means that it might end up out of Level 1 of the specification.
A subgrid is a grid inside another grid, but keeping a relationship between the rows/columns of the subgrid and the parent grid container. It shares the track sizing definitions with the parent. Just for the record, current implementations haven’t support for this feature yet.
However, nested grids are already available and will be part of Level 1. Basically nested grids have their own track sizing definitions completely independent of their parents. Of course, they’re not the same than subgrids.

Subgrid vs nested grid example Subgrid vs nested grid example

Implicit named areas

This is related with the concept of the named grid areas. Briefly, in a grid you can name the different areas (group of adjacent cells), for example: menu, main, aside and/or footer, using the grid-template-areas property.
Each area will define 4 implicit named lines: 2 called foo-start (marking the row and column start) and 2 called foo-end (row and column end), where foo would be the name of the area.
This WD introduces the possibility to create implicit named areas, by defining named grid lines using the previous pattern. That way if you explicitly create lines called foo-start and foo-end, you’ll be defining an implicit area called foo that could be used to place items in the grid.

Example of implicit named grid areas Example of implicit named grid areas

Aligning the grid

In this version the justify-content and align-content properties are added, which allow to align the whole grid within the grid container.

Other

In this WD appears a new informative section explaining the basic examples for the grid placement options. It’s an informative section but very useful to get an overview of the different possibilities.
In addition, it includes an explanatory example for the absolutely-positioned grid items behavior.

Working Draft: 13 May 2014

Track sizing algorithm

Probably the most important change in this version is the complete rewrite of the track sizing algorithm. Anyway, despite of the new wording, the algorithm keeps the very same behavior.
This is the main algorithm for grids, it defines how the track breaths should be calculated taking into account all the multiple available options that define the track sizes.
An appendix with a “translation” between the old algorithm and the new one is included too, mostly to serve as reference and help to detect possible mistakes.

Auto-placement changes

The grid-auto-flow property is modified in this WD:

  • none is substituted by a new stack packing mode.
  • As a consequence, property grid-auto-position (tied to none) is dropped.

Before this change, the default value for grid-auto-flow was none and, in that case, all the auto-placed items were positioned using the value determined by grid-auto-position (by default in 1×1).
With this change, the default value is row. But, you can specify stack and the grid’ll look for the first empty cell and use it to place there all the auto-positioned items.

Other

Implementations have now the possibility to clamp the maximum number of repetitions in a grid.
Besides, it brings up a new section related to sizing grid containers where it’s defined how they behave under max-content and min-content constraints.

Editor’s Draft: Last changes

Note: These changes are not yet in a WD version and might suffer some modifications before a new WD is published.

Automatic number of tracks

A new auto keyword has been added to repeat() function.
This will allow to repeat the track list specified as many times as needed, depending on the grid container size. Which used together with the auto-placement feature might be really nice combo.
For example, if the grid container is 350px width and it uses repeat(auto, 100px); to define the columns, you’ll end up having 3 columns.

Example of new auto keyword for repeat() function Example of new auto keyword for repeat() function

Auto-placement stack removed

Finally, after some issues with the stack mode, it’s been decided to remove it from the spec. This means that grid-auto-flow property gets simplified, allowing you to determine the direction: row (by default) or column; and the packing algorithm: dense or “sparse” (if omitted).
On top of that, the grid item placement algorithm has now a more explicit wording regarding the different packing modes.

Fragmentation

This section has been missing since a lot of time ago, and it finally has got some content.
Anyway, this is still an initial proposal and more work is needed to settle it down.

Other

Reviewed the scope of align-content and justify-content, now they apply to the grid tracks rather than to the grid as a single unit.

As a side note, there’s an ongoing discussion regarding the symbol used to determine the named grid lines. Currently it’s a parenthesis, e.g.:

grid-template-columns: (first) 100px (mid central) 200px (last);

However these parenthesis have some issues for Sass preprocessor. The proposal of using square brackets was not accepted in the last CSS WG F2F meeting, though it’ll be revisited again in the future.

Conclusion

Of course this list is not complete, and I can be missing some changes. At least, these’re the most important ones from the implementor perspective.
As you can see, despite of not having big behavioral changes during this year, the spec has been evolving and becoming more and more mature. A bunch of glitches have been fixed, and some features have been adapted thanks to the feedback from users and implementors.
Thanks to the spec editors: Tab, fantasai and Rossen (and the rest of the CSS WG), for all their work and patience in the mailing list answering lots of doubts and questions.

Next year CSS Grid Layout will be hitting your browsers, but you’re still on time to provide feedback and propose changes in the spec. The editors will be more than happy to listen your suggestions for improvements and know what things are you missing.
If you want to have first-hand information regarding the evolution of the spec, you should follow the CSS WG blog and check the minutes of the meetings where they discuss about grid. On top of that, if you want all the information, you should subscribe yourself to the CSS WG mailing list and read the mails with “[css-grid]” in the subject.

Last, in the next post I’ll talk about the work we’ve been doing during 2014 regarding the implementation in Blink and WebKit and our plans for 2015. Stay tuned!

Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

December 29, 2014 11:00 PM

December 15, 2014

Philippe Normand

Web Engines Hackfest 2014

Last week I attended the Web Engines Hackfest. The event was sponsored by Igalia (also hosting the event), Adobe and Collabora.

As usual I spent most of the time working on the WebKitGTK+ GStreamer backend and Sebastian Dröge kindly joined and helped out quite a bit, make sure to read his post about the event!

We first worked on the WebAudio GStreamer backend, Sebastian cleaned up various parts of the code, including the playback pipeline and the source element we use to bridge the WebCore AudioBus with the playback pipeline. On my side I finished the AudioSourceProvider patch that was abandoned for a few months (years) in Bugzilla. It’s an interesting feature to have so that web apps can use the WebAudio API with raw audio coming from Media elements.

I also hacked on GstGL support for video rendering. It’s quite interesting to be able to share the GL context of WebKit with GStreamer! The patch is not ready yet for landing but thanks to the reviews from Sebastian, Mathew Waters and Julien Isorce I’ll improve it and hopefully commit it soon in WebKit ToT.

Sebastian also worked on Media Source Extensions support. We had a very basic, non-working, backend that required… a rewrite, basically :) I hope we will have this reworked backend soon in trunk. Sebastian already has it working on Youtube!

The event was interesting in general, with discussions about rendering engines, rendering and JavaScript.

by Philippe Normand at December 15, 2014 04:51 PM

December 11, 2014

Javier Fernández

Web Engines Hackfest 2014

An awesome week is coming to the end and I’d like to thanks the sponsors for helping us to make possible that such an amazing group of hackers could work together to improve the web. We focused on some of the consolidated web engines but also about the most promising ones and, of course, hacking on them producing a good amount of patches.

15972643911_c197af2a23_z

The talks were great, and even better the breakout sessions about many hot topics for the future of the web engines.

15797464819_3eb5d51404_zBecause of the work I’ve been doing lately, I was specially involved in the CSS Features session, which among other things, it complemented the talk Sergio Villar gave us about the Grid Layout implementation we are doing at Igalia in collaboration with Bloomberg.  I introduced as well the work I’ve been doing on the implementation of the Box Alignment specification in both Blink and WebKit web engines; we evaluated how it would impact other layout models, like MathML, CSS Regions, CSS Flexible Box, to ease the logic of blocks and content alignment. We also discussed about the future of CSS regarding new layout models, which is a bit uncertain; there is actually an interesting discussion inside the W3C about this topic, so we will see how it evolves. We talked about graphics and CSS and the SVG specification  (the 2.0 version is being defined) which  will have an important role in the future, as I could personally notice during the last CSSConfEU conference in Berlin; it was also an important topic in other conferences along this year.

15789146369_b390b71cf8_zThis week was a nice opportunity to discuss with other web core hackers the issues I’ve found to properly implement the CSS Box Alignment specification in WebKit, see discussion in the bugzilla for details. We have concluded  that is not an easy problem that should be discussed in the mailing list, as it would imply assuming a performance overhead in CSS parsing. The ‘auto’ value defined by the spec for all the Box Alignment properties, to be resolved during the cascade depending on the type of elements, is affecting the current implementation of Flexible Box and MathML so we will have to find a solution.

I also produced a bunch of patches for WebKit to improve the way we were managing margins and float elements, properly identifying the elements creating a new formatting context and applying some refactoring to make the code clearer; these improvements fixed several issues in Grid Layout as well. Besides, borders, margin and padding was finally adapted in Grid Layout to the different writing-modes, which was a patch I’ve been working for some weeks already and had the opportunity to complete during this hackfest.

I think that’s all for now, hope to see you all in the next Web Engines Hackfest 2015.

sponsors

 

by jfernandez at December 11, 2014 10:10 AM

December 09, 2014

Manuel Rego

Web Engines Hackfest 2014

The Hackfest

Web Engines Hackfest 2014 (picture by Adrián Pérez) Web Engines Hackfest 2014 (picture by Adrián Pérez)

Today we’re finishing the first edition of the Web Engines Hackfest. This is like the continuation of the “WebKitGTK+ Hackfest” that has been organized by Igalia since 2009.

Checking some numbers, we’ve been 30 people working together for 4 full-days (even nights in some cases). On top of that, there’ve been 6 nice talks and several interesting discussions around the open web platform.

As usual the hackfest started with an initial brainstorming where everybody introduced themselves and their plans for the following days. Martin Robinson led the discussion and identified the main topics, that were later debated in detail on the breakout sessions.

The Talks

Despite of being an un-conference format event, this year we’ve arranged a few nice talks about different topics (you can find the slides in the hackfest wiki page):

  • Raspberry Pi Browser by Gustavo Noronha & ChangSeok Oh: They talked about the work done to have a full-browser working in a low specs device like the RPi. They used WebKitGTK+, but not WebKit2 yet, and they need to do some cool tricks to avoid the hardware limitations.
  • CSS Grid Layout by Sergio Villar: Sergio explained the work we’ve been doing around CSS Grid Layout on Blink and WebKit. Apart from explaining some of the nice things of the feature, he revealed our plans to ship it on Chrome during the first quarter of 2015.
  • State of JS Implementations by Andy Wingo: As usual a great talk by Wingo summarizing the evolution of JavaScript during the last years in the open source engines (JSC, SpiderMonkey and V8) and what we can expect in the future. Next year we’ll bring us further optimizations and a faster web for the users.

TyGL presentation by Zoltán Herczeg (picture by Adrián Pérez) TyGL presentation by Zoltán Herczeg (picture by Adrián Pérez)

  • GPU Based 2D Rendering With TyGL by Zoltán Herczeg: Good introduction to TyGL port explaining all the internals and architecture. The different technical challenges and the solutions provided by TyGL to face them.
  • WebKit for Wayland by Žan Doberšek: Žan presented a new WebKit port for Wayland platform. The main use case for this port is to have a just one web view full-screen (like in a set top box). He explained the internal details and how it’s done the integration with Wayland.
  • A Short Introduction to Servo by Martin Robinson: Martin explained the new parallel browser engine Servo and the Rust programming language behind the scenes. He talked about the current status and the next steps, it should be soon in dog-foodable state.

CSS Features breakout session (picture by Adrián Pérez) CSS Features breakout session (picture by Adrián Pérez)

My personal experience

From the personal point of view, I’ve been mostly focused on CSS stuff. We’ve a nice discussing regarding the new layout models (from multi-column, flexbox and grid to regions and shapes) and how long they take to be a major feature widely used on the web.

On the one hand, I’ve been carrying on with the regular work I’m lately doing related to CSS Grid Layout. I found and fixed a crash in the absolutely positioned grid children support that landed past week. In addition I managed to get few more tests landed on the grid W3C test suite that we’re starting to create (thanks to Gérard Talbot for the reviews).

A Coruña from San Pedro (picture by Adrián Pérez) A Coruña from San Pedro (picture by Adrián Pérez)

On the other hand, I’ve been talking with Mihnea Ovidenie about CSS Regions and how CSS Grid Layout could help to avoid the criticism regarding empty HTML elements required to create regions.
Grid allows you to divide the webpage in areas directly from CSS, each area is not associated with an HTML element (a DOM node). So if regions are defined using those areas, it won’t be needed to create empty elements on the DOM tree.
Basically, this seems to match with what is defined in the CSS Pagination Templates module that is being drafted by Alan Stearns.

Thanks

Finally I’d like to thanks to all the people attending (this event wouldn’t be possible without all of you). We hope you enjoined the hackfest and also your stay in A Coruña. See you next year!

Web Engines Hackfest 2014 sponsors: Adobe, Collabora and Igalia Web Engines Hackfest 2014 sponsors: Adobe, Collabora and Igalia

Also big thanks the sponsors Adobe, Collabora and Igalia for making possible such a great event.

December 09, 2014 11:00 PM

Žan Doberšek

Announcing WebKit for Wayland

Here at Igalia we are happy to announce the Wayland port of WebKit.

This port avoids using traditional GUI toolkits in favor of directly operating with the Wayland display protocol. Leveraging the WebKit2 multi-process architecture, the UIProcess is implemented as a shared library and loaded by the Wayland compositor, enabling the WebProcess to act as a direct client of the compositor while still being controlled by the UIProcess.

EGL, the Wayland EGL platform, and OpenGL ES are used for hardware-accelerated compositing of the rendered Web content. GLib, Libsoup and Cairo are used under the hood.

The port serves as a good base for building systems and environments that are mostly or completely relying on the Web platform technologies for building the desired interface.

Overall the port is still in its early days, with some basic functionality (e.g. functional keyboard and mouse input support) and many other Web platform features still not supported. But with Wayland EGL support constantly growing in graphics drivers for different GPUs, it can already be tested on devices like the Raspberry Pi or the Jetson TK1 development board.

In terms of supported Wayland compositors, for the moment we only support Weston (the reference Wayland compositor implementation), which is also used for development purposes. It’s also used for running the layout tests by again pushing WebKitTestRunner functionality into a shared library, though all that is still in very early stages.

The code is available on GitHub. There are also short instructions for building the dependencies and the port, and how to run it.

There’s also additional repositories there (for Cairo, Weston), containing changes that haven’t yet been pushed upstream. In the following days we’ll also be providing Buildroot configurations that can be used for cross-compiling the whole software stack for the different supported hardware.

We look forward to continuing evolving this work, enabling further features and improving performance on the software side and adding support for additional devices.

 

by Žan Doberšek at December 09, 2014 03:31 PM

Andy Wingo

state of js implementations, 2014 edition

I gave a short talk about the state of JavaScript implementations this year at the Web Engines Hackfest.


29 minutes, vorbis or mp3; slides (PDF)

The talk goes over a bit of the history of JS implementations, with a focus on performance and architecture. It then moves on to talk about what happened in 2014 and some ideas about where 2015 might be going. Have a look if that's a thing you are in to. Thanks to Adobe, Collabora, and Igalia for sponsoring the event.

by Andy Wingo at December 09, 2014 10:29 AM

December 04, 2014

Gwang Yoon Hwang

Introducing Threaded Compositor

Since CSS Animation appeared, the graphics performance of WebKit has become an important issue, especially in embedded systems world. Many users (including me) want fancy and stunning user interfaces, even in a small handset device which have very restricted CPU and GPU.

Luckily, I could work through the years to improve the graphics performance of WebKit. And with the support from colleagues in Igalia, I just finished to make the Threaded Compositor for WebKitGTK+ as a review-ready state.

The Threaded Compositor focused to improve the performance of WebKitGTK+ for CSS Animation using dedicated thread for compositing. But not only CSS Animation, it also provides a way to improve the performance of scrolling, scaling, rendering of canvas and video element.

Before diving into the detail, it would be good to watch a video which introducing the Threaded Compositor.

Click here if a video does not appear above this text.

The example used in this video can be accessed from here. I modified the original version to run automatically for benchmarking purpose.

Also, the current implementation of Threaded Compositor is available on the github. It is based on the WebKit r176538.

Motivation

The main-thread is where everything gets executed, including layout, JavaScript execution, and other heavy operations. Thus, running accelerated compositing in the main-thread severely limits responsiveness and rendering performance. By having a separate thread for compositing, we can bring a significant performance improvement in scrolling, zooming, and rendering.

Currently, several ports have already implemented a way to perform compositing off the main-thread. Coordinate Graphics, which is used by Qt and EFL, runs accelerated compositing in UI Process, and Chromium has implemented Compositor Thread that runs off the main render thread.

Threaded Compositor is a threaded model of Coordinated Graphics. Unlike Coordinated Graphics, it composites contents in the dedicated thread of Web Process and doesn’t use complicated inter-process shareable textures. Because of that, it is much easier to implement other multi-threaded rendering techniques which is covered at Compositing the Media and the Canvas.

Off-the-main-thread Animation

Compositing a contents using OpenGL[ES] is a basic technique to accelerate CSS animations. However, even we are using GPU accelerated compositing, we can face a V-sync pitfall. ( This happens more frequently in the embedded systems. ) Most of GPU uses V-sync (Vertical synchronization) to prevent the screen tearing problem. It may blocks OpenGL’s SwapBuffer call at most 16 ms - if the monitor’s refresh rate is 60 Hz - until the next vblank interrupt.

As you can see in the below diagram, this problem can waste the main-thread’s CPU time.

A diagram to visualize rendering pipeline of the current WebKitGTK+.

Again, when the main thread executing OpenGL[ES]’s SwapBuffer call, the driver should wait V-sync interrupt from the DRM/KMS module. The above diagram shows the worst case scenario. The main-thread could render 6 frames in given time if there was no V-sync problem. But with the V-sync, it renders just 4 frames. It can be happened more frequently in embedded environments which have less powerful CPU and GPU.

A diagram to visualize rendering pipeline of the Threaded Compositor.

To overcome this problem, we uses a very common technique to use a different thread to update screen. As visualized at above diagram, the compositing thread takes all overheads from the V-sync problem. The main-thread can do other job while the compositing thread handles OpenGL calls. (Purple rectangles in the main-thread indicate inter-thread overheads.)

Moreover, the Threaded Compositor accelerates CSS Animations, also.

Because all of the layer properties (including GraphicsLayerAnimation) are passed to the compositing thread, it can make available to run CSS animations on the compositing thread without interrupting the main-thread. When CoordinatedGraphicsScene finishes to render a scene, it will check is there any running animations in the its layer tree. If there are active one, it sets a timer to render a scene itself.

Attached sequence diagram can be helpful to understand this idea.

A sequence diagram of the updating animation in the Threaded Compositor

Scrolling and Scaling

Scrolling and Scaling are also expensive job especially in the embedded device.

When we are scaling a web page, WebCore needs to update whole layout of its web page. As all of us knows, it is really expensive operation in the embedded platform.

A simplified diagram to visualize scaling procedure of the current WebKitGTK+.

Let’s assume that a user tries to scale a web page using a two finger gesture. The WebCore in the Web Process tries to produce scaled web page with a exact layout as soon as possible. However, if it is a embedded device, it can need more than 100 ms - 10 fps.

The Threaded Compositor renders a web page with a requested scale using its cached layer tree. By doing like that, we can give a immediate visual feed back to the user. And re-layouted web page will be updated later.

A simplified diagram to visualize scaling procedure of the Threaded Compositor.

It is similar in the scrolling operation. When we are scrolling the web page, this operation doesn’t need a huge re-layout operation. But it needs re-painting and bliting operation to update a web page.

A simplified diagram to visualize scrolling procedure of the current WebKitGTK+.

Because the Threaded Compositor uses TILED_BACKING_STORE, it can render the scrolled web page with cached tiles with a minimal latency. During ThreadedCompositor renders a web page, the main-thread scrolls "actual" view. Whenever the main-thread finishes "actual" scroll, these changes are collected by CompositingCoordinator. And these changes are passed to the ThreadedCompositor to update the view.

A simplified diagram to visualize scrolling procedure of the Threaded Compositor.

Unfortunately, these optimization is only supported when we are using fixed layout. To support this without fixed layout, we need to refactor TILED_BACKING_STORE in WebKit and implement proper overlay scrollbars in WebKitGTK+.

Compositing the Media and the Canvas

Not only CSS3 Animation and scaling but also the video and canvas can be accelerated by the Threaded Compositor. Because it is little bit out of scope of this post - Because it is not implemented for upstreaming - I’ll only describe about it briefly here.

In the TextureMapper, Media (Plugins, Video element, …) and Canvas (WebGL, 2D Canvas) can be rendered by implementing TextureMapperPlatformLayer. Most important interface of the TextureMapperPlatformLayer is TextureMapperPlatformLayerClient::setPlatformLayerNeedsDisplay which requests the compositor to composite the its contents.

If the actual renderer of a Media and a Canvas element runs on off-the-main-thread, it is possible to bypass the main-thread entirely. The renderer can calls TextureMapperPlatformLayerClient::setPlatformLayerNeedsDisplay when it updates its result from its own thread. And compositing thread will composite the result without using the main-thread.

A diagram to visualize rendering pipeline of for Canvas and Video.

Also, if the target platform supports streams of texture, we can avoid pipeline hazards when drawing video layers in modern mobile GPUs which uses tile based rendering.

Performance

This performance comparison was presented in Linux.Conf.AU 2013. It is based on pretty old implementation but it shows meaningful performance improvement compare to plain TextureMapper method. I could not regenerate this result using my current laptop because it is too fast to make stressed condition. I hope I can share updated the performance comparison using the embedded device soon.

  • Test Cases [*]

    • leaves n

      • Modified the famous accelerated compositing example, WebKit Leaves, to draw n leaves.
    • 3DCube n

      • n cubes rotating using CSS3 animation.
  • Test Environment

    • Prototype of Threaded Compositor Implementation based on r134883
    • WebKit2 Gtk+ r140386
    • Intel Core i5-2400 3.10Ghz, Geforce GTS450, Ubuntu 12.04 x86_64
Tests ThreadedCompositor (fps) WebKit2 Gtk+ (fps) Improved %
leaves500 58.96 35.86 64.42
leaves1000 58.98 25.88 127.90
leaves2000 32.98 18.04 82.82
3DCube360 57.27 32.09 78.47
3DCube640 33.64 23.52 43.03
[*]These tests are made by Company 100

Class Diagram

I made this diagram to help my understanding during the implementation, but It would be good to share to help others who are interested in it.

A class diagram of the Threaded Compositor
  • CompositingCoordinator

    A class takes the responsibility of managing compositing resources in the main-thread.

  • ThreadedCompositor

    This class has a dedicate thread for compositing. It owns ViewportController and CoordinatedGraphicsScene to render scene on the actual surface.

  • ThreadedCoordinatedLayerTreeHost

    A concrete class of LayerTreeHost and CompositingCoordinator::Client for CoordinatedGraphics. And it has ThreadedCompositor to use the Threaded Compositor.

  • CoordinatedGraphicsScene

    A class which has a complete scene graph and rendering functionality. It synchronizes its own scene graph with a graphics layer tree in compositing coordinator.

  • ThreadSafeCoordinatedSurface

    This class implements a surface using ImageBuffer as a backend to use in the Web Process.

  • TextureMapperLayer

    It has actual properties to render a layer.

  • ViewportController

    This class is responsible to handle scale factor and scrolling position in the compositing thread.

In the main-thread, all of the visual changes of each GraphicsLayer are passed to CompositingCoordinator via CoordinatedGraphicsLayer. And CompositingCoordinator collects state changes from the GraphicsLayer tree when it is requested to do so.

From this point, Threaded Compositor and Coordinated Graphics behaves differently.

In Coordinated Graphics, the collected state changes of the layer tree is encoded and passed to CoordinatedLayerTreeHostProxy in UIProcess. And CoordinatedGraphicsScene applies these state changes to its own TextureMapperLayer tree and renders these on the frame buffer.

But in Threaded Compositor, these states are passed to CoordinatedGraphicsScene which owned by ThreadedCompositor. ThreadedCompositor has its own RunLoop and thread, all of the actual compositing operations are executed in the dedicated compositing thread in Web Process.

Current Status

Most of re-factoring for Threaded Compositor in Coordinated Graphics side was done in the last year.

However, upstreaming of Threaded Compositor was delayed to various issues. I had to simplify the design (which was quite complicate at that time) and resolve various issues Before starting upstreaming process.

In this year, WebKitGTK+ decided to deprecate WebKit1. Because of that it was possible to make much less complicated design. Also, I fixed the Threaded Compositor not to break current behavior of WebKitGTK+ without fixed layout.

I’m happy that I can start upstreaming process of this implementation, from now on.

Things To Do

  • Reduce memory uses when updating texture

    • RenderObjects draw its contents to UpdateAtlas to pass bitmaps to the compositing thread. However, The size of each UpdateAtlas is 4MB. It is quite big memory consumption in the embedded device. Still, I couldn’t find a way to solve it without using proprietary GPU driver.
  • Avoid pipeline hazards in the mobile GPU

    • Modern mobile GPUs uses tile-based [deferred] rendering. These GPUs can render multiple frames (~ 3 frames) concurrently to overcome its low performance. It is well working technique in the game industry which uses static textures.
    • In the WebKit, most of textures are dynamically generated. Whenever we are updating a texture using texSubImage2D, GPU would be stalled because it would be used in the previous rendering call.
    • To avoid this problem, chromium uses producer/consumer model of texture
    • We can create a wrapper for a texture which provides multiple buffering. In this way we can avoid the pipeline hazards.
  • Reduce the maintenance burden

    • we need to re-factor Coordinated Graphics to share codes with Apple’s UI SIDE COMPOSITING codes. Most of messages are duplicated. So it can be easily done by defining platform specific parts.
  • Accelerate the performance of 2D Canvas element and Video element

  • Check the performance of Threaded Compositor in the embedded device

    • I hope I can share about it soon.
  • Most of all, upstreaming!

by Gwang Yoon Hwang at December 04, 2014 01:20 AM

Adrián Pérez

Inception! JS-in-JS!

Apparently this November has been the gloomiest in a while, and that certainly may have slowed down my on-going quest to bring arrow functions to V8. Though arrow functions deserve a write-up themselves, my musings today are about a side quest that started in September, when I had the chance to visit enchanting Berlin one more time, this time as a speaker at JSConf EU.

So what does a guy hacking on JavaScript engines talk about in a conference which is not a lair of compiler neckbeards? Of course it had to be someting related to the implementation of JavaScript engines, but it should have something to do with what can be done with JavaScript. It turns out that V8 has large-ish parts implemented in a subset of JavaScript (and SpiderMonkey; and JavaScriptCore, a bit too!). Enter “Inception: JavaScript-in-JavaScript — where language features are implemented in the language itself any similarity with the movie is a mere coincidence)”.

Intermission

Provided that the video of my talk at JSConf EU is now online, you may want to go watch it now (as an option, with the slides in another window) and come back afterwards to continue reading here.

Did you see the video of the talk? Good, let's recap a bit before continuing.

Engines use JavaScript, too

I made some quick numbers (with CLOC) to give you an idea of how much major engines are using JavaScript themselves. This is the number of thousands of lines of code (kLOC) of the engines themselves, how much of it is JavaScript, and its percentage over the total (excluding the test suites):

Engine Total JS %
JavaScriptCore 269 1 0.3
SpiderMonkey 457 18 3.9
V8 532 23 4.3

Both V8 and SpiderMonkey have a good deal of the runtime library implemented in JavaScript, relying on the JIT to produce machine code to provide a good performance. JavaScriptCore might have an edge in cases where it is not possible to use the JIT and its C++ implementation can be faster when using the interpreter (iOS, anyone?). In the case of V8 the choice of going JavaScript is clear, because it always compiles to machine code.

Most of the features introduced in the runtime library by the ES6 specification can be implemented without needing to touch low-level C++ code. Or do we?

Live-coding %TypedArray%.prototype.forEach()

For my talk in Berlin I wanted to show how accessible it is to use JavaScript itself as an entry point to the guts of V8. What would be better than picking something new from the spec, and live-coding it, let's say the .forEach() method for typed arrays? Because of course, coding while delivering a talk cannot ever go bad.

Live-coding, what could go wrong?

As a first approach, let's write down an initial version of .forEach(), in the very same way as we would if we were writing a polyfill:

function TypedArrayForEach(cb, thisArg) {
    if (thisArg === undefined)
        thisArg = this;
    for (var i = 0; i < this.length; i++)
        cb.call(thisArg, this[i]);
}

But, in which file exactly should this go, and how is it made available in the runtime library? It turns out that there is a bunch of .js files under src/, one of them conspicuously named typedarray.js, prompting us to go ahead and write the above function in the file, plus the following initialization code inside its —also aptly named— SetupTypedArrays() function, right at the end of the macro definition:

function SetupTypedArrays() {
macro SETUP_TYPED_ARRAY(ARRAY_ID, NAME, ELEMENT_SIZE)
    // ...

    // Monkey-patch .forEach() into the prototype
    global.NAME.prototype.forEach = TypedArrayForEach;
endmacro

TYPED_ARRAYS(SETUP_TYPED_ARRAY)
}

Wait! What, macros? Yes, my friends, but before crying out in despair, rebuild V8 and run the freshly built d8 passing --harmony-arrays:

% ./out/x64.debug/d8 --harmony-arrays
V8 version 3.31.0 (candidate) [console: readline]
d8> Int8Array.prototype.forEach
function forEach() { [native code] }
d8>

It is alive!

Standards are hard

At this point we have a working polyfill-style implementation patched in the prototypes of typed arrays, and it works. Yet, it does not quite work as it should. To begin with, the function should throw TypeError if called on an object that is not a typed array, or if the callback is not callable. The spec also mentions that the .length property of the function should be 1 (that is: it is declared as having a single argument), and so on.

You see, formally specifying a programming language is a complex task because all those iffy details have to be figured out. Not to forget about the interactions with other parts of the language, or the existing code bases. In the particular case of JavaScript the backwards compatibility adds a good deal of complexity to the formal semantics, and wading through the hundreds of pages of the ES6 specification can be a daunting task. We may feel compelled to define things differently.

How standards are born, courtesy of XKCD.

Personally, I would eradicate one of the modes, keep just sloppy or strict, but not both. Only that then it would not be JavaScript anymore. Think about all that legacy code that lurks on the depths of the Internet, wouldn't it be nice that it continues to work?

That leads to yet another subtlety: the value passed as thisArg (available as this in the callback function) is expected to be wrapped into an object only for sloppy mode functions — but not for strict mode ones, because strict mode is, well, stricter. While .call() handles this, it is not particularly fast (more on this later), so we may end up needing a way to know which functions are in sloppy mode, do the wrapping ourselves (when needed) and arrange the calls in a more efficient way. We have to assume that there will be code in the wild relying in this behavior. If only we could knew which functions are sloppy!

A better solution

As a matter of fact, the JavaScript engines already know which functions are strict and which ones not. We just cannot get access to that information directly. Fortunately V8 provides a way in which JavaScript code can invoke C++ runtime functions: the functions declared using the RUNTIME_FUNCTION() macro can be invoked from JavaScript, by prefixing their names with %, and if you take a peek into src/runtime/, there are plenty.

To begin with, there is %_CallFunction(), which is used pervasively in the implementation of the runtime library instead of .call(). Using it is an order of magnitude faster, we will be keeping it in our utility belt. Then, there is %IsSloppyModeFunction(), which can be used to find out the “sloppyness” of our callback functions.

But wait! Do you remeber that I mentioned macros above? Take a look at src/macros.py, go, now. Did you see something else interesting in there? Plenty of utility macros, used in the implementation of the runtime library, live there. We can use SHOULD_CREATE_WRAPPER() instead. Also, there are a number of utility functions in src/runtime.js, most annotated with sections of the EcmaScript specification because they do implement algorithms as written in the spec. Using those makes easier to make sure our functions follow all the intrincacies of the spec.

Let's give them a try:

// Do not add a "thisArg" parameter, to make
// TypedArrayForEach.length==1 as per the spec.
function TypedArrayForEach(cb /* thisArg */) {
    // IsTypedArray(), from src/runtime/runtime-typedarray.cc
    // MakeTypeError(), from src/messages.js
    if (!%IsTypedArray(this))
        throw MakeTypeError('not_typed_array', []);

    // IS_SPEC_FUNCTION(), from src/macros.py
    if (!IS_SPEC_FUNCTION(cb))
        throw MakeTypeError('called_non_callable', [cb]));

    var wrap = false;
    var thisArg = this;
    if (arguments.length > 1) {
        thisArg = arguments[1];
        // SHOULD_CREATE_WRAPPER(), from src/macros.py
        wrap = SHOULD_CREATE_WRAPPER(cb, thisArg);
    }

    for (var i = 0; i < this.length; i++) {
        // ToObject(), from src/runtime.js
        %_CallFunction(wrap ? ToObject(thisArg) : thisArg,
                       this[i], i, this, cb);
    }
}

Now that is much closer to the version that went into V8 (which has better debugging support on top), and it covers every the corner cases in the spec. Rejoice!

When in Rome...

Needless to say, there is no documentation for most of the helper functions used to implement the runtime library. The best is to look around and see how functions similar to the one we want to implement are done, while keeping a copy of the spec around for reference.

Do as the tigers do.

Those are a couple of things one can find by scouring the the existing code of the runtime library. First: for typed arrays, the macro in SetupTypedArrays() installs a copy of the functions for each typed array kind, effectively duplicating the code. While this may seem gratuitious, each copy of the function is very likely to be used only on typed arrays in which elements are a certain type. This helps the compiler to generate better code for each of the types.

Second: Runtime functions with the %_ prefix in their names (like %_CallFunction()) do not have their C++ counterpart in src/runtime/. Those are inline runtime functions, for which the code is emitted directly by the code generation phase of the compiler. They are real fast, use them whenever possible!

Next

After coming back from Berlin, I have been implementing the new typed array methods, and the plan is to continue tackling them one at a time.

As mentioned, %TypedArray%.prototype.forEach() is already in V8 (remember to use the --harmony-arrays flag), and so is %TypedArray%.of(), which I think is very handy to create typed arrays without needing to manually assign to each index:

var pow2s = Uint8Array.of(1, 2, 4, 16, 32, 64, 128, 256);

Apart from my occassional announcements (and rants) on Twitter, you can also follow the overall progress by subscribing to issue #3578 in the V8 bug tracker.

Last but not least, a mention to the folks at Bloomberg, who are sponsoring our ongoing work implementing ES6 features. It is great that they are able to invest in improving the web platform not just for themselves, but for everybody else as well. Thanks!

by aperez (adrian@perezdecastro.org) at December 04, 2014 12:00 AM

December 02, 2014

Andy Wingo

there are no good constant-time data structures

Imagine you have a web site that people can access via a password. No user name, just a password. There are a number of valid passwords for your service. Determining whether a password is in that set is security-sensitive: if a user has a valid password then they get access to some secret information; otherwise the site emits a 404. How do you determine whether a password is valid?

The go-to solution for this kind of problem for most programmers is a hash table. A hash table is a set of key-value associations, and its nice property is that looking up a value for a key is quick, because it doesn't have to check against each mapping in the set.

Hash tables are commonly implemented as an array of buckets, where each bucket holds a chain. If the bucket array is 32 elements long, for example, then keys whose hash is H are looked for in bucket H mod 32. The chain contains the key-value pairs in a linked list. Looking up a key traverses the list to find the first pair whose key equals the given key; if no pair matches, then the lookup fails.

Unfortunately, storing passwords in a normal hash table is not a great idea. The problem isn't so much in the hash function (the hash in H = hash(K)) as in the equality function; usually the equality function doesn't run in constant time. Attackers can detect differences in response times according to when the "not-equal" decision is made, and use that to break your passwords.

Edit: Some people are getting confused by my use of the term "password". Really I meant something more like "secret token", for example a session identifier in a cookie. I thought using the word "password" would be a useful simplification but it also adds historical baggage of password quality, key derivation functions, value of passwords as an attack target for reuse on other sites, etc. Mea culpa.

So let's say you ensure that your hash table uses a constant-time string comparator, to protect against the hackers. You're safe! Or not! Because not all chains have the same length, "interested parties" can use lookup timings to distinguish chain lookups that take 2 comparisons compared to 1, for example. In general they will be able to determine the percentage of buckets for each chain length, and given the granularity will probably be able to determine the number of buckets as well (if that's not a secret).

Well, as we all know, small timing differences still leak sensitive information and can lead to complete compromise. So we look for a data structure that takes the same number of algorithmic steps to look up a value. For example, bisection over a sorted array of size SIZE will take ceil(log2(SIZE)) steps to get find the value, independent of what the key is and also independent of what is in the set. At each step, we compare the key and a "mid-point" value to see which is bigger, and recurse on one of the halves.

One problem is, I don't know of a nice constant-time comparison algorithm for (say) 160-bit values. (The "passwords" I am thinking of are randomly generated by the server, and can be as long as I want them to be.) I would appreciate any pointers to such a constant-time less-than algorithm. However a bigger problem is that the time it takes to access memory is not constant; accessing element 0 of the sorted array might take more or less time than accessing element 10. In algorithms we typically model access on a more abstract level, but in hardware there's a complicated parallel and concurrent protocol of low-level memory that takes a non-deterministic time for any given access. "Hot" (more recently accessed) memory is faster to read than "cold" memory.

Non-deterministic memory access leaks timing information, and in the case of binary search the result is disaster: the attacker can literally bisect the actual values of all of the passwords in your set, by observing timing differences. The worst!

You could get around this by ordering "passwords" not by their actual values but by their cryptographic hashes (e.g. by their SHA256 values). This would force the attacker to bisect not over the space of password values but of the space of hash values, which would protect actual password values from the attacker. You still leak some timing information about which paths are "hot" and which are "cold", but you don't expose actual passwords.

It turns out that, as far as I am aware, it is impossible to design a key-value map on common hardware that runs in constant time and is sublinear in the number of entries in the map. As Zooko put it, running in constant time means that the best case and the worst case run in the same amount of time. Of course this is false for bucket-and-chain hash tables, but it's false for binary search as well, as "hot" memory access is faster than "cold" access. The only plausible constant-time operation on a data structure would visit each element of the set in the same order each time. All constant-time operations on data structures are linear in the size of the data structure. Thems the breaks! All you can do is account for the leak in your models, as we did above when ordering values by their hash and not their normal sort order.

Once you have resigned yourself to leaking some bits of the password via timing, you would be fine using normal hash tables as well -- just use a cryptographic hashing function and a constant-time equality function and you're good. No constant-time less-than operator need be invented. You leak something on the order of log2(COUNT) bits via timing, where COUNT is the number of passwords, but since that's behind a hash you can't use it to bisect on actual key values. Of course, you have to ensure that the hash table isn't storing values in sorted order and short-cutting early. This sort of detail isn't usually part of the contract of stock hash table implementations, so you probably still need to build your own.

Edit: People keep mentioning Cuckoo hashing for some reason, despite the fact that it's not a good open-hashing technique in general (Robin Hood hashes with linear probing are better). Thing is, any operation on a data structure that does not touch all of the memory in the data structure in exactly the same order regardless of input leaks cache timing information. That's the whole point of this article!

An alternative is to encode your data structure differently, for example for the "key" to itself contain the value, signed by some private key only known to the server. But this approach is limited by network capacity and the appropriateness of copying for the data in question. It's not appropriate for photos, for example, as they are just too big.

Edit: Forcing constant-time on the data structure via sleep() or similar calls is not a good mitigation. This either massively slows down your throughput, or leaks information via side channels. Remote attackers can measure throughput instead of latency to determine how long an operation takes.

Corrections appreciated from my knowledgeable readers :) I was quite disappointed when I realized that there were no good constant-time data structures and would be happy to be proven wrong. Thanks to Darius Bacon, Zooko Wilcox-O'Hearn, Jan Lehnardt, and Paul Khuong on Twitter for their insights; all mistakes are mine.

by Andy Wingo at December 02, 2014 10:01 PM

November 27, 2014

Andy Wingo

scheme workshop 2014

I just got back from the US, and after sleeping for 14 hours straight I'm in a position to type about stuff again. So welcome back to the solipsism, France and internet! It is good to see you on a properly-sized monitor again.

I had the enormously pleasurable and flattering experience of being invited to keynote this year's Scheme Workshop last week in DC. Thanks to John Clements, Jason Hemann, and the rest of the committee for making it a lovely experience.

My talk was on what Scheme can learn from JavaScript, informed by my work in JS implementations over the past few years; you can download the slides as a PDF. I managed to record audio, so here goes nothing:


55 minutes, vorbis or mp3

It helps to follow along with the slides. Some day I'll augment my slide-rendering stuff to synchronize a sequence of SVGs with audio, but not today :)

The invitation to speak meant a lot to me, for complicated reasons. See, Scheme was born out of academic research labs, and to a large extent that's been its spiritual core for the last 40 years. My way to the temple was as a money-changer, though. While working as a teacher in northern Namibia in the early 2000s, fleeing my degree in nuclear engineering, trying to figure out some future life for myself, for some reason I was recording all of my expenses in Gnucash. Like, all of them, petty cash and all. 50 cents for a fat-cake, that kind of thing.

I got to thinking "you know, I bet I don't spend any money on Tuesdays." See, there was nothing really to spend money on in the village besides fat cakes and boiled eggs, and I didn't go into town to buy things except on weekends or later in the week. So I thought that it would be neat to represent that as a chart. Gnucash didn't have such a chart but I knew that they were implemented in Guile, as part of this wave of Scheme consciousness that swept the GNU project in the nineties, and that I should in theory be able to write it myself.

Problem was, I also didn't have internet in the village, at least then, and I didn't know Scheme and I didn't really know Gnucash. I think what I ended up doing was just monkey-typing out something that looked like the rest of the code, getting terrible errors but hey, it eventually worked. I submitted the code, many years ago now, some of the worst code you'll read today, but they did end up incorporating it into Gnucash and to my knowledge that report is still there.

I got more into programming, but still through the back door, so to speak. I had done some free software work before going to Namibia, on GStreamer, and wanted to build a programmable modular synthesizer with it. I read about Supercollider, and decided I wanted to do something like that but with the "unit generators" defined in GStreamer and orchestrated with Scheme. If I knew then that Scheme could be fast, I probably would have started on an entirely different course of things, but that did at least result in gainful employment doing unrelated GStreamer things, if not a synthesizer.

Scheme became my dominant language for writing programs. It was fun, and the need to re-implement a bunch of things wasn't a barrier at all -- rather a fun challenge. After a while, though, speed was becoming a problem. It became apparent that the only way to speed up Guile would be to replace its AST interpreter with a compiler. Thing is, I didn't know how to write one! Fortunately there was previous work by Keisuke Nishida, jetsam from the nineties wave of Scheme consciousness. I read and read that code, mechanically monkey-typed it into compilation, and slowly reworked it into Guile itself. In the end, around 2009, Guile was faster and I found myself its co-maintainer to boot.

Scheme has been a back door for me for work, too. I randomly met Kwindla Hultman-Kramer in Namibia, and we found Scheme to be a common interest. Some four or five years later I ended up working for him with the great folks at Oblong. As my interest in compilers grew, and it grew as I learned more about Scheme, I wanted something closer there, and that's what I've been doing in Igalia for the last few years. My first contact there was a former Common Lisp person, and since then many contacts I've had in the JS implementation world have been former Schemers.

So it was a delight when the invitation came to speak (keynote, no less!) the Scheme Workshop, behind the altar instead of in the foyer.

I think it's clear by now that Scheme as a language and a community isn't moving as fast now as it was in 2000 or even 2005. That's good because it reflects a certain maturity, and makes the lore of the tribe easier to digest, but bad in that people tend to ossify and focus on past achievements rather than future possibility. Ehud Lamm quoted Nietzche earlier today on Twitter:

By searching out origins, one becomes a crab. The historian looks backward; eventually he also believes backward.

So it is with Scheme and Schemers, to an extent. I hope my talk at the conference inspires some young Schemer to make an adaptively optimized Scheme, or to solve the self-hosted adaptive optimization problem. Anyway, as users I think we should end the era of contorting our code to please compilers. Of course some discretion in this area is always necessary but there's little excuse for actively bad code.

Happy hacking with Scheme, and au revoir!

by Andy Wingo at November 27, 2014 05:48 PM

November 24, 2014

Samuel Iglesias

piglit (II): How to launch a tailored piglit run

Last post I gave an introduction to piglit, an open-source test suite for OpenGL implementations. On that post, I explained how to compile the source code, how to run full piglit test suite and how to analyze the results.

However, you can tailor a piglit run to execute the specific tests that match your needs. For example I might want to check if a specific subset of tests pass because I am testing a new feature in Mesa as part of my job at Igalia.

Configure the piglit run

There are several parameters that configure the piglit run to match our needs:

  • –dry-run: do not execute the tests. Very useful to check if the parameters you give are doing what you expect.
  • –valgrind: It runs Valgrind memcheck on each test program that it’s going to execute. If it finds any error, the test fails and it saves the valgrind output into results file.
  • –all-concurrent: run all tests concurrently.
  • –no-concurrency: disable concurrent test runs.
  • –sync: sync results to disk after every test so you don’t lose information if something bad happens to your system.
  • –platform {glx,x11_egl,wayland,gbm,mixed_glx_egl}: if you compiled waffle with different windows system, this is the name of the one passed to waffle.

There is a help parameter if you want further information:

$ ./piglit run -h

Skip tests

Sometimes you prefer to skip some tests because they are causing a GPU hang or taking a lot of time and their output is not interesting for you. The way to skip a test is using the -x parameter:

$ ./piglit run tests/all -x texture-rg results/all-reference-except-texture-rg

Run specific tests

You can run some specific subset of tests using a filter by test name. Remember that it’s filtering by test name but not by functionality, so you might miss some test programs that check the same functionality.

$ ./piglit run tests/all -t color -t format results/color-format

Also, you can concatenate more parameters to add/remove more tests that are going to be executed.

$ ./piglit run tests/all -t color -t format -t tex -x texture-rg results/color-format-tex

Run standalone tests

There is another way to run standalone tests apart from using the -t argument to piglit run. You might be interested in this way if you want to run GDB on a failing tests to debug what is going on.

If we remember how was the HTML output for a given test:

piglit-test-details

The command field specifies the program name executed for this test together with all its arguments. Let me explain what they mean in this example.

  • Some binaries receives arguments that specify the data type to test (like at this case GL_RGB16_SNORM) or any other data: number of samples, msaa, etc.
  • -fbo: it draws in an off-screen framebuffer.
  • -auto: it automatically run the test and when it finishes, it closes the window and prints if the test failed or passed.

Occasionally, you might run the test program without fbo and auto parameters because you want to see what it draws on the window to understand better the bug you are debugging.

Create your own test profile

Besides explicitly adding/removing tests from tests/all.py (or other profiles) in the piglit run command, there is other way of running a specific subset of tests: profiles.

A test profile in piglit is a script written in Python that selects the tests to execute.

There are several profiles already defined in piglit, two of them we already saw them in this post series: tests/sanity.tests and tests/all. First one is useful to check if piglit was correctly compiled together with its dependencies, while the second runs all piglit tests in one shot. Of course there are more profiles inside tests/ directory: cl.py (OpenCL tests), es3conform.py (OpenGL ES 3.0 conformance tests), gpu.py, etc.

Eventually you will write your own profile because adding/removing tests in the console command is tiresome and prone to errors.

This is an example of a profile based on tests/gpu.py that I was recently using for testing the gallium llvmpipe driver.

# -*- coding: utf-8 -*-

# quick.tests minus compiler tests.

from tests.quick import profile
from framework.glsl_parser_test import GLSLParserTest

__all__ = ['profile']

# Remove all glsl_parser_tests, as they are compiler test
profile.filter_tests(lambda p, t: not isinstance(t, GLSLParserTest))

# Drop ARB_vertex_program/ARB_fragment_program compiler tests.
del profile.tests['spec']['ARB_vertex_program']
del profile.tests['spec']['ARB_fragment_program']
del profile.tests['asmparsertest']

# Too much time on gallium llvmpipe
profile.tests['spec']['!OpenGL 1.0'].pop('gl-1.0-front-invalidate-back', None)
profile.tests['spec']['!OpenGL 1.0'].pop('gl-1.0-swapbuffers-behavior', None)
profile.tests['spec']['!OpenGL 1.1'].pop('read-front', None)
profile.tests['spec']['!OpenGL 1.1'].pop('read-front clear-front-first', None)
profile.tests['spec']['!OpenGL 1.1'].pop('drawbuffer-modes', None)
profile.tests['spec']['EXT_framebuffer_blit'].pop('fbo-sys-blit', None)
profile.tests['spec']['EXT_framebuffer_blit'].pop('fbo-sys-sub-blit', None)

As you see, it picks the test lists from quick.py but with some changes: it drops all the tests related to a couple of OpenGL extensions (ARB_vertex_program and ARB_fragment_program) and it drops other seven tests because they took too much time when I was testing them with gallium llvmpipe driver on my laptop.

I recommend you to spend some time playing with profiles as it is a very powerful tool when tailoring a piglit run.

What else?

In some situations, you want to know which test produces kernel error messages in dmesg, or which test is being ran now. For both cases, piglit provides parameters for run command:

$ ./piglit run tests/all -v --dmesg results/all-reference

  • Verbose (-v) prints a line of output for each test before and after it runs, so you can find which takes longer or outputs errors without need to wait until piglit finishes.
  • Dmesg (–dmesg): it saves the difference of dmesg outputs before and after each test program is executed. Thanks to that, you can easily find which test produces kernel errors in the graphics device driver.

Wrapping up

After giving an introduction to piglit in my blog,  this post explains how to configure a piglit run, change the list of tests to execute and how to run a standalone test. As you see, piglit is very powerful tool that requires some time to learn how to use it appropriately.

In next post I will talk about a more advanced topic: how to create your own tests.

by Samuel Iglesias at November 24, 2014 07:58 AM

November 19, 2014

Andrés Gómez

Switching between nouveau and the nVIDIA proprietary OpenGL driver in (Debian) GNU/Linux

So lately I’ve been devoting my time in Igalia around the GNU/Linux graphics stack focusing, more specifically, in Mesa, the most popular open-source implementation of the OpenGL specification.

When working in Mesa and piglit, its testing suite, quite often you would like to compare the results obtained when running a specific OpenGL code with one driver or another.

In the case of nVIDIA graphic cards we have the chance of comparing the default open source driver provided by Mesa, nouveau, or the proprietary driver provided by nVIDIA. For installing the nVIDIA driver you will have to run something like:

root$ apt-get install linux-headers nvidia-driver nvidia-kernel-dkms nvidia-xconfig

Changing from one driver to another involves several steps so I decided to create a dirty script for helping with this.

The actions done by this script are:

  1. Instruct your X Server to use the adequate X driver.

    These instructions apply to the X.org server only.

    When using the default nouveau driver in Debian, the X.org server is able to configure itself automatically. However, when using the nVIDIA driver you most probably will have to instruct the proper settings to X.org.

    nVIDIA provides the package nvidia-xconfig. This package provides a tool of the same name that will generate a X.org configuration file suitable to work with the nVIDIA X driver:

root$ nvidia-xconfig 

WARNING: Unable to locate/open X configuration file.

Package xorg-server was not found in the pkg-config search path.
Perhaps you should add the directory containing `xorg-server.pc'
to the PKG_CONFIG_PATH environment variable
No package 'xorg-server' found
New X configuration file written to '/etc/X11/xorg.conf'

I have embedded this generated file into the provided custom script since it is suitable for my system:

echo 'Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection
' > /etc/X11/xorg.conf

I would recommend you to substitute this with another configuration file generated with nvidia-xconfig on your system.

  • Select the proper GLX library.

    Fortunately, Debian provides the alternatives mechanism to select between one or the other.
  • ALTERNATIVE=""

    ALTERNATIVE="/usr/lib/mesa-diverted"

    ALTERNATIVE="/usr/lib/nvidia"

    update-alternatives --set glx "${ALTERNATIVE}"

  • Black list the module we don’t want the Linux kernel to load on start up.

    Again, in Debian, the nVIDIA driver package installs the file /etc/nvidia/nvidia-blacklists-nouveau.conf that is linked, then, from /etc/modprobe.d/nvidia-blacklists-nouveau.conf instructing that the open source nouveau kernel driver for the graphic card should be avoided.

    When selecting nouveau, this script removes the soft link creating a new file which, instead of black listing nouveau’s driver, does it for the nVIDIA proprietary one:
  • rm -f /etc/modprobe.d/nvidia-blacklists-nouveau.conf
        echo "blacklist nvidia" > /etc/modprobe.d/nouveau-blacklists-nvidia.conf

    When selecting nVIDIA, the previous file is removed and the soft link is restored.

  • Re-generate the image used in the inital booting.

    This will ensure that we are using the proper kernel driver from the beginning of the booting of the system:
  • update-initramfs -u

    With these actions you will be already able to switch your running graphic driver.

    You will switch to nouveau with:

    root$ <path_to>/alternate-nouveau-nvidia.sh nouveau
    update-alternatives: using /usr/lib/mesa-diverted to provide /usr/lib/glx (glx) in manual mode
    update-initramfs: Generating /boot/initrd.img-3.17.0
    
    nouveau successfully set. Reboot your system to apply the changes ...

    And to the nVIDIA proprietary driver with:

    root$ <path_to>/alternate-nouveau-nvidia.sh nvidia
    update-alternatives: using /usr/lib/nvidia to provide /usr/lib/glx (glx) in manual mode
    update-initramfs: Generating /boot/initrd.img-3.17.0
    
    nvidia successfully set. Reboot your system to apply the changes ...

    It is recommended to reboot the system although theoretically you could unload the kernel driver and restart the X.org server. The reason is that it has been reported that unloading the nVIDIA kernel driver and loading a different one is not always working correctly.

    I hope this will be helpful for your hacking time!

    by tanty at November 19, 2014 10:52 AM

    November 14, 2014

    Andy Wingo

    on yakshave, on color, on cosines, on glitchen

    Hold on to your butts, kids, because this is epic.

    on yaks

    As in all great epics, our prideful, stubborn hero starts in a perfectly acceptable state of things, decides on a lark to make a small excursion, and comes back much much later to inflict upon you pictures from his journey.

    So. I have a web photo gallery but I don't take many pictures these days. Dealing with photos is a bit of a drag, and the ways that are easier like Instagram or what-not give me the (peer, corporate, government: choose 3) surveillance hives. So, I had vague thoughts that I should update my web gallery. Yakpoint 1.

    At the same time, my web gallery was written for mod_python on the server, and I don't like hacking in Python any more and kinda wanted to switch away from Apache. Yakpoint 2.

    So I rewrote the server-side part in Scheme. (Yakpoint 3.) It worked fine but I found I needed the ability to get the dimensions of files on the server, so I wrote a quick-and-dirty JPEG parser. Yakpoint 4.

    I needed EXIF data as well, as the original version displayed EXIF data, and for that I used a binding to libexif that I had written a few years ago when I thought about starting this project (Yakpoint -1). However I found some crashers in the library, because it had never really been tested in production, and instead of fixing them I said "what the hell, I'll just write an EXIF parser". (Yakpoint 5.) So I did and adapted the web gallery to use it (Yakpoint 6, for the adaptation.)

    At this point, I looked back, and looked forward, and looked all around, and all was good, but what was with this uneasiness I was feeling? And indeed, I hadn't actually made anything better, and I wasn't taking more photos, and the workflow was the same.

    I was also concerned about the client side of things, which was still in Python and using some breakage-prone legacy libraries to do the photo scaling and transformations and what-not, and relied on a desktop application (f-spot) of dubious future. So I started to look at what it would take to port that script to Scheme (Yakpoint 7). Well it used some legacy libraries to copy files over SSH (gnome-vfs; switching away from that would be Yakpoint 8) and I didn't want to make a Scheme GIO binding (Yakpoint 9, narrowly avoided), and I then -- and then, dear reader -- so then I said "well WTF my caching story on the server is crap anyway, I never know when the sqlite database has changed or not so I never know what responses I can cache, what I really want is a functional datastore" (Yakpoint 10), which is what I have with Git and Tekuti (Yakpoint of yore), and so why not just store my photos in Git like I do in Tekuti for blog posts and serve them from there, indexing as needed? Of course I'd need some other server software (Yakpoint of fore, by which I meantersay the future), but then I could just git push to update my photo gallery, and I wouldn't have to endure the horror that is GVFS shelling out to ssh in a FUSE daemon (Yakpoint of ne'er).

    So. After mulling over these thoughts for a while I decided, during an autumnal walk on the Salève in which we had the greatest views of Mont Blanc everrrrr and yet where are the photos?, that really what I needed was new photo management software, not just a web gallery. I should be able to share photos from my phone or from my desktop, fix them up either place, tag and such, and OK woo hoo! Such is the future! And the present for many people? Thing is, I also needed good permissions management (Yakpoint what, 10 I guess?), because you know a dude just out of college is not the same as that dude many years later. Which means serving things over HTTPS (Yakpoints 11-47) in such a way that the app has some good control over who gets what.

    Well. Anyway. My mind ran ahead, and runs ahead, and yet we haven't actually tasted the awesome sauce yet. So! The photo management software, whereever it lives, needs to rotate photos at least, and scale them down to a few resolutions. I smell a yak! I looked at jpegtran which can do some lossless rotations but it's not available as a library, which is odd; and really I don't like shelling out for core program functionality, because every time I deal with the file system it's the wild west of concurrent mutation. If naming things is one of the two hardest problems in computer science, the file system is the worst because you have to give a global name to every intermediate value.

    At the same time to scale images, what was I to do? Make a binding to libjpeg? Well I started (Yakpoint 48) but for reals kids, libjpeg is not fun. It works great and is really clever but

    1. it's approximately impossible to use from a dynamic ffi; you want a compiler to verify that you are using the right structure definitions

    2. there has been an inane ABI and format break imposed by the official IJG libjpeg but which other implementations have not followed, but how could you know which one you are using?

    3. the error handling facility encourages longjmp in C programs; somewhat terrifying

    4. off-heap image manipulation libraries always interact poorly with GC, because the GC only sees the small pointer to the off-heap image, and so doesn't GC often enough

    5. I have zero guarantee that libjpeg won't change ABI in weird ways, and I don't want to touch this software for the next 10 years

    6. I want to do jpegtran-like lossless transformations, but that's not available as a library, and it's totes ridics that binding libjpeg does not help you out here

    7. it's still an unsafe C library, battle-tested yes, but terrifyingly unsafe, and I'd be putting it on my server and who knows?

    Friends, I arrived at the pasture, and I, I chose the yak less shaven. I took my lame JPEG parser and turned it into a full decoder (Yakpoint 49), realized it wasn't much more work to do an encoder (Yakpoint 50), and implemented the lossless transformations (Yakpoint 51).

    on haters

    Before we go on, I know some people would think "what is this kid about". I mean, custom gallery software, a custom JPEG library of all things, all bespoke, why don't you just use off-the-shelf solutions? Why aren't you normal and use a normal language and what about the best practices and where's your business case and I can't go on about this because there's a technical term for people that say this kind of thing and it's "hater".

    Thing is, when did a hater ever make anything cool? Come to think of it, when did a hater make anything at all? In my experience the most vocal haters have nothing behind their names except a long series of pseudonymous rants in other people's comment boxes. So friends, in the joyful spirit of earning-anew, let's talk about JPEG!

    on color

    JPEG is a funny thing. Photos are our lives and our memories, our first steps and our friends, and yet I for one didn't know very much about them. My mental model that "a JPEG is a rectangle of pixels" doesn't turn out to be quite right.

    If you actually look in a normal JPEG, you see three planes of information. If I take this image, for example:

    If I decode it, actually I get three images. Here's the first one:

    This is just the greyscale version of the image. So, storytime! Remember black and white television? We had an old one that got moved around the house sometimes, like if Mom was working at something in the kitchen. We also had a color one in the living room, and you could watch one or the other and they showed the same stuff. Strange when you think about it though -- one being in color and the other not. Well it turns out that color was literally just added on, both historically and technically. The main broadcast was still in black and white, and then in one part of the frequency band there were separate color signals, which color TVs would pick up, mix with the black and white signal, and come out with color. Wikipedia notes that "color TV" was really just "colored TV", which is a phrase whose cleverness I respect. Big ups to the W P.

    In the context of JPEG, this black-and-white signal is sometimes called "luma", but is more precisely called Y', where the "prime" (the apostrophe) indicates that the signal has gamma correction applied.

    In the image above, I replaced the color planes (sometimes collectively called the "chroma") with zeroes, while losslessly keeping the luma. Below is the first color plane, with the Y' plane replaced with a uniform 50% luma, and the other color plane replaced with zeros.

    This color signal is technically known as CB, which may be very imperfectly understood as the bluish component of the color. Well the original image wasn't very blue, so we don't see very much here.

    Indeed, our eyes have a harder time seeing differences in color than differences in intensity. Apparently this goes all the way down to biology -- we have more receptors in our eyes for "black and white" and fewer for color.

    Early broadcasters took advantage of this difference in perception by actually devoting more bandwidth in their broadcasts to luma than to chroma; if you check the Wikipedia page you will see that the area in the spectrum allocation devoted to color is much smaller than the area devoted to intensity. So it is in JPEG: the above image being half-width indicates that actually we're just encoding one CB sample for every two Y' samples.

    Finally, here we have the CR color plane, which can loosely be thought of as the "redness" of the image.

    These test images and crops preserve the actual encoding of this photo as it came from my camera, without re-encoding. That's partly why there's not much interesting going on; with the megapixels these days, it's hard to fit much of anything in a few hundred pixels square. This particular camera is sub-sampling in the horizontal direction, but it's also common to subsample vertically as well, producing color planes that are half-width and half-height. In my limited investigations I have found that cameras tend to sub-sample just in the X direction, producing what they call 4:2:2 images, and that standard software encoders subsample in both, producing 4:2:0.

    Incidentally, properly scaling up the color planes is quite an irritating endeavor -- the standard indicates that the color is sampled between the locations of the Y' samples ("centered" chroma), but these images originally have EXIF data that indicates that the color samples are taken at the position of the first Y' sample ("co-sited" chroma). I'm pretty sure libjpeg doesn't delve into the EXIF to check this though, so it would seem that all renderings I have seen of these photos are subtly off.

    But how do you get proper color out of these strange luma and chroma things? Well, the Y'CBCR colorspace is really just the same color cube as RGB, except rotated: the Y' axis traverses the diagonal from (0, 0, 0) (black) to (255, 255, 255) (white). CB and CR are perpendicular to that diagonal, pointing towards blue or red respectively. So to go back to RGB, you multiply by a matrix to rotate the cube.

    It's not a very intuitive color system, as you can see from the images above. For one thing, at zero or full luma, the chroma axes have no meaning; black and white can have no hue. Indeed if you imagine trying to fit a cube corner-down into a similar-sized box, you end up either having empty space in the box, or you have to cut off corners from the cube, or both. Cut corners means that bits of the Y'CBCR signal are wasted; empty space means there are RGB colors that are not representable in Y'CBCR. I'm not sure, but I think both are true for the particular formulation of Y'CBCR used in JPEG.

    There's more to say about color here but frankly I don't know enough to do so, even though I worked in digital video for many years. If this is something you are mildly interested in, I highly, highly recommend watching Wim Taymans' presentation at this year's GStreamer conference. He takes a look at color in video that is constructive, building up from biology through math to engineering. His is a principled approach rather than a list of rules. It really clarified a number of things for me (and opened doors to unknown unknowns beyond).

    on cosines

    Where were we? Right, JPEG. So the proper way to understand what JPEG is is to understand the encoding process. We've covered colorspace conversion from RGB to Y'CBCR and sub-sampling. Next, the image canvas is divided into equal-sized "macroblocks". (These are called "minimum coded units" (MCUs) in the JPEG context, but in video they are usually called macroblocks, and it's a better name.) Without sub-sampling, each macro-block will contain one 8-sample-by-8-sample block for each component (Y', CB, CR) of the image. In my images above, the canvas space corresponding to one chroma block is the space of two luma blocks, so the macroblocks will be 16 samples wide and 8 samples tall, and contain two Y' blocks and one each of CB and CR. If the image canvas can't be evenly divided into macroblocks, it is padded to fit, usually by duplicating the last column or row of samples.

    Then to make a JPEG, each block is encoded separately, then the whole thing is just written out to a file, and you're done!

    This description glosses over a couple of important points, but it's a good big-picture view to have in mind. The pipeline goes from RGB pixels, to a padded RGB canvas, to separate Y'CBCR planes, to a possibly subsampled set of those planes, to macroblocks, to encoded macroblocks, to the file. Decoding is the reverse. It's a totally doable, comprehensible thing, and that was one of the big takeaways for me from this project. I took photography classes in high school and it was really cool to see how to shoot, develop, and print film, and this is similar in many ways. The real "film" is raw-format data, which some cameras produce, but understanding JPEG is like understanding enlargers and prints and fixer baths and such things. It's smelly and dark but pretty cool stuff.

    So, how do you encode a block? Well peoples, this is a kinda cool thing. Maybe you remember from some math class that, given n uniformly spaced samples, you can always represent that series as a sum of n cosine functions of equally spaced frequencies. In each litle 8-by-8 block, that's what we do: a "forward discrete cosine transformation" (FDCT), which is just multiplying together some matrices for every point in the block. The FDCT is completely separable in the X and Y directions, so the space of 8 horizontal coefficients multiplies by the space of 8 vertical coefficients at each column to yield 64 total coefficients, which is not coincidentally the number of samples in a block.

    Funny thing about those coefficients: each one corresponds to a particular horizontal and vertical frequency. We can map these out as a space of functions; for example giving a non-zero coefficient to (0, 0) in the upper-left block of a 8-block-by-8-block grid, and so on, yielding a 64-by-64 pixel representation of the meanings of the individual coefficients. That's what I did in the test strip above. Here is the luma example, scaled up without smoothing:

    The upper-left corner corresponds to a frequency of 0 in both X and Y. The lower-right is a frequency of 4 "hertz", oscillating from highest to lowest value in both directions four times over the 8-by-8 block. I'm actually not sure why there are some greyish pixels around the right and bottom borders; it's not a compression artifact, as I constructed these DCT arrays programmatically. Anyway. Point is, your lover's smile, your sunny days, your raw urban graffiti, your child's first steps, all of these are reified in your photos as a sum of cosine coefficients.

    The odd thing is that what is reified into your pictures isn't actually all of the coefficients there are! Firstly, because the coefficients are rounded to integers. Mathematically, the FDCT is a lossless operation, but in the context of JPEG it is not because the resulting coefficients are rounded. And they're not just rounded to the nearest integer; they are probably quantized further, for example to the nearest multiple of 17 or even 50. (These numbers seem exaggerated, but keep in mind that the range of coefficients is about 8 times the range of the original samples.)

    The choice of what quantization factors to use is a key part of JPEG, and it's subjective: low quantization results in near-indistinguishable images, but in middle compression levels you want to choose factors that trade off subjective perception with file size. A higher quantization factor leads to coefficients with fewer bits of information that can be encoded into less space, but results in a worse image in general.

    JPEG proposes a standard quantization matrix, with one number for each frequency (coefficient). Here it is for luma:

    (define *standard-luma-q-table*
      #(16 11 10 16 24 40 51 61
        12 12 14 19 26 58 60 55
        14 13 16 24 40 57 69 56
        14 17 22 29 51 87 80 62
        18 22 37 56 68 109 103 77
        24 35 55 64 81 104 113 92
        49 64 78 87 103 121 120 101
        72 92 95 98 112 100 103 99))
    

    This matrix is used for "quality 50" when you encode an 8-bit-per-sample JPEG. You can see that lower frequencies (the upper-left part) are quantized less harshly, and vice versa for higher frequencies (the bottom right).

    (define *standard-chroma-q-table*
      #(17 18 24 47 99 99 99 99
        18 21 26 66 99 99 99 99
        24 26 56 99 99 99 99 99
        47 66 99 99 99 99 99 99
        99 99 99 99 99 99 99 99
        99 99 99 99 99 99 99 99
        99 99 99 99 99 99 99 99
        99 99 99 99 99 99 99 99))
    

    For chroma (CB and CR) we see that quantization is much more harsh in general. So not only will we sub-sample color, we will also throw away more high-frequency color variation. It's interesting to think about, but also makes sense in some way; again in photography class we did an exercise where we shaded our prints with colored pencils, and the results were remarkable. My poor, lazy coloring skills somehow rendered leaves lifelike in different hues of green; really though, they were shades of grey, colored in imprecisely. "Colored TV" indeed.

    With this knowledge under our chapeaux, we can now say what the "JPEG quality" setting actually is: it's simply that pair of standard quantization matrices scaled up or down. Towards "quality 100", the matrix approaches all-ones, for no quantization, and thus minimal loss (though you still have some rounding, often subsampling as well, and RGB-to-Y'CBCR gamut loss). Towards "quality 0" they scale to a matrix full of large values, for harsh quantization.

    This understanding also explains those wavey JPEG artifacts you get on low-quality images. Those artifacts look like waves because they are waves. They usually occur at sharp intensity transitions, which like a cymbal crash cause lots of high frequencies that then get harshly quantized. Incidentally I suspect (but don't know) that this is the same reason that cymbals often sound bad in poorly-encoded MP3s, because of harsh quantization in the frequency domain.

    Finally, the coefficients are written out to a file as a stream of bits. Each file gets a huffman code allocated to it, which ideally is built from the distribution of quantized coefficient sizes seen in all of the blocks of an image. There are usually different encodings for luma and chroma, to reflect their different quantizations. Reading and writing this bitstream is a bit of a headache but the algorithm is specified in the JPEG standard, and all you have to do is implement it. Notably, though, there is special support for encoding a run of zero-valued coefficients, which happens often after quantization. There are rarely wavey bits in a blue blue sky.

    on transforms

    It's terribly common for photos to be wrongly oriented. Unfortunately, the way that many editors fix photo rotation is by setting a bit in the EXIF information of the JPEG. This is ineffectual, as web browsers don't look in the EXIF information, and silly, because it turns out you can losslessly rotate most JPEG images anyway.

    Consider that the body of a JPEG is an array of macroblocks. To rotate an image, you just have to rearrange those macroblocks, then rearrange the blocks inside the macroblocks (e.g. swap the two Y' blocks in my above example), then transform the blocks themselves.

    The lossless transformations that you can do on a block are transposition, vertical flipping, and horizontal flipping.

    Transposition flips a block along its downward-sloping diagonal. To do so, you just swap the coefficients at (u, v) with the coefficients at (v, u). Easy peasey.

    Flipping is trickier. Consider the enlarged DCT image from above. What would it take to horizontally flip the function at (0, 1)? Instead of going from light to dark, you want it to go from dark to light. Simple: you just negate the coefficients! But you only want to negate those coefficients that are "odd" in the X direction, which are those coefficients whose column is odd. And actually that's all there is to it. Flipping vertically is the same, but for coefficients whose row is odd.

    I said "most images" above because those whose size is not evenly divided by the macroblock size can't be losslessly rotated -- you will end up seeing some of the hidden data that falls off the edge of the canvas. Oh well. Most raw images are properly dimensioned, and if you're downscaling, you already have to re-encode anyway.

    But that's just flipping and transposition, you say! What about rotation? Well it turns out that you can express rotation in terms of these operations: rotating 90 degrees clockwise is just a transpose and a horizontal flip (in that order). Together, flipping horizontally, flipping vertically, and transposing form a group, in the same way that flipping and flopping form a group for mattresses. Yeah!

    on scheme

    I wrote this library in Scheme because that's my language of choice these days. I didn't run into any serious impedance mismatches; Guile has a generic multi-dimensional array facility that made it possible to express many of these operations as generic folds, unfolds, or maps over arrays. The huffman coding part was a bit irritating, but all in all things were pretty good. The speed is pretty bad, but I haven't optimized it at all, and it gives me a nice test case for the compiler. Anyway, it's been fun and it suits my needs. Check out the project page if you're interested. Yes, to shave a yak you have to get a bit bovine and smelly, but yaks live in awesome places!

    Finally I will leave you with a glitch, one of many that I have produced over the last couple weeks. Comments and corrections welcome below. Happy hacking!

    by Andy Wingo at November 14, 2014 04:49 PM

    Jacobo Aragunde

    LibreOffice workshop at A Coruña University

    Last week I went back to the University of A Coruña, this time to stand at the opposite side of the classroom and conduct a workshop about LibreOffice.

    I was invited by Juan José Sánchez Penas as part of the subject System Information Design (Deseño de Sistemas de Información) which belongs to the Master in Computer Science Engineering (Mestrado de Enxeñería Informática). The goal was introducing students to a real world project and see how the techniques they learn are applied in practice.

    The table of contents:

    1. An introduction to LibreOffice project: its long history since the 80s, current status and the awesome community that powers it.
    2. A high-level overview of the project architecture to present its main design philosophy.
    3. Accessibility in LibreOffice: how it is designed and current status. You will probably find this chapter familiar.
    4. Quality assurance techniques and tools: what the community does to assure the quality of our releases.

    Find below the slides I prepared for the workshop, with versions both in Galician and English. Files are hybrid PDFs to make them easy to modify and reuse; feel free to do it under the terms of the CC-BY-SA license.

    by Jacobo Aragunde Pérez at November 14, 2014 12:37 PM

    Andy Wingo

    generators in firefox now twenty-two times faster

    It's with great pleasure that I can announce that, thanks to Mozilla's Jan de Mooij, the new ES6 generator functions are twenty-two times faster in Firefox!

    Some back-story, for the unawares. There's a new version of JavaScript coming, ECMAScript 6 (ES6). Among the new features that ES6 brings are generator functions: functions that can suspend. Firefox's JavaScript engine, SpiderMonkey, has had support for generators for many years, long before other engines. This support was upgraded to the new ES6 standard last year, thanks to sponsorship from Bloomberg, and was shipped out to users in Firefox 26.

    The generators implementation in Firefox 26 was quite basic. As you probably know, modern JavaScript implementations have a number of tiered engines. In the case of SpiderMonkey there are three tiers: the interpreter, the baseline compiler, and the optimizing compiler. Code begins execution in the interpreter, which is the quickest engine to start. If a piece of code is hot -- meaning that lots of time is being spent there -- then it will "tier up" to the next level, where it is analyzed, possibly optimized, and then compiled to machine code.

    Unfortunately, generators in SpiderMonkey have always been stuck at the lowest tier, the interpreter. This is because of SpiderMonkey's choice of implementation strategy for generators. Generators were implemented as "floating interpreter stack frames": heap-allocated objects whose shape was exactly the same as a stack frame in the interpreter. This had the advantage of being fairly cheap to implement in the beginning, but ultimately it made them unable to take advantage of JIT compilation, as JIT code runs on its own stack which has a different layout. The previous strategy also relied on trampolining through a helper written in C++ to resume generators, which killed optimization opportunities.

    The solution was to represent suspended generator objects as snapshots of the state of a stack frame, instead of as stack frames themselves. In order for this to be efficient, last year we did a number of block scope optimizations to try and reduce the amount of state that a generator frame would have to restore. Finally, around March of this year we were at the point where we could refactor the interpreter to implement generators on the normal interpreter stack, with normal interpreter bytecodes, with the vision of being able to JIT-compile those bytecodes.

    I ran out of time before I could land that patchset; although the patches were where we wanted to go, they actually caused generators to be even slower and so they languished in Bugzilla for a few more months. Sad monkey. It was with delight, then, that a month or so ago I saw that SpiderMonkey JIT maintainer Jan de Mooij was interested in picking up the patches. Since then he has been hacking off and on at getting my old patches into shape, and ended up applying them all.

    He went further, optimizing stack frames to not reserve space for "aliased" locals (locals allocated on the scope chain), speeding up object literal creation in the baseline compiler and finally has implemented baseline JIT compilation for generators.

    So, after all of that perf nargery, what's the upshot? Twenty-two times faster! In this microbenchmark:

    function *g(n) {
        for (var i=0; i<n; i++)
            yield i;
    }
    function f() {
        var t = new Date();
        var it = g(1000000);
        for (var i=0; i<1000000; i++)
    	it.next();
        print(new Date() - t);
    }
    f();
    

    Before, it took SpiderMonkey 980 milliseconds to complete on Jan's machine. After? Only 43! It's actually marginally faster than V8 at this point, which has (temporarily, I think) regressed to 45 milliseconds on this test. Anyway. Competition is great and as a committer to both projects I find it very satisfactory to have good implementations on both sides.

    As in V8, in SpiderMonkey generators cannot yet reach the highest tier of optimization. I'm somewhat skeptical that it's necessary, too, as you expect generators to suspend fairly frequently. That said, a yield point in a generator is, from the perspective of the optimizing compiler, not much different from a call site, in that it causes all locals to be saved. The difference is that locals may have unboxed representations, so we would have to box those values when saving the generator state, and unbox on restore.

    Thanks to Bloomberg for funding the initial work, and big, big thanks to Mozilla's Jan de Mooij for picking up where we left off. Happy hacking with generators!

    by Andy Wingo at November 14, 2014 08:41 AM

    November 13, 2014

    Sergio Villar

    BlinkOn 3

    Last week I attended BlinkOn3 held at Google’s Mountain View office. Not only that but I also had the pleasure of giving a speech about what has been taking most of my time lately, the CSS Grid Layout implementation.

    Although there were several talks already scheduled for some weeks, the conference itself is very dynamic in the sense that new talks were added as people started to propose new topics to discuss.

    I found quite interesting the talks about the State of Blink and also the incoming changes related to Slimming Paint. Exciting times ahead!

    My talk was about the CSS Grid Layout work Igalia has been carrying out for several months now (also thanks to Bloomberg sponsorship). The talk was very well received (we were not a lot of people mainly because my talk was rescheduled trice), and people in general is quite excited about the new opportunities for web authors the spec will bring.

    The slides are here (3.3MB PDF). They look a bit blurry, that’s because its original format was the Google’s HTML5 io-2012-slides which allowed me to do a lot of live demos.

    I also had the opportunity to talk to Julien Chaffraix about the next steps. We both are quite confident about the status of the implementation, so we decided to eventually send the “Intent to ship” at some point during Q4. Very good news for web authors! The most important things to address before the shipping are the handling of absolutely positioned items and a couple of issues related to grids with indefinite remaining space.

    by svillar at November 13, 2014 11:29 AM

    November 11, 2014

    Samuel Iglesias

    Piglit, an open-source test suite for OpenGL implementations

    OpenGL logo OpenGL is an API for rendering 2D and 3D vector graphics now managed by the non-profit technology consortium Khronos Group. This is a multi-platform API found in different form factor devices (from desktop computer to embedded devices) and operating systems (GNU/Linux, Microsoft Windows, Mac OS X, etc).

    As Khronos only defines OpenGL API, the implementors are free to write the OpenGL implementation they wish. For example, when talking about GNU/Linux systems, NVIDIA provides its own proprietary libraries while other manufacturers like Intel are using Mesa, one of the most popular open source OpenGL implementations.

    Because of this implementation freedom, we need a way to check that they follow OpenGL specifications. Khronos provides their own OpenGL conformance test suite but your company needs to become a Khronos Adopter member to have access to it. However there is an unofficial open source alternative: piglit.

    Piglit

    Piglit is an open-source OpenGL implementation conformance test suite created by Nicolai Hähnle in 2007. Since then, it has increased the number of tests covering different OpenGL versions and extensions: today a complete piglit run executes more than 35,000 tests.

    Piglit is one of the tools widely used in Mesa to check that the commits  adding new functionality or modifying the source code don’t break the OpenGL conformance. If you are thinking in contributing to Mesa, this is definitely one of the tools you want to master.

    How to compile piglit

    Before compiling piglit, you need to have the following dependencies installed on your system. Some of them are available in modern GNU/Linux distributions (such as Python, numpy, make…), while others you might need to compile them (waffle).

    • Python 2.7.x
    • Python mako module
    • numpy
    • cmake
    • GL, glu and glut libraries and development packages (i.e. headers)
    • X11 libraries and development packages (i.e. headers)
    • waffle

    But waffle is not available in Debian/Ubuntu repositories, so you need to compile it manually and, optionally,  install it in the system:

    $ git clone git://github.com/waffle-gl/waffle
    $ cmake . -Dwaffle_has_glx=1
    $ make
    $ sudo make install

    Piglit is a project hosted in Freedesktop. To download it, you need to have installed git in your system, then run the corresponding git-clone command:

    $ git clone git://anongit.freedesktop.org/git/piglit

    Once it finishes cloning the repository, just compile it:

    $ cmake .
    $ make

    More info in the documentation.

    As a result, all the test binaries are inside bin/ directory and it’s possible to run them standalone… however there are scripts to run all of them in a row.

    Your first piglit run

    After you have downloaded piglit source code from its git repository and compiled it , you are ready to run the testing suite.

    First of all, make sure that everything is correctly setup:

    $ ./piglit run tests/sanity.tests results/sanity.results

    The results will be inside results/sanity.results directory. There is a way to process those results and show them in a human readable output but I will talk about it in the next point.

    If it fails, most likely it is because libwaffle is not found in the path. If everything went fine, you can execute the piglit test suite against your graphics driver.

    $ ./piglit run tests/all results/all-reference

    Remember that it’s going to take a while to finish, so grab a cup of coffee and enjoy it.

    Analyze piglit output

    Piglit provides several tools to convert the JSON format results in a more readable output: the CLI output tool (piglit-summary.py) and the HTML output tool (piglit-summary-html.py). I’m going to explain the latter first because its output is very easy to understand when you are starting to use this test suite.

    You can run these scripts standalone but piglit binary calls each of them depending of its arguments. I am going to use this binary in all the examples because it’s just one command to remember.

    HTML output

    In order to create an HTML output of a previously saved run, the following command is what you need:

    $ ./piglit summary html --overwrite summary/summary-all results/all-reference

    • You can append more results at the end if you would like to compare them. The first one is the reference for the others, like when counting the number of regressions.

    $ ./piglit summary html --overwrite summary/summary-all results/all-master results/all-test

    • The overwrite argument is to overwrite summary destination directory contents if they have been already created.

    Finally open the HTML summary web page in a browser:

    $ firefox summary/summary-all/index.html

    Each test has a background color depending of the result: red (failed), green (passed), orange (warning), grey (skipped) or black (crashed). If you click on its respective link at the right column, you will see the output of that test and how to run it standalone.

    There are more pages:

    • skipped.html: it lists all the skipped tests.
    • fixes.html: it lists all the tests fixed that before failed.
    • problems.html: it lists all the failing tests.
    • disabled.html: it shows executed tests before but now skipped.
    • changes.html: when you compare twoormoredifferentpiglit runs, this page shows all the changes comparing the new results with the reference (first results/ argument):
      • Previously skipped tests and now executed (although the result could be either fail, crash or pass).
      • It also includes all regressions.html data.
      • Any other change of the tests compared with the reference: crashed tests, passed tests that before were failing or skipped, etc.
    • regressions.html: when you compare two or more different piglit runs, this page shows the number of previously passed tests that now fail.
    • enabled.html: it lists all the executed tests.

    I recommend you to explore which pages are available and what kind of information each one provides. There are more pages like info which is at the first row of each results column on the right most of the screen and it gathers all the information about hardware, drivers, supported OpenGL version, etc.

    Test details

    As I said before, you can see what kind of error output (if any) a test has written, spent time on execution and which kind of arguments were given to the binary.

    piglit-test-detailsThere is also a dmesg field which shows the kernel errors that appeared in each test execution. If these errors are graphics driver related, you can easily detect which test was guilty. To enable this output, you need to add –dmesg argument to piglit run but I will explain this and other parameters in next post.

    Text output

    The usage of the CLI tool is very similar to HTML one except that its output appears in the terminal.

    $ ./piglit summary console results/all-reference

    As its output is not saved in any file, there is not argument to save it in a directory and there is no overwrite arguments either.

    Like HTML-output tool, you can append several result files to do a comparison between them. The tool will output one line per test together with its result (pass, fail, crash, skip) and a summary with all the stats at the end.

    As it prints the output in the console, you can take advantage of tools like grep to look for specific combinations of results

    $ ./piglit summary console results/all-reference | grep fail

    This is an example of an output of this command:

    $ ./piglit summary console results/all-reference
    [...]
    spec/glsl-1.50/compiler/interface-block-name-uses-gl-prefix.vert: pass
    spec/EXT_framebuffer_object/fbo-clear-formats/GL_ALPHA16 (fbo incomplete): skip
    spec/ARB_copy_image/arb_copy_image-targets GL_TEXTURE_CUBE_MAP_ARRAY 32 32 18 GL_TEXTURE_2D_ARRAY 32 16 15 11 12 5 5 1 2 14 15 9: pass
    spec/glsl-1.30/execution/built-in-functions/fs-op-bitxor-uvec3-uint: pass
    spec/ARB_depth_texture/depthstencil-render-miplevels 146 d=z16: pass
    spec/glsl-1.10/execution/variable-indexing/fs-varying-mat2-col-row-rd: pass
    summary:
           pass: 25085
           fail: 262
          crash: 5
           skip: 9746
        timeout: 0
           warn: 13
     dmesg-warn: 0
     dmesg-fail: 0
          total: 35111

    And this is the output when you compare two different piglit results:

    $ ./piglit summary console results/all-reference results/all-test
    [...]
    spec/glsl-1.50/compiler/interface-block-name-uses-gl-prefix.vert: pass pass
    spec/glsl-1.30/execution/built-in-functions/fs-op-bitxor-uvec3-uint: pass pass
    summary:
           pass: 25023
           fail: 548
          crash: 7
           skip: 8264
        timeout: 0
           warn: 15
     dmesg-warn: 0
     dmesg-fail: 0
        changes: 376
          fixes: 2
    regressions: 2
          total: 33857

    Output for Jenkins-CI

    There is another script (piglit-summary-junit.py) that produces results in a format that Jenkins-CI understands which is very useful when you have this continuous integration suite running somewhere. As I have not played with it yet, I keep it as an exercise for readers.

    Conclusions

    Piglit is an open-source OpenGL implementation conformance test suite widely use in projects like Mesa.

    In this post I explained how to compile piglit, run it and convert the result files to a readable output. This is very interesting when you are testing your last Mesa patch before submission to the development mailing list or when you are looking for regressions in the last stable version of your graphics device driver.

    Next post will cover how to run specific tests in piglit and explain some arguments very interesting for specific cases. Stay tuned!

    by Samuel Iglesias at November 11, 2014 04:05 PM

    Iago Toral

    A brief overview of the 3D pipeline

    Recap

    In the previous post I discussed the Mesa development environment and gave a few tips for newcomers, but before we start hacking on the code we should have a look at how modern GPUs look like, since that has a definite impact on the design and implementation of driver code. Let’s get to it.

    Fixed Function vs Programmable hardware

    Before the advent of shading languages like GLSL we did not have the option to program the 3D hardware at will. Instead, the hardware would have specific units dedicated to implement certain operations (like vertex transformations) that could only be used through specific APIs, like those exposed by OpenGL. These units are usually labeled as Fixed Function, to differentiate them from modern GPUs that also expose fully programmable units.

    What we have now in modern GPUs is a fully programmable pipeline, where graphics developers can code graphics algorithms of various sorts in high level programming languages like GLSL. These programs are then compiled and loaded into the GPU to execute specific tasks. This gives graphics developers a huge amount of freedom and power, since they are no longer limited to preset APIs exposing fixed functionality (like the old OpenGL lightning models for example).

    Modern graphics drivers

    But of course all this flexibility and power that graphics developers enjoy today come at the expense of significantly more complex hardware and drivers, since the drivers are responsible for exposing all that flexibility to the developers while ensuring that we still obtain the best performance out of the hardware in each scenario.

    Rather than acting as a bridge between a fixed API like OpenGL and fixed function hardware, drivers also need to handle general purpose graphics programs written in high-level languages. This is a big change. In the case of OpenGL, this means that the driver needs to provide an implementation of the GLSL language, so suddenly, the driver is required to incorporate a full compiler and deal with all sort of problems that belong to the realm of compilers, like choosing an intermediary representation for the program code (IR), performing optimization passes and generating native code for the GPU.

    Overview of a modern 3D pipeline

    I have mentioned that modern GPUs expose fully programmable hardware units. These are called shading units, and the idea is that these units are connected in a pipeline so that the output of a shading unit becomes the input of the next. In this model, the application developer pushes vertices to one end of the pipeline and usually obtains rendered pixels on the other side. In between these two ends there are a number of units making this transition possible and a number of these will be programmable, which means that the graphics developer can control how these vertices are transformed into pixels at different stages.

    The image below shows a simplified example of a 3D graphics pipeline, in this case as exposed by the OpenGL 4.3 specification. Let’s have a quick look at some of its main parts:


    The OpenGL 4.3 3D pipeline (image via www.brightsideofnews.com)

    Vertex Shader (VS)

    This programmable shading unit takes vertices as input and produces vertices as output. Its main job is to transform these vertices in any way the graphics developer sees fit. Typically, this is were we would do transforms like vertex projection,
    rotation, translation and, generally, compute per-vertex attributes that we won’t to provide to later stages in the pipeline.

    The vertex shader processes vertex data as provided by APIs like glDrawArrays or glDrawElements and outputs shaded vertices that will be assembled into primitives as indicated by the OpenGL draw command (GL_TRIANGLES, GL_LINES, etc).

    Geometry Shader

    Geometry shaders are similar to vertex shaders, but instead of operating on individual vertices, they operate on a geometry level (that is, a line, a triangle, etc), so they can take the output of the vertex shader as their input.

    The geometry shader unit is programmable and can be used to add or remove vertices from a primitive, clip primitives, spawn entirely new primitives or modify the geometry of a primitive (like transforming triangles into quads or points into triangles, etc). Geometry shaders can also be used to implement basic tessellation even if dedicated tessellation units present in modern hardware are a better fit for this job.

    In GLSL, some operations like layered rendering (which allows rendering to multiple textures in the same program) are only accessible through geometry shaders, although this is now also possible in vertex shaders via a particular extension.

    The output of a geometry shader are also primitives.

    Rasterization

    So far all the stages we discussed manipulated vertices and geometry. At some point, however, we need to render pixels. For this, primitives need to be rasterized, which is the process by which they are broken into individual fragments that would then be colored by a fragment shader and eventually turn into pixels in a frame buffer. Rasterization is handled by the rasterizer fixed function unit.

    The rasterization process also assigns depth information to these fragments. This information is necessary when we have a 3D scene where multiple polygons overlap on the screen and we need to decide which polygon’s fragments should be rendered and which should be discarded because they are hidden by other polygons.

    Finally, the rasterization also interpolates per-vertex attributes in order to compute the corresponding fragment values. For example, let’s say that we have a line primitive where each vertex has a different color attribute, one red and one green. For each fragment in the line the rasterizer will compute interpolated color values by combining red and green depending on how close or far the fragments are to each vertex. With this, we will obtain red fragments on the side of the red vertex that will smoothly transition to green as we move closer to the green vertex.

    In summary, the input of the rasterizer are the primitives coming from a vertex, tessellation or geometry shader and the output are the fragments that build the primitive’s surface as projected on the screen including color, depth and other interpolated per-vertex attributes.

    Fragment Shader (FS)

    The programmable fragment shader unit takes the fragments produced by the rasterization process and executes an algorithm provided by a graphics developer to compute the final color, depth and stencil values for each fragment. This unit can be used to achieve numerous visual effects, including all kinds of post-processing filters, it is usually where we will sample textures to color polygon surfaces, etc.

    This covers some of the most important elements in 3D the graphics pipeline and should be sufficient, for now, to understand some of the basics of a driver. Notice, however that have not covered things like transform feedback, tessellation or compute shaders. I hope I can get to cover some of these in future posts.

    But before we are done with the overview of the 3D pipeline we should cover another topic that is fundamental to how the hardware works: parallelization.

    Parallelization

    Graphics processing is a very resource demanding task. We are continuously updating and redrawing our graphics 30/60 times per second. For a full HD resolution of 1920×1080 that means that we need to redraw over 2 million pixels in each go (124.416.000 pixels per second if we are doing 60 FPS). That’s a lot.

    To cope with this the architecture of GPUs is massively parallel, which means that the pipeline can process many vertices/pixels simultaneously. For example, in the case of the Intel Haswell GPUs, programmable units like the VS and GS have multiple Execution Units (EU), each with their own set of ALUs, etc that can spawn up to 70 threads each (for GS and VS) while the fragment shader can spawn up to 102 threads. But that is not the only source of parallelism: each thread may handle multiple objects (vertices or pixels depending on the case) at the same time. For example, a VS thread in Intel hardware can shade two vertices simultaneously, while a FS thread can shade up to 8 (SIMD8) or 16 (SIMD16) pixels in one go.

    Some of these means of parallelism are relatively transparent to the driver developer and some are not. For example, SIMD8 vs SIMD16 or single vertex shading vs double vertex shading requires specific configuration and writing driver code that is aligned with the selected configuration. Threads are more transparent, but in certain situations the driver developer may need to be careful when writing code that can require a sync between all running threads, which would obviously hurt performance, or at least be careful to do that kind of thing when it would hurt performance the least.

    Coming up next

    So that was a very brief introduction to how modern 3D pipelines look like. There is still plenty of stuff I have not covered but I think we can go through a lot of that in later posts as we dig deeper into the driver code. My next post will discuss how Mesa models various of the programmable pipeline stages I have introduced here, so stay tuned!

    by Iago Toral at November 11, 2014 12:13 PM

    November 09, 2014

    Andy Wingo

    ffconf 2014

    Last week I had the great privilege of speaking at ffconf in Brighton, UK. It was lovely. The town put on a full demonstration of its range of November weather patterns, from blue skies to driving rain to hail (!) to sea-spray to drizzle and back again. Good times.

    The conference itself was quite pleasant as well, and from the speaker perspective it was amazing. A million thanks to Remy and Julie for making it such a pleasure. ffconf is mostly a front-end development conference, so it's not directly related with the practice of my work on JS implementations -- perhaps you're unaware, but there aren't so many browser implementors that actually develop content for their browsers, and indeed fewer JS implementors that actually write JS. Me, I sling C++ all day long and the most JavaScript I write is for tests. When in the weeds, sometimes we forget we're building an amazing runtime and that people do inspiring things with it, so it's nice to check in with front-end folks at their conferences to see what people are excited about.

    My talk was about the part of JavaScript implementations that are written in JavaScript itself. This is an area that isn't so well known, and it has its amusing quirks. I think it can be interesting to a curious JS hacker who wants to spelunk down a bit to see what's going on in their browsers. Intrepid explorers might even find opportunities to contribute patches. Anyway, nerdy stuff, but that's basically how I roll.

    The slides are here: without images (350kB PDF) or with images (3MB PDF).

    I haven't been to the UK in years, and being in a foreign country where everyone speaks my native language was quite refreshing. At the same time there was an awkward incident in which I was reminded that though close, American and English just aren't the same. I made this silly joke that if you get a polyfill into a JS implementation, then shucks, you have a "gollyfill", 'cause golly it made it in! In the US I think "golly" is just one of those milquetoast profanities, "golly" instead of "god" like saying "shucks" instead of "shit". Well in the UK that's a thing too I think, but there is also another less fortunate connotation, in which "golly" as a noun can be a racial slur. Check the Wikipedia if you're as ignorant as I was. I think everyone present understood that wasn't my intention, but if that is not the case I apologize. With slides though it's less clear, so I've gone ahead and removed the joke from the slides. It's probably not a ball to take and run with.

    However I do have a ball that you can run with though! And actually, this was another terrible joke that wasn't bad enough to inflict upon my audience, but that now chance fate gives me the opportunity to use. So never fear, there are still bad puns in the slides. But, you'll have to click through to the PDF for optimal groaning.

    Happy hacking, JavaScripters, and until next time.

    by Andy Wingo at November 09, 2014 05:36 PM

    November 04, 2014

    Sergio Villar

    I’m attending BlinkOn3

    Today I’m giving a speech at BlinkOn3, the Blink contributors’ conference held in Google’s Mountain View Office. Check the agenda for further details.

    The plan is to give an overview about the feature, present the most recent additions/improvements and also talk about the roadmap. My session is scheduled for 3:30PM at the Artic Ocean room. See you there!

    UPDATE: we had many issues trying to setup the hangout so in the end we decided to move the session to Wednesday morning. It’s currently scheduled for 1:15PM just after lunch.

    by svillar at November 04, 2014 03:35 PM

    October 31, 2014

    Katerina Barone-Adesi

    A tale of sunk allocation

    Pflua is fast; Igalia wants it to be faster. When I joined Igalia, one of the big open questions was why some very similar workloads had extremely different speeds; matching a packet dump against a matching or non-matching host IP address could make the speed vary by more than a factor of 3! There was already a belief that allocation/trace/JIT interactions were a factor in this huge performance discrepancy, and it needed more investigation.

    As a first step, I tracked down and removed some questionable allocations, which showed up in LuaJIT's trace dumps, but were not obvious from the source code. Removing unnecessary allocations from the critical path generally speeds up code, and LuaJIT comes with an option to dump its bytecode, internal representation, and the resulting assembly code, which gave some useful clues about where to start.

    How were the results of removing unnecessary allocation? They were mixed, but they included double-digit performance improvements for some filters on some pcap dumps. "Portrange 0-6000" got 33% faster when benchmarked on the 1gb dump, and 96% faster on wingolog, nearly doubling in speed. The results were achieved by sinking all of the CNEWI allocations within the filter function of pflua-match - the performance gain was due to a small set of changes to one function. So, what was that change, and how was it made?

    The first step was to see the traces generated by running pflua-match.

    luajit -jdump=+rs ./pflua-match ~/path/wingolog.org.pcap tcp > trace1

    The thing to look for is allocations that show up in LuaJIT's SSA IR. They are documented at http://wiki.luajit.org/SSA-IR-2.0#Allocations. Specifically, the important ones are allocations that are not "sunk" - that is, that have not already been optimized away. http://wiki.luajit.org/Allocation-Sinking-Optimization has more details about the allocations LuaJIT can sink). In practice, for the critical path of pflua-match, these were all CNEWI.

    What is CNEWI? The allocation documentation tersely says "Allocate immutable cdata" and that CNEWI "is only used for immutable scalar cdata types. It combines an allocation with an initialization." What is cdata? The FFI documentation defines cdata as "a C data object. It holds a value of the corresponding ctype." So, all of the unsunk allocations have already been tracked down to something immutable related to the FFI at this point.

    Pflua compiles 'pflang' (the nameless filtering language of libpcap) to Lua; tools like pflua-match run the generated Lua code, and LuaJIT compiles hot traces in both pflua-match itself and the generated Lua. Here's the start of what the generated Lua code for the "tcp" filter above looks like:

    function match(P,length)
       if not (length >= 34) then return false end
       do
          local v1 = ffi.cast("uint16_t*", P+12)[0]
          if not (v1 == 8) then goto L3 end
    

    Here is an illustrative excerpt from an IR trace of pflua-match:

    0019 p64 ADD 0018 +12
    0020 {sink} cdt CNEWI +170 0019
    0021 > fun EQ 0014 ffi.cast
    0022 {sink} cdt CNEWI +173 0019
    0023 rbp u16 XLOAD 0019
    .... SNAP #3 [ ---- ---- ---- 0023 ]
    0024 > int EQ 0023 +8
    

    A few things stand out. There are two CNEWI calls, and they are both sunk. Also, the constant 12 on the first line looks awfully familiar, as does the 8 on the last line: this is the IR representation of the generated code! The 0019 at the end of lines refers back to the line starting with 0019 at the end of the trace, rbp is a register, the +170 and +173 are an internal representation of C types, etc. For more details, read about the IR on LuaJIT's wiki.

    This trace is not directly useful for optimization, because the CNEWI calls are already sunk (this is shown by the {sink}, rather than a register, proceeding them), but it illustrates the principles and tools involved. It also shows that raw data access can be done with sunk allocations, even when it involves the FFI.

    Here is a somewhat more complex trace, showing first the LuaJIT bytecode, and then the LuaJIT SSA IR.

    ---- TRACE 21 start 20/8 "tcp":2
    0004 . KPRI 2 1
    0005 . RET1 2 2
    0022 ISF 8
    0023 JMP 9 => 0025
    0025 ADDVN 3 3 0 ; 1
    0026 MOV 0 7
    0027 JMP 5 => 0003
    0003 ISGE 0 1
    0004 JMP 5 => 0028
    0000 . . FUNCC ; ffi.meta.__lt
    0005 JLOOP 5 20
    ---- TRACE 21 IR
    0001 [8] num SLOAD #4 PI
    0002 [10] num SLOAD #5 PI
    0003 r14 p64 PVAL #21
    0004 r13 p64 PVAL #59
    0005 rbp p64 PVAL #64
    0006 r15 + cdt CNEWI +170 0003
    0007 {sink} cdt CNEWI +172 0003
    0008 {sink} cdt CNEWI +170 0004
    0009 [18] + cdt CNEWI +170 0005
    

    There are 4 CNEWI calls; only two are sunk. The other two are ripe for optimization. The bytecode includes JLOOP, which turns out to be a big clue.

    Here is the original implementation of pflua-match's filter harness:

    local function filter(ptr, ptr_end, pred)
       local seen, matched = 0, 0
       while ptr < ptr_end do
          local record = ffi.cast("struct pcap_record *", ptr)
          local packet = ffi.cast("unsigned char *", record + 1)
          local ptr_next = packet + record.incl_len
          if pred(packet, record.incl_len) then
             matched = matched + 1
          end
          seen = seen + 1
          ptr = ptr_next
       end
       return seen, matched
    end
    

    It contains exactly one loop. There are less and less places that the unsunk CNEWI allocations could be hiding. The SSA IR documentation says that PVAL is a "parent value reference", which means it comes from the parent trace, and "TRACE 21 start 20/8" says that the parent trace of trace 21 is trace 20. Once again, constants provide a hint to how code lines up: the 16 is from local packet = ffi.cast("unsigned char *", record + 1), as 16 is the size of pcap_record.

    0015 rdi p64 ADD 0013 +16
    0017 {sink} cdt CNEWI +170 0015
    0018 p64 ADD 0013 +8
    0019 rbp u32 XLOAD 0018
    0020 xmm2 num CONV 0019 num.u32
    0021 rbp + p64 ADD 0019 0015
    

    This is further confirmation that 'filter' is the function to be looking at. The reference to a parent trace itself is shown not to be the problem, because both a sunk and an unsunk allocation do it (here are the relevant lines from trace 21):

    0003 r14 p64 PVAL #21
    0006 r15 + cdt CNEWI +170 0003
    0007 {sink} cdt CNEWI +172 0003
    

    LuaJIT does not do classical escape analysis; it is quite a bit more clever, and sinks allocations which classical escape analysis cannot. http://wiki.luajit.org/Allocation-Sinking-Optimization discusses at length what it does and does not do. However, the filter function has an if branch in the middle that suffices to prevent the update to ptr from being sunk. One workaround is to store an offset to the pointer, which is a simple number shared between traces, and confine the pointer arithmetic to within one trace; it turns out that this is sufficient. Here is one rewritten version of the filter function which makes all of the CNEWI calls be sunk:

    local function filter(ptr, max_offset, pred)
       local seen, matched, offset = 0, 0, 0
       local pcap_record_size = ffi.sizeof("struct pcap_record")
       while offset < max_offset do
          local cur_ptr = ptr + offset
          local record = ffi.cast("struct pcap_record *", cur_ptr)
          local packet = ffi.cast("unsigned char *", record + 1)
          offset = offset + pcap_record_size + record.incl_len
          if pred(packet, record.incl_len) then
             matched = matched + 1
          end
          seen = seen + 1
       end
       return seen, matched
    end
    

    How did removing the allocation impact performance? It turned out to be mixed: peak speed dropped, but so did variance, and some filters became a double digit percent faster. Here are the results for the 1gb pcap file that Igalia uses in some benchmarks, which shows large gains on the "tcp port 5555" and "portrange 0-6000" filters, and a much smaller loss of performance on the "tcp", "ip and "accept all" filters.

    The original file:

    The patched file, without CNEWI in the critical path:

    Benchmarks on three other pcap dumps can be found at https://github.com/Igalia/pflua/issues/57. More information about the packet captures tested and Igalia's benchmarking of pflua in general is available at Igalia's pflua-bench repository on Github. There is also information about the workloads and the environment of the benchmarks.

    There are several takeaways here. One is that examining the LuaJIT IR is extremely useful for some optimization problems. Another is that LuaJIT is extremely clever; it manages to make using the FFI quite cheap, and performs optimizations that common techniques cannot. The value of benchmarking, and how it can be misleading, both are highly relevant: a change that removed allocation from a tight loop also hurt performance in several cases, apparently by a large amount in some of the benchmarks on other pcap dumps. Lastly, inefficiency can be where it is least expected; while ptr < ptr_end looks like extremely innocent code, and is just a harness in a small tool script, but it was the cause of unsunk allocations in the critical path, which in turn were skewing benchmarks. What C programmer would suspect that line was at fault?

    by Katerina Barone-Adesi at October 31, 2014 10:51 PM

    October 27, 2014

    Manuel Rego

    Presenting the Web Engines Hackfest

    After the Google’s fork back in April 2013, WebKit and Blink communities have been working independently, however patches often move from one project to another. In addition, a fair amount of the source code continues to be similar. Thus, it seems interesting to have a common place to discuss topics of shared interest and make plans for the future.

    For that reason, Igalia is announcing the Web Engines Hackfest that would happen in December 7-10 in our main office in A Coruña (Spain). This is like a new edition of the WebKitGTK+ Hackfest that we’ve been organizing since 2009, but this year changing the focus in order to make a more inclusive event.

    The Hackfest

    The new event will include members from all parts of the Web Platform community, not restricted to WebKit and Blink, but open to the different Web (Gecko and Servo) and JavaScript (V8, JavaScriptCore and SpiderMoney) free software engines.

    Past year hackfest picture by Claudio Saavedra Past year hackfest picture by Claudio Saavedra

    The idea is to bring together implementors from the Open Web Platform community in a 4-day hacking/discussion/unconference event with the goal of moving forward different topics, discuss common issues, make plans for the future, etc.

    And that’s what makes this event something different. It’s a completely hacking oriented event, where developers sit together in small groups to work for a few days pursuing some specific goals. These groups of interest are defined at the beginning of the hackfest and they agree the different tasks to be done on during the event.
    Thereby, don’t expect a full scheduling in advance with a list of talks and speakers like in a regular conference, this is something totally different focused on real hacking.

    Join us

    After the first round of invitations we’ve already a nice list of people from different companies that will be attending the hackfest this year.
    The good news is that we still have room for a few more people, so if you’re interested in coming to the event please contact us.

    Adobe and Igalia are sponsoring the Web Engines Hackfest 2014 Adobe and Igalia are sponsoring the Web Engines Hackfest 2014

    Thanks to Adobe and Igalia for sponsoring the Web Engines Hackfest and make possible such an exciting event.

    Looking forward to meet you there!

    October 27, 2014 11:00 PM

    October 20, 2014

    Claudio Saavedra

    Mon 2014/Oct/20

    • Together with GNOME 3.14, we have released Web 3.14. Michael Catanzaro, who has been doing an internship at Igalia for the last few months, wrote an excellent blog post describing the features of this new release. Go and read his blog to find out what we've been doing while we wait for his new blog to be sindicated to Planet GNOME.

    • I've started doing two exciting things lately. The first one is Ashtanga yoga. I had been wanting to try yoga for a long time now, as swimming and running have been pretty good for me but at the same time have made my muscles pretty stiff. Yoga seemed like the obvious choice, so after much tought and hesitation I started visiting the local Ashtanga Yoga school. After a month I'm starting to get somewhere (i.e. my toes) and I'm pretty much addicted to it.

      The second thing is that I started playing the keyboards yesterday. I used to toy around with keyboards when I was a kid but I never really learned anything meaningful, so when I saw an ad for a second-hand WK-1200, I couldn't resist and got it. After an evening of practice I already got the feel of Cohen's Samson in New Orleans and the first 16 bars of Verdi's Va, pensiero, but I'm still horribly bad at playing with both hands.

    October 20, 2014 02:11 PM

    October 17, 2014

    Enrique Ocaña

    Hacking on Chromium for Android from Eclipse (part 3)

    In the previous posts we learnt how to code and debug Chromium for Android C++ code from Eclipse. In this post I’m going to explain how to open the ChromeShell Java code, so that you will be able to hack on it like you would in a normal Android app project. Remember, you will need to install the ADT plugin in Eclipse  and the full featured adb which comes with the standalone SDK from the official pageDon’t try to reuse the android sdk in “third_party/android_tools/sdk”.

    Creating the Java project in Eclipse

    Follow these instructions to create the Java project for ChromeShell (for instance):

    • File, New, Project…, Android, Android project from existing code
    • Choose “src/chrome/android/shell/java” as project root, because there’s where the AndroidManifest.xml is. Don’t copy anything to the workspace.
    • The project will have a lot of unmet package dependencies. You have to manually import some jars:
      • Right click on the project, Properties, Java build path, Libraries, Add external Jars…
      • Then browse to “src/out/Debug/lib.java” (assuming a debug build) and import these jars (use CTRL+click for multiple selection in the file chooser):
        • base_java.jar, chrome_java.jar, content_java.jar, dom_distiller_core_java.jar, guava_javalib.jar,
        • jsr_305_javalib.jar, net_java.jar, printing_java.jar, sync_java.jar, ui_java.jar, web_contents_delegate_android.jar
      • If you keep having problems, go to “lib.java”, run this script and find in which jar is the class you’re missing:
    for i in *.jar; do echo "--- $i ---------------------"; unzip -l $i; done | most
    • The generated resources directory “gen” produced by Eclipse is going to lack a lot of stuff.
      • It’s better to make it point to the “right” gen directory used by the native build scripts.
      • Delete the “gen” directory in “src/chrome/android/shell/java” and make a symbolic link:
    ln -s ../../../../out/Debug/chrome_shell_apk/gen .
      • If you ever “clean project” by mistake, delete the chrome_shell_apk/gen directory and regenerate it using the standard ninja build command
    • The same for the “res” directory. From “src/chrome/android/shell/java”, do this (and ignore the errors):
    cp -sr $PWD/../res ./
    cp -sr $PWD/../../java/res ./
    • I haven’t been able to solve the problem of integrating all the string definitions. A lot of string related errors will appear in files under “res”. By the moment, just ignore those errors.
    • Remember to use a fresh standalone sdk. Install support for Android 4.4.2. Also, you will probably need to modify the project properties to match the same 4.4.2 version you have support for.

    And that’s all. Now you can use all the Java code indexing features of Eclipse. By the moment, you still need to build and install to the device using the command line recipe, though:

     ninja -C out/Debug chrome_shell_apk
     build/android/adb_install_apk.py --apk ChromeShell.apk --debug

    Debugging Java code

    To debug the Java side of the app running in the device, follow the same approach that you would if you had a normal Java Android app:

    • Launch the ChromeShell app manually from the device.
    • In Eclipse, use the DDMS perspective to locate the org.chromium.chrome.shell process. Select it in the Devices panel and connect the debugger using the “light green bug” icon (not to be mistaken with the normal debug icon available from the other perspectives).
    • Change to the Debug perspective and set breakpoints as usual.

    Enjoy!

    chromium_android_eclipse_java

    by eocanha at October 17, 2014 06:00 AM

    October 16, 2014

    Claudio Saavedra

    Thu 2014/Oct/16

    My first memories of Meritähti are from that first weekend, in late August 2008, when I had just arrived in Helsinki to spend what was supposed to be only a couple of months doing GTK+ and Hildon work for Nokia. Lucas, who was still in Finland at the time, had recommended that I check the program for the Night of the Arts, an evening that serves as the closing of the summer season in the Helsinki region and consists of several dozens of street stages set up all over with all kind of performances. It sounded interesting, and I was looking forward to check the evening vibe out.

    I was at the Ruoholahti office that Friday, when Kimmo came over to my desk to invite me to join his mates for dinner. Having the Night of the Arts in mind, I suggested we grab some food downtown before finding an interesting act somewhere, to which he replied emphatically "No! We first go to Meritähti to eat, a place nearby — it's our Friday tradition here." Surprised at the tenacity of his objection and being the new kid in town, I obliged. I can't remember now who else joined us in that summer evening before we headed to the Night of the Arts, probably Jörgen, Marius, and others, but that would be the first of many more to come in the future.

    I started taking part of that tradition and I always thought, somehow, that those Meritähti evenings would continue for a long time. Because even after the whole Hildon team was dismantled, even after many of the people in our gang left Nokia and others moved on to work on the now also defunct MeeGo, we still met in Meritähti once in a while for food, a couple of beers, and good laughs. Even after Nokia closed down the Ruoholahti NRC, even after everyone I knew had left the company, even after the company was sold out, and even after half the people we knew had left the country, we still met there for a good old super-special.

    But those evenings were not bound to be eternal, and like most good things in life, they are coming to an end. Meritähti is closing in the next weeks, and the handful of renegades who stuck in Helsinki will have to find a new place where to spend our evenings together. László, the friendly Hungarian who ran the place with his family, is moving on to less stressful endeavors. Keeping a bar is too much work, he told us, and everyone has the right to one day say enough. One would want to do or say something to change his mind, but what right do we have? We should instead be glad that the place was there for us and that we had the chance to enjoy uncountable evenings under the starfish lamps that gave the place its name. If we're feeling melancholic, we will always have Kaurismäki's Lights in the dusk and that glorious scene involving a dog in the cold, to remember one of those many times when conflict would ensue whenever a careless dog-owner would stop for a pint in the winter.

    Long live Meritähti, long live László, and köszönöm!

    October 16, 2014 05:49 PM

    October 14, 2014

    Enrique Ocaña

    Hacking on Chromium for Android from Eclipse (part 2)

    In the previous post, I showed all the references to get the Chromium for Android source code, setup Eclipse and build the ChromeShell app. Today I’m going to explain how to debug that app running in the device.

    Debugging from command line

    This is the first step that we must ensure to have working before trying to debug directly from Eclipse. The steps are explained in the debugging-on-android howto, but I’m showing them here for reference.

    Perform the “build Chrome shell” steps but using debug parameters:

     ninja -C out/Debug chrome_shell_apk
     build/android/adb_install_apk.py --apk ChromeShell.apk --debug

    To avoid the need of having a rooted Android device, setup ChromeShell as the app to be debugged going to Android Settings, Debugging in your device. Now, to launch a gdb debugging session from a console:

     cd ~/ANDROID/src
     . build/android/envsetup.sh
     ./build/android/adb_gdb_chrome_shell --start

    You will see that the adb_gdb script called by adb_gdb_chrome_shell pulls some libraries from your device to /tmp. If everything goes fine, gdb shouldn’t have any problem finding all the symbols of the source code. If not, please check your setup again before trying to debug in Eclipse.

    Debugging from Eclipse

    Ok, this is going to be hacky. Hold on your hat!

    Eclipse can’t use adb_gdb_chrome_shell and adb_gdb “as is”, because they don’t allow gdb command line parameters. We must create some wrappers in $HOME/ANDROID, our working dir. This means “/home/enrique/ANDROID/” for me. The wrappers are:

    Wrapper 1: adb_gdb

    This is a copy of  ~/ANDROID/src/build/android/adb_gdb with some modifications. It calculates the same as the original, but doesn’t launch gdb. Instead, it creates two symbolic links in ~/ANDROID:

    • gdb is a link to the arm-linux-androideabi-gdb command used internally.
    • gdb.init is a link to the temporary gdb config file created internally.

    These two files will make the life simpler for Eclipse. After that, the script prints the actual gdb command that it would have executed (but has not), and reads a line waiting for ENTER. After the user presses ENTER, it just kills everything. Here are the modifications that you have to do to the original adb_gdb you’ve copied. Note that my $HOME (~) is “/home/enrique”:

     # In the begining:
     CHROMIUM_SRC=/home/enrique/ANDROID/src
     ...
     # At the end:
     log "Launching gdb client: $GDB $GDB_ARGS -x $COMMANDS"
    
     rm /home/enrique/ANDROID/gdb
     ln -s "$GDB" /home/enrique/ANDROID/gdb
     rm /home/enrique/ANDROID/gdb.init
     ln -s "$COMMANDS" /home/enrique/ANDROID/gdb.init
     echo
     echo "---------------------------"
     echo "$GDB $GDB_ARGS -x $COMMANDS"
     read
    
     exit 0
     $GDB $GDB_ARGS -x $COMMANDS &&
     rm -f "$GDBSERVER_PIDFILE"

    Wrapper 2: adb_gdb_chrome_shell

    It’s a copy of ~/ANDROID/src/build/android/adb_gdb_chrome_shell with a simple modification in PROGDIR:

     PROGDIR=/home/enrique/ANDROID

    Wrapper 3: gdbwrapper.sh

    Loads envsetup, returns the gdb version for Eclipse if asked, and invokes adb_gdb_chrome_shell. This is the script to be run in the console before starting the debug session in Eclipse. It will invoke the other scripts and wait for ENTER.

     #!/bin/bash
     cd /home/enrique/ANDROID/src
     . build/android/envsetup.sh
     if [ "X$1" = "X--version" ]
     then
      exec /home/enrique/ANDROID/src/third_party/android_tools/ndk/toolchains/arm-linux-androideabi-4.8/prebuilt/linux-x86_64/bin/arm-linux-androideabi-gdb --version
      exit 0
     fi
     exec ../adb_gdb_chrome_shell --start --debug
     #exec ./build/android/adb_gdb_chrome_shell --start --debug

    Setting up Eclipse to connect to the wrapper

    Now, the Eclipse part. From the “Run, Debug configurations” screen, create a new “C/C++ Application” configuration with these features:

    • Name: ChromiumAndroid 1 (name it as you wish)
    • Main:
      • C/C++ Application: /home/enrique/ANDROID/src/out/Debug/chrome_shell_apk/libs/armeabi-v7a/libchromeshell.so
      • IMPORTANT: From time to time, libchromeshell.so gets corrupted and is truncated to zero size. You must regenerate it by doing:

    rm -rf /home/enrique/ANDROID/src/out/Debug/chrome_shell_apk
    ninja -C out/Debug chrome_shell_apk

      • Project: ChromiumAndroid (the name of your project)
      • Build config: Use active
      • Uncheck “Select config using C/C++ Application”
      • Disable auto build
      • Connect process IO to a terminal
    • IMPORTANT: Change “Using GDB (DSF) Create Process Launcher” and use “Legacy Create Process Launcher” instead. This will enable “gdb/mi” and allow us to set the timeouts to connect to gdb.
    • Arguments: No changes
    • Environment: No changes
    • Debugger:
      • Debugger: gdb/mi
      • Uncheck “Stop on startup at”
      • Main:
        • GDB debugger: /home/enrique/ANDROID/gdb (IMPORTANT!)
        • GDB command file: /home/enrique/ANDROID/gdb.init (IMPORANT!)
        • GDB command set: Standard (Linux)
        • Protocol: mi2
        • Uncheck: “Verbose console”
        • Check: “Use full file path to set breakpoints”
      • Shared libs:
        • Check: Load shared lib symbols automatically
    • Source: Use the default values without modification (absolute file path, program relative file path, ChromiumAndroid (your project name)).
    • Refresh: Uncheck “Refresh resources upon completion”
    • Common: No changes.

    When you have everything: apply (to save), close and reopen.

    Running a debug session

    Now, run gdbwrapper.sh in an independent console. When it pauses and starts waiting for ENTER, change to Eclipse, press the Debug button and wait for Eclipse to attach to the debugger. The execution will briefly pause in an ioctl() call and then continue.

    To test that the debugging session is really working, set a breakpoint in content/browser/renderer_host/render_message_filter.cc, at content::RenderMessageFilter::OnMessageReceived and continue the execution. It should break there. Now, from the Debug perspective, you should be able to see the stacktrace and access to the local variables.

    Welcome to the wonderful world of Android native code debugging from Eclipse! It’s a bit slow, though.

    This completes the C++ side of this series of posts. In the next post, I will explain how to open the Java code of ChromeShellActivity, so that you will be able to hack on it like you would in a normal Android app project.

    chromium_android_eclipse

    by eocanha at October 14, 2014 06:00 AM

    October 11, 2014

    Enrique Ocaña

    Hacking on Chromium for Android from Eclipse (part 1)

    In the Chromium Developers website has some excellent resources on how to setup an environment to build Chromium for Linux desktop and for Android. There’s also a detailed guide on how to setup Eclipse as your development environment, enabling you to take advantage of code indexing and enjoy features such as type hierarchy, call hierarchy, macro expansion, references and a lot of tools much better than the poor man’s trick of grepping the code.

    Unfortunately, there are some integration aspects not covered by those guides, so joining all the dots is not a smooth task. In this series of posts, I’m going to explain the missing parts to setup a working environment to code and debug Chromium for Android from Eclipse, both C++ and Java code. All the steps and commands from this series of posts have been tested in an Ubuntu Saucy chroot. See my previous post on how to setup a chroot if you want to know how to do this.

    Get the source code

    See the get-the-code guide. Don’t try to reconvert a normal Desktop build into an Android build. It just doesn’t work. The detailed steps to get the code from scratch and prepare the dependencies are the following:

     cd ANDROID # Or the directory you want
     fetch --nohooks android --nosvn=True
     cd src
     git checkout master
     build/install-build-deps.sh
     build/install-build-deps-android.sh
     gclient sync --nohooks

    Configure and generate the project (see AndroidBuildInstructions), from src:

     # Make sure that ANDROID/.gclient has this line:
     # target_os = [u'android']
     # And ANDROID/chromium.gyp_env has this line:
     # { 'GYP_DEFINES': 'OS=android', }
     gclient runhooks

    Build Chrome shell, from src:

     # This builds
     ninja -C out/Release chrome_shell_apk
     # This installs in the device
     # Remember the usual stuff to use a new device with adb:
     # http://developer.android.com/tools/device.html 
     # http://developer.android.com/tools/help/adb.html#Enabling
     # Ensure that you can adb shell into the device
     build/android/adb_install_apk.py --apk ChromeShell.apk --release

    If you ever need to update the source code, follow this recipe and use Release or Debug at your convenience:

     git pull origin master
     gclient sync
     # ninja -C out/Release chrome_shell_apk
     ninja -C out/Debug chrome_shell_apk
     # build/android/adb_install_apk.py --apk ChromeShell.apk --release
     build/android/adb_install_apk.py --apk ChromeShell.apk --debug

    As a curiosity, it’s worth to mention that adb is installed on third_party/android_tools/sdk/platform-tools/adb.

    Configure Eclipse

    To configure Eclipse, follow the instructions in LinuxEclipseDev. They work nice with Eclipse Kepler.

    In order to open and debug the Java code properly, it’s also interesting to install the ADT plugin in Eclipse too. Don’t try to reuse the Android SDK in “third_party/android_tools/sdk”. It seems to lack some things. Download a fresh standalone SDK from the official page instead and tell the ADT plugin to use it.

    In the next post, I will explain how to debug C++ code running in the device, both from the command line and from Eclipse.

    chromium_android_eclipse_cpp

    by eocanha at October 11, 2014 12:46 AM

    October 02, 2014

    Jacobo Aragunde

    LibreOffice on Android #4 – Document browser revisited

    I’m borrowing the post title that Tomaž and Andrzej used before to talk about the work that I have lately been doing at Igalia regarding LibreOffice on Android.

    You might know there are several projects living under android/experimental in our code tree; it is exciting to see that a new experiment for a document viewer that uses a fresh approach recently arrived to the party, which can be the basis for an Android editor. I was happy to add support to receive view or edit intents to the shiny new viewer, so we could open any document from other Android applications like file browsers.

    Besides, android/experimental hid some very interesting work on an Android-centric document browser that could be a good starting point to implement a native Android wrapping UI to LibreOffice, although it had some problems that made it unusable. In particular, thumbnail generation was making the application crash – for that reason I’ve disabled it until we get a proper fix – and the code to open a document was broken. Fixing and working around these issues were the first steps to bring the document browser back to life.

    I noticed that the browser was inconveniently dependent of the ActionBarSherlock library, which is not really necessary now we are targetting modern Android versions with out-of-the-box action bar support. I replaced Sherlock ActionBars with Android native ones, and that allowed to remove all the files from ABS library from our source tree.

    I also took the freedom to reorganize the application resources (design definitions, bitmaps and so), removing duplicated ones. It was the preparation for the next task…

    Finally, I merged the document browser project into the new viewer with this huge patch, so they can be built and installed together. I also did the modifications for the browser to open the documents using the new viewer, so they become one coherent, whole application.

    Now both the viewer and the document browser can evolve together to become a true LibreOffice for Android, which I hope to see not too far away in the future.

    LibreOffice document browser screenshot

    by Jacobo Aragunde Pérez at October 02, 2014 10:56 AM

    September 29, 2014

    Juan A. Suárez

    Highlights in Grilo 0.2.11 (and Plugins 0.2.13)

    Hello, readers!

    Some weeks ago we released a new version of Grilo and the Plugins set (yes, it sounds like a 70′s music group :) ). You can read the announcement here and here. If you are more curious about all the detailed changes done, you can take a look at the Changelog here and here.

    But even when you can read that information in the above links, it is always a pleasure if someone highlights what are the main changes. So let’s go!

    Launch Tool

    Regarding the core system, among the typical bug fixes, I would highlight a new tool: grl-launch. This tool, as others, got inspiration from GStreamer gst-launch. So far, when you wanted to do some operation in Grilo, like performing a search in YouTube or getting the title of a video on disk, the recommended way was using Grilo Test UI. This is a basic application that allows you to perform the typical operations in Grilo, like browsing or searching, and everthing from a graphical interface. The problem is that this tool is not flexible enough, so you can’t control all the details you could require. And it is also useful to visually check the results, but not to export the to manage with another tool.

    So while the Test UI is still very useful, to cover the other cases we have grl-launch. It is a command-line based tool that allows you to perform most of the operations allowed in Grilo, with a great degree of control. You can browse, search, solve details from a Grilo media element, …, with a great control: how many elements to skip or return, the metadata keys (title, author, album, …) to retrieve, flags to use, etc.

    And on top of that, the results can be exported directly to a CSV file so it can be loaded later in a spreadsheet.

    As example, getting the 10 first trailers from Apple’s iTunes Movie Trailers site:

    
    $ grl-launch-0.2 browse -c 10 -k title,url grl-apple-trailers
    23 Blast,http://trailers.apple.com/movies/independent/23blast/23blast-tlr_h480p.mov
    A Most Wanted Man,http://trailers.apple.com/movies/independent/amostwantedman/amostwantedman-tlr1_h480p.mov
    ABC's of Death 2,http://trailers.apple.com/movies/magnolia_pictures/abcsofdeath2/abcsofdeath2-tlr3_h480p.mov
    About Alex,http://trailers.apple.com/movies/independent/aboutalex/aboutalex-tlr1b_h480p.mov
    Addicted,http://trailers.apple.com/movies/lionsgate/addicted/addicted-tlr1_h480p.mov
    "Alexander and the Terrible, Horrible, No Good, Very Bad Day",http://trailers.apple.com/movies/disney/alexanderterribleday/alexanderterribleday-tlr1_h480p.mov
    Annabelle,http://trailers.apple.com/movies/wb/annabelle/annabelle-tlr1_h480p.mov
    Annie,http://trailers.apple.com/movies/sony_pictures/annie/annie-tlr2_h480p.mov
    Are You Here,http://trailers.apple.com/movies/independent/areyouhere/areyouhere-tlr1_h480p.mov
    As Above / So Below,http://trailers.apple.com/movies/universal/asabovesobelow/asabovesobelow-tlr1_h480p.mov
    10 results
    

    As said, if you re-direct the output to a file and you import it from a spreadsheet program as CSV you will read it better.

    dLeyna/UPnP plugin

    Regarding the plugins, here is where the fun takes place. Almost all plugins were touched, in some way or other. In most cases, for fixing bugs. But there are other changes I’d like to highlight. And among them, UPnP is one that suffered biggest changes.

    Well, strictly speaking, there is no more UPnP plugin. Rather, it was replaced by new dLeyna plugin, written mainly by Emanuele Aina. From an user point of view, there shouldn’t be big differences, as this new plugin also provides access to UPnP/DLNA sources. So where are the differences?

    First off, let’s specify what is dLeyna. So far, if you want to interact with a UPnP source, either you need to deal with the protocol, or use some low-level library, like gupnp. This is what the UPnP plugin was doing. Still it is a rather low-level API, but higher and better than dealing with the raw protocol.

    On the other hand, dLeyna, written by the Intel Open Source Technology Center, wraps the UPnP sources with a D-Bus layer. Actually,not only sources, but also UPnP media renderers and controllers, though in our case we are only interested in the UPnP sources. Thanks to dLeyna, you don’t need any more to interact with low-level UPnP, but with a higher D-Bus service layer. Similar to the way we interact with other services in GNOME or in other platforms. This makes easier to browser or search UPnP sources, and allows us to add new features. dLeyna also hides some details specific to each UPnP server that are of no interest for us, but we would need to deal with in case of using a lower level API. The truth is that though UPnP is quite well specified, each implementation doesn’t follow it at 100%: there are always slight differences that create nasty bugs. In this case, dLeyna acts (or should act) as a protection, dealing itself with those differences.

    And what is needed to use this new plugin? Basically, having dleyna-service D-Bus installed. When the plugin is started, it wakes up the service, which will expose all the available UPnP servers in the network, and the plugin would expose them as Grilo sources. Everything as it was happening with the previous UPnP source.

    In any case, I still keep a copy of the old UPnP plugin for reference, in case someone want to use it or take a look. It is in “unmaintained” mode, so try to use the new dLeyna plugin instead.

    Lua Factory plugin

    There isn’t big changes here, except fixes. But I want to remark it here because it is where most activity is happening. I must thank Bastien and Victor for the work they are doing here. Just to refresh, this plugin allows to execute sources written in Lua. That is, instead of writing your sources in GObject/C, you can use Lua. The Lua Factory plugin will load and run them. Writing plugins in Lua is a pleasure, as it allows to focus on fixing the real problems and leave the boiler plate details to the factory. Honestly, if you are considering writing a new source, I would really think about writing it in Lua.

    And that’s all! It is a longer post than usual, but it is nice to explain what’s going on in Grilo. And remember, if you are considering using Grilo in your product, don’t hesitate to contact with us.

    Happy week!

    by Juan A. Suárez at September 29, 2014 08:45 AM