Planet Igalia

March 23, 2015

Carlos García Campos

WebKitGTK+ 2.8.0

We are excited and proud of announcing WebKitGTK+ 2.8.0, your favorite web rendering engine, now faster, even more stable and with a bunch of new features and improvements.

Gestures

Touch support is one the most important features missing since WebKitGTK+ 2.0.0. Thanks to the GTK+ gestures API, it’s now more pleasant to use a WebKitWebView in a touch screen. For now only the basic gestures are implemented: pan (for scrolling by dragging from any point of the WebView), tap (handling clicks with the finger) and zoom (for zooming in/out with two fingers). We plan to add more touch enhancements like kinetic scrolling, overshot feedback animation, text selections, long press, etc. in future versions.

HTML5 Notifications

notifications

Notifications are transparently supported by WebKitGTK+ now, using libnotify by default. The default implementation can be overridden by applications to use their own notifications system, or simply to disable notifications.

WebView background color

There’s new API now to set the base background color of a WebKitWebView. The given color is used to fill the web view before the actual contents are rendered. This will not have any visible effect if the web page contents set a background color, of course. If the web view parent window has a RGBA visual, we can even have transparent colors.

webkitgtk-2.8-bgcolor

A new WebKitSnapshotOptions flag has also been added to be able to take web view snapshots over a transparent surface, instead of filling the surface with the default background color (opaque white).

User script messages

The communication between the UI process and the Web Extensions is something that we have always left to the users, so that everybody can use their own IPC mechanism. Epiphany and most of the apps use D-Bus for this, and it works perfectly. However, D-Bus is often too much for simple cases where there are only a few  messages sent from the Web Extension to the UI process. User script messages make these cases a lot easier to implement and can be used from JavaScript code or using the GObject DOM bindings.

Let’s see how it works with a very simple example:

In the UI process, we register a script message handler using the WebKitUserContentManager and connect to the “script-message-received-signal” for the given handler:

webkit_user_content_manager_register_script_message_handler (user_content, 
                                                             "foo");
g_signal_connect (user_content, "script-message-received::foo",
                  G_CALLBACK (foo_message_received_cb), NULL);

Script messages are received in the UI process as a WebKitJavascriptResult:

static void
foo_message_received_cb (WebKitUserContentManager *manager,
                         WebKitJavascriptResult *message,
                         gpointer user_data)
{
        char *message_str;

        message_str = get_js_result_as_string (message);
        g_print ("Script message received for handler foo: %s\n", message_str);
        g_free (message_str);
}

Sending a message from the web process to the UI process using JavaScript is very easy:

window.webkit.messageHandlers.foo.postMessage("bar");

That will send the message “bar” to the registered foo script message handler. It’s not limited to strings, we can pass any JavaScript value to postMessage() that can be serialized. There’s also a convenient API to send script messages in the GObject DOM bindings API:

webkit_dom_dom_window_webkit_message_handlers_post_message (dom_window, 
                                                            "foo", "bar");

 

Who is playing audio?

WebKitWebView has now a boolean read-only property is-playing-adio that is set to TRUE when the web view is playing audio (even if it’s a video) and to FALSE when the audio is stopped. Browsers can use this to provide visual feedback about which tab is playing audio, Epiphany already does that :-)

ephy-is-playing-audio

HTML5 color input

Color input element is now supported by default, so instead of rendering a text field to manually input the color  as hexadecimal color code, WebKit now renders a color button that when clicked shows a GTK color chooser dialog. As usual, the public API allows to override the default implementation, to use your own color chooser. MiniBrowser uses a popover, for example.

mb-color-input-popover

APNG

APNG (Animated PNG) is a PNG extension that allows to create animated PNGs, similar to GIF but much better, supporting 24 bit images and transparencies. Since 2.8 WebKitGTK+ can render APNG files. You can check how it works with the mozilla demos.

webkitgtk-2.8-apng

SSL

The POODLE vulnerability fix introduced compatibility problems with some websites when establishing the SSL connection. Those problems were actually server side issues, that were incorrectly banning SSL 3.0 record packet versions, but that could be worked around in WebKitGTK+.

WebKitGTK+ already provided a WebKitWebView signal to notify about TLS errors when loading, but only for the connection of the main resource in the main frame. However, it’s still possible that subresources fail due to TLS errors, when using a connection different to the main resource one. WebKitGTK+ 2.8 gained WebKitWebResource::failed-with-tls-errors signal to be notified when a subresource load failed because of invalid certificate.

Ciphersuites based on RC4 are now disallowed when performing TLS negotiation, because it is no longer considered secure.

Performance: bmalloc and concurrent JIT

bmalloc is a new memory allocator added to WebKit to replace TCMalloc. Apple had already used it in the Mac and iOS ports for some time with very good results, but it needed some tweaks to work on Linux. WebKitGTK+ 2.8 now also uses bmalloc which drastically improved the overall performance.

Concurrent JIT was not enabled in GTK (and EFL) port for no apparent reason. Enabling it had also an amazing impact in the performance.

Both performance improvements were very noticeable in the performance bot:

webkitgtk-2.8-perf

 

The first jump on 11th Feb corresponds to the bmalloc switch, while the other jump on 25th Feb is when concurrent JIT was enabled.

Plans for 2.10

WebKitGTK+ 2.8 is an awesome release, but the plans for 2.10 are quite promising.

  • More security: mixed content for most of the resources types will be blocked by default. New API will be provided for managing mixed content.
  • Sandboxing: seccomp filters will be used in the different secondary processes.
  • More performance: FTL will be enabled in JavaScriptCore by default.
  • Even more performance: this time in the graphics side, by using the threaded compositor.
  • Blocking plugins API: new API to provide full control over the plugins load process, allowing to block/unblock plugins individually.
  • Implementation of the Database process: to bring back IndexedDB support.
  • Editing API: full editing API to allow using a WebView in editable mode with all editing capabilities.

by carlos garcia campos at March 23, 2015 11:56 AM

March 18, 2015

Víctor Jáquez

GStreamer Hackfest 2015

Last weekend was the GStreamer Hackfest in Staines, UK, in the Samsung’s premises, who also sponsored the dinners and the lunches. Special thanks to Luis de Bethencourt, the almighty organizer!

My main purpose was to sip one or two pints with the GStreamer folks and, secondarily, to talk about gstreamer-vaapi, WebKitGTK+ and the new OpenGL/ES support in gst-plugins-bad.

15030008

About gstreamer-vaapi, there were a couple questions about some problems shown in downstream (stable releases in distributions) which I was happy to announce that they are mostly fixed in upstream. On the other hand, Sebastian Drödge was worried about the existing support of GStreamer 0.10 and I answered him that its removal is already in the pipeline. He looked pleased.

Related with gstreamer-vaapi and the new GstGL, we tested and merged a patch for GLES2/EGL, so now it is possible to render VA-API decoded video through glimagesink with (nearly) zero-copy. Sadly, this is not currently possible using GLX. Along the way I found a silly bug that came from a previous patch of mine and fixed it; also, we fixed other small bug in the gluploader .

In the WebKitGTK+ realm, I worked on a new functionality: to share the OpenGL context and the display of the browser with the GStreamer pipeline. With it, we could add gl filters into the pipeline. But honour to whom honour is due: this patch is a split of a previous patch done by Philippe Normand. The ultimate goal is to ditch the custom video sink in WebKit and reuse the glimagesink, with it’s new off-screen rendering feature.

Finally, on Sunday’s afternoon, I walked around Richmond and it is beautiful.

15030009

Thanks to Igalia, Intel and all the sponsors  that make possible the hackfest and my attendance.

by vjaquez at March 18, 2015 06:36 PM

March 17, 2015

Jacobo Aragunde

Creating new document providers in LibreOffice for Android

We recently completed our tasks for The Document Foundation regarding the Android document browser; nonetheless, we had a pending topic regarding the documentation of our work: write and publish a guide to extend the cloud storage integration. This blog post covers how to integrate new cloud solutions using the framework for cloud storage we have implemented.

Writing a document provider

Document Provider class diagram

We have called “document providers” to the set of classes that implement support for some storage solution. Document providers will consist of two classes implementing the IDocumentProvider and IFile interfaces. Both contain extensive in-code documentation of the operations to help anybody implementing them.

The IDocumentProvider interface provides some general operations about the provider, addressed to provide a starting point for the service. getRootDirectory() provides a pointer to the root of the service, while createFromUri() is required to restore the state of the document browser.

The IFile interface is an abstraction of the java File class, with many similar operations. Those operations will be used by the document browser to print information about the files, browse the directories and open the final documents.

Once those classes have been implemented, the new provider must be linked with the rest of the application by making some modifications to DocumentProviderFactory class. Touching the initialize() method to add a new instance of the provider to the providers[] array should be enough:

    // initialize document providers list
    instance.providers = new IDocumentProvider[3];
    instance.providers[0] = new LocalDocumentsDirectoryProvider();
    instance.providers[1] = new LocalDocumentsProvider();
    instance.providers[2] = new OwnCloudProvider(context);

At this point, your provider should appear in the drawer that pops-up with a swipe gesture from the left of the screen.

LibreOffice for Android, provider selection

You are encouraged to create the classes for your document provider in a separate package inside org.libreoffice.storage. Your operations may throw a RuntimeException in case of error, it will be captured by the UI activity and the message inside the exception will be shown, so make sure that you are internationalizing the strings using the standard Android API. You can always take a look to the existing providers and use them as an example, specially OwnCloudProvider which is the most complex one but still quite manageable.

Making use of application settings

If you are implementing a generic provider for some cloud service, it is quite likely that you need some input from the user like a login name or a password. For that reason we have added an activity for configuration related with document providers.

To add your settings in that screen, modify the file res/xml/documentprovider_preferences.xml and add a new PreferenceCategory that contain your own ones. The android:key attribute will allow you to use the preference from your code; you may want to add that preference string as a constant in DocumentProviderSettingsActivity.

At this point, you will be able to use the preferences in your DocumentProvider using the standard Android API. Take OwnCloudProvider as an example:

    public OwnCloudProvider(Context context) {
        ...
        // read preferences
        SharedPreferences preferences = PreferenceManager.getDefaultSharedPreferences(context);
        serverUrl = preferences.getString(
                DocumentProviderSettingsActivity.KEY_PREF_OWNCLOUD_SERVER, "");

Finally, we have added a way for providers to check if settings have changed; otherwise, the application should be restarted for the changes to take effect. Your provider must implement the OnSharedPreferenceChangeListener, which brings the onSharedPreferenceChanged() method. That method will be called whenever any preference is changed, you can just check the ones you are interested in using the key parameter, and make the changes required to the internal state of your provider.

Preference listener class diagram

Future

This effort has been one more step towards building a full featured version of LibreOffice for Android, and there are improvements we can, or even must do in the future:

  • With the progression of the work in the LibreOffice Android editor we will have to add a save operation to the document providers that takes care of uploading the new version of the file to the cloud.
  • It would be a good idea to implement an add-on mechanism to install the document providers. That way we would not add unnecessary weight to the main package, and plugins could be independently distributed.

That’s all for now; you can try the ownCloud provider building the feature/owncloud-provider-for-android branch yourself, or wait for it to be merged. We hope to see other members of the community taking advantage of this framework to provide new services soon.

by Jacobo Aragunde Pérez at March 17, 2015 11:54 AM

March 09, 2015

Javier Fernández

Content Distribution in CSS Grid Layout

It’s been a while since Igalia and Bloomberg started to implement the Box Alignment specification for the CSS Grid Layout model. Some weeks ago we accomplished an important milestone of our roadmap landing in Blink trunk the last patches implementing the Content Distribution properties: align-content and justify-content.

Quoting the CSS Box Alignment document, the content distribution properties are defined as follows:

Aligns the contents of the box as a whole along the box’s inline/row/main axis.
The alignment container is the grid container’s content box. The alignment subjects are the grid tracks.

The CSS syntax of these recently added properties gives an idea of how powerful and flexible they are for grid layout definitions, allowing every possible alignment values combination:

auto | <baseline-position> | <content-distribution> || [ <overflow-position>? && <content-position> ]

It’s worth mentioning that Baseline Alignment is still not implemented, as well as the <content-distribution> values for Distribution Alignment. However, in the latter case I’ve got already a quite promising draft implementation which, eventually, has been very useful to activate a discussion inside the W3C community to allow these alignment values for grid. In previous versions of the specification it was stated that all <content-distribution> values should use their <content-position> fallback values for grid containers. I’m glad that such decision was finally made, because I think that <content-distribution> values are really useful for defining fancier grid layouts. I’ll talk about this soon in a new post, as I consider it deserves a detailed description with the proper examples.

Last, but not least, as it happens with Self Alignment it allows using overflow keywords to define how we want to handle grid’s content overflow. It works in the same way for Content Distribution as we’ll see later with some examples.

Aligning the grid

When there is available space in the grid container block, it’s always useful to have a way to control how we want to use such space and how we want our grid to behave on it. It might happen that container’s size changes (fullscreen mode) or we could have to deal with a content sized grid modifying its content’s size. There are quite many possibilities, so I’ll leave this issue for user/designer’s imagination and I’ll focus on very simple examples to illustrate the concept.

For now, let’s consider this case to understand what you can do with the different <content-alignment> values in a grid layout.

.grid {
    grid: 50px 50px / 100px 100px;
    position: relative;
    width: 200px;
    height: 300px;
}
.fixedSize {
    width: 20px;
    height: 40px;
}
<div class="grid">
   <div class="fixedSize" style="grid-column: 1; grid-row: 1; background: violet;"></div>
   <div style="grid-column: 1; grid-row: 2; background: yellow;"></div>
   <div style="grid-column: 2; grid-row: 1; background: green;"></div>
   <div class="fixedSize" style="grid-column: 2; grid-row: 2; background: red;"></div>
</div>

We are defining a 2×2 grid with 50×100 pixels cells where we are going to place 4 items, one in each cell. Notice that items at (1,1) and (2,2) have a fixed size of 20×40 pixels, while the other 2 are auto-sized so they will be stretched to fill their corresponding grid cell (if you don’t know why, a reading of previous post might help). Also, bear in mind that both align-content and justify-content properties have start as the initial value for grids.

ContentAlignment

Controlling the grid overflow

When grid content’s size exceeds its container dimensions there is the risk of data loss. Some examples of this scenario are center or end alignment from the viewport’s edges; all the content overflowing the viewport’s area can’t be reached, hence we lose such data. In order to prevent this issue Box Alignment specification defines the safe overflow mode, which basically forces a start alignment value for the property handling the dimension where the overflow is detected.

Using the same CSS and HTML code in the example above, we can easily define cases where this data loss issue (red colored arrows) is clearly noticeable just modifying the height or width to cause top or left overflow respectively.

Content-Alignment-Overflow1

There are other situations where Content Alignment and Overflow interact in a different way, using margins, padding or/and borders and even combining different writing-modes and flow directions. The effect of the alignment values varies considerably depending on those factors but I think you have now a clear idea of how to use these new properties in a grid layout.

Current status and next steps

With the grid support for the align-content and justify-content CSS properties in Blink we’ve got most of the Box Alignment specification covered. As it was commented before, just Base Alignment is still pending to be implemented in Chromium browsers. I have to admit that there are also some bugs and wrong behavior using certain CSS combinations, specially regarding orthogonal flows, but we are working on it right now and I hope to integrate the patches soon in trunk.

For the time being, let’s consider the following table as the current implementation status of the Box Alignment specification for the Grid Layout model in WebKit (Safari/Epiphany) and Blink (Chrome/Chromium/Opera) based browsers:

align-grid-support-1

The lack of progress in the implementation of the Box Alignment specification in the WebKit web engine is disappointing. I’ve been stuck for quite a lot of time trying to upgrade the CSS properties to the last version of the spec, mainly due design and performance issues. I’ll discuss with the WebKit hackers the best approach to solve this issue so I can put the Grid Layout implementation at the same level than in Blink web engine.

Igalia and Bloomberg will continue working on the implementation of the CSS Grid Layout specification and among my short/mid term challenges are completing the Box Alignment support. These goals include the following tasks:

  • Fixing bugs and completing the orthogonal flows support.
  • Implementing the Base Alignment features
  • Completing the Content Distribution Alignment with the <content-distribution> values
  • Implementing the Box Alignment spec in WebKit
Igalia & Bloomberg logos

Igalia and Bloomberg working to build a better web platform

by jfernandez at March 09, 2015 01:58 PM

March 06, 2015

Iago Toral

An introduction to Mesa’s GLSL compiler (II)

Recap

My previous post served as an initial look into Mesa’s GLSL compiler, where we discussed the Mesa IR, which is a core aspect of the compiler. In this post I’ll introduce another relevant aspect: IR lowering.

IR lowering

There are multiple lowering passes implemented in Mesa (check src/glsl/lower_*.cpp for a complete list) but they all share a common denominator: their purpose is to re-write certain constructs in the IR so they fit better the underlying GPU hardware.

In this post we will look into the lower_instructions.cpp lowering pass, which rewrites expression operations that may not be supported directly by GPU hardware with different implementations.

The lowering process involves traversing the IR, identifying the instructions we want to lower and modifying the IR accordingly, which fits well into the visitor pattern strategy discussed in my previous post. In this case, expression lowering is handled by the lower_instructions_visitor class, which implements the lowering pass in the visit_leave() method for ir_expression nodes.

The hierarchical visitor class, which serves as the base class for most visitors in Mesa, defines visit() methods for leaf nodes in the IR tree, and visit_leave()/visit_enter() methods for non-leaf nodes. This way, when traversing intermediary nodes in the IR we can decide to take action as soon as we enter them or when we are about to leave them.

In the case of our lower_instructions_visitor class, the visit_leave() method implementation is a large switch() statement with all the operators that it can lower.

The code in this file lowers common scenarios that are expected to be useful for most GPU drivers, but individual drivers can still select which of these lowering passes they want to use. For this purpose, hardware drivers create instances of the lower_instructions class passing the list of lowering passes to enable. For example, the Intel i965 driver does:

const int bitfield_insert = brw->gen >= 7
                            ? BITFIELD_INSERT_TO_BFM_BFI
                            : 0;
lower_instructions(shader->base.ir,
                   MOD_TO_FLOOR |
                   DIV_TO_MUL_RCP |
                   SUB_TO_ADD_NEG |
                   EXP_TO_EXP2 |
                   LOG_TO_LOG2 |
                   bitfield_insert |
                   LDEXP_TO_ARITH);

Notice how in the case of Intel GPUs, one of the lowering passes is conditionally selected depending on the hardware involved. In this case, brw->gen >= 7 selects GPU generations since IvyBridge.

Let’s have a look at the implementation of some of these lowering passes. For example, SUB_TO_ADD_NEG is a very simple one that transforms subtractions into negative additions:

void
lower_instructions_visitor::sub_to_add_neg(ir_expression *ir)
{
   ir->operation = ir_binop_add;
   ir->operands[1] =
      new(ir) ir_expression(ir_unop_neg, ir->operands[1]->type,
                            ir->operands[1], NULL);
   this->progress = true;
}

As we can see, the lowering pass simply changes the operator used by the ir_expression node, and negates the second operand using the unary negate operator (ir_unop_neg), thus, converting the original a = b – c into a = b + (-c).

Of course, if a driver does not have native support for the subtraction operation, it could still do this when it processes the IR to produce native code, but this way Mesa is saving driver developers that work. Also, some lowering passes may enable optimization passes after the lowering that drivers might miss otherwise.

Let’s see a more complex example: MOD_TO_FLOOR. In this case the lowering pass provides an implementation of ir_binop_mod (modulo) for GPUs that don’t have a native modulo operation.

The modulo operation takes two operands (op0, op1) and implements the C equivalent of the ‘op0 % op1‘, that is, it computes the remainder of the division of op0 by op1. To achieve this the lowering pass breaks the modulo operation as mod(op0, op1) = op0 – op1 * floor(op0 / op1), which requires only multiplication, division and subtraction. This is the implementation:

ir_variable *x = new(ir) ir_variable(ir->operands[0]->type, "mod_x",
                                     ir_var_temporary);
ir_variable *y = new(ir) ir_variable(ir->operands[1]->type, "mod_y",
                                     ir_var_temporary);
this->base_ir->insert_before(x);
this->base_ir->insert_before(y);

ir_assignment *const assign_x =
   new(ir) ir_assignment(new(ir) ir_dereference_variable(x),
                         ir->operands[0], NULL);
ir_assignment *const assign_y =
   new(ir) ir_assignment(new(ir) ir_dereference_variable(y),
                         ir->operands[1], NULL);

this->base_ir->insert_before(assign_x);
this->base_ir->insert_before(assign_y);

ir_expression *const div_expr =
   new(ir) ir_expression(ir_binop_div, x->type,
                         new(ir) ir_dereference_variable(x),
                         new(ir) ir_dereference_variable(y));

/* Don't generate new IR that would need to be lowered in an additional
 * pass.
 */
if (lowering(DIV_TO_MUL_RCP) && (ir->type->is_float() ||
    ir->type->is_double()))
   div_to_mul_rcp(div_expr);

ir_expression *const floor_expr =
   new(ir) ir_expression(ir_unop_floor, x->type, div_expr);

if (lowering(DOPS_TO_DFRAC) && ir->type->is_double())
   dfloor_to_dfrac(floor_expr);

ir_expression *const mul_expr =
   new(ir) ir_expression(ir_binop_mul,
                         new(ir) ir_dereference_variable(y),
                         floor_expr);

ir->operation = ir_binop_sub;
ir->operands[0] = new(ir) ir_dereference_variable(x);
ir->operands[1] = mul_expr;
this->progress = true;

Notice how the first thing this does is to assign the operands to a variable. The reason for this is a bit tricky: since we are going to implement ir_binop_mod as op0 – op1 * floor(op0 / op1), we will need to refer to the IR nodes op0 and op1 twice in the tree. However, we can’t just do that directly, for that would mean that we have the same node (that is, the same pointer) linked from two different places in the IR expression tree. That is, we want to have this tree:

                             sub
                           /     \
                        op0       mult
                                 /    \
                              op1     floor
                                        |
                                       div
                                      /   \
                                   op0     op1

Instead of this other tree:


                             sub
                           /     \
                           |      mult
                           |     /   \
                           |   floor  |
                           |     |    |
                           |    div   |
                           |   /   \  |
                            op0     op1   

This second version of the tree is problematic. For example, let’s say that a hypothetical optimization pass detects that op1 is a constant integer with value 1, and realizes that in this case div(op0/op1) == op0. When doing that optimization, our div subtree is removed, and with that, op1 could be removed too (and possibily freed), leaving the other reference to that operand in the IR pointing to an invalid memory location… we have just corrupted our IR:

                             sub
                           /     \
                           |      mult
                           |     /    \
                           |   floor   op1 [invalid pointer reference]
                           |     |
                           |    /
                           |   /
                            op0 

Instead, what we want to do here is to clone the nodes each time we need a new reference to them in the IR. All IR nodes have a clone() method for this purpose. However, in this particular case, cloning the nodes creates a new problem: op0 and op1 are ir_expression nodes so, for example, op0 could be the expression a + b * c, so cloning the expression would produce suboptimal code where the expression gets replicated. This, at best, will lead to slower
compilation times due to optimization passes needing to detect and fix that, and at worse, that would go undetected by the optimizer and lead to worse performance where we compute the value of the expression multiple times:

                              sub
                           /        \
                         add         mult
                        /   \       /    \
                      a     mult  op1     floor
                            /   \          |
                           b     c        div
                                         /   \
                                      add     op1
                                     /   \
                                    a    mult
                                        /    \
                                        b     c

The solution to this problem is to assign the expression to a variable, then dereference that variable (i.e., read its value) wherever we need. Thus, the implementation defines two variables (x, y), assigns op0 and op1 to them and creates new dereference nodes wherever we need to access the value of the op0 and op1 expressions:

                       =               =
                     /   \           /   \
                    x     op0       y     op1


                             sub
                           /     \
                         *x       mult
                                 /    \
                               *y     floor
                                        |
                                       div
                                      /   \
                                    *x     *y

In the diagram above, each variable dereference is marked with an ‘*’, and each one is a new IR node (so both appearances of ‘*x’ refer to different IR nodes, both representing two different reads of the same variable). With this solution we only evaluate the op0 and op1 expressions once (when they get assigned to the corresponding variables) and we never refer to the same IR node twice from different places (since each variable dereference is a new IR node).

Now that we know why we assign these two variables, let’s continue looking at the code of the lowering pass:

In the next step we implement op0 / op1 using a ir_binop_div expression. To speed up compilation, if the driver has the DIV_TO_MUL_RCP lowering pass enabled, which transforms a / b into a * 1 / b (where 1 / b could be a native instruction), we immediately execute the lowering pass for that expression. If we didn’t do this here, the resulting IR would contain a division operation that might have to be lowered in a later pass, making the compilation process slower.

The next step uses a ir_unop_floor expression to compute floor(op0/op1), and again, tests if this operation should be lowered too, which might be the case if the type of the operands is a 64bit double instead of a regular 32bit float, since GPUs may only have a native floor instruction for 32bit floats.

Next, we multiply the result by op1 to get op1 * floor(op0 / op1).

Now we only need to subtract this from op0, which would be the root IR node for this expression. Since we want the new IR subtree spawning from this root node to replace the old implementation, we directly edit the IR node we are lowering to replace the ir_binop_mod operator with ir_binop_sub, make a dereference to op1 in the first operand and link the expression holding op1 * floor(op0 / op1) in the second operand, effectively attaching our new implementation in place of the old version. This is how the original and lowered IRs look like:

Original IR:

[prev inst] -> mod -> [next inst]
              /   \            
           op0     op1         

Lowered IR:

[prev inst] -> var x -> var y ->   =   ->   =   ->   sub   -> [next inst]
                                  / \      / \      /   \
                                 x  op0   y  op1  *x     mult
                                                        /    \
                                                      *y      floor
                                                                |
                                                               div
                                                              /   \
                                                            *x     *y

Finally, we return true to let the compiler know that we have optimized the IR and that as a consequence we have introduced new nodes that may be subject to further lowering passes, so it can run a new pass. For example, the subtraction we just added may be lowered again to a negative addition as we have seen before.

Coming up next

Now that we learnt about lowering passes we can also discuss optimization passes, which are very similar since they are also based on the visitor implementation in Mesa and also transform the Mesa IR in a similar way.

by Iago Toral at March 06, 2015 02:06 PM

Samuel Iglesias

My FOSDEM 2015 talk’s slides and video

As I said in a previous post, this year I attended FOSDEM to give talk about how to test our OpenGL drivers with Free Software. The slides are available online and the video recording has been recently released.

Last but not least, I would like to thank Igalia for sponsoring my trip :-)

Igalia logo

by Samuel Iglesias at March 06, 2015 08:25 AM

March 05, 2015

Jacobo Aragunde

New features in LibreOffice for Android document browser

The Document Foundation recently assigned one of the packages of the Android tender to Igalia; in particular, the one about cloud storage and email sharing. Our proposal comprised the following tasks:

  • Integrate the “share” feature of the Android framework to be able to send documents by email, bluetooth or any other means provided by the system.
  • Provide the means for the community to develop integration of cloud storage solutions.
  • Implement ownCloud integration as an example of how to integrate other cloud solutions.
  • Extensive documentation of the process to integrate more cloud solutions.

The work is completed and the patches available in the repository; most of them are already merged in master, while ownCloud support lives in a different branch for now.

Sharing documents

The Android-provided share feature allows to send a document not only through email but through bluetooth or any available methods, depending on the software installed in your device.

We have made this feature available to users through a context menu in the document browser, which pops up after a long press on a document.

Context menu in Android document browser

Share from the Android document browser

Support for cloud storage solutions

This task consisted of creating an interface to develop integration of any cloud storage solution. The first step was abstracting the code that made direct access to the file system, so it could be replaced by the different implementations of storage services, which from now on will be denominated document providers.

Afterwards, we created two document providers for local storage: one to access the internal storage of the device and another one to conveniently access the Documents directory inside the storage. These two simple providers served as a test for the UI to switch between both; we used the Android drawer widget, which pops-up with a swipe gesture from the left of the screen.

Side drawer in Android document browser

All the operations in the Android document browser were being performed in the same thread. Besides being suboptimal, the development framework actually forbids running network code in the main thread of the application. The next step for us was isolating the code that might need networking access when interacting with a cloud provider, and run it in separate threads.

ownCloud document provider

At that point, we had everything in place to write the code to access an ownCloud server. We did it with the help of an Android library provided by ownCloud developers.

There was still another task, though; any cloud service will likely need some configuration from the user for login credentials and so. We had to implement a preferences screen to enter these settings and do the proper wiring for the provider to be able to listen to any changes in them.

ownCloud settings screen

Documentation

To help other developers writing new document providers, we have tried to document the new code in detail, specially those interfaces that must be implemented to create new document providers. Besides, we will publish a document explaining how to extend the cloud storage integration here soon.

That’s all for now; to try the ownCloud provider you will have to build the feature/owncloud-provider-for-android branch yourself, while you will find the share feature in the packages already available in the Play Store or F-Droid. Hope you enjoy it!

by Jacobo Aragunde Pérez at March 05, 2015 03:46 PM

March 03, 2015

Iago Toral

An introduction to Mesa’s GLSL compiler (I)

Recap

In my last post I explained that modern 3D pipelines are programmable and how this has impacted graphics drivers. In the following posts we will go deeper into this aspect by looking at different parts of Mesa’s GLSL compiler. Specifically, this post will cover the GLSL parser, the Mesa IR and built-in variables and functions.

The GLSL parser

The job of the parser is to process the shader source code string provided via glShaderSource and transform it into a suitable binary representation that is stored in RAM and can be efficiently processed by other parts of the compiler in later stages.

The parser consists of a set of Lex/Yacc rules to process the incoming shader source. The lexer (glsl_parser.ll) takes care of tokenizing the source code and the parser (glsl_parser.yy) adds meaning to the stream of tokens identified in
the lexer stage.

Similarly, just like in C or C++, GLSL includes a pre-processor that goes through the shader source code before the main parser kicks in. Mesa’s implementation of the GLSL pre-processor lives in src/glsl/glcpp and is also based on Lex/Yacc rules.

The output of the parser is an Abstract Syntax Tree (AST) that lives in RAM memory, which is a binary representation of the shader source code. The nodes that make this tree are defined in src/glsl/ast.h.

For someone familiar with all the Lex/Yacc stuff, the parser implementation in Mesa should feel familiar enough.

The next step takes care of converting from the AST to a different representation that is better suited for the kind of operations that drivers will have to do with it. This new representation, called the IR (Intermediate Representation), is usually referenced in Mesa as Mesa IR, GLSL IR or simply HIR.

The AST to Mesa IR conversion is driven by the code in src/glsl/ast_to_hir.cpp.

Mesa IR

The Mesa IR is the main data structure used in the compiler. Most of the work that the compiler does can be summarized as:

  • Optimizations in the IR
  • Modifications in the IR for better/easier integration with GPU hardware
  • Linking multiple shaders (multiple IR instances) into a single program.
  • Generating native assembly code for the target GPU from the IR

As we can see, the Mesa IR is at the core of all the work that the compiler has to do, so understanding how it is setup is necessary to work in this part of Mesa.

The nodes in the Mesa IR tree are defined in src/glsl/ir.h. Let’s have a look at the most important ones:

At the top of the class hierarchy for the IR nodes we have exec_node, which is Mesa’s way of linking independent instructions together in a list to make a program. This means that each instruction has previous and next pointers to the instructions that are before and after it respectively. So, we have ir_instruction, the base class for all nodes in the tree, inherit from exec_node.

Another important node is ir_rvalue, which is the base class used to represent expressions. Generally, anything that can go on the right side of an assignment is an ir_rvalue. Subclasses of ir_rvalue include ir_expression, used to represent all kinds of unary, binary or ternary operations (supported operators are defined in the ir_expression_operation enumeration), ir_texture, which is used to represent texture operations like a texture lookup, ir_swizzle, which is used for swizzling values in vectors, all the ir_dereference nodes, used to access the values stored in variables, arrays, structs, etc. and ir_constant, used to represent constants of all basic types (bool, float, integer, etc).

We also have ir_variable, which represents variables in the shader code. Notice that the definition of ir_variable is quite large… in fact, this is by large the node with the most impact in the memory footprint of the compiler when compiling shaders in large games/applications. Also notice that the IR differentiates between variables and variable dereferences (the fact of looking into a variable’s value), which are represented as an ir_rvalue.

Similarly, the IR also defines nodes for other language constructs like ir_loop, ir_if, ir_assignment, etc.

Debugging the IR is not easy, since the representation of a shader program in IR nodes can be quite complex to traverse and inspect with a debugger. To help with this Mesa provides means to print the IR to a human-readable text format. We can enable this by using the environment variable MESA_GLSL=dump. This will instruct Mesa to print both the original shader source code and its IR representation. For example:

$ MESA_GLSL=dump ./test_program

GLSL source for vertex shader 1:
#version 140
#extension GL_ARB_explicit_attrib_location : enable

layout(location = 0) in vec3 inVertexPosition;
layout(location = 1) in vec3 inVertexColor;

uniform mat4 MVP;
smooth out vec3 out0;

void main()
{
  gl_Position = MVP * vec4(inVertexPosition, 1);
  out0 = inVertexColor;
}

GLSL IR for shader 1:
(
(declare (sys ) int gl_InstanceID)
(declare (sys ) int gl_VertexID)
(declare (shader_out ) (array float 0) gl_ClipDistance)
(declare (shader_out ) float gl_PointSize)
(declare (shader_out ) vec4 gl_Position)
(declare (uniform ) (array vec4 56) gl_CurrentAttribFragMESA)
(declare (uniform ) (array vec4 33) gl_CurrentAttribVertMESA)
(declare (uniform ) gl_DepthRangeParameters gl_DepthRange)
(declare (uniform ) int gl_NumSamples)
(declare () int gl_MaxVaryingComponents)
(declare () int gl_MaxClipDistances)
(declare () int gl_MaxFragmentUniformComponents)
(declare () int gl_MaxVaryingFloats)
(declare () int gl_MaxVertexUniformComponents)
(declare () int gl_MaxDrawBuffers)
(declare () int gl_MaxTextureImageUnits)
(declare () int gl_MaxCombinedTextureImageUnits)
(declare () int gl_MaxVertexTextureImageUnits)
(declare () int gl_MaxVertexAttribs)
(declare (shader_in ) vec3 inVertexPosition)
(declare (shader_in ) vec3 inVertexColor)
(declare (uniform ) mat4 MVP)
(declare (shader_out smooth) vec3 out0)
(function main
  (signature void
    (parameters
    )
    (
      (declare (temporary ) vec4 vec_ctor)
      (assign  (w) (var_ref vec_ctor)  (constant float (1.000000)) ) 
      (assign  (xyz) (var_ref vec_ctor)  (var_ref inVertexPosition) ) 
      (assign  (xyzw) (var_ref gl_Position)
            (expression vec4 * (var_ref MVP) (var_ref vec_ctor) ) ) 
      (assign  (xyz) (var_ref out0)  (var_ref inVertexColor) ) 
    ))
)
)

Notice, however, that the IR representation we get is not the one that is produced by the parser. As we will see later, that initial IR will be modified in multiple ways by Mesa, for example by adding different kinds of optimizations, so the IR that we see is the result after all these processing passes over the original IR. Mesa refers to this post-processed version of the IR as LIR (low-level IR) and to the initial version of the IR as produced by the parser as HIR (high-level IR). If we want to print the HIR (or any intermediary version of the IR as it transforms into the final LIR), we can edit the compiler and add calls to _mesa_print_ir as needed.

Traversing the Mesa IR

We mentioned before that some of the compiler’s work (a big part, in fact) has to do with optimizations and modifications of the IR. This means that the compiler needs to traverse the IR tree and identify subtrees that are relevant to this kind of operations. To achieve this, Mesa uses the visitor design pattern.

Basically, the idea is that we have a visitor object that can traverse the IR tree and we can define the behavior we want to execute when it finds specific nodes.

For instance, there is a very simple example of this in src/glsl/linker.cpp: find_deref_visitor, which detects if a variable is ever read. This involves traversing the IR, identifying ir_dereference_variable nodes (the ones where a variable’s value is accessed) and check if the name of that variable matches the one we are looking for. Here is the visitor class definition:

/**
 * Visitor that determines whether or not a variable is ever read.
 */
class find_deref_visitor : public ir_hierarchical_visitor {
public:
   find_deref_visitor(const char *name)
      : name(name), found(false)
   {
      /* empty */
   }

   virtual ir_visitor_status visit(ir_dereference_variable *ir)
   {
      if (strcmp(this->name, ir->var->name) == 0) {
         this->found = true;
         return visit_stop;
      }

      return visit_continue;
   }

   bool variable_found() const
   {
      return this->found;
   }

private:
   const char *name;       /**< Find writes to a variable with this name. */
   bool found;             /**< Was a write to the variable found? */
};

And this is how we get to use this, for example to check if the shader code ever reads gl_Vertex:

find_deref_visitor find("gl_Vertex");
find.run(sh->ir);
if (find.variable_found()) {
  (...)
}

Most optimization and lowering passes in Mesa are implemented as visitors and follow a similar idea. We will look at examples of these in a later post.

Built-in variables and functions

GLSL defines a set of built-in variables (with ‘gl_’ prefix) for each shader stage which Mesa injects into the shader code automatically. If you look at the example where we used MESA_GLSL=dump to obtain the generated Mesa IR you can see some of these variables.

Mesa implements support for built-in variables in _mesa_glsl_initialize_variables(), defined in src/glsl/builtin_variables.cpp.

Notice that some of these variables are common to all shader stages, while some are specific to particular stages or available only in specific versions of GLSL.

Depending on the type of variable, Mesa or the hardware driver may be able to provide the value immediately (for example for variables holding constant values like gl_MaxVertexAttribs or gl_MaxDrawBuffers). Otherwise, the driver will probably have to fetch (or generate) the value for the variable from the hardware at program run-time by generating native code that is added to the user program. For example, a geometry shader that uses gl_PrimitiveID will need that variable updated for each primitive processed by the Geometry Shader unit in a draw call. To achieve this, a driver might have to generate native code that fetches the current primitive ID value from the hardware and puts stores it in the register that provides the storage for the gl_PrimitveID variable before the user code is executed.

The GLSL language also defines a number of available built-in functions that must be provided by implementators, like texture(), mix(), or dot(), to name a few examples. The entry point in Mesa’s GLSL compiler for built-in functions
is src/glsl/builtin_functions.cpp.

The method builtin_builder::create_builtins() takes care of registering built-in functions, and just like with built-in variables, not all functions are always available: some functions may only be available in certain shading units, others may only be available in certain GLSL versions, etc. For that purpose, each built-in function is registered with a predicate that can be used to test if that function is at all available in a specific scenario.

Built-in functions are registered by calling the add_function() method, which registers all versions of a specific function. For example mix() for float, vec2, vec3, vec4, etc Each of these versions has its own availability predicate. For instance, mix() is always available for float arguments, but using it with integers requires GLSL 1.30 and the EXT_shader_integer_mix extension.

Besides the availability predicate, add_function() also takes an ir_function_signature, which tells Mesa about the specific signature of the function being registered. Notice that when Mesa creates signatures for the functions it also defines the function body. For example, the following code snippet defines the signature for modf():

ir_function_signature *
builtin_builder::_modf(builtin_available_predicate avail,
                       const glsl_type *type)
{
   ir_variable *x = in_var(type, "x");
   ir_variable *i = out_var(type, "i");
   MAKE_SIG(type, avail, 2, x, i);

   ir_variable *t = body.make_temp(type, "t");
   body.emit(assign(t, expr(ir_unop_trunc, x)));
   body.emit(assign(i, t));
   body.emit(ret(sub(x, t)));

   return sig;
}

GLSL’s modf() splits a number in its integer and fractional parts. It assigns the integer part to an output parameter and the function return value is the fractional part.

This signature we see above defines input parameter ‘x’ of type ‘type’ (the number we want to split), an output parameter ‘i’ of the same type (which will hold the integer part of ‘x’) and a return type ‘type’.

The function implementation is based on the existence of the unary operator ir_unop_trunc, which can take a number and extract its integer part. Then it computes the fractional part by subtracting that from the original number.

When the modf() built-in function is used, the call will be expanded to include this IR code, which will later be transformed into native code for the GPU by the corresponding hardware driver. In this case, it means that the hardware driver is expected to provide an implementation of the ir_unop_trunc operator, for example, which in the case of the Intel i965 driver is implemented as a single hardware instruction (see brw_vec4_visitor.cpp or brw_fs_visitor.cpp
in src/mesa/drivers/dri/i965).

In some cases, the implementation of a built-in function can’t be defined at the IR level. In this case the implementation simply emits an ad-hoc IR node that drivers can identify and expand appropriately. An example of this is EmitVertex() in a geometry shader. This is not really a function call in the traditional sense, but a way to signal the driver that we have defined all the attributes of a vertex and it is time to “push” that vertex into the current primitive. The meaning of “pushing the vertex” is something that can’t be defined at the IR level because it will be different for each driver/hardware. Because of that, the built-in function simply injects an IR node ir_emit_vertex that drivers can identify and implement properly when the time comes. In the case of the Intel code, pushing a vertex involves a number of steps that are very intertwined with the hardware, but it basically amounts to generating native code that implements the behavior that the hardware expects for that to happen. If you are curious, the implementation of this in the i965 driver code can be found in brw_vec4_gs_visitor.cpp, in the visit() method that takes an ir_emit_vertex IR node as parameter.

Coming up next

In this post we discussed the parser, which is the entry point for the compiler, and introduced the Mesa IR, the main data structure. In following posts we will delve deeper into the GLSL compiler implementation. Specifically, we will look into the lowering and optimization passes as well as the linking process and the hooks for hardware drivers that deal with native code generation.

by Iago Toral at March 03, 2015 08:28 AM

March 02, 2015

Lorenzo Tilve

Fosdem 2015

brussels

This is a good excuse as any other to retake the blog back to life.

It has been a long, long time since I was writing here last time, and I have been doing quite different things since then. As part of the Igalia browsers team, I have been working on WebKit related projects (mostly WebKitGTK+ and Epiphany), contributing on Chromium and Blink, and dealing with the coverage of Media Source Extensions specification.

Now going back to the reason for this post, it was my first year both at FOSDEM and Brussels. I had an idea about the size of the event, but it was indeed quite impressive.

This year we had four talks from igalians there, sharing the state on a few things that some our teams have been working on. In this case, we were talking about OpenGL conformance validation, on our improvements on performance and testing with Pflua, and the progress on LibreOffice for Android.

In particular the Igalia talks at FOSDEM 2015 were the following ones:

The four of them were really cool, and they had a lot of people attending to (well done mates :D )

Also, after digesting the FOSDEM huge schedule, my plan was to get the most out of the profiling and multimedia related tracks, and a few other talks that were calling my attention.

Saturday

On top of my colleagues presentations, and a part of the introductory talk by Karen Sandler, I could attend to Valgrind Integration in the Eclipse IDE , What is wrong with Operating SystemsTuning Valgrind for your WorkloadGStreamer in the living room and in outer spaceHow to start hacking on Valgrind by example and How to record all TV. Creating a 30 channel DVR.

There were some useful tricks at the Valgrind track and a nice overview on GStreamer portability. It was also curious finding RMS at the exhibition area. After a long conference day, it was Delirium Tremens time.

Sunday

My schedule for the Sunday was Ubuntu on phones and beyondMobile == WebKodi mediacenter (XBMC) past, present and futureIt’s not a bug, it’s an environment problemServo (the parallel web browser) and YOU! and Living on Mars: A Beginner’s Guide.

I found particularly interesting the one from Stormy Peters on how limited the understanding of the web is going to be for the next new billion users due to the devices used for it. I had also planned to attend to the one by Habib Virji (who I knew at the 2014 Web Engines Hackfest) but is talk on Web Security was full.

It was also curious the closing keynote on Mars One project. It would be fun if they manage to solve the (apparently still quite complex) open problems and this generation can witness a Mars colony.

To conclude, I think I probably rushed too much to get myself into an excessive amount of talks for 2 days, but it was great in any case to get info and fresh updates on the state of a lot of projects, meet some old friends, and breath some (literally) fresh air.

by lorenzo tilve at March 02, 2015 05:12 PM

February 24, 2015

Manuel Rego

Grid Auto-Placement Is Ready

Back in June I wrote my first post about CSS Grid Layout automatic placement feature, where I was talking about finishing the support for this feature “soon”. At the end it took a bit longer, but finally this month Igalia has completed the implementation and you can already start to test it in both Blink and WebKit.

A little bit of history

Basic support for auto-placement feature was already working since 2013. The past summer I added support for spanning auto-positioned items. However, the different packing modes of the grid item placement algorithm (aka auto-placement algorithm) were not implemented yet.

These packing modes were sparse, dense and stack. Initial behavior was implementing dense. I implemented sparse without too much trouble, however when implementing stack I had some doubts that I shared with the CSS Working Group. This ended up with some discussions and, finally, the removal of the stack mode. So you can forget about what I explained about it in my previous post.

Final syntax

After some back and forth it seems that now we’ve a definitive syntax for the grid-auto-flow property.

This property allows to determine two different things:

  • Direction: row (by default) or column.
    Defines the direction in which the grid is going to grow if needed to insert the auto-placed items.
  • Mode: sparse (if omitted) or dense.
    Depending on the packing mode the algorithm will try to fill (dense) or not (sparse) all the holes in the grid while inserting the auto-placed items.

So, you can use different combinations of these keywords (row, column and dense) to determine the desired behavior. Examples of some valid declarations:

grid-auto-flow: column;
grid-auto-flow: dense;
grid-auto-flow: row dense;
grid-auto-flow: dense column;

Let’s use an example to explain this better. Imagine the following 3x3 grid:

<div style="grid-template-rows: 50px 50px 50px; grid-template-columns: 100px 100px 100px;">
    <div style="grid-row: span 2; grid-column: 2;">item 1</div>
    <div style="grid-column: span 2;">item 2</div>
    <div>item 3</div>
    <div>item 4</div>
</div>

Depending on the value of grid-auto-flow property, the grid items will be placed in different positions as you can see in the next picture.

grid-auto-flow values example grid-auto-flow values example

Grid item placement algorithm

I’ve been talking about this algorithm for a while already. It describes how the items should be placed into the grid. Let’s use a new example that will help to understand better how the algorithm works:

<div style="grid-template-rows: repeat(4, 50px); grid-template-columns: repeat(5, 50px);">
    <div style="grid-row: 1; grid-column: 2;">i1</div>
    <div style="grid-row: 2; grid-column: 1;">i2</div>
    <div style="grid-row: 1; grid-column: span 2;">i3</div>
    <div style="grid-row: 1;">i4</div>
    <div style="grid-row: 2;">i5</div>
    <div style="grid-column: 2;">i6</div>
    <div>i7</div>
    <div style="grid-column: 3;">i8</div>
    <div>i9</div>
</div>

For items with definite positions (i1 and i2), you don’t need to calculate anything, simply place them in the positions defined by the grid placement properties and that’s all. However, for auto-placed items this is the algorithm explaining how to convert the automatic positions into actual ones.

Grid items location is determined by the row and column coordinates and the number of tracks it spans in each position. So you might have 3 types of auto-placed items:

  • Both coordinates are auto: i7 and i9.
  • Major axis position is auto: i6 and i8.
  • Minor axis position is auto: i3, i4 and i5.

Note: Major axis refers to the direction determined by grid-auto-flow property, and minor axis to the opposite one. For example in “grid-auto-flow: column;” the major axis is column and the minor axis is row.

Let’s describe briefly the main steps of the algorithm (considering that the direction determined by grid-auto-flow is row):

  1. First all the non auto-positioned items (the ones with definite position) should be placed in the grid.

    Example:

    • i1 (grid-row: 1; grid-column: 2;): 1st row and 2nd column.
    • i2 (grid-row: 2; grid-column: 1;): 2nd row and 1st column.

    Grid item placement algorithm: Step 1 Grid item placement algorithm: Step 1

  2. Next step is to place the auto-positioned items where the major axis direction is not auto, so they’re locked to a given row.

    Here we have different behaviors depending on the packing mode.

    • sparse: Look for the first empty grid area in this row where the item fits and it’s past any item previously placed by this step in the same row.

      Example:

      • i3 (grid-row: 1; grid-column: span 2;): 1st row and 3rd-4th columns.
      • i4 (grid-row: 1;): 1st row and 5th column.
      • i5 (grid-row: 2;): 2nd row and 2nd column.

      Grid item placement algorithm: Step 2 (sparse) Grid item placement algorithm: Step 2 (sparse)

    • dense: Look for the first empty grid area in this row where the item fits (without caring about the previous items).

      Example:

      • i3 (grid-row: 1; grid-column: span 2;): 1st row and 3rd-4th columns.
      • i4 (grid-row: 1;): 1st row and 1st column.
      • i5 (grid-row: 2;): 2nd row and 2nd column.

      Grid item placement algorithm: Step 2 (dense) Grid item placement algorithm: Step 2 (dense)

  3. Finally the rest of auto-placed items are positioned.

    Again the behavior depends on the packing mode. And the description is pretty similar to the one in the previous step, but without having the row constraint.

    • sparse: Look for the first empty area where the item fits and it’s past any item previously placed by this step. This means that we start looking for the empty area from the position of the last item placed.

      Example:

      • i6 (grid-column: 2;): 3rd row and 2nd column.
      • i7: 3rd row and 3rd column.
      • i8 (grid-column: 3;): 4th row and 3rd column.
      • i9: 4th row and 4th column.

      Grid item placement algorithm: Step 3 (sparse) Grid item placement algorithm: Step 3 (sparse)

    • dense: Look for the first empty area where the item fits. Starting always to look from the beginning of the grid.

      Example:

      • i6 (grid-column: 2;): 3rd row and 2nd column.
      • i7: 1st row and 5th column.
      • i8 (grid-column: 3;): 2nd row and 3rd column.
      • i9: 2nd row and 4th column.

      Grid item placement algorithm: Step 3 (dense) Grid item placement algorithm: Step 3 (dense)

Implementation details

Probably most of you won’t be interested in the details related to the implementation of this feature in Blink and WebKit. So, feel free to skip this point and move to the next one.

You can check the meta-bugs in Blink and WebKit to see all the patches involved in this feature. The code is almost the same in both projects and the main methods involved in the grid auto-placement are:

  • RenderGrid::placeItemsOnGrid() (Blink & WebKit): Which is in charge of placing the non auto-positioned items, covering step 1 explained before. And then it calls the next 2 methods with the auto-positioned items depending on its type.

  • RenderGrid::placeSpecifiedMajorAxisItemsOnGrid() (Blink & WebKit): This method will process the auto-placed items where only minor axis position is auto. So, they’re locked to a given row/column, which corresponds to step 2.

    Note that in the sparse packing mode we need to keep a cursor for each row/column to fulfill the condition related to have items placed after previous ones.

  • RenderGrid::placeAutoMajorAxisItemsOnGrid() (Blink & WebKit): And this last method is the one placing the auto-positioned items where the major axis position or both coordinates are auto.

    In this case, it uses the auto-placement cursor in order to implement the sparse behavior.

It’s also important to mention the class RenderGrid::GridIterator (Blink & WebKit) which has the responsibility to find the empty grid areas that are big enough to fit the auto-placed grid items.

Note: RenderGrid has just been renamed to LayoutGrid in Blink.

Peculiarities

On one side, the sparse packing mode is intended to preserve the DOM order of the grid items. However, this is only true if for all the items the major axis coordinate or both are auto. Otherwise, the DOM order cannot be guaranteed.

This is very easy to understand, as if you have some fixed elements, they’re going to be placed in their given positions, and the algorithm cannot maintain the ordering.

Let’s use a very simple example to show it:

<div style="display: grid;">
    <div style="grid-row: 1; grid-column: span 2;">item 1</div>
    <div>item 2</div>
    <div style="grid-row: 1; grid-column: 2;">item 3</div>
</div>

Where the elements will be positioned in:

  • item 1: 1st row and 3rd-4th columns.
  • item 2: 1st row and 1st column.
  • item 3: 1st row and 2nd column.

Grid item placement algorithm ordering Grid item placement algorithm ordering

On the other hand, there is another issue regarding sparse again, but this time in the second step of the algorithm (the one related to items locked to a given row/column). Where the items are placed after any other item previously positioned in that step in the same row/column.

Let’s take a look to the following example:

<div style="display: grid;">
    <div style="grid-row: 1; grid-column: 1;">item 1</div>
    <div style="grid-row: 1 / span 2;">item 2</div>
    <div style="grid-row: 2">item 3</div>
</div>

According to the algorithm these items will be placed at:

  • item 1: 1st row and 1st column.
  • item 2: 1st-2nd row and 2nd column.
  • item 3: 2nd row and 1st column.

Grid item placement algorithm sparse fixed row/column Grid item placement algorithm sparse fixed row/column

However, for the case of item 3 we could think that it should be placed in the 2nd row and the 3rd column. Because of item 2 is actually placed in the 1st and 2nd rows. But, for the sake of simplicity, this isn’t the behavior described in the spec.

Use cases

So far this has been too abstract, so let’s try to think in some real cases where the auto-placement feature might be handy.

The canonical example is the one about how to format a form. You can check the example in my previous post which puts labels in the first column and inputs in the second one. Or, the example in the spec with a little bit more complex form.

Another example could be how to format a definition list. Imagine that we’ve the following list:

<dl>
    <dt>Cat</dt>
    <dd>A carnivorous mammal (Felis catus) long domesticated as a pet and for catching rats and mice.</dd>
    <dt>Crocodile</dt>
    <dd>Any of several large carnivorous thick-skinned long-bodied aquatic reptiles (family Crocodylidae) of tropical and subtropical waters; broadly: crocodilian.</dd>
    <dt>Donkey</dt>
    <dd>The domestic ass (Equus asinus).</dd>
</dl>

In order to show it using a grid layout you would just need the next CSS lines:

dl {
    display: grid;
    grid-template-columns: 10px auto 10px 1fr 10px;
}

dt {
    grid-column: 2;
    width: max-content;
    align-self: center;
    justify-self: end;
}

dd {
    grid-column: 4;
    margin: 0px;
}

The result will be a 2-columns layout, where the terms go in the first column right aligned and vertically centered; and the definitions in the second column. With a 10px gutter wrapping the columns.

Definition list formatted using grid auto-placement feature Definition list formatted using grid auto-placement feature

Conclusion

Grid automatic placement has just been finished and you can start to play with it already in Chrome Canary (enabling the flag “Enable experimental Web Platform features”) or WebKit Nightly Builds. We think that it’s a nice and pretty powerful feature, and we encourage front-end web developers to test it (like the rest of CSS Grid Layout). The best part is that the spec is still having its last changes, and you’ll have the chance to provide feedback to the CSS Working Group if you miss something.

In order to make things easier to test I’ve adapted the demo to test the grid item placement algorithm, adding the option to specify the grid-auto-flow property too: http://igalia.github.io/css-grid-layout/autoplacement.html

Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

All this work has been done as part of the collaboration between Igalia and Bloomberg. We’ll be happy to receive your feedback about this feature. Stay tuned to get the latest news regarding CSS Grid Layout development!

February 24, 2015 11:00 PM

January 27, 2015

Katerina Barone-Adesi

Generating arbitrary test data

Most tests are useless without data, and the data they need to run can be fairly complicated. Generating a lot of data by hand sounds tedious, and would get rid of a lot of the advantages of generating tests; the flexibility and speed of having a large test case would disappear. So, what does it look like to generate data for tests? Here is an example of generating pflua's AST, one of its intermediate representations, using a functional grammar; the complete code is available. The key thing to notice is that it is actually fairly easy to do this; it is less than a page of straightforward code, mainly choosing randomly between alternatives.

An AST is generated by running Logical(). The simplest alternatives are to generate a one-word AST, which can be 'true', 'false', or 'fail'. Alternatively, a conditional (with if) or a comparison (with a relative operator like '==') can be generated. But this text explanation is already longer than the code; here's the implementation of 'Logical':

function Logical()
   return choose({ Conditional, Comparison, True, False, Fail })()
end

To handle complications, like avoiding dividing by zero or accessing beyond the bounds of a packet, a Lua table ('db') also gets passed around to the arithmetic operations. 'Comparison' then unwraps the assertions, generating code to test if they're violated and failing if they are.

With the general concepts explained, without further ado, here is the actual code to generate random ASTs for tests.

function True() return { 'true' } end
function False() return { 'false' } end
function Fail() return { 'fail' } end
function ComparisonOp() return choose({ '<', '>' }) end
function BinaryOp() return choose({ '+', '-', '/' }) end
function UnaryOp()
   return choose({ 'uint32', 'int32', 'ntohs', 'ntohl' })
end
-- Boundary numbers are often particularly interesting; test them often
function Number()
   if math.random() < 0.2
      then return math.random(-2^31, 2^32 - 1)
   else
      return choose({ 0, 1, -2^31, 2^32-1, 2^31-1 })
   end
end
function Len() return 'len' end
function Unary(db) return { UnaryOp(), Arithmetic(db) } end
function Binary(db)
   local op, lhs, rhs = BinaryOp(), Arithmetic(db), Arithmetic(db)
   if op == '/' then table.insert(db, { '!=', rhs, 0 }) end
   return { op, lhs, rhs }
end
function PacketAccess(db)
   local pkt_access_size = choose({1, 2, 4})
   local position = {'uint32', Arithmetic(db) }
   table.insert(db, {'>=', 'len', {'+', position, pkt_access_size}})
   return { '[]', position, pkt_access_size }
end
function Arithmetic(db)
   return choose({ Unary, Binary, Number, Len, PacketAccess })(db)
end
function Comparison()
   local asserts = {}
   local expr = { ComparisonOp(), Arithmetic(asserts), Arithmetic(asserts) }
   while #asserts > 0 do
      expr = { 'if', table.remove(asserts), expr, { 'fail' } }
   end
   return expr
end
function Conditional() return { 'if', Logical(), Logical(), Logical() } end
function Logical()
   return choose({ Conditional, Comparison, True, False, Fail })()
end

The above approach is not perfect; it generates quite large test cases. 'choose' weights options equally, though Number() demonstrates how bias can be introduced if some cases should be tested more often than others. Nonetheless, it's a real-world example of something good enough to be useful, while being easy and flexible at the same time.

by Katerina Barone-Adesi at January 27, 2015 02:30 PM

Improving Pflua's correctness with property-based checking

Creating a property-based tester and fixing several bugs - in one afternoon

Pflua, Igalia's implementation of Libpcap's filter language, is fast, and works very well. Like most software, it contains a few bugs, which unusual circumstances can trigger. Pflua is largely compatible with a subset of Libpcap, but has a few minor differences. Igalia is actively fixing bugs in Pflua, and increasing Libpcap compatibility, so that our users have the best experience possible: we like to fix bugs before anyone else even finds them. Part of the way we are improving Pflua is with property-based testing.

Writing a lot of tests by hand is slow, boring, inflexible, and insufficient. It makes it harder to change the underlying code and try new things. And people tend not to think about the full range of edge cases. This is an even more serious problem when the same person writes the code and test cases, which is usually the case with Pflua - if you miss an edge case writing code, it is very easy to miss the same one while writing tests. Luckily, there is an alternative to writing hundreds of thousands of unit tests by hand. It is called "property based testing", and it lets you specify a "property" that should always be true, and then generates random test cases to try to falsify that property.

I love QuickCheck. Using it is one of the most enjoyable things about programming in Haskell. It is a fantastic tool for generating test cases that people tend not to think of. Pflua is written in Lua, not Haskell, so one afternoon, Andy Wingo and I sat down and wrote a simple program inspired by QuickCheck. Our program is not a port or clone of QuickCheck; it is a much simpler tool. Originally, it tested only one property - that the results of running a filter on a packet were the same before and after optimizing the Pflua IR for that filter. It is a very simple property, easily expressed in one line of code, but we found several flaws in Pflua, ranging from trivial to fairly important.

The simplest bug was due to not emitting spaces around binary operators. What is 8--2? In Lua, 8 - -2 is 10, as expected, but -- is a comment, so 8--2 is 8. In C11 or C++, it is like writing 8//2. After adding a space around each binary operator, that bug was eliminated. It was a trivial bug, but it produced silently incorrect results, and I am glad our users will never be bitten by it.

How does Pflua's property based testing work? There are a few steps. Pflua-test loops up to the number of times the test is to be run. Each time, it generates random data for the test case, and then checks if the test passes. Concretely, for testing if optimized and unoptimized IR give the same results, it generates and applies an arbitrary valid IR sequence to a random packet from a pcap dump. The function "Logical" generates a random Pflang filter. The test runs the optimizer on the IR, and compares the results of applying the optimized and unoptimized IR to the chosen packet. Originally, we did this in the main loop; later, we generalized the tool to take files that describe properties. This testing is not exhaustive; there are infinitely many IRs (and no effective way to collapse the state space), and the test could pass on the concrete packet it is run against but fail on other packets. Nonetheless, it is useful.

What is the idea behind the function "Logical"? Lua does not have an equivalent of Haskell typeclasses or 'instance Arbitrary', so we captured part of the spirit of those tools by setting up a functional grammar to generate arbitrary valid IR strings. The details are not essential, but I think it is worth sharing how little code it takes to gain this much flexibility. See the next post for the full implementation of the generator for the IR's grammar, in Lua.

So, what happened when we ran the code, testing the property that an IR string applied to a packet gives the same result before and after being optimized? Aside from the aforementioned comment/binary operator bug, the tool found a range-folding bug, a bug stemming from not setting the range of len, and a bug where Pflua generated invalid Lua (by returning, regardless of whether it was at the end of a block), as well as an invalid optimization involving ntohs and ntohl, and an internal crash with NaN.

Testing always takes time and effort, but property-based testing is often a lot more flexible and effective than writing tests by hand, and can be surprisingly quick to start using. Pflua's QuickCheck-inspired tool is still quite new and undergoing significant changes, but turning up half a dozen bugs in an afternoon was a good start. As usual, many bugs are trivial to fix as soon as their existance is known; several of the bugs were fixed within minutes of finding them.

by Katerina Barone-Adesi at January 27, 2015 02:30 PM

January 24, 2015

Samuel Iglesias

FOSDEM 2015

Next week  I am traveling to Brussels to attend FOSDEM but this time as speaker! My talk will explain why we need to test our OpenGL drivers with OpenGL conformance test suites, list the unofficial Free software ones and give a brief introduction about how to use them.

See you there!

FOSDEM 2015Update: The slides are available online.

by Samuel Iglesias at January 24, 2015 11:10 AM

January 22, 2015

Jacobo Aragunde

LibreOffice for Android at FOSDEM

FOSDEM

Next week I’m flying to FOSDEM to let you know about our work in LibreOffice for Android, which has just been released in the Play Store. In my talk I will focus on the document browser, the new features we are currently working on and my own vision for the future of this component.

LibreOffice for Android, provider selection

You can read more about LibreOffice for Android in this post from The Document Foundation, and some gory technical details in Michael Meeks’ blog.

EDIT: Slides! Find them here.

by Jacobo Aragunde Pérez at January 22, 2015 05:30 PM

January 19, 2015

Manuel Rego

New Year, New Blog!

As you might noticed, I’m starting the new year changing my blog design and software.

The idea has often come to my mind since I discovered the static site generators like Jekyll. And finally this year I decided to get rid of WordPress and, definitely, move my blog to Jekyll. Like the previous blog theme was really old fashioned and completely unresponsive, I took advantage of the software migration to improve the design of my blog too.

Basically, I found this nice Clean Blog theme by Start Bootstrap and decided to give it a chance. On top of that, they provide not only the theme but also a version already integrated with Jekyll. So, most of the work was already in place.

In order to migrate the blog posts I used a WordPress plugin which, together with a few sed commands, was enough to move the old posts to Jekyll. In addition, I’m using Jekyll tags to categorize the posts like before. With a tag view page that shows all the posts and provide its own feed (required for planets subscriptions). E.g. CSS tag page.

Regarding comments, I decided to go for a static solution too. Basically, I got very few comments during the last year, so I think that looking for a more complex solution doesn’t worth it. This means, that if you want to comment something in my posts from now on, you should send me a mail (or a tweet) and that’s all. You can check a previous post with comments to see how they look like.

This was just a short notice about this change, I hope you enjoy the new site. At least I like it a lot. :-)

January 19, 2015 11:00 PM

January 12, 2015

Javier Fernández

Box Alignment and Grid Layout (II)

Some time has passed since my first post about the Box Alignment spec implementation status for Blink and WebKit web engines. I’ll do an update in this post and, since the gap between both web engines has grown considerably (I’ll do my best to reduce it as soon as possible), I’ll remark the differences between both engines.

What’s new ?

The ‘stretch’ value is now implemented for align-{self, items} and justify-{self, items} CSS properties. This behavior is quite important because it’s the default for these four properties in Flexible Box and Grid layouts. According to the specification, the ‘stretch’ value is defined as follows:

If the width or height (as appropriate) of the alignment subject is auto, its used value is the length necessary to make the alignment subject’s outer size as close to the size of the alignment container as possible, while still respecting the constraints imposed by min-height/max-width/etc. Otherwise, this is equivalent to start.

When defining the alignment properties in a grid layout it’s very important to consider how we want to manage item’s overflow. It’s allowed to specify an overflow alignment value for both Content and Item Alignment definition, but so far it’s implemented only for the Item Alignment properties. The Overflow Alignment concept is defined in the specification as follows:

To help combat undesirable data loss, an overflow alignment mode can be explicitly specified. “True” alignment honors the specified alignment mode in overflow situations, even if it causes data loss, while “safe” alignment changes the alignment mode in overflow situations in an attempt to avoid data loss.

The ‘stretch’ value in Grid Layout

This value applies to the Self Alignment properties {align, justify}-self, and obviously their corresponding Default Alignment ones {align, justify}-items. For grids, these properties consider that the alignment container is the grid cell, while the alignment subject is the grid item’s margin box.

The Box Alignment specification states that Default Alignment ‘auto’ values should be resolved to ‘stretch’ in case of Grid containers; this value will be used as the resolved value for ‘auto’ Self Alignment values.

So by default, or when explicitly defined as ‘stretch’, the grid item’s margin box will be stretched to fill its grid cell breadth. Let’s see it with a basic example:

align-stretch

All the examples available at https://igalia.github.io/css-grid-layout/

This change affected only the layout codebase of the web engine, since the value was already introduced in the style parsing logic. The Grid Layout rendering logic uses an interesting abstraction to override the grid item’s block container. This abstraction allows us to use Grid Cells as block containers when computing the logical height and width.

Overflow Alignment in Grid Layout

The Overflow Alignment value used when defining a grid layout could be particularly useful, specially for fixed sized grids. The potential data lost may happen not only at the left and top box edges, but between adjoining grid cells. Overflow Alignment is defined via the ‘safe’ and ‘true’ keywords. They were already introduced in the Blink core’s style parsing logic as part of the CSS3 upgrade of the alignment properties (justify-self, align-self) used in the FlexBox implementation. The new CSS syntax is described by the following expression:

auto | stretch | ">" href="http://dev.w3.org/csswg/css-align-3/#typedef-baseline-position" data-link-type="type"><baseline-position> | [ ">" href="http://dev.w3.org/csswg/css-align-3/#typedef-item-position" data-link-type="type"><item-position> && ">" href="http://dev.w3.org/csswg/css-align-3/#typedef-overflow-position" data-link-type="type"><overflow-position>? ]

According to the current Box Alignment specification draft, the Overflow Alignment keywords have the following meaning:

  • safe: If the size of the alignment subject overflows the alignment container, the alignment subject is instead aligned as if the alignment mode were start.
  • true: Regardless of the relative sizes of the alignment subject and alignment container, the given alignment value is honored.

I’ll proceed now to show how Overflow Alignment is applied in the Grid Layout specification with an example:

align-overflow

All the examples available at https://igalia.github.io/css-grid-layout/

The new syntax to allow the Overflow Alignment keywords required to modify the style builder and parsing logic, as it was mentioned before. The alignment properties became a CSSValueList instance instead of simple keyword IDs; both Blink and WebKit provides Style Builder code generation directives (CSSPropertyNames.in) for simple properties, but this was not the case for these properties anymore.

The Style Builder is more complex now because it has to deal with the conditional overflow keyword, which can be specified before or after the <item-position> keyword. Blink provides a function template scheme for groups of properties sharing the same logic, which is the case of most of the CSS Box Alignment properties (align-self, align-items and justify-self; justify-items is slightly different so it needs custom functions). WebKit is currently defining the new style builder and it does not follow this approach yet, but I’d say it will, eventually, since it makes a lot of sense.

{% macro apply_alignment(property_id, alignment_type) %}
{% set property = properties[property_id] %}
{{declare_initial_function(property_id)}}
{
    state.style()->set{{alignment_type}}(RenderStyle::initial{{alignment_type}}());
    state.style()->set{{alignment_type}}OverflowAlignment(RenderStyle::initial{{alignment_type}}OverflowAlignment());
}
 
{{declare_inherit_function(property_id)}}
{
    state.style()->set{{alignment_type}}(state.parentStyle()->{{property.getter}}());
    state.style()->set{{alignment_type}}OverflowAlignment(state.parentStyle()->{{property.getter}}OverflowAlignment());
}
 
{{declare_value_function(property_id)}}
{
    CSSPrimitiveValue* primitiveValue = toCSSPrimitiveValue(value);
    if (Pair* pairValue = primitiveValue->getPairValue()) {
        state.style()->set{{alignment_type}}(*pairValue->first());
        state.style()->set{{alignment_type}}OverflowAlignment(*pairValue->second());
    } else {
        state.style()->set{{alignment_type}}(*primitiveValue);
    }
}
{% endmacro %}
{{apply_alignment('CSSPropertyJustifySelf', 'JustifySelf')}}
{{apply_alignment('CSSPropertyAlignItems', 'AlignItems')}}
{{apply_alignment('CSSPropertyAlignSelf', 'AlignSelf')}}

Even though style building and parsing is shared among all the layout models using the Box Alignment properties, like Flexible Box and Grid so far, Overflow Alignment is only supported so far by the CSS Grid Layout implementation. The Overflow Alignment logic affects how the grid items are positioned during the layout phase.

The Overflow Alignment keywords are also valid in the Content Distribution syntax, which I’m working on now with quite good progress. The first patches landed already in trunk, providing an implementation of the justify-content property in Grid Layout. I’ll talk about it soon, hopefully, once the implementation is completed and the discussion in the www-style mailing list conclude with some agreement regarding the Distributed Alignment for grids.

Current implementation status

The Box Alignment specification is quite complete now in Blink, unfortunately that’s not the case of WebKit. I’ll summarize now the current implementation status in browsers based on each web engine, which are basically Chrome/Chromium vs Safari; I’ll also try to outline the roadmap for the next weeks.

align-grid-support

The flex-start, and flex-end values are used only in Flexible Box layouts, so they don’t apply to this analysis of the Grid support of the Box Alignment spec. The Distributed Alignment values apply only to the Content Distribution properties (align-content and justify-content). Finally, ‘stretch‘ is a valid value for both, Positional and Distributed Alignment, so it’s not redundant but a different interpretation of the same value depending on the property it’s applied to.

Some conclusions we can extract from the table above:

  • Default and Self Alignment support is almost complete in Blink; only Baseline Alignment is pending to be implemented.
  • Content Distribution support for justify-content in Blink. Only <content-position> values are implemented, since current spec draft states that all the <content-distibution> values will fallback in Grid Layout; spec authors are still evaluating changes on this issue, though.
  • WebKit fails to provide CSS3 parsing for all the properties except justify-self, although there are some patches pending of review to improve this situation.
  • There is no Grid support at all in WebKit for any of the Box Alignment properties.
Igalia & Bloomberg logos

Igalia and Bloomberg working to build a better web platform

by jfernandez at January 12, 2015 11:41 AM

January 07, 2015

Manuel Rego

CSS Grid Layout 2014 Recap: Implementation Status

After the review of the changes in the CSS Grid Layout spec in my previous post, let’s focus now in the status of the implementation in Blink and WebKit. This post will try to cover what we’ve been doing in Igalia during 2014 around grid layout, and it’ll talk about our plans for 2015.

Work done in 2014

Spec syntax

During the first part of the year we were updating the CSS Grid Layout syntax in order to match the last modifications introduced in the spec during 2013.
As part of this work, the grid and grid-template shorthands were introduced, which are deeply explained by my colleague Javi Fernández in a post.
Right now the implementation both in Blink (#337626) and WebKit (#127987) is complete and matching the spec regarding to the syntax.

Painting optimizations

Back in 2013, Julien Chaffraix did some optimizations in the grid painting code (#248151). However, those changes introduced some issues that were being fixed during 2014. Finally the painting code seems to have been stable for a while.
This optimization is not present in WebKit, so this work was only done in Blink.

Code refactoring

Another task that was done in the first half of this year was the refactoring of the code related to positions resolution. It’s been moved to its own class (GridResolvedPosition), so RenderGrid only has to deal with resolved positions now.
This change was done in both Blink (#258258) and WebKit (#131732).

Named grid lines

At the beginning of the year we implemented the support for named grid lines. This completes the implementation of the different placement options availabe in a grid (numeric indexes, named areas, named lines and span). Once more, this is supported in Blink (#337779) and WebKit (#129372).
In this case, my fellow Sergio Villar talked about this work in another post.

Named grid lines example Named grid lines example

Track sizing algorithm

The track sizing algorithm has been rewritten in the spec during 2014. Despite of keeping the same behaviour, the implementation was modified to follow the new algorithm closely.
During this work some missing features were detected and solved, making the current implementation more complete and robust.
Several patches have been done in both Blink (#392200) and WebKit (#137253, #136453, #137019 & #136575).

Automatic placement algorithm

The auto-placement feature has been completed adding support for items spanning several tracks and implementing the “sparse” and “dense” packaging modes.

Auto-placement example with spanning item Auto-placement example with spanning item

In this case you can read my post about how all this works.
This was done in both Blink (#353789 & #384099) and WebKit (#103316).

Auto-placement Auto-placement “sparse” and “dense” packing modes example

Fuzzinator bugfixing

Apart from generic bugfixing during this year we’ve fixed some issues detected by a tool called Fuzzinator in both Blink and WebKit. Renata Hodovan wrote a nice post explaining all the details regarding this (thanks for great your work).
The good news is that now the grid code is much more stable thanks to all the reports and patches done during 2014.

Alignment and justification

This work is still ongoing, but the main alignment properties (justify-self, align-self, justify-items, align-items, justify-content and align-content) are already supported, or on the way (with patches pending review), in Blink (#234199). For this feature the patches in WebKit (#133222 & #133224) are moving slowly.
You can check all the possibilities provided by these properties in a blog post by Javi.

Different options to align an item inside a grid cell Different options to align an item inside a grid cell

Absolutely positioned grid children

During the last part of 2014 it’s been implemented the special support for positioned grid children, because of they’ve some particularities in grids.
The initial patch is available on Blink (#273898), but still some stuff needs to be fixed to complete it. Then, it’ll be ported to WebKit as usual.

Absolutely positioned grid children example Absolutely positioned grid children example

Writing modes

We’ve been working on writing modes support fixing issues with margins, borders and paddings. Now, the columns/rows are painted in the right order depending on the direction property.
Orthogonal flows were clarified in the last update of the spec, current issues are already being addressed in order to fix them.
Again, all this work was done in Blink (#437354) and will be ported to WebKit later on.

Example of direction support in grid Example of direction support in grid

Testing

You can always increase the test coverage, specially for a big spec like CSS Grid Layout. We’ve been adding some missing tests here and there, and finally decided to start the road to create a W3C test suite for grid.
We’re still on the early stages, and getting used to all the W3C testing infrastucture and processes. Gérard Talbot is helping us to take the first steps, big thanks!
We’ve already drafted a test plan where you can follow our progress. We hope to complete the test suite during 2015.
As expeceted, the nice part when you’re focused on writing tests in general (not only tests for the patch you’re developing) is that you do much better tests and you end up finding small issues in different places.

Plans for 2015

The main goal is to ship CSS Grid Layout in Chrome (Blink) and see if Safari (WebKit) follows the trend. In that hypothetical scenario 3 major browsers: Chrome, Safari and Internet Explorer (despite implementing an old version of the spec) will have CSS Grid Layout support; which would be really great news.
Thinking about the next step in the short-term, our intention is to send the “Intent to Ship” to Blink mailing list during the first quarter of 2015.
WebKit is lagging a bit behind, but we’ve plans to update the implementation and reduce the gap between Blink and WebKit grid’s codebases.

Of course, apart from completing all the ongoing tasks and other minor fixes, we’ll have to keep doing more work to fully implement the spec:

  • Add support for “auto” keyword for repeat() (recently added to the spec).
  • Allow to grow the implicit grid before the explicit grid (supporting properly negative indexes for grid line numbers).
  • Implement fragmentation support once the spec is definitive regarding this topic.

Apart from that during 2015 we’ll review the performance and will try to make faster the grid with some optimizations.
Furthermore, it’d be really nice to add some support for grid in Chrome DevTools and Safari Web Inspector. That would make life of end users much easier.

Wrap-up

2015 will be the year of CSS Grid Layout in the browser. Hopefully, you’ll be able to use it natively in all the major browsers but Firefox (where you could use the polyfill). Anyway you can start to play with it already, enabling the experimental Web Platform features flag in Chrome (unprefixed) or using the WebKit nightly builds (with “-webkit” prefix).
If you want to follow closely the implementation track the meta-bugs in Blink (#79180) and WebKit (#60731).

Igalia is doing all this work as part of our collaboration with Bloomberg.
We’re waiting for you to start testing grid layout, report feedback and bugs. We’ll do our best in order that you enjoy it. Exciting times ahead!

Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

January 07, 2015 11:00 PM

Xabier Rodríguez Calvar

Streams API in WebKit at the Web Engines Hackfest

Yes, I know, I should have written this post before you know, blah, blah, excuse 1, blah, excuse 2, etc. ;)

First of course I would like to thank Igalia for allowing me to use the company time to attend the hackfest and meeting such a group of amazing programmers! It was quite intense and I tried to give my best though for different reasons (coordination, personal and so on) I missed some session.

My purpose at the hackfest was to work with Youenn Fablet from Canon on implementing the Streams API in WebKit. When we began to work together in November, Youenn had already a prototype working with some tests, so the idea was taking that, completing, polishing and shipping it. Easy, huh? Not so…

What is Streams? As you can read in the spec, the idea is to create a way of handling different kind of streams with a common high level API. Those streams can be a mapping of low level I/O system operations or can be easily created from JavaScript.

Fancy things you can do:

  • Create readable/writable streams mapping different operations
  • Read/write data from/to the streams
  • Pipe data between different streams
  • Handle backpressure (controlling the data flow) automagically
  • Handle chunks as the web application sees fit, including different data types
  • Implement custom loaders to feed different HTML tags (images, multimedia, etc.)
  • Map some existing APIs to Streams. XMLHttpRequest would be a wonderful first step.

First thing we did after the prototype was defining a roadmap:

  • General ReadableStream that you can create at JavaScript and read from it
  • XMLHttpRequest integration
  • Loaders for some HTML tags
  • WritableStream
  • Piping operations

As you can see in bugzilla we are close to finishing the first point, which took quite a lot of effort because it required:

  • Code cleaning
  • Making it build in debug
  • Improving the tests
  • Writing the promises based constructor
  • Fixing a lot of bugs

Of course we didn’t do all this at the hackfest, only Chuck Norris would have been able to do that. The hackfest provided the oportunity of meeting Youenn in person, working side by side and discussing different problems and possible strategies to solve them, like for example, the error management, queueing chunks and handling their size, etc. which are not trivial given the complexity created by the flexibility of the API.

After the hackfest we continued working and, as I said before, the result you can find at bugzilla. We hope to be able to land this soon and continue working on the topic within the current roadmap.

To close the topic about the hackfest, it was a pleasure to work with such amount of awesome web engines hackers and I would like to finish thanking the sponsors Collabora and Adobe and specially my employer, Igalia, that was sponsor and host.

by calvaris at January 07, 2015 07:30 PM

January 02, 2015

Claudio Saavedra

Thu 2015/Jan/01

Some recap about my reading in 2014. I definitely read less this year although to some degree it was more exhausting. Let's see.

Books read to completion through the year: 23 (~7800 pages, according to Goodreads). Tolstoi's "War and Peace" and Marx's "Capital, vol. 1" were the ones that took most of the time, and for both I had to make long breaks with lighter reads in between.

In the case of "War and Peace", I read from both the English translation by Pevear & Volokhonsky (Vintage Classics) and the Spanish translation by Kúper (Editorial El Aleph). From these two, P&V became my favorite one as I was approaching the end of the book.

The volume 1 of Capital was fascinating but exhausting, as its style and long chapters on the working conditions in England up to the 19th century and the development of industry and machinery are, well, not condensed enough. Someone had to tell Marx to just write a separate book about these topics (had not Engels' "The Condition of the Working Class in England" been enough?). Anyway, I read it side by side with Harvey's "Companion to Marx's Capital" which was pretty good at avoiding one's getting lost in the interwined historical and logical arguments and the dialectic journey from commodities to class struggle. I'm looking forward to volume 2, although I'll need a bit of a rest before I get to it.

The language count: 17 in English, 5 in Spanish, 2 in German. That's frustratingly not enough books in Spanish for a Spanish speaker. But on the upside I got a bit more confident reading in German. I even tried to read a few short stories by Dürrenmatt but his German was way beyond mine ("Der Tunnel" is mindblowing, by the way). I'll try again at some point in the future with him.

Steinbeck deserves a special mention. I loved both "Grapes of Wrath" and "Of Mice and Men" not only for their style but also for their historical component (specially in the Grapes). "Zur falschen Zeit" by the swiss Alain Claude Sulzer is also a pretty good book (and that I can recommend to people with an intermediate German level). Fuguet's "Las Películas de mi Vida" touched a nerve with me, as its central topics: earthquakes, Chile, cinema, family, and being an expat are all topics that relate to my life. I read it in the plane that took me to Chile during these holidays and couldn't help but shed a tear or two with it, in anticipation to what would be an intensely emotional visit to the homeland after 4 years.

The worst book I read was a compilation of Marxist writings by C. Wright Mills called "The Marxists". If Mills was a relevant author in sociology (I still have to read "The Sociological Imagination), this is a pretty lazy work. I won't repeat what I said in my Goodreads review.

The author count! 20 distinct authors, and again, very few women among them -- two to be precise. Steinbeck, Marx, and Lenin were the most read authors, with two works of each.

The sources for the books: two from local libraries and two electronic books, 13 second-hand books, one loan, and the rest probably bought new.

Books acquired in 2014: 73 (down from 110 in 2013, so that's an improvement). Most of them are second-hand books and a few are presents from my loving friends. I also have a bigger bookshelf now but it's full already -- story of my life.

Of my goals for 2014, I only fulfilled: reading "War and Peace", buying less books, reading more German. I didn't read the Quixote, Ulysses, nor 2666 yet, and I read even less books in Spanish. For 2015, I'll be happy if I read vol. 2 of Capital, if I manage to read more books in Spanish, and if I get to read more German. Off we go, 2015.

I also took my hobby of curating quotes to what I've named Hel Literaria.

January 02, 2015 02:06 AM

December 29, 2014

Manuel Rego

CSS Grid Layout 2014 Recap: Specification Evolution

Year 2014 is coming to an end, so it’s the perfect timing to review what has happened regarding the CSS Grid Layout spec, which Igalia has been implementing in both Blink and WebKit engines, as part of our collaboration with Bloomberg.

I was wondering what would be the best approach to write this post, and finally I’m going to split it in 2 different posts. This one covering the changes in the spec during the whole year and another one talking about the implementation details.

Two Working Drafts (WD) of the CSS Grid Layout Module have been published during 2014. In addition, during the last month of the year, somehow related with the work previously done at TPAC in the CSS Working Group (WG) face to face (F2F) meeting, several changes have been done in the spec (I guess that they’ll end up in a new WD in early 2015). So, let’s review the most important changes introduced in each version.

Working Draft: 23 Jan 2014

Subgrids

This is the first WD where subgrids feature appears marked as at-risk. This means that it might end up out of Level 1 of the specification.
A subgrid is a grid inside another grid, but keeping a relationship between the rows/columns of the subgrid and the parent grid container. It shares the track sizing definitions with the parent. Just for the record, current implementations haven’t support for this feature yet.
However, nested grids are already available and will be part of Level 1. Basically nested grids have their own track sizing definitions completely independent of their parents. Of course, they’re not the same than subgrids.

Subgrid vs nested grid example Subgrid vs nested grid example

Implicit named areas

This is related with the concept of the named grid areas. Briefly, in a grid you can name the different areas (group of adjacent cells), for example: menu, main, aside and/or footer, using the grid-template-areas property.
Each area will define 4 implicit named lines: 2 called foo-start (marking the row and column start) and 2 called foo-end (row and column end), where foo would be the name of the area.
This WD introduces the possibility to create implicit named areas, by defining named grid lines using the previous pattern. That way if you explicitly create lines called foo-start and foo-end, you’ll be defining an implicit area called foo that could be used to place items in the grid.

Example of implicit named grid areas Example of implicit named grid areas

Aligning the grid

In this version the justify-content and align-content properties are added, which allow to align the whole grid within the grid container.

Other

In this WD appears a new informative section explaining the basic examples for the grid placement options. It’s an informative section but very useful to get an overview of the different possibilities.
In addition, it includes an explanatory example for the absolutely-positioned grid items behavior.

Working Draft: 13 May 2014

Track sizing algorithm

Probably the most important change in this version is the complete rewrite of the track sizing algorithm. Anyway, despite of the new wording, the algorithm keeps the very same behavior.
This is the main algorithm for grids, it defines how the track breaths should be calculated taking into account all the multiple available options that define the track sizes.
An appendix with a “translation” between the old algorithm and the new one is included too, mostly to serve as reference and help to detect possible mistakes.

Auto-placement changes

The grid-auto-flow property is modified in this WD:

  • none is substituted by a new stack packing mode.
  • As a consequence, property grid-auto-position (tied to none) is dropped.

Before this change, the default value for grid-auto-flow was none and, in that case, all the auto-placed items were positioned using the value determined by grid-auto-position (by default in 1×1).
With this change, the default value is row. But, you can specify stack and the grid’ll look for the first empty cell and use it to place there all the auto-positioned items.

Other

Implementations have now the possibility to clamp the maximum number of repetitions in a grid.
Besides, it brings up a new section related to sizing grid containers where it’s defined how they behave under max-content and min-content constraints.

Editor’s Draft: Last changes

Note: These changes are not yet in a WD version and might suffer some modifications before a new WD is published.

Automatic number of tracks

A new auto keyword has been added to repeat() function.
This will allow to repeat the track list specified as many times as needed, depending on the grid container size. Which used together with the auto-placement feature might be really nice combo.
For example, if the grid container is 350px width and it uses repeat(auto, 100px); to define the columns, you’ll end up having 3 columns.

Example of new auto keyword for repeat() function Example of new auto keyword for repeat() function

Auto-placement stack removed

Finally, after some issues with the stack mode, it’s been decided to remove it from the spec. This means that grid-auto-flow property gets simplified, allowing you to determine the direction: row (by default) or column; and the packing algorithm: dense or “sparse” (if omitted).
On top of that, the grid item placement algorithm has now a more explicit wording regarding the different packing modes.

Fragmentation

This section has been missing since a lot of time ago, and it finally has got some content.
Anyway, this is still an initial proposal and more work is needed to settle it down.

Other

Reviewed the scope of align-content and justify-content, now they apply to the grid tracks rather than to the grid as a single unit.

As a side note, there’s an ongoing discussion regarding the symbol used to determine the named grid lines. Currently it’s a parenthesis, e.g.:

grid-template-columns: (first) 100px (mid central) 200px (last);

However these parenthesis have some issues for Sass preprocessor. The proposal of using square brackets was not accepted in the last CSS WG F2F meeting, though it’ll be revisited again in the future.

Conclusion

Of course this list is not complete, and I can be missing some changes. At least, these’re the most important ones from the implementor perspective.
As you can see, despite of not having big behavioral changes during this year, the spec has been evolving and becoming more and more mature. A bunch of glitches have been fixed, and some features have been adapted thanks to the feedback from users and implementors.
Thanks to the spec editors: Tab, fantasai and Rossen (and the rest of the CSS WG), for all their work and patience in the mailing list answering lots of doubts and questions.

Next year CSS Grid Layout will be hitting your browsers, but you’re still on time to provide feedback and propose changes in the spec. The editors will be more than happy to listen your suggestions for improvements and know what things are you missing.
If you want to have first-hand information regarding the evolution of the spec, you should follow the CSS WG blog and check the minutes of the meetings where they discuss about grid. On top of that, if you want all the information, you should subscribe yourself to the CSS WG mailing list and read the mails with “[css-grid]” in the subject.

Last, in the next post I’ll talk about the work we’ve been doing during 2014 regarding the implementation in Blink and WebKit and our plans for 2015. Stay tuned!

Igalia and Bloomberg working together to build a better web Igalia and Bloomberg working together to build a better web

December 29, 2014 11:00 PM

December 15, 2014

Philippe Normand

Web Engines Hackfest 2014

Last week I attended the Web Engines Hackfest. The event was sponsored by Igalia (also hosting the event), Adobe and Collabora.

As usual I spent most of the time working on the WebKitGTK+ GStreamer backend and Sebastian Dröge kindly joined and helped out quite a bit, make sure to read his post about the event!

We first worked on the WebAudio GStreamer backend, Sebastian cleaned up various parts of the code, including the playback pipeline and the source element we use to bridge the WebCore AudioBus with the playback pipeline. On my side I finished the AudioSourceProvider patch that was abandoned for a few months (years) in Bugzilla. It’s an interesting feature to have so that web apps can use the WebAudio API with raw audio coming from Media elements.

I also hacked on GstGL support for video rendering. It’s quite interesting to be able to share the GL context of WebKit with GStreamer! The patch is not ready yet for landing but thanks to the reviews from Sebastian, Mathew Waters and Julien Isorce I’ll improve it and hopefully commit it soon in WebKit ToT.

Sebastian also worked on Media Source Extensions support. We had a very basic, non-working, backend that required… a rewrite, basically :) I hope we will have this reworked backend soon in trunk. Sebastian already has it working on Youtube!

The event was interesting in general, with discussions about rendering engines, rendering and JavaScript.

by Philippe Normand at December 15, 2014 04:51 PM

December 11, 2014

Javier Fernández

Web Engines Hackfest 2014

An awesome week is coming to the end and I’d like to thanks the sponsors for helping us to make possible that such an amazing group of hackers could work together to improve the web. We focused on some of the consolidated web engines but also about the most promising ones and, of course, hacking on them producing a good amount of patches.

15972643911_c197af2a23_z

The talks were great, and even better the breakout sessions about many hot topics for the future of the web engines.

15797464819_3eb5d51404_zBecause of the work I’ve been doing lately, I was specially involved in the CSS Features session, which among other things, it complemented the talk Sergio Villar gave us about the Grid Layout implementation we are doing at Igalia in collaboration with Bloomberg.  I introduced as well the work I’ve been doing on the implementation of the Box Alignment specification in both Blink and WebKit web engines; we evaluated how it would impact other layout models, like MathML, CSS Regions, CSS Flexible Box, to ease the logic of blocks and content alignment. We also discussed about the future of CSS regarding new layout models, which is a bit uncertain; there is actually an interesting discussion inside the W3C about this topic, so we will see how it evolves. We talked about graphics and CSS and the SVG specification  (the 2.0 version is being defined) which  will have an important role in the future, as I could personally notice during the last CSSConfEU conference in Berlin; it was also an important topic in other conferences along this year.

15789146369_b390b71cf8_zThis week was a nice opportunity to discuss with other web core hackers the issues I’ve found to properly implement the CSS Box Alignment specification in WebKit, see discussion in the bugzilla for details. We have concluded  that is not an easy problem that should be discussed in the mailing list, as it would imply assuming a performance overhead in CSS parsing. The ‘auto’ value defined by the spec for all the Box Alignment properties, to be resolved during the cascade depending on the type of elements, is affecting the current implementation of Flexible Box and MathML so we will have to find a solution.

I also produced a bunch of patches for WebKit to improve the way we were managing margins and float elements, properly identifying the elements creating a new formatting context and applying some refactoring to make the code clearer; these improvements fixed several issues in Grid Layout as well. Besides, borders, margin and padding was finally adapted in Grid Layout to the different writing-modes, which was a patch I’ve been working for some weeks already and had the opportunity to complete during this hackfest.

I think that’s all for now, hope to see you all in the next Web Engines Hackfest 2015.

sponsors

 

by jfernandez at December 11, 2014 10:10 AM

December 09, 2014

Manuel Rego

Web Engines Hackfest 2014

The Hackfest

Web Engines Hackfest 2014 (picture by Adrián Pérez) Web Engines Hackfest 2014 (picture by Adrián Pérez)

Today we’re finishing the first edition of the Web Engines Hackfest. This is like the continuation of the “WebKitGTK+ Hackfest” that has been organized by Igalia since 2009.

Checking some numbers, we’ve been 30 people working together for 4 full-days (even nights in some cases). On top of that, there’ve been 6 nice talks and several interesting discussions around the open web platform.

As usual the hackfest started with an initial brainstorming where everybody introduced themselves and their plans for the following days. Martin Robinson led the discussion and identified the main topics, that were later debated in detail on the breakout sessions.

The Talks

Despite of being an un-conference format event, this year we’ve arranged a few nice talks about different topics (you can find the slides in the hackfest wiki page):

  • Raspberry Pi Browser by Gustavo Noronha & ChangSeok Oh: They talked about the work done to have a full-browser working in a low specs device like the RPi. They used WebKitGTK+, but not WebKit2 yet, and they need to do some cool tricks to avoid the hardware limitations.
  • CSS Grid Layout by Sergio Villar: Sergio explained the work we’ve been doing around CSS Grid Layout on Blink and WebKit. Apart from explaining some of the nice things of the feature, he revealed our plans to ship it on Chrome during the first quarter of 2015.
  • State of JS Implementations by Andy Wingo: As usual a great talk by Wingo summarizing the evolution of JavaScript during the last years in the open source engines (JSC, SpiderMonkey and V8) and what we can expect in the future. Next year we’ll bring us further optimizations and a faster web for the users.

TyGL presentation by Zoltán Herczeg (picture by Adrián Pérez) TyGL presentation by Zoltán Herczeg (picture by Adrián Pérez)

  • GPU Based 2D Rendering With TyGL by Zoltán Herczeg: Good introduction to TyGL port explaining all the internals and architecture. The different technical challenges and the solutions provided by TyGL to face them.
  • WebKit for Wayland by Žan Doberšek: Žan presented a new WebKit port for Wayland platform. The main use case for this port is to have a just one web view full-screen (like in a set top box). He explained the internal details and how it’s done the integration with Wayland.
  • A Short Introduction to Servo by Martin Robinson: Martin explained the new parallel browser engine Servo and the Rust programming language behind the scenes. He talked about the current status and the next steps, it should be soon in dog-foodable state.

CSS Features breakout session (picture by Adrián Pérez) CSS Features breakout session (picture by Adrián Pérez)

My personal experience

From the personal point of view, I’ve been mostly focused on CSS stuff. We’ve a nice discussing regarding the new layout models (from multi-column, flexbox and grid to regions and shapes) and how long they take to be a major feature widely used on the web.

On the one hand, I’ve been carrying on with the regular work I’m lately doing related to CSS Grid Layout. I found and fixed a crash in the absolutely positioned grid children support that landed past week. In addition I managed to get few more tests landed on the grid W3C test suite that we’re starting to create (thanks to Gérard Talbot for the reviews).

A Coruña from San Pedro (picture by Adrián Pérez) A Coruña from San Pedro (picture by Adrián Pérez)

On the other hand, I’ve been talking with Mihnea Ovidenie about CSS Regions and how CSS Grid Layout could help to avoid the criticism regarding empty HTML elements required to create regions.
Grid allows you to divide the webpage in areas directly from CSS, each area is not associated with an HTML element (a DOM node). So if regions are defined using those areas, it won’t be needed to create empty elements on the DOM tree.
Basically, this seems to match with what is defined in the CSS Pagination Templates module that is being drafted by Alan Stearns.

Thanks

Finally I’d like to thanks to all the people attending (this event wouldn’t be possible without all of you). We hope you enjoined the hackfest and also your stay in A Coruña. See you next year!

Web Engines Hackfest 2014 sponsors: Adobe, Collabora and Igalia Web Engines Hackfest 2014 sponsors: Adobe, Collabora and Igalia

Also big thanks the sponsors Adobe, Collabora and Igalia for making possible such a great event.

December 09, 2014 11:00 PM

Žan Doberšek

Announcing WebKit for Wayland

Here at Igalia we are happy to announce the Wayland port of WebKit.

This port avoids using traditional GUI toolkits in favor of directly operating with the Wayland display protocol. Leveraging the WebKit2 multi-process architecture, the UIProcess is implemented as a shared library and loaded by the Wayland compositor, enabling the WebProcess to act as a direct client of the compositor while still being controlled by the UIProcess.

EGL, the Wayland EGL platform, and OpenGL ES are used for hardware-accelerated compositing of the rendered Web content. GLib, Libsoup and Cairo are used under the hood.

The port serves as a good base for building systems and environments that are mostly or completely relying on the Web platform technologies for building the desired interface.

Overall the port is still in its early days, with some basic functionality (e.g. functional keyboard and mouse input support) and many other Web platform features still not supported. But with Wayland EGL support constantly growing in graphics drivers for different GPUs, it can already be tested on devices like the Raspberry Pi or the Jetson TK1 development board.

In terms of supported Wayland compositors, for the moment we only support Weston (the reference Wayland compositor implementation), which is also used for development purposes. It’s also used for running the layout tests by again pushing WebKitTestRunner functionality into a shared library, though all that is still in very early stages.

The code is available on GitHub. There are also short instructions for building the dependencies and the port, and how to run it.

There’s also additional repositories there (for Cairo, Weston), containing changes that haven’t yet been pushed upstream. In the following days we’ll also be providing Buildroot configurations that can be used for cross-compiling the whole software stack for the different supported hardware.

We look forward to continuing evolving this work, enabling further features and improving performance on the software side and adding support for additional devices.

 

by Žan Doberšek at December 09, 2014 03:31 PM

Andy Wingo

state of js implementations, 2014 edition

I gave a short talk about the state of JavaScript implementations this year at the Web Engines Hackfest.


29 minutes, vorbis or mp3; slides (PDF)

The talk goes over a bit of the history of JS implementations, with a focus on performance and architecture. It then moves on to talk about what happened in 2014 and some ideas about where 2015 might be going. Have a look if that's a thing you are in to. Thanks to Adobe, Collabora, and Igalia for sponsoring the event.

by Andy Wingo at December 09, 2014 10:29 AM

December 04, 2014

Gwang Yoon Hwang

Introducing Threaded Compositor

Since CSS Animation appeared, the graphics performance of WebKit has become an important issue, especially in embedded systems world. Many users (including me) want fancy and stunning user interfaces, even in a small handset device which have very restricted CPU and GPU.

Luckily, I could work through the years to improve the graphics performance of WebKit. And with the support from colleagues in Igalia, I just finished to make the Threaded Compositor for WebKitGTK+ as a review-ready state.

The Threaded Compositor focused to improve the performance of WebKitGTK+ for CSS Animation using dedicated thread for compositing. But not only CSS Animation, it also provides a way to improve the performance of scrolling, scaling, rendering of canvas and video element.

Before diving into the detail, it would be good to watch a video which introducing the Threaded Compositor.

Click here if a video does not appear above this text.

The example used in this video can be accessed from here. I modified the original version to run automatically for benchmarking purpose.

Also, the current implementation of Threaded Compositor is available on the github. It is based on the WebKit r176538.

Motivation

The main-thread is where everything gets executed, including layout, JavaScript execution, and other heavy operations. Thus, running accelerated compositing in the main-thread severely limits responsiveness and rendering performance. By having a separate thread for compositing, we can bring a significant performance improvement in scrolling, zooming, and rendering.

Currently, several ports have already implemented a way to perform compositing off the main-thread. Coordinate Graphics, which is used by Qt and EFL, runs accelerated compositing in UI Process, and Chromium has implemented Compositor Thread that runs off the main render thread.

Threaded Compositor is a threaded model of Coordinated Graphics. Unlike Coordinated Graphics, it composites contents in the dedicated thread of Web Process and doesn’t use complicated inter-process shareable textures. Because of that, it is much easier to implement other multi-threaded rendering techniques which is covered at Compositing the Media and the Canvas.

Off-the-main-thread Animation

Compositing a contents using OpenGL[ES] is a basic technique to accelerate CSS animations. However, even we are using GPU accelerated compositing, we can face a V-sync pitfall. ( This happens more frequently in the embedded systems. ) Most of GPU uses V-sync (Vertical synchronization) to prevent the screen tearing problem. It may blocks OpenGL’s SwapBuffer call at most 16 ms - if the monitor’s refresh rate is 60 Hz - until the next vblank interrupt.

As you can see in the below diagram, this problem can waste the main-thread’s CPU time.

A diagram to visualize rendering pipeline of the current WebKitGTK+.

Again, when the main thread executing OpenGL[ES]’s SwapBuffer call, the driver should wait V-sync interrupt from the DRM/KMS module. The above diagram shows the worst case scenario. The main-thread could render 6 frames in given time if there was no V-sync problem. But with the V-sync, it renders just 4 frames. It can be happened more frequently in embedded environments which have less powerful CPU and GPU.

A diagram to visualize rendering pipeline of the Threaded Compositor.

To overcome this problem, we uses a very common technique to use a different thread to update screen. As visualized at above diagram, the compositing thread takes all overheads from the V-sync problem. The main-thread can do other job while the compositing thread handles OpenGL calls. (Purple rectangles in the main-thread indicate inter-thread overheads.)

Moreover, the Threaded Compositor accelerates CSS Animations, also.

Because all of the layer properties (including GraphicsLayerAnimation) are passed to the compositing thread, it can make available to run CSS animations on the compositing thread without interrupting the main-thread. When CoordinatedGraphicsScene finishes to render a scene, it will check is there any running animations in the its layer tree. If there are active one, it sets a timer to render a scene itself.

Attached sequence diagram can be helpful to understand this idea.

A sequence diagram of the updating animation in the Threaded Compositor

Scrolling and Scaling

Scrolling and Scaling are also expensive job especially in the embedded device.

When we are scaling a web page, WebCore needs to update whole layout of its web page. As all of us knows, it is really expensive operation in the embedded platform.

A simplified diagram to visualize scaling procedure of the current WebKitGTK+.

Let’s assume that a user tries to scale a web page using a two finger gesture. The WebCore in the Web Process tries to produce scaled web page with a exact layout as soon as possible. However, if it is a embedded device, it can need more than 100 ms - 10 fps.

The Threaded Compositor renders a web page with a requested scale using its cached layer tree. By doing like that, we can give a immediate visual feed back to the user. And re-layouted web page will be updated later.

A simplified diagram to visualize scaling procedure of the Threaded Compositor.

It is similar in the scrolling operation. When we are scrolling the web page, this operation doesn’t need a huge re-layout operation. But it needs re-painting and bliting operation to update a web page.

A simplified diagram to visualize scrolling procedure of the current WebKitGTK+.

Because the Threaded Compositor uses TILED_BACKING_STORE, it can render the scrolled web page with cached tiles with a minimal latency. During ThreadedCompositor renders a web page, the main-thread scrolls "actual" view. Whenever the main-thread finishes "actual" scroll, these changes are collected by CompositingCoordinator. And these changes are passed to the ThreadedCompositor to update the view.

A simplified diagram to visualize scrolling procedure of the Threaded Compositor.

Unfortunately, these optimization is only supported when we are using fixed layout. To support this without fixed layout, we need to refactor TILED_BACKING_STORE in WebKit and implement proper overlay scrollbars in WebKitGTK+.

Compositing the Media and the Canvas

Not only CSS3 Animation and scaling but also the video and canvas can be accelerated by the Threaded Compositor. Because it is little bit out of scope of this post - Because it is not implemented for upstreaming - I’ll only describe about it briefly here.

In the TextureMapper, Media (Plugins, Video element, …) and Canvas (WebGL, 2D Canvas) can be rendered by implementing TextureMapperPlatformLayer. Most important interface of the TextureMapperPlatformLayer is TextureMapperPlatformLayerClient::setPlatformLayerNeedsDisplay which requests the compositor to composite the its contents.

If the actual renderer of a Media and a Canvas element runs on off-the-main-thread, it is possible to bypass the main-thread entirely. The renderer can calls TextureMapperPlatformLayerClient::setPlatformLayerNeedsDisplay when it updates its result from its own thread. And compositing thread will composite the result without using the main-thread.

A diagram to visualize rendering pipeline of for Canvas and Video.

Also, if the target platform supports streams of texture, we can avoid pipeline hazards when drawing video layers in modern mobile GPUs which uses tile based rendering.

Performance

This performance comparison was presented in Linux.Conf.AU 2013. It is based on pretty old implementation but it shows meaningful performance improvement compare to plain TextureMapper method. I could not regenerate this result using my current laptop because it is too fast to make stressed condition. I hope I can share updated the performance comparison using the embedded device soon.

  • Test Cases [*]

    • leaves n

      • Modified the famous accelerated compositing example, WebKit Leaves, to draw n leaves.
    • 3DCube n

      • n cubes rotating using CSS3 animation.
  • Test Environment

    • Prototype of Threaded Compositor Implementation based on r134883
    • WebKit2 Gtk+ r140386
    • Intel Core i5-2400 3.10Ghz, Geforce GTS450, Ubuntu 12.04 x86_64
Tests ThreadedCompositor (fps) WebKit2 Gtk+ (fps) Improved %
leaves500 58.96 35.86 64.42
leaves1000 58.98 25.88 127.90
leaves2000 32.98 18.04 82.82
3DCube360 57.27 32.09 78.47
3DCube640 33.64 23.52 43.03
[*]These tests are made by Company 100

Class Diagram

I made this diagram to help my understanding during the implementation, but It would be good to share to help others who are interested in it.

A class diagram of the Threaded Compositor
  • CompositingCoordinator

    A class takes the responsibility of managing compositing resources in the main-thread.

  • ThreadedCompositor

    This class has a dedicate thread for compositing. It owns ViewportController and CoordinatedGraphicsScene to render scene on the actual surface.

  • ThreadedCoordinatedLayerTreeHost

    A concrete class of LayerTreeHost and CompositingCoordinator::Client for CoordinatedGraphics. And it has ThreadedCompositor to use the Threaded Compositor.

  • CoordinatedGraphicsScene

    A class which has a complete scene graph and rendering functionality. It synchronizes its own scene graph with a graphics layer tree in compositing coordinator.

  • ThreadSafeCoordinatedSurface

    This class implements a surface using ImageBuffer as a backend to use in the Web Process.

  • TextureMapperLayer

    It has actual properties to render a layer.

  • ViewportController

    This class is responsible to handle scale factor and scrolling position in the compositing thread.

In the main-thread, all of the visual changes of each GraphicsLayer are passed to CompositingCoordinator via CoordinatedGraphicsLayer. And CompositingCoordinator collects state changes from the GraphicsLayer tree when it is requested to do so.

From this point, Threaded Compositor and Coordinated Graphics behaves differently.

In Coordinated Graphics, the collected state changes of the layer tree is encoded and passed to CoordinatedLayerTreeHostProxy in UIProcess. And CoordinatedGraphicsScene applies these state changes to its own TextureMapperLayer tree and renders these on the frame buffer.

But in Threaded Compositor, these states are passed to CoordinatedGraphicsScene which owned by ThreadedCompositor. ThreadedCompositor has its own RunLoop and thread, all of the actual compositing operations are executed in the dedicated compositing thread in Web Process.

Current Status

Most of re-factoring for Threaded Compositor in Coordinated Graphics side was done in the last year.

However, upstreaming of Threaded Compositor was delayed to various issues. I had to simplify the design (which was quite complicate at that time) and resolve various issues Before starting upstreaming process.

In this year, WebKitGTK+ decided to deprecate WebKit1. Because of that it was possible to make much less complicated design. Also, I fixed the Threaded Compositor not to break current behavior of WebKitGTK+ without fixed layout.

I’m happy that I can start upstreaming process of this implementation, from now on.

Things To Do

  • Reduce memory uses when updating texture

    • RenderObjects draw its contents to UpdateAtlas to pass bitmaps to the compositing thread. However, The size of each UpdateAtlas is 4MB. It is quite big memory consumption in the embedded device. Still, I couldn’t find a way to solve it without using proprietary GPU driver.
  • Avoid pipeline hazards in the mobile GPU

    • Modern mobile GPUs uses tile-based [deferred] rendering. These GPUs can render multiple frames (~ 3 frames) concurrently to overcome its low performance. It is well working technique in the game industry which uses static textures.
    • In the WebKit, most of textures are dynamically generated. Whenever we are updating a texture using texSubImage2D, GPU would be stalled because it would be used in the previous rendering call.
    • To avoid this problem, chromium uses producer/consumer model of texture
    • We can create a wrapper for a texture which provides multiple buffering. In this way we can avoid the pipeline hazards.
  • Reduce the maintenance burden

    • we need to re-factor Coordinated Graphics to share codes with Apple’s UI SIDE COMPOSITING codes. Most of messages are duplicated. So it can be easily done by defining platform specific parts.
  • Accelerate the performance of 2D Canvas element and Video element

  • Check the performance of Threaded Compositor in the embedded device

    • I hope I can share about it soon.
  • Most of all, upstreaming!

by Gwang Yoon Hwang at December 04, 2014 01:20 AM

Adrián Pérez

Inception! JS-in-JS!

Apparently this November has been the gloomiest in a while, and that certainly may have slowed down my on-going quest to bring arrow functions to V8. Though arrow functions deserve a write-up themselves, my musings today are about a side quest that started in September, when I had the chance to visit enchanting Berlin one more time, this time as a speaker at JSConf EU.

So what does a guy hacking on JavaScript engines talk about in a conference which is not a lair of compiler neckbeards? Of course it had to be someting related to the implementation of JavaScript engines, but it should have something to do with what can be done with JavaScript. It turns out that V8 has large-ish parts implemented in a subset of JavaScript (and SpiderMonkey; and JavaScriptCore, a bit too!). Enter “Inception: JavaScript-in-JavaScript — where language features are implemented in the language itself any similarity with the movie is a mere coincidence)”.

Intermission

Provided that the video of my talk at JSConf EU is now online, you may want to go watch it now (as an option, with the slides in another window) and come back afterwards to continue reading here.

Did you see the video of the talk? Good, let's recap a bit before continuing.

Engines use JavaScript, too

I made some quick numbers (with CLOC) to give you an idea of how much major engines are using JavaScript themselves. This is the number of thousands of lines of code (kLOC) of the engines themselves, how much of it is JavaScript, and its percentage over the total (excluding the test suites):

Engine Total JS %
JavaScriptCore 269 1 0.3
SpiderMonkey 457 18 3.9
V8 532 23 4.3

Both V8 and SpiderMonkey have a good deal of the runtime library implemented in JavaScript, relying on the JIT to produce machine code to provide a good performance. JavaScriptCore might have an edge in cases where it is not possible to use the JIT and its C++ implementation can be faster when using the interpreter (iOS, anyone?). In the case of V8 the choice of going JavaScript is clear, because it always compiles to machine code.

Most of the features introduced in the runtime library by the ES6 specification can be implemented without needing to touch low-level C++ code. Or do we?

Live-coding %TypedArray%.prototype.forEach()

For my talk in Berlin I wanted to show how accessible it is to use JavaScript itself as an entry point to the guts of V8. What would be better than picking something new from the spec, and live-coding it, let's say the .forEach() method for typed arrays? Because of course, coding while delivering a talk cannot ever go bad.

Live-coding, what could go wrong?

As a first approach, let's write down an initial version of .forEach(), in the very same way as we would if we were writing a polyfill:

function TypedArrayForEach(cb, thisArg) {
    if (thisArg === undefined)
        thisArg = this;
    for (var i = 0; i < this.length; i++)
        cb.call(thisArg, this[i]);
}

But, in which file exactly should this go, and how is it made available in the runtime library? It turns out that there is a bunch of .js files under src/, one of them conspicuously named typedarray.js, prompting us to go ahead and write the above function in the file, plus the following initialization code inside its —also aptly named— SetupTypedArrays() function, right at the end of the macro definition:

function SetupTypedArrays() {
macro SETUP_TYPED_ARRAY(ARRAY_ID, NAME, ELEMENT_SIZE)
    // ...

    // Monkey-patch .forEach() into the prototype
    global.NAME.prototype.forEach = TypedArrayForEach;
endmacro

TYPED_ARRAYS(SETUP_TYPED_ARRAY)
}

Wait! What, macros? Yes, my friends, but before crying out in despair, rebuild V8 and run the freshly built d8 passing --harmony-arrays:

% ./out/x64.debug/d8 --harmony-arrays
V8 version 3.31.0 (candidate) [console: readline]
d8> Int8Array.prototype.forEach
function forEach() { [native code] }
d8>

It is alive!

Standards are hard

At this point we have a working polyfill-style implementation patched in the prototypes of typed arrays, and it works. Yet, it does not quite work as it should. To begin with, the function should throw TypeError if called on an object that is not a typed array, or if the callback is not callable. The spec also mentions that the .length property of the function should be 1 (that is: it is declared as having a single argument), and so on.

You see, formally specifying a programming language is a complex task because all those iffy details have to be figured out. Not to forget about the interactions with other parts of the language, or the existing code bases. In the particular case of JavaScript the backwards compatibility adds a good deal of complexity to the formal semantics, and wading through the hundreds of pages of the ES6 specification can be a daunting task. We may feel compelled to define things differently.

How standards are born, courtesy of XKCD.

Personally, I would eradicate one of the modes, keep just sloppy or strict, but not both. Only that then it would not be JavaScript anymore. Think about all that legacy code that lurks on the depths of the Internet, wouldn't it be nice that it continues to work?

That leads to yet another subtlety: the value passed as thisArg (available as this in the callback function) is expected to be wrapped into an object only for sloppy mode functions — but not for strict mode ones, because strict mode is, well, stricter. While .call() handles this, it is not particularly fast (more on this later), so we may end up needing a way to know which functions are in sloppy mode, do the wrapping ourselves (when needed) and arrange the calls in a more efficient way. We have to assume that there will be code in the wild relying in this behavior. If only we could knew which functions are sloppy!

A better solution

As a matter of fact, the JavaScript engines already know which functions are strict and which ones not. We just cannot get access to that information directly. Fortunately V8 provides a way in which JavaScript code can invoke C++ runtime functions: the functions declared using the RUNTIME_FUNCTION() macro can be invoked from JavaScript, by prefixing their names with %, and if you take a peek into src/runtime/, there are plenty.

To begin with, there is %_CallFunction(), which is used pervasively in the implementation of the runtime library instead of .call(). Using it is an order of magnitude faster, we will be keeping it in our utility belt. Then, there is %IsSloppyModeFunction(), which can be used to find out the “sloppyness” of our callback functions.

But wait! Do you remeber that I mentioned macros above? Take a look at src/macros.py, go, now. Did you see something else interesting in there? Plenty of utility macros, used in the implementation of the runtime library, live there. We can use SHOULD_CREATE_WRAPPER() instead. Also, there are a number of utility functions in src/runtime.js, most annotated with sections of the EcmaScript specification because they do implement algorithms as written in the spec. Using those makes easier to make sure our functions follow all the intrincacies of the spec.

Let's give them a try:

// Do not add a "thisArg" parameter, to make
// TypedArrayForEach.length==1 as per the spec.
function TypedArrayForEach(cb /* thisArg */) {
    // IsTypedArray(), from src/runtime/runtime-typedarray.cc
    // MakeTypeError(), from src/messages.js
    if (!%IsTypedArray(this))
        throw MakeTypeError('not_typed_array', []);

    // IS_SPEC_FUNCTION(), from src/macros.py
    if (!IS_SPEC_FUNCTION(cb))
        throw MakeTypeError('called_non_callable', [cb]));

    var wrap = false;
    var thisArg = this;
    if (arguments.length > 1) {
        thisArg = arguments[1];
        // SHOULD_CREATE_WRAPPER(), from src/macros.py
        wrap = SHOULD_CREATE_WRAPPER(cb, thisArg);
    }

    for (var i = 0; i < this.length; i++) {
        // ToObject(), from src/runtime.js
        %_CallFunction(wrap ? ToObject(thisArg) : thisArg,
                       this[i], i, this, cb);
    }
}

Now that is much closer to the version that went into V8 (which has better debugging support on top), and it covers every the corner cases in the spec. Rejoice!

When in Rome...

Needless to say, there is no documentation for most of the helper functions used to implement the runtime library. The best is to look around and see how functions similar to the one we want to implement are done, while keeping a copy of the spec around for reference.

Do as the tigers do.

Those are a couple of things one can find by scouring the the existing code of the runtime library. First: for typed arrays, the macro in SetupTypedArrays() installs a copy of the functions for each typed array kind, effectively duplicating the code. While this may seem gratuitious, each copy of the function is very likely to be used only on typed arrays in which elements are a certain type. This helps the compiler to generate better code for each of the types.

Second: Runtime functions with the %_ prefix in their names (like %_CallFunction()) do not have their C++ counterpart in src/runtime/. Those are inline runtime functions, for which the code is emitted directly by the code generation phase of the compiler. They are real fast, use them whenever possible!

Next

After coming back from Berlin, I have been implementing the new typed array methods, and the plan is to continue tackling them one at a time.

As mentioned, %TypedArray%.prototype.forEach() is already in V8 (remember to use the --harmony-arrays flag), and so is %TypedArray%.of(), which I think is very handy to create typed arrays without needing to manually assign to each index:

var pow2s = Uint8Array.of(1, 2, 4, 16, 32, 64, 128, 256);

Apart from my occassional announcements (and rants) on Twitter, you can also follow the overall progress by subscribing to issue #3578 in the V8 bug tracker.

Last but not least, a mention to the folks at Bloomberg, who are sponsoring our ongoing work implementing ES6 features. It is great that they are able to invest in improving the web platform not just for themselves, but for everybody else as well. Thanks!

by aperez (adrian@perezdecastro.org) at December 04, 2014 12:00 AM

December 02, 2014

Andy Wingo

there are no good constant-time data structures

Imagine you have a web site that people can access via a password. No user name, just a password. There are a number of valid passwords for your service. Determining whether a password is in that set is security-sensitive: if a user has a valid password then they get access to some secret information; otherwise the site emits a 404. How do you determine whether a password is valid?

The go-to solution for this kind of problem for most programmers is a hash table. A hash table is a set of key-value associations, and its nice property is that looking up a value for a key is quick, because it doesn't have to check against each mapping in the set.

Hash tables are commonly implemented as an array of buckets, where each bucket holds a chain. If the bucket array is 32 elements long, for example, then keys whose hash is H are looked for in bucket H mod 32. The chain contains the key-value pairs in a linked list. Looking up a key traverses the list to find the first pair whose key equals the given key; if no pair matches, then the lookup fails.

Unfortunately, storing passwords in a normal hash table is not a great idea. The problem isn't so much in the hash function (the hash in H = hash(K)) as in the equality function; usually the equality function doesn't run in constant time. Attackers can detect differences in response times according to when the "not-equal" decision is made, and use that to break your passwords.

Edit: Some people are getting confused by my use of the term "password". Really I meant something more like "secret token", for example a session identifier in a cookie. I thought using the word "password" would be a useful simplification but it also adds historical baggage of password quality, key derivation functions, value of passwords as an attack target for reuse on other sites, etc. Mea culpa.

So let's say you ensure that your hash table uses a constant-time string comparator, to protect against the hackers. You're safe! Or not! Because not all chains have the same length, "interested parties" can use lookup timings to distinguish chain lookups that take 2 comparisons compared to 1, for example. In general they will be able to determine the percentage of buckets for each chain length, and given the granularity will probably be able to determine the number of buckets as well (if that's not a secret).

Well, as we all know, small timing differences still leak sensitive information and can lead to complete compromise. So we look for a data structure that takes the same number of algorithmic steps to look up a value. For example, bisection over a sorted array of size SIZE will take ceil(log2(SIZE)) steps to get find the value, independent of what the key is and also independent of what is in the set. At each step, we compare the key and a "mid-point" value to see which is bigger, and recurse on one of the halves.

One problem is, I don't know of a nice constant-time comparison algorithm for (say) 160-bit values. (The "passwords" I am thinking of are randomly generated by the server, and can be as long as I want them to be.) I would appreciate any pointers to such a constant-time less-than algorithm. However a bigger problem is that the time it takes to access memory is not constant; accessing element 0 of the sorted array might take more or less time than accessing element 10. In algorithms we typically model access on a more abstract level, but in hardware there's a complicated parallel and concurrent protocol of low-level memory that takes a non-deterministic time for any given access. "Hot" (more recently accessed) memory is faster to read than "cold" memory.

Non-deterministic memory access leaks timing information, and in the case of binary search the result is disaster: the attacker can literally bisect the actual values of all of the passwords in your set, by observing timing differences. The worst!

You could get around this by ordering "passwords" not by their actual values but by their cryptographic hashes (e.g. by their SHA256 values). This would force the attacker to bisect not over the space of password values but of the space of hash values, which would protect actual password values from the attacker. You still leak some timing information about which paths are "hot" and which are "cold", but you don't expose actual passwords.

It turns out that, as far as I am aware, it is impossible to design a key-value map on common hardware that runs in constant time and is sublinear in the number of entries in the map. As Zooko put it, running in constant time means that the best case and the worst case run in the same amount of time. Of course this is false for bucket-and-chain hash tables, but it's false for binary search as well, as "hot" memory access is faster than "cold" access. The only plausible constant-time operation on a data structure would visit each element of the set in the same order each time. All constant-time operations on data structures are linear in the size of the data structure. Thems the breaks! All you can do is account for the leak in your models, as we did above when ordering values by their hash and not their normal sort order.

Once you have resigned yourself to leaking some bits of the password via timing, you would be fine using normal hash tables as well -- just use a cryptographic hashing function and a constant-time equality function and you're good. No constant-time less-than operator need be invented. You leak something on the order of log2(COUNT) bits via timing, where COUNT is the number of passwords, but since that's behind a hash you can't use it to bisect on actual key values. Of course, you have to ensure that the hash table isn't storing values in sorted order and short-cutting early. This sort of detail isn't usually part of the contract of stock hash table implementations, so you probably still need to build your own.

Edit: People keep mentioning Cuckoo hashing for some reason, despite the fact that it's not a good open-hashing technique in general (Robin Hood hashes with linear probing are better). Thing is, any operation on a data structure that does not touch all of the memory in the data structure in exactly the same order regardless of input leaks cache timing information. That's the whole point of this article!

An alternative is to encode your data structure differently, for example for the "key" to itself contain the value, signed by some private key only known to the server. But this approach is limited by network capacity and the appropriateness of copying for the data in question. It's not appropriate for photos, for example, as they are just too big.

Edit: Forcing constant-time on the data structure via sleep() or similar calls is not a good mitigation. This either massively slows down your throughput, or leaks information via side channels. Remote attackers can measure throughput instead of latency to determine how long an operation takes.

Corrections appreciated from my knowledgeable readers :) I was quite disappointed when I realized that there were no good constant-time data structures and would be happy to be proven wrong. Thanks to Darius Bacon, Zooko Wilcox-O'Hearn, Jan Lehnardt, and Paul Khuong on Twitter for their insights; all mistakes are mine.

by Andy Wingo at December 02, 2014 10:01 PM

November 27, 2014

Andy Wingo

scheme workshop 2014

I just got back from the US, and after sleeping for 14 hours straight I'm in a position to type about stuff again. So welcome back to the solipsism, France and internet! It is good to see you on a properly-sized monitor again.

I had the enormously pleasurable and flattering experience of being invited to keynote this year's Scheme Workshop last week in DC. Thanks to John Clements, Jason Hemann, and the rest of the committee for making it a lovely experience.

My talk was on what Scheme can learn from JavaScript, informed by my work in JS implementations over the past few years; you can download the slides as a PDF. I managed to record audio, so here goes nothing:


55 minutes, vorbis or mp3

It helps to follow along with the slides. Some day I'll augment my slide-rendering stuff to synchronize a sequence of SVGs with audio, but not today :)

The invitation to speak meant a lot to me, for complicated reasons. See, Scheme was born out of academic research labs, and to a large extent that's been its spiritual core for the last 40 years. My way to the temple was as a money-changer, though. While working as a teacher in northern Namibia in the early 2000s, fleeing my degree in nuclear engineering, trying to figure out some future life for myself, for some reason I was recording all of my expenses in Gnucash. Like, all of them, petty cash and all. 50 cents for a fat-cake, that kind of thing.

I got to thinking "you know, I bet I don't spend any money on Tuesdays." See, there was nothing really to spend money on in the village besides fat cakes and boiled eggs, and I didn't go into town to buy things except on weekends or later in the week. So I thought that it would be neat to represent that as a chart. Gnucash didn't have such a chart but I knew that they were implemented in Guile, as part of this wave of Scheme consciousness that swept the GNU project in the nineties, and that I should in theory be able to write it myself.

Problem was, I also didn't have internet in the village, at least then, and I didn't know Scheme and I didn't really know Gnucash. I think what I ended up doing was just monkey-typing out something that looked like the rest of the code, getting terrible errors but hey, it eventually worked. I submitted the code, many years ago now, some of the worst code you'll read today, but they did end up incorporating it into Gnucash and to my knowledge that report is still there.

I got more into programming, but still through the back door, so to speak. I had done some free software work before going to Namibia, on GStreamer, and wanted to build a programmable modular synthesizer with it. I read about Supercollider, and decided I wanted to do something like that but with the "unit generators" defined in GStreamer and orchestrated with Scheme. If I knew then that Scheme could be fast, I probably would have started on an entirely different course of things, but that did at least result in gainful employment doing unrelated GStreamer things, if not a synthesizer.

Scheme became my dominant language for writing programs. It was fun, and the need to re-implement a bunch of things wasn't a barrier at all -- rather a fun challenge. After a while, though, speed was becoming a problem. It became apparent that the only way to speed up Guile would be to replace its AST interpreter with a compiler. Thing is, I didn't know how to write one! Fortunately there was previous work by Keisuke Nishida, jetsam from the nineties wave of Scheme consciousness. I read and read that code, mechanically monkey-typed it into compilation, and slowly reworked it into Guile itself. In the end, around 2009, Guile was faster and I found myself its co-maintainer to boot.

Scheme has been a back door for me for work, too. I randomly met Kwindla Hultman-Kramer in Namibia, and we found Scheme to be a common interest. Some four or five years later I ended up working for him with the great folks at Oblong. As my interest in compilers grew, and it grew as I learned more about Scheme, I wanted something closer there, and that's what I've been doing in Igalia for the last few years. My first contact there was a former Common Lisp person, and since then many contacts I've had in the JS implementation world have been former Schemers.

So it was a delight when the invitation came to speak (keynote, no less!) the Scheme Workshop, behind the altar instead of in the foyer.

I think it's clear by now that Scheme as a language and a community isn't moving as fast now as it was in 2000 or even 2005. That's good because it reflects a certain maturity, and makes the lore of the tribe easier to digest, but bad in that people tend to ossify and focus on past achievements rather than future possibility. Ehud Lamm quoted Nietzche earlier today on Twitter:

By searching out origins, one becomes a crab. The historian looks backward; eventually he also believes backward.

So it is with Scheme and Schemers, to an extent. I hope my talk at the conference inspires some young Schemer to make an adaptively optimized Scheme, or to solve the self-hosted adaptive optimization problem. Anyway, as users I think we should end the era of contorting our code to please compilers. Of course some discretion in this area is always necessary but there's little excuse for actively bad code.

Happy hacking with Scheme, and au revoir!

by Andy Wingo at November 27, 2014 05:48 PM