Planet Igalia

June 13, 2013

Andy Wingo

but that would be anarchy!

Good morning, internets!

This is the second of a series of articles on what it's like to work for/in/with a cooperative; the first one is here. Eventually I'll update the first one to link to the whole series, whereever it goes.

I work for a worker's cooperative, Igalia. This article series is about moving beyond theory to practice: to report on our experience with collective self-determination, for the curious and for the interested. It's a kind of exercise in marketing the revolution :) I hope I can be free from accusations of commercial interest, however; for better or for worse, our customers couldn't care less how we are organized internally.

the ties that bind

The essence of a worker's cooperative is to enable people to make decisions about their work to the degree to which they are affected by those decisions. For decisions affecting the whole group, you need collective deliberation. You could think of it as workplace democracy, if you understand democracy to go beyond simple voting and referenda.

Collective decision-making works, but you need some preconditions -- see for example conditions for consensus, from this excellent article on consensus. More so than any other enterprise, a cooperative needs to have a set of goals or principles that binds all of its members together. In Igalia, we have a rather wordy document internally, but it's basically a commitment to finding a way to let hackers work on interesting things, centered around free software, in a self-organized way. Everything else is just strategy.

structure

There are two more-or-less parallel structures in Igalia: one for decision-making and one for work. There is also a legal structure that is largely vestigial; more on that at the end.

The most important structure in Igalia is the "assembly". It's composed of all workers that have been in Igalia for a certain amount of time (currently 1 year, though there are discussions to shorten that period).

The assembly is the ultimate decision-making body, with power over all things: bank accounts, salary levels, budgets, strategies, team assignments, etc. All company information is open to all assembly members, which currently comprises all of the people in Igalia.

The assembly meets in person every two months at the head office in A Coruña, with the option for people to dial in. Since many of us work in remote locations and just communicate over mail and jabber, it's good to see people in person to renew the kind of personal connections that help make agreement easier. Incidentally, this requirement for group face-to-face meetings influenced the architecture of our head office; there is a dividable room there that can be expanded into a kind of "round table" with microphones at all the posts. You can see a picture of it on the about page.

The in-person assemblies are usually a bit abstracted over day-to-day operations and focus more on proposals that people have brought up or on strategy. Sometimes they are easy and quick and sometimes they go late into the night. The ones associated with longer-term planning like the yearly budgeting assemblies are the longest.

How well does this work, you ask? I would summarize by saying that it works OK. There are so many advantages to collective decision-making that I now take for granted that it is difficult to imagine other ways. However, making decisions is hard on a personal level: it's a challenge to hold all of the ideas in your head at one time, and to feel the right level of responsibility for the success of the business. I'll write another article on this point, I think, because it is also part of the cooperative working experience.

"The assembly" is both the bimonthly meeting and also the body of people. We're all on a mailing list and a jabber channel, which is where a lot of the other day-to-day business decisions get made, like "do we accept this contract", "should we hire people soon", "should we hire X person in particular", etc. However with some 40 people it's tricky to establish an active consensus on a mailing list, so it's usually the people that lead with proposals and actively round up people to help them implement that get things done.

work groups

So that's the power structure in Igalia. However on a day-to-day level, unless a thread is really active on the assembly mailing list, folks just do their jobs. Sounds silly but it has to happen. We're organized into teams, like in a normal company, but without managers -- we also do consensus on a smaller, more informal level within the teams.

Since we're a consulting shop, most people write code all day, but there are also people that primarily focus on other things: sales, team coordination (who's available for what work when?), administrative work, finance and cash-flow monitoring, etc. This broader team allocation is also a result of consensus. (You can see the theme here.) Ideally we rotate around the "coordinator"-type jobs so everyone stays fresh, hacking-wise, and we don't entrench these informal power structures. We've done some of that but could do more.

I've heard people say that "if you don't have a boss, the customer is your boss", but that's simply not true for us in any of the ways that matter. Our working conditions, pay, holidays, hours -- all of this is up to us. Yes, we have to do good work, but that's already an expectation we have of ourselves as hackers. It's a much healthier perspective to consider your customer to be your partner: both providing value for the other. If this isn't the case, we should find other partners, and happily in our particular industry this is a possibility. (More on cooperatives, capitalism, and crisis in a future post.)

legalities

As I said in the beginning, the important thing is that a group share principles, and agree periodically on the strategy to implement them. In that light, the particular legal structure of Igalia is an afterthought, though an important one.

Although Spanish law explicitly provides for cooperatives as a kind of legal entity, Igalia is organized as a limited-liability corporation. The reasons are not entirely clear to me, and periodically come up for debate. One of the issues, though, is that in a cooperative, 85% of your partners have to be Spanish residents, and we did not want that restriction.

Spanish workers are employees of Igalia, and folks outside of Spain are freelancers. However once you've been in the assembly for an amount of time, currently 2 years (way too long IMO), you have the option to become a legal partner in the business, purchasing an equal share of the business at a fixed price. I say "option" but it's also an expectation; the idea is that being a partner is a logical and natural outcome of working at/with/on/in Igalia. We have partners in a number of countries now.

You see my confusion with prepositions, and it's because you have to fit all of these ideas in your head at the same time: working for Igalia as an employee, on it as a project, with it as a partner, at it as a social experiment, etc.

Partners are the legal owners of Igalia, and there are about 30 of them now. They meet every few months, mostly to assess the progression of "prepartners" (those in the assembly but not yet partners, like myself). Ideally they don't discuss other things, and I think that works out in practice. There is a small power differential there between the partners and the assembly. However all the really important things get done in the assembly.

Finally, since Igalia is an S.L. (like an LLC), there is a legal administrator as well -- someone who's theoretically responsible for the whole thing and has to sign certain legal papers. In fact we have three of them, and the positions rotate around every three years. If we were a legal cooperative we could remove this need, which would be convenient. But that's how it is right now.

domination

I want a society without hierarchical power: no state, no military, no boss. But that would be anarchy, right? Well yes, of course that's what it would be! "Anarchy" doesn't equate to a lack of structure, though. It's true that Igalia is embedded in capitalism, but I think it and other cooperatives are a kind of practical anarchy, or at least a step along the path.

I'll close this epistle with a quote, in Chomsky's halting but endearing style. The whole interview is well worth a read.

Anarchism is quite different from that. It calls for an elimination to tyranny, all kinds of tyranny. Including the kind of tyranny that's internal to private power concentrations. So why should we prefer it? Well I think because freedom is better than subordination. It's better to be free than to be a slave. Its' better to be able to make your own decisions than to have someone else make decisions and force you to observe them. I mean, I don't think you really need an argument for that. It seems like ... transparent.

The thing you need an argument for, and should give an argument for, is, How can we best proceed in that direction? And there are lots of ways within the current society. One way, incidentally, is through use of the state, to the extent that it is democratically controlled. I mean in the long run, anarchists would like to see the state eliminated. But it exists, alongside of private power, and the state is, at least to a certain extent, under public influence and control -- could be much more so. And it provides devices to constrain the much more dangerous forces of private power. Rules for safety and health in the workplace for example. Or insuring that people have decent health care, let's say. Many other things like that. They're not going to come about through private power. Quite the contrary. But they can come about through the use of the state system under limited democratic control ... to carry forward reformist measures. I think those are fine things to do. they should be looking forward to something much more, much beyond, -- namely actual, much larger-scale democratization. And that's possible to not only think about, but to work on. So one of the leading anarchist thinkers, Bakunin in the 19th cent, pointed out that it's quite possible to build the institutions of a future society within the present one. And he was thinking about far more autocratic societies than ours. And that's being done. So for example, worker- and community- controlled enterprises are germs of a future society within the present one. And those not only can be developed, but are being developed.

The Kind of Anarchism I Believe In, Noam Chomsky, 28 May 2013

by Andy Wingo at June 13, 2013 07:55 AM

June 11, 2013

Andy Wingo

ecmascript generators from a performance perspective

It's been really gratifying to see the recent interest in ES6 generators in V8. I'm still a Schemer at heart, but one thing all language communities should learn from JS is its enthusiasm and sense of joyful play. One can't spend much time around folks like James Long or Domenic Denicola without feeling the urge to go out and build something fun.

perfie perf perf perf

But this is a blog about nargery! A lot of people have been speculating about the performance impact of using generators, and I've been meaning to write about it, so perhaps this article will tickle your neuron.

A generator is a function that can suspend. So the first perf question to ask yourself is, does the performance of generators matter at all? If your generator spends more time suspended than active, then probably not. Though it's not the only application of generators, if you are using generators for asynchronous programming, then probably generator performance matters not at all.

But that's not very satisfying, right? Even though it probably doesn't matter, maybe it does, and then you need to make sure your mental model of what is fast and what is slow corresponds to reality.

crankshaft

So, the skinny: as implemented in V8, the primary perf impact of generators is that they do not get optimized by Crankshaft.

As you probably know, Crankshaft is V8's optimizing compiler. There are two ways a function can be optimized: on entry, because the function is called many times, or within a loop, because the loop is taking too long. In the first case it's fairly easy: you kick off a parallel task to recompile the function, reading the type feedback collected by the inline caches from the baseline compiler. Once you've made the optimized function, you swap the function's code pointer, and the next time you call the function you get the optimized version instead of the generic version.

The other way is if you are in a function, in a hot loop, and you decide to optimize the function while you are in it. This is called on-stack replacement (OSR), and is important for functions that are only called once but do have hot loops. (Cough KRAKEN cough.) It's trickier to do parallel compilation with OSR, as the function might have left the loop by the time recompilation finishes, but oh well.

So in summary, when V8 goes to optimize a function it will be in one of two states: inactive, because it hasn't started to run yet, or active, because it's stuck in a tight loop.

Generators introduce a new state for functions: suspended. You can know whether a function is active or inactive, but you can't know how many suspended generator activations are out in the JS heap. So you have multiple potential entry points to the function, which complicates the optimizer as it has to work on graphs with irreducible loops. Perhaps it's still possible to optimize, as OSR is a form of irreducible loops; I suspect though that you would end up compiling as many optimized functions as there are yield points, plus one for the function entry (and possibly another for OSR). Anyway it's a bit complicated.

Also, deoptimization (bailout) becomes more difficult if you can have suspended activations, and as a technical detail, it's tricky for V8 to efficiently capture the state of an activation in a way that the garbage collector can understand.

The upshot is that tight loops in generators won't run fast. If you need speed in a loop in a generator, you will need to call out to some other function (which itself would get optimized).

baseline

On the other hand, I should note that generators are optimized for quick suspend and resume. In a generator function, V8 stores local variables on the heap instead of on the stack. You would think this would make things slow, but that is not the case. Access to locals has the same speed characteristics: both stack and heap locals are accessed by index from a register (the stack pointer or the context pointer).

There is an allocation overhead when you enter a generator to create the heap storage, but it is almost never the case that you need to copy or allocate when suspending a generator. Since the locals are on the heap already, there is nothing to do! You just write the current instruction pointer (something you know at compile-time) into a slot in the generator object and return the result.

Similarly, when resuming you have to push on a few words to rebuild the stack frame, but after that's done you just jump back to where you were and all is good.

The exception to this case is when you yield and there are temporaries on the stack. Recall in my article on V8's baseline compiler that the full-codegen is a stack machine. It allocates slots to named locals, but temporary values go on the stack at run-time, and unfortunately V8's baseline compiler is too naive even to know what the stack height is at compile-time. So each suspension checks the stack height, and if it is nonzero, it calls out to a runtime helper to store the operands.

This run-time save would be avoidable in register machines like JavaScriptCore's baseline compiler, which avoids pushing and popping on the stack. Perhaps this note might spur the competitive urges of some fine JSC hacker, to show up V8's generators performance. I would accept JSC having faster generators if that's what it took to get generators into WebKit, at least for a while anyway :)

abstract musings

Many of you have read about the Futamura projections, in which one compiles a program via partial application of an interpreter with respect to a given input program. Though in the real world this does not appear to be a good strategy for implementing compilers, as so much of compilation is not simply about program structure but also about representation, there is a kernel of deep truth in this observation: the essence of compilation is partial evaluation of a strategy with respect to a particular program.

This observation most often helps me in the form of its converse. Whenever I find myself implementing some sort of generic evaluation strategy, there's a sign in my head that start blinking the word "interpreter". (Kinda like the "on-air" signs in studios.) That means I'm doing something that's probably not as efficient as it could be. I find myself wondering how I can make the strategy specific to a particular program.

In this case, we can think of suspending and resuming not as copying a stack frame out and in, but instead compiling specific stubs to copy out only the values that are needed, in the representations that we know that they have. Instead of looping from the bottom to the top of a frame, you emit code for each element. In this way you get a specific program, and if you manage to get here, you have a program that can be inlined too: the polymorphic "next" reduces to a monomorphic call of a particular continuation.

Anyway, Amdahl's law limits the juice we can squeeze from this orange. For a tight for-of loop with a couple of optimizations that should land soon in V8, I see generator performance that is only 2 times slower that the corresponding hand-written iterators that use stateful closures or objects: 25 nanoseconds per suspend/resume versus 12 for a hand-written iterator.

25 nanoseconds, people. You're using jQuery and asking for data from a server in the next country. It is highly unlikely that generators will be a bottleneck in your program :)

by Andy Wingo at June 11, 2013 01:48 PM

June 06, 2013

Iago Toral

On WebKit, Layers and Accelerated Compositing

This post is a brief and quick introduction to Accelerated Compositing and the role this plays in WebKit.

The point of accelerated compositing in WebKit is to take advantage of the GPU to accelerate the rendering of web content. To understand how this works one needs to know how a web page is actually rendered by WebKit.

Rendering in WebKit

WebKit groups DOM elements in layers that are rendered separately and then composited together to create the final page. This enables proper handling of transparent objects and overlapping contents for example. These layers, just like the DOM, conform a tree.

WebKit defines some rules to decide when a new layer is needed and which DOM elements should be included in it. For example, if you have a WebGL canvas or a video element, these will go to separate layers, if you use explicit CSS position properties on an object you get another layer for it, etc.

When it is time to paint the web page on the browser, the work consists of traversing the layer tree and composite the layers together to create the final view of the web page. For example, if you have a transparent layer sitting on top of some other layer in the page, the composition will take the two individual layers that have been rendered independently and blend them together in the final page, in the right order, at the right position and with the right
transparency.

Accelerated Compositing

Before we had accelerated compositing, the compositing process happend in software, that is, it was the CPU that would do all the layer compositing work, which is expensive and can hog the CPU, making for a worse user experience. Accelerated compositing however, involves offloading the compositing of the layers to the GPU hardware. It turns out that GPUs can do the compositing very fast and doing so would also free the CPU, delivering a better, more responsive, user experience too.

Conclusions

Accelerated compositing is all about making a better use of the available graphics hardware, offloading the work required to composite the final view of the webpage from the various layers it contains, which results in a better user experience and better overall rendering performance.

by itoral at June 06, 2013 02:36 PM

June 05, 2013

Andy Wingo

no master

It's difficult for anyone with an open heart and a fresh mind to look at the world without shuddering at its injustices: police brutality, obscene gaps between rich and poor, war and bombs and land mines, people without houses and houses without people. So much wrong! It is right and natural not only to feel revulsion, but to revolt as well. Fight the man! Down with the state! No god, no master!

So to those working against foreclosures and evictions of families in your neighborhoods, more power to you. Fight the good fight.

However, revolt is unfortunately not sufficient. We must also think beyond the now and imagine a better world: life after the revolution.

I'm sure I've lost some of you now, and that's OK. We are all at different points of political consciousness, and you can't speak to all at the same time. Some of the words like "revolution" probably bother some people. It sounds strident, right? But if you agree (and perhaps you do not) that there are fundamental problems with the way the world works, and that symptoms like banks kicking families out of houses using riot police are ultimately caused by those problems, then any real change must also be fundamental: at the foundation, where the roots are. Radical problems need radical solutions. Revolution is no more (and no less!) than a radically different world.

For me, one of these roots to dig up is hierarchy. I take as a principle that people should have power over and responsibility for decisions to the extent that they are affected, and hierarchy is all about some people having more power than others. Even if you do away with capitalism, as in the early Soviet Union, if you don't directly address the issue of hierarchy, classes will emerge again. A world in which some people are coordinators and other people are workers will result in a coordinator class that gradually accretes more and more power. Hierarchy is in essence tyrannical, though any specific instance may be more or less so.

On the other side of things, hierarchy deadens those at the bottom: the listless students, the mechanical workers, the passive and cynical voters. The revolution has to come about in a way that leaves people more alive and curious and energetic, and that seems to correspond not only with greater freedom but also with personal responsibility. I think that a world without hierarchy would be a world whose people would be more self-actualized and self-aware, as they would have more responsibility over the situations that affect their lives and those around them.

Well. I don't want to wax too theoretical here, in what I meant to be an introduction. I've been having these thoughts for a while and finally a couple years ago I decided to join in a collective experiment: to practice my chosen craft of computer programming within a cooperative context. I haven't had a boss since then, and I haven't been the boss of anyone. I'm not even my own boss. There are no masters!

I've gotten so used to this way of working that sometimes I forget how unusual it is to work in a cooperative. Also, we're usually too busy hacking to write about it ;) So in the next few articles I'll take a look at the internal structure of our company, Igalia, and try to give some insight into what it's like working in a cooperative.

by Andy Wingo at June 05, 2013 01:35 PM

May 26, 2013

Jacobo Aragunde

Playing with HTML5

In the last weeks I have been playing with HTML5 in a very literal way, merging two of my interests: web development and video games. The result is a small arcade game which makes use of several features from the new web standard.

You can take a look at the code on GitHub at any time and even play it online here. Let me comment on some of the implementation details below:

The game loop

The game loop is implemented using the function setInterval() from the standard JS libraries. In particular, a callback is invoked 30 times per second: it checks the state of the game entities, updates them and then redraws the scene. This was the best option given that I’m not using any development framework.

Canvas

The game area is contained in a canvas. The canvas has been around long before HTML5 took shape, but it has received quite a lot of love recently in the form of acceleration. The game code is not specially designed for performance (after all, it was mostly developed in a week), and the scene is being completely redrawn 30 times per second. Nonetheless, I could try it in a first-generation Atom laptop with no noticeable lag.

Bitmaps

The canvas element has primitive operations to draw vector graphics (lines, bezier curves, text…) but 99% of the scene has been implemented using bitmaps, piled in several layers. This way of working is really similar to using a low-level graphic abstraction like SDL in the end.

What’s more, the bitmaps are sent to the browser packed in sheets that contain several sprites. It is a common technique in web development to save bandwidth and to overcome the limit in the number of simultaneous requests that browsers can send, but its use in video games dates from longer ago, when programmers had to deal with memory limitations.

Input

The game is controlled with the keyboard, and I use callbacks for standard key events (keyupkeydown) to check its state. These events act asynchronously and independently from the game loop, so the state of the keys is stored in an object which is checked in the update entities phase of the loop.

It also has to be taken into account that the key event callbacks are set on the canvas object, as a consequence events are lost if the focus moves out of the canvas. I have found some weird interactions when the focus changes while keys are pressed, so this part of the code definitely needs some work yet to become more robust.

Sound

I’ve used the web audio API to be able to manipulate sound with a high level of detail. Though I haven’t used most of its capabilities by far, it was worth connecting all the wires to make it work.

This API is designed as an audio flow going through several nodes, similarly to other popular multimedia frameworks. Each node has the ability to process the input in some way before sending it to the next node. I used only one intermediate node, the GainNode, to be able to mute the audio output at will, but it could have been used to implement an internal volume control.

Wishlist

A project is never complete with a list of things one would like to add, complete or polish. I’ve already mentioned some details worth revisiting, but there are also some features that I would have liked to use; examples are local storage to implement high score boards or network code to implement some sort of interaction between players. But in the end of the day, this has been a worthwhile exercise; I’m satisfied with the result and  I might still improve it in the future!

by Jacobo Aragunde Pérez at May 26, 2013 11:38 AM

May 24, 2013

Juan A. Suárez

Grilo plugins 0.2.8 released

I did a new release of grilo plugins only one week after the previous release because it includes a patch that people from gnome-photos would like to see in next GNOME 3.9 pre-release.

Besides that patch, this new release also includes a new plugin to get content from the Magnatune service. Credits go to Victor Toso, who did a great job on it.

Happy weekend from Igalia headquarter!

 

by jasuarez at May 24, 2013 10:55 PM

May 18, 2013

Juan A. Suárez

What’s going on in Grilo?

There’s a lot of time I don’t blog about Grilo. But it doesn’t mean we are not working on it! Here at Igalia we do, and we also get lot of contributions from community. All the announcements are sent to the Grilo mailing list.

During this week we have released a new version of Grilo, both for core (v0.2.6) and for the plugins (v0.2.7). You can see the announcements here and here. If you are interested in more detailed changelogs, you can see them here and here.

So what happens in this release?

  • As usual, lot of bugfixes.
  • Bastien added support for non-file URIs in Filesystem plugin. What this means? In Filesystem plugin you can configure the base path from which the source can show content. Thus, if you setup the base path as ~/Music, it means Filesystem will show only content placed in that directory. With the new approach, you could use as base-path something like recent://, getting the list of recently used items. Cool, uh?
  • My mate Sergio fixed the cache system in GrlNet. Seems we broke it at some point in the history, and we didn’t realize. We rely on libsoup and its caching feature, and Sergio is very skilled in libsoup. So who’s better than him to fix it? :D
  • We improved grl-inspect tool. Inspired in gst-inspect from GStreamer, this tool helps to list all the available sources in the system, and list all the supported features. Now, it can also list all the available metadata keys developer can use. Moreover, it also list which sources support each key.
  • We added support for i18n! Now we speak Brazilian, Czech, Galician, Greek, Polish, Serbian, Slovenian, Spanish and Tajik. And we hope to support more languages in the coming releases. Many thanks to Rafael, Marek, Fran, Dimitris, PiotrМирослав, Martin, Miguel and Victor for their work on translating Grilo.
  • Marek Chalupa added support for GOA in the Flickr plugin. This means that if you have a Flickr account configured in GOA, you will get for free a Flickr source dealing with that account. Nice!

I think that’s all. If you use Grilo and needs help or support, don’t hesitate to contact us!

by jasuarez at May 18, 2013 11:49 AM

May 15, 2013

Claudio Saavedra

Wed 2013/May/15

May 15, 2013 08:53 AM

May 14, 2013

Iago Toral

WebKitGTK+ / Wayland Demo and Future Work

So first things first, check out the video below to see the demo, it showcases Web (Epiphany), the default browser of the GNOME platform, running on WebKit1 under Wayland (Weston) and illustrates:

  • Browsing of regular text/image based sites
  • Embedded HTML5 video playback (Youtube and native)
  • 2D and 3D CSS transforms
  • WebGL

If you are intrigued to know more about WebKit, Wayland and Web, keep reading. If you already know about this and are more interested in knowing what is still missing, go here.

So WebKit you say, what is that and why should I care?

There is a significant chance that when you are reading this you are using WebKit. WebKit is an open source web browser engine that powers a variety of software that we run on our desktops, phones and a myriad of other gadgets every day and with the popularity that HTML5 has achieved this will only get bigger and bigger in the future.

Igalia has always considered the web the platform of the future and with the passing of the years we have seen this become more and more real. In this process we are certain that WebKit has played a very significant role, providing an open framework where many companies and individuals have worked together for years, building something that is at the core of most HTML5 based solutions out there. These companies and individuals have done a remarkable work in pushing the technology forward and make all this possible, and at Igalia we are very proud to be a part of this too.

Ok, so WebKit is important… but what does that have to do with Wayland?

WebKit and HTML5 are not the only big things happening out there. Wayland will progressively replace X in the coming years, something outstanding when you consider the fact that X has been there for 3 decades! This replacement is not trivial though, X has established the basis for graphical user interfaces since… well since before I even knew computers existed! Much of the software we use every day is based on X directly or indirectly, so with the change, a lot of software that we love will need to undergo some changes and adaptations to work with Wayland.

Platforms like GNOME and KDE have plans to support Wayland. GNOME in particular plans to have complete Wayland support some time in 2014, but for that to happen efforts have already started at many levels. GTK+, the graphical toolkit of the GNOME platform, is already compatible with Wayland, which is a big part of what needs to be done to make GNOME itself run on Wayland. Still, there are plenty of X bits in many other parts of the platform that will need to be addressed. WebKitGTK+ is no exception to this.

The work to get WebKitGTK+ ported to Wayland was initiated by my colleague José Dapena and I have recently joined this effort. I started from Jose’s patches, checking the current state of things and trying to identify the missing bits. I hope the video at the top gives a good idea of where we are now and later in this post I will explain what is still missing. For now I have focused mostly on WebKit1, but I believe most of my findings are just as valid for WebKit2. Hopefully this will help us define specific tasks and make steady progress in the months to come.

And what did you do to get that demo exactly?

Since WebKitGTK+ is based on GTK+ and GTK3 already provides a Wayland backend, we need to make sure that we use GTK3 in WebKitGTK+ and Web, which we already do (except for the plugin process which still needs GTK2, more on this later), so thankfully a lot of the heavy lifting had already been taken care of.

The second thing we need to check  is the backing store backend we use. We need to replace the X11 implementation with something more agnostic. Good that we already have a Cairo implementation of the backing store available!

A few other minor fixes aside, these two items represent most of what I needed to get the demo running.

How far are you from full Wayland support then?

1. Accelerated compositing

Current implementation of accelerated compositing depends on XComposite and XDamage. We need to identify how to re-implement this functionality in Wayland (at least the bits that depend on X directly)  while we make sure that we keep the same performance advantage.

2. Fullscreen video playback

Fullscreen video playback with GStreamer is implemented using the video overlay interface which is platform agnostic. However, even if the interface is platform agnostic, it requires GSteamer to provide a platform-specific implementation. There is a Wayland sink in GStreamer, however this sink does not implement the video overlay interface and I still don’t know if this is to be addressed in GStreamer or when. If the waylandsink in GStreamer is extended to implement the video overlay interface then the good news is that the work on the WebKit end should be fairly straight forward.

3. Plugins

Various plugins won’t work in Wayland. The Flash plugin in particular is known to only work with GTK2 (this is the reason the Plugin process in WebKit2 requires GTK2 actually). Since GTK2 does not support Wayland the consequence is that we won’t be seeing the Flash plugin in Wayland for now. We contacted Adobe some time ago on this subject to see if they would be interested in porting their Flash plugin to GTK3, but apparently this is not a priority for them. Even more, this restricts all plugins to use GTK2 and hence, it will pretty much disable all plugins under Wayland (well, at least the ones that need to render). Another example of a plugin needing work is Java, which seems to have direct X dependencies in its code leading to a nice crash when you load a Java applet for example. I did not do a comprehensive analysis of all the plugins, but this is definitely a problematic area in the short term. Of course, this will probably change with time, when Wayland becomes mainstream plugin developers will have to port the plugins or lose user base and be replaced by other implementations. Some plugins may even become completely obsolete before Wayland becomes mainstream too, that could be the case of the Flash plugin maybe. Bottom line: plugins suck and if you are a web developer you probably want to avoid them if you have an alternative.

4. WebKit2

WebKit2 brings a bigger challenge to the Wayland port due to the split process architecture that it introduces. In this context we need to make sure that all the rendering and painting that takes place in coordinated form between the UI process and the Web process use Wayland mechanisms for GPU acceleration efficiently. We still need to design a good solution for this before getting hands-on with fixing it and we will need to approach the Wayland community and discuss how WebKit2′s architecture fits in Wayland as it is today. Wayland has just integrated support for sub-surfaces and this is something that may come in handy in the context of this problem too.

5. Web (Epiphany)

Web is the default browser of the GNOME platform and a good test ground for WebKitGTK+ too. Making WebKitGTK+ Wayland compatible would not be good enough for GNOME if Web isn’t. Fortunately, WebKitGTK+ and GTK3 do most of the work for Web in this regard, although there seem to be some X interactions in Web itself they do not look difficult to replace or find a workaround for. I will use the video above to prove my point ;)

Conclusions

So that’s it, even if there are still some issues that need further work, WebKitGTK+ and Web can run in Wayland with few changes as they are today, including things like WebGL, CSS Transforms or embedded video playback. Next in the roadmap are accelerated compositing and WebKit2. We are planning to make progress in these areas in the coming months, so if you are as excited as we are for full Wayland support in WebKitGTK+ including WebKit2 and Accelerated Compositing, stay tuned!

by itoral at May 14, 2013 03:49 PM

Samuel Iglesias

Introduction to Linux Graphics drivers: DRM

Linux support for graphics cards are very important for desktop and mobile users: they want to run games, composite their applications and have a nice and modern user experience.

AGP Video Card photo

So it’s usual that all the eyes are on this area when you want to optimize your embedded device’s user experience… but how the graphics drivers communicate with user-space applications?

Víctor Jáquez, from Igalia, wrote a very nice introduction to this topic. If you are interest on Linux graphics stack in general, you have this post wrote by Jasper St. Pierre.

Direct Rendering Infraestructure schema

Direct Rendering Infraestructure schema (by Víctor Jáquez)

The Direct Rendering Infrastructure (DRI) in Linux is like is shown at the above picture. Using Xorg as an X server example, we see that it is in charge of X11 drawing commands along with Mesa or Gallium3D as the OpenGL software implementation for rendering three-dimensional graphics.

How they communicate with the actual HW? One possibility is using libdrm as a backend to talk with the Direct Rendering Manager at Linux kernel. DRM is divided in two in-kernel drivers: a generic drm driver and another which has specific support for the video hardware.

There is a Xorg driver running in user-space who receives the data to be passed to the HW. Using Nouveau as an example, the xf86-video-nouveau is the DDX (Device Dependent X) component inside of X stack.

It communicates with the in-kernel driver using ioctl. Some of them are generic for all the drivers and they are defined in the generic drm driver. Inside of drivers/gpu/drm/drm_drv.c file you have the relationship between the ioctl number, its corresponding function and its capabilities.

/** Ioctl table */
static const struct drm_ioctl_desc drm_ioctls[] = {
	DRM_IOCTL_DEF(DRM_IOCTL_VERSION, drm_version, DRM_UNLOCKED),
	DRM_IOCTL_DEF(DRM_IOCTL_GET_UNIQUE, drm_getunique, 0),
	DRM_IOCTL_DEF(DRM_IOCTL_GET_MAGIC, drm_getmagic, 0),
	DRM_IOCTL_DEF(DRM_IOCTL_IRQ_BUSID, drm_irq_by_busid, DRM_MASTER|DRM_ROOT_ONLY),
	DRM_IOCTL_DEF(DRM_IOCTL_GET_MAP, drm_getmap, DRM_UNLOCKED),
	DRM_IOCTL_DEF(DRM_IOCTL_GET_CLIENT, drm_getclient, DRM_UNLOCKED),
	DRM_IOCTL_DEF(DRM_IOCTL_GET_STATS, drm_getstats, DRM_UNLOCKED),
	DRM_IOCTL_DEF(DRM_IOCTL_GET_CAP, drm_getcap, DRM_UNLOCKED),
	DRM_IOCTL_DEF(DRM_IOCTL_SET_VERSION, drm_setversion, DRM_MASTER),
[...]

However, the specific driver for the video hardware can define its own ioctls in a similar way. For example, here you have the corresponding ones for the Nouveau driver (drivers/gpu/drm/nouveau/nouveau_drm.c file).

static struct drm_ioctl_desc
nouveau_ioctls[] = {
	DRM_IOCTL_DEF_DRV(NOUVEAU_GETPARAM, nouveau_abi16_ioctl_getparam, DRM_UNLOCKED|DRM_AUTH),
	DRM_IOCTL_DEF_DRV(NOUVEAU_SETPARAM, nouveau_abi16_ioctl_setparam, DRM_UNLOCKED|DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
	DRM_IOCTL_DEF_DRV(NOUVEAU_CHANNEL_ALLOC, nouveau_abi16_ioctl_channel_alloc, DRM_UNLOCKED|DRM_AUTH),
	DRM_IOCTL_DEF_DRV(NOUVEAU_CHANNEL_FREE, nouveau_abi16_ioctl_channel_free, DRM_UNLOCKED|DRM_AUTH),
	DRM_IOCTL_DEF_DRV(NOUVEAU_GROBJ_ALLOC, nouveau_abi16_ioctl_grobj_alloc, DRM_UNLOCKED|DRM_AUTH),
	DRM_IOCTL_DEF_DRV(NOUVEAU_NOTIFIEROBJ_ALLOC, nouveau_abi16_ioctl_notifierobj_alloc, DRM_UNLOCKED|DRM_AUTH),
	DRM_IOCTL_DEF_DRV(NOUVEAU_GPUOBJ_FREE, nouveau_abi16_ioctl_gpuobj_free, DRM_UNLOCKED|DRM_AUTH),
	DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_NEW, nouveau_gem_ioctl_new, DRM_UNLOCKED|DRM_AUTH),
	DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_PUSHBUF, nouveau_gem_ioctl_pushbuf, DRM_UNLOCKED|DRM_AUTH),
	DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_CPU_PREP, nouveau_gem_ioctl_cpu_prep, DRM_UNLOCKED|DRM_AUTH),
	DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_CPU_FINI, nouveau_gem_ioctl_cpu_fini, DRM_UNLOCKED|DRM_AUTH),
	DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_INFO, nouveau_gem_ioctl_info, DRM_UNLOCKED|DRM_AUTH),
};

As you can see, most of these ioctls can be grouped by functionality:

  • Get information from the drm generic driver: stats, version number, capabilities, etc.
  • Get/set parameters: gamma values, planes, color pattern, etc.
  • Buffer’s management: ask for new buffer, destroy, push buffer, etc.
  • Memory management: GEM, setup MMIO, etc.
  • Framebuffer management.
  • In case of Nouveau: channel allocation (for context switching).

Basically, the xorg user-space driver may prepare a buffer of commands to be executed on the GPU and pass it through a ioctl call. Sometimes, it just want to use the framebuffer to draw on the screen directly. Others, it uses KMS capabilities to change parameters of the screen: video mode, resolution, gamma correction, etc.

There are more things that DRM can do inside of the kernel. It can setup the color pattern used to draw pixels on the screen, it can select which encoder is going to be used and on which connector (LVDS, D-Sub, HDMI, etc), pick EDID information from the monitor, manage vblank IRQ…

At Igalia we have a long background working in Linux multimedia stack (GStreamer, WebGL, OpenCL, drivers, etc). If you need help to develop, optimize or just you want advice, please don’t hesitate to contact us.

by Samuel Iglesias at May 14, 2013 07:50 AM

May 08, 2013

Andy Wingo

generators in v8

Hey y'all, ES6 generators have landed in V8! Excellent!

Many of you know what that means already, but for those of you that don't, a little story.

A few months ago I was talking with Andrew Paprocki over at Bloomberg. They use JavaScript in all kinds of ways over there, and as is usually the case in JS, it often involves communicating with remote machines. This happens on the server side and on the client side. JavaScript has some great implementations, but as a language it doesn't make asynchronous communication particularly easy. Sure, you can do anything you want with node.js, but it's pretty easy to get stuck in callback hell waiting for data from the other side.

Of course if you do a lot of JS you learn ways to deal with it. The eponymous Callback Hell site lists some, and lately many people think of Promises as the answer. But it would be nice if sometimes you could write a function and have it just suspend in the middle, wait for your request to the remote computer to come through, then continue on.

So Andrew asked if me if we could somehow make asynchronous programming easier to write and read, maybe by implementing something like C#'s await operator in the V8 JavaScript engine. I told him that the way into V8 was through standards, but that fortunately the upcoming ECMAScript 6 standard was likely to include an equivalent to await: generators.

ES6 generators

Instead of returning a value, when you first call a generator, it returns an iterator:

// notice: function* instead of function
function* values() {
  for (var i = 0; i < arguments.length; i++) {
    yield arguments[i];
  }
}

var o = values(1, 2, 3);  // => [object Generator]

Calling next on the iterator resumes the generator, lets it run until the next yield or return, and then suspends it again, resulting in a value:

o.next(); // => { value: 1, done: false }
o.next(); // => { value: 2, done: false }
o.next(); // => { value: 3, done: false }
o.next(); // => { value: undefined, done: true }

Maybe you're the kind of person that likes imprecise, incomplete, overly abstract analogies. Yes? Well generators are like functions, with their bodies taken to the first derivative. Calling next integrates between two yield points. Chew on that truthy nugget!

asynchrony

Anyway! Supending execution, waiting for something to happen, then picking up where you left off: put these together and you have a nice facility for asynchronous programming. And happily, it works really well with promises, a tool for asynchrony that lots of JS programmers are using these days.

Q is a popular promises library for JS. There are some 250+ packages that depend on it in NPM, node's package manager. So cool, let's take their example of the "Pyramid of Doom" from the github page:

step1(function (value1) {
    step2(value1, function(value2) {
        step3(value2, function(value3) {
            step4(value3, function(value4) {
                // Do something with value4
            });
        });
    });
});

The promises solution does at least fix the pyramid problem:

Q.fcall(step1)
.then(step2)
.then(step3)
.then(step4)
.then(function (value4) {
    // Do something with value4
}, function (error) {
    // Handle any error from step1 through step4
})
.done();

But to my ignorant eye, some kind of solution involving a straight-line function would be even better. Remember what generators do: they suspend computation, wait for someone to pass back in a result, and then continue on. So whenever you would register a callback, whenever you would chain a then onto your promise, instead you suspend computation by yielding.

Q.async(function* () {
  try {
    var value1 = yield step1();
    var value2 = yield step2(value1);
    var value3 = yield step3(value2);
    var value4 = yield step4(value3);
    // Do something with value4
  } catch (e) {
    // Handle any error from step1 through step4
  }
});

And for a super-mega-bonus, we actually get to use try and catch to handle exceptions, just as Gods and Brendan intended.

Now I know you're a keen reader and there are two things that you probably noticed here. One is, where are the promises, and where are the callbacks? Who's making these promises anyway? Well you probably saw already in the second example that using promises well means that you have functions that return promises instead of functions that take callbacks. The form of functions like step1 and such are different when you use promises. So in a way the comparison between the pyramid and the promise isn't quite fair because the functions aren't quite the same. But we're all reasonable people so we'll deal with it.

Note that it's not actually necessary that any of the stepN functions return promises. The promises library will lift a value to a promise if needed.

The second and bigger question would be, how does the generator resume? Of course you've already seen the answer: the whole generator function is decorated by Q.async, which takes care of resuming the generator when the yielded promises are fulfilled.

You don't have to use generators of course, and using them with Q does mean you have to understand more things: not only standard JavaScript, but newfangled features, and promises to boot. Well, if it's not your thing, that's cool. But it seems to me that the widespread appreciation of await in C# bodes well for generators in JS.

ES6, unevenly distributed

Q.async has been part of Q for a while now, because Firefox has been shipping generators for some time.

Note however that the current ES6 draft specification for generators is slightly different from what Firefox ships: calling next on the iterator returns an object with value and done properties, whereas the SpiderMonkey JS engine used by Firefox uses an exception to indicate that the iterator is finished.

This change was the result of some discussions at the TC39 meeting in March, and hasn't made it into a draft specification or to the harmony:generators page yet. All could change, but there seems to be a consensus.

I made a patch to Q to allow it to work both with the old SpiderMonkey-style generators as well as with the new ES6 style, and something like it should go in soon.

give it to me already!

So yeah, generators in V8! I've been working closely with V8 hackers Michael Starzinger and Andreas Rossberg on the design and implementation of generators in V8, and I'm happy to say that it's all upstream now, modulo yield* which should go in soon. Along with other future ES6 features, it's all behind a runtime flag. Pass --harmony or --harmony-generators to your V8 to enable it.

Barring unforeseen issues, this will probably see the light in Chromium 29, corresponding to V8 3.19. For now though this hack is so hot out of the fire that it hasn't even had time to cool down and make it to the Chrome Canary channel yet. Perhaps within a few weeks; whenever the V8 dependency in Chrome gets updated to the 3.19 tree.

As far as Node goes, they usually track the latest stable V8 release, and so for them it will probably also be a few weeks before it goes in. You'll still have to run Node with the --harmony flag. However if you want a sneak preview of the future, I uploaded a branch of Node with V8 3.19 that you can build from source. It might mulch your cat, but life is not without risk on the bleeding_edge.

Finally for Q, as I said ES6 compatibility should come soon; track the progress or check out your own copy here.

final notes

Thanks to the V8 team for their support in this endeavor, especially to Michael Starzinger for enduring my constant questions. There's lots of interesting things in the V8 pipeline coming out of Munich. Thanks also to Andrew Paprocki and the Bloomberg crew for giving me the opportunity to fill out this missing piece of real-world ES6. The Bloomberg folks really get the web.

This has been a hack from Igalia, your friendly neighborhood browser experts. Have fun with generators, and happy hacking!

by Andy Wingo at May 08, 2013 07:31 PM

May 05, 2013

Eduardo Lima Mitev

Introducing gocl, a gobject wrapper to OpenCL

For the past few months I have been working on this project to bring OpenCL closer to GNOME technologies, and today I’m glad to make the first public announcement. For the uninformed reader, OpenCL is a framework and language for writing programs that execute across heterogeneous HW pieces like CPUs, GPUs, DSPs, etc. While not applicable to any piece of software, OpenCL can unleash unparalleled performance and power efficiency on specific heavy algorithms like media decoding, cryptography, computer vision, big data indexing and processing, physics simulation, graphics, image compositing, among others.

Gocl is a GLib/GObject based library that aims at simplifying the use of OpenCL in GNOME software. It is intended to be a lightweight wrapper that adapts OpenCL programming patterns and boilerplate, and expose a simpler API that is known and comfortable to GNOME developers. Examples of such adaptations are the integration with GLib’s main loop, exposing non-blocking APIs, GError based error reporting and full gobject-introspection support. It will also be including convenient API to simplify code for the most common use patterns.

Gocl started as part of the work and research we do at Igalia on HW acceleration, that I decided to take a bit of, clean it up and release it in a way that can be useful to others. OpenCL is gaining relevance and popularity since the number of implementations and supported chips have grown significantly in recent years. Soon we are going to see OpenCL running anywhere and GNOME technologies should be ready to take advantage of it.

Full gtk-doc documentation is available, and source code is hosted at my GitHub account.

The API is very simple and limited at this stage, and should be considered very unstable. Although I’m not currently working on it full time, I do have kind of a roadmap for the API and features that I will prioritize:

  • Completing the missing asynchronous API
  • Adding API to query available OpencL extensions
  • Provide API to expose cl_khr_gl_sharing extension, for object sharing with OpenGL

You are welcome to suggest/request features that you would like to see in Gocl, as well as propose changes on the API. The GitHub issue tracking at project’s page is available for that, and also to report bugs.

So, do you know of a specific piece of software in GNOME that could potentially benefit from OpenCL? I would love to hear about it.

At Igalia, as part of our strong commitment to make the Web better and faster, we are already looking into ways of applying OpenCL to WebKit and its related technologies, and I’m personally interested on that line of work.

by elima at May 05, 2013 11:41 PM

Víctor Jáquez

GPhone v0.2

It is almost 5 months since I published the first release of GPhone. Though I haven’t abandoned it, I swork on it slowly but steadly.  And to prove it, I’m releasing a new version, the v0.2.

What does it include? Mainly bug fixes. The new features are still a work in progress. You can browse the next branch in git to glance it.

Change Log:

  • Able to use GSimpleAsyncResult for glib <= 2.35
  • Added Vala (0.16) back compatibility, including a gstreamer-1.0.vapi
  • Added a script for build and update a jhbuild environment: update-jhbuild-env. It handles the gphone’s dependencies.
  • Notify to the user if his NAT type is not compatible with a VoIP session.
  • Renamed accounts to registrars.
  • Handle networking signaling
  • Create a global header file
  • Many bug fixes

 

by vjaquez at May 05, 2013 09:26 PM

April 26, 2013

Claudio Saavedra

Fri 2013/Apr/26

  • Oh the WebKits! During the past few weeks, thanks to Igalia's collaboration with the good folks at Bloomberg, I have descended from the heights of Epiphany and WebKitGTK+ to the depths of WebCore, that obscure but cleverly assembled part of WebKit that magnificently takes care of the logic inherent to layouting, rendering, and the inner representation of HTML documents. A fascinating aspect of WebCore is that its architecture, completely decoupled from the actual implementation in the different WebKit ports, means that any change to its parts will affect all ports and browsers built upon this marvelous piece of engineering. Let me assure you, dear reader, the challenges this implies are comparable only to the joy it brings to this humble hacker, as the following will reveal!

    Among the many duties of WebCore lies controlling the logic behind user interaction with HTML documents — something that has changed considerably in recent years. While originally, most interactive editing in the web was limited to plain and boring web forms, in this brave new world of ours it is also possible to build complete HTML editors using nothing but HTML and JavaScript access to the DOM. Have you seen Wordpress' fantastic editor? Then you shall agree with me that this is an extremely powerful feature.

    But with great power comes great responsibility, as the old saying goes. And with great responsibility come bugs, says a more recent variation of the same maxim. And where bugs are to be found, relentless minds work tirelessly in order to ensure that your browsing experience never ceases to improve. This is one of the goals that Igalia, humbly but boldly, pursues with utmost seriousness. And so it has been that I, your humble servant, have spent countless hours mastering my way through the DOM and editing features of WebCore. Bugs have been fixed already — some affecting editing in Windows, others affecting editing in GNU/Linux, and others affecting all platforms equally. More will be fixed in the forthcoming weeks. I can only attempt to share my excitement through these words, for I am unable to express it in a way that would do it justice.

  • As a side note, I am a committer to the WebKit project for a little while now. This is pretty cool, as it means I get a direct chance to break your browser. Or unbreak it, shall it be the case. I try to lean towards the latter but trust me, it is not an easy task!

April 26, 2013 02:40 PM

April 22, 2013

Jacobo Aragunde

KiCad nas XII Xornadas Libres

De novo visito a  Facultade de Informática da Coruña para participar na seguinte edición das Xornadas Libres que organiza o GPUL. Para esta ocasión preparo un pequeno taller sobre KiCad coa axuda do meu compañeiro Samuel. Trataremos de mostrar un ciclo completo do deseño dun proxecto hardware usando esta ferramenta.

Así que este xoves 25 ás 18:00 haberá KiCad na Facultade, pero non olvidedes botarlle un ollo ao resto de actividades, que abranguen diversos aspectos da cultura libre; sen ir máis lonxe, o meu compañeiro Chema tamén estará alí para convencer aos desenvolvedores noveis da importancia de licenciar o seu software.

Vémonos alí!

by Jacobo Aragunde Pérez at April 22, 2013 03:18 PM

April 18, 2013

Andy Wingo

inside full-codegen, v8's baseline compiler

Greetings to all. This is another nargish article on the internals of the V8 JavaScript engine. If that's your thing, read on. Otherwise, here's an interesting interview with David Harvey. See you laters!

full-codegen

Today's topic is V8's baseline compiler, called "full-codegen" in the source. To recall, V8 has two compilers: one that does a quick-and-dirty job (full-codegen), and one that focuses on hot spots (Crankshaft).

When you work with V8, most of the attention goes to Crankshaft, and rightly so: Crankshaft is how V8 makes code go fast. But everything goes through full-codegen first, so it's still useful to know how full-codegen works, and necessary if you go to hack on V8 itself.

The goal of this article is to recall the full-codegen to mind, so as to refresh our internal "efficiency model" of how V8 runs your code (cf. Norvig's PAIP retrospective, #20), and to place it in a context of other engines and possible models.

stack machine

All JavaScript code executed by V8 goes through full-codegen, so full-codegen has to be very fast. This means that it doesn't have time to do very much optimization. Even if it did have time to do optimization (which it doesn't), full-codegen lacks the means to do so. Since the code hasn't run yet when full-codegen compiles it, it doesn't know anything about types or shapes of objects. And as in other JS engines, full-codegen only compiles a function at a time, so it can't see very far. (This lets it compile functions lazily, the first time they are executed. Overall syntactic validity is ensured by a quick "pre-parse" over the source.)

I have to admit my surprise however at seeing again how simple full-codegen actually is: it's a stack machine! You don't get much simpler than that. It's shocking in some ways that this is the technology that kicked off the JS performance wars, back in 2008. The article I wrote a couple years ago does mention this, but since then I have spent a lot of time in JSC and forgot how simple things could be.

Full-codegen does have a couple of standard improvements over the basic stack machine model, as I mentioned in the "two compilers" article. One is that the compiler threads a "context" through the compilation process, so that if (say) x++ is processed in "effect" context, it doesn't need to push the initial value of x on the stack as the result value. The other is that if the top-of-stack (ToS) value is cached in a register, so that even if we did need the result of x++, we could get it with a simple mov instruction. Besides these "effect" and "accumulator" (ToS) contexts, there are also contexts for pushing values on the stack, and for processing a value for "control" (as the test of a conditional branch).

and that's it!

That's the whole thing!

Still here?

What about type recording and all that fancy stuff, you say? Well, most of the fancy stuff is in crankshaft. Type recording happens in the inline caches that aren't really part of full-codegen proper.

Full-codegen does keep some counters on loops and function calls and such that it uses to determine when to tier up to Crankshaft, but that's not very interesting. (It is interesting to note that V8 used to determine what to optimizing using a statistical profiler, and that is no longer the case. It seems statistical profiling would be OK in the long run, but as far as I understand it didn't do a great job with really short-running code, and made it difficult to reproduce bugs.)

I think -- and here I'm just giving my impression, but fortunately I have readers that will call my bullshit -- I think that if you broke down basic JavaScript and ignored optimizable hot spots, an engine mostly does variable accesses, property accesses, and allocations. Those are the fundamental things to get right. In a browser you also have DOM operations, which Blink will speed up at some point; and there are some domain-specific things like string operations and regular expressions of course. But ignoring that, there are just variables, properties, and allocations.

We'll start with the last bit. Allocation is a function of your choice of value representation, your use of "hidden classes" ("maps" in V8; all engines do the same thing now), and otherwise it's more a property of the runtime than of the compiler. (Crankshaft can lower allocation cost by using unboxed values and inlining allocations, but that's Crankshaft, not full-codegen.)

Property access is mostly handled by inline caches, which also handle method dispatch and relate to hidden classes.

The only thing left for full-codegen to do is to handle variable accesses: to determine where to store local variables, and how to get at them. In this case, again there's not much magic: either variables are really local and don't need to be looked up by name at runtime, in which case they can be allocated on the stack, or they persist for longer or in more complicated scopes, in which case they go on the heap. Of course if a variable is never referenced, it doesn't need to be allocated at all.

So in summary, full-codegen is quick, and it's dirty. It does its job with the minimum possible effort and gets out of the way, giving Crankshaft a chance to focus on the hot spots.

comparison

How does this architecture compare with what other JS engines are doing, you ask?

I have very little experience with SpiderMonkey, but as far as I can tell, with every release they they get closer and closer to V8's model. Earlier this month Kannan Vijayan wrote about SpiderMonkey's new baseline compiler, which looks to be exactly comparable to full-codegen. It's a stack machine that emits native code. It looks a little different because it operates on bytecode and not the AST, as SpiderMonkey still has an interpreter hanging around.

JavaScriptCore, the WebKit JavaScript engine, is similar on the surface: it also has a baseline compiler and an optimizing compiler. (In fact it's getting another optimizing compiler soon, but that's a story for another article). But if you look deeper, I think JSC has diverged from V8's model in interesting ways.

Besides being a register machine, a model that has inherent advantages over a stack machine, JavaScriptCore also has a low-level interpreter that uses the same calling convention as the optimizing compiler. I don't think JSC is moving in V8's direction in this regard; in fact, they could remove their baseline compiler entirely, relying instead on the interpreter and the optimizing compiler. JSC also has some neat tricks related to scoping (lazy tear-off); again, a topic for some future article.

summary

The field is still a bit open as to what is the best approach to use for cold code. It seems that an interpreter could still be a win, because if not, why would SpiderMonkey keep one around, now that they have a baseline compiler? And why would JSC add a new one?

At the same time, it seems to me that one could do more compile-time analysis at little extra cost, to speed up full-codegen by allocating more temporaries into slots; you would push and pop at compile-time to avoid doing it at run-time. It's unclear whether it would be worth it, though, as that analysis would have to run on all code instead of just the hot spots.

Hopefully someone will embark on the experiment; it should just be a simple matter of programming :) Until next time, happy hacking!

by Andy Wingo at April 18, 2013 03:45 PM

April 11, 2013

Carlos García Campos

WebKitGTK+ 2.0.0

After more than two years of development the Igalia WebKit team is proud to announce WebKitGTK+ 2.0.0.

But what’s so special about WebKitGTK+ 2.0?

The WebKit2GTK+ API is now the default one. This means that it’s now considered stable from the API/ABI backwards compatibility point of view, and that the old WebKit1 API is in maintenance mode and kind of deprecated. We will maintain both APIs, but we don’t plan to work on WebKi1 other than fixing bugs.

We encourage everybody to port their existing WebKitGTK+ applications to WebKit2, although we know the WebKit2 GTK+ API is not ready for all applications yet. We will work on adding new API during next release cycle, so let us know if you are missing some API that prevents you from porting your project.

Epiphany, the GNOME Web browser, has been successfully ported to WebKit2 and uses it by default since GNOME 3.8.

What are the benefits of the WebKit2 GTK+ API?

We have talked several times about the advantages of the multi-process architecture of WebKit2, robustness, responsiveness, security, etc. All of the details of the multi-process separation are mostly transparent for the API users, bringing all those benefits for free to any application using WebKit2 GTK+. We have developed the API on top of this multi-process architecture, but also with the experience of several years developing and maintaining the WebKit1 GTK+ API, learning from the mistakes made in the past and keeping the good ideas. As a result, the WebKit2 API is very similar to the WebKi1 in some parts and quite different in others. We started from scratch with the following goals:

  • Simple and easy to use. Instead of porting the WebKit1 API to WebKit2, we decided to add new API on demand. We set some milestones based on porting real applications, adding new API required to port them. This also allowed us to design the API, not only thinking about what we want or need to expose, but also how the applications expect to use the API.
  • Consistency. We have tried hard to be consistent with the names of the functions, signals and properties exposed by the API.
  • Flexibility. When possible, the API allows to use your own implementation of some parts that can be adopted to different platforms. So, you can use your own file chooser, JavaScript dialogs, context menu, print dialog, etc.
  • It works by default. For all those features where a custom implementation can be used, there’s a default implementation in WebKit that just works by default.
  • Unit tests. We have enforced all new patches adding API to WebKit2 GTK+ to include also unit tests, so the whole API is covered by unit tests.

Let’s see the major changes and advantages of this new WebKi2 API.

WebKitWebView is a scrolling widget

For API users this means that WebKitWebView should not be added to a GtkScrolledWindow, the widget is scrollable by itself. Actually this is also the case of the WebKitWebView in WebKit1, but some hacks were introduced to allow the widget to be used inside a GtkScrolledWindow. This caused a lot of headaches due to the synchronization between the internal scrolling and the GTK+ scroll adjustments. So now the main scrollbars are also handled by the WebKitWebView which, among other things, fixed the problem of the double scrollbars in some web sites.

Double scrollbar issue

Embedded HTTP authentication dialog

The default implementation of the HTTP authentication embeds a dialog in the WebView instead of using a real GtkDialog. It’s also integrated with the keyring by default using libsecret.

HTTP authentication dialog

GTK+ 2 plugins (flash)

Plugins also run in a different process that is built with GTK+ 2 to support the most popular plugins like flash that still use GTK+ 2.

MiniBrowser showing a youtube video using flash plugin

Web Inspector

The Web Inspector works automatically in both docked and undocked states without requiring any API call.

Inspector docked

It also has support for remote inspecting.

Remote inspecting

Accelerated compositing

Accelerated compositing is always enabled in WebKit2.

Poster circle

Future plans

During the next release cycle we’ll work on fixing bugs and completing the API, see our RoadMap for further details, but we’ll also explore some other areas not directly related the the API:

  • Multiple web processes support
  • Sandboxing
  • Network Process

by carlos garcia campos at April 11, 2013 05:48 PM

April 10, 2013

Samuel Iglesias

A great year has passed

Exactly one year ago, it was my first day as Igalia employee. I had arrived from Geneva some days before, I had rented an apartment here in Coruña the day before. I was very nervous and willing to start.

Today, I can say I am more than happy to be working for this company: I started contributing drivers upstream, I became one of the official maintainers of them, I went to LinuxCon Europe, gave talks at OSHWCon Madrid 2012 and 7th White Rabbit workshop, I learned a lot of things… but the most important thing is that I made real friends here.

Regarding my personal life, everything has been getting better last year. Living so close to my parent’s home gives me the possibility to see my relatives and friends more often. I just need to drive 3 hours and I am there!

I don’t know what is going to happen next year, but I am sure it’s going to be awesome :-)

by Samuel Iglesias at April 10, 2013 02:41 PM

April 08, 2013

Víctor Jáquez

GStreamer Hackfest 2013 – Milan

Last week, from 28th to 31th of March, some of us gathered at Milan to hack some bits of the GStreamer internals. For me was a great experience interact with great hackers such as Sebastian Drödge, Wim Taymans, Edward Hervey, Alessandro Decina and many more. We talked about GStreamer and, more particularly, we agreed on new features which I would like to discuss here.

GStreamer Hackers at Milan

GStreamer Hackers at Milan

For sake of completeness, let me say that I have been interested in hardware accelerated multimedia for a while, and just lately I started to wet my feet in VAAPI and VDPAU, and their support in our beloved GStreamer.

GstContext

The first feature that reached upstream is the GstContext. Historically, in 2011, Nicolas Dufresne added GstVideoContext as an interface to a share video context (such as display name, X11 display, VA-API display, etc.) among the pipeline elements and the applications. But now, Sebastian, generalized the interface to a container to stores and shares any kind of contexts between multiple elements and the application.

The first approach, that is still living in gst-plugins-bad, was merely a wrapper to a custom query to set or request a video context. But now, the context sharing is part of the pipeline setup.

An element that needs a shared context must follow these actions:

  1. Check if the element already has a context
  2. Query downstream for the context
  3. Post a message in the bus to see if the application has one to share.
  4. Create the context if there is none, post a message and send an event letting know that the element has the context.

You can see the example of the eglglessink to know how to use this feature.

GstVideoGLTextureUploadMeta

Also in 2011, Nicolas Dufresne, added a helper class to upload a buffer into a surface (OpenGL texture, VA API surface, Wayland surface, etc.). This is quite important since the new video players are scene based, using framework such as Clutter or OpenGL directly, where the video display is composed by various actors, such as the multimedia controls widgets.

But still, this interface didn’t fit well for GStreamer 1.0, until now, where it was introduced in the figure of a buffer’s meta, though this meta is only specific for OpenGL textures. If the buffer provides this new GstVideoGLTextureUploadMeta meta, a new function gst_video_gl_texture_upload_meta_upload() is available to upload that buffer into an OpenGL texture specified by its numeric identifier.

Obviously, in order to use this meta, it should be proposed for allocation by the sink. Again, you can see the case of eglglesink as example.

Caps Features

The caps features are a new data type for specify a specific extension or requirement for the handled media.

From the practical point of view, we can say that caps structures with the same name but with a non-equal set of caps features are not compatible, and, if a pad supports multiple sets of features it has to add multiple equal structures with different feature sets to the caps.

Empty GstCapsFeatures are equivalent with the GstCapsFeatures handled by the common system memory. Other examples would be a specific memory types or the requirement of having a specific meta on the buffer.

Again, we can see the example of the capsfeatures in eglglessink, because now the gst-inspect also shows the caps feature of the pads:

Pad Templates:
  SINK template: 'sink'
    Availability: Always
    Capabilities:
      video/x-raw(memory:EGLImage)
                 format: { RGBA, BGRA, ARGB, ABGR, RGBx,
                           BGRx, xRGB, xBGR, AYUV, Y444,
                           I420, YV12, NV12, NV21, Y42B,
                           Y41B, RGB, BGR, RGB16 }
                  width: [ 1, 2147483647 ]
                 height: [ 1, 2147483647 ]
              framerate: [ 0/1, 2147483647/1 ]
      video/x-raw(meta:GstVideoGLTextureUploadMeta)
                 format: { RGBA, BGRA, ARGB, ABGR, RGBx,
                           BGRx, xRGB, xBGR, AYUV, Y444,
                           I420, YV12, NV12, NV21, Y42B,
                           Y41B, RGB, BGR, RGB16 }
                  width: [ 1, 2147483647 ]
                 height: [ 1, 2147483647 ]
              framerate: [ 0/1, 2147483647/1 ]
      video/x-raw
                 format: { RGBA, BGRA, ARGB, ABGR, RGBx,
                           BGRx, xRGB, xBGR, AYUV, Y444,
                           I420, YV12, NV12, NV21, Y42B,
                           Y41B, RGB, BGR, RGB16 }
                  width: [ 1, 2147483647 ]
                 height: [ 1, 2147483647 ]
              framerate: [ 0/1, 2147483647/1 ]

Parsers meta

This is a feature which has been pulled by Edward Hervey. The idea is that the video codec parsers (H264, MPEG, VC1) attach a meta into the buffer with a defined structure that carries that new information provided by the codified stream.

This is particularly useful by the decoders, which will not have to parse again the buffer in order to extract the information they need to decode the current buffer and the following.

For example, here is the H264 parser meta definition.

VDPAU

Another task pulled by Edward Hervey, for which I feel excited, is the port of VDPAU decoding elements to GStreamer 1.0.

Right now only the MPEG decoder is upstreamed, but MPEG4 and H264 are coming.

As a final note, I want to thank Collabora and Fluendo for sponsoring dinners. A special thank you, as well, for Igalia which covered my travel expenses and attendance to the hackfest.

by vjaquez at April 08, 2013 04:36 PM

April 04, 2013

Andy Wingo

April 01, 2013

Philippe Normand

GStreamer hackfest in Milan

I spent Easter in Milan, attending the second official GStreamer hackfest. It has been a productive event and, as always, a pleasure to hang out with some members of the GStreamer core dev team fame :)

I focused on a few GStreamer bugs that were blocking some other WebKit bugs related to the media player and the WebAudio backend. I also had in mind to port the osxaudio sink to 1.0, a task started one week before the hackfest, but didn't have time to come back to it.

The first task I worked on was about cleaning up some code in uridecodebin related to URI protocols and on-disk buffering. In WebKit we use a custom HTTP(S) source element used by playbin when loading remote media files. The issue is that that element is also exposed to applications using WebKit itself so we made it handle a custom protocol, webkit+http://. But that exposed a bug in urideocdebin where some protocols, including http, are hardcoded in a white list used to enable on-disk buffering. Unfortunately the aforementioned bug is not solved yet even though some good progress was made!

The second task I worked on was first reported by Wim, the WebKit media player wasn't pausing when receiving buffering messages. The fix was quite simple. After that I finally added some basic codec installer support in WebKit. The GStreamer codec installer API is quite simple :)

Lastly I started working again on the WebAudio createMediaSourceElement support for the GStreamer/WebKit backend. The patch has been sleeping in Bugzilla since last Christmas. That feature is interesting for WebAudio applications that need to use audio data coming from media elements to eg, do stuff like equalizers. The current patch in bugzilla gathers the audio buffers in lists and copy them to the AudioBus when needed. Those memcpys could probably be avoided by pre-allocating the memory for the AudioBus and telling GStreamer to write directly there. So, with the help of my friend Alessandro Decina (who was doing some similar stuff on his Firefox branch) I started working on a simple GstAllocator that would be configured in the pipeline by intercepting allocation queries toward the appsinks used to gather the audio buffers. Well, this exposed some bugs on the GStreamer side! The deinterleave element always uses the default allocator when pushing data to its source pads. So I started a patch that performs allocation queries downstream and use the allocator returned, if any. Unfortunately my custom allocator doesn't work yet but I'll cary on work on this, the GstAllocator API is quite an interesting thing to play with :)

All in all it was a great and productive event. Christian wrote a nice report about the hackfest too. I also wanted to thank Collabora and Fluendo for sponsoring dinners. A special thank you as well for Igalia which covered my travel expenses and attendance to the hackfest.

by Philippe Normand at April 01, 2013 08:29 AM

March 26, 2013

Víctor Jáquez

GStreamer Hackfest 2013

Next Thursday I’ll be flying to Milan to attend the 2013 edition of the GStreamer Hackfest. My main interests are hardware codecs and GL integration, particularly VA API integrated with GL-based sinks.

Thanks Igalia for sponsoring my trip!

Igalia

by vjaquez at March 26, 2013 09:51 AM

March 25, 2013

Xan López

Web 3.8: the peace dividends release

I’m very happy about this release. Thanks to hard choices that we dared to make in the past we are now breaking new ground and giving GNOME some of the tools it needs to be the premier free software operating system. It’s been a long way since I spent an entire GUADEC porting good old epiphany to this newfangled thing called “WebKit”, and what a ride it has been.

There’s a lot to talk about, so let’s get started.

WebKit2

For the first time in this release, Web uses by default the all new multi-process architecture known as WebKit2. The advantages of such a design are well known by most, but here is a brief list of what this means for us:

  • Stability: we now run all the WebKit code in a different process (WebProcess) than the UI code (UIProcess). Most crashes, by far, happen inside WebKit, so this will make the application a lot more stable. When crashes happen, you’ll get a message explaining how to continue browsing, just a reload away. Note that for 3.8 we run all the web pages in the same WebProcess, just like Safari for OSX did in 5.1, but we plan to move to multiple WebProcesses in the near future.
  • Responsiveness: not only most crashes happen inside WebKit, but most of your CPU (and GPU!) time will generally also be spent there. By moving this processing away we allow the UI to be always ready to react to your input, so tab switching and scrolling are now super smooth. You’ll have to see it to believe it, but trust me when I say the improvement is enormous.
  • Plugins: in WebKit2 plugins run in their own process (PluginProcess). In general this makes the browser more robust against the frequent crashes and hiccups of some plugins, but for GNOME in particular it means something even better. Flash, as our more loyal users know well, has not worked natively in Web since we switched to GTK+3. The Adobe plugin is stuck in GTK+2, and since using both libraries at the same time in one process is forbidden we have been forced to rely on things like nspluginwrapper for a while. Well, not anymore: we compile our PluginProcess with GTK+2 precisely for this reason, so Flash works again out of the box.
  • Security: finally, by isolating all the web content from the application we reduce the possibility of malicious code having access to sensitive information. This will be greatly enhanced in the future with the sandboxing of the WebProcess, but 3.8 lays the first stone for a safer browsing.

Our WebKit2 port and Web backend have been more or less in continuous development for more than 2 years. This is the result of a lot of hard work by all the WebKit contributors, the Igalia WebKit team and the Web developers. Despite being a collective effort, like most free software, I want to thank Carlos García Campos in particular for leading the WebKit2GTK+ effort with so much energy. His hard work and perseverance were instrumental in making this happen, so the usual beverages or thank you notes are more than deserved next time you see him.

All this would be more than enough for a single release, but there is much more to see. Let’s keep going.

Private browsing

As advanced during the last WebKitGTK+ hackfest, Web 3.8 brings a straightforward way of launching a temporary and private browsing session, free from any data that could possibly identify you, that will leave no trace on the hard drive once you are done with it. Just go to the application menu and click “New Incognito Window”, you are all set.

private

Undo close tab

You closed a tab just to realize you actually needed to check one more thing in there. You pressed the wrong key. Selected the wrong menu item. Nevermind, in 3.8 Web will remember the tabs (and windows!) you have recently closed, and bringing them back to live is as easy as pressing Ctrl-Shift-T or selecting “Reopen Closed Tab” in the application menu.

UI polish

You want a shiny new search bar? We have it:

search

A handy new tab button in the main toolbar? Got it:

new-tab

Vastly improved Web App icons and titles? Yep:

facebook

Fancy HTML5 media controls perhaps? You bet:

after

Plus less clutter in the gear menu, improved theming, simplified Preferences dialog, and many more small UI tweaks.

3.10 and beyond

As usual we are already working on the next major release. For 3.10 and beyond you can look forward to:

  • Multiple WebProcesses. This is still under development, but at the very least we think we’ll be able to provide either the current “just one” model or the “one per tab” variant. Ideally also the “one per domain instance” that Chrome uses by default, but this one will be trickier. Stay tuned.
  • Deeper GNOME integration. Hooking into the new Shell search APIs or the Settings privacy controls is high in our TODO list.
  • Overview improvements. Animations? Access to recent history or recently closed tabs?

We think GNOME 3.8 is the best GNOME yet, and Web 3.8 is the best Web yet. We hope you’ll enjoy both, we definitely enjoyed doing our part.

by xan at March 25, 2013 05:05 PM

March 22, 2013

Sergio Villar

Improving the editing code in WebKit

For a while now Igalia and Bloomberg have been collaborating to advance Web technologies. As part of that, I’ve been lately involved on improving some editing capabilities of WebKit (posts to follow soon).

As you probably know, in HTML5 any element can be editable. The feature was introduced some time ago, but was finally standardized by the WHATWG. It’s as easy as adding the attribute contenteditable=true and voilà, the magic unfolds (check it out!!!).

That’s indeed a very powerful feature that allows you to create fast and full-featured rich text editors on the Web. Thing is that, although the contenteditable is standardized, some of the behaviors are not, and thus, each Web engine must provide the one it considers the most correct.

Let me show an example of the above. Imagine that you have the typical HTML bulleted list but inside an editable content element (like a <div>). It’d look like something like this:

  • one
  • two
  • three
  • four

Nothing unusual so far. Now imagine that you select the item “two” and then drag it to the end of “four”. What would be the expected behavior? There is no clear answer to that, you might think that one possible outcome could be:

  • one
  • three
  • fourtwo

which matches Firefox behavior (and WebKit’s as well before I landed this). Another possible result would be this one:

  • one
  • three
  • four
  • two

Both of them seem pretty sensible, the question is whether you consider that you’re dragging just some text, or a text inside a <li> element. It is not an easy decision, so let’s examine some other cases, for example, what happens if we try with two items instead of just one?. In that case (assuming we select “two” and “three” and drop them after “four”)  the result is the following:

  • one
  • four
  • two
  • three

So despite apparently being both outcomes equally correct, it seems that the behavior is a bit inconsistent depending on the number of items you select. I don’t know how this is implemented in Firefox but I’ll try to explain why WebKit was behaving like that.

WebKit is actually pretty smart when dealing with this kind of operations. When doing a drag and drop of a selection, WebKit builds a markup representation (see this file for further details) of the selected contents. In the first case (one item selected) WebKit detects that the selection starts and ends inside the same node (a text element). That’s why the generated markup does not include any <li> tags, and thus, the content is pasted at the end of the “four” item as text.

In the case of multiple item selection, WebKit traverses the DOM tree looking for a common ancestor of the selected nodes in order to build a proper serialized markup representation of the data that is dragged. As you might have deduced, that operation generates a markup that does include all the nodes it finds until the common ancestor is reached (to preserve the structure and appearance). So instead of having a simple two we’d end up with something similar to <li>two</li><li>three</li> for the multiple selection case.

The actual code is a bit more complex than just “get this markup and insert it into the document” because it has to deal with several corner cases and also needs to perform some cleanups after the move operations. Also the generated markup is much more complex (for the single selected item the markup is normally a <span> element which includes styling data among other stuff) but I simplified it for the sake of simplicity.

So after some work (tracked in bug 111556) I came up with a solution that effectively makes WebCore’s editing code to behave the same way independently of the number of selected list items (note that partial selections inside a list item are not affected by this change).

I’d like to thank Ryosuke Niwa for his insightful reviews and Igalia for giving me the chance to work on exciting stuff around WebKit.

by svillar at March 22, 2013 03:07 PM

March 14, 2013

Diego Pino

Kandy: a Google Reader app for your Kindle

For the last couple of weeks I’ve been working on a Google Reader web app for the Kindle. I love my Kindle and I use it for reading wikipedia and blog posts too. Lately I was using it to read Google Reader but the interface, aimed for desktop computers, was hard to use with the Kindle.

So, as a pet project, I started to code a new web app that could access the Unofficial Google Reader API and present the subscriptions and entries in a more pleasant way.

Today Google announced it will discontinue Google Reader on 1st July. Kandy is still very basic but it’s functional. There were things I wanted to polish and I got some other ideas in mind, but with today’s news I don’t feel like investing more time on this app.

You can check Kandy online here: http://kandy.herokuapp.com (better access from your Kindle :) ). The code is available at github: https://github.com/dpino/Kandy

Enjoy it while you can :)

Some of the blogs I read Planet Igalia ^^ Entries read nicely, like in Article mode Sometimes shit happens :(

 

by diego pino at March 14, 2013 09:17 PM

March 04, 2013

Víctor Jáquez

Igalia (three) week in WebKit: Media controls, Notifications, and Clang build

Hi all,

Three weeks have passed since I wrote the last WebKit report, and they did so quick that it scares me. Many great things have happened since then.

Let’s start with my favorite area: multimedia. Phil landed a patch that avoids muting the sound if the audio pitch is preserved. And Calvaris finally landed his great new media controls. Now watching videos in WebKitGTK+ is a pleasure.

Rego keeps his work on adding more tests to WebKitGTK+, ando he also wrote a fine document on how to build Epiphany with WK2 from git/svn.

Claudio, besides his work in the snapshots API that we already commented, retook the implementation of the notifications API for WebKitGTK+. And, while implementing it, he fixed some crashers in WK2′s implementation. He has also given us an early screencast with the status of the notifications implementation: Check it out! (video).

Carlos García Campos, besides working hard on the stable and development releases of WebKitGTK+ library, has also landed a couple of fixes. Meanwhile, Dape removed some dependencies, making the code base more clean.

Žan, untiredly, has been poking all around the WebKit project, keeping the GTK port kicking and healthy: He fixed code; cleaned up scripts, autogenerated code and enhanced utility scripts; he also enabled more tests in the development builds. But his most impressive work in progress is enabling the Clang build of the GTK port.

But there’s more! Žan setup a web page were you can visualize the status of WebKit2 on several ports: http://www.iswk2buildbrokenyet.com/

by vjaquez at March 04, 2013 11:29 AM

February 27, 2013

Jacobo Aragunde

New release: PhpReport 2.1!

Although I’ve recently changed of scene, there’s still some love for one of the eldest tools we have in Igalia, PhpReport. I’ve been gathering some hours here and there in the last months to be able to develop and ship a new batch of improvements. I would like to highlight some of them:

  • Now we have a way to block the edition of tasks after some time. You can configure a number of days after which users will not be able to insert new tasks or modify existing ones. This will prevent that huge changes in the past affect to your numbers without notice.
  • There is also a new report for users to check their list of tasks and search among them. It gives a higher-level view of your work than the day-to-day tasks interface, while providing more detail than the user or projects details screens. There is also a complementary web service that can be accessed by external scripts. The first commit of this feature dates back from 2010, so it has indeed been stopped for too much!
  • The annoying bug in the tasks screen that prevented you from saving correct tasks under certain conditions is finally gone. And since we were at it, we have modified the notification banners to show up only once.
  • Finally there’s some documentation for system administrators beyond the installation instructions.
  • We added a new boolean field to the tasks to mark specific tasks as done “on site”.
  • Implemented some keyboard shortcuts to increase the productivity in the tasks screen. In particular, you can quick jump to a task with CTRL + 1, 2, etc., save with CTRL + S or add a new task with CTRL + N. Alternatives with ALT exists for browsers not allowing to override these specific combinations, see the corresponding section in the user documentation for details.

The list could be wider, but you can already read it in the changelog. Meanwhile, you can download the new version, or try it online. Enjoy!

by Jacobo Aragunde Pérez at February 27, 2013 06:55 PM

Adrián Pérez

Optimizing Qt for MIPS: QString::fromLatin1

During the last months at Igalia we have been working on optimizing open source components for the MIPS architecture, using the SIMD instructions supported by the latest MIPS 74K cores. More specifically, I have been working on the Qt framework.

One of the most used parts of Qt are the basic utilities in the QtCore module, being QString a basic building block that all applications are expected to use heavily, so making it faster would be a gain for any application. One of the functions I first worked on is QString::fromLatin1: it converts an array of characters in ISO–8859–1 encoding (also known as “Latin 1”) and returns a QString object, which internally stores text data in UTF-16.

Let’s take a look at the generic version of QString::fromLatin1. Taking apart the surrounding code, the loop contains a simple “pull-up” operation: each input byte is converted to an unsigned 16-bit integer, padding the high 8 bits with zeroes (the method works for ISO–8859–1, do not try this for other input encodings):

// "dst": array of 16-bit integers
// "src": array of  8-bit integers
 
while (len--)
   *dst++ = (uchar) *src++;

Almost every compiler out there —including GCC— will implement this doing 1-byte loads into a register, zeroing the upper part of the register, and then doing a halfword (16-bit) store. Essentially:

; a0: "dst" array
; a1: "src" array
; a2: "len"
 
1:  lbu    t1, 0 (a1)
    addiu  a2, a2, -1
    sh     t1, 0 (a0)
    addiu  a0, a0, 2
    bnez   a2, 1b
     addiu a1, a1, 1

(Note how the branch delay slot is taken advantage of to increment one of the pointers; some compilers may just insert a nop instead.)

The most important thing to do is unrolling the loop, but some care is needed: as Thiago Macieira mentions in one of his articles, in average most of the strings in Qt have a length of 17 characters. This means that doing a long unrolling of the loop could be worse because of the calculations needed to determine how many times the unrolled loop is executed and the handling of the remaining items. The first thing to do was making two versions of the algorithm: one with the loop unrolled 4 times, the other with it unrolled 8 times.

As next step, the usual optimization is changing the memory access so as load/store operations are aligned, and handle full words (instead of bytes and halfwords). Loading a word means that the loop is implicitly unrolled 4 times, and some bit-juggling needs to be done in order to extract the values before storing them. Something like:

1:  lw     t1, (a1)  ; t1=ABCD
    addiu  a1, a1,  4
    ; extract bytes from t1 to t2/t3, padding
    ; them with zeroes: t2=0A0B, t3=0C0D
    addiu  a2, a2, -4
    sw     t2, 0 (a0)
    sw     t3, 4 (a0)
    bnez   a2, 1b
     addiu a0, a0, 8

If the code needed to unpack the data needs more than 14 instructions, then this loop would use more instructions than the basic one above (maybe it still would be marginally faster due to the aligned load/store). This is where the new DSP instructions of the 74K cores come in handy: the preceu.ph.qbl and preceu.ph.qbr instructions unpack two 8-bit parts of a register to two 16-bit integers in another register. Like this:

Using the preceu.ph.* instructions avoids having to do quite a lot of shifting and masking to get the values right. The final loop is still quite easy to grasp:

1:  lw     t1, (a1)  ; t1=ABCD
    addiu  a1, a1,  4
    preceu.ph.qbl t2, t1  ; t2=0A0B
    preceu.ph.qbr t3, t1  ; t3=0C0D
    addiu  a2, a2, -4
    sw     t2, 0 (a0)
    sw     t3, 4 (a0)
    bnez   a2, 1b
     addiu a0, a0, 8

Note that the actual implementation is longer, handles the extra items remaining after the unrolled loop, and has a preamble to make the pointers aligned—which, most of the time, is worth to do.

For the sake of completeness I made an additional version of the function in assembler with the loop unrolled 8 times, which uses byte loads and halfword stores. For the benchmarking different string lengths were checked, being each version of the function called a million times. The plot uses the average of 10 runs:

Legend for the plot above:

  • Qt: This is the original code included in Qt, which is s simple pull-up operation that converts chars to short integers. The code was built with -O2 -march=74kf -mdspr2 to get the best possible optimizations from GCC.
  • lbu+unroll8: Loads each character using lbu (byte load) instructions, stores results using sh (store halfword), and the main body of the loop is 8-unrolled. It also uses cache prefetch hinting for both source and destination memory areas.
  • lw+unroll4+dsp: The main body of the loop is 4-unrolled, using lw instructions to load aligned words, DSP instructions to unpack bytes into halfwords, and finally store results with sw instructions.
  • lw+unroll8+dsp: Similar to the previous one, main body is 8-unrolled, and if the source address is not aligned, up to three bytes are consumed before starting to use lw instructions to load aligned words.

The following observations can be done:

  • All the optimized versions perform better than the simple Qt version, being the speedup up to 3.x (note: the time scale in the graphic is logarithmic).
  • For strings up to 20 characters, the 4-unrolled function performs clearly better, because of the small data set and the fact that it avoids the additional preamble used to align the source pointer address. Also, some string lengths (4, 12, 20) can be better handled by the 4-unrolled loop because the lengths are not divisible by 8.
  • For strings bigger than 20 characters, the 8-unrolled with aligned loading version is the best, and it reaches a 3x speedup for strings longer than 32 characters. For longer strings, the speedup is maintaned and becomes clearer as string are longer.

With this data, we can conclude that the best implementation is an hybrid approach, using both the code for the 4-unrolled loop and the 8-unrolled loop, and selecting one of them depending on the length of the input data. Checking the length of the input string and branching takes just two extra instructions (slti and a bnez), which is okay to have in the function preamble because it is not costly.

Wrapping up, the final version is up to 3 times faster than the original Qt implementation, without sacrificing the speed for the rather common small strings.

Last but not least, I would like to thank MIPS Technologies for sponsoring this work, and for making it possible to have a development board at Igalia for testing and benchmarking.

by aperez at February 27, 2013 02:14 PM

February 25, 2013

Andy Wingo

on generators

Hello! Is this dog?

Hi dog this is Andy, with some more hacking nargery. Today the topic is generators.

(shift k)

By way of introduction, you might recall (with a shudder?) a couple of articles I wrote in the past on delimited continuations. I even gave a talk on delimited continuations at FrOSCon last year, a talk that was so hot that thieves broke into the the FrOSCon video squad's room and stole their hard drives before the edited product could hit the tubes. Not kidding! The FrOSCon folks still have the tapes somewhere, but meanwhile time marches on without the videos; truly, the terrorists have won.

Anyway, the summary of all that is that Scheme's much-praised call-with-current-continuation isn't actually a great building block for composable abstractions, and that delimited continuations have much better properties.

So imagine my surprise when Oleg Kiselyov, in a mail to the Scheme standardization list, suggested a different replacement for call/cc:

My bigger suggestion is to remove call/cc and dynamic-wind from the base library into an optional feature. I'd like to add the note inviting the discussion for more appropriate abstractions to supersede call/cc -- such [as] generators, for example.

He elaborated in another mail:

Generators are a wide term, some versions of generators are equivalent (equi-expressible) to shift/reset. Roshan James and Amr Sabry have written a paper about all kinds of generators, arguing for the generator interface.

And you know, generators are indeed a very convenient abstraction for composing producers and consumers of values -- more so than the lower-level delimited continuations. Guile's failure to standardize a generator interface is an instance of a class of Scheme-related problems, in which generality is treated as more important than utility. It's true that you can do generators in Guile, but we shouldn't make the user build their own.

The rest of the programming world moved on and built generators into their languages. James and Sabry's paper is good because it looks at yield as a kind of mainstream form of delimited continuations, and uses that perspective to place the generators of common languages in a taxonomy. Some of them are as expressive as delimited continuations. There are are also two common restrictions on generators that trade off expressive power for ease of implementation:

  • Generators can be restricted to disallow backtracking, as in Python. This allows generators to yield without allocating memory for a new saved execution state, instead updating the stored continuation in-place. This is commonly called the one-shot restriction.

  • Generators can also restrict access to their continuation. Instead of reifying generator state as external iterators, internal iterators call their consumers directly. The first generators developed by Mary Shaw in the Alphard language were like this, as are the generators in Ruby. This can be called the linear-stack restriction, after an implementation technique that they enable.

With that long introduction out of the way, I wanted to take a look at the proposal for generators in the latest ECMAScript draft reports -- to examine their expressiveness, and to see what implementations might look like. ES6 specifies generators with the the first kind of restriction -- with external iterators, like Python.

Also, Kiselyov & co. have a hip new paper out that explores the expressiveness of linear-stack generators, Lazy v. Yield: Incremental, Linear Pretty-printing. I was quite skeptical of this paper's claim that internal iterators were significantly more efficient than external iterators. So this article will also look at the performance implications of the various forms of generators, including full resumable generators.

ES6 iterators

As you might know, there's an official ECMA committee beavering about on a new revision of the ECMAScript language, and they would like to include both iterators and generators.

For the purposes of ECMAScript 6 (ES6), an iterator is an object with a next method. Like this one:

var p = console.log;

function integers_from(n) {
  function next() {
    return this.n++;
  }
  return { n: n, next: next };
}

var ints = integers_from(0);
p(ints.next()); // prints 0
p(ints.next()); // prints 1

Here, int_gen is an iterator that returns successive integers from 0, all the way up to infinity2^53.

To indicate the end of a sequence, the current ES6 drafts follow Python's example by having the next method throw a special exception. ES6 will probably have an isStopIteration() predicate in the @iter module, but for now I'll just compare directly.

function StopIteration() {}
function isStopIteration(x) { return x === StopIteration; }

function take_n(iter, max) {
  function next() {
    if (this.n++ < max)
      return iter.next();
    else
      throw StopIteration;
  }
  return { n: 0, next: next };
}

function fold(f, iter, seed) {
  var elt;
  while (1) {
    try {
      elt = iter.next();
    } catch (e) {
      if (isStopIteration (e))
        break;
      else
        throw e;
    }
    seed = f(elt, seed);
  }
  return seed;
}

function sum(a, b) { return a + b; }

p(fold(sum, take_n(integers_from(0), 10), 0));
// prints 45

Now, I don't think that we can praise this code for being particularly elegant. The end result is pretty good: we take a limited amount of a stream of integers, and combine them with a fold, without creating intermediate data structures. However the means of achieving the result -- the implementation of the producers and consumers -- are nasty enough to prevent most people from using this idiom.

To remedy this, ES6 does two things. One is some "sugar" over the use of iterators, the for-of form. With this sugar, we can rewrite fold much more succinctly:

function fold(f, iter, seed) {
  for (var x of iter)
    seed = f(x, seed);
  return seed;
}

The other thing offered by ES6 are generators. Generators are functions that can yield. They are defined by function*, and yield their values using iterators:

function* integers_from(n) {
  while (1)
    yield n++;
}

function* take_n(iter, max) {
  var n = 0;
  while (n++ < max)
    yield iter.next();
}

These definitions are mostly equivalent to the previous iterator definitions, and are much nicer to read and write.

I say "mostly equivalent" because generators are closer to coroutines in their semantics because they can also receive values or exceptions from their caller via the send and throw methods. This is very much as in Python. Task.js is a nice example of how a coroutine-like treatment of generators can help avoid the fraction-of-an-action callback mess in frameworks like Twistednode.js.

There is also a yield* operator to delegate to another generator, as in Python's PEP 380.

OK, now that you know all the things, it's time to ask (and hopefully answer) a few questions.

design of ES6 iterators

Firstly, has the ES6 committee made the right decisions regarding the iterators proposal? For me, I think they did well in choosing external iterators over internal iterators, and the choice to duck-type the iterator objects is a good one. In a language like ES6, lacking delimited continuations or cross-method generators, external iterators are more expressive than internal iterators. The for-of sugar is pretty nice, too.

Of course, expressivity isn't the only thing. For iterators to be maximally useful they have to be fast, and here I think they could do better. Terminating iteration with a StopIteration exception is not so great. It will be hard for an optimizing compiler to rewrite the throw/catch loop terminating condition into a simple goto, and with that you lose some visibility about your loop termination condition. Of course you can assume that the throw case is not taken, but you usually don't know for sure that there are no more throw statements that you might have missed.

Dart did a better job here, I think. They split the next method into two: a moveNext method to advance the iterator, and a current getter for the current value. Like this:

function dart_integers_from(n) {
  function moveNext() {
    this.current++;
    return true;
  }
  return { current: n-1, moveNext: moveNext };
}

function dart_take_n(iter, max) {
  function moveNext() {
    if (this.n++ < max && iter.moveNext()) {
      this.current = iter.current;
      return true;
    }
    else
      return false;
  }
  return {
    n: 0,
    current: undefined,
    moveNext: moveNext
  };
}

function dart_fold(f, iter, seed) {
  while (iter.moveNext())
    seed = f(iter.current, seed);
}

I think it might be fair to say that it's easier to implement an ES6 iterator, but it is easier to use a Dart iterator. In this case, dart_fold is much nicer, and almost doesn't need any sugar at all.

Besides the aesthetics, Dart's moveNext is transparent to the compiler. A simple inlining operation makes the termination condition immediately apparent to the compiler, enabling range inference and other optimizations.

iterator performance

To measure the overhead of for-of-style iteration, we can run benchmarks on "desugared" versions of for-of loops and compare them to normal for loops. We'll throw in Dart-style iterators as well, and also a stateful closure iterator designed to simulate generators. (More on that later.)

I put up a test case on jsPerf. If you click through, you can run the benchmarks directly in your browser, and see results from other users. It's imprecise, but I believe it is accurate, more or less.

The summary of the results is that current engines are able to run a basic summing for loop in about five cycles per iteration (again, assuming 2.5GHz processors; the Epiphany 3.6 numbers, for example, are mine). That's suboptimal only by a factor of 2 or so; pretty good. Thigh fives all around!

However, performance with iterators is not so good. I was surprised to find that most of the other tests are suboptimal by about a factor of 20. I expected the engines to do a better job at inlining.

One data point that stands out of my limited data set is the development Epiphany 3.7.90 browser, which uses JavaScriptCore from WebKit trunk. This browser did significantly better with Dart-style iterators, something I suspect due to better inlining, though I haven't verified it.

Better inlining and stack allocation is already a development focus of modern JS engines. Dart-style iterators will get faster without being a specific focus of optimization. It seems like a much better strategy for the ES6 specification to piggy-back off this work and specify iterators in such a way that they will be fast by default, without requiring more work on the exception system, something that is universally reviled among implementors.

what about generators?

It's tough to know how ES6 generators will perform, because they're not widely implemented and don't have a straightforward "desugaring". As far as I know, there is only one practical implementation strategy for yield in JavaScript, and that is to save the function's stack and the set of live registers. Resuming a generator copies the activation back onto the stack and restores the registers. Any other strategy that involves higher-level control-flow rewriting would make debugging too difficult. This is not a terribly onerous requirement for an implementation. They already have to know exactly what is live and what is not, and can reconstruct a stack frame without too much difficulty.

In practice, invoking a generator is not much different from calling a stateful closure. For this reason, I added another test to the jsPerf benchmark I mentioned before that tests stateful closures. It does slightly worse than iterators that keep state in an object, but not much worse -- and there is no essential reason why it should be slow.

On the other hand, I don't think we can expect compilers to do a good job at inlining generators, especially if they are specified as terminating with a StopIteration. Terminating with an exception has its own quirks, as Chris Leary wrote a few years ago, though I think in his conclusion he is searching for virtues where there are none.

dogs are curious about multi-shot generators

So dog, that's ECMAScript. But since we started with Scheme, I'll close this parenthesis with a note on resumable, multi-shot generators. What is the performance impact of allowing generators to be fully resumable?

Being able to snapshot your program at any time is pretty cool, but it has an allocation cost. That's pretty much the dominating factor. All of the examples that we have seen so far execute (or should execute) without allocating any memory. But if your generators are resumable -- or even if they aren't, but you build them on top of full delimited continuations -- you can't re-use the storage of an old captured continuation when reifying a new one. A test of iterating through resumable generators is a test of short-lived allocations.

I ran this test in my Guile:

(define (stateful-coroutine proc)
  (let ((tag (make-prompt-tag)))
    (define (thunk)
      (proc (lambda (val) (abort-to-prompt tag val))))
    (lambda ()
      (call-with-prompt tag
                        thunk
                        (lambda (cont ret)
                          (set! thunk cont)
                          ret)))))

(define (integers-from n)
  (stateful-coroutine
   (lambda (yield)
     (let lp ((i n))
       (yield i)
       (lp (1+ i))))))

(define (sum-values iter n)
  (let lp ((i 0) (sum 0))
    (if (< i n)
        (lp (1+ i) (+ sum (iter)))
        sum)))

(sum-values (integers-from 0) 1000000)

I get the equivalent of 2 Ops/sec (in terms of the jsPerf results), allocating about 0.5 GB/s (288 bytes/iteration). This is similar to the allocation rate I get for JavaScriptCore, which is also not generational. V8's generational collector lets it sustain about 3.5 GB/s, at least in my tests. Test case here; multiply the 8-element test by 100 to yield approximate MB/s.

Anyway these numbers are much lower than the iterator tests. ES6 did right to specify one-shot generators as a primitive. In Guile we should perhaps consider implementing some form of one-shot generators that don't have an allocation overhead.

Finally, while I enjoyed the paper from Kiselyov and the gang, I don't see the point in praising simple generators when they don't offer a significant performance advantage. I think we will see external iterators in JavaScript achieve near-optimal performance within a couple years without paying the expressivity penalty of internal iterators.

Catch you next time, dog!

by Andy Wingo at February 25, 2013 04:17 PM

February 24, 2013

Martin Robinson

Edge-distance anti-aliasing

(You might want to go straight to the demo)

Some months ago, I noticed that the Chromium compositor, the code which powers Chromium’s accelerated compositing implementation (and also Aura!) was anti-aliasing layer edges. This was especially surprising to me since I knew for a fact that my hardware didn’t support multisample anti-aliasing yet. Some people from the Chromium graphics team, who are incredibly friendly and helpful, pointed me to a very clever bit of code they were using to do edge-distance anti-aliasing for composited layers.

Aliasing can happen when you sample a signal that has details that cannot be captured by your sample frequency. In the case of graphics, it’s what happens when features of the image exist in an area smaller than a pixel. One important feature is the edge of a piece of geometry, such as the edge of a shape drawn onto the page. When you start rotating shapes and projecting them with 3D CSS transforms, geometry that before aligned to pixel boundaries can move to the spaces between pixel boundaries.

For instance, let’s zoom into the edge of triangle that we are rendering. Maybe in this case it’s half of a rotated div.

A polygon without anti-aliasing

If we naively decided the color of every pixel based on whether any of the triangle covered it at all, we’d end up with a jagged edge.

One thing we could do to make the edge smooth is to determine how much area of the pixel is covered by the triangle. We could adjust the transparency of the shape color (the alpha value) by that percentage when painting the pixel. This process is a type of anti-aliasing and the percentage that the triangle covers the pixel is called coverage.

A polygon with anti-aliasing

Think of all the complicated geometric calculations we’d have to do to figure out the ratio exactly. Since we have to make these calculations for every pixel up to 60 times per second (for smooth animations), our rendering would be pretty slow if we actually did them! Instead, it’d be nice if we could somehow estimate how much of the pixel is covered and do less work. One approach is to calculate the distance from the pixel to the edge of the shape. If the pixel is more than one pixel’s distance away from the edge, we know that we should not paint the triangle color at all. If the pixel is closer than one one pixel’s distance from the edge, we should paint the triangle color, but reduced by some transparency factor.

Luckily, OpenGL can run a little program for every pixel 1 it paints. This program is called a fragment shader. We can write up this logic in the fragment shader and change the triangle color for every pixel.

Perhaps some of those reading who have had experience with OpenGL may notice here that this isn’t going to work as I’ve described. OpenGL already does something like the naive approach I talked about first. Some pixels (the ones OpenGL decided should be painted with the triangle color), will be properly anti-aliased, but many pixels will not be painted at all. For instance, if the coverage of the pixel is only 20% and OpenGL decided not to paint it, we wouldn’t even have a chance to determine how far it was from the triangle edge, because the fragment shader wouldn’t run for those missing pixels.

Let’s “trick” OpenGL into painting more pixels than it would otherwise. A simple way to do this is to just make the triangle a little bit bigger. In fact, we can expand all the edges by less than one pixel and OpenGL will paint all those missing pixels. Now we have smooth edges!

Expanding the drawing area

I spent some time implementing this for the TextureMapper accelerated compositor (and thus WebKitGTK+), so it’s not just the Chromium compositor that has this feature any longer. For me it makes a huge difference in the quality of 3D CSS content. For the purposes of demonstration, I’ve also reimplemented it in WebGL, so if you have a modern browser you can see it in action.

A screenshot of the edge-distance anti-aliasing demo
  1. Fragments and pixels are actually different things, but I'm glossing over this difference for the sake of simplicity.

February 24, 2013 05:00 AM

February 22, 2013

Carlos López

How to properly activate TRIM for your SSD on Linux: fstrim, lvm and dm-crypt

Unlike hard disk drives (HDDs), NAND flash memory that make SSD cannot overwrite existing data. This means that you first have to delete the old data before writing new one. Flash memory is divided into blocks, which is further divided in pages. The minimum write unit is a page, but the smallest erase unit is a block.

Data can be written directly into an empty page, but only whole blocks can be erased. Therefore, to reclaim the space taken up by invalid data, all the valid data from one block must be first copied and written into the empty pages of a new block. Only then can the invalid data in the original block be erased, making it ready for new valid data to be written.

Do you see the problem? This means that as time goes on, the SSD will internally fragment the blocks among the different pages, until that it reaches a point where there won’t be available any empty page. Then every time the drive needs to write a block into any of the semi-full pages, it first needs to copy the current blocks from the page to a buffer, then it has to delete the whole page to finally rewrite the old blocks along with the new one. This means that as time goes on the SSD performance degrades more and more, because for every write it has to go through a cycle of read-erase-modify-write. This is known as “write amplification”.

Without TRIM the disk is unable to know which blocks are in use by a file or which ones are marked as free space. This is because when a file is deleted, the only thing the OS does is to mark the blocks that were used by the file as free inside the file system index. But the OS won’t tell the disk about this. This means that over time the performance of the SSD disk will degrade more and more, and it don’t matters how much space you free, because the SSD won’t know about it.

What is TRIM? TRIM was invented for solving this problem. TRIM is the name of a command that the operating system can send to tell the SSD which blocks are free in the filesystem.
The SSD uses this information to internally defragment the blocks and keep free pages available to be written quickly and efficiently.

How to active TRIM on Linux? The first thing to know is that TRIM should be enabled on all I/O abstraction layers. This means that if you have an ext4 partition on top of LVM, which in turn is on top of an encrypted volume with LUKS/dm-crypt, then you must enable support for TRIM in these three layers: The filesystem, LVM and dm-crypt. There is no point in enabling it at the filesystem level if you don’t enable it also on the other layers. The TRIM command should be translated from one layer to another until reaching the SSD.

  1. Enabling TRIM support on dm-crypt

  2. We simply have to add the option discard inside our crypttab

    • $ cat /etc/crypttab
# <target name>    <source device>    <key file>       <options>
sda2_crypt         /dev/sda2          none             luks,discard

Note: The usage of TRIM on dm-crypt could cause some security issues like the revelation of which sectors of your disk are unused.

  • Enabling TRIM support on LVM

  • We have to enable the option issue_discards in the LVM configuration.

    • $ cat /etc/lvm/lvm.conf
    # [...]
    devices {
       # [...]
       issue_discards = 1
       # [...]
    }
    # [...]
    
  • Enabling TRIM support on the file system

  • This is the most interesting part. Most people simply add the option “discard” in the mounting options at /etc/fstab. However, this means that every time you delete a file, the OS will be reporting in real-time to the SSD which blocks were occupied by that file and are not longer in use, and then the SSD will have to perform a defragmentation and deletion of those internal blocks, operation which will take an amount of time higher than desired.

    In order to optimize the performance of the SSD, I strongly advise you to avoid doing the TRIM operation in real time (whenever a file is deleted) because you would be putting an unnecessary extra amount of work over the SSD. In other words: You should not enable the discard option in fstab.

    Instead, what I recommend is to run a script periodically to tell the SSD which blocks are free with the command fstrim. Doing this operation daily or weekly is more than enough. This way we do not lose any performance due to TRIM when deleting files and we periodically keep informed the SSD about the free blocks.

    Other advantages of the fstrim way are:

    • If you didn’t enabled correctly the TRIM support in the above layers of your setup, you will receive an error when executing fstrim. On the other hand, if you were using the discard option at fstab you wouldn’t have received any error and you would end thinking that you managed to get TRIM working properly when you didn’t.
    • If you delete a file by mistake (you know it happens), you can recover it before anacron runs your script fstrim. On the other hand, if you were using the discard-at-fstab option you wouldn’t have any chance of recovering the file, because the OS would have told the SSD to TRIM that blocks as soon as you deleted the file, and consequently the SSD has irreversibly destroyed such blocks.

    Here you have simple script to run fstrim on the /, /boot and /home partitions, which can be programmed to be executed periodically by anacron

    • $ cat /etc/cron.weekly/dofstrim
    #! /bin/sh
    for mount in / /boot /home; do
    	fstrim $mount
    done
    

    It’s worth mentioning that both fstrim and discard-at-fstab options only work on filesystems that implement the ioctl FITRIM. Today, such filesystems are:

    ~/kernel/linux (v3.8) $ grep -lr FITRIM fs/ | cut -d/ -f2 | sort | uniq | xargs echo
    btrfs ext3 ext4 gfs2 jfs ocfs2 xfs
    

    Note: On most setups you will have to rebuild your initramfs with update-initramfs -u (Debian and derivatives) or dracut -f (Redhat and derivatives) and reboot the machine after touching the configuration options of LVM or dm-crypt.

    Update 25-Feb: Seems that Fedora users with a dm-crypt volume will be affected by this problem: https://bugzilla.redhat.com/890533

    by clopez at February 22, 2013 11:14 PM

    February 20, 2013

    Xabier Rodríguez Calvar

    New media controls in WebKitGtk+

    So it looks like my patch for the rework of the WebKitGtk+ media controls was finally landed.

    First I would like to thank Igalia for giving me some time to complete this task, which took some work and began at WebKitGtk+ hackfest some time ago with Žan Doberšek and Jon McCann.

    Starting point was:

    Starting point

    As you can see the controls look like an old Gtk+ application without any theming. Jon suggested that we could began with mimicing Chromium controls as they look closer to any modern themed GNOME application and adapt them to use the GNOME symbolic icons and keep some other stuff like the volume bar, but of course making it look nicer.

    What was done:

    • Adding the GNOME symbolic icon theme and a method to replace the normal stock icons, though we keep them as fallback.
    • Deep adaptation of Chromium CSS and C++ code to make it suit the GNOME requirements.
    • Some buttons fell off the design, like seeking backwards and forward.
    • Aligned the elements with the pixel ruler to make them as close to perfect as possible in all conditions (as some buttons are hidden in certain situations, like fullscreen, volume…).
    • Fixed a bug about the buffering ranges that was in trunk at that point, but was independent of the code I was cooking.
    • Removed as much of the C++ code as possible to deviate the drawing to CSS, which is more maintainable for design purposes. The only things that are still painted with C++ code are the slider tracks, which depend on parameters than cannot be specified in CSS, like the buffering ranges and the volume (which was not before, but I introduced for design coherence).
    • Removed the focus ring which was making the controls uglier.
    • Removed the dead code.
    • New baselines for the tests, including the pixel ones. Flagged also some tests that are (and will) not working in Chromium either.

    I had a small issue with a Chromium guy landing a patch that forced me to change the display of some components from -webkit-box to -webkit-flex and of course, rebasing all related tests. This created a small delay in landing the patch, but it finally did as 143463.

    And the result is the following:
    New media controls

    I don’t know about you guys, but I like it!

    by calvaris at February 20, 2013 06:06 PM

    February 16, 2013

    Andy Wingo

    opengl particle simulation in guile

    Hello, internets! Did you know that today marks two years of Guile 2? Word!

    Guile 2 was a major upgrade to Guile's performance and expressiveness as a language, and I can say as a user that it is so nice to be able to rely on such a capable foundation for hacking the hack.

    To celebrate, we organized a little birthday hack-feast -- a communal potluck of programs that Guilers brought together to share with each other. Like last year, we'll collect them all and publish a summary article on Planet GNU later on this week. This is an article about the hack that I worked on.

    figl

    Figl is a Guile binding to OpenGL and related libraries, written using Guile's dynamic FFI.

    Figl is a collaboration between myself and Guile super-hacker Daniel Hartwig. It's pretty complete, with a generated binding for all of OpenGL 2.1 based on the specification files. It doesn't have a good web site yet -- it might change name, and probably we'll move to savannah -- but it does have great documentation in texinfo format, built with the source.

    But that's not my potluck program. My program is a little demo particle simulation built on Figl.

    Click to download video

    Source available here. It's a fairly modern implementation using vertex buffer objects, doing client-side geometry updates.

    To run, assuming you have some version of Guile 2.0 installed:

    git clone git://gitorious.org/guile-figl/guile-figl.git
    cd guile-figl
    autoreconf -vif
    ./configure
    make
    # 5000 particles
    ./env guile examples/particle-system/vbo.scm 5000
    

    You'll need OpenGL development packages installed -- for the.so links, not the header files.

    performance

    If you look closely on the video, you'll see that the program itself is rendering at 60 fps, using 260% CPU. (The CPU utilization drops as soon as the video starts, because of the screen capture program.) This is because the updates to the particle positions and to the vertices are done using as many processors as you have -- 4, in my case (2 real and 2 threads each). I've gotten it up to 310% or so, which is pretty good utilization considering that the draw itself is single-threaded and there are other things that need to use the CPU like the X server and the compositor.

    Each particle has a position and a velocity, and is attracted to the center using a gravity-like force. If you consider that updating a particle's position should probably take about 100 cycles (for a square root, some divisions, and a few other floating point operations), then my 2.4GHz processors should allow for a particle simulation with 1.6 million particles, updated at 60fps. In that context, getting to 5K particles at 60 fps is not so impressive -- it's sub-optimal by a factor of 320.

    This slowness is oddly exciting to me. It's been a while since I've felt Guile to be slow, and this is a great test case. The two major things to work on next in Guile are its newer register-based virtual machine, which should speed things up by a factor of 2 or so in this benchmark; and an ahead-of-time native compiler, which should add another factor of 5 perhaps. Optimizations to an ahead-of-time native compiler could add another factor of 2, and a JIT would probably make up the rest of the difference.

    One thing I was really wondering about when making this experiment was how the garbage collector would affect the feeling of the animation. Back when I worked with the excellent folks at Oblong, I realized that graphics could be held to a much higher standard of smoothness than I was used to. Guile allocates floating-point numbers on the heap and uses the Boehm-Demers-Weiser conservative collector, and I was a bit skeptical that things could be smooth because of the pause times.

    I was pleasantly surprised that the result was very smooth, without noticeable GC hiccups. However, GC is noticeable on an overloaded system. Since the simulation advances time by 1/60th of a second at each frame, regardless of the actual frame rate, differing frame processing times under load can show a sort of "sticky-zipper" phenomenon.

    If I ever got to optimal update performance, I'd probably need to use a better graphics card than my Intel i965 embedded controller. And of course vertex shaders are really the way to go if I cared about the effect and not the compiler -- but I'm more interested in compilers than graphics, so I'm OK with the state of things.

    Finally I would note that it has been pure pleasure to program with general, high-level macros, knowing that the optimizer would transform the code into really efficient low-level code. Of course I had to check the optimizations, with the ,optimize, command at the REPL. I even had to add a couple of transformations to the optimizer at one point, but I was able to do that without restarting the program, which was quite excellent.

    So that's my happy birthday dish for the Guile 2 potluck. More on other programs after the hack-feast is over; until then, see you in #guile!

    by Andy Wingo at February 16, 2013 07:20 PM