Planet Igalia

November 27, 2014

Andy Wingo

scheme workshop 2014

I just got back from the US, and after sleeping for 14 hours straight I'm in a position to type about stuff again. So welcome back to the solipsism, France and internet! It is good to see you on a properly-sized monitor again.

I had the enormously pleasurable and flattering experience of being invited to keynote this year's Scheme Workshop last week in DC. Thanks to John Clements, Jason Hemann, and the rest of the committee for making it a lovely experience.

My talk was on what Scheme can learn from JavaScript, informed by my work in JS implementations over the past few years; you can download the slides as a PDF. I managed to record audio, so here goes nothing:


55 minutes, vorbis or mp3

It helps to follow along with the slides. Some day I'll augment my slide-rendering stuff to synchronize a sequence of SVGs with audio, but not today :)

The invitation to speak meant a lot to me, for complicated reasons. See, Scheme was born out of academic research labs, and to a large extent that's been its spiritual core for the last 40 years. My way to the temple was as a money-changer, though. While working as a teacher in northern Namibia in the early 2000s, fleeing my degree in nuclear engineering, trying to figure out some future life for myself, for some reason I was recording all of my expenses in Gnucash. Like, all of them, petty cash and all. 50 cents for a fat-cake, that kind of thing.

I got to thinking "you know, I bet I don't spend any money on Tuesdays." See, there was nothing really to spend money on in the village besides fat cakes and boiled eggs, and I didn't go into town to buy things except on weekends or later in the week. So I thought that it would be neat to represent that as a chart. Gnucash didn't have such a chart but I knew that they were implemented in Guile, as part of this wave of Scheme consciousness that swept the GNU project in the nineties, and that I should in theory be able to write it myself.

Problem was, I also didn't have internet in the village, at least then, and I didn't know Scheme and I didn't really know Gnucash. I think what I ended up doing was just monkey-typing out something that looked like the rest of the code, getting terrible errors but hey, it eventually worked. I submitted the code, many years ago now, some of the worst code you'll read today, but they did end up incorporating it into Gnucash and to my knowledge that report is still there.

I got more into programming, but still through the back door, so to speak. I had done some free software work before going to Namibia, on GStreamer, and wanted to build a programmable modular synthesizer with it. I read about Supercollider, and decided I wanted to do something like that but with the "unit generators" defined in GStreamer and orchestrated with Scheme. If I knew then that Scheme could be fast, I probably would have started on an entirely different course of things, but that did at least result in gainful employment doing unrelated GStreamer things, if not a synthesizer.

Scheme became my dominant language for writing programs. It was fun, and the need to re-implement a bunch of things wasn't a barrier at all -- rather a fun challenge. After a while, though, speed was becoming a problem. It became apparent that the only way to speed up Guile would be to replace its AST interpreter with a compiler. Thing is, I didn't know how to write one! Fortunately there was previous work by Keisuke Nishida, jetsam from the nineties wave of Scheme consciousness. I read and read that code, mechanically monkey-typed it into compilation, and slowly reworked it into Guile itself. In the end, around 2009, Guile was faster and I found myself its co-maintainer to boot.

Scheme has been a back door for me for work, too. I randomly met Kwindla Hultman-Kramer in Namibia, and we found Scheme to be a common interest. Some four or five years later I ended up working for him with the great folks at Oblong. As my interest in compilers grew, and it grew as I learned more about Scheme, I wanted something closer there, and that's what I've been doing in Igalia for the last few years. My first contact there was a former Common Lisp person, and since then many contacts I've had in the JS implementation world have been former Schemers.

So it was a delight when the invitation came to speak (keynote, no less!) the Scheme Workshop, behind the altar instead of in the foyer.

I think it's clear by now that Scheme as a language and a community isn't moving as fast now as it was in 2000 or even 2005. That's good because it reflects a certain maturity, and makes the lore of the tribe easier to digest, but bad in that people tend to ossify and focus on past achievements rather than future possibility. Ehud Lamm quoted Nietzche earlier today on Twitter:

By searching out origins, one becomes a crab. The historian looks backward; eventually he also believes backward.

So it is with Scheme and Schemers, to an extent. I hope my talk at the conference inspires some young Schemer to make an adaptively optimized Scheme, or to solve the self-hosted adaptive optimization problem. Anyway, as users I think we should end the era of contorting our code to please compilers. Of course some discretion in this area is always necessary but there's little excuse for actively bad code.

Happy hacking with Scheme, and au revoir!

by Andy Wingo at November 27, 2014 05:48 PM

November 24, 2014

Samuel Iglesias

piglit (II): How to launch a tailored piglit run

Last post I gave an introduction to piglit, an open-source test suite for OpenGL implementations. On that post, I explained how to compile the source code, how to run full piglit test suite and how to analyze the results.

However, you can tailor a piglit run to execute the specific tests that match your needs. For example I might want to check if a specific subset of tests pass because I am testing a new feature in Mesa as part of my job at Igalia.

Configure the piglit run

There are several parameters that configure the piglit run to match our needs:

  • –dry-run: do not execute the tests. Very useful to check if the parameters you give are doing what you expect.
  • –valgrind: It runs Valgrind memcheck on each test program that it’s going to execute. If it finds any error, the test fails and it saves the valgrind output into results file.
  • –all-concurrent: run all tests concurrently.
  • –no-concurrency: disable concurrent test runs.
  • –sync: sync results to disk after every test so you don’t lose information if something bad happens to your system.
  • –platform {glx,x11_egl,wayland,gbm,mixed_glx_egl}: if you compiled waffle with different windows system, this is the name of the one passed to waffle.

There is a help parameter if you want further information:

$ ./piglit run -h

Skip tests

Sometimes you prefer to skip some tests because they are causing a GPU hang or taking a lot of time and their output is not interesting for you. The way to skip a test is using the -x parameter:

$ ./piglit run tests/all -x texture-rg results/all-reference-except-texture-rg

Run specific tests

You can run specific tests by appending the –name parameter (-n) together with test’s binary name.

$ ./piglit run tests/all -n texture-rg results/reference-texture-rg

Or you prefer to run some specific subset of tests using a filter by test name. Remember that it’s filtering by test name but not by functionality, so you might miss some test programs that check the same functionality.

$ ./piglit run tests/all -t color -t format results/color-format

Also, you can concatenate more parameters to add/remove more tests that are going to be executed.

$ ./piglit run tests/all -t color -t format -t tex -x texture-rg results/color-format-tex

Run standalone tests

There is another way to run standalone tests apart from using the name argument to piglit run. You might be interested in this way if you want to run GDB on a failing tests to debug what is going on.

If we remember how was the HTML output for a given test:

piglit-test-details

The command field specifies the program name executed for this test together with all its arguments. Let me explain what they mean in this example.

  • Some binaries receives arguments that specify the data type to test (like at this case GL_RGB16_SNORM) or any other data: number of samples, msaa, etc.
  • -fbo: it draws in an off-screen framebuffer.
  • -auto: it automatically run the test and when it finishes, it closes the window and prints if the test failed or passed.

Occasionally, you might run the test program without fbo and auto parameters because you want to see what it draws on the window to understand better the bug you are debugging.

Create your own test profile

Besides explicitly adding/removing tests from tests/all.py (or other profiles) in the piglit run command, there is other way of running a specific subset of tests: profiles.

A test profile in piglit is a script written in Python that selects the tests to execute.

There are several profiles already defined in piglit, two of them we already saw them in this post series: tests/sanity.tests and tests/all. First one is useful to check if piglit was correctly compiled together with its dependencies, while the second runs all piglit tests in one shot. Of course there are more profiles inside tests/ directory: cl.py (OpenCL tests), es3conform.py (OpenGL ES 3.0 conformance tests), gpu.py, etc.

Eventually you will write your own profile because adding/removing tests in the console command is tiresome and prone to errors.

This is an example of a profile based on tests/gpu.py that I was recently using for testing the gallium llvmpipe driver.

# -*- coding: utf-8 -*-

# quick.tests minus compiler tests.

from tests.quick import profile
from framework.glsl_parser_test import GLSLParserTest

__all__ = ['profile']

# Remove all glsl_parser_tests, as they are compiler test
profile.filter_tests(lambda p, t: not isinstance(t, GLSLParserTest))

# Drop ARB_vertex_program/ARB_fragment_program compiler tests.
del profile.tests['spec']['ARB_vertex_program']
del profile.tests['spec']['ARB_fragment_program']
del profile.tests['asmparsertest']

# Too much time on gallium llvmpipe
profile.tests['spec']['!OpenGL 1.0'].pop('gl-1.0-front-invalidate-back', None)
profile.tests['spec']['!OpenGL 1.0'].pop('gl-1.0-swapbuffers-behavior', None)
profile.tests['spec']['!OpenGL 1.1'].pop('read-front', None)
profile.tests['spec']['!OpenGL 1.1'].pop('read-front clear-front-first', None)
profile.tests['spec']['!OpenGL 1.1'].pop('drawbuffer-modes', None)
profile.tests['spec']['EXT_framebuffer_blit'].pop('fbo-sys-blit', None)
profile.tests['spec']['EXT_framebuffer_blit'].pop('fbo-sys-sub-blit', None)

As you see, it picks the test lists from quick.py but with some changes: it drops all the tests related to a couple of OpenGL extensions (ARB_vertex_program and ARB_fragment_program) and it drops other seven tests because they took too much time when I was testing them with gallium llvmpipe driver on my laptop.

I recommend you to spend some time playing with profiles as it is a very powerful tool when tailoring a piglit run.

What else?

In some situations, you want to know which test produces kernel error messages in dmesg, or which test is being ran now. For both cases, piglit provides parameters for run command:

$ ./piglit run tests/all -v --dmesg results/all-reference

  • Verbose (-v) prints a line of output for each test before and after it runs, so you can find which takes longer or outputs errors without need to wait until piglit finishes.
  • Dmesg (–dmesg): it saves the difference of dmesg outputs before and after each test program is executed. Thanks to that, you can easily find which test produces kernel errors in the graphics device driver.

Wrapping up

After giving an introduction to piglit in my blog,  this post explains how to configure a piglit run, change the list of tests to execute and how to run a standalone test. As you see, piglit is very powerful tool that requires some time to learn how to use it appropriately.

In next post I will talk about a more advanced topic: how to create your own tests.

by Samuel Iglesias at November 24, 2014 07:58 AM

November 19, 2014

Andrés Gómez

Switching between nouveau and the nVIDIA proprietary OpenGL driver in (Debian) GNU/Linux

So lately I’ve been devoting my time in Igalia around the GNU/Linux graphics stack focusing, more specifically, in Mesa, the most popular open-source implementation of the OpenGL specification.

When working in Mesa and piglit, its testing suite, quite often you would like to compare the results obtained when running a specific OpenGL code with one driver or another.

In the case of nVIDIA graphic cards we have the chance of comparing the default open source driver provided by Mesa, nouveau, or the proprietary driver provided by nVIDIA. For installing the nVIDIA driver you will have to run something like:

root$ apt-get install linux-headers nvidia-driver nvidia-kernel-dkms nvidia-xconfig

Changing from one driver to another involves several steps so I decided to create a dirty script for helping with this.

The actions done by this script are:

  1. Instruct your X Server to use the adequate X driver.

    These instructions apply to the X.org server only.

    When using the default nouveau driver in Debian, the X.org server is able to configure itself automatically. However, when using the nVIDIA driver you most probably will have to instruct the proper settings to X.org.

    nVIDIA provides the package nvidia-xconfig. This package provides a tool of the same name that will generate a X.org configuration file suitable to work with the nVIDIA X driver:

root$ nvidia-xconfig 

WARNING: Unable to locate/open X configuration file.

Package xorg-server was not found in the pkg-config search path.
Perhaps you should add the directory containing `xorg-server.pc'
to the PKG_CONFIG_PATH environment variable
No package 'xorg-server' found
New X configuration file written to '/etc/X11/xorg.conf'

I have embedded this generated file into the provided custom script since it is suitable for my system:

echo 'Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection
' > /etc/X11/xorg.conf

I would recommend you to substitute this with another configuration file generated with nvidia-xconfig on your system.

  • Select the proper GLX library.

    Fortunately, Debian provides the alternatives mechanism to select between one or the other.
  • ALTERNATIVE=""

    ALTERNATIVE="/usr/lib/mesa-diverted"

    ALTERNATIVE="/usr/lib/nvidia"

    update-alternatives --set glx "${ALTERNATIVE}"

  • Black list the module we don’t want the Linux kernel to load on start up.

    Again, in Debian, the nVIDIA driver package installs the file /etc/nvidia/nvidia-blacklists-nouveau.conf that is linked, then, from /etc/modprobe.d/nvidia-blacklists-nouveau.conf instructing that the open source nouveau kernel driver for the graphic card should be avoided.

    When selecting nouveau, this script removes the soft link creating a new file which, instead of black listing nouveau’s driver, does it for the nVIDIA proprietary one:
  • rm -f /etc/modprobe.d/nvidia-blacklists-nouveau.conf
        echo "blacklist nvidia" > /etc/modprobe.d/nouveau-blacklists-nvidia.conf

    When selecting nVIDIA, the previous file is removed and the soft link is restored.

  • Re-generate the image used in the inital booting.

    This will ensure that we are using the proper kernel driver from the beginning of the booting of the system:
  • update-initramfs -u

    With these actions you will be already able to switch your running graphic driver.

    You will switch to nouveau with:

    root$ <path_to>/alternate-nouveau-nvidia.sh nouveau
    update-alternatives: using /usr/lib/mesa-diverted to provide /usr/lib/glx (glx) in manual mode
    update-initramfs: Generating /boot/initrd.img-3.17.0
    
    nouveau successfully set. Reboot your system to apply the changes ...

    And to the nVIDIA proprietary driver with:

    root$ <path_to>/alternate-nouveau-nvidia.sh nvidia
    update-alternatives: using /usr/lib/nvidia to provide /usr/lib/glx (glx) in manual mode
    update-initramfs: Generating /boot/initrd.img-3.17.0
    
    nvidia successfully set. Reboot your system to apply the changes ...

    It is recommended to reboot the system although theoretically you could unload the kernel driver and restart the X.org server. The reason is that it has been reported that unloading the nVIDIA kernel driver and loading a different one is not always working correctly.

    I hope this will be helpful for your hacking time!

    by tanty at November 19, 2014 10:52 AM

    November 14, 2014

    Andy Wingo

    on yakshave, on color, on cosines, on glitchen

    Hold on to your butts, kids, because this is epic.

    on yaks

    As in all great epics, our prideful, stubborn hero starts in a perfectly acceptable state of things, decides on a lark to make a small excursion, and comes back much much later to inflict upon you pictures from his journey.

    So. I have a web photo gallery but I don't take many pictures these days. Dealing with photos is a bit of a drag, and the ways that are easier like Instagram or what-not give me the (peer, corporate, government: choose 3) surveillance hives. So, I had vague thoughts that I should update my web gallery. Yakpoint 1.

    At the same time, my web gallery was written for mod_python on the server, and I don't like hacking in Python any more and kinda wanted to switch away from Apache. Yakpoint 2.

    So I rewrote the server-side part in Scheme. (Yakpoint 3.) It worked fine but I found I needed the ability to get the dimensions of files on the server, so I wrote a quick-and-dirty JPEG parser. Yakpoint 4.

    I needed EXIF data as well, as the original version displayed EXIF data, and for that I used a binding to libexif that I had written a few years ago when I thought about starting this project (Yakpoint -1). However I found some crashers in the library, because it had never really been tested in production, and instead of fixing them I said "what the hell, I'll just write an EXIF parser". (Yakpoint 5.) So I did and adapted the web gallery to use it (Yakpoint 6, for the adaptation.)

    At this point, I looked back, and looked forward, and looked all around, and all was good, but what was with this uneasiness I was feeling? And indeed, I hadn't actually made anything better, and I wasn't taking more photos, and the workflow was the same.

    I was also concerned about the client side of things, which was still in Python and using some breakage-prone legacy libraries to do the photo scaling and transformations and what-not, and relied on a desktop application (f-spot) of dubious future. So I started to look at what it would take to port that script to Scheme (Yakpoint 7). Well it used some legacy libraries to copy files over SSH (gnome-vfs; switching away from that would be Yakpoint 8) and I didn't want to make a Scheme GIO binding (Yakpoint 9, narrowly avoided), and I then -- and then, dear reader -- so then I said "well WTF my caching story on the server is crap anyway, I never know when the sqlite database has changed or not so I never know what responses I can cache, what I really want is a functional datastore" (Yakpoint 10), which is what I have with Git and Tekuti (Yakpoint of yore), and so why not just store my photos in Git like I do in Tekuti for blog posts and serve them from there, indexing as needed? Of course I'd need some other server software (Yakpoint of fore, by which I meantersay the future), but then I could just git push to update my photo gallery, and I wouldn't have to endure the horror that is GVFS shelling out to ssh in a FUSE daemon (Yakpoint of ne'er).

    So. After mulling over these thoughts for a while I decided, during an autumnal walk on the Salève in which we had the greatest views of Mont Blanc everrrrr and yet where are the photos?, that really what I needed was new photo management software, not just a web gallery. I should be able to share photos from my phone or from my desktop, fix them up either place, tag and such, and OK woo hoo! Such is the future! And the present for many people? Thing is, I also needed good permissions management (Yakpoint what, 10 I guess?), because you know a dude just out of college is not the same as that dude many years later. Which means serving things over HTTPS (Yakpoints 11-47) in such a way that the app has some good control over who gets what.

    Well. Anyway. My mind ran ahead, and runs ahead, and yet we haven't actually tasted the awesome sauce yet. So! The photo management software, whereever it lives, needs to rotate photos at least, and scale them down to a few resolutions. I smell a yak! I looked at jpegtran which can do some lossless rotations but it's not available as a library, which is odd; and really I don't like shelling out for core program functionality, because every time I deal with the file system it's the wild west of concurrent mutation. If naming things is one of the two hardest problems in computer science, the file system is the worst because you have to give a global name to every intermediate value.

    At the same time to scale images, what was I to do? Make a binding to libjpeg? Well I started (Yakpoint 48) but for reals kids, libjpeg is not fun. It works great and is really clever but

    1. it's approximately impossible to use from a dynamic ffi; you want a compiler to verify that you are using the right structure definitions

    2. there has been an inane ABI and format break imposed by the official IJG libjpeg but which other implementations have not followed, but how could you know which one you are using?

    3. the error handling facility encourages longjmp in C programs; somewhat terrifying

    4. off-heap image manipulation libraries always interact poorly with GC, because the GC only sees the small pointer to the off-heap image, and so doesn't GC often enough

    5. I have zero guarantee that libjpeg won't change ABI in weird ways, and I don't want to touch this software for the next 10 years

    6. I want to do jpegtran-like lossless transformations, but that's not available as a library, and it's totes ridics that binding libjpeg does not help you out here

    7. it's still an unsafe C library, battle-tested yes, but terrifyingly unsafe, and I'd be putting it on my server and who knows?

    Friends, I arrived at the pasture, and I, I chose the yak less shaven. I took my lame JPEG parser and turned it into a full decoder (Yakpoint 49), realized it wasn't much more work to do an encoder (Yakpoint 50), and implemented the lossless transformations (Yakpoint 51).

    on haters

    Before we go on, I know some people would think "what is this kid about". I mean, custom gallery software, a custom JPEG library of all things, all bespoke, why don't you just use off-the-shelf solutions? Why aren't you normal and use a normal language and what about the best practices and where's your business case and I can't go on about this because there's a technical term for people that say this kind of thing and it's "hater".

    Thing is, when did a hater ever make anything cool? Come to think of it, when did a hater make anything at all? In my experience the most vocal haters have nothing behind their names except a long series of pseudonymous rants in other people's comment boxes. So friends, in the joyful spirit of earning-anew, let's talk about JPEG!

    on color

    JPEG is a funny thing. Photos are our lives and our memories, our first steps and our friends, and yet I for one didn't know very much about them. My mental model that "a JPEG is a rectangle of pixels" doesn't turn out to be quite right.

    If you actually look in a normal JPEG, you see three planes of information. If I take this image, for example:

    If I decode it, actually I get three images. Here's the first one:

    This is just the greyscale version of the image. So, storytime! Remember black and white television? We had an old one that got moved around the house sometimes, like if Mom was working at something in the kitchen. We also had a color one in the living room, and you could watch one or the other and they showed the same stuff. Strange when you think about it though -- one being in color and the other not. Well it turns out that color was literally just added on, both historically and technically. The main broadcast was still in black and white, and then in one part of the frequency band there were separate color signals, which color TVs would pick up, mix with the black and white signal, and come out with color. Wikipedia notes that "color TV" was really just "colored TV", which is a phrase whose cleverness I respect. Big ups to the W P.

    In the context of JPEG, this black-and-white signal is sometimes called "luma", but is more precisely called Y', where the "prime" (the apostrophe) indicates that the signal has gamma correction applied.

    In the image above, I replaced the color planes (sometimes collectively called the "chroma") with zeroes, while losslessly keeping the luma. Below is the first color plane, with the Y' plane replaced with a uniform 50% luma, and the other color plane replaced with zeros.

    This color signal is technically known as CB, which may be very imperfectly understood as the bluish component of the color. Well the original image wasn't very blue, so we don't see very much here.

    Indeed, our eyes have a harder time seeing differences in color than differences in intensity. Apparently this goes all the way down to biology -- we have more receptors in our eyes for "black and white" and fewer for color.

    Early broadcasters took advantage of this difference in perception by actually devoting more bandwidth in their broadcasts to luma than to chroma; if you check the Wikipedia page you will see that the area in the spectrum allocation devoted to color is much smaller than the area devoted to intensity. So it is in JPEG: the above image being half-width indicates that actually we're just encoding one CB sample for every two Y' samples.

    Finally, here we have the CR color plane, which can loosely be thought of as the "redness" of the image.

    These test images and crops preserve the actual encoding of this photo as it came from my camera, without re-encoding. That's partly why there's not much interesting going on; with the megapixels these days, it's hard to fit much of anything in a few hundred pixels square. This particular camera is sub-sampling in the horizontal direction, but it's also common to subsample vertically as well, producing color planes that are half-width and half-height. In my limited investigations I have found that cameras tend to sub-sample just in the X direction, producing what they call 4:2:2 images, and that standard software encoders subsample in both, producing 4:2:0.

    Incidentally, properly scaling up the color planes is quite an irritating endeavor -- the standard indicates that the color is sampled between the locations of the Y' samples ("centered" chroma), but these images originally have EXIF data that indicates that the color samples are taken at the position of the first Y' sample ("co-sited" chroma). I'm pretty sure libjpeg doesn't delve into the EXIF to check this though, so it would seem that all renderings I have seen of these photos are subtly off.

    But how do you get proper color out of these strange luma and chroma things? Well, the Y'CBCR colorspace is really just the same color cube as RGB, except rotated: the Y' axis traverses the diagonal from (0, 0, 0) (black) to (255, 255, 255) (white). CB and CR are perpendicular to that diagonal, pointing towards blue or red respectively. So to go back to RGB, you multiply by a matrix to rotate the cube.

    It's not a very intuitive color system, as you can see from the images above. For one thing, at zero or full luma, the chroma axes have no meaning; black and white can have no hue. Indeed if you imagine trying to fit a cube corner-down into a similar-sized box, you end up either having empty space in the box, or you have to cut off corners from the cube, or both. Cut corners means that bits of the Y'CBCR signal are wasted; empty space means there are RGB colors that are not representable in Y'CBCR. I'm not sure, but I think both are true for the particular formulation of Y'CBCR used in JPEG.

    There's more to say about color here but frankly I don't know enough to do so, even though I worked in digital video for many years. If this is something you are mildly interested in, I highly, highly recommend watching Wim Taymans' presentation at this year's GStreamer conference. He takes a look at color in video that is constructive, building up from biology through math to engineering. His is a principled approach rather than a list of rules. It really clarified a number of things for me (and opened doors to unknown unknowns beyond).

    on cosines

    Where were we? Right, JPEG. So the proper way to understand what JPEG is is to understand the encoding process. We've covered colorspace conversion from RGB to Y'CBCR and sub-sampling. Next, the image canvas is divided into equal-sized "macroblocks". (These are called "minimum coded units" (MCUs) in the JPEG context, but in video they are usually called macroblocks, and it's a better name.) Without sub-sampling, each macro-block will contain one 8-sample-by-8-sample block for each component (Y', CB, CR) of the image. In my images above, the canvas space corresponding to one chroma block is the space of two luma blocks, so the macroblocks will be 16 samples wide and 8 samples tall, and contain two Y' blocks and one each of CB and CR. If the image canvas can't be evenly divided into macroblocks, it is padded to fit, usually by duplicating the last column or row of samples.

    Then to make a JPEG, each block is encoded separately, then the whole thing is just written out to a file, and you're done!

    This description glosses over a couple of important points, but it's a good big-picture view to have in mind. The pipeline goes from RGB pixels, to a padded RGB canvas, to separate Y'CBCR planes, to a possibly subsampled set of those planes, to macroblocks, to encoded macroblocks, to the file. Decoding is the reverse. It's a totally doable, comprehensible thing, and that was one of the big takeaways for me from this project. I took photography classes in high school and it was really cool to see how to shoot, develop, and print film, and this is similar in many ways. The real "film" is raw-format data, which some cameras produce, but understanding JPEG is like understanding enlargers and prints and fixer baths and such things. It's smelly and dark but pretty cool stuff.

    So, how do you encode a block? Well peoples, this is a kinda cool thing. Maybe you remember from some math class that, given n uniformly spaced samples, you can always represent that series as a sum of n cosine functions of equally spaced frequencies. In each litle 8-by-8 block, that's what we do: a "forward discrete cosine transformation" (FDCT), which is just multiplying together some matrices for every point in the block. The FDCT is completely separable in the X and Y directions, so the space of 8 horizontal coefficients multiplies by the space of 8 vertical coefficients at each column to yield 64 total coefficients, which is not coincidentally the number of samples in a block.

    Funny thing about those coefficients: each one corresponds to a particular horizontal and vertical frequency. We can map these out as a space of functions; for example giving a non-zero coefficient to (0, 0) in the upper-left block of a 8-block-by-8-block grid, and so on, yielding a 64-by-64 pixel representation of the meanings of the individual coefficients. That's what I did in the test strip above. Here is the luma example, scaled up without smoothing:

    The upper-left corner corresponds to a frequency of 0 in both X and Y. The lower-right is a frequency of 4 "hertz", oscillating from highest to lowest value in both directions four times over the 8-by-8 block. I'm actually not sure why there are some greyish pixels around the right and bottom borders; it's not a compression artifact, as I constructed these DCT arrays programmatically. Anyway. Point is, your lover's smile, your sunny days, your raw urban graffiti, your child's first steps, all of these are reified in your photos as a sum of cosine coefficients.

    The odd thing is that what is reified into your pictures isn't actually all of the coefficients there are! Firstly, because the coefficients are rounded to integers. Mathematically, the FDCT is a lossless operation, but in the context of JPEG it is not because the resulting coefficients are rounded. And they're not just rounded to the nearest integer; they are probably quantized further, for example to the nearest multiple of 17 or even 50. (These numbers seem exaggerated, but keep in mind that the range of coefficients is about 8 times the range of the original samples.)

    The choice of what quantization factors to use is a key part of JPEG, and it's subjective: low quantization results in near-indistinguishable images, but in middle compression levels you want to choose factors that trade off subjective perception with file size. A higher quantization factor leads to coefficients with fewer bits of information that can be encoded into less space, but results in a worse image in general.

    JPEG proposes a standard quantization matrix, with one number for each frequency (coefficient). Here it is for luma:

    (define *standard-luma-q-table*
      #(16 11 10 16 24 40 51 61
        12 12 14 19 26 58 60 55
        14 13 16 24 40 57 69 56
        14 17 22 29 51 87 80 62
        18 22 37 56 68 109 103 77
        24 35 55 64 81 104 113 92
        49 64 78 87 103 121 120 101
        72 92 95 98 112 100 103 99))
    

    This matrix is used for "quality 50" when you encode an 8-bit-per-sample JPEG. You can see that lower frequencies (the upper-left part) are quantized less harshly, and vice versa for higher frequencies (the bottom right).

    (define *standard-chroma-q-table*
      #(17 18 24 47 99 99 99 99
        18 21 26 66 99 99 99 99
        24 26 56 99 99 99 99 99
        47 66 99 99 99 99 99 99
        99 99 99 99 99 99 99 99
        99 99 99 99 99 99 99 99
        99 99 99 99 99 99 99 99
        99 99 99 99 99 99 99 99))
    

    For chroma (CB and CR) we see that quantization is much more harsh in general. So not only will we sub-sample color, we will also throw away more high-frequency color variation. It's interesting to think about, but also makes sense in some way; again in photography class we did an exercise where we shaded our prints with colored pencils, and the results were remarkable. My poor, lazy coloring skills somehow rendered leaves lifelike in different hues of green; really though, they were shades of grey, colored in imprecisely. "Colored TV" indeed.

    With this knowledge under our chapeaux, we can now say what the "JPEG quality" setting actually is: it's simply that pair of standard quantization matrices scaled up or down. Towards "quality 100", the matrix approaches all-ones, for no quantization, and thus minimal loss (though you still have some rounding, often subsampling as well, and RGB-to-Y'CBCR gamut loss). Towards "quality 0" they scale to a matrix full of large values, for harsh quantization.

    This understanding also explains those wavey JPEG artifacts you get on low-quality images. Those artifacts look like waves because they are waves. They usually occur at sharp intensity transitions, which like a cymbal crash cause lots of high frequencies that then get harshly quantized. Incidentally I suspect (but don't know) that this is the same reason that cymbals often sound bad in poorly-encoded MP3s, because of harsh quantization in the frequency domain.

    Finally, the coefficients are written out to a file as a stream of bits. Each file gets a huffman code allocated to it, which ideally is built from the distribution of quantized coefficient sizes seen in all of the blocks of an image. There are usually different encodings for luma and chroma, to reflect their different quantizations. Reading and writing this bitstream is a bit of a headache but the algorithm is specified in the JPEG standard, and all you have to do is implement it. Notably, though, there is special support for encoding a run of zero-valued coefficients, which happens often after quantization. There are rarely wavey bits in a blue blue sky.

    on transforms

    It's terribly common for photos to be wrongly oriented. Unfortunately, the way that many editors fix photo rotation is by setting a bit in the EXIF information of the JPEG. This is ineffectual, as web browsers don't look in the EXIF information, and silly, because it turns out you can losslessly rotate most JPEG images anyway.

    Consider that the body of a JPEG is an array of macroblocks. To rotate an image, you just have to rearrange those macroblocks, then rearrange the blocks inside the macroblocks (e.g. swap the two Y' blocks in my above example), then transform the blocks themselves.

    The lossless transformations that you can do on a block are transposition, vertical flipping, and horizontal flipping.

    Transposition flips a block along its downward-sloping diagonal. To do so, you just swap the coefficients at (u, v) with the coefficients at (v, u). Easy peasey.

    Flipping is trickier. Consider the enlarged DCT image from above. What would it take to horizontally flip the function at (0, 1)? Instead of going from light to dark, you want it to go from dark to light. Simple: you just negate the coefficients! But you only want to negate those coefficients that are "odd" in the X direction, which are those coefficients whose column is odd. And actually that's all there is to it. Flipping vertically is the same, but for coefficients whose row is odd.

    I said "most images" above because those whose size is not evenly divided by the macroblock size can't be losslessly rotated -- you will end up seeing some of the hidden data that falls off the edge of the canvas. Oh well. Most raw images are properly dimensioned, and if you're downscaling, you already have to re-encode anyway.

    But that's just flipping and transposition, you say! What about rotation? Well it turns out that you can express rotation in terms of these operations: rotating 90 degrees clockwise is just a transpose and a horizontal flip (in that order). Together, flipping horizontally, flipping vertically, and transposing form a group, in the same way that flipping and flopping form a group for mattresses. Yeah!

    on scheme

    I wrote this library in Scheme because that's my language of choice these days. I didn't run into any serious impedance mismatches; Guile has a generic multi-dimensional array facility that made it possible to express many of these operations as generic folds, unfolds, or maps over arrays. The huffman coding part was a bit irritating, but all in all things were pretty good. The speed is pretty bad, but I haven't optimized it at all, and it gives me a nice test case for the compiler. Anyway, it's been fun and it suits my needs. Check out the project page if you're interested. Yes, to shave a yak you have to get a bit bovine and smelly, but yaks live in awesome places!

    Finally I will leave you with a glitch, one of many that I have produced over the last couple weeks. Comments and corrections welcome below. Happy hacking!

    by Andy Wingo at November 14, 2014 04:49 PM

    Jacobo Aragunde

    LibreOffice workshop at A Coruña University

    Last week I went back to the University of A Coruña, this time to stand at the opposite side of the classroom and conduct a workshop about LibreOffice.

    I was invited by Juan José Sánchez Penas as part of the subject System Information Design (Deseño de Sistemas de Información) which belongs to the Master in Computer Science Engineering (Mestrado de Enxeñería Informática). The goal was introducing students to a real world project and see how the techniques they learn are applied in practice.

    The table of contents:

    1. An introduction to LibreOffice project: its long history since the 80s, current status and the awesome community that powers it.
    2. A high-level overview of the project architecture to present its main design philosophy.
    3. Accessibility in LibreOffice: how it is designed and current status. You will probably find this chapter familiar.
    4. Quality assurance techniques and tools: what the community does to assure the quality of our releases.

    Find below the slides I prepared for the workshop, with versions both in Galician and English. Files are hybrid PDFs to make them easy to modify and reuse; feel free to do it under the terms of the CC-BY-SA license.

    by Jacobo Aragunde Pérez at November 14, 2014 12:37 PM

    Andy Wingo

    generators in firefox now twenty-two times faster

    It's with great pleasure that I can announce that, thanks to Mozilla's Jan de Mooij, the new ES6 generator functions are twenty-two times faster in Firefox!

    Some back-story, for the unawares. There's a new version of JavaScript coming, ECMAScript 6 (ES6). Among the new features that ES6 brings are generator functions: functions that can suspend. Firefox's JavaScript engine, SpiderMonkey, has had support for generators for many years, long before other engines. This support was upgraded to the new ES6 standard last year, thanks to sponsorship from Bloomberg, and was shipped out to users in Firefox 26.

    The generators implementation in Firefox 26 was quite basic. As you probably know, modern JavaScript implementations have a number of tiered engines. In the case of SpiderMonkey there are three tiers: the interpreter, the baseline compiler, and the optimizing compiler. Code begins execution in the interpreter, which is the quickest engine to start. If a piece of code is hot -- meaning that lots of time is being spent there -- then it will "tier up" to the next level, where it is analyzed, possibly optimized, and then compiled to machine code.

    Unfortunately, generators in SpiderMonkey have always been stuck at the lowest tier, the interpreter. This is because of SpiderMonkey's choice of implementation strategy for generators. Generators were implemented as "floating interpreter stack frames": heap-allocated objects whose shape was exactly the same as a stack frame in the interpreter. This had the advantage of being fairly cheap to implement in the beginning, but ultimately it made them unable to take advantage of JIT compilation, as JIT code runs on its own stack which has a different layout. The previous strategy also relied on trampolining through a helper written in C++ to resume generators, which killed optimization opportunities.

    The solution was to represent suspended generator objects as snapshots of the state of a stack frame, instead of as stack frames themselves. In order for this to be efficient, last year we did a number of block scope optimizations to try and reduce the amount of state that a generator frame would have to restore. Finally, around March of this year we were at the point where we could refactor the interpreter to implement generators on the normal interpreter stack, with normal interpreter bytecodes, with the vision of being able to JIT-compile those bytecodes.

    I ran out of time before I could land that patchset; although the patches were where we wanted to go, they actually caused generators to be even slower and so they languished in Bugzilla for a few more months. Sad monkey. It was with delight, then, that a month or so ago I saw that SpiderMonkey JIT maintainer Jan de Mooij was interested in picking up the patches. Since then he has been hacking off and on at getting my old patches into shape, and ended up applying them all.

    He went further, optimizing stack frames to not reserve space for "aliased" locals (locals allocated on the scope chain), speeding up object literal creation in the baseline compiler and finally has implemented baseline JIT compilation for generators.

    So, after all of that perf nargery, what's the upshot? Twenty-two times faster! In this microbenchmark:

    function *g(n) {
        for (var i=0; i<n; i++)
            yield i;
    }
    function f() {
        var t = new Date();
        var it = g(1000000);
        for (var i=0; i<1000000; i++)
    	it.next();
        print(new Date() - t);
    }
    f();
    

    Before, it took SpiderMonkey 980 milliseconds to complete on Jan's machine. After? Only 43! It's actually marginally faster than V8 at this point, which has (temporarily, I think) regressed to 45 milliseconds on this test. Anyway. Competition is great and as a committer to both projects I find it very satisfactory to have good implementations on both sides.

    As in V8, in SpiderMonkey generators cannot yet reach the highest tier of optimization. I'm somewhat skeptical that it's necessary, too, as you expect generators to suspend fairly frequently. That said, a yield point in a generator is, from the perspective of the optimizing compiler, not much different from a call site, in that it causes all locals to be saved. The difference is that locals may have unboxed representations, so we would have to box those values when saving the generator state, and unbox on restore.

    Thanks to Bloomberg for funding the initial work, and big, big thanks to Mozilla's Jan de Mooij for picking up where we left off. Happy hacking with generators!

    by Andy Wingo at November 14, 2014 08:41 AM

    November 13, 2014

    Sergio Villar

    BlinkOn 3

    Last week I attended BlinkOn3 held at Google’s Mountain View office. Not only that but I also had the pleasure of giving a speech about what has been taking most of my time lately, the CSS Grid Layout implementation.

    Although there were several talks already scheduled for some weeks, the conference itself is very dynamic in the sense that new talks were added as people started to propose new topics to discuss.

    I found quite interesting the talks about the State of Blink and also the incoming changes related to Slimming Paint. Exciting times ahead!

    My talk was about the CSS Grid Layout work Igalia has been carrying out for several months now (also thanks to Bloomberg sponsorship). The talk was very well received (we were not a lot of people mainly because my talk was rescheduled trice), and people in general is quite excited about the new opportunities for web authors the spec will bring.

    The slides are here (3.3MB PDF). They look a bit blurry, that’s because its original format was the Google’s HTML5 io-2012-slides which allowed me to do a lot of live demos.

    I also had the opportunity to talk to Julien Chaffraix about the next steps. We both are quite confident about the status of the implementation, so we decided to eventually send the “Intent to ship” at some point during Q4. Very good news for web authors! The most important things to address before the shipping are the handling of absolutely positioned items and a couple of issues related to grids with indefinite remaining space.

    by svillar at November 13, 2014 11:29 AM

    November 11, 2014

    Samuel Iglesias

    Piglit, an open-source test suite for OpenGL implementations

    OpenGL logo OpenGL is an API for rendering 2D and 3D vector graphics now managed by the non-profit technology consortium Khronos Group. This is a multi-platform API found in different form factor devices (from desktop computer to embedded devices) and operating systems (GNU/Linux, Microsoft Windows, Mac OS X, etc).

    As Khronos only defines OpenGL API, the implementors are free to write the OpenGL implementation they wish. For example, when talking about GNU/Linux systems, NVIDIA provides its own proprietary libraries while other manufacturers like Intel are using Mesa, one of the most popular open source OpenGL implementations.

    Because of this implementation freedom, we need a way to check that they follow OpenGL specifications. Khronos provides their own OpenGL conformance test suite but your company needs to become a Khronos Adopter member to have access to it. However there is an unofficial open source alternative: piglit.

    Piglit

    Piglit is an open-source OpenGL implementation conformance test suite created by Nicolai Hähnle in 2007. Since then, it has increased the number of tests covering different OpenGL versions and extensions: today a complete piglit run executes more than 35,000 tests.

    Piglit is one of the tools widely used in Mesa to check that the commits  adding new functionality or modifying the source code don’t break the OpenGL conformance. If you are thinking in contributing to Mesa, this is definitely one of the tools you want to master.

    How to compile piglit

    Before compiling piglit, you need to have the following dependencies installed on your system. Some of them are available in modern GNU/Linux distributions (such as Python, numpy, make…), while others you might need to compile them (waffle).

    • Python 2.7.x
    • Python mako module
    • numpy
    • cmake
    • GL, glu and glut libraries and development packages (i.e. headers)
    • X11 libraries and development packages (i.e. headers)
    • waffle

    But waffle is not available in Debian/Ubuntu repositories, so you need to compile it manually and, optionally,  install it in the system:

    $ git clone git://github.com/waffle-gl/waffle
    $ cmake . -Dwaffle_has_glx=1
    $ make
    $ sudo make install

    Piglit is a project hosted in Freedesktop. To download it, you need to have installed git in your system, then run the corresponding git-clone command:

    $ git clone git://anongit.freedesktop.org/git/piglit

    Once it finishes cloning the repository, just compile it:

    $ cmake .
    $ make

    More info in the documentation.

    As a result, all the test binaries are inside bin/ directory and it’s possible to run them standalone… however there are scripts to run all of them in a row.

    Your first piglit run

    After you have downloaded piglit source code from its git repository and compiled it , you are ready to run the testing suite.

    First of all, make sure that everything is correctly setup:

    $ ./piglit run tests/sanity.tests results/sanity.results

    The results will be inside results/sanity.results directory. There is a way to process those results and show them in a human readable output but I will talk about it in the next point.

    If it fails, most likely it is because libwaffle is not found in the path. If everything went fine, you can execute the piglit test suite against your graphics driver.

    $ ./piglit run tests/all results/all-reference

    Remember that it’s going to take a while to finish, so grab a cup of coffee and enjoy it.

    Analyze piglit output

    Piglit provides several tools to convert the JSON format results in a more readable output: the CLI output tool (piglit-summary.py) and the HTML output tool (piglit-summary-html.py). I’m going to explain the latter first because its output is very easy to understand when you are starting to use this test suite.

    You can run these scripts standalone but piglit binary calls each of them depending of its arguments. I am going to use this binary in all the examples because it’s just one command to remember.

    HTML output

    In order to create an HTML output of a previously saved run, the following command is what you need:

    $ ./piglit summary html --overwrite summary/summary-all results/all-reference

    • You can append more results at the end if you would like to compare them. The first one is the reference for the others, like when counting the number of regressions.

    $ ./piglit summary html --overwrite summary/summary-all results/all-master results/all-test

    • The overwrite argument is to overwrite summary destination directory contents if they have been already created.

    Finally open the HTML summary web page in a browser:

    $ firefox summary/summary-all/index.html

    Each test has a background color depending of the result: red (failed), green (passed), orange (warning), grey (skipped) or black (crashed). If you click on its respective link at the right column, you will see the output of that test and how to run it standalone.

    There are more pages:

    • skipped.html: it lists all the skipped tests.
    • fixes.html: it lists all the tests fixed that before failed.
    • problems.html: it lists all the failing tests.
    • disabled.html: it shows executed tests before but now skipped.
    • changes.html: when you compare twoormoredifferentpiglit runs, this page shows all the changes comparing the new results with the reference (first results/ argument):
      • Previously skipped tests and now executed (although the result could be either fail, crash or pass).
      • It also includes all regressions.html data.
      • Any other change of the tests compared with the reference: crashed tests, passed tests that before were failing or skipped, etc.
    • regressions.html: when you compare two or more different piglit runs, this page shows the number of previously passed tests that now fail.
    • enabled.html: it lists all the executed tests.

    I recommend you to explore which pages are available and what kind of information each one provides. There are more pages like info which is at the first row of each results column on the right most of the screen and it gathers all the information about hardware, drivers, supported OpenGL version, etc.

    Test details

    As I said before, you can see what kind of error output (if any) a test has written, spent time on execution and which kind of arguments were given to the binary.

    piglit-test-detailsThere is also a dmesg field which shows the kernel errors that appeared in each test execution. If these errors are graphics driver related, you can easily detect which test was guilty. To enable this output, you need to add –dmesg argument to piglit run but I will explain this and other parameters in next post.

    Text output

    The usage of the CLI tool is very similar to HTML one except that its output appears in the terminal.

    $ ./piglit summary console results/all-reference

    As its output is not saved in any file, there is not argument to save it in a directory and there is no overwrite arguments either.

    Like HTML-output tool, you can append several result files to do a comparison between them. The tool will output one line per test together with its result (pass, fail, crash, skip) and a summary with all the stats at the end.

    As it prints the output in the console, you can take advantage of tools like grep to look for specific combinations of results

    $ ./piglit summary console results/all-reference | grep fail

    This is an example of an output of this command:

    $ ./piglit summary console results/all-reference
    [...]
    spec/glsl-1.50/compiler/interface-block-name-uses-gl-prefix.vert: pass
    spec/EXT_framebuffer_object/fbo-clear-formats/GL_ALPHA16 (fbo incomplete): skip
    spec/ARB_copy_image/arb_copy_image-targets GL_TEXTURE_CUBE_MAP_ARRAY 32 32 18 GL_TEXTURE_2D_ARRAY 32 16 15 11 12 5 5 1 2 14 15 9: pass
    spec/glsl-1.30/execution/built-in-functions/fs-op-bitxor-uvec3-uint: pass
    spec/ARB_depth_texture/depthstencil-render-miplevels 146 d=z16: pass
    spec/glsl-1.10/execution/variable-indexing/fs-varying-mat2-col-row-rd: pass
    summary:
           pass: 25085
           fail: 262
          crash: 5
           skip: 9746
        timeout: 0
           warn: 13
     dmesg-warn: 0
     dmesg-fail: 0
          total: 35111

    And this is the output when you compare two different piglit results:

    $ ./piglit summary console results/all-reference results/all-test
    [...]
    spec/glsl-1.50/compiler/interface-block-name-uses-gl-prefix.vert: pass pass
    spec/glsl-1.30/execution/built-in-functions/fs-op-bitxor-uvec3-uint: pass pass
    summary:
           pass: 25023
           fail: 548
          crash: 7
           skip: 8264
        timeout: 0
           warn: 15
     dmesg-warn: 0
     dmesg-fail: 0
        changes: 376
          fixes: 2
    regressions: 2
          total: 33857

    Output for Jenkins-CI

    There is another script (piglit-summary-junit.py) that produces results in a format that Jenkins-CI understands which is very useful when you have this continuous integration suite running somewhere. As I have not played with it yet, I keep it as an exercise for readers.

    Conclusions

    Piglit is an open-source OpenGL implementation conformance test suite widely use in projects like Mesa.

    In this post I explained how to compile piglit, run it and convert the result files to a readable output. This is very interesting when you are testing your last Mesa patch before submission to the development mailing list or when you are looking for regressions in the last stable version of your graphics device driver.

    Next post will cover how to run specific tests in piglit and explain some arguments very interesting for specific cases. Stay tuned!

    by Samuel Iglesias at November 11, 2014 04:05 PM

    Iago Toral

    A brief overview of the 3D pipeline

    Recap

    In the previous post I discussed the Mesa development environment and gave a few tips for newcomers, but before we start hacking on the code we should have a look at how modern GPUs look like, since that has a definite impact on the design and implementation of driver code. Let’s get to it.

    Fixed Function vs Programmable hardware

    Before the advent of shading languages like GLSL we did not have the option to program the 3D hardware at will. Instead, the hardware would have specific units dedicated to implement certain operations (like vertex transformations) that could only be used through specific APIs, like those exposed by OpenGL. These units are usually labeled as Fixed Function, to differentiate them from modern GPUs that also expose fully programmable units.

    What we have now in modern GPUs is a fully programmable pipeline, where graphics developers can code graphics algorithms of various sorts in high level programming languages like GLSL. These programs are then compiled and loaded into the GPU to execute specific tasks. This gives graphics developers a huge amount of freedom and power, since they are no longer limited to preset APIs exposing fixed functionality (like the old OpenGL lightning models for example).

    Modern graphics drivers

    But of course all this flexibility and power that graphics developers enjoy today come at the expense of significantly more complex hardware and drivers, since the drivers are responsible for exposing all that flexibility to the developers while ensuring that we still obtain the best performance out of the hardware in each scenario.

    Rather than acting as a bridge between a fixed API like OpenGL and fixed function hardware, drivers also need to handle general purpose graphics programs written in high-level languages. This is a big change. In the case of OpenGL, this means that the driver needs to provide an implementation of the GLSL language, so suddenly, the driver is required to incorporate a full compiler and deal with all sort of problems that belong to the realm of compilers, like choosing an intermediary representation for the program code (IR), performing optimization passes and generating native code for the GPU.

    Overview of a modern 3D pipeline

    I have mentioned that modern GPUs expose fully programmable hardware units. These are called shading units, and the idea is that these units are connected in a pipeline so that the output of a shading unit becomes the input of the next. In this model, the application developer pushes vertices to one end of the pipeline and usually obtains rendered pixels on the other side. In between these two ends there are a number of units making this transition possible and a number of these will be programmable, which means that the graphics developer can control how these vertices are transformed into pixels at different stages.

    The image below shows a simplified example of a 3D graphics pipeline, in this case as exposed by the OpenGL 4.3 specification. Let’s have a quick look at some of its main parts:


    The OpenGL 4.3 3D pipeline (image via www.brightsideofnews.com)

    Vertex Shader (VS)

    This programmable shading unit takes vertices as input and produces vertices as output. Its main job is to transform these vertices in any way the graphics developer sees fit. Typically, this is were we would do transforms like vertex projection,
    rotation, translation and, generally, compute per-vertex attributes that we won’t to provide to later stages in the pipeline.

    The vertex shader processes vertex data as provided by APIs like glDrawArrays or glDrawElements and outputs shaded vertices that will be assembled into primitives as indicated by the OpenGL draw command (GL_TRIANGLES, GL_LINES, etc).

    Geometry Shader

    Geometry shaders are similar to vertex shaders, but instead of operating on individual vertices, they operate on a geometry level (that is, a line, a triangle, etc), so they can take the output of the vertex shader as their input.

    The geometry shader unit is programmable and can be used to add or remove vertices from a primitive, clip primitives, spawn entirely new primitives or modify the geometry of a primitive (like transforming triangles into quads or points into triangles, etc). Geometry shaders can also be used to implement basic tessellation even if dedicated tessellation units present in modern hardware are a better fit for this job.

    In GLSL, some operations like layered rendering (which allows rendering to multiple textures in the same program) are only accessible through geometry shaders, although this is now also possible in vertex shaders via a particular extension.

    The output of a geometry shader are also primitives.

    Rasterization

    So far all the stages we discussed manipulated vertices and geometry. At some point, however, we need to render pixels. For this, primitives need to be rasterized, which is the process by which they are broken into individual fragments that would then be colored by a fragment shader and eventually turn into pixels in a frame buffer. Rasterization is handled by the rasterizer fixed function unit.

    The rasterization process also assigns depth information to these fragments. This information is necessary when we have a 3D scene where multiple polygons overlap on the screen and we need to decide which polygon’s fragments should be rendered and which should be discarded because they are hidden by other polygons.

    Finally, the rasterization also interpolates per-vertex attributes in order to compute the corresponding fragment values. For example, let’s say that we have a line primitive where each vertex has a different color attribute, one red and one green. For each fragment in the line the rasterizer will compute interpolated color values by combining red and green depending on how close or far the fragments are to each vertex. With this, we will obtain red fragments on the side of the red vertex that will smoothly transition to green as we move closer to the green vertex.

    In summary, the input of the rasterizer are the primitives coming from a vertex, tessellation or geometry shader and the output are the fragments that build the primitive’s surface as projected on the screen including color, depth and other interpolated per-vertex attributes.

    Fragment Shader (FS)

    The programmable fragment shader unit takes the fragments produced by the rasterization process and executes an algorithm provided by a graphics developer to compute the final color, depth and stencil values for each fragment. This unit can be used to achieve numerous visual effects, including all kinds of post-processing filters, it is usually where we will sample textures to color polygon surfaces, etc.

    This covers some of the most important elements in 3D the graphics pipeline and should be sufficient, for now, to understand some of the basics of a driver. Notice, however that have not covered things like transform feedback, tessellation or compute shaders. I hope I can get to cover some of these in future posts.

    But before we are done with the overview of the 3D pipeline we should cover another topic that is fundamental to how the hardware works: parallelization.

    Parallelization

    Graphics processing is a very resource demanding task. We are continuously updating and redrawing our graphics 30/60 times per second. For a full HD resolution of 1920×1080 that means that we need to redraw over 2 million pixels in each go (124.416.000 pixels per second if we are doing 60 FPS). That’s a lot.

    To cope with this the architecture of GPUs is massively parallel, which means that the pipeline can process many vertices/pixels simultaneously. For example, in the case of the Intel Haswell GPUs, programmable units like the VS and GS have multiple Execution Units (EU), each with their own set of ALUs, etc that can spawn up to 70 threads each (for GS and VS) while the fragment shader can spawn up to 102 threads. But that is not the only source of parallelism: each thread may handle multiple objects (vertices or pixels depending on the case) at the same time. For example, a VS thread in Intel hardware can shade two vertices simultaneously, while a FS thread can shade up to 8 (SIMD8) or 16 (SIMD16) pixels in one go.

    Some of these means of parallelism are relatively transparent to the driver developer and some are not. For example, SIMD8 vs SIMD16 or single vertex shading vs double vertex shading requires specific configuration and writing driver code that is aligned with the selected configuration. Threads are more transparent, but in certain situations the driver developer may need to be careful when writing code that can require a sync between all running threads, which would obviously hurt performance, or at least be careful to do that kind of thing when it would hurt performance the least.

    Coming up next

    So that was a very brief introduction to how modern 3D pipelines look like. There is still plenty of stuff I have not covered but I think we can go through a lot of that in later posts as we dig deeper into the driver code. My next post will discuss how Mesa models various of the programmable pipeline stages I have introduced here, so stay tuned!

    by Iago Toral at November 11, 2014 12:13 PM

    November 09, 2014

    Andy Wingo

    ffconf 2014

    Last week I had the great privilege of speaking at ffconf in Brighton, UK. It was lovely. The town put on a full demonstration of its range of November weather patterns, from blue skies to driving rain to hail (!) to sea-spray to drizzle and back again. Good times.

    The conference itself was quite pleasant as well, and from the speaker perspective it was amazing. A million thanks to Remy and Julie for making it such a pleasure. ffconf is mostly a front-end development conference, so it's not directly related with the practice of my work on JS implementations -- perhaps you're unaware, but there aren't so many browser implementors that actually develop content for their browsers, and indeed fewer JS implementors that actually write JS. Me, I sling C++ all day long and the most JavaScript I write is for tests. When in the weeds, sometimes we forget we're building an amazing runtime and that people do inspiring things with it, so it's nice to check in with front-end folks at their conferences to see what people are excited about.

    My talk was about the part of JavaScript implementations that are written in JavaScript itself. This is an area that isn't so well known, and it has its amusing quirks. I think it can be interesting to a curious JS hacker who wants to spelunk down a bit to see what's going on in their browsers. Intrepid explorers might even find opportunities to contribute patches. Anyway, nerdy stuff, but that's basically how I roll.

    The slides are here: without images (350kB PDF) or with images (3MB PDF).

    I haven't been to the UK in years, and being in a foreign country where everyone speaks my native language was quite refreshing. At the same time there was an awkward incident in which I was reminded that though close, American and English just aren't the same. I made this silly joke that if you get a polyfill into a JS implementation, then shucks, you have a "gollyfill", 'cause golly it made it in! In the US I think "golly" is just one of those milquetoast profanities, "golly" instead of "god" like saying "shucks" instead of "shit". Well in the UK that's a thing too I think, but there is also another less fortunate connotation, in which "golly" as a noun can be a racial slur. Check the Wikipedia if you're as ignorant as I was. I think everyone present understood that wasn't my intention, but if that is not the case I apologize. With slides though it's less clear, so I've gone ahead and removed the joke from the slides. It's probably not a ball to take and run with.

    However I do have a ball that you can run with though! And actually, this was another terrible joke that wasn't bad enough to inflict upon my audience, but that now chance fate gives me the opportunity to use. So never fear, there are still bad puns in the slides. But, you'll have to click through to the PDF for optimal groaning.

    Happy hacking, JavaScripters, and until next time.

    by Andy Wingo at November 09, 2014 05:36 PM

    November 04, 2014

    Sergio Villar

    I’m attending BlinkOn3

    Today I’m giving a speech at BlinkOn3, the Blink contributors’ conference held in Google’s Mountain View Office. Check the agenda for further details.

    The plan is to give an overview about the feature, present the most recent additions/improvements and also talk about the roadmap. My session is scheduled for 3:30PM at the Artic Ocean room. See you there!

    UPDATE: we had many issues trying to setup the hangout so in the end we decided to move the session to Wednesday morning. It’s currently scheduled for 1:15PM just after lunch.

    by svillar at November 04, 2014 03:35 PM

    October 31, 2014

    Katerina Barone-Adesi

    A tale of sunk allocation

    Pflua is fast; Igalia wants it to be faster. When I joined Igalia, one of the big open questions was why some very similar workloads had extremely different speeds; matching a packet dump against a matching or non-matching host IP address could make the speed vary by more than a factor of 3! There was already a belief that allocation/trace/JIT interactions were a factor in this huge performance discrepancy, and it needed more investigation.

    As a first step, I tracked down and removed some questionable allocations, which showed up in LuaJIT's trace dumps, but were not obvious from the source code. Removing unnecessary allocations from the critical path generally speeds up code, and LuaJIT comes with an option to dump its bytecode, internal representation, and the resulting assembly code, which gave some useful clues about where to start.

    How were the results of removing unnecessary allocation? They were mixed, but they included double-digit performance improvements for some filters on some pcap dumps. "Portrange 0-6000" got 33% faster when benchmarked on the 1gb dump, and 96% faster on wingolog, nearly doubling in speed. The results were achieved by sinking all of the CNEWI allocations within the filter function of pflua-match - the performance gain was due to a small set of changes to one function. So, what was that change, and how was it made?

    The first step was to see the traces generated by running pflua-match.

    luajit -jdump=+rs ./pflua-match ~/path/wingolog.org.pcap tcp > trace1

    The thing to look for is allocations that show up in LuaJIT's SSA IR. They are documented at http://wiki.luajit.org/SSA-IR-2.0#Allocations. Specifically, the important ones are allocations that are not "sunk" - that is, that have not already been optimized away. http://wiki.luajit.org/Allocation-Sinking-Optimization has more details about the allocations LuaJIT can sink). In practice, for the critical path of pflua-match, these were all CNEWI.

    What is CNEWI? The allocation documentation tersely says "Allocate immutable cdata" and that CNEWI "is only used for immutable scalar cdata types. It combines an allocation with an initialization." What is cdata? The FFI documentation defines cdata as "a C data object. It holds a value of the corresponding ctype." So, all of the unsunk allocations have already been tracked down to something immutable related to the FFI at this point.

    Pflua compiles 'pflang' (the nameless filtering language of libpcap) to Lua; tools like pflua-match run the generated Lua code, and LuaJIT compiles hot traces in both pflua-match itself and the generated Lua. Here's the start of what the generated Lua code for the "tcp" filter above looks like:

    function match(P,length)
       if not (length >= 34) then return false end
       do
          local v1 = ffi.cast("uint16_t*", P+12)[0]
          if not (v1 == 8) then goto L3 end
    

    Here is an illustrative excerpt from an IR trace of pflua-match:

    0019 p64 ADD 0018 +12
    0020 {sink} cdt CNEWI +170 0019
    0021 > fun EQ 0014 ffi.cast
    0022 {sink} cdt CNEWI +173 0019
    0023 rbp u16 XLOAD 0019
    .... SNAP #3 [ ---- ---- ---- 0023 ]
    0024 > int EQ 0023 +8
    

    A few things stand out. There are two CNEWI calls, and they are both sunk. Also, the constant 12 on the first line looks awfully familiar, as does the 8 on the last line: this is the IR representation of the generated code! The 0019 at the end of lines refers back to the line starting with 0019 at the end of the trace, rbp is a register, the +170 and +173 are an internal representation of C types, etc. For more details, read about the IR on LuaJIT's wiki.

    This trace is not directly useful for optimization, because the CNEWI calls are already sunk (this is shown by the {sink}, rather than a register, proceeding them), but it illustrates the principles and tools involved. It also shows that raw data access can be done with sunk allocations, even when it involves the FFI.

    Here is a somewhat more complex trace, showing first the LuaJIT bytecode, and then the LuaJIT SSA IR.

    ---- TRACE 21 start 20/8 "tcp":2
    0004 . KPRI 2 1
    0005 . RET1 2 2
    0022 ISF 8
    0023 JMP 9 => 0025
    0025 ADDVN 3 3 0 ; 1
    0026 MOV 0 7
    0027 JMP 5 => 0003
    0003 ISGE 0 1
    0004 JMP 5 => 0028
    0000 . . FUNCC ; ffi.meta.__lt
    0005 JLOOP 5 20
    ---- TRACE 21 IR
    0001 [8] num SLOAD #4 PI
    0002 [10] num SLOAD #5 PI
    0003 r14 p64 PVAL #21
    0004 r13 p64 PVAL #59
    0005 rbp p64 PVAL #64
    0006 r15 + cdt CNEWI +170 0003
    0007 {sink} cdt CNEWI +172 0003
    0008 {sink} cdt CNEWI +170 0004
    0009 [18] + cdt CNEWI +170 0005
    

    There are 4 CNEWI calls; only two are sunk. The other two are ripe for optimization. The bytecode includes JLOOP, which turns out to be a big clue.

    Here is the original implementation of pflua-match's filter harness:

    local function filter(ptr, ptr_end, pred)
       local seen, matched = 0, 0
       while ptr < ptr_end do
          local record = ffi.cast("struct pcap_record *", ptr)
          local packet = ffi.cast("unsigned char *", record + 1)
          local ptr_next = packet + record.incl_len
          if pred(packet, record.incl_len) then
             matched = matched + 1
          end
          seen = seen + 1
          ptr = ptr_next
       end
       return seen, matched
    end
    

    It contains exactly one loop. There are less and less places that the unsunk CNEWI allocations could be hiding. The SSA IR documentation says that PVAL is a "parent value reference", which means it comes from the parent trace, and "TRACE 21 start 20/8" says that the parent trace of trace 21 is trace 20. Once again, constants provide a hint to how code lines up: the 16 is from local packet = ffi.cast("unsigned char *", record + 1), as 16 is the size of pcap_record.

    0015 rdi p64 ADD 0013 +16
    0017 {sink} cdt CNEWI +170 0015
    0018 p64 ADD 0013 +8
    0019 rbp u32 XLOAD 0018
    0020 xmm2 num CONV 0019 num.u32
    0021 rbp + p64 ADD 0019 0015
    

    This is further confirmation that 'filter' is the function to be looking at. The reference to a parent trace itself is shown not to be the problem, because both a sunk and an unsunk allocation do it (here are the relevant lines from trace 21):

    0003 r14 p64 PVAL #21
    0006 r15 + cdt CNEWI +170 0003
    0007 {sink} cdt CNEWI +172 0003
    

    LuaJIT does not do classical escape analysis; it is quite a bit more clever, and sinks allocations which classical escape analysis cannot. http://wiki.luajit.org/Allocation-Sinking-Optimization discusses at length what it does and does not do. However, the filter function has an if branch in the middle that suffices to prevent the update to ptr from being sunk. One workaround is to store an offset to the pointer, which is a simple number shared between traces, and confine the pointer arithmetic to within one trace; it turns out that this is sufficient. Here is one rewritten version of the filter function which makes all of the CNEWI calls be sunk:

    local function filter(ptr, max_offset, pred)
       local seen, matched, offset = 0, 0, 0
       local pcap_record_size = ffi.sizeof("struct pcap_record")
       while offset < max_offset do
          local cur_ptr = ptr + offset
          local record = ffi.cast("struct pcap_record *", cur_ptr)
          local packet = ffi.cast("unsigned char *", record + 1)
          offset = offset + pcap_record_size + record.incl_len
          if pred(packet, record.incl_len) then
             matched = matched + 1
          end
          seen = seen + 1
       end
       return seen, matched
    end
    

    How did removing the allocation impact performance? It turned out to be mixed: peak speed dropped, but so did variance, and some filters became a double digit percent faster. Here are the results for the 1gb pcap file that Igalia uses in some benchmarks, which shows large gains on the "tcp port 5555" and "portrange 0-6000" filters, and a much smaller loss of performance on the "tcp", "ip and "accept all" filters.

    The original file:

    The patched file, without CNEWI in the critical path:

    Benchmarks on three other pcap dumps can be found at https://github.com/Igalia/pflua/issues/57. More information about the packet captures tested and Igalia's benchmarking of pflua in general is available at Igalia's pflua-bench repository on Github. There is also information about the workloads and the environment of the benchmarks.

    There are several takeaways here. One is that examining the LuaJIT IR is extremely useful for some optimization problems. Another is that LuaJIT is extremely clever; it manages to make using the FFI quite cheap, and performs optimizations that common techniques cannot. The value of benchmarking, and how it can be misleading, both are highly relevant: a change that removed allocation from a tight loop also hurt performance in several cases, apparently by a large amount in some of the benchmarks on other pcap dumps. Lastly, inefficiency can be where it is least expected; while ptr < ptr_end looks like extremely innocent code, and is just a harness in a small tool script, but it was the cause of unsunk allocations in the critical path, which in turn were skewing benchmarks. What C programmer would suspect that line was at fault?

    by Katerina Barone-Adesi at October 31, 2014 10:51 PM

    October 28, 2014

    Manuel Rego

    Presenting the Web Engines Hackfest

    After the Google’s fork back in April 2013, WebKit and Blink communities have been working independently, however patches often move from one project to another. In addition, a fair amount of the source code continues to be similar. Thus, it seems interesting to have a common place to discuss topics of shared interest and make plans for the future.

    For that reason, Igalia is announcing the Web Engines Hackfest that would happen in December 7-10 in our main office in A Coruña (Spain). This is like a new edition of the WebKitGTK+ Hackfest that we’ve been organizing since 2009, but this year changing the focus in order to make a more inclusive event.

    The Hackfest

    The new event will include members from all parts of the Web Platform community, not restricted to WebKit and Blink, but open to the different Web (Gecko and Servo) and JavaScript (V8, JavaScriptCore and SpiderMoney) free software engines.

    Past year hackfest picture by Claudio Saavedra

    Past year hackfest picture by Claudio Saavedra

    The idea is to bring together implementors from the Open Web Platform community in a 4-day hacking/discussion/unconference event with the goal of moving forward different topics, discuss common issues, make plans for the future, etc.

    And that’s what makes this event something different. It’s a completely hacking oriented event, where developers sit together in small groups to work for a few days pursuing some specific goals. These groups of interest are defined at the beginning of the hackfest and they agree the different tasks to be done on during the event.
    Thereby, don’t expect a full scheduling in advance with a list of talks and speakers like in a regular conference, this is something totally different focused on real hacking.

    Join us

    After the first round of invitations we’ve already a nice list of people from different companies that will be attending the hackfest this year.
    The good news is that we still have room for a few more people, so if you’re interested in coming to the event please contact us.

    Adobe and Igalia are sponsoring the Web Engines Hackfest 2014

    Adobe and Igalia are sponsoring the Web Engines Hackfest 2014

    Thanks to Adobe and Igalia for sponsoring the Web Engines Hackfest and make possible such an exciting event.

    Looking forward to meet you there!

    by Manuel Rego Casasnovas at October 28, 2014 01:29 PM

    October 20, 2014

    Claudio Saavedra

    Mon 2014/Oct/20

    • Together with GNOME 3.14, we have released Web 3.14. Michael Catanzaro, who has been doing an internship at Igalia for the last few months, wrote an excellent blog post describing the features of this new release. Go and read his blog to find out what we've been doing while we wait for his new blog to be sindicated to Planet GNOME.

    • I've started doing two exciting things lately. The first one is Ashtanga yoga. I had been wanting to try yoga for a long time now, as swimming and running have been pretty good for me but at the same time have made my muscles pretty stiff. Yoga seemed like the obvious choice, so after much tought and hesitation I started visiting the local Ashtanga Yoga school. After a month I'm starting to get somewhere (i.e. my toes) and I'm pretty much addicted to it.

      The second thing is that I started playing the keyboards yesterday. I used to toy around with keyboards when I was a kid but I never really learned anything meaningful, so when I saw an ad for a second-hand WK-1200, I couldn't resist and got it. After an evening of practice I already got the feel of Cohen's Samson in New Orleans and the first 16 bars of Verdi's Va, pensiero, but I'm still horribly bad at playing with both hands.

    October 20, 2014 08:11 AM

    October 17, 2014

    Enrique Ocaña

    Hacking on Chromium for Android from Eclipse (part 3)

    In the previous posts we learnt how to code and debug Chromium for Android C++ code from Eclipse. In this post I’m going to explain how to open the ChromeShell Java code, so that you will be able to hack on it like you would in a normal Android app project. Remember, you will need to install the ADT plugin in Eclipse  and the full featured adb which comes with the standalone SDK from the official pageDon’t try to reuse the android sdk in “third_party/android_tools/sdk”.

    Creating the Java project in Eclipse

    Follow these instructions to create the Java project for ChromeShell (for instance):

    • File, New, Project…, Android, Android project from existing code
    • Choose “src/chrome/android/shell/java” as project root, because there’s where the AndroidManifest.xml is. Don’t copy anything to the workspace.
    • The project will have a lot of unmet package dependencies. You have to manually import some jars:
      • Right click on the project, Properties, Java build path, Libraries, Add external Jars…
      • Then browse to “src/out/Debug/lib.java” (assuming a debug build) and import these jars (use CTRL+click for multiple selection in the file chooser):
        • base_java.jar, chrome_java.jar, content_java.jar, dom_distiller_core_java.jar, guava_javalib.jar,
        • jsr_305_javalib.jar, net_java.jar, printing_java.jar, sync_java.jar, ui_java.jar, web_contents_delegate_android.jar
      • If you keep having problems, go to “lib.java”, run this script and find in which jar is the class you’re missing:
    for i in *.jar; do echo "--- $i ---------------------"; unzip -l $i; done | most
    • The generated resources directory “gen” produced by Eclipse is going to lack a lot of stuff.
      • It’s better to make it point to the “right” gen directory used by the native build scripts.
      • Delete the “gen” directory in “src/chrome/android/shell/java” and make a symbolic link:
    ln -s ../../../../out/Debug/chrome_shell_apk/gen .
      • If you ever “clean project” by mistake, delete the chrome_shell_apk/gen directory and regenerate it using the standard ninja build command
    • The same for the “res” directory. From “src/chrome/android/shell/java”, do this (and ignore the errors):
    cp -sr $PWD/../res ./
    cp -sr $PWD/../../java/res ./
    • I haven’t been able to solve the problem of integrating all the string definitions. A lot of string related errors will appear in files under “res”. By the moment, just ignore those errors.
    • Remember to use a fresh standalone sdk. Install support for Android 4.4.2. Also, you will probably need to modify the project properties to match the same 4.4.2 version you have support for.

    And that’s all. Now you can use all the Java code indexing features of Eclipse. By the moment, you still need to build and install to the device using the command line recipe, though:

     ninja -C out/Debug chrome_shell_apk
     build/android/adb_install_apk.py --apk ChromeShell.apk --debug

    Debugging Java code

    To debug the Java side of the app running in the device, follow the same approach that you would if you had a normal Java Android app:

    • Launch the ChromeShell app manually from the device.
    • In Eclipse, use the DDMS perspective to locate the org.chromium.chrome.shell process. Select it in the Devices panel and connect the debugger using the “light green bug” icon (not to be mistaken with the normal debug icon available from the other perspectives).
    • Change to the Debug perspective and set breakpoints as usual.

    Enjoy!

    chromium_android_eclipse_java

    by eocanha at October 17, 2014 06:00 AM

    October 16, 2014

    Claudio Saavedra

    Thu 2014/Oct/16

    My first memories of Meritähti are from that first weekend, in late August 2008, when I had just arrived in Helsinki to spend what was supposed to be only a couple of months doing GTK+ and Hildon work for Nokia. Lucas, who was still in Finland at the time, had recommended that I check the program for the Night of the Arts, an evening that serves as the closing of the summer season in the Helsinki region and consists of several dozens of street stages set up all over with all kind of performances. It sounded interesting, and I was looking forward to check the evening vibe out.

    I was at the Ruoholahti office that Friday, when Kimmo came over to my desk to invite me to join his mates for dinner. Having the Night of the Arts in mind, I suggested we grab some food downtown before finding an interesting act somewhere, to which he replied emphatically "No! We first go to Meritähti to eat, a place nearby — it's our Friday tradition here." Surprised at the tenacity of his objection and being the new kid in town, I obliged. I can't remember now who else joined us in that summer evening before we headed to the Night of the Arts, probably Jörgen, Marius, and others, but that would be the first of many more to come in the future.

    I started taking part of that tradition and I always thought, somehow, that those Meritähti evenings would continue for a long time. Because even after the whole Hildon team was dismantled, even after many of the people in our gang left Nokia and others moved on to work on the now also defunct MeeGo, we still met in Meritähti once in a while for food, a couple of beers, and good laughs. Even after Nokia closed down the Ruoholahti NRC, even after everyone I knew had left the company, even after the company was sold out, and even after half the people we knew had left the country, we still met there for a good old super-special.

    But those evenings were not bound to be eternal, and like most good things in life, they are coming to an end. Meritähti is closing in the next weeks, and the handful of renegades who stuck in Helsinki will have to find a new place where to spend our evenings together. László, the friendly Hungarian who ran the place with his family, is moving on to less stressful endeavors. Keeping a bar is too much work, he told us, and everyone has the right to one day say enough. One would want to do or say something to change his mind, but what right do we have? We should instead be glad that the place was there for us and that we had the chance to enjoy uncountable evenings under the starfish lamps that gave the place its name. If we're feeling melancholic, we will always have Kaurismäki's Lights in the dusk and that glorious scene involving a dog in the cold, to remember one of those many times when conflict would ensue whenever a careless dog-owner would stop for a pint in the winter.

    Long live Meritähti, long live László, and köszönöm!

    October 16, 2014 11:49 AM

    October 14, 2014

    Enrique Ocaña

    Hacking on Chromium for Android from Eclipse (part 2)

    In the previous post, I showed all the references to get the Chromium for Android source code, setup Eclipse and build the ChromeShell app. Today I’m going to explain how to debug that app running in the device.

    Debugging from command line

    This is the first step that we must ensure to have working before trying to debug directly from Eclipse. The steps are explained in the debugging-on-android howto, but I’m showing them here for reference.

    Perform the “build Chrome shell” steps but using debug parameters:

     ninja -C out/Debug chrome_shell_apk
     build/android/adb_install_apk.py --apk ChromeShell.apk --debug

    To avoid the need of having a rooted Android device, setup ChromeShell as the app to be debugged going to Android Settings, Debugging in your device. Now, to launch a gdb debugging session from a console:

     cd ~/ANDROID/src
     . build/android/envsetup.sh
     ./build/android/adb_gdb_chrome_shell --start

    You will see that the adb_gdb script called by adb_gdb_chrome_shell pulls some libraries from your device to /tmp. If everything goes fine, gdb shouldn’t have any problem finding all the symbols of the source code. If not, please check your setup again before trying to debug in Eclipse.

    Debugging from Eclipse

    Ok, this is going to be hacky. Hold on your hat!

    Eclipse can’t use adb_gdb_chrome_shell and adb_gdb “as is”, because they don’t allow gdb command line parameters. We must create some wrappers in $HOME/ANDROID, our working dir. This means “/home/enrique/ANDROID/” for me. The wrappers are:

    Wrapper 1: adb_gdb

    This is a copy of  ~/ANDROID/src/build/android/adb_gdb with some modifications. It calculates the same as the original, but doesn’t launch gdb. Instead, it creates two symbolic links in ~/ANDROID:

    • gdb is a link to the arm-linux-androideabi-gdb command used internally.
    • gdb.init is a link to the temporary gdb config file created internally.

    These two files will make the life simpler for Eclipse. After that, the script prints the actual gdb command that it would have executed (but has not), and reads a line waiting for ENTER. After the user presses ENTER, it just kills everything. Here are the modifications that you have to do to the original adb_gdb you’ve copied. Note that my $HOME (~) is “/home/enrique”:

     # In the begining:
     CHROMIUM_SRC=/home/enrique/ANDROID/src
     ...
     # At the end:
     log "Launching gdb client: $GDB $GDB_ARGS -x $COMMANDS"
    
     rm /home/enrique/ANDROID/gdb
     ln -s "$GDB" /home/enrique/ANDROID/gdb
     rm /home/enrique/ANDROID/gdb.init
     ln -s "$COMMANDS" /home/enrique/ANDROID/gdb.init
     echo
     echo "---------------------------"
     echo "$GDB $GDB_ARGS -x $COMMANDS"
     read
    
     exit 0
     $GDB $GDB_ARGS -x $COMMANDS &&
     rm -f "$GDBSERVER_PIDFILE"

    Wrapper 2: adb_gdb_chrome_shell

    It’s a copy of ~/ANDROID/src/build/android/adb_gdb_chrome_shell with a simple modification in PROGDIR:

     PROGDIR=/home/enrique/ANDROID

    Wrapper 3: gdbwrapper.sh

    Loads envsetup, returns the gdb version for Eclipse if asked, and invokes adb_gdb_chrome_shell. This is the script to be run in the console before starting the debug session in Eclipse. It will invoke the other scripts and wait for ENTER.

     #!/bin/bash
     cd /home/enrique/ANDROID/src
     . build/android/envsetup.sh
     if [ "X$1" = "X--version" ]
     then
      exec /home/enrique/ANDROID/src/third_party/android_tools/ndk/toolchains/arm-linux-androideabi-4.8/prebuilt/linux-x86_64/bin/arm-linux-androideabi-gdb --version
      exit 0
     fi
     exec ../adb_gdb_chrome_shell --start --debug
     #exec ./build/android/adb_gdb_chrome_shell --start --debug

    Setting up Eclipse to connect to the wrapper

    Now, the Eclipse part. From the “Run, Debug configurations” screen, create a new “C/C++ Application” configuration with these features:

    • Name: ChromiumAndroid 1 (name it as you wish)
    • Main:
      • C/C++ Application: /home/enrique/ANDROID/src/out/Debug/chrome_shell_apk/libs/armeabi-v7a/libchromeshell.so
      • IMPORTANT: From time to time, libchromeshell.so gets corrupted and is truncated to zero size. You must regenerate it by doing:

    rm -rf /home/enrique/ANDROID/src/out/Debug/chrome_shell_apk
    ninja -C out/Debug chrome_shell_apk

      • Project: ChromiumAndroid (the name of your project)
      • Build config: Use active
      • Uncheck “Select config using C/C++ Application”
      • Disable auto build
      • Connect process IO to a terminal
    • IMPORTANT: Change “Using GDB (DSF) Create Process Launcher” and use “Legacy Create Process Launcher” instead. This will enable “gdb/mi” and allow us to set the timeouts to connect to gdb.
    • Arguments: No changes
    • Environment: No changes
    • Debugger:
      • Debugger: gdb/mi
      • Uncheck “Stop on startup at”
      • Main:
        • GDB debugger: /home/enrique/ANDROID/gdb (IMPORTANT!)
        • GDB command file: /home/enrique/ANDROID/gdb.init (IMPORANT!)
        • GDB command set: Standard (Linux)
        • Protocol: mi2
        • Uncheck: “Verbose console”
        • Check: “Use full file path to set breakpoints”
      • Shared libs:
        • Check: Load shared lib symbols automatically
    • Source: Use the default values without modification (absolute file path, program relative file path, ChromiumAndroid (your project name)).
    • Refresh: Uncheck “Refresh resources upon completion”
    • Common: No changes.

    When you have everything: apply (to save), close and reopen.

    Running a debug session

    Now, run gdbwrapper.sh in an independent console. When it pauses and starts waiting for ENTER, change to Eclipse, press the Debug button and wait for Eclipse to attach to the debugger. The execution will briefly pause in an ioctl() call and then continue.

    To test that the debugging session is really working, set a breakpoint in content/browser/renderer_host/render_message_filter.cc, at content::RenderMessageFilter::OnMessageReceived and continue the execution. It should break there. Now, from the Debug perspective, you should be able to see the stacktrace and access to the local variables.

    Welcome to the wonderful world of Android native code debugging from Eclipse! It’s a bit slow, though.

    This completes the C++ side of this series of posts. In the next post, I will explain how to open the Java code of ChromeShellActivity, so that you will be able to hack on it like you would in a normal Android app project.

    chromium_android_eclipse

    by eocanha at October 14, 2014 06:00 AM

    October 11, 2014

    Enrique Ocaña

    Hacking on Chromium for Android from Eclipse (part 1)

    In the Chromium Developers website has some excellent resources on how to setup an environment to build Chromium for Linux desktop and for Android. There’s also a detailed guide on how to setup Eclipse as your development environment, enabling you to take advantage of code indexing and enjoy features such as type hierarchy, call hierarchy, macro expansion, references and a lot of tools much better than the poor man’s trick of grepping the code.

    Unfortunately, there are some integration aspects not covered by those guides, so joining all the dots is not a smooth task. In this series of posts, I’m going to explain the missing parts to setup a working environment to code and debug Chromium for Android from Eclipse, both C++ and Java code. All the steps and commands from this series of posts have been tested in an Ubuntu Saucy chroot. See my previous post on how to setup a chroot if you want to know how to do this.

    Get the source code

    See the get-the-code guide. Don’t try to reconvert a normal Desktop build into an Android build. It just doesn’t work. The detailed steps to get the code from scratch and prepare the dependencies are the following:

     cd ANDROID # Or the directory you want
     fetch --nohooks android --nosvn=True
     cd src
     git checkout master
     build/install-build-deps.sh
     build/install-build-deps-android.sh
     gclient sync --nohooks

    Configure and generate the project (see AndroidBuildInstructions), from src:

     # Make sure that ANDROID/.gclient has this line:
     # target_os = [u'android']
     # And ANDROID/chromium.gyp_env has this line:
     # { 'GYP_DEFINES': 'OS=android', }
     gclient runhooks

    Build Chrome shell, from src:

     # This builds
     ninja -C out/Release chrome_shell_apk
     # This installs in the device
     # Remember the usual stuff to use a new device with adb:
     # http://developer.android.com/tools/device.html 
     # http://developer.android.com/tools/help/adb.html#Enabling
     # Ensure that you can adb shell into the device
     build/android/adb_install_apk.py --apk ChromeShell.apk --release

    If you ever need to update the source code, follow this recipe and use Release or Debug at your convenience:

     git pull origin master
     gclient sync
     # ninja -C out/Release chrome_shell_apk
     ninja -C out/Debug chrome_shell_apk
     # build/android/adb_install_apk.py --apk ChromeShell.apk --release
     build/android/adb_install_apk.py --apk ChromeShell.apk --debug

    As a curiosity, it’s worth to mention that adb is installed on third_party/android_tools/sdk/platform-tools/adb.

    Configure Eclipse

    To configure Eclipse, follow the instructions in LinuxEclipseDev. They work nice with Eclipse Kepler.

    In order to open and debug the Java code properly, it’s also interesting to install the ADT plugin in Eclipse too. Don’t try to reuse the Android SDK in “third_party/android_tools/sdk”. It seems to lack some things. Download a fresh standalone SDK from the official page instead and tell the ADT plugin to use it.

    In the next post, I will explain how to debug C++ code running in the device, both from the command line and from Eclipse.

    chromium_android_eclipse_cpp

    by eocanha at October 11, 2014 12:46 AM

    October 02, 2014

    Jacobo Aragunde

    LibreOffice on Android #4 – Document browser revisited

    I’m borrowing the post title that Tomaž and Andrzej used before to talk about the work that I have lately been doing at Igalia regarding LibreOffice on Android.

    You might know there are several projects living under android/experimental in our code tree; it is exciting to see that a new experiment for a document viewer that uses a fresh approach recently arrived to the party, which can be the basis for an Android editor. I was happy to add support to receive view or edit intents to the shiny new viewer, so we could open any document from other Android applications like file browsers.

    Besides, android/experimental hid some very interesting work on an Android-centric document browser that could be a good starting point to implement a native Android wrapping UI to LibreOffice, although it had some problems that made it unsuable. In particular, thumbnail generation was making the application crash – for that reason I’ve disabled it until we get a proper fix – and the code to open a document was broken. Fixing and working around these issues were the first steps to bring the document browser back to life.

    I noticed that the browser was inconveniently dependent of the ActionBarSherlock library, which is not really necessary now we are targetting modern Android versions with out-of-the-box action bar support. I replaced Sherlock ActionBars with Android native ones, and that allowed to remove all the files from ABS library from our source tree.

    I also took the freedom to reorganize the application resources (design definitions, bitmaps and so), removing duplicated ones. It was the preparation for the next task…

    Finally, I merged the document browser project into the new viewer with this huge patch, so they can be built and installed together. I also did the modifications for the browser to open the documents using the new viewer, so they become one coherent, whole application.

    Now both the viewer and the document browser can evolve together to become a true LibreOffice for Android, which I hope to see not too far away in the future.

    LibreOffice document browser screenshot

    by Jacobo Aragunde Pérez at October 02, 2014 10:56 AM

    September 29, 2014

    Juan A. Suárez

    Highlights in Grilo 0.2.11 (and Plugins 0.2.13)

    Hello, readers!

    Some weeks ago we released a new version of Grilo and the Plugins set (yes, it sounds like a 70′s music group :) ). You can read the announcement here and here. If you are more curious about all the detailed changes done, you can take a look at the Changelog here and here.

    But even when you can read that information in the above links, it is always a pleasure if someone highlights what are the main changes. So let’s go!

    Launch Tool

    Regarding the core system, among the typical bug fixes, I would highlight a new tool: grl-launch. This tool, as others, got inspiration from GStreamer gst-launch. So far, when you wanted to do some operation in Grilo, like performing a search in YouTube or getting the title of a video on disk, the recommended way was using Grilo Test UI. This is a basic application that allows you to perform the typical operations in Grilo, like browsing or searching, and everthing from a graphical interface. The problem is that this tool is not flexible enough, so you can’t control all the details you could require. And it is also useful to visually check the results, but not to export the to manage with another tool.

    So while the Test UI is still very useful, to cover the other cases we have grl-launch. It is a command-line based tool that allows you to perform most of the operations allowed in Grilo, with a great degree of control. You can browse, search, solve details from a Grilo media element, …, with a great control: how many elements to skip or return, the metadata keys (title, author, album, …) to retrieve, flags to use, etc.

    And on top of that, the results can be exported directly to a CSV file so it can be loaded later in a spreadsheet.

    As example, getting the 10 first trailers from Apple’s iTunes Movie Trailers site:

    
    $ grl-launch-0.2 browse -c 10 -k title,url grl-apple-trailers
    23 Blast,http://trailers.apple.com/movies/independent/23blast/23blast-tlr_h480p.mov
    A Most Wanted Man,http://trailers.apple.com/movies/independent/amostwantedman/amostwantedman-tlr1_h480p.mov
    ABC's of Death 2,http://trailers.apple.com/movies/magnolia_pictures/abcsofdeath2/abcsofdeath2-tlr3_h480p.mov
    About Alex,http://trailers.apple.com/movies/independent/aboutalex/aboutalex-tlr1b_h480p.mov
    Addicted,http://trailers.apple.com/movies/lionsgate/addicted/addicted-tlr1_h480p.mov
    "Alexander and the Terrible, Horrible, No Good, Very Bad Day",http://trailers.apple.com/movies/disney/alexanderterribleday/alexanderterribleday-tlr1_h480p.mov
    Annabelle,http://trailers.apple.com/movies/wb/annabelle/annabelle-tlr1_h480p.mov
    Annie,http://trailers.apple.com/movies/sony_pictures/annie/annie-tlr2_h480p.mov
    Are You Here,http://trailers.apple.com/movies/independent/areyouhere/areyouhere-tlr1_h480p.mov
    As Above / So Below,http://trailers.apple.com/movies/universal/asabovesobelow/asabovesobelow-tlr1_h480p.mov
    10 results
    

    As said, if you re-direct the output to a file and you import it from a spreadsheet program as CSV you will read it better.

    dLeyna/UPnP plugin

    Regarding the plugins, here is where the fun takes place. Almost all plugins were touched, in some way or other. In most cases, for fixing bugs. But there are other changes I’d like to highlight. And among them, UPnP is one that suffered biggest changes.

    Well, strictly speaking, there is no more UPnP plugin. Rather, it was replaced by new dLeyna plugin, written mainly by Emanuele Aina. From an user point of view, there shouldn’t be big differences, as this new plugin also provides access to UPnP/DLNA sources. So where are the differences?

    First off, let’s specify what is dLeyna. So far, if you want to interact with a UPnP source, either you need to deal with the protocol, or use some low-level library, like gupnp. This is what the UPnP plugin was doing. Still it is a rather low-level API, but higher and better than dealing with the raw protocol.

    On the other hand, dLeyna, written by the Intel Open Source Technology Center, wraps the UPnP sources with a D-Bus layer. Actually,not only sources, but also UPnP media renderers and controllers, though in our case we are only interested in the UPnP sources. Thanks to dLeyna, you don’t need any more to interact with low-level UPnP, but with a higher D-Bus service layer. Similar to the way we interact with other services in GNOME or in other platforms. This makes easier to browser or search UPnP sources, and allows us to add new features. dLeyna also hides some details specific to each UPnP server that are of no interest for us, but we would need to deal with in case of using a lower level API. The truth is that though UPnP is quite well specified, each implementation doesn’t follow it at 100%: there are always slight differences that create nasty bugs. In this case, dLeyna acts (or should act) as a protection, dealing itself with those differences.

    And what is needed to use this new plugin? Basically, having dleyna-service D-Bus installed. When the plugin is started, it wakes up the service, which will expose all the available UPnP servers in the network, and the plugin would expose them as Grilo sources. Everything as it was happening with the previous UPnP source.

    In any case, I still keep a copy of the old UPnP plugin for reference, in case someone want to use it or take a look. It is in “unmaintained” mode, so try to use the new dLeyna plugin instead.

    Lua Factory plugin

    There isn’t big changes here, except fixes. But I want to remark it here because it is where most activity is happening. I must thank Bastien and Victor for the work they are doing here. Just to refresh, this plugin allows to execute sources written in Lua. That is, instead of writing your sources in GObject/C, you can use Lua. The Lua Factory plugin will load and run them. Writing plugins in Lua is a pleasure, as it allows to focus on fixing the real problems and leave the boiler plate details to the factory. Honestly, if you are considering writing a new source, I would really think about writing it in Lua.

    And that’s all! It is a longer post than usual, but it is nice to explain what’s going on in Grilo. And remember, if you are considering using Grilo in your product, don’t hesitate to contact with us.

    Happy week!

    by Juan A. Suárez at September 29, 2014 08:45 AM

    September 23, 2014

    Sergio Villar

    Grids everywhere!

    Hi dear readers,

    it’s awesome to see people-really-excited (including our friends at Bloomberg) about CSS Grid Layout, specially after Rachel Andrew‘s talk in CSSConf. I really believe CSS Grid Layout will be a revolution for web designers as it will help them to build amazing responsive web sites without having to add hacks all around.

    Me and my fellow Igalians keep working on adjusting the code to match the specification, polishing the code, adding new features and even drastically improving the performance of grid.

    by svillar at September 23, 2014 11:53 AM

    September 15, 2014

    Iago Toral

    Setting up a development environment for Mesa

    Recap

    In my previous post I provided an overview of the Mesa source tree and identified some of its main modules.

    Since we are on that subject I thought it would make sense to give a few tips on how to setup the development environment for Mesa too, so here I go.

    Development environment

    Mesa is mostly written in a combination of C and C++, uses autotools for its build system and Git for version control, so it should be a fairly familiar environment for many people. I am not going to explain how to build autotools projects here, there is plenty of documentation available on that subject, so instead I will focus on the specifics of Mesa.

    First we need to checkout the source code. If you do not have a developer account then do an anonymous checkout:

    # git clone git://anongit.freedesktop.org/git/mesa/mesa
    

    If you do have a developer account do this instead:

    # git clone git+ssh://username@git.freedesktop.org/git/mesa/mesa
    

    Next, we will have to deal with dependencies. This should not be too hard though. Mesa is fairly low in the software stack so it does not have many and the ones it has seem to have a fairly stable API and don’t change too often, so typically, you should be able to build Mesa if you have a recent distribution and you keep it up to date. For reference, as of now I can build Mesa on my Ubuntu 14.04 without any problems.

    In any case, the actual dependencies you will need to get may vary depending on the drivers you want to build, the target platform and the features you want to enable. For example, the R300 Gallium driver requires LLVM, but the Intel i965 driver doesn’t.

    Notice, however, that if you are hacking on features that require specific builds of the XServer, Wayland/Weston or similar stuff the required setup will be more complex, since you would probably need to include these other projects into the mix, together with their respective dependencies.

    Configuring the source tree

    Here I will mention some of the Mesa specific options that I found to be more useful in my time with Mesa:

    –enable-debug: This is necessary, at least, to get assertions to work, and you want this while you are developing. Mesa and the drivers have assertions on many places to make sure that new code does not break certain assumptions or violate hardware constraints, so you really want to make sure that you have these activated when you are developing. It also adds “-g -O0″ to enable debug support.

    –with-dri-drivers: This is the list of classic Mesa DRI drivers you want to build. If you know you will only hack on the i965 driver, for example, then building other drivers will only slow down your builds.

    –with-gallium-drivers: This is the list of Gallium drivers you want to build. Again, if you are hacking on the classic DRI i965 driver you are probably not interested in building any Gallium drivers.

    Notice that if you are working on the Mesa framework layer, that is, the bits shared by all drivers, instead of the internals of a specific driver, you will probably want to include more drivers in the build to make sure that they keep building after your changes.

    –with-egl-platforms: This is a list of supported platforms. Same as with the options above, you probably only want to build Mesa for the platform or platforms you are working on.

    Besides using a combination of these options, you probably also want to set your CFLAGS and CXXFLAGS (remember that Mesa uses both C and C++). I for one like to pass “-g3″, for example.

    Using your built version of Mesa

    Once you have built Mesa you can type ‘make install’ to install the libraries and drivers. Probably, you have configured autotools (via the --prefix option) to do this to a safe location that does not conflict with your distribution installation of Mesa and now your problem is to tell your OpenGL programs that they should use this version of Mesa instead of the one provided by your distro.

    You will have to adjust a couple of environment variables for this:

    LIBGL_DRIVERS_PATH: Set this to the path where your built drivers have been installed. This will tell Mesa’s loader to look for the drivers here.

    LD_LIBRARY_PATH: Set this to the path where your Mesa libraries have been installed. This will make it so that OpenGL programs load your recently built libGL.so rather than your system’s.

    For more tips I’d suggest to read this short thread in the Mesa mailing list, which has some Mesa developers discussing their development environment setup.

    Coming up next

    In the next post I will provide an introduction to modern 3D graphics hardware. After all, the job of the graphics driver is all about programming the hardware, so having a basic understanding of how it works is a requirement if want to do any meaningful driver development.

    by Iago Toral at September 15, 2014 02:44 PM

    September 14, 2014

    Claudio Saavedra

    Sun 2014/Sep/14

    You can try to disguise it in any way you want, but at the end of the day what we have is a boys' club that suddenly cannot invest all of its money into toys for the boys' amusement but now also needs to spend it leveling the field for the girls to be able to play too. Less money for the toys the boys like, surely that upsets them -- after all, boys were having so much fun so far and now that fun is being taken away.

    The fact that the fun in this case happens to be of a socially necessary technological nature (a free desktop, a free software stack, whatever you want to call it) doesn't make this any different. If you are objecting to OPW and your argument is that it hinders the technological advance of the GNOME project, well, admit it -- isn't the fact that you enjoy technology at heart (ie, you are the one having fun) one of the main reasons you're saying this?

    Male-chauvinism can take a thousand forms, and many of those forms are so well hidden and ingrained into our culture that they are terribly difficult to see, specially if you're a man and not the target of it. Once we are confronted with any of these forms, this might even give us a terrible headache -- we are in front of something we didn't even know it existed -- and it can take a tremendous effort to accept they're here. But, frankly, that effort is long due and many of us will refuse to be around those not wanting to make it.

    September 14, 2014 03:52 PM

    September 11, 2014

    Javier Muñoz

    Pflua and high performance packet filtering

    Time to write other post! This time I will comment on one of our most recent projects here in Igalia, a high performance packet filtering toolkit written in Lua.

    Several weeks ago I received a phone call coming from Juan. Andy was looking for some mate ready to jump in a new opportunity related to high performance networking, hypervisors, packet filtering and LuaJIT. Hey! this mix sounded great so I joined Andy and we went ahead.

    Six weeks later, and with Diego joining the project too, one first implementation (Pflua) of the libpcap packet filtering language (pflang), together with the proper testing code and benchmarking (Pflua-bench) went live.

    Along those weeks, I hacked in bindings/FFI implementation, performance/benchmarking, testing stuff and kernel-space to user-space code adaptation (Linux BPF JIT wrapped as a dynamic library!). With this post I will share a quick overview of the project and the proper links to explore it in detail.

    As mentioned, Pflua implements the libpcap packet filtering language, which we allude as 'pflang' for short. Pflua is a high performance packet filtering toolkit implemented in LuaJIT (a tracing compiler for the Lua language). Together with Pflua we developed Pflua-bench too, a benchmarking implementation of pflang.

    Pflua and Pflua-bench were developed for Snabb Gmbh, the company behind the Snabb Switch network appliance toolkit. You can read on this project or getting in touch with Luke and other Snabb hackers in the snabb-devel forum. They are working in very interesting and challenging use cases where virtualization and Software Defined Networking (SDN) are pulling more and more networking into servers. At the same time, user-space networking software is out-performing kernel-space software too.

    In this point, you could be interested in the inner technical details for Pflua and Pflua-bench. If so, I would recommend to read the last post of my colleague Andy. He introduces the project with a great compiler hacker perspective. If you are in a hurry I would highlight the following points:

    • Pflua implements two compilation pipelines or execution engines. It is able to generate Lua code starting from a pflang expression or starting from Berkeley Packet Filter VM. With the first engine you reach great flexibility to craft complex/expert filters. Moreover, your final filters in Lua will be free from some limitations and constraints in BPF, such as extra bound checks or converting to host byte order.
    • Pflua-bench compares 5 pflang implementations: the user-space BPF interpreter from libpcap (libpcap), the old Linux kernel-space BPF compiler (linux-bpf), the new Linux kernel-space BPF compiler (linux-ebpf), BPF bytecode cross-compiled to Lua (bpf) and pflang compiled directly to Lua (Pflua). You can see our benchmarking results and comparative analysis here.
    • Pflua seems to be an acceptable implementation of pflang and, in many circumstances, Pflua is the fastest pflang implementation by a long shot.

    As mentioned, Pflua was developed for Snabb Gmbh around an Open Source virtualized Ethernet networking stack and it has the right potential to become one high performance packet filtering toolkit in SDN solutions (forwarding and control planes).

    We are incubating this project in Igalia. Feel free to follow the development and drop us a mail if you want to support this project or you are just using it!

    by Javier at September 11, 2014 10:00 PM

    September 08, 2014

    Javier Fernández

    Box Alignment and Grid Layout

    As some of my readers already know, Igalia and Bloomberg are collaborating in the implementation of the Grid Layout specification for the Blink/Chromium and WebKit web engines. As part of this assignment, I had the opportunity to review and contirbute to the implementaiton of another feature I consider quite useful for the web: CSS Box Alignment Module (level 3).

    The Box Alignment specification was designed to generalize the behavior of boxes alignment within their containers, which is nowadays defined across multiple specifications. Several layout models are affected by this new specification: block, table, flex and grid. This post is about how it affects to the Grid Layout implementation.

    I think is a good idea to begin my exposition with a brief introduction of some concepts related to alignment and CSS Writing Modes, which I consider quite relevant to understand the implications of this specification for the Grid Layout implementation and, more important, to realize about its potential.

    Examples are mandatory when analyzing W3C specifications; personally, I can’t see all the angles and implications of a feature described in a specification without the proper examples, both visual and source code.

    Finally, I’d like to conclude my article with a development angle describing some interesting implementation details and technical challenges I faced while working on both Blink and WebKit web engines. Also, which perhaps is more interesting, the ones I couldn’t solve yet and I’m still working on. As always comments and feedback are really welcome.

    Introduction to Box Alignment and Writing-Modes

    From the CSS Box Alignment specification:

    features of CSS relating to the alignment of boxes within their containers in the various CSS box layout models: block layout, table layout, flex layout, and grid layout.

    From the CSS Writing Modes specification:

    CSS features to support for various international writing modes, such as left-to-right (e.g. Latin or Indic), right-to-left (e.g. Hebrew or Arabic), bidirectional (e.g. mixed Latin and Arabic) and vertical (e.g. Asian scripts).

    In order to get a better understanding of alignment some abstract dimensional and directional terms should be explained and taken into account. I’m going to briefly describe some of them, the ones I consider more relevant for my exposition; a more detailed definition can be obtained from the Abstract Box Terminology section of the specification.

    There are three sets of directional terms in CSS:

    • physical – Interpreted relative to the page, independent of writing mode. The physical directions are left, right, top, and bottom
    • flow-relative -  Interpreted relative to the flow of content. The flow-relative directions are start and end, or block-start, block-end, inline-start, and inline-end if the dimension is also ambiguous.
    • line-relative – Interpreted relative to the orientation of the line box. The line-relative directions are line-left, line-right, line-over, and line-under.

    The abstract dimensions are defined below:

    • block dimension – The dimension perpendicular to the flow of text within a line, i.e. the vertical dimension in horizontal writing modes, and the horizontal dimension in vertical writing modes.
    • inline dimension – The dimension parallel to the flow of text within a line, i.e. the horizontal dimension in horizontal writing modes, and the vertical dimension in vertical writing modes.
    • block axis – The axis in the block dimension, i.e. the vertical axis in horizontal writing modes and the horizontal axis in vertical writing modes.
    • inline axis - The axis in the inline dimension, i.e. the horizontal axis in horizontal writing modes and the vertical axis in vertical writing modes.
    • extent or logical height - A measurement in the block dimension: refers to the physical height (vertical dimension) in horizontal writing modes, and to the physical width (horizontal dimension) in vertical writing modes.
    • measure or logical width - A measurement in the inline dimension: refers to the physical width (horizontal dimension) in horizontal writing modes, and to the physical height (vertical dimension) in vertical writing modes. (The term measure derives from its use in typography.)

    Then, there are flow-relative and line-relative directions. For the time being, I’ll consider only flow-relative directions terms since they are more relevant for discussing alignment issues.

    • block-start - The side that comes earlier in the block progression, as determined by the writing-mode property: the physical top in horizontal-tb mode, the right in vertical-rl, and the left in vertical-lr.
    • block-end - The side opposite block-start.
    • inline-start - The side from which text of the inline base direction would start. For boxes with a used direction value of ltr, this means the line-left side. For boxes with a used direction value of rtl, this means the line-right side.
    • inline-end - The side opposite start.

    writing-modes

    So now that we have defined the box edges and flow direction concepts we can review how they are used when defining the alignment

    properties and values inside a Grid Layout, which can be defined along two axes:

    • which dimension they apply to (inline vs. stacking)
    • whether they control the position of the box within its parent, or the box’s content within itself.

    alignment-properties

    Regarding the alignment values, there are two concepts that are important to understand:

    • alignment subject - The alignment subject is the thing or things being aligned by the property. For justify-self and align-self, the alignment subject is the margin box of the box the property is set on. For justify-content and align-content, the alignment subject is defined by the layout mode.
    • alignment container - The alignment container is the rectangle that the alignment subject is aligned within. This is defined by the layout mode, but is usually the alignment subject’s containing block.

    Also, there are several kind of alignment behaviors:

    • Positional Alignment - specify a position for an alignment subject with respect to its alignment container.
    • Baseline Alignment - form of positional alignment that aligns multiple alignment subjects within a shared alignment context (such as cells within a row or column) by matching up their alignment baselines.
    • Distributed Alignment - used by justify-content and align-content to distribute the items in the alignment subject evenly between the start and end edges of the alignment container.
    • Overflow Alignment - when the alignment subject is larger than the alignment container, it will overflow. To help combat this problem, an overflow alignment mode can be explicitly specified.

    At the time of this writing, only Positional Alignment is implemented so I’ll focus on those values in the rest of the article. I’m still working on implementing the specification, though, so there will be time to talk about the other values in future posts.

    • center - Centers the alignment subject within its alignment container.
    • start - Aligns the alignment subject to be flush with the alignment container’s start edge.
    • end - Aligns the alignment subject to be flush with the alignment container’s end edge.
    • self-start - Aligns the alignment subject to be flush with the edge of the alignment container corresponding to the alignment subject’s start side. If the writing modes of the alignment subject and the alignment container are orthogonal, this value computes to start.
    • self-end - Aligns the alignment subject to be flush with the edge of the alignment container corresponding to the alignment subject’s end side. If the writing modes of the alignment subject and the alignment container are orthogonal, this value computes to end.
    • left - Aligns the alignment subject to be flush with the alignment container’s line-left edge. If the property’s axis is not parallel with the inline axis, this value computes to start.
    • right - Aligns the alignment subject to be flush with the alignment container’s line-right edge. If the property’s axis is not parallel with the inline axis, this value computes to start.

    So, after this introduction and with all these concepts in mind, it’s now time to get hands on the Grid Layout implementation of the Box Alignment specification. As it was commented before, I’ll try to use as many examples as possible.

    Aligning items inside a Grid Layout

    Before entering in details with source code and examples, I’d like to summarize most of the concepts described below with some pretty simple diagrams:

    2×2 Grid Layout (LTR)

    grid-alignment-ltr

    2×2 Grid Layout (RTL)

    grid-alignment-rtl

    The diagram below illustrates how items are placed inside the grid using different writing modes:

    grid-writing-modes

    At this point, some real examples would help to understand how the CSS alignment properties work on Grid Layout and why they are so important to get all the potential behind this new layout model.

    Let’s consider this basic stylesheet which will be used in the examples from now on:

    <style>
      .grid {
          grid-auto-columns: 100px;
          grid-auto-rows: 200px;
          width: -webkit-fit-content;
          margin-bottom: 20px;
      }
       .item {
          width: 20px;
          height: 40px;
      }
       .content {
          width: 10px;
          height: 20px;
          background: white;
      }
       .verticalRL {
          -webkit-writing-mode: vertical-rl;
      }
       .verticalLR {
          -webkit-writing-mode: vertical-lr;
      }
       .horizontalBT {
          -webkit-writing-mode: horizontal-bt;
      }
       .directionRTL {
          direction: rtl;
      }
    </style>

    The item style will be used for the grid items, while the content will be the style of the elements to be placed inside each grid item. There are as well writing-mode related styles, which will be useful later to experiment with different flow and text directions.

    In the first example we will center all the cells content so we can have a fully aligned grid, which is particularly interesting for many web applications.

    <div class="grid" style="align-items: center; 
                             justify-items: center">
      <div class="cell row1-column1">
        <div class="item"></div>
      </div>
      <div class="cell row1-column2">
        <div class="item"></div>
      </div>
      <div class="cell row2-column1">
        <div class="item"></div>
      </div>
      <div class="cell row2-column2">
        <div class="item"></div>
      </div>
    </div>
    grid-alignment-example1

    In the next example we will illustrate how to use all the Positional Alignment values so we can place nine items in the same grid cell.

     
    <div class="grid">
      <div class="cell row1-column1"
         style="align-self: start; justify-self: start;">
        <div class="item"></div>
      </div>
      <div class="cell row1-column1"
         style="align-self: center; justify-self: start;">
        <div class="item"></div>
      </div>
      <div class="cell row1-column1"
         style="align-self: end; justify-self: start;">
        <div class="item"></div>
      </div>
      <div class="cell row1-column1"
         style="align-self: start; justify-self: center;">
        <div class="item"></div>
      </div>
      <div class="cell row1-column1"
         style="align-self: center; justify-self: center;">
        <div class="item"></div>
      </div>
      <div class="cell row1-column1"
         style="align-self: end; justify-self: center;">
        <div class="item"></div>
      </div>
      <div class="cell row1-column1"
         style="align-self: start; justify-self: end;">
        <div class="item"></div>
      </div>
      <div class="cell row1-column1"
         style="align-self: center; justify-self: end;">
        <div class="item"></div>
      </div>
      <div class="cell row1-column1"
         style="align-self: end; justify-self: end;">
        <div class="item"></div>
      </div>
    </div>
    grid-alignment-example2

    Let’s start playing with inline and block-flow direction and see how it affects to the different Positional Alignment values. I’ll start with the inline direction, which affects only to the justify-xxx set of CSS properties.

    <div class="grid" style="align-items: self-start; justify-items: self-start">
      <div class="cell row1-column1">
        <div class="item"></div>
      </div>
      <div class="cell row1-column2">
        <div class="item"></div>
      </div>
      <div class="cell row2-column1">
        <div class="item"></div>
      </div>
      <div class="cell row2-column2">
        <div class="item"></div>
      </div>
    </div>
    Direction LTR Direction RTL
    grid-alignment-example3 grid-alignment-example4

    The writing-mode CSS Property applies to the block-flow direction, hence it’s the align-xxx properties the ones affected. In this case, orthogonal writing-modes can be specified in the HTML source code; however, these use cases are not yet fully supported by the current implementation of Grid Layout.

    <div class="grid"
          style="align-items: self-start; 
                 justify-items: self-start">
      <div class="cell row1-column1">
        <div class="item"></div>
      </div>
      <div class="cell row1-column2">
        <div class="item"></div>
      </div>
      <div class="cell row2-column1">
        <div class="item"></div>
      </div>
      <div class="cell row2-column2">
        <div class="item"></div>
      </div>
    </div>
    grid-alignment-example3
    Vertical LR Vertical RL
    grid-alignment-example5 grid-alignment-example6

    Technical challenges, accomplished and to be faced

    Implementing the Box Alignment specification has been a long task and there is still quite much work ahead for both, WebKit and Blink/Chromium web engines. Perhaps one of the most tedious issue was the definition of a couple of new CSS properties: justify-self and justify-items, which required to touch several Core components, from the CSS parser, the style builder and resolver and finally the rendering.

    Another important technical challenge comes from the fact that the Box Alignment properties already present in both web engines were implemented as part of the Flexible Box specification. As it was commented before in this post, the Box Alignment specification aims to generalize the alignment behavior for several layout models, hence these properties were not tied to the Flexible Box implementation anymore; this lead to many technical issue, as I’ll explain later.

    The patch implemented for issue 333423005 is a good example of the files to touch and logic to be added in order to implement a new CSS property in Blink/Chromium. There is a similar work to be done in the WebKit web engine; at the time of this writing the similarities are still big, even though some parts changed considerably, like the CSS parsing and style builder logic. As an example, the patch implemented in bug 134419

    The following code is quite descriptive of the nature of the CSS Box Alignment properties and how they are applied during the style cascade:

    void StyleAdjuster::adjustStyleForAlignment(RenderStyle& style, const RenderStyle& parentStyle)
    {
        bool isFlexOrGrid = style.isDisplayFlexibleOrGridBox();
        bool absolutePositioned = style.position() == AbsolutePosition;
     
        // If the inherited value of justify-items includes the legacy keyword, 'auto'
        // computes to the the inherited value.
        // Otherwise, auto computes to:
        //  - 'stretch' for flex containers and grid containers.
        //  - 'start' for everything else.
        if (style.justifyItems() == ItemPositionAuto) {
            if (parentStyle.justifyItemsPositionType() == LegacyPosition) {
                style.setJustifyItems(parentStyle.justifyItems());
                style.setJustifyItemsPositionType(parentStyle.justifyItemsPositionType());
            } else {
                style.setJustifyItems(isFlexOrGrid ? ItemPositionStretch : ItemPositionStart);
            }
        }
     
        // The 'auto' keyword computes to 'stretch' on absolutely-positioned elements,
        // and to the computed value of justify-items on the parent (minus
        // any 'legacy' keywords) on all other boxes (to be resolved during the layout).
        if ((style.justifySelf() == ItemPositionAuto) && absolutePositioned)
            style.setJustifySelf(ItemPositionStretch);
     
        // The 'auto' keyword computes to:
        //  - 'stretch' for flex containers and grid containers,
        //  - 'start' for everything else.
        if (style.alignItems() == ItemPositionAuto)
            style.setAlignItems(isFlexOrGrid ? ItemPositionStretch : ItemPositionStart);
     
        // The 'auto' keyword computes to 'stretch' on absolutely-positioned elements,
        // and to the computed value of align-items on the parent (minus
        // any 'legacy' keywords) on all other boxes (to be resolved during the layout).
        if ((style.alignSelf() == ItemPositionAuto) && absolutePositioned)
            style.setAlignSelf(ItemPositionStretch);
    }

    The WebKit web engine implements the same logic in the StyleResolver class; the StyleAdjuster class is just a helper class defined in the blink/Chromium engine to assist the StyleReslolver logic during the style cascade in order to make some final adjustmetns.

    The issue 297483005 implements the align-self CSS property support in Grid Layout; the follwong code extrated from that patch is a good example of how alingment interacts with the grid tracks.

    LayoutUnit RenderGrid::rowPositionForChild(const RenderBox* child) const
    {
        bool hasOrthogonalWritingMode = child->isHorizontalWritingMode() != isHorizontalWritingMode();
        ItemPosition alignSelf = resolveAlignment(style(), child->style());
     
        switch (alignSelf) {
        case ItemPositionSelfStart:
            // If orthogonal writing-modes, this computes to 'Start'.
            // FIXME: grid track sizing and positioning does not support orthogonal modes yet.
            if (hasOrthogonalWritingMode)
                return startOfRowForChild(child);
     
            // self-start is based on the child's block axis direction. That's why we need to check against the grid container's block flow.
            if (child->style()->writingMode() != style()->writingMode())
                return endOfRowForChild(child);
     
            return startOfRowForChild(child);
        case ItemPositionSelfEnd:
            // If orthogonal writing-modes, this computes to 'End'.
            // FIXME: grid track sizing and positioning does not support orthogonal modes yet.
            if (hasOrthogonalWritingMode)
                return endOfRowForChild(child);
     
            // self-end is based on the child's block axis direction. That's why we need to check against the grid container's block flow.
            if (child->style()->writingMode() != style()->writingMode())
                return startOfRowForChild(child);
     
            return endOfRowForChild(child);
     
        case ItemPositionLeft:
            // orthogonal modes make property and inline axes to be parallel, but in any case
            // this is always equivalent to 'Start'.
            //
            // self-align's axis is never parallel to the inline axis, except in orthogonal
            // writing-mode, so this is equivalent to 'Start’.
            return startOfRowForChild(child);
     
        case ItemPositionRight:
            // orthogonal modes make property and inline axes to be parallel.
            // FIXME: grid track sizing and positioning does not support orthogonal modes yet.
            if (hasOrthogonalWritingMode)
                return endOfRowForChild(child);
     
            // self-align's axis is never parallel to the inline axis, except in orthogonal
            // writing-mode, so this is equivalent to 'Start'.
            return startOfRowForChild(child);
     
        case ItemPositionCenter:
            return centeredRowPositionForChild(child);
            // Only used in flex layout, for other layout, it's equivalent to 'Start'.
        case ItemPositionFlexStart:
        case ItemPositionStart:
            return startOfRowForChild(child);
            // Only used in flex layout, for other layout, it's equivalent to 'End'.
        case ItemPositionFlexEnd:
        case ItemPositionEnd:
            return endOfRowForChild(child);
        case ItemPositionStretch:
            // FIXME: Implement the Stretch value. For now, we always start align the child.
            return startOfRowForChild(child);
        case ItemPositionBaseline:
        case ItemPositionLastBaseline:
            // FIXME: Implement the ItemPositionBaseline value. For now, we always start align the child.
            return startOfRowForChild(child);
        case ItemPositionAuto:
            break;
        }
     
        ASSERT_NOT_REACHED();
        return 0;
    }

    The resolveAlignment function call deserves an special mention, since it will lead to the open issues I’m still working on. The Box Alignment specification states that the auto values must be resolved to either stretch or start depending on the kind of element. This is theoretically performed during the style cascade, so it wouldn’t be necessary to resolve it at the rendering stage. The code is pretty simple :

    static ItemPosition resolveAlignment(const RenderStyle* parentStyle, const RenderStyle* childStyle)
    {
        ItemPosition align = childStyle->alignSelf();
        // The auto keyword computes to the parent's align-items computed value, or to "stretch", if not set or "auto".
        if (align == ItemPositionAuto)
            align = (parentStyle->alignItems() == ItemPositionAuto) ? ItemPositionStretch : parentStyle->alignItems();
        return align;
    }

    The RenderFlexibleBox implementation has to define a similar logic and what is more important, the default value of all the Box Alignment properties have been changed to auto, instead of stretch as it’s stated in the Flexbible Box specification.

    To make things even more complicated, many HTML elements are being rendered by RenderFlexibleBox objects as an implementation decision, without the proper display value set to indicate such assumption. This causes many issues and layout tests failures, since the resolved value for auto depends on the kind of element, which is defined by its display property value. Additionally, there are also problems with the anonymous render objects added to the tree on certain implementations.

    Both WebKit and Blink/Chromium are affected by these issues; Mathml is a good example for the WebKit engine, since most if its render objects are implemented using a RenderFlexibleBox; also, it assigns and manipulates the align-{self, items} properties during the layout. The RenderFullScreen object is a source of problems for the Blink/Chromium web engine on this regard; it uses a RenderFleixibleBox because of its stretch default behavior, which is not the case anymore according to the Box Alignment specification.

    I’m still working on theses issues in both web engines, so this issue is trying to face part of the problems on Blink/Chromium. There are a similar bug in the WebKit engine with similar challenges.

    Another pending issue present in both web engines is the lack of support for different writing-modes. Eventhouth the Grid Layout logic is prepared to support them, it’s still buggy and for certain combinations it does not produce the expected outcome.

    I’d like to finish this post pointing out that anybody can follow the progress of the Box Alignment spec implementation for Grid Layout you can track these bugs on either of the web engine you are more interested on:

    • Blink/Chromium
      • bug 249451: [CSS Grid Layout] Implement row-axis Alignment
      • bug 376823: [CSS Grid Layout] Implement column-axis Alignment
    • WebKit
      • bug 133224 – [meta] [CSS Grid Layout] Implement column-axis Alignment
      • bug 133222 – [meta] [CSS Grid Layout] Implement row-axis Alignment

    This work wouldn’t be possible without the support of Bloomberg and Igalia, who are comitted to provide a better web platform for developers.

    Igalia & Bloomberg logos

    Igalia and Bloomberg working to build a better web platform

    by jfernandez at September 08, 2014 01:48 PM

    Eduardo Lima Mitev

    Drawing Web content with OpenGL (ES 3.0) instanced rendering

    This is a follow up article about my ongoing research on Web content rendering using aggressive batching and merging of draw operations, together with OpenGL (ES 3.0) instanced rendering.

    In a previous post, I discussed how relying on the Web engine’s layer tree to figure out non-overlapping content (layers) of a Web page, would (theoretically) allow an OpenGL based rasterizer to ignore the order of the drawing operations. This would allow the rasterizer to group together drawing of similar geometry and submit them efficiently to the GPU using instanced rendering.

    I also presented some basic examples and comparisons of this technique with Skia, a popular 2D rasterizer, giving some hints on how much we can accelerate rendering if the overhead of the OpenGL API calls is reduced by using the instanced rendering technique.

    However, this idea remained to be validated for real cases and in real hardware, specially because of the complexity and pressure imposed on shader programs, which now become responsible for de-referencing the attributes of each batched geometry and render them correctly.

    Also, there are potential API changes in the rasterizer that could make this technique impractical to implement in any existing Web engine without significant changes in the rendering process.

    To try keep this article short and focused, today I want to talk only about my latest experiments rendering some fairly complex Web elements using this technique; and leave the discussion about performance to future entries.

    Everything is a rectangle

    As mentioned in my previous article, almost everything in a Web page can be rendered with a rectangle primitive.

    Web pages are mostly character glyphs, which today’s rasterizers normally draw by texture mapping a pre-rendered image of the glyph onto a rectangular area. Then you have boxes, images, shadows, lines, etc; which can all be drawn with a rectangle with the correct layout, transformation and/or texturing.

    Primitives that are not rectangles are mostly seen in the element’s border specification, where you have borders with radius, and different styles: double, dotted, grooved, etc. There is a rich set of primitives coming from the combination of features in the borders spec alone.

    There is also the Canvas 2D and SVG APIs, which are created specifically for arbitrary 2D content. The technique I’m discussing here purposely ignores these APIs and focuses on accelerating the rest.

    In practice, however, these non-rectangular geometries account for just a tiny fraction of the typical rendering of a Web page, which allows me to effectively call them “exceptions”.

    The approach I’m currently following assumes everything in a Web page is a rectangle, and all non-rectangular geometry is treated as exceptions and handled differently on shader code.

    This means I no longer need to ignore the ordering problem since I always batch a rectangle for every single draw operation, and then render all rectangles in order. This introduces a dramatic change compared to the previous approach I discussed. Now I can (partially) implement this technique without changing the API of existing rasterizers. I say “partially” because to take full advantage of the performance gain, some API changes would be desired.

    Drawing non-rectangular geometry using rectangles

    So, how do we deal with these exceptions? Remember that we want to draw only with rectangles so that no operation could ever break our batch, if we want to take full advantage of the instanced rendering acceleration.

    There are 3 ways of rendering non-rectangular geometry using rectangles:

    • 1. Using a geometry shader:

      This is the most elegant solution, and looks like it was designed for this case. But since it isn’t yet widely deployed, I will not make much emphasis on it here. But we need to follow its evolution closely.

    • 2. Degenerating rectangles:

      This is basically to turn a rectangle into a triangle by degenerating one of its vertices. Then, with a set of degenerated rectangles one could draw any arbitrary geometry as we do today with triangles.

    • 3. Drawing geometry in the fragment shader:

      This sounds like a bad idea, and it is definitely a bad idea! However, given the small and limited amount of cases that we need to consider, it can be feasible.

    I’m currently experimenting with 3). You might ask why?, it looks like the worse option. The reason is that going for 2), degenerating rectangles, seems overkill at this point, lacking a deeper understanding of exactly what non-rectangle geometry we will ever need. Implementing a generic rectangle degeneration just for a few tiny set of cases would have been initially a bad choice and a waste of time.

    So I decided to explore first the option of drawing these exceptions in the fragment shader and see how far I could go in terms of shader code complexity and performance (un)loss.

    Next, I will show some examples of simple Web features rendered this way.

    Experiments

    The setup:

    While my previous screen-casts were ran in my working laptop with a powerful Haswell GPU, one of my goals then was to focus on mobile devices. Hence, I started developing on an Arndale board I happen to have around. Details of the exact setup is out of the scope now, but I will just mention that the board is running a Linaro distribution with the official Mali T604 drivers by ARM.

    My Arndale board

    Following is a video I ensambled to show the different examples running on the Arndale board (and my laptop at the same time). This time I had to record using an external camera instead of screen-casting to avoid interference with the performance, so please bear with my camera-on-hand video recording skills.



    This video file is also available on Vimeo.

    I won’t talk about performance now, since I plan to cover that in future deliveries. Enough to be said that the performance is pretty good, comparable to my laptop in most of the examples. Also, there are a lot of simple known optimizations that I have not done because I’m focusing on validating the method first.

    One important thing to note is that when drawing is done in a fragment shader, you cannot benefit from multi-sampling anti-aliasing (MSAA), since sampling occurs at an earlier stage. Hence, you have to implement anti-aliasing your self. In this case, I implemented a simple distance-to-edge linear anti-aliasing, and to my surprise, the end result is much better than the MSAA with 8 samples I was trying on my Haswell laptop before, and it is also faster.

    On a related note, I have found out that MSAA does not give me much when rendering character glyphs (the majority of content) since they come already anti-aliased by FreeType2. And MSAA will slow down the rendering of the entire scene for every single frame.

    I continue to dump the code from this research into a personal repository on GitHub. Go take a look if you are interested in the prototyping of these experiments.

    Conclusions and next steps

    There is one important conclusion coming out from these experiments: The fact that the rasterizer is stateless makes it very inefficient to modify a single element in a scene.

    By stateless I mean they do not keep semantic information about the elements being drawn. For example, lets say I draw a rectangle in one frame, and in the next frame I want to draw the same rectangle somewhere else on the canvas. I already have a batch with all the elements of the scene happily stored in a vertex buffer object on GPU memory, and the rectangle in question is there somewhere. If I could keep the offset where that rectangle is in the batch, I could modify its attributes without having to drop and re-submit the entire buffer.

    The solution: Moving to a scene graph. Web engines already implement a scene graph but at a higher level. Here I’m talking about a scene graph in the rasterizer itself, where nodes keep the offset of their attributes in the batch (layout, transformation, color, etc); and when you modify any of these attributes, only the deltas are uploaded to the GPU, rather than the whole batch.

    I believe a scene graph approach has the potential to open a whole new set of opportunities for acceleration, specially for transitions and animations, and scrolling.

    And that’s exciting!

    Apart from this, I also want to:

    • Benchmark! set up a platform for reliable benchmarking and perf comparison with Skia/Cairo.
    • Take a subset of this technique and test it in Skia, behind current API.
    • Validate the case of drawing drop shadows and multi-step gradient backgrounds.
    • Test in other different OpenGL ES 3.0 implementations (and more devices!).

    Let us not forget the fight we are fighting: Web applications must be as fast as native. I truly think we can do it.

    by elima at September 08, 2014 01:16 PM

    Iago Toral

    An eagle eye view into the Mesa source tree

    Recap

    My last post introduced Mesa’s loader as the module that takes care of auto-selecting the right driver for our hardware. If the loader fails to find a suitable hardware driver it will fall back to a software driver, but we can also force this situation ourselves, which may come in handy in some scenarios. We also took a quick look at the glxinfo tool that we can use to query the capabilities and features exposed by the selected driver.

    The topic of today focuses on providing a quick overview of the Mesa source code tree, which will help us identify the parts of the code that are relevant to our interests depending on the driver and/or the feature we intend to work on.

    Browsing the source code

    First off, there is already some documentation on this topic available on the Mesa 3D website that is a good place to start. Since that already gives some insight on what goes into each part of the repository I’ll focus on complementing that information with a little bit more of detail for some of the most important parts I have interacted with so far:

    • In src/egl/ we have the implementation of the EGL standard. If you are working on EGL-specific features, tracking down an EGL-specific problem or you are simply curious about how EGL links into the GL implementation, this is the place you want to visit. This includes the EGL implementations for the X11, DRM and Wayland platforms.
    • In src/glx/ we have the OpenGL bits relating specifically to X11 platforms, known as GLX. So if you are working on the GLX layer, this is the place to go. Here there is all the stuff that takes care of interacting with the XServer, the client-side DRI implementation, etc.
    • src/glsl/ contains a critical aspect of Mesa: the GLSL compiler used by all Mesa drivers. It includes a GLSL parser, the definition of the Mesa IR, also referred to as GLSL IR, used to represent shader programs internally, the shader linker and various optimization passes that operate on the Mesa IR. The resulting Mesa IR produced by the GLSL compiler is then consumed by the various drivers which transform it into native GPU code that can be loaded and run in the hardware.
    • src/mesa/main/ contains the core Mesa elements. This includes hardware-independent views of core objects like textures, buffers, vertex array objects, the OpenGL context, etc as well as basic infrastructure, like linked lists.
    • src/mesa/drivers/ contains the actual classic drivers (not Gallium). DRI drivers in particular go into src/mesa/drivers/dri. For example the Intel i965 driver goes into src/mesa/drivers/dri/i965. The code here is, for the most part, very specific to the underlying hardware platforms.
    • src/mesa/swrast*/ and src/mesa/tnl*/ provide software implementations for things like rasterization or vertex transforms. Used by some software drivers and also by some hardware drivers to implement certain features for which they don’t have hardware support or for which hardware support is not yet available in the driver. For example, the i965 driver implements operations on the accumulation and selection buffers in software via these modules.
    • src/mesa/vbo/ is another important module. Across its various versions, OpenGL has specified many ways in which a program can tell OpenGL about its vertex data, from using functions of the glVertex*() family inside glBegin()/glEnd() blocks, to things like vertex arrays, vertex array objects, display lists, etc… The drivers, however, do not need to deal with all this, Mesa makes it so that they always receive their vertex data as collection of vertex arrays, significantly reducing complexity on the side of the driver implementator. This is the module that takes care of managing all this, so no matter what type of drawing you GL program is doing or how it specifies its vertex data, it will always go through this module before it reaches the driver.
    • src/loader/, as we have seen in my previous post, contains the Mesa driver loader, which provides the logic necessary to decide which Mesa driver is the right one to use for a specific hardware so that Mesa’s libGL.so can auto-select the right driver when loaded.
    • src/gallium/ contains the Gallium3D framework implementation. If, like me, you only work on a classic driver, you don’t need to care about the contents of this at all. If you are working on Gallium drivers however, this is the place where you will find the various Gallium drivers in development (inside src/gallium/drivers/), like the various Gallium ATI/AMD drivers, Nouveau or the LLVM based software driver (llvmpipe) and the Gallium state trackers.

    So with this in mind, one should have enough information to know where to start looking for something specific:

    • If are interested in how vertex data provided to OpenGL is manipulated and uploaded to the GPU, the vbo module is probably the right place to look.
    • If we are looking to work on a specific aspect of a concrete hardware driver, we should go to the corresponding directory in src/mesa/drivers/ if it is a classic driver, or src/gallium/drivers if it is a Gallium driver.
    • If we want to know about how Mesa, the framework, abstracts various OpenGL concepts like textures, vertex array objects, shader programs, etc. we should look into src/mesa/main/.
    • If we are interested in the platform specific support, be it EGL or GLX, we want to look into src/egl or src/glx.
    • If we are interested in the GLSL implementation, which involves anything from the compiler to the intermediary IR and the various optimization passes, we need to look into src/glsl/.

    Coming up next

    So now that we have an eagle view of the contents of the Mesa repository let’s see how we can prepare a development environment so we can start hacking on
    some stuff. I’ll cover this in my next post.

    by Iago Toral at September 08, 2014 11:59 AM

    September 04, 2014

    Iago Toral

    Driver loading and querying in Mesa

    Recap

    In my previous post I explained that Mesa is a framework for OpenGL driver development. As such, it provides code that can be reused by multiple driver implementations. This code is, of course, hardware agnostic, but frees driver developers from doing a significant part of the work. The framework also provides hooks for developers to add the bits of code that deal with the actual hardware. This design allows multiple drivers to co-exist and share a significant amount of code.

    I also explained that among the various drivers that Mesa provides, we can find both hardware drivers that take advantage of a specific GPU and software drivers, that are implemented entirely in software (so they work on the CPU and do not depend on a specific GPU). The latter are obviously slower, but as I discussed, they may come in handy in some scenarios.

    Driver selection

    So, Mesa provides multiple drivers, but how does it select the one that fits the requirements of a specific system?

    You have probably noticed that Mesa is deployed in multiple packages. In my Ubuntu system, the one that deploys the DRI drivers is libgl1-mesa-dri:amd64. If you check its contents you will see that this package installs OpenGL drivers for various GPUs:

    # dpkg -L libgl1-mesa-dri:amd64 
    (...)
    /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_radeonsi.so
    /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_r600.so
    /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_nouveau.so
    /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_vmwgfx.so
    /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_r300.so
    /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_swrast.so
    /usr/lib/x86_64-linux-gnu/dri/i915_dri.so
    /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
    /usr/lib/x86_64-linux-gnu/dri/r200_dri.so
    /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
    /usr/lib/x86_64-linux-gnu/dri/radeon_dri.so
    /usr/lib/x86_64-linux-gnu/dri/r300_dri.so
    /usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so
    /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
    /usr/lib/x86_64-linux-gnu/dri/nouveau_vieux_dri.so
    /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
    /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
    (...)
    

    Since I have a relatively recent Intel GPU, the driver I need is the one provided in i965_dri.so. So how do we tell Mesa that this is the one we need? Well, the answer is that we don’t, Mesa is smart enough to know which driver is the right one for our GPU, and selects it automatically when you load libGL.so. The part of Mesa that takes care of this is called the ‘loader’.

    You can, however, point Mesa to look for suitable drivers in a specific directory other than the default, or force it to use a software driver using various environment variables.

    What driver is Mesa actually loading?

    If you want to know exactly what driver Mesa is loading, you can instruct it to dump this (and other) information to stderr via the LIBGL_DEBUG environment variable:

    # LIBGL_DEBUG=verbose glxgears 
    libGL: screen 0 does not appear to be DRI3 capable
    libGL: pci id for fd 4: 8086:0126, driver i965
    libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/tls/i965_dri.so
    libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
    

    So we see that Mesa checks the existing hardware and realizes that the i965 driver is the one to use, so it first attempts to load the TLS version of that driver and, since I don’t have the TLS version, falls back to the normal version, which I do have.

    The code in src/loader/loader.c (loader_get_driver_for_fd) is the one responsible for detecting the right driver to use (i965 in my case). This receives a device fd as input parameter that is acquired previously by calling DRI2Connect() as part of the DRI bring up process. Then the actual driver file is loaded in glx/dri_common.c (driOpenDriver).

    We can also obtain a more descriptive indication of the driver we are loading by using the glxinfo program that comes with the mesa-utils package:

    # glxinfo | grep -i "opengl renderer"
    OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Mobile 
    

    This tells me that I am using the Intel hardware driver, and it also shares information related with the specific Intel GPU I have (SandyBridge).

    Forcing a software driver

    I have mentioned that having software drivers available comes in handy at times, but how do we tell the loader to use them? Mesa provides an environment variable that we can set for this purpose, so switching between a hardware driver and a software one is very easy to do:

    # LIBGL_DEBUG=verbose LIBGL_ALWAYS_SOFTWARE=1 glxgears 
    libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/tls/swrast_dri.so
    libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
    

    As we can see, setting LIBGL_ALWAYS_SOFTWARE will make the loader select a software driver (swrast).

    If I force a software driver and call glxinfo like I did before, this is what I get:

    # LIBGL_ALWAYS_SOFTWARE=1 glxinfo | grep -i "opengl renderer"
    OpenGL renderer string: Software Rasterizer
    

    So it is clear that I am using a software driver in this case.

    Querying the driver for OpenGL features

    The glxinfo program also comes in handy to obtain information about the specific OpenGL features implemented by the driver. If you want to check if the Mesa driver for your hardware implements a specific OpenGL extension you can inspect the output of glxinfo and look for that extension:

    # glxinfo | grep GL_ARB_texture_multisample
    

    You can also ask glxinfo to include hardware limits for certain OpenGL features including the -l switch. For example:

    # glxinfo -l | grep GL_MAX_TEXTURE_SIZE
    GL_MAX_TEXTURE_SIZE = 8192
    

    Coming up next

    In my next posts I will cover the directory structure of the Mesa repository, identifying its main modules, which should give Mesa newcomers some guidance as to where they should look for when they need to find the code that deals with something specific. We will then discuss how modern 3D hardware has changed the way GPU drivers are developed and explain how a modern 3D graphics pipeline works, which should pave the way to start looking into the real guts of Mesa: the implementation of shaders.

    by Iago Toral at September 04, 2014 11:43 AM

    September 02, 2014

    Andy Wingo

    high-performance packet filtering with pflua

    Greets! I'm delighted to be able to announce the release of Pflua, a high-performance packet filtering toolkit written in Lua.

    Pflua implements the well-known libpcap packet filtering language, which we call pflang for short.

    Unlike other packet filtering toolkits, which tend to use the libpcap library to compile pflang expressions bytecode to be run by the kernel, Pflua is a completely new implementation of pflang.

    why lua?

    At this point, regular readers are asking themselves why this Schemer is hacking on a Lua project. The truth is that I've always been looking for an excuse to play with the LuaJIT high-performance Lua implementation.

    LuaJIT is a tracing compiler, which is different from other JIT systems I have worked on in the past. Among other characteristics, tracing compilers only emit machine code for branches that are taken at run-time. Tracing seems a particularly appropriate strategy for the packet filtering use case, as you end up with linear machine code that reflects the shape of actual network traffic. This has the potential to be much faster than anything static compilation techniques can produce.

    The other reason for using Lua was because it was an excuse to hack with Luke Gorrie, who for the past couple years has been building the Snabb Switch network appliance toolkit, also written in Lua. A common deployment environment for Snabb is within the host virtual machine of a virtualized server, with Snabb having CPU affinity and complete control over a high-performance 10Gbit NIC, which it then routes to guest VMs. The administrator of such an environment might want to apply filters on the kinds of traffic passing into and out of the guests. To this end, we plan on integrating Pflua into Snabb so as to provide a pleasant, expressive, high-performance filtering facility.

    Given its high performance, it is also reasonable to deploy Pflua on gateway routers and load-balancers, within virtualized networking appliances.

    implementation

    Pflua compiles pflang expressions to Lua source code, which are then optimized at run-time to native machine code.

    There are actually two compilation pipelines in Pflua. The main one is fairly traditional. First, a custom parser produces a high-level AST of a pflang filter expression. This AST is lowered to a primitive AST, with a limited set of operators and ways in which they can be combined. This representation is then exhaustively optimized, folding constants and tests, inferring ranges of expressions and packet offset values, hoisting assertions that post-dominate success continuations, etc. Finally, we residualize Lua source code, performing common subexpression elimination as we go.

    For example, if we compile the simple Pflang expression ip or ip6 with the default compilation pipeline, we get the Lua source code:

    return function(P,length)
       if not (length >= 14) then return false end
       do
          local v1 = ffi.cast("uint16_t*", P+12)[0]
          if v1 == 8 then return true end
          do
             do return v1 == 56710 end
          end
       end
    end
    

    The other compilation pipeline starts with bytecode for the Berkeley packet filter VM. Pflua can load up the libpcap library and use it to compile a pflang expression to BPF. In any case, whether you start from raw BPF or from a pflang expression, the BPF is compiled directly to Lua source code, which LuaJIT can gnaw on as it pleases. Compiling ip or ip6 with this pipeline results in the following Lua code:

    return function (P, length)
       local A = 0
       if 14 > length then return 0 end
       A = bit.bor(bit.lshift(P[12], 8), P[12+1])
       if (A==2048) then goto L2 end
       if not (A==34525) then goto L3 end
       ::L2::
       do return 65535 end
       ::L3::
       do return 0 end
       error("end of bpf")
    end
    

    We like the independence and optimization capabilities afforded by the native pflang pipeline. Pflua can hoist and eliminate bounds checks, whereas BPF is obligated to check that every packet access is valid. Also, Pflua can work on data in network byte order, whereas BPF must convert to host byte order. Both of these restrictions apply not only to Pflua's BPF pipeline, but also to all other implementations that use BPF (for example the interpreter in libpcap, as well as the JIT compilers in the BSD and Linux kernels).

    However, though Pflua does a good job in implementing pflang, it is inevitable that there may be bugs or differences of implementation relative to what libpcap does. For that reason, the libpcap-to-bytecode pipeline can be a useful alternative in some cases.

    performance

    When Pflua hits the sweet spots of the LuaJIT compiler, performance screams.


    (full image, analysis)

    This synthetic benchmark runs over a packet capture of a ping flood between two machines and compares the following pflang implementations:

    1. libpcap: The user-space BPF interpreter from libpcap

    2. linux-bpf: The old Linux kernel-space BPF compiler from 2011. We have adapted this library to work as a loadable user-space module (source)

    3. linux-ebpf: The new Linux kernel-space BPF compiler from 2014, also adapted to user-space (source)

    4. bpf-lua: BPF bytecodes, cross-compiled to Lua by Pflua.

    5. pflua: Pflang compiled directly to Lua by Pflua.

    To benchmark a pflang implementation, we use the implementation to run a set of pflang expressions over saved packet captures. The result is a corresponding set of benchmark scores measured in millions of packets per second (MPPS). The first set of results is thrown away as a warmup. After warmup, the run is repeated 50 times within the same process to get multiple result sets. Each run checks to see that the filter matches the the expected number of packets, to verify that each implementation does the same thing, and also to ensure that the loop is not dead.

    In all cases the same Lua program is used to drive the benchmark. We have tested a native C loop when driving libpcap and gotten similar results, so we consider that the LuaJIT interface to C is not a performance bottleneck. See the pflua-bench project for more on the benchmarking procedure and a more detailed analysis.

    The graph above shows that Pflua can stream in packets from memory and run some simple pflang filters them at close to the memory bandwidth on this machine (100 Gbit/s). Because all of the filters are actually faster than the accept-all case, probably due to work causing prefetching, we actually don't know how fast the filters themselves can run. At any case, in this ideal situation, we're running at a handful of nanoseconds per packet. Good times!


    (full image, analysis)

    It's impossible to make real-world tests right now, especially since we're running over packet captures and not within a network switch. However, we can get more realistic. In the above test, we run a few filters over a packet capture from wingolog.org, which mostly operates as a web server. Here we see again that Pflua beats all of the competition. Oddly, the new Linux JIT appears to fare marginally worse than the old one. I don't know why that would be.

    Sadly, though, the last tests aren't running at that amazing flat-out speed we were seeing before. I spent days figuring out why that is, and that's part of the subject of my last section here.

    on lua, on luajit

    I implement programming languages for a living. That doesn't mean I know everything there is to know about everything, or that everything I think I know is actually true -- in particular, I was quite ignorant about trace compilers, as I had never worked with one, and I hardly knew anything about Lua at all. With all of those caveats, here are some ignorant first impressions of Lua and LuaJIT.

    LuaJIT has a ridiculously fast startup time. It also compiles really quickly: under a minute. Neither of these should be important but they feel important. Of course, LuaJIT is not written in Lua, so it doesn't have the bootstrap challenges that Guile has; but still, a fast compilation is refreshing.

    LuaJIT's FFI is great. Five stars, would program again.

    As a compilation target, Lua is OK. On the plus side, it has goto and efficient bit operations over 32-bit numbers. However, and this is a huge downer, the result range of bit operations is the signed int32 range, not the unsigned range. This means that bit.band(0xffffffff, x) might be negative. No one in the history of programming has ever wanted this. There are sensible meanings for negative results to bit operations, but only if an argument was negative. Grr. Otherwise, Lua shares the same concerns as other languages whose numbers are defined as 64-bit doubles.

    Sometimes people get upset that Lua starts its indexes (in "arrays" or strings) with 1 instead of 0. It's foreign to me, so it's sometimes a challenge, but it can work as well as anything else. The problem comes in when working with the LuaJIT FFI, which starts indexes with 0, leading me to make errors as I forget which kind of object I am working on.

    As a language to implement compilers, Lua desperately misses a pattern matching facility. Otherwise, a number of small gripes but no big ones; tables and closures abound, which leads to relatively terse code.

    Finally, how well does trace compilation work for this task? I offer the following graph.


    (full image, analysis)

    Here the tests are paired. The first test of a pair, for example the leftmost portrange 0-6000, will match most packets. The second test of a pair, for example the second-from-the-left portrange 0-5, will reject all packets. The generated Lua code will be very similar, except for some constants being different. See portrange-0-6000.md for an example.

    The Pflua performance of these filters is very different: the one that matches is slower than the one that doesn't, even though in most cases the non-matching filter will have to do more work. For example, a non-matching filter probably checks both src and dst ports, whereas a successful one might not need to check the dst.

    It hurts to see Pflua's performance be less than the Linux JIT compilers, and even less than libpcap at times. I scratched my head for a long time about this. The Lua code is fine, and actually looks much like the BPF code. I had taken a look at the generated assembly code for previous traces and it looked fine -- some things that were not as good as they should be (e.g. a fair bit of conversions between integers and doubles, where these traces have no doubles), but things were OK. What changed?

    Well. I captured the traces for portrange 0-6000 to a file, and dove in. Trace 66 contains the inner loop. It's interesting to see that there's a lot of dynamic checks in the beginning of the trace, although the loop itself is not bad (scroll down to see the word LOOP:), though with the double conversions I mentioned before.

    It seems that trace 66 was captured for a packet whose src port was within range. Later, we end up compiling a second trace if the src port check fails: trace 67. The trace starts off with an absurd amount of loads and dynamic checks -- to a similar degree as trace 66, even though trace 66 dominates trace 67. It seems that there is a big penalty for transferring from one trace to another, even though they are both compiled.

    Finally, once trace 67 is done -- and recall that all it has to do is check the destination port, and then update the counters from the inner loop) -- it jumps back to the top of trace 66 instead of the top of the loop, repeating all of the dynamic checks in trace 66! I can only think this is a current deficiency of LuaJIT, and not with trace compilation in general, although the amount of state transfer points to a lack of global analysis that you would get in a method JIT. I'm sure that values are being transferred that are actually dead.

    This explains the good performance for the match-nothing cases: the first trace that gets compiled residualizes the loop expecting that all tests fail, and so only matching cases or variations incur the trace transfer-and-re-loop cost.

    It could be that the Lua code that Pflua residualizes is in some way not idiomatic or not performant; tips in that regard are appreciated.

    conclusion

    I was going to pass some possible slogans by our marketing department, but we don't really have one, so I pass them on to you and you can tell me what you think:

    • "Pflua: A Totally Adequate Pflang Implementation"

    • "Pflua: Sometimes Amazing Performance!!!!1!!"

    • "Pflua: Organic Artisanal Network Packet Filtering"

    Pflua was written by Igalians Diego Pino, Javier Muñoz, and myself for Snabb Gmbh, fine purveyors of high-performance networking solutions. If you are interested in getting Pflua in a Snabb context, we'd be happy to talk; drop a note to the snabb-devel forum. For Pflua in other contexts, file an issue or drop me a mail at wingo@igalia.com. Happy hackings with Pflua, the totally adequate pflang implementation!

    by Andy Wingo at September 02, 2014 10:15 AM

    August 27, 2014

    Jacobo Aragunde

    Speaking in the next LibreOffice conference

    I’m happy to announce that I will be taking part in the 2014 edition of LibreOffice Conference as a speaker. I’ll overview the status of accessibility in our favorite productivity suite, starting with an introduction to accessibility support and how application are supposed to implement it, we will check the particular case of LibreOffice: which accessibility backends are implemented and how the architecture is designed to support multiple backends while maximizing code reuse.

    The conference program looks hot too, and this time I’m particularly interested on hearing from the success cases that will be presented there, looking for ideas and lessons to apply to new deployments.

    Igalia is one of the sponsors of the conference, taking our compromise with LibreOffice project a step further. The company will also be sponsoring my flight and stay in Bern.

    Last but not least, it will be great to meet the community members again, and get to know those I haven’t met yet in previous conferences or hackfests. Looking forward to seeing you at Bern!

    Igalia & LibreOffice

    EDIT: get the slides here!

    by Jacobo Aragunde Pérez at August 27, 2014 10:00 AM

    August 25, 2014

    Andy Wingo

    revisiting common subexpression elimination in guile

    A couple years ago I wrote about a common subexpression pass that I implemented in Guile 2.0.

    To recap, Guile 2.0 has a global, interprocedural common subexpression elimination (CSE) pass.

    In the context of compiler optimizations, "global" means that it works across basic block boundaries. Basic blocks are simple, linear segments of code without control-flow joins or branches. Working only within basic blocks is called "local". Working across basic blocks requires some form of understanding of how values can flow within the blocks, for example flow analysis.

    "Interprocedural" means that Guile 2.0's CSE operates across closure boundaries. Guile 2.0's CSE is "context-insensitive", in the sense that any possible effect of a function is considered to occur at all call sites; there are newer CSE passes in the literature that separate effects of different call sites ("context-sensitive"), but that's not a Guile 2.0 thing. Being interprocedural was necessary for Guile 2.0, as its intermediate language could not represent (e.g.) loops directly.

    The conclusion of my previous article was that although CSE could do cool things, in Guile 2.0 it was ultimately limited by the language that it operated on. Because the Tree-IL direct-style intermediate language didn't define order of evaluation, didn't give names to intermediate values, didn't have a way of explicitly representing loops and other kinds of first-order control flow, and couldn't precisely specify effects, the results, well, could have been better.

    I know you all have been waiting for the last 27 months for an update, probably forgoing meaningful social interaction in the meantime because what if I posted a followup while you were gone? Be at ease, fictitious readers, because that day has finally come.

    CSE over CPS

    The upcoming Guile 2.2 has a more expressive language for the optimizer to work on, called continuation-passing style (CPS). CPS explicitly names all intermediate values and control-flow points, and can integrate nested functions into first-order control-flow via "contification". At the same time, the Guile 2.2 virtual machine no longer penalizes named values, which was another weak point of CSE in Guile 2.0. Additionally, the CPS intermediate language enables more fined-grained effects analysis.

    All of these points mean that CSE has the possibility to work better in Guile 2.2 than in Guile 2.0, and indeed it does. The shape of the algorithm is a bit different, though, and I thought some compiler nerds might be interested in the details. I'll follow up in the next section with some things that new CSE pass can do that the old one couldn't.

    So, by way of comparison, the old CSE pass was a once-through depth-first visit of the nested expression tree. As the visit proceeded, the pass built up an "environment" of available expressions -- for example, that (car a) was evaluated and bound to b, and so on. This environment could be consulted to see if a expression was already present in the environment. If so, the environment would be traversed from most-recently-added to the found expression, to see if any intervening expression invalidated the result. Control-flow joins would cause recomputation of the environment, so that it only held valid values.

    This simple strategy works for nested expressions without complex control-flow. CPS, on the other hand, can have loops and other control flow that Tree-IL cannot express, so for it to build up a set of "available expressions" requires a full-on flow analysis. So that's what the pass does: a flow analysis over the labelled expressions in a function to compute the set of "available expressions" for each label. A labelled expression a is available at label b if a dominates b, and no intervening expression could have invalidated the results. An expression invalidates a result if it may write to a memory location that the result may have read. The code, such as it is, may be found here.

    Once you have the set of available expressions for a function, you can proceed to the elimination phase. First, you start by creating an "eliminated variable" map, which initially maps each variable to itself, and an "equivalent expressions" table, which maps "keys" to a set of labels and bound variables. Then you visit each expression in a function, again in topologically sorted order. For each expression, you compute a "key", which is some unique representation of an expression that can be compared by structural equality. Keys that compare as equal are equivalent, and are subject to elimination.

    For example, consider a call to the add primitive with variables labelled b and c as arguments. Imagine that b maps to a in the eliminated variable table. The expression as a whole would then have a key representation as the list (primcall add a c). If this key is present in the equivalent expression table, you check to see if any of the equivalent labels is available at the current label. If so, hurrah! You mark the outputs of the current label as being replaced by the outputs of the equivalent label. Otherwise you add the key to the equivalent table, associated with the current label.

    This simple algorithm is enough to recursively eliminate common subexpressions. Sometimes the recursive aspect (i.e. noticing that b should be replaced by a), along with the creation of a common key, causes the technique to be called global value numbering (GVN), but CSE seems a better name to me.

    The algorithm as outlined above eliminates expressions that bind values. However not all expressions do that; some are used as control-flow branches. For this reason, Guile also computes a "truthy table" with another flow analysis pass. This table computes a set of which branches have been taken to get to each program point. In the elimination phase, if a branch is reached that is equivalent to a previously taken branch, we consult the truthy table to see which continuation the previous branch may have taken. If it can be proven to have taken just one of the legs, the test is elided and replaced with a direct jump.

    A few things to note before moving on. First, the "compute an analysis, then transform the function" sequence is quite common in this sort of problem. It leads to some challenges regarding space for the analysis; my last article deals with these in more detail.

    Secondly, the rewriting phase assumes that a value that is available may be substituted, and that the result would be a proper CPS term. This isn't always the case; see the discussion at the end of the article on CSE in Guile 2.0 about CPS, SSA, dominators, and scope. In essence, the scope tree doesn't necessarily reflect the dominator tree, so not all transformations you might like to make are syntactically valid. In Guile 2.2's CSE pass, we work around the issue by concurrently rewriting the scope tree to reflect the dominator tree. It's something I am seeing more and more and it gives me some pause as to the suitability of CPS as an intermediate language.

    Also, consider the clobbering part of analysis, where e.g. an expression that writes a value to memory has to invalidate previously read values. Currently this is implemented by traversing all available expressions. This is suboptimal and could be quadratic in the end. A better solution is to compute a dependency graph for expressions, which links together operations on the same regions of memory; see LLVM's memory dependency analysis for an idea of how to do this.

    Finally, note that this algorithm is global but intraprocedural, meaning that it doesn't propagate values across closure boundaries. It's possible to extend it to be interprocedural, though it's less necessary in the presence of contification.

    scalar replacement via fabricated expressions

    Let's say you get to an expression at label L, (cons a b). It binds a result c. You determine you haven't seen it before, so you add (primcall cons a b) → L, c to your equivalent expressions set. Cool. We won't be able to replace a future instance of (cons a b) with c, because that doesn't preserve object identity of the newly allocated memory, but it's definitely a cool fact, yo.

    What if we add an additional mapping to the table, (car c) → L, a? That way any expression at which L is available would replace (car c) with a, which would be pretty neat. To do so, you would have to add the &read effect to the cons call's effects analysis, but since the cons wasn't really up for elimination anyway it's all good.

    Similarly, for (set-car! c d) we can add a mapping of (car c) → d. Again we have to add the &read effect to the set-car, but that's OK too because the write invalidated previous reads anyway.

    The same sort of transformation holds for other kinds of memory that Guile knows how to allocate and mutate. Taken together, they form a sort of store-to-load forwarding and scalar replacement that can entirely eliminate certain allocations, and many accesses as well. To actually eliminate the allocations requires a bit more work, but that will be the subject of the next article.

    future work

    So, that's CSE in Guile 2.0. It works pretty well. In the future I think it's probably worth considering an abstract heap-style analysis of effects; in the end, the precision of CSE is limited to how precisely we can model the effects of expressions.

    The trick of using CSE to implement scalar replacement is something I haven't seen elsewhere, though I doubt that it is novel. To fully remove the intermediate allocations needs a couple more tricks, which I will write about in my next nargy dispatch. Until then, happy hacking!

    by Andy Wingo at August 25, 2014 09:48 AM

    August 18, 2014

    Andy Wingo

    on gnu and on hackers

    Greetings, gentle hackfolk. 'Tis a lovely waning light as I write this here in Munich, Munich the green, Munich full of dogs and bikes, Munich the summer-fresh.

    Last weekend was the GNU hackers meeting up in Garching, a village a few metro stops north of town. Garching is full of quiet backs and fruit trees and small gardens bursting with blooms and beans, as if an eddy of Chistopher Alexander whirled out and settled into this unlikely place. My French suburb could learn a thing or ten. We walked back from the hack each day, ate stolen apples and corn, and schemed the nights away.

    The program of GHM this year was great. It started off with a bang, as GNUnet hackers Julian Kirsch and Christian Grothoff broke the story that the Five-Eyes countries (US, UK, Canada, Australia, NZ) regularly port-scan the entire internet, looking for vulnerabilities. They then proceed to exploit those vulnerabilities, in regular hack-a-thons, trying to own as many boxes in as many countries as they can. They then use them as launchpads for attacks and for exfiltration of information from other networks.

    The presentation that broke this news also proposed a workaround based on port-knocking, Knock. Knock embeds the hash of a pre-shared key with some other information into the 32-bit initial sequence number of a TCP connection. Unlike previous incarnations of port-knocking, Knock also authenticates the first n payload bytes, so that the connection isn't vulnerable to hijacking (e.g. via GCHQ "quantum injection", where well-placed evil routers race the true destination server to provide the first response packet of a connection). Leaking the pwn-the-internet documents with Laura Poitras at the same time as the Knock unveiling was a pretty slick move!

    I was also really impressed by Christian's presentation on the GNU name system. GNS is a replacement for DNS whose naming structure mirrors our social naming structure. For example, www.alice.gnu would be my friend Alice, and www.alice.bob.gnu would be Alice's friend Bob. With some integration, it can work on normal desktops and mobile devices. There are lots more details, so check gnunet.org/gns for more information.

    Of course, a new naming system does need some operating system support. In this regard Ludovic Courtès' update on Guix was particularly impressive. Guix is a Nix-like system whose goal is reproducible, user-controlled GNU/Linux systems. A couple years ago I didn't think much of it, but now it's actually booting on raw hardware, not just under virtualization, and things seem to be rolling forth as if on rails. Guix manages to be technically innovative at the same time as being GNU-centered, so it can play a unique role in propagating GNU work like GNS.

    and yet.

    But now, as the dark clouds race above and the light truly goes, we arrive to the point I really wanted to discuss. GNU has a terrible problem with gender balance, and with diversity in general. Of about 70 attendees at this recent GHM, only two were women. We talk the talk about empowering users and working for freedom but, to a first approximation, it's really just a bunch of dudes that think the exact same things.

    There are many reasons for this, of course. Some people like to focus on what's called the "pipeline problem" -- that there aren't as many women coming out of computer science programs as men. While true, the proportion of women CS graduates is much higher than the proportion of women at GHM events, so something must be happening in between. And indeed, the attrition rates of women in the tech industry are higher than that of men -- often because we men make it a needlessly unpleasant place for women to be. Sometimes it's even dangerous. The incidence of sexual harassment and assault in tech, especially at events, is something terrible. Scroll down in that linked page to June, July, and August 2014, and ask yourself whether that's OK. (Hint: hell no.)

    And so you would think that people who consider themselves to be working for some abstract liberatory principle, as GNU is, would be happy to take a stand against this kind of asshaberdashery. There you would be wrong. Voilà a timeline of an incident.

    timeline

    March 2014
    Someone at the FSF asks a GHM organizer to add an anti-harassment policy to GHM. The organizer does so and puts one on the web page, copying the text from Libreplanet's web site. The policy posted is:

    Offensive or overly explicit sexual language or imagery is inappropriate during the event, including presentations.

    Participants violating these rules may be sanctioned or expelled from the meeting at the discretion of the organizers.

    Harassment includes offensive comments related to gender, sexual orientation, disability, appearance, body size, race, religion, sexual images in public spaces, deliberate intimidation, stalking, harassing photography or recording, persistent disruption of talks or other events, repeated unsolicited physical contact, or sexual attention.

    Monday, 11 August 2014
    The first mention of the policy is made on the mailing list, in a mail with details about the event that also includes the line:

    An anti-harrasment policy applies at GHM: http://gnu.org/ghm/policy.html

    Monday, 11 August 2014
    A speaker writes the list to say:

    Since I do not desire to be denounced, prosecuted and finally sanctioned or expelled from the event (especially considering the physical pain and inconvenience of attending due to my very recent accident) I withdraw my intention to lecture "Introducing GNU Posh" at the GHM, as it is not compliant with the policy described in the page above.

    Please remove the talk from the official schedule. Thanks.

    PS: for those interested, I may perform the talk off-event in case we find a suitable place, we will see..

    The resulting thread goes totes clownshoes and rages on up until the event itself.
    Friday, 15 August 2014
    Sheepish looks between people that flamed each other over the internet. Hallway-track discussion starts up quickly though. Disagreeing people are not rational enough to have a conversation though (myself included).
    Saturday, 16 August 2014
    In the morning and lunch break, people start to really discuss the issues (spontaneously). It turns out that the original mail that sparked the thread was based, to an extent, on a misunderstanding: that "offensive or overly explicit sexual language or imagery" was parsed (by a few non-native English speakers) as "offensive language or ...", which people thought was too broad. Once this misunderstanding was removed, there were still people that thought that any policy at all was unneeded, and others that were concerned that someone could say something without intending offense, but then be kicked out of the event. Arguments back and forth. Some people wonder why others can be hurt by "just words". Some discussion of rape culture as continuum between physical violence and cultural tropes. One of the presentations after lunch is by a GNU hacker. He starts his talk by stating his hope that he won't be seen as "offensive or part of rape culture or something". His microphone wasn't on, so once he gets it on he repeats the joke. I stomp out, slam the door, and tweet a few angry things. Later in the evening the presenter and I discuss the issue. He apologizes to me.
    Sunday, 17 August 2014
    A closed meeting for GNU maintainers to round up the state of GNU and GHM. No women present. After dealing with a number of banalities, we finally broach the topic of the harassment policy. More opposition to the policy Sunday than Saturday lunch. Eventually a proposal is made to replace "offensive" with "disparaging", and people start to consent to that. We run out of time and have to leave; resolution unclear.
    Monday, 18 August 2014
    GHM organizer updates the policy to remove the words "offensive or" from the beginning of the harassment policy.

    I think anyone involved would agree on this timeline.

    thoughts

    The problems seen over the last week with this anti-harassment policy are entirely to do with the men. It was a man who decided that he should withdraw his presentation because he found it offensive that he could be perceived as offensive. It was men who willfully misread the policy, comparing it to such examples as "I should have the right to offend Microsoft supporters", "if I say the wrong word I will go to jail", and who raised the ignorant, vacuous spectre of "political correctness" to argue that they should be able to say what they want, in a GNU conference, no matter who they hurt, no matter what the effects. That they are able to argue this position from a status-quo perspective is the definition of privilege.

    Now, there is ignorance, and there is malice. Both must be opposed, but the former may find a cure. Although I didn't begin my contribution to the discussion in the smoothest way, linking to a an amusing article on the alchemy of intent that is probably misunderstood, it ended up that one of the main points was about intent. I know Ralph (say) and Ralph is a great person and so how could it be that anything Ralph would say could be a slur? You know he wouldn't mean it like that!

    To that, we of course have to say that as GNU grows, not everyone knows that Ralph is a great person. In the end what would it mean for someone to be anti-racist but who says racist things all the time? You would have to call them racist, right? Or if you just said something one time, but refused to own up to your error, and instead persisted in repeating a really racist slur -- you would be racist right? But I know you... But the thing that you said...

    But then to be honest I wonder sometimes. If someone repeats a joke trivializing rape culture, after making sure that the microphone is picking up his words -- I mean, that's a misogynist action, right? Put aside the question of whether the person is, in essence, misogynist or not. They are doing misogynist things. How do I know that this person isn't going to do it again, private apology or not?

    And how do I know that this community isn't going to permit it again? That remark was made to a room of 40 dudes or so. Not one woman was present. Although there was some discussion afterwards, if people left because of the phrase, it was only two or three. How can we then say that GNU is not a misogynist community -- is not a community that tolerates misogyny?

    Given all of this, what do you expect? Do you expect to grow GNU into a larger organization in the future, rich and strong and diverse? If that's not your goal, you are not my colleague. And if it is your goal, why do you put up with this kind of behavior?

    The discussion on intent and offense seems to have had its effect in the removal of "offensive or" from the anti-harassment policy language. I think it's terrible, though -- if you don't trust someone who says they were offended by sexual language or imagery, why would you trust them when they report sexual harassment or assault? I can only imagine this leading to some sort of argument where the person who has had the courage to report such an incident finds himself or herself in a public witness box, justifying that the incident was offensive. "I'm sorry my words offended you, but that was not my intent, and anyway the words were not offensive." Lolnope.

    There were so many other wrong things about this -- a suggestion that we the GNU cabal (lamentably, a bunch of dudes) should form a committee to make the wording less threatening to us; that we're just friends anyway; that illegal things are illegal anyway... it's as if the Code of Conduct FAQ that Ashe Dryden assembled were a Bingo card and we all lost.

    Finally I don't think I trusted the organizers enough with this policy. Both organizers expressed skepticism about the policy in such terms that if I personally hadn't won the privilege lottery (white male "western" hetero already-established GNU maintainer) I wouldn't feel comfortable bringing up a concern to them.

    In the future I will not be attending any conferences without strong, consciously applied codes of conduct, and I enjoin you to do the same.

    conclusion


    Propagandhi, "Refusing to Be a Man", Less Talk, More Rock (1996)

    There is no conclusion yet -- working for the better world we all know is possible is a process, as people and as a community.

    To outsiders, to outsiders everywhere, please keep up the vocal criticisms. Thank you for telling your story.

    To insiders, to insiders everywhere, this is your problem. The problem is you. Own it.

    by Andy Wingo at August 18, 2014 09:07 PM

    August 08, 2014

    Iago Toral

    Diving into Mesa

    Recap

    In my last post I gave a quick introduction to the Linux graphics stack. There I explained how what we call a graphics driver in Linux is actually a combination of three different drivers:

    • the user space X server DDX driver, which handles 2D graphics.
    • the user space 3D OpenGL driver, that can be provided by Mesa.
    • the kernel space DRM driver.

    Now that we know where Mesa fits let’s have a more detailed look into it.

    DRI drivers and non-DRI drivers

    As explained, Mesa handles 3D graphics by providing an implementation of the OpenGL API. Mesa OpenGL drivers are usually called DRI drivers too. Remember that, after all, the DRI architecture was brought to life precisely to enable efficient implementation of OpenGL drivers in Linux and, as I introduced in my previous post, DRI/DRM are the building blocks of the OpenGL drivers in Mesa.

    There are other implementations of the OpenGL API available too. Hardware vendors that provide drivers for Linux will provide their own implementation of the OpenGL API, usually in the form of a binary blob. For example, if you have an NVIDIA GPU and install NVIDIA’s proprietary driver this will install its own libGL.so.

    Notice that it is possible to create graphics drivers that do not follow the DRI architecture in Linux. For example, the NVIDIA proprietary driver installs a Kernel module that implements similar functionality to DRM but with a different API that has been designed by NVIDIA, and obviously, their corresponding user space drivers (DDX and OpenGL) will use this API instead of DRM to communicate with the NVIDIA kernel space driver.

    Mesa, the framework

    You have probably noticed that when I talk about Mesa I usually say ‘drivers’, in plural. That is because Mesa itself is not really a driver, but a project that hosts multiple drivers (that is, multiple implementations of the OpenGL API).

    Indeed, Mesa is best seen as a framework for OpenGL implementators that provides abstractions and code that can be shared by multiple drivers. Obviously, there are many aspects of an OpenGL implementation that are independent of the underlying hardware, so these can be abstracted and reused.

    For example, if you are familiar with OpenGL you know it provides a state based API. This means that many API calls do not have an immediate effect, they only modify the values of certain variables in the driver but do not require to push these new values to the hardware immediately. Indeed, usually that will happen later, when we actually render something by calling glDrawArrays() or a similar API: it is at that point that the driver will configure the 3D pipeline for rendering according to all the state that has been set by the previous API calls. Since these APIs do not interact with the hardware their implementation can be shared by multiple drivers, and then, each driver, in their implementation of glDrawArrays(), can fetch the values stored in this state and translate them into something meaningful for the hardware at hand.

    As such, Mesa provides abstractions for many things and even complete implementations for multiple OpenGL APIs that do not require interaction with the hardware, at least not immediate interaction.

    Mesa also defines hooks for the parts where drivers may need to do hardware specific stuff, for example in the implementation of glDrawArrays().

    Looking into glDrawArrays()

    Let’s see an example of these hooks into a hardware driver by inspecting the stacktrace produced from a call to glDrawArrays() inside Mesa. In this case, I am using the Mesa Intel DRI driver and I am calling glDrawArrays() from a function named render() in my program. This is the relevant part of the stacktrace:

    brw_upload_state () at brw_state_upload.c:651
    brw_try_draw_prims () at brw_draw.c:483
    brw_draw_prims () at brw_draw.c:578
    vbo_draw_arrays () at vbo/vbo_exec_array.c:667
    vbo_exec_DrawArrays () at vbo/vbo_exec_array.c:819
    render () at main.cpp:363
    

    Notice that glDrawArrays() is actually vbo_exec_DrawArrays(). What is interesting about this stack is that vbo_exec_DrawArrays() and vbo_draw_arrays() are hardware independent and reused by many drivers inside Mesa. If you don’t have an Intel GPU like me, but also use a Mesa, your backtrace should be similar. These generic functions would usually do things like checks for API use errors, reformatting inputs in a way that is more appropriate for later processing or fetching additional information from the current state that will be needed to implement the actual operation in the hardware.

    At some point, however, we need to do the actual rendering, which involves configuring the hardware pipeline according to the command we are issuing and the relevant state we have set in prior API calls. In the stacktrace above this starts with brw_draw_prims(). This function call is part of the Intel DRI driver, it is the hook where the Intel driver does the stuff required to configure the Intel GPU for drawing and, as you can see, it will later call something named brw_upload_state(), which will upload a bunch of state to the hardware to do exactly this, like configuring the various shader stages required by the current program, etc.

    Registering driver hooks

    In future posts we will discuss how the driver configures the pipeline in more detail, but for now let’s just see how the Intel driver registers its hook for the glDrawArrays() call. If we look at the stacktrace, and knowing that brw_draw_prims() is the hook into the Intel driver, we can just inspect how it is called from vbo_draw_arrays():

    static void
    vbo_draw_arrays(struct gl_context *ctx, GLenum mode, GLint start,
                    GLsizei count, GLuint numInstances, GLuint baseInstance)
    {
       struct vbo_context *vbo = vbo_context(ctx);
       (...)
       vbo->draw_prims(ctx, prim, 1, NULL, GL_TRUE, start, start + count - 1,
                       NULL, NULL);
       (...)
    }
    

    So the hook is draw_prims() inside vbo_context. Doing some trivial searches in the source code we can see that this hook is setup in brw_draw_init() like this:

    void brw_draw_init( struct brw_context *brw )
    {
       struct vbo_context *vbo = vbo_context(ctx);
       (...)
       /* Register our drawing function:
        */
       vbo->draw_prims = brw_draw_prims;
       (...)
    }
    

    Let’s put a breakpoint there and see when Mesa calls into that:

    brw_draw_init () at brw_draw.c:583
    brwCreateContext () at brw_context.c:767
    driCreateContextAttribs () at dri_util.c:435
    dri2_create_context_attribs () at dri2_glx.c:318
    glXCreateContextAttribsARB () at create_context.c:78
    setupOpenGLContext () at main.cpp:411
    init () at main.cpp:419
    main () at main.cpp:477
    

    So there it is, Mesa (unsurprisingly) calls into the Intel DRI driver when we setup the OpenGL context and it is there when the driver will register various hooks, including the one for drawing primitives.

    We could do a similar thing to see how the driver registers its hook for the context creation. We will see that the Intel driver (as well as other drivers in Mesa) assign a global variable with the hooks they need like this:

    static const struct __DriverAPIRec brw_driver_api = {
       .InitScreen           = intelInitScreen2,
       .DestroyScreen        = intelDestroyScreen,
       .CreateContext        = brwCreateContext,
       .DestroyContext       = intelDestroyContext,
       .CreateBuffer         = intelCreateBuffer,
       .DestroyBuffer        = intelDestroyBuffer,
       .MakeCurrent          = intelMakeCurrent,
       .UnbindContext        = intelUnbindContext,
       .AllocateBuffer       = intelAllocateBuffer,
       .ReleaseBuffer        = intelReleaseBuffer
    };
    
    PUBLIC const __DRIextension **__driDriverGetExtensions_i965(void)
    {
       globalDriverAPI = &brw_driver_api;
    
       return brw_driver_extensions;
    }
    

    This global is then used throughout the DRI implementation in Mesa to call into the hardware driver as needed.

    We can see that there are two types of hooks then, the ones that are needed to link the driver into the DRI implementation (which are the main entry points of the driver in Mesa) and then the hooks they add for tasks that are related to the hardware implementation of OpenGL bits, typically registered by the driver at context creation time.

    In order to write a new DRI driver one would only have to write implementations for all these hooks, the rest is already implemented in Mesa and reused across multiple drivers.

    Gallium3D, a framework inside a framework

    Currently, we can split Mesa DRI drivers in two kinds: the classic drivers (not based on the Gallium3D framework) and the new Gallium drivers.

    Gallium3D is part of Mesa and attempts to make 3D driver development easier and more practical than it was before. For example, classic Mesa drivers are tightly coupled with OpenGL, which means that implementing support for other APIs (like Direct3D) would pretty much require to write a completely new implementation/driver. This is addressed by the Gallium3D framework by providing an API that exposes hardware functions as present in modern GPUs rather than focusing on a specific API like OpenGL.

    Other benefits of Gallium include, for example, support for various Operating Systems by separating the part of the driver that relies on specific aspects of the underlying OS.

    In the last years we have seen a lot of drivers moving to the Gallium infrastructure, including nouveau (the open source driver for NVIDIA GPUs), various radeon drivers, some software drivers (swrast, llvmpipe) and more.


    Gallium3D driver model (image via wikipedia)

    Although there were some efforts to port the Intel driver to Gallium in the past, development of the Intel Gallium drivers (i915g and i965g) is stalled now as far as I know. Intel is focusing in the classic version of the drivers instead. This is probably because it would take a large amount of time and effort to bring the current classic driver to Gallium with the same features and stability that it has in its current classic form for many generations of Intel GPUs. Also, there is a lot of work going on to add support for new OpenGL features to the driver at the moment, which seems to be the priority right now.

    Gallium and LLVM

    As we will see in more detail in future posts, writing a modern GPU driver involves a lot of native code generation and optimization. Also, OpenGL includes the OpenGL Shading Language (GLSL) which directly requires to have a GLSL compiler available in the driver too.

    It is no wonder then that Mesa developers thought that it would make sense to reuse existing compiler infrastructure rather than building and using their own: enter LLVM.

    By introducing LLVM into the mix, Mesa developers expect to bring new and better optimizations to shaders and produce better native code, which is critical to performance.

    This would also allow to eliminate a lot of code from Mesa and/or the drivers. Indeed, Mesa has its own complete implementation of a GLSL compiler, which includes a GLSL parser, compiler and linker as well as a number of optimizations, both for abstract representations of the code, in Mesa, and for the actual native code for a specific GPU, in the actual hardware driver.

    The way that Gallium plugs LLVM is simple: Mesa parses GLSL and produces LLVM intermediary representation of the shader code that it can then pass to LLVM, which will take care of the optimization. The role of hardware drivers in this scenario is limited to providing LLVM backends that describe their respective GPUs (instruction set, registers, constraints, etc) so that LLVM knows how it can do its work for the target GPU.

    Hardware and Software drivers

    Even today I see people who believe that Mesa is just a software implementation of OpenGL. If you have read my posts so far it should be clear that this is not true: Mesa provides multiple implementations (drivers) of OpenGL, most of these are hardware accelerated drivers but Mesa also provides software drivers.

    Software drivers are useful for various reasons:

    • For developing and testing purposes, when you want to take the hardware out of the equation. From this point of view, a software representation can provide a reference for expected behavior that is not tied or constrained by any particular hardware. For example, if you have an OpenGL program that does not work correctly we can run it with the software driver: if it works fine then we know the problem is in the hardware driver, otherwise we can suspect that the problem is in the application itself.
    • To allow execution of OpenGL in systems that lack 3D hardware drivers. It would obviously be slow, but in some scenarios it could be sufficient and it is definitely better than not having any 3D support at all.

    I initially intended to cover more stuff in this post, but it is already getting long enough so let’s stop here for now. In the next post we will discuss how we can check and change the driver in use by Mesa, for example to switch between a software and hardware driver, and we will then start looking into Mesa’s source code and introduce its main modules.

    by Iago Toral at August 08, 2014 10:31 AM

    August 06, 2014

    Carlos García Campos

    GTK+ 3 Plugins in WebKitGTK+ and Evince Browser Plugin

    GTK+ 3 plugins in WebKitGTK+

    The WebKit2 GTK+ API has always been GTK+ 3 only, but WebKitGTK+ still had a hard dependency on GTK+ 2 because of the plugin process. Some popular browser plugins like flash or Java use GTK+ 2 unconditionally (and it seems they are not going to be ported to GTK+ 3, at least not in the short term). These plugins stopped working in Epiphany when it switched to GTK+ 3 and started to work again when Epiphany moved to WebKit2.

    To support GTK+ 2 plugins we had to build the plugin process with GTK+ 2, but also some parts of WebCore and WebKit2 (the ones depending on GTK+ and used by the plugin process) were built twice. As a result we had a WebKitPluginProcess binary of ~40MB, that was always used for all the plugins. This kind of made sense, since there were no plugins using GTK+ 3, and the GTK+ 2 dependency was harmless for plugins not using GTK+ at all. However, we realized we were making a rule for the exception, since most of the plugins don’t even use GTK+, and there weren’t plugins using GTK+ 3 because they were not supported by any browser (kind of chicken-egg problem).

    Since WebKitGTK+ 2.5.1 we have two binaries for the plugin process: WebKitPluginProcess2 which is exactly the same 40MB binary using GTK+ 2 that we have always had, but that now is only used to load plugins using GTK+ 2; and WebKitPluginProcess, a 7,4K binary that is now used by default for everything except loading plugins that use GTK+ 2. And since it links to GTK+ 3, it might load plugins using GTK+ 3 as well. Another side effect is that now we can make GTK+ 2 optional, WebKitPluginProcess2 wouldn’t be built and only plugins using GTK+ 2 wouldn’t be supported.

    Evince Browser Plugin

    For a long time, we have maintained that PDF documents shouldn’t be opened inside the browser, but downloaded and then opened by the default document viewer. But then the GNOME design team came up with new mockups for Epiphany were everything was integrated in the browser, including PDF documents. It’s something all the major browsers do nowadays, using different approaches though (Custom PDF plugin inside the web engine, JavaScript libraries, etc.).

    At the WebKitGTK+ hackfest in 2012 we started to think about how to implement the integrated document reading in Epiphany based on the design mockups. We quickly discarded the idea of implementing it as a NPAPI plugin, because that would mean we had to use a very old evince version using GTK+ 2. We can’t implement it inside WebKit using libevince because it’s a GPL library, so the first approach was to implement it inside Epiphany using libevince. I wrote a first patch, it was mostly a proof of concept hack, that added a new view widget based on EvView to be used instead of a WebView when a document supported by evince was requested. This approach has a lot of limitations, since it only works when the main resource is a document, but not for documents embedded in a HTML page or an iframe, and a lot of integration problems that makes it quite difficult to maintain inside Epiphany. All of these issues would be solved by implementing it as a NPAPI plugin and it wouldn’t require any change in Epiphany. Now that WebKitGTK+ supports GTK+ 3 plugins, there’s no reason not to do so.

    Epiphany Evince Plugin

    Thanks to a project in Igalia I’ve been able to work on it, and today I’ve landed an initial implementation of the browser plugin to Evince git master. It’s only a first implementation (written in C++ 11) with the basic features (page navigation, view modes, zoom and printing), and a very simple UI that needs to be updated to match the mockups. It can be disabled at compile time like all other frontends inside Evince (thumbnailer, previewer, nautilus properties page).

    Epiphany embedded PDF document Epiphany standalone PDF document

    Another advantage of being a NPAPI plugin is that it’s scriptable so that you can control the viewer using JavaScript.

    Epiphany scriptable PDF

    And you can pass initial parameters (like current page, zoom level, view mode, etc.) from the HTML tag.

    <object data="test.pdf" type="application/pdf" width="600" height="300" 
                    currentPage="2" zoomMode="fit-page" continuous="false">
      The pdf could not be rendered.
    </object>

    You can even hide the default toolbar and build your own one using HTML and JavaScript.

    by carlos garcia campos at August 06, 2014 10:45 AM

    August 05, 2014

    Víctor Jáquez

    GUADEC 2014

    The last Friday 25 of July, National Day of Galicia, started very early because I had to travel to Strasbourg, official seat of the European Parliament, not for any political duty, but for the GNOME Users and Developers European Conference, the GUADEC!

    My last GUADEC was in The Hague, in 2010, though in 2012, when it was hosted in Coruña, I attended a couple talks. Nonetheless, it had been a long time since I met the community, and it was a pleasure to me meet them again.

    My biggest impression was the number of attendees. I remember the times in Turkey or in Gran Canaria where hundreds packed the auditoriums and halls. Nowadays the audience was smaller, but that is a good thing, since now you get in touch with the core of developers who drive and move the project easily.

    We, Igalia, as sponsors, had a banner in the main room and a table in a corridor. Here is a picture of Juan to prove it:

    Juan at the Igalia's both.

    Juan at the Igalia’s booth.

    Also I ran across with Emmanuele Bassi, setting up a booth to show up the Endless Mobile OS, based on GNOME 3. The people at GUADEC welcomed with enthusiasm the user experience provided by it and the purpose of the project. Personally, I love it. If you don’t know the project, you should visit their web site.

    The first talk I attended what the classic GStreamer update by Sebastian Dröge and Tim Müller. They talked about the new features in GStreamer 1.4. Neat stuff in there. I like the new pace of GStreamer, rather of the old stagnated evolution of 0.10 version.

    Afterwards, Jim Hall gave us a keynote about Usability in GNOME. I really enjoyed that talk. He studied the usability of several GNOME applications such as Nautilus (aka Files), GEdit, Epiphany (aka Web), etc., as part of his Masters’ research. It was a pleasure to hear that Epiphany is regarded as having a good usability.

    After lunch I was in the main room hearing Sylvain Le Bon about sustainable business models for free software. He talked about crowd funding, community management and related stuff.

    The next talk was Christian Hergert about his project GOM, an object mapper from GObjects to SQLite, which is used in Grilo to prevent SQL injection by some plugins that use SQLite.

    Later on, Marina Zhurakhinskaya gave us one of the best talks of the GUADEC: How to be an ally to women in tech. I encourage you to download the slides and read them. There I learned about the unicorn law and the impostor syndrome.

    The day closed with the GNOME Foundation’s teams reports.

    Sunday came and I arrived to the venue for the second keynote: Should We Teach The Robot To Kill by Nathan Willis. In his particular style, Nathan, presented a general survey of GNU/Linux in the Automotive Industry.

    Next, one of main talks from Igalia: Web 3.12: a browser to make us proud, presented by Edu. It was fairly good. Edu showed us the latest development in WebKitGTK+ and Epiphany (aka Web). There were quite a few questions at the end of the talk. Epiphany nowadays is actively used by a lot of people in the community.

    After, Zeeshan presented his GNOME boxes, an user interface for running virtual machines. Later on Alberto Ruiz showed us Fleet Commander, a web application to handle large desktop deployments.

    And we took our classic group photo:

    Group phoo

    Group photo

    That Sunday closed with the intern’s lighting talks. Cool stuff is being cooked by them.

    On Monday I was in the venue when Emmanuele Bassi talked us about GSK, the GTK+ Scene Graph Kit, his new project, using as a starting point the lessons learned in Clutter. Its objective is to have a scene graph library fully integrated in GTK+.

    After the lunch and the second part of the Foundation’s Annual General Meeting, Benjamin Otte gave an amusing talk about the CSS implementation in GTK+. Later, Jasper St. Pierre talked about the Wayland support in GNOME.

    When the coffee break ended, the almighty Žan Doberšek gave the other talk from Igalia: Wayland support in WebKit2GTK+.

    In the last day of the GUADEC, I attended Bastien Nocera’s talk: Hardware integration, the GNOME way, where he reviewed the history of his contributions to GNOME related with hardware integration and the goal of nicely support most of the hardware in GNOME, like compasses, gyroscopes, et cetera.

    Afterwards, Owen Taylor talked us about the GNOME’s continuous integration performance testing, in order to know exactly why one release of GNOME is faster or slower than the last.

    And the third keynote came: Matthew Garrett talked us about his experiences with the GNOME community and his vision about where it should go: to enhance the privacy and security of the users, something that many GNOMErs are excited about, such as Federico Mena.

    Later on, David King talked about his plans for Cheese, the webcam application, turning it into a DBus service, using the current development of kdbus to sandbox the interaction with the hardware.

    Afterwards Christian Hergert talked us about his plans for Builder, a new IDE for GNOME. Promising stuff, but we will see how it goes. Christian said that he is going to take a full year working on this project.

    The GUADEC ended with the lighting talks, where I enjoyed one about the problems around the current encryption and security tools.

    Finally, the next GUADEC host was unveiled: the Sweden Conspiracy: Gothenburg!

    by vjaquez at August 05, 2014 11:46 AM